HIERARCHICAL EXTENSIONS OF BAYESIAN PARAMETRIC MODELS FOR
WHOLE GENOME PREDICTION
By
Wenzhao Yang

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Animal Science – Doctor of Philosophy
Quantitative Biology – Doctor of Philosophy
2014

ABSTRACT
HIERARCHICAL EXTENSIONS OF BAYESIAN PARAMETRIC MODELS FOR
WHOLE GENOME PREDICTION
By
Wenzhao Yang
Whole genome prediction (WGP) is increasingly used to predict breeding values
(BV) of plants and animals based on the use of single nucleotide polymorphism (SNP)
marker panels. Two particularly popular WGP models, labeled BayesA and BayesB, are
based on specifying all SNP-associated effects to be independent of each other. In this
dissertation, we further extend these two models to allow for greater flexibility to infer
upon BV and SNP effects in three different frameworks: 1) allowing for correlated SNP
effects, 2) reaction norm modeling of genotype by environment interaction (G×E) and 3)
bivariate WGP models. We complement these efforts with focusing on strategies to infer
upon key hyperparameters that anchor some of these specifications.
Based on a first order nonstationary antedependence specification, we extended
BayesA and BayesB to account for spatial correlation between SNP effects due to the
proximal QTL; we label the corresponding extensions as ante-BayesA and ante-BayesB
respectively. Using simulation studies and application to the publicly available
heterogeneous stock mice data and other provided benchmark data, we determined that
antedependence models had significantly higher WGP accuracies compared to their
conventional counterparts, especially at higher LD levels. Subsequently, we extended
reaction norm (RN) and random regression (RR) models to account for G×E. Several

specifications on the SNP-specific variance-covariance matrices (VCV) of intercept and
slope effects were considered using independent inverted Wishart (IW) prior densities
(IW-BayesA, IW-BayesB and IW-BayesC). Two potentially more flexible RR/RN
models using square root free Cholesky decomposition (CD) were proposed (CD-BayesA
and CD-BayesB). Based on a RN simulation study and a RR data analysis in pigs,
RR/RN WGP models provided greater WGP accuracies compared to conventional WGP
models although differences were not substantial between the competing IW- vs CDbased methods except with simpler genetic architectures (i.e., low number of QTL). We
also developed bivariate WGP models based on more or less the same specifications for
SNP-specific VCV in RR/RN models (i.e., IW-BayesA, CD-BayesA and CD-BayesB)
comparing them to the more conventional bivariate genomic BLUP (bGBLUP) model.
Using a LD simulation study, the three bivariate trait models generally demonstrated
higher WGP accuracy than univariate BayesA or BayesB when the number of pleiotropic
QTL was relatively large and the heritability of the trait was low. Furthermore, in an
application to data from pine trees, CD-BayesB exhibited higher predictive ability
compared to other competing models. Comparisons between competing WGP models
require appropriate tuning of key hyperparameters. Hence we also studied three
alternative Metropolis-Hastings (MH) sampling strategies to infer upon key
hyperparameters in BayesA and BayesB. Both simulation studies and application to the
heterogeneous stock mice data, strategies that were more heavily based on Metropolis
Hastings sampling of key hyperparameters demonstrated significantly greater
computational efficiencies compared to strategies that deferred to usage of Gibbs
sampling.

ACKNOWLEDGEMENTS

I would like to thank my advisor, Dr. Robert Tempelman, for taking me as his
student and guiding me throughout my PhD research project with encouragement and
patience. Dr. Robert Tempelman not only offered me a great opportunity to work with
him but also helped me developing critical thinking. I greatly appreciate his guidance and
menorship he provided to me over the years.
I want to thank my guidance committee members for their constructive and
thought-provoking suggestions on my research work. Dr. Juan Steibel, who was my
second reader, provided me important data resource for my research project. Dr. Cathy
Ernst helped me understanding genetics and guided me for my dual major. Dr. Yuehua
Cui inspired me to learn quantitative genetics and Dr. Qing Lu gave me great input from
an epigenetics perspective.
Furthermore, I would like to thank United States Department of Agriculture
(USDA), Animal Science department and Quantitative Biology program for sponsoring
my research project. I am also grateful to my colleagues Nora Bello, Chunyu Chen, Heng
Wang, Lei Zhou, Igseo Choi, Yvonne Badke, Jose Luis Gualdron and Pablo Reeb for
their feedback and friendship.
Last but not the least, I’d like to express my thanks to my husband and my parents
for their unconditional love and support.

iv

TABLE OF CONTENTS

LIST OF TABLES ....................................................................................................... viii
LIST OF FIGURES ........................................................................................................ ix
Chapter 1 Introduction .................................................................................................... 1
Chapter 2 A Bayesian antedependence model for whole genome prediction ................. 5
2.1 Background .............................................................................................................. 5
2.2 Materials and Methods ............................................................................................. 7
2.2.1 Conventional WGP model ........................................................................... 7
2.2.2 Antedependence extensions of WGP models .............................................. 9
2.2.3 Simulation study ........................................................................................ 12
2.2.4 Application to Heterogeneous Stock Mice Dataset ................................... 14
2.2.5 Application on Simulated Genomic Data from Hickey and Gorjanc ........ 16
2.2.6 Bayesian inference ..................................................................................... 17
2.2.7 Prior specifications .................................................................................... 17
2.3 Results .................................................................................................................... 18
2.3.1 Simulation Study ....................................................................................... 18
2.3.2 Application to Heterogeneous Stock Mice data ........................................ 28
2.3.3 Application to Hickey and Gorjanc Data .................................................. 28
2.4 Discussion and Conclusion .................................................................................... 30
Chapter 3 Improving the computational efficiency of fully Bayes inference and
assessing the effect of misspecification on hyperparameters in whole genome
prediction model ............................................................................................................ 37
3.1 Introduction ............................................................................................................ 37
3.2 Materials and Methods ........................................................................................... 40
3.2.1 WGP Model ............................................................................................... 40
2
3.2.2 Univariate Metropolis Hastings sampling on ν and Gibbs update on s
(DFMH) ............................................................................................................... 41
3.2.3 Univariate Metropolis Hastings sampling for each of ν and s (UNIMH)
............................................................................................................................. 42
2
3.2.4 Bivariate Metropolis Hastings sampling on ν and s (BIVMH) ............. 43
3.2.5 Simulation Study ....................................................................................... 44
3.2.6 Data Application: Assessment of computational efficiency comparisons 47
3.3 Results .................................................................................................................... 48
3.3.1 Simulation Study ....................................................................................... 48
3.3.2 Application to Heterogeneous Stock Mice data ........................................ 60
3.4 Discussion .............................................................................................................. 60
3.5 Conclusions ............................................................................................................ 63
2

v

Chapter 4 Random regression and reaction norm extensions of whole genome
prediction models to account for genotype by environment interaction ....................... 64
4.1 Introduction ............................................................................................................ 64
4.2 Materials and Methods ........................................................................................... 67
4.2.1 Random regression and reaction norm models.......................................... 67
4.2.2 Conventional BayesA and BayesB (BayesA\BayesB) .............................. 69
4.2.3 Bivariate Normality (IW-BayesC)............................................................. 69
4.2.4 Bivariate Student t and Variable Selection (IW-BayesA\IW-BayesB) ..... 70
4.2.5 Cholesky decomposition specifications (CD-BayesA\CD-BayesB) ......... 71
4.2.6 Bayesian inference ..................................................................................... 73
4.2.7 Simulation Study ....................................................................................... 73
4.2.8 MSU Pig Resource Population data .......................................................... 78
4.2.9 Priors used for data analyses ..................................................................... 81
4.3 Results .................................................................................................................... 82
4.3.1 Simulation Study ....................................................................................... 82
4.3.2 MSU Pig Resource Population data .......................................................... 87
4.4 Discussion .............................................................................................................. 94
4.5 Conclusions .......................................................................................................... 101
Chapter 5 Exploring alternative specifications for bivariate trait whole genome
prediction models ........................................................................................................ 102
5.1 Introduction ........................................................................................................... 102
5.2 Methods and Materials ......................................................................................... 105
5.2.1 Whole genome prediction models ........................................................... 105
5.2.2 Univariate BayesA and BayesB (uBayesA\uBayesB)............................. 106
5.2.3 Bivariate Ridge regression (bGBLUP) .................................................... 106
5.2.4 Bivariate Student-t (IWBayesA) ............................................................. 107
5.2.5 Cholesky decomposition specifications (CDBayesA\CDBayesB).......... 108
5.2.6 Bayesian inference ................................................................................... 111
5.2.7 Simulation studies.................................................................................... 111
5.2.8 Pine data analyses .................................................................................... 115
5.2.9 Priors used for data analyses ................................................................... 117
5.3 Results .................................................................................................................. 118
5.3.1 Simulation Studies ................................................................................... 118
5.3.2 Pine data analyses .................................................................................... 121
5.4 Discussion ............................................................................................................ 124
5.5 Conclusions .......................................................................................................... 128
Chapter 6 Discussion, Conclusions and Future Work................................................. 129
APPENDICES ............................................................................................................. 133
APPENDIX A: Chapter 2 ........................................................................................... 134
APPENDIX B: Chapter 3............................................................................................ 164
APPENDIX C: Chapter 4............................................................................................ 184
APPENDIX D: Chapter 5 ........................................................................................... 211

vi

BIBLIOGRAPHY ....................................................................................................... 242

vii

LIST OF TABLES

Table 2.1: Summary statistics for 6 different marker densities in the simulation study over
20 replicates ...................................................................................................................... 19
Table 4.1: Summary of six scenarios in LD simulation.................................................... 74
Table 5.1: Summary of two different populations compared in a LD simulation study. 112

viii

LIST OF FIGURES

2

2

Figure 2.1: Average posterior means of sg (BayesA, BayesB) and sδ (ante-BayesA,
ante-BayesB) across 20 replicates for six different levels of LD comparing BayesA and
ante-BayesA in (A) and BayesB versus ante-BayesB in (B). Significant differences in
posterior means between competing methods at each LD level are indicated by *(P<0.01),
**(P <0.001), or ***(P <0.0001). .................................................................................... 21
Figure 2.2: Average posterior means of π g (BayesB) versus π δ (ante-BayesB) across 20
replicates as a function of six different LD levels. Significant differences in posterior
means between competing methods at each LD level are indicated by *(P<0.01), **(P
<0.001), or ***(P <0.0001). ............................................................................................. 22
Figure 2.3: Average posterior means of vg (BayesA, BayesB) and vδ (ante-BayesA,
ante-BayesB) across 20 replicates for six different levels of LD comparing BayesA and
ante-BayesA in (A) and BayesB versus ante-BayesB in (B). No significant differences
(P>.01) were determined between the two sets of competing procedures at each LD level.
........................................................................................................................................... 24
Figure 2.4: Average accuracies of estimated breeding value across 20 replicates for
analyses based on each of six LD levels. Differences in accuracy between BayesA and
ante-BayesA (bottom symbols) and between BayesB with ante-BayesB (top symbols)
indicated as significant by *(P<.01) (*) or **( P<.001) ................................................... 26
Figure 2.5: Boxplots of average accuracies of estimated breeding value across 9
replicates for four traits in Generations 6, 8 and 10 for benchmark data from Hickey and
Gorjanc (2012). Differences in accuracy between anteBayesB (black) and BayesB (dark
gray) and between anteBayesA (light gray) with BayesA (white) indicated as significant
by *(0.05<P<.10), **( 0.01<P<.05) or ***( P<.01). ....................................................... 29
Figure 3.1: Boxplots of effective sample size for v divided by total CPU time in seconds
across 15 replicates for three different levels of LD comparing DFMH, UNIMH and
BIVMH under BayesA model. Different letters indicate significant difference with P <
0.05.................................................................................................................................... 50
2
Figure 3.2: Boxplots of effective sample size for s divided by total computational time
in seconds across 15 replicates for three different levels of LD comparing DFMH,
UNIMH and BIVMH under BayesA model. Different letters indicate significant
difference with P< 0.05. ................................................................................................... 50

ix

Figure 3.3: Boxplots of effective sample size for v divided by total computational time in
seconds across 15 replicates for three different levels of LD comparing DFMH, UNIMH
and BIVMH under BayesB model. Different letters indicate significant difference with P
< 0.05. ............................................................................................................................... 51
Figure 3.4: Boxplots of effective sample size for s 2 divided by total computational time
in seconds across 15 replicates for three different levels of LD comparing DFMH,
UNIMH and BIVMH under BayesB model. Different letters indicate significant
difference with P < 0.05. .................................................................................................. 51
Figure 3.5: Boxplots of effective sample size for π divided by total computational time
in seconds across 15 replicates for three different levels of LD comparing DFMH,
UNIMH and BIVMH under BayesB model. Different letters indicate significant
difference with P< 0.05. ................................................................................................... 52
Figure 3.6: Boxplots of accuracies of breeding value predictions under BayesA model
2
across 15 replicates at medium marker density (pairwise r2=0.24) for s set equal to
2
2
different multiples of average posterior mean of s ( smed
=7x10-4) in a fully Bayes
2
are indicated by *(P <0.01), ***(P <0.0001) .
analysis. Significant differences with 1 smed
........................................................................................................................................... 54

Figure 3.7: Boxplots of accuracies of breeding value predictions under BayesB model
2
across 15 replicates at medium marker density (pairwise r2=0.24) for s set equal to
2
2
=4x10-2) in a fully Bayes
different multiples of average posterior mean of s ( smed
2
analysis. Significant differences with 1 smed
are indicated by *(P <0.01), ***(P <0.0001).
........................................................................................................................................... 54

Figure 3.8: Boxplots of posterior mean and median for v under BayesA model across 15
2
replicates at medium marker density (pairwise r2=0.24) for s set equal to different
2
2
multiples of average posterior mean of s ( smed
=7x10-4) in a fully Bayes analysis. ........ 56

Figure 3.9: Boxplots of posterior mean and median for v under BayesB model across 15
2
replicates at medium marker density (pairwise r2=0.24) for s set equal to different
2
2
multiples of average posterior mean of s ( smed
=4x10-2) in a fully Bayes analysis. ....... 57

Figure 3.10: Boxplots of posterior mean and median for π under BayesB model across
2
15 replicates at medium marker density (pairwise r2=0.24) for s set equal to different
2
multiples of average posterior mean of s 2 ( smed
=4x10-2) in a fully Bayes analysis. ......... 57

x

Figure 3.11: Accuracy of breeding value predictions under BayesA model across 15
replicates at high and low marker densities (pairwise LD r2=0.32 and 0.17) using DFMH
(red), DFMH with fixed scale s 2 =7x10-4/4=1.75x10-4 (green) at high marker density
and DFMH with fixed scale s 2 =7x10-4 x 2.5=1.75x10-3 (blue) at low marker density.
Significant difference in accuracy between DFMH (red) and DFMH with s 2 =1.75x10-4
(green) was found at P< 0.05. ........................................................................................... 59
Figure 3.12: Accuracy of breeding value predictions under BayesB model across 15
replicates at high and low marker densities (pairwise LD r2=0.32 and 0.17) using DFMH
(red), DFMH with fixed scale s 2 =0.04/4=0.01 (green) at high marker density and
DFMH with fixed scale s 2 =0.04 x 2.5=0.1 (blue) at low marker density. No significant
differences in accuracy among each pair of methods were found. ................................... 59
Figure 4.1: Average accuracy of breeding value prediction for seven methods
(BayesA\BayesB\IW-BayesA\IW-BayesB\IW-BayesC\CD-BayesA\CD-BayesB) at three
environmental covariates in six scenarios, i.e. (1) M both = 100 , M int = 0 , ρ g1g2 = 0 ,
h 2 = 0.5 ; (2) M both = 100 , M int = 0 , ρ g1g2 = 0.5 , h 2 = 0.5 ; (3) M both = 100 , M int = 0 ,

ρ g1g2 = 0.8 , h 2 = 0.5 ; (4) M both = 50 , M int = 50 , ρ g1g2 = 0.5 , h 2 = 0.5 ; (5) M both = 50 ,
M int = 50 , ρ g1g2 = 0.5 , h 2 = 0.2 ; (6) M both = 20 , M int = 20 , ρ g1g2 = 0.5 , h 2 = 0.5 ;
Different letters indicate significant difference at P<0.05. ............................................... 84
Figure 4.2: Accuracy of intercept (A) and slope (B) breeding value prediction for five RN
methods (IW-BayesA\IW-BayesB\IW-BayesC\CD-BayesA\CD-BayesB) under six
scenarios: i.e. (1) M both = 100 , M int = 0 , ρ g1g2 = 0 , h 2 = 0.5 ; (2) M both = 100 , M int = 0 ,

ρ g1g2 = 0.5 , h 2 = 0.5 ; (3) M both = 100 , M int = 0 , ρ g1g2 = 0.8 , h 2 = 0.5 ; (4) M both = 50 ,
M int = 50 , ρ g1g2 = 0.5 , h 2 = 0.5 ; (5) M both = 50 , M int = 50 , ρ g1g2 = 0.5 , h 2 = 0.2 ; (6)
M both = 20 , M int = 20 , ρ g1g2 = 0.5 , h 2 = 0.5 ; Different letters indicate significant
difference at P<0.05. ......................................................................................................... 86
Figure 4.3: Predictive ability for eight methods (BayesA\BayesB\RR-BLUP\IWBayesA\IW-BayesB\IW-BayesC\CD-BayesA\CD-BayesB) from cross-validation
analysis using back fat thickness in MSU Pig Resource Population data. Different letters
indicate significant difference at P<0.0001. ..................................................................... 87
Figure 4.4: Estimated intercept breeding values from conventional BayesA, IW-BayesA
and CD-BayesA using the complete final analyses data on back fat thickness in MSU Pig
Resource Population. Reference line is y=x. .................................................................... 89

xi

Figure 4.5: Estimated slope breeding values from IW-BayesA and CD-BayesA using the
complete final analyses data on back fat thickness in MSU Pig Resource Population.
Reference line is y=x. ....................................................................................................... 89
Figure 4.6: Estimated SNP effects from conventional BayesA against marker position
using the complete final analyses data on back fat thickness in MSU Pig Resource
Population. ........................................................................................................................ 91
Figure 4.7: Estimated SNP effects from IW-BayesA against marker position using the
complete final analyses data on back fat thickness in MSU Pig Resource Population when
A) at week 10, B) at week 16, C) at week 22. .................................................................. 92
Figure 4.8: Estimated SNP effects from CD-BayesA against marker position using the
complete final analyses data on back fat thickness in MSU Pig Resource Population when
A) at week 10, B) at week 16, C) at week 22. .................................................................. 93
Figure 5.1: Accuracy of breeding value prediction for six methods
(uBayesA\uBayesB\bGBLUP\IWBayesA\CDBayesA\CDBayesB) in two scenarios: A)
number of QTLs for both traits = 10; B) number of QTLs for both traits = 30. Different
letters indicate significant difference with P<0.05. ........................................................ 120
Figure 5.2: Average predictive ability from cross-validation using the loblolly pine data
set for eight methods (uBayesA\uBayesB\bGBLUP\IWBayesA\CDBayesA1\CDBayesB1
\CDBayesA2\CDBayesB2), where CDBayesA1 and CDBayesB1 were using RBIN as the
first trait and RGV as the second trait; CDBayesA2 and CDBayesB2 were using RGV as
the first trait and RBIN as the second trait; Different letters indicate means are different
(P<0.05) from each other. ............................................................................................... 122
Figure 5.3: Estimated SNP effects for RGV from using 807 individuals and 2684 SNPs in
Pine data set using three methods: A) uBayesA; B) IWBayesA; C) CDBayesB1. ........ 123

xii

Chapter 1 Introduction

High throughput genotyping technologies such as single nucleotide
polymorphisms (SNP) have slowly redefined genetic evaluation systems for livestock by
augmenting phenotype and pedigree information with rich genomic information on
economically important traits. This has been particularly true in the dairy industry
(Wiggans, VanRaden et al. 2011). This process has come to be known as whole genome
prediction (WGP) or genomic selection based on the seminal work of Meuwissen et al.
(2001). The idea of using whole genome SNP markers in WGP has been introduced to
predict phenotypes for complex traits in human genetics (Lee, van der Werf et al. 2008;
de los Campos, Gianola et al. 2010). In a human skin cancer study, joint analyses of SNP
markers in WGP have been demonstrated to improve prediction accuracy of genetic risk
(Vazquez, de los Campos et al. 2012). Next generation sequencing technologies will only
further make SNP genotyping more cost effective and hence more dense (De Donato,
Peters et al. 2013), thereby creating more opportunities for researchers to identify and
exploit the use of genes or quantitative trait loci (QTL) that are in potentially high linkage
disequilibrium (LD) with SNP and meaningfully influence economically important traits
and disease incidence rates in various populations.
There are several statistical inferential issues associated with WGP. Firstly, the
number of available markers (m) is typically much greater than that of animals having
phenotypic records (n) such that least-squares or likelihood-based approaches are not
practical without first imposing some model selection strategy for deleting markers until
m<n. After all, it is widely believed that only a small unknown proportion of markers
1

are associated with non-zero genetic effects (Meuwissen, Hayes et al. 2001). There have
been a number of different modeling approaches to specify the distribution of these
effects; thus far, almost all such approaches have been based on these effects being
independent of each other. Gianola et al. (2003) anticipated that some adjacent SNP
markers within a chromosome might have dependencies among their estimated effects. In
addition, SNP marker effects might have nonstationary spatial correlations within a
chromosome given their proximity to the major QTL.
The most common approach used in current WGP evaluations (VanRaden 2008;
Daetwyler 2009; Misztal, Legarra et al. 2009; Misztal, Aggrey et al. 2013) is based on the
use of Genomic Best Linear Unbiased Prediction or GBLUP (HENDERSON 1976)
whereby all marker effects are assumed to be normally, identically and independently
distributed (NIID) random draws. This specification helps resolve the m >> n issue since
the specification of random effects facilitates a borrowing of information across markers.
However, it has been speculated that the distributional assumptions in GBLUP may be
too strong, depending upon the genetic architecture of the trait; i.e., the distribution of the
QTL effects themselves, often believed to be non-normal (HAYES and GODDARD 2001) or
the relative number of QTL to number of markers.
Meuwissen et al. (2001) introduced parametric Bayesian models labeled “BayesA”
and “BayesB” to provide additional distributional flexibility, with both approaches often
demonstrating better fit for WGP compared to GBLUP (Meuwissen, Hayes et al. 2001;
Habier, Fernando et al. 2007; Hayes, Bowman et al. 2009). The “BayesA” model
specifies marker specific genetic effects to be normally distributed with mean 0 and
marker specific variances being independent random draws from a scaled inverted chi2

square distribution; in essence, the genetic effects are marginally specified to be IID
Student t distributed (de los Campos, Hickey et al. 2012). The “BayesB” model uses this
same distributional assumption as one component of a mixture distribution, the other
component being a point spike at 0; i.e., no effects for those markers belonging to that
component. Since then, several other “Bayesian alphabet” models have been developed
as well (de los Campos, Naya et al. 2009; Verbyla, Hayes et al. 2009; Habier, Fernando
et al. 2011; Wang, Ding et al. 2013);nevertheless, it has been duly noted that these
developments and any such comparisons involving new models might be tainted by misspecification or inappropriate tuning of key hyperparameters that anchor their
corresponding distributional specifications (GIANOLA 2013).
SNP effects have been jointly analyzed under a multivariate WGP framework
across heterogeneous environments (Burgueno, de los Campos et al. 2012) or multiple
traits (CALUS and VEERKAMP 2011). For heterogeneous environments in WGP, genotype
by environment interaction (G×E) can be detected by modeling SNP specific intercept
and slope effects of environmental covariates (Lillehammer, Hayes et al. 2009) in random
regression (RR) and reaction norm (RN) models (Berry, Buckley et al. 2003; Cardoso
and Tempelman 2012). For multiple traits in WGP, pleiotropic regions on the genome
can be detected by modeling SNP specific pleiotropic effects in multivariate trait models
(van Binsbergen, Veerkamp et al. 2012). In RR/RN models and bivariate trait models, the
same prior densities can be specified on the genetic variance-covariance matrices (VCV)
of the SNP specific effects. Calus and Veerkamp (2011) proposed a multiple trait BayesA
model with a conjugate inverted Wishart (IW) prior on VCV. It has potential inflexibility
since uncertainty in all elements of a VCV is based on a single degrees of freedom
3

parameter (MUNILLA and CANTET 2012). Bello et al. (2010) suggested that the square
root free Cholesky decomposition (CD) of the VCV in bivariate mixed models might
allow greater flexibility as uncertainty can be differentially expressed on each element of
a VCV using such a parameterization.
There are three overarching goals in this dissertation as it pertains to meaningfully
developing WGP models with improved accuracies. Firstly, one objective was to help
potentially improve WGP accuracy by extending existing models to account for potential
spatially-induced correlations between SNP effects due to the proximity of QTL (Chapter
2) as originally anticipated by Gianola et al. (2003). A second objective was to
investigate computational strategies that one might use to be reasonably able to infer
upon all key hyperparameters that underlie these and other more conventional WGP
models in Chapter 3, including an assessment of the implications of hyperparameter
misspecification. Finally, I deemed it imperative to provide greater flexibility in bivariate
SNP effects modeling than currently developed for WGP models, whether for inferring
upon genotype by environment interactions from a reaction norm perspective (Chapter 4)
or to provide for bivariate trait WGP models (Chapter 5). I conclude this dissertation
with some further concluding thoughts and areas for future research in Chapter 6.

4

Chapter 2 A Bayesian antedependence model for whole genome prediction

2.1 Background
Whole genome prediction (WGP) using commercially available medium to high
density (>50,000) single nucleotide polymorphism (SNP) panels have transformed
livestock and plant breeding. Typically, the allelic substitution effects of all SNP markers
are jointly estimated in WGP evaluation models assuming additive inheritance and
summed to predict breeding values of each individual animal based on its SNP genotypes
(Meuwissen, Hayes et al. 2001). This technology will not only lead to dramatically
increased rates of genetic improvement for economically important traits such as meat
and milk production in livestock (Wiggans, VanRaden et al. 2011) or crop production
(Lorenz, Chao et al. 2011), but would also improve predictions of genetic predisposition
to human diseases for personalized medicine (de los Campos, Gianola et al. 2010).
Currently, the number (m) of available SNP markers is typically much greater
than the number (n) of animals having phenotypic records. Hence, hierarchical mixed
model or Bayesian approaches have been generally adopted in WGP to efficiently borrow
information across these many markers by specifying their corresponding effects to be
random. Following MEUWISSEN et al. (2001), these effects are typically specified to be
either Gaussian or Student t-distributed (BayesA), or a mixture of either of these two
densities with a point mass on zero (BayesB). When these effects are specified to be
Gaussian, then best linear unbiased prediction of these effects is typically pursued
because of computational tractability (VanRaden 2008; Hayes, Bowman et al. 2009);
applied to WGP, this procedure is often known as GBLUP. Thus far, the distributional
5

specifications for these various hierarchical modeling approaches have been based on a
prior assumption of independence between all such effects.
GIANOLA et al. (2003) anticipated that some of these effects might be spatially correlated
within chromosomes such that greater inference efficiency might be provided by
modeling these effects as correlated. Their proposed specifications required either
equally spaced markers and/or within-chromosome correlations depending strictly on
physical/linkage map distance between markers. However, the equally spaced
assumption is rather tenuous for most currently available SNP marker panels. Even more
importantly, the inferred correlation structure is likely to be nonstationary given that it
should be primarily driven by the proximity of SNP markers to quantitative trait loci
(QTL) of major effects. In other words, we anticipate that the correlation between the
inferred effects of adjacent SNPs distal to major QTL would be substantially smaller than
those proximal to these QTL.
Antedependence models have been increasingly advocated for the analysis of
repeated measures data (ZIMMERMAN and NÚÑEZ-ANTÓN 2010) to parsimoniously
account for nonstationary correlations between repeated measurements over time. In this
paper, we develop first-order antedependence counterparts to BayesA and BayesB.
Through simulation, a cross-validation study involving the publicly available
heterogeneous stock mixture data (Valdar, Solberg et al. 2006; Valdar, Solberg et al.
2006) and journal-provided reference data (HICKEY and GORJANC 2012) to benchmark
our proposed methods against others, we demonstrate that, compared to their
conventional counterparts, these antedependence-based WGP models improve the

6

accuracy of genomic merit prediction as well as potentially increase the sensitivity of
QTL detection, which is the key objective of genome wide association studies (GWAS).
2.2 Materials and Methods
2.2.1 Conventional WGP model
The base linear mixed model used for WGP is generally written as follows:
y = Xβ + Zg + Wu + e

[1]

Here, y = { yi }i =1 is a n x 1 vector of phenotypes, β is an p x 1 unknown vector of fixed
n

effects connected to y via a known n x p incidence or covariate matrix X (e.g.,
environmental effects), g = { g j }

m
j =1

is an m x 1 vector of random SNP effects connected

to y via a known n x m matrix of SNP genotypes coded as 0, 1, or 2 copies of the minor
allele for each SNP (column) and animal (row) in Z. Furthermore, u = {uk }k =1 is a q x 1
q

vector of random polygenic effects connected to y via a known n x q incidence matrix W,
and e = {ei }i =1 is the residual vector. We assume that u ~ N(0, As u2 ) , where A denotes
n

the pedigree-derived numerator relationship matrix (HENDERSON 1976), and is often
included in WGP models due to insufficient genome coverage by Z (CALUS and

( )

VEERKAMP 2007). Furthermore, we specify g ~ N ( 0, Σg ) where Σg = diag s g2 j and
e ~ N (0, Is e2 ) . From a Bayesian perspective, a subjective prior may be also specified on

(

)

β using β ~ N β 0 , Vβ with β 0 and Vβ taken as known (SORENSEN and GIANOLA 2002).

7

Now the distinction between GBLUP, BayesA, and BayesB in MEUWISSEN et al. (2001)

s g2∀j ), then the model
depends upon the characterization of Σg . If Σg = Is g2 (i.e., s g2=
j
is defined to be GBLUP. If, instead, the diagonal elements of Σg are independent
random draws from an scaled inverted chi-square distribution, i.e., s g2 j ~ χ −2 (ν g ,ν g sg2 )

( )

such that E s g2 j =

ν g sg2
, then the model is said to be BayesA such that marginally gj is
νg − 2

a random draw from a Student t distribution with mean 0, degrees of freedom ν g and
scale parameter sg2 (de los Campos, Naya et al. 2009; Gianola, de los Campos et al. 2009).
Now BayesB further extends BayesA by including a two-component mixture with one
component being χ −2 (ν g ,ν g sg2 ) and the other component being a spike or point mass at 0;
i.e.,

=0
with probability π g
2
 ~χ (ν g ,ν g sg ) with probability (1-π g )


s g2 | ν g , sg2 
j

−2

[2]

That is, π g ( 0 < π g < 1 ) represents the proportion of SNP markers having no associated
genetic effects on the trait of interest.
Clear warnings have been provided on how sensitive inferences using BayesA or
BayesB may be to specification of the hyperparameters (de los Campos, Naya et al. 2009;
Gianola, de los Campos et al. 2009). It has not been widely appreciated that ν g and sg2 are
estimable; this recognition is critical as both hyperparameters help define the genetic
architecture in BayesA and BayesB. That is, ν g characterizes the variability of s g2 j

8

about a typical variance component of sg2 . Details on how to estimate ν g and sg2 in the
context of BayesA were previously provided by YI and XU (2008). Furthermore, π g is
estimable in BayesB. For both BayesA and BayesB, we specify the prior distribution
ν g ~ p (ν g ) ∝ (ν g + 1) , similar to what we have previously adopted in other applications
−2

(Kizilkaya and Tempelman 2005; Bello, Steibel et al. 2010). Furthermore, we specify

π g ~ p (π g | απ , βπ ) = Beta (απ , βπ ) for BayesB, with values of απ and βπ chosen to reflect
prior uncertainty on π g . We also specify a proper conjugate prior on sg2 in BayesB; i.e.,
sg2 ~ p ( sg2 | α g , β g ) = Gamma(α s , β s ) recognizing that the specifications on α g and β g

become increasingly influential as π g → 1.
Finally, we specify noninformative priors s e2 ~ χ −2 ( −1, 0 ) and s u2 ~ χ −2 ( −1, 0 )
which are congruent with specifying uniform priors on s e and s u , respectively, and in
line with recommendations for variance components by GELMAN (2006). We similarly
and confidently specify sg2 ~ χ −2 ( −1, 0 ) in BayesA, given that m is generally large enough
for stable inference on sg2 without the need for more informative priors.
2.2.2 Antedependence extensions of WGP models
We propose a nonstationary first-order antedependence correlation structure for g
based on the relative physical location of SNP markers along the chromosome(s):

δ1

gj = 
 t j , j −1 g j −1 + δ j

if j = 1
if 2 ≤ j ≤ m

9

[3]

(

)

Here δ j ~ NID 0, s δ2 j , j = 1,..., m whereas t j , j −1 is the marker interval specific
antedependence parameter (ZIMMERMAN and NÚÑEZ-ANTÓN 2010) of g j on g j −1 in the
specified order. We can rewrite the recursive expression in [3] in matrix notation:
[4]

g=Tg+δ

where δ=

{δ }
j

m

= (I − T)g for I being a m x m identity matrix, and T having all null

j =1

values except for elements tj,,j-1 at the corresponding subscript addresses. It can be readily
seen using Equation [4] that var(g ) =
Σg =
( I − T ) Δ ( I − T ) ' where ( I − T ) is a
−1

−1

−1

{ }

lower triangular matrix with diagonal elements equal to 1 and Δ = diag s δ2j

m
j =1

. As

further illustrated in File S1 from the Supporting Information, Σg−1 =
( I − T )′ Δ −1 ( I − T ) is
a readily determined tri-diagonal matrix (ZIMMERMAN and NÚÑEZ-ANTÓN 2010) which is
important as it facilitates inference on g.
Some of the other developments closely follow the BayesA and BayesB models
of MEUWISSEN et al. (2001). That is, we specify s δ2j ~ χ −2 (ν δ ,ν δ sδ2 ) in a model which
we label ante-BayesA. Similarly, we propose an ante-BayesB model whereby we specify
a mixture similar to Equation [2] except that it is specified on s δ2j ; i.e., a mixture of point
mass on zero with probability π δ and scaled inverted chi-square prior χ −2 (ν δ ,ν δ sδ2 ) with
probability (1- π δ ). As we suggested earlier for π g , we believe that π δ is estimable such

10

that ante-BayesA is merely a special case of ante-BayesB. In turn, BayesA is merely a
special case of ante-BayesA, as is BayesB of ante-BayesB, when T = 0 ; i.e., tj,,j-1 = 0 ∀j .
These antedependence extensions, nevertheless, do require inference on the
unknown m-1 non-zero elements {t j , j −1}

m
j =2

of T. Borrowing from DANIELS and

POURAHMADI (2002) and BELLO et al. (2010), we specify t j , j −1 ~ N ( mt , s t2 ) as a
conjugate prior in both ante-BayesA and ante-BayesB, thereby allowing flexible
inference on the nonstationary correlation structure in Σg . However, it should be further
noted that if interval j,j-1 specifies that between the last SNP of one particular linkage
group or chromosome and the first SNP in the arbitrarily subsequent linkage group, then
we set the corresponding t j , j −1 = 0 . The remaining prior specifications are specified on
the hyperparameters that essentially characterize the hypothesized genetic architecture of
the trait and are virtually identical to those previously prescribed for BayesA and BayesB;
i.e., ν δ ∝ (ν δ + 1) , sδ2 ~ Gamma(α s , β s ) , π δ ~ Beta (απ , βπ ) with απ , βπ α s and β s again all
−2

2
specified as known. Similarly, we also estimate mt and s t by placing subjective priors,
2
mt ~ N ( mt 0 , st20 ) and s t2 ~ χ −2 (ν t ,ν t st2 ) on these key hyperparameters, where mt 0 , st 0 ,

ν t and st2 are specified to be known.
As in MEUWISSEN et al. (2001) and subsequent work, our implementation strategy
is based on the use of Markov Chain Monte Carlo (MCMC) methods; however, we also
additionally infer key hyperparameters; i.e., ν g (ν δ ) , sg2 ( sδ2 ) , and π g (π δ ) that characterize
the genetic architecture of the trait, as alluded to earlier. Further details on the full
11

conditional densities and any necessary Metropolis Hastings strategies to sample from the
joint posterior density of all unknown parameters using MCMC are provided in Appendix
A1.
2.2.3 Simulation study
We compare the performance of BayesA and BayesB with their antedependence
counterparts, ante-BayesA and ante-BayesB, in a simulation study. Twenty replicated
datasets were each generated from a base population containing 50 unrelated males and
50 unrelated females. Each dataset underwent random mating while maintaining constant
population size for 6001 generations beyond the base population. The entire genome was
composed of one chromosome of length 1 Morgan. All of 20,001 potential SNP markers
were equally spaced on this genome with a potential QTL placed directly in the middle of
each interval of adjacent markers. In the base population, all 20,000 QTL and 20,001
SNP marker alleles were coded as monomorphic. The number of simulated crossover
events per meiosis was generated from a Poisson (mean = 1) distribution with the
location of the crossover events uniformly distributed throughout the chromosome in
accordance with the Haldane mapping function. The mutation rate for both QTL and SNP
markers was specified to be 10-4 per locus per generation and to be recurrent, that is,
switching between one of two alternative allelic states 0 and 1 whenever mutation
occurred so as to ensure biallelic loci (Coster, Bastiaansen et al. 2010; Daetwyler, PongWong et al. 2010).
In Generation 6001, all SNP markers and QTL with a minor allele frequency
(MAF) less than 0.05 were discarded. We then randomly selected only 30 of the

12

remaining QTL and their corresponding allelic substitution effects. For each of these
k=1,2,…,30 QTL, an allelic substitution effect ( α k ) was drawn from a reflected gamma
distribution with shape parameter 0.4 and scale parameter 1.66, with a positive or
negative sign on α k sampled with equal probability. The genetic variance at QTL k was
2
determined to be 2 pk (1 − pk ) α k , where pk is the MAF at QTL k . The total genetic

variance was subsequently determined to be the summation of these terms across the 30
30

2
selected QTLs; i.e., as 2∑ pk (1 − pk ) α k . Now the true breeding values (TBV) were
k =1

defined to be a genotype-based linear function of the 30 generated QTL effects which,
because these QTL were located between various SNP, are not subsets of g. These TBV
were further scaled such that the total genetic variance was 1 as per MEUWISSEN and
GODDARD (2010). Residual effects were, in turn, sampled from a standard normal
distribution, such that the heritability was 0.50. That is, each phenotypic record was
generated by adding the TBV for that animal plus its corresponding residual. Hence, 100
animals with known phenotypes and genotypes in Generation 6001 were simulated for
inferring upon the SNP effects, using each of the competing methods. Genotypes and the
TBV for each of 100 offspring were also generated in Generation 6002, based on
randomly mating animals in Generation 6001.
For each of the 20 replicated datasets, the effect of 6 different marker densities on
the comparison between the competing methods were investigated by selecting every 1, 4,
7, 10, 15, and 20 SNP markers from those with MAF>0.05. That is, the datasets were
used as a blocking factor in comparing different marker densities for the accuracy of
predicting genetic merit in Generation 6002, using each of the four different methods:
13

BayesA, BayesB, ante-BayesA and ante-BayesB. Accuracy was defined as the
correlation between estimated breeding values (EBV) for Generation 6002, using just
Generation 6001 phenotypes and genotypes data, and the corresponding TBV of
Generation 6002. These EBV are based on the posterior mean ( g ) of g; i.e., EBV are
elements of Zg . Comparisons were also drawn between the BayesA/BayesB procedures
and their antedependence counterparts for inference on the key hyperparameters that
characterize genetic architecture. This was conducted using multifactorial ANOVA on
the posterior means using replicate as the blocking factor for assessing the importance of
model and marker density and their interaction across the 20 replicates. Furthermore, an
assessment of the relative ability of ante-BayesB compared to BayesB to identify the top
QTL by genetic variance was based on the difference in posterior probabilities of δ j and

g j , respectively, of adjacent SNP markers being non-zero. As QTLs were placed
between SNP markers and never on top of SNP markers, we calculated this probability of
association by determining the proportion of MCMC cycles that either or both of the two
markers adjacent to the known QTL were chosen to be non-zero within each analysis.
All comparisons were based on the linear mixed model in Equation [2] with X
being a column vector of ones, except that polygenic effects (u) were ignored for
simplicity and computational tractability.
2.2.4 Application to Heterogeneous Stock Mice Dataset
We used a dataset publicly available from the Wellcome Trust
(http://gscan.well.ox.ac.uk/) which includes phenotypic records on 2,296 mice, each
genotyped for 12,147 SNP markers. This data resource, which also includes pedigree
14

information, was based on an advanced intercross mating among 8 inbred strains after 50
generations of random mating (Valdar, Solberg et al. 2006). The average linkage
disequilibrium (LD), as measured by r2 between adjacent markers is 0.62 (Legarra,
Robert-Granie et al. 2008), which is high compared to commonly used SNP panels
available for livestock populations. For example, the average r2 between adjacent markers
in most commercially available livestock SNP panels ranges from 0.10 to 0.37 for
markers that are generally around 100kb apart (Du, Clutter et al. 2007; De Roos, Hayes et
al. 2008; Abasht, Sandford et al. 2009; Jarmila, Sargolzaei et al. 2010).
Given this high pairwise LD, we considered only a random subset of all markers
from this dataset to ensure adjacent marker LD levels that are representative of livestock
populations. We first excluded SNP markers if the percentage of missing genotypes
across samples was greater than 10% or if the MAF was less than 2.5%. We also
discarded animals having greater than 20% missing SNP genotypes. We then randomly
selected 50 SNP markers from each of the 19 autosomes, leading to an average LD of r2
of 0.35 between adjacent markers. The resulting dataset then involved records on 1917
animals with genotypes on 950 SNPs.
As in LEGARRA et al. (2008), we also added the random effect of cage in the WGP
2
model of [2]; i.e., y = Xβ + Zg + Wu + Sc + e where c ~ N (0, Is c ) and S is the

corresponding incidence matrix with all other terms defined as before. Furthermore, we
2
specified GELMAN’s prior s c2 ~ χ −2 ( −1, 0 ) on s c in addition to all previously provided

prior specifications. Also, as per LEGARRA et al. (2008),we chose to use the data
provided on body weight at 6 weeks that was already pre-corrected for fixed effects such
15

that X was a column vector of ones and β consisted of just an overall mean. Missing
SNP genotypes were simply imputed from binary distributions based on their
corresponding allelic frequencies in the dataset following LEGARRA et al. (2008). We
adopted the same within-family cross validation technique as described in LEGARRA et al.
(2008) by randomly partitioning each family into two. This partitioning was replicated
20 times to obtain 20 different nearly equal sized partitions of training and validation data
subsets. Also, as in LEGARRA et al. (2008), we compared the various methods using
predictive abilities, defined as the correlation between phenotypes in the validation subset
and their corresponding predictions based on their inferences from the training data
subset.
2.2.5 Application on Simulated Genomic Data from Hickey and Gorjanc
To provide a benchmark comparison of our proposed methods with competing
methods in other papers in this issue, we analyze simulated datasets provided by and
described in detail by Hickey and Gorjanc (2012). They generated 10 replicated datasets
for each of four different traits whereby 9000 QTL effects were generated for Trait 1 and
900 QTL effects were generated for Trait 2. Traits 3 and 4 mirrored Traits 1 and 2,
respectively, with the further requirement that the MAF for these QTL was less than 0.30.
Since we were permitted to simultaneously run 144 jobs on the High Performance
Computing Cluster at MSU (hpcc.msu.edu), we chose to compare the four methods for
each of the four traits on each of the first nine datasets (4x4x9=144). For all analyses,
training data were based on 2000 animals in Generations 4 and 5 whereas TBV were
provided on 500 animals within each Generation 6, 8 and 10. To facilitate computing
tractability, we saved every tenth SNP marker that had a MAF > 0.20. This led to a
16

range of 2884 to 2952 SNP markers and an average LD between adjacent markers from
0.16 to 0.17 across the nine replicates. All four models also included polygenic effects.
Antedependence methods were directly compared with their classical counterparts for
accuracy (correlation of EBV with TBV) and bias (deviation of slope from 1 from
regressing TBV on EBV) in these latter validation generations using a Wilcoxon signed
rank test.
2.2.6 Bayesian inference
For each of the four methods, BayesA, BayesB, ante-BayesA and ante-BayesB in
both our simulation study and the heterogeneous stock mice application, we ran MCMC
for 50,000 cycles of burn-in followed by an additional 300,000 cycles; for the benchmark
data from Hickey and Gorjanc (2012), the corresponding numbers were 80,000 and
1,000,000, respectively. Every tenth MCMC cycle was subsequently saved for inference
post burn-in. We monitored MCMC convergence via inspection of trace plots and
determined the effective sample size (ESS) for number of random draws from the joint
posterior density for all key hyperparameters using the R package CODA (Plummer,
Best et al. 2006). The larger number of MCMC cycles for the Hickey-Gorjanc data were
based on ensuring that ESS for all hyperparameters at least exceeded 100. Inferences
were primarily based on the posterior means and posterior standard deviations for key
parameters, including those hyperparameters that characterize genetic architecture.
2.2.7 Prior specifications
For all analyses in this paper, we chose απ = 10 and βπ = 1 in both BayesB and
ante-BayesB to reflect the prior belief that most of the markers will not be associated
17

with any genetic effects; however, the dispersion of this corresponding beta distribution
is still large enough such that values of π g ( π δ ) close to 0.70 are plausible. Based on
preliminary runs, we also found that this prior specification led to superior mixing
properties of the MCMC chains over a naïve Uniform(0,1) prior, yet facilitated
domination of data over prior information since απ + βπ << m.
For BayesB and ante-BayesB, we always specified α=
β=
0.1 for the Gamma
s
s
prior on sg2 ( sδ2 ). For the antedependence based models, we specified mt 0 = 0 , st20 = 1 ,

ν t = -1, and st2 = 0; i.e., a standard normal prior on mt and GELMAN’s prior on s t2 . We
also always specified a flat prior on β by defining Vβ−1 = 0. Prior specifications for all
other parameters (e.g., variance components) were based on those previously
recommended in this paper.
2.3 Results
2.3.1 Simulation Study
For the six different marker densities, the average distances between adjacent
markers ranged from 0.046 to 0.918 cM over the 20 replicates whereas the average LD
between adjacent markers, measured by r2 values, ranged from 0.15 to 0.31, as shown in
Table 2.1. Among the 30 chosen QTLs within each of the 20 replicates, anywhere
between 6 to 11 of the QTLs had variances greater than 2% of the total genetic variance.

18

Table 2.1: Summary statistics for 6 different marker densities in the simulation study over
20 replicates
Marker
density
level†

Average
number of
markers per
replicate

Average distance
between adjacent marker
loci (cM) per replicate

Average r2
between
adjacent marker
loci per replicate

1

108

0.918

0.15

2

145

0.689

0.18

3

217

0.459

0.21

4

311

0.321

0.24

5

545

0.184

0.27

6

2182

0.046

0.31

†Marker density levels 1 through 6 pertain to saving every 20th, 15th, 10th, 7th, 4th, and
every single SNP marker from a single 1M chromosome within each data replicate.

It is important to recognize that none of the modeling assumptions behind BayesA,
BayesB, ante-BayesA, or ante-BayesB truly match the data generation model based on
thousands of generations of LD created between markers and QTLs, even for simulated
data. This goes beyond the fact that the QTL effects were drawn from reflected Gamma
distributions in our simulation study as typically done (e.g., Meuwissen, Hayes et al.
2001; Meuwissen and Goddard 2010). That is, the process of recombination over
thousands of generations in terms of how it generates LD between QTL and SNP markers
is not explicitly captured in any known WGP model, including any of the competing
models, especially when the effects of neighboring SNP markers rather than the causal

19

QTL effects are being estimated. Hence, there is no way to surmise the “true” values of
key hyperparameters, whether for sg2 , ν g , or π g in BayesA or BayesB or for sδ2 ,ν δ , π δ , mt
or s t2 in ante-BayesA or ante-BayesB. However, one should anticipate that estimates of
2
sg2 or sδ should be inversely related to marker density, since they closely represent the

{ }

mean value of the variance components s g2 j

m
j =1

{ }

or s δ2j

m
j =1

, respectively, accounted for

by each SNP. Indeed, we observe this phenomena in the comparison between BayesA
and ante-BayesA in Figure 2.1A. We also note a similar comparison between sδ2 and sg2
for BayesB versus ante-BayesB in Figure 2.1B, but further recognize that the
corresponding estimates of sg2 and sδ2 are roughly one order of magnitude greater than
those seen in Figure 2.1A. That is, sg2 and sδ2 specify a typical value for s g2 and s δ2 ,
i

i

respectively, over many more loci in (ante)BayesA than their (ante)BayesB counterparts.
In spite of the lower values observed in Figure 2.1A, however, there was a significant
difference (P<0.01) between sg2 and sδ2 when r2 ≥ 0.21.

20

2

2

Figure 2.1: Average posterior means of sg (BayesA, BayesB) and sδ (ante-BayesA,
ante-BayesB) across 20 replicates for six different levels of LD comparing BayesA and
ante-BayesA in (A) and BayesB versus ante-BayesB in (B). Significant differences in
posterior means between competing methods at each LD level are indicated by
*(P<0.01), **(P <0.001), or ***(P <0.0001).

21

As marker density increased, we also expected that the estimates of π g or π δ
should increase as well; that is, it becomes increasingly unlikely that individual SNP
markers become associated with a particular QTL with greater marker density. Indeed
we observed this in Figure 2.2. It was particularly interesting that the posterior means of

π δ were generally lower than that of π g , with differences widening with increasing
marker density (i.e., LD level) such that the differences were significant beyond r2=0.24
(P<0.01). Note the subtle difference in interpretation between π g and π δ as π g pertains
to the probability of non-association for the corresponding SNP whereas π δ pertains to
the probability of non-association conditional on a neighboring SNP.

Figure 2.2: Average posterior means of π g (BayesB) versus π δ (ante-BayesB) across 20
replicates as a function of six different LD levels. Significant differences in posterior
means between competing methods at each LD level are indicated by *(P<0.01), **(P
<0.001), or ***(P <0.0001).

22

The estimates of ν g and ν δ also changed as a function of marker density for anteBayesA versus BayesA in Figure 2.3A and for ante-BayesB versus BayesB in Figure
2.3B. Specifically, the posterior means of ν g and particularly of ν δ both decrease with
increasing marker intensity. Since these parameters, respectively, characterize the
heterogeneity of s g2 j and s δ2j across SNP or, alternatively, the heaviness of the tails for
the resulting marginal Student t distribution on g j and δ j across SNP, our results imply
that these hierarchical methods, and particularly those based on nonstationary first order
antedependence correlation structures, identify SNP with large effects as being more
outlying relative to a normal distribution when marker density increases. However, these
differences between ν g and ν δ were not seen to be statistically significant at any marker
density.

23

Figure 2.3: Average posterior means of vg (BayesA, BayesB) and vδ (ante-BayesA,
ante-BayesB) across 20 replicates for six different levels of LD comparing BayesA and
ante-BayesA in (A) and BayesB versus ante-BayesB in (B). No significant differences
(P>.01) were determined between the two sets of competing procedures at each LD level.

24

Figure A2.1 and A2.2 (see Appendix A2) show, respectively, the average
posterior means for mt and s t2 against LD level across the 20 replicates under both anteBayesA and ante-BayesB. There was no evidence (P>0.01) across these 20 replicates
that the posterior means of mt were different from zero at any LD levels; however, at
higher LD levels, the posterior means tended to converge to zero as anticipated.
Similarly, Figure A2.2 showed that the posterior estimates for s t2 were also lower at
higher LD levels. Again, this was somewhat anticipated since there should be less
disparity in different values of the antedependence parameters ( t j , j −1 ) between adjacent
markers with increasing marker intensity.
The average accuracies of the EBV over the 20 replicated datasets are plotted as a
function of the average r2 (i.e., the different marker densities) between adjacent markers
for the four different methods in Figure 2.4. As anticipated, given the simulated genetic
architecture of few QTL, the accuracies for the BayesB methods were consistently
greater than their corresponding BayesA counterparts at all marker densities. Also, anteBayesA and ante-BayesB outperformed their classical counterparts with differences
increasing with LD level. Specifically, ante-BayesA had significantly greater accuracies
compared to conventional BayesA, as did ante-BayesB compared to BayesB (P<0.01),
when average LD levels exceeded r2 = 0.24.

25

Figure 2.4: Average accuracies of estimated breeding value across 20 replicates for
analyses based on each of six LD levels. Differences in accuracy between BayesA and
ante-BayesA (bottom symbols) and between BayesB with ante-BayesB (top symbols)
indicated as significant by *(P<.01) (*) or **( P<.001)

We anticipated that the antedependence parameters t j , j −1 's would have greater
importance at higher marker densities. To demonstrate this, we standardized the
posterior means of these parameters as a ratio over their posterior standard deviations, i.e.,
tj , j −1 =

E ( t j , j −1 | y )

var ( t j , j −1 | y )

, for each analysis. We then determined the proportion of these

tj , j −1 whose absolute value exceeded an arbitrary value of 2 for each data replicate and

26

marker density analysis to indicate the relative importance of these antedependence
parameters. We present boxplots of these proportions across the 20 replicates for anteBayesA and for ante-BayesB in Figure A2.3 (see Appendix A2). We anticipated and
noted that a higher proportion of tj , j −1 exceeded 2 in datasets characterized by higher
marker densities, thereby indicating that, in general, nonstationary serial correlation
between adjacent markers becomes increasingly more important with higher levels of LD.
We believe this phenomenon is responsible for driving the differences in accuracies
between ante-BayesA (ante-BayesB) and BayesA (BayesB) with increasing LD levels as
seen earlier in Figure 2.4.
Hierarchical methods that are similar to BayesB, in that they jointly infer upon all
SNP effects, have been increasingly advocated as tools for GWAS (Hoggart, Whittaker et
al. 2008; Lee, van der Werf et al. 2008; Logsdon, Hoffman et al. 2010). Figure A2.4 (see
Appendix A2) shows the average (across 20 replicates) posterior mean probabilities of
identifying the largest QTL by genetic variance within each replicate using BayesB and
ante-BayesB, respectively. These estimated posterior probabilities increased with LD
level for both models but were significantly greater for ante-BayesB than for BayesB
with statistical significance also increasing with LD or marker density. That is, the
precision for detecting QTL was increasingly greater for ante-BayesB compared to
BayesB at higher LD levels. We observed this consistently across data replicates with
the ability of ante-BayesB to better track causal variants increasing with marker density
(see Appendix A2, Figure A2.5).

27

2.3.2 Application to Heterogeneous Stock Mice data
We summarize posterior inferences of key parameters using BayesA and BayesB
in Table A2.1 and for their antedependence counterparts in Table A2.2 (see Appendix A2)
for the heterogeneous stock mice data. Inferences on s u2 , s c2 , and s e2 were consistent
with results previously reported by LEGARRA et al. (2008). As expected from our
simulation study, the estimates for ν g (ν δ ) and sg2 ( sδ2 ) were substantially greater for
BayesB (ante-BayesB) than for BayesA (ante-BayesA). Although the posterior mean for

π g of 0.81 (BayesB) was only slightly larger than π δ = 0.80 (ante-BayesB), the posterior
mean of sδ2 was substantially larger in ante-BayesB compared to sg2 in BayesB. The
average estimates ± empirical standard errors of predictive ability correlations over the 20
cross validation partitions of training and validation data subsets were 0.57±0.01,
0.62±0.01, 0.60±0.01 and 0.66±0.01 for BayesA, BayesB, ante-BayesA and ante-BayesB,
respectively. The differences between BayesA with ante-BayesA and BayesB with anteBayesB were both determined to be statistically significant (P<0.005), indicating the
relative advantage of the antedependence methods. Furthermore, BayesB and anteBayesB had significantly greater predictive abilities than BayesA and ante-BayesA,
respectively (P<.001).
2.3.3 Application to Hickey and Gorjanc Data
Average posterior means for key hyperparameters for each of the four methods
across the nine replicates are provided in Table A2.3, whereas the corresponding average
ESS are provided in Table A2.4 (see Appendix A2). Estimates of π g ( π δ ) and sg2 ( sδ2 )

28

were lower whereas estimates of ν g (ν δ ) are higher for traits with higher numbers of QTL
(Traits 1 and 3) compared to those with lower numbers of QTL (Traits 2 and 4), relative
to the same number of markers.
A side-by-side comparison of the accuracies of the four methods across the
validation generations (6, 8, and 10) is provided in Figure 2.5. It is remarkable to note
that ante-BayesA had generally significantly greater accuracies than BayesA for Traits 1
and 3 (larger numbers of QTL) that was still maintained until Generation 10, whereas
ante-BayesB had generally significantly greater accuracies than BayesB for Traits 2 and 4
(lower numbers of QTL) but only in Generations 6 and 8. An assessment of bias of the
four procedures based on regressing TBV on EBV is provided in Figure A2.6 (see
Appendix A2). For all traits, all four methods had some significant bias in Generation 6
but not in later generations.

Figure 2.5: Boxplots of average accuracies of estimated breeding value across 9
replicates for four traits in Generations 6, 8 and 10 for benchmark data from Hickey and
Gorjanc (2012). Differences in accuracy between anteBayesB (black) and BayesB (dark
gray) and between anteBayesA (light gray) with BayesA (white) indicated as significant
by *(0.05<P<.10), **( 0.01<P<.05) or ***( P<.01).
29

2.4 Discussion and Conclusion
In this paper, we extend two very popular Bayesian methods, BayesA and BayesB,
for WGP to model potential nonstationary correlations between SNP effects in close
proximity to QTL.

We demonstrated using a small-scale simulation study that the

accuracies of our proposed first order antedependence extensions, labeled ante-BayesA
and ante-BayesB, were greater than their classical counterparts with differences
increasing with marker density. This result was anticipated given that the magnitude and
importance of the antedependence parameters {t j , j −1}

m
j =2

in T should increase as marker

densities increase. To further illustrate the importance of modeling nonstationary
correlations, rather than basing correlations between SNP effects purely as a function of
distance (Gianola, Perez-Enciso et al. 2003), we observed the magnitude of the posterior
means of {t j , j −1}

m
j =2

at the various locations through the chromosome for the first four

replicates using ante-BayesA and ante-BayesB (see Figure A2.7). These posterior means
tended to be rather large in absolute value in the general vicinity of the major QTL. This
result was anticipated since each QTL is likely tracked by several SNP, each in
incomplete LD with the QTL (GODDARD and HAYES 2009). Interestingly, there appeared
to be a greater spread in these posterior means around a greater number of QTL using
ante-BayesB compared to ante-BayesA.
We realize that the order in which Equation [3] is specified for the
antedependence methods is rather arbitrary; i.e., one might specify Equation [3] from the
end of the p-arm to the end of the q-arm of a chromosome or vice-versa. For instance,
instead of specifying Equation [3] from j = 1, 2,..., m , we might have also modeled
30

antedependence in the opposite direction; i.e., from
=
j m, m − 1,...,1 . It has been
demonstrated by ZIMMERMAN and NÚÑEZ-ANTÓN (2010) in the context of longitudinal
data analysis that the (co)variances based on a first order antedependence model are
invariant to directionality as long as the relative order is correctly specified. To illustrate
this, we re-analyzed the first four replicates based on the highest average marker density
(r2 = 0.31), again using ante-BayesA and ante-BayesB but this time specifying Equation
[3] in the opposite direction from what was used previously. We plotted the posterior
means ( g ) of g based on the analysis in the original direction again the same estimates
based on the analysis in the opposite direction (Figure A2.8), further demonstrating that
inferences on EBV are invariant to direction. Noting that the EBV are linear functions
of g , i.e., Z g , we noted even greater consistency for EBV between the two directions
for these same four replicates in Figure A2.9 (see Appendix A2).
Given that the accuracy of EBV was greater using the antedependence-based
procedures compared to their classical counterparts, we examined the elements of g for
each marker between the two different classes of models within each of the first four
replicates and the highest average marker density (Figure A2.10). It was interesting to
note that these elements were more shrunk to zero using the antedependence-based
procedures compared to the conventional counterparts.

Given the specification of

nonstationary correlations between effects of adjacent SNP markers, the effective number
of SNP may be considered to be somewhat lower using the antedependence based
procedures, particularly in regions containing a major QTL. That is, larger elements of
g using the antedependence-based models will tend to be more highly correlated and

31

hence shrunk closer to zero compared to elements of g derived from their conventional
counterparts. GODDARD et al. (2009) have previously described the optimal statistical
properties of g when g is specified as a set of “random effects” having an exchangeable
distribution, such as the normal or Student t. However, these properties are better
realized when a more appropriate correlation structure is specified; as would be true, for
example, modeling polygenic effects with a classical animal model (HENDERSON 1984).
The more optimal properties of g using the antedependence based models was also
partly reflected in the earlier EBV comparisons with their classical counterparts.
In spite of the limited scale of our simulation studies, it has been demonstrated
that inferences on accuracy based on 100 individuals and a genome length of 1 M is
roughly equivalent to inferences derived from 3000 individuals and a genome length of
30 M (MEUWISSEN and GODDARD 2010), the latter of which might depict a more
common scenario in livestock populations. We also based all of our simulation work on a
heritability of 50%; for situations with lower or higher heritabilities, we would naturally
expect the accuracies to respectively, decrease or increase accordingly in concert with
previous simulation results (CALUS and VEERKAMP 2007) and/or analytical derivations
(MEUWISSEN and GODDARD 2010); however, we believe that there is no reason to believe
that the antedependence methods would not outperform the conventional Bayesian WGP
methods in these situations as well; this was further substantiated by our analyses of the
data from Hickey and Gorjanc (2012) which was based on a heritability of 25%. In our
own simulation study, there were around 2200 markers per the single 1M chromosome
using the highest average LD level of r2= 0.31, whereas there were around 100 markers
per the single chromosome with the lowest average LD level of r2= 0.15. Using
32

Meuwissen and Goddard (2010), these two specifications are, respectively, analogous to
a panel of 60,000 SNP markers and to a panel of 3,000 SNP markers for a 30 M genome;
commercially developed panels having roughly these same numbers of SNP markers are
now widely available for cattle (Wiggans, VanRaden et al. 2011). Based on the results of
our work, we anticipate that the antedependence-based methods, compared to their
classical counterparts, would lead to even greater accuracies with higher density SNP
marker panels (m>500,000) that are being developed for livestock or for situations where
there is sequence data (MEUWISSEN and GODDARD 2010).

Along those lines, we

anticipate that these methods would also perform better in populations where LD is
greater between markers due to other phenomena; e.g., selection history.
Our simulation studies were also based on a particular genetic architecture; i.e. 30
QTL that were randomly distributed throughout a 1 M chromosome (or equivalently, 900
QTL for a 30 M genome). Although this is not the focus of our paper, we realize that
genetic architecture (i.e., number of QTL, average QTL substitution effect, marker
density, etc.) can impact the relative merit of BayesA, BayesB, and GBLUP based on
other studies where key hyperparameters such as π g , ν g and sg2 are arbitrarily specified to
be known (Daetwyler, Pong-Wong et al. 2010; Meuwissen and Goddard 2010). That is,
the greater the number of QTL, each with small effects, relative to the number of SNP
markers, the more likely the genetic architecture reflects the GBLUP assumptions ( π g = 1 ,

ν g → ∞ such that s g2 j= sg2∀j ). Conversely, BayesB would be favored in the situation
where SNP marker density is high relative to the number of QTL ( π g < 1 ). However, we
believe that formal comparisons in data fit between BayesA, BayesB, and GBLUP, along
33

with ante-BayesA and ante-BayesB, are not entirely necessary since ante-BayesB
represents the most general model.

As previously noted, ante-BayesA is a special case

of BayesA, as is ante-BayesB of BayesB, when T = 0 such that then π δ = π g , ν δ = ν g , and
sδ2 = sg2 . Furthermore, BayesB becomes BayesA as π g → 1 , whereas BayesA becomes

GBLUP as ν g → ∞ . Nevertheless, our claim that one only needs to fit ante-BayesB,
rather than any of the other three competing submodels, vitally depends upon reliable
inferences being provided on these key hyperparameters defining genetic architecture,
rather than arbitrarily specifying them (Daetwyler, Pong-Wong et al. 2010; Meuwissen
and Goddard 2010) or estimating a subset thereof (Habier, Fernando et al. 2011). We
provide details on MCMC inference strategies on these and other unknown parameters in
Appendix A1.

We are currently pursuing more suitable inferential strategies for

variable selection (O'HARA and SILLANPAA 2009) when inferring upon π g or π δ . Also,
although our proposed antedependence methods seem to work well under additive
genetic model assumptions, it is not clear how well they may perform in the presence, for
example, of extensive non-additive gene action where nonparametric approaches may be
warranted (Gianola, Wu et al. 2010). Nevertheless, even in the extensive presence of such
phenomena, genetic variance is still considered to be primarily additive (Hill, Goddard et
al. 2008).
Although the scope of this work was focused on the potential merit of these
antedependence models for WGP, we suggested earlier that there may be also merit for
using these models in GWAS in both livestock and human populations. It has become
increasingly recognized that GWAS procedures based on joint analyses of all SNP
34

markers are more powerful than the conventional series of single SNP analyses. Our
results suggest that modeling nonstationary correlations between SNP effects will further
augment this power. At any rate, we recognize that for reasonably accurate GWAS, that
a greater marker density (m) per chromosome and sample size (n) should be considered
(e.g., MEUWISSEN and GODDARD 2010) than those studied in this paper; i.e., most of the
posterior probabilities reported in Appendix Figure A2.4 and Figure A2.5 are too low to
be of practical benefit in current applications.
We also acknowledge that our ante-BayesA and ante-BayesB models increase the
computational load relative to their conventional counterparts. Since m is typically large,
the computing time for the proposed antedependence models is bottlenecked primarily by
the m elements of δ , the m diagonal elements of ∆ and the m-1 non-zero elements of T.
Similarly, computing time for the two conventional methods, BayesA and BayesB, is
primarily restricted by the dimension of δ and ∆ ; i.e., roughly 2/3 as many variables for
the antedependence-based models, ignoring the remaining parameters such as variance
components and hyperparameters. Hence, the computing time for the antedependence
based procedures should be somewhat less than 1/3 greater than for their conventional
counterparts. Indeed, we discovered from our simulation study that computing time for
all four competing models were linear in m with the antedependence based models taking
less than 30% greater computing time compared to the conventional counterparts for the
wide range of values of m considered in this paper. We recognize for much larger
number of SNP markers, than those pursued in this study, that alternative algorithmic
adaptations already developed for models similar to conventional BayesA or BayesB,

35

such as those based on the EM algorithm (Shepherd, Meuwissen et al. 2010) or
variational Bayes (Logsdon, Hoffman et al. 2010), would be worth exploring.
We believe the proposed antedependence models provide opportunities for further
study and extension.

For example, it has been previously recognized that basing

inferences on allelic effects on the use of multiple marker haplotypes rather than single
markers increases accuracy of WGP (Calus, Meuwissen et al. 2008; Villumsen, Janss et
al. 2008) or GWAS (Grapes, Dekkers et al. 2004). Given the difficulty in how to
appropriately specify these haplotypes, we believe our antedependence-based methods
may help bridge these two different strategies as the effects of adjacent SNP markers
connected by large values of t j , j −1 may somewhat determine “effective haplotype”
effects.

We also think that our antedependence specifications might facilitate multiple

breed inference if, for example, genomic effect differences between breeds is primarily
due to differences in SNP associations with QTL, as partly manifested in T

36

Chapter 3 Improving the computational efficiency of fully Bayes inference and
assessing the effect of misspecification on hyperparameters in whole genome
prediction model

3.1 Introduction
Genomic predictions based on high density single nucleotide polymorphisms
(SNP) markers distributed over the whole genome have become increasingly adopted for
animal and plant breeding. Parametric Bayesian methods have been particularly popular,
most notably BayesA and BayesB as first presented by MEUWISSEN et al. (2001). BayesB
specifies a mixture prior on the SNP specific effects having point mass at zero with
probability π or randomly drawn, with probability (1- π), from a Student t distribution
with degrees of freedom v and scale parameter s2; BayesA is BayesB with π = 0. Hence
π is typically believed to be the proportion of SNPs that are not associated or in linkage
disequilibrium (LD) with causal variants although this interpretation is somewhat
complicated by the existence of LD. These hyperparameters (v, s2 and π) are relevant in
that they partly determine the genetic architecture of traits and can be further shown to
depend upon SNP marker densities used in the analyses (YANG and TEMPELMAN 2012) .
Now inference in BayesA/B like models is conducted using either Markov Chain
Monte Carlo (MCMC) methods for fully Bayesian inference or faster albeit approximate
methods based on the use of the expectation maximization (EM) algorithm or its various
derivatives (Shepherd, Meuwissen et al. 2010). Unfortunately, it has not been readily
established how to properly infer upon these hyperparameters in the EM based methods
such that they are often arbitrarily “tuned” or specified (KARKKAINEN and SILLANPAA
2012). Furthermore, although it is possible to infer upon these same hyperparameters
37

using MCMC, the poor efficiency and speed of these implementations have seemingly
discouraged this practice (de los Campos, Hickey et al. 2013). In particular, it has been
noted that the correlation between v and s2 across MCMC cycles is generally so large that
these two hyperparameters are nearly non-identifiable from each other (Habier, Fernando
et al. 2011; de los Campos, Hickey et al. 2013). This particular MCMC analysis was
based on a strategy first presented by Yi and Xu (2008) that invokes a Gibbs update for
the full conditional density (FCD) on s2, as it is conditionally conjugate with a Gamma
prior, whereas a Metropolis Hastings (MH) update was used on sampling from the FCD
of v since it is not recognizable (YI and XU 2008). We label this particular algorithm as
DFMH (i.e., sampling v using MH) and it is the control or reference strategy for this
paper.
Now computational efficiency in MCMC schemes is related to the degree of
mixing or autocorrelation between subsequent samples of the same parameter. The most
popular metric for inferring the degree of mixing or autocorrelation for a fixed number of
MCMC cycles is the effective sample size (ESS), which can be readily computed using
software packages like CODA (Plummer, Best et al. 2006). The ESS determines the
effective number of independent draws such that a greater degree of autocorrelation
between subsequent samples for the same parameter would lead to a smaller ESS and
hence poorer computational efficiency. Now, although there are clear exceptions,
MCMC sampling strategies that lead to a greater ESS for a certain total number of
MCMC cycles tend to have greater computational cost per cycle. This realization is
reflected in other recent quantitative genetics applications (Shariati and Sorensen 2008;

38

Waagepetersen, Ibanez-Escriche et al. 2008) who derived various metrics to integrate
together these two components of computational efficiency.
We surmised that there may be a number of strategies that could improve the
computational efficiency of inferring upon key hyperparameters in a BayesA/B WGP
model compared to DFMH. Furthermore, the efficiency of any such strategy could
markedly depend on the use of an appropriate scale. For example, a highly nonlinear
relationship between two variables can be rendered somewhat linear by transforming
either one or both of the corresponding parameters. When v and s2 are both logtransformed, the resulting scatterplot of the transformed variables against each other
tends to demonstrate a more linear relationship. Hence this change of variable might
facilitate potentially more efficient MCMC sampling strategies based on multivariate
proposal densities, for example.
Specification of key hyperparameters in a BayesA/B WGP model has been treated
arbitrarily in a wide selection of genomic selection topics. Meuwissen et al. (2001) chose
4.2 as v and 0.04 as s2 for BayesB model based on population genetics arguments in a
simulation study. Daetwyler et al. (2010) set both v and s2 to 1 under their BayesB model
analyses across all simulation scenarios. In various simulation scenarios, it seems
reasonable to choose different specifications on the key hyperparameters as their
estimates were dependent on many factors such as marker density in the simulation study
(YANG and TEMPELMAN 2012).
There were two primary objectives in this study. First, we wanted to explore
alternative strategies to improve the computational efficiency of estimating
39

hyperparameters in BayesA/B WGP models. Secondly given the prevalent practice of
specifying rather than estimating these hyperparameters, we wanted to assess the impact
of misspecifying these hyperparameters on accuracy of breeding value prediction.
3.2 Materials and Methods
3.2.1 WGP Model
The WGP model used for comparison of the various computational strategies
and/or hyperparameter specifications can be denoted as follows:
m

yi =+
x i' β ∑ zij g j + ei

[1]

j =1

Here yi is the phenotype for ith animal (i=1,2,…,n), β is a vector of fixed effects such
'
that xi is the known incidence row vector connecting yi to β, zij is the genotype covariate

for SNP j on animal i coded as either 0, 1, or 2 copies of a reference allele for SNP j on
animal i, g j is the random effect for SNP j, and ei is the residual. The WGP model in
matrix algebra notation can be written as:

y = Xβ + Zg + e

[2]

Where X = {xi' } , Z = { zij } , g = {g j } ~ N ( 0, Σg ) with variance-covariance matrix
j =1
i=1
n

m

( ) and residual vector e= {e }

G = diag s g2 j

n
i i =1

~ N (0, Is e2 ) .

We compared three sampling strategies under BayesA and BayesB specifications
(Meuwissen, Hayes et al. 2001) on s g2 j in WGP model. In BayesB, s g2 j has a mixture
40

prior of two components: a scaled inverted chi-square distribution with s g2 j ~ χ −2 (ν ,ν s 2 )
with probability π and a spike at 0 with probability (1 − π ) . Here π loosely represents
the proportion of SNP markers having associated genetic effects on the phenotype.
BayesA is a special case of BayesB when π = 1. Following Yang and Tempelman
(2012), we specify the following prior distributions on the hyperparameters as:

ν ~ p (ν ) ∝ (ν + 1) and the Gelman prior s 2 ~ χ −2 ( −1,0 ) (GELMAN 2006) for BayesA, a
−2

Beta (=
απ 1,=
βπ 8 ) for
proper conjugate prior s 2 ~ Gamma ( 0.1,0.1) and π ~ p (π | απ , β=
π)
BayesB. For all three computational strategies that we subsequently describe, we adapt
the same commonly used MCMC strategies for sampling from all parameters/random
variables, other than ν , s , and π , as outlined by Meuwissen et al. (2001) for example.
2

We now describe each of the three computational strategies in turn.
2
3.2.2 Univariate Metropolis Hastings sampling on ν and Gibbs update on s

(DFMH)
This strategy, which we designate as DFMH, closely follows Yi and Xu (2008).
The FCD of ν does not have a recognizable form; hence sampling from this FCD
requires a strategy other than a Gibbs step. Here, we used the MH algorithm to sample
from the FCD of ν drawing from our experiences in various other applications
(Kizilkaya, Carnier et al. 2003; Kizilkaya and Tempelman 2005; Bello, Steibel et al. 2010;
Yang and Tempelman 2012). More specifically, we generate from the FCD of

ξ = log(ν ) , ensuring that the FCD of ξ takes into account the Jacobian of the
transformation from ν to

ξ (see Appendix B1.1). Since ξ can conceptually be defined
41

anywhere on the continuous real line, we believe this transformation better justifies the
use of a Gaussian proposal density centered on the value of

ξ from the previous MCMC

cycle; i.e., a random walk MH step (CHIB and GREENBERG 1995); alternatively, a
heavier-tailed Student t proposal density (CHIB and GREENBERG 1995) could be used as
well. During the first half of burn-in, we adaptively tune the variance of this proposal
density such that the MH acceptance ratios are intermediate (i.e., 25-75%) adapting the
strategy described by Muller (1991) and in accordance with standard recommendations
(Gelman, Carlin et al. 2003; Carlin and Louis 2008). This proposal density variance was
then fixed for the last half of burn-in in order to ensure a proper convergent MCMC
2

algorithm. Yi and Xu (2008) demonstrated that the FCD of s is Gamma, provided that a
conditionally conjugate Gamma or noninformative prior is used. Using the Gelman prior
2

(GELMAN 2006) for s as we have previously advocated for BayesA (YANG and
TEMPELMAN 2012), the FCD of s can be shown to be Gamma with shape 0.5 ( mν + 1)
2

m

and scale 0.5ν ∑ s g2 j . Hence, for DFMH, we sampled ν using the described MH update
j=1

2

and s with a Gibbs update. In DFMH, we sampled 10 MH samples per MCMC cycle
for ν .
3.2.3 Univariate Metropolis Hastings sampling for each of ν and s (UNIMH)
2

Metropolis Hastings sampling, if properly tuned with good proposal densities and
intermediate acceptance rates, can often lead to faster mixing and hence greater MCMC
efficiency relative to Gibbs sampling. This is because MH sampling typically proposes
bigger jumps throughout the posterior density compared to the use of Gibbs sampling.
42

Hence, we propose a second strategy, UNIMH, whereby we again use MH to sample
from ν but also use MH to sample from s . As with ν in DFMH, we sample s by first
2

2

( )

a change of variable to its logarithm (i.e., ψ = log s ) and use a random walk MH
2

algorithm based on a Gaussian proposal density for ψ . Similar to what was done for ν ,
the variance of this proposal density was only tuned for intermediate acceptance rates
during the first half of burn-in to ensure a properly convergent MCMC chain. In UNIMH,
10 MH samples per MCMC cycle were specified for sampling ν and s . Details on this
2

strategy are further provided in Appendix B1.2.
2
3.2.4 Bivariate Metropolis Hastings sampling on ν and s (BIVMH)

2
As previously noted, the posterior correlation between ν and s can be high;

hence, it might be advantageous to jointly sample both parameters together with a
bivariate random walk MH sampler as demonstrated with another application by
Ntzoufras (2011). Hence, we propose a third sampling algorithm that we label BIVMH.
Here, we divided the burn-in period for this strategy into four stages of equal lengths with
respect to the number of MCMC cycles; arguably, a more efficient implementation might
be possible given that these stages may not necessarily need to be of the same length. In
Stage 1, we sampled log(ν ) and log( s ) from their respective FCD using the UNIMH
2

strategy previously described, fine-tuning the variances of the two separate Gaussian
proposal densities to ensure MH acceptance rates falling between 25% and 75%. In
Stage 2, we sampled log(ν ) and log( s 2 ) using UNIMH, fixing the variances of their
respective proposal densities to those values tuned at the end of Stage 1 while computing
43

the empirical correlation between the samples of log(ν ) and log( s ) drawn within the
2

same cycle. In Stage 3, log(ν ) and log( s ) were jointly sampled together using a
2

bivariate Gaussian proposal density with variances based on those tuned at the end of
Stage 1 and a covariance based on the correlation computed from Stage 2. During Stage
3, we further fine-tuned the proposal variances to ensure intermediate acceptance rates
for joint samples of log(ν ) and log( s ) with the proposal covariance based on the same
2

correlation derived in Stage 3. In Stage 4, we drew samples using the same joint MH
random walk from the newly tuned bivariate Gaussian proposal density in Stage 3 but
without further tuning in order to ensure a proper convergent MCMC chain. Upon the
end of Stage 4, and hence burn-in, we saved samples for the hyperparameters of ν and s 2
(i.e., back-transformed) for MCMC-based fully Bayesian inference. Ten MH samples
per MCMC cycle for ν and s were drawn at each Stage. Details on this strategy are
2

further provided in Appendix B1.3.
3.2.5 Simulation Study
In order to compare the efficiency of the three sampling strategies, DFMH,
UNIMH and BIVMH under BayesA and BayesB modeling specifications, we simulated
15 replicated datasets using the HaploSim package in R (Coster, Bastiaansen et al. 2010).
The simulated genome was composed of one chromosome of length 1 Morgan consisting
of 100,000 equally spaced loci. For each of the 100 animals in the base population, every
5th locus on this chromosome was heterozygous (i.e., for a total of 20,000 such loci)
whereas the remaining 80,000 loci were completely monomorphic, similar to that in
Coster et al. (2010). Individuals were randomly mated to generate 100 animals within
44

each of 6000 subsequent generations in order to generate LD between loci. The number
of recombinations per each meiosis event was drawn from a Poisson(1) distribution with
the position of each recombination being randomly drawn from a uniform distribution on
the chromosome (i.e., no interference). Furthermore, we specified the recurrent mutation
rate to be 10-5 per locus per generation.
After Generation 6000, random matings were used to augment the population size
to 1000 individuals in Generation 6001. In Generation 6001, we deleted loci with a
minor allele frequency (MAF) less than 0.05 and randomly selected 30 from the
remaining loci to be quantitative trait loci (QTL). Following Meuwissen et al. (2001), we
simulated substitution effects α for these 30 QTL from a reflected gamma distribution
with shape parameter 0.4 and scale parameter 1.66 such that the true breeding values
(TBV) were genotype-based linear combinations of α . Phenotypes for animals in
generation 6001 were generated based on heritability of 50%; i.e., such that s e2 =
var(TBV). Additionally, genotypes for 1000 offspring in generation 6002 were based on
random matings of individuals in Generation 6001. Again, TBV were based on linear
combinations of α based on QTL genotypes inherited from Generation 6001.
After discarding SNP with MAF< 0.05, we then selected every 1st, 4th and 10th
SNP markers for inclusion in analyses in order to consider the effect of different marker
densities; i.e., high (around 2394 SNPs), medium (around 598 SNPs), and low (around
239 SNPs). We compared the computational efficiency of inferring on key
hyperparameters (e.g., v and s2) for genetic architecture under these three different
marker densities. We ignored fitting polygenic effects for all comparisons in the

45

simulation study to facilitate further computational feasibility, recognizing that the
relative efficiency of each strategy should not differ otherwise.
We compared the computational efficiencies of the three MCMC strategies on
each replicated dataset, considering each of three different marker densities. Now
computational efficiency, as it pertains to a particular hyperparameter, was defined as the
effective sample size (ESS) for the post-burn-in MCMC cycles divided by total CPU time;
i.e. ESS/CPU recorded in #/seconds. That is, the greater ESS/CPU, the greater the
computational efficiency for inferring the posterior density of that particular
hyperparameter.
Given that many researchers do not infer upon some or even all hyperparameters
in WGP models because of perceived inferential challenges, we thought it important to
assess the impact of their misspecification on the accuracy of genomic prediction. Using
the same simulated data as described previously, we focused on five different scenarios,
all at the medium marker density (selecting every 4th marker). Each scenario was based
on setting s2 to be an arbitrary multiplicative constant of the average posterior mean at the
2
medium marker density ( smed
) based on the complete (BayesA or BayesB) model that

was used to infer upon the other hyperparameters (ν and π where applicable) as well.
2
2
2
These five scenarios were to set 1) s2 = smed
, 2) s2 = 0.1 smed
, 3) s2 = 0.01 smed
, 4) s2 = 10
2
2
2
, and 5) s2 = 100 smed
. Note that the specification of smed
depended upon which
smed

model (BayesA or BayesB) was employed, as described later.
We also wondered if one could roughly specify a good working value for s2 by
merely basing it on an estimate derived from, say, analysis of the same phenotype but
46

based on a SNP panel with a different marker density. As s2 represents a typical value for
the SNP-specific variances s g2 j , then its value should be inversely related to the number
of SNP markers as we have observed previously (YANG and TEMPELMAN 2012). For
example, given that there were four times as many markers at the high marker density as
there were at the medium marker density in the simulation study, an initial specification
2
for s2 at the high marker density is to use s2 = 0.25 smed
. Similarly, since there were half

as many SNP markers for the low marker density specification, an initial specification in
2
. These specifications for s2 were compared for their effect
that case might be s2 = 2 smed

on accuracy of breeding value prediction relative to the situation where s2 is inferred
upon along with all other hyperparameters under both BayesA and BayesB for all 15
replicated datasets.
In all cases, accuracy was defined as correlation between estimated breeding
values (EBV) and TBV where EBV =Z g for g is the posterior mean of g and TBV is
defined as before.
3.2.6 Data Application: Assessment of computational efficiency comparisons
In this dataset, 2,296 mice were genotyped for 12,147 SNP markers with a high
pairwise LD of r2=0.6 (Legarra, Robert-Granie et al. 2008). After data cleaning on
genotypes (YANG and TEMPELMAN 2012), there were 1940 animals with 10,467 SNP
markers. We selected 50, 100 and 200 SNP markers from each of the 19 autosomes to
create three different marker densities using pre-corrected body weight at 6 weeks as our
phenotypes. As in Yang and Tempelman (2012), we also modeled the random effects of
cage in addition to SNP effects and polygenic effects in the WGP model using the
47

2
2
Gelman prior (GELMAN 2006) specified on the cage s c and the polygenic variance s u .

After merging phenotypes with the genotypes, we were left with 1917 animals with
complete phenotypes and genotypes on 950, 1900 and 3800 SNP markers across the 19
autosomes.
3.3 Results
3.3.1 Simulation Study
By selecting every single, 4th and 10th SNP markers for inclusion, the average r2
between adjacent SNPs, was 0.17, 0.24 and 0.32 for the three marker densities over the
15 replicated datasets. Inferences on s2 in the three sampling strategies DFMH, UNIMH
and BIVMH were compared under both BayesA (Appendix Figure B2.1A) and BayesB
(Appendix Figure B2.1B) specifications. We observed estimates (posterior means) of s2
decreased as the marker density increased and that estimates derived from BayesB were
generally one order of magnitude greater than those in BayesA. Furthermore, estimates of

π generally increased (Appendix Figure B2.2) whereas estimates of v generally
decreased as marker density increased (Appendix Figure B2.3). All of these results are
consistent with our previous work (YANG and TEMPELMAN 2012).
For quality control, we checked to see that the estimates for the key
hyperparameters should be the same between the three computational strategies within
each replicated dataset under the same model, allowing for Monte Carlo error. Pairwise
scatterplots of the estimates of s2 under the three different strategies for each of the three
different marker densities in BayesA (Appendix Figure B2.4) and in BayesB (Appendix
Figure B2.5) indicated generally good agreement as did for π using BayesB (Appendix
48

Figure B2.6) and for ν under BayesA (Appendix Figure B2.7) and BayesB (Appendix
Figure B2.8). The greatest differences between DFMH and UNIMH or between DFMH
and BIVMH were found in estimating ν under BayesB (Appendix Figure B2.8).
Figure 3.1 and 3.2 illustrate side-by-side boxplots of ESS/CPU for each of the
three strategies under each of the three marker densities for v and s 2 , respectively, under
the BayesA model. In all cases, ESS/CPU were higher for BIVMH and UNIMH
compared to DFMH (P<0.05). For v at high marker density (r2=0.32), BIVMH had
higher ESS/CPU than UNIMH (P<0.0001). For s 2 at high and median marker densities
(r2=0.32 and 0.24), ESS/CPU were higher for BIVMH than UNIMH (P<0.0001).
Interestingly, the differences in efficiencies between the three strategies widened as
marker density increased. Efficiencies for the three alternative sampling strategies were
also compared under BayesB model for v (Figure 3.3), s 2 (Figure 3.4), and π (Figure
3.5). We found that UNIMH and BIVMH had significantly greater computational
efficiencies compared to DFMH for all three hyperparameters (P<0.05). For v , UNIMH
had significantly higher ESS/CPU compared to BIVMH at low marker density (P<0.05).
For s 2 , UNIMH had significantly greater computational efficiencies compared to BIVMH
at median marker density whereas BIVMH had higher ESS/CPU than UNIMH at low
marker density (P<0.05). For π , BIVMH had higher ESS/CPU compared to UNIMH at
high and median marker densities (P<0.05).

49

Figure 3.1: Boxplots of effective sample size for v divided by total CPU time in seconds
across 15 replicates for three different levels of LD comparing DFMH, UNIMH and
BIVMH under BayesA model. Different letters indicate significant difference with P <
0.05.

2
Figure 3.2: Boxplots of effective sample size for s divided by total computational time
in seconds across 15 replicates for three different levels of LD comparing DFMH,
UNIMH and BIVMH under BayesA model. Different letters indicate significant
difference with P< 0.05.

50

Figure 3.3: Boxplots of effective sample size for v divided by total computational time in
seconds across 15 replicates for three different levels of LD comparing DFMH, UNIMH
and BIVMH under BayesB model. Different letters indicate significant difference with P
< 0.05.

Figure 3.4: Boxplots of effective sample size for s 2 divided by total computational time
in seconds across 15 replicates for three different levels of LD comparing DFMH,
UNIMH and BIVMH under BayesB model. Different letters indicate significant
difference with P < 0.05.

51

Figure 3.5: Boxplots of effective sample size for π divided by total computational time
in seconds across 15 replicates for three different levels of LD comparing DFMH,
UNIMH and BIVMH under BayesB model. Different letters indicate significant
difference with P< 0.05.

We also separately looked at the components of computational efficiency; i.e.,
ESS and CPU/cycle in seconds for each parameter in both models and under all three
strategies. As anticipated, DFMH was computationally less expensive in terms of
CPU/cycle compared to the proposed strategies UNIMH and BIVMH; however, the ESS
for the 40,000 MCMC cycles that were drawn in each analyses were such that UNIMH
and BIVMH generally far exceeded that of DFMH. What was particularly ominous was
how quickly the ESS measures degraded with increasing marker densities thereby
suggesting that high density marker panels lead to analyses that require not only greater
CPU/cycle but also a greater number of MCMC cycles to ensure that ESS values are
sufficiently great enough to ensure reliable inference.
We were interested as to whether accuracy of breeding value prediction might
depend on misspecification of hyperparameters, say, s 2 . We assessed the impact on

52

accuracy of breeding value predictions based on setting s 2 to a wide range of values
2
based on various multiples (0.01x to 100x) of the average posterior mean ( smed
=7x10-4

2
for BayesA, smed
=4x10-2 for BayesB) across the 15 replicates under the medium marker

density. For BayesA (Figure 3.6) we determined no significant difference in accuracies
2
2
and s 2 = 0.01 smed
); however, breeding value
when s2 was understated (i.e., s 2 = 0.1 smed

accuracies were significantly compromised when s 2 was overstated (P<0.01), particularly
2
at s2 = 100 smed
(P<0.0001). For BayesB (Figure 3.7), we did not see any significant

differences in accuracy of prediction between the various scenarios.

53

Figure 3.6: Boxplots of accuracies of breeding value predictions under BayesA model
2
across 15 replicates at medium marker density (pairwise r2=0.24) for s set equal to
2
2
different multiples of average posterior mean of s ( smed
=7x10-4) in a fully Bayes
2
analysis. Significant differences with 1 smed
are indicated by *(P <0.01), ***(P <0.0001) .

Figure 3.7: Boxplots of accuracies of breeding value predictions under BayesB model
2
across 15 replicates at medium marker density (pairwise r2=0.24) for s set equal to
2
2
=4x10-2) in a fully Bayes
different multiples of average posterior mean of s ( smed
2
analysis. Significant differences with 1 smed
are indicated by *(P <0.01), ***(P <0.0001).

54

We wondered if some of the non-significant differences in these comparisons
could be partly attributed to a compensation in the inferences on other hyperparameters,
specifically v and π (in BayesB). Indeed we noted that as the specification on s 2
2
2
to 100 smed
, the posterior means of v also increased under both
increased from 0.01 smed

BayesA (Figure 3.8) and BayesB (Figure 3.9). This was somewhat anticipated given the
high posterior correlation attributed between these two hyperparameters. Note from the

{ }

prior specification on s

2
gj

{ }

the MCMC draws of s

2
gj

m
j =1

ν s2
that E s | s > 0 = ; that is, the average values of
ν −2

(

2
gj

2
gj

)

ν s2
will be somewhat constrained by
. So if s 2 is
j =1
ν −2

m

understated, the estimate of v ( v > 2) should decrease accordingly to compensate such

ν s2
that there is a good deal of flexibility in maintaining the value of
. However, if s 2 is
ν −2
ν s2
overstated, then there is very little flexibility to accordingly bring down
with an
ν −2
increased value of v since

ν s2
can never be less than s 2 . We believe this is the reason
ν −2

why understating the value of s 2 is less serious than overstating it, at least for BayesA, as
further indicated by our results. The misspecification also impacts estimates of π in a
BayesB as further illustrated in Figure 3.10. This provides BayesB even more flexibility
than BayesA for misspecification of s 2 ; that is, overstated values of s 2 merely distribute
the number of non-zero { g j }

m
j =1

over a few number of markers as indicated by higher

values of π . This may be a key reason why we noticed non-significant differences for
accuracy of breeding value prediction in BayesB between various specifications of s2 in
55

Figure 3.7. Misspecification of hyperparameters could then be another reason why
BayesB often outperforms BayesA in many other comparisons.

Figure 3.8: Boxplots of posterior mean and median for v under BayesA model across 15
2

replicates at medium marker density (pairwise r2=0.24) for s set equal to different
2
2
multiples of average posterior mean of s ( smed
=7x10-4) in a fully Bayes analysis.

56

Figure 3.9: Boxplots of posterior mean and median for v under BayesB model across 15
2
replicates at medium marker density (pairwise r2=0.24) for s set equal to different
2
2
multiples of average posterior mean of s ( smed
=4x10-2) in a fully Bayes analysis.

Figure 3.10: Boxplots of posterior mean and median for π under BayesB model across
2
15 replicates at medium marker density (pairwise r2=0.24) for s set equal to different
2
multiples of average posterior mean of s 2 ( smed
=4x10-2) in a fully Bayes analysis.

57

We also wondered if estimates of s 2 based on analysis derived from certain
marker densities could be extrapolated to other marker densities for analysis of the same
2
2
phenotypes. Recall that smed
=7x10-4 for BayesA, smed
=4x10-2 for BayesB with the

medium marker density panel. For the high marker density involving 4 times as many
2
markers, we specified s2 = 4 smed
whereas for the low marker density panel which had half
2
as many markers, we specified s2 = 0.5 smed
. We found no significant differences in

accuracies in any case (see Figures 3.11 and 3.12) except for a significant lower accuracy
for extrapolation on s 2 at the higher marker density using BayesA (P=0.04).

58

Figure 3.11: Accuracy of breeding value predictions under BayesA model across 15
replicates at high and low marker densities (pairwise LD r2=0.32 and 0.17) using DFMH
(red), DFMH with fixed scale s 2 =7x10-4/4=1.75x10-4 (green) at high marker density
and DFMH with fixed scale s 2 =7x10-4 x 2.5=1.75x10-3 (blue) at low marker density.
Significant difference in accuracy between DFMH (red) and DFMH with s 2 =1.75x10-4
(green) was found at P< 0.05.

Figure 3.12: Accuracy of breeding value predictions under BayesB model across 15
replicates at high and low marker densities (pairwise LD r2=0.32 and 0.17) using DFMH
(red), DFMH with fixed scale s 2 =0.04/4=0.01 (green) at high marker density and
DFMH with fixed scale s 2 =0.04 x 2.5=0.1 (blue) at low marker density. No significant
differences in accuracy among each pair of methods were found.

59

3.3.2 Application to Heterogeneous Stock Mice data
We summarize posterior inferences for the key hyperparameters under BayesA
and BayesB analyses of the heterogeneous stock mice data in Appendix Table B2.1,
Table B2.2 and Table B2.3 for the three marker densities: 950, 1900 and 3800 SNP. For
the 950 SNP marker analysis (Appendix Table B2.1), posterior means of parameters were
close to estimates previously provided for this same data by Yang and Tempelman (2012).
For all analyses, the number of MCMC cycles post burn-in was the same. Under the
BayesA model, the ESS for s 2 was twice as large using UNIMH and BIVMH compared
to DFMH, while the ESS for v was four times larger in UNIMH and BIVMH compared
to DFMH; the ESS for s 2 and for v were similar between UNIMH and BIVMH. Under
the BayesB model, the ESS for v using UNIMH and BIVMH was 7-8 times greater for ν
and 2 times greater for and π than that for DFMH. For the 1900 marker panel
(Appendix Table B2.2), the ESS for UNIMH and BIVMH compared to DFMH were
between 10-14 times greater for v and were both around 3 times greater for s 2 under
BayesA than for BayesB model, these ratios were respectively between 10 and 15 for v ,
and close to 2 for s 2 and π . For the 3800 marker panel (Appendix Table B2.3), these
respective ratios were between 12 and 13 for v and around 4 for s 2 using BayesA,
whereas they were between 10 and 15 for v , about 3 for s 2 and about 4 for π using
BayesB.
3.4 Discussion
Most researchers don’t typically infer upon key hyperparameters (i.e., v , s 2 and π )
that partly determine the genetic architecture in BayesA/B WGP models. This is in part
60

due to the high posterior correlation that exists between some of these hyperparameters,
in particular, v and s 2 (Habier, Fernando et al. 2011; de los Campos, Hickey et al. 2013).
Nevertheless, some (Riedelsheimer, Technow et al. 2012; Technow, Riedelsheimer et al.
2012; Technow and Melchinger 2013) have been successful in using techniques
previously presented by Yi and Xu (2008) and Yang and Tempelman (2012) to infer
upon these hyperparameters, and which closely mirrors that labeled DFMH as described
in this paper.
We considered two alternative sampling strategies to DFMH, each involving the
use of MH, in an attempt to improve the computational efficiency of WGP models as
measured by the ratio ESS/CPU. Using simulation studies and empirical data analyses,
we demonstrated that strategies borrowing more heavily on MH sampling had better
computational efficiencies compared to DFMH. Simple modifications such as sampling
s 2 with a MH rather than a Gibbs step (UNIMH) or joint sampling of s 2 and v with a

bivariate MH step (BIVMH) lead to substantial improvements in ESS/CPU. We concede
that our investigation is not exhaustive with respect to assessing all possible strategies to
improve computational efficiency in these models; in fact, there may be a hybrid
involving some or all of the three presented sampling strategies that might be
computationally more efficient. Deviations of MH sampling such as Langevin-Hastings
could also have been explored and assessed here as well although its advantage relative to
MH sampling has not been too convincing in other animal breeding models (Shariati and
Sorensen 2008; Waagepetersen, Ibanez-Escriche et al. 2008). In other work that we do
not report here, we attempted to base the covariance matrix for the proposal density in
BIVMH on the negative Hessian of the joint FCD of log( v ) and log( s 2 ).
61

However, we

determined this matrix to be positive definite generally only when v < 50, thereby
negating its use in this way. Recently, non-MCMC (i.e. expectation-maximization)
schemes have been increasingly popular; however, it is often not straightforward how to
estimate key hyperparameters in these implementations (KARKKAINEN and SILLANPAA
2012). In any case, we encourage further development and work in this area including the
Bayesian LASSO model (de los Campos, Naya et al. 2009).
We have previously demonstrated that it may be advantageous to specify nonstationary correlation structures between adjacent SNP using a first-order antedependence
specification (YANG and TEMPELMAN 2012). In work not reported here, we also
evaluated the three alternative sampling strategies in the context of antedependence
versions of BayesA and BayesB and drew conclusions virtually identical to what we
draw here.
Overspecifying s 2 appeared to have deleterious effects on accuracy of genomic
selection using BayesA models although no such effect was observed in BayesB models
likely due to the counteracting influence of π . It appeared that underspecification of s 2
lead to more robust genomic predictions as there is greater flexibility for inference on π
and s 2 to compensate for this. We also determined that it may be reasonable to consider
specifying values for s 2 for one marker density based on a previous estimate from
another marker density by taking into account the direct inverse relationship between s 2
and marker density.
At any rate, it should be fully appreciated that these hyperparameters should not
be arbitrarily specified in BayesA models. We anticipate that these issues are also
62

pertinent to determining tuning parameters for various nonparametric approaches as well.
We do recognize however computational challenges may be formidable for marker
density panels that far exceed those that we considered in this paper. At the very least
then, some hyperparameters should be estimated based on simple model-based
approximations; for example, s 2 in BayesA should not be much different in magnitude
from the variance component for SNP effects in a GBLUP (Meuwissen, Hayes et al.
2001)analysis; hence, a REML-like estimator could be used to provide a reasonable
specification. If this is deemed to be computationally intractable relative to the marker
density, then extrapolations based on analyses based on lower marker densities might be
pursued similar to those presented in this paper.
3.5 Conclusions
In WGP Bayesian hierarchical models, log transformation and jointly drawing v
and s 2 can improve MCMC efficiency for inference on all hyperparameters. Even
separate univariate MH draws on v and s 2 is substantially more efficient than Gibbs
sampling of s 2 . Overspecification of key hyperparameters s 2 can reduce accuracy of
breeding value prediction under BayesA model. BayesB model is more robust to
misspecification of s 2 due to inference on association probability π . However, it’s
important to estimate all hyperparameters since misspecification of s 2 can lead to poor
inference on v and π .

63

Chapter 4 Random regression and reaction norm extensions of whole genome
prediction models to account for genotype by environment interaction

4.1 Introduction
Whole genome prediction (WGP) has become a revolutionary process for
selecting animals and plants for genetic merit on economically important traits using high
density single nucleotide polymorphism (SNP) markers (Meuwissen, Hayes et al. 2001).
Many WGP methods have been investigated to improve accuracy of breeding value (BV)
prediction (de los Campos, Hickey et al. 2013). Meuwissen et al. (2001) proposed two
hierarchical Bayesian methods, i.e. scaled-t density prior with and without point mass at
zero, namely BayesB and BayesA, respectively. To infer upon key hyperparameters, fully
hierarchical Bayesian WGP approaches based on BayesA have been developed and
applied in many studies (Yi and Xu 2008; Jia and Jannink 2012; Yang and Tempelman
2012).
Genotype by environment (G×E) interaction refers to how genotypes influence
phenotypes differentially in different environments (FALCONER 1952). That is to say, the
genetic merit, even ranking, of animals for certain quantitative traits could be
substantially different across different environments. The existence of G×E has been
found for various traits in various livestock and plant species (Deeb and Cahaner 2001;
Berry, Buckley et al. 2003; Beerda, Ouweltjes et al. 2007; Bohmanova, Misztal et al.
2008; Knap and Su 2008; Hadjipavlou and Bishop 2009; Lillehammer, Hayes et al. 2009).
Recently, it has been determined that some SNP and hence quantitative trait loci (QTL)
effects are different across environments (Lillehammer, Arnyasi et al. 2007;

64

Lillehammer, Odegard et al. 2007; Lillehammer, Goddard et al. 2008; Lillehammer,
Hayes et al. 2009). In fact, Lillehammer et al. (2007) determined in their analyses that
some QTL may not have been otherwise inferred without allowing for G×E. However,
little work has been considered to jointly model SNP effects across different
environments under a WGP framework (Burgueno, de los Campos et al. 2012).
Burgueno et al. (2012) adopted factor analytic models to account for G×E based on SNP
and/or pedigree derived relationships. Their model did not consider information due to
environmental covariates that might potentially drive G×E. If G×E is present, but is not
considered in WGP models, then selection of animals for certain environments could be
suboptimal. The existence of G×E further complicates the process of WGP validation;
that is, sometimes the effects of markers estimated under one population (i.e., training set)
are retested in another population or environment (i.e. validation set) (Daetwyler, Calus
et al. 2013); a clear example is the use of parental genotypes and data as training data
with progeny genotypes and data used as validation within the context of future
environments as validation. If G×E effects are important, then this validation strategy
may not work as intended.
Random regression (RR) and reaction norm (RN) models have played an
important role in detecting G×E of a linear or even higher order nature (Calus, Groen et
al. 2002; Berry, Buckley et al. 2003; Calus and Veerkamp 2003; Mattar, Silva et al. 2011;
Cardoso and Tempelman 2012; Streit, Reinhardt et al. 2012). RR models have been
typically used for modeling genetic merit of traits with repeated measurements over time
(Berry, Buckley et al. 2003) whereas RN models have been applied to quantitative traits
where genetic merit is typically modeled as a function of key environmental covariate(s)

65

(Streit, Reinhardt et al. 2012). For both RR and RN models, BV can be modeled as
function of an intercept, reflecting an average environment, and a linear function of a
key environmental covariate (Calus, Groen et al. 2002). In QTL mapping studies, the
QTL specific intercept and slope effects of environmental covariates have been modeled
to account for G×E (Lillehammer, Arnyasi et al. 2007; Lillehammer, Odegard et al. 2007;
Lillehammer, Goddard et al. 2008). In a genome wide association study (GWAS)
focusing on the detection of G×E, SNP specific intercept and slope effects of
environmental covariates have been modeled (Lillehammer, Hayes et al. 2009). With
increasing availability of high marker densities in WGP, we develop genomic RR/RN
models by specifying SNP substitution effects as random intercept and linear functions of
age or environmental covariates in a manner similar to Streit et al. (2013).
In “Bayesian alphabet” WGP models like BayesA or BayesB (MEUWISSEN and
GODDARD 2010), SNP specific genetic variances are modeled. These variances are of no
inherent interest but are used to specify a distribution for the SNP effects that are heavier
tailed (e.g. Student t) than Gaussian. For RR/RN models used to specify G×E in WGP,
2x2 genetic variance-covariance matrices (VCV) of the SNP-specific intercepts and
slopes are modeled. Conjugate priors on these trait-specific VCV, such as independent
inverted Wishart (IW) densities, have been used for bivariate genomic analyses (CALUS
and VEERKAMP 2011), thereby rendering marginal distributions on SNP intercept and
slope effects as bivariate Student t. An alternative specification, the square root free
Cholesky decomposition (CD) of the VCV, has been applied in bivariate trait analyses to
model random and residual variance-covariance matrices (Bello, Steibel et al. 2010). The
CD specification re-parameterizes VCV into generalized autoregressive parameters

66

(GARP) and innovation variances to provide potentially greater flexibility relative to IW
based specifications (POURAHMADI 1999).
In this paper, we develop and test five possible RR/RN models based on IW or
CD based specifications. The objectives of this study were to compare these five models
to two conventional BayesA and BayesB models for assessing the accuracy of BV
prediction in WGP, and to compare the five RR/RN models in the ability to detect G×E
of a linear nature.
4.2 Materials and Methods
4.2.1 Random regression and reaction norm models
The random regression (RR) model for WGP can be denoted as follows:

yik = x′i β + z′i ( g1 + dik g 2 ) + ai + eik ,

[1]

where yik is the kth phenotype record for ith animal (k=1,2,…,t; i =1,2,…,ni); β is the
vector of fixed effects, x′i is the incidence row vector connecting elements of β to animal
i; z′i = [ zi1 zi 2 zi 3  zim ] is the vector of genotypes coded as 0, 1, or 2 copies of the
minor allele on SNPs for animal i; g1 = { g1 j }
effects; g 2 = { g 2 j }

m
j =1

m
j =1

is the vector of SNP-specific intercept

is the vector of SNP-specific temporal slope effects; d ik is the

environmental covariate for record k on animal i and eik is the random residual. Finally,
ak is the random effect of animal characterized by a variance component s a2 . This
particular random effect may be either a specification on residual polygenic or permanent
environmental effects or both; nevertheless, it is particularly required in addition to the

67

other terms when there are repeated records per animal. The environmental covariates

dik are assumed to be known without error. In Equation [1], one could write the total
breeding value (BV) for animal i in an environment characterized by covariate dik as

z′i ( g1 + dik g 2 ) which, in turn, is a function of the intercept BV z′i g1 and the slope BV

z′i g 2 . One could then think of z′i g1 as being a measure of overall genetic merit (if dik is
recentered to an average of 0) for animal i whereas

z′i g 2 is a the genetic merit of that

same animal’s environmental sensitivity; i.e., the greater the value of | z′i g 2 |, the greater
the sensitivity of that animal’s genetic merit to different environments as represented by
different values of dik.
We further write the RR-WGP model in matrix notation as follows:

y=Xβ + Zg1 + DZg 2 + Wa + e,

[2]

where e = {eik } , D is a nt by nt diagonal matrix with the environmental covariates {dik }

Z
along the diagonal, X = {x′i ⊗ 1tx1}i =1 , and =
n

{z′i ⊗ 1tx1}i=1 with ⊗ denoting the
n

2
Kronecker product. We assume e ~ N (0, R = Is e ) in this paper although generalizations

to heterogeneous residual variances over time would be possible too. Also, we might
specify arbitrarily informative or diffuse priors p ( β ) on “fixed effects” β , p (s e2 ) on s e2
and p (s a2 ) on s a2 .
Reaction norm (RN) models could somewhat be considered simplifications of RR
models whereby typically only one phenotypic record ( yi ) is observed per animal,
although the environmental covariate ( di ) unique to animal i may vary across animals
68

(i.e., t = 1). Sometimes these covariates need to also be inferred (Su, Madsen et al. 2006)
but for the purposes of this study we’ll also consider di as known. The RN model can
then be written as:

yi = x′iβ + z′i ( g1 + d i g 2 ) + ei ,

[3]

As with the RR model in Equations [1] and [2], we could add an effect for animal, or
equivalently, residual polygenic effects based on a known correlation (i.e., numerator
relationship) matrix, A, between animals if the number of SNP is not considered to be
large enough to model genetic variability.
4.2.2 Conventional BayesA and BayesB (BayesA\BayesB)
In conventional WGP models, all elements of g 2 are zero. BayesB specifies a
mixture prior of a point mass at zero with non-association probability (1 − π ) and a
Student-t density with degrees of freedom v and scale parameter s 2 with association
probability π . BayesA is a special case of BayesB when π = 1 (Meuwissen, Hayes et al.
2001). Priors such as p (ν ) , p ( s 2 ) , and/or p (π ) can be specified on these
hyperparameters for BayesA or BayesB in order to properly “tune” them or account for
their uncertainty as we and others have done previously (Yi and Xu 2008; Technow,
Riedelsheimer et al. 2012; Yang and Tempelman 2012) or strongly advocated (JIA and
JANNINK 2012).
4.2.3 Bivariate Normality (IW-BayesC)
The simplest specification for SNP-effects and slopes might be based on
multivariate normality. Suppose we reorder g = [ g1 ' g 2 '] ' instead as
69

g*

g11 g 21 g12 g 22 ... g1m g 2 m ] ' [ g.1 '
[=

g.2 ' ... g.m '] where g.j represents the random

intercept and slope effects of SNP j. Here g.j are all specified to be independently
multivariate normal with null mean vector and common variance covariate matrix Σg
where
 s g21 s g1g2 
Σg = 
2 
s g1g2 s g2 

[4]

This specification, more or less, represents a bivariate extension of what Habier et al.
(2011) describes as BayesCπ when π = 0 where π defines the probability of nonassociation. However, as we illustrate later, this is effectively equivalent to a classical
mixed model analysis. One can specify a conjugate Inverted Wishart prior on Σ g with

Σ g ~ IW (v0 , Σ 0 ) , and we denote this extension as IW-BayesC.
4.2.4 Bivariate Student t and Variable Selection (IW-BayesA\IW-BayesB)
We consider an extension to IW-BayesC, whereby intercept and slope effects are
specified to have heterogeneous variance-covariance matrices across SNP. For SNP j, we
specify g. j to be conditionally bivariate normal; i.e.,


 s g21 j s g1 j g2 j  
.
g. j  N  02×1 , G j = 
2


s
s
 g1 j g2 j
g2 j 



[5]

We then specify all G j to have independent conjugate inverted Wishart prior densities;
i.e. G j ~ IW(vg , Σ g ) characterized by a degrees of freedom parameter vg and scale matrix
 s g21 s g1g2 
. We denote this specification as IW-BayesA, noting the obvious
Σg = 
2 
s g1g2 s g2 

70

bivariate extension to BayesA as first proposed by Meuwissen et al. (Meuwissen, Hayes

Σ g ∀j such that IW-BayesA reverts back to IWet al. 2001). Note that as vg →∞, G=
j
BayesC. We specify a prior p ( vg ) on vg and a conjugate Wishart prior Σ g ~ W ( v0 , S0 )
on Σ g . As alluded to by Munilla and Cantet (2012) and Bello et al. (2010), the
variability of the three components (i.e., 2 SNP-specific variances and a SNP-specific
covariance) of G j using IW-BayesA/B is primarily controlled by one hyperparameter: vg .
Mirroring the extension of BayesA to BayesB by Meuwissen et al. (2001), we
also further modified IW-BayesA by specifying a mixture prior on G j such that

G j = 02 x 2 with probability (1 − π ) and G j ~ IW(vg , Σ g ) with probability π . We name
this procedure IW-BayesB specifying a prior p (π ) on π . This specification is perhaps
more dubious than IW-BayesA, given its all or none assumption with respect to SNP
effects on both intercept and slope, whereas IW-BayesA likely has more flexibility to
specify large SNP effects for, say, the intercept, but near-zero for slope.
4.2.5 Cholesky decomposition specifications (CD-BayesA\CD-BayesB)
Based on our previous experiences, e.g. (Bello, Steibel et al. 2010), we
conjectured that specification of inverted Wishart prior densities on G j might be rather
inflexible as such specifications either imply that all SNP have either non-zero effects for
both intercept and slope (IW-BayesA, IW-BayesC) or, if they don’t, both effects are 0
(i.e., IW-BayesB), thereby not allowing for the possibility that some SNP effects are
overall important (i.e. non-zero intercept) but environmentally robust (i.e., zero slope).

71

Furthermore, these IW specifications are additionally inflexible in that the heterogeneity
of each single component of G j is controlled by a single parameter vg .
We subsequently developed an alternative parameterization based on the square
root free Cholesky decomposition (CD) of G j as based on our previous work (Bello,
Steibel et al. 2010). The CD parameterization provides potentially greater flexibility by
modeling the following relationship between g 2 and g1 :

=
g 2 Ψg1 + g 2|1

[6]

{ }

Here g 2|1 = g 2|1, j is the vector of SNP-specific slope effects conditional on intercept
effects whereas Ψ = diag {φ j } j =1 represents a diagonal matrix of SNP-specific
m

associations between intercept and slope effects. Hence, we can re-write the RN/RR
model [2] as:

y=Xβ + Zg1 + DZ ( Ψg1 + g 2|1 ) + e,

[7a]

y=Xβ + ( Z+DZΨ ) g1 + DZg 2|1 + e.

[7b]

or

Note for SNP j that if φj ≈ 0, intercept effects are independent of slope effects. If g 2|1, j ≈ 0
and φj ≠ 0 then intercept and slope effects are perfectly correlated. If g 2|1, j ≈ 0 and φj ≈ 0,
then the SNP is said to be environmentally robust (i.e., g 2 j ≈ 0 ).
For SNP j, we specify g1 j ~ N (0, s g21 j ) with

g 2|1, j ~ N (0, s g22|1 j ) with

s g22|1 j ~ χ −2 ( v2 , v2 s22 ) .

72

s g21 j ~ χ −2 ( v1 , v1s12 ) whereas

In essence, these two mixtures specify

two separate univariate Student t densities for elements of g1 and g 2|1 . Furthermore, we
specify independent normal priors on SNP-specific association parameters between
intercept and slope effects; i.e., φ j ~ N ( mφ , s φ2 )∀j . We label this model as CD-BayesA.
Alternatively, let’s consider a variable selection extension of CD-BayesA such that s g21 j
and s g22|1 j have these same respective inverted chi-square priors but with corresponding
probabilities π 1 and π 2 such that s g21 j = 0 and s g22|1 j =0 with probabilities (1 − π 1 ) and

(1 − π 2 ) .

For obvious reasons, we then label this model then as CD-BayesB. For both

models, we specify diffuse or informative priors on mφ and on s φ2 .
4.2.6 Bayesian inference
In order to conduct fully Bayesian inference using Markov Chain Monte Carlo
methods, it is necessary to derive the full conditional densities (FCD) for each unknown
parameter to be inferred. For each of the aforementioned RN/RR models, we present
these FCD in Appendix C1.
4.2.7 Simulation Study
In order to discern the ability of the various models to differentially fit various
naively defined genetic architectures, we conducted a simple simulation study. We
targeted six specific scenarios as outlined in Table 4.1. Key specifications were based on
an overall or average genetic correlation ( ρ g1g2 ) between intercept and slope, as further
described and defined in Appendix C1.3, targeting values of ρ g1g2 = 0, ρ g1g2 = 0.5, and

ρ g g = 0.8. We also investigated the effects of the number of QTL influencing both
1 2

73

intercept and slope (Mboth = 20, 50, or 100), the number of QTL influencing the intercept
only (Mint = 20, 50) and the heritability (h2 = 0.20 or 0.50). One may think of Mint as the
number of environmentally robust QTL (i.e., consistent genetic effects across
environments) whereas Mboth denotes the number of environmentally sensitive QTL; i.e.
QTL whose effects are influenced by environmental effects.
Table 4.1: Summary of six scenarios in LD simulation
Scenario

Number of QTLs Number of QTLs Overall genetic

Average

for both intercept for intercept only correlation ( ρ g1g2 )

heritability

and slope

( Mint )

[and range across

( Mboth )
1
2
3
4
5
6

100
100
100
50
50
20

(h 2 )

replicates]
0
0
0
50
50
20

0 [-0.07, 0.07]
0.5 [0.39, 0.61]
0.8 [0.68, 0.85]
0.5 [0.40, 0.66]
0.5 [0.40, 0.66]
0.5 [0.37, 0.71]

74

0.5
0.5
0.5
0.5
0.2
0.5

The first three scenarios (Scenarios 1-3) all entailed Mboth = 100 and h2 = 0.50 and
characterized situations that seemingly best agreed with an IW-BayesB or a CD-BayesB
situation i.e., Mint = 0. The only differences between each of these three scenarios were
differences in ρ g1g2 . Scenarios 4-6 seemingly best agreed with the CD-BayesB
specifications (i.e., Mint ≠ 0) and were studied in order to assess the effect of h2 (0.5 vs 0.2
for Scenarios 4 vs. 5) and Mint (50 versus 20 for Scenarios 4 vs. 6).
Twenty replicated datasets were generated under each of the six different
scenarios. For each replicate, we used the R package HaploSim (Coster, Bastiaansen et
al. 2010) to generate 6000 historical generations based on a constant population size of
100 animals as based on 200 unique haplotypes in the base generation. For all cases, the
genome was originally composed of one chromosome with 1 Morgan in length and
having 100,000 loci. For 20,000 of these loci, the biallelic minor allele frequency was
0.5 whereas the remaining loci (i.e., 80,000) were specified to be monomorphic in the
base population. The number of recombinations for each meiosis event was drawn from
a Poisson(1) distribution with genomic positions for recombination randomly chosen
from a uniform distribution. For the 6000 historical generations, we specified the
recurrent mutation rate for all loci as 10-5 per locus per generation. After 6000
generations, two additional Generations 6001 and 6002 were generated to expanded to
randomly mated population sizes of n = 2000 animals each.
For each replicate, we deleted SNP with minor allele frequency (MAF) < 0.05 in
Generation 6001. Around 2200 SNPs remained after data editing. The genotype matrix Z
was then based on the number of minor alleles (0, 1 or 2) at each locus for each animal.

75

We randomly chose Mboth + Mint of these SNP to be QTL. Variances for QTL-specific
intercept effects

s g21 j were drawn from scaled inverted chi-square distributions with scale

s12 = 2 and degrees of freedom ν 1 = 5. Variances for any QTL-specific slope effects

conditional on intercept effects, (i.e.,

s g22|1 j ) for each of Mboth QTL, were also drawn

from scaled inverted chi-square densities always with ν 2 = 5 and with s22 = 2 ( ρ g1g2 = 0),
s22 =1.5 ( ρ g1g2 = 0.5), or s22 = 0.72 ( ρ g1g2 = 0.8). QTL effects for intercepts

{g

}

=j M both + M int
QTL ,1 j

j =1

and conditional slopes { gQTL ,2|1 j }

j = M both
j =1

were then independently

generated from normal distributions with null means and their corresponding variances

s g21 j

and

s g22|1 j .

Hence, QTL effects for { gQTL ,1 j }

=j M both + M int
j =1

and { gQTL ,2|1 j }

j = M both
j =1

were each

Student t-distributed.
The association parameters {φQTL , j }

j = M both
j =1

between intercept and slope for each of

Mboth QTL were generated from independent normal distributions, always with variance
2
s QTL
,φ = 0.05 and with mean mQTL ,φ = 0, for ρ g g = 0, mQTL ,φ =0.5, for ρ g g = 0.5, and
1 2

1 2

mQTL ,φ =0.8, for ρ g g = 0.8. Subsequently, the effect for slope for QTL j was determined
1 2

to be g=
gQTL ,1 j + φQTL , j gQTL ,2|1 j ; j=1,2,…,Mboth. An environmental covariate, di ,
QTL ,2 j
unique to each animal, was randomly drawn from Ν(0,0.36). Writing ZQTL,int as the
subset of the n x Mint SNP genotypes in Z that are designated to be QTL for intercepts
only and ZQTL,both as the subset of the n x Mboth SNP genotypes in Z that are designated to
be QTL for both intercepts and slopes, the true breeding values {TBV }i =1 in each of
i =n

76

Generations 6001 and 6002 were generated using

ZQTL ,both { gQTL ,1 j }

{TBVi }i =1

i=n

+ ZQTL ,int { gQTL ,1 j }

+ diag ( di ) ZQTL ,both { gQTL ,2 j }

j=
M both
j=
M both + M int

j=
M both

1
j=
j=
M both +1

1
j=

[8]
Residuals ( ei ) were generated for the record on each animal were drawn from a normal
distribution with null mean and variance ( s e2 ) as dependent upon h2; i.e

s =
2
e

(

)

var {TBVi }i =1 (1 − h 2 )
i =n

[9]

h2

=
yi TBVi + ei .
Hence, phenotypic records were generated as
Since the existence of GxE would imply that we select for genetic merit tailored
for specific environments, we compared the accuracy of predicting TBV among the five
aforementioned RN methods as well the two conventional methods (BayesA, BayesB) at
three environmental covariate values: d = -1.2, d = 0.0 and d = 1.2 representing -2, 0 and
+ 2 standard deviations, respectively, for d. Accuracy for a particular value of d was

{

}

defined as the correlation between {TBVi ,d }i =1 and { EBVi ,d }i =1 where TBVi ,d for
i =n

i =n

Generation 6002 animals is based on Equation [8] but with all di = d whereas

EBV=
z i' gˆ 1 + dz i' gˆ 2 for z i' being the SNP genotypes for animal i in Generation 6002
i ,d *
were based on estimated effects using only Generation 6001 data; i.e., gˆ 1 and gˆ 2 are the
respective posterior means for SNP-specific intercepts and slopes. For each of the 5
RR/RN models, we also assessed the relative accuracy of predicting the components of
77

the BV due to intercept and slope separately in all six scenarios among the five methods.
Intercept BV accuracy was determined as the correlation between true intercept BV (first
two expressions on right side of Equation 8) and estimated intercept BV ( Zgˆ 1 ) whereas
slope BV accuracy was determined as the correlation between true slope BV
( ZQTL ,both { gQTL ,2 j }

j = M both
j =1

) and estimated slope BV ( Zgˆ 2 ). We compared prediction

accuracy of BV using a Wilcoxon signed rank test between each pair of the RR/RN and
conventional methods.
4.2.8 MSU Pig Resource Population data
The genotypes used for this analysis were based on a commercial platform for
low density genotyping (8434 SNP) in swine marketed as the Neogen Porcine GeneSeek
Genomic Profiler LD (version 1) (GeneSeek, a Neogen Company, Lincoln, NE) (Badke,
Bates et al. 2013). We received complete phenotypes and genotypes information on 928
F2 animals derived from a Duroc × Pietrain resource population at Michigan State
University (Edwards, Ernst et al. 2008; Choi, Steibel et al. 2010). Any SNP with MAF <
0.01 were deleted. For adjacent SNP in complete LD with pairwise r2=1, we deleted one
SNP at random from each pair. Then we excluded SNPs having P-value < 10-4 for the
Hardy Weinberg equilibrium test. Genotypes for the remaining 5271 SNPs were
standardized using ( zij − 2 p j )

2 p j (1 − p j ) where zij is genotype of jth SNP on ith

animal and p j is allele frequency of the reference (“0”) allele for jth SNP (de los Campos,
Hickey et al. 2013). Pedigree information was available for the 928 F2 animals including
their parents and grandparents. There were 140 unique full-sib families in total among the

78

928 F2 animals. The average full-sib family size was around 6. Map information was
available on SNPs for each of the 18 autosomes from Duarte et al. (Duarte, Bates et al.
2013).
We focused on back fat thickness as our response variable. Back fat thickness
was measured at the 10th rib by B-mode ultrasound at weeks 10, 13, 16, 19 and 22
(Edwards et al. 2008). We used the following RR model for data analysis:

yijkl = m + ∑ βω di + sex j + ∑ ( sex j × βω diω )
4

4

ω

=
ω 1=
ω 1

+ litterk + u1l + di u2l + z′l ( g1 + di g 2 ) + eijkl

.

[10]

ω
Here, m is the overall mean; di is the ωth order power or polynomial (ω =1,2,3,4) on

week i with βω being the corresponding partial regression coefficient, sex j is the fixed

(

effect of sex of animal j, sex j × βω diω

)

j

ω
is the fixed effect of the interaction between di

and sex j , litterk is random effect for litter k, u1l is permanent environmental intercept
effect for animal l, u2l is the permanent environmental slope effect for animal l,

z′l = [ zl1 zl 2 zl 3  zlm ] is the vector of genotypes on animal l, g1 is the vector of SNPspecific intercept effects; g 2 is the vector of SNP-specific slope effects; d i is the recoded
covariate for week i; eijkl is the residual effect on yijkl . We further rescaled d i as -1, -0.5,
0, 0.5 and 1 for weeks 10, 13, 16, 19 and 22.
Random effect specifications were as follows. Litter effects were presumed to be
2
.
normally and independently distributed with null mean and variance component s litter

We decided not to fit polygenic effects since they are strongly confounded with both litter

79

effects, permanent environment effects and SNP effects and hence would severely
impede MCMC performance. The permanent environmental effects u = [u1 ' u 2 '] ' for

u1 = {u1l } and u 2 = {u2l } were assumed to be multivariate normal with null mean vector
and variance-covariance matrix I ⊗ Σ where I is the identity matrix and Σ is the 2 x 2
unstructured covariance matrix between permanent environmental intercept and slope.
With availability of pedigree information, we also applied a conventional
polygenic model (RR-BLUP) as a control-based comparison for the genomic (i.e., SNP)
based RR models. Fixed effects, random litter effects and permanent environmental
effects were defined the same as in Equation [10], except that we replaced SNP effects by
polygenic effects for both intercept and slope as following:

yijkl = m + ∑ βω di + sex j + ∑ ( sex j × βω diω )
4

ω

4

=
ω 1=
ω 1

+ litterk + u1l + di u2l + a1l + di a2l + eijkl

.

[11]

We assumed that the polygenic effects a = [a1 ' a 2 '] ' for intercept a1 = {a1l } and slope

a 2 = {a2l } have a joint multivariate normal distribution with null mean vector and
variance-covariance matrix A ⊗ Σa , where A is the numerator relationship matrix and

Σa is the 2 x 2 unstructured covariance matrix between polygenic intercept and slope
effects.
All of the five aforementioned RR models along with RR-BLUP were compared
to conventional WGP specifications (i.e. BayesA and BayesB) based on 20 random
across litter cross validation splits of the data using Wilcoxon signed rank test. That is,
for each split, data on all individuals from 126 (90%) litters were randomly chosen as the
80

training data whereas the data on all subjects in the remaining 14 (10%) litters were
chosen to be validation data. In this cross validation, hyperparameters and variance
components were fixed to estimates derived from analyses of the entire data. In all cases,
except for RR-BLUP, inferences were based on posterior means of the MCMC samples
from the posterior distributions. To expedite analyses under RR-BLUP model, we used
ASReml 3.0 to provide analytical BLUP solutions rather than using MCMC, noting that
conditionally on variance components, these MCMC and BLUP inferences are identical.
For all eight models, the Pearson correlation between these predictions and the actual
training data was used as a measure of performance of the competing methods.
4.2.9 Priors used for data analyses
For all analyses in this paper, we specified a vaguely informative prior

ν ~ p (ν ) ∝ (ν + 1) and Gelman’s prior s 2  χ −2 ( −1,0 ) , which is also informative
−2

when bounded, for conventional BayesA as we’ve done previously (YANG and
TEMPELMAN 2012). For conventional BayesB, proper conjugate priors

s 2  Gamma ( 0.1,0.1) and π ~ p (π | απ , β=
Beta (=
απ 1,=
βπ 8 ) were used. For IWπ)
BayesC, a conjugate inverted Wishart prior was specified on scale matrix Σ g with

Σ g ~ IW (v0 =
−3, S0 =
02×2 ) . For IW-BayesA and IW-BayesB, we specified a prior on vg
such that p ( vg ) ∝ ( vg + 1) and a conjugate Wishart prior; i.e., Σ g ~ W ( v0 , S0 ) on the scale
−2

matrix Σ g . As in conventional BayesB, the same proper prior was specified on π , i.e.

π ~ Beta (=
απ 1,=
βπ 8) in IW-BayesB.

81

In CD-BayesA and CD-BayesB, we used the same non-informative priors

p (ν 1 ) ∝ (ν 1 + 1) and p (ν 2 ) ∝ (ν 2 + 1) for intercept and conditional slope effects. In
−2

−2

addition, Gelman’s prior s12  χ −2 ( −1,0 ) and s22  χ −2 ( −1,0 ) were used in CD-BayesA,
2
2
while s1  Gamma (0.1, 0.1) and s2  Gamma (0.1, 0.1) were specified in CD-BayesB.

απ 1 1,=
βπ 1 8) and
For CD-BayesB, we specified the proper priors π 1 ~ Beta (=
π 2 ~ Beta (=
απ 2 1,=
βπ 2 8) . For both CD-BayesA and CD-BayesB, mean of SNP specific
association parameters between intercept and slope mφ was specified using

p (=
(τ 0,=
mφ ) N=
ζ 2 1) , and variance for association parameters s φ2 was specified with

( )

2
χ −2 ( −1,0 ) .
Gelman’s prior p s=
φ

4.3 Results
4.3.1 Simulation Study
In Figure 4.1, we illustrate comparisons of the accuracies for each of the seven
competing models at each of three different environmental covariate values: d = -1.2, 0
and 1.2 for each of the six scenarios as previously described in Table 4.1. Again, these
results were based on 20 replicated datasets for each scenario. We focus first on the
comparisons in Scenarios 1-3 where the simulated genetic architecture seemed more
congruent with an IW-BayesA or CD-BayesA like specification (i.e., Mint = 0). When

ρ g g = 0 in Scenario 1 (Figure 4.1(1)), there were significant (P<0.05) differences in
1 2

accuracy of nearly 30 percentage points (i.e., >84% vs. <56% at d = -1.2, >82% vs. <52%
at d = 1.2) in favor of each the five RN models over the two conventional models
82

(BayesA/BayesB). Such differences were also significant when ρ g1g2 = 0.5 (Scenario 2 in
Figure 4.1(2)) and when ρ g1g2 = 0.8 (Scenario 3 in Figure 4.1(3)) although these
differences became increasingly asymmetric between d = -1.2 (i.e., progressively larger)
and d = 1.2 (i.e., progressively smaller). For each of these three scenarios, IW-BayesA
had the highest accuracy relative to all other models in d = -1.2 and 1.2 (P<0.05) whereas
it did not appear to be different from either of the CD models when d = 0. Furthermore,
the differences between all of the competing models were trivial (<2-3%) between all RN
and conventional models at d = 0 such that IW-BayesB and IW-BayesC were not even
judged to be different from BayesA and BayesB.
We further compared accuracies at d = -1.2, 0 and 1.2 in three other scenarios (4,
5, and 6) where the simulated genetic architecture seemed more congruent with a CDBayesB like specification (i.e., Mint>0) based on ρ g1g2 = 0.5. In Scenario 4 (Figure 4.1(4))
where Mboth = Mint = 50 and h2=0.5, both conventional BayesA and BayesB were inferior
(P<0.05) to all RN models at both d = -1.2 and 1.2 as expected. However, IW-BayesA
surprisingly outperformed (P<0.05) all other RN methods at both d = -1.2 and 1.2,
although such differences were small (i.e., <2-3%). Differences between all models were
even smaller at d = 0 (i.e., <1-2%). The specifications for Scenario 5 (Figure 4.1(5))
differed only from Scenario 4 with a lower h2 = 0.2. In that case, IW-BayesA also had
significantly (P<0.05) higher accuracy compared to other RN models at d =-1.2 and
d=+1.2 except for CD-BayesB at d = +1.2. At d = 0, CD-BayesB (P<0.05) outperformed
other models although all differences were very small. The final Scenario 6 (Figure
4.1(6)) differed only from Scenario 4 by lower numbers of QTL (i.e. Mboth = Mint = 20).
Not only was IW-BayesC significantly much lower in accuracy compared to all other RN
83

models at all three values of d, it was even lower in accuracy compared to conventional
(i.e., BayesA and BayesB) models at d = 0. CD-BayesB had the highest accuracy among
the five RN methods at each value of d although that difference was not significant
compared to IW-BayesA and IW-BayesB at d = -1.2.

Figure 4.1: Average accuracy of breeding value prediction for seven methods
(BayesA\BayesB\IW-BayesA\IW-BayesB\IW-BayesC\CD-BayesA\CD-BayesB) at three
environmental covariates in six scenarios, i.e. (1) M both = 100 , M int = 0 , ρ g1g2 = 0 ,
h 2 = 0.5 ; (2) M both = 100 , M int = 0 , ρ g1g2 = 0.5 , h 2 = 0.5 ; (3) M both = 100 , M int = 0 ,

ρ g1g2 = 0.8 , h 2 = 0.5 ; (4) M both = 50 , M int = 50 , ρ g1g2 = 0.5 , h 2 = 0.5 ; (5) M both = 50 ,
M int = 50 , ρ g g = 0.5 , h 2 = 0.2 ; (6) M both = 20 , M int = 20 , ρ g g = 0.5 , h 2 = 0.5 ;
1 2

1 2

Different letters indicate significant difference at P<0.05.

84

We attempted to better understand these results by focusing on which of the RN
methods performed best for inferring upon the intercept component of the BV and the
slope component of the BV. For all six simulation scenarios, we found that CD-BayesB
always had the significantly highest accuracy for intercept BV compared to all other RN
models (Figure 4.2A) whereas IW-BayesC and IW-BayesB were generally amongst the
worst. For slope BV accuracy, there was no evidence of differences between any of the
models at the low heritability Scenario 5 (Figure 4.2B). However, IW-BayesA did
outperform other RN methods in Scenarios 1-4 except for Scenario 3 where CD-BayesB
was not found to be inferior to IW-BayesA. In Scenario 6, IW-BayesA only
outperformed IW-BayesC. IW-BayesC and IW-BayesB had generally among the lowest
slope BV accuracy although CD-BayesA was poorest in Scenario 4. Hence the general
advantages for predicting environment-specific BV for CD-BayesB appeared to accrue
from its greater accuracy on inferring intercept components of genetic merit whereas that
for IW-BayesA appeared to accrue from inferring the slope components.

85

Figure 4.2: Accuracy of intercept (A) and slope (B) breeding value prediction for five RN
methods (IW-BayesA\IW-BayesB\IW-BayesC\CD-BayesA\CD-BayesB) under six
scenarios: i.e. (1) M both = 100 , M int = 0 , ρ g1g2 = 0 , h 2 = 0.5 ; (2) M both = 100 , M int = 0 ,

ρ g1g2 = 0.5 , h 2 = 0.5 ; (3) M both = 100 , M int = 0 , ρ g1g2 = 0.8 , h 2 = 0.5 ; (4) M both = 50 ,
M int = 50 , ρ g1g2 = 0.5 , h 2 = 0.5 ; (5) M both = 50 , M int = 50 , ρ g1g2 = 0.5 , h 2 = 0.2 ; (6)
M both = 20 , M int = 20 , ρ g1g2 = 0.5 , h 2 = 0.5 ; Different letters indicate significant
difference at P<0.05.

86

4.3.2 MSU Pig Resource Population data
Predictive ability was calculated as the correlation between observed phenotypes
yijkl in the validation dataset and their predicted values based on inferences from the

training dataset. The predictive abilities for each of the eight models for each of the 20
different cross-validation sets are illustrated in Figure 4.3. The five RR Bayesian
methods had ~2.5% higher (P<0.0001) predictive ability than the two conventional
methods and RR-BLUP model. In addition, no significant differences in predictive ability
were found among the five SNP effects based RR methods.

Figure 4.3: Predictive ability for eight methods (BayesA\BayesB\RR-BLUP\IWBayesA\IW-BayesB\IW-BayesC\CD-BayesA\CD-BayesB) from cross-validation
analysis using back fat thickness in MSU Pig Resource Population data. Different letters
indicate significant difference at P<0.0001.

87

In an attempt to better understand these results, we focused on the non-mixture
methods (conventional BayesA, IW-BayesA and CD-BayesA) for estimated intercept BV
and slope BV respectively. Based on estimated intercept BV using the complete final
analyses data, Figure 4.4 showed a relatively high correlation (~0.9996) between the two
RR methods IW-BayesA and CD-BayesA. In contrast, low correlations were determined
between conventional BayesA and IW-BayesA (~0.8606) and between conventional
BayesA and CD-BayesA(~0.8589). As conventional BayesA does not model genomic
effects for slope on weeks of age, we can only compare IW-BayesA and CD-BayesA for
the estimated slope BV. Figure 4.5 demonstrated that there was a high correlation
(~0.9845) between IW-BayesA and CD-BayesA for the estimated slope component of
BV.

88

Figure 4.4: Estimated intercept breeding values from conventional BayesA, IW-BayesA
and CD-BayesA using the complete final analyses data on back fat thickness in MSU Pig
Resource Population. Reference line is y=x.

Figure 4.5: Estimated slope breeding values from IW-BayesA and CD-BayesA using the
complete final analyses data on back fat thickness in MSU Pig Resource Population.
Reference line is y=x.

89

To further investigate the difference in detecting QTLs, we computed the absolute
values of posterior means of SNP effects at three different ages (10, 16 and 22 weeks) for
these same three methods in Figures 4.6-4.8. In the conventional BayesA model, the
SNP effects by necessity are estimated to be the same at any age and hence only one plot
was provided; this plot demonstrated that three chromosomes (2, 6, and 11) had some
relatively large SNP peaks. IW-BayesA (Figure 4.7) and CD-BayesA (Figure 4.8) also
demonstrated peaks in these and other chromosomes at all three ages. However, as might
be anticipated, SNP effects under these RR models tended to increase with increasing
ages. On chromosome 6, we found two relatively large SNP peaks with IW-BayesA and
CD-BayesA, respectively. With estimated SNP intercept and slope effects from IWBayesA and CD-BayesA, we can further demonstrate regression lines of estimated SNP
effects on rescaled weeks of age (-1, -0.5, 0, 0.5, 1) for the two SNP markers (Appendix
Figure C2.1).

90

Figure 4.6: Estimated SNP effects from conventional BayesA against marker position
using the complete final analyses data on back fat thickness in MSU Pig Resource
Population.

91

Figure 4.7: Estimated SNP effects from IW-BayesA against marker position using the
complete final analyses data on back fat thickness in MSU Pig Resource Population when
A) at week 10, B) at week 16, C) at week 22.

92

Figure 4.8: Estimated SNP effects from CD-BayesA against marker position using the
complete final analyses data on back fat thickness in MSU Pig Resource Population when
A) at week 10, B) at week 16, C) at week 22.

93

4.4 Discussion
Although RR and RN models have been extensively used for modeling GxE in
classical polygenic models (CARDOSO and TEMPELMAN 2012), they have not been as
extensively adapted for WGP models. Several efforts have been made to infer upon G×E
using RN models in QTL mapping and GWAS studies (Lillehammer, Arnyasi et al. 2007;
Lillehammer, Odegard et al. 2007; Lillehammer, Goddard et al. 2008; Lillehammer,
Hayes et al. 2009). To improve power of QTL detection, Lillehammer et al. (2007)
proposed RN models to estimate the QTL intercept and slope effects based on haplotypes
with identity by descent (IBD) information. They applied their models to a Norwegian
Red cattle population using herd-year mean estimates as environmental covariates
(Lillehammer, Arnyasi et al. 2007; Lillehammer, Goddard et al. 2008) which is typical of
RN models. Lillehammer et al. (2009) compared their models with and without pedigree
information in an Australian dairy bull population to scan one SNP at a time based on
genotypes. WGP models (i.e., joint analysis of all SNP) have been advocated to be very
important for conducting GWAS (Wang, Misztal et al. 2012); hence, we explored and
compared the alternative WGP RR alternatives for that purpose as well.
We consider both RR and RN WGP models in this paper since the modeling
issues are almost identical, albeit the circumstances are rather different. RN models are
intended for those situations where there is typically one measure per animal and
environmental effects and animal BV might be characterized by linear functions of these
covariates, (DE JONG 1995). RR models are intended for longitudinal data collection in
the sense that there are repeated measurements for each animal over time, such as back
fat thickness in pigs as analyzed in this paper. Because of this repeated measures dynamic,
94

it becomes even more imperative to model animal effects (i.e. additive genetics and/or
permanent environmental effects) in RR genomic models relative to RN models.
Alternative strategies for modeling GxE might be pursued based on, for example, factor
analysis, when environments cannot be classified by linear functions of covariates
(Burgueno, de los Campos et al. 2012).
The simplest RN/RR specification that we considered was IW-BayesC, being
essentially identical to a classical mixed model specification. Unlike any of the other
four RR/RN specifications, IW-BayesC assumes all pairs of SNP-specific intercept and
slope effects to be normally distributed whereas each of the other methods specify either
t-distributed or null effects. Furthermore, being simpler, the only hyperparameters
requiring inference in IW-BayesC are variance components to estimate; this is not trivial
given the difficulty for inferring upon degrees of freedom, for example, in heavier tailed
specifications (Habier, Fernando et al. 2011). In fact, IW-BayesC is identical in principle
to a classical RR or RN approach for mixed effects modeling. That is, we can set up the
mixed model equations from the RR/RN-WGP model [2] as follows:

 X ' R −1X

−1
 Z'R X
( DZ ) ' R −1X


X ' R −1Z
Z ' R −1Z + Ig 11
( DZ ) ' R −1Z + Ig12

  βˆ   X ' R −1y 
X ' R −1DZ
  

−1
'
Z ' R −1DZ + Ig 12   gˆ 1  =
Z
R
y


22  
−1
−1 


ˆ
+
DZ
'
R
DZ
I
g
g
DZ
'
R
y
( )
)
  2  (

[11]

 g 11 g 12 
where  12 22  = Σ −g1 whereby one could readily base s e2 and Σ g on their REML
g g 
estimates. Of course, it might also be necessary to further specify polygenic effects
and/or permanent environmental effects in [11], particularly for the RR case. In the
strictest sense, this computational approach is not exactly equivalent to IW-BayesC,
95

which is a fully Bayes procedure that would take into account uncertainty in s e2 and Σ g .
Nevertheless, based on experiences with our simulation study (not reported), inferences
on variance components, SNP effects, and, hence, BV are effectively the same, provided
that relatively diffuse priors on s e2 and Σ g are specified in IW-BayesC. Furthermore,
this mixed model/REML approach is computationally far more efficient compared to
having to conduct MCMC on IW-BayesC.
This efficiency can be further enhanced when the number of markers well exceeds
the number of animals for which genomic BV are estimated. That is, one can design a set
of mixed model equations equivalent to [11] but with potentially much smaller
dimensionality by directly solving for genomic intercept and slope BV in a genomic
animal model rather explicitly modeling SNP-specific effects (Habier, Fernando et al.
2007; Strandén and Garrick 2009). Extending Stranden and Garrick (2009) further and
assuming, say, one record per genotyped animal in a RN-like situation we can
reparameterize Equation [11] as follows

X'
X'D
X ' X
  βˆ   X ' y 

  
−1
−1

2 11
D + ( ZZ′ ) s e2 g 12   uˆ 1  =
 X I + ( ZZ′ ) s e g
 y 

−1
−1

2 11
2 22  
 D ' X D + ( ZZ′ ) s e g D + ( ZZ′ ) s e g  uˆ 2   D ' y 

[12]

where u1 =Zg1 and u 2 =Zg 2 . Extensions by Misztal et al. (Misztal, Legarra et al. 2009) to
RR and RN models where records from genotyped and ungenotyped animals are
combined into the analysis would be relatively straightforward as well.
One objective of this paper was to develop and compare five alternative RR/RN
models against each other and two conventional WGP models BayesA and BayesB. In
the simulation study, we first investigated the effect of an “average” genetic correlation
96

( ρ g1g2 ) between intercept and slope. In Scenarios 1-3 of the simulation study, each
mimicking a CD BayesA-like or even IW-BayesA-like process (because ν1 = ν2) , we
considered three different levels of ρ g1g2 representing low, median and high positive
genetic correlations between traits. We compared accuracy of genomic prediction in
environments characterized by low, average and high values. We found significantly
higher accuracies for the five RR/RN methods compared to the two conventional WGP
methods at the extreme environments (d=-1.2 and d =1.2) at all specifications of ρ g1g2 .
We found the difference in accuracies between the RN versus the conventional models
became greater with increasing ρ g1g2 at d = -1.2 whereas, curiously, we found the
converse trends at d = 1.2. This result might be due to the fact that the intercept is
defined at d = 0. Hence, positive genetic correlations between intercept and slope would
build positive associations between genomic evaluations at d = 0 with those genomic
evaluations based on d ≥ 0, but negative associations between genomic evaluations at d =
0 and those genomic evaluations based on d appreciably less than 0; i.e, where SNP or
animal-specific reaction norms start to “crisscross” each other. In other words, if we had
specified ρ g1g2 to be negative, we would have found the opposite trends.
We found that IW-BayesA tended to have significantly greater accuracies in
Scenarios 1-3 than all other RN models at d=1.2 and d=-1.2. This was initially surprising
to us; however, we would note two considerations. Firstly, the degrees of freedom
specification were the same for QTL effects for both intercept and in slope, i.e., ν1 = ν2.
Hence, CD-BayesA might confer little or no advantage to IW-BayesA then because of
the greater parsimony of a single degrees of freedom specification of the latter. Secondly,

97

with data simulated based on LD, one might anticipate that the assumption of
independence between effects of adjacent SNP markers is somewhat distorted (YANG and
TEMPELMAN 2012), although independence is specified for every model in this study.
Subsequently it may be rather difficult to predict the relative performance of WGP
models under LD, since the true model cannot be specified parametrically under LD,
even when QTL effects are generated from known distributions. In fact, we conducted a
separate simulation study (results not shown) whereby the data generation strategy
exactly match the assumptions of either of the two models, CD-BayesA and IW-BayesA;
but based on an assumption of linkage equilibrium between SNP markers. In those
comparisons, CD-BayesA did outperform IW-BayesA when the data generation model
was based on a CD-BayesA model and vice versa. Similar conclusions have been
recently drawn by Wimmer et al. (2013) who found that the presence of high levels of
LD, high levels of complexity ( (Mboth + Mint)/m) of genetic architecture, and low levels
of determinedness (n/m) will tend to mute differences in performance between various
Bayesian alphabet models.
At an average level of performance (d=0), there appeared to be very little
difference between any of the models, including the two conventional models not based
on any RN specification whatsoever. This was not surprising since one would expect that
conventional WGP models would, by default, predict to an average environment.
Scenarios 4-6 in the simulation study were intended to mimic a CD-BayesB like
process whereby only a fraction of the QTL effects having general performance effects
(i.e. intercept) also showed environmental sensitivity (i.e., non-zero slope effects).
Scenario 4 (heritability = 0.5) and Scenario 5 (heritability = 0.2) were the same in that

98

half of the 100 QTL were environmentally sensitive. As expected, all RN models had
lower accuracy in Scenario 5 compared to Scenario 4 because of the lower heritability,
although somewhat surprisingly IW-BayesA generally maintained its advantage over all
models.
Realizing that the total number of QTL have also been known to influence
prediction accuracy comparisons between conventional BayesA and BayesB
(MEUWISSEN and GODDARD 2010), we considered Scenario 6 which involved a total of
40 QTL, Mint = 20 environmentally robust QTL and Mboth = 20 environmentally sensitive
QTL, 2/5 of what was specified in Scenario 4 with everything else being the same. As
anticipated, CD-BayesB finally started to emerge as the most accurate of the 4 RN
methods particularly at d = 0 and d = 1.2. Again, these results are in agreement with
Wimmer et al (2013) who determined that variable selection methods (like BayesB or
CD-BayesB here), perform best under genetic architectures with low complexities.
Conversely, it was this scenario where the performance of IW-BayesC started to plummet,
even being inferior to conventional BayesA/BayesB analyses at d = 0. Of course, we
should be quick to note that QTL effects were simulated from heavy-tailed t-distributions,
perhaps thereby stacking the odds against IW-BayesC.
A particularly odd result was that the comparisons on accuracies did not
necessarily match up with estimated accuracies of predicting their components, i.e.
intercept and slope BV. That is, CD-BayesB was among the best for inferring upon
intercept BV in Figure 4.2A, whereas IW-BayesA was typically among the best for
inferring upon slope BV. Nevertheless, this does help explain why IW-BayesA was often

99

the best for inferring upon genomic BV at d = -1.2 and d = 1.2, whereas CD-BayesB
generally dominated at d = 0.
Our simulation study was based on arbitrary specifications of genetic architecture
based on (Mboth + Mint) QTL randomly located on a 1M chromosome; based on arguments
provided by Meuwissen and Goddard (2010), one might readily extrapolate these
simulation results to the case of nchr*(Mboth + Mint) QTL for a nchr M genome based on nchr
chromosomes. We realize that other determinants such as marker density can also
influence the comparisons among the five RN WGP models. Furthermore, if we had
specified even greater Mboth + Mint QTL, the more complex genetic architecture might
then more likely reflect the IW-BayesC assumptions ( π g = 1 , ν g → ∞ such that
Σ=
Σ g∀j ).
j

Given our simulation results then, it perhaps was not too surprising that we did
not observe any meaningful differences between the various models with an application
to data from a pig resource population. Firstly, the model applied was a RR, rather than
a RN model, implying that there is greater phenotypic information provided by the
repeated records in a RR context thereby potentially muting any real differences between
the various candidate models. Furthermore, the genetic architecture of the trait analyzed
was presumably far more complex and the level of determinedness far less than anything
considered in the simulation study.
Based on an analysis using microsatellite markers in the MSU pig resource
population, Choi et al. (2010) found highly significant QTL for back fat thickness at
week of age 10, 13, 16, 19 and 22 on chromosome 6 using a QTL mapping approach
without considering G×E. Our results based on a RR WGP analysis using a low density
100

SNP chip also indicated potential QTL on chromosome 6. Nevertheless, the RR WGP
specifications allowed us to explicitly model these potential QTL effects as a function of
age.
4.5 Conclusions
Five RR/RN methods have been developed in this paper under the frame work of
WGP. Based on a RN simulation study and a RR data analysis in pigs, RR/RN WGP
models provide greater accuracies in genomic evaluations compared to more
conventional WGP models. We believe that it’s important to account for SNP specific
intercept and slope effects in RN or RR data situations where SNP genotypes are
available. Nevertheless, differences in predictive performance between the various
RR/RN WGP models were not overwhelming such that simpler specifications such as
IW-BayesA may be suitable for analyses that involve high degrees of genetic complexity
or low levels of determinedness as previously mentioned by Wimmer et al (2013).
Conversely, based on our simulation results, we anticipate that CD-BayesB might show
greater promise when marker density is large relative to the number of QTL; i.e., low
degree of complexity. It is important that efficient software and/or algorithms be
developed for these models in order to allow for meaningful comparisons in these
situations.

101

Chapter 5 Exploring alternative specifications for bivariate trait whole genome
prediction models

5.1 Introduction
With the advent of genotyping and sequencing technologies, whole genome
prediction (WGP) has become commonly used for genetically selecting animals and
plants for economically important traits (de los Campos, Hickey et al. 2013). Numerous
approaches, including non-parametric methods (Gianola, Wu et al. 2010), Bayesian
parametric “alphabet” methods (Meuwissen, Hayes et al. 2001; Gianola, de los Campos
et al. 2009; Habier, Fernando et al. 2011), and generalized expectation-maximization
methods (KARKKAINEN and SILLANPAA 2012) for single trait analyses have been
developed. There may be, however, other untapped opportunities to improve prediction
accuracy in WGP. It is well known, for example, that many economically important
traits are genetically correlated. Multiple trait analyses have been recently used to
account for correlations among traits due to specific genes in genome wide association
studies (ZHU and ZHANG 2009), including for differential mapping of pleiotropic versus
non-pleiotropic QTLs (Banerjee, Yandell et al. 2008).
A large number of genetic evaluation methods have been developed and applied
to jointly analyze correlated traits in livestock (Gianola and Sorensen 2004; Banerjee,
Yandell et al. 2008). Some of these methods involve independent analyses on sets of
transformed variables using techniques based on, for example, factor analysis, principal
component analysis, canonical analysis and cluster analysis (Weller, Wiggans et al. 1996;
Musani, Zhang et al. 2006; de los Campos and Gianola 2007; Vichi and Saporta 2009).
102

For quantitative genetic analysis, these methods generally require a two-step approach of
reducing either number of traits and/or number of genetic effects. Another approach to
multiple trait modeling is to model the linear regression relationships among traits in a
multilayer system, namely structural equation models (SEM) or path coefficient models
(GIANOLA and SORENSEN 2004) although this might seem rather complex with large
numbers of SNPs. Banerjee et al. (2008) used seeming unrelated regression (SUR) to
identify pleiotropic QTL for multiple traits. This method allows each trait to have a
separate set of QTL or trait specific QTL and facilitates a computational efficient
sampling algorithm. However, their method models trait correlations due to residuals
rather than due to QTL, thereby providing no information on genetic correlation between
traits.
In order to further improve WGP accuracy, several efforts have been made to
develop Bayesian approaches in multiple trait models (CALUS and VEERKAMP 2011; JIA
and JANNINK 2012). Calus and Veerkamp (2011) demonstrated that, for traits having a
high genetic correlation with each other, multiple trait WGP model analyses lead to
higher WGP accuracies compared to single trait analyses, particularly for the lower
heritability trait. Among multiple trait WGP models investigated by Calus and Veerkamp
(2011), BayesSSVS, a variable selection method with a spike and slab prior on the SNP
effects, outperformed other models that assumed a normal density on all SNP. Jia and
Jannink (2012) further confirmed that the advantage of multiple trait WGP models over
single trait counterparts was greatly influenced by several factors, i.e. heritabilities and
genetic correlations between traits as well as the number of QTLs.

103

BayesA and BayesB are popular “Bayesian alphabet” models used for single trait
WGP. BayesA specifies a scaled-t prior on SNP effects and is a special case of BayesB
which, similar to BayesSSVS, specifies a mixture prior of point mass at zero and scaled-t
density (Meuwissen, Hayes et al. 2001). These specifications often have higher WGP
prediction accuracies compared to procedures based on Gaussian distribution
assumptions (e.g. ridge regression or GBLUP) if SNP effects deviate substantially from
normality. Calus and Veerkamp (2011) recently extended BayesA for use in multiple
trait analyses and determined similar advantages over multiple trait GBLUP predictions.
By estimating SNP-specific pleiotropic effects for multiple traits, we may infer
upon the most important pleiotropic regions in the genome given the relative locations of
SNP markers (van Binsbergen, Veerkamp et al. 2012). However, the assumption of a
multiple trait BayesA model, derived from conjugate inverted Wishart (IW) prior
densities on the SNP specific variance-covariance matrices (VCV), might be potentially
inflexible since the uncertainty in all elements of a VCV is based on a single degrees of
freedom parameter (MUNILLA and CANTET 2012). An alternative parameterization on
VCV for random and residual effects was proposed by Bello et al. (2010) who suggested
that the square root free Cholesky decomposition (CD) of the VCV in bivariate mixed
models might allow greater flexibility as uncertainty can be differentially expressed on
each element of a VCV using such a parameterization.
In this study, our objectives were: 1) To reaffirm the greater accuracies of
prediction provided by bivariate trait models relative to single trait conventional WGP
approaches and 2) To assess whether there may be greater flexibility, and hence greater

104

WGP accuracy, using CD-based parameterizations compared to IW based specifications
on SNP-specific variance-covariance matrices.
5.2 Methods and Materials
5.2.1 Whole genome prediction models
In a bivariate trait WGP model, SNP substitution effects are estimated for two
traits simultaneously. The general bivariate trait WGP model can be denoted by the
following:

yij = x 'ij β j + z′i g j + eij ,

[1]

where yij is the phenotype record for ith animal on jth trait (i =1,2,…,n; j=1,2); β j is the
vector of fixed effects on trait j, x′ij is the incidence row vector connecting elements of β j
to animal i; z′i = [ zi1 zi 2 zi 3  zim ] is the vector of genotypes coded as 0, 1, or 2
copies of the minor allele on SNPs for animal i; g j = {g jk } is the vector of SNP
k =1
m

substitution effects on trait j; eij is the random residual for ith animal on jth trait. We can
rewrite Equation [1] using matrix notation as:
 y1   X1 0   β1   Z 0   g1   e1 
 y  =  0 X  β  +  0 Z   g  +  e  ,

 2  2
2 2
 2 

where y i = { yij }

n

i =1

[2]

Xi = {xij' } , and Z = {z i' } . The animals’ genomic merit for the two
n

n

i=1

i=1

traits can be subsequently represented as u1 = Zg1 and u 2 = Zg 2 , respectively. For the
various bivariate trait WGP models investigated, we assumed that pairs of residuals on
105

animal i; i.e. ei. = [ ei1 ei 2 ] ' , i=1,2,…,n, follow independent bivariate normal densities
 s e21 s e1e2 
with a vector null mean and a common variance covariance matrix Σ e = 
.
2
s e1e2 s e2 

Similarly, effects of SNP k on the two traits follow independent bivariate normal
densities with a vector null mean and a common variance covariance matrix
 s g2 s g1g2 
Σg =  1
 . Diffuse proper Gaussian or flat priors are typically specified on β1
2
s g1g2 s g2 

and β2 (SORENSEN and GIANOLA 2002). For the residual variance covariance matrix Σe ,
we might typically specify a conjugate inverted Wishart prior with degrees of freedom v0
and scale matrix Σ 0 .
5.2.2 Univariate BayesA and BayesB (uBayesA\uBayesB)
We re-label conventional single trait BayesA and BayesB models as uBayesA and
uBayesB, respectively to indicate the emphasis on univariate analysis. We infer upon
key hyperparameters in this model using prior specifications and strategies previously
outlined by Yang and Tempelman (2012).
5.2.3 Bivariate Ridge regression (bGBLUP)
Outside of some strategies for rescaling, we specify the realized relationship
matrix based on the unscaled genotype matrix for SNPs derived as G=ZZ′ in a bivariate
mixed effects model that we label as bivariate genomic BLUP or bGBLUP. In the
bGBLUP model, we specified multivariate normal distributions having null means for
each of u = [u1 ' u 2 '] ' and e = [e1 ' e 2 '] ' such that var
=
(u )

106

( ( ZZ') ⊗ Σ )
g

and

var=
(e)

(I

2x2

⊗ Σ e ) . Based on these specifications we use ASReml 3.0 (Gilmour, Gogel

et al. 2009) to provide REML estimates of Σg and Σe in order to compute the BLUP uˆ of u
and hence gˆ of g as necessary.
5.2.4 Bivariate Student-t (IWBayesA)
A convenient and previously used extension for bivariate trait WGP model is to
apply a conjugate inverted Wishart prior on heterogeneous SNP-specific variancecovariance matrices. This specification represents a multivariate extension of BayesA
(CALUS and VEERKAMP 2011) that we label as IWBayesA. For the joint effects of SNP k
on the two traits, we can specify a bivariate normal density conditionally as follows:

 s g21k s g1k g2 k  
g.k  N  02×1 , G k = 

2

s
s

 
g
g
g
2k
 1k 2 k


[4]

where G k is the SNP-specific variance-covariance matrix for the two traits and is
regarded as a random draw from a conjugate Inverted Wishart prior with degrees of

 s g21 s g1g2 
Σ
=
freedom vg and scale matrix g 
2  . For a fully hierarchical Bayesian model
s g1g2 s g2 
as developed in Yang and Tempelman (2012), we inferred upon hyperparameters after
specifying a prior p ( vg ) on vg , and also a conjugate Wishart prior Σ g ~ W ( v0 , S0 ) on Σ g .
Note that the uncertainty of IWBayesA is only controlled by one scalar vg (MUNILLA and
CANTET 2012). Furthermore, IWBayesA assumes that every single SNP is pleiotropic.
Conceptually, IWBayesA is not much different from that developed (with the same label)
for reaction norm modeling in Chapter 4.
107

5.2.5 Cholesky decomposition specifications (CDBayesA\CDBayesB)
In order to address the potential inflexibility in IWBayesA, we developed an
alternative approach based on the square root free Cholesky decomposition (CD) of G k .
We have previously shown that the CD parameterization can provide greater flexibility
for modeling variance-covariance matrices (Bello, Steibel et al. 2010).

Based on a

particular order for two traits, we can write the SNP effects on the second trait g 2 as a
linear regression of SNP effects on the first trait g1 :

=
g 2 Ψg1 + g 2|1

[5]

Hence we can re-write the general bivariate trait WGP model [2] as:
g1
  e1 
 y1   X1 0   β1   Z 0  
 y  =  0 X  β  +  0 Z   Ψg + g  + e 
2|1 

 1
2 2
 2 
 2

[6]

where g 2|1 = {g 2|1,k } is the vector of SNP effects on the second trait conditional on the
k =1
m

first trait, and Ψ = diag {φk }k =1 is a diagonal matrix of SNP-specific association effects
m

between SNP effects on the two traits.
2
Suppose we specify g1k ~ N (0, s g1k )∀k with s g2 ~ χ −2 ( v1 , v1s12 ) ∀k for SNP
1k

effects

and

their

respective

g 2|1,k ~ N (0, s g22|1k )∀k with

variances

on

Trait

s g22|1k ~ χ −2 ( v2|1 , v2|1s2|21 ) ∀k

1.

Similarly,

we

specify

for SNP effects and variances on

Trait 2 conditional on Trait 1. We label this model as CDBayesA given its CD-based
multivariate extension of BayesA; conceptually, it is very similar to the model of the
same name used in Chapter 4 for reaction norm modeling.
108

For the SNP specific

association effects between the two traits, we specify independent normal priors

φk ~ N ( mφ , s φ2 )∀k . Key hyperparameters, namely the degrees of freedom ( v1 , v2|1 ) and
scale ( s12 , s2|21 ) parameters, can be inferred upon in CDBayesA using prior specifications
similar to those in the conventional uBayesA model.
Here, the SNP-specific variance-covariance matrices G k are still specified very
generally by three parameters, as with IWBayesA, but expressed by an alternative
parameterization; i.e.
 s g21k

s g21k φk
Gk =  2

2
2
2
s g1k φk s g1k φk + s g2|1k 

However, unlike IWBayesA whereby the uncertainty on G k is essentially controlled by
one degrees of freedom parameter, the uncertainty on G k in CDBayesA is controlled by
three such parameters: the two different degrees of freedom terms v1 and v2|1 as well as
the variance component s φ2 . Nevertheless, CDBayesA, as does IWBayesA, assumes that
every SNP has a pleiotropic effect.
In an attempt to provide even greater flexibility than CDBayesA; i.e., to allow not
only pleiotropic effects but also non-pleiotropic effects and/or potentially null effects for
each SNP on both traits, we developed a variable selection approach analogous to a
BayesB type of specification which we naturally label as CDBayesB. Let s g2 have a
1k
mixture prior of point mass at zero with probability (1 − π 1 ) or randomly drawn from

109

χ −2 ( v1 , v1s12 ) with probability π 1 for k=1,2,…,m. Similarly, s g22|1k has a mixture prior of

(

)

(

−2
2
point mass at zero with probability 1 − π 2|1 or randomly drawn from χ v2|1 , v2|1s2|1

)

with probability π 2|1 for k=1,2,…,m. For the association effects, φk between the two
traits at each SNP k=1,2,…,m, we specify a mixture prior of point mass at zero with non-

(

association probability 1 − π φ

)

or randomly drawn from N ( mφ , s φ2 ) with association

probability π φ . Hence, in CDBayesB, we could infer upon SNP effects that are non-zero
and trait-specific (i.e., non-pleiotropic). For Trait 1, this would occur when g1k ≠ 0 with

φk = 0 and g 2|1k = 0 whereas for Trait 2 this would entail g1k = 0 and g 2|1k ≠ 0 regardless
of the value of φk . Pleiotropic effects will be inferred if g1k ≠ 0 and φk ≠ 0 regardless of
the value of g 2|1k although a value of g 2|1k = 0 would then imply a situation of “perfect”
pleiotropy between the two traits (i.e., a SNP-specific genetic correlation equal to ±1).
Using prior specifications similar to those in CDBayesA, we could infer upon the key
hyperparameters, i.e. degrees of freedom ( v1 , v2|1 ) and scale parameters ( s12 , s2|21 ).
Furthermore, we could specify informative or diffuse priors p (π 1 ) , p (π 2|1 ) , p (π φ ) , p ( mφ ) ,
and p (s φ2 ) on π 1 , π 2|1 , π φ , mφ and s φ2 , respectively. This model is somewhat analogous
to the same named model for reaction norms in Chapter 4 except that here we specify
three mixture rather than two mixture distributions to provide even greater flexibility.

110

5.2.6 Bayesian inference
For fully Bayesian inference using Markov Chain Monte Carlo (MCMC) methods,
we require strategies for drawing random draws from the full conditional densities (FCD)
of each unknown parameter (or blocks thereof) under all models. FCD for uBayesA\
uBayesB have been illustrated in our previous work (YANG and TEMPELMAN 2012). In
this paper, we present FCD for three bivariate trait WGP models (IWBayesA, CDBayesA,
CDBayesB) in Appendix D1. The fourth bivariate WGP model was analyzed using
classical REML and BLUP for computational expedience although we determined that
these inferences weren’t practically different from MCMC based inferences (results not
shown).
5.2.7 Simulation studies
We designed a naive small-scale simulation study involving independent markers
(i.e., not in LD) using a response surface design (Table D2.1 in Appendix D2) based on
five factors that we thought might be particularly important for influencing the WGP
accuracies on the trait with the lower heritability between two rather different models:
IWBayesA versus CD-BayesB. These five factors included the number (n) of animals,
the number (M1) of QTL controlling Trait 1 (h2 = 0.8), the number (M2) of QTL
controlling Trait 2 (h2 = 0.1), the number (M12) of QTL pleiotropically controlling both
2
traits and the variability ( s QTL
,φ ) of the associations between the two traits across QTL.

Among all five factors considered, only M12 had a significant interaction with model; i.e.,
the difference between IWBayesA and CDBayesB for WGP accuracy on Trait 2
depended on M12 (P<0.0001) as further noted in Table D2.2 of Appendix D2. We used

111

this knowledge to design a more focused LD simulation study to compare the WGP
accuracies for bivariate trait analyses involving six different models (uBayesA, uBayesB,
bGBLUP, IWBayesA, CDBayesA, CDBayesB).

Two populations were targeted,

differing only in the number of pleiotropic QTL influencing both traits (M12=10 versus
M12=30), with all the other specifications being the same as indicated in Table 5.1.
Table 5.1: Summary of two different populations compared in a LD simulation study.
Factors

Population 1

Population 2

Constant
Heritability of Trait 1

0.5

0.5

Heritability of Trait 2

0.1

0.1

Residual covariance between two traits

0

0

Number of SNPs

2000

2000

Number of animals

500

500

Mean on association parameters (mφ)

0.8

0.8

Variance on association parameters
(s2φ)

0.05

0.05

Number of QTLs for Trait 1(M1)

10

10

Number of QTLs for Trait 2(M2)

10

10

Number of QTLs for both traits (M12)

10

Of Interest

112

30

For each of the two populations or scenarios, we generated 20 replicates based on
a constant population size of 100 for 6000 generations of random mating using the
hypred package in R (TECHNOW 2012). For each replicate, we defined the genome as
composed of one chromosome of length 1 Morgan having 20,000 SNP loci; i.e., the
number of recombinations for each meiosis event was drawn from a Poisson(1)
distribution with crossing over locations drawn from a uniform distribution. In the base
population, all loci were monomorphic with polymorphisms created by a recurrent
mutation rate of 2.5×10-4 per locus per generation for each of the first 6000 generations.
After 6000 generations, two additional generations (6001 and 6002) were created with
expanded population sizes of 500 animals each. In Generation 6001, we excluded SNP
with a minor allele frequency (MAF) < 0.1. We defined genotype dosages (i.e., genotype
matrix) in Generation 6001 and 6002 as counts (0, 1, 2) of the minor allele for all the
remaining SNPs. We randomly selected 2000 SNPs plus an additional M1 + M2 + M12
SNPs to be QTLs in Generation 6001.
Now M1=10, M2=10, M12=10 in Population 1 whereas M1=10, M2=10, M12=30 in
Population 2. We generated QTL effects { gQTL ,1 j }

=j M1 + M12
j =1

for Trait 1 from a reflected

gamma distribution with shape=0.4 and scale=2.24. QTL effects { gQTL ,2|1 j }

j =M1 + M12 + M 2

=j M1 +1

for

Trait 2, conditional on Trait 1 were also generated from a reflected gamma distribution
with shape=0.4 and scale = 1.34. The association variables between Trait 1 and Trait 2
(i.e. {φQTL , j }

=j M1 + M12
=j M1 +1

2
) were simulated from a N( mQTL ,φ =0.8, s QTL
,φ =0.05). The effect for

gQTL ,1 j + φQTL , j gQTL ,2|1 j ; j=1, 2,…,M1 +
QTL j on Trait 2 was thereby determined as g=
QTL ,2 j

113

M12 + M2 noting that {φQTL , j }
and { gQTL ,2|1 j }

j = M1
j =1

j =M 2
j =1

= 0 , {φQTL , j }

j =M1 + M12 + M 2
j =M1 + M12 +1

= 0 , { gQTL ,1 j }

j =M1 + M12 + M 2
j =M1 + M12 +1

= 0,

= 0. Hence pleiotropic QTL effects were generated from a complex

bivariate distribution. If we define ZQTL,1 as the subset of the n x M1 SNP genotypes in Z
that are designated to be QTL for Trait 1 only and ZQTL,12 as the subset of the n x M12
SNP genotypes in Z that are designated to be QTL for both traits, the true breeding
values {TBV1i }i =1 for Trait 1 in each of Generations 6001 and 6002 were generated using
i =n

{TBV1i }i =1

i =n

= ZQTL ,12 { gQTL ,1 j }

=j M12
=j 1

+ ZQTL ,1 { gQTL ,1 j }

=j M12 + M1

[7a]

=j M12 +1

Similarly, if we define ZQTL,2 as the subset of the n x M2 SNP genotypes in Z that
are designated to be QTL for Trait 2 only, the true breeding values {TBV2i }i =1 for Trait 2
i =n

in each of Generations 6001 and 6002 were generated using

{TBV2i }i =1

i =n

= ZQTL ,12 { gQTL ,2 j }

=j M12

=j 1

+ ZQTL ,2 { gQTL ,2 j }

=j M12 + M 2

[7b]

=j M12 +1

2
2
Based on heritabilities h1 = 0.5 and h2 = 0.1 for Trait 1 and Trait 2, respectively,

we generated the pair of residuals for each of the two traits on each animal from a
bivariate normal distribution with null mean and variance covariance matrix Σ e , i.e.

(

)

 var {TBV }i =n (1 − h 2 ) h 2
1i i =1
1
1
Σe = 

0


0

(

)

var {TBV2i }i =1 (1 − h22 )
i =n



2
h2 


[8]

In other words, residuals were specified to be uncorrelated as it has been determined
previously by Jia and Jannink (2012) that the nature of residual correlation between two
114

traits is inconsequential to the accuracy of WGP prediction in bivariate models.

y1i TBV1i + e1i and
Phenotypic records for two traits were then generated as=

=
y2i TBV2i + e2i . Prediction accuracies of breeding values for the two traits in Generation
6002 were defined as correlation between {TBV1i }i =1 and {EBV1i }i =1 for Trait 1, and
i =n

i =n

correlation between {TBV2i }i =1 and {EBV2i }i =1 for Trait 2. The factor M12, of interest,
i =n

i =n

here influences the overall genetic correlation ( ρ g1g2 ) between the two traits, which we
determined as the correlation between {TBV1i }i =1 and {TBV2i }i =1 in Generation 6001.
i =n

i =n

5.2.8 Pine data analyses
Resende et al. (2012) provide a data set of loblolly pine phenotypes and genotypes
for demonstration of WGP methods which has been previously used by Jia and Jannink
(2012). The original data set had genotypes on 4854 SNPs and 926 individuals. After we
excluded SNPs with MAF<0.05 and with P<10-4 in HWE test, 2684 SNPs remained.
Using de los Campos et al. (2013), we standardized the genotype matrix based

(z

ij

− 2pj )

2 p j (1 − p j ) , where zij is genotype of jth SNP (minor allele dosage of 0, 1,

or 2) on ith animal and p j is allele frequency of one allele for jth SNP. Although raw
phenotypes were not publicly available, the authors provided deregressed EBVs for 17
traits.

Following Jia and Jannink (2012), we fitted deregressed EBVs as response

variables to compare the various WGP models. We selected two disease resistance traits,
i.e. Rust gall volume (RGV) with heritability of 0.12 and presence or absence of rust
(RBIN) with heritability of 0.21. After merging deregressed EBVs for the two traits and

115

SNP genotypes, 807 individuals with deregressed EBVs on both traits and each
genotyped with 2684 SNPs remained in the final data set.
Hyperparameters for each of the six models (uBayesA/uBayesB, bGBLUP,
IWBayesA, CDBayesA, CDBayesB) were estimated. Bayesian inference was based on
600,000 MCMC iterations with a burn-in period of 50,000 cycles for uBayesA, uBayesB
and IWBayesA. However, we found MCMC samples of hyperparameters using the
CDBayesA and CDBayesB models were mixing very slowly,

particularly for

hyperparameters like scale ( s12 , s2|21 ) and degrees of freedom ( v1 , v2|1 ). To alleviate this
problem, we arbitrarily specified the degrees of freedom ( v1 , v2|1 ) to be 4 for both traits in
the CDBayesA model since its specification didn’t influence the WGP accuracy of BV as
we found in Chapter 3. We fixed the scale parameters ( s12 , s2|21 ) in CDBayesA model
unique to each trait to their corresponding REML estimates in a bBLUP model as based
on the Cholesky decomposition of Σg. Nevertheless, we still estimated the mean and
variance of the association parameters (i.e., mφ and s φ2 ) in CDBayesA model using
MCMC. For CDBayesB model, we fixed the degrees of freedom ( v1 , v2|1 ) to 5. We also
fixed scale parameters ( s12 , s2|21 ) and probabilities ( π 1 , π 2|1 ) unique to each trait based on
their corresponding estimates using uBayesB. Other hyperparameters ( π φ , mφ and s φ2 )
in CDBayesB were estimated using MCMC.
To further compare the six models by cross-validation, we randomly split the data
20 different times into a training subset with 726 individuals (90%) and a validation
subset with the remaining 81 individuals (10%), thereby leading to 20 cross-validation
replicates. In order to investigate the influence of the specification of trait order in
116

CDBayesA and CDBayesB, we first analyzed the training data by setting RBIN as Trait 1
and RGV as Trait 2, labeling the two models as CDBayesA1 and CDBayesB1. Then, we
switched the order of the two traits and labeled the two models as CDBayesA2 and
CDBayesB2. For the seven methods, we predicted the deregressed EBVs in the validation
dataset based on posterior mean estimates of SNP effects for the two traits from the
training dataset. Performance of the seven models, namely cross-validation predictive
ability, was evaluated by Pearson correlation between the predicted and fitted deregressed
EBVs in the validation data. In cross-validation, we expected to see more differences in
predictive ability comparing the seven models on the low heritability trait RGV than on
RBIN. For each of the models on trait RGV, we also assessed inferences on effects on the
various SNPs to see if there might be any meaningful differences between the various
models in this respect.
5.2.9 Priors used for data analyses
In the simulation study, we specified a non-informative prior p (ν ) ∝ (ν + 1)

−2

and Gelman’s prior s 2  χ −2 ( −1,0 ) for uBayesA as we’ve done previously (YANG and
TEMPELMAN 2012). We specified a non-informative prior p (ν ) ∝ (ν + 1) , a proper
−2

απ 1,=
βπ 8) in uBayesB. For
conjugate prior s 2  Gamma (0.1, 0.1) and π ~ Beta (=
IWBayesA, we specified a proper conjugate prior p( Σg ) ∝ W( v0 , Σ0 ) where v0 is 4 and

Σ0 is a 2 by 2 identity matrix. For CDBayesA, we specified the same non-informative
prior on degree of freedom (ν 1 ,ν 2|1 ) and Gelman’s prior on scale parameters ( s1 , s2|1 ) as
2

2

in uBayesA. Priors p ( mφ ) ∝ N(0,1) and p (s φ2 ) ∝ χ −2 ( −1,0 ) were specified on mean and
117

variance of association parameters in CDBayesA. For CDBayesB, we specified the same
prior on degree of freedom (ν 1 ,ν 2|1 ), scale parameters ( s1 , s2|1 ) and probabilities ( π 1 , π 2|1 )
2

2

as in uBayesB. We specified the same priors on mφ and s φ2 as in CD-BayesA. For π φ ,

απ 1,=
βπ 8 ) .
we specified proper prior π φ ~ Beta (=
For the analysis of the pine data, we used the same priors on hyperparameters in
uBayesA and uBayesB. For IWBayesA, we specified proper conjugate prior

p( Σg ) ∝ W( v0 , Σ0 ) where v0 is 2 and Σ0 is a 2 by 2 diagonal matrix. The first and
second diagonal elements in Σ0 were specified to be the estimates of scale on RGV and
RBIN using uBayesA, i.e. 1.323e-05 and 3.215e-05. For CDBayesA and CDBayesB, we
fixed the degrees of freedom and scale parameters to estimates from the corresponding
univariate analyses (uBayesA and uBayesB) because of slow mixing as previously noted.
We used the same priors on mφ , s φ2 , and π φ for CDBayesA and CDBayesB as specified in
the simulation study.
5.3 Results
5.3.1 Simulation Studies
In the simulation study, the overall genetic correlation ( ρ g1g2 ) between two traits
was 0.48 and 0.63, respectively, for Populations 1 and 2 over the 20 replicates per
population. Population 2 had a much smaller between replicate standard deviation (~0.13)
for ρ g1g2 compared to Population 1 (~0.29).

118

Figure 5.1 illustrates the average WGP accuracy of predicting breeding values
(BV) in Generation 6002 for the two traits over the 20 replicates for each of Populations
1 (Figure 5.1A) and 2 (Figure 5.2A). For Trait 1 (h2 = 0.5) in Population 1, bGBLUP
had >5% lower (P<0.05) average accuracy compared to IWBayesA, while IWBayesA
had ~2% lower (P<0.05) accuracy compared to the four models (uBayesA\uBayesB\CDBayesA\CDBayesB), including two based on univariate WGP analyses. For Trait 2
(h2=0.1) in Population 1, bGBLUP and uBayesA had ~8% and ~3% lower (P<0.05)
average

accuracy

respectively

compared

to

the

other

four

models

(BayesB\IWBayesA\CDBayesA\CDBayesB). No significant difference was found
between uBayesB and the three bivariate models (IWBayesA\CDBayesA\CDBayesB) for
Trait 2 in Population 1.
For Trait 1 in Population 2 (Figure 5.1B), we found that both bGBLUP and
IWBayesA had ~2% lower (P<0.05) average accuracy than the other four models
(uBayesA\uBayesB\CDBayesA\CDBayesB). For Trait 2 in Population 2, the three
bivariate trait models (IWBayesA\CDBayesA\CDBayesB) outperformed bGBLUP (~3%),
while bGBLUP had ~5% higher (P<0.05) accuracy than the uBayesA and uBayesB. No
significant difference was found among the three Bayesian bivariate trait models
(IWBayesA\CDBayesA\CDBayesB) for Trait 2 in Population 2.

119

Figure 5.1: Accuracy of breeding value prediction for six methods
(uBayesA\uBayesB\bGBLUP\IWBayesA\CDBayesA\CDBayesB) in two scenarios: A)
number of QTLs for both traits = 10; B) number of QTLs for both traits = 30. Different
letters indicate significant difference with P<0.05.

120

5.3.2 Pine data analyses
The average predictive abilities for the eight different models over the 20 different
replicates in the cross validation are summarized in Figure 5.2.
For the lower heritability trait RGV, we found that CDBayesB1 and CDBayesB2
had ~5% greater (P<0.05) predictive accuracy compared to the other six models, whereas
bGBLUP, CDBayesA1 and CDBayesA2 had lower (P<0.05) predictive accuracies (~6%
and ~9%) compared to the other three bivariate trait models. There was no evidence that
IWBayesA had different predictive accuracies for RGV compared to uBayesA and
uBayesB. However, four of the models (uBayesA\uBayesB\IWBayesA\CDBayesA1)
including the two univariate models outperformed bGBLUP (~4% - 6%) for RGV.
For the higher heritability trait RBIN, we found that bGBLUP, CDBayesA1 and
CD-BayesA2 had lower (P<0.05) predictive accuracies (~6% and ~9%) compared to the
other three bivariate trait models. Furthermore, there was no evidence of a difference
among uBayesA and uBayesB and three of the bivariate trait models
(IWBayesA\CDBayesB1\CDBayesB2).
Across each of the cross validation replicates, the estimated SNP effects for either
trait using either the two specifications for order of traits using CDBayesB agreed rather
well (refer to panels A and B of Figure D2.1 in Appendix D2). However, in contrast with
CD-BayesB, there was less agreement between the two different trait orders using
CDBayesA (refer to panels C and D of Figure D2.1 in Appendix D2). To further
demonstrate the difference among three methods (uBayesA, IWBayesA and CDBayesB1),
Figure 5.3 shows the absolute values of estimated SNP effects on RGV against SNP
121

index using all of the data. IWBayesA (Figure 5.3B) detected the same number of
extreme large SNP effects as uBayesA (Figure 5.3A). However, compared to IWBayesA
and uBayesA, there were larger SNP effects inferred in CDBayesB1 (Figure 5.3C) for
RGV, which might partially explain the higher predictive accuracy for RGV in crossvalidation for that particular model.

Figure 5.2: Average predictive ability from cross-validation using the loblolly pine data
set for eight methods (uBayesA\uBayesB\bGBLUP\IWBayesA\CDBayesA1\CDBayesB1
\CDBayesA2\CDBayesB2), where CDBayesA1 and CDBayesB1 were using RBIN as the
first trait and RGV as the second trait; CDBayesA2 and CDBayesB2 were using RGV as
the first trait and RBIN as the second trait; Different letters indicate means are different
(P<0.05) from each other.

122

Figure 5.3: Estimated SNP effects for RGV from using 807 individuals and 2684 SNPs in
Pine data set using three methods: A) uBayesA; B) IWBayesA; C) CDBayesB1.

123

5.4 Discussion
Multiple trait extensions to WGP have been developed to improve prediction
accuracy by accounting for genetic correlations between traits (CALUS and VEERKAMP
2011; JIA and JANNINK 2012). Many studies have shown the advantage of multiple trait
models compared to their univariate counterparts especially for lower heritability traits.
For multiple trait models, variable selection methods such as BayesSSVS have shown
some advantage in prediction accuracy over models based on no such mixtures and/or
normality (CALUS and VEERKAMP 2011). Nevertheless, in Jia and Jannink (2012), a
variable selection method BayesCπ, based on normality for one of the mixtures,
demonstrated advantages over bivariate BayesA, which is similar to our IW-BayesA, and
GBLUP. A Gaussian prior on SNP effects might not be an ideal specification for genetic
architectures characterized by few large QTL effects. Furthermore, heavy-tailed variable
selection methods like BayesB popularized for univariate WGP analyses have not yet
been considered in multiple trait analysis. In this study, we developed multiple trait WGP
models based on the two univariate methods BayesA and BayesB. We did not, however,
pursue a variable selection of IWBayesA, analogous to IWBayesB developed in the
previous chapter, as we believed it to be dubious to attempt to fit a model where SNP
effects were either both zero or both non-zero for different traits.
In univariate trait WGP analyses, many factors, e.g. number of animals, number
of SNP markers, number of QTLs and heritability for the trait, could influence prediction
accuracy (MEUWISSEN and GODDARD 2010). According to Jia and Jannink (2012) factors
such as number of QTLs, genetic correlation and heritabilities for the two traits could
influence the prediction accuracy in bivariate trait WGP analyses,. They pursued “change
124

one factor at a time” techniques for their simulation study experimental design, which
reduced the total number of simulated replicates to be generated and analyzed by various
competing WGP models compared to a full factorial design. Conversely, we used a
response surface design to quickly pinpoint factors that might be particularly important
for influencing differences in accuracy between CDBayesB and IWBayesA for WGP
prediction leading us to focus in on M12, the number of pleiotropic loci.
In previously developed bivariate trait WGP simulations as well as distributional
assumptions for various modeling, QTLs have been generally assumed to be always
pleiotropic, often to the point that the genetic correlation between two traits is uniform
throughout the genome (CALUS and VEERKAMP 2011; JIA and JANNINK 2012). Under
such scenarios, these investigators generally found greater advantages for bivariate trait
compared to univariate trait WGP models.

Conversely, we considered a situation

whereby QTL may be either pleiotropic or non-pleiotropic in their effects between the
two traits. We specifically defined three categories for QTLs, where M1 and M2 was the
number of non-pleiotropic QTLs for each of the two respective traits, while M12
represented the number of pleiotropic QTLs. We further allowed for the fact, in both our
simulations and some of our models (e.g., IWBayesA, CDBayesA, and CDBayesB) that
the nature of the strength of association (i.e., genetic correlation) might be rather
heterogeneous across pleiotropic QTL as association variables (φj) between two traits for
the M12 QTL were drawn from a normal distribution.
In a focused LD simulation study, the only difference between two competing
scenarios was M12 (M12 = 10 in Scenario 1, M12 = 30 in Scenario 2). We found that
bivariate trait models improved accuracy of WGP compared to univariate BayesB for the
125

lower heritability trait in Scenario 2 but not in Scenario 1. That is, increasing M12 seemed
to provide more power to detect pleiotropic effects in bivariate trait models. However, we
did not detect any difference in WGP accuracy between CDBayesB and IWBayesA in
Scenario 2, even though CDBayesB, unlike IWBayesA, implicitly distinguishes between
pleiotropic versus non-pleitropic QTL.
Although IWBayesA is a convenient choice for bivariate trait WGP modeling and
closely mirrors the multivariate BayesA procedure previously developed by Calus and
Veerkamp (2011), it is imperative that hyperparameters like degrees of freedom and scale
parameters are properly “tuned” or inferred upon rather than set to arbitrary values;
otherwise, WGP accuracies can be badly compromised (JIA and JANNINK 2012). In this
study, we inferred upon hyperparameters based on the specification of diffuse prior
distributions. However, we also recognized in this study that mixing problems can arise
with real data applications such that it might be necessary to tune these hyperparameters
somewhat on univariate analyses.
In our LD simulation study, we found that IWBayesA has much lower accuracy
compared to other competing methods for the higher heritability trait. Conversely, in Jia
and Jannink (2012), IWBayesA outperformed uBayesA in their default simulation
scenario. The reason for the different results might be how QTL were differently
generated between our two studies. IWBayesA assumes that every SNP has a pleiotropic
effect, which partly agrees with the specifications in Jia and Jannink’s simulation study.
However, we also specified a substantial proportion of QTL in our study to be nonpleiotropic (i.e., trait-specific). This may have resulted in the IWBayesA being rather

126

inflexible relative to other competing models, including even univariate WGP models, for
the higher heritability trait.
In our simulation study, bGBLUP performed poorly compared to IWBayesA and
even to univariate WGP models (uBayesA\uBayesB) for Trait 1 (high heritability trait)
under both Scenarios 1 and 2. The assumption of bGBLUP states that SNP effects for
bivariate traits follows a light tail (multivariate normal) distribution, which is often
violated when there are only a few QTL that underly both traits. Our simulation study
and the analysis of the pine data indicated that bGBLUP had lower WGP accuracy
compared to IWBayesA which effectively assumes that SNP effects follow a heavier
tailed multivariate t distribution. This is also consistent with conclusions drawn by Jia
and Jannink (2012).
We found CDBayesB had much higher predictive ability in cross validation on
RGV in the pine data analysis. Jia and Jannink (2012) studied the same dataset in their
comparisons. However, they didn’t find any significant difference in cross-validation
performance between bivariate and univariate WGP trait models (except for the situation
when they assumed some missing values for one trait). With the ability to differentially
infer upon both pleiotropic and non-pleiotropic effects, we believe that CDBayesB offers
more flexibility compared to other competing models, including those previously tested
by Jia and Jannink (2012) and Calus and Veerkamp (2011).
CDBayesA model is a special case of CDBayesB with probabilities of nonzero
effects on both traits and associations set to 1. Unlike IWBayesA, it is necessary to
specify an order for the two traits for both CD models. In LD simulation, we analyzed the
127

data under the two models, using the same order of traits we simulated. However, it
might not be obvious whether the order of traits specified is important in actual
applications. We initially used the higher heritability trait (RBIN) as the first trait and the
lower heritability trait (RGV) as the second trait for both CD models. After switching
orders for these two traits, we found that predictive ability was unaffected for both
CDBayesA and CD-BayesB. One possible reason for this is that these two models might
be far more flexible than IWBayesA for distinguishing between pleiotropic versus nonpleitropic QTL.
5.5 Conclusions
Alternative Cholesky-based parameterizations (CDBayesA\CDBayesB) and
inverted Wishart specification on VCVs (IWBayesA) for bivariate trait WGP models
were investigated for their advantage in prediction accuracy compared to bivariate ridge
regression (bGBLUP) and univariate WGP models (uBayesA\uBayesB). With both nonpleiotropic and pleiotropic QTLs specified in two scenarios of LD simulation, the three
bivariate trait WGP models had higher accuracy than the two univariate trait models
(~8%) when the number of pleiotropic QTLs was relatively large. For the low heritability
trait in the two scenarios, the three Bayesian bivariate trait WGP models outperformed
bGBLUP (P<0.05). However, we didn’t find any significant difference among the three
Bayesian bivariate trait WGP models in both scenarios. Jointly accounting for pleiotropic
and non-pleiotropic SNP effects in CDBayesB is obviously more flexible compared to
bivariate models (CDBayesA and IWBayesA) assuming all SNP are pleiotropic. Due to
its flexibility, CDBayesB had higher predictive ability (~5%) compared to other
competing models regardless of the order on the two traits in application to pine data.
128

Chapter 6 Discussion, Conclusions and Future Work

This dissertation has focused on extending statistical models and developing
computing strategies to better conduct whole genome prediction (WGP) for selection of
breeding stock for economically important traits based on high density single nucleotide
polymorphisms (SNP) marker panels. The primary intent of this work was to develop
greater flexibility of WGP models in a number of potentially different ways. One key
enhancement was to model potential spatial correlation between SNP effects due to the
presence of QTL (Chapter 2). Another was to allow for potentially different modes of
genetic action (i.e., pleiotropic versus trait-specific), whether for reaction norm models
that account for a specific form of genotype by environment interaction (Chapter 4) or for
bivariate trait analysis (Chapter 5). Additional hybrid models that combine the features
of various WGP models in this dissertation (e.g., bivariate antedependence models) could
be conceptually derived and tested in future work as well.
Some researchers might be rather critical of these efforts, recognizing that this
dissertation has only added further to the “Bayesian alphabet” (Gianola, de los Campos et
al. 2009; Gianola 2013) given proposed model labels such as “ante-BayesA” or “CDBayesB”, for example. This criticism is certainly warranted if key hyperparameters are
not properly tuned, since improper tuning would only distort comparisons between the
models proposed in this dissertation and more conventional models used in current WGP
implementations. Hence this work has been prepared with this issue keenly in mind,
presenting fully Bayesian inferential strategies to infer upon these key hyperparameters in
every instance whenever possible. In fact, an entire chapter (Chapter 3) addresses
129

computational efficiencies for alternative strategies and the impact of hyperparameter
misspecification in WGP models.
However, we recognize that there is much more work that needs to be done on
this front, particularly as the applications in this dissertation were smaller in scale, i.e.,
with respect to number of genotypes m and number of phenotypes n, compared to many
current applications. Even in those cases, some difficulties were encountered. For
example, we resorted to inferring upon some key hyperparameters using conventional
univariate models before properly tuning the key bivariate hyperparameters for some of
the bivariate genomic analyses in Chapter 5.
Some of the models developed in this dissertation did not always perform better
than more conventional specifications; indeed, this was contrary to our expectations
based on simulated genetic architectures. We realize that all of the simulation studies and
applications considered in this dissertation are, by no means, exhaustive; nevertheless we
do, for example, note the following. Antedependence specifications (ante-BayesA/ anteBayesB) seemed to show particular advantages when linkage disequilibrium was
substantial (Chapter 2) whereas finely constructed bivariate genomic models such as CDBayesB which differentially model pleiotropic from non-pleiotropic QTL did particularly
well when the genetic architecture was simple (Chapter 5); i.e., low numbers (mQTL) of
QTL relative to the number of SNP markers (m). However, on nearly just as many
occasions, we did not detect meaningful differences in WGP accuracy between seemingly
disparate model specifications. For example, results were often counterinituitive with our
reaction norm model work (Chapter 4) in that sometimes IW-BayesA, a model that
assumes complete pleiotropy throughout the genome, did better than CD-BayesB which

130

was constructed to loosen that requirement. Recent work from Wimmer et al. (2013)
might be particularly enlightening in that regard; that is, they concluded that Bayesian
variable selection models are not likely to confer substantial advantages over simpler
GBLUP specifications when the heritability is low, the level of determinedness (n/m), is
low, the model complexity (mQTL/n) is high and/or the LD is high. This may partially
explain why some of the proposed variable selection methods in this dissertation (e.g.
CD-BayesB) did not confer substantial advantages for some of the smaller scale
examples considered.
Nevertheless, these rules do not necessarily apply to looking at different
distributional forms, e.g., Student t versus normal, with extensions to various bivariate
forms, e.g. based on inverted Wishart specifications versus more flexible specifications
based on the Cholesky decomposition, especially if model complexity is high.
Furthermore, the level of determinedness is increasing so fast in some populations, e.g.
Holsteins, that now n > m (LEGARRA and DUCROCQ 2012) such that it might become
increasingly more feasible to develop more comprehensive WGP models. This might
become particularly true for GWAS types of analyses where it has been noted in this
dissertation that inferences on individual SNP effects may be sensitive to model
specification. Furthermore although m and pairwise LD will admittedly increase with
sequencing technologies and thereby limit the effectiveness of more elaborate WGP
model specifications, it is also then more likely that future WGP models might be based
on haplotypes of SNP rather than single SNP per se, thereby further distorting the issue of
WGP model fit and choice relative to work by, e.g., Wimmer et al. (2013). Hence, this
continues to be a promising and exciting area of research.

131

There are certainly other strong limitations to the work in this dissertation that
may further distort the comparisons between the various WGP models, particularly for
analyses that involve real data. Firstly, we assumed genetic effects to be strictly additive,
such that it is unpredictable what the effects of non-additive gene action might be
(Gianola, Wu et al. 2010) on our comparisons. Certainly, there may be other
nonlinearities that might not have been accounted for in our work as well. Plasmodebased simulations may represent a more effective way of reassessing the relative
performance of WGP models (Vaughan, Divers et al. 2009).
We have already mentioned the computational limitations of some of our
proposed models, particularly when hyperparameters need to be estimated. Animal
breeders have been reticent, at best, to attempt to infer upon or properly tune these
hyperparameters for good reason; it has been rather difficult to do so except, perhaps,
based on some method of moments based determinations (de los Campos, Hickey et al.
2013). Although the toolkit in this dissertation was based on MCMC, it might be prudent
to pursue other computationally feasible analytical approximations based on, for example,
variational Bayes (Logsdon, Hoffman et al. 2010) or expectation-maximization like
methods (KARKKAINEN and SILLANPAA 2012). A similar argument also applies to
hyperparameter estimation. For example, in ridge regression or GBLUP like models,
REML could be used to estimate the key “hyperparameters” like the common SNP
variance component and the residual variance for example; certainly something similar
could be done for Student t (e.g. Bayes A) implementations as well; e.g. (Pinheiro, Liu et
al. 2001). This should be another fruitful area for future research.

132

APPENDICES

133

APPENDIX A: Chapter 2
A1 Markov Chain Monte Carlo Implementation Strategy for Ante-BayesA and
Ante-BayesB
In order to conduct MCMC, it is necessary to first specify the joint posterior
density of all unknown parameters (SORENSEN and GIANOLA 2002). To do this, we
interchangeably reparameterize the joint density of the data y and the random SNP effects,
using g for ante-BayesA and δ for ante-BayesB in order to exploit algorithmic
efficiencies that are unique to either model. For instance with ante-BayesA, we write
p ( y,g | β, u, σ δ , t, s e2 ) = p ( y | β, u, g, s e2 ) p ( g | σ δ , t )

[A1]

Note the component p ( y | β, u, g, s e2=
) N ( Xβ + Zg + Wu, Is e2 ) is based on Equation [1]
−1
−1
whereas g ~ p ( g | σ δ , t ) = N ( 0, Σg ) with Σg =
( I − T ) Δ ( I − T ) ′ are defined by

elements in σ δ = s δ21 s δ22 s δ23  s δ2m  specified along the diagonal of ∆, and by
t = t2,1 , t3,2 ,..., tm ,m −1  ' specified just below the diagonal elements in T as previously

indicated. For ante-BayesB, we reparameterize [A1] differently:
p ( y,δ | β, u, σ δ , t, s e2 ) = p ( y | β, δ, t, s e2 ) p ( δ | σ δ )

[A2]

recognizing that δ= (I − T)g such that [A2] represents a linear transformation of [A1].
That is, the first component of [A2] is based on

(

p ( y | β, δ, t, s )= N Xβ + Z ( I − T ) δ + Wu, Is
2
e

−1

) whereas p (δ | σ ) = ∏ N ( 0,s ) .
m

2
e

δ

j =1

2
δj

We’ll subsequently represent [A1] and [A2] together as p ( y,g ( δ ) | β, u, σ δ , t, s e2 , mt , s t2 )
134

to recognize the interchangeability between g and δ when conditioning on t. The joint
posterior density of all unknown parameters can be written as products of specifications
provided previously:
p ( β, g, u,s e2 , σ δ , t, mt ,s t2 ,ν δ , sδ2 | y ) ∝
 m

p ( y,g ( δ ) | β, u, σ δ , t,s e2 , mt ,s t2 ) p ( β )  ∏ p ( t j , j −1 | mt ,s t2 ) 
 j =2


[A3]

 m

2
2
2
2
2
 ∏ p s δ j |ν δ , sδ , p δ  p (s u |ν u , su ) , p (s e |ν e , Se ) p (ν δ )
 j =1


(

)

p ( sδ2 | α s , β s ) p ( mt | mt 0 , st20 ) p (s t2 | vt , st2 ) p (p δ | α pp
,β )

From the paper, p ( β ) = N ( β, Vβ ) , p ( t j , j −1 | mt ,s t2 ) = N ( mt ,s t2 ) , p (s u2 | ν u , su2 ) = χ −2 (ν u ,ν u su2 ) ,

(

)

p s e | ν e , se = χ
2

2

−2

(ν ,ν s ) , p ( s
e

2
e e

2

δ

(

)

)

(

)

| α s , β s = Gamma (α s , β s ) , p mt | mt 0 , st 0 = N mt 0 , st 0 ,
2

2

p (s t2 | vt , st2 ) = χ −2 ( vt , vt st2 ) , and p (π δ | α π , βπ ) = Beta (α π , βπ ) . Furthermore,

(

j

(

j

p s δ2 | ν δ , sδ2 , π δ

) is a mixture analogous to Equation [2] for ante-BayesB whereas

)

(

p s δ2 | ν δ , sδ2 , π δ= 1= χ −2 ν δ ,ν δ sδ2

) for ante-BayesA as described in the paper.

For some

parameters, we subsequently derive and present FCD separately for ante-BayesA ( π δ = 1)
from ante-BayesB ( π δ < 1) as some MCMC sampling strategies appear to be simpler or
more computationally efficient for one or the other model. Now MCMC requires
random draws from the full conditional densities of each unknown parameter (or blocks
thereof) conditional on all other parameters and the data (SORENSEN and GIANOLA 2002).
These full conditional densities are provided below for various classes of these unknown
parameters.

135

To sample all fixed and random effects in ante-BayesA, write θ = [β ' g' u '] ' as
the (p+m+q) vector of fixed and random effects, Q = [ X Z  W ] as the n x (p+m+q)
overall model incidence matrix with Σ − = diag ( Vβ−1 Σg −1 A −1s u−2 ) as a block diagonal
matrix with the corresponding listed components as the various blocks. It can be readily
demonstrated (SORENSEN and GIANOLA 2002) that the FCD of θ is

( )

θ | y,ELSE ~ N θˆ , C

[A4]

where ELSE denotes all other parameters in [A3] other than θ and

C
θˆ = CQ ' y+ β 0 ' Vβ−1 01x ( m + q )  ' for=

(Q ' Q + Σ )

− −1

s e2 . Note that with a typical “flat”

prior for β is defined by Vβ−1 = 0 such that θˆ = CQ ' y . Also, note that univariate or
multivariate block FCD subsets of θ could also be partitioned and sampled using [A4]
based on results from Wang and Gianola (1994) . The structure of Σg −1 = {Σg jj ' }
contained within Σ − is a simple tri-diagonal matrix: using ZIMMERMAN and NÚÑEZjj
s δ−j2 + t 2j +1, js δ−j2+1 for j = 1,2,….,m-1 with
ANTÓN (2010), the diagonal elements are Σ=
g

Σg mm = s δ−m2 whereas the elements adjacent to the diagonal are

Σg j , j +1 = Σg j +1, j = −t j +1, j s δ−j2+1 .
To sample marker-specific variances in ante-BayesA: Consider now the FCD for

s δ2 , j=1,2,…,m:
j

(

) (

) (

p s δ2j | y ,ELSE ∝ p g | t21 , t32 ,..., tm,m −1 , s δ21 , s δ22 , s δ23 ,..., s δ2m p s δ2j | ν δ , sδ2
136

)

[A5]

We use Chan and Jeliazkov (2009, pg 461) to simplify the first component of [A5] as
follows:

(

p g | t21 , t32 ,..., tG ,G −1 , s δ21 , s δ22 , s δ23 ,..., s δ2G
∝ G −1
∝Δ

1/2

−1 1/2

)

1/2
1
 1

exp  − g ' Σg −1g  =
( I − T ) ' Δ −1 ( I − T ) exp  − g ' ( I − T ) ' Δ −1 ( I − T ) g 
 2

 2


( )

G
 1

exp  − δ ' Δ −1δ  ∝ ∏ s δ2j
 2
 j =1

−1/2

 1 δ j2
exp  −
 2 s δ2
j







[A6]
Using the component in [A6] pertaining to s δ2j in [A5] and

(

)

ν s2
ν
 − δ δ
− δ +1
2s δ2 j
2  2 

p s δ2j | ν δ , sδ2 ∝ s δ j

e

then

νδ

 ν δ sδ2  2
ν s2

νδ  − δ δ
2 

−
+
1
−1/2
δ


2
1 j 
 s 2  2  e 2s δ2 j

exp  −
p s δ2j | y ,ELSE ∝∝ s δ2j
2
 2 sδ  νδ  δ j
j 

Γ 
 2
 ν +1 
 1 (δ j2 + ν δ sδ2 ) 
− δ +1
2
 2


exp  −
∝ sδ j
 2

s δ2j



(

) ( )

[A7]

( )

(

)

(

)

i.e. p s δ2j | y , ELSE= χ −2 ν δ + 1, δ j2 +ν δ sδ2 . As a sidenote, elements of δ can be
recursively derived from g:

0 
0
 1
 −t
0
 21 1 
0
δ=
( I − T ) g = 0 −t32


1
 
 0
−tm ,m −1
0


137

0   g1  
g1





0 g2
g 2 − t21 g1 
  
 g3 − t32 g 2 
0   g3  =
  

0   





1   g m   g m − tm ,m −1 g m −1 

[A8]

To sample fixed and random effects other than SNP effects in ante-BayesB, here
we deem it computationally tractable to sample the rest of the location parameters
separately from g. We again use Equation [A4] except that now we define θ = [β ' u '] '
as a (p+ q)x1 vector of fixed and random polygenic effects with Q = [ X W ] being the
corresponding n x (p+ q) submodel incidence matrix and Σ − = diag ( 0 pxp A −1s u−2 ) being
the corresponding block diagonal matrix. We then sample using Equation [A4] and

C
θˆ = CQ ' ( y-Zg ) + β 0 ' Vβ−1 01xq  ' for=

(Q ' Q + Σ )

− −1

s e2 .

To sample random SNP effects and variances in ante-BayesB, we consider the
collapsed sampling strategy (Liu 1994) for jointly sampling s δ2j and δj as previously
adapted for Bayes B in Meuwissen et al. (2001). Consider the previously described
mixture prior on the conditional variances
0 with probability π δ


p s δ2j | vδ , sδ2 , π δ =  −2
2

 χ ( vδ , vδ sδ ) with probability 1-π δ

(

)

(

[A9]

)

We jointly sample s δ2j and δj from p s δ2j , δ j | ELSE , y , by sampling first from

(

)

p s δ2j | y , ELSE except δ j and then from p (δ j | ELSE , y ) . The first component of [A2]
implies the following linear model:
y = Xβ + Hδ + Wu + e

[A10]

138

where=
H Z ( I − T ) . Let’s further partition H into the jth column, hj, and other
−1

remaining columns H − j ; similarly, we represent δ − j as all elements of δ other than δ j .
Then we further rewrite [A10] as follows:

y =Xβ + H − j δ − j + h jδ j + Wu + e

[A11]

It can be readily demonstrated, following similar developments for BayesB provided by
Meuwissen et al. (2001), that:

(

) ∫ p (s , δ | y, ELSE )
) p ( δ | s ) p (s | ν , s , π ) d δ

p s δ2j | y , ELSE except δ j =
∝

∫ p ( y | β, δ, u, t, s
δ

2
e

2

δj

2

j

δj

δj

j

2

δj

δ

2

δ

j

j



 1

1
*
*
2

−
−
−
−
exp
y
h
'
y
h
exp
δ
δ
δ
∫  2s e2 ( j j j ) ( j j j )   2s δ2 j dδ j
δj
j


−1/2
 1

∝ p s δ2j | ν δ , sδ2 , π V j
exp  − y *j ' V j−1y *j 
 2


(

)

(

)

∝ p s δ2j | ν δ , sδ2 , π

[A12]

=
V j h j h 'js δ2j + Is e2 .
where y *j =y − Xβ − H − j δ j − Wu and
Since [A12 ] is not a recognizable distributional form, a Metropolis Hastings step is
required. We adapt the independence chains implementation (CHIB and GREENBERG

(

)

1995) as also adapted by Meuwissen et al. (2001) using the prior p s δ2j | ν δ , sδ2 , π δ as the
,
candidate density. That is, at MCMC cycle [k], one samples a candidate, say, s δ2[*]
j

(

)

2
from p s δ2j | ν δ , sδ2 , π δ conditioned upon the updated values for ν δ , sδ and π δ . One

139

accepts s δ2j [ k ] = s δ2j* as the value for in cycle [k] with probability based on the Metropolis-

(

)

Hastings acceptance ratio q s δ2j[k −1] → s δ2j* :

(

q s δ j[k −1] → s δ j*
2

2

)

) (

(

)


 p s 2 | y , ELSE except δ p s 2 | ν , s 2 , π 
δ j*
j
δ j[ k −1]
δ
δ
min 
,1


2
2
2
=

 p s δ j[k −1] | y , ELSE except δ j p s δ j* | ν δ , sδ , π  ;




1, otherwise

(

) (

)

[A13]
If the proposal s δ2j* is rejected, then set s δ2j [ k ] = s δ2j [ k−1] ; i.e., the value of s δ2j in the
previous MCMC cycle. It can be demonstrated that using Meuwissen et al. (2001) that
[A13] is further equal to:

(

q s δ2j[k −1] → s δ2j*

)




 1

* −1/2
exp  − y *' V j *−1y * 
V
j



 2

min 
,1
1
−
1/2
−
=
 1
  [A14]
[t −1]
 V [t −1]

exp  − y *' V j
y * 
 j

 2
 


1, otherwise


(

)

−1
Note that neither the determinant V j nor the inverse V j are trivial computations since

m is typically large.

Adapting a development from Rohan Fernando (personal

communication) for BayesB, it can be readily shown that [A14] further simplifies:

140

(

q s δ2j[k −1] → s δ2j*

=
where v*j

(h

)


* 2 



'
h
y
−1/2
(
1
j
j)
*

 ( v ) exp  −


j


 2
 
v*j

 ,1
min 

2


=


h j ' y *j )  
(
1
[ t +1] −1/2


( v j ) exp  − 2 v[t +1]  



j

 



1, otherwise


' h j ) s δ2j * + ( h j ' h j )=
s e2 and v[jk −1]
2

j

(h

[A15]

' h j ) s δ2j [ k −1] + ( h j ' h j ) s e2 . Once
2

j

s δ2 is sampled, one could immediately draw δ j from p (δ j | ELSE , y ) readily seen to be
j

 h 'j y *j

s e2

p (δ j | ELSE , y ) = N  '
,
 h j h j + s δ−2 h 'j h j + s δ−2 
j
j



[A16]

(

)

in order to complete the joint collapsed sampler draw from p s δ2j , δ j | ELSE , y . One

h j −1 t j , j −1h j + z j −1 , j =
could demonstrate the following backward recursive relationship=
m, m -2,….,2 with zj denoting column j of Z and hm= zm. Hence for computational

(

tractability, one could use this relationship in sampling pairs from p s δ2j , δ j | ELSE , y
starting with j = m and working recursively backwards to j=1.
To sample proportion of SNP markers associated with zero-effects in AnteBayesB, the FCD of πδ is based on the following:
m

(

)

p ( π δ | y , ELSE ) ∝ ∏ p s δ2j | ν δ , sδ2 , π δ p ( π δ | αδ , β δ )
j =1

141

[A17]

)

=
I (s δ
∑

where p ( π δ | αδ , β δ ) = Beta (αδ , β δ ) =
. Let m1

m

2

j =1

j

)

0 denote the number of

zero-valued elements sampled in σ δ for a particular MCMC cycle where I(.) denotes the
indicator function. Then it can be readily demonstrated that Equation [A17] is simply

=
p ( π δ | y, ELSE
) Beta (αδ + m1 , βδ + m − m1 )

.

To sample antedependence parameters and their corresponding hyperparameters,
consider now deriving the joint FCD of t = t2,1 , t3,2 ,..., tm ,m −1  ' :

p ( t | y ,ELSE )
 m

∝ p g | t2,1 , t3,2 ,..., tm,m −1 , s δ21 , s δ22 , s δ23 ,..., s δ2G  ∏ p ( t j , j −1 | mt , s t2 ) 
[A18]
 j =2


(

)

Borrowing developments, again from Chan and Jeliakov (2009, pg 462), the first
component of [A18] can be rewritten as:

(

p g | t21 , t32 ,..., tm ,m −1 , s δ21 , s δ22 , s δ23 ,..., s δ2m

)

 1

exp  − g ' ( I − T ) ' Δ −1 ( I − T ) g 
 2

2
2
2

 1 ( g3 − t32 g 2 ) 
 1 ( g 2 − t21 g1 ) 
g m − tm ,m −1 g m −1 ) 
(
1

  exp  −
 exp  −
∝ xp  −
2
2
2
 2





s
2
s
2
s
δ2
δ3
δm






 1

∝ exp  − g ( −1) − Ψt ' Δ(−−11) g ( −1) − Ψt 
 2

∝ ( I − T ) ' Δ −1 ( I − T )

(

1/2

)

(

)

[A19]

142

saving only terms that are functions of t with Ψ =diag ( g1 , g 2, ..., g m −1 ) being a diagonal
m-1 x m-1 matrix with the listed elements, g ( −1) =  g 2 g3  g m  ' , and

(

∆ ( −1) =
diag s δ22 , s δ23 ..., s δ2m

) being a diagonal m-1 x m-1 matrix with the listed elements.

Hence, [A18] can be rewritten as follows:

(

)

p ( t | y,ELSE ) ∝ p g ( −1) | t, Δ( −1) p ( t | 1mt , Is t2 )
 1


 1

∝  exp  − g ( −1) − Ψt ' Δ(−−11) g ( −1) − Ψt   exp  − 2 ( t − 1mt ) ' ( t − 1mt ) 
 2


 2s t


 1

∝  exp  − t − tˆ ' Σt−1 t − tˆ  
[A20]
 2



(

)

(

)

(

(

)

)

where

(

Σˆ t = Ψ ' Δ(−−11) Ψ + Is t−2

)

−1

[A21]

and

(

tˆ = Ψ ' Δ(−−11) Ψ + Is t−2

) ( Ψ ' Δ(
−1

−1
−1)

g ( −1) + 1s t−2 mt

)
( )

[A22]

2

Note that Ψ 'Δ(−−11) Ψ + Is t−2 is diagonal with elements g j s δ−j2+1 + s t−2 , j = 1,2,…,m-1,
−1
−2
whereas element j of Ψ 'Δ g + 1s t mt is g j g j +1s δ−j2+1 + s t−2 mt , j = 1,2,…,m-1. In other

(

words, the FCD of t j +1, j is t j +1, j | ELSE , y ~ N tˆj +1, j , sˆ t ( j +1, j2)

tˆj +1, j =

g j g j +1s δ−j2+1 + s t−2 mt

(g )
j

2

sδ + s
−2

j +1

−2
t

[A23]

143

)

where

and

sˆ t ( j +1, j2) =

(( g ) s
2

j

−2
δ j +1

+ s t−2

)

−1

[A24]

Note further that tˆj +1, j can be written as a weighted average:
g j g j +1s δ−j2+1 + s t−2 mt
s δ−j2+1 ( g j )
g j +1
s t−2
ˆ
=
t j +1, j =
+
mt
2
2
2
s δ−j2+1 ( g j ) + s t−2
s δ−j2+1 ( g j ) + s t−2 g j s δ−j2+1 ( g j ) + s t−2
2

[A25]

Now with g j = 0 , as one might anticipate occasionally with ante-BayesB with markers
defined at the beginning of a linkage group, tˆj +1, j = mt and sˆ

2
t ( j +1, j )

= s t2 such that one

2
draws t j +1, j from its prior density based on updated values of mt and s t . For the much

more common situation in ante-BayesB (assuming large π δ ) where gj ≠ 0 but s δ2j+1 = 0,
the FCD of t j +1, j can be shown to be a normal with mean tˆj +1, j =

sˆ

2
t ( j +1, j )

g j +1
gj

and variance

= s t2 .

With p ( mt ) specified to be normal with prior mean mt0 and prior variance s2t0 then
Gibbs sampling can be used for the corresponding parameters.

p ( mt ,| y,ELSE ) = N ( m t , s t2 )

[A26]

where

144

m −1

m t =

s

2
t

1
mt 0
st20
m −1

t +

1
+ 2
st20
st

[A27]

m

for t =

∑t

j , j −1

2

m −1

and

 1 m −1 
=
s  2 + 2 
st 
 st 0

−1

2
t

[A28]

2
The FCD of s t given that the prior p (s t2 | ν t , st2 ) is scaled inverted chi-square
2
with known hyperparameters ν t and st can be derived as follows:

p (s t2 ,| y, ELSE )
 m

∝  ∏ p ( t j , j −1 | mt , s t2 )  p (s t2 | ν t , st2 )
 j =2

ν s2

ν

t t
− t +1 − 2

− ( m −1)/2
 1 m
2 
exp  − 2 ∑ ( t j , j −1 − mt )   s t2  2  e 2s t
∝  ( 2πs t2 )


 2s t j =2



[A29]

 ν + m −1 

 1  m
− t
+1

2
∝  (s t2 )  2  exp  − 2  ∑ ( t j , j −1 − mt ) + ν t st2   
 2s j =2


t 
  



m


2
That is, p (s t2 ,| y , ELSE =
) χ −2  m +ν t , ∑ ( t j , j −1 − mt ) +ν t st2  . Note that we advocate
j =2



2
the non-informative specificationsν t = −1 and st = 0 in the paper.

145

To sample the scale parameter for the random SNP effects, borrowing results
2
from Yi and Xu (2008), the FCD for sδ based on the specification of a conjugate prior

p ( sδ2 | α s , β s ) = Gamma (α s , β s ) can be written as follows:

 m

p ( sδ2 | y , ELSE ) ∝  ∏ I s δ2j > 0 p s δ2j | ν δ , sδ2  p ( sδ2 | α s , β s )
 j =1

νδ


2
2


s
ν


δ δ
2
sν
αs


ν
 − δ δ
2  β
 m
− δ +1
α s −1
2
2
2
s
(
δj
s)


2
2  2 
=  ∏ I sδ j > 0
sδ j
e
sδ2 ) e − β s sδ

(
ν 
 j =1
 Γ (α s )
Γ δ 


 2




m1ν δ +1

ν m

αs +
−1
2
∝ ( sδ2 )
exp  − sδ2  δ ∑ I s δ2j > 0 s δ−j2 + β s  


 2 j =1



(

(

) (

)

)

(

[A30]

)

i.e., a Gamma distribution with parameters α s +

m1νδ +1
ν
and δ
2
2

∑ I (s δ
m

2

j =1

j

)

> 0 s δ−j2 + β s .

To sample the degrees of freedom parameter for the random SNP effects, simple
Metropolis updates could be used for sampling ν δ . For an arbitrary prior p (ν δ ) , the
corresponding FCD is as follows:
 m

p (ν δ | ELSE ) ∝  ∏ I s δ2j > 0 p s δ2j | ν δ , sδ2  p (ν δ )
 j =1

νδ


 ν δ sδ2  2

2 
ν s


ν
 − δ δ
 m
− δ +1
2 
2s g2 

2
2  2 
s gj
=  ∏ I sδ j > 0
e j  p (ν δ )
ν 
 j =1

Γ δ 


2







(

(

) (

)

)

146

[A31]

Details on how to ν δ can be based on a random walk Metropolis Hastings step; we have
provided details on this in other non-genomic applications involving the sampling of
degrees of freedom parameters (Kizilkaya and Tempelman 2005; Bello, Steibel et al.
2010).
To sample the residual variance, given a specified scaled inverted chi-square prior
2
p (s e2 ,| α e , se2 ) = χ −2 (α e , α e se2 ) , the corresponding FCD of s e can be written as follows:

p (s e2 | y , ELSE )

(

∝ ( 2πs
∝s

)

2 − n /2
e

ν + n 
− e +1
2  2

e

)

ν

ν s2



 1
 2 − 2e +1 − 2es ee2
e
exp  − 2 ( y-Xβ − Zg − Wu ) ' ( y-Xβ − Zg − Wu )  s e
 2s e


 1

exp  − 2 ( ( y-Xβ − Zg − Wu ) ' ( y-Xβ − Zg − Wu ) +ν e se2 ) 
 2s e


[A32]
In other words, [A32] is χ −2 (ν e + n, ( y-Xβ − Zg − Wu ) ' ( y-Xβ − Zg − Wu ) +ν e se2 ) . Note
2
that we advocate the non-informative specificationsν e = −1 and se = 0 in the paper.

To sample the polygenic variance, given a conjugate scaled inverted-chi square
2
prior p (s u2 ,| α u , su2 ) = χ −2 (α u , α u su2 ) , the FCD of s e is classically given as follows:

p (s | y,ELSE ) ∝ (s
2
u

ν u + q 
2 - 2 +1
u

)

 1 ( u′A -1u +ν u su2 ) 

exp   2

s u2



147

[A33]

In other words, [A33] is χ −2 (ν u + q, u′A -1u +ν u su2 ) . Note that we advocate the non2
informative specifications ν u = −1 and su = 0 in the paper.

148

A2 Supplementary Figures and Tables

Figure A2.1: Average posterior means of mt and empirical standard errors across 20
replicates for each of six different LD levels using ante-BayesA and ante-BayesB. No
significant differences (P>0.01) were determined between the competing procedures with
each other or from zero at each LD level.

149

Figure A2.2: Average posterior means of s t and empirical standard errors across 20
replicates for each of six different LD levels using ante-BayesA and ante-BayesB. No
significant differences (P>.01) were determined between the two sets of competing
procedures at each LD level.
2

150

Figure A2.3: Box-plot of proportions of the absolute posterior means of elements of

{t }

m

j , j −1 j = 2

divided by their respective posterior standard deviations that exceeded 2 across

all 20 replicates for each of six different levels of LD using ante-BayesA (A) and anteBayesB (B).

151

Figure A2.4: Average posterior probabilities of association for the top QTL within each
of 20 replicates using BayesB and ante-BayesB for each of six different LD levels. LDspecific differences between the two methods declared significant by *(P<0.01), **( P
<0.001), or ***(P<0.0001).

152

153

Figure A2.5: Bar plots of posterior probabilities of association of either or both of two
bracketing SNP to each of the six largest QTL effects within each of the first four
replicates (A,B,C,D) at the highest (r2=0.31), medium (r2=0.24) and lowest (r2=0.15)
average LD levels. Posterior probabilities using BayesB and ante-BayesB are
represented by green and black bars, respectively, whereas gray bars represent the
proportion of the genetic variance accounted for by the corresponding QTL. QTL
location is labeled on x-axis for each replicate.

154

Figure A2.6: Boxplots of estimated slopes for within-replicate regressions of true
breeding values on estimated breeding values across 9 replicates for four traits in
Generations 6, 8 and 10 from benchmark data of Hickey and Gorjanc (2011) using anteBayesB (black), BayesB (dark gray), anteBayesA (light gray) and BayesA (white).
Differences from unity indicated as significant by *(0.05<P<.10), **( 0.01<P<.05) or
***( P<.01).

155

40

60

80

0.6

Posterior mean for ant

20

0.3 0.4 0.5

0.45
0.35
0.25

Posterior mean for ant

0

100

0

20

40

100

60

80

100

60

80

100

60

80

100

0.2

Posterior mean for ant

-0.2

20

40

60

80

100

0

20

40

20

40

60

80

-0.3 -0.1 0.1

0.1
-0.1
-0.3

0

0.3

cM
Posterior mean for ant

cM
Posterior mean for ant

80

0.0

0.2
0.0

0

100

0

20

40

-0.45
-0.60

-0.8 -0.6 -0.4 -0.2

-0.30

cM
Posterior mean for ant

cM
Posterior mean for ant

60
cM

-0.2

Posterior mean for ant

cM

0

20

40

60

80

100

0

cM

20

40
cM

{ }

FIGURE A2.7: Posterior means of antedependence parameters t j , j −1

m
j =2

versus

corresponding SNP bracket location based on anteBayesA (left column) and anteBayesB
(right column) for each of the first four replicates (rows 1 through 4) based on analyses
using highest average marker density (r2 = 0.31) Arrows denote the position for any
QTL that accounted for greater than 2% of the total variance in each replicate.

156

FIGURE A2.8: Posterior means of g using ante-BayesA (left-column) and ante-BayesB
(right-column) based on specifying antedependence in one direction along the
chromosome against corresponding posterior means based on the same analyses but
specifying antedependence in the opposite direction for each of the first four replicates
(rows 1 through 4) and the highest average marker density (r2 = 0.31). Reference lines of
intercept 0 and slope 1 are superimposed.

157

FIGURE A2.9: Posterior means of EBV using ante-BayesA (left-column) and ante-BayesB
(right-column) based on specifying antedependence in one direction along the
chromosome against corresponding posterior means based on the same analyses but
specifying antedependence in the opposite direction for each of the first four replicates
(rows 1 through 4) and the highest average marker density (r2 = 0.31). Reference lines of
intercept 0 and slope 1 are superimposed.

158

FIGURE A2.10. Posterior means of g based on ante-BayesA versus BayesA (left-column)
and ante-BayesB versus BayesB (right-column) for each of the first four replicates (rows
1 through 4) and the highest marker density (r2 = 0.31). Reference lines of intercept 0 and
slope 1 are superimposed.

159

Table A2.1: Average posterior means (PMEAN), posterior standard deviations (PSD),
2
posterior medians (PMED), and effective sample size (ESS) for residual variance ( s e ),
2
2
cage variance ( s c ), polygenic variance ( s u2 ) and key hyperparameters (ν g , sg , and π g )
based on BayesA and BayesB analyses of training data subsets derived from 20 different
partitions of the heterogeneous stock mice dataset. Empirical standard deviations across
the 20 partitions are provided in parentheses.

Parameter
BayesA

PMEAN

PSD

PMED

ESS

s e2

0.32(0.05)

0.15(0.003)

0.31(0.05)

6126(283)

s c2

2.08(0.13)

0.31(0.014)

2.05(0.12)

6048(247)

s u2
νg

3.21(0.15)

0.49(0.018)

3.18(0.13)

5728(213)

16.12(1.83)

23.11(2.72)

7.62(0.56)

285(20)

sg2
BayesB
s e2

0.002(0.00014)

0.0007(0.0001)

0.002(0.00014)

266(17)

0.34(0.04)

0.13(0.002)

0.34(0.04)

12745(1213)

s c2

2.03(0.12)

0.35(0.011)

2.04(0.13)

11642(1086)

s u2
νg

3.25(0.17)

0.53(0.021)

3.25(0.16)

9892(924)

19.31(1.03)

24.14(1.09)

9.47(0.68)

536(65)

sg2

0.02(0.003)

0.0006(0.003)

0.02(0.0004)

493(48)

πg

0.81(0.03)

0.01(0.004)

0.81(0.03)

517(61)

160

Table A2.2: Average posterior means (PMEAN), posterior standard deviations (PSD),
posterior medians (PMED), and effective sample size (ESS) for residual variance ( s e2 ),
cage variance ( s c2 ), polygenic variance ( s u2 ) and key hyperparameters (ν δ , sδ2 , π g , mt
and s t2 ) based on ante-BayesA and ante-BayesB analyses of training data subsets
derived from 20 different partitions of the heterogeneous stock mice dataset. Empirical
standard deviations across the 20 partitions are provided in parentheses.
Parameter
AnteBayesA

PMEAN

PSD

PMED

ESS

s e2

0.32(0.04)

0.14(0.002)

0.32(0.03)

2517(134)

s c2

2.01(0.13)

0.34(0.012)

2.02(0.13)

2483(128)

s u2
νδ

3.22(0.14)

0.53(0.03)

3.21(0.14)

2357(122)

15.54(1.32)

21.23(1.54)

7.27(0.46)

185(26)

sδ2

0.001(0.0004) 0.0006(0.00005) 0.001(0.0004)

144(20)

mt

0.030(0.002)

0.0139(0.004)

0.030(0.001)

2561(52)

s t2

0.037(0.004)

0.010(0.0003)

0.034(0.004)

986(43)

s e2

0.34(0.04)

0.15(0.004)

0.33(0.04)

9512(765)

s c2

2.02(0.12)

0.39(0.017)

2.01(0.10)

9134(726)

s u2
νδ

3.23(0.15)

0.52(0.03)

3.22(0.14)

9038(689)

18.21(1.04)

22.46(1.22)

9.26(0.74)

376(33)

sδ2

0.14(0.05)

0.32(0.18)

0.07(0.15)

259(21)

πδ
mt

0.80(0.04)

0.01(0.004)

0.79(0.03)

343(32)

0.02(0.003)

0.011(0.002)

0.02(0.004)

1480(49)

s t2

0.032(3e-3)

0.05(2e-4)

0.031(3e-3)

638(41)

AnteBayesB

161

Table A2.3: Average posterior means for residual variance ( s e2 ), polygenic variance
2
( s u2 ), key hyperparameters (ν g , sg , and π g ) based on BayesA and BayesB analyses and

key hyperparameters (ν δ , sδ2 , π g , mt and s t2 ) based on ante-BayesB and ante-BayesA
analyses for 4 different traits using Hickey and Gorjanc (2011) benchmark data.
Empirical standard deviations across 9 replicates are provided in parentheses.
Trait 1

Trait 2

Trait 3

Trait 4

s e2

0.834(0.019)

1.348(0.125)

0.593(0.015)

0.939(0.038)

s u2
νδ

0.131(0.017)

0.161(0.033)

0.101(0.010)

0.144(0.020)

28.043(1.841)

15.707(2.430)

22.941(2.158)

17.792(2.953)

sδ2

7.13e-4(4.63e-5)

1.34e-3(1.74e-4)

4.40e-4(5.30e-5)

9.14e-4(8.21e-5)

πδ
mt

0.789(0.016)

0.834(0.015)

0.771(0.017)

0.826(0.019)

0.032(0.011)

0.033(0.009)

0.011(0.015)

0.013(0.016)

s t2

0.038(0.002)

0.034(0.001)

0.031(0.006)

0.046(0.009)

s e2

0.833(0.020)

1.339(0.123)

0.591(0.015)

0.937(0.040)

s u2
νg

0.108(0.017)

0.141(0.034)

0.085(0.010)

0.114(0.012)

31.643(1.623)

23.475(3.750)

33.536(2.501)

27.203(3.674)

sg2

8.23e-4(5.60e-5)

1.97e-3(3.21e-4)

5.31e-4(6.75e-5)

4.91e-3(3.71e-3)

πg

0.826(0.011)

0.861(0.015)

0.823(0.017)

0.871(0.020)

s e2

0.827(0.020)

1.339(0.124)

0.590(0.015)

0.935(0.040)

s u2
νδ

0.157(0.030)

0.208(0.051)

0.143(0.018)

0.169(0.030)

22.556(1.192)

12.501(2.840)

20.909(1.194)

14.941(3.003)

sδ2

6.40e-5(1.45e-5)

6.84e-5(1.91e-5)

2.22e-5(1.06e-5)

4.85e-5(1.81e-5)

mt

0.021(0.013)

0.007(0.020)

0.002(0.016)

0.021(0.025)

s t2

0.035(0.003)

0.022(0.005)

0.032(0.002)

0.027(0.004)

s e2

0.829(0.019)

1.338(0.124)

0.592(0.015)

0.939(0.040)

s u2
νg

0.099(0.015)

0.123(0.034)

0.071(0.009)

0.097(0.012)

23.871(2.033)

14.639(3.147)

23.032(2.832)

15.882(3.812)

sg2

1.05e-4(8.70e-6)

1.22e-4(1.68e-5)

6.33e-5(7.23e-6)

8.30e-5(1.61e-5)

anteBayesB

BayesB

anteBayesA

BayesA

162

Table A2.4: Average effective sample size for residual variance ( s e2 ), polygenic variance
ν s2
π
s2
( u ), key hyperparameters ( g , g , and g ) based on BayesA and BayesB analyses and
ν s2 π m
s2
key hyperparameters ( δ , δ , g , t and t ) based on ante-BayesB and ante-BayesA
analyses for 4 different traits using Hickey and Gorjanc (2011) benchmark data.
Empirical standard deviations across 9 replicates are provided in parentheses.
Trait 1

Trait 2

Trait 3

Trait 4

s e2

4910(879)

4883(757)

6139(1065)

4163(599)

s u2
νδ

1423(317)

905(131)

1164(174)

961(177)

136(12)

157(25)

125(10)

146(24)

sδ2

108(7)

111(9)

113(12)

121(19)

πδ
mt

216(19)

236(24)

210(20)

249(25)

168(29)

184(35)

150(26)

132(27)

s t2

241(46)

302(63)

207(45)

225(53)

s e2

3058(541)

4814(1181)

3462(555)

3354(344)

s u2
νg

1056(238)

968(284)

993(154)

836(85)

137(6)

162(18)

127(5)

128(13)

sg2

138(4)

122(28)

119(16)

126(16)

πg

264(36)

347(41)

297(26)

275(39)

s e2

4887(1131)

6352(1900)

9318(1823)

5667(1638)

s u2
νδ

426(64)

1429(1026)

782(225)

810(321)

105(4)

164(48)

101(8)

122(25)

sδ2

114(9)

120(34)

107(20)

118(24)

mt

137(10)

110(4)

101(18)

107(10)

s t2

191(25)

273(66)

201(25)

259(51)

s e2

2794(342)

4212(1053)

2986(402)

2541(200)

s u2
νg

677(93)

566(111)

642(77)

494(38)

114(6)

211(69)

123(4)

229(65)

sg2

150(14)

187(58)

145(11)

183(42)

anteBayesB

BayesB

anteBayesA

BayesA

163

APPENDIX B: Chapter 3
B1 Description of three sampling strategies
B1.1 Sampling strategy for DFMH
To sample the degrees of freedom parameter for the random SNP effects, we used
a proper prior p (ν ) ∝ (ν + 1)

−2

corresponding to a Uniform(0,1) prior on (ν + 1) . The
−1

FCD for ν is as follows:

 m

p (ν | ELSE ) ∝  ∏ I s g2 j > 0 p s g2 j | ν , s 2  p (ν )
 j =1


(



 m
=  ∏ I s g2 j > 0
 j =1



(

) (

)

ν

)

 ν s2  2
ν s2
ν 
 2 
− +1 − 2s 2

 s 2  2 e g j
gj
ν 
Γ 
2



 1

2
 (1 + v )



As this FCD is not recognizable, we could use a random walk normal MH step on
ξ = log(ν ) . Note that the Jacobian from ν to ξ is exp(ξ ) . The corresponding FCD for ξ

is as follows:

p (ξ | ELSE )
 ( exp(ξ ) s 2 / 2 )exp (ξ )/2 

∝
 Γ ( exp(ξ ) / 2 ) 



Where
=
m1

∑ I (s
m

j =1

2
gj

m1

exp ( ξ ) s
 m
 exp ( ξ )  −
−
+1
2s g2 j
2
2  2


s
0
s
I
>
e
gj
gj
∏
j =1


(

)

)

> 0 , hence

164

2


1

exp(ξ )
 (1 + exp(ξ ) ) 2


log p (ξ | ELSE )
 exp(ξ )
 exp(ξ )  
ξ + log( s 2 ) − log(2) ) − log Γ 
= m1 
(
 +
 2 
 2

∑ I (s
m

j =1

2
gj

  exp(ξ ) 
exp(ξ ) s 2 
2
> 0 −
+ 1 log(s g j ) −
 − 2 log(1 + exp(ξ )) + ξ
2
  2

s
2

g
j



)

Suppose the value of ξ in the current cycle i is ξ [i ] , we could propose a random walk
value for ξ [i +1] in the next cycle from a Gaussian distribution:

 − (ξ * −ξ [i ] )2 
1

p (ξ *) =
exp 
2cv2


2π cv


That is equivalent to generate a random variable, say δ from N(0, cv2 ) and add it to ξ [i ]
*
to propose ξ=
ξ [i ] + δ . To determine the odds ratio α =

p (ξ * | ELSE )

p (ξ [i ] | ELSE )

, we evaluated

this ratio as:

α

(

exp log p (ξ * | ELSE ) − log p (ξ [i ] | ELSE )

)

To implement this Metropolis sampling strategy, we first generated u from a
Uniform(0,1) distribution. Then, 1) If α > 1 , accept ξ [i +1] = ξ * ; 2) If α > u , accept

ξ [i +1] = ξ * ; 3) If α < u , then set ξ [i +1] = ξ [i ] . The following tuning procedure is to
determine cv2 :
1) For the last 10 cycles, the rate of acceptance is greater than 80%, increase cv2 by a
factor of 1.2.
165

2) For the last 10 cycles, the rate of acceptance is less than 20%, decrease cv2 by a factor
of 0.7.
3) After the burn-in, keep cv2 constant and monitor subsequent acceptance rates to ensure
that they fall within 25 to 75%.
To sample the scale parameter for the random SNP effects, borrowing results
from Yi and Xu (2008), the FCD for s 2 based on the specification of a conjugate prior
p ( s 2 | α s , β s ) = Gamma (α s , β s ) can be written as follows:

 m

p ( s 2 | ELSE ) ∝  ∏ I s g2 j > 0 p s g2 j | ν , s 2  p ( s 2 | α s , β s )
 j =1


(



 m
=  ∏ I s g2 j > 0
 j =1



(

∝ ( s2 )

αs +

m1ν
−1
2

) (

)

ν

)

 s 2ν  2
s 2ν
ν  − 2
 2 
− +1

 s 2  2  e 2s g j
gj
ν 
Γ 
2



 ( β s )α s 2 α s −1 − β s s2
(s ) e

 Γ (α s )




ν m

exp  − s 2  ∑ I s g2 j > 0 s g−2j + β s  


 2 j =1



(

)

mν
i.e., a Gamma distribution with parameters α s + 21 and

ν

∑ I (s
2
m

j =1

2
gj

)

> 0 s g−2j + β s .

B1.2 Sampling strategy for UNIMH
To sample the degrees of freedom parameter for the random SNP effects, the FCD
for sampling ν is as follows:

166

 m

p (ν | ELSE ) ∝  ∏ I ( g j ≠ 0 ) p ( g j | ν g , sg2 )  p (ν )
 j =1


 v +1
v +1 
Γ
2 − 2 
 1 1/2 
 m

g
1
2 

1 + j2  
=
∏ I ( g j ≠ 0 ) 


2 
 v  π vs   vs   (1 + v ) 2
 j =1
Γ  


2

As this FCD is not recognizable, we could use a random walk normal MH step on
ξ = log(ν ) . Note that the Jacobian from ν to ξ is exp(ξ ) . The corresponding FCD for ξ

is as follows:

p (ξ | ELSE )
 Γ ((exp(ξ ) +1) / 2 )  

1
∝
 

2
 Γ ( exp(ξ ) / 2 )   π exp(ξ ) s 
m1

1

(1 + exp(ξ ) )
Where
=
m1

2

m1 /2

exp ( ξ ) +1
−
 m

2
2


g
j


I ( g j ≠ 0 ) 1 +
2
∏

 exp(ξ ) s 
 j =1




exp(ξ )

∑I (g
m

j =1

j

≠ 0 ) , hence

log p (ξ | ELSE )



1
 exp(ξ ) +1 
 exp(ξ )  1
= m1  log Γ 
+
 − log Γ 
 + log 
2 
2
2
2
π
exp
(
ξ
)
s








  exp(ξ ) +1 

g 2j
I ( g j ≠ 0)  − 
) − 2 log(1 + exp (ξ )) + ξ
∑
 log(1 +
2 
2
exp(ξ ) s 

j =1
 
m

Suppose the value of ξ in the current cycle i is ξ [i ] , we could propose a random walk
value for ξ [i +1] in the next cycle from a Gaussian distribution:

167

 − (ξ * −ξ [i ] )2 
1

p (ξ *) =
exp 
2
c
2


2π cv
v


That is equivalent to generate a random variable, say δ from N(0, cv2 ) and add it to ξ [i ]
*
to propose ξ=
ξ [i ] + δ . To determine the odds ratio α =

p (ξ * | ELSE )

p (ξ [i ] | ELSE )

, we evaluated

this ratio as:

α

(

exp log p (ξ * | ELSE ) − log p (ξ [i ] | ELSE )

)

To implement this Metropolis sampling strategy, we first generated u from a
Uniform(0,1) distribution. Then, 1) If α > 1 , accept ξ [i +1] = ξ * ; 2) If α > u , accept

ξ [i +1] = ξ * ; 3) If α < u , then set ξ [i +1] = ξ [i ] . The following tuning procedure is to
determine cv2 :
1) For the last 10 cycles, the rate of acceptance is greater than 70%, increase cv2 by a
factor of 1.2.
2) For the last 10 cycles, the rate of acceptance is less than 20%, decrease cv2 by a factor
of 0.7.
3) After the burn-in, keep cv2 constant and monitor subsequent acceptance rates to ensure
that they fall within 25 to 75%.

168

To sample the scale parameter for the random SNP effects, the FCD for sampling
s 2 is as follows:

 m

p ( s 2 | ELSE ) ∝  ∏ I ( g j ≠ 0 ) p ( g j | ν , s 2 )  p ( s 2 | α s , β s )
 j =1


 v +1
v +1 
Γ
α
2 − 2 
 1 1/2 
 m
gj 
β s ) s 2 αs −1 − β s s2
(
2 


=
 ∏ I ( g j ≠ 0)

(s ) e


 1 +
Γ (α s )
 v   π vs 2   vs 2 
 j =1

Γ 


2


Even if this FCD is recognizable, we could use a random walk normal MH step on
2
ψ = log( s 2 ) . Note that the Jacobian from s to ψ is exp(ψ ) . The corresponding FCD for

ψ is as follows:
p (ψ | ELSE )

 v +1
v +1 
Γ
α
2 − 2 
 1 1/2 
 m
gj 
β s ) s 2 αs −1 − β s s2
(
2 


 ∏ I ( g j ≠ 0)

(s ) e


 1 +
Γ (α s )
 v   π vs 2   vs 2 
 j =1

Γ 


2


 Γ ( (ν + 1) / 2 )  

1
∝
 

 Γ ( v / 2 )   π vexp(ψ ) 
m1

( exp(ψ ) )

α s −1

Where
=
m1

m1 /2

v +1
−
 m

2
2


g
j


I ( g j ≠ 0 ) 1 +

∏

 vexp(ψ ) 
 j =1




e − β s exp (ψ ) exp(ψ )

∑I (g
m

j =1

j

≠ 0 ) , hence

169

log p (ψ | ELSE )


1
 v +1
v 1
= m1  log Γ 
) +
 − log Γ   + log(
π vexp(ψ ) 
 2 
2 2

 ν + 1

g 2j
I
g
0
log(1
)
≠
−
+
+ (α s − 1)ψ − β s exp(ψ ) + ψ
(
)


∑
j


vexp(ψ ) 
j =1
  2 
m

Suppose the value of ψ in the current cycle i is ψ [i ] , we could propose a random walk
value for ψ [i +1] in the next cycle from a Gaussian distribution:

 − (ψ * −ψ [i ] )2 
1

p (ψ *) =
exp 
2
2cs


2π cs


That is equivalent to generate a random variable, say δ from N (0, cs2 ) and add it to ψ [i ]
*
to propose ψ
=
ψ [i ] + δ . To determine the odds ratio α =

p (ψ * | ELSE )

p (ψ [i ] | ELSE )

, we evaluated

this ratio as:

α

(

exp log p (ψ * | ELSE ) − log p (ψ [i ] | ELSE )

)

To implement this Metropolis sampling strategy, we first generated u from a
Uniform(0,1) distribution. Then, 1) If α > 1 , accept ψ [i +1] = ψ * ; 2) If α > u , accept

ψ [i +1] = ψ * ; 3) If α < u , then set ψ [i +1] = ψ [i ] . The following tuning procedure is to
determine cs2 :
1) For the last 10 cycles, the rate of acceptance is greater than 70%, increase cs2 by a
factor of 1.2.
170

2) For the last 10 cycles, the rate of acceptance is less than 20%, decrease cs2 by a factor
of 0.7.
3) After the burn-in, keep cs2 constant and monitor subsequent acceptance rates to ensure
that they fall within 25 to 75%.
B1.3 Sampling strategy for BIVMH
To sample the degrees of freedom and scale parameters for the random SNP
effects, we divided burn-in into four stages with equal length as follows:
2
In stage 1, we sample log(ν ) and log( s ) using UNIMH (see sampling strategy 2) with

fine-tuning procedure on cv2 and cs2 , which are also the variances for the two separate
Gaussian proposal densities;
2
In stage 2, we sample log(ν ) and log( s ) using UNIMH with fixing cv2 and cs2 to the

values tuned from the last cycle in stage 1 and compute correlation r between samples of
log(ν ) and log( s 2 ) within stage 2;
2
In stage 3, we jointly sample log(ν ) and log( s ) using a bivariate Gaussian proposal

density with variances cv2 and cs2 based on those tuned at the end of Stage 1 and a
covariance based on the correlation computed from Stage 2. Joint density for ν and s 2 is
as follows:

171

 m

p ( v, s 2 | ELSE ) ∝  ∏ I ( g j ≠ 0 ) p ( g j | ν , s 2 )  p (ν ) p ( s 2 | α s , β s )
 j =1


 v +1
v +1 
Γ
αs
2 − 2 
 1 1/2 
 m

g
1 ( βs )
2 

j
2 α s −1 − β s s 2

=
 ∏ I ( g j ≠ 0)
1
+

s
(
) e




2
Γ (α s )
 v   π vs 2   vs 2 
1
v
+
(
)
 j =1

Γ 


2


As this density is not recognizable, we could use a random walk normal MH step on
ξ = log(ν ) and ψ = log( s 2 ) . Note that the Jacobian from ν to ξ is exp(ξ ) . Note that the

Jacobian from s 2 to ψ is exp(ψ ) . The corresponding joint density for ξ and ψ is as
follows:
p (ξ ,ψ | ELSE )
 Γ ((exp(ξ ) +1) / 2 )  

1

 

 Γ ( exp(ξ ) / 2 )   π exp(ξ )exp(ψ ) 
m1

1

(1 + exp(ξ ) )
Where
=
m1

( exp(ψ ) )

α s −1

2

∑I (g
m

j =1

j

m1 /2

exp ( ξ ) +1
−
 m

2
2


g
j


I ( g j ≠ 0 ) 1 +

∏

(
)
(
)
exp
exp
ξ
ψ


 j =1




e − β s exp (ψ ) exp(ψ )exp(ξ )

≠ 0 ) , hence

log p (ξ ,ψ | ELSE )


1
 exp(ξ ) +1 
 exp(ξ )  1 
= m1  log Γ 
log
−
Γ
+




 +
2


 2  2  π exp(ξ )exp(ψ )  

  exp(ξ ) +1 

g 2j
0
log(1
) −
I
g
≠
−
+
( j )   2 
∑
exp(ξ )exp(ψ ) 
j =1

2 log(1 + exp(ξ )) + (α s − 1)ψ − β s exp(ψ ) + ψ + ξ
m

172

Suppose the value of η =[ξ ,ψ ]′ in the current cycle i is η[i ] , we could propose a random
walk value for η[i +1] in the next cycle from a bivariate Gaussian distribution:

=
p ( η *)

1
2π cη2Σ

1/2

−1


exp  − ( η * −η[i ] )′ ( cη2Σ ) ( η * −η[i ] ) 



2
That is equivalent to generate a random variable, say δ from N(0, cηΣ ) and add it to η[i ]

 c2
v
to propose η
= η + δ , where Σ =
 r cv2cs2
*

[i ]

r cv2cs2  2
 , cv and cs2 were fixed value
cs2 

2
from last cycle in stage 1, correlation r between samples of log(ν ) and log( s )

computed from stage 2.

To determine the odds ratio α =

α

p ( η* | ELSE )

p ( η[i ] | ELSE )

(

, we evaluated this ratio as:

exp log p ( η* | ELSE ) − log p ( η[i ] | ELSE )

)

To implement this Metropolis sampling strategy, we first generated u from a
Uniform(0,1) distribution. Then, 1) If α > 1 , accept η[i +1] = η * ; 2) If α > u , accept
η[i +1] = η * ; 3) If α < u , then set η[i +1] = η[i ] . The following tuning procedure is to
2
determine cη :

2
1) For the last 10 cycles, the rate of acceptance is greater than 60%, increase cη by a

factor of 1.2.

173

2
2) For the last 10 cycles, the rate of acceptance is less than 10%, decrease cη by a factor

of 0.7.
2
3) After the burn-in, keep cη constant and monitor subsequent acceptance rates to ensure

that they fall within 25 to 75%.
2
In stage 4, we jointly sample log(ν ) and log( s ) using a bivariate Gaussian proposal

2
density with fixing value of cη at the end of stage 3. After burn-in, we started to save all

samples on ν and s 2 using MH with the bivariate Gaussian proposal density.

174

B2 Supplementary tables and figures

Table B2.1: Average posterior means (PMEAN), posterior standard deviations (PSD),
posterior medians (PMED), and effective sample size (ESS) for residual variance ( s e2 ),
cage variance ( s c2 ), polygenic variance ( s u2 ) and key hyperparameters (ν , s 2 , and π )
based on BayesA and BayesB analyses of subsets with 950 SNPs from the heterogeneous
stock mice dataset.
DFMH
Parameter PMEAN PSD
BayesA
s e2
0.16
0.09
2
sc
2.02
0.21
2
su
4.02
0.36
18.60
47.26
ν
s2
2e-3
6e-4
BayesB
s e2
0.16
0.06
s c2
1.94
0.21
s u2
3.92
0.32
29.16
65.25
ν
2
s
0.02
5e-3
0.83
0.10
π

ESS

UNIMH
PMEAN PSD

ESS

BIVMH
PMEAN PSD

ESS

2964
2515
2295
291
279

0.16
2.02
3.98
22.28
2e-3

0.08 3218
0.21 2797
0.36 2602
61.07 856
6e-4 498

0.16
2.02
3.98
24.49
2e-3

0.09
0.21
0.36
64.30
6e-4

3240
2757
2690
1092
504

4918
4301
4113
395
314
391

0.16
1.91
3.91
30.61
0.02
0.83

0.06
0.21
0.30
72.58
5e-3
0.10

0.16
1.91
3.91
35.80
0.02
0.83

0.06
0.21
0.30
81.36
5e-3
0.10

6283
5466
5274
2957
813
639

175

6171
5441
5280
2346
782
619

Table B2.2: Average posterior means (PMEAN), posterior standard deviations (PSD),
posterior medians (PMED), and effective sample size (ESS) for residual variance ( s e2 ),
cage variance ( s c2 ), polygenic variance ( s u2 ) and key hyperparameters (ν , s 2 , and π )
based on BayesA and BayesB analyses of subsets with 1900 SNPs from the
heterogeneous stock mice dataset.
DFMH
Parameters PMEAN

PSD

UNIMH
ESS

BIVMH

PMEAN

PSD

ESS

PMEAN

PSD

ESS

BayesA
s e2
s

2
c

s u2

ν
s2

0.16
2.03
3.80
14.82
8e-4

0.08 1648
0.21 1515
0.39 1314
36.03 121
3e-4 109

0.16
2.00
3.80
19.96
8e-4

0.07
0.20
0.38
55.46
3e-4

2228
2016
1983
1075
303

0.16
2.00
3.80
22.75
8e-4

0.07
0.21
0.39
61.18
3e-4

3195
2786
2238
1425
315

0.16
1.98
3.68
23.57
2e-3
0.85

0.07 4228
0.19 4037
0.35 3971
52.94 215
1e-3 194
0.09 208

0.16
1.98
3.68
30.50
2e-3
0.85

0.07
0.19
0.36
72.51
1e-3
0.08

5210
4468
4300
2474
475
424

0.16
1.98
3.68
33.75
2e-3
0.85

0.07
0.19
0.34
81.49
1e-3
0.09

6185
5891
5284
2948
542
594

BayesB
s e2
s

2
c

s u2

ν
s2

π

176

Table B2.3: Average posterior means (PMEAN), posterior standard deviations (PSD),
posterior medians (PMED), and effective sample size (ESS) for residual variance ( s e2 ),
cage variance ( s c2 ), polygenic variance ( s u2 ) and key hyperparameters (ν , s 2 , and π )
based on BayesA and BayesB analyses of subsets with 3800 SNPs from the
heterogeneous stock mice dataset.
DFMH
PSD ESS

Parameters PMEAN
BayesA
s e2
0.16
0.07
s c2
2.01
0.20
s u2
3.43
0.33
3.14
0.88
ν
2
s
5e-4
2e-4
BayesB
s e2
0.16
0.07
s c2
1.96
0.19
2
su
3.27
0.32
6.95
27.24
ν
2
s
1e-3
9e-4
0.88
0.10
π

UNIMH
PMEAN PSD

ESS

BIVMH
PMEAN PSD

ESS

1030
1011
827
111
103

0.16
2.01
3.43
3.14
5e-4

0.07
0.20
0.34
2.05
2e-4

1792
1565
1529
1280
407

0.16
2.01
3.43
3.24
5e-4

0.07
0.20
0.34
1.50
2e-4

2962
2537
1968
1339
456

3579
3134
2896
198
163
194

0.16
1.96
3.27
6.71
1e-3
0.88

0.07
0.19
0.33
20.79
9e-4
0.09

4198
3986
3761
2513
419
405

0.16
1.96
3.27
9.05
1e-3
0.88

0.07
0.19
0.31
30.10
9e-4
0.09

5230
4273
4158
3127
489
512

177

2

Figure B2.1: Average posterior means of s (BayesA, BayesB) using DFMH, UNIMH
and BIVMH across 15 replicates at LD level of 0.17, 0.24 and 0.32 comparing DFMH,
UNIMH and BIVMH using BayesA model in (A) and using BayesB model in (B).

178

Figure B2.2: Average posterior means of π using BayesB model across 15 replicates as
a function of LD levels 0.17, 0.24 and 0.32 comparing DFMH, UNIMH and BIVMH.

179

Figure B2.3: Average posterior means of v (BayesA, BayesB) across 15 replicates for
three different levels of LD comparing DFMH, UNIMH and BIVMH in BayesA model
(A) and in BayesB model (B).

180

2
Figure B2.4: Average posterior means of s in BayesA model using DFMH, UNIMH
and BIVMH across 15 replicates for three LD levels of 0.17 (bottom), 0.24(middle) and
0.32 (top) comparing DFMH, UNIMH and BIVMH.

2

Figure B2.5: Average posterior means of s in BayesB model using DFMH, UNIMH
and BIVMH across 15 replicates for three LD levels of 0.17 (bottom), 0.24(middle) and
0.32 (top) comparing DFMH, UNIMH and BIVMH.

181

Figure B2.6: Average posterior means of π using BayesB model across 15 replicates at
three LD levels of 0.17 (bottom), 0.24(middle) and 0.32 (top) comparing DFMH,
UNIMH and BIVMH.

Figure B2.7: Average posterior means of v across 15 replicates for three LD levels of
0.17 (bottom), 0.24(middle) and 0.32 (top) comparing DFMH, UNIMH and BIVMH in
BayesA model.

182

Figure B2.8: Average posterior means of v across 15 replicates for three LD levels of
0.17 (bottom), 0.24(middle) and 0.32 (top) comparing DFMH, UNIMH and BIVMH in
BayesB model.

183

APPENDIX C: Chapter 4
C1 Markov Chain Monte Carlo (MCMC) Implementation Strategy for all methods
C1.1 MCMC Implementation Strategy for IW-BayesB/IW-BayesA /IW-BayesC
For the RR/RN WGP model, we know that y is the n x 1 vector of phenotypes for
animals, β is the q x 1 vector of fixed effects, g1 represents the m x 1 vector of SNP
specific random intercept effects, g2 represents the m x 1 vector of SNP specific random
slope effects, X is the n x q design matrix for fixed effect, Z is a m x m genotype matrix,
D is the n x n diagonal matrix with the environmental covariates on the diagonal.
y=Xβ + Zg1 + DZg 2 + e
To sample location parameters β, g1, g2 computationally efficient, we adopted a GaussSeidel updating algorithm in MCMC implementation strategy (LEGARRA and MISZTAL
2008). To sample the random effects more efficiently in order to facilitate MCMC mixing,
we block sampled random intercept and slope effects one SNP at a time, such that we

′
′
define a vector g =  g11 , g 21 ,..., g1 j , g 2 j ,..., g1m , g 2 m  where g j =  g1 j , g 2 j  are random
intercept and slope effects for SNP j. Based on the description of priors for the three
models in materials and methods, we can write the joint posterior density for all unknown
parameters as following:

184

p ( β, g, G, vg , Σg , π , s e2 | y )

∝ p ( y | β, g, G, vg , Σg , π , s e2 ) p ( β )
 m
 m
 ∏ p(g j | G j )   ∏ p G j | vg , Σg , π
=
 j 1=
 j 1

(



)  p( Σ


g

| v0 , Σ0 ) p ( vg ) p (π ) p (s e2 )

where G j is a 2 x 2 genetic variance-covariance matrix and Σg is a 2 x 2 scale matrix for
random intercept and slope effects. In IW-BayesB ( π < 1 ), G has a mixture prior of point
mass at a 2 x 2 matrix of 0’s with probability 1 − π and Inverted Wishart distribution with
degrees of freedom vg and scale matrix Σg with probability π . In IW-BayesA ( π = 1 ), G
has an Inverted Wishart prior of degree freedom vg and scale matrix Σg . In IW-BayesC
( π = 1 and vg = ∞ ) , all SNPs share a common genetic variance-covariance matrix Σg .
To sample fixed effects in IW-BayesA/IW-BayesB/IW-BayesC, FCD for the kth
element of β is as follows:

β k | y, β − k , else ~ N ( βk , vk ) where

βk =

x.' k ( ( y-Xβ − Zg1 − DZg 2 ) +x.k β k )

x.' k ( e + x.k β k )
= =
x.' k x.k

x.' k x.k

(x

'
.k

e + x.' k x.k β k )
x.' k x.k

 n
 n 2 
x
e
+
 ∑ ik i  ∑ xik  β k 
i 1=
=
i1  
=
'
x.k x.k

 n 2
x
x
s
=
and vk s=
( .k )
 ∑ xik 
 i =1 
2
e

'
.k

−1

−1

2
e

185

(

)

e − x.k β k [ t +1] − β k [ t ] ,
Immediately after sampling β k , we update the residual by e =
where β k [ t +1] is the sampled β k value at cycle [t+1] and β k [ t ] is the sampled β k value at
cycle [t].
To sample random intercept and slope effects in IW-BayesA, FCD for the jth
element of g is as follows:

(


g j | y, β, g − j , G j , Σg , s e2 ~ N g j , V
gj

)

Where
−1

  z′. j 
  z′. j  
 g1 j  
−1 2




=
+
−
−
+
s
g j  
Zg
DZg
z
Dz
z
Dz
G
y-Xβ



(
)
j
e
.j
.j
1
2
 .j
 z′. j D   . j
  z′. j D  
g2 j  









−1

  z′. j z. j
  z′. j 
z′. j Dz. j 
= 
+ G j −1s e2  
e + z. j g1 j + Dz. j g2 j

 z′. j Dz. j z′. j DDz. j
 z′. j D 



 
z′. j Dz. j 
  z′. j z. j

+ G j −1s e2 
= 

 z′. j Dz. j z′. j DDz. j





−1

(

)

 z′. j ( e + z. j g1 j + Dz. j g 2 j ) 


 z′. j D ( e + z. j g1 j + Dz. j g 2 j )



  z′. j z. j z′. j Dz. j  −2


=
s e + G j −1 
and V
 
gj


′
′
  z. j Dz. j z. j DDz. j 


−1

(

)

(

)

After sampling g j , we update residual by e =
e − z. j g1[tj+1] − g1[ tj] − Dz. j g2[t j+1] − g2[ t j] ,
where g1 j [ t +1] , g 2 j [ t +1] are the sampled g1 j , g 2 j value at cycle [t+1] and g1 j [ t ] , g 2 j [ t ] are
the sampled g1 j , g 2 j value at cycle [t].

186

To sampling genetic variance-covariance matrices for random effects in IW-

(

)

BayesA, given the specified inverted Wishart prior p G j | vg , Σg ∝ IW( vg , Σg ) , the
FCD for G j can be written as follows:

p ( G j | y, else ) ∝ p ( g j | G j ) p ( G j | vg , Σg )
∝ Gj
∝ Gj

−1/2

(

)

(

exp −0.5 ( g′j G −j 1g j ) G j
)

1
− ( vg +1) +3
2











(

)

1
− vg +3
2

(

exp −0.5trace ( G −j 1Σg (vg − 3) )

)

  g1j2

g1j g 2j 


+
−
v
Σ
(
3)

g g
  g1j g 2j g 2j2 





exp  −0.5trace  G −j 1  


  g1j2 g1j g 2j 


|
,
~
IW
1,
G
y
+
+ Σg ( vg − 3)  
else
v
Hence,


g
j
2
 g1j g 2j g 2j






To sample random effects and genetic variance-covariance matrices in IWBayesB, according to the collapsed sampling strategy (Liu 1994), we jointly sampled g j
and G j as adapted in Bayes B (Meuwissen et al. 2001). We first sample from

p ( G j | y, ELSE except g j ) p ( G j | ν g , Σg , π ) V j
=

−1/2

 1

exp  − y *j ' V j−1y *j 
 2


 g1 j 
*
where y j =( y-Xβ − Z − j g1,− j − DZ − j g 2,− j ) = z. j Dz. j    + e
 g2 j 
 z ' 
V j var
=
=
and
( y*j ) z j Dz j  G j z 'j D + Is e2
 j 

187

Since the above FCD is not recognizable, we need to adopt Metropolis-Hastings (MH)
algorithm using the mixture prior p ( G j | ν g , Σg , π ) as the candidate density. At MCMC
*
cycle [t], we sample a candidate G j from the candidate density conditional on updated

[t ]
*
values for hyperparameters. We accept G j = G j with the probability based on the MH

(

)

[ t −1]
acceptance ratio of α G[jt −1] , G*j , where G j is the value at MCMC cycle [t-1].

(

)

Adapting Meuwissen et al. (2001), the MH ratio α G[jt −1] , G*j looks as follows:

α (G

[ t −1]
j

*
j

,G

)

(
(

) (
) (

)
)


 p G*j | ELSE except g j p G[jt −1] |ν G , ΣG , π

 min 

,1
=
 p G[jt −1] | ELSE except g j p G*j |ν G , ΣG , π




1, otherwise


*
[t ]
[ t −1]
If the candidate G j is rejected, we can then set G j = G j . The MH ratio can be further

derived as:
−1



 1

* −1/2
exp  − y * ' ( V j * ) y * 

 Vj

 2
 ,1

min

α ( G[jt −1] , G*j ) = 
 V [t −1] −1/2 exp  − 1 y * ' V [t −1] −1 y *  

 
j
 j

 2
 


1, otherwise


(

)

It can be demonstrated that this MH ratio can be written as:





r j2 
*−1/2

exp  −
 Rj


 2R * 



j 

,1
min 
[ t −1]
*
2
α (G j , G j ) = 


 R [t −1]−1/2 exp  − r j  
j

[
t
1]
−
 2R
 

j

 



1, otherwise

188

where

rj

 zj '  *  zj ' 
 g1 j   z j ' 
 z ' D  y j =  z ' D   z. j Dz. j   g  +  z ' D  e
 j 
 j 
 2j  j 

and

 z ' 
g   z '  
R j var
=
=
( r j ) var   z 'j D  z. j Dz. j   g1 j  +  z 'j D e 
 2j  j  
 j 
z j ' Dz j   z j ' z j
z j ' Dz j   z j ' z j
z j ' Dz j  2
 zj 'zj
 z ' Dz z ' DDz  G j  z ' Dz z ' DDz  +  z ' Dz z ' DDz  s e
j
j
j
j
j
j
j
j
j
 j
 j
 j
If a non-zero matrix for G j is sampled, then one can draw samples of g j using the same
[ t +1]
[t ]
[ t +1]
[t ]
full conditionals as with IW-BayesA. If either one of g1 j , g1 j , g 2 j , g 2 j are non-

(

)

(

e − z. j g1[ tj+1] − g1[ tj] − Dz. j g 2[ tj+1] − g 2[ tj]
zero, we need to update residual e =

)

immediately

after sampling g j , where g1 j [ t +1] , g 2 j [ t +1] are the sampled g1 j , g 2 j value at cycle [t+1] and

g1 j [ t ] , g 2 j [ t ] are the sampled g1 j , g 2 j value at cycle [t].
To sample proportion of SNP markers associated with non-zero effects in IWBayesB, with specified prior p (π | απ , βπ ) = Beta (απ , βπ ) , the FCD of π is based on the
following,









p (π | y,else ) ∝  ∏ p ( G j | vg , Σg , π )  p (π | απ , βπ )
m

j =1

189

=
Let m1

m

∑ I(G
j =1

j

≠ 02×2 ) denote the number of non-zero vectors sampled in G j at a

particular MCMC cycle where I(.) denotes the indicator function. Then we can write

p (π | y,else
=) Beta(απ + m1 , βπ + m − m1 ) .
To sample scale matrix for genetic variance-covariance matrices in IWBayesA/IW-BayesB, given the specified Wishart prior for scale matrix

p( Σg ) ∝ W( v0 , Σ0 / v0 ) , we can write FCD for Σ g as the following,
 m

p ( Σg | y, else ) ∝  ∏ I ( G j ≠ 02×2 ) p ( G j | vg , Σg )  p( Σg )
 j =1

 m

vg /2
∝  ∏ I ( G j ≠ 02×2 ) Σg
exp −0.5trace ( G −j 1Σg ( vg − 3) )  Σg
 j =1


(

∝ Σg

(

1
v0 + vg m −3
2

)

)

1
( v0 −3)
2

(

exp −0.5trace ( Σg Σ0−1v0 )

m

 

exp  −0.5trace  Σg  ( vg − 3)∑ I ( G j ≠ 02×2 ) G −j 1 + v0 Σ0−1   



j =1
  
 


−1
m



−1
−1 
Hence, Σg | y, else ~ W  v0 + vg m,  ( vg − 3)∑ I ( G j ≠ 02×2 ) G j + v0 Σ0  

j =1

 


To sample the degrees of freedom for genetic variance-covariance matrices in
IW-BayesB/IW-BayesA, with a specified non-informative prior for degrees of freedom
p( vg ) ∝ 1 / (1 + vg ) , we can write the FCD for vg as the following,
2

 m

p ( vg | y, else ) ∝  ∏ I ( G j ≠ 02×2 ) p G j | vg , Σg  p( vg )
 j =1


(

vg /2
 m
( vg − 3) Σg
Gj
∝  ∏ I ( G j ≠ 02×2 ) − vg
 j =1
2 Γ 2 ( vg / 2 )


)

−

(

1
vg + 3
2

)


1
exp −0.5trace ( G −j 1Σg ( vg − 3) ) 
 (1 + vg )2


190

(

)

)

We sampled vg using a random walk Metropolis Hastings algorithm, which was
described in other non-genomic applications involving the sampling of degrees of
freedom parameter (Kizilkaya and Tempelman 2005; Bello, Steibel et al. 2010) .
To sample random intercept and slope effects in IW-BayesC, FCD for the jth
element of g is as follows:

(


g j | y, β, g − j , Σg , s e2 ~ N g j , V
gj

)

Where
−1

  z′. j 
  z′  
g 
 z. j Dz. j  + Σg −1s e2   . j   ( y-Xβ − Zg1 − DZg 2 ) +  z. j Dz. j   1 j  
=
g j  



 g2 j 
 z′ D 
 z′ D 


 .j 
  .j 
−1

  z′. j z. j
  z′. j 
z′. j Dz. j 
= 
+ Σg −1s e2  
e + z. j g1 j + Dz. j g2 j

 z′. j Dz. j z′. j DDz. j
 z′. j D 



 
  z′. j z. j

z′. j Dz. j 
= 
+ Σg −1s e2 

 z′. j Dz. j z′. j DDz. j





−1

(

(

)

) 
)

 z′. j e + z. j g1 j + Dz. j g 2 j

 z′ D e + z g + Dz g
.j 1j
.j 2j
 .j

(

  z′. j z. j z′. j Dz. j  −2

−1

Σ
s
=
+
and V


gj
g 
  z′. j Dz. j z′. j DDz. j  e




−1

(

)

(

)

After sampling g j , we update residual by e =
e − z. j g1[tj+1] − g1[ tj] − Dz. j g2[t j+1] − g2[ t j] ,
where g1 j [ t +1] , g 2 j [ t +1] are the sampled g1 j , g 2 j value at cycle [t+1] and g1 j [ t ] , g 2 j [ t ] are
the sampled g1 j , g 2 j value at cycle [t].

191

To sample genetic variance-covariance matrix for random effects in IW-BayesC,
from the joint posterior density, we can derive FCD for Σg , which has an inverted
Wishart prior p( Σg | v0 , Σ0 ) ∝ IW ( v0 , Σ0 ) .
p ( Σg | y, else ) ∝ p(g | Σg ) p( Σg | v0 , Σ0 )
∝ Σ g ⊗ I m ×m
∝ Σg

−

−1/2

1
( m + v0 + 3)
2

1


− ( v0 + 3)
g   
exp  −0.5  [ g1′ g′2 ]  Σg−1 ⊗ I m×m   1    Σg 2
exp −0.5trace ( Σg−1Σ0 )


g 2   



(

(

(

exp −0.5trace Σg−1 ( Sg + Σ0 )

))

 g ' g g1' g 2 
where S g =  '1 1
'

 g 2 g1 g 2 g 2 
i.e. p( Σg | y, else) ~ IW ( v0 + m, Sg + Σ0 )
To sample residual variance in IW-BayesA/IW-BayesB/IW-BayesC, given a
2
−2
specified scaled inverted Chi-square prior p(σ e |ν e , Se ) = χ (ν e ,ν e Se ) , we can write
2
the FCD for s e as follows,

p (s e2 | y, else )
∝ ( 2πs

∝s

2
e

)

2 − n /2
e

− n +ν e +1


2



ν



ν S


 2 − 2e +1 − 2es e2e
1
exp  − 2 e ' e  s e
e
 2s e



exp  −




1

( e ' e + ν e Se ) 
2s

2
e

where e = y-Xβ − Zg1 − DZg 2

(

i.e. p(s e2 | y, else)  χ −2 df =
e′e + ν e Se
n + ν e , scale =
192

)

)

C1.2 MCMC Implementation Strategy for CD-BayesB/CD-BayesA
As presented in the Materials and Methods, a square root free Cholesky
decomposition can be applied to the genetic variance-covariance matrix. Let

=
g 2 Ψg1 + g 2|1 where g 2|1 is the vector of SNP specific environmental slope effects
conditional on intercept effects and Ψ = diag {φ j } j =1 represents a diagonal matrix of SNPm

specific associations between intercept and slope effects. The RR/RN WGP model can be
rewritten as:
y=Xβ + Zg1 + DZ ( Ψg1 + g 2|1 ) + e
=Xβ + ( Z + DZΨ ) g1 + DZg 2|1 + e
Based on the description of priors for the two CD models in materials and methods, we
can write down the joint posterior density for all unknown parameters:

(

p β, g1, g 2|1, sg , sg , φ, mφ ,s φ2 ,v1, v2 , s12 , s22 , π 1 , π 2 , s e2 | y
1

(

2|1

)

)

∝ p y | β, g1, g 2|1, sg , sg , φ, mφ ,s φ2 ,v1, v2 , s12 , s22 , π 1 , π 2 , s e2 p ( β )
1

2|1

 m
 m
 m
 m
2
2
2
2
2
2
 ∏ p( g1 j | sg1 j )   ∏ p sg1 j | v1, s1 , π 1   ∏ p( g 2|1 j | sg2|1 j )   ∏ p sg2|1 j | v2 , s2 , π 2
=
=
 j 1=
 j 1
 j 1
  j =1

(

(

)



) 


 m

2
2
2
2
2
 ∏ p(φ j | mφ ,s φ )  p mφ p s φ p ( v1 ) p ( v2 ) p s1 p s2 p (π 1 ) p (π 2 ) p s e
 j =1


( ) ( )

( ) ( )

( )

′
Where sg1 = s g211 s g212 s g213  s g21m  is a vector of SNP specific variances for
′
intercept, sg2|1 = s g22|1,1 s g22|1,2 s g22|1,3  s g22|1,m  is a vector of SNP specific variances for

slope conditional on intercept, φ = [φ1 φ2  φm ]′ is a vector of SNP specific association
193

2
2
parameters between intercept and slope. In CD-BayesB ( π 1 < 1 and π 2 < 1 ), sg1 j ( sg2|1 j )

has a mixture prior of point mass at zero with probability 1 − π 1 ( 1 − π 2 ) and scaled
inverted Chi-square distribution of degree freedom v1 ( v2 ) and scale parameter s12 ( s22 )
2
2
with probability π 1 ( π 2 ). In CD-BayesA ( π 1 = 1 and π 2 = 1), sg1 j ( sg2|1 j ) has a scaled

inverted Chi-square distribution of degree freedom v1 ( v2 ) and scale parameter s12 ( s22 ).
To sample all fixed effects in CD-BayesA/CD-BayesB, FCD for the kth element of
β is as follows:

β k | y, β − k , else ~ N ( βk , vβ k )
with

βk =

x.' k

( ( y-Xβ − ( Z + DZΨ ) g

x.' k ( e + x.k β k )
= =
x.' k x.k

(x

1

− DZg 2|1 ) +x.k β k

)

'
.k

x x.k

'
.k

e + x.' k x.k β k )
x.' k x.k

 n
 n 2 
x
e
+
 ∑ ik i  ∑ xik  β k 
k 1=
=
i1  
=
'
x.k x.k

 n 2

v
s
s
x
x
=
=
and β k
( .k )
 ∑ xik 
 k =1 
2
e

'
.k

−1

−1

2
e

(

)

e − x.k β k [ t +1] − β k [ t ] ,
Immediately after sampling β k , we update the residual by e =
where β k [ t +1] is the sampled β k value at cycle [t+1] and β k [ t ] is the sampled β k value at
cycle [t].
194

To sample random intercept effects in CD-BayesA, with specified prior on g1 j ,
 z1 j (1 + d1φ j ) z2 j (1 + d 2φ j ) z3 j (1 + d 3φ j )  znj (1 + d nφ j )′ is
let’s define z*.g1 j =



column j of Z + DZΨ . The FCD can be written as follows:

{ } ,{s }

m
2
g2|1 j
=j 1 =j 1

g1 j | y, β, g −1 j , g 2|1 , s g21 j

m

(

, s e2 , {φ j } j =1 ~ N g1 j , vg1 j
m

)

where

g1 j =

z.'*g1 j

( ( y-Xβ − ( Z + DZΨ ) g − DZg ) +z
1

z

)

(

'*
*
. g1 j . g1 j

z

+s s
2
e

2|1

*
. g1 j

g1 j

)

−2
g1 j

z.'*g1 j e + z.*g1 j g1 j
z.'*g1 j e+z.'*g1 j z.*g1 j g1 j
= =
z.'*g1 j z.*g1 j + s e2s g−12j z.'*g1 j z.*g1 j + s e2s g−12j
2
 n 2
φ
z
1
+
d
e
+
(
)
∑
ij
i j i
 ∑ zij (1 + d iφ j )  g1 j
=i 1 =
i1

=
2
 n 2

2 −2
 ∑ zij (1 + d iφ j ) + s e s g1 j 
 i =1

n

n
2


vg1 j  s e−2 ∑ zij (1 + d iφ j ) + s g−12j 
and =
i =1



(

)

−1

e − z*. g1 j ( g1 j [ t +1] − g1 j [ t ] ) ,
Immediately after sampling g1 j , we update the residual by e =
[ t +1]
[t ]
where g1 j
is the sampled g1 j value at cycle [t+1] and g1 j is the sampled g1 j value at

cycle [t].
To sample random slope effects conditional on intercept in CD-BayesA, with

′
*
specified prior on g 2|1 j , let’s define z. g2|1 j =  z1 j d1 z2 j d 2 z3 j d 3  znj d n  is the jth
column of DZ . The FCD can be written as follows:
195

{ } ,{s }

m
2
g2|1 j
=j 1 =j 1

g 2|1 j | y, β, g −2|1 j , g1 , s g21 j

m

(

, s e2 , {φ j } j =1 ~ N g 2|1 j , vg2|1 j
m

)

where

g 2|1 j =

z.'*g2|1 j

( ( y-Xβ − ( Z + DZΨ ) g − DZg ) +z
1

z

(

)

'*
. g2|1 j

z

*
. g2|1 j

2|1

+s s
2
e

*
. g2|1 j

g 2|1 j

)

−2
g2|1 j

z.'*g2|1 j e + z.*g2|1 j g 2|1 j
z.'*g2|1 j e+z.'*g2|1 j z.*g2|1 j g 2|1 j
= =
z.'*g2|1 j z*. g2|1 j + s e2s g−2|12 j
z.'*g2|1 j z.*g2|1 j + s e2s g−2|12 j
2
 n
( di zij )ei +  ∑ ( di zij )  g2|1 j
∑
=i 1 =
i1

=
n
2

2 −2 
 ∑ ( d i zij ) + s e s g2|1 j 
 i =1

n

2
 −2 n
−2 
=
and vg2|1 j  s e ∑ ( d i zij ) + s g2|1 j 
i =1



−1

Immediately after sampling g 2|1 j , we update the residual by
e=
e − z*. g2|1 j ( g 2|1 j [ t +1] − g 2|1 j [ t ] ) , where g 2|1 j [ t +1] is the sampled g 2|1 j value at cycle [t+1]
[t ]
and g 2|1 j is the sampled g 2|1 j value at cycle [t].

To sample variances of SNP intercept effects in CD-BayesA, given the previously
2
2
−2
2
specified prior p(s g1 j | v1, s1 ) = χ ( v1, v1s1 ) , we can derive FCD as follows:

196

(

)

(

p s g2 | y , else ∝ p g1 j | s g2
1j

(

∝ 2πs

2
g1 j

)

−1/2

1j

 g1 j 2
exp  − 2
 2s g


∝s

 (ν 1 +1) +1 

2


 2
s g



| ν 1 , s12

2
g1 j

 ν 1 +1 

2 

−

1j

1j

−
2 
g1 j

) p (s

)

 ν 1s12
 2s g2


exp  −

1j






 ν 1s12 + g1 j 2 
exp  −

2


2
s
g


1j

v1 + 1, scale =
g12j + v1s12 )
That is, s g21 j | y, else  χ −2 ( df =

To sample variances of SNP slope effects conditional on intercept in CD-BayesA,
2
2
−2
2
given the previously specified prior p(s g2|1 j | v2 , s2 ) = χ (v2 , v2 s2 ) , we can derive FCD

as follows:

(

)

(

) p (s

p s g2 | y , else ∝ p g 2|1 j | s g2

(

2|1 j

∝ 2πs

2
g 2|1 j

)

−1/2

2|1 j

 g 2|1 j 2
exp  − 2
 2s g


2|1 j

∝s

 (ν 2 +1) +1 

2


−
2

g 2|1 j

 2
s g



2
g 2|1 j

 ν 2 +1 

2 

−

2|1 j

| ν 2 , s22

)

 ν 2 s22
 2s g2


exp  −

2|1 j






 ν 2 s22 + g 2|1 j 2 
exp  −

2


2
s
g


2|1 j

v2 + 1, scale =+
g 2|21 j v2 s22 )
That is, s g22|1 j | y, else  χ −2 ( df =

To sample variances and effects of SNP intercepts in CD-BayesB, according to
2
the collapsed sampling strategy (Liu 1994), we jointly sampled g1 j and s g1 j as adapted

in Bayes B (Meuwissen et al. 2001). Let’s define
 z1 j (1 + d1φ j ) z2 j (1 + d 2φ j ) z3 j (1 + d 3φ j )  znj (1 + d nφ j )′ is the jth column of
z*.g1 j =



Z + DZΨ . We first sample from

197

(

) ∫ p (s

p s g21 j | ELSE except g1 j =

2
g1 j

)

, g1 j | ELSE dg1 j

g1 j

∝

(

) (

)

 n

p ( yi | β, g −1 j , g1 j , g 2|1 , s e2 )  p g1 j | s g21 j p s g21 j | ν 1 , s12 , π 1 dg1 j
∫g  ∏
i =1

1j

(

)

∝ p s g21 j | ν 1 , s12 , π 1 Vg1 j

−1/2

 1

exp  − y *g1 j ' Vg−11j y *g1 j 
 2


( y-Xβ − ( Z + DZΨ)

*
where y g=
1j

−j

)

g −1 j − DZg 2|1 =
and Vg1 j z*g1 j z '*g1 j s g21 j + Is e2 .

Using that expression, the random walk Metropolis-Hastings acceptance ratio for

(

)

sampling from p s g21 j | ELSE except g1 j at MCMC cycle [t] based on using

)

(

p s g21 j | ν 1 , s12 , π 1 as the candidate density looks as follows:

(

α s g2[ t −1] , s g2*
1j

1j

)

(
(

) (
) (

)
)


 p s g2* | ELSE except g1 j p s g2[ t −1] | ν 1 , s12 , π 1 
1j
1j
 min 
,1
2[
1]
2*
2
−
t
=
 p sg

|
except
|
,
,
s
ν
π
ELSE
g
p
s
1j
1 1
1
g1 j
1j



1, otherwise;


which can be demonstrated (Meuwissen et al. (2001)), to be equal to:

(

α s g2[t −1] , s g2*
1j

1j

)

−1/2



 1

Vg1 j *
exp  − y *g1 j ' Vg1 j *−1y *g1 j 



 2

 min 
,1

=
 V [t −1] −1/2 exp  − 1 y * ' V [t −1] −1 y *  

g
g1 j
g1 j 
 g1 j


 2 1j
 


1, otherwise


(

and further simplified as follows:

198

)




 rg21 j 
−1/2
*



exp  − * 
vg1 j
 2v g 



1j 

,1
 min 

=

rg21 j  
−1/2

[ t −1]
exp  −


 v g1 j
 2v g [ t −1]  
1j


 


1, otherwise


( )

(

α s g2[t −1] , s g2*
1j

1j

)

(

)

where

rg1 j = z.'*g1 j y*g1 j =



2

∑ zij (1 + diφ j )ei +  ∑ zij2 (1 + diφ j )  g1 j
n

n

=i 1 =
i 1



and

vg1 j var(
z.'*g1 j y *g1 j )
=
=

(

z.'*g1 j z.*g1 j

)s
2

2
g1 j

(

)

+ z.'*g1 j z.*g1 j s e2

2

2
2
 n 2
 n 2
2
=  ∑ zij (1 + d iφ j )  s g1 j +  ∑ zij (1 + d iφ j )  s e2
 i 1=

i1


2
If a non-zero value for s g1 j is sampled, then one can draw samples of g1 j using the same
[ t +1]
[t ]
full conditionals as with CD-BayesA. If either g1 j or g1 j are nonzero, the residual

e − z*. g1 j ( g1 j [ t +1] − g1 j [ t ] ) immediately thereafter.
needs to be updated as e =

To sample variances and effects of SNP slopes conditional on intercepts in CD2
BayesB, we jointly sampled g 2|1 j and s g2|1 j as adapted in Bayes B (Meuwissen et al.

′
*
2001). Let’s define z. g2|1 j =  z1 j d1 z2 j d 2 z3 j d 3  znj d n  is the jth column of DZ . We
first sample from

199

(

) ∫ p (s

p s g22|1 j | ELSE except g 2|1 j =

2
g2|1 j

)

, g 2|1 j | ELSE dg 2|1 j

g2|1 j

(

) (

)

 n

∝ ∫  ∏ p ( yi | β, g −2|1 j , g 2|1 j , g1 , s e2 )  p g 2|1 j | s g22|1 j p s g22|1 j | ν 2 , s22 , π 2 dg 2|1 j

g2|1 j  i =1

(

)

∝ p s g22|1 j | ν 2 , s22 , π 2 Vg2|1 j

*
where y g2|1=
j

−1/2

 1

exp  − y *g2|1 j ' Vg−2|11 j y *g2|1 j 
 2


( y-Xβ − ( Z + DZΨ) g − ( DZ )
1

−j

)

g −2|1 j=
and Vg2|1 j z*g2|1 j z '*g2|1 j s g22|1 j + Is e2 .

Using that expression, the random walk Metropolis-Hastings acceptance ratio for

(

)

sampling from p s g22|1 j | ELSE except g 2|1 j at MCMC cycle [t] based on using

(

)

p s g22|1 j | ν 2 , s22 , π 2 as the candidate density looks as follows:

(

α s

2[ t −1]
g2|1 j

,s

2*
g2|1 j

)

(
(

) (
) (

)
)


 p s g2* | ELSE except g 2|1 j p s g2[ t −1] | ν 2 , s22 , π 2 
2|1 j
2|1 j
 min 
,1
2[
1]
2*
2
−
t
=
 p sg

|
except
|
,
,
s
ν
π
ELSE
g
p
s
2|1 j
2 2
2
g2|1 j
2|1 j



1, otherwise;


According to Meuwissen et al. (2001), this ratio is further equal to:

(

α s g2[t −1] , s g2*
2|1 j

2|1 j

)

−1/2



 1

Vg2|1 j *
exp  − y *g2|1 j ' Vg2|1 j *−1y *g2|1 j 



 2

 min 
,1

=
 V [t −1] −1/2 exp  − 1 y * ' V [t −1] −1 y *  

g
g2|1 j
g2|1 j 
 g2|1 j


 2 2|1 j
 


1, otherwise


(

and can be further simplified as:

200

)




 rg22|1 j 
−1/2
*



vg2|1 j
exp  − * 
 2v g 



2|1 j 

,1
 min 

=

rg22|1 j  
−1/2

[ t −1]
exp  −


 vg2|1 j
 2vg [ t −1]  
2|1 j


 


1, otherwise


(

(

α s g2[t −1] , s g2*
2|1 j

2|1 j

)

)

(

)

where

*
=
rg2|1 j z.'*=
g2|1 j y g2|1 j



2

∑ zij di ei +  ∑ ( zij di )  g2|1 j
n

n

=i 1 =
i 1



and

z.'*g2|1 j y *g2|1 j )
=
vg2|1 j var(
=

(

z.'*g2|1 j z*. g2|1 j

)s
2

2
g2|1 j

(

)

+ z.'*g2|1 j z*. g2|1 j s e2

2

2
2
 n
 n
2
=  ∑ ( zij d i )  s g2|1 j +  ∑ ( zij d i )  s e2
=
 i 1=

i1


2
If a non-zero value for s g2|1 j is sampled, then one can draw samples of g 2|1 j using the
[ t +1]
[t ]
same full conditionals as with CD-BayesA. If either g 2|1 j or g 2|1 j are nonzero, the

e − z*. g2|1 j ( g 2|1 j [ t +1] − g 2|1 j [ t ] ) immediately thereafter.
residual needs to be updated as e =

To sample proportion of SNP markers associated with non-zero intercept effects

(

)

in CD-BayesB, with specified prior p π 1 | απ1 , βπ1 = Beta (απ1 , βπ1 ) , the FCD of π 1 is
based on the following,



m



j =1

(



)

p (π 1 | y,else ) ∝  ∏ p s g21 j |ν 1, s12 , π 1  p (π 1 | απ , βπ


1

201

1

)

∑ I (s

)

m

=
Let m1

j =1

> 0 denote the number of non-zero values sampled in s g21 j at a

2
g1 j

particular MCMC cycle where I(.) denotes the indicator function. Then we can write

p (π 1 | y,else
=) Beta(απ + m1 , βπ + m − m1 ) .
1

1

To sample proportion of SNP markers associated with non-zero slope effects
conditional on intercept in CD-BayesB, with specified prior

p (π 2 | απ , βπ ) = Beta (απ , βπ ) , the FCD of π 2 is based on the following,
2

2

2



m



j =1

2



)

(

p (π 2 | y,else ) ∝  ∏ p s g22|1 j |ν 2 , s22 , π 2  p (π 2 | απ , βπ



=
Let m2

∑ I (s
m

j =1

2
g2|1 j

2

2

)

)

> 0 denote the number of non-zero values sampled in s g2 at a
2|1 j

particular MCMC cycle where I(.) denotes the indicator function. Then we can write

p (π 2 | y,else
=) Beta(απ + m2 , βπ + m − m2 ) .
2

2

To sample association parameters between intercept and slope effects in CDBayesA/CD-BayesB, with specified prior φj ~ N ( mφ , s φ2 ) , let’s define G = diag {g1 j } j =1
m

′
*
and z.φ j =  z1 j d1 g1 j z2 j d 2 g1 j z3 j d 3 g1 j  znj d n g1 j  is column j of DZG . The FCD for

φj can be written as follows:

{ } ,{s }

m
2
g2|1 j
1j
j 1=
j 1
=

φ j | y, β, φ− j , g1 , g 2|1 , s g2

m

(

, s e2 , ~ N φj , vφ j

202

)

where

φj =

(

s e2s φ−2 mφ +z*.φ j ' ( y-Xβ − Zg1 -DZGφ − DZg 2|1 ) +z.*φ jφ j
*
.φ j

z 'z

*
.φ j

+ sφ s
−2

)

2
e

s e2s φ−2 mφ +z*.φ j ' ( e + z*.φ jφ j )

s e2s φ−2 mφ +z*.φ j ' e+z*.φ j ' z*.φ jφ j
=
z*.φ j ' z*.φ j + s φ−2s e2
z*.φ j ' z*.φ j + s φ−2s e2


n

=



s e2s φ−2 mφ +∑ zij d i g1 j ei +  ∑ ( zij d i g1 j )  φ j
n

2

 i =1

i =1
n


−2 2 
 ∑ ( zij d i g1 j ) + s φ s e 
 i =1




2

and
n
2


vφ j  s e−2 ∑ ( zij d i g1 j ) + s φ−2 
=
i =1



−1

e − z*.φj (φj [ t +1] − φj [ t ] ) , where
Immediately after sampling φj , we update the residual by e =

φj [ t +1] is the sampled φj value at cycle [t+1] and φj [ t ] is the sampled φj value at cycle [t].
To sampling mean for association parameters in CD-BayesA/CD-BayesB, given a
2
specified prior p( mφ ) = N (τ , ζ ) , FCD for mφ can be written as follows,

 m

p ( mφ | y , ELSE ) ∝  ∏ p (φ j | mφ , s φ2 )  p ( mφ | τ , ζ 2 )
 j =1


1
∝ exp  −
 2s 2
φ


∑ (φ


1
∝ exp  −
 2s m2
φ


∑(m

− mφ )

m

j =1

j

m

i =1

φ

2


 ( mφ − τ ) 2 
 exp  −

2ζ 2 



2
− mφ ) 



203

−1

m
 s φ2 
2 −1

 φ + ( ζ ) τ
φj
∑
m

j =1
s m2φ
=
Where mφ =
for φ =
and
2 −1
m
s 
−1
(ζ 2 ) +  mφ 



(

mφ | y, ELSE  N mφ , s m2φ

i.e.

−1

 s φ2  
2 −1
 (ζ ) + 

 m  


 


−1

)

To sampling variance of association parameters in CD-BayesA/CD-BayesB, with

( )

2
a specified prior p s φ2 = Gamma (αφ , βφ ) , FCD for s φ can be written as follows,

 m

p (s φ2 | y, ELSE ) ∝  ∏ p (φ j | mφ , s φ2 )  p (s φ2 )
 j =1


 1 m
αφ −1 − β s 2
2 
2 − m /2
2
πs
exp
− 2 ∑ (φ j − mφ )   (s φ2 ) e φ φ
(

φ )
 2s j =1


φ



 m − 2αφ 
−
+1
2  2


∝ sφ

 (φ − mφ )' (φ − mφ ) + βφ
exp  −

2s φ2






That is, s φ2 | y, ELSE  χ −2 ( df =( m − 2αφ ), scale =(φ − mφ )' (φ − mφ ) + βφ ) .
To sample scale parameter for SNP intercepts in CD-BayesA/CD-BayesB, with a
specified prior for scale parameter p( s12 | α1 , β1 ) = Gamma (α1 , β1 ) , we can write FCD as
follows:

204

 m

p ( s12 | y, ELSE ) ∝  ∏ I s g21 j > 0 p s g21 j | ν 1 , s12 , π 1  p ( s12 | α1 , β1 )
 j =1


(



 m
=  ∏ I s g21 j > 0
 j =1



(

∝ ( s12 )

α1 +

m1ν 1
−1
2

) (

)

ν1

)

 s12ν 1  2
s 2ν
ν
 − 1 21
 2 
− 1 +1
2

 s 2  2  e s g1 j
g1 j
ν 
Γ 1 
2



 2 α1 −1 − β1s12
 ( s1 ) e





ν m

exp  − s12  1 ∑ I s g21 j > 0 s g−12j + β1  


 2 j =1



(

)

mν
ν m
That is, a gamma distribution with parameters α1 + 12 1 and 1 ∑ I s g21 j > 0 s g−12j + β1 .
2 j =1

(

)

To sample scale parameter for SNP slopes conditional on intercept in CDBayesA/CD-BayesB, with a specified prior for scale parameter

p( s22 | α 2 , β 2 ) = Gamma (α 2 , β 2 ) , we can write FCD as follows,
 m

p ( s22 | y , ELSE ) ∝  ∏ I s g22|1 j > 0 p s g22|1 j | ν 2 , s22 , π 2  p ( s22 | α 2 , β 2 )
 j =1


(



 m
=  ∏ I s g22|1 j > 0
 j =1



(

∝ ( s22 )

α2 +

m2ν 2
−1
2

) (

)

ν2

)

 s22ν 2  2
s22ν 2
ν

 2 
− 2 +1 − 2s 2

 s 2  2  e g2|1 j
g2|1 j
ν 2 
Γ 
2



 2 α2 −1 − β2 s22
 ( s2 ) e





ν m

exp  − s22  2 ∑ I s g22|1 j > 0 s g−2|12 j + β 2  


 2 j =1



(

)

mν
ν m
That is, a gamma distribution with parameters α 2 + 22 2 and 2 ∑ I s g22|1 j > 0 s g−2|12 j + β 2 .
2 j =1

(

205

)

To sampling degrees of freedom for SNP intercept effects in CD-BayesA/CD-

v1 ) 1 / (1 + v1 ) ,
BayesB, with a specified non-informative prior for degrees of freedom p(=
2

we can write FCD for v1 as follows:
 m

p (ν 1 | y, ELSE ) ∝  ∏ I s g21 j > 0 p s g21 j | ν 1 , s12 , π 1  p (ν 1 )
 j =1


(



 m
=  ∏ I s g21 j > 0
 j =1



(

ν1

)

)

) (

 s12  2
s12
ν

2
− 1 +1 − 2s 2
  s 2  2  e g1 j
g
ν  1j
Γ 1 
2




 p (ν 1 )




We sampled v1 using a random walk Metropolis Hastings algorithm, which was
described in other non-genomic applications involving the sampling of degrees of
freedom parameter (Kizilkaya & Tempelman 2005; Bello 2010).
To sample degrees of freedom for SNP slopes conditional on intercepts in CDBayesA/CD-BayesB, with a specified non-informative prior for degrees of freedom

p(=
v2 ) 1 / (1 + v2 ) , we can write FCD for v2 as follows:
2

 m

p (ν 2 | y, ELSE ) ∝  ∏ I s g22|1 j > 0 p s g22|1 j | ν 2 , s22 , π 2  p (ν 2 )
 j =1


(



 m
=  ∏ I s g22|1 j > 0
 j =1



(

s 2
s22
ν2  −
2
  s 2 − 2 +1 e 2s g22|1 j
g
 ν 2  2|1 j
Γ 
 2
2
2

)

ν2

) (

)




 p (ν 2 )




206

We sampled v2 using a random walk Metropolis Hastings algorithm, which was
described in other non-genomic applications involving the sampling of degrees of
freedom parameter (Kizilkaya & Tempelman 2005; Bello 2010).
To sample residual variance in CD-BayesA/CD-BayesB, given a specified scaled
2
2
−2
inverted Chi-square prior p(σ e |ν e , Se ) = χ (ν e ,ν e Se ) , we can write the FCD for s e as

follows,

p (s e2 | y, else )
∝ ( 2πs

∝ s e2

)

2 − n /2
e

− n +ν e +1


2



ν



ν S


 2 − 2e +1 − 2es e2e
1
exp  − 2 e ' e  s e
e
 2s e



exp  −




1

( e ' e + ν e Se ) 
2s

2
e

where e = y − Xβ − ( Z + DZΨ ) g1 − DZg 2|1

(

i.e. p(s e2 | y, else)  χ −2 df =
e′e + ν e Se
n + ν e , scale =

207

)

C1.3 Derivation for overall genetic correlation between intercept and slope
The RR/RN WGP model can be written as follows,

yi = x 'i β + ∑ zij ( g1 j + di g 2 j ) + ei ; i =
1, 2,..., n
m

j =1

Then, the genetic variance at environmental covariate d i for animal i with genotype zij at
locus j can be defined as,

 m
 m 2 2
var  ∑ zij ( g1 j + di g 2 j =
)  ∑ zij s g1 j + di2s g22 j + 2dis g1 jg 2 j
=
 j 1=
 j1

(

m

m

)

m

=
∑ zij2s g21 j + ∑ zij2 di2s g22 j + ∑ zij2 2dis g1g 2 j

=j 1 =j 1

=j 1

Thus, for any environment d, we can define the genetic variance as follows,

 m

var  ∑ zij ( g1 j + dg 2 j ) 
 j =1

m

m
m
2 2
2
2 2
ij g1 j
ij g 2 j
=j 1 =j 1 =j 1

=
∑z s

+d

∑z s

+ 2d ∑ zij2 s g1g 2 j

Using de los Campos et al. (2012), we can compute the overall genetic variance across all
animals as follows:
n  m
m
m

=
Vg n −1 ∑  ∑ zij2s g21 j + d 2 ∑ zij2s g22 j + 2d ∑ zij2 s g1g 2 j 
=i 1 =
=j 1
=j 1
j1

−1

=
n

n

m

∑∑ z s

=i 1 =j 1

2
ij

2
g1 j

+d n
2

−1

n

m

∑∑ z s

=i 1 =j 1

2
ij

2
g2 j

+ 2dn

−1

n

m

∑∑ z s

=i 1 =j 1

208

2
ij

g1g 2 j

n

m

where n −1 ∑∑ zij2s g21 j is the overall genetic variance for intercept across animals,
=i 1 =j 1

n

−1

n

m

∑∑ z s

=i 1 =j 1

n

2
ij

2
g2 j

is the overall genetic variance for slope across animals and

m

n −1 ∑∑ zij2s g1g 2 j is the overall genetic covariance between intercept and slope across
=i 1 =j 1

animals. Therefore, the overall genetic correlation ρg1g2 between intercept and slope can
be written as,
n

ρg g =
1 2

m

n −1 ∑∑ zij2s g1g 2 j
=i 1 =j 1

 −1 n m 2 2  −1 n m 2 2 
 n ∑∑ zijs g1 j  n ∑∑ zijs g2 j 
 =i 1 =j 1
 =i 1 =j 1


209

C2 Supplementary tables and figures

Figure C2.1: Estimated SNP effects with Environmental (rescaled weeks of age 10, 13,
16, 19 and 22) dependence on back fat thickness for two SNP markers (solid line and
dash line) using the complete final analyses data in MSU Pig Resource Population under
model A) IW-BayesA, B) CD-BayesA.

210

APPENDIX D: Chapter 5
D1 Markov Chain Monte Carlo (MCMC) Implementation Strategy for Bayesian
hierarchical methods
D1.1 MCMC Implementation Strategy for IWBayesA
To sample location parameters computationally efficiently, we adopted GaussSeidel updating algorithm in MCMC implementation strategy (Legarra and Misztal,
2008). In the bivariate trait WGP model, y1 is the n x 1 vector of phenotypes for animals
on trait 1, y2 is the n x 1 vector of phenotypes for animals on trait 2, β1 (β2) is the q x 1
vector of fixed effects on trait 1 (trait 2), g1 (g2) represents the m x 1 vector of SNP
substitution effects on trait 1 (trait 2), X1 (X2) is the n x q design matrix for fixed effect β1
(β2), Z is the m x m genotype matrix.
 y1   X1 0   β1   Z 0   g1   e1 
 y  =  0 X  β  +  0 Z   g  +  e 

 2  2
2 2
 2 

[1]

To estimate the random SNP effects more efficiently, we block sampling trait specific
effects by SNP, such that we have a vector g = [ g11 , g 21 ,..., g1k , g 2 k ,..., g1m , g 2 m ]′ where
g k = [ g1k , g 2 k ]′ is random SNP effects on trait 1 and trait 2 for kth SNP. Let β = [β1 β2 ]′
and y = [ y1 y 2 ]′ , based on the description of priors for IWBayesA in methods, we can
write the joint posterior density for all unknown parameters as follows:

211

p ( β, g, G, vg , Σg , Σe | y )

∝ p ( y | β, g, G, vg , Σg , Σe ) p ( β )

[2]

 m
 m

p
(
g
|
G
)
∏
k
k   ∏ p G k | vg , Σg  p ( Σg | v0 , Σ 0 ) p vg p ( Σe )
=
 k 1=
 k 1


(

)

( )

where G is a 2m x 2m genetic variance-covariance matrix and Σg is a 2 x 2 scale matrix
for trait specific random SNP effects. We assume that G has an Inverted Wishart (IW)
prior of degree freedom vg and scale matrix Σg . Residuals for the two traits e = [ e1 e 2 ]′
were assumed to have a bivariate normal distribution with null mean and variance Is e21 Is e1e2 
covariance matrix R = 
2  . Thus, we know the inverse of the residual
 Is e1e2 Is e2 
−1

2
 Ir11 Ir12 
 r11 r12   s e1 s e1e2 
=
variance-covariance matrix is R =  12
where
 12 22  s
2  .
22 
 Ir Ir 
 r r   e1e2 s e2 

−1

To sample fixed effects on trait 1, FCD for lth element of β1 can be written as
follows:

β1l | y, β −1l , g1 , g 2 , G, Σe ~ N ( β1l , vβ 1l ) with

212

β1l =
=
=

(

'
x.(1)
( r11y1 +r12 y 2 -r11X1β1 -r12X 2β2 − r11Zg1 − r12Zg 2 ) +r11x.(1)l β1l
l
(1) ' (1) 11
.l
.l

)

x x r

(

'
x.(1)
( r11 ( y1 − X1β1 − Zg1 ) +r12 ( y 2 − X 2β2 − Zg 2 ) ) +r11x.(1)l β1l
l

)

(1) ' (1) 11
.l
.l

x x r
x

(1) '
.l

(r

e1 + r e 2 + r x β1l )

11

12

11 (1)
.l

' (1) 11
x.(1)
l x.l r

n
n
 n

r11 ∑ x1il e1i + r12 ∑ x1il e2i + r11  ∑ x1il2  β 1l
=i 1 =i 1 =
i1

=
n


r11  ∑ x1il2 
 i =1


 n 2 
 ∑ x1il 
(1) ' (1) 11 −1
 i =1

x.l x.l r )
=
and vβ 1l (=
11
r

−1

Immediately after sampling β1l , we update the first trait residual using

(

)

e1 =
e1 − x.(1)
β 1l [ t +1] − β 1l [ t ] , where x.(1)
l
l is the covariate/dummy variable values for fixed

effect variable l on trait 1, β1l [ t +1] is the sampled β1l value at cycle [t+1] and β1l [ t ] is the
sampled β1l value at cycle [t].
To sample fixed effects on trait 2, FCD for lth element of β2 can be written as
follows:

β 2 l | y, β −2 l , g1 , g 2 , G, Σe ~ N ( β2 l , vβ 2 l ) with

213

β2 l =
=

'
x.(2)
l

'
x.(2)
l

(( r

21

y1 +r 22 y 2 -r 21X1β1 -r 22 X 2β2 − r 21Zg1 − r 22 Zg 2 ) +x.(2)
l β 2l
x

(( r

12

)

(2) ' (2) 22
.l
.l

x r

( y1 − X1β1 − Zg1 ) +r 22 ( y 2 − X 2β2 − Zg 2 ) ) +r 22x.(2)l β 2l )
' (2) 22
x.(2)
l x.l r

'
12
22
22 (2)
22 (2)
22 (2) ' (2)
x.(2)
r12x.(2)
l ( r e1 + r e 2 + r x . l β 2 l )
l e1 + r x . l e1 + r x . l x . l β 2 l
=
' (2) 22
' (2) 22
x.(2)
x.(2)
l x.l r
l x.l r
n
n
 n

r12 ∑ x2il e1i + r 22 ∑ x1il e2i + r 22  ∑ x2il2  β 2 l
=i 1 =i 1 =
i1

=
n


r 22  ∑ x2il2 
 i =1


 n 2 
 ∑ x2il 
(2) ' (2) 22 −1
 i =1

x.l x.l r )
=
and vβ 2 l (=
22
r

−1

Immediately after sampling β 2l , we update the second trait residual using

(

)

e2 =
e 2 − x.(2)
β 2 l [ t +1] − β 2 l [ t ] , where x.(2)
is the covariate/dummy variable values for fixed
l
l

effect variable l on trait 2, β 2 l [ t +1] is the sampled β 2l value at cycle [t+1] and β 2 l [ t ] is the
sampled β 2l value at cycle [t].
To sample trait specific random SNP effects, FCD for kth element of g can be
written as follows:

g 
m

g k =  1k  | y, β1 , β2 , g − k , {G k }k =1 , Σe ~ N g k , V
gk
g
 2k 

(

Where

214

)

  r11z′ z r12 z′ z 

=
g k   21 .k .k 22 .k .k  + G k−1 
 r z′ z r z′ z

.k .k
.k .k 



−1

 z′.k ( r11e1 + r12e 2 + r11z.k g1k + r12 z.k g 2 k ) 


 z′.k ( r 21e1 + r 22e 2 + r12 z.k g1k + r 22 z.k g 2 k )



and

  r11z′.k z.k

=
Vgk   21
 r z′ z
  .k .k


r12 z′.k z.k 
−1
+
G


k
r 22 z′.k z.k 


−1

Immediately after sampling g k , we update the first trait residual using

e1 =
e1 − z.k ( g1k [ t +1] − g1k [ t ] ) and update the second trait residual using
e2 =
e 2 − z.k ( g 2 k [ t +1] − g 2 k [ t ] ) , where g1k [ t +1] ( g 2 k [ t +1] ) is the sampled g1k ( g 2k ) value at
cycle [t+1] and g1k [ t ] ( g 2 k [ t ] ) is the sampled g1k ( g 2k ) value at cycle [t].
To sample genetic variance-covariance matrix for random SNP effects, given the
specified conjugate prior on genetic variance-covariance matrix

p ( G k | vg , Σg ) ∝ IW( vg , Σg ) for kth SNP, FCD of G k can be written as follows:

p ( G k | else ) ∝ p ( g k | G k ) p ( G k | vg , Σg )
∝ Gk

−1/2

∝ Gk

−

(

(

)

exp −0.5 ( g′k G k−1g k ) G k

1
( vg +1) + 3
2

)

−

1
( vG + 3)
2

(

exp −0.5trace ( G k−1Σg )


 −1   g1k2

g1k g 2k 

+
Σ
exp  −0.5trace  G k  

g
2
 g g
  


g
1k
2k
2k









  g1k2

g1k g 2k 
G
|
~
IW
1,
Σ
else
v
+
+

 
Hence, k

g
2
 
 g
g
g
g
2k 
  1k 2k



215

)

To sample scale matrix for genetic variance-covariance matrices, given a
conjugate Wishart prior for scale matrix p( Σg ) ∝ W( v0 , Σ0 / v0 ) , FCD of Σg can be
written as follows:
p ( Σg | else ) ∝ ∏ p ( G k | vg , Σg )W( v0 , Σ0 / v0 )
m

k =1

1
vg /2
( v0 −3)
 m

exp −0.5trace ( G k−1Σg )  Σg 2
exp −0.5trace ( Σg Σ0−1v0 )
∝  ∏ Σg
 k =1

1
m



vg m /2
( v0 −3)
exp  −0.5trace  Σg ∑ G −j 1   Σg 2
exp −0.5trace ( Σg Σ0−1v0 )
∝ Σg


 j =1



(

∝ Σg

(

1
v0 + vg m −3
2

)

)

(

)

(

)


  m −1

exp  −0.5trace  Σg  ∑ G k + v0 Σ0−1   

  k =1


−1


 m −1
−1 
Hence, Σg | else ~ W  v0 + vg m,  ∑ G k + v0 Σ0  

 k =1
 


To sample degrees of freedom for genetic variance-covariance matrices, given a

( )

(

“non-informative” prior on degrees of freedom p vg ∝ 1 1 + vg

)

2

, we can write FCD

for vg as follows:
p ( vg | else ) ∝ 2

− vg m

Γ 2 ( vg / 2 )

−m

Σg

vg m /2

m

∏ Gj

−

j =1

(

1
vg + 3
2

)

(

exp −0.5trace ( G −j 1Σg )

) 1 +1v
(

g

)

2

where Γ 2 ( vg / =
2 ) π 2(2−1)/4 ∏ Γ  vg / 2 + (1 − p ) / 2  ∝ Γ  vg / 2  Γ  vg / 2 − 1 / 2 
2

p =1

The FCD for vg is not recognizable and so a sampling strategy for nonstandard
distributions is required. In order to use proposal densities, especially in a random walk
Metropolis implementation, it may be more appropriate to change the variable such that

216

its parameter space is defined on the real line. Using ζ = log ( vg ) , the relevant FCD in
this case is:

(

p (ζ | else ) ∝ 0.5 Σg
m

∏ Gj

)

1/2 m exp(ζ )

−

( Γ exp (ζ ) / 2 Γ exp (ζ ) / 2 − 1 / 2 )

1
( exp(ζ )+3)
2

j =1

(

exp −0.5trace ( G −j 1Σg )

) 1 + exp1 (ζ )
(

−m

)

2

exp (ζ )

Hence,
log p (ζ | else ) =

)

(

 exp (ζ ) log(0.5) + 0.5log Σg


m
 − log Γ exp (ζ ) / 2  − log Γ exp (ζ ) / 2 − 0.5 





−0.5 ( exp (ζ ) + 3) ∑ log G j − 0.5∑ trace ( Σg G −j 1 ) − 2 log (1 + exp (ζ ) ) + ζ
m

m

=j 1 =j 1

We sampled vg using a random walk Metropolis Hastings algorithm, which was
described in other non-genomic applications involving the sampling of degrees of
freedom parameter (Kizilkaya & Tempelman 2005; Bello 2010). Suppose the value of ζ
in the current cycle i is ζ[i]. Then generate a random variable, say δ from N(0,c2) and add
it to ζ[i] to propose ζ* = ζ[i] + δ. Determine the following odds ratio:

α = p (ζ * | else ) p (ζ [i ] | else ) . For numerical stability, it is perhaps much wiser to

(

)

evaluate
this ratio as: α exp log p (ζ [i +1] | else ) − log p (ζ [i ] | else ) . To implement this
=
Metropolis (within Gibbs) scheme, one would first generate U from a Uniform(0, 1)
distribution. If 1) α > 1, accept ζ[i+1] = ζ* ; 2) If α > U, accept ζ[i+1] = ζ* ; 3) If α < U,
then set ζ[i+1] = ζ[i].
217

To sample residual variance-covariance matrix, given a specified Inverted
Wishart prior p ( Σe | v0 , Σ0 ) = IW ( v0 , Σ0 ) , we can write the FCD for Σe as follows,

p ( Σe |ELSE , y ) ∝ p ( y | β, g, Σe ) p ( Σe | v0 , Σ0 )

(

)

exp −0.5 trace ( Σe −1Se ) Σe

∝ Σe

−

n
2

∝ Σe

−

1
( n + v0 + 3)
2

(

−

1
( v0 + 3)
2

(

exp −0.5trace ( Σe -1Σ0−1 )

))

(

exp −0.5 trace Σe −1 ( Se + Σ0−1 ) 



 e1' e1 e1' e 2 
where S e =  '

'
e 2 e1 e 2 e 2 

(

Hence, Σe | else ~ IW n + v0 , ( Se + Σ0−1 )

)

218

)

D1.2 MCMC Implementation Strategy for CD-BayesA/CD-BayesB
As defined in the methods, a square root free Cholesky decomposition (CD) can

g 2 Ψg1 + g 2|1
be applied to the genetic variance-covariance matrices. Thereby, if let =
where g 2|1 is the vector of SNP substitution effects for trait 2 conditional on trait 1 and

Ψ = diag {φ j } j =1 represents a diagonal matrix of SNP-specific associations between two
m

traits. Based on the description of priors for the two CD models in methods, we can write
down the joint posterior density for all unknown parameters:

(

p β, g1, g 2|1, sg , sg , φ, mφ ,s φ2 ,v1, v2 , s12 , s22 , π 1 , π 2 , π φ , Σe | y
1

(

2|1

)

)

∝ p y | β, g1, g 2|1, sg , sg , φ, mφ ,s φ2 ,v1, v2 , s12 , s22 , π 1 , π 2 , π φ , Σe p ( β )
1

2|1

 m
 m
 m
 m
2
2
2
2
s
s
π
s
p
(
g
|
)
p
|
v
,
s
,
p
(
g
|
)
p sg22|1k | v2 , s22 , π 2
∏
∏
g1 k   ∏
g1 k
1 ∏
g 2|1 k  
1 1
1k
2|1k

=
=
 k 1=
 k 1
 k 1
  j =1

(

(

)



) 




2
2
2
2
 ∏ p(φk | mφ ,s φ , π φ )  p mφ p s φ p ( v1 ) p ( v2 ) p s1 p s2 p (π 1 ) p (π 2 ) p π φ p ( Σe )
 k =1

′
Where sg1 = s g211 s g212 s g213  s g21m  is a vector of SNP specific variances for trait 1,
m

( ) ( )

( ) ( )

( )

′
s g2|1 = s g22|1,1 s g22|1,2 s g22|1,3  s g22|1,m  is a vector of SNP specific variances for trait 2

conditional on trait 1, φ = [φ1 φ2  φm ]′ is a vector of SNP specific association
2
2
parameters between two traits. In CDBayesB ( π 1 < 1 , π 2 < 1 and π φ < 1 ), sg1k ( sg2|1k )

has a mixture prior of point mass at zero with probability 1 − π 1 ( 1 − π 2 ) and scaled
inverted Chi-square distribution of degree freedom v1 ( v2 ) and scale parameter s12 ( s22 )
2
2
with probability π 1 ( π 2 ). In CDBayesA ( π 1 = 1 , π 2 = 1 and π φ = 1 ), sg1 j ( sg2|1 j ) has a

219

scaled inverted Chi-square distribution of degree freedom v1 ( v2 ) and scale parameter s12
( s22 ). To sample location parameters computationally efficiently, we again adopted
Gauss-Seidel updating algorithm in MCMC implementation strategy (Legarra and
Misztal, 2008).
To sample fixed effects on trait 1 in CDBayesA\CDBayesB, FCD for lth element
of β1 can be written as follows:
m
m
m
2
, {s g22|1k } , Σe , {φk }k =1 ~ N ( β1l , vβ 1l )
β1l | y, β −1l , g1 , g 2|=
1 , {s g 1k }k 1 =
k 1

with

β1l =
=
=

'
x.(1)
l

'
x.(1)
l

(( r

(( r

11

y1 +r12 y 2 -r11X1β1 -r12 X 2β2 − r11Zg1 − r12 ZΨg1 − r12 Zg 2|1 ) +r11x.(1)
l β1l

)

(1) ' (1) 11
.l
.l

x x r

11

( y1 − X1β1 − Zg1 ) +r12 ( y 2 − X 2β2 − ZΨg1 − Zg 2|1 ) ) +r11x.(1)l β1l )

'
11
12
11 (1)
x.(1)
l ( r e1 + r e 2 + r x . l β1l )

' (1) 11
x.(1)
l x.l r

' (1) 11
x.(1)
l x.l r

n
n
 n

r11 ∑ x1il e1i + r12 ∑ x1il e2i + r11  ∑ x1il2  β 1l
=i 1 =i 1 =
i1

=
n


r11  ∑ x1il2 
 i =1


 n 2 
 ∑ x1il 
(1) ' (1) 11 −1
 i =1

x.l x.l r )
=
and vβ 1l (=
11
r

−1

Immediately after sampling β1l , we update the first trait residual using

(

)

e1 =
e1 − x.(1)
β 1l [ t +1] − β 1l [ t ] , where x.(1)
l
l is the covariate/dummy variable values for fixed

220

effect variable l on trait 1, β1l [ t +1] is the sampled β1l value at cycle [t+1] and β1l [ t ] is the
sampled β1l value at cycle [t].
To sample fixed effects on trait 2 in CDBayesA\CDBayesB, FCD for lth element
of β2 can be written as follows:
m
m
m
2
, {s g22|1k } , s e2 , {φk }k =1 ~ N ( β2 l , vβ 2 l )
β 2 l | y, β −2 l , g1 , g 2|=
1 , {s g 1k }k 1 =
k 1

Where

β2 l =
=
=

'
x.(2)
l

'
x.(2)
l

(( r

(( r

12

y1 +r 22 y 2 -r12 X1β1 -r 22 X 2β2 − r12 Zg1 − r 22 ZΨg1 − r 22 Zg 2|1 ) +r 22x.(2)
l β 2l
x

12

)

(2) ' (2) 22
.l
.l

x r

( y1 − X1β1 − Zg1 ) +r 22 ( y 2 − X 2β2 − ZΨg1 − Zg 2|1 ) ) +r 22x.(2)l β 2l )

'
12
22
22 (2)
x.(2)
l ( r e1 + r e 2 + r x . l β 2 l )

' (2) 22
x.(2)
l x.l r

' (2) 22
x.(2)
l x.l r

 n 2
r ∑ x2il e1i + r ∑ x1il e2i + r  ∑ x2il  β 2 l
=i 1 =i 1 =
i1

=
n


r 22  ∑ x2il2 
 i =1

n

n

12

22

22

 n 2 
 ∑ x2il 
(2) ' (2) 22 −1
 i =1

x.l x.l r )
=
and vβ 2 l (=
22
r

−1

Immediately after sampling β 2l , we update the second trait residual using

(

)

e2 =
e 2 − x.(2)
β 2 l [ t +1] − β 2 l [ t ] , where x.(2)
is the covariate/dummy variable values for fixed
l
l

effect variable l on trait 2, β 2 l [ t +1] is the sampled β 2l value at cycle [t+1] and β 2 l [ t ] is the
sampled β 2l value at cycle [t].
221

To sample random SNP effects for trait 1 in CDBayesA, FCD for kth element of
g1 can be written as follows:
g1k | y, β1 , β2 , g −1k , g 2|1 , {s g21k } , {s g22|1k } , Σe , {φk }k =1 ~ N ( g1k , v1k )
m

m

m

=
k 1=
k 1

Where

(

g1k =

=

)

 z.' k ( r11y1 +r12 y 2 -r11X1β1 -r12 X 2β2 − r11Zg1 − r12 ZΨg1 − r12 Zg 2|1 + r11z.k g1k + r12φk z.k g1k )



 +φ z ' ( r12 y + r 22 y − r12 X β − r 22 X β − r12 Zg − r 22 ZΨg − r 22 Zg + r12 z g + r 22φ z g ) 
1
2
1 1
2 2
1
1
2|1
. k 1k
k . k 1k 
 k .k
r11z.' k z.k + r12φk z.' k z.k + r12φk z.' k z.k + r 22φk z.' kφk z.k + s g−12k

 z.' k ( r11e1 +r12e 2 + r11z.k g1k + r12φk z.k g1k )



 +φk z.' k ( r12e1 + r 22e 2 + r12 z.k g1k + r 22φk z.k g1k ) 



r11z.' k z.k + r12φk z.' k z.k + r12φk z.' k z.k + r 22φk z.' kφk z.k + s g−12k

n
n
n
 11 n

12
11 
2 
12 
2 
 r ∑ zik e1i + r ∑ zik e2i + r  ∑ zik  g1k + r  ∑ zik  φk g1k

i 1
=
 i 1=

i1 
 i 1 ==

n
n
n
 12 n


22
12
2 
22 2 
2 
 + r φk ∑ zik e1i + r φk ∑ zik e2i + r φk  ∑ zik  g1k + r φk  ∑ zik  g1k 
i 1 =i 1 =
=
 i 1 =
i1  
=
n


( r11 + 2r12φk + r 22φk2 )  ∑ zik2  + s g−12k
 i =1 

=

(r

11

n
 n

 n

+ r12φk )  ∑ zik e1i  + ( r12 + r 22φk ) ∑ zik e2i + ( r11 + 2r12φk + r 22φk2 )  ∑ zik2  g1k
i 1=
=
 i =1

i1 
n
11
12
22 2 
2 
−2
( r + 2r φk + r φk )  ∑ zik  + s g1k
 i =1 

and



 n

v1k = ( r11 + 2r12φk + r 22φk2 )  ∑ zik2  + s g−12k 
 i =1 



−1

Immediately after sampling g1k , we update the first trait residual using

222

e1 =
e1 − z.k ( g1k [ t +1] − g1k [ t ] ) and update the second trait residual using
e2 =
e 2 − φk z.k ( g1k [ t +1] − g1k [ t ] ) , where g1k [ t +1] is the sampled g1k value at cycle [t+1] and

g1k [ t ] is the sampled g1k value at cycle [t].
To sample random SNP effects for trait 2 conditional on trait 1 in CDBayesA,
FCD for kth element of g2|1 can be written as follows:
g 2|1k | y, β1 , β2 , g1 , g −2|1k , {s g21k } , {s g22|1k } , s e2 , {φk }k =1 ~ N ( g 2|1k , v2|1k )
m

m

m

=
k 1=
k 1

where

g 2|1k =
=
=
=

z.' k

(( r

12

y1 +r 22 y 2 -r12 X1β1 -r 22 X 2β2 − r12 Zg1 − r 22 ZΨg1 − r 22 Zg 2|1 + r 22 z.k g 2|1k )
r z z +s
22 '
.k .k

(

−2
g 2|1k

z.' k r12 ( y1 − X1β1 − Zg1 ) +r 22 ( y 2 − X 2β2 − ZΨg1 − Zg 2|1 ) + r 22 z.k g 2|1k
r z z +s
22 '
.k .k

−2
g 2|1k

z.' k ( r12e1 +r 22e 2 + r 22 z.k g 2|1k )
r 22 z.' k z.k + s g−2|2 1k
r12 z.' k e1 +r 22 z.' k e 2 + r 22 z.' k z.k g 2|1k
r 22 z.' k z.k + s g−2|2 1k

n
n
 n

r12 ∑ zik e1i + r 22 ∑ zik e2i + r 22  ∑ zik2  g 2|1k
=i 1 =i 1 =
i1 
=
n


r 22  ∑ zik2  + s g−2|2 1k
 i =1 

and

  n


v2|1k  r 22  ∑ zik2  + s g−2|2 1k 
=
  i =1 


−1

223

)

)

Immediately after sampling g 2k , we update the second trait residual using

e2 =
e 2 − z.k ( g 2|1k [ t +1] − g 2|1k [ t ] ) , where g 2|1k [ t +1] is the sampled g 2|1k value at cycle [t+1] and

g 2|1k [ t ] is the sampled g 2|1k value at cycle [t].
To sample random association parameters in CDBayesA, FCD for φk
(k=1,2,…,m) can be written as follows:

φk | y, β, g1 , g 2|1 , {s g2

} ,{s }
m

1k

2
g 2|1 k

k =1

m
k =1

(

, Σ e , φ− k ~ N φk , vφ k

)

with

φk =

(

s e2 s φ−2 mφ +z.gk ' ( y 2 -Xβ − Zg 2|1 ) +z.gkφk
2

)

z ' z + s sφ
g
.k

g
.k

2
e2

−2

s e2 s φ−2 mφ +z.gk ' ( e 2 + z.gkφ j )

s e2 s φ−2 mφ +z.gk ' e 2 +z.gk ' z.gkφk
=
z.gk ' z.gk + s e2 s φ−2
z.gk ' z.gk + s e2 s φ−2
2

2

2

n

=

2



n



s e2 s φ−2 mφ +g1k ∑ zik e2 j +  g12k ∑ ( zik )  φk


2

=i 1 =i 1
n
2
−2
2
2
ik
e2 φ
1k
i =1


g


∑(z )

2




 +s s


and

 2 n

2
  g1k ∑ ( z jk ) 

j =1



−2 
vφ k
=
+ sφ 

s e22







−1

224

Immediately after sampling φk , we update the second trait residual using

e2 =
e 2 − g1k z.k (φk [ t +1] − φk [ t ] ) , where z.gk = g1k z.k , φk [ t +1] is the sampled φk value at cycle
[t+1] and φk [ t ] is the sampled φk value at cycle [t].
To sampling SNP specific variances on trait 1 in CDBayesA,

(

)

given that the prior density is scaled inverted chi-square: s g21k | v1 , s12  χ −2 v1 , v1s12 , FCD
for s g21k (k=1,2,…,m) can be derived as follows:

(

)

(

) (

p s g21k | else ∝ p g1k | s g21k p s g21k | ν 1 , s12

(

∝ 2πs g21k

∝s

)

−1/2

 g 2
exp  − 1k2
 2s g
1k


)

ν1

ν s  2
ν 1s12
ν

− 1 +1 − 2
  2 
2s
s g21k  2 e g1k

 Γ  ν 1 
2
2
1 1

2
2
 (ν +1)  −ν 1s1 + g1 k
− 1
+1
2
2s g1 k
2  2

g1 k

e

(

v1 + 1, scale =
g12k + v1s12
Thus, s g21k | else  χ −2 df =

)

To sample SNP specific variances on trait 2 conditional on trait 1 in CDBayesA, given
that the prior density is scaled inverted chi-square:
for s g2 (k=1,2,…,m) can be derived as follows:
2|1k

225

s g22|1k |ν 2|1 , s2|21  χ −2 ( v2|1 , v2|1s2|21 ) , FCD

(

(

)

p s g2 | else ∝ p g 2|1k | s g2

(

2|1 k

∝ 2πs

2
g 2|1 k

)

−1/2

2|1 k

 g 2|1k
exp  − 2
 2s g


2

2|1 k

 (ν 2|1 +1)

∝s

−

2

g2|1 k

2


+1  −


) p (s

 2
s g



2
g 2|1 k

 ν 2|1

−

 2

| ν 2|1 , s2|21


+1  −

2|1 k



e

)

2
ν 2|1s2|1

2s g

2
2|1 k

2
2
+ g 2|1 k
ν 2|1s2|1

2s g

2

e

2|1 k

(

v2|1 1, scale =
g 2|21k + v2|1s2|21
Thus, s g22|1k | else  χ −2 df =+

)

To sample random SNP effects and variances for trait 1 in CDBayesB, according
2
to the collapsed sampling strategy (Liu 1994), we jointly sampled g1k and s g1k as

adapted in Bayes B (Meuwissen et al. 2001). Let’s first sample from

p (s g21k | ELSE except g1k ) =

∫ p (s

2
g 1k

, g1k | ELSE )dg1k

g1 k

 n

∝ ∫  ∏ p ( yi1, yi 2 | β, g −1k , g1k , g 2|1 , R )  p ( g1k | s g21k ) p (s g21k | ν 1 , s12 , π 1 ) dg1k

g1 k  i =1
∝ p (s g21k | ν 1 , s12 , π 1 ) V1k

−1/2

 1

exp  − y *−1k ' V1−k1y *−1k 
 2


where y *−1k =  y1,* −1k ' y *2,−1k '  ' , y1,* −1k =y1 -X1β1 − Z − k g −1k ,
 z



.k
2
y *2,−1k =
y 2 -X 2β 2 − ( ZΨ )− k g −=
1k − Zg 2|1 and V1k
φ z  [ z.k φk z.k ]s g1k + R ⊗ I
 k .k 

Using that expression, the random walk Metropolis-Hastings acceptance ratio for

(

)

sampling from p s g21k | ELSE except g1k at MCMC cycle [t] based on using

(

)

p s g21k | ν 1 , s12 , π 1 as the candidate density looks as follows:
226

α (s

2[ t −1]
g1 k

,s

2*
g1 k

)

(
(

) (
) (

)
)


 p s g2* | ELSE except g1k p s g2[ t −1] | ν 1 , s12 , π 1 
1k
1k
 min 
,1
 p s g2[ t −1] | ELSE except g1k p s g2* | ν 1 , s12 , π 1 
=
1k
1k



1, otherwise;


According to Meuwissen et al. (2001), this ratio is further equal to:

(

α s g2[t −1] , s g2*
1k

1k

)

−1/2



 1

V1k *
exp  − y *−1k ' V1k *−1y *−1k 



 2

 min 
,1

=
 V [t −1] −1/2 exp  − 1 y * ' V [t −1] −1 y *  

−1k
− 1k 
1k
 1k


 2
 


1, otherwise


(

)

Using results from Rohan Fernando, we could simplify the Metropolis acceptance ratio
for sampling SNP specific variance for trait 1 further as follows:

(

α s g2[t −1] , s g2*
1k

1k

)




 w12k 
* −1/2

 ( v1k ) exp  − * 

 2v1k  ,1
min 


=

w12k  
[ t −1] −1/2
exp
v
−

(
)

 

 1k
2v1k [t −1]  




1, otherwise

Where
 Ir11 Ir12   z.k g1k + e1 
w1k =  z.' k φk z.' k   12

22  
 Ir Ir  φk z.k g1k + e 2 
 r11z g + r11e + r12φ z g +r12e 
=  z.' k φk z.' k   12 .k 1k 12 1 22 k .k 1k 22 2 
 r z.k g1k + r e1 + r φk z.k g1k +r e 2 

= ( r11 + 2r12φk + r 22φk2 ) z.' k z.k g1k + ( r11 + φk r12 ) z.' k e1 + ( r12 + φk r 22 ) z.' k e 2

and

227

=
v1k var ( w1k ) = ( r11 + 2r12φk + r 22φk2 ) ( z.' k z.k ) s g21k + ( r11 + φk r12 ) ( z.' k z.k ) s e21
2

2

2

+ ( r12 + φk r 22 ) ( z.' k z.k ) s e22 + ( r11 + φk r12 )( r12 + φk r 22 )( z.' k z.k ) s e12
2

2
n
n
2 2
2
2
11
12 2
k
k
ik
g1 k
k
=i 1 =i 1

= ( r + 2r φ + r φ
11

12

22

)  ∑ z



 s


+ (r + φ r

)  ∑ z


2
ik

 2
 s e1


n
2

 n

+ ( r12 + φk r 22 )  ∑ zik2  s e22 + ( r11 + φk r12 )( r12 + φk r 22 )  ∑ zik2  s e12
=
 i 1=

i1 

2
If a non-zero value for s g1k is sampled, then one can draw samples of g1k using the same

full conditionals as with CD-BayesA. If either g1[kt +1] or g1[kt ] are nonzero, residual for

(

)

e1 − z.k g1k [ t +1] − g1k [ t ] and residual for trait 2 also
trait 1 needs to be updated as e1 =

(

)

e 2 − φk z.k g1k [ t +1] − g1k [ t ] immediately thereafter.
needs to be updated as e 2 =
To sample random SNP effects and variances for trait 2 conditional on trait 1 in
CDBayesB, according to the collapsed sampling strategy (Liu 1994), we jointly sampled

g 2|1k and s g2

2|1 k

as adapted in Bayes B (Meuwissen et al. 2001). Let’s first sample from

p (s g22|1k | ELSE except g 2 k ) =

∫ p (s

2
g 2|1k

, g 2|1k | ELSE )dg 2|1k

g2|1 k

∝

 n

p ( yi1, yi 2 | β, g1k , g 2|1k , g −2|1k , R )  p ( g 2|1k | s g22|1k ) p (s g22|1k | ν 2 , s22 , π 2 ) dg 2|1k
∫g  ∏
i =1

2|1 k

∝ p (s g22|1k | ν 2 , s22 , π 2 ) V2|1k

−1/2

 1

exp  − y *−2|1k ' V2|−11k y *−2|1k 
 2


where y *−2|1k =  y1* ' y *2,−2|1k ' ' , y1* =y1 -X1β1 − Z1g1 , y *2,−2|1k= y 2 -X 2β 2 − ZΨg1 − Zg −2|1k and
0
V2|1k   [ 0 z.k ]s g2 2|1k + R ⊗ I
=
 z. k 

228

Using that expression, the random walk Metropolis-Hastings acceptance ratio for

(

)

sampling from p s g22|1k | ELSE except g 2|1k at MCMC cycle [t] based on using

(

)

p s g22|1k | ν 2 , s22 , π 2 as the candidate density looks as follows:

(

α s

2[ t −1]
g2|1 k

,s

2*
g2|1 k

)

(
(

) (
) (

)
)


 p s g2* | ELSE except g 2|1k p s g2[ t −1] | ν 2 , s22 , π 2 
2|1 k
2|1 k
 min 
,1
t
−
2[
1]
2*
2
=
 p sg

|
ELSE
except
g
p
s
|
ν
,
s
,
π
g2|1 k
2|1k
2 2
2
2|1 k



1, otherwise;


According to Meuwissen et al. (2001), this ratio is further equal to:

(

α s g2[t −1] , s g2*
2|1 k

2|1 k

)

−1/2



 1

exp  − y *−2|1k ' V2|1k *−1y *−2|1k 
V2|1k *



 2

 min 
,1

=
 V [t −1] −1/2 exp  − 1 y * ' V [t −1] −1 y *  

2|1k
−2|1k
−2|1k 
 2|1k


 2
 


1, otherwise


(

)

Using results from Rohan Fernando, we could simplify the Metropolis acceptance ratio
for sampling SNP specific variance for trait 1 further as follows:

(

α s g2[t −1] , s g2*
2|1 k

2|1 k

)

2



 w2|1

−1/2
k

 ( v2|1k * ) exp  − * 

v
2


k
2|1

 ,1
min 
2
=

 
 ( v [t −1] )−1/2 exp  − w2|1k  

 2v [t −1]  
 2|1k
2|1k

 



1, otherwise


Where

229

e1

 Ir11 Ir12  
w2|1k = 0 z.' k   12

22  
 Ir Ir   z.k g 2|1k + e 2 
 r11e + r12 z g +r12e 
= 0 z.' k   12 1 22 .k 2|1k 22 2 
 r e1 + r z.k g 2|1k +r e 2 
= r12 z.' k e1 +r 22 z.' k e 2 +r 22 z.' k z.k g 2|1k
and

v2|1k = var ( w2|1k )
= ( r12 ) ( z.' k z.k ) s e21 + ( r 22 ) ( z.' k z.k ) s e22 + r12 r 22 ( z.' k z.k ) s e12 + ( r 22 ) ( z.' k z.k ) s g21k
2

2

2

2

2

n
n
n
 n 2 2
22 2 
2  2
12 22 
2 
22 2 
2 
2
+
r
z
s
r
z
s
r
r
z
s
r
+
+
( )  ∑ ik  e1 ( )  ∑ ik  e2
 ∑ ik  e12 ( )  ∑ zik  s g1k
=
 i 1=

 i 1=

i1=

i1 
12 2

2
If a non-zero value for s g2|1k is sampled, then one can draw samples of g 2|1k using the
[ t +1]
[t ]
same full conditionals as with CDBayesA. If either g 2|1k or g 2|1k are nonzero, residual

(

)

e 2 − z.k g 2|1k [ t +1] − g 2|1k [ t ] immediately
for trait 2 also needs to be updated as e 2 =
thereafter.
To sample random association parameters φk (k=1,2,…,m) in CDBayesB, given

 N ( mφ , s φ2 ) with prob π φ
the specified prior distribution, i.e. φk ~ 
, FCD on φk can be
with prob 1- π φ
0
derived following Geweke (1994):
If we define y *2,−φk = y 2 -X 2β 2 − ( ZΓ )− k φ− k − Zg 2|1 = z.gkφk + e 2 and z.gk = g1k z.k , then the

(

)(

 y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk
k
k
likelihood function kernel is : exp  −
2

2s e2

230

)  . Conditional on



 y *2,−φk k ' y *2,−φk k
φk = 0 , the value of the kernel is: exp  −

2s e22



 . Conditional on φk ≠ 0 , the


corresponding kernel density is:

(

)(

 y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk
k
k
s φ exp  −
2

2s e2

−1

(

)  exp  − (φ

k




)(

 y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk
k
k
=s φ exp  −

2s e22

−1

(

)(

 y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk
k
k
= s φ−1 exp  −

2s e22





2
− mφ ) 

2s φ2 


)  exp  − (φ

k







)  exp  − (φ

2ωk2

k







− φk

− φˆk

) − (φ
2

k

)

2

2
φk

2v

2
− mφ ) 

2s φ2 



2
2
ˆ2
 exp  −  φk + mφ − φk
  2ωk2 2s φ2 2vφ2k

 



 


where
2
 2 n

 2
2
−2
2

s e s φ mφ +g1k ∑ zik e2 j +  g1k ∑ ( zik )  φk
 g1k ∑ ( z jk )

j =1
−2 
2
=
=
i
i
1
1



=
+ sφ
and vφ k
φˆk =


s e22
 2 n
2
−2
2
+
g
z
s
s
(
)
∑
k
ik
e
φ
1




i =1




n

−1

n

2

2

To remove the conditioning on φk = 0 or on φk ≠ 0 , it is necessary to further integrate
this expression over φk . This integration yields:

(

)(

 y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk
k
k
=
exp  −

sφ
2s e22

vφk

)  exp  −  φ



mφ 2 φˆk 2  
+
−

  2ωk2 2s φ2 2vφ2k  

 
2

k

Thus, the conditional Bayes Factor in favor of φk ≠ 0 over φk = 0 is,

231

(

)(

 y *2,−φ ' y *2,−φ − y *2,−φ − z.gkφk ' y *2,−φ − z.gkφk
k
k
k
k
BFk
exp 
2

sφ
2s e2

 φˆ 2 m 2 
v
= φ k exp  k 2 − φ 2 
 2v

sφ
 φ k 2s φ 
vφ k

)  exp  −  φ

mφ 2 φˆk 2
+
−

  2ωk2 2s φ2 2vφ2k
 
2

k





 



To draw φk from its conditional distribution, the conditional posterior probability that

φk = 0 is computed from the conditional Bayes factor (BF) is pˆ k =

1 − πφ
(1 − π φ ) + π φ BFk

.

Based on a comparison of this probability with a drawing from a Uniform(0,1), the

(

2
choice φk = 0 or φk ≠ 0 is made. If φk ≠ 0 , then a draw is made from a N φˆk , vφ k

)

distribution. Immediately after sampling φk , if either φk[ t +1] or φk[ t ] are nonzero, we

(

)

e 2 − g1k z.k φk [ t +1] − φk [ t ] .
update residual for the second trait using e 2 =
To sample proportion of SNP markers associated with non-zero SNP effects for

(

)

trait 1 in CDBayesB, given a specified prior p π 1 | απ1 , βπ1 = Beta (απ1 , βπ1 ) , the FCD of

π 1 is based on the following,


m



j =1



(

)

p (π 1 | ELSE ) ∝  ∏ p s g21 j |ν 1, s12 , π 1  p (π 1 | απ , βπ



=
Let m1

∑ I (s
m

j =1

2
g1 j

1

1

)

)

> 0 denote the number of non-zero values sampled in s g21 j at a

particular MCMC cycle where I(.) denotes the indicator function. Then we can write

p (π 1 | ELSE
=
) beta(απ + m1, βπ + m − m1 ) .
1

1

232

To sample proportion of SNP markers associated with non-zero SNP effects for
trait 2 conditional on trait 1 in CDBayesB, given a specified prior

(

p π 2|1 | απ , βπ
2|1

2|1

) = Beta(α


m



j =1

π 2|1

, βπ 2|1 ) , the FCD of π 2|1 is based on the following,



) (

(

p (π 2|1 | ELSE ) ∝  ∏ p s g22|1 j |ν 2|1, s2|21, π 2|1  p π 2|1 | απ , βπ



=
Let m2

∑ I (s
m

j =1

2
g2|1 j

2|1

2|1

)

)

> 0 denote the number of non-zero values sampled in s g2 at a
2|1 j

particular MCMC cycle where I(.) denotes the indicator function. Then we can write

p (π 2|1 | ELSE
=
) beta(απ + m2 , βπ + m − m2 ) .
2|1

2|1

To sample proportion of non-zero association parameters between two traits in

(

)

CDBayesB, given a specified prior p π φ | αφ , βφ = Beta (αφ , βφ ) , the FCD of π φ is
based on the following,

 m

p π φ | ELSE ∝  ∏ p φk | mφ , s φ2 , π φ  p π φ | αφ , βφ
 k =1


(

=
mφ
Let

(

)

m

∑ I (φ
k =1

k

) (

)

≠ 0 ) denote the number of non-zero values sampled in φk at a particular

MCMC cycle where I(.) denotes the indicator function. Then we can write

p (π φ | ELSE
=
) beta(αφ + mφ , βφ + m − mφ ) .

233

To sample scale parameter s12 for random SNP effects for trait 1 in
CDBayesA\CDBayesB, given a specified prior for scale parameter

p( s12 | α1 , β1 ) = Gamma (α1 , β1 ) , we can write FCD as follows:
 m

p ( s12 | ELSE ) ∝  ∏ I s g21 j > 0 p s g21 j | ν 1 , s12 , π 1  p ( s12 | α1 , β1 )
 j =1


(



 m
=  ∏ I s g21 j > 0
 j =1



(

m1ν 1
2 α1 + 2 −1
1

∝ (s

)

)

ν1

 s ν 2
s 2ν
ν
 − 1 21
 2 
− 1 +1

 s 2  2  e 2s g1 j
g
ν  1j
Γ 1 
2
2
1 1

)

) (



 2 α1 −1 − β s2
 ( s1 ) e 1 1





ν m

exp  − s12  1 ∑ I s g21 j > 0 s g−12j + β1  


 2 j =1



(

)

mν
ν m
That is, a gamma distribution with parameters α1 + 12 1 and 1 ∑ I s g21 j > 0 s g−12j + β1 .
2 j =1

(

)

To sample scale parameter s2|21 for random SNP effects for trait 2 conditional on
trait 1 in CDBayesA\CDBayesB, given a specified prior for scale parameter

p( s2|21 | α 2 , β 2 ) = Gamma (α 2 , β 2 ) , we can write FCD as follows:

234

 m

p ( s2|21 | ELSE ) ∝  ∏ I s g22|1 j > 0 p s g22|1 j | ν 2|1 , s2|21 , π 2|1  p ( s2|21 | α 2 , β 2 )
 j =1


(

) (

)

ν 2|1

2
2


s
ν

2|1 2|1
2


 ν 2|1  − s2|1ν 2|1
 m
−
+
1


2 
2s 2
=  ∏ I s g22|1 j > 0 
s g22|1 j  2 e g2|1 j
ν 
 j =1
Γ  2|1 

 2 



(

∝ ( s2|21 )

α2 +

)

m2ν 2|1
2

−1




2
α −1
 ( s2|21 ) 2 e − β2 s2|1






ν m

exp  − s2|21  2|1 ∑ I s g22|1 j > 0 s g−2|12 j + β 2  


 2 j =1



(

)

That is, a gamma distribution with parameters α 2 +

ν 2|1
2

∑ I (s
m

j =1

2
g2|1 j

m2ν 2|1
and
2

)

> 0 s g−2|12 j + β 2 .

To sample degrees of freedom for SNP effects on trait 1 in
CDBayesA/CDBayesB, with a specified non-informative prior for degrees of freedom

p(=
v1 ) 1 / (1 + v1 ) , we can write FCD for v1 as follows,
2

 m

p (ν 1 | ELSE ) ∝  ∏ I s g21 j > 0 p s g21 j | ν 1 , s12 , π 1  p (ν 1 )
 j =1


(



 m
=  ∏ I s g21 j > 0
 j =1



(

) (

ν1

)

 s12  2
s12
ν

2
− 1 +1 − 2s 2
  s 2  2  e g1 j
g
ν  1j
Γ 1 
2

)




 p (ν 1 )




We sampled v1 using a random walk Metropolis Hastings algorithm, which was
described in other non-genomic applications involving the sampling of degrees of
freedom parameter (Kizilkaya & Tempelman 2005; Bello 2010).

235

To sample degrees of freedom for SNP effects on trait 2 conditional on trait 1 in
CDBayesA/CDBayesB, with a specified non-informative prior for degrees of freedom

p( v=
1 / (1 + v2|1 ) , we can write FCD for v2|1 as the following,
2|1 )
2

 m

p (ν 2|1 | ELSE ) ∝  ∏ I s g22|1 j > 0 p s g22|1 j | ν 2|1 , s2|21 , π 2|1  p (ν 2|1 )
 j =1


(



 m
=  ∏ I s g22|1 j > 0
 j =1




(

) (

)

ν 2|1

)

 s2|21  2
2
 
 ν 2|1  − s2|1
−
+1
2
2
  s 2  2  e 2s g2|1 j
g
 ν  2|1 j
Γ  2|1 
 2 




 p (ν 2|1 )





We sampled v2|1 using a random walk Metropolis Hastings algorithm, which was
described in other non-genomic applications involving the sampling of degrees of
freedom parameter (Kizilkaya & Tempelman 2005; Bello 2010).
To sample mean for association parameters between two traits in CDBayesA/CD2
BayesB, given a specified prior mφ  N (τ , ζ ) , FCD can be written as the following,

 m

p ( mφ | ELSE ) ∝  ∏ I (φk ≠ 0 ) p (φk | mφ , s φ2 )  p ( mφ | τ , ζ 2 )
 j =1


1
∝ exp  −
 2s 2
φ


m

∑ I (φ
k =1

k

≠ 0 ) (φk − mφ )

2


 ( mφ − τ ) 2 
 exp  −

2ζ 2 




2
1

m
m
∝ exp  −
−

(
)
φ
φ
 2s m2

φ


Where

236

s φ2

φζ +τ
2

mφ =

m

mφ

for φ =

sφ

2

ζ +
2

∑ φ I (φ
k =1

k

k

≠ 0)

ζ
and s mφ =

2

2

mφ

mφ

ζ +
2

mφ

s φ2
s φ2
mφ

2
Hence, mφ | ELSE  N ( mφ , s mφ )

To sample variance for association parameters between two traits in
CDBayesA/CDBayesB, given a specified prior p (s φ2 ) = (s φ2 )

−1/2

2
, FCD for s φ can be

written as follows,
 m

p (s φ2 | ELSE ) ∝  ∏ I (φk ≠ 0 ) p (φk | mφ , s φ2 )  p (s φ2 )
 k =1


 1 m
−1/2
2 
2 − m /2
2
exp
πs
− 2 ∑ I (φk ≠ 0 ) (φk − mφ )   (s φ2 )
(

φ )



 2s φ k =1


m

∑ I (φk ≠0)(φk − mφ )

 ( mφ −1) 
k =1
−
+1 −
2  2


∝ sφ

2

2s φ2

e

m
2
2
−2 
( mφ − 1), scale =∑1 (φk ≠ 0 ) (φk − mφ ) 
Hence, s φ | else  χ  df =
k =1



To sample residual variance-covariance matrix in CDBayesA\CDBayesB, given a
specified Inverted Wishart prior p ( Σe | v0 , Σ0 ) = IW ( v0 , Σ0 ) , we can write the FCD for
Σe as follows,

p ( Σe |ELSE , y ) ∝ p ( y | β, g, Σe ) p ( Σe | v0 , Σ0 )

(

)

exp −0.5 trace ( Σe −1Se ) Σe

∝ Σe

−

n
2

∝ Σe

−

1
( n + v0 + 3)
2

(

(

−

1
( v0 + 3)
2

(

exp −0.5trace ( Σe -1Σ0−1 )

))

exp −0.5 trace Σe −1 ( Se + Σ0−1 ) 



237

)

e ' e
where S e =  1' 1
e 2 e1

e1' e 2 

e '2 e 2 

(

Hence, Σe | else ~ IW n + v0 , ( Se + Σ0−1 )

)

238

D2 Supplementary tables and figures
Table D2.1: Summary of Responsible surface designs (RSD) in LE simulation.
Factors

RSD

Controlling
Heritability on Trait 1

0.8

Heritability on Trait 2

0.1

Residual covariance between two traits

0

Number of SNPs

2000

Mean on association parameters

0.8

Number of animals

2000,4000,6000

Variance on association parameters

2e-3,1.001,2

Number of QTLs on Trait 1

20,310,600

Number of QTLs on Trait 2

20,310,600

Number of QTLs on both traits

20,310,600

Investigating

Table D2.2: P values for the fixed effects by fitting accuracy on the lower heritability trait
as response variable in LE simulation under RSD.
Fixed effects

P-value

Number of animals (n)

<0.0001

Number of QTL on Trait 2 (M2)

<0.0001

Number of QTL on both traits (M12)

<0.0001

Variance on association parameters ( s φ )

0.0005

M2*M12

<0.0001

M12*M12

<0.0001

M12*Method

<0.0001

M12*M12*Method

0.0038

2

239

Figure D2.1: Estimated SNP effects for Rust_gall_vol and Rust_bin against SNP index
using whole Pine data set comparing CDBayesA1 and CDBayesA2, CDBayesB1 and
CDBayesB2. A) and C) were for Rust_gall_vol, B) and D) were for Rust_bin.

240

BIBLIOGRAPHY

241

BIBLIOGRAPHY

Abasht, B., E. Sandford, et al. (2009). "Extent and consistency of linkage disequilibrium
and identification of DNA markers for production and egg quality traits in commercial
layer chicken populations." Bmc Genomics 10(Suppl. 2): S2.
Badke, Y., R. Bates, et al. (2013). "Methods of tagSNP selection and other variables
affecting imputation accuracy in swine." Bmc Genetics 14(1): 8.
Banerjee, S., B. S. Yandell, et al. (2008). "Bayesian quantitative trait loci mapping for
multiple traits." Genetics 179(4): 2275-2289.
Beerda, B., W. Ouweltjes, et al. (2007). "Effects of genotype by environment interactions
on milk yield, energy balance, and protein balance." Journal of Dairy Science 90(1): 219228.
Bello, N. M., J. P. Steibel, et al. (2010). "Hierarchical Bayesian modeling of random and
residual variance-covariance matrices in bivariate mixed effects models." Biometrical
Journal 52(3): 297-313.
Berry, D. P., F. Buckley, et al. (2003). "Estimation of genotype X environment
interactions, in a grassbased system, for milk yield, body condition score, and body
weight using random regression models." Livestock Production Science 83(2-3): 191-203.
Bohmanova, J., I. Misztal, et al. (2008). "Short communication: Genotype by
environment interaction due to heat stress." Journal of Dairy Science 91(2): 840-846.
Burgueno, J., G. de los Campos, et al. (2012). "Genomic Prediction of Breeding Values
when Modeling Genotype x Environment Interaction using Pedigree and Dense
Molecular Markers." Crop Science 52(2): 707-719.
Calus, M. P. L., A. F. Groen, et al. (2002). "Genotype x environment interaction for
protein yield in Dutch dairy cattle as quantified by different models." Journal of Dairy
Science 85(11): 3115-3123.
Calus, M. P. L., T. H. E. Meuwissen, et al. (2008). "Accuracy of genomic selection using
different methods to define haplotypes." Genetics 178(1): 553-561.
242

Calus, M. P. L. and R. F. Veerkamp (2003). "Estimation of environmental sensitivity of
genetic merit for milk production traits using a random regression model." Journal of
Dairy Science 86(11): 3756-3764.
Calus, M. P. L. and R. F. Veerkamp (2007). "Accuracy of breeding values when using
and ignoring the polygenic effect in genomic breeding value estimation with a marker
density of one SNP per cM." Journal of Animal Breeding and Genetics 124(6): 362-368.
Calus, M. P. L. and R. F. Veerkamp (2011). "Accuracy of multi-trait genomic selection
using different methods." Genetics Selection Evolution 43.
Cardoso, F. F. and R. J. Tempelman (2012). "Linear reaction norm models for genetic
merit prediction of Angus cattle under genotype by environment interaction." Journal of
Animal Science 90(7): 2130-2141.
Carlin, B. P. and T. A. Louis (2008). Bayesian Methods for Data Analysis. Boca Raton,
FL, CRC Press.
Chan, J. C. C. and I. Jeliazkov (2009). "MCMC Estimation of Restricted Covariance
Matrices." Journal of Computational and Graphical Statistics 18(2): 457-480.
Chib, S. and E. Greenberg (1995). "Understanding the Metropolis-Hastings algorithm."
American Statistician 49(4): 327-335.
Choi, I., J. P. Steibel, et al. (2010). "Application of alternative models to identify QTL for
growth traits in an F-2 Duroc x Pietrain pig resource population." Bmc Genetics 11.
Coster, A., J. W. M. Bastiaansen, et al. (2010). "QTLMAS 2009: simulated dataset."
BMC Proceedings 4(Suppl 1): S3.
Coster, A., J. W. M. Bastiaansen, et al. (2010). "Sensitivity of methods for estimating
breeding values using genetic markers to the number of QTL and distribution of QTL
variance." Genetics Selection Evolution 42: 9.
Daetwyler, H. D. (2009). Genome-wide evaluation of populations. Wageningen,
Netherlands, Wageningen Universiteit (Wageningen University).

243

Daetwyler, H. D., M. P. L. Calus, et al. (2013). "Genomic Prediction in Animals and
Plants: Simulation of Data, Validation, Reporting, and Benchmarking." Genetics 193(2):
347-+.
Daetwyler, H. D., R. Pong-Wong, et al. (2010). "The Impact of Genetic Architecture on
Genome-Wide Evaluation Methods." Genetics 185(3): 1021-1031.
Daniels, M. J. and M. Pourahmadi (2002). "Bayesian analysis of covariance matrices and
dynamic models for longitudinal data." Biometrika 89(3): 553-566.
De Donato, M., S. O. Peters, et al. (2013). "Genotyping-by-Sequencing (GBS): A Novel,
Efficient and Cost-Effective Genotyping Method for Cattle Using Next-Generation
Sequencing." PLoS ONE 8(5): e62137.
de Jong, G. (1995). "Phenotypic plasticity as a product of selection in a varialbe
environment." American Naturalist 145(4): 493-512.
de los Campos, G. and D. Gianola (2007). "Factor analysis models for structuring
covariance matrices of additive genetic effects: a Bayesian implementation." Genetics
Selection Evolution 39(5): 481 - 494.
de los Campos, G., D. Gianola, et al. (2010). "Predicting genetic predisposition in
humans: the promise of whole-genome markers." Nature Reviews Genetics 11(12): 880886.
de los Campos, G., J. M. Hickey, et al. (2012). "Whole Genome Regression and
Prediction Methods Applied to Plant and Animal Breeding." Genetics.
de los Campos, G., J. M. Hickey, et al. (2013). "Whole-Genome Regression and
Prediction Methods Applied to Plant and Animal Breeding." Genetics 193(2): 327-+.
de los Campos, G., H. Naya, et al. (2009). "Predicting Quantitative Traits With
Regression Models for Dense Molecular Markers and Pedigree." Genetics 182(1): 375385.
De Roos, A. P. W., B. J. Hayes, et al. (2008). "Linkage disequilibrium and persistence of
phase in Holstein Friesian, Jersey and Angus cattle." Genetics 179: 1503 - 1512.

244

Deeb, N. and A. Cahaner (2001). "Genotype-by-environment interaction with broiler
genotypes differing in growth rate. 1. The effects of high ambient temperature and nakedneck genotype on lines differing in genetic background." Poultry Science 80(6): 695-702.
Du, F. X., A. C. Clutter, et al. (2007). "Characterizing linkage disequilibrium in pig
populations." International Journal of Biological Sciences 3(3): 166-178.
Duarte, J. L. G., R. O. Bates, et al. (2013). "Genotype imputation accuracy in a F2 pig
population using high density and low density SNP panels." Bmc Genetics 14.
Edwards, D. B., C. W. Ernst, et al. (2008). "Quantitative trait loci mapping in an F-2
Duroc x Pietrain resource population: I. Growth traits." Journal of Animal Science 86(2):
241-253.
Falconer, D. S. (1952). "The problem of environment and selection." The American
Naturalist 86: 293-298.
Gelman, A. (2006). "Prior distributions for variance parameters in hierarchical models."
Bayesian Analysis 1(3): 515-533.
Gelman, A., J. B. Carlin, et al. (2003). Bayesian Data Analysis. Boca Raton, FL., CRC
Press.
Gianola, D. (2013). "Priors in whole-genome regression: the bayesian alphabet returns."
Genetics 194(3): 573-596.
Gianola, D., G. de los Campos, et al. (2009). "Additive Genetic Variability and the
Bayesian Alphabet." Genetics 183(1): 347-363.
Gianola, D., M. Perez-Enciso, et al. (2003). "On marker-assisted prediction of genetic
value: Beyond the ridge." Genetics 163(1): 347-365.
Gianola, D. and D. Sorensen (2004). "Quantitative Genetic Models for Describing
Simultaneous and Recursive Relationships Between Phenotypes." Genetics 167(3): 14071424.
Gianola, D., X.-L. Wu, et al. (2010). "A non-parametric mixture model for genomeenabled prediction of genetic value for a quantitative trait." Genetica 138(9): 959-977.
245

Gilmour, A. R., B. J. Gogel, et al. (2009). "2009 ASReml User Guide Release 3.0 " VSN
International Ltd, Hemel Hempstead, HP1 1ES, UK.
Goddard, M. E. and B. J. Hayes (2009). "Mapping genes for complex traits in domestic
animals and their use in breeding programmes." Nature Reviews Genetics 10(6): 381-391.
Goddard, M. E., N. R. Wray, et al. (2009). "Estimating Effects and Making Predictions
from Genome-Wide Marker Data." Statistical Science 24(4): 517-529.
Grapes, L., J. C. M. Dekkers, et al. (2004). "Comparing linkage disequilibrium-based
methods for fine mapping quantitative trait loci." Genetics 166(3): 1561-1570.
Habier, D., R. L. Fernando, et al. (2007). "The Impact of Genetic Relationship
Information on Genome-Assisted Breeding Values." Genetics 177(4): 2389-2397.
Habier, D., R. L. Fernando, et al. (2011). "Extension of the Bayesian alphabet for
genomic selection." Bmc Bioinformatics 12: 186.
Hadjipavlou, G. and S. C. Bishop (2009). "Age-dependent quantitative trait loci affecting
growth traits in Scottish Blackface sheep." Animal Genetics 40(2): 165-175.
Hayes, B. and M. E. Goddard (2001). "The distribution of the effects of genes affecting
quantitative traits in livestock." Genetics Selection Evolution 33(3): 209-229.
Hayes, B. J., P. J. Bowman, et al. (2009). "Invited review: Genomic selection in dairy
cattle: Progress and challenges." J Dairy Sci 92(2): 433 - 443.
Henderson, C. R. (1976). "A Simple Method for Computing the Inverse of a Numerator
Relationship Matrix Used in Prediction of Breeding Values." Biometrics 32(1): 69-83.
Henderson, C. R. (1984). Applications of Linear Models in Animal Breeding. Guelph,
Canada, University of Guelph.
Hickey, J. M. and G. Gorjanc (2012). "Simulated Data for Genomic Selection and
Genome-Wide Association Studies Using a Combination of Coalescent and Gene Drop
Methods." G3-Genes Genomes Genetics 2(4): 425-427.

246

Hill, W. G., M. E. Goddard, et al. (2008). "Data and Theory Point to Mainly Additive
Genetic Variance for Complex Traits." PLoS Genet 4(2): e1000008.
Hoggart, C. J., J. C. Whittaker, et al. (2008). "Simultaneous Analysis of All SNPs in
Genome-Wide and Re-Sequencing Association Studies." PLoS Genetics 4(7): e1000130.
Jarmila, B., M. Sargolzaei, et al. (2010). "Characteristics of linkage disequilibrium in
North American Holsteins." BMC Genomics 11: 11.
Jia, Y. and J. L. Jannink (2012). "Multiple-Trait Genomic Selection Methods Increase
Genetic Value Prediction Accuracy." Genetics 192(4): 1513-+.
Karkkainen, H. P. and M. J. Sillanpaa (2012). "Back to Basics for Bayesian Model
Building in Genomic Selection." Genetics 112(139014).
Kizilkaya, K., P. Carnier, et al. (2003). "Cumulative t-link threshold models for the
genetic analysis of calving ease scores." Genetics Selection Evolution 35(6): 489 - 512.
Kizilkaya, K. and R. J. Tempelman (2005). "A general approach to mixed effects
modeling of residual variances in generalized linear mixed models." Genetics Selection
Evolution 37(1): 31-56.
Knap, P. W. and G. Su (2008). "Genotype by environment interaction for litter size in
pigs as quantified by reaction norms analysis." Animal 2(12): 1742-1747.
Lee, S. H., J. H. van der Werf, et al. (2008). "Predicting unobserved phenotypes for
complex traits from whole-genome SNP data." PLoS Genet 4(10): e1000231.
Legarra, A. and V. Ducrocq (2012). "Computational strategies for national integration of
phenotypic, genomic, and pedigree data in a single-step best linear unbiased prediction."
Journal of Dairy Science 95(8): 4629-4645.
Legarra, A. and I. Misztal (2008). "Technical note: Computing strategies in genome-wide
selection." Journal of Dairy Science 91(1): 360-366.
Legarra, A., C. Robert-Granie, et al. (2008). "Performance of Genomic Selection in
Mice." Genetics 180(1): 611-618.

247

Lillehammer, M., M. Arnyasi, et al. (2007). "A genome scan for quantitative trait locus
by environment interactions for production traits." Journal of Dairy Science 90(7): 34823489.
Lillehammer, M., M. E. Goddard, et al. (2008). "Quantitative trait locus-by-environment
interaction for milk yield traits on Bos taurus autosome 6." Genetics 179(3): 1539-1546.
Lillehammer, M., B. J. Hayes, et al. (2009). "Gene by environment interactions for
production traits in Australian dairy cattle." Journal of Dairy Science 92(8): 4008-4017.
Lillehammer, M., J. Odegard, et al. (2007). "Random regression models for detection of
gene by environment interaction." Genetics Selection Evolution 39(2): 105-121.
Liu, J. S. (1994). "The Collapsed Gibbs Sampler in Bayesian Computations with
Applications to a Gene-Regulation Problem." Journal of the American Statistical
Association 89(427): 958-966.
Logsdon, B., G. Hoffman, et al. (2010). "A variational Bayes algorithm for fast and
accurate multiple locus genome-wide association analysis." BMC Bioinformatics 11(1):
58.
Lorenz, A. J., S. M. Chao, et al. (2011). Genomic Selection in Plant Breeding:
Knowledge and Prospects. Advances in Agronomy, Vol 110. San Diego, Elsevier
Academic Press Inc. 110: 77-123.
Mattar, M., L. O. C. Silva, et al. (2011). "Genotype x environment interaction for longyearling weight in Canchim cattle quantified by reaction norm analysis." Journal of
Animal Science 89(8): 2349-2355.
Meuwissen, T. and M. Goddard (2010). "Accurate Prediction of Genetic Values for
Complex Traits by Whole-Genome Resequencing." Genetics 185(2): 623-631.
Meuwissen, T. H. E., B. J. Hayes, et al. (2001). "Prediction of total genetic value using
genome-wide dense marker maps." Genetics 157(4): 1819-1829.
Misztal, I., S. E. Aggrey, et al. (2013). "Experiences with a single-step genome
evaluation." Poultry Science 92(9): 2530-2534.

248

Misztal, I., A. Legarra, et al. (2009). "Computing procedures for genetic evaluation
including phenotypic, full pedigree, and genomic information." Journal of Dairy Science
92(9): 4648-4655.
Muller, P. (1991). "A generic approach to posterior integration and Gibbs sampling.
Technical Report 91-09." Retrieved March 12, 2013, from
http://www.stat.purdue.edu/research/technical_reports/1991-tr.html.
Munilla, S. and R. J. C. Cantet (2012). "Bayesian conjugate analysis using a generalized
inverted Wishart distribution accounts for differential uncertainty among the genetic
parameters - an application to the maternal animal model." Journal of Animal Breeding
and Genetics 129(3): 173-187.
Musani, S. K., H. G. Zhang, et al. (2006). "Principal component analysis of quantitative
trait loci for immune response to adenovirus in mice." Hereditas 143(1): 189-197.
Ntzoufras, I. (2011). Bayesian Modeling Using WinBugs, John Wiley & Sons.
O'Hara, R. B. and M. J. Sillanpaa (2009). "A Review of Bayesian Variable Selection
Methods: What, How and Which." Bayesian Analysis 4(1): 85-117.
Pinheiro, J. C., C. H. Liu, et al. (2001). "Efficient algorithms for robust estimation in
linear mixed-effects models using the multivariate t distribution." Journal of
Computational and Graphical Statistics 10(2): 249-276.
Plummer, M., N. Best, et al. (2006). "CODA: convergence diagnostics and output
analysis for MCMC." R News 6(1): 7-11.
Pourahmadi, M. (1999). "Joint mean-covariance models with applications to longitudinal
data: Unconstrained parameterisation." Biometrika 86(3): 677-690.
Resende, M. F. R., P. Munoz, et al. (2012). "Accuracy of Genomic Selection Methods in
a Standard Data Set of Loblolly Pine (Pinus taeda L.)." Genetics 190(4): 1503-+.
Riedelsheimer, C., F. Technow, et al. (2012). "Comparison of whole-genome prediction
models for traits with contrasting genetic architecture in a diversity panel of maize inbred
lines." BMC Genomics 13(1): 452.

249

Shariati, M. and D. Sorensen (2008). "Efficiency of alternative MCMC strategies
illustrated using the reaction norm model." Journal of Animal Breeding and Genetics
125(3): 176-186.
Shepherd, R. K., T. H. Meuwissen, et al. (2010). "Genomic selection and complex trait
prediction using a fast EM algorithm applied to genome-wide markers." BMC
Bioinformatics 11: 529.
Sorensen, D. and D. Gianola (2002). Likelihood, Bayesian, and MCMC methods in
quantitative genetics. New York, Springer-Verlag.
Strandén, I. and D. J. Garrick (2009). "Technical note: Derivation of equivalent
computing algorithms for genomic predictions and reliabilities of animal merit." Journal
of Dairy Science 92(6): 2971-2975.
Streit, M., F. Reinhardt, et al. (2012). "Reaction norms and genotype-by-environment
interaction in the German Holstein dairy cattle." Journal of Animal Breeding and
Genetics 129(5): 380-389.
Streit, M., R. Wellmann, et al. (2013). "Using Genome-Wide Association Analysis to
Characterize Environmental Sensitivity of Milk Traits in Dairy Cattle." G3-Genes
Genomes Genetics 3(7): 1085-1093.
Su, G., P. Madsen, et al. (2006). "Bayesian analysis of the linear reaction norm model
with unknown covariates." Journal of Animal Science 84(7): 1651-1657.
Technow, F. (2012). "R Package Hypred: Simulation of Genomic Data in Applied
Genetics."
Technow, F. and A. Melchinger (2013). "Genomic prediction of dichotomous traits with
Bayesian logistic models." Theoretical and Applied Genetics: 1-11.
Technow, F., C. Riedelsheimer, et al. (2012). "Genomic prediction of hybrid performance
in maize with models incorporating dominance and population specific marker effects."
Theoretical and Applied Genetics 125(6): 1181-1194.
Valdar, W., L. C. Solberg, et al. (2006). "Genome-wide genetic association of complex
traits in heterogeneous stock mice." Nature Genetics 38(8): 879-887.

250

Valdar, W., L. C. Solberg, et al. (2006). "Genetic and Environmental Effects on Complex
Traits in Mice." Genetics 174(2): 959-984.
van Binsbergen, R., R. F. Veerkamp, et al. (2012). "Makeup of the genetic correlation
between milk production traits using genome-wide single nucleotide polymorphism
information." Journal of Dairy Science 95(4): 2132-2143.
VanRaden, P. M. (2008). "Efficient Methods to Compute Genomic Predictions." Journal
of Dairy Science 91(11): 4414-4423.
Vaughan, L. K., J. Divers, et al. (2009). "The use of plasmodes as a supplement to
simulations: A simple example evaluating individual admixture estimation
methodologies." Computational Statistics & Data Analysis 53(5): 1755-1766.
Vazquez, A. I., G. de los Campos, et al. (2012). "A Comprehensive Genetic Approach for
Improving Prediction of Skin Cancer Risk in Humans." Genetics 192(4): 1493-1502.
Verbyla, K. L., B. J. Hayes, et al. (2009). "Accuracy of genomic selection using
stochastic search variable selection in Australian Holstein Friesian dairy cattle." Genetics
Research 91(5): 307-311.
Vichi, M. and G. Saporta (2009). "Clustering and disjoint principal component analysis."
Computational Statistics & Data Analysis 53(8): 3194-3208.
Villumsen, T. M., L. Janss, et al. (2008). "The importance of haplotype length and
heritability using genomic selection in dairy cattle." Journal of Animal Breeding and
Genetics 126: 3-13.
Waagepetersen, R., N. Ibanez-Escriche, et al. (2008). "A comparison of strategies for
Markov chain Monte Carlo computation in quantitative genetics." Genetics Selection
Evolution 40(2): 161-176.
Wang, C. L., X. D. Ding, et al. (2013). "Bayesian methods for estimating GEBVs of
threshold traits." Heredity 110(3): 213-219.
Wang, C. S., J. J. Rutledge, et al. (1994). "Bayesian-Analysis of Mixed Linear-Models
Via Gibbs Sampling with an Application to Litter Size in Iberian Pigs." Genetics
Selection Evolution 26(2): 91-115.

251

Wang, H., I. Misztal, et al. (2012). "Genome-wide association mapping including
phenotypes from relatives without genotypes." Genetics Research 94(2): 73-83.
Weller, J. I., G. R. Wiggans, et al. (1996). "Application of a canonical transformation to
detection of quantitative trait loci with the aid of genetic markers in a multi-trait
experiment." Theoretical and Applied Genetics 92(8): 998-1002.
Wiggans, G. R., P. M. VanRaden, et al. (2011). "The genomic evaluation system in the
United States: Past, present, future." Journal of Dairy Science 94(6): 3202-3211.
Wimmer, V., C. Lehermeier, et al. (2013). "Genome-Wide Prediction of Traits with
Different Genetic Architecture Through Efficient Variable Selection." Genetics 195(2):
573-+.
Yang, W. and R. J. Tempelman (2012). "A Bayesian antedependence model for whole
genome prediction." Genetics 190(4): 1491-1501.
Yi, N. and S. Xu (2008). "Bayesian Lasso for quantitative trait loci mapping." Genetics
179(2): 1045 - 1055.
Zhu, W. S. and H. P. Zhang (2009). "Rejoinder: Why do we test multiple traits in genetic
association studies?" Journal of the Korean Statistical Society 38(1): 25-27.
Zimmerman, D. L. and V. A. Núñez-Antón (2010). Antedependence Models for
Longitudinal Data, Chapman and Hall/CRC.

252