Ix... . .1“.
......1... ...
gain...“

. .
i

. ....rl
a L

t

7

 

 

 

 

. I.
. . ...pS-ﬂﬂuu-EHVIWd‘IIP
. . ... n..n..!.¢\4..t1x.. :0”
g . .nz. v.8...m1w
..

.. .u z 1.. ....
a... .
1 gray?
5:50.! I:
\}

... -s‘:
1 ..w.al?.‘\.
. av]... .

x
.16. \h

 

This is to certify that the

dissertation entitled

. HIERARCHICAL BAYESIAN THRESHOLD MODELS APPLIED TO THE
QUANTITATIVE GENETIC ANALYSIS OF CALVIN; EASE SCORES
IN ITALIAN PIEMONTESE CATTLE

presented by
Kadir Kizilkaya

has been accepted towards fulﬁllment
of the requirements for

 

 

Phone degreein Mina]. SCience

IGMJ' We“

Major professor I i

Date 87/L'Ql/02— r

 

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0- 12771

 

LIBRARY
Michigan State
University

 

 

 

PLACE IN RETURN Box to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 cJCIRC/Dateouopss-sz

HIERARCHICAL BAYESIAN THRESHOLD MODELS APPLIED TO THE
QUANTITATIVE GENETIC ANALYSIS OF CALVING EASE SCORES IN
ITALIAN PIEMONTESE CATTLE

By

Kadir Kizilkaya

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Animal Science

2002

ABSTRACT

HIERARCHICAL BAYESIAN THRESHOLD MODELS APPLIED TO THE
QUANTITATIVE GENETIC ANALYSIS OF CALVING EASE SCORES IN
ITALIAN PIEMONTESE CATTLE
By

Kadir Kizilkaya

First parity calving ease scores for Italian Piemontese cattle were analyzed using
sire and maternal grandsire threshold models. Genetic parameters were estimated by
approximate marginal maximum likelihood (MML) methods such as the historically
popular expectation-maximization method and Laplace's method in addition to
inferentially exact Markov Chain Monte Carlo (MCMC) methods. Laplacian MML and
MCMC point estimates of variance components and direct and maternal heritabilities
were seen to be statistically signiﬁcant with measures of uncertainty that were virtually
identical to each other. Furthermore, the joint modal estimates of sire effects and
associated standard errors conditioned on MML estimates of variance and covariance
components were seen to differ little from the respective posterior means and standard
deviations derived from MCMC. These results suggest that there may be little need to
consider computationally intensive MCMC methods for national breed genetic
evaluations derived from large calving ease datasets in cattle production industries.

A heavier tailed Student t residual distribution may be speciﬁed as an alternative
to the normal distribution for modeling underlying liability variables that characterize a

threshold probit-link model. Both t—link and probit link models were applied to simulated

data sets characterized by various degrees of freedom speciﬁcations for t-error on the
liability variables. Model choice, using either the deviance information criteria (DIC) or
the log marginal likelihood (LML), was generally correctly assigned in all cases, whether
for the direct linear model analysis of liability variables or for the threshold model
analysis of the corresponding ordinal data. The threshold t-link sire maternal-grandsire
model was found to better ﬁt Italian Piemontese calving ease data compared to the
regular threshold cumulative probit link model; nevertheless, the rank correlation on
posterior means of breeding values between a threshold-t and regular threshold model
analyses exceeded 0.98.

A hierarchical generalized linear mixed model based on a structural multifactorial
model with ﬁxed and random effects that multiplicatively inﬂuence residual
heteroskedasticity was developed. Validation of models and MCMC algorithms were
based on simulated normal and ordinal categorical data with heteroskedastic residual
structures. Simulation results indicated that DIC and LML were useful in correctly
choosing between heteroskedastic and homoskedastic models. The residual variance for
male calves was signiﬁcantly greater than that for female calves and signiﬁcant residual
heteroskedasticity existed across herds, whether for linear model analyses of birth
weights or for threshold model analysis of calving ease scores. However, the high
correlation between posterior mean of sire effects from heteroskedastic and
homoskedastic models indicated that there was no signiﬁcant rerankings of sire genetic

merit between the two models.

ACKNOWLEDGEMENTS

I would like to express my sincere gratitude to Dr. Robert Tempelman for his guidance,
encouragement, permanent support and patience during these past ﬁve wonderful years as
a doctoral student. Appreciation is also extended to Drs. Dennis Banks, Catherine Ernst,
Joseph Gardiner and Frank Lupi for their encouragement and for serving on the guidance
committee. Also I would like to express my sincere thanks to Department of Animal
Science for kindly offering me an assistantship during my PhD program. I wish to thank
Dr. Guilherme Rosa for his kindly help and advice during preparation of my dissertation.
I greatly appreciate the logistical assistance provided by Dr. Peter Saarna in Animal
Breeding and Genetics Group Quantitative Genetics Lab. I also would like to thank my
fellow graduate students Fernando Cardoso, David Edwards, Hugo Roman in Animal
Science; Cengiz Caner in Food Packaging; Savas Berber in Physics and Astronomy and
Dr. Jose Sanchez, Jeff Smeenk and Dr. Mohammed Khan in Crop and Soil Science for
their friendship, encouragement and always being there to help me. I wish to thank Ersel
and Sule Catal for friendship and support. I am thankful to Dr. Richard Harwood and the
CS. Mott Foundation for the ﬁnancial support in the last term of my program. I also
would like to thank Collage of Agriculture and Natural Resources (CANR) for their
support through the Dissertation Completion Fellowship. I am thankful to Adnan
Menderes University in Aydin, Turkey for their understanding and help. Finally I would
like to thank my wife, Aysegul, my daughter, Mervenur and my son, Ismail Can, for their

incomparable support and understanding.

iv

PREFACE
Chapter 1 and Chapter 2 in this dissertation were written in the style required for
publication in the Journal of Animal Breeding and Genetics. Chapter 3 and Chapter 4

were written in the style required for publication in the Genetics Selection Evolution.

TABLE OF CONTENTS

 

 

 

 

LIST OF TABLES ..... - - - - - -- -- - - --...VIII
LIST OF FIGURES ..... -- - - - x
INTRODUCTION _ - - -- 1
REFERENCES ...................................................................................................... 5
CHAPTER 1: LITERATURE REVIEW - - -- -- 8
INTRODUCTION ................................................................................................. 8
STATISTICAL MODELS FOR GENETIC EVALUATION OF CALVING
EASE .................................................................................................................... 12
BAYESIAN INFERENCE ................................................................................. 15
VARIANCE COMPONENT ESTIMATION ................................................... 17
ROBUST MIXED LINEAR MODELS ............................................................ 21
HETEROSKEDASTIC MIXED EFFECTS MODELS .................................. 26

CHAPTER 2: BAYESIAN INFERENCE STRATEGIES FOR THE PREDICTION
OF GENETIC MERIT USING THRESHOLD MODELS WITH AN
APPLICATION TO CALVING EASE SCORES IN ITALIAN PIEMONTESE

 

CATTLE - - - - 42
ABSTRACT ......................................................................................................... 42
INTRODUCTION ............................................................................................... 43
MATERIALS AND METHODS ....................................................................... 45

Data .............................................................................................................. 45
Threshold Model ......................................................................................... 47
MML Using the EM Approach .................................................................. 48
Laplace's Method ........................................................................................ 50
Markov Chain Monte Carlo (MCMC) ..................................................... 52
Inference on Heritabilities and Genetic Correlations .............................. 54
RESULTS AND DISCUSSION ......................................................................... 54
Assessment of MCMC Convergence and Mixing .................................... 54
Variance Component Inference ................................................................. 55
Heritabilities and Genetic Correlations .................................................... 57
Comparison of Inferences on Sire Effects ................................................ 58
CONCLUSIONS ................................................................................................. 60
REFERENCES .................................................................................................... 61

CHAPTER 3: CHAPTER 3: AN ASSESSMENT OF CUMULATIVE t-LINK
THRESHOLD MODELS FOR THE QUANTITATIVE GENETICS ANALYSIS

 

OF CALVING EASE SCORES - 67
ABSTRACT ......................................................................................................... 67
INTRODUCTION ............................................................................................... 68
MODEL CONSTRUCTION .............................................................................. 70

vi

MODEL COMPARISON ................................................................................... 75

DATA ................................................................................................................... 78

Simulation Study ......................................................................................... 78

Italian Piemontese Calving Ease Data ...................................................... 79

RESULTS ............................................................................................................ 82

Simulation Study ......................................................................................... 82

Application to Calving Ease Scores in Italian Piemontese Cattle .......... 85

Genetic Parameter Inference ............................................................. 85

Inferences on Sire Effects ................................................................... 87

DISCUSSION ...................................................................................................... 88

REFERENCES .................................................................................................... 93
CHAPTER 4: BAYESIAN STRUCTURAL MODELING OF HETEROGENEOUS

RESIDUAL VARIANCES IN GENERALIZED LINEAR MIXED MODELS ..... 104

ABSTRACT ....................................................................................................... 104

INTRODUCTION ............................................................................................. 105

MATERIALS and METHODS ....................................................................... 108
HIERARCHICAL CONSTRUCTION OF THE HETEROSKEDASTIC

GENERALIZED LINEAR MIXED MODEL ....................................... 108

MODEL COMPARISON ......................................................................... 116

DATA ......................................................................................................... 117

Simulation Study ............................................................................... 117

Italian Piemontese Birth Weight and Calving Ease Data ............. 119

RESULTS .......................................................................................................... 122

Simulation Study ....................................................................................... 122

Application to Birth Weight and Calving Ease Scores in Italian

Piemontese Cattle ...................................................................................... 125

Genetic Parameter Inference ........................................................... 125

Inference on Sire Effects .................................................................. 127

DISCUSSION .................................................................................................... 128

REFERENCES .................................................................................................. 131

CONCLUSIONS AND RECOMMENDATIONS - 147

 

vii

LIST OF TABLES

CHAPTER 1: LITERATURE REVIEW
Table I. Itemized dystocia costs (Albera et al. 1999). - 38

Table 11. Direct heritability (hzd), maternal heritability (hzm), and direct-maternal

genetic correlation (rg) estimates :1: standard errors, where available, for calving ease
score.---- - - 39

CHAPTER 2: BAYESIAN INFERENCE STRATEGIES FOR THE PREDICTION
OF GENETIC MERIT USING THRESHOLD MODELS WITH AN
APPLICATION TO CALVING EASE SCORES IN ITALIAN PIEMONTESE
CATTLE

Table 1. Effective sample sizes for (co)variance components and threshold
parameters using MCMC for ﬁrst parity calving ease scores in Piemontese cattle. 65

Table II. Estimation of variance-covariance components and genetic parameters for
calving difﬁculty in Piemontese cattle using EM, Laplacian, and MCMC algorithms.
66

 

CHAPTER 3: AN ASSESSMENT OF CUMULATIVE t-LINK THRESHOLD
MODELS FOR THE QUANTITATIVE GENETICS ANALYSIS OF CALVING
EASE SCORES

Table I. Posterior inference on the residual degrees of freedom parameter (v) in
simulation study using the cumulative t-link model. 97

Table II. Posterior inference on sire variance (of) in simulation study using the
cumulative t-link model. 98

Table III. Deviance information criteria (DIC) and log marginal likelihood (LML)
comparisons between models in simulation study. 99

Table IV. Posterior inference on genetic parameters of calving ease scores in Italian
Piemontese cattle. -- - 1 - - 1- 100

Table V. MCMC chain-speciﬁc deviance information criterion (DIC) and log

marginal likelihood (LML) values for the analysis of calving ease scores in Italian
Piemontese cattle. - _ - 101

CHAPTER 4: BAYESIAN STRUCTURAL MODELING OF HETEROGENEOUS
RESIDUAL VARIANCES IN GENERALIZED LINEAR MIXED MODELS

Table I. Posterior inference on the dispersion and heterogeneity parameters in
simulation study using linear mixed effects model. 135

viii

Table II. Posterior inference on the dispersion and heterogeneity parameters in
simulation study using cumulative probit mixed model- - 136

Table III. Posterior inference on variance components of birth weight and calving
ease scores in Italian Piemontese cattle. - 137

Table IV. Posterior inference on genetic parameters of birth weight and calving
ease scores in Italian Piemontese cattle. - 138

Table V. MCMC chain-speciﬁc deviance information criterion (DIC) and log

marginal likelihood (LML) values for the analysis of birth weight and calving ease
scores in Italian Piemontese cattle. 139

ix

LIST OF FIGURES

CHAPTER 1: LITERATURE REVIEW

Figure l. A flowchart interaction of direct additive and maternal genetic effects with
other factors in terms of their effect on calving difﬁculty (Burfening 1991) ............. 40

Figure 2. Mapping of underlying liabilities to observed calving ease phenotypes.
Thresholds 1, 2, and 3 delimit normally distributed underlying liabilities into 4
calving ease phenotypes (unassisted calving, assisted easy calving, assisted difﬁcult
calving, and Caesarean and embryotomy). - 41

CHAPTER 3: AN ASSESSMENT OF CUMULATIVE t-LINK THRESHOLD
MODELS FOR THE QUANTITATIVE GENETICS ANALYSIS OF CALVING
EASE SCORES

Figure l. Scatterplot of posterior means of sire progeny differences based on
cumulative t-link versus cumulative probit link model with superimposed line of
best ﬁt.- 102

Figure 2. Scatterplot of posterior standard deviations of sire progeny differences
based on cumulative t-link versus cumulative probit link model with superimposed
line of best fit. 103

CHAPTER 4: BAYESIAN STRUCTURAL MODELING OF HETEROGENEOUS
RESIDUAL VARIANCES IN GENERALIZED LINEAR MIXED MODELS

 

 

Figure 1. DIC differences in simulation study - - 140
Figure 2. Approximate 95% PPI on herd-speciﬁc delta for BW. 141
Figure 3. Approximate 95% PPI on herd-specific delta for CD 142

 

Figure 4. Scatterplot of posterior means of sire progeny differences for BW based on
homoskedastic versus heteroskedastic LMM model. 143

Figure 5. Scatterplot of posterior standard deviations of sire progeny differences for
BW based on homoskedastic versus heteroskedastic LMM model. 144

Figure 6. Scatterplot of posterior means of sire progeny differences for CD based on
homoskedastic versus heteroskedastic CPM model _ -_ _ 145

Figure 7. Scatterplot of posterior standard deviations of sire progeny differences for
CD based on homoskedastic versus heteroskedastic CPM model. 146

INTRODUCTION

Proﬁtable cattle production requires careful cost control. A herd's cost of
production is strongly associated with its general level of ﬁtness and fertility (Banos
1999). Consequently, over the last two decades seedstock selection emphasis has been
increasingly directed towards ﬁtness traits such as conformation scores, calving ease, and
stillbirth incidences.

Calving ease, or conversely calving difﬁculty, is particularly important for beef
and dairy cattle production since calving problems, compared to successful unassisted
calvings, generate additional costs well beyond parturition. Costs are immediately
incurred, of course, as veterinary fees, increased labor costs, and potential loss of calf, but
long term losses also accrue due to subsequent declines in dam health and fertility,
directly resulting in reduced long-term productivity (Albera et a1. 1999). Although all
breeds experience calving difficulty to various degrees, incidence rates are particularly
high in the heavy muscled Italian Piemontese breed (Carnier et al. 2000) relative to
reported estimates on the continental and British breeds (V arona et al. 1999; Bennett and
Gregory 2001). In the Gennplasm Evaluation Program at the Roman L. Hruska U.S.
Meat Animal Research Center (MARC) in Clay Center, NE (Wheeler et al. 1996) the
Piemontese breed has been concluded to be the only breed out of 10 studied to “provide
the greatest opportunity to produce, lean tender meat”. Hence the importance of
controlling calving difﬁculty in this breed cannot be understated. One strategy for
improving calving ease is through genetic selection.

Genetic evaluation systems for normally distributed production characters have

been based on well-characterized linear mixed models (Henderson 1973). Breeding sires

are typically ranked by best linear unbiased predictions (BLUP) or, equivalently,
empirical Bayes estimates of their corresponding random effects. Furthermore, variance
component and derivative heritabilities are typically estimated using restricted maximum
likelihood (REML), which has desirable properties relative to other methods (Mc Culloch
and Searle 2001). Calving ease, however, is generally recorded as a 4 or 5 level ordinal
category trait, such that normality assumptions are clearly violated. Abdel-Azim and
Berger (1998) and Luo et al. (2001) have demonstrated that REML estimates of variance
components based on linear mixed model analysis of categorical data yield biased
estimates of heritabilities and genetic correlations.

A suitable model for genetic analysis of categorical data is the cumulative probit
or threshold model adapted by Gianola and Foulley (1983) and Harville and Mee (1984).
However, the proposed inference procedures rely upon rather strong asymptotic or
approximate assmnptions. Variance component estimation has been historically based on
a joint maximization of the marginal posterior density of the variance components using
an approximate invocation of the expectation-maximization (EM) algorithm (Harville
and Mee 1984; Stiratelli et al. 1984). However, in various simulation tests of this
method, some signiﬁcant biases in heritability estimates were found (Hoeschele et al.
1987, Simianer and Schaeffer 1989). Furthermore, under the threshold model, breeding
sire merit is typically estimated using elements of the joint posterior modes of these
random effects, conditionally on estimated variance components. These joint posterior
modes are intended to approximate the posterior means, shown to be optimal selection
criteria by Fernando and Gianola (1986). Whether or not those approximations work well

as intended for breed genetic evaluations is not clearly known.

Markov Chain Monte Carlo (MCMC) methods have facilitated exact sample
Bayesian inference for many applications in animal breeding, including inference on
categorical traits. In MCMC, inference on parameters is based on random but correlated
draws from posterior distributions. Earlier studies have highlighted MCMC analysis for
genetic parameter estimation in production traits (Wang et al. 1994a,b; Luo et al. 1999;
Varona et al. 1999). The superiority of MCMC inference over, for example, approximate
EM-based inference on variance components for categorical data under conventional
threshold models has been demonstrated in simulation studies (Hoeschele and Tier 1995).
MCMC allows additional modeling possibilities for quantitative genetic analyses of
calving ease that have not yet been fully studied and exploited.

For example, it has been noted that preferential treatment may be partly
responsible for the records of some high producing animals. Stranden and Gianola
(1999) developed hierarchical Bayesian models based on a Student t distributed error
structure to provide outlier-robustness on genetic evaluations. Calving ease is often
subjectively scored by herdspersons on an ordinal scale such that data quality might
conceptually be a greater issue here than for continuously distributed production
characters. Albert and Chib (1993) proposed a cumulative I link model providing greater
modeling ﬂexibility relative to cumulative probit model for the analysis of ordinal
categorical data. In animal breeding applications, a cumulative I link model might be
considered to be an outlier-robust model to minimize the impact of outlying records on
genetic merit predictions for calving ease on breeding sires.

As another example, many studies have indicated that residual variances are

heterogeneous for normally distributed production characters across herds, sexes, and

other factors (Hill et al. 1983; lbanez et al. 1999). If this phenomenon is not properly
taken into account (i.e. a homoskedastic error distribution is assumed), differences in
within-subclass variances result in biased breeding value predictions such that a
disproportionate numbers of animals may be selected from environments characterized
by high variability. This bias has the potential effect of substantially reducing genetic
progress due to breedstock selection based on these predictions (Hill 1984; Weigel and
Gianola 1992).

A structural model for heterogeneity of residual variance on a conceptual
underlying scale has been developed for the threshold model by Foulley and Gianola
(1996). However, this model, as well as a subsequent model by Jaﬁiezic et al. (1999),
invokes analytical approximations, which appear tenuous, particularly for the analysis of
categorical data. Exact MCMC inference on residual heteroskedasticity is conceptually
possible but requires development and testing.

Cumulative t—link and heterogeneous variance models need to be tested as
alternatives to conventional threshold models used currently by the beef cattle industry
for genetic evaluation of breeding sires for calving ease of their daughters as dams and
progeny as calves. A hierarchical Bayesian framework seems to be suitable for the

construction of these models, using MCMC for statistical inference.

REFERENCES

Abdel-Azim, G. A.; Berger, P. J ., 1999: Properties of threshold model predictions. J.
Anim. Sci. 77: 582-590.

Albera, A.; Carnier, P.; Groen, A. F., 1999: Breeding for improved calving performance
in Piemontese cattle economic value. Proceedings international workshop on EU
concerted action genetic improvement of functional traits in cattle (GIFT);
breeding goals and selection schemes. Bulletin no:23. Wageningen, The
Netherlands 7-9th November, 1999.

Albert, J. H.; Chib, S., 1993: Bayesian analysis of binary and polychotomous response
data. J. Am. Stat. Assoc. 88: 669-679.

Banos, G., 1999: From research to application: A summary of scientiﬁc developments
and possible implementation to the genetic improvement for functional traits.
Bulletin no:23. Wageningen, The Netherlands 7-9“1 November, 1999.

Bennett, G. L.; Gregory, K. E., 2001: Genetic (co)variances for calving difficulty score in
composite and parental populations of beef cattle: I. Calving difficulty score, birth
weight, weaning weight, and postweaning gain. J. Anim. Sci. 79: 45-51.

Carnier, P.; Albera, A.; Dal Zotto, R.; Groen, A. F.; Bona, M.; Bittante, G., 2000: Genetic
parameters for direct and maternal calving performance over parities in Piemontese
cattle. J. Anim. Sci. 78: 2532-2539.

Fernando, R.L.; Gianola, D., 1986: Optimal properties of the conditional mean as a
selection criterion. Theor. Appl. Genet. 72: 822-825.

Foulley, J. L.; Gianola, D., 1996: Statistical analysis of ordered categorical data via a
structural heteroskedastic threshold model. Genet. Sel. Evol. 28: 249-273.

Gianola, D.; Foulley, J. L., 1983: Sire evaluation for ordered categorical data with a
threshold model. Genet. Sel. Evol. 15 :201-224.

Harville, D. A.; Mee, R. W., 1984: A mixed-model procedure for analyzing ordered
categorical data. Biometrics 40: 393-408.

Henderson, C.R., 1973: Sire evaluation and genetic trends. In: Proc. Anim. Breed.

Genet. Symp. in Honor of Dr. J.L. Lush. ASAS and ADSA, Charnpaign, IL, pp.
10-41

Hill W. G., 1984: On selection among group with heterogeneous variance. Anim. Prod.
39: 473-477.

Hoeschele, I.; Gianola, D.; Foulley, J. L., 1987: Estimation of variance components with
quasi-continuous data using Bayesian methods. J. Anim. Breed. Genet. 104: 334-
349.

Hoeschele, 1.; Tier, B., 1995: Estimation of variance components of threshold characters
by marginal posterior mode and means via Gibbs sampling. Genet. Sel. Evol. 27:
519-540.

lbanez, M. A.; Carabano, M. J .; Alenda, R., 1999: Identiﬁcation of sources of
heterogeneous residual and genetic variances in milk yield data from the Spanish

Holstein-Friesian population and impact on genetic evaluation. Livest. Prod. Sci.
59: 33-49.

Jaffrezic, F.; Robert-Granie, C.; Foulley, J. L., 1999: A quasi-score approach to the
analysis of ordered categorical data via a mixed heteroskedastic threshold model.
Genet. Sel. Evol. 31: 301-318.

Luo, M. F.; Boettcher, P. J .; Dekkers, J. C. M.; Schaeffer, L. R., 1999: Bayesian analysis
for estimation of genetic parameters of calving ease and stillbirth for Canadian
Holsteins. J. Dairy Sci. 82: 1848.

Luo, M. F.; Boettcher, P. J .; Schaeffer, L. R.; Dekkers, J. C. M., 2001: Bayesian
inference for categorical traits with an application to variance components
estimation. J. Dairy Sci. 84: 694-704.

McClulloch, C.E.; Searle, SR, 2001: Generalized, linear, and mixed modals. Wiley,
NewYork.

Simianer, H.; Schaeffer, L. R., 1989: Estimation of covariance components between one
continuous and one binary trait. Genet. Sel. Evol. 21: 303-315.

Stiratelli, R.; Laird, N.; Ware, J. H., 1984: Random-effects models for serial observations
with binary response. Biometrics. 40: 961 -971.

Stranden, I.; Gianola, D., 1999: Mixed effects linear models with t-distributions for
quantitative genetic analysis: a Bayesian approach, Genet. Sel. Evol. 31 : 25-42.

Varona L.; Misztal, 1.; Bertrand, J. K., 1999: Threshold-linear versus linear-linear
analysis of birth weight and calving ease using an animal model: I. Variance
component estimation. J. Anim. Sci. 77: 1994-2002.

Wang, C. S.; Rutledge, J. J .; Gianola, D., 1994a: Bayesian analysis of mixed linear

models vi Gibbs sampling with an application to litter size in Iberian pigs. Genet.
Sel. Evol. 26: 91-115.

Wang, C. S.; Gianola, D.; Sorensen, D.; Jensen, J .; Christensen, A.; Rutledge, J. J.,
1994b: Response to selection for litter size in Danish landrace pigs-A Bayesian
analysis. Theor. And Appl. Genet. 88: 220-230.

Weigel K. A.; Gianola, D., 1992: Estimation of heterogeneous within-herd variance
components using empirical bayes method: A simulation study. J. Dairy Sci. 75:
2824-2833.

Wheeler, T.J.; Cundiff, L.V.; Koch, R.M.; Crouse, J .D., 1996: Characterization of
biological types of cattle (Cycle IV): carcass traits and longissimus palatability. J.
Anim. Sci. 74:1023-1035.

CHAPTER 1

Literature Review

INTRODUCTION

Dystocia, or calving difﬁculty, is deﬁned as an abnormal or difﬁcult birth
requiring assistance (Manfredi et al. 1991). The percentages of dams requiring some
assistance at calving range from 28.9% in American Gelbvieh (V arona et al. 1999) to
88.4% in Italian Piemontese cattle (Carnier et al. 2000), although these numbers may be
further indicative of management differences as well. The percentages of dams requiring
Caesarean section at calving range from are 0.4% in Holstein Fresian (Meijering 1984),
to 13.7% in Italian Piemontese cattle (Carnier et al. 2000). Calving difﬁculty, or
conversely calving case, is scored subjectively according to its departure from a normal
calving, the latter of which results with a healthy dam and a healthy calf without any
human intervention (Meijering 1984). These scores are generally deﬁned on an ordinal
scale from 1 (unassisted calving) to 4 or 5 (Caesarean section), the score thereby

reﬂecting the amount of required assistance at birth.

Injuries or suffocation resulting from difﬁcult or delayed calving cause the death
of many calves either at birth or immediately thereafter in spite of the fact that about 80%
of all calves lost at birth are anatomically normal (Whitter and Thome 1995). In various
beef breeds, calf mortality at or near the time of birth has been shown to be four times
greater (P<0.01) in calves experiencing dystocia (20.4%) than in those not experiencing
dystocia (5.0%) (Laster and Gregory 1973). In addition to the loss of the calf or dam,
calving difﬁculties negatively impact cattle production through increased veterinary labor

costs, subsequent dam health and fertility problems (i.e., increases in the postpartum

interval or decreases in overall conception rate) and subsequently decreased dam milk

production (Walker et al. 1994; Albera et al. 1999).

Anderson (1998) estimated that calving difﬁculty results in annual losses of 25
million dollars in the state of Nebraska and overall annual losses in the United States are
estimated at between 500 million and 750 million dollars (Walker et a1. 1994). Albera et
al. (1999) itemized economic losses derived from market prices or supplied by
veterinarians and Piemontese extension specialists in the computation of dystocia costs in
Italian Piemontese cattle. These losses are highlighted in Table I. Since calving difficulty
has both major direct and indirect costs on production, cattle breeders have an interest in

genetic analysis of calving difficulty to selectively improve populations for better calving

ease.

Genetic and environmental factors contribute to the cause and severity of calving
difﬁculty. Figure 1 taken from Burfening (1991) ﬂowcharts the complexity of this trait
and serves to point out the numerous interacting factors that affect calving difﬁculty. As
seen from Figure 1, calving case is both a characteristic of the cow and of the calf.
Furthermore, a breeding bull may inﬂuence calving case either as a sire or as a maternal
grandsire. For example, a calfs sire contributes to the calving ease as a characteristic of
the calf through the sire's direct genetic effect. This particular genetic effect may be
inﬂuenced by the birth weight, gestation length and shape of the calf as inherited from its
sire. Sire breed differences for calving ease are known to exist. Laster et al. (1973) have
shown in a study involving Hereford and Angus cows that calves sired by Charolais,
Simmental, Limousin and South Devon bulls experienced signiﬁcantly more calving

difﬁculty than calves sired by Hereford, Angus and Jersey bulls (P<0.01). A maternal

genetic effect, on the other hand, is determined by genes passed on by the sire to his
daughter that affect her pelvic measures, uterine environment, hormonal control and other
factors that determine her ability to calve easily as a dam. That same sire also passes on
genes to his daughter's calf that directly inﬂuences its ability to be calved out easily (i.e.

direct genetic effects).

Pollak and Freeman (1976) and Berger and Freeman (1978) determined that calf
sex was an important source of variability for calving difﬁculty in Holsteins. This is not
too surprising since bull calves are generally larger than heifer calves at birth. Laster and
Gregory (1973) determined calf losses in various beef breeds to be higher in male calves
(22.4%) than in female calves (16.3%) delivered from dams experiencing difficult births.
However, they found no difference in calf mortality between sexes when calving was

unassisted.

Calf birth weight is an important factor affecting calving difﬁculty. Laster et al.
(1973) determined that the regression of calving difﬁculty on birth weight was highly
signiﬁcant (P<0.005). Birth weight is also genetically correlated to calving difﬁculty
(V arona et al. 1999), which indicates that genetic selection against calving difﬁculty
reduces birth weight. The effect of birth weight on calving difficulty is signiﬁcant in
heifers but as dams become older, the effect of birth weight becomes less important
(Herring 1996). Calving difficulty in two-year-old dams (54% Hereford and Angus cows
in US. Meat Animal Research Center, MARC and 30% Hereford cows in Colorado State
University, CSU) is three to four times more frequent than in three-years-olds (16% in
MARC and 11% in CSU), since the younger dams have not yet attained full physical or

skeletal maturity (Meijering 1984). Calving difﬁculty problems are minimal when a dam

10

reaches four to ﬁve years old. In an analysis of calving difﬁculty data collected over 14
years, Berger (1994) determined that fewer in Holstein cows needed assistance for
calving in second (18.42%), third (16.97%) and later parities than in ﬁrst parity (40.40%).

Season of calving also impacts calving difficulty. Berger (1994) reported that
calving difﬁculty in Holsteins is more frequent during winter than in summer. Philipsson
(1976) explained this by noting that summer is grazing season which enables cows to
exercise on the pasture. In contrast, cows are conﬁned and receive less exercise during
the winter.

Pelvic opening of the dams also represents an important source of variation in the
frequency of calving difﬁculties. Disproportion in size between the fetus and darn
appears to be the major cause of dystocia. Thus, there is a critical need for adequate

growth in heifers to allow an increase in pelvic area.

Both direct and maternal genetic variability appears to exist for calving ease based
on studies reported in Table II. This table reports heritability and genetic correlation
estimates by study, model and method of variance component estimation. These models
and methods are discussed in more detail later. Although reported heritability estimates
are highly variable, direct heritability estimates generally concentrate around intermediate
values (0.15-0.20) with maternal heritability estimates being generally lower. The
estimated genetic correlations between direct and maternal effects are also generally
negative, thereby indicating that the ability of a female calf to be easily delivered from
her own dam is antagonistic with that same female's ability to subsequently calve easily

herself. Since calving ease and body dimensions are inherently related, a calf may be

11

small at birth allowing her own easy delivery but then being small as a dam would have

subsequent calving problems.

STATISTICAL MODELS FOR GENETIC EVALUATION OF CALVING EASE
A primary goal of breeding programs is to maximize genetic gain in animal traits

affecting livestock production. Since the greatest selection differential is on sires in most
beef cattle populations, particularly where the use of artiﬁcial insemination is prominent,
accurate sire ranking is important. Achieving accurate sire selection requires inference on
genetic parameters such as, for example, heritability, or the proportion of total variability
that is genetic, whether for direct or maternal genetic effects. The greater the heritability,
the greater the potential for genetic improvement of the character through selective
breeding. For continuous production characters like weight gain, estimation of genetic
parameters is typically carried out using restricted maximum likelihood (REML) under
linear mixed models. In linear mixed models, the inﬂuence of ﬁxed effects such as age,
breed or sex along with genetic and other random effects are jointly inferred upon for the
analysis of normally distributed traits. Predicted breeding values of individual animals
using best linear unbiased prediction (BLUP) are useful for seedstock selection in
livestock improvement programs (Henderson, 1973).

In addition to obvious production characters affecting farm revenue, economic
efﬁciency in animal production can also be enhanced by considering genetic selection for
health and reproductive ﬁtness traits such as conformation score, calving ease and
ovulation rate (Dematawewa and Berger, 1997). However, unlike production traits,
ﬁtness traits such as calving case are often not continuous in their expression but are

discrete. Thus, key multivariate normal distributional assumptions invoked under a linear

12

mixed model analysis are violated, particularly when the number of ordinal categories is
small; i.e. $5. Gianola (1982) argued against using BLUP to predict breeding values for
ordinal data since estimated breeding values and residuals can be shown to be dependent
upon each other. Furthermore, several recent studies illustrate that variance component
estimation based on linear mixed models consistently yielded biased estimates of
heritabilities and correlation coefﬁcients in the analysis of categorical data (Abdel-Azim

and Berger 1998; Luo et al. 2001).

Threshold models, more commonly known as cumulative link models in the
statistics literature, are appropriate for the multifactorial analysis of ordinal categorical
traits. Threshold models are probably best understood by the concept that the ordinal
trait is inﬂuenced by an underlying liability or latent variable binned by various
thresholds that impose a discontinuity on the visible expression of the trait (Falconer and
MacKay, 1996). In Figure 2, for example, a liability value between the ﬁrst and second
thresholds maps to an observed category of assisted easy calving. The concept of the
threshold model dates back to Wright (1934) who inferred upon the variability of the
number of toe digits between and within strains of guinea pigs and hypothesized an
underlying normal distribution of liability variables with strain-speciﬁc means and
common variance. Using calving ease data as the application, both Gianola and Foulley
(1983) and Harville and Mee (1984) proposed breeding value estimation procedures
under a threshold mixed effects model based on the generalized linear model formalized
by Nelder and Wedderburn (1972). Unlike the linear mixed model, the threshold mixed
model does appropriately account for the multinomial distribution of calving ease data,

conditional upon a "linked" function of both ﬁxed and random effects as deﬁned

13

previously. The typical link function used for threshold mixed models in animal breeding
is the cumulative probit function. The probit function is simply the inverse function of
the standard normal cumulative density function (cdf). In a cumulative probit threshold

model for the analysis of data with C ordinal categories, the cumulative probability,

Prob(J S j), up to a certain ordinal score j, 135C, on a subject is written as standard

normal cdf of a linear function of the (i-I)th threshold parameter and the ﬁxed and
random effects associated with that subject. Using Figure 2, for example, the probability
of a dam experiencing an easy unassisted calving or easier calving is modeled as the cdf
of the ﬁrst threshold plus any ﬁxed or random effects that inﬂuence the magnitude of the
liability variable. Gianola and Foulley (1983) developed scoring equations for the
threshold mixed model under the Bayesian paradigm. These equations maximize the
joint posterior density of the ﬁxed and random effects, conditional upon known or
estimated variance components. The corresponding joint modes are then reported as
point estimates with standard errors based on the information matrix of this joint
posterior density. These joint modal or maximum a posteriori (MAP) estimates are
currently used for genetic evaluations on calving ease (Berger 1994; Wang et al. 1997).
Genetic evaluation models are of two broad types. The ﬁrst model is the sire or
sire and maternal grandsire model where calf records on, say, calving case are connected
directly to sires with genetic relationships identiﬁed only through known paternal
ancestry. The second is the animal model where each record is connected directly to each
calf identiﬁcation with all known genetic relationships explicitly modeled. Whereas
animal models are now predominantly used for genetic evaluation of production traits

under a linear mixed mode], sire or sire and maternal grandsire models are most oﬁen

14

used for threshold mixed model analysis of calving ease data. If the predominant genetic
relationship is that due to paternal halfsibs, there should be intuitively very little
difference in sire genetic evaluations between those computed under a sire versus an
animal model. Indeed, these models can be shown to be equivalent under a linear mixed
model framework if the only genetic relationships are paternal half sibs on calves making
the records. However, threshold animal models have been shown to lead to MAP
estimates of breeding values that are much more biased than those determined under
threshold sire models (Mayer, 1995), since the approximation invoked for MAP breaks
down substantially when the number of records per individual in a model is small. Such
a number on average is smaller in animal models than in sire models since there are
generally many more unique calf identiﬁcations than unique sire identiﬁcations in a set of
data.

BAYESIAN INFERENCE

Likelihood-based methods have been generally used to infer upon genetic
parameters in animal breeding and genetics. However, recent theoretical and
computational developments in Bayesian inference, such as Markov Chain Monte Carlo
methods, have inspired researchers to increasingly use Bayesian methods in animal
breeding and genetic applications.

In Bayesian inference, as with likelihood inference, the joint density of the data y
is characterized by a probability distribution p(y|0) . The quantity0 denotes the vector of
unknown parameters, including those that an investigator may wish to infer upon. A
Bayesian model essentially consists of two parts. The ﬁrst is the sampling distribution or

likelihood function p(y|0) which is the information provided by y on 0. The second part

15

is a prior distribution p(0). In the Bayesian approach, researchers may have some prior
knowledge about 0 which could be incorporated in the analysis using their speciﬁcation
of p(0). This feature creates one main distinction between Bayesian and frequentist or
likelihood approaches.

Aﬁer combining the prior information p(0) with the available information from

data p(yl0) , inference onO is based on the posterior distribution:

p(OIy)=p—(¥"%I;-@. where p(y>= ip<y|0)p(0)do.

In the posterior distribution, p(y) is constant relative to inference on 0 . Thus the posterior

distribution can be written in the more compact form
p(OIy) 0c p(yl0)p(9)
where 0:: denotes 'proportional to'. Herein lies another basic distinction between

likelihood and Bayesian inference. Bayesian inference is conditional upon a single

realization of y whereas likelihood or frequentist inference in based on conceptual
repeated realizations of y. Furthermore, provided that p(Gly) can be determined,

Bayesian inference is exact for small samples whereas likelihood inference generally is

not.
With 0=(0l ,0, ,...,0n )' , marginal and conditional densities of any parameter 0i can
be obtained from the joint density p((il ,02 ,...,0n) . That is the marginal posterior density of

any one parameter of interest, say 0i , can be determined by:

p(e,|y)= [p(e,,e,,....,9,,|y)do_i ,where 9_,=(9,,e,,...,e,,,,e,,,,...,e,,)'.

16

When the posterior distribution is determined, inference on parameters can be made using
any features of this distribution including location (mean, mode and median) and

dispersion (variance and standard deviation).

VARIANCE COMPONENT ESTIMATION

Variance components and their derivative genetic parameters (i.e., direct and
maternal genetic heritabilities and genetic correlation) in a threshold mixed model have
been historically estimated using approximate marginal maximum likelihood (ML)
techniques. The techniques are used to maximize the approximate marginal posterior
density of the variance components. One historically popular technique in animal
breeding is based upon an approximate invocation of the expectation-maximization (EM)
algorithm proposed by Harville and Mee (1984), and Stiratelli et al. (1984). In simulation
studies based on this algorithm, some biases were determined in heritability estimates.
Hoeschele et al. (1987) reported that mean square error of MML was larger than that of
REML when contemporary subclass sizes were small. Simianer and Schaeffer (1989)
also determined EM-MML heritability estimates to be biased upwards by 50% in a binary
trait simulation study when the mean subclass sizes were approximately 2.3 and 1.1 for

each dataset. Hoeschele et al. (1987) and Simianer and Schaeffer (1989) considered that
the approximation ulY,0: ~ N(ii,Cuu ) , where u is breeding values, Y is ordinal
categorical data, 0: is genetic variance, 0 is the posterior mode estimates of breeding

values and C... is the u-part of the inverse of the coefﬁcient matrix, used in EM-MML is
related to the bias of MML since this approximation is based on a normality assumption,

which may not hold with small subclass sizes.

17

A second approximate MML algorithm is based on Laplace's method (Leonard
1982) and was ﬁrst adapted to animal breeding by Tempelman and Gianola (1993). In a
sire model simulation study by Tempelman (1998), binary data was generated from each
of three populations (additive genetic variance = 0.20, 0.60 or 1.00). Laplacian MML
estimates were determined to be much less biased than EM-MML estimates of variance
components along with 20 to 30% less relative error. Since the Laplacian procedure is
analogous to the derivative-free REML procedure introduced by Graser et al. (1987), it
facilitates hypothesis testing of variance components via marginal likelihood ratio tests
(Tempelman, 1998).

Markov Chain Monte Carlo (MCMC) is a general method for the simulation of
random variables from probability density functions known only up to proportionality
(Geyer 1992) and thus is particularly well suited to Bayesian inference. The root of
MCMC methods can be traced back to the application of Metropolis et al. (1953) in
physics. The Gibbs sampler introduced by Geman and Geman (1984) is simply one
implementation of MCMC whereby the joint posterior distribution of all unknowns can
be derived from knowledge of the distributions of each unknown parameter conditional
on all other parameters and the data. These distributions are typically referred to as fully
conditional posterior distributions (FCD's). Due to the Markov Chain properties, MCMC
samples are autocorrelated such that the information content of these samples is less than
if the samples could be drawn independently. This problem can be readily overcome,
however, by simply drawing more MCMC samples. In contrast to the approximate EM-
MML algorithm, which simply yields point estimates, MCMC methods can retrieve the

entire posterior distribution of any unknown parameter of interest (Sorensen et al. 1994).

18

Wang et al. (1993, 1994a) ﬁrst used the Gibbs sampler in animal breeding,
applying the method to both linear mixed sire and animal models. Sorensen et a1. (1994)
and Wang et al. (1994b) implemented Gibbs sampling in a method for drawing Bayesian
inferences about selection response in livestock populations over time. Jensen et al.
(1994) extended the use of the Gibbs sampler to models that involve covariance
components, such as that would exist between direct and maternal additive genetic effects
for beef production traits.

The Gibbs sampler was introduced for the analysis of binary data by Zeger and
Karim (1991). Extensions of their procedure for the analysis of ordinal categorical
responses were presented by Albert and Chib (1993), who used it in conjunction with
data augmentation (Tanner and Wong 1987; Gelfand et al. 1992) in a somewhat
computationally simpler strategy relative to Zeger and Karim (1991). Sorensen et al.
(1995) subsequently adapted Albert and Chib's work to quantitative genetic inference on
ordinal data. By augmenting the observed categorical data with the MCMC generated
underlying liability variables, the FCD's for all ﬁxed and random effects and variance
components were found to have the same simple forms as in the linear mixed model,
using liability variables generated from their FCD as, essentially, the data (Sorensen et al.
1994). Van Tassel] et al. (1998) further extended the Gibbs sampler for multiple trait
analysis of several continuous and ordered categorical traits, applying their sampler to the
joint analysis of twinning and ovulation rates from a herd of cattle selected for twinning
rate at the US Meat Animal Research Center.

Hoeschele and Tier (1995) demonstrated that MCMC point estimates (posterior

means) of variance components have frequency properties that are superior to

19

approximate EM-MML estimates in simulated ordinal categorical data characterized by
small contemporary subclass sizes (i.e. small herds). However, these differences were
shown to be increasingly smaller with larger sample sizes.

Varona et al. (1999) analyzed calving difﬁculty scores from Gelbvieh cattle by
using a threshold animal model using MCMC to obtain the marginal posterior mean and
standard deviation of the direct and maternal heritabilities and the direct-matemal genetic
correlation. The posterior means i standard deviations of the direct and maternal
heritabilities for calving difﬁculty were O.23i0.036 and 0.10:0.018, respectively. The

posterior mean of genetic correlation (—0.36i0.090) demonstrated an antagonistic genetic
relationship between the direct and maternal effects for calving ease. Luo et al. (2001)
also studied statistical models including direct and maternal genetic effects for calving
ease scores using Gibbs sampling for inference. Inference properties on heritabilities and
genetic correlations comparing animal models versus sire and maternal grandsire models
and threshold mixed models versus linear mixed models were compared. They concluded
that linear mixed models produced significant downward or upward biased (P<0.05)
point estimates of heritabilities in contrast to point estimates derived using threshold
mixed models. Also, threshold sire and maternal grandsire models yielded smaller mean
squared error of heritability estimates than a threshold animal model. Furthermore,
treating herd-year-season (HYS) as random effects in the threshold sire models reduced
the variance of estimates of genetic parameters for the categorical traits. Luo et al.
(2001) concluded that the threshold sire and maternal grandsire model was most

appropriate for the quantitative genetic analysis of calving ease.

20

ROBUST MIXED LINEAR MODELS
For normally distributed data or alternatively for underlying liability variables in
threshold mixed models, residual terms are typically assumed to be normally distributed
with zero mean and common variance in animal breeding applications. This assumption,
however, may make the resulting analysis vulnerable to the presence of outliers (Rogers
and Tukey 1972, Lange et al. 1989). Heavier-tailed densities (such as the univariate or

multivariate Student t distributions) are viable alternatives to the Gaussian one. The t

family of distributions [t(,u, o-2 , v)] is characterized by three parameters: center [I , scale

0'2 and degrees of freedom parameter v e (0, 00) that determines the heaviness of the

tails (Gelman 1995). When v = 1 , the t distribution is the Cauchy distribution with a very
thick tail, inﬁnite mean and inﬁnite variance. Conversely, when v —> oo , the t distribution
approaches the lighter-tailed normal distribution. Having a thicker tail, the t distribution
can provide robustness against unusual or outlying observations when used to model the
density of the residual terms.

Lange et al. (1989) considered the t-distribution as a useful extension of the
normal distribution for statistical modeling of datasets having outliers. A maximum
likelihood approach was used for parameter estimation including the degrees of freedom
parameter. Numerous datasets were analyzed based on the maximum likelihood strategy
for a general t-distributed error model. Generally, the t model was found to ﬁt much
better than a normal model based on the likelihood ratio test. The asymptotic standard
errors for the location parameter estimates under the t model also were generally smaller
compared to the normal models. Reasonable estimates of degrees of freedom were

obtained for all datasets, except for a radioimmunoassay dataset. In a nonlinear regression

21

analysis of radioimmunoassay dataset, the maximum likelihood estimate of the degrees of
freedom parameter was 0.29, which was concluded to be not very satisfactory. Lange et
al. (1989) concluded that the low estimate stemmed from attempting to accommodate one
very extreme outlier. When that outlier was removed, the resulting estimate increased to
1.2. Fraser (1979) suggested that maximum likelihood estimation of degrees of freedom
is not advisable when degrees of freedom estimation goes much below 1 suggesting that
the t-model is not well suited to data with extreme outliers. Fraser (1979) further
suggested that in problems with small sample sizes, researchers might ﬁx the degrees of
freedom at some predetermined value, such as 4, rather than attempt to estimate the
degrees of freedom simultaneously with other parameters. Based on the results from
different analysis, they concluded that outliers and robustness can be handled by the
maximum likelihood estimator for all the parameters in the t-distributed error model
using the EM algorithm.

Lange and Sinsheimer (1993) extended the work by Lange et al. (1989) further by
considering alternative heavy-tailed distributions to the Student t. Each of the models
considered in their paper, including the logistic, slash, t and contaminated normal
distributions, were shown to be derived as scale mixtures of normals. More speciﬁcally,
given that the residual terms e,- are speciﬁed conditionally as e, ~ N (0,073) , i=1,2,. . .,n,
alternative heavy-tailed residual densities can be marginally speciﬁed by alternative

distributions on A). For example, a Student 1 distribution is represented as a scale mixture

of normals with Gamma distributed scaling factors xi, ~ GammaGE, %) , v being the

degrees of freedom (Lange et al. 1989). Alternatively, the slash distribution is

22

characterized by scaling factors having the density p(zi, I v) = WI,"—l , , i =1 ,2,. . . .,n . Under

a contaminated normal error model, each residual is drawn from a mixture of two
populations based on Prob(2., = 2. | ¢, y) = 7 if 2. = (15 or 1— y, if x1 =1 . In a logistic error

model, the probability is transformed to odds to remove the upper bound and then taking
the logarithm of odds also removes the lower bound for the linear function of the

explanatory variables. Lange and Sinsheimer (1993) chose the following logistic
model u, =0,/ (1+e92 «3.1.1., ”°‘["“"* ”2 ) for their analysis. Using maximum likelihood in large

datasets, the authors found that the t, slash and contaminated methods showed great
promise in muting the effects of outliers. The slash and t-models had particularly superior
estimation properties while the contaminated and logistic model were less satisfactory.
However, in the analysis of radioimmunoassay data, Lange et al. (1989) found potential
problems with the t and slash error models and suggested that the best alternative with a
data set of this nature may be to discard the outliers and simply use the normal error
models.

Geweke (1993) considered methods for Bayesian inference in econometric
models in which errors were independent and identically t—distributed using the Gibbs
sampler. Posterior odds ratios were calculated for t-error models having different degrees
of freedom versus a normal error model. Geweke applied these models to fourteen
macroeconomic time series datasets and found that the posterior odds ratio always
heavily favored the independent Student-t error model over the normal error model.

In animal breeding and genetics, it has been noted that preferential treatment
commonly inﬂuences the records of high producing or economically valuable animals

and consequently, their predicted breeding values (Stranden 1996). Stranden and Gianola

23

(1998) simulated the effect of preferential treatment of cows for milk production in four
herds of a multiple ovulation and embryo transfer scheme under selection. Three mixed
effect linear models were compared in terms of their ability to handle preferential
treatment: the classical Gaussian model, a model with multivariate t-distributed error
clustered by herd, and a model with independent t-distributed error. The posterior
distributions of all parameters were obtained using the Gibbs sampler. The three models
were found to have similar performance in the absence of preferential treatment.
Conversely, when the preferential treatment was prevalent and substantial in effect, the
univariate t-model lead to substantially less biased inference on breeding values and
genetic trends than those obtained with the Gaussian model.

Stranden and Gianola (1999) also presented a Bayesian approach for inferences
about parameters of mixed effects linear models with t-distributed error effects in
quantitative genetic application. Data was generated based on a preferential treatment
process. The univariate t-model for the errors led to better estimates of additive genetic
and error variances than either a herd-clustered t-model or a Gaussian sampling process.
Using a univariate t-model, posterior distributions of breeding values were found to be
sharper and the posterior mean estimates of heritabilities were closer to the true values
compared to estimates derived from the other two models.

Rosa (1999) considered the application of robust mixed linear models based on (-
distribution, slash and contaminated normal error distributions to pup birth weight from
reproduction toxicology study. Marginal posterior densities of degrees of freedom, for

the t and slash error distributions, were determined to be concentrated about small values,

24

suggesting the inadequacy of the Gaussian distribution. Rosa (1999) also computed
Bayes factors to verify the better ﬁt of the long-tailed distributions to this data set.

In statistical mapping of quantitative trait loci (QTL), a common assumption is
normally distributed phenotypic observations; any deviation from this assumption can
affect power and robustness of QTL detection. Von Rohr and Hoeschele (2002)
demonstrated the application of a skewed Student-t sampling model under four different
error distributions and determined that residual, additive QTL and dominance QTL
variance estimates were much closer to the true value when the analysis was performed
with the skewed Student-t model rather than with normal model. Replacement of the
normal by a skewed Student-t penetrance function also clearly improved the accuracy of
parameter estimation. Thus, their results indicated substantial beneﬁts of heavy-tailed
skewed error models for QTL mapping.

Albert and Chib (1993) considered a cumulative t-link function, rather than a
cumulative probit link function for the analysis of ordinal categorical data. This
speciﬁcation is equivalent to specifying the underlying latent variables in a threshold
model to be t-distributed rather than normally distributed. Gianola and Sorensen (1996)
extended Albert and Chib’s (1993) work to describe a hierarchical Bayes model with
correlated random effects suitable for quantitative genetic analysis of categorical data.
Albert and Chib (1993) also presented Gibbs sampling schemes for models where the
latent distribution was either univariate or multivariate t, and also considered Bayes

factors for contrasting different models as well as for identiﬁcation of outliers.

25

HETEROSKEDASTIC MIXED EFFECTS MODELS

An important assumption in most genetic evaluation models is that variance
components associated with random effects are constant across all environmental
conditions. However, the existence of heterogeneous variance, or heteroskedasticity, for
milk production (Hill et al. 1983; lbanez et al. 1999), growth performance (Garrick et al.
1989), and conformation (Robert-Granie et al. 1997) has been ﬁrmly established. More
recently, work on estimating heteroskedasticity on the underlying scale in threshold
models have been investigated in animal breeding (Foulley and Gianola 1996; Jaffrezic et
al. 1999; Ducrocq 2000).

A number of possible reasons for heteroskedasticity have been suggested,
including a positive relationship between herd means and variance, spatial gradients in
residual variability across geographical regions, and temporal gradients in residual
variability within herds due to changes in herd management. If heteroskedasticy is not
properly taken into account, differences in within-subclass variances can result in biased
breeding value predictions and disproportionate numbers of animals selected from
environments characterized by high variability, thereby reducing genetic progress (Hill
1984; Weigel and Gianola 1992).

Alternative methods have been proposed to take into account heterogeneity of
variance in quantitative genetics and animal breeding. For continuous production data, a
logarithmic transformation can be used to alleviate the heterogeneous variance problem if
the variances are a simple function of the mean of trait. Scaling of observations by the
estimated standard deviations was proposed by Hill (1984) as another method of

accounting for heterogeneous variance.

26

Heterogeneity of genetic and residual variances can be accounted for in genetic
evaluation using BLUP, provided that these variance components are known (Gianola
1986). However, variance components are rarely known and need to be estimated. Since
most preferred methods of variance components estimation, including ML and REML,
rely on asymptotic properties, such procedures may not yield reliable estimates if
variance components are estimated separately for each of many small subclasses (Weigel
and Gianola 1992). This was demonstrated by Winkelman and Schaeffer (1988) who
used REML and ANOVA to estimate within-herd genetic and residual variances.

Gianola et al. (1992) presented a Bayesian approach to deal with heterogeneity of
residual variances with respect to some criterion of classiﬁcation (strata) of the data. In
their model, variance components (or variance covariance matrices) were assumed to
have prior inverted chi-square (or inverted Wishart) distribution. The resulting empirical
Bayes estimates of within-herd residual variances can be essentially shown to be
weighted averages of the residual variance using data from the herd (i.e. REML) and the
average across-herd residual variance. Gianola et al. (1992) indicated that when the
amount of information in a particular herd stratum is large, the REML part of the
estimator dominates; otherwise, the average across herd estimate dominates.

Foulley et al. (1992) proposed a structural linear model for assessing the source of

heterogeneity of residual variances in Gaussian mixed linear models. Their procedures

are based on the concept of a log link function, 7,. = ln 0: = mpt. , for variance

components where a": is the residual variance in subclass i, and m, is a known incidence
matrix for the unknown vector A of dispersion parameters. The marginal likelihood

(marginal posterior density of A) is maximized with respect to A. to determine point

27

estimates. Hypothesis testing of any of the dispersion parameters in A are then based on
a marginal likelihood ratio test against subset models. The estimation algorithms in
Foulley et al. (1992) showed remarkable similarities to the mixed model equations of
Henderson.

San Cristobal et al. (1993) extended the method of Foulley et al. (1992) by
identifying potential multifactorial sources of heterogeneity of residual and genetic
variances in mixed linear Gaussian models. They developed the model based on a
structural linear model for log residual and genetic variances as a function of ﬁxed and
random effects, just as is done for location parameters in a regular linear mixed effects
model.

Foulley and Gianola (1996) adopted the structural linear model and log link
function for heterogeneity of residual variance of underyling liabilites in the cumulative
probit threshold model. Approximate statistical inference and hypothesis testing on all
dispersion parameters and goodness of ﬁt was again based on a marginal likelihood test.
Foulley and Gianola (1996) applied their method to calving ease scores from the US
Simmental breed and concluded that the heteroskedastic threshold model had a better ﬁt
to the data than the standard threshold model.

J affrezic et al. (1999) also used a structural linear model on log-variance to infer
upon heterogeneity of residual variances in mixed threshold model. For parameter
estimation they presented an approximate quasi-score approach including two steps: i) a
marginalization with respect to the random effects leading to quasi-score estimators, ii)

an approximation of variance-covariance matrix of the observations.

28

Ducrocq (2000) analyzed the calving ease data of French dairy breeds (Normande
and Montbeliarde) using the heteroskedastic threshold model as proposed by Foulley and
Gianola (1996). He ﬁrst selected a satisfactory model to describe the effects of
environmental factors on the location and dispersion parameters of the underlying
liability variables. Four random effects were then added: sire of calf, sire of dam, dam
(within sire of dam) and herd-year-season, assuming homogeneous ratios of variance
components across environments. Accounting for heterogeneity of the residual variance
in the analysis signiﬁcantly improved all model ﬁt criteria (likelihood ratio, AIC and chi-
square statistics). He determined that direct and maternal heritability estimates were
similar for both breeds but were substantially lower than those obtained by Manfredi
(1990). Direct heritabilities were 5.4% in both breeds, while maternal heritabilities were

very low (around 3%).

The speciﬁc objectives for this dissertation were to

1) Compare variance component, genetic parameter and breeding value estimates
based on the two marginal maximum likelihood procedures (EM and Laplace),
and MCMC using a threshold sire-matemal grandsire model in order to assess the
relative need for MCMC methods in a national genetic evaluation system for

calving ease.

2) Assess the potential utility of outlier-robust threshold models in simulated and

actual calving ease data.

29

3) Develop and apply a Bayesian structural multiplicative model on residual
variances for observed and augmented variables in heteroskedastic generalized

linear mixed models.

30

REFERENCES

Abdel-Azim, G. A.; Berger, P. J ., 1999: Properties of threshold model predictions. J.
Anim. Sci. 77: 582-590.

Albera, A.; Carnier, P.; Groen, A. F ., 1999: Breeding for improved calving performance
in Piemontese cattle economic value. Proceedings international workshop on EU
concerted action genetic improvement of functional traits in cattle (GIFT);
breeding goals and selection schemes. Bulletin no:23. Wageningen, The
Netherlands 7-9th November, 1999.

Albert, J. H.; Chib, S., 1993: Bayesian analysis of binary and polychotomous response
data. J. Am. Stat. Assoc. 88: 669-679.

Anderson, R, 1998: Minimizing calving difﬁculty in beef cattle. University of Minnesota
extension service. hm;://www.mes.umn.edu/Documents/D/I/DI5778.htrnl.

Bennett, G. L.; Gregory, K. E., 2001: Genetic (co)variances for calving difﬁculty score in
composite and parental populations of beef cattle: I. Calving difﬁculty score, birth
weight, weaning weight, and postweaning gain. J. Anim. Sci. 79: 45-51.

Berger, P. J ., 1994: Genetic prediction for calving ease in the United States: Data,
models, and use by the dairy industry. J. Dairy Sci. 77: 1146-1153.

Berger, P. J.; Freeman, A. E., 1978: Prediction of sire merit for calving difﬁculty. J.
Dairy Sci. 61: 1146-1152.

Bink, M. C. A. M.; Quaas, R. L.; Van Arendonk, J. A. M., 1998: Bayesian estimation of
dispersion parameters with a reduced animal model including polygenic and QTL
effects. Genet. Sel. Evol. 30: 103-125.

Burfening, P. J ., 1991: Factors affecting calving difﬁculty and implications to breeding
and management programs. Proceedings of the international beef symposium.
January 15-17, 1991. Montana.

Carnier, P.; Albera, A.; Dal Zotto, R.; Groen, A. F.; Bona, M.; Bittante, G., 2000: Genetic
parameters for direct and maternal calving performance over parities in Piemontese
cattle. J. Anim. Sci. 78: 2532-2539.

Dematawewa, C. M. B.; Berger, P. J., 1997: Effect of dystocia on yield, fertility, and cow

losses and an economical evaluation of dystocia sources for Holsteins. J. Dairy Sci.
80: 754-761.

Dong, M. C.; Quaas, R. L.; Pollak, E. J ., 1991: Estimation of genetic parameters of

calving ease and birth weight by a threshold model. J. Anim. Sci. 69 (Suppl. 1):
204 (Abstr.).

31

Ducrocq, V., 2000: Calving ease evaluation of French dairy bulls with a heteroskedastic
threshold model with direct and maternal effects. Genetic evaluations for
conformation and other functional traits: 123. Bulletin no: 25. Proceedings of the
2000 interbull meeting. Bled, Slovenia, May 14-15, 2000.

Falconer, D. S.; Mackay, T. F. C., 1996: Introduction to quantitative genetics, 4th edn.
Harlow, UK: Longman.

F oulley, J. L.; Irn, S.; Gianola, D.; Hoeschele, 1., 1987: Emprical Bayes estimation of
parameter for n polygenic traits. Genet. Sel. Evol. 19: 197:224.

Foulley, J. L.; San Cristobal, M.; Gianola, D.; Irn, S., 1992: Marginal likelihood and
Bayesian approaches to the analysis of heterogeneous residual variances in mixed
linear Gaussian models. Comp. Stat. Data Anal. 13 (1992) 291-305.

Foulley, J. L.; Gianola, D., 1996: Statistical analysis of ordered categorical data via a
structural heteroskedastic threshold model. Genet. Sel. Evol. 28: 249-273.

Fraser, D. A. S., 1979: Inference and linear models, New York: McGraw-Hill.

Garrick, D. J .; Pollak, E. J .; Quaas, R. L.; Van Vleck, L. D., 1989: Variance
heterogeneity in direct and maternal weight traits by sex and percent purebred for
Simmental-sired calves. J Anim. Sci. 67: 2515-2528.

Gelfand, A. E.; Smith, A. F. M.; Lee, T. M., 1992: Bayesian analysis of constrained
parameter and truncated data problems using Gibbs sampling. J. Am. Stat.
Assoc. 87: 523-532.

Gelman, A.; Rubin, D., 1992: Inference from iterative simulation using multiple
sequences. Stat. Sci. 15: 201-224.

Gelman, A.; Carlin, J. B.; Stern, H. S.; Rubin, D. B., 1995: Bayesian data analysis, lSt
edn., Chapman & Hall, London.

Geman, S.; Geman, D., 1984: Stochastic relaxation, Gibbs distribution and Bayesian
restoration of images. IEE Transactions on Pattern Analysis and Machine
Intelligence 6: 721-741.

Geyer, C. J ., 1992: Practical Markov chain Monte-Carlo (with discussion). Stat. Sci. 7:
467-511.

Geweke, J., 1993: Bayesian treatment of the independent Student-t linear model, J. Appl.
Econometrics 8: 19-40.

Gianola, D., 1982: Theory and analysis of threshold characters. J. Anim. Sci. 54: 1079-
1096.

32

Gianola, D.; Foulley, J. L.. 1983: Sire evaluation for ordered categorical data with a
threshold model. Genet. Sel. Evol. 15 :201-224.

Gianola, D., 1986: On selection criteria and estimation of parameters when the variance
is heterogeneous. Theor. Appl. Genet. 72: 671-677.

Gianola, D.; Foulley, J. L.; Fernando, R. L.; Henderson, C. R.; Weigel, K. A., 1992:
Estimation of heterogeneous variance using empirical bayes methods: Theoretical
considerations. J. Dairy Sci. 75 : 2805-2823.

Gianola, D.; Sorensen, D., 1996: A mixed effects threshold model with a t distribution.
47th Annual Meeting of the European Association for Animal Production,
Lillehammer, Norway, 15p.

Graser, H. U.; Smith, S. P.; Tier, B., 1987: A derivative-free approach for estimating
variance components in animal models by restricted maximum likelihood. J. Anim.
Sci. 64:1362-1370.

Harville, D. A.; Mee, R. W., 1984: A mixed-model procedure for analyzing ordered
categorical data. Biometrics 40: 393-408.

Henderson, C. R., 1973: Sire evaluation and genetic trends. In: Proc. Anim. Breeding
Genet. Symp. In Honor of Dr. Jay L. Lush. Pp 10-41. Am. Soc. Anim. Sci.
Charnpaign, IL.

Herring, W.O., 1996: Calving difﬁculty in beef cattle. Agricultural publication G2035.
Published by University extension, University of Missouri-Columbia.

Hill , W. G.; Edwards, M. R.; Ahmed, M. K. A.; Thompson, R., 1983: Heritability of

milk yield and composition at different levels and variability of production. Anim.
Prod. 36: 59-68.

Hill W. G., 1984: On selection among group with heterogeneous variance. Anim. Prod.
39: 473-477.

Hoeschele, 1.; Gianola, D.; Foulley, J. L., 1987: Estimation of variance components with

quasi-continuous data using Bayesian methods. J. Anim. Breed. Genet. 104: 334-
349.

Hoeschele, I.; Gianola, D., 1989: Bayesian versus Maximum Quasi Likelihood methods
for sire evaluation with categorical data. J. Dairty Sci. 72: 1569-1577.

Hoeschele, 1.; Tier, B., 1995: Estimation of variance components of threshold characters

by marginal posterior mode and means via Gibbs sampling. Genet. Sel. Evol. 27:
519-540.

33

lbanez, M. A.; Carabano, M. J .; Alenda, R., 1999: Identiﬁcation of sources of
heterogeneous residual and genetic variances in milk yield data from the Spanish

Holstein-Friesian population and impact on genetic evaluation. Livest. Prod. Sci.
5 9: 33-49.

Jaffrezic, F.; Robert-Granie, C.; Foulley, J. L., 1999: A quasi-score approach to the

analysis of ordered categorical data via a mixed heteroskedastic threshold model.
Genet. Sel. Evol. 31: 301-318.

Jensen, J .; Wang, C. S.; Sorensen, D. A.; Gianola, D., 1994: Bayesian inference on
variance and covariance components for traits inﬂuenced by maternal and direct
genetic effects, using the Gibbs sampler. Acta Agric. Scand. 44: 193-201.

Lange, K. L.; Little, R. J. A.; Taylor, J. M. G., 1989: Robust statistical modeling using
the t distribution. J. Am. Stat Assoc. 84:881-896.

Lange, K. L.; Sinsheimer, J. S., 1993: Normal/independent distributions and their
applications in robust regression. J. Am. Stat. Assoc. 84 :881-896.

Laster, D. B.; Glimp, H. A.; Cundiff, L. V; Gregory, K. E., 1973: Factors affecting
dystocia and the effects of dystocia on subsequent reproduction in beef cattle. J.
Anim. Sci. 36: 695-705.

Laster, D. 3.; Gregory, K. E., 1973: Factors inﬂuencing peri- and early potrratal calf
mortality. J. Anim. Sci. 37: 1092-1097.

Luo, M. F .; Boettcher, P. J .; Dekkers, J. C. M.; Schaeffer, L. R., 1999: Bayesian analysis
for estimation of genetic parameters of calving ease and stillbirth for Canadian
Holsteins. J. Dairy Sci. 82: 1848.

Luo, M. F.; Boettcher, P. J.; Schaeffer, L. R.; Dekkers, J. C. M., 2001: Bayesian
inference for categorical traits with an application to variance components
estimation. J. Dairy Sci. 84 : 694-704.

Manfredi, E. J .; Ducrocq, V.; Foulley, J. L., 1991a: Genetic analysis of dystocia in dairy
cattle. J. Dairy Sci.. 74: 1715-1723.

Manfredi, E. J .; San Cristobal, M.; Foulley, J. L., 1991b: Some factor affecting the

estimation of genetic parameters for cattle dystocia under a threshold model.
Anim. Prod. 53: 151-156.

Matos, C. A. P.; Thomas, D. L.; Young, L. D.; Gianola, D., 1994: Analysis of lamb
survival using linear and threshold models with maternal effects. 5th World

Congress of Genetics Applied to Livestock Production, Ontario, Canada. Vol. 18:
426-429.

34

Matos, C. A. P.; Thomas, D. L.; Gianola, D.; Tempelman, R. J.; Young, L. D., 1997:
Genetic analysis of discrete reproductive traits in sheep using linear and nonlinear
models. 1. Estimation of genetic parameters. J. Anim. Sci. 75 : 76-87.

McGuirk, B. J .; Going, I; Gilmour, A. R., 1998: The genetic evaluation of beef sires used
for crossing with dairy cows in the UK. 2. Genetic parameters and sire merit
predictions for calving survey trits. Animal Sci. 66: 47-54.

McGuirk, B. J .; Going, I; Gilmour, A. R., 1999: The genetic evaluation of UK Holstein
F riesian sires for calving ease and related traits. Animal Sci. 68: 413-422.

Meijering, A., 1984: Dystocia ans stillbirth in cattle - a review of causes, relations and
implications. Livest. Prod. Sci., 11: 143-177.

Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A.; Teller, H., 1953:

Equations of state calculations by fast computing machines. J. Chemical Physics
21: 1087-1091.

Olesen, I.; Perez-Enciso, M.; Gianola, D.; Thomas, D. L., 1994: A comparison of normal
and nonnormal mixed models for number of lambs born in Norwegian sheep. J.
Anim. Sci. 72: 1166-1173.

Philipsson, J ., 1976: Studying calving difﬁculty, stillbirth, and associated factors in
Swedish cattle breeds. V. Effects of calving performance and stillbirth in Swedish

Fresian heifers on productivity in the subsequent lactation. Acta. Agric. Scand. 26:
230-260.

Pollak, E. J.; Freeman, A. E., 1976: Parameter estimation and sire evaluation for dystocia
and calf size in Holsteins. J. Dairy Sci. 59: 1817-1825.

Rarnirez-Valverde, R.; Misztal, 1.; Bertrand, J. K., 2001: Comparison of threshold vs
linear and animal vs sire models for predicting direct and maternal genetic effects
on calving difﬁculty in beef cattle. J. Anim. Sci. 79: 333-338.

Rogers, W. H.; Tukey, J. W., 1972: Understanding some long tailed distributions.
Statistica Neerlandia 26 : 21 1-226.

Robert-Granie, c.; Ducrocq, V.; F oulley, J. L., 1997: Heterogeneity of variance for type
traits in the Montbeliarde cattle breed. Genet. Sel. Evol. 29: 545-570.

Rosa, G. J. M.; 1999: Robust mixed linear models in quantitative genetics: Bayesian

analysis via Gibbs sampling, in: Proceedings of International Symposium on
Animal Breeding and Genetics, 21-24 September 1999, Brazil, pp. 133-159.

35

San Cristobal, M.; Foulley, J. L.; Manfredi, E., 1993: Inference about multiplicative

heteroskedastic components of variance in a mixed linear Gaussian model with an
application to beef cattle breeding. Genet. Sel. Evol. 25: 3-30.

Simianer, H.; Schaeffer, L. R., 1989: Estimation of covariance components between one
continuous and one binary trait. Genet. Sel. Evol. 21 : 303-315.

Sorensen, D. A.; Andersen, S.; Gianola, D.; Korsgaard, I., 1995: Bayesian inference in
threshold models using Gibbs sampling. Genet. Sel. Evol. 27: 229-249.

Stiratelli, R.; Laird, N.; Ware, J. H., 1984: Random-effects models for serial observations
with binary response. Biometrics. 40: 961-971.

Stranden, I. J ., 1996: Robust mixed effects linear models with t-distributions and
application to dairy cattle breeding. PhD thesis, University of Wisconsin-Madison.

Stranden, 1.; Gianola, D.; 1998: Attenuating effects of preferential treatment with
Student-t mixed linear models: a simulation study, Genet. Sel. Evol. 30: 565-583.

Stranden, 1.; Gianola, D.; 1999: Mixed effects linear models with t-distributions for
quantitative genetic analysis: a Bayesian approach, Genet. Sel. Evol. 31 : 25-42.

Tanner, M. A.; Wong, W. H., 1987: The calculation of posterior distributions by data
augmentation (with discussion). J. Am. Stat. Assoc. 82: 528-550.

Tempelman, R, J .; Gianola D., 1993: Marginal maximum likelihood estimation of

variance components in Poisson mixed models using Laplacian integration.
Genet. Sel. Evol. 25 : 305-319.

Tempelman, R. J ., 1998: Generalized linear mixed models in dairy breeding. J. Dairy Sci.
81: 1428-1444.

Van Tassell, C. P.; Van Vleck L. D.; Gregory, K. E., 1998: Bayesian analysis of twinning
and ovulation rates using a multiple-trait threshold model and Gibbs sampling. J.
Anim. Sci. 76: 2048-2061.

Varona L.; Misztal, I.; Bertrand, J. K., 1999: Threshold-linear versus linear-linear
analysis of birth weight and calving ease using an animal model: I. Variance

component estimation. J. Anim. Sci. 77: 1994-2002.

Von Rohr, P.; Hoeschele, I., 2002: Bayesian QTL mapping using skewed Student-t
distribution. Genet. Sel. Evol. 34: 1-21.

Walker, D.; Ritchie, H.; Hawkis, D.; Gibson, C., 1994: Pelvic measurement and calving
difﬁculty in beef cattle. MSU extension beef bulletins. 23300001.

36

Walsh, B., 2000: Markov Chain Monte Carlo and Gibbs sampling. Lecture notes for EEB
5962.

Wang, C. S.; Rutledge, J. J.; Gianola, D., 1993: Marginal inferences about variance
components in a mixed linear model using Gibbs sampling. Genet. Sel. Evol. 25 :
41-62.

Wang, C. S.; Rutledge, J. J .; Gianola, D., 1994a: Bayesian analysis of mixed linear
models vi Gibbs sampling with an application to litter size in Iberian pigs. Genet.
Sel. Evol. 26: 91-115.

Wang, C. S.; Gianola, D.; Sorensen, D.; Jensen, J .; Christensen, A.; Rutledge, J. J.,
1994b: Response to selection for litter size in Danish landrace pigs-A Bayesian
analysis. Theor. And Appl. Genet. 88: 220-230.

Wang, C. S.; Quaas, R. L.; Pollak, E. J ., 1997: Bayesian analysis of calving ease score
and birth weights. Genet. Sel. Evol. 29: 117-143.

Weigel K. A.; Gianola, D., 1992: Estimation of heterogeneous within-herd variance

components using empirical bayes method: A simulation study. J. Dairy Sci. 75 :
2824-2833.

Weller, J. I.; Misztal, I; Gianola, D., 1988: Genetic analysis of dystocia and calf mortality
in Israeli-Holsteins by threshold and linear models. J. Dairy Sci. 71: 2491-2501.

Whittier, J .C .; Thorne, J .G., 1995: Assisting the beef cow at calving time. Agricultural

publication G2007. Published by University extension, University of Missouri-
Columbia.

Winkelman, A.; Schaeffer, L. R., 1988: Effect of heterogeneity of variance in dairy sire
evaluation. J. Dairy Sci. 71: 3033-3041.

Wright, S., 1934: An analysis of variability in number of digits in an inbred strain of
guinea pigs. Genetics 19: 506-5 18.

Zeger, S. L.; Karim, M. R., 1991: Generalized linear models with random effects-A
Gibbs sampling approach. J. Am. Stat. Assoc. 86: 79-86.

37

Table I. Itemized dystocia costs (Albera et al. 1999).

 

 

Parameter Dollars'

Cost of caesarean section 99.99
Labor of the farmer per hour 4.58
Price of a newborn male calf 545.46
Price of a newborn female calf 409.12
Cost of involuntary culling after ﬁrst calving 306.40
Cost of involuntary culling after other calvings 483.67

 

'Based on lEuro = 8 0.8802.

38

 

 

- ”do ".8 55685
- Rd 85 =8 Bx
- 35 N3 .8me..
- m3 8 .o 582::
- and $3 2820:
- 93 mg 5328
- mm... 8.9 32820
- :3 35 53.5%
- E .o and 8?... 32m: 38: 28s romeo .23 .358
36. 86 36 098882 msoocowESo:
o _ .9 8.: 3° 83:38: 425. 2285 88s 8225
3.3%.? 8.385 8.32 .o 328.55 4.23. 325 883 a a .2an
2.? 2:. 3° 52%: 0202 325 $8: a s a:
839...? 8.32 .o 3.335 £23.00 2222 22.85. 88: .... a 22.;
8.32.? 8.386 8.32 .o 5328 0202 3:: 88: a .0 «83>
- - 8.32 .c 52%: 42mm 228$ 88: .3 .0 U359:
- - 8.388 52%: 42mm 8:: 38: a a 63.622
- - 8de _ .o £2583
- - modﬁc _ .o 2820:
- - 8.338 .355 32mm 2285 $8: ...... .0 V3.522
- - 5.336 5395.—
- - 8.336 289.0:
- - 8.3_ 3 22820 32.“: has: 38: .... a 63.622
e _ .o- 2 .c a _ .o 55555 - 228:: :8: .a .... 9.8
c _ .o- So 8... 5320:
m3 _ 3 3° 85:52 .282: 223:: :8: .... a 6&5:
- - 3o 5220: 252:2 5:: 53: SEE:
a. EN; ...: 32m 8502 382 Sam

 

.808 030 wag—no Lou 6528.6 80:3 Echo Havana H 86:58 A35
833:8 88% 3838388 an .35 banana; 3832. .35 £385: 525 .: as;

38

Maternal Components (Dam Effects)1 I

Paternal Components (Sire Effects)
—DAM’S GENES ‘l f l SIRE’S GENES
Size & Muscling

 

Fetal Effects

 

CALF’S GENES
Direct Effects

 

Matern al Effects

 

 

V

V
Maternal Dam’s Pelvic L——> CALF’S SIZE
Preparation Size

 

 

Birth weight, shape, sex

 

Maternal Effects

|

 

i

, + CALVING DIFFICULTIES — DYSTOCIA

f ‘ +
Still Births Impaired Fertility

Loss of dams
Labor costs

Reduced Milk Production

 

Figure l. A ﬂowchart interaction of direct additive and maternal genetic effects with
other factors in terms of their effect on calving difficulty (Burfening 1991).

40

   
    

 

 

Threshold 2

 

     
 

 

Threshold 1
Threshold 3
Assisted Assisted C‘ d
Unassisted easy diﬁicult E118???“ an
calving calving calving ‘1‘ 0 011W
'3 ’3 " 1 0 1 z 3
Llablhty

Figure 2. Mapping of underlying liabilities to observed calving ease phenotypes.
Thresholds 1, 2, and 3 delimit normally distributed underlying liabilities into 4 calving
ease phenotypes (unassisted calving, assisted easy calving, assisted difficult calving, and
Caesarean and embryotomy).

41

CHAPTERZ

Bayesian Inference Strategies for the Prediction of Genetic Merit Using
Threshold Models with an Application to Calving Ease Scores in Italian

Piemontese Cattle

ABSTRACT

First parity calving difﬁculty scores from Italian Piemontese cattle were analyzed
using a threshold mixed effects model. The model included the ﬁxed effects of age of
dam and sex of calf and their interaction and the random effects of sire, maternal
grandsire, and herd-year-season. Covariances between sire and maternal grandsire
effects were modeled using a numerator relationship matrix based on male ancestors.
Field data consisted of 23,953 records collected between 1989 and 1998 from 4,741 herd-
year-seasons. Variance and covariance components were estimated using two alternative
approximate marginal maximum likelihood (ML) methods, one based on expectation-
maximization (EM) and the other based on Laplacian integration. Inferences were
compared to those based on three separate runs or sequences of Markov Chain Monte
Carlo (MCMC) sampling in order to assess the validity of approximate MML estimates
derived from data with similar size and design structure. Point estimates of direct
heritability were 0.24, 0.25 and 0.26 for EM, Laplacian and MCMC (posterior mean),
respectively whereas corresponding maternal heritability estimates were 0.10, 0.11 and
0.12, respectively. The covariance between additive direct and maternal effects was
found to be not different from zero based on MCMC-derived conﬁdence sets. The
conventional joint modal estimates of sire effects and associated standard errors based on

MML estimates of variance and covariance components differed little from the respective

42

posterior means and standard deviations derived from MCMC. Therefore, there may be
little need to pursue computation—intensive MCMC methods for inference on genetic
parameters and genetic merits using conventional threshold sire and maternal grandsire

models for large datasets on calving ease.

INTRODUCTION

Calving ease has a signiﬁcant economic impact on beef and dairy production
through increased risk to the survival of both calf and cow (McGuirk et al. 1998). The
Italian Piemontese breed has a particularly high percentage of difficult calvings (Carnier
et al. 2000) relative to reported estimates from other breeds (Varona et al. 1999; Bennett
and Gregory 2001 ). Accurate inference on genetic parameters and genetic merit are
important precursors for effective sire selection strategies to improve calving ease.

The threshold mixed model developed by Gianola and Foulley (1983) is one
example of generalized linear mixed models (Breslow and Clayton 1993) which have
become increasingly popular for the multifactorial mixed effects analyses of non-normal
phenotypes. There have been a number of recent studies illustrating advantages of
threshold mixed models over linear mixed models for the analysis of ordinal calving ease
data (Luo et al. 2001; Rameriz-Valverde et al. 2001).

Empirical Bayes or maximum a posteriori (MAP) estimates of breeding values
are based upon a joint maximization of the posterior density of ﬁxed and random effects
conditioned upon variance components being known, or set equal to their estimates
(Foulley et al. 1987). These MAP estimates are currently used for published genetic

evaluations on calving ease (Berger 1994; Wang et al. 1997). Variance components and

43

their derivative genetic parameters in a threshold mixed model have been historically
estimated using approximate and deterministic marginal maximum likelihood (MML)
techniques. One such technique is based upon an approximate invocation of the
expectation-maximization (EM) algorithm as proposed by Harville and Mee (1984) and
Stiratelli et al. (1984). Some signiﬁcant biases on heritability estimates using this
algorithm were reported in simulation studies conducted by Hoeschele et al. (1987),
Hoeschele and Gianola (1989) and Simianer and Schaeffer (1989), particularly when
contemporary subclass sizes were small and genetic parameters were large in magnitude.

A second approximate MML algorithm is based on Laplace's method and was
ﬁrst adapted to animal breeding by Tempelman and Gianola (1993). In a sire model
simulation study, Tempelman (1998) determined that Laplacian MML estimates tended
to be much less biased than EM-MML estimates of variance components. Since the
Laplacian procedure is analogous to the derivative-free REML procedure introduced by
Graser et al. (1987), it also facilitates hypothesis testing of variance components via
marginal likelihood ratio tests (Tempelman, 1998).

Markov Chain Monte Carlo (MCMC) techniques allow small sample inference
and have been utilized in animal breeding and genetics (Sorensen et al. 1995; Hoeschele
and Tier 1995; Varona et al. 1999; Luo et al. 1999; Luo et al. 2001). The relative
improvements in properties of MCMC point estimates (posterior means) of variance
components relative to approximate MML estimates from data characterized by small
contemporary subclass sizes have been demonstrated in Simulation studies by Hoeschele
and Tier (1995), although differences were shown to be increasingly smaller with larger

sample sizes. Even so, in the largest simulated dataset considered by Hoeschele and Tier

(1995), herd-year-season (HYS) subclasses were large in average size (>50) and maternal
effects were not considered. In contrast, many breed association datasets used for genetic
evaluations are large yet characterized by many small contemporary subclass Sizes. The
computing requirements for MCMC are not trivial for large datasets, and the need for
MCMC, not only for inference on variance components but as well as on breeding values,
needs to be validated. Furthermore, execution of MCMC requires far greater care than
conventional MAP implementations for national genetic evaluations.

The objectives of this study were 1) to determine and compare variance
component and corresponding genetic parameter estimates for calving ease scores in an
Italian Piemontese population using the two MML procedures (EM and Laplace), and
MCMC based on a threshold Sire-maternal grandsire model, and 2) to compare
conventional MAP estimates of breeding values and approximate standard errors with
corresponding MCMC posterior means and stande deviations in order to assess the
relative need for MCMC methods in a national genetic evaluation system for calving
ease, and 3) to demonstrate the utility of a Metropolis-Hastings update on improving

MCMC mixing on the threshold parameters.

MATERIALS AND METHODS
Data
First pan'ty calving ease scores recorded on Italian Piemontese cattle from
January, 1989 to July, 1998 by ANABORAPI (Associazione Nazionale Allevatori Bovini
di Razza Piemontese, Strada Trinita 32a, 12061 Carru, Italy) were used in this study.

Only herds with at least 50 records over that time period were considered, leaving a total

45

of 23,953 records. Calving difﬁculty was coded into ﬁve categories by breeders and
recorded by technicians who visited the farmers monthly: 1) unassisted delivery, 2)
assisted easy calving 3) assisted difficult calving 4) caesarean section and 5) foetotomy.
Categories 4 and 5 were combined for analyses as the incidence of foetotomy was less
than 0.5%. The general frequencies of ﬁrst parity calving ease scores in the data set were
2,747 (11.47%) for unassisted delivery, 14,131 (58.99%) for assisted easy calving, 3,683
(15.38%) for assisted difﬁcult calving and 3,392 (14.16%) for caesarean section and
foetotomy.

The effects of age of dam and sex of calf and their interaction were modeled by
combining eight different age groups (20 to 23, 23 to 25, 25 to 27, 27 to 29, 29 to 31, 31
to 33, 33 to 35, and 35 to 38 months) with sex of calf for a total of 16 nominal subclasses.
Herd-year-season (HYS) subclasses were created from combinations of herd, year, and
two different seasons (from November to April and from May to October) as in Carnier

et al. (2000), except that HYS was treated as random in this study.

The total number of identiﬁed sires in the pedigree ﬁle was 9,090. A total of
1,817 of these Sires were identiﬁed as being both calf sires and calf maternal grandsires,
thereby indicating the density of genetic relationships accruing from selection over time.
After pedigree pruning (i.e. treating as unknown those sires that do not have daughters
with calving ease records and appear in the pedigree ﬁle only once), the number of sires

that uniquely contributed to inference on genetic parameters and breeding values was

determined to be 3,624.

46

Threshold Model
Calving ease scores are determined by unobserved underlying continuous

variables or liabilities and a set of ﬁxed thresholds, r, < 12.... < r

m-l 9

with To = —oo and
rm = 00, where m is the number of categories. An observed calving ease score is
dependent upon underlying variable, which is bounded between two unobserved
thresholds (Gianola and Foulley 1983). More speciﬁcally, calving ease scores y,~ for

individual i, in this study are deﬁned by the following bins for (1,, the underlying liability

for individual i:

‘

 

1 To < U ,. s 2',
_ 2 2', < U .- S 2', (1)
ﬁ-h Q<Q$g
k4 r3 < U, < r, i=1....n
A linear model is used to characterize the distribution of the unobserved liabilities:
U = n + e (2)

where U is a n x 1 vector of liabilities on calving difﬁculty, I] = E (U) is the vector of
mean liabilities, and e is a vector of liability residuals with e | of ~ N (0,10: ).

In addition to the speciﬁcations on To and t4 , identiﬁability constraints require
that one of I, , 12 , or 13 be set to an arbitrary constant; in this case, we set I, = 0 such

1',

that the vector of estimable thresholds is 1' =[ ] . An additional identiﬁability

T3
constraint was satisﬁed by setting of =1 (Gianola and Foulley 1983).

The mean liability is written as a linear combination of ﬁxed and random factors

as typical of a sire and maternal grandsire model,

47

n=Xﬂ+le+Z2m+Z3h (3)
where ll is a p x 1 vector of ﬁxed intercept and age-sex effects, s is a qa x 1 vector of

random sire effects, m is a qa x 1 vector of random maternal grandsire (MGS) effects, h
is a qh x 1 vector of random HYS effects, and X, Z1, Z2, and Z3 are known incidence

matrices. We assume:

[;]~N([g],c=com]

and
h ~ N(o,Ia,3)
02
where G() =[ “ ‘2'") , with of denoting the sire variance, of, denoting the maternal
0"?" 0.,"

grandsire variance, as," denoting the sire-maternal grandsire covariance, and a:

denoting the HYS variance. Furthermore, ® denotes the Kronecker product (Searle
1982), and A is the numerator additive relationship matrix between sires due to identiﬁed

male ancestors (Henderson 1976). Also, h is assumed independent of s and m.

MML Using the EM Approach
Variance components were ﬁrst estimated by MML using the EM-type algorithm
of Harville and Mee (1984). Applied to the problem at hand, the EM algorithm is based

on iterating through

2 §'A"§ +trace(A"(~3n)
0', = (4a)
qa

 

m'A"m+trace A"ém
qa

 

§'A"m + trace(A"(~3m)
0's". = (4c)
qa

 

and

 

a}, = (4d)

Writing 0' =[t' [1' s' m' h'] , then C“, Um," , CW and UN, represent the sire by

sire, MGS by MGS, sire by MGS and HYS by HYS portions, respectively, of

 

é _ 62 log p(O | 0.3,af,,am,a,f,y)
6960'

and 0' = [9' [3' s' a" h '] is the joint maximizer of the conditional posterior density
p(0 | of , of, , as," , a: ,y) , readily determined using scoring methods as presented by
Gianola and Foulley (1983). In our study, Newton Raphson was used to compute 0

based on Fortran90 program that implemented FSPAK90 (Misztal and Perez-Enciso
1998) due to the dimension and sparsity of 62.

At convergence, the resulting MML estimates are asymptotically considered to be
joint maximizers of p(af , 0'3, , as," , a; | y), the joint posterior density of the

variance/covariance components. Convergence was determined to occur when the
absolute difference in all MML estimates from one iterate to the next was less than 1045.
Further details and examples on the implementation of this MML technique invoking BM

in animal breeding can be found in Foulley et al. (1987) and Hoeschele et al. (1987).

49

Laplace's Method
Laplace's method was the second approximate MML technique applied to the
data. Details on implementing Laplace's method for variance component estimation in
generalized linear mixed models are provided in Tempelman and Gianola (1993) and in
Tempelman (1998). Applied to the problem at hand, Laplace's method is based on
maximizing the following approximate log marginal joint density of the (co)variance

components.

It
2 2 2 ~ I A v A o A r A _ I A l A I A I A
10g p (as 3 am 9 arm ’ ah I y) ~ 2 In ((D (Ty, -xiB-zlis-12im-z3ih) (D (TM-l -xip-zlis-12im-z3ih ))
i=l

A

[g' m']G" [f

m]_ﬂ1n(a,f)—:—l:——%ln|(~3|
0h

 

q
-41 G _.
2 n( o) 2 2

(5)
where n is the total number of observations. The MML estimates of the variance
components that jointly maximize (5) were determined using the simplex algorithm
(Nelder and Mead 1965), as previously implemented for derivative free REML (Meyer

1989). Convergence was assumed when the variance of the values of
log p(af ,0'3,,0'm,0',f ly) in the simplex was less than 10‘. In order to control the

probability of converging to a local maximum, the simplex algorithm was restarted once
from the ﬁrst set of converged estimates.

Smith and Graser (1986) presented a method for deriving asymptotic standard
errors of variance components for derivative-free REML that could be adopted in

Laplace's method, doing so jointly for all variance components. We determined this joint

50

assessment to be numerically unstable for our problem. Therefore, approximate standard
errors of the Laplace MML estimates were derived separately for each component by

holding each of the other components equal to their MML estimates in
log p(af ,03” amp; I y) and proﬁling log p(af ,0'3,,0'sm ,0: | y) over a grid of values

for the component of interest. Speciﬁcally, a second order polynomial was ﬁtted to this

surface in the following manner:
b0 +b10',2 +b2 (0',2 )2 = log p(o',2 | &3,,y) (6)
where 6'3, denotes the ML estimates of all variance components other than the

component of (r= s, m, sm and h) of current interest. The function in (6) was ﬁtted using

10 equally spaced values of of local to each MML estimate 6'3 such that the

—1
corresponding asymptotic standard errors was computed as (,/—2b2) .
Final MAP solutions to 0 were determined by jointly maximizing
p(0 | 0'52 , 0,3,, cm, 0'; ,y) via Newton-Raphson with the variance components set equal to

their approximate MML estimates (EM—based method and Laplace's method).

Asymptotic standard errors of the MAP estimate of s were determined as the square roots
of the diagonal elements of C“. evaluated at the MML estimates of the variance
components.

Formal hypothesis testing was carried out on each (co)variance component.

Likelihood ratio tests for am and for a: were based on computing the difference

between -2 log p(af , of, , as," , of | y) evaluated at the joint MML estimates with
-2 log p(af ,0'3, ,0”, , a": | y) maximized with respect all (co)variance components except

51

the parameter(s) of interest which were set to 0. The difference involving cm was
compared to a 1,2 whereas the difference involving 0': was compared to an equal
mixture of I: and 1,2 as required by Stram and Lee (1994). The likelihood ratio test for

testing Ho: of = 0 was based on comparing the difference between

—2 log p(af ,0"; ,0'_,,,,,0',f ly) evaluated at the joint MML estimates with

—2 log p(og2 = 0,03,,0'w, = 0,0,3 ly) maximized with respect to a“: and 0'; against a

distribution based on a equal mixture of 1,2 and 122 (Stram and Lee 1994). The

hypothesis test Ho: of, = 0 was similarly tested by determining the difference between

—2 log p(of , of, ,0”, , a"; | y) evaluated at the joint MML estimates with

—2 log p(ofmf, = 0,01,," = 0,0,3 ly) maximized with respect to a"; and of.

Markov Chain Monte Carlo (MCMC)

The same model was considered here as with the approximate MML/MAP
procedures previously outlined. The same MCMC procedures as outlined by Albert and
Chib (1993) and adapted to animal breeding by Sorensen et al. (1995) were applied here
with one important exception. The threshold parameters in 1' were sampled jointly using
the multivariate Metropolis-Hastings (MH) update presented by Cowles (1996). In
agreement with the simulation studies by Cowles (1996), the Cowles MH update
facilitated considerably faster mixing on the threshold parameters relative to sampling

from univariate full conditionals on 12 and t3 , as presented by Albert and Chib (1993).

This issue is not trivial since the slowest mixing parameters in previous MCMC threshold

52

model analyses are generally the threshold parameters (Sorensen et al. 1995; Varona et
al. 1999). Flat priors were invoked on the ﬁxed effects and on the variance components.
Three separate MC MC chains of 85,000 cycles were generated. Starting values
for parameters in chain 1 were based on Laplace estimates of variance components. In
chain 2, starting values were set equal to each Laplace variance component estimate plus
three times the corresponding asymptotic standard error whereas for chain 3, starting
values were set equal to each Laplace variance component estimate minus three times the
corresponding asymptotic standard error. Aﬁer discarding the ﬁrst 5,000 cycles within
each chain as "bum-in" (Gelman and Rubin, 1992), each set of the remaining 80,000
cycles within each chain were subsequently saved to determine the marginal posterior
density of each variance component and identiﬁable threshold parameters and to
determine the posterior means and standard deviations of each location parameter.
Generally, MCMC implementations are most problematic for variance component
and threshold parameters in terms of convergence and mixing (Sorensen et al. 1995).

Convergence diagnostics were assessed for each variance component and threshold

parameters by computing the potential scale reduction factor Ii of Gelman and Rubin
(1992) throughout bum-in. This diagnostic is based on an AN OVA technique that
computes the ratio of between chain variability to within chain variability, and which
should approach 1 at convergence. The effectiveness of MCMC mixing aﬁer bum-in was
determined by effective sample size (ESS) of the samples (Sorensen et al. 1995;
Hoeschele and Tier 1995) using the initial positive sequence estimator (Geyer 1992).

The ESS is an estimate on the information content of the MCMC samples in terms of an

equivalent number of independent samples.

53

Inference on Heritabilities and Genetic Correlations

o o o a o o o o u 2 . .
Point estimates and posterior densrtres of the additive direct vanance 0'0 , additive
maternal of, variances, and the corresponding direct-maternal covariance O'DM can be

derived from 0'3 , 0'3, and 0",," (Manfredi et al. 1991a,b; Matos et al. 1994; Luo et al.

1999), using
of, 4 0 0 0'3
00M = —2 4 0 arm (7)
a; 1—4 4 0';

The phenotypic variance (0,2,) is deﬁned by
Inferences on additive direct (11,3) and maternal (hf, ) heritabilities and the direct-

matemal genetic correlation (rDM) were further determined using the following

2 2
O C a 0 O- Q
relationshlps: hf, = —’2), hi, = —i and rDM = D“ . Note that WIth HYS treated as
0'

2 2 2
I’ 0P JUDGM

random and hence of included as part of of, , the heritabilities estimated in this study are

 

effectively reported as "across-herd" heritabilities rather than as "within-herd"

heritabilities as would be for the case when HYS are treated as ﬁxed.

RESULTS AND DISCUSSION
Assessment of MCMC Convergence and Mixing

Based on the raw trace plots of samples from 85,000 cycles and Gelman and
Rubin's Ii computed from the 3 chains, it was determined that 5,000 cycles was a

54

sufﬁciently long bum-in period for all dispersion and threshold parameters and within all

chains; i.e., the length of the burn-in period was enough to eliminate the effect of the

different starting values. The Ii values for the slowest mixing parameters (variance
components and threshold parameters) were all equal to 1.00 by the end of the bum-in
period.

The estimated ESS for each variance component and threshold parameter is given
in Table I and is based on a sum of separate determinations from each of the three
separate chains. The ESS for these parameters ranged from 1,705 to 6,953, indicating
suﬁicient MCMC mixing. Several animal breeders have recently suggested 100 as the
minimum ESS for reliable statistical inference (U imari et al. 1996; Bink et al. 1998). The

ESS of 3,211 and 3,110 for 1’2 and I, were found to be higher than that for the variance

components and considerably higher than what has been determined for threshold
parameters in comparable MCMC studies (V arona et al. 1999; Luo et al. 2001). Again,

improved mixing was most likely due to the Cowles MH update.

Variance Component Inference
Point estimates of variance components based on MML using EM, Laplace's
method and based on posterior means, modes and medians using MCMC are given in
Table H. The median has been suggested (Raftery and Lewis 1992) as a more robust
point estimator compared to the posterior mean. However, this study demonstrated no
appreciable differences between the posterior mean, mode, and median of parameters

since the posterior distributions of variance components and threshold parameters were

55

nearly symmetric. The point estimates from the three independent MCMC chains were,
as expected, highly repeatable (not shown) because of the individually large ESS.

In this study, the EM-MML method produced slightly lower point estimates of
variance components compared to the Laplace MML method, which in turn lead to
slightly lower estimates relative to any of the MCMC point estimates. These results are
in agreement with what was anticipated given the simulation study results of Tempelman
(1998). That is, genetic variance estimates based on the use of the EM-MML method
may be slightly biased downwards. Nevertheless, there appeared to be no signiﬁcant
difference in these estimates between all three methods in this study.

All likelihood ratio tests (test statistics not presented) based on Laplace's method

2

m 9

indicated that all components of variance and covariance (of ,0' 0's",,0':) were

statistically signiﬁcant (P<0.0001). Furthermore, the asymptotic standard errors of the
Laplace MML estimates closely corresponded to the posterior standard deviations
determined using MCMC, thereby suggesting that the joint posterior density of the
variance components was close to being multivariate normal. Further evidence of this is
indicated by the 95% posterior probability intervals being nearly symmetric about the
respective posterior means.

Table II also indicates that maternal genetic variability for calving difﬁculty is
appreciable in Piemontese cattle. However, there was no evidence to support a non-zero

additive-matemal covariance.

56

Heritabilities and Genetic Correlations

Inferences on direct and maternal heritabilities and the genetic correlation are
presented in Table H. Point estimates of the heritabilities and genetic correlation obtained
from the MCMC algorithm were similar across all chains and estimation methods
(posterior mean, mode and median). These results indicated that the posterior
distributions of genetic parameters were nearly symmetric. As with the variance
components, the EM-MML estimates were found to be slightly lower than the Laplace
MML estimates and the various MCMC point estimates.

Our heritability estimates were similar to ﬁndings of Varona et al. (1999) who
used a threshold animal model with direct and maternal effects, but they were
substantially higher than the threshold model estimates reported by Manfredi et al.
(1991a,b), McGuirk et al. (1998), McGuirk et al. (1999), Luo et al. (1999), and Bennett
and Gregory (2001). The genetic correlation did not appear to be statistically signiﬁcant;
nevertheless, the 95% posterior probability interval was extensive (i.e. from -0.29 to
0.25). In contrast, signiﬁcantly negative direct-matemal genetic correlations have been
reported using threshold models (Luo et al. 1999; Bennett and Gregory 2001) and linear
models (Carnier et al. 2000) for the analysis of calving ease.

Carnier et al. (2000) used a linear mixed model to analyze a data set from a source
virtually identical to that used in this study and reported lower heritability estimates and a
signiﬁcantly negative genetic correlation estimate compared to estimates reported in
Table II. In ﬁeld data studies of various categorical traits in animal breeding, Weller et
al. (1988), Olsen et al. (1994), Matos et al. (1997), Varona et al. (1999) also found that

linear model heritability estimates were smaller than threshold model heritability

57

estimates, as anticipated from theory (Dempster and Lerner 1950). Our results suggest
that selection of sires for calving ease of their progeny should not result in antagonistic
maternal effects over successive generations. The differences in results from Carnier et
a1. (2000) may be based on model speciﬁcation (i.e. threshold sire and maternal grandsire
versus linear animal model; ﬁxed versus random HYS effects) and substantial differences
in data editing. Nearly twice as many records were used by Carnier et al. (2000) whereas
the current study concentrated on herds having 50 or more records over the same time
period. To investigate the impact of the HYS speciﬁcation further, data were analyzed
with two linear mixed (sire-matemal grandsire) models, treating HYS as ﬁxed in one case
and as random in the other. Treating HYS as ﬁxed in a threshold mixed model was not
feasible due to the extreme category problem (Hoeschele and Tier 1995). A statistically
signiﬁcant genetic estimate of -0.72 was found with ﬁxed HYS effects whereas a
nonsigniﬁcant genetic correlation estimate of -0.05 was estimated when HYS were
treated as random. This implies that the direct-matemal genetic covariance may be
heterogeneous across different environments (i.e. different herds) and particularly
inﬂuenced by herd size since ﬁxed versus random HYS speciﬁcations should lead to

similar results for large herds.

Comparison of Inferences on Sire Effects
Empirical Bayes or MAP estimates of sire effects were computed by maximizing
the joint density of ﬁxed and random effects conditioned upon estimated variance
components by EM and Laplacian algorithms; the respective MAP sire effect estimates

are labeled MAP-EM and MAP-Laplace. Posterior means of solutions were determined

58

to be corresponding point estimates of sire effects using MCMC. The Pearson and rank
correlations between these estimates were computed and found to be greater than 0.99
between MAP-EM and MAP-Laplace, between MAP-EM and MCMC, and between
MAP-Laplace and MCMC; i.e. there were no substantial differences in ranking. A simple
linear regression involving estimated sire effects of MCMC on MAP-EM, MCMC on
MAP-Laplace and MAP-Laplace on MAP-EM indicating regression slope estimates only
slightly different from one, respectively, 1.06, 1.03 and 1.03. Wang et al. (1997)
illustrated a similar relationship in a simulation study whereby the regression coefficient
of MAP on MCMC-derived posterior means for direct genetic effects was slightly greater
than one, 1.08. They also determined the corresponding correlation to be high (>0.99).

Posterior standard errors of the sire effects can be used to derive approximate
accuracies of estimated genetic merits. These standard errors were determined as the
standard deviation of the samples from MCMC and from the square root of the diagonals
of C5,, based on both EM and Laplace estimates of variances. Simple linear regression
analyses of these standard errors based on MCMC versus MAP-EM, MCMC versus
MAP-Laplace and MAP-Laplace versus MAP-EM, resulted in intercepts not different
from 0 but estimated slopes of 1.09, 1.03, and 1.07, respectively, indicating that the
posterior standard errors were on average larger under MCMC than under MAP
estimates. This was anticipated as uncertainty on the variance components is accounted
for in fully marginal inference using MCMC but not so in MAP. In all cases, the

correlation between the estimated standard errors was near one.

59

CONCLUSIONS

In spite of the results of several smaller simulation studies, our work showed that
there were no appreciable differences between approximate MML/MAP versus MCMC
methods for inferences on genetic parameters and genetic merit on calving ease under a
threshold sire and maternal grandsire model and derived from a relatively large data set.
Furthermore, MCMC mixing was substantially improved relative to previously published
implementations due to the use of a joint MH update for the threshold parameters.
Maternal genetic variation was estimated to be an important source of variability
although the genetic correlation between direct and maternal effects was not signiﬁcant.
The impact of random versus ﬁxed HYS effects on the estimates of this genetic
correlation warrants further study.

Hence, there does not appear to be a pressing need to use MCMC methods for
national genetic evaluations of calving ease based on conventionally speciﬁed threshold
and maternal grandsire sire models. However, this may not be true for animal model

speciﬁcations.

6O

REFERENCES

Albert, J. H.; Chib, S., 1993: Bayesian analysis of binary and polychotomous response
data. J. Am. Stat. Assoc. 88: 669-679.

Bennett, G. L.; Gregory, K. E., 2001: Genetic (co)variances for calving difﬁculty score in
composite and parental populations of beef cattle: I. Calving difficulty score, birth
weight, weaning weight, and postweaning gain. J. Anim. Sci. 79: 45-51.

Berger, P. J ., 1994: Genetic prediction for calving ease in the United States: Data,
models, and use by the dairy industry. J. Dairy Sci. 77: 1146-1153.

Bink, M. C. A. M.; Quaas, R. L.; Van Arendonk, J. A. M., 1998: Bayesian estimation of
dispersion parameters with a reduced animal model including polygenic and QTL
effects. Genet. Sel. Evol. 30: 103-125.

Breslow, N.; Clayton, D., 1993: Approximate inference in generalized linear mixed
models. J. Am. Stat. Assoc. 88: 9-25.

Carnier, P.; Albera, A.; Dal Zotto, R.; Groen, A. F.; Bona, M.; Bittante, G., 2000: Genetic

parameters for direct and maternal calving performance over parities in Piemontese
cattle. J. Anim. Sci. 78: 2532-2539.

Cowles, M. K., 1996: Accelerating Monte Carlo Markov Chain convergence for
cumulative—link generalized linear models. Stat. and Comp. 6: 101-111.

Dempster, E.R.; Lerner, I. M., 1950: Heritability of threshold characters. Genetics 35:
212-236.

Foulley, J. L.; Irn, S.; Gianola, D.; Hoeschele, I., 1987: Emprical Bayes estimation of
parameter for n polygenic traits. Genet. Sel. Evol. 19: 197:224.

Gelman, A.; Rubin, D., 1992: Inference from iterative simulation using multiple
sequences. Stat. Sci. 15: 201-224.

Geyer, C. J ., 1992: Practical Markov chain Monte-Carlo (with discussion). Stat. Sci. 7:
467-51 1.

Gianola, D.; Foulley, J. L., 1983: Sire evaluation for ordered categorical data with a
threshold model. Genet. Sel. Evol. 15 :201-224.

Graser, H. U.; Smith, S. P.; Tier, B., 1987: A derivative-free approach for estimating

variance components in animal models by restricted maximum likelihood. J. Anim.
Sci. 64:1362-1370.

61

Harville, D. A.; Mee, R. W., 1984: A mixed-model procedure for analyzing ordered
categorical data. Biometrics 40: 393-408.

Henderson, C. R., 1976: Inverse of a matrix of relationships due to sires and maternal
grandsires in an inbred population. J. Dairy Sci. 59: 1585-1588.

Hoeschele, 1.; Gianola, D.; Foulley, J. L., 1987: Estimation of variance components with
quasi-continuous data using Bayesian methods. J. Anim. Breed. Genet. 104: 334-
349.

Hoeschele, 1.; Gianola, D., 1989: Bayesian versus Maximum Quasi Likelihood methods
for sire evaluation with categorical data. J. Dairty Sci. 72: 1569-1577.

Hoeschele, 1.; Tier, B., 1995: Estimation of variance components of threshold characters

by marginal posterior mode and means via Gibbs sampling. Genet. Sel. Evol. 27:
519-540.

Luo, M. F .; Boettcher, P. J .; Dekkers, J. C. M.; Schaeffer, L. R., 1999: Bayesian analysis
for estimation of genetic parameters of calving ease and stillbirth for Canadian
Holsteins. J. Dairy Sci. 82: 1848.

Luo, M. F.; Boettcher, P. J .; Schaeffer, L. R.; Dekkers, J. C. M., 2001: Bayesian
inference for categorical traits with an application to variance components
estimation. J. Dairy Sci. 84: 694-704.

Manfredi, E. J .; San Cristobal, M.; Foulley, J. L., 1991a: Some factor affecting the

estimation of genetic parameters for cattle dystocia under a threshold model. Anim.
Prod. 53: 151-156.

Manfredi, E. J .; Ducrocq, V.; Foulley, J. L., 1991b: Genetic analysis of dystocia in dairy
cattle. J. Dairy Sci. 74: 1715-1723.

Matos, C. A. P.; Thomas, D. L.; Young, L. D.; Gianola, D., 1994: Analysis of lamb
survival using linear and threshold models with maternal effects. 5'“ World

Congress of Genetics Applied to Livestock Production, Ontario, Canada. Vol. 18:
426-429.

Matos, C. A. P.; Thomas, D. L.; Gianola, D.; Tempelman, R. J .; Young, L. D., 1997:
Genetic analysis of discrete reproductive traits in sheep using linear and nonlinear
models. 1. Estimation of genetic parameters. J. Anim. Sci. 75 : 76-87.

McGuirk, B. J .; Going, I; Gilmour, A. R., 1998: The genetic evaluation of beef sires used

for crossing with dairy cows in the UK. 2. Genetic parameters and sire merit
predictions for calving survey trits. Animal Sci. 66: 47-54.

62

McGuirk, B. J .; Going, I; Gilmour, A. R., 1999: The genetic evaluation of UK Holstein
Friesian sires for calving ease and related traits. Animal Sci. 68: 413-422.

Meyer, K., 1989: Restricted maximum likelihood to estimate variance components for
animal models with several random effects using a derivative-free algorithm.
Genet. Sel. Evol. 21 : 317-340.

Misztal, I.; Perez-Enciso, M., 1998: FSPAK90 - A Fortran 90 interface to sparse-matrix
package FSPAK with dynamic memory allocation and sparse matrix structure.
Proc. 6th World Cong. Gen. Appl. Livest. Prod. V0127: 467-468.

Nelder J. A.; Mead, R., 1965: Computer Journal. 7: 308-313.

Olesen, 1.; Perez-Enciso, M.; Gianola, D.; Thomas, D. L., 1994: A comparison of normal
and nonnormal mixed models for number of lambs born in Norwegian sheep. J.
Anim. Sci. 72: 1166-1173.

Raﬁery, A. E.; Lewis, S. M., 1992: Pages 765-776 in Bayesian Statistics 4. J. M.
Bernardo, J. O. Berger, A. P. David, and A. F. M. Smith, ed. Oxford Univ. Press,
Oxford, UK.

Ramirez-Valverde, R.; Misztal, 1.; Bertrand, J. K., 2001: Comparison of threshold vs
linear and animal vs sire models for predicting direct and maternal genetic effects
on calving difﬁculty in beef cattle. J. Anim. Sci. 79: 333-338.

Searle, S. R., 1982: Matrix Algebra Useful for Statistics. John Wiley & Sons, New York.

Simianer, H.; Schaeffer, L. R., 1989: Estimation of covariance components between one
continuous and one binary trait. Genet. Sel. Evol. 21: 303-315.

Smith, S. P.; Graser, H. U., 1986: Estimating variance components in a class of mixed
models by restricted maximum likelihood. J. Dairy Sci. 69: 1156-1165.

Sorensen, D. A.; Andersen, S.; Gianola, D.; Korsgaard, 1., 1995: Bayesian inference in
threshold models using Gibbs sampling. Genet. Sel. Evol. 27: 229-249.

Stiratelli, R.; Laird, N.; Ware, J. H., 1984: Random-effects models for serial observations
with binary response. Biometrics. 40: 961-971.

Strarn, D. 0.; Lee, J. W., 1994: Variance components testing in the longitudinal mixed
effects model. Biometrics. 50: 1171-1177.

Tempelman, R, J .; Gianola D., 1993: Marginal maximum likelihood estimation of

variance components in Poisson mixed models using Laplacian integration. Genet.
Sel. Evol. 25: 305-319.

63

Tempelman, R. J ., 1998: Generalized linear mixed models in dairy breeding. J. Dairy Sci.
81: 1428-1444.

Uimari, P.; Thaller, G.; Hoeschele, 1., 1996: The use of multiple markers in a Bayesian
method for mapping quantitative trait loci. Genetics. 143: 1831-1842.

Varona L.; Misztal, 1.; Bertrand, J. K., 1999: Threshold-linear versus linear-linear
analysis of birth weight and calving ease using an animal model: I. Variance
component estimation. J. Anim. Sci. 77: 1994-2002.

Wang, C. S.; Quaas, R. L.; Pollak, E. J ., 1997: Bayesian analysis of calving ease score
and birth weights. Genet. Sel. Evol. 29: 117-143.

Weller, J. 1.; Misztal, I; Gianola, D., 1988: Genetic analysis of dystocia and calf mortality
in Israeli-Holsteins by threshold and linear models. J. Dairy Sci. 71: 2491-2501.

Table 1. Effective sample sizes for (co)variance components and threshold parameters
using MCMC for ﬁrst parity calving ease scores in Piemontese cattle.

 

 

 

PARAMETER
0:2 0'; 0m 0': 72 1'3
ESSa 1,799 1,705 1,799 6,953 3,211 3,110

 

“The combined effective sample sizes based on the sum of determinations from 3
independent MCMC chains post bum-in, each based on 80,000 samples.

65

Table II. Estimation of variance-covariance components and genetic parameters for
calving difﬁculty in Piemontese cattle usirg EM, Laplacian, and MCMC algorithms.

 

 

 

Variance EMa Laplaceb MCMC

component PMc PMDd PMEc PPIf

0'32 0.083 0.091i‘0.01 1 0.095i'0.014 0.093 0.095 0.070-0. 123
0'3, 0.055 0.060i0.008 0.064i0.010 0.062 0.062 0.046—0.084
0's," 0.041 0.045i0.007 0.045i0.008 0.044 0.044 0.029-0.061
0'; 0.185 0.208i0.013 0.217i0.013 0.217 0.217 0.192-0.243
0'12, 0.331 0.363 038010.055 0.373 0.373 0.278-0.493
0;, 0.140 0.151 0.170i0.037 0.163 0.163 0106-0252
O’DM -0.002 -0.002 -0.011i0.035 -0.01 1 -0.011 -0.084—0.055
hf) 0.236 0.251 0.259i0.0348 0.253 0.258 0.194-0.330
hi4 0.100 0.105 0.116i0.0256 0.112 0.114 0.072-0.173
rDM -0.008 -0.009 -0.034i0.136 -0.051 -0.037 -0.292-0.245

 

alJoint MML estimates using EM algorithm.

l’Joint MML estimates using Laplace algorithm :t asymptotic standard error.

°Posterior mean i posterior standard deviation.
dMarginal posterior mode.
eMarginal posterior median.

f95% posterior probability interval.

66

CHAPTER 3

An Assessment of Cumulative t-Link Threshold Models for the

g Quantitative Genetic Analysis of Calving Ease Scores

ABSTRACT

The Student-t and other heavy-tailed distributions have been speciﬁed for
residuals to facilitate robust statistical inference in linear mixed models. In this study, we
develop a hierarchical threshold mixed model based on a cumulative t-link speciﬁcation
for the analysis of ordinal data, speciﬁcally, calving ease scores. The validation of our
model and our Markov Chain Monte Carlo (MCMC) algorithm was carried out on
simulated data from normally and t4 (i.e. a t distribution with 4 degrees of freedom)
distributed populations using the deviance information criterion (DIC) and a related
measure to validate recently proposed model choice criteria. The simulation study
indicated that although inference on the degrees of freedom parameter is possible,
MCMC mixing was problematic. Nevertheless, the DIC was shown to be a satisfactory
measure of model ﬁt to data. We applied a sire maternal and grandsire cumulative t-link
model to a calving ease dataset from 8,847 Italian Piemontese ﬁrst parity dams. The
cumulative :4 -link model was shown to lead to posterior means of direct and maternal
heritabilities (0.40:t0.06, 0.11i0.04) and a direct maternal genetic correlation (-
0.58i0.15) that were not different from the corresponding posterior means of the
heritabilities (0.42:1:0.07, 0.14i0.04) and the genetic correlation (-0.55:t0.14) inferred
under the conventional cumulative probit link threshold model. Furthermore, the

correlation (>.99) between posterior means of sire progeny merit from the two models

67

suggested no meaningful rerankings. Nevertheless, the cumulative t-link model was

decisively chosen as the better ﬁtting model using DIC.

INTRODUCTION

Data quality is an increasingly important issue for genetic evaluation of livestock,
both from a national and international perspective (Emanuelson et al. [13]). Breed
associations and government agencies typically invoke arbitrary data quality control edits
in order to minimize the impact of recording error, preferential treatment and/or
injury/disease on predicted breeding values (Bertrand and Wiggans [4]) in the belief that
the data residuals should be normally distributed.

It has been recently demonstrated that the speciﬁcation of Student t distributed
residuals in linear mixed models may effectively mute the impact of residual outliers,
particularly in situations where preferential treatment of some breedstock may be
anticipated (Stranden and Gianola [3 8]). Based on the work of Lange et a1. [22] and
others, Stranden and Gianola [3 9] developed the corresponding hierarchical Bayesian
models for animal breeding, using Markov Chain Monte Carlo (MCMC) methods for
inference. In their models, residuals were speciﬁed as either having independent
(univariate) t distributions or multivariate t distributions within herd clusters. Outside of
possibly longitudinal studies, the multivariate speciﬁcation is of dubious merit ([32],
[38], [39]) such that all of our subsequent discussion pertains to the univariate
speciﬁcation only.

A Student t distribution can be represented as a scale mixture of normals with

Gamma distributed scaling factors (Lange et al. [22]). Alternative speciﬁcations of scale

mixture of normals, all of which lead to heavy-tailed residual densities relative to the
normal, are further considered by Carlin and Polson [6] and in an animal breeding
context by Rosa [32]. Using Bayes factors approximations, Rosa [32] noted that these
heavier tailed models provided better ﬁts to a dataset on birth weights in rats than a
model based on a Gaussian error distribution. More recent applications include linkage
mapping of quantitative trait loci (von Rohr and Hoeschele [44]).

Auxiliary traits such as calving ease or milking speed are often subjectively
scored on an ordinal scale. It might then be anticipated that data quality would be an
issue of greater concern in these traits than more objectively measured production
characters, particularly since record keeping is generally unsupervised, being the
responsibility of the attending herdsperson. Luo et al. [23] has suggested that a decline in
the diligence of data recording was partially responsible for their lower heritability
estimates of calving ease relative to earlier estimates from the same Canadian Holstein
population.

The cumulative probit link (CP) generalized linear mixed model, otherwise called
the threshold model, is currently the most commonly used genetic evaluation model for
calving ease ([3, 46]). MCMC methods are particularly well suited to this model since the
augmentation of the joint posterior density with normally distributed underlying or latent
liability variables facilitate implementations very similar to those developed for linear
mixed effects models ([1, 36]). A cumulative I link (CT) model has been proposed by
Albert and Chib [l] for the analysis of ordinal categorical data, thereby providing greater

modeling ﬂexibility relative to the CP model. The CT model can be created by simply

69

augmenting the joint posterior density with t-distributed rather than normally distributed
underlying liability variables based on a Gamma scale mixture of normals.

The objectives of this study were to validate MCMC inference of the CT
generalized linear mixed (sire) model via a simulation study and to compare the ﬁt of this
model with the CP model for the quantitative genetic analysis of calving ease scores in

Italian Piemontese cattle.

MODEL CONSTRUCTION

Suppose that elements of the n x 1 data vector Y = {Y,.} , can take values in any

one of C mutually exclusive ordered categories. The classical CP model for ordinal data

(Gianola and Foulley [17]) can be written as follows:

Prob(Y,. ___ j | [3,“,1') = (D[Tj “(xiP+zi“)]_¢[Tj-I ’(xiP+zi“)] , (1)

 

 

0' 0'

where j = 1,2,. . .,C denotes the index for categories. Also <1>(.) denotes the standard

normal cumulative distribution ﬁmction, B and u are the vectors of unknown ﬁxed and

random effects, and 1" = [TO I, 21.] is a vector of unknown threshold parameters
satisfying 1, < 2'2... < Q with To = -oo and rc = +00. Furthermore, x, and z, are known

incidence row vectors. Latent liability variables (L = {L,.};,) can be introduced to

alternatively deﬁne the same speciﬁcation as in (1) but in two hierarchical stages:

,.
Prob(Y,=j|L,,r)=Zl(r,_, <L,<r,)1(Y,.=j) (2a)

j-I

L, | p,u,af ~ N(x',.s + z',u,a§) (2b)

70

for i = 1,2,. . .,n. Here 1(.) denotes an indicator function, which is equal to 1 when the
expression in the ﬁmction is true and is equal to 0 otherwise. As shown by Albert and
Chib [1] and in an animal breeding context by Sorensen et al. [36]; this introduction of L
facilitates a tractable MCMC implementation.

The CT model is a generalization of (1), that is,

Prob(Y,. =1, 3,1-33.03): F‘itj -(x.n+z.u)]_R [Tr—I -(x.-s+z.u)] (3)

 

 

(7' (3'

e C

for j =1 ,2,. . .,C where F v represents the cumulative density function of a standard Student
t-distribution with degrees of freedom v. Note that as v —) oo, (3) -—) (1) such that the

standard CP model is simply a special case of the CT model. Like the CP model, the CT

model can also be represented as a two-stage speciﬁcation, with the ﬁrst stage as in (2a)

-%( v+l)

but the second stage speciﬁed as:
. . 2
(L,-(x,|3 +z,u))

p(ﬂ)
2 1+ 2 , (4)

F(%)F[%)(va§ )3 v0.

i.e., L,- is Student t-distributed with location parameter p, = x,[3 + z',u , scale parameter

 

p(L. Ir,u.af,v)=

0'} >0 and degrees of freedom v>0 for i=1,2,. . .,n. In turn, equation (4) can be

represented by a two-stage scale mixture of normals:

2
L,- mumm— ~ NW ”Leif—J (5a)
L exp {-9g-v} (5b)

71

Note that (5b) speciﬁes a Gamma density with parameters v/2 and v/2, thereby having an
expectation of 1. The remaining stages of our hierarchical model are characteristic of

animal breeding models. We write

B ~ 1303) (6)
where p(ﬂ) is a subjective prior, typically speciﬁed to be ﬂat or vaguely informative.
F urthennore, the random effects are typically characterized by a structural multivariate
prior speciﬁcation:

ultp ~ p(ul¢) ~ N(0,G(¢)) (7)
Here G(qr) is a variance-covariance matrix that is a function of several unknown
variance components or variance-covariance matrices in q) , depending on whether or not

there are multiple sets of random effects and/or speciﬁed covariances between these sets;
an example of the latter is the covariance between additive and maternal genetic effects.
Furthermore, ﬂat priors, inverted Gamma densities, inverted Wishart densities or
products thereof may be speciﬁed for the prior density p(tp) on q) , depending, again, on
the number of sets of random effects and whether there are any covariances thereof
(Jensen et al. [20]).

Finally, a prior is required for the degrees of freedom parameter v. We use the

prior

p(v) cc 6+1_v)'2' (8)

which is consistent with a Uniform(0,1) prior on l/(l+v).

As with CP models, there are identiﬁability issues involving elements of ‘t with

0'} such that constraints are necessary. Typically, one element of 1', , 2'2 ,..., To, is restricted

72

to an arbitrary value (e.g. 2', =0) along with 0'3 = 1. This is the parameterization we
chose such that inference on 0'3 is not subsequently considered in this paper.

Presuming that the elements of Y are conditionally independent given [1 and u, we

can write the joint posterior density of all unknown parameters and latent variables (L) as

follows:
p(ﬂ,u,r,¢,v,L,l l y)

0C(ljpmbozy''L'")1"(Li'1"“"’3”'i)1’(4lV)]P(Ii)rv(ul(101200.00) (9)

n

i=1 '

where 1. = {1,}
A MCMC inference strategy involves determining and generating random
variables from the full conditional densities (FCD's) of each parameter or blocks thereof.
Many of the FCD's can be directly derived using results from Sorensen et al. [36] jointly
with results from Stranden and Gianola [39]. Let 0= [6' u']'. It can be readily shown that

the FCD of Q is multivariate normal:

0|y,v,1.,L~N(0,W'R"W+X‘) (10)
where R'l -diag{2 }" isadiagonal matrix 2’ -[0Pxp 0M J W-[X Z] for
- i ,3 9 — -l 9 -
' 001p (9(a))
X=[xl x2 x,,]', Z=[z, 22 z,,]',and
é=['}]=[w'R"W+2']"w'R"L (11)
u

Generation of individual elements 0,, j=1,2,. . .,p+q or blocks thereof of 0 from their

respective FCD is straightforward using the strategy presented by Wang et al. [45].

73

The FCD of individual elements of L and T are straightforward to generate from
using results from Sorensen et al. [36]. We, however, prefer the Metropolis-Hastings and
method of composition joint update of L and I presented by Cowles [11]. She
demonstrated and we have further noted in our previous applications (Kizilkaya et al.
[21]) that the resulting MCMC mixing properties using this joint update are vastly
superior to using separate Gibbs updates on individual elements of L and ‘t as outlined

by Sorensen et al. [36].

If some partitions of (p form a variance-covariance matrix, then their respective
FCD can be readily shown to be inverted-Wishart (Jensen et al. [20]) whereas if other
partitions of (p involve scalar variance but no covariance components, then the FCD of
each component can be shown to be inverted-gamma.

The F CD of 7t, can be shown to be

v+l

— -|
p(/l,. | 1-, ,L,0, v) 0: kg 2 J x exp[--:—'((L,-x,ﬂ-z,u)2 + v)] (12)
that is, the kernel in (12) speciﬁes that distribution to be
v +1 1 . . 2
Gamma(——2—,§(v+(L,-x,ﬂ-z,u) )J. Here L,- denotes all elements of 1. = {1,} except

for 1,, i =1,2,....n.

Finally, the FCD of v can be shown to be

I!

o“
p(vlﬂ,u,L,).)oc —2-— [II/1,7] exp(--:-/1,D(—l—— (13)

I7(V/2) 121 1+ v)2

74

given the speciﬁcation for p(v) in (8). Equation (13) is not a recognizable density such
that a Metropolis-Hastings update is required. We utilized a random walk
implementation (Chib and Greenberg [10]) of Metropolis-Hastings sampling;
speciﬁcally, a normal density with expectation equal to the parameter value from the

previous MCMC cycle was used as the proposal density for drawing from the FCD of

x = log (v) , using equation (13) and the necessary J acobian for this transformation. The

Metropolis-Hastings acceptance ratio was tuned to intermediate rates (40-50%) during
the MCMC burn-in period to optimize MCMC mixing (Chib and Greenberg [10]). Since

the variance of a t-density is not deﬁned for v S 2, we truncate sampled from (13) such

that v > 2, or equivalently K > log(2).

MODEL COMPARISON

Model choice is an important issue that has not received considerable attention in
animal breeding until only very recently ([19, 31]). Likelihood ratio tests have been used
to compare differences in ﬁt between various models and their reduced subsets; however,
these tests do not facilitate more general model comparisons. The Bayes factor has a
strong theoretical justiﬁcation as a general model choice criterion; however algorithms
for Bayes factors computations are either computationally intensive (e.g. Chib [9]) or
numerically unstable (Newton and Raftery [30]). Furthermore, as Gelfand and Ghosh
[15] indicate, Bayes factors lack clear interpretation in the case of improper priors which
are particularly frequent speciﬁcations in animal breeding hierarchical models. Akaike's
information criterion or Schwarz's Bayesian criterion are analytical measures that provide

an asymptotic representation of Bayes factors and reﬂect a compromise between

75

goodness of ﬁt and number of parameters. As the total number of parameters and latent
variables often exceeds the number of observations in an animal breeding (e.g. animal
model) analysis, the effective number of parameters in hierarchical models is not always
so obvious. The MCMC sample average of the posterior log likelihoods, or data
sampling log densities, may be used as a means for comparing different models
(Dempster [12]); however, as Speigelhalter et al. [37] indicate, it is not always so obvious
how to proceed when these densities are similar but the number of parameters and/or the
numbers of hierarchical stages of the candidate models vary. Speigelhalter et al. [37]
proposed the deviance information criterion (DIC) for comparing alternative
constructions of hierarchical models. The DIC is based on the posterior distribution of
the deviance statistic, which is -2 times the sampling distribution of the data as speciﬁed
in the ﬁrst stage of a hierarchical model. It may not be so obvious, however, what exactly
the data sampling stage is in a hierarchical model. For example, the data sampling stage
for the CT model may be speciﬁed in one way as

r, —(x,.[l +z,u) z',_, —(x,B + z,u)

Prob(Y,. =j||3,u,1:,2,,0'3)=<1> 0' -(D 0 (14)

e e

a a

given the speciﬁcations of (2a) and (5a) or it may be speciﬁed more marginally using (3).

 

 

 

 

We prefer a more marginalized or heavier-tailed ﬁrst stage speciﬁcation such as (3) for
CT and (l) for CP, potentially leading to a more stable implementation with justiﬁcation
provided by Satagopan et al. [34] but with their context being the stabilization of the

Bayes factor estimator of Newton and Raftery [30].

76

The DIC is computed as the sum of average Bayesian deviance (5) plus the
"effective number of parameters" (pp) with respect to a model, such that smaller DIC
values indicate better ﬁt to the data. Let G denote the number of cycles after

convergence in a MCMC chain. Furthermore, we represent all unknown parameters in the

marginalized ﬁrst stage speciﬁcation by 3 = (B,u,r, v) with 3 excluding v=oo in the CP

model. Then, for the CT model, then the average Bayesian deviance can be estimated
using (3) by

If

5 = -2[iZIog(Pr°b(Y =% | B'“~""J"'ﬂ"")))

g=l 121

where the superscript [g] denotes the MCMC cycle g, g = 1,2,. . .,G for the sampled value
of the corresponding parameter. Furthermore, p0 can be estimated as p,, = D — 0(8)

where

D(§) = —2[:log Prob(Y = y, IB,ﬁ,¥,lI-)).

i=1

Here the bar notation (e. g. 8) denotes the corresponding posterior mean vector.
We alternatively considered the conditional predictive ordinate (CPO) as the basis

for model choice (Gelfand [14]). Deﬁned for observation 1', we write the CPO as

(i

7?(y,. |y_,.,M, ) = léZ(Prob(Y = y, I p181,u[gl,.rlg]))-| ]-

g-l
using (1) for the CP model (Model MI) and

G

-l
72(y, |y_,,M,)z[—éZ(Prob(Y = y, IBlgl,ulxl,11gl,lel))-l]

s=l

using (3) for the CT model (Model M2). Here y-i denotes all observations other than y,-.
The log marginal likelihood (LML) of the data for a certain model, say Mk, can then be

estimated as:

LML = L(y | M.,) = ﬁled/Hy. IL. M.»

i=1

Larger values of LML indicate better model ﬁt to the data.

DATA
Simulation Study
A simulation study was used to validate the CT model and the utility of the DIC
and the LML for model choice between CP and CT. Three replicated datasets were
generated from each of two different populations as characterized by the distribution of
the liability residuals. Population I had a residual density that was standard normal

whereas Population II had a residual density that was standard Student-t distributed with

scale parameter 03 = 1 and degrees of freedom ve = 4. All datasets were generated based

on a simple random effects (sire) model with null mean. Liability data for 50 progeny

from each of 50 unrelated sires was generated by summing independently drawn sire
effects from N(0, 0',2 = 0.10) with independently drawn residuals from N(0, 0'3 = 1.00)

for a total of 2500 records. These underlying liabilities were mapped to ordinal data with

four categories based on the threshold parameter values of r, = —0.50, 1', =1.00, and
1, =2.00 for all populations. Ordinal data from each replicated dataset was analyzed using

both CP and CT sire models. For purposes of parameter identiﬁability, we invoked the

restrictions 0'} = 1 and r, = —0.50. As a positive control, the underlying liability data for

78

each replicate was analyzed using both normal and t distributed error mixed linear
models. For the t distributed error model, the MCMC procedure adapted was similar to
that presented in Stranden and Gianola [39], except that the degrees of freedom parameter
(v>2) was inferred as a continuous (rather than discrete) parameter, using the Metropolis-
Hastings update as presented earlier. Graphical inspection of the chains based on
preliminary analyses was used to determine a common length of bum-in period. For each
replicated data set within each population, a burn-in period of 20,000 cycles was
discarded before saving samples from each of an additional 100,000 MCMC cycles.
Furthermore, DIC and LML values were computed for each model on each replicated
dataset to validate those measures as model choice criteria. In all cases, ﬂat priors were

invoked on the variance components.

Italian Piemontese Calving Ease Data

First parity calving ease scores recorded on Italian Piemontese cattle from
January, 1989 to July, 1998 by ANABORAPI (Associazione Nazionale Allevatori Bovini
di Razza Piemontese, Strada Trinita 32a, 12061 Cam‘r, Italy) were used for this study. In
order to limit computing demands, only herds that were represented by at least 100
records over that nine-year period were considered for the demonstration of the proposed
methods in this paper, leaving a total of 8,847 records. Calving ease was coded into ﬁve
categories by breeders and subsequently recorded by technicians who visited the breeders
monthly. The ﬁve ordered categories are: 1) unassisted delivery, 2) assisted easy calving
3) assisted difﬁcult calving 4) caesarean section and 5) foetotomy. As the incidence of

foetotomy was less than 0.5%, the last two ordinal categories were combined, leaving a

79

total of four mutually exclusive categories. The general frequencies of ﬁrst parity calving
ease scores in the data set were 951 (10.75%) for unassisted delivery; 5,514 (62.32%) for
assisted easy calving; 1,316 (14.88%) for assisted difﬁcult calving; and 1,066 (12.05%)
for caesarean section and foetotomy.

The effects of dam age, sex of the calf, and their interaction were considered by
combining eight different age groups (20 to 23, 23 to 25, 25 to 27, 27 to 29, 29 to 31, 31
to 33, 33 to 35, and 35 to 38 months) with sex of calf for a total of 16 nominal subclasses.
Herd-year-season (HYS) subclasses were created from combinations of herd, year, and
two different seasons (from November to April and from May to October) as in Carnier
et al. [7] who also analyzed calving ease data from this same population. The sire
pedigree ﬁle was further pruned by striking out identiﬁcations of sires having no
daughters with calving ease records and appearing only once as either a sire or a maternal
grandsire of a sire having daughters with records in the data ﬁle. Pruning results in no
loss of pedigree information on parameter estimation yet is effective in reducing the
number of random effects and hence computing demands. The number of sires remaining
in the pedigree ﬁle aﬁer pruning was 1,929. Therefore, as in Kizilkaya et al. [21], the CP
and CT models used for the analysis of calving ease data included the ﬁxed effects of age

of dam classiﬁcations, sex of calf and their interaction in B , the random effects of

independent herd-year-season effects in h, random sire effects in s and random maternal

grandsire effects in m. We assume:

qumwm)

and

80

h~N(0,ra,f)

0' 0" o n a n 2 o
where G0 =[ " é" , wrth 0'3 denoting the srre varrance, 0',,, denoting the maternal
0.57” am

. . . . . . 2
grandsrre varrance, 0',,,, denoting the srre-matemal grandsrre covariance, and 0,,

denoting the HYS variance. Furthermore, ® denotes the Kronecker (direct) product
(Searle [35]), and A is the numerator additive relationship matrix between sires due to
identiﬁed male ancestors (Henderson [18]). Also, h is assumed independent of s and m.
Flat unbounded priors were placed on all ﬁxed effects and variance components. Based
on preliminary analyses of the data and results from the simulation study, v was not
inferred upon but held constant to v = 4 in the CT model. MCMC inference was based on
the running of three different chains for each model. For each chain in the CP model, a
total of 5,000 cycles of burn-in period followed by saving samples from each of 100,000
additional cycles was executed based on the experiences of Kizilkaya et al. [21].
Because of initially anticipated slower mixing, the corresponding burn-in period for each
chain in the CT model was 10,000 cycles followed by saving each of 200,000 additional
cycles. To facilitate diagnosis of MCMC convergence by 5,000 cycles, the starting
values on variance components for each chain within a model were widely discrepant,
with one chain starting at the posterior mean of all (co)variance components based on a
preliminary analysis, another chain starting at the posterior mean minus 3 posterior
standard deviations for each (co)variance component and the ﬁnal chain starting at the
posterior mean plus 3 posterior standard deviations for each (co) variance component.
For both the simulation study and the calving ease data analysis, the effective

number of independent samples (ESS) for each parameter was determined using the

81

initial positive sequence estimator of Geyer [16] as adapted by Sorensen et al. [3 6].

Furthermore, key genetic parameters, speciﬁcally direct heritability (11,21 ), maternal

heritability (hi) and the direct-matemal genetic correlation (rdm) were inferred upon in

the calving ease data using the functions of Go as presented by Kizilkaya et al. [21] and
Luo et al. [23], for example. The only difference in the computation of heritabilities

between the CP and the CT model was that the marginal residual variance for the
underlying liabilities is not 03 in CT, as it is in CP, but is equal to __v_§0_3 (Stranden
v .—

and Gianola [39]). Posterior means and standard deviation of elements of s were also

compared between the CP and the CT model.

RESULTS
Simulation Study

Table I summarizes inference on v based on the replicated datasets from the two
populations, comparing the CP versus CT models for the analysis of ordinal categorical
data and comparing the Gaussian linear mixed model versus the t-error linear mixed
model for the analysis of the matched latent or underlying normal liabilities, as if they
were directly observed. The inference on v was surprisingly sharp and seemingly
unbiased for the t-error mixed model analysis of liability data from Population I (v=4),
with 95% equal-tailed posterior probability intervals (PPI) not exceeding 1.5 in width;
furthermore, the corresponding ESS were relatively large indicating stable MCMC
inference. Conversely, inference on v based on the t-error mixed model analysis of

liability data from Population II (v = 00) indicated extremely wide 95% PPI and posterior

82

means exceeding 100, correctly indicating stronger evidence of Gaussian distributed
versus t-distributed residuals for data from that population. Furthermore, ESS were
generally very small (~ 20) indicating that inference on v was rather unreliable for data
from Population II, at least given the speciﬁed MCMC sampling scheme. That is, ﬁve
times as many MCMC samples would be needed to attain a minimum ESS of 100 as
advocated by previous investigators ([5, 41]).

Inference on v in ordinal data under the CT model was also interesting. In
Population I, the 95% PPI correctly concentrated on low values for v although the PPI
were understandably wider than for the corresponding analyses of liability data under I-
error linear mixed models. Also, the ESS on v were considerably smaller (<25) for the
CT model analysis of ordinal data than for the corresponding matched linear model
analyses of liability data, such that acceptably accurate inference on v would require
substantially more sampling. In replicated ordinal data from Population II, the 95% PPI
on v were wide and concentrated on high values of v, consistent with what was expected.
Furthermore, as with the t-error mixed model analysis of liability data, MCMC mixing on
v using the CT model on ordinal data was seen to be particularly problematic in

Population II as manifested by the small ESS.
Table II summarizes inference on 0,2 based on the replicated datasets from the

two populations, comparing the CP versus CT models for the analysis of ordinal
categorical data and comparing the Gaussian linear mixed model versus the t-error linear
mixed model for the analysis of the underlying liabilities. In the analyses of liability data

from replicated datasets from both populations, the 95% PPI were in good agreement

with 03 =0.10; furthermore, very large ESS indicating very good MCMC mixing.

83

MCMC mixing on 0'? was understandably slower in the analysis of ordinal categorical
data, particularly in replicated data from Population 11 using CT due to the generally high
posterior sampling correlation between 0': and v.

Because of the problem of MCMC mixing of v in ordinal data, the MCMC chains
were rerun with v held constant (v = 4) for the DIC and LML comparisons between
models. In Table III, the DIC and the LML are given for each replicated dataset within
each population for the CP versus the CT model analyses of ordinal data and for the
Gaussian linear mixed model versus the t-error linear mixed model analyses of liability
data. Speigelhalter et al. [3 7] suggested that a DIC difference exceeding 7 to be a
substantial indication of an important difference in model ﬁt. Given that, the model
choices based on DIC for the linear mixed model analyses of liability data were
resoundingly in favor of the correct model. However, in the comparison between CP and
CT models for ordinal data analysis, the correct (CT) model was decisively chosen in
only one of the replicates of Population I (v = 4) whereas the correct (CP) model was
decisively chosen in only two of the replicates of Population 11 (v = 00), the other
comparisons being somewhat indecisive (i.e. DIC differences < 7). This may not be too
surprising given the information content of ordinal data relative to underlying continuous
liability data. Comparisons based on LML (larger is better) lead to similar conclusions as

those based on DIC (smaller is better).

Application to Calving Ease Scores in Italian Piemontese Cattle
Genetic Parameter Inference

Sire and maternal grandsire CP and CT models were used for the analysis of
calving ease scores in Italian Piemontese cattle. Because of the MCMC mixing problems
encountered in inferring upon v, this parameter was held constant to v =4. Posterior
inferences on key genetic parameters are summarized in Table IV and are based on the
combined results from each of the three separate chains. The posterior mean, median and
modal estimates (not shown) of the two heritabilities, and the genetic correlation using
the MCMC algorithms were similar to each other within both models, implying that the
posterior densities were symmetric and unimodal; this was further manifested by the fact
that the 95% PPI are closely matched by the posterior mean :1: 2 standard deviations. In
this study the total ESS for dispersion parameters across the three chains ranged from
1,420 to 9,305, indicating sufficient MCMC mixing under both models. Table IV shows
that the ESS from the CT model were found to be almost double those of the CP model,
attributable to the twice as large post-bum-in period for CT model. Considering v as
known also improves mixing of, particularly, genetic parameters, in the CT model
relative to joint inference with v (results not shown). Although the n x l auxiliary variable
vector). is included in the CT model, this augmentation does not appear to adversely

impact ESS and hence mixing of key genetic parameters relative to the CP model.

In this study, the CT model produced posterior means of genetic variance

components that were nearly twice as large as those estimated using the CP model.

Furthermore, the marginal residual variance is _v__ 0,,2 in the CT model such that
v .—

seemingly twice as much residual variance is inferred under a CT model (with v = 4) than

85

under a CP model (with v = 00). Although these results might at ﬁrst sight imply that
greater genetic and residual variation is directly captured by the CT model, it should be
realized that the variance of underlying variables L is only deﬁned proportionately to the
marginal residual variance. This is further apparent in that the 95% PPI of heritabilities

were only very slightly concentrated towards lower values in the CT model relative to the
CP model such that the corresponding 95% PPI for both h; and h; overlapped

substantially between the two models. Furthermore, the posterior density of rd," was very

similar between CT and CP, with most of the density being between -0.2 and -0.8.

In order to compare the CP and CT models for ﬁt to the calving ease data, LML and DIC

values, broken down into its components 15 and pp, are reported in Table V. Since it is
difﬁcult to quantify the degree of Monte Carlo error on DIC (Zhu and Carlin [47]), we
report DIC and LML values separately for each of the three chains under each model. It
can be seen that there are relatively inconsequential differences in the measures from one
chain to the next within each model relative to between models, thereby indicating
considerably small Monte Carlo errors on the DIC difference between the two models.
Both model choice criteria were overwhelmingly in favor of the CT model with v = 4. As

anticipated, the model complexity, as measured by pp, is higher for the CT model;

however, the complexity penalty is strongly counteracted by a smaller mean deviance D ,

thereby resulting in a smaller DIC favoring choice of the CT model.

86

Inferences on Sire Effects

Posterior means of elements of s were determined to be corresponding point
estimates of progeny differences (EPD) under both CT and CP models. The relationships
between these estimates are shown to be strongly linear in Figure 1, with no hint of
substantial reranking as indicated by a Pearson correlation of 0.99. The CT model had
greater spread in EPD's compared to the CP model, as further manifested by a least-
squares estimated slope of 1.38. This is not too surprising since a larger additive genetic
(sire) variance was inferred in the CT model such that posterior means of elements of 9
should be more dispersed in the CT model relative to the CP model. However, as
discussed later, this is not practically important since the variance of L is deﬁned only
proportionately to the marginal residual variability which is also larger in the CT model.

Posterior standard deviations of elements of s are analogous to standard errors of
prediction in mixed effects model analysis and can be used to derive approximate
reliabilities of EPD's (Wang et al. [46]). That is, the standard errors of prediction were
simply determined as the standard deviation of the MCMC samples of elements of s.
Figure 2 provides scatter plots of these standard errors with the corresponding least
squares regression line for the CT model versus the CP model. In this case, the
correlation between the estimated standard errors was near unity. The estimated slope
was nearly 1.41 indicating that the posterior standard errors were on average 41% larger

under the CT model than under the CP model; nevertheless, these need to be considered

proportionately to 0'3 which was also larger under the CT model.

87

DISCUSSION

Given the recent momentum in using heavy-tailed residual speciﬁcations for the
analysis of production data in animal breeding ([32, 38, 39, 44]) a hierarchical threshold
(CT) mixed model based on a cumulative t-link speciﬁcation was developed, validated by
simulation and applied to a small calving ease dataset from Italian Piemontese cattle.
The simulation study indicated that inference on v is possible in a CT model; however, it
appears that either a more suitable MCMC strategy is needed or many more samples are
required than pursued in our study to ensure a more reliable inference on v. Until this
issue is satisfactorily resolved, we advocate ﬁxing v at some arbitrarily low value in a CT
model analysis. We chose v = 4 since this value minimally assures deﬁned ﬁrst, second,
and third moments while providing a liability variable distribution that is maximally
heavy-tailed. One can then use model choice criteria such as DIC to assess whether or
not the CT model is a better data ﬁt than the CP model. We further note that for the case
where v is ﬁxed, that MCMC mixing was not negatively affected by using the CT model,
even though our data augmentation (of 2.) implementation might be of concern to those
who might prefer Metropolis-Hastings sampling on all parameters (von Rohr and
Hoeschele [44]) instead of introducing augmented variables. More recently, it has been
demonstrated that data augmentation can be strategically used to enhance MCMC
mixing; the strategies discussed by van Dyk and Meng [42] may facilitate more favorable
mixing on v and hence deserve further consideration in CT model applications to animal
breeding. Of particular note was that due to the implementation of the algorithm of
Cowles[l 1], MCMC mixing of 1' was not seen to have been the most limiting (results not

reported) as in previous animal breeding implementations ([36],[43]).

88

Our point estimates for heritabilities are substantially higher than corresponding
threshold model estimates for calving ease reported by Manfredi et al. [25, 26], McGuirk
et al. [28, 29], Varona et al. [43], Luo et al. [23], and Bennett and Gregory [2].
Nevertheless, our inference on a strongly negative direct-matemal genetic correlation is
in agreement with previous work on calving ease using threshold models ([2, 23, 43]) and
linear mixed models using data from the same source (Carnier et al. [7]). Hence, it
appears from our results, in agreement with other studies, that selection of sires for
calving ease of their progeny as calves should result in antagonistic effects in the ability
of their daughters to calve easily as dams in successive generations. What was most
surprising is that the 95% PPI for hzd are greater and do not overlap with corresponding
PPI in Kizilkaya et al. (2002) who used a larger data set on ﬁrst parity records from herds
with greater than 50 rather than 100 records over the nine year period from the same data
source. This result may be indicative of heterogeneity in genetic and residual variance
due to size of herd or other confounding factors (e. g. region); this is an area for ﬁrrther
research that our group has started with respect to residual variability.

Two Bayesian model choice criteria, DIC and LML, were used to choose between
the CP and CT models. In a simulation study, it was demonstrated that both DIC and
LML were able to decisively choose the correct model in most cases whereas, in the
remaining cases, these measures were too similar between the two models to allow a
deﬁnitive choice. In the analysis of calving ease scores in Italian Piemontese cattle, the
CT model was overwhelmingly chosen as the best ﬁtting model by both model choice
criteria. Nevertheless, in the examination of EPD's there were no real tangible

differences between the two models in terms of sire genetic rankings. Although the

89

posterior standard deviations of sire progeny merit appear to be greater in the CT model
relative to the CP model, the impact on reported accuracies of these predictions would be
shown to be negligible. For example, the Beef Improvement Federation (USA) measure
(Wang et al. [46]) of accuracy is somewhat directly related to the genetic variance which
is higher in the CT versus the CP model.

Our study involved sire models, where calf records are connected directly to sires
with genetic relationships identiﬁed only through known paternal ancestry. Sire models
are different from animal models where each record is connected directly to a calf
identiﬁcation with all known genetic (maternal and paternal) relationships explicitly
modeled. For animals (e. g. dams), with effectively less data information, we would
anticipate greater differences in predicted genetic merit between CT and CP animal
models than given in our study. However, a sire model has been seen to be more stable
than an animal model in CP implementations (Mayer [27]), particularly when paternal
half-sib relationships are predominant as in our data. Furthermore, the fact that a sire and
maternal grandsire model only accounts directly for a portion of the additive genetic
variance and of the maternal genetic variance implies that a t-error assumption is
essentially placed on a composite source of error, i.e. the sum of the residual variance and
remaining genetic variation, attributable to unknown dams and to Mendelian sampling.
From the perspective of using heavy tailed densities to mute residual outliers, this is
signiﬁcant since deviant dam and/or Mendelian sampling effects may be muted as well in
a CT sire model speciﬁcation.

Presently, it does not appear to be feasible to apply MCMC methods to the very

large datasets used for routine genetic evaluation of livestock by breed associations and

90

national recording organizations. Kizilkaya et al. [21] recently demonstrated, however,
very little differences in predicted genetic merit and standard error of prediction in a CP
sire model between inference provided by MCMC and by approximate empirical Bayes
procedures currently utilized by the industry (e.g. Berger [3]). Empirical Bayes
procedures are based on using the joint posterior modes of the sire effects as the EPD's,
conditional on estimated variance components as if they were known with certainty.
Based on results from Gianola and Foulley [17], the CT model can be readily
implemented using empirical Bayes methods since the probability density function and
the cumulative density function of a standard Student t distribution could be substituted
for the corresponding Gaussian functions needed to derive the necessary scoring
equations used to determine the required joint posterior modes. Furthermore, the degrees
of freedom parameter, v, could be jointly estimated with the variance components using
Laplace's method and tested for statistical signiﬁcance using marginal likelihood ratio
tests (Tempelman and Gianola [40]). Empirical Bayes implementation of the CT sire
model and the comparison of results with MCMC inference under the CT sire model
deserve further consideration. Unfortunately, however, these comparisons may not
necessarily apply to CP or CT animal models since joint modal estimates of EPD's in the
CP animal model can be badly biased (Mayer [27]).

In this study, we have considered only two cumulative link models for the
analysis of calving ease; conceptually, there are many others including those proposed by
Chen and Dey [8]. In fact, Albert and Chib [1] demonstrate that the cumulative logistic
link model is roughly equivalent to the CT model with v = 8. Other models can be

contrived by considering alternative heavy-tailed distributions for the underlying

91

variables, such as those considered by Rosa [32]. Some of the resulting models may be
shown to have better ﬁt to calving ease data than demonstrated with the CT model in our
paper. Speciﬁcations based on skewness (Von Rohr and Hoeschele [44]) may also have
merit.

The substantial residual and genetic correlations between birth weight and calving
difﬁculty imply that genetic evaluations of calving case would substantially beneﬁt from
a bivariate threshold/linear multiple trait analysis with birth weight ([24, 43]). Further
work on providing modeling ﬂexibility with t-distributed speciﬁcations on both traits

jointly is needed given our results and those already presented by Stranden and Gianola

[38, 39] and Rosa [32].

92

REFERENCES

[1] Albert J .H., Chib S., Bayesian analysis of binary and polychotomous response data, J.
Am. Stat. Assoc. 88 (1993) 669-679.

[2] Bennett G.L., Gregory K.E., Genetic (co)variances for calving difficulty score in
composite and parental populations of beef cattle: I. Calving difﬁculty score, birth
weight, weaning weight, and postweaning gain, J. Anim. Sci. 79 (2001) 45-51.

[3] Berger P.J., Genetic prediction for calving ease in the United States: Data, models,
and use by the dairy industry, J. Dairy Sci. 77 (1994) 1146-1153.

[4] Bertrand J .K., Wiggans G.R., Validation of data and review of results from genetic
evaluation systems for US beef and dairy cattle. Proc. 6'h World Congr. Genet. Appl.
Livest. Prod. 27 (1998): 327-330. Armidale, Australia, January 11-16, 1998.

[5] Bink M.C.A.M., Quaas R.L., Van Arendonk J .A.M., Bayesian estimation of
dispersion parameters with a reduced animal model including polygenic and QTL
effects, Genet. Sel. Evol. 30 (1998) 103-125.

[6] Carlin B.P., Polson N.G., Inference for nonconjugate Bayesian models using the
Gibbs sampler, The Canadian Journal of Statistics 19 (1991) 399-405.

[7] Carnier P., Albera A., Dal Zotto R., Groen A.F., Bona M., Bittante G., Genetic
parameters for direct and maternal calving performance over parities in Piemontese
cattle, J. Anim. Sci. 78 (2000) 2532-2539.

[8] Chen M-H., Dey D.K., Bayesian analysis for correlated ordinal data models, in: Dey
D.K., Ghosh S.K., Mallick B.K. (Ed), Generalized Linear Models: A Bayesian
Perspective, Marcel Dekker, New York, 2000, pp. 133-158.

[9] Chib S., Marginal likelihood from the Gibbs output, J. Am. Stat. Assoc. 90 (1995)
773-795.

[10] Chib S., Greenberg E., Understanding the Metropolis Hastings algorithm, Am. Stat.
49 (1995) 327-335.

[1 l] Cowles M. K., Accelerating Monte Carlo Markov Chain convergence for cumulative
link generalized linear models, Stat. and Comp. 6 (1996) 101-111.

[12] Dempster AP, The direct use of likelihood for signiﬁcance testing, Stat. and Comp.
7 (1997) 247-252.

[1 3] Emanuelson U., F ikse F ., Banos G., Impact of national genetic evaluation models on
international comparisons, in: Computational Cattle Breeding '99, 18-20 March
1999, Tuusula, Finland.

93

[14] Gelfand A.E., Model determination using sampling-based methods, in: Gilks W.R.,
S. Richardson 8., Spiegelhalter D.J.(Ed.), Markov Chain Monte Carlo in practrce,
Chapman&Hall, New York, 1996, pp. 145-162.

[15] Gelfand A.E., Ghosh S.K., Model choice: A minimum posterior predictive loss
approach, Biometrika 85 (1998) 1-11.

[16] Geyer C. J ., Practical Markov chain Monte-Carlo (with discussion), Stat. Sci. 7
(1992) 467-511.

[17] Gianola D., Foulley J. L., Sire evaluation for ordered categorical data with a
threshold model, Genet. Sel. Evol. 15 (1983) 201-224.

[1 8] Henderson C.R., Inverse of a matrix of relationships due to sires and maternal
grandsires in an inbred population, J. Dairy Sci. 59 (1976) 1585-1588.

[19] Heringstad B., Rekaya R., Gianola D., Klemetsdal G., Weigel K.A., Bayesian
analysis of liability of clinical mastitis in Norwegian cattle with a threshold model:
Effcts of data sampling method and model speciﬁcation, J. Dairy Sci. 84 (2001)
2337-2346.

[20] Jensen J ., Wang C.S., Sorensen D.A., Gianola D., Bayesian inference on variance
and covariance components for traits inﬂuenced by maternal and direct genetic

effects, using the Gibbs sampler, Acta Agric. Scand. Sect. A. Animal Sci. 44 (1994)
193-201.

[21] Kizilkaya K., Banks B.D., Carnier P., Albera A., Bittante G., Tempelman R.J.,
Bayesian inference strategies for the prediction of genetic merit using threshold
models with an application to calving ease scores in Italian Piemontese cattle, J.
Anim. Breed, Genet. (in press).

[22] Lange K.L., Little R.J.A., Taylor J .M.G., Robust statistical modeling usin the t
distribution, J. Am. Stat. Assoc. 84 (1989) 881-896.

[23] Luo M.F., Boettcher P.J., Dekkers J .C.M., Schaeffer L.R., Bayesian analysis for
estimation of genetic parameters of calving ease and stillbirth for Canadian
Holsteins, J. Dairy Sci. 82 (1999) 1848.

[24] Luo M.F., Boettcher P.J., Schaeffer L.R., Dekkers J .C.M., Bayesian inference for

categorical traits with an application to variance components estimation, J. Dairy
Sci. 84 (2001) 694-704.

[25] Manfredi E.J., San Cristobal M., F oulley J .L., Some factor affecting the estimation

of genetic parameters for cattle dystocia under a threshold model, Anim. Prod. 53
(1991a) 151-156.

94

[26] Manfredi E.J., Ducrocq V., Foulley J .L., Genetic analysis of dystocia in dairy cattle,
J. Dairy Sci. 74 (1991b) 1715-1723.

[27] Mayer M., Inequality of maximum a posteriori estimators with equivalent sire and
animal models for threshold traits, Genet. Sel. Evol. 27 (1995) : 423-435.

[28] McGuirk B.J., Going 1., Gilmour A.R., The genetic evaluation of beef sires used for
crossing with dairy cows in the UK. 2. Genetic parameters and sire merit predictions
for calving survey traits, Animal Sci. 66 (1998) 47-54.

[29] McGuirk B.J., Going 1., Gilmour A.R., The genetic evaluation of UK Holstein
Friesian sires for calving ease and related traits, Animal Sci. 68 (1999) 413-422.

[30] Newton M.A., Raftery A.E., Approximate Bayesian inference with the weighted
likelihood bootstrap, J .R. Stat. Soc. Ser. B. 56 (1994) 3-48.

[31] Rekaya R., Weigel K.A., Gianola D., Application of a structural model for genetic
covariances in international dairy sire evaluations, J. Dairy Sci. 84 (2001) 1525-
1530.

[32] Rosa G.J.M., Robust mixed linear models in quantitative genetics: Bayesian analysis
via Gibbs sampling, in: Proceedings of International Symposium on Animal
Breeding and Genetics, 21-24 September 1999, Brazil, pp. 133-159.

[33] Rubin D.B., Iteratively reweighted least squares. In Encyclopedia of Statistical
Sciences, 4 (1983) 272-275.

[34] Satagopan J .M. Newton M.A., Raftery A.E., Easy estimation of normalizing
constants and Bayes factors from posterior simulation: stabilizing the harmonic
mean estimator, Technical Report 1028, Department of Statistics. 2001.
http://www.stat.wisc.edu/~newton/papers/abstracts/trl 028a.html

[3 5] Searle S.R., Matrix Algebra Useful for Statistics, John Wiley & Sons, NewYork,
1982.

[36] Sorensen D.A., Andersen S., Gianola D., Korsgaard I., Bayesian inference in
threshold models using Gibbs sampling, Genet. Sel. Evol. 27 (1995) 229-249.

[37] Spiegelhalter D.J., Best N.G., Carlin B.P., van der Linde A., Bayesian measures of
model complexity and ﬁt, J .R. Statist. Soc. B 64 (2002) 1-34.

[3 8] Stranden 1., Gianola D., Attenuating effects of preferential treatment with Student-t
mixed linear models: a simulation study, Genet. Sel. Evol. 30 (1998) 565-583.

95

[39] Stranden I., Gianola D., Mixed effects linear models with t-distributions for
quantitative genetic analysis: a Bayesian approach, Genet. Sel. Evol. 31 (1999) 25-
42.

[40] Tempelman R.J., Gianola D., A mixed effects model for overdispersed count data in
animal breeding, Biometrics 52 (1996) 265-279.

[41] Uimari P., Thaller G., Hoeschele I., The use of multiple markers in a Bayesian
method for mapping quantitative trait loci, Genetics 143 (1996) 1831-1842.

[42] van Dyk D.A., Meng X-L., The art of data augmentation (with discussion). J. Comp.
and Grap. Stat. 10 (2001) l-SO.

[43] Varona L., Misztal 1., Bertrand J. K., Threshold-linear versus linear-linear analysis
of birth weight and calving ease using an animal model: I. Variance component
estimation, J. Anim. Sci. 77 (1999) 1994-2002.

[44] von Rohr P., Hoeschele 1., Bayesian QTL mapping using skewed Student-t
distribution, Genet. Sel. Evol. 34 (2002) 1-21.

[45] Wang C.S., Rutledge J .J ., Gianola D., Bayesian analysis of mixed linear models via
Gibbs sampling with an application to litter size in Iberian pigs, Genet. Sel. Evol. 26
(1994) 91-115

[46] Wang C.S., Quaas R.L., Pollak E.J., Bayesian analysis of calving ease score and
birth weights, Genet. Sel. Evol. 29 (1997) 117-143.

[47] Zhu L., Carlin B.P., Comparing hierarchical models for spatio-temporally
misaligned data using the deviance information criterion, Statistics in Medicine 19
(2000) 2265-2278.

96

.Amama 530 we 838:8 85:03 0253 REE 05 m5? 838% 8098305 Mo 52:50 0280b“. 05...
.352: £538 8:88 35.33 $8.
douse-ow 23:8..." um :88 85809.
.508 803% m 8m 330:9: coumaaom comm dousngmmu .0328“
36300 53> uoEooam = 8:233 “8038.“ we moopwou v 53> coasnmammv 3668 a 53, @250on _ comm—among

 

 

 

 

 

E 8. 2 z. - one 9.3 a 3% 2 ”$5 - $2 ”3:. 4 am 3
2 ~28 - 3.8 3&8 a 3.5 om 03.8 - 3.8 8.0: a 3; Na
om owes - 8.4 3% a 2.8 a 2.32 - 3.9 SE. I. 02 7:
mm ”Em - 32.. 3.2 a as :5 Sn - Ga 2d a a? E
3 8.2 - 8.4 SM 4 was R? 3.4 - a? 08 a NS. 2
E 32 - «3. a: a ”mm 3.3 :3. - 2a 98 a o; Z
38 an: :3 Em a SE 38 .E :3 new a. 2m .easo-=o§a£
3520 «do aaﬁq

 

.6on

...—E: 033383 05 main beam cows—58mm 5 A3 83:8qu :83on mo moocwow 3:28.. 23 :0 ooaocomﬁ 8538 A «Sun.

97

.AmaoC 8960 .8 88830 88888 gamma 3:8 05 wﬁw: 8888 835388 .«o 838:: 0368.8 056

.1885 £58ng 8:88 35.33 $3.
83833 855m um :88 82.88%.
.9833 m 8m 38888 sous—~88 comm 8889388 8358
888350 53, 3&8on a noun—smog ”8258..“ mo 88an v 53» 8085588 F5288 8 5m? vommooam H 8288903

 

 

 

98

t: a; - 5o 85 a N3 in? m; - So So a 3o w:
o: 8o - 8d :3 a MS 23.. 3o - 5o 85 a N3 E
a: a; - 3o 85 a :5 2a.: 2.3 - 3o 85 a o; 7:
8a :5 - 3o 85 a 3o 8%: 3.0 - 8d 8.0 a o; 2
85 N2 - 3o :3 a m; 3.2. mg - 3o 25 « Ed 8
33 a; - 86 So a :6 88% 8o - mod 85 a m; E
..mmm or; :3 ..am a sa ummm .5 $8 ..8 a :3 833-8388
9% 355 «do 32%:

 

.388 x83 038383 05 86: 35% 828886 8 va 8583 8% no 88888 Summon .= 033—.

.3883 m .8.“ 330:3." scum—:98 scam .aousnﬁmsu 3:28.—

Swﬁmsmo 55 93:00am a cows—smog ”888$ mo moouwou v 53> cocsnwummu 3662 ~ 53, 2&6on H coumﬁmoma

 

 

 

 

 

33. Si 23. 3% $3. $3 a Ga- RS 3
83- g 3 £3- Ea $3”- $3 83.- on 3 Na
33. £3 $3- 3% 29m- $3 83.- :3 7:
Sam- 8% $3- RE 331 83 move- 23 E
83- 23 $3- :3 Q 3- $3 $3. _ K.” 2
$3. $3 33. 83 23. 8% 83: S; E
,9: 05 dﬁ 05 as: 05 d3 05 “633-8338
388 .358
x57: gum—58:0 x5. :poﬁ 0333850 .258 8.5.: “macaw E58 Hobo 56950
3% :88 38 mo we .22 $8 Esmﬂo maﬁé

 

.buBm

cots—25m E 30on 50253 385388 326 vocati— HEwEE wo_ was ORG «route cow—Eco? 35305 A: 033,—.

99

HES—o 8:: 05 888 mmm E owcﬁ 8 “£8 H H mos—g BEBE imam: .836 Mo 838ch

8:268 0353 3:5 2: maﬁa 38:0 0202 8:: 05 888 8353 E35039: mo 698:: 938mm". 33 2:6
.EEBE b23305 Bremen uBEIBao $30

dozﬁﬁé wavcﬂm H 588 How—88mg

downtomoc 8m “x8 coma

 

 

Ex I 82 5% - 8:
EH vac- - m3- m; H woo- 82 m3- - who- :6 H «We. a...
:3 I 3: 8% I a:
83 3o - 3o :3 H :.o 85 m3 - So 35 H Ed a”:
SHE I £1: 68; I mm:
53 m3 - ”do 85 H 93 ER one - one So H as a”:
:53 I 08.2 38m I and H
Ewe $3 I 33 cm; H 82 83 83 I M: S 23 H 2; Nb
$8; I 3a :3 I 32 E
ER 83 - Sod sod H E; 35 mod - mood 23 H 53 b
5» I $2 3% - 2.2 2
SE 83 - Rod 23 H 83 $2 83 - Sod _ :3 H 3.90 Nb
53 I 3m: :8; - 23 H
:2 $2 - a; :55 H ES 53“ 83 - SS 085 H 3:. Nb
,.me Ora $3 Em H E Hmmm era :3 new H 35 “3935

 

388 H5: 33230

 

 

 

388 vi: “58 0333830
.233 88.8an SES— E 388 38 wag—8 mo Eouogm own—how :o 8:203 Hotonom .>— 933—.

100

.mhouogn mo Hon—En: 258% on?

.883 :88 85609 no 893 853% 563mm,.

60.83% mammoxmm .«o omega on?

 

 

 

god- 912 gm v3.2 wing m 330
«mod- :3; mg _mm.m _ 3&3 N 530
36.x- $12 2% mama“ wvmdﬁ _ EEO
mend- 9%.: $5 2.5% ﬂ QWS m 520
35.x- 3A.: awn and ~ 893 N 530
mend- 9&2 an Nana a v3.3 g £30

“505 3%:50
d3 93 0 am 985 an

 

.238 8820805 5%: E 888 88 wag—3 mo imbued
2: é 32$ 32d Banzai 3538. 3 Ba 6:: 8:35 Homage? 859% ommoaméﬁo 0292 .> 2.3

101

a as us 8: uaaauaa 5S 355 a5 Boa
“5:25:80 3%? #53 0333850 no woman mooaohobmv ~30on ohm 338:3 mo 338 Bremen .«o “BR—utmom A 0.59,.—

 

E. .32.. 9.5358
3-

   

Fl

”WI 1 aNielmm‘O

me

102

.5 63 .«o 0:: womaEtoE—m 53, .308 Ma: .505 0363850
9%? ME? «33383 co woman moocouobﬁ >5on 2% 33833 .«o maggot “5933 8560a 90 “29838 .N «...—ugh

, xc=wima «>32:an

 

ad to no No to o
l 1 ‘ w‘ . ‘ :11 “\‘w . - . . o

to

S m

m

0o m

to m

We w

”I
go

No

103

CHAPTER 4

Bayesian structural modeling of heterogeneous residual variances in

generalized linear mixed models

ABSTRACT

A Bayesian hierarchical generalized linear mixed model (GLMM) based on a
structural multifactorial model with ﬁxed and random effects that multiplicatively
inﬂuence residual heteroskedasticity was developed. The two speciﬁc GLMM
considered in this paper are the linear mixed mode] (LMM) for the analysis of normally
distributed characters like birth weight and the threshold or cumulative probit mixed
model (CPMM) for the analysis of ordinal categorical data like calving ease. We
validated our models using Markov Chain Monte Carlo (MCMC) methods for posterior
inference on parameters specifying residual heteroskedasticity. Three replicated datasets
on normally distributed and ordinal categorical data from each of four different
populations, characterized by various levels of residual heteroskedasticity, were
generated. Residual heteroskedasticity parameters, including ﬁxed and random effects
speciﬁcations, were particularly well estimated for the analysis of normal data and when
heteroskedasticity was extreme. Furthermore, the deviance information criterion (DIC)
was useful in correctly choosing between heteroskedastic and homoskedastic models.
Sire and maternal grandsire heteroskedastic linear and cumulative probit link threshold
models were ﬁtted, respectively, to birth weight (BW) and calving ease (CE) data on
8,847 Italian Piemontese ﬁrst parity dams. The residual variance for male calves was

signiﬁcantly greater than that for female calves for both BW and CE, translating to

substantial differences in the posterior means i standard deviations of direct heritabilities

104

for BW (0.32 i 0.05 for females versus 0.25 i 0.04 for males) and for CE (0.504 t 0.083
for females versus 0.370 i 0.064 for males). Although the heteroskedastic LMM and

CPMM were chosen by DIC as the better ﬁtting model for BW and CE, respectively, the
high correlation between posterior means of sire effects (>0.97) suggested no meaningful

rerankings.

INTRODUCTION

An important assumption in many genetic evaluation models is that variance
components associated with random and residual effects are homogeneous across all
conditions, whether it be across sexes of animal or environmental conditions, such as
herds. These models are said to be homoskedastic. However, the existence of
heterogeneous variances, or heteroskedasticity, for milk production ([16], [18]), grth
performance (Garrick et al. [9]) and conformation traits (Robert-Granie et a1. [30]) has
been ﬁrmly established.

A number of possible causes for heteroskedasticity have been suggested,
including a positive relationship between herd means and variances, differences across
geographical regions, and changes in variability over time due to changing herd
managements (Weigel et al. [40]). If this phenomenon is not properly taken into account
in generalized linear mixed models, differences in within-subclass variances can result in
biased breeding value predictions with disproportionate nmnbers of animals selected
from environmental subclasses characterized by high variability. This bias potentially
translates into a reduced genetic progress due to breedstock selection based on these

predictions ([17], [40]).

105

Generalized linear mixed models (GLMM) can be readily extended to take into
account heteroskedasticity. As an example, best linear unbiased prediction (BLUP) can
be reliably used in breedstock selection when random effects and residual variances are
heterogeneous, provided they are correctly speciﬁed across subclasses characterized by
unique variance components (Gianola [12]). However, variance components are rarely
known and may need to be estimated for a large number of small subclasses when
heteroskedasticity is present. Since most preferred methods of variance components
estimation, including maximum likelihood (ML) and restricted maximum likelihood
(REML), have only asymptotically desirable properties, such procedures may not yield
reliable estimates within small subclasses ([40], [41]).

Foulley et al. [7] introduced an empirical Bayes procedure for assessing sources

of residual heteroskedasticity in Gaussian linear mixed models (LMM). Their method

was based on the use of a structural log link model 7,. = Ina: = m3)» , where of, is the
residual variance component speciﬁc to subclass i, m ,. is a known incidence matrix
unique to subclass i connecting 0': to A. , a vector of unknown dispersion parameters.

Hypothesis testing on elements of A. was based on an approximate marginal likelihood
ratio test. In addition, Foulley et al. [7] and San Cristobal et al. [32] further extended the
structural log link model such that 3. includes both "ﬁxed" and "random" factor
partitions. As in a classical linear model, the distinction between these two classes of
effects for modeling heteroskedasticity is such that levels of a random factor derive from
a probability distribution (Searle et al. [33]); that is, a structural prior is placed on random
effects. Conversely, ﬁxed effects involve parameters that might be characterized by

subjective or noninformative priors. For example, sex of animal might be considered

106

ﬁxed whereas herds might be considered random with respect to a structural
classiﬁcation model for heteroskedasticity (San Cristobal et al. [32]). As with random
location effects in classical linear mixed effects models, specifying structural priors on
effects inﬂuencing heteroskedasticity is particularly beneﬁcial in that statistical
information on one effect (e.g. herd) is borrowed from the distribution of all such effects
' within the same factor (Robinson [31]).

Foulley and Gianola [8] adapted the structural log linear ﬁxed effects model for
heterogeneous residual variances in a threshold or cumulative probit mixed model
(CPMM), using likelihood procedures for residual heteroskedastic inference on calving
ease in American Simmentals as a function of sex of calf and age of dam. Ducrocq [6]
analyzed calving ease data of French dairy breeds (Normande and Montbeliarde) using
the methods developed by Foulley and Gianola [8] method, considering the effects of
sex, age of dam, parity, region, and year on residual heteroskedasticity on calving ease
scores.

Many of the proposed residual heteroskedastic models for animal breeding,
however, invoke analytical approximations, which appear tenuous, particularly for the
analysis of categorical data. Furthermore, we perceive the lack of a unifying framework
for structural modeling of heterogeneous variances in GLMM analysis for both
continuous production and categorical ﬁtness traits. That is, structural prior or random
effects speciﬁcations have been developed for Gaussian error models, as by Foulley et al.
[7] and San Cristobal et al. [32] but not for any other GLMM, such as the CPMM, in
animal breeding. The objectives of our study were 1) to develop and validate a Bayesian

structural multiplicative model on residual variances for observed or augmented variables

107

in a heteroskedastic GLMM, concentrating on a LMM analysis of normal data and a
CPMM analysis of ordinal data based on use of Markov Chain Monte Carlo (MCMC)
methods and 2) to apply the model using MCMC to a calving ease dataset derived from

the Italian Piemontese population.

MATERIALS and METHODS
HIERARCHICAL CONSTRUCTION OF THE HETEROSKEDASTIC
GENERALIZED LINEAR MIXED MODEL

In a number of GLMM, data augmentation schemes exist such that a nxl vector

of either observed or augmented variables L = {L}; conceptually maps one-to-one to

n
i:

the data vector Y ={ ,.} l . Examples where such augmented variables are useful include

threshold models (Sorensen et al. [34]) and censored data models (Sorensen et a1. [3 5]).
We write a linear mixed effects model as

L=XB+Zu+e (1)
where B is a vector of ﬁxed location effects, u is a vector of random location effects, X

and Z are known design matrices and e is a vector of normally distributed residuals with
variance covariance matrix R having a certain heteroskedasticity speciﬁcation as deﬁned

later. The linear model in (l) is equivalent to the following distributional speciﬁcation:
LlB,u,R~p(Llli,u,R)=N(XB+Zu,R) (2)
For normally distributed data as in a LMM, yEL such that

p(Y | B,u) -_-= p(L | B,u) whereas for ordinal data with, say, C = 4 categories, L maps to y

as follows:

108

 

rl to < L, S r,
2 r < L. s r
K = T l I 2 (3)
3 Q<LSQ
L4 13 <L, <r4 i=1....n
where 1:o = —oo < r, < 12 < 173 < r4 = 00 are threshold parameters that deﬁne bin
boundaries for y based on L.
We further partition e according to residual variance subclasses
e' = [e]l e}, e; where e” ~N(0,R,,, =1”, of”) pertains to the nuxl subvector of

residuals identiﬁed with the k"1 level (Ir—=1 ,2,. ...s) of a ﬁxed effect subclass (e.g. sex) and
I"1 level (l=1,2,. . ..t) of a random effect subclass (e.g. herd) that jointly inﬂuence the
residual variance for the k!“ subclass. For pedagogical purposes, we just concentrate on
the effect of one ﬁxed factor and one random factor on residual heteroskedasticity but

extensions to additional factors are possible. We partition the data

y' = [y], y',2 y:,,] and the corresponding augmented variables

L' = [L], L], L5,] accordingly. We propose a multiplicative structural effects

model as follows:

02 £25,, k=1,2,...s;1=1,2,....t. (4)

ck!

where 6;: is the residual variance identiﬁed with the kth level of the ﬁxed effect subclass

and 6, >0 is a random multiplicative scaling factor unique to the 1th level of the random
effect. Note then that ﬁxed and random effects speciﬁcations are provided on both
location and residual dispersion parameters, although the classes of effects considered do
not need to be the same as for the log structural heteroskedastic models of Foulley et al.

[7] and San Cristobal et al. [32].

109

The prior density on the ﬁxed location effects [3 is speciﬁed
[5 ~ 1303) (5)
where p(ﬂ) is a subjective prior, typically speciﬁed to be ﬂat or vaguely informative.

Furthermore, the random location effects are typically characterized by a structural

multivariate prior speciﬁcation:

ulcp ~ p(uw) = N (0,G(¢)) (6)
Here C((p) is a variance-covariance matrix that is a function of several unknown
variance components or variance-covariance matrices in q) , depending on whether or not

there are multiple sets of random effects and/or speciﬁed covariances between these sets;
an example of the latter is the covariance between additive and maternal genetic effects.
Furthermore, ﬂat priors, inverted Gamma densities, inverted Wishart densities or

products thereof may be speciﬁed for the prior density on (p ,
‘P ~ p(v) (7)

depending, again, on the number of sets of random effects and whether there are any

covariances thereof (Jensen et al. [20]).

A subjective conjugate inverted-gamma prior density or, alternatively, a ﬂat prior
density may be speciﬁed separately for each 6': , i.e.

53, ~P(53.) (8)
for k =1 ,2,. . .,s. Conversely, a structural prior is used to model the random residual

dispersion effects, 6,, 1: 1,2,. . .,t. We conveniently choose this structural prior to be an

inverted-gamma density with parameters on. and one-1,

110

 

—1 a" _a_ —1
p(6,la.)oc(—“I:(a—))—(6,) ‘ . "emf—“:5 ); I= 1,2,....1. (9)
e I

Here E(§,) = land Var(6,) =

 

1 2 such that as a, ——> oo , the random dispersion effects
a _

inﬂuence on residual heteroskedasticity diminishes. Note that with the speciﬁcation in
(5), there is a borrowing of information across levels 1= 1, 2,. . ., t of the random factor,

just as there is for classical random effects modeling of location parameters. Generally,

a, is unknown such that a subjective prior may be placed on it. One noninformative prior

used in our applications is speciﬁed as follows:

6!. ~ p(a.) 0C

1
(1+a¢)2 (10)

which is identical to specifying a Uniform(0,1) prior on

 

1
+ a)

The remaining hierarchical speciﬁcations in this heteroskedastic GLMM depend
upon the ﬁrst (data sampling) stage of the n x 1 data vector y. For a LMM analysis of
normal error data, equation (1) would sufﬁce (i.e. y a L) such that no augmented
variables are required, whereas for a CPMM analysis of ordinal data with C ordinal
categories, numbered j = 1,2,. . .,C, we would specify the ﬁrst stage of our hierarchical

model using Sorensen et al. [34]:

p(y I L,‘r)=1—"-I{Zl(‘rj-l < Likl < Tj)1(yild :j)} (Ila)

i=l j=|

followed by, using a scalar representation of (2),

L,“ |B,u,6‘:6, ~ N(x;k,B+z;k,u,6:6,). (11b)

111

Here L,“ and y,“ are the ith elements of LH and y“, respectively, whereas x'm and 2',“ are

known incidence row vectors corresponding to subject i within the )‘d'’1 residual variance
subclass for k = l,2...,s,l=1,2,...,t, and i = 1,2,...,nk1. As before,1' = [To 1, 1C],

denotes a vector of unknown threshold parameters that delimit the augmented variables L
into their respective observed data bins y with 1(.) denoting the indicator function having

value 1 if the condition within the function is true, being 0 otherwise.

Recall previously that To = -oo and rc = +00 but further note that in addition 1:1 is

ﬁxed to an arbitrary constant in ‘order to satisfy identiﬁability constraints as in Sorensen
et al. [34]. We further adopt the alternative parameterization presented by Sorensen et al.
[34] in which residual variance is explicitly modeled, rather than typically constrained to
equal to 1, as typically invoked in homoskedastic error CPMM (Gianola and Foulley
[13]). This speciﬁcation thereby requires one additional constraint on 1' , such that C-3
parameters in 1' are uniquely identiﬁable. A prior distribution on the uniquely identiﬁable
elements of 1' may be speciﬁed provided that the order constraints on elements of r are

satisﬁed (Sorenson et al. [34]), i.e.

'r'=[t2 t3 rm] ~p(1:); rz<r3<m<1'(..l (12)
Note that our parameterization of the heteroskedastic cumulative probit threshold

model, as per Foulley and Gianola [8], does not readily extend to the modeling of binary

outcomes since it is required that C-3 2 0.

The joint posterior density of B,u,{6';: }5

kg, ’91}; Aka, and any other parameters

necessary for the GLMM in question (i.e. L and t in the CPMM) is simply speciﬁed as

112

the product of the various stages of the hierarchical model. That is, for the LMM where

y=L, the joint posterior density of all unknowns speciﬁed to proportionality is:

p(ﬂ “{0. ,:=,} ,{6 },,.,¢,a.ly)

oc p(y I r u {0. 1:1,. 6,}’)p(mp(u | 0120) (13)
[I110(0')1[I:1P(: I0t )1p(a )
whereas for the CPMM,

p,(L 1', p, u,{a h.-. {6,};,.,¢,a. Iy)
«p(yIL r)p(L|B u,a.{ LIZ. WI.) (14)

p(r)p(l3)p(u I ¢)p(¢)[1:1p(0. )](1:1p(5) Ia )]p(a.)
Note that p(L | B, u, {0' I}; I ,{6,};=l) is equivalent to (2) given that R is a function of

{03: 1;] and {6111:1'

A MCMC inference strategy requires determination of and sampling from the full
conditional distributions (FCD) of each parameter or groupings thereof. The FCD for
ﬁxed and random location effects, regardless of whether inference is based on the LMM

or the CPMM, can be readily shown to be multivariate normal

“ I -l r -l -'
_2 I3 x R x x R 2
B9 “1(P90—‘ez 0’2e 9 "no.39619629°--9619ae9y9L~N 9

ii Z'R"X z'R"z+(G(¢))"

(15)

113

-1
A xrR-lx xoR-IZ , -1
where [B] =[ )4] [X R L] is the typical mixed model

a Z'R"X z'R"z+(G(¢) Z'R"L

solutions to B and u based on current MCMC-sampled values of R, L, and (p.

S.

l I

Furthermore, R'l = k

I

ll

R;,' , X = {x'ﬂd} , and Z = {z'm}. Univariate strategies for sampling

1.
I

from (15) are elucidated in Wang et al. [39].

Using results from Sorensen et al. [34], it is straightforward to generate from the
FCD of individual elements of L and I under the CPMM. We prefer, however, the
Metropolis-Hastings and method of composition joint update of L and 1: presented by
Cowles [5]. She demonstrated and we have further noted in our previous applications
(Kizilkaya et al. [21 22]) that the resulting MCMC mixing properties using this joint
update are vastly superior to using separate Gibbs updates on individual elements of L
and 1‘ as outlined by Sorensen et al. [34].

For either the CPM or the LMM, it is straightforward to generate elements of
(p from their FCD using results presented in previous work. If some partitions of «p form
a variance-covariance matrix, then their respective FCD can be readily shown to be
inverted-Wishart (Jensen et al. [20]) whereas if other partitions of q) involve scalar
variance but no covariance components, then the FCD of each component can be shown
to be inverted-gamma ([34], [39]).

We subsequently derive the F CD for 6: for either the CPMM or the LMM.

Under the CPMM,

114

—2 —2 —2 —2 —2 —2
p(O’q “avg-9'2"" "06119099.. 9 ".903: 96195 29 M619ﬂu9¢9ae9L979Y)

" ' L; Lu "xu "Z11 11"ku "Zu
«(nmaw )611- -2‘ " 13255 B ") “6)

‘1

 

 

 

 

9 x
:99” ( :(L’d -X,,,B-Z,,,u) (Lu -X,,,B-Z,,,ll)
-2 ""2 _ 1.. 26, _2
°' (“99) p(a.)
K )

 

 

for k = 1,2,. . .,s. Under the LMM, y“ is substituted for Lu and conditioning on t and L is

not required in the speciﬁcation of the FCD of 6": . With a ﬂat noninformative prior

speciﬁed for 0,,2 (i. e. p(o' 2)oc 1, the FCD for 0,2 is then simply inverted gamma with

I
2"”

—1 and 6,, = 2'3“" "XUB‘ZWELH ‘ xuﬂ-Zuu)

. It can

 

 

parameters 0,, =

also be readily shown that if one speciﬁes p(E 2) to be inverted gamma, the FCD of 0':

will also be inverted gamma.

The FCD of 6, can be similarly derived:

p(6,|63.6 6,... ...63 .6 ..6... ,6-..,..6,... , 6.11.u.¢,a..y,L.r)

CCU—[(03 6,)2 "3’ Jexp[_%§(1m -Xul3-Z,,,u) (Lu ‘qu-Zuu)

 

 

 

 

 

1.1 63 6,
(ac-1a) -(a,+l) ae—l
r(a,) "—_(‘5’) 6"" a, (17)
f \
,. L -X B- -Z n L -X B- -Z n
‘24,,“ 2(H H H 2)-(2H kl H )w _1
m 5, ‘2 +a,+l exp _k=l
5:
I 2

 

 

115

That is, the FCD for 6, is then simply inverted gamma with shape parameters

5
Z nu

a, = E]?— + a, and scale parameter

p, =:(L"’ -X,,,ﬂ—Z,,,u)_(2L,,, —X,,,|$—Z,,,u) +a, —1 for I = 1,2,...,t.
k=l 20c,

The F CD for a, based on the prior p(a, ) adopted from (10), is

e,’

at %exp[—(qc —1)§61-l ]l:ll(6’ )“(a,+l)p (ac)

Now (18) is not a recognizable density. We sample from y/ = log (a, ) using a

p(a,111,..,q1,6'3,6,3,...,62 6,,6,,...,6,,y,1.,r)
(18)

random walk Metropolis-Hastings sampler with a Gaussian proposal density tuned during
MCMC bum-in such that the Metropolis-Hastings acceptance rates are intermediate for

optimal MCMC mixing (Chib and Greenberg [4]).

MODEL COMPARISON
The deviance information criterion (DIC) has been recently proposed for
comparing goodness of ﬁt for alternative constructions of hierarchical models to data
(Speigelhalter et al. [36]) and has been increasingly used in animal breeding ([15], [22],
[29]). The DIC is based on the posterior distribution of the deviance statistic or -2 times
the sampling distribution of the data as speciﬁed in the ﬁrst stage of a hierarchical model.

For the LMM, the data sampling stage is speciﬁed as:

116

p(y l B,u,¢p,6: ,...,6": ,§,,...,5,,y)

s I n m _ , 4 _ 19
=Hn(2n)'1(63,6,)"9 app—"i X“B+Z““]l:“[y“ XHWZEEI] ( )

k=l I=l
whereas for the CPMM, the data sampling stage is based on further marginalizing the

product of (11a) and (11b) across all n observations over L and is written as follows:

p(y | B,u,q1,63l,...,E:,6,,...,6,,y)

 

 

= ﬁﬁﬁpmbocu = 11,, |mun-,5:,...,6:,5,,...,6,) (20)
k=l Is] is]
s I n1, 7, —(x;,,ﬂ+z;,,u) Ty, -1-(321139‘111‘1)
= (D ,1, —(D u;
UUU 1 Jags, 1 1 J636, 1

The DIC is computed as the sum of average Bayesian deviance (5) plus the
"effective number of parameters" (pp) with respect to a model, such that smaller DIC
values indicate better ﬁt to the data. The log marginal likelihood (LML) of the data for a
certain model is an alternative model choice criterion that we have adopted in previous
research (Kizilkaya et al. [22]). Larger values of LML indicate better model ﬁt to the
data. We base our LML computations on (19) and (20) in comparing the two respective
models. More information about DIC and LML determinations can be found in

Speigelhalter et al. [36] , Gelfand [10] and Kizilkaya et al. [22].

DATA
Simulation Study
A simulation study was carried out to validate Bayesian inference on the proposed
heteroskedastic error LMM and CPMM and to assess the ability of the DIC and the LML

to correctly choose between homoskedastic and heteroskedastic error GLMM. A simple

117

mixed effects model was used to generate underlying variables liability data L for n =
2500 progeny from each of 50 unrelated sires:

LW = ,u + sex, + herd J. + sire, + em,
Here ,u =O.5, {sen}:=1 (sex, =-O.5 and sex, =0.5) represents a 2 x 1 vector of ﬁxed sex

100
j=I

effects. F urtherrnore, {herd I} ~ N (0,10,?) is a 100 x 1 vector of random effects and

{sirek K: I ~ N (0,103) represent a 50x1 vector of independent random effects with

0,? =0.25 and 0,2 =0.10. Finally, e”. = {em} ~ N (0,1,0: 6]. )is the vector of residuals
associated with the n”. records from sex i and herd j. Three replicated datasets from each
of four different populations or different values of or, were generated: Population I) one=3,
Population II) ae=12, and Population III) one=50 each with 6: =1, 62 =1.25 and

5]. ~ Inverted- Gamma(a,,,ae —1),j=1,2....100; and Population IV with homoskedastic
error, i.e. 06:60 with 0: =6; =1 and 6,. =1,j=1,2,...,100. The values ae=3, 12 and 50

represent extreme, moderate and mild levels, respectively, of residual heteroskedasticity
across herds. Levels of ﬁxed and random location effects were randomly assigned to

individuals in data generation. Augmented data L was mapped to ordinal data y based on

C=4 categories with r, =-0.50, 2', =1.00 and r3=2.00 in all populations. Both L and y

were analyzed using the appropriate GLMM (LMM and CPMM, respectively) based on

both homogeneous and heterogeneous residual variance structures with ﬂat priors utilized

on B , 0: , 0,: , 0,2 , 0,? , and a noninformative prior, as previously described, used for on...

For the purposes of parameter identiﬁability, we invoked the restrictions r, =-0.50

118

and r2 =1 .00. MCMC was used for Bayesian posterior inference on all parameters. For

each replicated data set within each population, a burn-in period of 10,000 cycles was
discarded before saving samples from each of an additional 100,000 MCMC cycles.
Graphical inspection of the MCMC chains based on preliminary analyses was used to
determine a common length of burn-in period. Furthermore, DIC and LML values were
computed for each model on each replicated dataset to validate those measures as reliable

model choice criteria.

Italian Piemontese Birth Weight and Calving Ease Data

Birth weight and ﬁrst parity calving ease scores recorded on Italian Piemontese
cattle from January, 1989 to July, 1998 by Associazione Nazionale Allevatori Bovini di
Razza Piemontese (ANABORAPI), Strada Trinita 32a, 12061 Carru, Italy were used for
this study. As in Kizilkaya et al. [22], only herds that were represented by at least 100
records were considered in the study, leaving a total of 8,847 records. Calving ease was
scored into ﬁve categories by breeders and subsequently recorded by technicians who
visited the breeders monthly. The ﬁve ordered categories are: 1) unassisted delivery, 2)
assisted easy calving 3) assisted difﬁcult calving 4) Caesarean section and 5) foetotomy.
As the incidence of foetotomy was less than 0.5%, the last two ordinal categories were
combined, leaving a total of four mutually exclusive categories. The general frequencies
of ﬁrst parity calving ease scores in the data set were 951 (10.75%) for unassisted
delivery; 5,514 (62.32%) for assisted easy calving; 1,316 (14.88%) for assisted difficult

calving; and 1,066 (12.05%) for Caesarean section and foetotomy.

119

The effects of dam age, sex of the calf, and their interaction were considered by
combining eight different age groups (20 to 23, 23 to 25, 25 to 27, 27 to 29, 29 to 31, 31
to 33, 33 to 35, and 35 to 38 months) with sex of calf for a total of 16 nominal subclasses.
Herd-year-season (HYS) subclasses were created from combinations of herd, year, and
two different seasons (from November to April and from May to October) as in Carnier
et al. [3] who also analyzed calving ease data from this same population. The sire
pedigree ﬁle was further pruned by striking out identiﬁcations of sires having no
daughters with calving ease records and appearing only once as either a sire or a maternal
grandsire of a sire having daughters with records in the data ﬁle. The number of sires
remaining in the pedigree ﬁle after pruning was 1,929.

As in Kizilkaya et al. [21 22], the LMM and CPMM used for the analyses of birth
weight and calving ease data included the ﬁxed effects of age of dam classiﬁcations, sex

of calf and their interaction in [1 , the random effects of independent herd-year-season

effects in h, random sire effects in s and random maternal grandsire effects in m. We

assume:
8 0
I 1911169099)
111 0
and
h ~ N(0,ra,f)
0': as»: 9 2
where G() = 2 , With 0,, denoting the sire variance, 03, denoting the maternal

grandsire variance, 0,", denoting the sire-matemal grandsire covariance, and 0:

denoting the HYS variance. Furthermore, (3 denotes the Kronecker product (Searle

120

[33]), and A is the numerator additive relationship matrix between sires due to identiﬁed
male ancestors (Henderson [14]). Also, h is assumed independent of s and m. Flat priors

were placed on all ﬁxed location effects and variance components.

Residual heteroskedasticity was modeled as a ﬁmction of ﬁxed sex effects (0'3”!
and 53m ) and random herd effects (6]. ,j=1,2,. . .,66) where 0‘]. ~ Inverted-

Gamma( 09,, a, -1) in a similar way as in the simulation study. The same ﬂat priors that

were considered for ﬁxed dispersion effects and a, in the simulation study were also

considered here.

MCMC inference was based on the running of three different chains for each
model. For each chain, a total of 20,000 burn-in cycles followed by saving samples from
each of 100,000 additional cycles was generated as previously done in Kizilkaya et al.
[21 22]. To facilitate diagnosis of MCMC convergence by 20,000 cycles, the starting
values on variance components for each chain within a model were widely discrepant,
with one chain starting at the posterior mean of all (co)variance components based on a
preliminary analysis, another chain starting at the posterior mean minus 3 posterior
standard deviations for each (co)variance component and the ﬁnal chain starting at the

posterior mean plus 3 posterior standard deviations for each (co) variance component.

Key genetic parameters, speciﬁcally direct heritability (113 ), maternal heritability

(hi) and the direct-maternal genetic correlation (rdm) were inferred upon in the calving

ease data using the transformations on G0 as considered by Kizilkaya et al. [21 22];

however, hi and hi, was determined separately for each sex dependent upon the use of

either 62 or 0'2 in the denominator.
e Male eFemale

121

For both the simulation study and the calving ease data analysis, the effective
number of independent samples (ESS) was determined using the initial positive sequence

estimator of Geyer [11] as adapted by Sorensen et al. [34].

RESULTS
Simulation Study

Posterior means and standard deviations and 95% equal-tailed posterior
probability intervals (95%PPI) on 0,2 , 0,3 , E: , 0,: and (1,3 based on the three replicated
datasets ﬁ'om each of the four populations, are provided in Table I for LMM analysis of
L and Table II for CPMM for the corresponding mapped ordinal values of y. For each of

the four populations in the simulation study, posterior means of 03 and 0: were

generally slightly biased upwards for the LMM analysis on L and for the CPMM analysis
on y. Nevertheless, 95% PPIs of these two variance components under both the LMM
and CPMM analyses included the true values of parameters in all cases, being
understandably wider for the CPMM analysis of y due to the substantial loss in data

information in mapping from continuous L to discrete y.

The 95% PPI of 0’: and 0": in both the LMM and CPMM analyses of L and y,

similarly, always included the true values of 6,: =1 and 5,: =1.25, respectively, for each

of the three replicated datasets from each of Populations I, II, and HI. Furthermore, for

Population IV with homoskedastic error, posterior means and 95% PPIs for

63' and 632 were found to be similar each other and concentrated around the true value

0: = 6,: =1 as anticipated. Also, 95%PPI for are inferred on each of the three datasets

122

from each of Populations I, II, and HI included the corresponding true parameter value.

Posterior standard deviations and widths of 95%PPIs of ate increased as (1, increased

from 3 to 12 to 50 as in Populations I, II, and 1H, respectively. Furthermore, because of
the lower information content of ordinal data relative to underlying liability data, CPMM

analyses produced understandably wider 95% PPI and larger standard errors on a,

relative to LMM analyses.

To further validate the residual heteroskedastic GLMM that we propose, the
posterior means of each element of 8 as unique for each level of random factor for herd-
speciﬁc residual variances were compared against the true elements of 8 for each
simulation replicate from Populations I, H and 111 based on LMM and CPMM analyses of g
L and y, respectively. The linear relationship between estimated and true values of 8

was estimated based on simple linear regression. We anticipated that as when a,

increased, the linear relationship between posterior means and true values of 8 should
approach 0 with greater degree of statistical shrinkage of posterior means to 8 = 1. This
would be consistent with what is expected with random location effects models where a
greater degree of statistical shrinkage is observed when random effects variances are
small as opposed to being large (Robinson [31]). The least-squares estimated slopes
between posterior means and true values of 8 averaged across each of the three replicates
from Populations I, H, and 1H were 0.882, 0.670 and 0.181, respectively, for the LMM
analysis of L and 0.669, 0.456 and 0.141 , respectively, for the CPMM analysis of y,
further illustrating the greater degree of shrinkage with the less informative CPMM

analysis.

123

It is important to generate sufﬁcient number of MCMC cycles in order to
minimize the impact of Monte Carlo error on posterior inference. Several animal breeders
have recently suggested 100 as the minimum effective sample size (ESS) for reliable

statistical inference ([2], [37]). In our simulation study, the ESS for

0,? , 0,? , 0",? , 5,? and a, ranged from 250-78,000 in LMM for the analysis of L and from

160-15,000 in CPMM for the analysis of y. As anticipated, the CPMM analysis of y was
characterized by substantially lower ESS than the LMM analysis of the corresponding L
from which y was mapped via 1' ; nevertheless, ESS appeared sufﬁcient for reliable
posterior inference in both GLMM.

The simulation study was also used to validate DIC and LML as model choice
criteria. For this purpose, DIC and LML values for each replicated dataset based on
homoskedastic and heteroskedastic GLMM within each of the four populations were
determined for the analysis of L using a LMM and for the analysis of y using a CPMM.
A DIC difference exceeding 7 has been suggested by Speigelhalter et al. [36] as an
indication of a meaningful difference in model ﬁt. The DIC differences between
homoskedastic and heteroskedastic GLMM (LMM for y and CPMM for L) for each
matched replicate within populations are shown in Figure 1. In all cases, DIC differences
were clearly in favor the correct model. Furthermore, as expected, the magnitude of DIC
differences involving LMM analyses of L was higher than that for CPMM analyses of y
since, again, y is less informative than L. The DIC differences between homoskedastic

and heteroskedastic error GLMM approached 0 with increasing 0,.

124

Application to Birth Weight and Calving Ease Scores in Italian Piemontese Cattle
Genetic Parameter Inference

Sire and maternal grandsire LMM and CPMM were used for the analyses of birth
weight and calving ease scores, respectively, in Italian Piemontese cattle. Posterior
means and standard deviations, 95% equal-tailed PPI and ESS on dispersion and genetic
parameters for birth weight and calving ease scores are summarized in Table III and
Table IV, respectively. In all cases, reported inferences were based on combining
samples from the three separate MCMC chains after burnin. In addition to the posterior
means, the posterior modes and medians (results not reported) using the MCMC
algorithm were calculated for each parameter and were found to be very similar to each
other, whether for LMM analysis of birth weights or for the CPM analysis of calving
ease scores. These results imply that the posterior densities were symmetric and
unimodal.

The total number of ESS for dispersion parameters across the three chains ranged
from 1,836 to 21,839 for the LMM analysis of birth weights and from 841 to 14,045 for
the CPMM analysis of calving ease scores. As anticipated from the results of the
simulation study, the CPMM analysis generated lower ESS than the LMM analysis.

Relative to reported results for the homogeneous CPMM in Kizilkaya et al. [22], the

additional inference on a, , 53...... , and 592m. did not appear to adversely impact MCMC

mixing and, subsequently, ESS in the CPMM. We observed similar comparisons in the
heteroskedastic versus homoskedastic (results not reported) LMM analysis of birth

weights.

125

For the purposes of model comparison, birth weights were also analyzed using
homoskedastic LMM. Estimates of 0,? and 0?, from homoskedastic LMM and CPMM
were found to be somewhat higher than the corresponding estimates under the

heteroskedastic LMM and CPMM. However, the corresponding 95% PPI for these

parameters overlapped considerably between the homoskedastic and heteroskedastic
GLMM. Of particular note, the estimate of 0,? using homoskedastic LMM for birth

weight was found to be signiﬁcantly greater than the corresponding estimate in the
heteroskedastic LMM since the respective 95% PPI did not overlap whatsoever.
Regardless of the GLMM considered, we observed that the posterior mean of the
residual variance from homoskedastic model is nearly equal to the average posterior
means of sex-speciﬁc residual variances in the heteroskedastic error models. Table III
further indicated that the residual variance for male calves is signiﬁcantly greater than for
female calves for birth weight and calving ease scores using LMM and CPMM
respectively. This signiﬁcant difference between residual variances translated into sex-
speciﬁc differences for direct and maternal heritabilities, as indicated in Table IV.

Heteroskedastic LMM and CPMM lead to very low posterior means (<5) for 0:,:

indicating that there is considerable residual heteroskedasticity across herds for both birth

weights and calving ease.

Approximate 95% PPI on herd-speciﬁc 6i in the birth weight and calving ease
scores analyses were computed and presented in Figure 2 for LMM model and Figure3
for CPMM, respectively. As seen in ﬁgures, the PPI for 6, in many herds do not overlap

with the expected average value of 1, indicating that residual variances for these herds is

signiﬁcantly higher or lower than average.

126

In order to compare the homoskedastic and heteroskedastic LMM (and CPMM)
model for ﬁt to the birth weight (and calving ease) datasets, LML and DIC values with its
components 5 and p,, are given in Table V. DIC and LML values in Table V were
reported separately for each of the three chains under LMM and CPM models. As seen
from Table V, DIC and LML results are very consistent within each model indicating
small Monte Carlo errors on the DIC and LML differences between homoskedastic and
heteroskedastic residual models. Both DIC and LML model choice criteria were in favor
of the heteroskedastic LMM and CPM models. Surprisingly, the model complexity, as

measured by p D , was lower for the heteroskedastic LMM model whereas the

heteroskedastic CPMM model generated higher model complexity value. Kizilkaya et al.
[22] analyzed the calving ease data by using homoskedastic cumulative t-link model and
determined that homoskedastic cumulative t-link mixed model (CTMM) was
overwhelmingly chosen as the best ﬁtting model choice criteria. However, the DIC and
LML comparisons between the heteroskedastic CPMM in this paper and the
homoskedastic CTMM in Kizilkaya et al. [22], strongly favor the heteroskedastic error

CPMM for goodness of ﬁt.

Inference on Sire Effects
Posterior means of elements of s were computed under homoskedastic and
heteroskedastic error GLMM models, as corresponding point estimates of expected
progeny differences (EPD). The relationship between these EPD's were determined by
using simple regression and shown in Figure 4 and Figure 6. As seen from ﬁgures, there

is strong linear relationship between posterior mean estimates of s, indicating no

127

reranking problem of sires based on Pearson correlations of 0.97 from LMM and 0.98
from CPM.

The standard errors of prediction were computed as the posterior standard
deviation of the MCMC samples of elements of s. The scatter plots of these standard
errors for both models with the corresponding least squares regression line for the
homoskedastic versus heteroskedastic LMM and CPMM models were presented in
Figure 5 and Figure 7, respectively. Regression coefﬁcient and correlation between
models in birth weight and calving case analysis were near unity, indicating almost

perfect linear relationship between posterior standard errors.

DISCUSSION

There is already extensive animal breeding research dealing with statistical
inference on heterogeneous variances based on maximrun likelihood or empirical Bayes
methods over the last two decades ([6], [8], [9], [16], [18], [19], [30]). In this study, we
developed a general framework for structural modeling of heterogeneous residual
variances in GLMM based on fully Bayesian inference using MCMC methods. We
validated our heteroskedastic GLMM by simulation and applied it to birth weight and
calving difficulty data from Italian Piemontese cattle using linear mixed models and
cumulative probit mixed models, respectively. Our method could also be conceptually
extended to a heteroskedastic analysis of count data based on a Poisson log-normal model
or to censored data models (Sorensen et al. [3 5]).

The results from our simulation study indicated that inference on parameters

specifying ﬁxed and random sources of residual heteroskedasticity is possible in both

128

linear and threshold mixed models. Furthermore, model choice criteria such as DIC can
be used with conﬁdence to assess to choose between heteroskedastic and homoskedastic
GLMM speciﬁcations.

Residual variances for BW and CE were shown to be greater in male calves
compared to female calves in Italian Piemontese cattle, directly resulting in smaller
heritabilities of direct and maternal effects for male calves. Furthermore estimates from a
homoskedastic speciﬁcation were midway between the separate male and female
posterior mean estimates derived from a heteroskedastic speciﬁcation. Our results for
birth weights are in good agreement with the results from Garrick et al. [9]. Ducrocq [6]
also estimated 7-18% residual variances in male calves relative to female calves for
calving ease scores in Norrnande and Montbeliarde breeds. Our female calf posterior
means of direct and maternal heritabilities were substantially higher than corresponding
threshold model estimates with homoskedastic residual assumption for calving ease
reported by Manfredi et al. [25 26], McGuirk et al. [27 28], Varona et al. [38], Luo et al.
[23], Bennett and Gregory [1] and Ducrocq [6]. Direct and maternal heritabilities were
also determined to be greater for female than for male calves and differences between
them were signiﬁcant for BW and CE. However, these results should be treated with
caution since we did not consider sex-speciﬁc inference on genetic variance. This is an
area for further research, particularly as the power for inferring upon sex-speciﬁc genetic
variance is substantially lower than that for residual variance. Inference on strongly
negative direct-maternal genetic correlations for BW and CE is in agreement with

previous work ([1], [3], [6], [22], [23], [38]).

129

There appeared to be no appreciable differences in sire re-rankings between
homoskedastic and heteroskedastic models. This should be not too surprising if sires are
randomly used across lowly and highly variable herds and calf sex ratios within each sire
do not deviate appreciably from 50%. However, substantial differences in rerankings
might be anticipated using animal models, particularly for across herd rankings of dams.

In the analyses of simulation and ﬁeld data, two Bayesian model choice criteria
DIC and LML, were utilized to choose between homoskedastic and heteroskedastic
GLMMs and the homoskedastic GLMM was clearly rejected with small DIC and large
LML values. In the analyses of BW and CD datasets in Italian Piemontese cattle,
heteroskedastic GLMMs were overwhelmingly chosen as the best ﬁtting models by both
model choice criteria. These results and results from previous studies showed the
existence of heteroskedastic residual structure and deﬁciencies of many current genetic
evaluation systems.

To further ﬁne-tune genetic evaluation systems, heterogeneous variances could be
additionally modeled as a function of age of dam, region or year in models. In addition,
heavy-tailed distributions for production traits and liability variables in ﬁtness traits can
be jointly modeled with heteroskedastic speciﬁcations in GLMM as we are working on
currently. Residual and genetic correlations between birth weight and calving difﬁculty
imply that genetic evaluations of calving ease would substantially beneﬁt from a bivariate
threshold/linear multiple trait analysis with birth weight ([24], [3 8]). Further work on
providing heteroskedastic modeling options with normal and t-distributed speciﬁcations

involving both traits jointly is needed.

130

REFERENCES

[1] Bennett G.L., Gregory K.E., Genetic (co)variances for calving difficulty score in
composite and parental populations of beef cattle: I. Calving difficulty score, birth
weight, weaning weight, and postweaning gain, J. Anim. Sci. 79 (2001) 45-51.

[2] Bink M.C.A.M., Quaas R.L., Van Arendonk J .A.M., Bayesian estimation of
dispersion parameters with a reduced animal model including polygenic and QTL
effects, Genet. Sel. Evol. 30 (1998) 103-125.

[3] Carnier P., Albera A., Dal Zotto R., Groen A.F., Bona M., Bittante G., Genetic
parameters for direct and maternal calving performance over parities in Piemontese
cattle J. Anim. Sci. 78 (2000) 2532-2539.

[4] Chib S., Greenberg E., Understanding the Metropolis Hastings algorithm, Am. Stat.
49 (1995) 327-335.

[5] Cowles M.K., Accelerating Monte Carlo Markov Chain convergence for cumulative
link generalized linear models, Stat. and Comp. 6 (1996) 101-111.

[6] Ducrocq, V., 2000: Calving ease evaluation of French dairy bulls with a
heteroskedastic threshold model with direct and maternal effects. Genetic evaluations
for conformation and other functional traits: 123. Bulletin no: 25. Proceedings of the
2000 interbull meeting. Bled, Slovenia, May 14-15, 2000.

[7] Foulley J .L., San Cristobal M., Gianola D., Irn S., Marginal likelihood and Bayesian
approaches to the analysis of heterogeneous residual variances in mixed linear
Gaussian models, Comp. Stat. Data Anal. 13 (1992) 291-305.

[8] Foulley J .L., Gianola D., Statistical analysis of ordered categorical data via a
structural heteroskedastic threshold model Genet. Sel. Evol. 28 (1996) 249-273.

[9] Garrick D.J., Pollak E.J., Quaas R.L., Van Vleck L.D., Variance heterogeneity in
direct and maternal weight traits by sex and percent purebred for Simmental-sired
calves, J Anim. Sci. 67 (1989) 2515-2528.

[10] Gelfand A.E., Model determination using sampling-based methods, in: Gilks W.R.,
S. Richardson S., Spiegelhalter D.J.(Ed.), Markov Chain Monte Carlo in practice,
Chapman&Hall, New York, 1996, pp. 145-162.

[1 1] Geyer C.J., Practical Markov chain Monte-Carlo (with discussion), Stat. Sci. 7
(1992) 467-511.

[12] Gianola D., On selection criteria and estimation of parameters when the variance is
heterogeneous, Theor. Appl. Genet. 72 (1986) 671-677.

131

[13] Gianola D., Foulley J .L., Sire evaluation for ordered categorical data with a
threshold model, Genet. Sel. Evol. 15 (1983) 201-224.

[14] Henderson C.R., Inverse of a matrix of relationships due to sires and maternal
grandsires in an inbred population, J. Dairy Sci. 59 (1976) 1585-1588.

[15] Heringstad B., Rekaya R., Gianola D., Klemetsdal G., Weigel K.A., Bayesian
analysis of liability of clinical mastitis in Norwegian cattle with a threshold model:
Effects of data sampling method and model speciﬁcation, J. Dairy Sci. 84 (2001)
2337-2346.

[16] Hill W.G., Edwards M.R., Ahmed M.K.A., Thompson R., Heritability of milk yield
and composition at different levels and variability of production, Anim. Prod. 36
( 1983) 59-68.

[17] Hill W.G., On selection among group with heterogeneous variance. Anim. Prod. 39
(1984) 473-477.

[1 8] lbanez M.A., Carabano M.J., Alenda R., Identiﬁcation of sources of heterogeneous
residual and genetic variances in milk yield data from the Spanish Holstein-Friesian
population and impact on genetic evaluation, Livest. Prod. Sci. 59 (1999) 33-49.

[19] J affrezic F., Robert-Granie C., F oulley J .L., A quasi-score approach to the analysis
of ordered categorical data via a mixed heteroskedastic threshold model, Genet. Sel.
Evol. 31 (1999) 301-318.

[20] Jensen J ., Wang C.S., Sorensen D.A., Gianola D., Bayesian inference on variance
and covariance components for traits inﬂuenced by maternal and direct genetic
effects, using the Gibbs sampler, Acta Agric. Scand. 44 (1994) 193-201.

[21] Kizilkaya K., Banks B.D., Carnier P., Albera A., Bittante G., Tempelman R.J.,
Bayesian inference strategies for the prediction of genetic merit using threshold
models with an application to calving ease scores in Italian Piemontese cattle, J.
Anim. Bred, Genet. (in press).a

[22] Kizilkaya K., Carnier P., Albera A., Bittante G., Tempelman R.J., An assessment of
cumulative t-link threshold models for the genetic analysis of calving ease scores,
Genet. Sel. Evol. (submitted).b

[23] Luo M.F., Boettcher P.J., Dekkers J .C.M., Schaeffer L.R., Bayesian analysis for
estimation of genetic parameters of calving ease and stillbirth for Canadian
Holsteins, J. Dairy Sci. 82 (1999) 1848.

[24] Luo M.F., Boettcher P.J., Schaeffer L.R., Dekkers J .C.M., Bayesian inference for

categorical traits with an application to variance components estimation, J. Dairy
Sci. 84 (2001) 694-704.

132

[25] Manfredi E.J., San Cristobal M., F oulley J .L., Some factor affecting the estimation
of genetic parameters for cattle dystocia under a threshold model, Anim. Prod. 53
(1991a)151-156.

[26] Manfredi E.J., Ducrocq V., Foulley J .L., Genetic analysis of dystocia in dairy cattle,
J. Dairy Sci. 74 (1991b) 1715-1723.

[27] McGuirk B.J., Going 1., Gilmour A.R., The genetic evaluation of beef sires used for
crossing with dairy cows in the UK. 2. Genetic parameters and sire merit predictions
for calving survey traits, Animal Sci. 66 (1998) 47-54.

[28] McGuirk B.J., Going I., Gilmour A.R., The genetic evaluation of UK Holstein
Friesian sires for calving ease and related traits, Animal Sci. 68 (1999) 413—422.

[29] Rekaya R., Weigel K.A., Gianola D., Application of a structural model for genetic
covariances in international dairy sire evaluations, J. Dairy Sci. 84 (2001) 1525-
1530.

[30] Robert-Granie C., Ducrocq V., Foulley J .L., Heterogeneity of variance for type traits
in the Montbeliarde cattle breed. Genet. Sel. Evol. 29 (1997) 545-570.

[31] Robinson G.K. That BLUP is a good thing - the estimation of random effects. Stat.
Sci. 6 (1991) 15-51.

[32] San Cristobal M., Foulley J .L., Manfredi E., Inference about multiplicative
heteroskedastic components of variance in a mixed linear Gaussian model with an
application to beef cattle breeding, Genet. Sel. Evol. 25 (1993) 3-30.

[33] Searle S.R., Matrix Algebra Useful for Statistics, John Wiley & Sons, NewYork,
1982.

[34] Sorensen D. A., Andersen S., Gianola D., Korsgaard 1., Bayesian inference in
threshold models using Gibbs sampling, Genet. Sel. Evol. 27 (1995) 229-249.

[3 5] Sorensen, D.A., D. Gianola, and LR. Korsgaard. Bayesian mixed-effects model
analysis of a censored data distribution with animal breeding applications. Acta
Agric. Scand. See. A. 48 (1998) 222-229.

[36] Spiegelhalter D.J., Best N.G., Carlin B.P., van der Linde A., Bayesian measures of
model complexity and ﬁt, J .R. Statist. Soc. B 64 (2002) 1-34.

[37] Uimari P., Thaller G., Hoeschele I., The use of multiple markers in a Bayesian
method for mapping quantitative trait loci, Genetics 143 (1996) 1831-1842.

133

[3 8] Varona L., Misztal I., Bertrand J. K., Threshold-linear versus linear-linear analysis
of birth weight and calving case using an animal model: I. Variance component
estimation, J. Anim. Sci. 77 (1999) 1994-2002.

[39] Wang C.S., Rutledge J .J ., Gianola D., Bayesian analysis of mixed linear models via
Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26
(1994) 91-115.

[40] Weigel K. A., Gianola D., Estimation of heterogeneous within-herd variance
components using empirical bayes method: A simulation study, J. Dairy Sci. 75
(1992) 2824-2833.

[41] Winkelman A., Schaeffer L.R., Effect of heterogeneity of variance in dairy sire
evaluation. J. Dairy Sci. 71 (1988) 3033-3041.

134

 

 

 

66.56N . -66.MN mvN . 6HN6.6NN 66. .- . 6.6 66.366. 66. .-N66 3.366.. 5.6-6. .6 n6.6H..N6 6. 64.6.6 m66HN. .6 m->.
666666.666.” 666NNNH6N666 v. . .-..66 66.33.. 6. . 766.6 6663.6.— wm6-a . .6 3.6HnN.6 a. 6.566 .66“. . .6 N->.
NNN . mm-m . .mm mw.vvaHE..m6m 66. .-666 6666666 66. .-N66 3.366.. 666.6 . .6 V6.6H6N6 mN6-6. .6 V6666. .6 .->.
6N.”...N-NNN. 6m. .6 . H60 . 6 6m. .-6. .. 66.6HNN. N. . . -m66 m66HN6.. 3.6-3.6 n66HNm6 9N6-666 V66Hm . .6 ...--...
66169186. hm. .6NH6v. .6. . ...-NM. 666“.” . N. . w . . .-666 66.366. 3.6-8.6 666HNm6 .m6-N. .6 666ﬁoN6 N-...
66.35-.. . . .N 6N.6¢6HN666. 6m. .-6. .. 66.6H..N.. vo. . 66.6 V6.6H666 26... . .6 V66HmN6 6. 6.66.6 M66HN. .6 .-...
66.6N-mv6 3.66666. 66. .-NN.. 3.33.. m . . . -N66 66.6HN6. $6.36 66.3.66 6 . 6-8.6 M66HN. .6 m-..
. .6.-V6.6 N. .NH . N.6 an. .-. ... ..66HVN. m6. 736 66.366 6m6-6N6 66.6HwN6 vN,6-666 36Hm . .6 N-..
M6.N.-m6.n aw. .Hn66 .v. ...m. .. >66H6N.. w . . .-m66 66.6H66.. 3.6... . .6 V66HVN6 N66. .6 66621.6 . .6 .-..
hm.v-6n.N 3.63%..“ 6m. .-.6.. 6. 6Hw... 66..-»..6 66.6“.66 mv.6-mN6 mo6me6 v. 6.666 N66H666 m.
6N.v-mm.N 3.356 S. 756.. 6. .6HmN.. 6N. 75.6 666HN6.. Nv6-NN6 36“.”. m6 .N6-m66 M6666 . .6 N-.
m 0.1m m .N 6V6HNm...” 6N. .-m66 66.6H66.. m6. .656 66.3666 26... . .6 V6.6HVN6 .N.6-..6.6 86“.”... .6 .-.
56°66 Qwﬁzn. 7.6966 Omani“. 7.6.366 Oman—)7. 7.6.666 QmHZn. 7.6.666 QmHSr. magma-60.3.6606

b b 30 .b

NI!

NI

.0608

ﬂue-.60 .0868 .805. 60.3 6636 0066.586 0. $8258.. 90006882. .05 606.63% 05 no 85.86:. 8.580.. .. 9.63..

135

 

 

 

.m.N6m-N . .6. $5633. .6 .6. .66.. 6N6H66.. Nv..-n66 N. .6Hv. . . mm6-VN6 566H6m6 6N6-666 36%.». .6 m->.
65.56.1666 66.666 366.66. 6N. .666 666H. ... NN. .-m66 566H56.. v6.6.5 . .6 ~.6.6H.~N6 5. 6.666 66.6H. .6 N->.
56.6.6 m-V6éN 3.6663” . N.mN5 v. . T566 566H66.. n . . . -666 566HN6.. 666-6. .6 66.6.IT5N6 VN6-666 V6666 . .6 .->.
5.». . 6v-65.6 66.656 3.6665. 66. .-m N. . 6. 663%.. 5N. .-N66 66.6H66.. .m6-mN6 5666666 6N66. .6 66.6H5 . .6 6...
66666-6 . .6. 56. .56H5m66. $1-56.. 666HmN.. 6N. .-.66 566HV6.. 6v.6-NN6 666”: m6 .m6-N. .6 m66H6. .6 N-...
V6.6mVN-.6.m. 66.6566HN6.v65 mm. .-m . .. 6. 6Hmm. N. . .-566 6666666 666-5 . .6 666HmN6 6. 6-666 666H. .6 .-.:
V6.N6 .666 Nvdo . H566.q 66. .66.. m. .6HVN.. .N. .666 66.6H66. 6v6-mN6 56.3666 6. 6-566 66.6HN. .6 m-..
6N. . N666 v66“ . 6.6. ...N-m . .. 3636.. .N. .666 6. 6H . 6.. mV6-6N6 66.6H6m6 6N6-666 V66H6. .6 N-..
5v.»- . - . 66 6N.N-H65.5 N6. .-666 . .6H66.. 66.6- . 5.6 566HV66 6N6-v. .6 3.6H6N6 .N6-666 66.6.4.6. .6 .-..
mm.6-N6.m 666Hmv... 5N. .656 m . 6HV66 . .. .656 6. 6H666 6.6-6 . .6 666H6N6 m . 6-66.6 666H666 6-.
V6.v-mm.N 666H6v6 6m. .-N6.6 . .6HN. .. .N. .666 6. 6..-.66. . 6v6-6N6 66.6H6N6 6N6-566 666Hm. .6 N-.
66.6-6m.N V6.6H.6.m .v. .-m66 N. .6Hm . .. v. . .-656 666HV66 666-5 . .6 66.6HmN6 6N6-666 V66Hm .6 .-.
7.51666 Qmﬁia. .n.n.o\om6 DmHSE 7.6.0666 DmHEn. 7.51666 Qmﬁzn. 7.6.366 QmHEA. 30889-623369.

.6 ﬂab b ..b b

NI

NI

.0on 606.8

.58.. 033.6850 66.0: 6.030 5.8.68.0 6. 0.0.0880». 66.056880”. .05 8.000606 05 co 00:205.. 8.60600“. ... 0.6:...

136

.3830 085 05 $88 mmm 5 owﬁu 9 8.3.. H H mos—g BEBE 2N3: 8500 no 88850

8:268 2660a 32$ 2: wEm: Basso 0202 025 05 88% 838% “59.335 .«o “38:: gauche .89 25.

.7352: bsmnmnenm .85me 3:84.33 $30

.33 I 03.:

$3 I ~35

.cowmgoc @3983 H :38 85609.

downtomoc 3m «x2 coma

 

35.: E: I 23 2:... H 8m; 3.2 mad I 0de «23 H 89o A 3%
so? I 83; $3 I and
3&2 8% I $2 3&3 H 88.... RS: 23 I max on; H 83 a
:05 I as $.33: EH ...:
33; 82. I ammo 285 H ”22 and $3 I a? 32 H $3 mm- R
88; I ca. s 88H I ”3d ﬁg»
33 Had I 83 885 H 8:6 25 $5.: I 33 23 H 3:: mm
53 I 8a. a 53 I and $5
2% ”$2 - $85 285 H 82: 35 852 I ”8;: as; H m3: mm
823 I 03d 53 I E: H
a3 8:. I a; 23 H 83 v3.3 33 I 82 83 H 83 Nb
3% I Sn :53 I 83 .5.
$3 $3 I 83 :3 H H85 83 $2 I 3:. 83 H 33 b
a: I ”E a? I a: s
3 $2. I 23. 83 H v85 33 a; I 82 .2 3 H 093 Nb
$8; I on: at; I a5: v.
33 £3 - 53 ~86 H :3 $3 $2 I $3 33 H mm: Nb
Hmmm or: :3 Em H 33 Hmmm can :3 ..am H E “sagas;

 
 

$38 Soto 3me H58 95.25850

 

338 98%? 3me H354
.238 88.8805 Swan: 5 $88 38 $330 was Emma? is mo manages 393.5, no 8:203 8560A A: «Bah

137

@530 085 05 $88 mmm E 0mg 2 Homo.— H _ was?» 3335 Xmas 639 we 88:58

8:263. 9683 REE 05 mama: £35 0202 025 2: $88 838% 262839: mo 338:: 028qu .83 on?
.3325 bzssoa stoma 33.33 $5

donuts“. @3283 H 538 85509.

downtown—V 8m ~on coma

 

 

 

 

$3-83 $3-83 : - as -
33 $3 I :2 :3 H H23 23 $3 I 23 Sod H ”mod E u - H H we
53-23 $322,: 2 - :5 -
83 $3 I 335 Rod H 33 $3 83 I ”3o :3 H E; E m - ‘ H F
82-82 H _ 5-2:
mg 82- - £3- 33 H $3- 23 23 I $2- :3 H 82- 5
83-23 86.3; E... -
m8 $3 I «8.0 3:. H 33 23 a; I Sod Rod H mm; x ..H 3
$3.82 $2443.: -
aim was I :3 33 H 32 2mm 33 I 23 Rad H £3 HE F
53E 38.3: 2 -
ma $3 I 83 mm; H Rod :3 $3 I 83. 83 H wood E S
$3.23 :2.§8.: I -
$3 82 I 3% Hood H 22 $2 82 I $3 35 H 3% E ms
Hmmm or: £3 Ham H $5 $3 OE $3 Dam H 3: $3835
338 3me we: H508 oiﬁxsﬁdu 388 ﬂoat». @828 Mods

 

.238 omega—080E SE“: 5 880m 88 wag—mo 98 Emma? 5:: mo £80883 038% cc oocouomﬁ Showman .>— 03:.

138

$80883 no 5983 953mm 2?

.889, :88 Summon no @093 853% 56989.

68838 5%on mo owﬁozw 2E.“

 

 

 

 

 

 

3%. MRS 2» 32.2 ”3.2 m 520
was- mane m8 01.2 ”3.2 N 596
3%- $5: 3» 3.2 2&2 H 520
momummﬁou—mOuowom
82- 9A.: 2: Eu: $92 m 520
8;- 3...: a: 35.3 892 N 520
83. 8...: g 2.2 Home _ 526
amoﬂmmvov—moaom
228

3 as. $3... 25 :3... 9%.? m 520
32.8- 3.2. woo 5.3. 3%? N 326
32.8. ”2.3 mg ~83. 9A.? _ £20
momumNVOXmoHoao:

:18. 03.3 mg H33. 08.? m 520
v3.8- 03.3 08 3...: ca? N :35
v3.8- 08.3. 08 v8.3. 25.? _ 520
ﬂmoﬂmﬂuoxmoﬁam

2::
as: 93 0 am .. 65 _. m

 

.238 38:0an 52d: 5 880m 980 wag—mo 28 Emma‘s 5.5 mo mama—«SH
05 Sm was?» 325 ©0258:— EEwSE mo. was GR: 85:8 coumgomﬁ 853% o£oo%-5m:o 0202 .> 03:.

139

$me coumwaﬁmm E mooaﬂobmv 05 .— «...—ME

—
_

2:-

 

 

OO

18N

18M

seawwa 310

18..

 

 

223 o o
2.5 o o

 

 

8w

 

140

 

 

 

 

ow

mm

 

.Bm é «2% 058323 8 7: £3 oséxoa? .N 2:5

.5:
cm m1 ow nm em mm 0m mu 0— m o

$8

I
a

0‘!
mean Jonarsod 93190

141

mo

 

 

 

 

ow

mm

 

.mo .8 2% 058%-an 8 E .33 038mg? .m 955

3.82
Om m? o7 mm cm mm on mm c— m o

o

_
ﬁu
G
0
mV

n mu

0 .d
0
NF
m.
m
.M‘

... m...
mw
m“

142

.6608 ESE
0:980meon 3%? 0538.883 co woman 3m .8.“ mooaobgv >5on 2% m0 838 How—88a mo «29.33% .9 0.53,.”

 

.22... ounutmxmgoz

 

 

 

 

WW1 onsepaxsmamu

 

 

 

 

143

.338 224 3333888:
3%? £33883 no woman 3m 8m moocouobmv xaowoa 2% mo 83333 335% Snowman .«o 8.983% .m 952..—

 

 

 

224 ozmaumxmgo:
3 N. v F no co to No o
r _ h _ _ b _ O
- Nd H
w
. a
.. vo m
S
, 0.0 a
p
,. 3 m
m.
I P m
- Ne w

 

v._‘

 

 

 

144

.638 SEQ
038—on8.8qu 3%? 3863880: no woman m0 Sm mooaeobmu ~30on 9% MO 338 Showman .«o 8.933% .0 «...—ME

 

 

 

 

  

 

 

 

SEQ 0550930on
1 PI
0
O
,. no- . m
09 .. m
P P v
_ _ _ W-
by
m.
3
O
. m
0
- F

 

 

145

.388 SEC 0538—858:
3%? 2630x882 no 382 m0 Sm moocuuotmv acumen 2% mo meow—«3% 2353 Sigma .«o “29835 .5 «...—ur—

 

 

 

 

sEo uﬁaumﬁoso:
93 to 93 0.0 m3 No who to mod 0
- P b P p p _ . _ O

I to m

m

. m

, No x

m

B

7 no u...

3

w

, v.0 w

 

- md

 

 

146

CONCLUSIONS AND RECOMMENDATIONS

Fitness traits such as conformation score, calving ease and ovulation rate affect
livestock production proﬁtability. This fact has been reﬂected in animal breeding as
selection emphasis has shifted gradually away from yield to ﬁtness traits in order to
maximize farm income. Calving ease is particularly important for cost control to beef and
dairy cattle production because of the increased risk that calving problems add to the
survival of both calf and dam. Accurate inference on genetic parameters and genetic
merit are important for effective sire selection strategies to improve calving ease.

Variance components and their derivative genetic parameters in a threshold sire-
matemal grandsire model analysis of calving ease have been historically estimated using
approximate marginal maximum likelihood (MML) procedures based on expectation-
maximization (EM). In order to assess the validity of approximate MML estimates,
inferences were compared to those based on sequences of Markov Chain Monte Carlo
(MCMC) sampling in Chapter H. Laplacian ML and MCMC point estimates of
variance components and direct and maternal heritabilities were found to be signiﬁcant
for calving ease in first parity Italian Piemontese dams. However, the covariance between
additive direct and maternal effects was found to be not different from zero based on
MCMC-derived posterior probability intervals. Furthermore, the joint modal estimates of
sire effects and associated standard errors based on MML estimates of variance
components differed little from the respective posterior means and standard deviations
using MCMC. The results suggest that it may not be necessary to apply computationally

intensive MCMC methods for inferences on genetic parameters and sire genetic merits

147

using threshold models on large calving ease data sets, at least based on a conventional
genetic evaluation model.

A hierarchical threshold mixed model based upon a cumulative t-link rather than a
cumulative probit link speciﬁcation for the analysis of ordinal data was developed and
applied to calving ease scores in Chapter III. The model and MCMC algorithm were
validated on simulated liability and categorical data sets from normal and t distributions
(with 4 degrees of freedom) using the deviance information criterion (DIC) and log
marginal likelihood (LML). The simulation study indicated that inference on the t error
degrees of freedom was reliable for the analyses of liability and ordinal data sets;
however, MCMC mixing was problematic, especially for ordinal data. Both DIC and
LML were able to choose the correct model in most cases. In the analysis of calving ease
scores on Italian Piemontese cattle, the cumulative t-link model was overwhelmingly
chosen as the best ﬁtting model. However, the posterior means of direct and maternal
heritabilities from cumulative probit link and t-link threshold models were found to be
not meaningfully different from each other. Furthermore, the examination of posterior
means of sire effects showed that there was no real difference between two models in
terms of genetic rankings of sires. A surprising result is that the covariance between
additive direct and maternal effects was found to be signiﬁcantly negative in Chapter III,
in stark contrast to the results presented in Chapter H. The result in Chapter 11 further
appeared to be dependent upon whether or not herds were treated as random or ﬁxed.

In Chapter IV, a Bayesian hierarchical generalized mixed model based on a
structural multifactorial model with ﬁxed and random effects that multiplicatively

inﬂuence heterogeneity of residual variance was developed. The models and MCMC

148

algorithms were validated on simulated normal and ordinal categorical data characterized
by heteroskedastic residual structures. The simulation study further indicated that DIC
and LML were useful in correctly choosing between heteroskedastic and homoskedastic
models. Application of the method on Italian Piemontese calving ease data indicated that
the residual variance for male calves was signiﬁcantly greater than that for female calves;
and that residual variances for some herds were signiﬁcantly higher or lower than
average. However, as in Chapter 2, the high correlation between posterior mean of sire
effects from heteroskedastic and homoskedastic models suggested no meaningful
reranking.

It appears that better hierarchical Bayesian models confer primary advantages in
more accurately modeling variability in calving ease without providing major
perturbations on sire rankings. Whereas this may be true for cumulative probit mixed
effects models, this is less likely to be the case for linear mixed effects models, as hinted
by a heteroskedastic analysis of birth weight in Chapter IV. Furthermore, signiﬁcant
rerankings may be expected to occur in animal model rather than sire model
speciﬁcations, particularly for dams.

In future research, heterogeneous residual variance could be modeled as a
function of age of dam, year or region as further extensions to the models considered in
Chapter IV. Furthermore, heteroskedastic t-error based mixed models for residuals might
be considered for the analyses of production and liability variables; work is already
underway on this front with preliminary results indicating heteroskedastic t-error models

as the best ﬁtting relative to those models considered in this dissertation.

149

More signiﬁcantly, heterogeneity of residual and genetic correlations might exist
between birth weight and calving difﬁculty. As this phenomenon potentially inﬂuences
national selection strategies, further work on providing greater modeling ﬂexibility with
normal and t-distributed homoskedastic and heteroskedastic error structures on both traits

jointly is needed.

150

   

    

SITY

' lllj‘lijjjiﬂjﬂjjllﬁ

26

   

jj‘m '