Ix... . .1“. ......1... ... gain...“ . . i . ....rl a L t 7 . I. . . ...pS-flfluu-EHVIWd‘IIP . . ... n..n..!.¢\4..t1x.. :0” g . .nz. v.8...m1w .. .. .u z 1.. .... a... . 1 gray? 5:50.! I: \} ... -s‘: 1 ..w.al?.‘\. . av]... . x .16. \h This is to certify that the dissertation entitled . HIERARCHICAL BAYESIAN THRESHOLD MODELS APPLIED TO THE QUANTITATIVE GENETIC ANALYSIS OF CALVIN; EASE SCORES IN ITALIAN PIEMONTESE CATTLE presented by Kadir Kizilkaya has been accepted towards fulfillment of the requirements for Phone degreein Mina]. SCience IGMJ' We“ Major professor I i Date 87/L'Ql/02— r MS U is an Affirmative Action/Equal Opportunity Institution 0- 12771 LIBRARY Michigan State University PLACE IN RETURN Box to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 cJCIRC/Dateouopss-sz HIERARCHICAL BAYESIAN THRESHOLD MODELS APPLIED TO THE QUANTITATIVE GENETIC ANALYSIS OF CALVING EASE SCORES IN ITALIAN PIEMONTESE CATTLE By Kadir Kizilkaya A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Animal Science 2002 ABSTRACT HIERARCHICAL BAYESIAN THRESHOLD MODELS APPLIED TO THE QUANTITATIVE GENETIC ANALYSIS OF CALVING EASE SCORES IN ITALIAN PIEMONTESE CATTLE By Kadir Kizilkaya First parity calving ease scores for Italian Piemontese cattle were analyzed using sire and maternal grandsire threshold models. Genetic parameters were estimated by approximate marginal maximum likelihood (MML) methods such as the historically popular expectation-maximization method and Laplace's method in addition to inferentially exact Markov Chain Monte Carlo (MCMC) methods. Laplacian MML and MCMC point estimates of variance components and direct and maternal heritabilities were seen to be statistically significant with measures of uncertainty that were virtually identical to each other. Furthermore, the joint modal estimates of sire effects and associated standard errors conditioned on MML estimates of variance and covariance components were seen to differ little from the respective posterior means and standard deviations derived from MCMC. These results suggest that there may be little need to consider computationally intensive MCMC methods for national breed genetic evaluations derived from large calving ease datasets in cattle production industries. A heavier tailed Student t residual distribution may be specified as an alternative to the normal distribution for modeling underlying liability variables that characterize a threshold probit-link model. Both t—link and probit link models were applied to simulated data sets characterized by various degrees of freedom specifications for t-error on the liability variables. Model choice, using either the deviance information criteria (DIC) or the log marginal likelihood (LML), was generally correctly assigned in all cases, whether for the direct linear model analysis of liability variables or for the threshold model analysis of the corresponding ordinal data. The threshold t-link sire maternal-grandsire model was found to better fit Italian Piemontese calving ease data compared to the regular threshold cumulative probit link model; nevertheless, the rank correlation on posterior means of breeding values between a threshold-t and regular threshold model analyses exceeded 0.98. A hierarchical generalized linear mixed model based on a structural multifactorial model with fixed and random effects that multiplicatively influence residual heteroskedasticity was developed. Validation of models and MCMC algorithms were based on simulated normal and ordinal categorical data with heteroskedastic residual structures. Simulation results indicated that DIC and LML were useful in correctly choosing between heteroskedastic and homoskedastic models. The residual variance for male calves was significantly greater than that for female calves and significant residual heteroskedasticity existed across herds, whether for linear model analyses of birth weights or for threshold model analysis of calving ease scores. However, the high correlation between posterior mean of sire effects from heteroskedastic and homoskedastic models indicated that there was no significant rerankings of sire genetic merit between the two models. ACKNOWLEDGEMENTS I would like to express my sincere gratitude to Dr. Robert Tempelman for his guidance, encouragement, permanent support and patience during these past five wonderful years as a doctoral student. Appreciation is also extended to Drs. Dennis Banks, Catherine Ernst, Joseph Gardiner and Frank Lupi for their encouragement and for serving on the guidance committee. Also I would like to express my sincere thanks to Department of Animal Science for kindly offering me an assistantship during my PhD program. I wish to thank Dr. Guilherme Rosa for his kindly help and advice during preparation of my dissertation. I greatly appreciate the logistical assistance provided by Dr. Peter Saarna in Animal Breeding and Genetics Group Quantitative Genetics Lab. I also would like to thank my fellow graduate students Fernando Cardoso, David Edwards, Hugo Roman in Animal Science; Cengiz Caner in Food Packaging; Savas Berber in Physics and Astronomy and Dr. Jose Sanchez, Jeff Smeenk and Dr. Mohammed Khan in Crop and Soil Science for their friendship, encouragement and always being there to help me. I wish to thank Ersel and Sule Catal for friendship and support. I am thankful to Dr. Richard Harwood and the CS. Mott Foundation for the financial support in the last term of my program. I also would like to thank Collage of Agriculture and Natural Resources (CANR) for their support through the Dissertation Completion Fellowship. I am thankful to Adnan Menderes University in Aydin, Turkey for their understanding and help. Finally I would like to thank my wife, Aysegul, my daughter, Mervenur and my son, Ismail Can, for their incomparable support and understanding. iv PREFACE Chapter 1 and Chapter 2 in this dissertation were written in the style required for publication in the Journal of Animal Breeding and Genetics. Chapter 3 and Chapter 4 were written in the style required for publication in the Genetics Selection Evolution. TABLE OF CONTENTS LIST OF TABLES ..... - - - - - -- -- - - --...VIII LIST OF FIGURES ..... -- - - - x INTRODUCTION _ - - -- 1 REFERENCES ...................................................................................................... 5 CHAPTER 1: LITERATURE REVIEW - - -- -- 8 INTRODUCTION ................................................................................................. 8 STATISTICAL MODELS FOR GENETIC EVALUATION OF CALVING EASE .................................................................................................................... 12 BAYESIAN INFERENCE ................................................................................. 15 VARIANCE COMPONENT ESTIMATION ................................................... 17 ROBUST MIXED LINEAR MODELS ............................................................ 21 HETEROSKEDASTIC MIXED EFFECTS MODELS .................................. 26 CHAPTER 2: BAYESIAN INFERENCE STRATEGIES FOR THE PREDICTION OF GENETIC MERIT USING THRESHOLD MODELS WITH AN APPLICATION TO CALVING EASE SCORES IN ITALIAN PIEMONTESE CATTLE - - - - 42 ABSTRACT ......................................................................................................... 42 INTRODUCTION ............................................................................................... 43 MATERIALS AND METHODS ....................................................................... 45 Data .............................................................................................................. 45 Threshold Model ......................................................................................... 47 MML Using the EM Approach .................................................................. 48 Laplace's Method ........................................................................................ 50 Markov Chain Monte Carlo (MCMC) ..................................................... 52 Inference on Heritabilities and Genetic Correlations .............................. 54 RESULTS AND DISCUSSION ......................................................................... 54 Assessment of MCMC Convergence and Mixing .................................... 54 Variance Component Inference ................................................................. 55 Heritabilities and Genetic Correlations .................................................... 57 Comparison of Inferences on Sire Effects ................................................ 58 CONCLUSIONS ................................................................................................. 60 REFERENCES .................................................................................................... 61 CHAPTER 3: CHAPTER 3: AN ASSESSMENT OF CUMULATIVE t-LINK THRESHOLD MODELS FOR THE QUANTITATIVE GENETICS ANALYSIS OF CALVING EASE SCORES - 67 ABSTRACT ......................................................................................................... 67 INTRODUCTION ............................................................................................... 68 MODEL CONSTRUCTION .............................................................................. 70 vi MODEL COMPARISON ................................................................................... 75 DATA ................................................................................................................... 78 Simulation Study ......................................................................................... 78 Italian Piemontese Calving Ease Data ...................................................... 79 RESULTS ............................................................................................................ 82 Simulation Study ......................................................................................... 82 Application to Calving Ease Scores in Italian Piemontese Cattle .......... 85 Genetic Parameter Inference ............................................................. 85 Inferences on Sire Effects ................................................................... 87 DISCUSSION ...................................................................................................... 88 REFERENCES .................................................................................................... 93 CHAPTER 4: BAYESIAN STRUCTURAL MODELING OF HETEROGENEOUS RESIDUAL VARIANCES IN GENERALIZED LINEAR MIXED MODELS ..... 104 ABSTRACT ....................................................................................................... 104 INTRODUCTION ............................................................................................. 105 MATERIALS and METHODS ....................................................................... 108 HIERARCHICAL CONSTRUCTION OF THE HETEROSKEDASTIC GENERALIZED LINEAR MIXED MODEL ....................................... 108 MODEL COMPARISON ......................................................................... 116 DATA ......................................................................................................... 117 Simulation Study ............................................................................... 117 Italian Piemontese Birth Weight and Calving Ease Data ............. 119 RESULTS .......................................................................................................... 122 Simulation Study ....................................................................................... 122 Application to Birth Weight and Calving Ease Scores in Italian Piemontese Cattle ...................................................................................... 125 Genetic Parameter Inference ........................................................... 125 Inference on Sire Effects .................................................................. 127 DISCUSSION .................................................................................................... 128 REFERENCES .................................................................................................. 131 CONCLUSIONS AND RECOMMENDATIONS - 147 vii LIST OF TABLES CHAPTER 1: LITERATURE REVIEW Table I. Itemized dystocia costs (Albera et al. 1999). - 38 Table 11. Direct heritability (hzd), maternal heritability (hzm), and direct-maternal genetic correlation (rg) estimates :1: standard errors, where available, for calving ease score.---- - - 39 CHAPTER 2: BAYESIAN INFERENCE STRATEGIES FOR THE PREDICTION OF GENETIC MERIT USING THRESHOLD MODELS WITH AN APPLICATION TO CALVING EASE SCORES IN ITALIAN PIEMONTESE CATTLE Table 1. Effective sample sizes for (co)variance components and threshold parameters using MCMC for first parity calving ease scores in Piemontese cattle. 65 Table II. Estimation of variance-covariance components and genetic parameters for calving difficulty in Piemontese cattle using EM, Laplacian, and MCMC algorithms. 66 CHAPTER 3: AN ASSESSMENT OF CUMULATIVE t-LINK THRESHOLD MODELS FOR THE QUANTITATIVE GENETICS ANALYSIS OF CALVING EASE SCORES Table I. Posterior inference on the residual degrees of freedom parameter (v) in simulation study using the cumulative t-link model. 97 Table II. Posterior inference on sire variance (of) in simulation study using the cumulative t-link model. 98 Table III. Deviance information criteria (DIC) and log marginal likelihood (LML) comparisons between models in simulation study. 99 Table IV. Posterior inference on genetic parameters of calving ease scores in Italian Piemontese cattle. -- - 1 - - 1- 100 Table V. MCMC chain-specific deviance information criterion (DIC) and log marginal likelihood (LML) values for the analysis of calving ease scores in Italian Piemontese cattle. - _ - 101 CHAPTER 4: BAYESIAN STRUCTURAL MODELING OF HETEROGENEOUS RESIDUAL VARIANCES IN GENERALIZED LINEAR MIXED MODELS Table I. Posterior inference on the dispersion and heterogeneity parameters in simulation study using linear mixed effects model. 135 viii Table II. Posterior inference on the dispersion and heterogeneity parameters in simulation study using cumulative probit mixed model- - 136 Table III. Posterior inference on variance components of birth weight and calving ease scores in Italian Piemontese cattle. - 137 Table IV. Posterior inference on genetic parameters of birth weight and calving ease scores in Italian Piemontese cattle. - 138 Table V. MCMC chain-specific deviance information criterion (DIC) and log marginal likelihood (LML) values for the analysis of birth weight and calving ease scores in Italian Piemontese cattle. 139 ix LIST OF FIGURES CHAPTER 1: LITERATURE REVIEW Figure l. A flowchart interaction of direct additive and maternal genetic effects with other factors in terms of their effect on calving difficulty (Burfening 1991) ............. 40 Figure 2. Mapping of underlying liabilities to observed calving ease phenotypes. Thresholds 1, 2, and 3 delimit normally distributed underlying liabilities into 4 calving ease phenotypes (unassisted calving, assisted easy calving, assisted difficult calving, and Caesarean and embryotomy). - 41 CHAPTER 3: AN ASSESSMENT OF CUMULATIVE t-LINK THRESHOLD MODELS FOR THE QUANTITATIVE GENETICS ANALYSIS OF CALVING EASE SCORES Figure l. Scatterplot of posterior means of sire progeny differences based on cumulative t-link versus cumulative probit link model with superimposed line of best fit.- 102 Figure 2. Scatterplot of posterior standard deviations of sire progeny differences based on cumulative t-link versus cumulative probit link model with superimposed line of best fit. 103 CHAPTER 4: BAYESIAN STRUCTURAL MODELING OF HETEROGENEOUS RESIDUAL VARIANCES IN GENERALIZED LINEAR MIXED MODELS Figure 1. DIC differences in simulation study - - 140 Figure 2. Approximate 95% PPI on herd-specific delta for BW. 141 Figure 3. Approximate 95% PPI on herd-specific delta for CD 142 Figure 4. Scatterplot of posterior means of sire progeny differences for BW based on homoskedastic versus heteroskedastic LMM model. 143 Figure 5. Scatterplot of posterior standard deviations of sire progeny differences for BW based on homoskedastic versus heteroskedastic LMM model. 144 Figure 6. Scatterplot of posterior means of sire progeny differences for CD based on homoskedastic versus heteroskedastic CPM model _ -_ _ 145 Figure 7. Scatterplot of posterior standard deviations of sire progeny differences for CD based on homoskedastic versus heteroskedastic CPM model. 146 INTRODUCTION Profitable cattle production requires careful cost control. A herd's cost of production is strongly associated with its general level of fitness and fertility (Banos 1999). Consequently, over the last two decades seedstock selection emphasis has been increasingly directed towards fitness traits such as conformation scores, calving ease, and stillbirth incidences. Calving ease, or conversely calving difficulty, is particularly important for beef and dairy cattle production since calving problems, compared to successful unassisted calvings, generate additional costs well beyond parturition. Costs are immediately incurred, of course, as veterinary fees, increased labor costs, and potential loss of calf, but long term losses also accrue due to subsequent declines in dam health and fertility, directly resulting in reduced long-term productivity (Albera et a1. 1999). Although all breeds experience calving difficulty to various degrees, incidence rates are particularly high in the heavy muscled Italian Piemontese breed (Carnier et al. 2000) relative to reported estimates on the continental and British breeds (V arona et al. 1999; Bennett and Gregory 2001). In the Gennplasm Evaluation Program at the Roman L. Hruska U.S. Meat Animal Research Center (MARC) in Clay Center, NE (Wheeler et al. 1996) the Piemontese breed has been concluded to be the only breed out of 10 studied to “provide the greatest opportunity to produce, lean tender meat”. Hence the importance of controlling calving difficulty in this breed cannot be understated. One strategy for improving calving ease is through genetic selection. Genetic evaluation systems for normally distributed production characters have been based on well-characterized linear mixed models (Henderson 1973). Breeding sires are typically ranked by best linear unbiased predictions (BLUP) or, equivalently, empirical Bayes estimates of their corresponding random effects. Furthermore, variance component and derivative heritabilities are typically estimated using restricted maximum likelihood (REML), which has desirable properties relative to other methods (Mc Culloch and Searle 2001). Calving ease, however, is generally recorded as a 4 or 5 level ordinal category trait, such that normality assumptions are clearly violated. Abdel-Azim and Berger (1998) and Luo et al. (2001) have demonstrated that REML estimates of variance components based on linear mixed model analysis of categorical data yield biased estimates of heritabilities and genetic correlations. A suitable model for genetic analysis of categorical data is the cumulative probit or threshold model adapted by Gianola and Foulley (1983) and Harville and Mee (1984). However, the proposed inference procedures rely upon rather strong asymptotic or approximate assmnptions. Variance component estimation has been historically based on a joint maximization of the marginal posterior density of the variance components using an approximate invocation of the expectation-maximization (EM) algorithm (Harville and Mee 1984; Stiratelli et al. 1984). However, in various simulation tests of this method, some significant biases in heritability estimates were found (Hoeschele et al. 1987, Simianer and Schaeffer 1989). Furthermore, under the threshold model, breeding sire merit is typically estimated using elements of the joint posterior modes of these random effects, conditionally on estimated variance components. These joint posterior modes are intended to approximate the posterior means, shown to be optimal selection criteria by Fernando and Gianola (1986). Whether or not those approximations work well as intended for breed genetic evaluations is not clearly known. Markov Chain Monte Carlo (MCMC) methods have facilitated exact sample Bayesian inference for many applications in animal breeding, including inference on categorical traits. In MCMC, inference on parameters is based on random but correlated draws from posterior distributions. Earlier studies have highlighted MCMC analysis for genetic parameter estimation in production traits (Wang et al. 1994a,b; Luo et al. 1999; Varona et al. 1999). The superiority of MCMC inference over, for example, approximate EM-based inference on variance components for categorical data under conventional threshold models has been demonstrated in simulation studies (Hoeschele and Tier 1995). MCMC allows additional modeling possibilities for quantitative genetic analyses of calving ease that have not yet been fully studied and exploited. For example, it has been noted that preferential treatment may be partly responsible for the records of some high producing animals. Stranden and Gianola (1999) developed hierarchical Bayesian models based on a Student t distributed error structure to provide outlier-robustness on genetic evaluations. Calving ease is often subjectively scored by herdspersons on an ordinal scale such that data quality might conceptually be a greater issue here than for continuously distributed production characters. Albert and Chib (1993) proposed a cumulative I link model providing greater modeling flexibility relative to cumulative probit model for the analysis of ordinal categorical data. In animal breeding applications, a cumulative I link model might be considered to be an outlier-robust model to minimize the impact of outlying records on genetic merit predictions for calving ease on breeding sires. As another example, many studies have indicated that residual variances are heterogeneous for normally distributed production characters across herds, sexes, and other factors (Hill et al. 1983; lbanez et al. 1999). If this phenomenon is not properly taken into account (i.e. a homoskedastic error distribution is assumed), differences in within-subclass variances result in biased breeding value predictions such that a disproportionate numbers of animals may be selected from environments characterized by high variability. This bias has the potential effect of substantially reducing genetic progress due to breedstock selection based on these predictions (Hill 1984; Weigel and Gianola 1992). A structural model for heterogeneity of residual variance on a conceptual underlying scale has been developed for the threshold model by Foulley and Gianola (1996). However, this model, as well as a subsequent model by Jafiiezic et al. (1999), invokes analytical approximations, which appear tenuous, particularly for the analysis of categorical data. Exact MCMC inference on residual heteroskedasticity is conceptually possible but requires development and testing. Cumulative t—link and heterogeneous variance models need to be tested as alternatives to conventional threshold models used currently by the beef cattle industry for genetic evaluation of breeding sires for calving ease of their daughters as dams and progeny as calves. A hierarchical Bayesian framework seems to be suitable for the construction of these models, using MCMC for statistical inference. REFERENCES Abdel-Azim, G. A.; Berger, P. J ., 1999: Properties of threshold model predictions. J. Anim. Sci. 77: 582-590. Albera, A.; Carnier, P.; Groen, A. F., 1999: Breeding for improved calving performance in Piemontese cattle economic value. Proceedings international workshop on EU concerted action genetic improvement of functional traits in cattle (GIFT); breeding goals and selection schemes. Bulletin no:23. Wageningen, The Netherlands 7-9th November, 1999. Albert, J. H.; Chib, S., 1993: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88: 669-679. Banos, G., 1999: From research to application: A summary of scientific developments and possible implementation to the genetic improvement for functional traits. Bulletin no:23. Wageningen, The Netherlands 7-9“1 November, 1999. Bennett, G. L.; Gregory, K. E., 2001: Genetic (co)variances for calving difficulty score in composite and parental populations of beef cattle: I. Calving difficulty score, birth weight, weaning weight, and postweaning gain. J. Anim. Sci. 79: 45-51. Carnier, P.; Albera, A.; Dal Zotto, R.; Groen, A. F.; Bona, M.; Bittante, G., 2000: Genetic parameters for direct and maternal calving performance over parities in Piemontese cattle. J. Anim. Sci. 78: 2532-2539. Fernando, R.L.; Gianola, D., 1986: Optimal properties of the conditional mean as a selection criterion. Theor. Appl. Genet. 72: 822-825. Foulley, J. L.; Gianola, D., 1996: Statistical analysis of ordered categorical data via a structural heteroskedastic threshold model. Genet. Sel. Evol. 28: 249-273. Gianola, D.; Foulley, J. L., 1983: Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15 :201-224. Harville, D. A.; Mee, R. W., 1984: A mixed-model procedure for analyzing ordered categorical data. Biometrics 40: 393-408. Henderson, C.R., 1973: Sire evaluation and genetic trends. In: Proc. Anim. Breed. Genet. Symp. in Honor of Dr. J.L. Lush. ASAS and ADSA, Charnpaign, IL, pp. 10-41 Hill W. G., 1984: On selection among group with heterogeneous variance. Anim. Prod. 39: 473-477. Hoeschele, I.; Gianola, D.; Foulley, J. L., 1987: Estimation of variance components with quasi-continuous data using Bayesian methods. J. Anim. Breed. Genet. 104: 334- 349. Hoeschele, 1.; Tier, B., 1995: Estimation of variance components of threshold characters by marginal posterior mode and means via Gibbs sampling. Genet. Sel. Evol. 27: 519-540. lbanez, M. A.; Carabano, M. J .; Alenda, R., 1999: Identification of sources of heterogeneous residual and genetic variances in milk yield data from the Spanish Holstein-Friesian population and impact on genetic evaluation. Livest. Prod. Sci. 59: 33-49. Jaffrezic, F.; Robert-Granie, C.; Foulley, J. L., 1999: A quasi-score approach to the analysis of ordered categorical data via a mixed heteroskedastic threshold model. Genet. Sel. Evol. 31: 301-318. Luo, M. F.; Boettcher, P. J .; Dekkers, J. C. M.; Schaeffer, L. R., 1999: Bayesian analysis for estimation of genetic parameters of calving ease and stillbirth for Canadian Holsteins. J. Dairy Sci. 82: 1848. Luo, M. F.; Boettcher, P. J .; Schaeffer, L. R.; Dekkers, J. C. M., 2001: Bayesian inference for categorical traits with an application to variance components estimation. J. Dairy Sci. 84: 694-704. McClulloch, C.E.; Searle, SR, 2001: Generalized, linear, and mixed modals. Wiley, NewYork. Simianer, H.; Schaeffer, L. R., 1989: Estimation of covariance components between one continuous and one binary trait. Genet. Sel. Evol. 21: 303-315. Stiratelli, R.; Laird, N.; Ware, J. H., 1984: Random-effects models for serial observations with binary response. Biometrics. 40: 961 -971. Stranden, I.; Gianola, D., 1999: Mixed effects linear models with t-distributions for quantitative genetic analysis: a Bayesian approach, Genet. Sel. Evol. 31 : 25-42. Varona L.; Misztal, 1.; Bertrand, J. K., 1999: Threshold-linear versus linear-linear analysis of birth weight and calving ease using an animal model: I. Variance component estimation. J. Anim. Sci. 77: 1994-2002. Wang, C. S.; Rutledge, J. J .; Gianola, D., 1994a: Bayesian analysis of mixed linear models vi Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26: 91-115. Wang, C. S.; Gianola, D.; Sorensen, D.; Jensen, J .; Christensen, A.; Rutledge, J. J., 1994b: Response to selection for litter size in Danish landrace pigs-A Bayesian analysis. Theor. And Appl. Genet. 88: 220-230. Weigel K. A.; Gianola, D., 1992: Estimation of heterogeneous within-herd variance components using empirical bayes method: A simulation study. J. Dairy Sci. 75: 2824-2833. Wheeler, T.J.; Cundiff, L.V.; Koch, R.M.; Crouse, J .D., 1996: Characterization of biological types of cattle (Cycle IV): carcass traits and longissimus palatability. J. Anim. Sci. 74:1023-1035. CHAPTER 1 Literature Review INTRODUCTION Dystocia, or calving difficulty, is defined as an abnormal or difficult birth requiring assistance (Manfredi et al. 1991). The percentages of dams requiring some assistance at calving range from 28.9% in American Gelbvieh (V arona et al. 1999) to 88.4% in Italian Piemontese cattle (Carnier et al. 2000), although these numbers may be further indicative of management differences as well. The percentages of dams requiring Caesarean section at calving range from are 0.4% in Holstein Fresian (Meijering 1984), to 13.7% in Italian Piemontese cattle (Carnier et al. 2000). Calving difficulty, or conversely calving case, is scored subjectively according to its departure from a normal calving, the latter of which results with a healthy dam and a healthy calf without any human intervention (Meijering 1984). These scores are generally defined on an ordinal scale from 1 (unassisted calving) to 4 or 5 (Caesarean section), the score thereby reflecting the amount of required assistance at birth. Injuries or suffocation resulting from difficult or delayed calving cause the death of many calves either at birth or immediately thereafter in spite of the fact that about 80% of all calves lost at birth are anatomically normal (Whitter and Thome 1995). In various beef breeds, calf mortality at or near the time of birth has been shown to be four times greater (P<0.01) in calves experiencing dystocia (20.4%) than in those not experiencing dystocia (5.0%) (Laster and Gregory 1973). In addition to the loss of the calf or dam, calving difficulties negatively impact cattle production through increased veterinary labor costs, subsequent dam health and fertility problems (i.e., increases in the postpartum interval or decreases in overall conception rate) and subsequently decreased dam milk production (Walker et al. 1994; Albera et al. 1999). Anderson (1998) estimated that calving difficulty results in annual losses of 25 million dollars in the state of Nebraska and overall annual losses in the United States are estimated at between 500 million and 750 million dollars (Walker et a1. 1994). Albera et al. (1999) itemized economic losses derived from market prices or supplied by veterinarians and Piemontese extension specialists in the computation of dystocia costs in Italian Piemontese cattle. These losses are highlighted in Table I. Since calving difficulty has both major direct and indirect costs on production, cattle breeders have an interest in genetic analysis of calving difficulty to selectively improve populations for better calving ease. Genetic and environmental factors contribute to the cause and severity of calving difficulty. Figure 1 taken from Burfening (1991) flowcharts the complexity of this trait and serves to point out the numerous interacting factors that affect calving difficulty. As seen from Figure 1, calving case is both a characteristic of the cow and of the calf. Furthermore, a breeding bull may influence calving case either as a sire or as a maternal grandsire. For example, a calfs sire contributes to the calving ease as a characteristic of the calf through the sire's direct genetic effect. This particular genetic effect may be influenced by the birth weight, gestation length and shape of the calf as inherited from its sire. Sire breed differences for calving ease are known to exist. Laster et al. (1973) have shown in a study involving Hereford and Angus cows that calves sired by Charolais, Simmental, Limousin and South Devon bulls experienced significantly more calving difficulty than calves sired by Hereford, Angus and Jersey bulls (P<0.01). A maternal genetic effect, on the other hand, is determined by genes passed on by the sire to his daughter that affect her pelvic measures, uterine environment, hormonal control and other factors that determine her ability to calve easily as a dam. That same sire also passes on genes to his daughter's calf that directly influences its ability to be calved out easily (i.e. direct genetic effects). Pollak and Freeman (1976) and Berger and Freeman (1978) determined that calf sex was an important source of variability for calving difficulty in Holsteins. This is not too surprising since bull calves are generally larger than heifer calves at birth. Laster and Gregory (1973) determined calf losses in various beef breeds to be higher in male calves (22.4%) than in female calves (16.3%) delivered from dams experiencing difficult births. However, they found no difference in calf mortality between sexes when calving was unassisted. Calf birth weight is an important factor affecting calving difficulty. Laster et al. (1973) determined that the regression of calving difficulty on birth weight was highly significant (P<0.005). Birth weight is also genetically correlated to calving difficulty (V arona et al. 1999), which indicates that genetic selection against calving difficulty reduces birth weight. The effect of birth weight on calving difficulty is significant in heifers but as dams become older, the effect of birth weight becomes less important (Herring 1996). Calving difficulty in two-year-old dams (54% Hereford and Angus cows in US. Meat Animal Research Center, MARC and 30% Hereford cows in Colorado State University, CSU) is three to four times more frequent than in three-years-olds (16% in MARC and 11% in CSU), since the younger dams have not yet attained full physical or skeletal maturity (Meijering 1984). Calving difficulty problems are minimal when a dam 10 reaches four to five years old. In an analysis of calving difficulty data collected over 14 years, Berger (1994) determined that fewer in Holstein cows needed assistance for calving in second (18.42%), third (16.97%) and later parities than in first parity (40.40%). Season of calving also impacts calving difficulty. Berger (1994) reported that calving difficulty in Holsteins is more frequent during winter than in summer. Philipsson (1976) explained this by noting that summer is grazing season which enables cows to exercise on the pasture. In contrast, cows are confined and receive less exercise during the winter. Pelvic opening of the dams also represents an important source of variation in the frequency of calving difficulties. Disproportion in size between the fetus and darn appears to be the major cause of dystocia. Thus, there is a critical need for adequate growth in heifers to allow an increase in pelvic area. Both direct and maternal genetic variability appears to exist for calving ease based on studies reported in Table II. This table reports heritability and genetic correlation estimates by study, model and method of variance component estimation. These models and methods are discussed in more detail later. Although reported heritability estimates are highly variable, direct heritability estimates generally concentrate around intermediate values (0.15-0.20) with maternal heritability estimates being generally lower. The estimated genetic correlations between direct and maternal effects are also generally negative, thereby indicating that the ability of a female calf to be easily delivered from her own dam is antagonistic with that same female's ability to subsequently calve easily herself. Since calving ease and body dimensions are inherently related, a calf may be 11 small at birth allowing her own easy delivery but then being small as a dam would have subsequent calving problems. STATISTICAL MODELS FOR GENETIC EVALUATION OF CALVING EASE A primary goal of breeding programs is to maximize genetic gain in animal traits affecting livestock production. Since the greatest selection differential is on sires in most beef cattle populations, particularly where the use of artificial insemination is prominent, accurate sire ranking is important. Achieving accurate sire selection requires inference on genetic parameters such as, for example, heritability, or the proportion of total variability that is genetic, whether for direct or maternal genetic effects. The greater the heritability, the greater the potential for genetic improvement of the character through selective breeding. For continuous production characters like weight gain, estimation of genetic parameters is typically carried out using restricted maximum likelihood (REML) under linear mixed models. In linear mixed models, the influence of fixed effects such as age, breed or sex along with genetic and other random effects are jointly inferred upon for the analysis of normally distributed traits. Predicted breeding values of individual animals using best linear unbiased prediction (BLUP) are useful for seedstock selection in livestock improvement programs (Henderson, 1973). In addition to obvious production characters affecting farm revenue, economic efficiency in animal production can also be enhanced by considering genetic selection for health and reproductive fitness traits such as conformation score, calving ease and ovulation rate (Dematawewa and Berger, 1997). However, unlike production traits, fitness traits such as calving case are often not continuous in their expression but are discrete. Thus, key multivariate normal distributional assumptions invoked under a linear 12 mixed model analysis are violated, particularly when the number of ordinal categories is small; i.e. $5. Gianola (1982) argued against using BLUP to predict breeding values for ordinal data since estimated breeding values and residuals can be shown to be dependent upon each other. Furthermore, several recent studies illustrate that variance component estimation based on linear mixed models consistently yielded biased estimates of heritabilities and correlation coefficients in the analysis of categorical data (Abdel-Azim and Berger 1998; Luo et al. 2001). Threshold models, more commonly known as cumulative link models in the statistics literature, are appropriate for the multifactorial analysis of ordinal categorical traits. Threshold models are probably best understood by the concept that the ordinal trait is influenced by an underlying liability or latent variable binned by various thresholds that impose a discontinuity on the visible expression of the trait (Falconer and MacKay, 1996). In Figure 2, for example, a liability value between the first and second thresholds maps to an observed category of assisted easy calving. The concept of the threshold model dates back to Wright (1934) who inferred upon the variability of the number of toe digits between and within strains of guinea pigs and hypothesized an underlying normal distribution of liability variables with strain-specific means and common variance. Using calving ease data as the application, both Gianola and Foulley (1983) and Harville and Mee (1984) proposed breeding value estimation procedures under a threshold mixed effects model based on the generalized linear model formalized by Nelder and Wedderburn (1972). Unlike the linear mixed model, the threshold mixed model does appropriately account for the multinomial distribution of calving ease data, conditional upon a "linked" function of both fixed and random effects as defined 13 previously. The typical link function used for threshold mixed models in animal breeding is the cumulative probit function. The probit function is simply the inverse function of the standard normal cumulative density function (cdf). In a cumulative probit threshold model for the analysis of data with C ordinal categories, the cumulative probability, Prob(J S j), up to a certain ordinal score j, 135C, on a subject is written as standard normal cdf of a linear function of the (i-I)th threshold parameter and the fixed and random effects associated with that subject. Using Figure 2, for example, the probability of a dam experiencing an easy unassisted calving or easier calving is modeled as the cdf of the first threshold plus any fixed or random effects that influence the magnitude of the liability variable. Gianola and Foulley (1983) developed scoring equations for the threshold mixed model under the Bayesian paradigm. These equations maximize the joint posterior density of the fixed and random effects, conditional upon known or estimated variance components. The corresponding joint modes are then reported as point estimates with standard errors based on the information matrix of this joint posterior density. These joint modal or maximum a posteriori (MAP) estimates are currently used for genetic evaluations on calving ease (Berger 1994; Wang et al. 1997). Genetic evaluation models are of two broad types. The first model is the sire or sire and maternal grandsire model where calf records on, say, calving case are connected directly to sires with genetic relationships identified only through known paternal ancestry. The second is the animal model where each record is connected directly to each calf identification with all known genetic relationships explicitly modeled. Whereas animal models are now predominantly used for genetic evaluation of production traits under a linear mixed mode], sire or sire and maternal grandsire models are most ofien 14 used for threshold mixed model analysis of calving ease data. If the predominant genetic relationship is that due to paternal halfsibs, there should be intuitively very little difference in sire genetic evaluations between those computed under a sire versus an animal model. Indeed, these models can be shown to be equivalent under a linear mixed model framework if the only genetic relationships are paternal half sibs on calves making the records. However, threshold animal models have been shown to lead to MAP estimates of breeding values that are much more biased than those determined under threshold sire models (Mayer, 1995), since the approximation invoked for MAP breaks down substantially when the number of records per individual in a model is small. Such a number on average is smaller in animal models than in sire models since there are generally many more unique calf identifications than unique sire identifications in a set of data. BAYESIAN INFERENCE Likelihood-based methods have been generally used to infer upon genetic parameters in animal breeding and genetics. However, recent theoretical and computational developments in Bayesian inference, such as Markov Chain Monte Carlo methods, have inspired researchers to increasingly use Bayesian methods in animal breeding and genetic applications. In Bayesian inference, as with likelihood inference, the joint density of the data y is characterized by a probability distribution p(y|0) . The quantity0 denotes the vector of unknown parameters, including those that an investigator may wish to infer upon. A Bayesian model essentially consists of two parts. The first is the sampling distribution or likelihood function p(y|0) which is the information provided by y on 0. The second part 15 is a prior distribution p(0). In the Bayesian approach, researchers may have some prior knowledge about 0 which could be incorporated in the analysis using their specification of p(0). This feature creates one main distinction between Bayesian and frequentist or likelihood approaches. Afier combining the prior information p(0) with the available information from data p(yl0) , inference onO is based on the posterior distribution: p(OIy)=p—(¥"%I;-@. where p(y>= ip oo , the t distribution approaches the lighter-tailed normal distribution. Having a thicker tail, the t distribution can provide robustness against unusual or outlying observations when used to model the density of the residual terms. Lange et al. (1989) considered the t-distribution as a useful extension of the normal distribution for statistical modeling of datasets having outliers. A maximum likelihood approach was used for parameter estimation including the degrees of freedom parameter. Numerous datasets were analyzed based on the maximum likelihood strategy for a general t-distributed error model. Generally, the t model was found to fit much better than a normal model based on the likelihood ratio test. The asymptotic standard errors for the location parameter estimates under the t model also were generally smaller compared to the normal models. Reasonable estimates of degrees of freedom were obtained for all datasets, except for a radioimmunoassay dataset. In a nonlinear regression 21 analysis of radioimmunoassay dataset, the maximum likelihood estimate of the degrees of freedom parameter was 0.29, which was concluded to be not very satisfactory. Lange et al. (1989) concluded that the low estimate stemmed from attempting to accommodate one very extreme outlier. When that outlier was removed, the resulting estimate increased to 1.2. Fraser (1979) suggested that maximum likelihood estimation of degrees of freedom is not advisable when degrees of freedom estimation goes much below 1 suggesting that the t-model is not well suited to data with extreme outliers. Fraser (1979) further suggested that in problems with small sample sizes, researchers might fix the degrees of freedom at some predetermined value, such as 4, rather than attempt to estimate the degrees of freedom simultaneously with other parameters. Based on the results from different analysis, they concluded that outliers and robustness can be handled by the maximum likelihood estimator for all the parameters in the t-distributed error model using the EM algorithm. Lange and Sinsheimer (1993) extended the work by Lange et al. (1989) further by considering alternative heavy-tailed distributions to the Student t. Each of the models considered in their paper, including the logistic, slash, t and contaminated normal distributions, were shown to be derived as scale mixtures of normals. More specifically, given that the residual terms e,- are specified conditionally as e, ~ N (0,073) , i=1,2,. . .,n, alternative heavy-tailed residual densities can be marginally specified by alternative distributions on A). For example, a Student 1 distribution is represented as a scale mixture of normals with Gamma distributed scaling factors xi, ~ GammaGE, %) , v being the degrees of freedom (Lange et al. 1989). Alternatively, the slash distribution is 22 characterized by scaling factors having the density p(zi, I v) = WI,"—l , , i =1 ,2,. . . .,n . Under a contaminated normal error model, each residual is drawn from a mixture of two populations based on Prob(2., = 2. | ¢, y) = 7 if 2. = (15 or 1— y, if x1 =1 . In a logistic error model, the probability is transformed to odds to remove the upper bound and then taking the logarithm of odds also removes the lower bound for the linear function of the explanatory variables. Lange and Sinsheimer (1993) chose the following logistic model u, =0,/ (1+e92 «3.1.1., ”°‘["“"* ”2 ) for their analysis. Using maximum likelihood in large datasets, the authors found that the t, slash and contaminated methods showed great promise in muting the effects of outliers. The slash and t-models had particularly superior estimation properties while the contaminated and logistic model were less satisfactory. However, in the analysis of radioimmunoassay data, Lange et al. (1989) found potential problems with the t and slash error models and suggested that the best alternative with a data set of this nature may be to discard the outliers and simply use the normal error models. Geweke (1993) considered methods for Bayesian inference in econometric models in which errors were independent and identically t—distributed using the Gibbs sampler. Posterior odds ratios were calculated for t-error models having different degrees of freedom versus a normal error model. Geweke applied these models to fourteen macroeconomic time series datasets and found that the posterior odds ratio always heavily favored the independent Student-t error model over the normal error model. In animal breeding and genetics, it has been noted that preferential treatment commonly influences the records of high producing or economically valuable animals and consequently, their predicted breeding values (Stranden 1996). Stranden and Gianola 23 (1998) simulated the effect of preferential treatment of cows for milk production in four herds of a multiple ovulation and embryo transfer scheme under selection. Three mixed effect linear models were compared in terms of their ability to handle preferential treatment: the classical Gaussian model, a model with multivariate t-distributed error clustered by herd, and a model with independent t-distributed error. The posterior distributions of all parameters were obtained using the Gibbs sampler. The three models were found to have similar performance in the absence of preferential treatment. Conversely, when the preferential treatment was prevalent and substantial in effect, the univariate t-model lead to substantially less biased inference on breeding values and genetic trends than those obtained with the Gaussian model. Stranden and Gianola (1999) also presented a Bayesian approach for inferences about parameters of mixed effects linear models with t-distributed error effects in quantitative genetic application. Data was generated based on a preferential treatment process. The univariate t-model for the errors led to better estimates of additive genetic and error variances than either a herd-clustered t-model or a Gaussian sampling process. Using a univariate t-model, posterior distributions of breeding values were found to be sharper and the posterior mean estimates of heritabilities were closer to the true values compared to estimates derived from the other two models. Rosa (1999) considered the application of robust mixed linear models based on (- distribution, slash and contaminated normal error distributions to pup birth weight from reproduction toxicology study. Marginal posterior densities of degrees of freedom, for the t and slash error distributions, were determined to be concentrated about small values, 24 suggesting the inadequacy of the Gaussian distribution. Rosa (1999) also computed Bayes factors to verify the better fit of the long-tailed distributions to this data set. In statistical mapping of quantitative trait loci (QTL), a common assumption is normally distributed phenotypic observations; any deviation from this assumption can affect power and robustness of QTL detection. Von Rohr and Hoeschele (2002) demonstrated the application of a skewed Student-t sampling model under four different error distributions and determined that residual, additive QTL and dominance QTL variance estimates were much closer to the true value when the analysis was performed with the skewed Student-t model rather than with normal model. Replacement of the normal by a skewed Student-t penetrance function also clearly improved the accuracy of parameter estimation. Thus, their results indicated substantial benefits of heavy-tailed skewed error models for QTL mapping. Albert and Chib (1993) considered a cumulative t-link function, rather than a cumulative probit link function for the analysis of ordinal categorical data. This specification is equivalent to specifying the underlying latent variables in a threshold model to be t-distributed rather than normally distributed. Gianola and Sorensen (1996) extended Albert and Chib’s (1993) work to describe a hierarchical Bayes model with correlated random effects suitable for quantitative genetic analysis of categorical data. Albert and Chib (1993) also presented Gibbs sampling schemes for models where the latent distribution was either univariate or multivariate t, and also considered Bayes factors for contrasting different models as well as for identification of outliers. 25 HETEROSKEDASTIC MIXED EFFECTS MODELS An important assumption in most genetic evaluation models is that variance components associated with random effects are constant across all environmental conditions. However, the existence of heterogeneous variance, or heteroskedasticity, for milk production (Hill et al. 1983; lbanez et al. 1999), growth performance (Garrick et al. 1989), and conformation (Robert-Granie et al. 1997) has been firmly established. More recently, work on estimating heteroskedasticity on the underlying scale in threshold models have been investigated in animal breeding (Foulley and Gianola 1996; Jaffrezic et al. 1999; Ducrocq 2000). A number of possible reasons for heteroskedasticity have been suggested, including a positive relationship between herd means and variance, spatial gradients in residual variability across geographical regions, and temporal gradients in residual variability within herds due to changes in herd management. If heteroskedasticy is not properly taken into account, differences in within-subclass variances can result in biased breeding value predictions and disproportionate numbers of animals selected from environments characterized by high variability, thereby reducing genetic progress (Hill 1984; Weigel and Gianola 1992). Alternative methods have been proposed to take into account heterogeneity of variance in quantitative genetics and animal breeding. For continuous production data, a logarithmic transformation can be used to alleviate the heterogeneous variance problem if the variances are a simple function of the mean of trait. Scaling of observations by the estimated standard deviations was proposed by Hill (1984) as another method of accounting for heterogeneous variance. 26 Heterogeneity of genetic and residual variances can be accounted for in genetic evaluation using BLUP, provided that these variance components are known (Gianola 1986). However, variance components are rarely known and need to be estimated. Since most preferred methods of variance components estimation, including ML and REML, rely on asymptotic properties, such procedures may not yield reliable estimates if variance components are estimated separately for each of many small subclasses (Weigel and Gianola 1992). This was demonstrated by Winkelman and Schaeffer (1988) who used REML and ANOVA to estimate within-herd genetic and residual variances. Gianola et al. (1992) presented a Bayesian approach to deal with heterogeneity of residual variances with respect to some criterion of classification (strata) of the data. In their model, variance components (or variance covariance matrices) were assumed to have prior inverted chi-square (or inverted Wishart) distribution. The resulting empirical Bayes estimates of within-herd residual variances can be essentially shown to be weighted averages of the residual variance using data from the herd (i.e. REML) and the average across-herd residual variance. Gianola et al. (1992) indicated that when the amount of information in a particular herd stratum is large, the REML part of the estimator dominates; otherwise, the average across herd estimate dominates. Foulley et al. (1992) proposed a structural linear model for assessing the source of heterogeneity of residual variances in Gaussian mixed linear models. Their procedures are based on the concept of a log link function, 7,. = ln 0: = mpt. , for variance components where a": is the residual variance in subclass i, and m, is a known incidence matrix for the unknown vector A of dispersion parameters. The marginal likelihood (marginal posterior density of A) is maximized with respect to A. to determine point 27 estimates. Hypothesis testing of any of the dispersion parameters in A are then based on a marginal likelihood ratio test against subset models. The estimation algorithms in Foulley et al. (1992) showed remarkable similarities to the mixed model equations of Henderson. San Cristobal et al. (1993) extended the method of Foulley et al. (1992) by identifying potential multifactorial sources of heterogeneity of residual and genetic variances in mixed linear Gaussian models. They developed the model based on a structural linear model for log residual and genetic variances as a function of fixed and random effects, just as is done for location parameters in a regular linear mixed effects model. Foulley and Gianola (1996) adopted the structural linear model and log link function for heterogeneity of residual variance of underyling liabilites in the cumulative probit threshold model. Approximate statistical inference and hypothesis testing on all dispersion parameters and goodness of fit was again based on a marginal likelihood test. Foulley and Gianola (1996) applied their method to calving ease scores from the US Simmental breed and concluded that the heteroskedastic threshold model had a better fit to the data than the standard threshold model. J affrezic et al. (1999) also used a structural linear model on log-variance to infer upon heterogeneity of residual variances in mixed threshold model. For parameter estimation they presented an approximate quasi-score approach including two steps: i) a marginalization with respect to the random effects leading to quasi-score estimators, ii) an approximation of variance-covariance matrix of the observations. 28 Ducrocq (2000) analyzed the calving ease data of French dairy breeds (Normande and Montbeliarde) using the heteroskedastic threshold model as proposed by Foulley and Gianola (1996). He first selected a satisfactory model to describe the effects of environmental factors on the location and dispersion parameters of the underlying liability variables. Four random effects were then added: sire of calf, sire of dam, dam (within sire of dam) and herd-year-season, assuming homogeneous ratios of variance components across environments. Accounting for heterogeneity of the residual variance in the analysis significantly improved all model fit criteria (likelihood ratio, AIC and chi- square statistics). He determined that direct and maternal heritability estimates were similar for both breeds but were substantially lower than those obtained by Manfredi (1990). Direct heritabilities were 5.4% in both breeds, while maternal heritabilities were very low (around 3%). The specific objectives for this dissertation were to 1) Compare variance component, genetic parameter and breeding value estimates based on the two marginal maximum likelihood procedures (EM and Laplace), and MCMC using a threshold sire-matemal grandsire model in order to assess the relative need for MCMC methods in a national genetic evaluation system for calving ease. 2) Assess the potential utility of outlier-robust threshold models in simulated and actual calving ease data. 29 3) Develop and apply a Bayesian structural multiplicative model on residual variances for observed and augmented variables in heteroskedastic generalized linear mixed models. 30 REFERENCES Abdel-Azim, G. A.; Berger, P. J ., 1999: Properties of threshold model predictions. J. Anim. Sci. 77: 582-590. Albera, A.; Carnier, P.; Groen, A. F ., 1999: Breeding for improved calving performance in Piemontese cattle economic value. Proceedings international workshop on EU concerted action genetic improvement of functional traits in cattle (GIFT); breeding goals and selection schemes. Bulletin no:23. Wageningen, The Netherlands 7-9th November, 1999. Albert, J. H.; Chib, S., 1993: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88: 669-679. Anderson, R, 1998: Minimizing calving difficulty in beef cattle. University of Minnesota extension service. hm;://www.mes.umn.edu/Documents/D/I/DI5778.htrnl. Bennett, G. L.; Gregory, K. E., 2001: Genetic (co)variances for calving difficulty score in composite and parental populations of beef cattle: I. Calving difficulty score, birth weight, weaning weight, and postweaning gain. J. Anim. Sci. 79: 45-51. Berger, P. J ., 1994: Genetic prediction for calving ease in the United States: Data, models, and use by the dairy industry. J. Dairy Sci. 77: 1146-1153. Berger, P. J.; Freeman, A. E., 1978: Prediction of sire merit for calving difficulty. J. Dairy Sci. 61: 1146-1152. Bink, M. C. A. M.; Quaas, R. L.; Van Arendonk, J. A. M., 1998: Bayesian estimation of dispersion parameters with a reduced animal model including polygenic and QTL effects. Genet. Sel. Evol. 30: 103-125. Burfening, P. J ., 1991: Factors affecting calving difficulty and implications to breeding and management programs. Proceedings of the international beef symposium. January 15-17, 1991. Montana. Carnier, P.; Albera, A.; Dal Zotto, R.; Groen, A. F.; Bona, M.; Bittante, G., 2000: Genetic parameters for direct and maternal calving performance over parities in Piemontese cattle. J. Anim. Sci. 78: 2532-2539. Dematawewa, C. M. B.; Berger, P. J., 1997: Effect of dystocia on yield, fertility, and cow losses and an economical evaluation of dystocia sources for Holsteins. J. Dairy Sci. 80: 754-761. Dong, M. C.; Quaas, R. L.; Pollak, E. J ., 1991: Estimation of genetic parameters of calving ease and birth weight by a threshold model. J. Anim. Sci. 69 (Suppl. 1): 204 (Abstr.). 31 Ducrocq, V., 2000: Calving ease evaluation of French dairy bulls with a heteroskedastic threshold model with direct and maternal effects. Genetic evaluations for conformation and other functional traits: 123. Bulletin no: 25. Proceedings of the 2000 interbull meeting. Bled, Slovenia, May 14-15, 2000. Falconer, D. S.; Mackay, T. F. C., 1996: Introduction to quantitative genetics, 4th edn. Harlow, UK: Longman. F oulley, J. L.; Irn, S.; Gianola, D.; Hoeschele, 1., 1987: Emprical Bayes estimation of parameter for n polygenic traits. Genet. Sel. Evol. 19: 197:224. Foulley, J. L.; San Cristobal, M.; Gianola, D.; Irn, S., 1992: Marginal likelihood and Bayesian approaches to the analysis of heterogeneous residual variances in mixed linear Gaussian models. Comp. Stat. Data Anal. 13 (1992) 291-305. Foulley, J. L.; Gianola, D., 1996: Statistical analysis of ordered categorical data via a structural heteroskedastic threshold model. Genet. Sel. Evol. 28: 249-273. Fraser, D. A. S., 1979: Inference and linear models, New York: McGraw-Hill. Garrick, D. J .; Pollak, E. J .; Quaas, R. L.; Van Vleck, L. D., 1989: Variance heterogeneity in direct and maternal weight traits by sex and percent purebred for Simmental-sired calves. J Anim. Sci. 67: 2515-2528. Gelfand, A. E.; Smith, A. F. M.; Lee, T. M., 1992: Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling. J. Am. Stat. Assoc. 87: 523-532. Gelman, A.; Rubin, D., 1992: Inference from iterative simulation using multiple sequences. Stat. Sci. 15: 201-224. Gelman, A.; Carlin, J. B.; Stern, H. S.; Rubin, D. B., 1995: Bayesian data analysis, lSt edn., Chapman & Hall, London. Geman, S.; Geman, D., 1984: Stochastic relaxation, Gibbs distribution and Bayesian restoration of images. IEE Transactions on Pattern Analysis and Machine Intelligence 6: 721-741. Geyer, C. J ., 1992: Practical Markov chain Monte-Carlo (with discussion). Stat. Sci. 7: 467-511. Geweke, J., 1993: Bayesian treatment of the independent Student-t linear model, J. Appl. Econometrics 8: 19-40. Gianola, D., 1982: Theory and analysis of threshold characters. J. Anim. Sci. 54: 1079- 1096. 32 Gianola, D.; Foulley, J. L.. 1983: Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15 :201-224. Gianola, D., 1986: On selection criteria and estimation of parameters when the variance is heterogeneous. Theor. Appl. Genet. 72: 671-677. Gianola, D.; Foulley, J. L.; Fernando, R. L.; Henderson, C. R.; Weigel, K. A., 1992: Estimation of heterogeneous variance using empirical bayes methods: Theoretical considerations. J. Dairy Sci. 75 : 2805-2823. Gianola, D.; Sorensen, D., 1996: A mixed effects threshold model with a t distribution. 47th Annual Meeting of the European Association for Animal Production, Lillehammer, Norway, 15p. Graser, H. U.; Smith, S. P.; Tier, B., 1987: A derivative-free approach for estimating variance components in animal models by restricted maximum likelihood. J. Anim. Sci. 64:1362-1370. Harville, D. A.; Mee, R. W., 1984: A mixed-model procedure for analyzing ordered categorical data. Biometrics 40: 393-408. Henderson, C. R., 1973: Sire evaluation and genetic trends. In: Proc. Anim. Breeding Genet. Symp. In Honor of Dr. Jay L. Lush. Pp 10-41. Am. Soc. Anim. Sci. Charnpaign, IL. Herring, W.O., 1996: Calving difficulty in beef cattle. Agricultural publication G2035. Published by University extension, University of Missouri-Columbia. Hill , W. G.; Edwards, M. R.; Ahmed, M. K. A.; Thompson, R., 1983: Heritability of milk yield and composition at different levels and variability of production. Anim. Prod. 36: 59-68. Hill W. G., 1984: On selection among group with heterogeneous variance. Anim. Prod. 39: 473-477. Hoeschele, 1.; Gianola, D.; Foulley, J. L., 1987: Estimation of variance components with quasi-continuous data using Bayesian methods. J. Anim. Breed. Genet. 104: 334- 349. Hoeschele, I.; Gianola, D., 1989: Bayesian versus Maximum Quasi Likelihood methods for sire evaluation with categorical data. J. Dairty Sci. 72: 1569-1577. Hoeschele, 1.; Tier, B., 1995: Estimation of variance components of threshold characters by marginal posterior mode and means via Gibbs sampling. Genet. Sel. Evol. 27: 519-540. 33 lbanez, M. A.; Carabano, M. J .; Alenda, R., 1999: Identification of sources of heterogeneous residual and genetic variances in milk yield data from the Spanish Holstein-Friesian population and impact on genetic evaluation. Livest. Prod. Sci. 5 9: 33-49. Jaffrezic, F.; Robert-Granie, C.; Foulley, J. L., 1999: A quasi-score approach to the analysis of ordered categorical data via a mixed heteroskedastic threshold model. Genet. Sel. Evol. 31: 301-318. Jensen, J .; Wang, C. S.; Sorensen, D. A.; Gianola, D., 1994: Bayesian inference on variance and covariance components for traits influenced by maternal and direct genetic effects, using the Gibbs sampler. Acta Agric. Scand. 44: 193-201. Lange, K. L.; Little, R. J. A.; Taylor, J. M. G., 1989: Robust statistical modeling using the t distribution. J. Am. Stat Assoc. 84:881-896. Lange, K. L.; Sinsheimer, J. S., 1993: Normal/independent distributions and their applications in robust regression. J. Am. Stat. Assoc. 84 :881-896. Laster, D. B.; Glimp, H. A.; Cundiff, L. V; Gregory, K. E., 1973: Factors affecting dystocia and the effects of dystocia on subsequent reproduction in beef cattle. J. Anim. Sci. 36: 695-705. Laster, D. 3.; Gregory, K. E., 1973: Factors influencing peri- and early potrratal calf mortality. J. Anim. Sci. 37: 1092-1097. Luo, M. F .; Boettcher, P. J .; Dekkers, J. C. M.; Schaeffer, L. R., 1999: Bayesian analysis for estimation of genetic parameters of calving ease and stillbirth for Canadian Holsteins. J. Dairy Sci. 82: 1848. Luo, M. F.; Boettcher, P. J.; Schaeffer, L. R.; Dekkers, J. C. M., 2001: Bayesian inference for categorical traits with an application to variance components estimation. J. Dairy Sci. 84 : 694-704. Manfredi, E. J .; Ducrocq, V.; Foulley, J. L., 1991a: Genetic analysis of dystocia in dairy cattle. J. Dairy Sci.. 74: 1715-1723. Manfredi, E. J .; San Cristobal, M.; Foulley, J. L., 1991b: Some factor affecting the estimation of genetic parameters for cattle dystocia under a threshold model. Anim. Prod. 53: 151-156. Matos, C. A. P.; Thomas, D. L.; Young, L. D.; Gianola, D., 1994: Analysis of lamb survival using linear and threshold models with maternal effects. 5th World Congress of Genetics Applied to Livestock Production, Ontario, Canada. Vol. 18: 426-429. 34 Matos, C. A. P.; Thomas, D. L.; Gianola, D.; Tempelman, R. J.; Young, L. D., 1997: Genetic analysis of discrete reproductive traits in sheep using linear and nonlinear models. 1. Estimation of genetic parameters. J. Anim. Sci. 75 : 76-87. McGuirk, B. J .; Going, I; Gilmour, A. R., 1998: The genetic evaluation of beef sires used for crossing with dairy cows in the UK. 2. Genetic parameters and sire merit predictions for calving survey trits. Animal Sci. 66: 47-54. McGuirk, B. J .; Going, I; Gilmour, A. R., 1999: The genetic evaluation of UK Holstein F riesian sires for calving ease and related traits. Animal Sci. 68: 413-422. Meijering, A., 1984: Dystocia ans stillbirth in cattle - a review of causes, relations and implications. Livest. Prod. Sci., 11: 143-177. Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A.; Teller, H., 1953: Equations of state calculations by fast computing machines. J. Chemical Physics 21: 1087-1091. Olesen, I.; Perez-Enciso, M.; Gianola, D.; Thomas, D. L., 1994: A comparison of normal and nonnormal mixed models for number of lambs born in Norwegian sheep. J. Anim. Sci. 72: 1166-1173. Philipsson, J ., 1976: Studying calving difficulty, stillbirth, and associated factors in Swedish cattle breeds. V. Effects of calving performance and stillbirth in Swedish Fresian heifers on productivity in the subsequent lactation. Acta. Agric. Scand. 26: 230-260. Pollak, E. J.; Freeman, A. E., 1976: Parameter estimation and sire evaluation for dystocia and calf size in Holsteins. J. Dairy Sci. 59: 1817-1825. Rarnirez-Valverde, R.; Misztal, 1.; Bertrand, J. K., 2001: Comparison of threshold vs linear and animal vs sire models for predicting direct and maternal genetic effects on calving difficulty in beef cattle. J. Anim. Sci. 79: 333-338. Rogers, W. H.; Tukey, J. W., 1972: Understanding some long tailed distributions. Statistica Neerlandia 26 : 21 1-226. Robert-Granie, c.; Ducrocq, V.; F oulley, J. L., 1997: Heterogeneity of variance for type traits in the Montbeliarde cattle breed. Genet. Sel. Evol. 29: 545-570. Rosa, G. J. M.; 1999: Robust mixed linear models in quantitative genetics: Bayesian analysis via Gibbs sampling, in: Proceedings of International Symposium on Animal Breeding and Genetics, 21-24 September 1999, Brazil, pp. 133-159. 35 San Cristobal, M.; Foulley, J. L.; Manfredi, E., 1993: Inference about multiplicative heteroskedastic components of variance in a mixed linear Gaussian model with an application to beef cattle breeding. Genet. Sel. Evol. 25: 3-30. Simianer, H.; Schaeffer, L. R., 1989: Estimation of covariance components between one continuous and one binary trait. Genet. Sel. Evol. 21 : 303-315. Sorensen, D. A.; Andersen, S.; Gianola, D.; Korsgaard, I., 1995: Bayesian inference in threshold models using Gibbs sampling. Genet. Sel. Evol. 27: 229-249. Stiratelli, R.; Laird, N.; Ware, J. H., 1984: Random-effects models for serial observations with binary response. Biometrics. 40: 961-971. Stranden, I. J ., 1996: Robust mixed effects linear models with t-distributions and application to dairy cattle breeding. PhD thesis, University of Wisconsin-Madison. Stranden, 1.; Gianola, D.; 1998: Attenuating effects of preferential treatment with Student-t mixed linear models: a simulation study, Genet. Sel. Evol. 30: 565-583. Stranden, 1.; Gianola, D.; 1999: Mixed effects linear models with t-distributions for quantitative genetic analysis: a Bayesian approach, Genet. Sel. Evol. 31 : 25-42. Tanner, M. A.; Wong, W. H., 1987: The calculation of posterior distributions by data augmentation (with discussion). J. Am. Stat. Assoc. 82: 528-550. Tempelman, R, J .; Gianola D., 1993: Marginal maximum likelihood estimation of variance components in Poisson mixed models using Laplacian integration. Genet. Sel. Evol. 25 : 305-319. Tempelman, R. J ., 1998: Generalized linear mixed models in dairy breeding. J. Dairy Sci. 81: 1428-1444. Van Tassell, C. P.; Van Vleck L. D.; Gregory, K. E., 1998: Bayesian analysis of twinning and ovulation rates using a multiple-trait threshold model and Gibbs sampling. J. Anim. Sci. 76: 2048-2061. Varona L.; Misztal, I.; Bertrand, J. K., 1999: Threshold-linear versus linear-linear analysis of birth weight and calving ease using an animal model: I. Variance component estimation. J. Anim. Sci. 77: 1994-2002. Von Rohr, P.; Hoeschele, I., 2002: Bayesian QTL mapping using skewed Student-t distribution. Genet. Sel. Evol. 34: 1-21. Walker, D.; Ritchie, H.; Hawkis, D.; Gibson, C., 1994: Pelvic measurement and calving difficulty in beef cattle. MSU extension beef bulletins. 23300001. 36 Walsh, B., 2000: Markov Chain Monte Carlo and Gibbs sampling. Lecture notes for EEB 5962. Wang, C. S.; Rutledge, J. J.; Gianola, D., 1993: Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genet. Sel. Evol. 25 : 41-62. Wang, C. S.; Rutledge, J. J .; Gianola, D., 1994a: Bayesian analysis of mixed linear models vi Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26: 91-115. Wang, C. S.; Gianola, D.; Sorensen, D.; Jensen, J .; Christensen, A.; Rutledge, J. J., 1994b: Response to selection for litter size in Danish landrace pigs-A Bayesian analysis. Theor. And Appl. Genet. 88: 220-230. Wang, C. S.; Quaas, R. L.; Pollak, E. J ., 1997: Bayesian analysis of calving ease score and birth weights. Genet. Sel. Evol. 29: 117-143. Weigel K. A.; Gianola, D., 1992: Estimation of heterogeneous within-herd variance components using empirical bayes method: A simulation study. J. Dairy Sci. 75 : 2824-2833. Weller, J. I.; Misztal, I; Gianola, D., 1988: Genetic analysis of dystocia and calf mortality in Israeli-Holsteins by threshold and linear models. J. Dairy Sci. 71: 2491-2501. Whittier, J .C .; Thorne, J .G., 1995: Assisting the beef cow at calving time. Agricultural publication G2007. Published by University extension, University of Missouri- Columbia. Winkelman, A.; Schaeffer, L. R., 1988: Effect of heterogeneity of variance in dairy sire evaluation. J. Dairy Sci. 71: 3033-3041. Wright, S., 1934: An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19: 506-5 18. Zeger, S. L.; Karim, M. R., 1991: Generalized linear models with random effects-A Gibbs sampling approach. J. Am. Stat. Assoc. 86: 79-86. 37 Table I. Itemized dystocia costs (Albera et al. 1999). Parameter Dollars' Cost of caesarean section 99.99 Labor of the farmer per hour 4.58 Price of a newborn male calf 545.46 Price of a newborn female calf 409.12 Cost of involuntary culling after first calving 306.40 Cost of involuntary culling after other calvings 483.67 'Based on lEuro = 8 0.8802. 38 - ”do ".8 55685 - Rd 85 =8 Bx - 35 N3 .8me.. - m3 8 .o 582:: - and $3 2820: - 93 mg 5328 - mm... 8.9 32820 - :3 35 53.5% - E .o and 8?... 32m: 38: 28s romeo .23 .358 36. 86 36 098882 msoocowESo: o _ .9 8.: 3° 83:38: 425. 2285 88s 8225 3.3%.? 8.385 8.32 .o 328.55 4.23. 325 883 a a .2an 2.? 2:. 3° 52%: 0202 325 $8: a s a: 839...? 8.32 .o 3.335 £23.00 2222 22.85. 88: .... a 22.; 8.32.? 8.386 8.32 .o 5328 0202 3:: 88: a .0 «83> - - 8.32 .c 52%: 42mm 228$ 88: .3 .0 U359: - - 8.388 52%: 42mm 8:: 38: a a 63.622 - - 8de _ .o £2583 - - modfic _ .o 2820: - - 8.338 .355 32mm 2285 $8: ...... .0 V3.522 - - 5.336 5395.— - - 8.336 289.0: - - 8.3_ 3 22820 32.“: has: 38: .... a 63.622 e _ .o- 2 .c a _ .o 55555 - 228:: :8: .a .... 9.8 c _ .o- So 8... 5320: m3 _ 3 3° 85:52 .282: 223:: :8: .... a 6&5: - - 3o 5220: 252:2 5:: 53: SEE: a. EN; ...: 32m 8502 382 Sam .808 030 wag—no Lou 6528.6 80:3 Echo Havana H 86:58 A35 833:8 88% 3838388 an .35 banana; 3832. .35 £385: 525 .: as; 38 Maternal Components (Dam Effects)1 I Paternal Components (Sire Effects) —DAM’S GENES ‘l f l SIRE’S GENES Size & Muscling Fetal Effects CALF’S GENES Direct Effects Matern al Effects V V Maternal Dam’s Pelvic L——> CALF’S SIZE Preparation Size Birth weight, shape, sex Maternal Effects | i , + CALVING DIFFICULTIES — DYSTOCIA f ‘ + Still Births Impaired Fertility Loss of dams Labor costs Reduced Milk Production Figure l. A flowchart interaction of direct additive and maternal genetic effects with other factors in terms of their effect on calving difficulty (Burfening 1991). 40 Threshold 2 Threshold 1 Threshold 3 Assisted Assisted C‘ d Unassisted easy difiicult E118???“ an calving calving calving ‘1‘ 0 011W '3 ’3 " 1 0 1 z 3 Llablhty Figure 2. Mapping of underlying liabilities to observed calving ease phenotypes. Thresholds 1, 2, and 3 delimit normally distributed underlying liabilities into 4 calving ease phenotypes (unassisted calving, assisted easy calving, assisted difficult calving, and Caesarean and embryotomy). 41 CHAPTERZ Bayesian Inference Strategies for the Prediction of Genetic Merit Using Threshold Models with an Application to Calving Ease Scores in Italian Piemontese Cattle ABSTRACT First parity calving difficulty scores from Italian Piemontese cattle were analyzed using a threshold mixed effects model. The model included the fixed effects of age of dam and sex of calf and their interaction and the random effects of sire, maternal grandsire, and herd-year-season. Covariances between sire and maternal grandsire effects were modeled using a numerator relationship matrix based on male ancestors. Field data consisted of 23,953 records collected between 1989 and 1998 from 4,741 herd- year-seasons. Variance and covariance components were estimated using two alternative approximate marginal maximum likelihood (ML) methods, one based on expectation- maximization (EM) and the other based on Laplacian integration. Inferences were compared to those based on three separate runs or sequences of Markov Chain Monte Carlo (MCMC) sampling in order to assess the validity of approximate MML estimates derived from data with similar size and design structure. Point estimates of direct heritability were 0.24, 0.25 and 0.26 for EM, Laplacian and MCMC (posterior mean), respectively whereas corresponding maternal heritability estimates were 0.10, 0.11 and 0.12, respectively. The covariance between additive direct and maternal effects was found to be not different from zero based on MCMC-derived confidence sets. The conventional joint modal estimates of sire effects and associated standard errors based on MML estimates of variance and covariance components differed little from the respective 42 posterior means and standard deviations derived from MCMC. Therefore, there may be little need to pursue computation—intensive MCMC methods for inference on genetic parameters and genetic merits using conventional threshold sire and maternal grandsire models for large datasets on calving ease. INTRODUCTION Calving ease has a significant economic impact on beef and dairy production through increased risk to the survival of both calf and cow (McGuirk et al. 1998). The Italian Piemontese breed has a particularly high percentage of difficult calvings (Carnier et al. 2000) relative to reported estimates from other breeds (Varona et al. 1999; Bennett and Gregory 2001 ). Accurate inference on genetic parameters and genetic merit are important precursors for effective sire selection strategies to improve calving ease. The threshold mixed model developed by Gianola and Foulley (1983) is one example of generalized linear mixed models (Breslow and Clayton 1993) which have become increasingly popular for the multifactorial mixed effects analyses of non-normal phenotypes. There have been a number of recent studies illustrating advantages of threshold mixed models over linear mixed models for the analysis of ordinal calving ease data (Luo et al. 2001; Rameriz-Valverde et al. 2001). Empirical Bayes or maximum a posteriori (MAP) estimates of breeding values are based upon a joint maximization of the posterior density of fixed and random effects conditioned upon variance components being known, or set equal to their estimates (Foulley et al. 1987). These MAP estimates are currently used for published genetic evaluations on calving ease (Berger 1994; Wang et al. 1997). Variance components and 43 their derivative genetic parameters in a threshold mixed model have been historically estimated using approximate and deterministic marginal maximum likelihood (MML) techniques. One such technique is based upon an approximate invocation of the expectation-maximization (EM) algorithm as proposed by Harville and Mee (1984) and Stiratelli et al. (1984). Some significant biases on heritability estimates using this algorithm were reported in simulation studies conducted by Hoeschele et al. (1987), Hoeschele and Gianola (1989) and Simianer and Schaeffer (1989), particularly when contemporary subclass sizes were small and genetic parameters were large in magnitude. A second approximate MML algorithm is based on Laplace's method and was first adapted to animal breeding by Tempelman and Gianola (1993). In a sire model simulation study, Tempelman (1998) determined that Laplacian MML estimates tended to be much less biased than EM-MML estimates of variance components. Since the Laplacian procedure is analogous to the derivative-free REML procedure introduced by Graser et al. (1987), it also facilitates hypothesis testing of variance components via marginal likelihood ratio tests (Tempelman, 1998). Markov Chain Monte Carlo (MCMC) techniques allow small sample inference and have been utilized in animal breeding and genetics (Sorensen et al. 1995; Hoeschele and Tier 1995; Varona et al. 1999; Luo et al. 1999; Luo et al. 2001). The relative improvements in properties of MCMC point estimates (posterior means) of variance components relative to approximate MML estimates from data characterized by small contemporary subclass sizes have been demonstrated in Simulation studies by Hoeschele and Tier (1995), although differences were shown to be increasingly smaller with larger sample sizes. Even so, in the largest simulated dataset considered by Hoeschele and Tier (1995), herd-year-season (HYS) subclasses were large in average size (>50) and maternal effects were not considered. In contrast, many breed association datasets used for genetic evaluations are large yet characterized by many small contemporary subclass Sizes. The computing requirements for MCMC are not trivial for large datasets, and the need for MCMC, not only for inference on variance components but as well as on breeding values, needs to be validated. Furthermore, execution of MCMC requires far greater care than conventional MAP implementations for national genetic evaluations. The objectives of this study were 1) to determine and compare variance component and corresponding genetic parameter estimates for calving ease scores in an Italian Piemontese population using the two MML procedures (EM and Laplace), and MCMC based on a threshold Sire-maternal grandsire model, and 2) to compare conventional MAP estimates of breeding values and approximate standard errors with corresponding MCMC posterior means and stande deviations in order to assess the relative need for MCMC methods in a national genetic evaluation system for calving ease, and 3) to demonstrate the utility of a Metropolis-Hastings update on improving MCMC mixing on the threshold parameters. MATERIALS AND METHODS Data First pan'ty calving ease scores recorded on Italian Piemontese cattle from January, 1989 to July, 1998 by ANABORAPI (Associazione Nazionale Allevatori Bovini di Razza Piemontese, Strada Trinita 32a, 12061 Carru, Italy) were used in this study. Only herds with at least 50 records over that time period were considered, leaving a total 45 of 23,953 records. Calving difficulty was coded into five categories by breeders and recorded by technicians who visited the farmers monthly: 1) unassisted delivery, 2) assisted easy calving 3) assisted difficult calving 4) caesarean section and 5) foetotomy. Categories 4 and 5 were combined for analyses as the incidence of foetotomy was less than 0.5%. The general frequencies of first parity calving ease scores in the data set were 2,747 (11.47%) for unassisted delivery, 14,131 (58.99%) for assisted easy calving, 3,683 (15.38%) for assisted difficult calving and 3,392 (14.16%) for caesarean section and foetotomy. The effects of age of dam and sex of calf and their interaction were modeled by combining eight different age groups (20 to 23, 23 to 25, 25 to 27, 27 to 29, 29 to 31, 31 to 33, 33 to 35, and 35 to 38 months) with sex of calf for a total of 16 nominal subclasses. Herd-year-season (HYS) subclasses were created from combinations of herd, year, and two different seasons (from November to April and from May to October) as in Carnier et al. (2000), except that HYS was treated as random in this study. The total number of identified sires in the pedigree file was 9,090. A total of 1,817 of these Sires were identified as being both calf sires and calf maternal grandsires, thereby indicating the density of genetic relationships accruing from selection over time. After pedigree pruning (i.e. treating as unknown those sires that do not have daughters with calving ease records and appear in the pedigree file only once), the number of sires that uniquely contributed to inference on genetic parameters and breeding values was determined to be 3,624. 46 Threshold Model Calving ease scores are determined by unobserved underlying continuous variables or liabilities and a set of fixed thresholds, r, < 12.... < r m-l 9 with To = —oo and rm = 00, where m is the number of categories. An observed calving ease score is dependent upon underlying variable, which is bounded between two unobserved thresholds (Gianola and Foulley 1983). More specifically, calving ease scores y,~ for individual i, in this study are defined by the following bins for (1,, the underlying liability for individual i: ‘ 1 To < U ,. s 2', _ 2 2', < U .- S 2', (1) fi-h Q0.99). Posterior standard errors of the sire effects can be used to derive approximate accuracies of estimated genetic merits. These standard errors were determined as the standard deviation of the samples from MCMC and from the square root of the diagonals of C5,, based on both EM and Laplace estimates of variances. Simple linear regression analyses of these standard errors based on MCMC versus MAP-EM, MCMC versus MAP-Laplace and MAP-Laplace versus MAP-EM, resulted in intercepts not different from 0 but estimated slopes of 1.09, 1.03, and 1.07, respectively, indicating that the posterior standard errors were on average larger under MCMC than under MAP estimates. This was anticipated as uncertainty on the variance components is accounted for in fully marginal inference using MCMC but not so in MAP. In all cases, the correlation between the estimated standard errors was near one. 59 CONCLUSIONS In spite of the results of several smaller simulation studies, our work showed that there were no appreciable differences between approximate MML/MAP versus MCMC methods for inferences on genetic parameters and genetic merit on calving ease under a threshold sire and maternal grandsire model and derived from a relatively large data set. Furthermore, MCMC mixing was substantially improved relative to previously published implementations due to the use of a joint MH update for the threshold parameters. Maternal genetic variation was estimated to be an important source of variability although the genetic correlation between direct and maternal effects was not significant. The impact of random versus fixed HYS effects on the estimates of this genetic correlation warrants further study. Hence, there does not appear to be a pressing need to use MCMC methods for national genetic evaluations of calving ease based on conventionally specified threshold and maternal grandsire sire models. However, this may not be true for animal model specifications. 6O REFERENCES Albert, J. H.; Chib, S., 1993: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88: 669-679. Bennett, G. L.; Gregory, K. E., 2001: Genetic (co)variances for calving difficulty score in composite and parental populations of beef cattle: I. Calving difficulty score, birth weight, weaning weight, and postweaning gain. J. Anim. Sci. 79: 45-51. Berger, P. J ., 1994: Genetic prediction for calving ease in the United States: Data, models, and use by the dairy industry. J. Dairy Sci. 77: 1146-1153. Bink, M. C. A. M.; Quaas, R. L.; Van Arendonk, J. A. M., 1998: Bayesian estimation of dispersion parameters with a reduced animal model including polygenic and QTL effects. Genet. Sel. Evol. 30: 103-125. Breslow, N.; Clayton, D., 1993: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88: 9-25. Carnier, P.; Albera, A.; Dal Zotto, R.; Groen, A. F.; Bona, M.; Bittante, G., 2000: Genetic parameters for direct and maternal calving performance over parities in Piemontese cattle. J. Anim. Sci. 78: 2532-2539. Cowles, M. K., 1996: Accelerating Monte Carlo Markov Chain convergence for cumulative—link generalized linear models. Stat. and Comp. 6: 101-111. Dempster, E.R.; Lerner, I. M., 1950: Heritability of threshold characters. Genetics 35: 212-236. Foulley, J. L.; Irn, S.; Gianola, D.; Hoeschele, I., 1987: Emprical Bayes estimation of parameter for n polygenic traits. Genet. Sel. Evol. 19: 197:224. Gelman, A.; Rubin, D., 1992: Inference from iterative simulation using multiple sequences. Stat. Sci. 15: 201-224. Geyer, C. J ., 1992: Practical Markov chain Monte-Carlo (with discussion). Stat. Sci. 7: 467-51 1. Gianola, D.; Foulley, J. L., 1983: Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15 :201-224. Graser, H. U.; Smith, S. P.; Tier, B., 1987: A derivative-free approach for estimating variance components in animal models by restricted maximum likelihood. J. Anim. Sci. 64:1362-1370. 61 Harville, D. A.; Mee, R. W., 1984: A mixed-model procedure for analyzing ordered categorical data. Biometrics 40: 393-408. Henderson, C. R., 1976: Inverse of a matrix of relationships due to sires and maternal grandsires in an inbred population. J. Dairy Sci. 59: 1585-1588. Hoeschele, 1.; Gianola, D.; Foulley, J. L., 1987: Estimation of variance components with quasi-continuous data using Bayesian methods. J. Anim. Breed. Genet. 104: 334- 349. Hoeschele, 1.; Gianola, D., 1989: Bayesian versus Maximum Quasi Likelihood methods for sire evaluation with categorical data. J. Dairty Sci. 72: 1569-1577. Hoeschele, 1.; Tier, B., 1995: Estimation of variance components of threshold characters by marginal posterior mode and means via Gibbs sampling. Genet. Sel. Evol. 27: 519-540. Luo, M. F .; Boettcher, P. J .; Dekkers, J. C. M.; Schaeffer, L. R., 1999: Bayesian analysis for estimation of genetic parameters of calving ease and stillbirth for Canadian Holsteins. J. Dairy Sci. 82: 1848. Luo, M. F.; Boettcher, P. J .; Schaeffer, L. R.; Dekkers, J. C. M., 2001: Bayesian inference for categorical traits with an application to variance components estimation. J. Dairy Sci. 84: 694-704. Manfredi, E. J .; San Cristobal, M.; Foulley, J. L., 1991a: Some factor affecting the estimation of genetic parameters for cattle dystocia under a threshold model. Anim. Prod. 53: 151-156. Manfredi, E. J .; Ducrocq, V.; Foulley, J. L., 1991b: Genetic analysis of dystocia in dairy cattle. J. Dairy Sci. 74: 1715-1723. Matos, C. A. P.; Thomas, D. L.; Young, L. D.; Gianola, D., 1994: Analysis of lamb survival using linear and threshold models with maternal effects. 5'“ World Congress of Genetics Applied to Livestock Production, Ontario, Canada. Vol. 18: 426-429. Matos, C. A. P.; Thomas, D. L.; Gianola, D.; Tempelman, R. J .; Young, L. D., 1997: Genetic analysis of discrete reproductive traits in sheep using linear and nonlinear models. 1. Estimation of genetic parameters. J. Anim. Sci. 75 : 76-87. McGuirk, B. J .; Going, I; Gilmour, A. R., 1998: The genetic evaluation of beef sires used for crossing with dairy cows in the UK. 2. Genetic parameters and sire merit predictions for calving survey trits. Animal Sci. 66: 47-54. 62 McGuirk, B. J .; Going, I; Gilmour, A. R., 1999: The genetic evaluation of UK Holstein Friesian sires for calving ease and related traits. Animal Sci. 68: 413-422. Meyer, K., 1989: Restricted maximum likelihood to estimate variance components for animal models with several random effects using a derivative-free algorithm. Genet. Sel. Evol. 21 : 317-340. Misztal, I.; Perez-Enciso, M., 1998: FSPAK90 - A Fortran 90 interface to sparse-matrix package FSPAK with dynamic memory allocation and sparse matrix structure. Proc. 6th World Cong. Gen. Appl. Livest. Prod. V0127: 467-468. Nelder J. A.; Mead, R., 1965: Computer Journal. 7: 308-313. Olesen, 1.; Perez-Enciso, M.; Gianola, D.; Thomas, D. L., 1994: A comparison of normal and nonnormal mixed models for number of lambs born in Norwegian sheep. J. Anim. Sci. 72: 1166-1173. Rafiery, A. E.; Lewis, S. M., 1992: Pages 765-776 in Bayesian Statistics 4. J. M. Bernardo, J. O. Berger, A. P. David, and A. F. M. Smith, ed. Oxford Univ. Press, Oxford, UK. Ramirez-Valverde, R.; Misztal, 1.; Bertrand, J. K., 2001: Comparison of threshold vs linear and animal vs sire models for predicting direct and maternal genetic effects on calving difficulty in beef cattle. J. Anim. Sci. 79: 333-338. Searle, S. R., 1982: Matrix Algebra Useful for Statistics. John Wiley & Sons, New York. Simianer, H.; Schaeffer, L. R., 1989: Estimation of covariance components between one continuous and one binary trait. Genet. Sel. Evol. 21: 303-315. Smith, S. P.; Graser, H. U., 1986: Estimating variance components in a class of mixed models by restricted maximum likelihood. J. Dairy Sci. 69: 1156-1165. Sorensen, D. A.; Andersen, S.; Gianola, D.; Korsgaard, 1., 1995: Bayesian inference in threshold models using Gibbs sampling. Genet. Sel. Evol. 27: 229-249. Stiratelli, R.; Laird, N.; Ware, J. H., 1984: Random-effects models for serial observations with binary response. Biometrics. 40: 961-971. Strarn, D. 0.; Lee, J. W., 1994: Variance components testing in the longitudinal mixed effects model. Biometrics. 50: 1171-1177. Tempelman, R, J .; Gianola D., 1993: Marginal maximum likelihood estimation of variance components in Poisson mixed models using Laplacian integration. Genet. Sel. Evol. 25: 305-319. 63 Tempelman, R. J ., 1998: Generalized linear mixed models in dairy breeding. J. Dairy Sci. 81: 1428-1444. Uimari, P.; Thaller, G.; Hoeschele, 1., 1996: The use of multiple markers in a Bayesian method for mapping quantitative trait loci. Genetics. 143: 1831-1842. Varona L.; Misztal, 1.; Bertrand, J. K., 1999: Threshold-linear versus linear-linear analysis of birth weight and calving ease using an animal model: I. Variance component estimation. J. Anim. Sci. 77: 1994-2002. Wang, C. S.; Quaas, R. L.; Pollak, E. J ., 1997: Bayesian analysis of calving ease score and birth weights. Genet. Sel. Evol. 29: 117-143. Weller, J. 1.; Misztal, I; Gianola, D., 1988: Genetic analysis of dystocia and calf mortality in Israeli-Holsteins by threshold and linear models. J. Dairy Sci. 71: 2491-2501. Table 1. Effective sample sizes for (co)variance components and threshold parameters using MCMC for first parity calving ease scores in Piemontese cattle. PARAMETER 0:2 0'; 0m 0': 72 1'3 ESSa 1,799 1,705 1,799 6,953 3,211 3,110 “The combined effective sample sizes based on the sum of determinations from 3 independent MCMC chains post bum-in, each based on 80,000 samples. 65 Table II. Estimation of variance-covariance components and genetic parameters for calving difficulty in Piemontese cattle usirg EM, Laplacian, and MCMC algorithms. Variance EMa Laplaceb MCMC component PMc PMDd PMEc PPIf 0'32 0.083 0.091i‘0.01 1 0.095i'0.014 0.093 0.095 0.070-0. 123 0'3, 0.055 0.060i0.008 0.064i0.010 0.062 0.062 0.046—0.084 0's," 0.041 0.045i0.007 0.045i0.008 0.044 0.044 0.029-0.061 0'; 0.185 0.208i0.013 0.217i0.013 0.217 0.217 0.192-0.243 0'12, 0.331 0.363 038010.055 0.373 0.373 0.278-0.493 0;, 0.140 0.151 0.170i0.037 0.163 0.163 0106-0252 O’DM -0.002 -0.002 -0.011i0.035 -0.01 1 -0.011 -0.084—0.055 hf) 0.236 0.251 0.259i0.0348 0.253 0.258 0.194-0.330 hi4 0.100 0.105 0.116i0.0256 0.112 0.114 0.072-0.173 rDM -0.008 -0.009 -0.034i0.136 -0.051 -0.037 -0.292-0.245 alJoint MML estimates using EM algorithm. l’Joint MML estimates using Laplace algorithm :t asymptotic standard error. °Posterior mean i posterior standard deviation. dMarginal posterior mode. eMarginal posterior median. f95% posterior probability interval. 66 CHAPTER 3 An Assessment of Cumulative t-Link Threshold Models for the g Quantitative Genetic Analysis of Calving Ease Scores ABSTRACT The Student-t and other heavy-tailed distributions have been specified for residuals to facilitate robust statistical inference in linear mixed models. In this study, we develop a hierarchical threshold mixed model based on a cumulative t-link specification for the analysis of ordinal data, specifically, calving ease scores. The validation of our model and our Markov Chain Monte Carlo (MCMC) algorithm was carried out on simulated data from normally and t4 (i.e. a t distribution with 4 degrees of freedom) distributed populations using the deviance information criterion (DIC) and a related measure to validate recently proposed model choice criteria. The simulation study indicated that although inference on the degrees of freedom parameter is possible, MCMC mixing was problematic. Nevertheless, the DIC was shown to be a satisfactory measure of model fit to data. We applied a sire maternal and grandsire cumulative t-link model to a calving ease dataset from 8,847 Italian Piemontese first parity dams. The cumulative :4 -link model was shown to lead to posterior means of direct and maternal heritabilities (0.40:t0.06, 0.11i0.04) and a direct maternal genetic correlation (- 0.58i0.15) that were not different from the corresponding posterior means of the heritabilities (0.42:1:0.07, 0.14i0.04) and the genetic correlation (-0.55:t0.14) inferred under the conventional cumulative probit link threshold model. Furthermore, the correlation (>.99) between posterior means of sire progeny merit from the two models 67 suggested no meaningful rerankings. Nevertheless, the cumulative t-link model was decisively chosen as the better fitting model using DIC. INTRODUCTION Data quality is an increasingly important issue for genetic evaluation of livestock, both from a national and international perspective (Emanuelson et al. [13]). Breed associations and government agencies typically invoke arbitrary data quality control edits in order to minimize the impact of recording error, preferential treatment and/or injury/disease on predicted breeding values (Bertrand and Wiggans [4]) in the belief that the data residuals should be normally distributed. It has been recently demonstrated that the specification of Student t distributed residuals in linear mixed models may effectively mute the impact of residual outliers, particularly in situations where preferential treatment of some breedstock may be anticipated (Stranden and Gianola [3 8]). Based on the work of Lange et a1. [22] and others, Stranden and Gianola [3 9] developed the corresponding hierarchical Bayesian models for animal breeding, using Markov Chain Monte Carlo (MCMC) methods for inference. In their models, residuals were specified as either having independent (univariate) t distributions or multivariate t distributions within herd clusters. Outside of possibly longitudinal studies, the multivariate specification is of dubious merit ([32], [38], [39]) such that all of our subsequent discussion pertains to the univariate specification only. A Student t distribution can be represented as a scale mixture of normals with Gamma distributed scaling factors (Lange et al. [22]). Alternative specifications of scale mixture of normals, all of which lead to heavy-tailed residual densities relative to the normal, are further considered by Carlin and Polson [6] and in an animal breeding context by Rosa [32]. Using Bayes factors approximations, Rosa [32] noted that these heavier tailed models provided better fits to a dataset on birth weights in rats than a model based on a Gaussian error distribution. More recent applications include linkage mapping of quantitative trait loci (von Rohr and Hoeschele [44]). Auxiliary traits such as calving ease or milking speed are often subjectively scored on an ordinal scale. It might then be anticipated that data quality would be an issue of greater concern in these traits than more objectively measured production characters, particularly since record keeping is generally unsupervised, being the responsibility of the attending herdsperson. Luo et al. [23] has suggested that a decline in the diligence of data recording was partially responsible for their lower heritability estimates of calving ease relative to earlier estimates from the same Canadian Holstein population. The cumulative probit link (CP) generalized linear mixed model, otherwise called the threshold model, is currently the most commonly used genetic evaluation model for calving ease ([3, 46]). MCMC methods are particularly well suited to this model since the augmentation of the joint posterior density with normally distributed underlying or latent liability variables facilitate implementations very similar to those developed for linear mixed effects models ([1, 36]). A cumulative I link (CT) model has been proposed by Albert and Chib [l] for the analysis of ordinal categorical data, thereby providing greater modeling flexibility relative to the CP model. The CT model can be created by simply 69 augmenting the joint posterior density with t-distributed rather than normally distributed underlying liability variables based on a Gamma scale mixture of normals. The objectives of this study were to validate MCMC inference of the CT generalized linear mixed (sire) model via a simulation study and to compare the fit of this model with the CP model for the quantitative genetic analysis of calving ease scores in Italian Piemontese cattle. MODEL CONSTRUCTION Suppose that elements of the n x 1 data vector Y = {Y,.} , can take values in any one of C mutually exclusive ordered categories. The classical CP model for ordinal data (Gianola and Foulley [17]) can be written as follows: Prob(Y,. ___ j | [3,“,1') = (D[Tj “(xiP+zi“)]_¢[Tj-I ’(xiP+zi“)] , (1) 0' 0' where j = 1,2,. . .,C denotes the index for categories. Also <1>(.) denotes the standard normal cumulative distribution fimction, B and u are the vectors of unknown fixed and random effects, and 1" = [TO I, 21.] is a vector of unknown threshold parameters satisfying 1, < 2'2... < Q with To = -oo and rc = +00. Furthermore, x, and z, are known incidence row vectors. Latent liability variables (L = {L,.};,) can be introduced to alternatively define the same specification as in (1) but in two hierarchical stages: ,. Prob(Y,=j|L,,r)=Zl(r,_, 0 and degrees of freedom v>0 for i=1,2,. . .,n. In turn, equation (4) can be represented by a two-stage scale mixture of normals: 2 L,- mumm— ~ NW ”Leif—J (5a) L exp {-9g-v} (5b) 71 Note that (5b) specifies a Gamma density with parameters v/2 and v/2, thereby having an expectation of 1. The remaining stages of our hierarchical model are characteristic of animal breeding models. We write B ~ 1303) (6) where p(fl) is a subjective prior, typically specified to be flat or vaguely informative. F urthennore, the random effects are typically characterized by a structural multivariate prior specification: ultp ~ p(ul¢) ~ N(0,G(¢)) (7) Here G(qr) is a variance-covariance matrix that is a function of several unknown variance components or variance-covariance matrices in q) , depending on whether or not there are multiple sets of random effects and/or specified covariances between these sets; an example of the latter is the covariance between additive and maternal genetic effects. Furthermore, flat priors, inverted Gamma densities, inverted Wishart densities or products thereof may be specified for the prior density p(tp) on q) , depending, again, on the number of sets of random effects and whether there are any covariances thereof (Jensen et al. [20]). Finally, a prior is required for the degrees of freedom parameter v. We use the prior p(v) cc 6+1_v)'2' (8) which is consistent with a Uniform(0,1) prior on l/(l+v). As with CP models, there are identifiability issues involving elements of ‘t with 0'} such that constraints are necessary. Typically, one element of 1', , 2'2 ,..., To, is restricted 72 to an arbitrary value (e.g. 2', =0) along with 0'3 = 1. This is the parameterization we chose such that inference on 0'3 is not subsequently considered in this paper. Presuming that the elements of Y are conditionally independent given [1 and u, we can write the joint posterior density of all unknown parameters and latent variables (L) as follows: p(fl,u,r,¢,v,L,l l y) 0C(ljpmbozy''L'")1"(Li'1"“"’3”'i)1’(4lV)]P(Ii)rv(ul(101200.00) (9) n i=1 ' where 1. = {1,} A MCMC inference strategy involves determining and generating random variables from the full conditional densities (FCD's) of each parameter or blocks thereof. Many of the FCD's can be directly derived using results from Sorensen et al. [36] jointly with results from Stranden and Gianola [39]. Let 0= [6' u']'. It can be readily shown that the FCD of Q is multivariate normal: 0|y,v,1.,L~N(0,W'R"W+X‘) (10) where R'l -diag{2 }" isadiagonal matrix 2’ -[0Pxp 0M J W-[X Z] for - i ,3 9 — -l 9 - ' 001p (9(a)) X=[xl x2 x,,]', Z=[z, 22 z,,]',and é=['}]=[w'R"W+2']"w'R"L (11) u Generation of individual elements 0,, j=1,2,. . .,p+q or blocks thereof of 0 from their respective FCD is straightforward using the strategy presented by Wang et al. [45]. 73 The FCD of individual elements of L and T are straightforward to generate from using results from Sorensen et al. [36]. We, however, prefer the Metropolis-Hastings and method of composition joint update of L and I presented by Cowles [11]. She demonstrated and we have further noted in our previous applications (Kizilkaya et al. [21]) that the resulting MCMC mixing properties using this joint update are vastly superior to using separate Gibbs updates on individual elements of L and ‘t as outlined by Sorensen et al. [36]. If some partitions of (p form a variance-covariance matrix, then their respective FCD can be readily shown to be inverted-Wishart (Jensen et al. [20]) whereas if other partitions of (p involve scalar variance but no covariance components, then the FCD of each component can be shown to be inverted-gamma. The F CD of 7t, can be shown to be v+l — -| p(/l,. | 1-, ,L,0, v) 0: kg 2 J x exp[--:—'((L,-x,fl-z,u)2 + v)] (12) that is, the kernel in (12) specifies that distribution to be v +1 1 . . 2 Gamma(——2—,§(v+(L,-x,fl-z,u) )J. Here L,- denotes all elements of 1. = {1,} except for 1,, i =1,2,....n. Finally, the FCD of v can be shown to be I! o“ p(vlfl,u,L,).)oc —2-— [II/1,7] exp(--:-/1,D(—l—— (13) I7(V/2) 121 1+ v)2 74 given the specification for p(v) in (8). Equation (13) is not a recognizable density such that a Metropolis-Hastings update is required. We utilized a random walk implementation (Chib and Greenberg [10]) of Metropolis-Hastings sampling; specifically, a normal density with expectation equal to the parameter value from the previous MCMC cycle was used as the proposal density for drawing from the FCD of x = log (v) , using equation (13) and the necessary J acobian for this transformation. The Metropolis-Hastings acceptance ratio was tuned to intermediate rates (40-50%) during the MCMC burn-in period to optimize MCMC mixing (Chib and Greenberg [10]). Since the variance of a t-density is not defined for v S 2, we truncate sampled from (13) such that v > 2, or equivalently K > log(2). MODEL COMPARISON Model choice is an important issue that has not received considerable attention in animal breeding until only very recently ([19, 31]). Likelihood ratio tests have been used to compare differences in fit between various models and their reduced subsets; however, these tests do not facilitate more general model comparisons. The Bayes factor has a strong theoretical justification as a general model choice criterion; however algorithms for Bayes factors computations are either computationally intensive (e.g. Chib [9]) or numerically unstable (Newton and Raftery [30]). Furthermore, as Gelfand and Ghosh [15] indicate, Bayes factors lack clear interpretation in the case of improper priors which are particularly frequent specifications in animal breeding hierarchical models. Akaike's information criterion or Schwarz's Bayesian criterion are analytical measures that provide an asymptotic representation of Bayes factors and reflect a compromise between 75 goodness of fit and number of parameters. As the total number of parameters and latent variables often exceeds the number of observations in an animal breeding (e.g. animal model) analysis, the effective number of parameters in hierarchical models is not always so obvious. The MCMC sample average of the posterior log likelihoods, or data sampling log densities, may be used as a means for comparing different models (Dempster [12]); however, as Speigelhalter et al. [37] indicate, it is not always so obvious how to proceed when these densities are similar but the number of parameters and/or the numbers of hierarchical stages of the candidate models vary. Speigelhalter et al. [37] proposed the deviance information criterion (DIC) for comparing alternative constructions of hierarchical models. The DIC is based on the posterior distribution of the deviance statistic, which is -2 times the sampling distribution of the data as specified in the first stage of a hierarchical model. It may not be so obvious, however, what exactly the data sampling stage is in a hierarchical model. For example, the data sampling stage for the CT model may be specified in one way as r, —(x,.[l +z,u) z',_, —(x,B + z,u) Prob(Y,. =j||3,u,1:,2,,0'3)=<1> 0' -(D 0 (14) e e a a given the specifications of (2a) and (5a) or it may be specified more marginally using (3). We prefer a more marginalized or heavier-tailed first stage specification such as (3) for CT and (l) for CP, potentially leading to a more stable implementation with justification provided by Satagopan et al. [34] but with their context being the stabilization of the Bayes factor estimator of Newton and Raftery [30]. 76 The DIC is computed as the sum of average Bayesian deviance (5) plus the "effective number of parameters" (pp) with respect to a model, such that smaller DIC values indicate better fit to the data. Let G denote the number of cycles after convergence in a MCMC chain. Furthermore, we represent all unknown parameters in the marginalized first stage specification by 3 = (B,u,r, v) with 3 excluding v=oo in the CP model. Then, for the CT model, then the average Bayesian deviance can be estimated using (3) by If 5 = -2[iZIog(Pr°b(Y =% | B'“~""J"'fl""))) g=l 121 where the superscript [g] denotes the MCMC cycle g, g = 1,2,. . .,G for the sampled value of the corresponding parameter. Furthermore, p0 can be estimated as p,, = D — 0(8) where D(§) = —2[:log Prob(Y = y, IB,fi,¥,lI-)). i=1 Here the bar notation (e. g. 8) denotes the corresponding posterior mean vector. We alternatively considered the conditional predictive ordinate (CPO) as the basis for model choice (Gelfand [14]). Defined for observation 1', we write the CPO as (i 7?(y,. |y_,.,M, ) = léZ(Prob(Y = y, I p181,u[gl,.rlg]))-| ]- g-l using (1) for the CP model (Model MI) and G -l 72(y, |y_,,M,)z[—éZ(Prob(Y = y, IBlgl,ulxl,11gl,lel))-l] s=l using (3) for the CT model (Model M2). Here y-i denotes all observations other than y,-. The log marginal likelihood (LML) of the data for a certain model, say Mk, can then be estimated as: LML = L(y | M.,) = filed/Hy. IL. M.» i=1 Larger values of LML indicate better model fit to the data. DATA Simulation Study A simulation study was used to validate the CT model and the utility of the DIC and the LML for model choice between CP and CT. Three replicated datasets were generated from each of two different populations as characterized by the distribution of the liability residuals. Population I had a residual density that was standard normal whereas Population II had a residual density that was standard Student-t distributed with scale parameter 03 = 1 and degrees of freedom ve = 4. All datasets were generated based on a simple random effects (sire) model with null mean. Liability data for 50 progeny from each of 50 unrelated sires was generated by summing independently drawn sire effects from N(0, 0',2 = 0.10) with independently drawn residuals from N(0, 0'3 = 1.00) for a total of 2500 records. These underlying liabilities were mapped to ordinal data with four categories based on the threshold parameter values of r, = —0.50, 1', =1.00, and 1, =2.00 for all populations. Ordinal data from each replicated dataset was analyzed using both CP and CT sire models. For purposes of parameter identifiability, we invoked the restrictions 0'} = 1 and r, = —0.50. As a positive control, the underlying liability data for 78 each replicate was analyzed using both normal and t distributed error mixed linear models. For the t distributed error model, the MCMC procedure adapted was similar to that presented in Stranden and Gianola [39], except that the degrees of freedom parameter (v>2) was inferred as a continuous (rather than discrete) parameter, using the Metropolis- Hastings update as presented earlier. Graphical inspection of the chains based on preliminary analyses was used to determine a common length of bum-in period. For each replicated data set within each population, a burn-in period of 20,000 cycles was discarded before saving samples from each of an additional 100,000 MCMC cycles. Furthermore, DIC and LML values were computed for each model on each replicated dataset to validate those measures as model choice criteria. In all cases, flat priors were invoked on the variance components. Italian Piemontese Calving Ease Data First parity calving ease scores recorded on Italian Piemontese cattle from January, 1989 to July, 1998 by ANABORAPI (Associazione Nazionale Allevatori Bovini di Razza Piemontese, Strada Trinita 32a, 12061 Cam‘r, Italy) were used for this study. In order to limit computing demands, only herds that were represented by at least 100 records over that nine-year period were considered for the demonstration of the proposed methods in this paper, leaving a total of 8,847 records. Calving ease was coded into five categories by breeders and subsequently recorded by technicians who visited the breeders monthly. The five ordered categories are: 1) unassisted delivery, 2) assisted easy calving 3) assisted difficult calving 4) caesarean section and 5) foetotomy. As the incidence of foetotomy was less than 0.5%, the last two ordinal categories were combined, leaving a 79 total of four mutually exclusive categories. The general frequencies of first parity calving ease scores in the data set were 951 (10.75%) for unassisted delivery; 5,514 (62.32%) for assisted easy calving; 1,316 (14.88%) for assisted difficult calving; and 1,066 (12.05%) for caesarean section and foetotomy. The effects of dam age, sex of the calf, and their interaction were considered by combining eight different age groups (20 to 23, 23 to 25, 25 to 27, 27 to 29, 29 to 31, 31 to 33, 33 to 35, and 35 to 38 months) with sex of calf for a total of 16 nominal subclasses. Herd-year-season (HYS) subclasses were created from combinations of herd, year, and two different seasons (from November to April and from May to October) as in Carnier et al. [7] who also analyzed calving ease data from this same population. The sire pedigree file was further pruned by striking out identifications of sires having no daughters with calving ease records and appearing only once as either a sire or a maternal grandsire of a sire having daughters with records in the data file. Pruning results in no loss of pedigree information on parameter estimation yet is effective in reducing the number of random effects and hence computing demands. The number of sires remaining in the pedigree file afier pruning was 1,929. Therefore, as in Kizilkaya et al. [21], the CP and CT models used for the analysis of calving ease data included the fixed effects of age of dam classifications, sex of calf and their interaction in B , the random effects of independent herd-year-season effects in h, random sire effects in s and random maternal grandsire effects in m. We assume: qumwm) and 80 h~N(0,ra,f) 0' 0" o n a n 2 o where G0 =[ " é" , wrth 0'3 denoting the srre varrance, 0',,, denoting the maternal 0.57” am . . . . . . 2 grandsrre varrance, 0',,,, denoting the srre-matemal grandsrre covariance, and 0,, denoting the HYS variance. Furthermore, ® denotes the Kronecker (direct) product (Searle [35]), and A is the numerator additive relationship matrix between sires due to identified male ancestors (Henderson [18]). Also, h is assumed independent of s and m. Flat unbounded priors were placed on all fixed effects and variance components. Based on preliminary analyses of the data and results from the simulation study, v was not inferred upon but held constant to v = 4 in the CT model. MCMC inference was based on the running of three different chains for each model. For each chain in the CP model, a total of 5,000 cycles of burn-in period followed by saving samples from each of 100,000 additional cycles was executed based on the experiences of Kizilkaya et al. [21]. Because of initially anticipated slower mixing, the corresponding burn-in period for each chain in the CT model was 10,000 cycles followed by saving each of 200,000 additional cycles. To facilitate diagnosis of MCMC convergence by 5,000 cycles, the starting values on variance components for each chain within a model were widely discrepant, with one chain starting at the posterior mean of all (co)variance components based on a preliminary analysis, another chain starting at the posterior mean minus 3 posterior standard deviations for each (co)variance component and the final chain starting at the posterior mean plus 3 posterior standard deviations for each (co) variance component. For both the simulation study and the calving ease data analysis, the effective number of independent samples (ESS) for each parameter was determined using the 81 initial positive sequence estimator of Geyer [16] as adapted by Sorensen et al. [3 6]. Furthermore, key genetic parameters, specifically direct heritability (11,21 ), maternal heritability (hi) and the direct-matemal genetic correlation (rdm) were inferred upon in the calving ease data using the functions of Go as presented by Kizilkaya et al. [21] and Luo et al. [23], for example. The only difference in the computation of heritabilities between the CP and the CT model was that the marginal residual variance for the underlying liabilities is not 03 in CT, as it is in CP, but is equal to __v_§0_3 (Stranden v .— and Gianola [39]). Posterior means and standard deviation of elements of s were also compared between the CP and the CT model. RESULTS Simulation Study Table I summarizes inference on v based on the replicated datasets from the two populations, comparing the CP versus CT models for the analysis of ordinal categorical data and comparing the Gaussian linear mixed model versus the t-error linear mixed model for the analysis of the matched latent or underlying normal liabilities, as if they were directly observed. The inference on v was surprisingly sharp and seemingly unbiased for the t-error mixed model analysis of liability data from Population I (v=4), with 95% equal-tailed posterior probability intervals (PPI) not exceeding 1.5 in width; furthermore, the corresponding ESS were relatively large indicating stable MCMC inference. Conversely, inference on v based on the t-error mixed model analysis of liability data from Population II (v = 00) indicated extremely wide 95% PPI and posterior 82 means exceeding 100, correctly indicating stronger evidence of Gaussian distributed versus t-distributed residuals for data from that population. Furthermore, ESS were generally very small (~ 20) indicating that inference on v was rather unreliable for data from Population II, at least given the specified MCMC sampling scheme. That is, five times as many MCMC samples would be needed to attain a minimum ESS of 100 as advocated by previous investigators ([5, 41]). Inference on v in ordinal data under the CT model was also interesting. In Population I, the 95% PPI correctly concentrated on low values for v although the PPI were understandably wider than for the corresponding analyses of liability data under I- error linear mixed models. Also, the ESS on v were considerably smaller (<25) for the CT model analysis of ordinal data than for the corresponding matched linear model analyses of liability data, such that acceptably accurate inference on v would require substantially more sampling. In replicated ordinal data from Population II, the 95% PPI on v were wide and concentrated on high values of v, consistent with what was expected. Furthermore, as with the t-error mixed model analysis of liability data, MCMC mixing on v using the CT model on ordinal data was seen to be particularly problematic in Population II as manifested by the small ESS. Table II summarizes inference on 0,2 based on the replicated datasets from the two populations, comparing the CP versus CT models for the analysis of ordinal categorical data and comparing the Gaussian linear mixed model versus the t-error linear mixed model for the analysis of the underlying liabilities. In the analyses of liability data from replicated datasets from both populations, the 95% PPI were in good agreement with 03 =0.10; furthermore, very large ESS indicating very good MCMC mixing. 83 MCMC mixing on 0'? was understandably slower in the analysis of ordinal categorical data, particularly in replicated data from Population 11 using CT due to the generally high posterior sampling correlation between 0': and v. Because of the problem of MCMC mixing of v in ordinal data, the MCMC chains were rerun with v held constant (v = 4) for the DIC and LML comparisons between models. In Table III, the DIC and the LML are given for each replicated dataset within each population for the CP versus the CT model analyses of ordinal data and for the Gaussian linear mixed model versus the t-error linear mixed model analyses of liability data. Speigelhalter et al. [3 7] suggested that a DIC difference exceeding 7 to be a substantial indication of an important difference in model fit. Given that, the model choices based on DIC for the linear mixed model analyses of liability data were resoundingly in favor of the correct model. However, in the comparison between CP and CT models for ordinal data analysis, the correct (CT) model was decisively chosen in only one of the replicates of Population I (v = 4) whereas the correct (CP) model was decisively chosen in only two of the replicates of Population 11 (v = 00), the other comparisons being somewhat indecisive (i.e. DIC differences < 7). This may not be too surprising given the information content of ordinal data relative to underlying continuous liability data. Comparisons based on LML (larger is better) lead to similar conclusions as those based on DIC (smaller is better). Application to Calving Ease Scores in Italian Piemontese Cattle Genetic Parameter Inference Sire and maternal grandsire CP and CT models were used for the analysis of calving ease scores in Italian Piemontese cattle. Because of the MCMC mixing problems encountered in inferring upon v, this parameter was held constant to v =4. Posterior inferences on key genetic parameters are summarized in Table IV and are based on the combined results from each of the three separate chains. The posterior mean, median and modal estimates (not shown) of the two heritabilities, and the genetic correlation using the MCMC algorithms were similar to each other within both models, implying that the posterior densities were symmetric and unimodal; this was further manifested by the fact that the 95% PPI are closely matched by the posterior mean :1: 2 standard deviations. In this study the total ESS for dispersion parameters across the three chains ranged from 1,420 to 9,305, indicating sufficient MCMC mixing under both models. Table IV shows that the ESS from the CT model were found to be almost double those of the CP model, attributable to the twice as large post-bum-in period for CT model. Considering v as known also improves mixing of, particularly, genetic parameters, in the CT model relative to joint inference with v (results not shown). Although the n x l auxiliary variable vector). is included in the CT model, this augmentation does not appear to adversely impact ESS and hence mixing of key genetic parameters relative to the CP model. In this study, the CT model produced posterior means of genetic variance components that were nearly twice as large as those estimated using the CP model. Furthermore, the marginal residual variance is _v__ 0,,2 in the CT model such that v .— seemingly twice as much residual variance is inferred under a CT model (with v = 4) than 85 under a CP model (with v = 00). Although these results might at first sight imply that greater genetic and residual variation is directly captured by the CT model, it should be realized that the variance of underlying variables L is only defined proportionately to the marginal residual variance. This is further apparent in that the 95% PPI of heritabilities were only very slightly concentrated towards lower values in the CT model relative to the CP model such that the corresponding 95% PPI for both h; and h; overlapped substantially between the two models. Furthermore, the posterior density of rd," was very similar between CT and CP, with most of the density being between -0.2 and -0.8. In order to compare the CP and CT models for fit to the calving ease data, LML and DIC values, broken down into its components 15 and pp, are reported in Table V. Since it is difficult to quantify the degree of Monte Carlo error on DIC (Zhu and Carlin [47]), we report DIC and LML values separately for each of the three chains under each model. It can be seen that there are relatively inconsequential differences in the measures from one chain to the next within each model relative to between models, thereby indicating considerably small Monte Carlo errors on the DIC difference between the two models. Both model choice criteria were overwhelmingly in favor of the CT model with v = 4. As anticipated, the model complexity, as measured by pp, is higher for the CT model; however, the complexity penalty is strongly counteracted by a smaller mean deviance D , thereby resulting in a smaller DIC favoring choice of the CT model. 86 Inferences on Sire Effects Posterior means of elements of s were determined to be corresponding point estimates of progeny differences (EPD) under both CT and CP models. The relationships between these estimates are shown to be strongly linear in Figure 1, with no hint of substantial reranking as indicated by a Pearson correlation of 0.99. The CT model had greater spread in EPD's compared to the CP model, as further manifested by a least- squares estimated slope of 1.38. This is not too surprising since a larger additive genetic (sire) variance was inferred in the CT model such that posterior means of elements of 9 should be more dispersed in the CT model relative to the CP model. However, as discussed later, this is not practically important since the variance of L is defined only proportionately to the marginal residual variability which is also larger in the CT model. Posterior standard deviations of elements of s are analogous to standard errors of prediction in mixed effects model analysis and can be used to derive approximate reliabilities of EPD's (Wang et al. [46]). That is, the standard errors of prediction were simply determined as the standard deviation of the MCMC samples of elements of s. Figure 2 provides scatter plots of these standard errors with the corresponding least squares regression line for the CT model versus the CP model. In this case, the correlation between the estimated standard errors was near unity. The estimated slope was nearly 1.41 indicating that the posterior standard errors were on average 41% larger under the CT model than under the CP model; nevertheless, these need to be considered proportionately to 0'3 which was also larger under the CT model. 87 DISCUSSION Given the recent momentum in using heavy-tailed residual specifications for the analysis of production data in animal breeding ([32, 38, 39, 44]) a hierarchical threshold (CT) mixed model based on a cumulative t-link specification was developed, validated by simulation and applied to a small calving ease dataset from Italian Piemontese cattle. The simulation study indicated that inference on v is possible in a CT model; however, it appears that either a more suitable MCMC strategy is needed or many more samples are required than pursued in our study to ensure a more reliable inference on v. Until this issue is satisfactorily resolved, we advocate fixing v at some arbitrarily low value in a CT model analysis. We chose v = 4 since this value minimally assures defined first, second, and third moments while providing a liability variable distribution that is maximally heavy-tailed. One can then use model choice criteria such as DIC to assess whether or not the CT model is a better data fit than the CP model. We further note that for the case where v is fixed, that MCMC mixing was not negatively affected by using the CT model, even though our data augmentation (of 2.) implementation might be of concern to those who might prefer Metropolis-Hastings sampling on all parameters (von Rohr and Hoeschele [44]) instead of introducing augmented variables. More recently, it has been demonstrated that data augmentation can be strategically used to enhance MCMC mixing; the strategies discussed by van Dyk and Meng [42] may facilitate more favorable mixing on v and hence deserve further consideration in CT model applications to animal breeding. Of particular note was that due to the implementation of the algorithm of Cowles[l 1], MCMC mixing of 1' was not seen to have been the most limiting (results not reported) as in previous animal breeding implementations ([36],[43]). 88 Our point estimates for heritabilities are substantially higher than corresponding threshold model estimates for calving ease reported by Manfredi et al. [25, 26], McGuirk et al. [28, 29], Varona et al. [43], Luo et al. [23], and Bennett and Gregory [2]. Nevertheless, our inference on a strongly negative direct-matemal genetic correlation is in agreement with previous work on calving ease using threshold models ([2, 23, 43]) and linear mixed models using data from the same source (Carnier et al. [7]). Hence, it appears from our results, in agreement with other studies, that selection of sires for calving ease of their progeny as calves should result in antagonistic effects in the ability of their daughters to calve easily as dams in successive generations. What was most surprising is that the 95% PPI for hzd are greater and do not overlap with corresponding PPI in Kizilkaya et al. (2002) who used a larger data set on first parity records from herds with greater than 50 rather than 100 records over the nine year period from the same data source. This result may be indicative of heterogeneity in genetic and residual variance due to size of herd or other confounding factors (e. g. region); this is an area for firrther research that our group has started with respect to residual variability. Two Bayesian model choice criteria, DIC and LML, were used to choose between the CP and CT models. In a simulation study, it was demonstrated that both DIC and LML were able to decisively choose the correct model in most cases whereas, in the remaining cases, these measures were too similar between the two models to allow a definitive choice. In the analysis of calving ease scores in Italian Piemontese cattle, the CT model was overwhelmingly chosen as the best fitting model by both model choice criteria. Nevertheless, in the examination of EPD's there were no real tangible differences between the two models in terms of sire genetic rankings. Although the 89 posterior standard deviations of sire progeny merit appear to be greater in the CT model relative to the CP model, the impact on reported accuracies of these predictions would be shown to be negligible. For example, the Beef Improvement Federation (USA) measure (Wang et al. [46]) of accuracy is somewhat directly related to the genetic variance which is higher in the CT versus the CP model. Our study involved sire models, where calf records are connected directly to sires with genetic relationships identified only through known paternal ancestry. Sire models are different from animal models where each record is connected directly to a calf identification with all known genetic (maternal and paternal) relationships explicitly modeled. For animals (e. g. dams), with effectively less data information, we would anticipate greater differences in predicted genetic merit between CT and CP animal models than given in our study. However, a sire model has been seen to be more stable than an animal model in CP implementations (Mayer [27]), particularly when paternal half-sib relationships are predominant as in our data. Furthermore, the fact that a sire and maternal grandsire model only accounts directly for a portion of the additive genetic variance and of the maternal genetic variance implies that a t-error assumption is essentially placed on a composite source of error, i.e. the sum of the residual variance and remaining genetic variation, attributable to unknown dams and to Mendelian sampling. From the perspective of using heavy tailed densities to mute residual outliers, this is significant since deviant dam and/or Mendelian sampling effects may be muted as well in a CT sire model specification. Presently, it does not appear to be feasible to apply MCMC methods to the very large datasets used for routine genetic evaluation of livestock by breed associations and 90 national recording organizations. Kizilkaya et al. [21] recently demonstrated, however, very little differences in predicted genetic merit and standard error of prediction in a CP sire model between inference provided by MCMC and by approximate empirical Bayes procedures currently utilized by the industry (e.g. Berger [3]). Empirical Bayes procedures are based on using the joint posterior modes of the sire effects as the EPD's, conditional on estimated variance components as if they were known with certainty. Based on results from Gianola and Foulley [17], the CT model can be readily implemented using empirical Bayes methods since the probability density function and the cumulative density function of a standard Student t distribution could be substituted for the corresponding Gaussian functions needed to derive the necessary scoring equations used to determine the required joint posterior modes. Furthermore, the degrees of freedom parameter, v, could be jointly estimated with the variance components using Laplace's method and tested for statistical significance using marginal likelihood ratio tests (Tempelman and Gianola [40]). Empirical Bayes implementation of the CT sire model and the comparison of results with MCMC inference under the CT sire model deserve further consideration. Unfortunately, however, these comparisons may not necessarily apply to CP or CT animal models since joint modal estimates of EPD's in the CP animal model can be badly biased (Mayer [27]). In this study, we have considered only two cumulative link models for the analysis of calving ease; conceptually, there are many others including those proposed by Chen and Dey [8]. In fact, Albert and Chib [1] demonstrate that the cumulative logistic link model is roughly equivalent to the CT model with v = 8. Other models can be contrived by considering alternative heavy-tailed distributions for the underlying 91 variables, such as those considered by Rosa [32]. Some of the resulting models may be shown to have better fit to calving ease data than demonstrated with the CT model in our paper. Specifications based on skewness (Von Rohr and Hoeschele [44]) may also have merit. The substantial residual and genetic correlations between birth weight and calving difficulty imply that genetic evaluations of calving case would substantially benefit from a bivariate threshold/linear multiple trait analysis with birth weight ([24, 43]). Further work on providing modeling flexibility with t-distributed specifications on both traits jointly is needed given our results and those already presented by Stranden and Gianola [38, 39] and Rosa [32]. 92 REFERENCES [1] Albert J .H., Chib S., Bayesian analysis of binary and polychotomous response data, J. Am. Stat. Assoc. 88 (1993) 669-679. [2] Bennett G.L., Gregory K.E., Genetic (co)variances for calving difficulty score in composite and parental populations of beef cattle: I. Calving difficulty score, birth weight, weaning weight, and postweaning gain, J. Anim. Sci. 79 (2001) 45-51. [3] Berger P.J., Genetic prediction for calving ease in the United States: Data, models, and use by the dairy industry, J. Dairy Sci. 77 (1994) 1146-1153. [4] Bertrand J .K., Wiggans G.R., Validation of data and review of results from genetic evaluation systems for US beef and dairy cattle. Proc. 6'h World Congr. Genet. Appl. Livest. Prod. 27 (1998): 327-330. Armidale, Australia, January 11-16, 1998. [5] Bink M.C.A.M., Quaas R.L., Van Arendonk J .A.M., Bayesian estimation of dispersion parameters with a reduced animal model including polygenic and QTL effects, Genet. Sel. Evol. 30 (1998) 103-125. [6] Carlin B.P., Polson N.G., Inference for nonconjugate Bayesian models using the Gibbs sampler, The Canadian Journal of Statistics 19 (1991) 399-405. [7] Carnier P., Albera A., Dal Zotto R., Groen A.F., Bona M., Bittante G., Genetic parameters for direct and maternal calving performance over parities in Piemontese cattle, J. Anim. Sci. 78 (2000) 2532-2539. [8] Chen M-H., Dey D.K., Bayesian analysis for correlated ordinal data models, in: Dey D.K., Ghosh S.K., Mallick B.K. (Ed), Generalized Linear Models: A Bayesian Perspective, Marcel Dekker, New York, 2000, pp. 133-158. [9] Chib S., Marginal likelihood from the Gibbs output, J. Am. Stat. Assoc. 90 (1995) 773-795. [10] Chib S., Greenberg E., Understanding the Metropolis Hastings algorithm, Am. Stat. 49 (1995) 327-335. [1 l] Cowles M. K., Accelerating Monte Carlo Markov Chain convergence for cumulative link generalized linear models, Stat. and Comp. 6 (1996) 101-111. [12] Dempster AP, The direct use of likelihood for significance testing, Stat. and Comp. 7 (1997) 247-252. [1 3] Emanuelson U., F ikse F ., Banos G., Impact of national genetic evaluation models on international comparisons, in: Computational Cattle Breeding '99, 18-20 March 1999, Tuusula, Finland. 93 [14] Gelfand A.E., Model determination using sampling-based methods, in: Gilks W.R., S. Richardson 8., Spiegelhalter D.J.(Ed.), Markov Chain Monte Carlo in practrce, Chapman&Hall, New York, 1996, pp. 145-162. [15] Gelfand A.E., Ghosh S.K., Model choice: A minimum posterior predictive loss approach, Biometrika 85 (1998) 1-11. [16] Geyer C. J ., Practical Markov chain Monte-Carlo (with discussion), Stat. Sci. 7 (1992) 467-511. [17] Gianola D., Foulley J. L., Sire evaluation for ordered categorical data with a threshold model, Genet. Sel. Evol. 15 (1983) 201-224. [1 8] Henderson C.R., Inverse of a matrix of relationships due to sires and maternal grandsires in an inbred population, J. Dairy Sci. 59 (1976) 1585-1588. [19] Heringstad B., Rekaya R., Gianola D., Klemetsdal G., Weigel K.A., Bayesian analysis of liability of clinical mastitis in Norwegian cattle with a threshold model: Effcts of data sampling method and model specification, J. Dairy Sci. 84 (2001) 2337-2346. [20] Jensen J ., Wang C.S., Sorensen D.A., Gianola D., Bayesian inference on variance and covariance components for traits influenced by maternal and direct genetic effects, using the Gibbs sampler, Acta Agric. Scand. Sect. A. Animal Sci. 44 (1994) 193-201. [21] Kizilkaya K., Banks B.D., Carnier P., Albera A., Bittante G., Tempelman R.J., Bayesian inference strategies for the prediction of genetic merit using threshold models with an application to calving ease scores in Italian Piemontese cattle, J. Anim. Breed, Genet. (in press). [22] Lange K.L., Little R.J.A., Taylor J .M.G., Robust statistical modeling usin the t distribution, J. Am. Stat. Assoc. 84 (1989) 881-896. [23] Luo M.F., Boettcher P.J., Dekkers J .C.M., Schaeffer L.R., Bayesian analysis for estimation of genetic parameters of calving ease and stillbirth for Canadian Holsteins, J. Dairy Sci. 82 (1999) 1848. [24] Luo M.F., Boettcher P.J., Schaeffer L.R., Dekkers J .C.M., Bayesian inference for categorical traits with an application to variance components estimation, J. Dairy Sci. 84 (2001) 694-704. [25] Manfredi E.J., San Cristobal M., F oulley J .L., Some factor affecting the estimation of genetic parameters for cattle dystocia under a threshold model, Anim. Prod. 53 (1991a) 151-156. 94 [26] Manfredi E.J., Ducrocq V., Foulley J .L., Genetic analysis of dystocia in dairy cattle, J. Dairy Sci. 74 (1991b) 1715-1723. [27] Mayer M., Inequality of maximum a posteriori estimators with equivalent sire and animal models for threshold traits, Genet. Sel. Evol. 27 (1995) : 423-435. [28] McGuirk B.J., Going 1., Gilmour A.R., The genetic evaluation of beef sires used for crossing with dairy cows in the UK. 2. Genetic parameters and sire merit predictions for calving survey traits, Animal Sci. 66 (1998) 47-54. [29] McGuirk B.J., Going 1., Gilmour A.R., The genetic evaluation of UK Holstein Friesian sires for calving ease and related traits, Animal Sci. 68 (1999) 413-422. [30] Newton M.A., Raftery A.E., Approximate Bayesian inference with the weighted likelihood bootstrap, J .R. Stat. Soc. Ser. B. 56 (1994) 3-48. [31] Rekaya R., Weigel K.A., Gianola D., Application of a structural model for genetic covariances in international dairy sire evaluations, J. Dairy Sci. 84 (2001) 1525- 1530. [32] Rosa G.J.M., Robust mixed linear models in quantitative genetics: Bayesian analysis via Gibbs sampling, in: Proceedings of International Symposium on Animal Breeding and Genetics, 21-24 September 1999, Brazil, pp. 133-159. [33] Rubin D.B., Iteratively reweighted least squares. In Encyclopedia of Statistical Sciences, 4 (1983) 272-275. [34] Satagopan J .M. Newton M.A., Raftery A.E., Easy estimation of normalizing constants and Bayes factors from posterior simulation: stabilizing the harmonic mean estimator, Technical Report 1028, Department of Statistics. 2001. http://www.stat.wisc.edu/~newton/papers/abstracts/trl 028a.html [3 5] Searle S.R., Matrix Algebra Useful for Statistics, John Wiley & Sons, NewYork, 1982. [36] Sorensen D.A., Andersen S., Gianola D., Korsgaard I., Bayesian inference in threshold models using Gibbs sampling, Genet. Sel. Evol. 27 (1995) 229-249. [37] Spiegelhalter D.J., Best N.G., Carlin B.P., van der Linde A., Bayesian measures of model complexity and fit, J .R. Statist. Soc. B 64 (2002) 1-34. [3 8] Stranden 1., Gianola D., Attenuating effects of preferential treatment with Student-t mixed linear models: a simulation study, Genet. Sel. Evol. 30 (1998) 565-583. 95 [39] Stranden I., Gianola D., Mixed effects linear models with t-distributions for quantitative genetic analysis: a Bayesian approach, Genet. Sel. Evol. 31 (1999) 25- 42. [40] Tempelman R.J., Gianola D., A mixed effects model for overdispersed count data in animal breeding, Biometrics 52 (1996) 265-279. [41] Uimari P., Thaller G., Hoeschele I., The use of multiple markers in a Bayesian method for mapping quantitative trait loci, Genetics 143 (1996) 1831-1842. [42] van Dyk D.A., Meng X-L., The art of data augmentation (with discussion). J. Comp. and Grap. Stat. 10 (2001) l-SO. [43] Varona L., Misztal 1., Bertrand J. K., Threshold-linear versus linear-linear analysis of birth weight and calving ease using an animal model: I. Variance component estimation, J. Anim. Sci. 77 (1999) 1994-2002. [44] von Rohr P., Hoeschele 1., Bayesian QTL mapping using skewed Student-t distribution, Genet. Sel. Evol. 34 (2002) 1-21. [45] Wang C.S., Rutledge J .J ., Gianola D., Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs, Genet. Sel. Evol. 26 (1994) 91-115 [46] Wang C.S., Quaas R.L., Pollak E.J., Bayesian analysis of calving ease score and birth weights, Genet. Sel. Evol. 29 (1997) 117-143. [47] Zhu L., Carlin B.P., Comparing hierarchical models for spatio-temporally misaligned data using the deviance information criterion, Statistics in Medicine 19 (2000) 2265-2278. 96 .Amama 530 we 838:8 85:03 0253 REE 05 m5? 838% 8098305 Mo 52:50 0280b“. 05... .352: £538 8:88 35.33 $8. douse-ow 23:8..." um :88 85809. .508 803% m 8m 330:9: coumaaom comm dousngmmu .0328“ 36300 53> uoEooam = 8:233 “8038.“ we moopwou v 53> coasnmammv 3668 a 53, @250on _ comm—among E 8. 2 z. - one 9.3 a 3% 2 ”$5 - $2 ”3:. 4 am 3 2 ~28 - 3.8 3&8 a 3.5 om 03.8 - 3.8 8.0: a 3; Na om owes - 8.4 3% a 2.8 a 2.32 - 3.9 SE. I. 02 7: mm ”Em - 32.. 3.2 a as :5 Sn - Ga 2d a a? E 3 8.2 - 8.4 SM 4 was R? 3.4 - a? 08 a NS. 2 E 32 - «3. a: a ”mm 3.3 :3. - 2a 98 a o; Z 38 an: :3 Em a SE 38 .E :3 new a. 2m .easo-=o§a£ 3520 «do aafiq .6on ...—E: 033383 05 main beam cows—58mm 5 A3 83:8qu :83on mo moocwow 3:28.. 23 :0 ooaocomfi 8538 A «Sun. 97 .AmaoC 8960 .8 88830 88888 gamma 3:8 05 wfiw: 8888 835388 .«o 838:: 0368.8 056 .1885 £58ng 8:88 35.33 $3. 83833 855m um :88 82.88%. .9833 m 8m 38888 sous—~88 comm 8889388 8358 888350 53, 3&8on a noun—smog ”8258..“ mo 88an v 53» 8085588 F5288 8 5m? vommooam H 8288903 98 t: a; - 5o 85 a N3 in? m; - So So a 3o w: o: 8o - 8d :3 a MS 23.. 3o - 5o 85 a N3 E a: a; - 3o 85 a :5 2a.: 2.3 - 3o 85 a o; 7: 8a :5 - 3o 85 a 3o 8%: 3.0 - 8d 8.0 a o; 2 85 N2 - 3o :3 a m; 3.2. mg - 3o 25 « Ed 8 33 a; - 86 So a :6 88% 8o - mod 85 a m; E ..mmm or; :3 ..am a sa ummm .5 $8 ..8 a :3 833-8388 9% 355 «do 32%: .388 x83 038383 05 86: 35% 828886 8 va 8583 8% no 88888 Summon .= 033—. .3883 m .8.“ 330:3." scum—:98 scam .aousnfimsu 3:28.— Swfimsmo 55 93:00am a cows—smog ”888$ mo moouwou v 53> cocsnwummu 3662 ~ 53, 2&6on H coumfimoma 33. Si 23. 3% $3. $3 a Ga- RS 3 83- g 3 £3- Ea $3”- $3 83.- on 3 Na 33. £3 $3- 3% 29m- $3 83.- :3 7: Sam- 8% $3- RE 331 83 move- 23 E 83- 23 $3- :3 Q 3- $3 $3. _ K.” 2 $3. $3 33. 83 23. 8% 83: S; E ,9: 05 dfi 05 as: 05 d3 05 “633-8338 388 .358 x57: gum—58:0 x5. :pofi 0333850 .258 8.5.: “macaw E58 Hobo 56950 3% :88 38 mo we .22 $8 Esmflo mafié .buBm cots—25m E 30on 50253 385388 326 vocati— HEwEE wo_ was ORG «route cow—Eco? 35305 A: 033,—. 99 HES—o 8:: 05 888 mmm E owcfi 8 “£8 H H mos—g BEBE imam: .836 Mo 838ch 8:268 0353 3:5 2: mafia 38:0 0202 8:: 05 888 8353 E35039: mo 698:: 938mm". 33 2:6 .EEBE b23305 Bremen uBEIBao $30 dozfifié wavcflm H 588 How—88mg downtomoc 8m “x8 coma Ex I 82 5% - 8: EH vac- - m3- m; H woo- 82 m3- - who- :6 H «We. a... :3 I 3: 8% I a: 83 3o - 3o :3 H :.o 85 m3 - So 35 H Ed a”: SHE I £1: 68; I mm: 53 m3 - ”do 85 H 93 ER one - one So H as a”: :53 I 08.2 38m I and H Ewe $3 I 33 cm; H 82 83 83 I M: S 23 H 2; Nb $8; I 3a :3 I 32 E ER 83 - Sod sod H E; 35 mod - mood 23 H 53 b 5» I $2 3% - 2.2 2 SE 83 - Rod 23 H 83 $2 83 - Sod _ :3 H 3.90 Nb 53 I 3m: :8; - 23 H :2 $2 - a; :55 H ES 53“ 83 - SS 085 H 3:. Nb ,.me Ora $3 Em H E Hmmm era :3 new H 35 “3935 388 H5: 33230 388 vi: “58 0333830 .233 88.8an SES— E 388 38 wag—8 mo Eouogm own—how :o 8:203 Hotonom .>— 933—. 100 .mhouogn mo Hon—En: 258% on? .883 :88 85609 no 893 853% 563mm,. 60.83% mammoxmm .«o omega on? god- 912 gm v3.2 wing m 330 «mod- :3; mg _mm.m _ 3&3 N 530 36.x- $12 2% mama“ wvmdfi _ EEO mend- 9%.: $5 2.5% fl QWS m 520 35.x- 3A.: awn and ~ 893 N 530 mend- 9&2 an Nana a v3.3 g £30 “505 3%:50 d3 93 0 am 985 an .238 8820805 5%: E 888 88 wag—3 mo imbued 2: é 32$ 32d Banzai 3538. 3 Ba 6:: 8:35 Homage? 859% ommoaméfio 0292 .> 2.3 101 a as us 8: uaaauaa 5S 355 a5 Boa “5:25:80 3%? #53 0333850 no woman mooaohobmv ~30on ohm 338:3 mo 338 Bremen .«o “BR—utmom A 0.59,.— E. .32.. 9.5358 3- Fl ”WI 1 aNielmm‘O me 102 .5 63 .«o 0:: womaEtoE—m 53, .308 Ma: .505 0363850 9%? ME? «33383 co woman moocouobfi >5on 2% 33833 .«o maggot “5933 8560a 90 “29838 .N «...—ugh , xc=wima «>32:an ad to no No to o l 1 ‘ w‘ . ‘ :11 “\‘w . - . . o to S m m 0o m to m We w ”I go No 103 CHAPTER 4 Bayesian structural modeling of heterogeneous residual variances in generalized linear mixed models ABSTRACT A Bayesian hierarchical generalized linear mixed model (GLMM) based on a structural multifactorial model with fixed and random effects that multiplicatively influence residual heteroskedasticity was developed. The two specific GLMM considered in this paper are the linear mixed mode] (LMM) for the analysis of normally distributed characters like birth weight and the threshold or cumulative probit mixed model (CPMM) for the analysis of ordinal categorical data like calving ease. We validated our models using Markov Chain Monte Carlo (MCMC) methods for posterior inference on parameters specifying residual heteroskedasticity. Three replicated datasets on normally distributed and ordinal categorical data from each of four different populations, characterized by various levels of residual heteroskedasticity, were generated. Residual heteroskedasticity parameters, including fixed and random effects specifications, were particularly well estimated for the analysis of normal data and when heteroskedasticity was extreme. Furthermore, the deviance information criterion (DIC) was useful in correctly choosing between heteroskedastic and homoskedastic models. Sire and maternal grandsire heteroskedastic linear and cumulative probit link threshold models were fitted, respectively, to birth weight (BW) and calving ease (CE) data on 8,847 Italian Piemontese first parity dams. The residual variance for male calves was significantly greater than that for female calves for both BW and CE, translating to substantial differences in the posterior means i standard deviations of direct heritabilities 104 for BW (0.32 i 0.05 for females versus 0.25 i 0.04 for males) and for CE (0.504 t 0.083 for females versus 0.370 i 0.064 for males). Although the heteroskedastic LMM and CPMM were chosen by DIC as the better fitting model for BW and CE, respectively, the high correlation between posterior means of sire effects (>0.97) suggested no meaningful rerankings. INTRODUCTION An important assumption in many genetic evaluation models is that variance components associated with random and residual effects are homogeneous across all conditions, whether it be across sexes of animal or environmental conditions, such as herds. These models are said to be homoskedastic. However, the existence of heterogeneous variances, or heteroskedasticity, for milk production ([16], [18]), grth performance (Garrick et al. [9]) and conformation traits (Robert-Granie et a1. [30]) has been firmly established. A number of possible causes for heteroskedasticity have been suggested, including a positive relationship between herd means and variances, differences across geographical regions, and changes in variability over time due to changing herd managements (Weigel et al. [40]). If this phenomenon is not properly taken into account in generalized linear mixed models, differences in within-subclass variances can result in biased breeding value predictions with disproportionate nmnbers of animals selected from environmental subclasses characterized by high variability. This bias potentially translates into a reduced genetic progress due to breedstock selection based on these predictions ([17], [40]). 105 Generalized linear mixed models (GLMM) can be readily extended to take into account heteroskedasticity. As an example, best linear unbiased prediction (BLUP) can be reliably used in breedstock selection when random effects and residual variances are heterogeneous, provided they are correctly specified across subclasses characterized by unique variance components (Gianola [12]). However, variance components are rarely known and may need to be estimated for a large number of small subclasses when heteroskedasticity is present. Since most preferred methods of variance components estimation, including maximum likelihood (ML) and restricted maximum likelihood (REML), have only asymptotically desirable properties, such procedures may not yield reliable estimates within small subclasses ([40], [41]). Foulley et al. [7] introduced an empirical Bayes procedure for assessing sources of residual heteroskedasticity in Gaussian linear mixed models (LMM). Their method was based on the use of a structural log link model 7,. = Ina: = m3)» , where of, is the residual variance component specific to subclass i, m ,. is a known incidence matrix unique to subclass i connecting 0': to A. , a vector of unknown dispersion parameters. Hypothesis testing on elements of A. was based on an approximate marginal likelihood ratio test. In addition, Foulley et al. [7] and San Cristobal et al. [32] further extended the structural log link model such that 3. includes both "fixed" and "random" factor partitions. As in a classical linear model, the distinction between these two classes of effects for modeling heteroskedasticity is such that levels of a random factor derive from a probability distribution (Searle et al. [33]); that is, a structural prior is placed on random effects. Conversely, fixed effects involve parameters that might be characterized by subjective or noninformative priors. For example, sex of animal might be considered 106 fixed whereas herds might be considered random with respect to a structural classification model for heteroskedasticity (San Cristobal et al. [32]). As with random location effects in classical linear mixed effects models, specifying structural priors on effects influencing heteroskedasticity is particularly beneficial in that statistical information on one effect (e.g. herd) is borrowed from the distribution of all such effects ' within the same factor (Robinson [31]). Foulley and Gianola [8] adapted the structural log linear fixed effects model for heterogeneous residual variances in a threshold or cumulative probit mixed model (CPMM), using likelihood procedures for residual heteroskedastic inference on calving ease in American Simmentals as a function of sex of calf and age of dam. Ducrocq [6] analyzed calving ease data of French dairy breeds (Normande and Montbeliarde) using the methods developed by Foulley and Gianola [8] method, considering the effects of sex, age of dam, parity, region, and year on residual heteroskedasticity on calving ease scores. Many of the proposed residual heteroskedastic models for animal breeding, however, invoke analytical approximations, which appear tenuous, particularly for the analysis of categorical data. Furthermore, we perceive the lack of a unifying framework for structural modeling of heterogeneous variances in GLMM analysis for both continuous production and categorical fitness traits. That is, structural prior or random effects specifications have been developed for Gaussian error models, as by Foulley et al. [7] and San Cristobal et al. [32] but not for any other GLMM, such as the CPMM, in animal breeding. The objectives of our study were 1) to develop and validate a Bayesian structural multiplicative model on residual variances for observed or augmented variables 107 in a heteroskedastic GLMM, concentrating on a LMM analysis of normal data and a CPMM analysis of ordinal data based on use of Markov Chain Monte Carlo (MCMC) methods and 2) to apply the model using MCMC to a calving ease dataset derived from the Italian Piemontese population. MATERIALS and METHODS HIERARCHICAL CONSTRUCTION OF THE HETEROSKEDASTIC GENERALIZED LINEAR MIXED MODEL In a number of GLMM, data augmentation schemes exist such that a nxl vector of either observed or augmented variables L = {L}; conceptually maps one-to-one to n i: the data vector Y ={ ,.} l . Examples where such augmented variables are useful include threshold models (Sorensen et al. [34]) and censored data models (Sorensen et a1. [3 5]). We write a linear mixed effects model as L=XB+Zu+e (1) where B is a vector of fixed location effects, u is a vector of random location effects, X and Z are known design matrices and e is a vector of normally distributed residuals with variance covariance matrix R having a certain heteroskedasticity specification as defined later. The linear model in (l) is equivalent to the following distributional specification: LlB,u,R~p(Llli,u,R)=N(XB+Zu,R) (2) For normally distributed data as in a LMM, yEL such that p(Y | B,u) -_-= p(L | B,u) whereas for ordinal data with, say, C = 4 categories, L maps to y as follows: 108 rl to < L, S r, 2 r < L. s r K = T l I 2 (3) 3 Q0 is a random multiplicative scaling factor unique to the 1th level of the random effect. Note then that fixed and random effects specifications are provided on both location and residual dispersion parameters, although the classes of effects considered do not need to be the same as for the log structural heteroskedastic models of Foulley et al. [7] and San Cristobal et al. [32]. 109 The prior density on the fixed location effects [3 is specified [5 ~ 1303) (5) where p(fl) is a subjective prior, typically specified to be flat or vaguely informative. Furthermore, the random location effects are typically characterized by a structural multivariate prior specification: ulcp ~ p(uw) = N (0,G(¢)) (6) Here C((p) is a variance-covariance matrix that is a function of several unknown variance components or variance-covariance matrices in q) , depending on whether or not there are multiple sets of random effects and/or specified covariances between these sets; an example of the latter is the covariance between additive and maternal genetic effects. Furthermore, flat priors, inverted Gamma densities, inverted Wishart densities or products thereof may be specified for the prior density on (p , ‘P ~ p(v) (7) depending, again, on the number of sets of random effects and whether there are any covariances thereof (Jensen et al. [20]). A subjective conjugate inverted-gamma prior density or, alternatively, a flat prior density may be specified separately for each 6': , i.e. 53, ~P(53.) (8) for k =1 ,2,. . .,s. Conversely, a structural prior is used to model the random residual dispersion effects, 6,, 1: 1,2,. . .,t. We conveniently choose this structural prior to be an inverted-gamma density with parameters on. and one-1, 110 —1 a" _a_ —1 p(6,la.)oc(—“I:(a—))—(6,) ‘ . "emf—“:5 ); I= 1,2,....1. (9) e I Here E(§,) = land Var(6,) = 1 2 such that as a, ——> oo , the random dispersion effects a _ influence on residual heteroskedasticity diminishes. Note that with the specification in (5), there is a borrowing of information across levels 1= 1, 2,. . ., t of the random factor, just as there is for classical random effects modeling of location parameters. Generally, a, is unknown such that a subjective prior may be placed on it. One noninformative prior used in our applications is specified as follows: 6!. ~ p(a.) 0C 1 (1+a¢)2 (10) which is identical to specifying a Uniform(0,1) prior on 1 + a) The remaining hierarchical specifications in this heteroskedastic GLMM depend upon the first (data sampling) stage of the n x 1 data vector y. For a LMM analysis of normal error data, equation (1) would suffice (i.e. y a L) such that no augmented variables are required, whereas for a CPMM analysis of ordinal data with C ordinal categories, numbered j = 1,2,. . .,C, we would specify the first stage of our hierarchical model using Sorensen et al. [34]: p(y I L,‘r)=1—"-I{Zl(‘rj-l < Likl < Tj)1(yild :j)} (Ila) i=l j=| followed by, using a scalar representation of (2), L,“ |B,u,6‘:6, ~ N(x;k,B+z;k,u,6:6,). (11b) 111 Here L,“ and y,“ are the ith elements of LH and y“, respectively, whereas x'm and 2',“ are known incidence row vectors corresponding to subject i within the )‘d'’1 residual variance subclass for k = l,2...,s,l=1,2,...,t, and i = 1,2,...,nk1. As before,1' = [To 1, 1C], denotes a vector of unknown threshold parameters that delimit the augmented variables L into their respective observed data bins y with 1(.) denoting the indicator function having value 1 if the condition within the function is true, being 0 otherwise. Recall previously that To = -oo and rc = +00 but further note that in addition 1:1 is fixed to an arbitrary constant in ‘order to satisfy identifiability constraints as in Sorensen et al. [34]. We further adopt the alternative parameterization presented by Sorensen et al. [34] in which residual variance is explicitly modeled, rather than typically constrained to equal to 1, as typically invoked in homoskedastic error CPMM (Gianola and Foulley [13]). This specification thereby requires one additional constraint on 1' , such that C-3 parameters in 1' are uniquely identifiable. A prior distribution on the uniquely identifiable elements of 1' may be specified provided that the order constraints on elements of r are satisfied (Sorenson et al. [34]), i.e. 'r'=[t2 t3 rm] ~p(1:); rz. 666666.666.” 666NNNH6N666 v. . .-..66 66.33.. 6. . 766.6 6663.6.— wm6-a . .6 3.6HnN.6 a. 6.566 .66“. . .6 N->. NNN . mm-m . .mm mw.vvaHE..m6m 66. .-666 6666666 66. .-N66 3.366.. 666.6 . .6 V6.6H6N6 mN6-6. .6 V6666. .6 .->. 6N.”...N-NNN. 6m. .6 . H60 . 6 6m. .-6. .. 66.6HNN. N. . . -m66 m66HN6.. 3.6-3.6 n66HNm6 9N6-666 V66Hm . .6 ...--... 66169186. hm. .6NH6v. .6. . ...-NM. 666“.” . N. . w . . .-666 66.366. 3.6-8.6 666HNm6 .m6-N. .6 666fioN6 N-... 66.35-.. . . .N 6N.6¢6HN666. 6m. .-6. .. 66.6H..N.. vo. . 66.6 V6.6H666 26... . .6 V66HmN6 6. 6.66.6 M66HN. .6 .-... 66.6N-mv6 3.66666. 66. .-NN.. 3.33.. m . . . -N66 66.6HN6. $6.36 66.3.66 6 . 6-8.6 M66HN. .6 m-.. . .6.-V6.6 N. .NH . N.6 an. .-. ... ..66HVN. m6. 736 66.366 6m6-6N6 66.6HwN6 vN,6-666 36Hm . .6 N-.. M6.N.-m6.n aw. .Hn66 .v. ...m. .. >66H6N.. w . . .-m66 66.6H66.. 3.6... . .6 V66HVN6 N66. .6 66621.6 . .6 .-.. hm.v-6n.N 3.63%..“ 6m. .-.6.. 6. 6Hw... 66..-»..6 66.6“.66 mv.6-mN6 mo6me6 v. 6.666 N66H666 m. 6N.v-mm.N 3.356 S. 756.. 6. .6HmN.. 6N. 75.6 666HN6.. Nv6-NN6 36“.”. m6 .N6-m66 M6666 . .6 N-. m 0.1m m .N 6V6HNm...” 6N. .-m66 66.6H66.. m6. .656 66.3666 26... . .6 V6.6HVN6 .N.6-..6.6 86“.”... .6 .-. 56°66 Qwfizn. 7.6966 Omani“. 7.6.366 Oman—)7. 7.6.666 QmHZn. 7.6.666 QmHSr. magma-60.3.6606 b b 30 .b NI! NI .0608 flue-.60 .0868 .805. 60.3 6636 0066.586 0. $8258.. 90006882. .05 606.63% 05 no 85.86:. 8.580.. .. 9.63.. 135 .m.N6m-N . .6. $5633. .6 .6. .66.. 6N6H66.. Nv..-n66 N. .6Hv. . . mm6-VN6 566H6m6 6N6-666 36%.». .6 m->. 65.56.1666 66.666 366.66. 6N. .666 666H. ... NN. .-m66 566H56.. v6.6.5 . .6 ~.6.6H.~N6 5. 6.666 66.6H. .6 N->. 56.6.6 m-V6éN 3.6663” . N.mN5 v. . T566 566H66.. n . . . -666 566HN6.. 666-6. .6 66.6.IT5N6 VN6-666 V6666 . .6 .->. 5.». . 6v-65.6 66.656 3.6665. 66. .-m N. . 6. 663%.. 5N. .-N66 66.6H66.. .m6-mN6 5666666 6N66. .6 66.6H5 . .6 6... 66666-6 . .6. 56. .56H5m66. $1-56.. 666HmN.. 6N. .-.66 566HV6.. 6v.6-NN6 666”: m6 .m6-N. .6 m66H6. .6 N-... V6.6mVN-.6.m. 66.6566HN6.v65 mm. .-m . .. 6. 6Hmm. N. . .-566 6666666 666-5 . .6 666HmN6 6. 6-666 666H. .6 .-.: V6.N6 .666 Nvdo . H566.q 66. .66.. m. .6HVN.. .N. .666 66.6H66. 6v6-mN6 56.3666 6. 6-566 66.6HN. .6 m-.. 6N. . N666 v66“ . 6.6. ...N-m . .. 3636.. .N. .666 6. 6H . 6.. mV6-6N6 66.6H6m6 6N6-666 V66H6. .6 N-.. 5v.»- . - . 66 6N.N-H65.5 N6. .-666 . .6H66.. 66.6- . 5.6 566HV66 6N6-v. .6 3.6H6N6 .N6-666 66.6.4.6. .6 .-.. mm.6-N6.m 666Hmv... 5N. .656 m . 6HV66 . .. .656 6. 6H666 6.6-6 . .6 666H6N6 m . 6-66.6 666H666 6-. V6.v-mm.N 666H6v6 6m. .-N6.6 . .6HN. .. .N. .666 6. 6..-.66. . 6v6-6N6 66.6H6N6 6N6-566 666Hm. .6 N-. 66.6-6m.N V6.6H.6.m .v. .-m66 N. .6Hm . .. v. . .-656 666HV66 666-5 . .6 66.6HmN6 6N6-666 V66Hm .6 .-. 7.51666 Qmfiia. .n.n.o\om6 DmHSE 7.6.0666 DmHEn. 7.51666 Qmfizn. 7.6.366 QmHEA. 30889-623369. .6 flab b ..b b NI NI .0on 606.8 .58.. 033.6850 66.0: 6.030 5.8.68.0 6. 0.0.0880». 66.056880”. .05 8.000606 05 co 00:205.. 8.60600“. ... 0.6:... 136 .3830 085 05 $88 mmm 5 owfiu 9 8.3.. H H mos—g BEBE 2N3: 8500 no 88850 8:268 2660a 32$ 2: wEm: Basso 0202 025 05 88% 838% “59.335 .«o “38:: gauche .89 25. .7352: bsmnmnenm .85me 3:84.33 $30 .33 I 03.: $3 I ~35 .cowmgoc @3983 H :38 85609. downtomoc 3m «x2 coma 35.: E: I 23 2:... H 8m; 3.2 mad I 0de «23 H 89o A 3% so? I 83; $3 I and 3&2 8% I $2 3&3 H 88.... RS: 23 I max on; H 83 a :05 I as $.33: EH ...: 33; 82. I ammo 285 H ”22 and $3 I a? 32 H $3 mm- R 88; I ca. s 88H I ”3d fig» 33 Had I 83 885 H 8:6 25 $5.: I 33 23 H 3:: mm 53 I 8a. a 53 I and $5 2% ”$2 - $85 285 H 82: 35 852 I ”8;: as; H m3: mm 823 I 03d 53 I E: H a3 8:. I a; 23 H 83 v3.3 33 I 82 83 H 83 Nb 3% I Sn :53 I 83 .5. $3 $3 I 83 :3 H H85 83 $2 I 3:. 83 H 33 b a: I ”E a? I a: s 3 $2. I 23. 83 H v85 33 a; I 82 .2 3 H 093 Nb $8; I on: at; I a5: v. 33 £3 - 53 ~86 H :3 $3 $2 I $3 33 H mm: Nb Hmmm or: :3 Em H 33 Hmmm can :3 ..am H E “sagas; $38 Soto 3me H58 95.25850 338 98%? 3me H354 .238 88.8805 Swan: 5 $88 38 $330 was Emma? is mo manages 393.5, no 8:203 8560A A: «Bah 137 @530 085 05 $88 mmm E 0mg 2 Homo.— H _ was?» 3335 Xmas 639 we 88:58 8:263. 9683 REE 05 mama: £35 0202 025 2: $88 838% 262839: mo 338:: 028qu .83 on? .3325 bzssoa stoma 33.33 $5 donuts“. @3283 H 538 85509. downtown—V 8m ~on coma $3-83 $3-83 : - as - 33 $3 I :2 :3 H H23 23 $3 I 23 Sod H ”mod E u - H H we 53-23 $322,: 2 - :5 - 83 $3 I 335 Rod H 33 $3 83 I ”3o :3 H E; E m - ‘ H F 82-82 H _ 5-2: mg 82- - £3- 33 H $3- 23 23 I $2- :3 H 82- 5 83-23 86.3; E... - m8 $3 I «8.0 3:. H 33 23 a; I Sod Rod H mm; x ..H 3 $3.82 $2443.: - aim was I :3 33 H 32 2mm 33 I 23 Rad H £3 HE F 53E 38.3: 2 - ma $3 I 83 mm; H Rod :3 $3 I 83. 83 H wood E S $3.23 :2.§8.: I - $3 82 I 3% Hood H 22 $2 82 I $3 35 H 3% E ms Hmmm or: £3 Ham H $5 $3 OE $3 Dam H 3: $3835 338 3me we: H508 oifixsfidu 388 float». @828 Mods .238 omega—080E SE“: 5 880m 88 wag—mo 98 Emma? 5:: mo £80883 038% cc oocouomfi Showman .>— 03:. 138 $80883 no 5983 953mm 2? .889, :88 Summon no @093 853% 56989. 68838 5%on mo owfiozw 2E.“ 3%. MRS 2» 32.2 ”3.2 m 520 was- mane m8 01.2 ”3.2 N 596 3%- $5: 3» 3.2 2&2 H 520 momummfiou—mOuowom 82- 9A.: 2: Eu: $92 m 520 8;- 3...: a: 35.3 892 N 520 83. 8...: g 2.2 Home _ 526 amoflmmvov—moaom 228 3 as. $3... 25 :3... 9%.? m 520 32.8- 3.2. woo 5.3. 3%? N 326 32.8. ”2.3 mg ~83. 9A.? _ £20 momumNVOXmoHoao: :18. 03.3 mg H33. 08.? m 520 v3.8- 03.3 08 3...: ca? N :35 v3.8- 08.3. 08 v8.3. 25.? _ 520 flmoflmfluoxmofiam 2:: as: 93 0 am .. 65 _. m .238 38:0an 52d: 5 880m 980 wag—mo 28 Emma‘s 5.5 mo mama—«SH 05 Sm was?» 325 ©0258:— EEwSE mo. was GR: 85:8 coumgomfi 853% o£oo%-5m:o 0202 .> 03:. 139 $me coumwafimm E mooaflobmv 05 .— «...—ME — _ 2:- OO 18N 18M seawwa 310 18.. 223 o o 2.5 o o 8w 140 ow mm .Bm é «2% 058323 8 7: £3 oséxoa? .N 2:5 .5: cm m1 ow nm em mm 0m mu 0— m o $8 I a 0‘! mean Jonarsod 93190 141 mo ow mm .mo .8 2% 058%-an 8 E .33 038mg? .m 955 3.82 Om m? o7 mm cm mm on mm c— m o o _ fiu G 0 mV n mu 0 .d 0 NF m. m .M‘ ... m... mw m“ 142 .6608 ESE 0:980meon 3%? 0538.883 co woman 3m .8.“ mooaobgv >5on 2% m0 838 How—88a mo «29.33% .9 0.53,.” .22... ounutmxmgoz WW1 onsepaxsmamu 143 .338 224 3333888: 3%? £33883 no woman 3m 8m moocouobmv xaowoa 2% mo 83333 335% Snowman .«o 8.983% .m 952..— 224 ozmaumxmgo: 3 N. v F no co to No o r _ h _ _ b _ O - Nd H w . a .. vo m S , 0.0 a p ,. 3 m m. I P m - Ne w v._‘ 144 .638 SEQ 038—on8.8qu 3%? 3863880: no woman m0 Sm mooaeobmu ~30on 9% MO 338 Showman .«o 8.933% .0 «...—ME SEQ 0550930on 1 PI 0 O ,. no- . m 09 .. m P P v _ _ _ W- by m. 3 O . m 0 - F 145 .388 SEC 0538—858: 3%? 2630x882 no 382 m0 Sm moocuuotmv acumen 2% mo meow—«3% 2353 Sigma .«o “29835 .5 «...—ur— sEo ufiaumfioso: 93 to 93 0.0 m3 No who to mod 0 - P b P p p _ . _ O I to m m . m , No x m B 7 no u... 3 w , v.0 w - md 146 CONCLUSIONS AND RECOMMENDATIONS Fitness traits such as conformation score, calving ease and ovulation rate affect livestock production profitability. This fact has been reflected in animal breeding as selection emphasis has shifted gradually away from yield to fitness traits in order to maximize farm income. Calving ease is particularly important for cost control to beef and dairy cattle production because of the increased risk that calving problems add to the survival of both calf and dam. Accurate inference on genetic parameters and genetic merit are important for effective sire selection strategies to improve calving ease. Variance components and their derivative genetic parameters in a threshold sire- matemal grandsire model analysis of calving ease have been historically estimated using approximate marginal maximum likelihood (MML) procedures based on expectation- maximization (EM). In order to assess the validity of approximate MML estimates, inferences were compared to those based on sequences of Markov Chain Monte Carlo (MCMC) sampling in Chapter H. Laplacian ML and MCMC point estimates of variance components and direct and maternal heritabilities were found to be significant for calving ease in first parity Italian Piemontese dams. However, the covariance between additive direct and maternal effects was found to be not different from zero based on MCMC-derived posterior probability intervals. Furthermore, the joint modal estimates of sire effects and associated standard errors based on MML estimates of variance components differed little from the respective posterior means and standard deviations using MCMC. The results suggest that it may not be necessary to apply computationally intensive MCMC methods for inferences on genetic parameters and sire genetic merits 147 using threshold models on large calving ease data sets, at least based on a conventional genetic evaluation model. A hierarchical threshold mixed model based upon a cumulative t-link rather than a cumulative probit link specification for the analysis of ordinal data was developed and applied to calving ease scores in Chapter III. The model and MCMC algorithm were validated on simulated liability and categorical data sets from normal and t distributions (with 4 degrees of freedom) using the deviance information criterion (DIC) and log marginal likelihood (LML). The simulation study indicated that inference on the t error degrees of freedom was reliable for the analyses of liability and ordinal data sets; however, MCMC mixing was problematic, especially for ordinal data. Both DIC and LML were able to choose the correct model in most cases. In the analysis of calving ease scores on Italian Piemontese cattle, the cumulative t-link model was overwhelmingly chosen as the best fitting model. However, the posterior means of direct and maternal heritabilities from cumulative probit link and t-link threshold models were found to be not meaningfully different from each other. Furthermore, the examination of posterior means of sire effects showed that there was no real difference between two models in terms of genetic rankings of sires. A surprising result is that the covariance between additive direct and maternal effects was found to be significantly negative in Chapter III, in stark contrast to the results presented in Chapter H. The result in Chapter 11 further appeared to be dependent upon whether or not herds were treated as random or fixed. In Chapter IV, a Bayesian hierarchical generalized mixed model based on a structural multifactorial model with fixed and random effects that multiplicatively influence heterogeneity of residual variance was developed. The models and MCMC 148 algorithms were validated on simulated normal and ordinal categorical data characterized by heteroskedastic residual structures. The simulation study further indicated that DIC and LML were useful in correctly choosing between heteroskedastic and homoskedastic models. Application of the method on Italian Piemontese calving ease data indicated that the residual variance for male calves was significantly greater than that for female calves; and that residual variances for some herds were significantly higher or lower than average. However, as in Chapter 2, the high correlation between posterior mean of sire effects from heteroskedastic and homoskedastic models suggested no meaningful reranking. It appears that better hierarchical Bayesian models confer primary advantages in more accurately modeling variability in calving ease without providing major perturbations on sire rankings. Whereas this may be true for cumulative probit mixed effects models, this is less likely to be the case for linear mixed effects models, as hinted by a heteroskedastic analysis of birth weight in Chapter IV. Furthermore, significant rerankings may be expected to occur in animal model rather than sire model specifications, particularly for dams. In future research, heterogeneous residual variance could be modeled as a function of age of dam, year or region as further extensions to the models considered in Chapter IV. Furthermore, heteroskedastic t-error based mixed models for residuals might be considered for the analyses of production and liability variables; work is already underway on this front with preliminary results indicating heteroskedastic t-error models as the best fitting relative to those models considered in this dissertation. 149 More significantly, heterogeneity of residual and genetic correlations might exist between birth weight and calving difficulty. As this phenomenon potentially influences national selection strategies, further work on providing greater modeling flexibility with normal and t-distributed homoskedastic and heteroskedastic error structures on both traits jointly is needed. 150 SITY ' lllj‘lijjjifljfljjllfi 26 jj‘m '