QUANTITATIVE GENETIC AND GENOMIC MODELING OF FEED EFFICIENCY IN DAIRY CATTLE By Yongfang Lu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Animal Science Doctor of Philosophy 2016 ABSTRACT QUANTITATIVE GENETIC AND GENOMIC MODELING OF FEED EFFICIENCY IN DAIRY CATTLE By Yongfang Lu Residual feed efficiency (rFE) is an increasingly important novel trait for maintaining the economic and environmental sustainability of dairy cattle production. Given that rFE is a derived trait based on the key energy source trait, dry matter intake (DMI), and key energy sink traits like milk energy (MILKE) and metabolic body weight (MBW), a more thorough quantitative genetic study of rFE would provide important baseline knowledge for dairy cattle management. A dairy consortium dataset focused on eventually developing genetic evaluations for rFE in US Holsteins, has been collected on DMI, MILKE, and MBW records from nearly 7000 dairy cattle in four countries. My dissertation was designed to address some motivating quantitative genetic research questions that could be addressed from the analyses of this dataset. In Chapter 2, I reassessed the merit of using residual feed intake (RFI), defined as the estimated residual from regressing DMI on energy sink traits, as a measure of rFE, given that it is most commonly used in dairy cattle breeding research. We proposed a multiple trait (MT) modeling strategy involving all of the component traits, demonstrating that the use of the Cholesky Decomposition (CD) on the genetic and residual variance-covariance matrices lead to a more potentially flexible quantitative genetic approach to modeling rFE compared to RFI. The advantages of this MT approach were confirmed by simulation when the genetic versus residual relationships between energy sink and source traits were rather divergent. However, there appeared to be no meaningful differences in quantitative genetic inferences when applied to the dairy consortium dataset. Similar conclusions on non-distinctions between the two models for genome wide association (GWA) analyses were also drawn in Chapter 4 although the MT GWA analyses shed further light that quantitative genetic inferences on DMI are distinctly independent of those on rFE. I also investigated heterogeneous genetic relationships across environments for rFE with two broadly different approaches. In Chapter 3, I extended the MT model to discover substantial heterogeneity in genetic and residual partial efficiencies and variance components defining rFE as functions of environmental and management factors. Subsequently, I inferred upon genotype by environment interaction at the genomic level across environment conditions in Chapter 5, determining that some genomic regions are sensitive to environmental covariates such as average production and temperature. iv ACKNOWLEDGMENTS This dissertation could not have been completed without the assistance and support from many professors, colleagues, and my family. I would like to acknowledge people who have made great contributions to my PhD research and my life. First, I want to express my sincere appreciation to my advisor, Dr. Robert Tempelman, who has offered unconditional support and understanding over the years. Thank you for inspiring and thoughtful guidance through my graduate training. You have continually and convincingly conveyed a spirit of adventure in regard to research and scholarship. I would like to thank my committee members, Dr. Mike VandeHaar, Dr. Juan P. Steibel, Dr. Nora M. Bello, and Dr. Andrew O. Finley for their encouragement, insightful comments, inspiring discussions, and useful suggestions. I appreciate your patience and understanding over the years. Special thanks to Dr.Mike VandeHaar, who has made every effort to create feed efficiency consortium dataset. My whole dissertation was based on the viability of this dataset. In addition, thanks to my colleagues and department of Animal Science. They have supported me all the time. I am very grateful for the friendship of all of the members of the animal genetic and breeding research group. I have worked closely and puzzled over many research and personal problems with them. Finally, many thanks to my parents and dear sister. They have let me pursue PhD degree freely and encouraged me as always. Also, to my husband, thank you for taking care of me and bringing so much happiness to my life. v TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................... viii LIST OF FIGURES ........................................................................................................................ x Chapter1: Introduction .................................................................................................................... 1 1.1 Data Resources ..................................................................................................................... 3 1.2 Research Issues to Resolve with Genetics of Residual Feed Efficiency .............................. 4 1.3 Statistical Approaches .......................................................................................................... 7 1.4 Specific Objectives ............................................................................................................... 7 Chapter2: An Alternative Approach to Modeling Genetic Merit of Feed Efficiency in Dairy Cattle ............................................................................................................................................... 9 2.1 Abstract ................................................................................................................................ 9 2.2 Introduction .......................................................................................................................... 9 2.3 Materials and Methods ....................................................................................................... 12 2.3.1 Proposed MT Model ................................................................................................... 12 2.3.2 Functional Similarities between RFI and Proposed MT Model ................................. 15 2.3.3 Simulation Study ........................................................................................................ 17 2.3.4 Application to Dairy Consortium Data ....................................................................... 19 2.4 Results ................................................................................................................................ 23 2.4.1 Simulation Study ........................................................................................................ 23 2.4.2 Application to Dairy Consortium Data ....................................................................... 28 2.4.2.1 Variance Component Estimates .......................................................................... 28 2.4.2.2 Partial Efficiencies/Regressions .......................................................................... 29 2.4.2.3 Correlation between EBV ................................................................................... 30 2.4.2.4 Cross-validation Study ........................................................................................ 31 2.5 Discussion .......................................................................................................................... 32 2.6 Conclusions ........................................................................................................................ 36 Chapter3: Modeling Genetic and Non-Genetic Variation of Feed Efficiency and its Partial Relationships between Component Traits as a Function of Management and Environmental Factors ........................................................................................................................................... 37 3.1 Abstract .............................................................................................................................. 37 vi 3.2 Introduction ........................................................................................................................ 37 3.3 Materials and Methods ....................................................................................................... 40 3.3.1 The Multiple Trait Model ........................................................................................... 40 3.3.2 Application to Dairy Consortium Data ....................................................................... 47 3.3.2.1 Data Source and Editing ..................................................................................... 47 3.3.2.2 Model Comparison.............................................................................................. 49 3.3.2.3 Cross-validation Assessment .............................................................................. 50 3.3.2.4 Marginal Means .................................................................................................. 51 3.3.3 Simulation Study ........................................................................................................ 51 3.3.4 MCMC Implementation ............................................................................................. 52 3.4 Results ................................................................................................................................ 53 3.4.1 Analysis of Dairy Consortium Data ........................................................................... 53 3.4.2 Inferences on Factors Influencing PE ......................................................................... 53 3.4.3 Inferences on Variance Components and Heritability of rFE .................................... 60 3.4.4 Cross-validation .......................................................................................................... 64 3.4.5 Simulation Study Assessment .................................................................................... 65 3.5 Discussion .......................................................................................................................... 66 3.6 Conclusions ........................................................................................................................ 71 Chapter4: Genome Wide Association Analyses based on Alternative Strategies for Modeling Feed Efficiency ............................................................................................................................. 72 4.1 Abstract .............................................................................................................................. 72 4.2 Introduction ........................................................................................................................ 73 4.3 Materials and Methods ....................................................................................................... 75 4.3.1 The MT Model............................................................................................................ 75 4.3.2 Single SNP Associations ............................................................................................ 78 4.3.3 Window-Based Associations ...................................................................................... 79 4.3.4 Application to Dairy Consortium Data ....................................................................... 80 4.3.4.1 Data Description ................................................................................................. 80 4.3.4.2 Real Data Analyses ............................................................................................. 83 4.4 Results and Discussion ....................................................................................................... 84 4.4.1 Genetic Parameter Estimates ...................................................................................... 84 4.4.2 Single SNP Associations ............................................................................................ 85 4.4.3 Window-Based Associations ...................................................................................... 88 vii 4.4.4 General Discussion ..................................................................................................... 93 4.5 Conclusions ........................................................................................................................ 96 Chapter5: Modeling genotype by environment interaction over multiple covariates for genome-wide association analyses of residual feed efficiency in dairy cattle ............................................ 98 5.1 Abstract .............................................................................................................................. 98 5.2 Introduction ........................................................................................................................ 99 5.3 Materials and Methods ..................................................................................................... 101 5.3.1 Conventional Genomic Prediction Model for Residual Feed Intake ........................ 101 5.3.2 Reaction Norm Extension Involving Multiple Covariates ....................................... 102 5.3.3 Backsolving for SNP-specific Effects ...................................................................... 104 5.3.4 Genome Wide Associations on SNP-specific Overall Effects and Reaction Norms 104 5.3.5 Simulation Study ...................................................................................................... 108 5.3.6 Application to RFI Data ........................................................................................... 109 5.3.7 Cross-validation Assessment .................................................................................... 111 5.4 Results .............................................................................................................................. 111 5.4.1 Simulation Study ...................................................................................................... 111 5.4.2 Real Data Analyses ................................................................................................... 114 5.4.3 Genome-wide Association Analyses for RFI ........................................................... 116 5.5 Discussion ........................................................................................................................ 119 5.6 Conclusions ...................................................................................................................... 123 Chapter6: Conclusions and Future Work .................................................................................... 124 6.1 Conclusions ...................................................................................................................... 124 6.2 Implications of Dissertation for Genetic Improvement and Control of Feed Efficiency . 127 6.3 Future Work ..................................................................................................................... 128 APPENDICES ............................................................................................................................ 131 APPENDIX A: Chapter 1 ...................................................................................................... 132 APPENDIX B: Chapter 2 ....................................................................................................... 136 APPENDIX C: Chapter 3 ....................................................................................................... 151 APPENDIX D: Chapter 4....................................................................................................... 158 REFERENCES ........................................................................................................................... 169 viii LIST OF TABLES Table 2.1: Ranges of values on seven key parameters for testing proposed multivariate model versus RFI model under a response surface design simulation study. .......................................... 18 Table 2.2: Frequency of number of cows and lactations by station in dairy consortium study.... 20 Table 2.3: Estimates of variance components, heritabilities, and repeatabilities of rFE from dairy consortium data analyses based on proposed multivariate approach versus RFI model. ............. 29 Table 2.4: Estimates of partial efficiencies based on proposed multivariate model in dairy consortium data analysis ............................................................................................................... 30 Table 2.5: Cross Validation Prediction Accuracies for each of 5 different validation datasets (fold) based on proposed (MT) versus RFI analyses of dairy consortium data ...................................... 31 Table 3.1: Distribution of cows by region. ................................................................................... 48 Table 3.2: Posterior inferences characterizing heterogeneity of genetic and residual partial efficiencies of DMI1 on MILKE2 .................................................................................................. 55 Table 3.3: Posterior inferences characterizing heterogeneity of genetic and residual partial efficiencies of DMI1 on MBW2 .................................................................................................... 56 Table 3.4: Posterior inferences characterizing heterogeneity of genetic and residual variation of DMI1|MILKE2,MBW3 .................................................................................................................. 62 Table 3.5: Inferences on station and parity specific heritabilities for DMI1|MILKE2,MBW3...... 64 Table 3.6: Cross validation prediction accuracies for each of 5 different validation datasets (fold) for homogeneous and heterogeneous models. .............................................................................. 65 Table 3.7: Summary on 95% highest posterior density interval (HPD) and coverage probabilities across 20 replicated datasets in simulation study ......................................................................... 66 Table 4.1: Brief characterization of dataset by stations1 ............................................................... 82 Table 4.2: Estimated variance components and heritability for residual feed efficiency and its component traits. ........................................................................................................................... 84 ix Table 4.3: Residual and genetic covariances between the three component traits ....................... 85 Table 4.4: Physical location1 (in Mb) of significant regions detected for each trait .................... 89 Table 4.5: Symbol, name, and involved biological functions of genes overlapped with 2 significant regions identified for residual feed efficiency in Holstein cows ................................ 91 Table 5.1: Brief characterization of dataset by stations .............................................................. 107 Table 5.2: Areas under receiver operating characteristic curve (AUC) between the classic GBLUP and the reaction norm (RN) model at different false positive rates. ............................. 112 Table 5.3: Estimated variance components and likelihood ratio test for residual feed intake based on the reaction norm (RN) models with various breakpoints on temperature (TP).................... 114 Table 5.4: Estimated variance components and likelihood ratio test for dry matter intake based on the reaction norm (RN) models with various breakpoints on temperature (TP).................... 115 Table 5.5: Symbol, name, and biological process of genes have reaction norm (RN) effects with temperature breaks at 65F (TP65) and contemporary group milk production (AP) for the residual feed intake ................................................................................................................................... 118 Table 5.6: Cross Validation Prediction Accuracies for each of 5 different validation datasets (fold) under the regular GBLUP and the reaction norm (RN) model. .................................................. 119 Table A.1: True values for each of seven key parameters in 162 run response surface design simulation study. ......................................................................................................................... 132 Table B.1: Posterior inferences characterizing heterogeneity of genetic and residual partial efficiencies of MBW1 on MILKE2 ............................................................................................. 147 Table B.2: Marginal Mean and Variance Component Inferences of Genetic and Residual Variance Components of MILKE1.............................................................................................. 148 Table B.3: Marginal Mean and Variance Component Inferences of Genetic and Residual Variance Components of MBW1| MILKE2 ................................................................................ 150 x LIST OF FIGURES Figure 2.1: Estimated (with standard error bars) mean residual (: grey bars) and genetic (: white bars) partial efficiencies under proposed multivariate model and estimated partial regression coefficients (: black bars) under RFI model for DMI on (a) MILKE (j=1) as a function of couplets of values for [] and for DMI on (b) MBW (j=2) as a function of couplets of values for [] in response surface design simulation study........................ 24 Figure 2.2: Estimated (with standard error bars) mean residual (: grey bars) and genetic (: white bars) partial efficiencies under proposed multivariate model and estimated partial regression coefficients (: black bars) under RFI model for DMI on (a) MILKE (j=1) as a function of couplets of values for [] and for DMI on (b) MBW (j=2) as a function of couplets of values for [] in response surface design simulation study........................ 25 Figure 2.3: Estimated (with standard error bars) mean parameters under RFI (white bars) vs proposed multivariate (black bars) models for (a) genetic variances versus , (b) residual variances versus and (c) heritabilities versus as a function of equivalent partial efficiencies ( defined as and ) or nonequivalent partial efficiencies ( defined asand ) in response surface design simulation study. Any bars not sharing the same letter within a pair are different from each other (P<0.05). ....................................................................................................................................... 26 Figure 2.4: Differences in accuracies of genetic merit prediction between proposed multivariate and RFI models versus for = 0.2(--), 0.3(--), 0.4 (--),0.5(--), and 0.6(--) in response surface design simulation study. .................................................. 27 Figure 2.5: Difference in accuracies of prediction of genetic merit between proposed multivariate and RFI models versus for being 0.0500(--), 0.0875(--), 0.1250 (--), 0.1625(--), and 0.2000(--) in response surface design simulation study. ...... 28 Figure 2.6: Scatterplot of EBV of rFE based on proposed multivariate model () versus RFI () model of dairy consortium data. Reference line of slope 1 and intercept 0 superimposed....................................................................................................................................................... 31 xi Figure 3.1: Posterior means of station-specific genetic (and ) and residual (and ) partial efficiencies versus estimated partial regression coefficients (and) previously reported by Tempelman et al., 2015) for DMI1 on MILKE2 (Panel a) and MBW3 (Panel a). .................... 59 Figure 4.1: Comparison of BLUP of SNP effects for RFI versus DMI conditional on MILKE and MBW. Line of slope 1 and intercept 0 superimposed. ................................................................. 86 Figure 4.2: Comparison of single SNP inferences (-log10 (p)) for DMI conditional on MILKE and MBW versus those for RFI. Line of slope 1 and intercept 0 superimposed. ......................... 86 Figure 4.3: The Manhattan plots of single SNP for DMI conditional on MILKE and MBW(a) and for RFI (b). Bonferroni correction threshold superimposed. ........................................................ 87 Figure 4.4:The Manhattan plots of single SNP on MILKE (a), MBW(b) , and DMI(c). Bonferroni correction threshold superimposed. ............................................................................ 88 Figure 4.5: The Manhattan plots of 1MB windows for DMI conditional on MILKE and MBW(a) and for RFI (b). Bonferroni correction threshold superimposed. ................................................. 90 Figure 4.6: The Manhattan plot of 1-Mb windows on MILKE (a), MBW(b) , and DMI(c). Bonferroni correction threshold superimposed. ............................................................................ 92 Figure 5.2: Receiver operating characteristic (ROC) curves for random intercept for 1Mb non-overlapping windows based on the reaction norm (RN) model and the classic GBLUP averaged across 10 replicates. Line of slope 1 and intercept 0 superimposed. .......................................... 112 Figure 5.3: Receiver operating characteristic (ROC) curves for random slopes on first to fourth order polynomial terms of AP for 1Mb non-overlapping windows based on the reaction norm (RN) model and the classic GBLUP averaged across 10 replicates. Line of slope 1 and intercept 0 superimposed. .......................................................................................................................... 113 Figure 5.4: Comparison on -log10(p) of SNP (a) and 1Mb window (b) for random intercept between the GBLUP and the reaction norm (RN) model. Line of slope 1 and intercept 0 superimposed. ............................................................................................................................. 116 Figure C.1: Scatterplot of log10(p) for SNPs (a) and windows (b) based on the homogeneous partial regression RFI versus heterogeneous partial regression RFI models. Line of slope 1 and intercept 0 superimposed. ........................................................................................................... 156 xii Figure C.2: Manhatten plots for windows on RFI based on all cows (a) and genotyped cows (b). Bonferroni correction threshold superimposed. .......................................................................... 157 Figure D.1: Manhattan plot for SNP-specific random intercept based on the regular GBLUP (a) and the RN model (b) .................................................................................................................. 163 Figure D.2: Manhattan plot for Window-specific random intercept based on the regular GBLUP (a) and the RN model (b) ............................................................................................................ 164 Figure D.3: Manhattan plot for SNP-specific random slope on the linear (a) and quadratic (b) terms of AP ................................................................................................................................. 164 Figure D.4: Manhattan plot for SNP-specific random slope on the linear (a) and quadratic (b) terms of TP65 .............................................................................................................................. 165 Figure D.5: Manhattan plot for SNP-specific random slope on the linear (a) and quadratic (b) terms of RH (1) and interaction between TP65 and RH (2) ....................................................... 166 Figure D.6: Manhattan plot for Window-specific random on the linear (a) and quadratic (b) terms of AP ........................................................................................................................................... 167 Figure D.7: Manhattan plot for Window-specific random on the linear (a) and quadratic (b) terms of TP65........................................................................................................................................ 167 Figure D.8: Manhattan plot for Window-specific random slope on the linear (a) and quadratic (b) terms of RH (1) and interaction between TP65 and RH (2) ....................................................... 168 1 Chapter1: Introduction According to the FAO, global food production must increase by 70% to support an anticipated population of 9.1 billion by 2050 (http://www.fao.org/fileadmin/templates/wsfs/docs/expert_paper/How_to_Feed_the_World_in_2050.pdf). Milk is a complete source of nutrients and milk production must increase even further, roughly 80%, by 2050 (Steinfeld et al., 2006). However, increased milk production is associated with higher feed consumption and concomitant greater greenhouse gas emissions. Feed cost accounts for 50-80% of total cost for dairy production (Connor et al., 2013, Sova et al., 2013). Furthermore, for every kg of milk produced, there is an average 1.5 kg of carbon dioxide equivalent of greenhouse gas emissions (GHG) (Hagemann et al., 2012) though Capper et al. (2009) indicated that improving milk production will reduce GHG for every unit milk production. Thus, selection on production traits without considering dry matter intake (DMI) may threaten the economic and environmental sustainability of dairy cattle production. Hence, there have been strong arguments put forward to improve feed efficiency (FE) in dairy cattle (Hayes et al., 2013, Pryce et al., 2015, VandeHaar et al., 2016). Overall feed efficiency has very broad definitions that generally can be classified into two types of traits: ratio based traits such as feed conversion rate, and residual based traits such as residual feed intake (RFI) (Berry and Crowley, 2013). Overall feed efficiency, defined as the ratio of feed intake captured in milk products, has been doubled for the past 50 years. This increase is mostly due to the dilution of maintenance; i.e., feeding cows with much more feed than the maintenance requirements to promote the proportion of feed utilized for milk production 2 (VandeHaar and St-Pierre, 2006, VandeHaar et al., 2016). Yet, VandeHaar et al. (2016) suggested that dilution of maintenance effect diminishes due to the decreased feed digestibility as cows consume higher levels of feed, and therefore more attention move toward residual based FE traits that define and enhance digestive and metabolic efficiency of feed. Thus, the overall feed efficiency has two major components, 1) feed digestive and metabolic efficiency and 2) dilution of maintenance (i.e. less feed being directed towards body maintenance). RFI and other residual based measures that evaluate variation in digestive and metabolic efficiency among animals that are independent of the dilution of maintenance are not complete measures of overall feed efficiency given they do not favor small cows with lower maintenance requirement. In this dissertation, we will refer to residual feed efficiency (rFE) as the feed digestive and metabolic efficiency that are defined by RFI or similar measures that do not necessarily equate with overall feed efficiency and economic efficiency. Note that selection for small body weight and hence a smaller maintenance requirement can be easily pursued in dairy breeding (Pryce et al., 2015). Whole genome prediction (WGP), based on the use of high density single nucleotide polymorphism (SNP) markers (Meuwissen et al., 2001) has recently played a prominent role in accelerating the genetic improvement of dairy cattle since it was implemented in 2009 (Garcia-Ruiz et al., 2016, Taylor et al., 2016). That is, higher accuracies of genetic merit prediction and larger genetic gains are achieved by using genomic information compared to traditional selection based on using only pedigree information. Genomic selection (GS) based on the WGP has proven to be an efficient way to improve economically important traits in dairy cattle. With the implementation of GS, genetic merit on milk yield has increased by 8% over the past 8 years as 3 reported by the Council on Dairy Cattle Breeding (https://www.cdcb.us/eval/summary/trend.cfm?). Genetic merit has also improved for reproductive traits as well (Garcia-Ruiz et al., 2016). It has also been noted that GS is particularly useful selection for lowly heritable traits (Calus et al., 2013). Multiple studies have reported the estimated heritability of rFE to be between 0.08 and 0.35 depending on various measures and populations. Hence, genetic variation for rFE appears to exist and should be considered in genetic selection programs (Vallimont et al., 2010, 2011, Pryce et al., 2014, Tempelman et al., 2015, VandeHaar et al., 2016). In particular, the use of GS for selecting and breeding cows with higher rFE appears to be promising in dairy cattle (Pryce et al., 2015, VandeHaar et al., 2016). Furthermore, the use of genome wide association analyses (GWA) could be also helpful to detect important genomic regions influencing rFE. Inference on genetic merit depends on the availability of phenotypic records from animals or their relatives. Similarly, the use of WGP and GWA relies upon a sufficiently large number of genotyped animals in order to obtain reliable and meaningful results. Thanks to the existence of national Dairy Herd Improvement (DHI) programs, milk yields and their components (i.e. fat and protein yields) on dairy cows have been regularly collected for decades. Furthermore, there are now over 1.4 million US dairy cattle that are genotyped (https://www.cdcb.us/News/08_11_16_CDCB_Livability_1_1_clean_002.pdf). However, DMI is currently not a routinely recorded trait as individual DMI records requires substantial 4 investments in facilities and labor. Hence, the lack of DMI data is a major obstacle to the genetic improvement in rFE. Combining data from different institutions is being increasingly used to assemble large enough datasets for genetic inferences (de Haas et al., 2012, Berry et al., 2014, Andersson et al., 2015, Hardie et al., 2015). Currently, there are several research consortia that involve the collection of DMI records from dairy cattle. The dairy feed efficiency project led by Michigan State University and funded by USDA-NIFA (United States Department of Agriculture National Institute of Food and Agriculture) has led to the creation of a database including records on DMI, milk yield (MY), milk components, body weight (BW), and body condition score (BCS) on nearly 7,000 cows from 16 different research stations in U.S., Canada, Dutch, and the United Kingdom (Tempelman et al., 2015); this is the dataset that has been used in this particular dissertation. The global DMI (gDMI) project led by Wageningen University also combined DMI data from multiple countries but with a primary objective of predicting genetic merit and selecting for DMI rather than for rFE per se (Veerkamp et al., 2013). There is yet another project involving dairy researchers from Canada, Australia, and New Zealand to perform genomic studies on residual feed intake (RFI). Some data has been shared across these three projects. The availability of the USDA-NIFA consortium dataset greatly facilitates genetic studies on feed efficiency and its component traits. With this dataset, I hope to answer some questions that are critical to genetic and genomic selection on rFE based on a relatively large population. 5 One question is how should we evaluate rFE in dairy cattle? There are a few different measures for rFE in dairy cattle, including feed conversion ratio, gross efficiency, RFI, and net energy efficiency (Koch et al., 1963, Kennedy et al., 1993, Berry and Crowley, 2013, Berry and Pryce, 2014). Among them, the currently most popular is RFI defined as the difference between actual DMI and that predicted based on energy sink requirements for production and maintenance (Koch et al., 1963). These predictions assumed that the partial relationships relating DMI to these energy sink traits are the same at genetic and non-genetic levels. Furthermore, the use of RFI requires that records on all the component traits used to compute RFI are available on each animal. Currently, the gDMI research group focuses on the analysis of DMI rather than RFI, in selecting for FE (Veerkamp et al., 2013). There have been reservations . A natural question is under what conditions would selection for DMI and RFI be equivalent? As indicated previously, assembling data from various research stations is necessary to build a sufficiently large dataset for inference; nevertheless, there may be issues in integrating data from rather disparate sources (Banos et al., 2012). Data quality can be partly addressed by invoking careful edits and filters. However, there may be heterogeneity at various levels which may somewhat contaminate inferences involving combined datasets if these issues are not appropriately handled. Heterogeneity in residual variances has been widely recognized and considered in breeding for many economic traits in dairy cattle (Lopez-Romero et al., 2003, Ronnegard et al., 2010, Ronnegard et al., 2013, Ou et al., 2016) as well as heterogeneity at the genetic level (Martin-Collado et al., 2015, Shirali et al., 2015, Tempelman et al., 2015). For 6 example, Tempelman et al. (2015) found that genetic variances for RFI varied greatly across countries. Also, de Haas et al. (2012) also reported low genetic correlations among DMI from different countries, suggesting DMI may be genetically heterogeneous across countries. Each of those two studies implied that heterogeneity was influenced by research station or even country. However, residual and genetic heterogeneity of rFE may be caused by various other factors including weather, location, parities, rations, farm management, etc (Martin-Collado et al., 2015). Inferring such heterogeneity in rFE could provide important base knowledge for fine-tuning management across production systems. Finally, GWA analyses of rFE require further research. There have been smaller scale GWA studies conducted for the rFE in dairy cattle populations (Bolormaa et al., 2011, Yao et al., 2013). Although Veerkamp et al. (2012) performed GWA analysis on DMI, no GWA study in dairy cattle has directly compared differences in GWA inferences between rFE and DMI to my knowledge. This is somewhat intriguing as there is some controversy as to whether to include DMI or rFE as a key trait for genetic improvement programs in dairy cattle (Berry and Pryce, 2014). With rFE, there are other considerations involving GWA inferences. That is, are allelic substitution effects in a certain portion of the genome of a consistent magnitude across environments or are these effects environmentally sensitive? To address these concerns, GWA analyses should attempt to model genotype by environment (GxE) interaction in which the allelic substitution effects for various genotypes are modeled as functions of potentially important environmental covariates. This phenomenon is believed to be fairly common for complex traits 7 within a wide variety of species including dairy cattle (Valdar et al., 2006, Dempfle et al., 2008, Goddard and Hayes, 2009). Information on GxE could be particularly exploited in dairy cattle breeding because of advanced reproductive technologies (e.g. artificial insemination, embryo transfer, synchronized breedings) that facilitate the ready exchange of germplasm across environments. By developing methodology and corresponding data analyses to answer these questions, I hope to develop some base knowledge on the complexities of genetic and environmental modeling of rFE that could be incorporated in future dairy cattle management schemes. In this dissertation, I proposed an alternative strategy for modeling rFE based on a multivariate (MT) model involving the components traits of rFE, namely DMI, milk energy (MILKE), and metabolic body weight (MBW). One can readily think of rFE as being DMI adjusted for (or conditional on) MILKE and MBW. The Cholesky decomposition (Pourahmadi, 2007) on the genetic and residual variance-covariance matrices in the MT facilitates such a characterization at both genetic and non-genetic levels and has been previously used to characterize the relationship between production and reproduction traits in dairy cattle (Bello et al., 2012). Exploring partial relationships between the component traits of rFE at genetic and non-genetic levels could allow a richness of insight not otherwise allowed with RFI modeling. This dissertation aims to provide a new perspective to evaluate and perform genetic analysis on rFE using the USDA-NIFA funded dairy consortium dataset. The first objective of this 8 dissertation is to demonstrate how the Cholesky decomposition of genetic and residual variance-covariance matrices in an MT model on the key components traits of rFE leads to a potentially more elegant characterization of rFE compared to anything else previously provided. A second objective is to explore potential heterogeneity of partial efficiencies and variance components of rFE at both genetic and non-genetic levels as functions of various environment and management factors. A third objective was to assess whether or not there might be any advantages to using the MT over the RFI model for GWA and particularly to assess how the associations may differ between the various component traits of rFE and rFE itself. My final objective was to investigate the impact of GxE through reaction norm modeling and to assess its impact on GWA and resulting inferences. All objectives involve the development of methodologies and/or models, followed by a simple simulation study assessment and an application to the USDA-NIFA consortium dataset. 9 Chapter2: An Alternative Approach to Modeling Genetic Merit of Feed Efficiency in Dairy Cattle Genetic improvement of residual feed efficiency (rFE) in dairy cattle requires greater attention given increasingly important resource constraint issues. A widely accepted yet occasionally contested measure of rFE in dairy cattle is residual feed intake (RFI). The use of RFI is limiting for a number of reasons, including interpretation, differences in recording frequencies between the various component traits that define RFI, and potential differences in genetic versus non-genetic relationships between the component traits. Hence, analyses focusing on dry matter intake (DMI) as the response are often preferred. We propose an alternative multiple-trait (MT) modeling strategy that exploits the Cholesky decomposition to provide a potentially more robust measure of rFE. We assessed both approaches by simulation as well as by application to 26,383 weekly records from 50 to 200 days in milk on 2,470 cows from a dairy rFE consortium study involving 7 institutions. Although the proposed MT model fared better than the RFI model when simulated genetic and non-genetic associations between DMI and component traits were substantially different from each other, there were no meaningful differences in predictive performance between the two models when applied to the consortium data. Residual feed efficiency (rFE) based on the efficient conversion of feed nutrients into saleable milk directly affects the profitability of dairy production. Thus, improving rFE in cattle is important to maximizing dairy production on limited inputs, especially as constraints on feed production become increasingly relevant. In addition, improving rFE is also of environmental 10 importance because more nutrients are directed into milk production with less nutrient loss in manure and methane excreted as rFE increases (Richardson and Herd, 2004). A popular measure of rFE is residual feed intake (RFI) which is defined as the difference between actual feed intake and that predicted based on requirements for production and maintenance or the so-(Koch et al., 1963). The typical statistical modeling strategy for RFI includes two stages (Tempelman et al., 2015). For reasons that will be explained later, we will refer to DMI as Trait #3 such that yi3 represents the DMI record for animal i. In the first stage, DMI, as the energy source, is typically specified as a linear function of various energy sinks plus other fixed or random (non-animal) effects that potentially influence DMI; i.e. . [2.1] Here represents a vector of various fixed effects connected to yi3 via known incidence row vector. Key energy sinks include milk energy (MILKE), metabolic BW (MBW) being defined as BW raised to the ¾ power and BW change (BW) all indexed by animal or record i in Equation [2.1]. Note then that b1, b2, and b3 are partial regression coefficients of DMI on energy sinks MILKE, MBW, and BW, respectively. Now RFIi is merely the estimated residual from Model [2.1], thereby repri; this variable is further typically specified as the response variable in a second stage variance components model: [2.2] 11 In Equation [2.2], is the overall mean, whereas is the animal genetic merit for RFI and connected to RFIi via known incidence row vector . Furthermore, for A being the numerator (Henderson, 1976) or genomic relationship matrix (VanRaden, 2008) or a hybrid of the two (Aguilar et al., 2010), whereas other potential random animal effects may include permanent environmental effects if there is more than one record per animal. Finally is a corresponding residual such that . Conceptually, Equations [2.1] and [2.2] could be combined together as one model as shown in Equation [2.3], thereby reinforcing that RFI is really just an adjusted measure of DMI; i.e. [2.3] As argued by Tempelman et al. (2015), analyses based on a single-stage model [2.3] may be desirable if effects in Equation [2.1] are not orthogonal to effects in Equation [1.2]. Given these limitations of RFI but, nevertheless, an inherent desire to meaningfully characterize rFE beyond a selection index involving DMI, we propose an alternative parameterization for multiple trait (MT) modeling of DMI jointly with key energy sink traits. Our objective is to demonstrate how this parameterization not only leads to a potentially more elegant characterization of rFE compared to RFI, but furthermore reconciles approaches focused on the analyses of RFI versus DMI (Berry and Pryce, 2014). 12 2.3.1 Proposed MT Model Our proposed strategy for characterizing rFE is based on the square root free or modified Cholesky decomposition (CD) (Pourahmadi et al., 2007) which our group has previously adapted for the joint analysis of milk production and reproduction data albeit in a non-genetic context (Bello et al., 2010). We apply this decomposition on each variance-covariance matrix (e.g., genetic and residual) partition in a MT model analysis on DMI and two key energy sink traits (i.e., MILKE and MBW). As with RFI modeling, we prefer to keep BW as a covariate for DMI even in this proposed approach, in part due to its seemingly very low heritability (<1%) and greater relative variability. Statistically, the order for the three traits is rather arbitrary; however, for modeling rFE, it is necessary to specify DMI as the last trait in the sequence as noted later. We write the MT linear mixed model for MILKE, MBW, and DMI in order as Traits 1, 2, and 3, respectively; extensions to any other number of rFE component traits are relatively straightforward, provided again that DMI is specified last. We momentarily assume one record per animal such that the model can be written in a classical quantitative genetics framework as: ; j=1,2,3 [2.4] Here is the vector of n x 1 responses for Trait j whereas denotes the vector of fixed effects for Trait j connected to by corresponding known incidence matrix X.j, whereas denotes the q x 1 vector of additive genetic effects for Trait j connected to by corresponding known incidence matrix Z.j.n x 1 13 Let u denote the vector of additive genetic merit for all three traits jointly; i.e. where elements are ordered by animals within traits. Suppose that we reorder u by traits within animals as with representing the genetic merit for the three traits on animal i. We specify the typical animal breeding MT modeling assumptions using where G is the 3 x 3 genetic variance covariance matrix for the three traits with diagonal element j, (j=1,2,3) being the genetic variancefor Trait j and off-diagonals defining the genetic covariance between Traits j and j(jjj) across the animals. Then, for A being the relationship matrix, as previously noted, and denoting the Kronecker product with q also counting potential ancestors without phenotypic records. Similarly, writing as the 3 x 1 vector of residuals for animal i, we assume where R is a 3 x 3 residual variance covariance matrix with residual variances () along the diagonal and residual covariances in the off-diagonals. A CD on G merely involves the following reparameterization of additive genetic effects for the three traits in a sequential manner: [2.5a] [2.5b] [2.5c] 14 -- [2.7] Now with DMI specified as Trait 3 in this MT sequence, is the genetic variance for DMI conditional on genetic effects of MILKE and MBW. As a corollary, represents the genetic merit of DMI conditional on the genetic merit of MILKE and of MBW for animal i with . It is this term that we propose as an alternative measure for 15 genetic merit of rFE as a counterpart to in Equations [2.2] or [2.3]. Negative values for indicates the animal i has favorably above average genetic merit for rFE with defining the genetic variance of our proposed measure of rFE. Of course, the magnitude of the partial efficiencies and factor into these relationships as well; i.e. together, these coefficients partly determine genetic covariance between each pair of the three traits; e.g., from Equation [2.7], . 2. 2. 2. 2.3.2 Functional Similarities between RFI and Proposed MT Model 2.- 16 2.-2.2. 2. 2.2.,2. 17 2.3.3 Simulation Study - 18 Table 2.1: Ranges of values on seven key parameters for testing proposed multivariate model versus RFI model under a response surface design simulation study. 1 Simulated values for each of 162 runs are provided in Appendix Table A.1. -2.-he , , was determined to be the EBV conditionally on the corresponding REML estimates of and . Similarly, the BLUP of, , using the proposed MT approach was based on the corresponding REML estimates of variance-covariance matrices G and R and subsequent CD-based transformations. Comparisons were made between the two models for estimates of partial efficiencies (MT model) versus partial regressions (RFI model) as well as for estimates of variance components and heritabilities. for the RFI analysis and the correlation between and for the proposed MT approach. Second order response surface model analyses were used to infer upon differences in accuracies between the two models as a function of the 7 parameters. 19 2.3.4 Application to Dairy Consortium Data Data source and editing: DMI, milk yields (MY), BW, all measured in kg, as well as fat, protein, and lactose components of milk were collected from Holstein cows at 7 different North American research stations with most of the data being a subset of data from a larger international consortium Tempelman et al. (2015). These research stations were University of Alberta (AB), Iowa State University (ISU), Michigan State University (MSU), the University of Florida (UF), the University of Wisconsin-Madison (UW), the USDA Dairy Forage Research Center (USDFRC) in Madison, WI, and the USDA Animal Genomics and Improvement Laboratory (AGIL) in Beltsville, MD. For AB, all the above mentioned traits were collected on a monthly basis except DMI; more detailed description of that data can be found in Berry et al. (2014) and Manafiazar et al. (2013). Data editing procedures and conversion to weekly records on DMI, MILKE, MBW, and were conducted as according to those outlined in Tempelman et al. (2015) with a data summary provided in Table 2.2. This included restricting the data from 50 to 200 DIM; i.e., at a period of lactation that is somewhat post-peak such that cows are not likely to be in a negative energy balance. Pedigrees involving a total of 8,721 animals going back 3 generations were procured as well. 20 Table 2.2: Frequency of number of cows and lactations by station in dairy consortium study. AB 1AB= University of Alberta, UF = University of Florida, ISU = Iowa State University, MSU = Michigan State University, USDFRC = USDA Forage Research Center, UW = University of Wisconsin-Madison, AGIL=USDA Animal Genomics Improvement Laboratory Statistical model and data analysis: All analyses were conducted using either the proposed MT approach or the single-step RFI model. However, given that there were multiple weekly records per lactation with some cows, in turn, having multiple lactations, it was necessary to additionally model two sets of permanent environmental effects. That is, the linear mixed model for DMI using the one-stage RFI approach in Equation [2.3] and for each of the three traits using the proposed MT approach in Equation [2.4] were as follows: ; j=1,2,3 [2.10] Equation [2.10] then slightly extends Equation [2.4] (and also Equation [2.3] for RFI analyses) in that two more sets of random effects were fitted. Here include the fixed effects of linear effect of , research station, parity, rations, and fourth order polynomial function of DIM for trait j. The vector u.j represents the genetic effects for Trait j as indicated previously. Furthermore, represents the vector of within-lactation permanent environmental effects for Trait j, identifying common environmental effects consistently associated with an animal within lactation such that is the corresponding incidence matrix. 21 Similarly, represents the vector of between-lactation permanent environmental effects for Trait j identifying common environmental effects consistently associated with an animal across lactations, with being the corresponding incidence matrix. Both sets of permanent environmental effects were also considered in Connor et al. (2013) and Tempelman et al. (2015). Suppose that for the MT analyses, we specify and across all traits. As we did earlier for u and e, we could reorder elements of pw and pb by traits within animals; i.e. rewrite and , where and are the within-lactation and between-lactation permanent environmental effects, respectively, for the three traits on animal i. We then assumeand ; i.e., Pw and Pb are 3 x 3 covariance variance matrices for within-lactation and between-lactation permanent environmental effects, respectively. With this reordering for pw and pb, we specify independence for these effects across animals; i.e. and . We also adapt the same CD-based reparameterization for elements of pw and pb as we did for u in Equations [5abc] and for e in Equations [2.8abc]. Thus, the SCVC for DMI|MILKE&MBW are estimated on four different levels, i.e., conditional genetic (), conditional within-lactation permanent environment (), conditional between-lactation permanent environment (), and conditional residual variance () for our proposed rFE variable. In other words, the corresponding heritability is simply the ratio of and , whereas the corresponding repeatability is the ratio of and . Similar representations of 22 heritabilities () and repeatabilities () for the RFI modeling approach were determined by extending Equation [3] as well, with further details provided in Tempelman et al. (2015). Estimates of variance components, heritabilities, and partial efficiencies at both genetic (i.e., u) and non-genetic (i.e., pw, pb, and e) levels were obtained by ASREML based on using the -covariance matrices (Gilmour et al., 2009), which directly provides estimates of SCVC and partial efficiencies and their approximate standard errors at the various levels. Furthermore, the correlation between the two models for EBV was also assessed. Cross-validation assessment: In order to test the robustness of our cross-validation tests with respect to different strategies of splitting the data, a hybrid data splitting strategy was developed for five-fold cross validation with application to the dairy consortium data. The entire dataset was initially split into two equal non-overlapping sets (Split A and Split B) of records on completely different animals. In Split A, the data were further randomly divided across animals into five nearly equally sized subsplits, i.e. each subsplit included all the records from a randomly chosen 20% of the animals within Split A. In Split B, the data were also randomly divided into five subsplits except that roughly 20% of the records from each animal was represented within each subsplit. One validation dataset was then based on randomly assigning one of the 5 subsplits from Split A with one of the 5 subsplits from Split B, such that the remaining data constituted the training data for that particular fold as part of a 5 fold cross-validation. This cross-validation thereby allowed a simultaneous assessment of the two competing models on both across and within animal cross-validation predictive abilities. The 23 performance metric used was the cross-validation predictive accuracy defined as the correlation between predicted DMI in the validation dataset based on estimates derived from the analyses of the training dataset. 2.4.1 Simulation Study Under simulated scenarios where partial efficiencies were equivalent at both genetic and residual levels, (i.e., and ), estimates of partial efficiencies were virtually identical to estimates of partial regression coefficients (b1 and b2) under the RFI analysis (Figure 2.1) as anticipated from Equation [2.9]. Conversely, under scenarios where partial efficiencies were not the same at the genetic and the corresponding residual levels, (i.e.,and ), partial regression estimates based on the RFI model were intermediate to the two corresponding partial efficiency estimates derived from the MT model; i.e. was intermediate to and whereas was intermediate to and (Figure 2.2). At any rate, whether or not genetic and residual partial efficiencies were equivalent (Figure 2.1) or non-equivalent (Figure 2.2), one can observe that the MT model provided unbiased estimates of these partial efficiencies. 24 Figure 2.1: Estimated (with standard error bars) mean residual (: grey bars) and genetic (: white bars) partial efficiencies under proposed multivariate model and estimated partial regression coefficients (: black bars) under RFI model for DMI on (a) MILKE (j=1) as a function of couplets of values for [] and for DMI on (b) MBW (j=2) as a function of couplets of values for [] in response surface design simulation study. 25 Figure 2.2: Estimated (with standard error bars) mean residual (: grey bars) and genetic (: white bars) partial efficiencies under proposed multivariate model and estimated partial regression coefficients (: black bars) under RFI model for DMI on (a) MILKE (j=1) as a function of couplets of values for [] and for DMI on (b) MBW (j=2) as a function of couplets of values for [] in response surface design simulation study. Under equivalent partial efficiencies at the genetic and residual levels, estimated variance components, and heritabilities were virtually identical for the two modeling approaches, again as anticipated; i.e., and such that (Figure 2.3). Conversely, under non-equivalent partial efficiencies, we observed that(P<0.05), whereas no significant differences were observed between and as well as between and . 26 Figure 2.3: Estimated (with standard error bars) mean parameters under RFI (white bars) vs proposed multivariate (black bars) models for (a) genetic variances versus , (b) residual variances versus and (c) heritabilities versus as a function of equivalent partial efficiencies ( defined as and ) or nonequivalent partial efficiencies ( defined asand ) in response surface design simulation. Bars not sharing the same letter within a pair are different from each other (P<0.05). Differences in accuracies of genetic merit prediction between the proposed MT and classical RFI approaches were determined for their prediction of u3|1,2. For the situation where the corresponding partial efficiencies were the same at both the genetic and residual levels, i.e., and , differences in the accuracy of genetic prediction were close to 0 or even slightly higher for the classical RFI model (Figures 2.4 and 2.5). Again, this result was somewhat anticipated given Equation [2.9] because the classical RFI approach is not only mathematically equivalent but also is more parsimonious compared to the proposed MT approach in those specific cases. Nevertheless, when genetic partial efficiencies were substantially different relative to the corresponding residual partial efficiencies, i.e., 27 and , our proposed MT model fared substantially better (Figures 2.4 and 2.5). In particular, the accuracy of genetic merit prediction was as much as 4 percentage points greater for the proposed approach when and were most divergent from each other (Figure 2.4). However, the difference for accuracy of genetic merit prediction between the two approaches was far less pronounced across various combinations of and as shown in Figure 2.5, likely in part due to their smaller range of investigated values relative to and . Figure 2.4: Differences in accuracies of genetic merit prediction between proposed multivariate and RFI models versus for = 0.2(--), 0.3(-------- 28 Figure 2.5: Difference in accuracies of prediction of genetic merit between proposed multivariate and RFI models versus for being 0.0500(--), 0.0875(---- -- --. 2.4.2 Application to Dairy Consortium Data 2.4.2.1 Variance Component Estimates Estimates of variance components, heritabilities, and repeatabilities using each of the two modeling approaches on the dairy consortium data are provided in Table 2.3. Heritability estimates were very similar; the estimate (± standard error) of using the proposed MT model was 0.14 ± 0.03, whereas the estimate of using the one-stage RFI model was 0.16 ± 0.03. Repeatability estimates were also very similar (i.e., 0.64 ± 0.01 versus 0.65 ± 0.01) between the two modeling approaches. 29 Table 2.3: Estimates of variance components, heritabilities, and repeatabilities of rFE from dairy consortium data analyses based on proposed multivariate approach versus RFI model. - - 2.4.2.2 Partial Efficiencies/Regressions Estimated partial efficiencies based on the proposed MT approach are reported in Table 2.4; key estimates pertain to or the partial efficiencies of DMI on MILKE and to or the partial efficiencies of DMI on MBW for x = u, pw, pb, and e. For example, the coefficient defines the change in the genetic merit for DMI for every unit change in the genetic merit of MILKE. Thus, these partial efficiencies partly characterize rFE at various levels in that the smaller their values, the more efficient the conversion of feed to MILKE or MBW at the various levels (i.e. genetic effects, within/across-permanent environmental effects or residual effects). 30 Estimates of ranged from 0.32 to 0.52 whereas estimates of ranged from 0.04 to 0.16. Under the RFI analysis, the corresponding partial regression coefficient estimates were 0.38±0.005 for b1 and 0.12±0.004 for b2 which were well within the range of the corresponding partial efficiency estimates from the proposed MT analysis. Table 2.4: Estimates of partial efficiencies based on proposed multivariate model in dairy consortium data analysis - 1Estimated partial efficiencies (± standard errors) of MBW on MILKE (), MBW on MILKE () and of DMI on MBW () at the 2genetic (), within-lactation permanent environment () between-lactation permanent environment ()and residual () levels. 2.4.2.3 Correlation between EBV The estimated correlation between EBV of rFE for cows having their own records based on the two different modeling approaches was = 0.93. A scatterplot of the two sets of EBV is provided in Figure 2.6 further demonstrates the high level of agreement between the two different sets of EBV although it is apparent that demonstrates a slightly greater level of shrinkage relative to likely in part to the fact that the estimate of is slightly smaller than that for as provided earlier in Table 2.3. 31 Figure 2.6: Scatterplot of EBV of rFE based on proposed multivariate model () versus RFI () model of dairy consortium data. Reference line of slope 1 and intercept 0 superimposed 2.4.2.4 Cross-validation Study The accuracies for the 5-fold cross validation were provided in Table 2.5. There was no evidence of a significant difference in predictive accuracy between the two modeling approaches, whether based on within animal or across animal splits of the data, or even combined. Table 2.5: Cross Validation Prediction Accuracies for each of 5 different validation datasets (fold) based on proposed (MT) versus RFI analyses of dairy consortium data 32 We have proposed a modeling strategy for rFE that we believe helps to address the controversy between those researchers that prefer to focus on the analysis of DMI versus those that prefer analyzing RFI for quantitative genetic inference on rFE, recognizing that DMI is easier to understand, even though DMI is not a direct measure of rFE (Berry and Pryce, 2014). In particular, we demonstrated that MT inference involving DMI and the energy sink traits such as MILKE and MBW is equivalent to single trait inference on RFI under one special condition: that the partial efficiencies relating DMI to the energy sink traits are the same at genetic and non-genetic (i.e. permanent environment or residual) levels. However, we would quickly add that even Kennedy et al. (1993) had earlier suggested that selection on RFI is equivalent to multiple trait selection on its components. In fact, we would suggest that our proposed MT approach more Kennedy et al. (1993) which is far more challenging to impleMore recently, another group (Strathe et al., 2014) also presented a strategy for rFE modeling in pigs which is even more closely aligned to our proposed MT strategy. We do believe our proposed MT model facilitates a more comprehensive investigation into the relationship between DMI and the key energy sinks (MILKE and MBW) at different levels; i.e., genetic, permanent environmental, and residual levels whereas the classical RFI approach assumes that such relationships are constant across all such levels. Indeed, our simulation results intuitively proved that two modeling approaches have similar accuracy of genetic merit 33 prediction when partial efficiencies are equivalent at the various levels (i.e., ); in fact, it could be effectively argued that RFI modeling would then be preferred since there would be fewer parameters to estimate. Nevertheless, we also demonstrated by simulation that there can be greater accuracies in genetic merit prediction using the proposed MT approach when these partial efficiencies are rather different from each other between the genetic and non-genetic levels (i.e.,). Our empirical analysis of a subset of a larger dairy consortium study (Tempelman et al., 2015) suggested no real apparent advantage of our proposed approach relative to classical RFI modeling. This result again may be due to the fact that the partial efficiencies may not have demonstrated enough heterogeneity across the various genetic and non-genetic levels to warrant a greater cross-validation prediction accuracy for the proposed approach. However, we believe the proposed MT approach warrants future consideration and research for several reasons. Firstly, a MT approach facilitates the incorporation of all data on animals that might have missing records on DMI or some of the energy sink components of rFE; conversely a RFI modeling strategy would require deletion of all data from any time period missing a record on any rFE component trait. Hence, an even greater accuracy of selection could be demonstrated based on our approach if such actually more subtle for rFE modeling because all component traits are generally not measured with the same level of frequency. For instance, we summarized records on all of the traits to a weekly basis just as we did previously, even though recording frequencies did not only differ among traits within each research station, they also differed among research stations for the same 34 traits(Tempelman et al., 2015). Our proposed approach thereby provides an opportunity to better formally account for these recording frequency differences. Secondly, the proposed MT approach is not any more difficult to implement than any standard MT analysis. Variance-covariance component matrices can be readily estimated using REML in mixed model software followed by use of the CD to estimate the corresponding partial efficiencies. In fact, as previously noted, ASREML software (Gilmour et al., 2009) has the their estimated standard errors using the average information matrix. With denoting a standard MT-based EBV for MILKE, MBW, and DMI, the estimated genetic partial efficiencies () can be used in a rearrangement of Equation [2.6] to backsolve for the BLUP of being the last element of , and which we propose as our genetic merit for rFE. Note that a MT analysis is an important prerequisite for forming a selection index on the corresponding traits. Suppose that that the relative , , and are known for MILKE, MBW, and DMI, respectively, such that a portion of the selection index, at least as it pertains to just these 3 traits, is for animal i based on a MT analysis. Noting that a selection index would involve DMI or rFE, but not both, the selection index for animal i could be re-expressed as a function of its EBV for MILKE, MBW, and our proposed measure of rFE as using Equation [2.5c]. So, for example, if > 0, 0 < < 1 (as based on our results), < 0, and > || as expected, then the ratio of the economic weight of MILKE relative to that () on rFE should be smaller compared 35 with the same ratio involving the weight () on MILKE relative to that () on DMI in a selection index. In other words, rFE would be weighted more highly (in absolute value) relative to MILKE as compared with DMI relative to MILKE in a selection index. Given the previously noted similarities between our rFE measure and RFI, a similar comparison could be drawn between the economic weights of DMI and RFI relative to MILKE in alternative expressions of the same selection index. Finally, our approach allows us to better formally investigate any potential heterogeneity in these partial efficiencies at the various levels across, say, research stations or even diets; this would involve further extending work by Bello et al. (2010) who similarly investigated heterogeneities in the associations between milk production and reproduction at the contemporary group and cow levels. We failed to include BW as a fourth trait in our proposed model given its very high variability and low heritability, even though it is conceptually straightforward to do so. Hence, we treated BW as a covariate in both our proposed MT approach and the RFI model such that both sets of analysis on the dairy data implied a common partial efficiency or regression of DMI on BW at all genetic and non-genetic levels. Better approaches to deal with BW within the context of our proposed model warrant future research. We also did not report the effect of ; i.e. the partial efficiency of MBW on MILKE at level x = u or e in our MT vs RFI modeling comparisons because we observed that neither the magnitude nor the range of the discrepancies between and had any impact (results not reported). 36 It has been shown that higher accuracies and larger genetic gains can be achieved by using genomic selection for rFE traits (deHaas et al., 2014). Even though we used pedigree-based BLUP in this paper, it is conceptually rather easy to substitute the numerator relationships matrix with a genomic relationship matrix or even a H relationship matrix that facilitates the use of genotyped and non-genotyped animals (Aguilar et al., 2010). We believe genomic extensions of our proposed model will be promising and may facilitate further discoveries as to which genes are important for rFE. In this study, an alternative MT model strategy based on a modified Cholesky decomposition for modeling rFE in dairy cattle was proposed as an alternative to the conventional RFI modeling approach. Both strategies were further studied and compared using simulation and application to a subset of dairy research consortium data. We demonstrated the equivalence between proposed and RFI modeling strategies when partial efficiencies between DMI with any of the energy sink traits were equivalent at all levels whereas the proposed MT approach had up to 4 percentage points greater accuracy in genetic merit prediction when there were large discrepancies between these partial relationships at genetic versus non-genetic levels. Application to the dairy research consortium data did not suggest any significant difference in cross-validation predictive accuracy between the two approaches. Nevertheless, we believe our proposed approach warrants further consideration in future research, especially when data on some of the rFE component traits are missing. 37 Chapter3: Modeling Genetic and Non-Genetic Variation of Feed Efficiency and its Partial Relationships between Component Traits as a Function of Management and Environmental Factors Residual feed efficiency (rFE), characterized as the fraction of feed nutrients converted into milk or meat, is of increasing economic importance in the dairy industry. We conjecture that rFE is a complex trait whose variation and relationships or partial efficiencies (PE) involving the conversion of DMI to milk energy and metabolic body weight may be highly heterogeneous across environments or management scenarios. In this study, a hierarchical Bayesian multivariate mixed model was proposed to jointly infer upon such heterogeneity at both genetic and non-genetic levels on PE and variance components (VC). The heterogeneity was modeled by embedding mixed effects specifications on PE and VC in addition to those directly specified on the component traits. We validated the model by simulation and applied it to a joint analysis of a dairy rFE consortium dataset with 5,088 Holstein cows from 13 research stations in Canada, the Netherlands, the United Kingdom, and the United States. Although no differences were detected among research stations for PE at the genetic level, there was some evidence of heterogeneity in residual PE. Furthermore, substantial heterogeneity in VC across stations, parities, and ration were observed with heritability estimates of rFE ranging from 0.16 to 0.46 across stations. Residual feed efficiency (rFE) is becoming more important for the economic and environmental sustainability of dairy production and increases as a greater proportion of feed nutrients are directed towards milk production (Connor, 2015). A commonly used measure of 38 rFE is residual feed intake (RFI) which is defined as the difference between actual and predicted model analysis whereby partial regression relationships are specified between DMI and energy sink covariates such as milk energy (MILKE) and metabolic body weight (MBW), for example. Given that there has been some reluctance to directly incorporate RFI in breeding goals because RFI can be a difficult concept to understand and explain and there are possible genetic antagonisms with other performance traits (Berry and Pryce, 2014), some investigators have focused their attention on DMI as the key phenotype for rFE analyses (Berry et al., 2014, de Haas et al., 2015). Recently, Lu et al. (2015) proposed a multiple trait (MT) mixed model analysis of DMI with MILKE and MBW that further resolves whether RFI or DMI should be considered as the key response variable for rFE. They demonstrated that Cholesky decompositions performed on each of the estimated (e.g., by REML) 3 x 3 genetic (G) and residual (R) variance-covariance matrices among the 3 key traits (MILKE, MBW, and DMI) lead to a parameterization whereby the estimated partial regression relationships between DMI and MILKE, and between DMI and MBW are essentially partitioned into genetic and residual components. Additionally, the Cholesky decomposition leads to a determination of EBV for rFE that is identical to that based on a classical RFI analysis under the special case whereby the partial regression relationships relating DMI to MILKE and to MBW are specified to be identical at both genetic and residual levels. As a corollary, Lu et al. (2015) demonstrated by simulation that the greater the discrepancy between the partial relationships at genetic and non-genetic levels, the greater the EBV accuracy for their proposed MT approach relative to a classical RFI analysis. Furthermore, the MT approach facilitates the incorporation of data on cows that might be missing, even 39 selectively, on any of the 3 key phenotypes that would otherwise be discarded in a RFI analysis (Pollak et al., 1984). A hierarchical Bayesian extension of a 2-trait MT model was developed earlier in a non-genetic context by (Bello et al., 2010) who later inferred heterogeneous partial regression relationships between calving intervals and milk yield at both herd and cow levels as a function of environmental and herd management factors for Michigan dairy herds. We surmise that the genetic and residual partial relationships between DMI and MILKE, or between DMI and MBW could also be modeled as a linked multifactorial function of various factors including parities, research stations, and rations, for example. Indeed, based on station-specific RFI analyses, Tempelman et al. (2015) determined that estimated partial regression coefficients of DMI on MILKE and on MBW were highly heterogeneous across research stations in several countries. Furthermore, adaptation of the hierarchical Bayesian approach as proposed by (Bello et al., 2010) would also infer the degree of heterogeneity in heritabilities of rFE across different management conditions. The 2 primary objectives of this study were 1) to identify potential management or environmental factors that might impact genetic and residual partial regression relationships (i.e., PE) between DMI and MILKE, and between DMIand MBW, and 2) to infer whether there is evidence of heterogeneity of genetic and residual variances for rFE across these same management or environmental factors. 40 3.3.1 The Multiple Trait Model Our study closely combines developments provided in Lu et al. (2015) with those provided in (Bello et al., 2010). The 3 key component traits of rFE are numbered as follows in the MT model: 1) MILKE, 2) MBW, and 3) DMI. We write this MT model as follows: . [3.1] Here is the vector of responses for the 3 traits on record or animal , i nFurthermore, is the vector of fixed effects connected to by known incidence matrix such that denotes the subvector of fixed effects for trait j, j = 1,2,3. Note that denotes the Kronecker product such that we assume the same fixed effects incidence row vector for each of the 3 traits for ease of presentation, although further generalization is possible. Similarly, is the vector for animal genetic effects connected to by known incidence matrix such that denotes the subvector of random genetic effects on trait j for all n animals. Hence, we also assume the same random effects incidence row vector for each of the 3 traits, although again further generalization is possible. To further simplify presentation, we specifically focus on the situation where there is 1 record per animal, and genetic merit is explicitly modeled only for animals having records, although extensions to the more common situation where genetic evaluations are also desired on animals without any records or multiple records is readily apparent. Finally, denotes the sub-vector of residuals for the 3 traits on animal i. Now 41 can be alternatively reordered by traits within animals; i.e., with denoting the vector of random genetic effects for the three traits on animal i. for animals and () -, then we specify 42 [3.2] with such that , and all other terms defined as before. Mirroring what was presented previously, albeit in a non-genetic context (Bello et al., 2012) and further extending Lu et al. (2015), we write the Cholesky decomposition for as , where is an identity matrix of order 3, is a diagonal matrix of sequentially conditional variance components (VC) and is a lower triangular matrix such that and are modeled as being potentially unique to the environmental or management circumstances pertaining to animal i. Now where is based on the specification whose elements can be written recursively Lu et al. (2015) as [3.3a] [3.3b] [3.3c] For instance, is used to represent the genetic effect of MBW conditional on MILKE for animal i whereas pertains to the genetic effect of DMI conditional on MILKE and MBW and which we have previously proposed as the genetic effect for rFE for animal i (Lu et al., 2015). Similarly, we define is the Mendelian sampling genetic variance of rFE for animal i. referred to , , and as partial efficiencies (PE) at the genetic level. We further extend their work using Bello et al. (2010) to allow for the possibility that these genetic PE are potentially heterogeneous across cows as manifested by the additional subscript i. 43 Lu et al. (2015) likewise demonstrated, where is a diagonal matrix and is a lower triangular matrix except that here both and are potentially unique to the environmental circumstances peculiar to animal i following Bello et al. (2010). Now is the covariance matrix of as defined within whose elements can be demonstrated to be written recursively, using Lu et al. (2015), as [3.4a] [3.4b] [3.4c] Here we refer to,, and as PE at the residual level, again being unique to animal i. Equations [3.3c] and [3.4c] are particularly key for defining rFE in that they specify the key relationships between the energy source (DMI) and energy sink traits (MILKE and MBW) at both the genetic (i.e., , and ) and residual (, and ) levels. For example, defines the change in for every unit change in holding constant whereas defines the change in for every unit change in holding constant on animal i. Furthermore, and are similarly interpreted but at the residual or non-genetic level with respect to animal i. In essence, heterogeneous PE at the genetic or residual levels indirectly imply heterogeneous genetic and residual correlations between traits across animals as determined by different environments and management factors. As in Bello et al. (2010), we specify a structural mixed effects model on each of these PE. For example, at the genetic level, we write 44 [3.5] with r = 21, 31, or 32 denoting indices representing PE relationships between MBW and MILKE (r = 21), between DMI and MILKE (r = 31) and between DMI and MBW (r = 32) as per Equations [3.3b] and [3.3c]. Here represents a connected to by known incidence row vector whereas represents a by known incidence row vector -informative or vaguely informative prior distributions that do not presume exchangeability amongst the elements of utilized for factors characterized by a large number of subclasses, with each subclass characterized by relatively limited data (Bello et al., 2010). We similarly specify a structural mixed effects model for the residual PE , , and specific to observation i [3.6] with r = 21, 31, or 32 as before as per Equations [3.4b] and [3.4c]. Here, denotes a x 1 vector of fixed effects connected to by known incidence row vector whereas represent a x 1 vector of random effects connected to by where terms are similarly interpreted as with genetic PE. Following Bello et al. (2010), each VC specific to animal i in is also specified as a function of environmental or management factors in a structural mixed effects model: 45 [3.7] where s =1, 2|1, and 3|1,2 is used to denote, respectively, the genetic variance for MBW, the genetic variance for MBW conditional on MILKE, and the genetic variance for DMI conditional on MILKE and MBW. Here is a x 1 vector of fixed effects connected to by known incidence row vector whereas is a x 1 vector of random effects connected to by known incidence row vector . Independent inverted Gamma priors IG(,) are assigned to each element of such that with coefficient of variation (CV) of as previously demonstrated by Kizilkaya and Tempelman (2005) and Bello et al. (2010). That is, characterizes the standard deviation of subclass-specific variances expressed relative to the average subclass variance. Finally, the VC in are similarly modeled with a mixed effects specification: [3.8] where s = 1, 2|1, and 3|1,2 is used to denote, respectively, the residual variance for MBW, the residual variance for MBW conditional on MILKE, and the residual variance for DMI conditional on MILKE and MBW. Here denotes a x 1 vector of fixed effects connected to by known incidence row vector whereas represents a x 1 vector of 46 random effects connected to by known incidence row vector . Here, elements of are assumed to be random independent draws from IG(,) such that and. Following Bello et al. (2010) and Kizilkaya and Tempelman (2005), vaguely informative priors are specified on and as follows: [3.9] [3.10] for s = 1, 2|1, and 3|1,2. For all analyses in this paper, flat unbounded priors are specified on ,,,,,,,,,,,,,,,,,,and ,as also adapted by Bello et al. (2010), although vaguely or even rather informative priors could be specified as well. Due to these flat prior specifications, it is imperative to invoke identifiability restrictions on ,,,,,,,,,,, and similar to those commonly invoked for fixed classification factors (i.e., ,, and ) in classical linear model specifications. In that regard, we utilize the corner parameterization (Clayton, 1996, Kizilkaya and Tempelman, 2005) also referred to as the set-to-zero restriction (Milliken and Johnson., 2009) and as also used by Bello et al. (2010) whereby an overall 47 intercept is specified and the effect corresponding to one arbitrarily chosen level of each fixed Full conditional densities (FCD) of all unknown (hyper)parameters as required for conducting Markov Chain Monte Carlo (MCMC) Bayesian analysis are provided in the Appendix B. 3.3.2 Application to Dairy Consortium Data 3.3.2.1 Data Source and Editing A dataset of MILKE, DMI, and MBW on 5,088 Holstein cows was collected on 13 research stations and studies from the United Kingdom, the Netherlands, Canada, and the United States (US); data sources and editing procedures used for this study are extensively characterized elsewhere (Berry et al., 2014, Lu et al., 2015, Tempelman et al., 2015, Manzanilla-Pech et al., 2016). Phenotypes on all 3 traits were further condensed into 42-d records by taking the average of the first 6 weekly records on all 3 traits between 50 and 200 DIM. If data from multiple lactations were available on individual cows, only the earlier lactation was chosen. One research station was in Canada, being the University of Alberta (AB). Six stations were in the US, specifically, Iowa State University (ISU), Michigan State University (MSU), the University of Florida (UF), the University of Wisconsin-Madison (UW), the USDA Dairy Forage Research Center (USDFRC) in Madison, WI, and the USDA Beltsville Agricultural Research Center (BARC) in Beltsville, MD. Four research stations were in the Netherlands including an Leeuwarden, a third study (ZOM) based on the work by Zom et al. (2012) ; and a compilation of studies (NLN) based on data collected from various nutritional experiments, all of which were 48 previously characterized in detail by Tempelman et al. (2015). The remaining 2 herds were from the United Kingdom, the Langhill (LAN) farm near Edinburgh from 1992 to 2001 and from the Scottish Agricultural College (SAC) Dairy Research Centre based at Crichton Royal Farm near Dumfries with data collection from 2003 to 2011; again, these data are also described extensively elsewhere (Tempelman et al., 2015). A summary of the number of cows by research station/study is provided in Table 3.1. Table 3.1: Distribution of cows by region. Region Station No. of cows with phenotypes Canada AB 236 United States UF 175 ISU 742 MSU 158 UW 237 USDFRC 565 BARC 286 Netherlands TGEN 569 NBZ 99 ZOM 661 NLN 329 United Kingdom LAN 590 SAC 441 Overall 5,088 1AB = University of Alberta; UF = University of Florida; ISU = Iowa State University; MSU = Michigan State University; UW = University of WisconsinMadison; USDFRC = USDA Dairy Forages Research Center; BARC = USDA Beltsville Agricultural Research Center; TGEN Netherlands; ZOM = data based on the work of Zom et al. (2012); NLN = compilation of studies previously characterized by Tempelman et al. (2015); LAN = Langhill farm, Edinburgh, UK; SAC = Scottish Agricultural College. 49 The specification for the full hierarchical Bayesian model included the fixed effects of research station and parity (primiparous versus multiparous) for each of,,,,, and. Research station, parity, and the linear and quadratic effects of DIM were specified in ,,,,, and . Finally, research station, parity, ration with station, and up to 4th order polynomial effects of DIM were specified in ,, and. Ration within station was specified in all cases as the single random effects factor for each of , , , and . Given that there were a total of 271 rations across the 13 research stations, it was more important to specify these ration effects as random at the deepest levels of the model hierarchy (i.e., for PE and VC) in order to facilitate efficient borrowing of information across rations. Conversely, treating ration effects as fixed in ,, and was deemed to be feasible because these effects are specified closer to the data (y) in the model hierarchy. To account for changes in energy balance, the average daily change in BW (BW) was fitted as a covariate in , , and as well. 3.3.2.2 Model Comparison Model comparison and selection in hierarchical Bayesian inference is often formally based on the Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002) as well as on the relative sizes of the posterior z-scores; i.e., the ratio of the posterior means to their posterior standard deviations (Gelman et al., 2012). The absolute values of these posterior z-scores can be used in Wald-like tests to ascertain, for example, whether or not they exceed 1.96 in absolute value for a Type I error rate of 5%, assuming the marginal posterior density of the corresponding parameter is reasonably symmetric. The DIC has previously been used in a stepwise manner to infer 50 evidence for fixed and random sources of heterogeneity on VC and PE within a 2-trait model by Bello et al. (2012). However adapting their stepwise process would be far more onerous for our 3-trait model given the need for an assessment of which fixed or random effects factors to keep for each of 12 different mixed model subsets (i.e., 3 separate models for each of Equations [3.5], [3.6], [3.7], and [3.8]) beyond the classical mixed model specification in Equation [3.1]. Thus, we only compared 5 models starting at one extreme from a purely homoscedastic MT model with homogeneous PE and VC at all levels (M0) as developed and used previously in Lu et al. (2015) to the proposed fully heteroscedastic MT model with heterogeneous PE and VC, based on all structural mixed effects model specifications as detailed in this paper (M4). The remaining models evaluated were M1) fixed effects (i.e., without random effects) modeling on both genetic and residual PE with homogeneous genetic and residual VC, M2) fixed effects modeling on genetic and residual PE and on genetic and residual VC and M3) full mixed effects modeling on VC with only fixed effects modeling on PE. 3.3.2.3 Cross-validation Assessment To assess the relative importance of modeling heterogeneous genetic and residual VC and PE, we assessed the cross-validation performance of our proposed model (M4) relative to the more conventional MT mixed model (M0) of Lu et al. (2015). More specifically, a 5-fold cross validation study was conducted by merely randomly partitioning the entire dataset into 5 equally sized subsets such that for each of the folds, 4 subsets were chosen as training data with the remaining subset chosen as validation data. The performance of the 2 models were compared with respect to cross-validation predictive accuracy defined as the correlation between true DMI 51 and prediction in the validation datasets based on estimates derived from the analyses of the training datasets. 3.3.2.4 Marginal Means All reported estimates were based on the full model (M4), determined to be the best fitting of those evaluated. To facilitate interpretation, however, estimates of all fixed effects were re-expre-software (Littell et al., 2002, Milliken and Johnson., 2009); further illustration in the context of our MT model is presented by Bello et al. (2010) and Bello et al. (2012). For example, linear combinations of ,,,,, and were used to estimate marginal genetic and residual PE relationships for the various research stations, averaged across levels of the remaining factor(s) (i.e., parity) and at the midpoint covariate values (i.e., DIM = 125 d). Marginal mean linear combinations of ,,,,, andwere exponentiated to the VC scale in a manner similar to that described by Bello et al. (2010). Pairwise comparisons between marginal posterior means were conducted between levels of each fixed effects factor (i.e., stations and parity) using 2-tailed Bayesian P-values in the manner described previously by Bello et al. (2010). 3.3.3 Simulation Study We realize that our proposed Bayesian MT model on 3 traits is massively hierarchical in nature since it entails a total of 15 different mixed model specifications, including 3 within the basic first stage specification for yi in Equation [3.1], one for each of , , and in Equation [3.5] , one for each of , , and in Equation [3.6], one for each of , 52 , and in Equation [3.7], and one for each of ,, and in Equation [3.8]. Hence, similar to Bello et al. (2010), we used a simulation study to validate the proposed model and Bayesian inference strategy as developed elsewhere. Twenty replicated datasets were simulated based on only the subset of the data and pedigree deriving from the US research stations to facilitate computational tractability. An overall intercept and station effects were generated as fixed effects for PE (,,,,, and ) and for the VCs (,,,,, and), whereas ration within station effects were generated as random effects with variance components ,, for the genetic PE and ,, and for the residual PE and with hyperparameters ,, and for the genetic VC and ,, and for the residual VC. True values for these (hyper)parameters were roughly determined as estimates from the analysis of subset of data deriving from the US. Coverage probabilities of the 95% highest posterior density (HPD) interval for each parameter or hyper-parameter were based on how many times the HPD included the true values over the 20 replicated datasets. 3.3.4 MCMC Implementation For all analyses involving either the dairy or simulated data, trace plots, autocorrelation, and effective sample size (ESS) for samples of all unknown parameters drawn from the posterior density using MCMC were monitored (Brooks and Gelman, 1998) using the R package Coda (Plummer et al., 2006). The number of burnin cycles were 30,000 for the simulation study and 20,000 for the dairy data analysis. After burnin, every tenth cycle was saved from 100,000 53 MCMC cycles such that posterior inference was based on 10,000 cycles. This sampling strategy was sufficient to ensure that ESS > 100 for all hyperparameters. 3.4.1 Analysis of Dairy Consortium Data As indicated previously, only a small subset of all possible models was considered, the simplest model (M0) being equivalent to a classical homogeneous (co)variance matrix multiple-trait modeling specification as in Lu et al. (2015). DIC values were expressed relative to Mo (i.e. DIC = 0 for Mo) which was determined to be the worst fitting model. DIC values became progressively smaller (i.e., better fitting) for increasingly more complex models starting with M1) fixed effects but no random effects modeling on both genetic and residual PE with no structural modeling at all for VC (DIC = -1,673); M2) only fixed effects modeling on both genetic and residual PE and VC (DIC = -4,305); M3) full mixed effects modeling on VC with fixed effects modeling only on PE (DIC = -4,633); and finally M4) full mixed effects modeling on both VC and PE (DIC = -5,400), the best fitting of all models and upon which all subsequent inferences on the dairy data analysis are based. Clearly then, it was important to structurally fit genetic and residual sources of heterogeneous PE and VE per Equations [3.5], [3.6], [3.7], and [3.8], recognizing that DIC differences exceeding 7 are generally considered to represent significant difference in model fit (Spiegelhalter et al., 2002). 3.4.2 Inferences on Factors Influencing PE The posterior means (PMEAN) and posterior standard deviations (PSD), 95% Highest Posterior Density (HPD) intervals, and ESS for all PE related inferences on marginal means of 54 station, parity, linear, and quadratic effects of DIM at both genetic (, and ) and residual levels (, and ) as well as variance components for random effects at the genetic (,, and ) and residual (,, and ) levels are summarized in Appendix Table B.1 for MBW|MILKE, in Table 3.2 for DMI|MILKE, and in Table 3.3 for DMI|MBW, respectively. 55 Table 3.2: Posterior inferences characterizing heterogeneity of genetic and residual partial efficiencies of DMI1 on MILKE2 1 Dry matter intake (in Kg) 2 Milk energy (in Mcal) 3 Posterior (marginal) mean (± posterior standard deviation) 4 95% Highest posterior density interval 5 Effective sample size 6 Station levels characterized in body of paper. 56 Table 3.3: Posterior inferences characterizing heterogeneity of genetic and residual partial efficiencies of DMI1 on MBW2 1 Dry matter intake (in Kg) 2 Metabolic body weight (in Kg) 3 Posterior (marginal) mean (± posterior standard deviation) 4 Estimates not sharing the same letters within factors are statistically different (P < 0.05). 5 95% Highest posterior density interval 6 Effective sample size 7 Station levels characterized in body of paper. 57 Although we do not particularly highlight results regarding MBW|MILKE in this paper, it is intriguing to note from Supplementary Table B.1 that the genetic PE between the 2 traits were not generally different from 0 for any one research station nor for either parity because 0 fell within the 95% HPD in all cases. Conversely, residual relationships were universally positive. Taken together and using Equation [3.7] in (Lu et al., 2015), these results imply a universally negligible genetic correlation but a positive residual correlation between MBW and MILKE. At any rate, there was also no formal evidence of any heterogeneity in these relationships between research stations or parities based on pairwise comparisons between the corresponding marginal means. Also, there was no evidence of heterogeneous genetic variances but strong evidence of heterogeneous residual variances across stations for MILKE (Supplementary Table B.2) whereas there was no evidence of any heterogeneous genetic nor residual variances for MBW|MILKE across stations (Supplementary Table B.3) Because our focus was on rFE, we were particularly interested in PE relationships involving DMI. As a basis for reference, the overall mean PE at the genetic and residual levels were respectively determined to be and where,,, and denote the PMEAN of the corresponding parameters for r = 31 or 32; i.e., PE of DMI on MILKE and of DIM on MBW, respectively. Specifically, the PMEAN(± PSD) of was 0.38(± 0.10) kg/Mcal whereas the PMEAN (± PSD) of was 0.10(± 0.05) kg/kg0.75 implying then, for example, the genetic merit of DMI is estimated to increase by 0.38 kg on average for every Mcal increase in the genetic merit of MILKE holding constant MBW. At the residual level, the PMEAN(± PSD) of 58 was 0.33 (± 0.06) kg/Mcal whereas the PMEAN(± PSD) of was 0.07(± 0.04) kg/kg0.75 such that, for example, as the residual or temporary environment effect of MBW increases by 1 kg0.75, the residual effect of DMI is estimated to increase by 0.07 kg holding constant MILKE. These overall genetic and residual PE estimates are in reasonable agreement with averages of station specific partial regression coefficients relating DMI to MILKE and to MBW within a classical RFI analysis on the same data reported previously by Tempelman et al. (2015) who, in turn, demonstrated these estimates to be consistent with estimates reported by NRC (2001). A scatterplot of the station-specific marginal means for genetic and residual PE versus the corresponding partial regression coefficients relating DMI to MILKE and to MBW from the classical RFI analyses in Tempelman et al. (2015) are provided in Figure 3.1, noting that estimates from Station AB are not included in Figure 3.1 since that station was not considered in Tempelman et al. (2015). Correlations between station-specific partial regression coefficients relating DMI to MILKE, reported by Tempelman et al. (2015), with corresponding genetic and residual PE estimates were 0.66 and 0.76, respectively. Correlations between station-specific partial regression coefficients relating DMI to MBW, reported by Tempelman et al. (2015), with corresponding genetic and residual PE estimates were 0.39 and 0.56, respectively. Although the agreement between the 2 analyses was generally good, it should be clearly seen that the genetic PE were generally larger than residual PE, particularly for DMI on MILKE (Figure 3.1a ). Furthermore, there appeared to be less heterogeneity between stations in the genetic PE relative to the residual PE for DMI on MBW (Figure 3.1b). 59 1 Dry matter intake (in Kg) 2 Milk energy (in Mcal) 3 Metabolic body weight (in Kg) Figure 3.1: Posterior means of station-specific genetic (and ) and residual (and ) partial efficiencies versus estimated partial regression coefficients (and) previously reported by Tempelman et al., 2015) for DMI1 on MILKE2 (Panel a) and MBW3 (Panel a). A list of these same station-specific marginal mean inferences for the genetic and residual PE are provided in Table 3.2 for DMI on MILKE and in Table 3.3 for DMI on MBW. Inference summaries are also provided for primiparous versus multiparous animals as well as for linear and quadratic coefficients of DIM on these PE. Although the estimated marginal mean genetic PE between DMI and MILKE ranged widely from 0.28 to 0.49 kg/Mcal across stations, there was no formal evidence of any difference (P > 0.05) between any pair of stations or between primiparous versus multiparous cows. Variation due to ration within station effects on genetic PE between DMI and MILKE seemed substantial; based on the reported point estimate kg/Mcal such that one might anticipate roughly a range of = 0.28 (a) (b) 60 kg/Mcal difference on genetic PE between the most extreme ration effects, assuming normality and adjusting for station and DIM effects. As with the genetic PE, any differences in the marginal mean residual PE of DMI to MILKE did not appear to be important even though again point estimates varied widely from 0.27 kg/Mcal to 0.49 kg/Mcal. Furthermore, the estimate kg/Mcal further suggested that ration within station could be a substantial source of variability for the residual PE relationship between DMI and MILKE. Linear and quadratic effects of DIM also did not seem to be important for either residual or genetic PE; in other words, stage of lactation did not appear to be important for modeling PE between DMI and MILKE. The posterior summary of station and parity level effects on the genetic and residual PE of DMI to MBW are provided in Table 3.3. There appeared to be no significant differences between stations for the genetic PE although point estimates ranged from 0.07 kg/kg0.75 to 0.21 kg/kg0.75. Furthermore, heterogeneity in genetic PE of DMI to MBW due to the rations appeared to be relatively substantial (kg/kg0.75). However, at the residual level, there was formal evidence of at least one difference for PE of DMI on MBW between ZOM (0.05 ± 0.02 kg/kg0.75) and USDFRC (0.13 ± 0.04 kg/kg0.75) even though point estimates ± standard errors ranged more widely from 0.02 ± 0.04 kg/kg0.75 (SAC) to 0.18 ± 0.12 kg/kg0.75(UF). The magnitude of variation in the residual PE of DMI on MBW due to the ration was again rather substantial (kg/kg0.75). 3.4.3 Inferences on Variance Components and Heritability of rFE Supplementary Tables C.2 and C.3, and Table 3.4 summarize marginal inferences on fixed and random effects influencing VC for MILKE, MBW|MILKE, and DMI|MILKE,MBW, 61 respectively. We focus in particular on DMI|MILKE,MBW; i.e., our proposed rFE trait. The overall genetic variance as based on the PMEAN (± PSD) of was 0.65 (± 0.19) kg2, whereas the overall residual variance was based on the PMEAN (± PSD) of being 1.86(± 0.45) kg2 , such that the PMEAN (± PSD) of overall heritability was 0.26 (± 0.08). Evidence for heterogeneity in the genetic variance for rFE was considerable (Table 3.4). The NBZ station appeared to have the smallest estimated genetic variance (0.47 ± 0.10 kg2) whereas the largest genetic variance was estimated at UF (1.28 ± 0.41 kg2) among all stations. No significant differences in genetic variance in rFE were found between primiparous and multiparous cows. Furthermore, genetic variances were demonstrated to be highly heterogeneous between rations within stations with the 95% HPD for the CV on genetic variance between rations being between 0.26 and 0.52. 62 Table 3.4: Posterior inferences characterizing heterogeneity of genetic and residual variation of DMI1|MILKE2,MBW3 1 Dry matter intake (in Kg) 2 Milk energy (in Mcal) 3 Metabolic body weight (in Kg) 4 Posterior (marginal) mean (± posterior standard deviation) 5 Estimates not sharing the same letters within factors are statistically different (P < 0.05). 6 95% Highest posterior density interval 7 Effective sample size 8 Station levels characterized in body of paper. 9 Coefficient of variation of ration within station-specific genetic and residual variance components 63 The magnitude of heterogeneity on the residual variance of rFE across research stations was also quite considerable (Table 3.4). The marginal PMEAN (± PSD) for stations ranged from a low of 0.85 (± 0.24) kg2 at NLN to 3.01 (± 1.25) kg2 at LAN. Furthermore, primiparous cows were inferred to have a substantially lower residual variance (1.01 ± 0.18 kg2) compared to multiparous cows (2.91 ± 0.47 kg2 ). Also, there appeared to be considerable evidence for heterogeneity of residual variances across rations as the 95% HPD the CV on genetic variance between rations was defined as between 0.47 to 1.57. Table 3.5 summarizes inferences on heritabilities of rFE across different stations and parity classes. Evidence in favor of heterogeneity of heritability was substantial with the smallest heritability estimated at LAN (0.16 ± 0.06) at the largest estimates at UF (0.46 ± 0.13) and NLN (0.41 ± 0.09). Furthermore, the estimated heritability (0.39 ± 0.06) for primiparous cows was almost twice that (0.22 ± 0.04) for multiparous cows with the difference being significant (P < 0.05). 64 Table 3.5: Inferences on station and parity specific heritabilities for DMI1|MILKE2,MBW3 1 Dry matter intake (in Kg) 2 Milk energy (in Mcal) 3 Metabolic body weight (in Kg) 4 Posterior mean (± posterior standard deviation) 5 Estimates not sharing the same letters are statistically different (P < 0.05). 6 95% Highest posterior density interval 7 Effective sample size 8 Station levels characterized in body of paper. 3.4.4 Cross-validation The accuracies for the 5-fold cross validation on the dairy consortium data are provided in Table 3.6. There was strong evidence for differences in model fit between the homogeneous (M0) and fully heterogeneous (M4) models. The accuracy for predicting DMI was consistently higher (~4%) using the heterogeneous model compared to that of the homogeneous model, which 65 further implies that substantial heterogeneity of some nature (i.e., on genetic or residual PE, or on genetic or residual VC) exists in this consortium dataset. Table 3.6: Cross validation prediction accuracies for each of 5 different validation datasets (fold) for homogeneous and heterogeneous models. 3.4.5 Simulation Study Assessment Coverage frequencies, ranging from 16/20 to 20/20 for the 95% HPD of key hyperparameters across the 20 replicates in the simulation study, are listed in Table 3.7 and were as nearly expected (i.e., 95%). Summing across the last column of Table 3.7, the overall coverage frequency across all parameters was 458/470 or very close to the expected 95%. The posterior means of were more variable and their 95% HPD thus was wider than those of, as there is typically greater uncertainty for parameters characterizing distribution of random effects as opposed to those for residuals, as reported in an earlier study (Bello et al., 2010). 66 Table 3.7: Summary on 95% highest posterior density interval (HPD) and coverage probabilities across 20 replicated datasets in simulation study We have previously proposed a MT model analysis involving MILKE, MBW, and DMI using the Cholesky decomposition to provide a statistically more elegant approach for characterizing the genetic merit and heritability of rFE relative to classical RFI approaches (Lu et al., 2015). Note that the proposed MT model is more closely related than the classical specification of RFI to the genotypic regression model proposed by Kennedy et al. (1993); so the developments we 67 have presented here somewhat represent a heterogeneous yet more elegant extension of their genotypic regression model. As we had also indicated in Lu et al. (2015), it certainly would have been ideal to include BW as a fourth trait given that, like MILKE and MBW, it is considered to be an important energy sink for lactating dairy cattle(Berry and Crowley, 2013). But, for reasons already outlined in Lu et al. (2015), we modeled it as a covariate to account for its effect on rFE in our MT analyses. We further augment this work here by modeling heterogeneity in the genetic and residual (co)variances, albeit indirectly, between MILKE, MBW, and DMI, by specifying structural mixed effects models on each key parameter defined by the square root free Cholesky decomposition on the (co)variance matrices, similar to Bello et al. (2010). Based on formal model choice criteria (DIC), we determined that specifying such heterogeneity was important for the analysis of an international dairy feed efficiency consortium dataset. Due to computing constraints, we did not consider every possible submodel given that there are a total of 15 different structural model specifications, 3 per each of Equations [3.1], [3.5], [3.6], [3.7], and [3.8]. We might further conclude, nevertheless, that based on our analyses, specifying this heterogeneity was ultimately most important for VC and somewhat less so for PE, particularly for PE at the genetic level. The latter is not particularly too surprising since it is typically more challenging to detect heterogeneity in genetic (co)variances relative to residual (co)variances. It was somewhat surprising to us that we found no evidence of differences between parity classes for genetic or residual PE for DMI on MBW or for DMI on MILKE given that the metabolic relationships between these traits have been reported to differ between primiparous and multiparous cows (Wathes et al., 2007). 68 There was mild evidence of some heterogeneity in residual PE for DMI on MBW between stations with the only significant difference involving two research stations. Any differences in ration-specific or station specific PE could be partly attributed to differences, for example, in management strategies, ration compositions, measurement error, or ambient temperature; e.g., whether or not the animal is a thermo-neutral environment (NRC, 1981). Furthermore, we demonstrated that genetic PE were generally larger than residual PE implying that the strength of the relationships between DMI with MILKE and MBW is stronger at the genetic level than at the residual level. In a classical RFI analysis, the residual and genetic PE are constrained to be identical as previously noted by Lu et al. (2015); nevertheless estimates of these 2 sets of PE tracked very well with the corresponding partial regression coefficients from the station-specific RFI-based analyses conducted by Tempelman et al. (2015) as demonstrated earlier in Figure 3.1. Note that any heterogeneity in the PE, particularly at the genetic level, has implications for accurately estimating breeding values on rFE, because these PE determine the key relationship between the genetic merit of DMI and that of rFE using Equation [3.3c] for example. We concluded that the genetic PE of MBW|MILKE did not differ from 0 for any station, implying then no evidence of genetic association between MILKE and MBW. Our results are then consistent with near zero genetic correlation estimates between these 2 traits as reported elsewhere (Berry et al., 2003, Spurlock et al., 2012). As previously indicated, there was much stronger evidence of a difference in genetic and residual variances and, consequently, in heritabilities for rFE between research stations, adding further evidence that rFE is indeed a complex trait as also indicated by Tempelman et al. (2015) 69 who previously determined country specific differences for heritabilities of RFI based on analyses of weekly data. The average heritability estimates reported in Table 3.6 appeared to be somewhat larger here than in Lu et al. (2015) and in Tempelman et al. (2015). Any differences between the current study with those 2 previous studies may be partly due to the fact that 42 d data were used here instead of weekly data, such that the effect of BW was better characterized by relatively less measurement error in the current study. However, it could also reflect the fact that the modeled heterogeneity in genetic and residual PE in Equations [3.3c] and [3.4c] leads to more accurate characterizations of genetic and residual effects of rFE (i.e., and , respectively), thereby resulting in a more reliable estimates of the heritabilities of rFE. Our analyses furthermore indicated that the heritability of rFE in primiparous cows was nearly double that of multiparous cows due to the large difference between the parity classes for residual variance. Again, these results may partly reflect the differences in energy dynamics as it relates to growth demands in primiparous cows relative to multiparous cows. Since selection indices involving rFE inherently depend upon the correct specification of genetic/residual variances and covariances between the key traits of MILKE, MBW, and DMI (Lu et al., 2015), it appears that differential selection index weightings may need to reflect this heterogeneity across stations and parities as well. It is also important to note that between-station heterogeneity may partly reflect differences in time periods for data collection (see Table 1 in Tempelman et al., 2015), noting that higher heritability estimates tended to be associated with more recent data. Although our genetic analysis is based on the use of pedigree information and not on genomic marker information to specific animal relationships, it is conceptually easy to extend the 70 proposed multivariate model for genomic evaluations simply by replacing the A with the genomic or realized relationship matrix G or even a hybrid matrix H based on data from both genotyped and non-genotyped animals as with single step GBLUP (Aguilar et al., 2010). We focused on a pedigree-based analysis in this paper for one primary reason; A is a relatively sparse matrix with 83.6% of its entries being 0, thereby allowing us to effectively utilize sparse in R) to help improve computational efficiency for sampling elements of u whereas G is typically a dense matrix. We also determined (not reported) inconsequential differences in EBV between using A versus G based on computationally tractable MT analyses assuming homogeneous PE and VC specifications (Lu et al., 2015). Nevertheless, we do believe it will be important to further optimize our software code to facilitate the incorporation of genomic information in future analyses. One potential extension of our proposed model is to directly model genetic effects as random effects on genetic and residual PE and VC, much like what we did with ration effects. In other words, one could specify Equations [3.5], [3.6], [3.7], and [3.8] as a function of polygenic or even SNP-specific effects (i.e., in genomic) models. In fact, with respect to residual variances as per Equation [3.8], related work has already been conducted by Sorensen and Waagepetersen (2003) for polygenic effects and by Yang et al. (2011) for genomic marker effects. With respect to PE, somewhat indirectly related work has been conducted in beef cattle by Savietto et al. (2014) and in poultry(Aggrey and Rekaya, 2013, Rekaya and Aggrey, 2015) whereby animal specific random regressions are specified on partial regression relationships between DMI and MBW and average daily gain as based on a classical RFI model. At any rate, it seems likely that 71 many more records would be needed than would be required here to infer upon genetic heterogeneity of this nature at these higher levels of the Bayesian model hierarchy. Our simulation study, nevertheless, indicates that it is possible to infer upon the nature of the heterogeneity specified in our proposed model. We have conducted a hierarchical Bayesian analysis to infer upon potential heterogeneity on genetic and residual components of DMI conversion to MILKE and DMI to MBW as well as in genetic and residual VC of rFE across research stations. We detected no evidence of heterogeneity in genetic PE with slight evidence of heterogeneity in the residual PE of DMI to MBW across research stations. Yet evidence for genetic and residual heteroscedasticity was substantial across stations and rations within stations levels for rFE. The estimated heritability of rFE ranged from 0.16 to 0.46 across stations, thereby implying that rFE is a more complex trait than what is currently considered in most quantitative genetic analyses. Heterogeneous relationships across environments should be taken into consideration in the genetic characterization and management of rFE in dairy cattle. 72 Chapter4: Genome Wide Association Analyses based on Alternative Strategies for Modeling Feed Efficiency Genome wide association (GWA) of residual feed efficiency (rFE) using genome wide association (GWA) analysis could help target important genomic regions influencing rFE. Data provided by an international dairy rFE research consortium consisted of phenotypic records on dry matter intakes (DMI), milk energy (MILKE), and metabolic body weight (MBW) on 6937 cows from 16 stations in four counties. Of these 6937 cows, 4916 had genotypes on 57,347 single nucleotide polymorphism (SNP) markers. We adapted our previously proposed multiple trait (MT) approach for modeling rFE to conduct a GWA study and compared it to a GWA analysis based on the more classical residual feed intake (RFI) model. Both models were based on a single-step genomic BLUP procedure that allows the use of phenotypes from both genotyped and non-genotyped cows. Estimated effects for single SNP markers were small and not statistically important for rFE. However, upon further refining this analysis to develop joint tests on SNP markers within non-overlapping 1-Mb windows, significant associations were detected between rFE with a region on each of BTA 12 and 26 under both MT and RFI models. There was no overlap between detected genomic regions for rFE and genomic regions influencing its component traits (i.e. DMI, MILKE, and MBW). This confirms that rFE and its component traits are different such that genomic regions that control DMI are not necessarily synonymous with genomic regions influencing rFE. 73 Residual feed efficiency (rFE), defined as the efficiency of converting feed nutrients into milk and body tissue, has gained considerable attention in recent years, partly due to increasing constraints on arable land as well as mounting concerns regarding environmental pollution (Richardson and Herd, 2004). Most notably, rFE has been included in the national selection index for beef and dairy cattle breeding in Australia in 2014 and 2015, respectively (Gonzalez-Recio et al., 2014, Byrne et al., 2016) with some other countries considering the same (Hayes et al., 2013). Whole genome prediction (WGP) based on the use of single nucleotide polymorphism (SNP) markers (Meuwissen et al., 2001) is a particularly promising strategy to improve rFE in the long term(Hayes et al., 2013, Pryce et al., 2015, VandeHaar et al., 2016). Nevertheless, it is equally important to identify potential causal variants or quantitative trait loci (QTL) affecting rFE in order to fine-tune physiological strategies for further improving rFE (Herd and Arthur, 2009). Lu et al. (2015) proposed a model that elegantly derived rFE based on a simple reparameterization of a standard multiple trait (MT) analysis of its key component traits, namely, dry matter intake (DMI), milk energy (MILKE) and metabolic body weight (MBW). The MT model recognizes that the partial regressions relating DMI to MILKE and MBW may be different at genetic and non-genetic levels whereas residual feed intake (RFI), a classical measure of rFE, is based on specifying one universal set of partial relationships at the phenotypic level. Although Lu et al. (2015) used pedigree information to determine the numerator relationship matrix (A) between animals, it is certainly possible to substitute A with genomic relationships (G) constructed from SNP markers or even a hybrid relationship matrix (H) 74 combining genotyped and non-genotyped animals having phenotypes as in the single step (SSBLUP) approach (Aguilar et al., 2010). One important advantage of MT over RFI analyses is that it allows the use of data on animals with missing records on any of its component traits; subsequently, MT analyses mitigates the impact of selection bias when records on at least one trait are rarely missing such as, for example, MILKE (Pollak et al., 1984). Furthermore, the MT model is potentially useful for identifying potential pleiotropic SNP markers associated with multiple traits (Banerjee et al., 2008). Thus, MT extensions to GWA based on the model of Lu et al. (2015) could help further elucidate how candidate regions for rFE differ, if at all, from those of its component traits. Increasingly, more GWA studies have been based on mixed model association methods that simultaneously fit all markers to infer upon any one particular marker of interest (Yang et al., 2014). A particularly popular strategy is based on pursuing generalized least squares (GLS) inference upon the SNP marker of interest (i.e., treating it as a fixed effect) while treating all other SNP markers as random. This strategy is a feature of the EMMAX software (Kang et al., 2010) with an efficient implementation provided in Gualdron Duarte et al. (2014) . In GWA analyses that simultaneously fit all SNP markers, any given marker may exhibit only a weak association with a closely linked QTL, in part because of the multicollinearity issues incurred with linkage disequilibrium (LD) with neighboring markers and the large Bonferroni multiple comparisons penalty incurred with testing many different SNP in high LD with each other (Johnson et al., 2010). Hence there has been some momentum to using GWA inference based on joint tests on SNP markers within various partitions or windows containing groups of SNP 75 markers in order to increase sensitivity and specificity of GWA (Pahl and Schafer, 2010, Moser et al., 2015, Goddard et al., 2016). In this study, we compare GWA analysis on rFE based on using the classical RFI model with the MT model proposed by Lu et al. (2015) on an international rFE consortium dataset involving phenotypes on 6937 cows of which 4916 cows are genotyped. Our objectives were to 1) infer upon the genetic architecture for rFE and its components traits; 2) compare GWA inferences for rFE between base on the MT model and the RFI model; 3) demonstrate the utility of window-based inference for GWA. 4.3.1 The MT Model We propose extending our previously proposed MT model (Lu et al., 2015) to conduct GWA analyses on rFE as well as on its component traits, DMI, MILKE, and MBW. We reintroduce the MT model in Equation [4.1] for the three component traits arranged in order as i=1) MILKE, i=2) MBW, and i=3) DMI . [4.1] Now is the vector of responses for all n cows on Trait i =1,2, and 3 is a vector of fixed effects that included a linear regression on BW, station, first 76 through fourth order polynomial regressions on days in milk (DIM), parity class (i.e. primiparous versus multiparous), ration, the interaction between first through fourth polynomial terms on DIM and parity class for trait i. All covariates and dummy variables for these effects are defined by the known incidence matrix Xi. Furthermore, and represent vectors of animal genomic effects and residual effects, respectively, for trait i.ector of animal genomic effects for all three traits were - [4.2a] 77 [4.2b] [4.2c] Here denotes the genetic merit for subject i on MBW conditional on MILKE. denotes the genetic merit for subject i on DMI conditional on MILKE and MBW (i.e., DMI|MILKE,MBW), which Lu et al. (2015) have previously proposed as an alternative expression of genetic merit for rFE. Furthermore, ,, and are partial efficiencies or relationships at the genetic level and derived from the CD on u, that are used to specify these recursive relationships in Equations [4.2a]-[4.2c]. Finally,, , and with , , and defined to be distributionally independent from each other. Note that is the genetic variance for the rFE measure proposed by Lu et al. (2015). Lu et al. (2015) also demonstrated how relationships similar to those in Equations [4.2a], [4.2b], and [4.2c] can be specified at the residual level such that ,, and are analogously defined as residual partial efficiencies based on the CD on e. A classical RFI mode (Lu et al., 2015) can be written as a regression of DMI on MILKE and MBW with random genetic and residual effects . [4.3] Here denotes the vector of fixed effects for RFI with corresponding known incidence matrix XRFI. The fixed effects fitted in were the same as those fitted in from the MT model. Additionally, b1 and b2 are regression coefficients of DMI on MILKE and DMI on MBW, respectively, and define partial efficiencies at a composite phenotypic level. It is also assumed 78 that and. Lu et al. (2015) demonstrated that when partial efficiencies are identical at the genetic and residual levels (i.e. = and =), then inferences under the two competing models (RFI versus MT) should be essentially identical. 4.3.2 Single SNP Associations All (co)variance components ( can estimated using REML for both the MT ( ) and the classical RFI models (). Conditional on these REML estimates (), ixed model equations can be used to directly solve for BLUP of the animal genetic effects for each of the three component traits (i.e. ,, and ) in the MT model and for RFI (i.e., ) in the classical RFI model. Furthermore, as demonstrated by Lu et al. (2015), ,, and and the CD on u can be used to derive the BLUP of u3|1,2 (i.e., ) based on Equation [4.2c]. We partition into effects for non-genotyped () and genotyped () animals and do so similarly for , the BLUP of . As demonstrated previously by Wang et al. (2012) , the BLUP of SNP effects for trait i can be backsolved from using Equation [4.4] . [4.4] Here denote the BLUP of the SNP effects for the corresponding traits MILKE (i=1), MBW (i =2), DMI (i =3), DMI|MILKE,MBW (i =3|1,2) and RFI (i = RFI). Borrowing from Gualdron Duarte et al. (2014) , it can be readily demonstrated that where is the Bayesian posterior (co)variance matrix for conditional on , or equivalently, the prediction error (co)variance matrix 79 for conditional on based on classical linear mixed models theory (Searle et al., 1992), i.e., , or , respectively. Further details are provided in Appendix B. A test statistic used for GWA on SNP k for trait i can be written as in Equation [4.5] [4.5] Here, the denominator in Equation [4.5] is the square root of the k,k element in. Gualdron Duarte et al. (2014) and Bernal Rubio et al. (2016) demonstrated that this test statistic is equivalent to conducting GLS inference on gik while momentarily treating all other elements of gi as random, just as with EMMAX (Kang et al., 2010). That is, under the null hypothesis Ho: gik = 0, it can be demonstrated that zik ~ N(0,1). 4.3.3 Window-Based Associations Recognizing the potential issues with single SNP associations, adjusting for collinear SNP in high LD as described earlier, we also conducted window-based inferences. That is, we partitioned the genome into non-overlapping 1-Mb regions, similar to what has been done for GWA analyses in other cattle populations (Guo et al., 2012, Hawken et al., 2012, Alpay et al., 2014). This resulted in a total of R=2,673 windows. Suppose that Z can be partitioned accordingly into these R windows as with Zr having nr columns, implying then that window r contains nr SNP markers. Suppose that is similarly partitioned; i.e. such that gir is of dimension nr x 1. A joint test for all nr markers can be readily determined by computing the test statistic in Equation [4.6] [4.6] 80 where . Similar to Equation [4.5], the test statistic in Equation [4.6] allows for GLS inference on gir, treating all other SNP effects as random. That is, extending Bernal Rubio et al. (2016), under the null hypothesis Ho: gir = 0, it can be readily demonstrated that is distributed as a chi-square random variable with nr degrees of freedom. 4.3.4 Application to Dairy Consortium Data 4.3.4.1 Data Description The dataset was collected from 6937 Holstein cows involving 16 research stations or major studies from Scotland, the Netherlands, Canada, and the United States. All records on DMI, MILKE, MBW, and RFI were based on the first 28d after 50 DIM, based on the data edits described in Tempelman et al. (2015) and Lu et al. (2015). For cows having multiple lactations in the dataset, only records from the earliest lactation were used. Nine of the research stations were from the United States including Iowa State University (ISU) at Ames, IA, Michigan State University (MSU) at East Lansing, MI, the University of Florida (UF) at Gainesville, FL, the University of Wisconsin-Madison (UW) at Madison, WI, the United States Dairy Forages Research Center (USDFRC) at Madison, WI, the USDA Animal Genomics and Improvement Laboratory (AGIL) at Beltsville, MD, Virginia Polytechnic Institute and State University (VT) at Blacksburg, VA, the Dairy Research Facility at Miner Institute (MIN) at Chazy, NY, and the Purina Animal Nutrition Center (PANC) at Gray Summit, MO. Four stations/studies were from the Netherlands including an experimental herd (TGEN) in Lelystad previously described by Veerkamp et al. (2000) and Banos et al. (2012) ; the Nij Bosma Zathe (NBZ) herd located near 81 Leeuwarden and also previously described by Banos et al. (2012) ; and a compilation of studies (NLN) based on data collected from various nutritional experiments as described in Tempelman et al. (2015). Two stations derived from the UK: the Langhill (LAN) farm near Edinburgh from 1992 to 2001 upon which the herd was relocated to the Scottish Agricultural College (SAC) Dairy Research Centre based at Crichton Royal Farm near Dumfries with data collection from 2003 to 2011. Finally, the last station was the University of Alberta (UA) in Edmonton, Canada. A complete breakdown of number of animals with phenotypes and genotypes and timeline of data collection for each station is provided in Table 4.1. 82 Table 4.1: Brief characterization of dataset by stations1 1 UA = University of Alberta; UF = University of Florida; ISU = Iowa State University; MSU = Michigan State University; MIN= Dairy Research Facility at Miner Institute; UW = University of WisconsinMadison; PANC= Purina Animal Nutrition Center;USDFRC = USDA Dairy Forages Research Center; USDA AGIL = USDA Beltsville Agricultural Research Center; VT= erd, Lelystad, the Netherlands; NBZ= Nij Bosma Zathe, Leeuwarden, the Netherlands; ZOM = data based on the work of Zom et al. (2012); NLN = compilation of studies previously characterized by Tempelman et al. (2015); LAN = Langhill farm,Edinburgh, UK; SAC = Scottish Agricultural College. Area Station Number of Phenotypes Number of Genotypes Dates of Data Collection Canada UA 237 220 09/2007-06/2012 United States UF 507 377 02/2009-07/2015 ISU 952 930 05/2008-09/2015 MSU 271 264 02/2011-01/2015 MIN 57 51 02/2013-03/2013 UW 816 780 12/2007-07/2015 PANC 144 18 08/2009-12/2011 USDFRC 407 347 10/2009-04/2015 USDA AGIL 518 488 11/2007-09/2015 VT 90 54 02/2011-10/2013 NL TGEN 586 461 11/1991-04/1998 NBZ 100 91 07/2003-11/2004 ZOM 661 0 11/1991-01/2001 NLN 555 385 03/2003-03/2014 UK LAN 590 157 12/1990-03/2001 SAC 446 293 09/2003-11/2010 Overall 6937 4916 83 Genomic characterization was based on Illumina® BovineSNP50 Genotyping BeadChip, which features 54,609 informative SNP probes uniformly spanning the entire bovine genome. Genotypes were obtained on 4,917 of the 6,937 cows and pre-processed for parentprogeny conflicts checking by AGIL. These genotypes were imputed to a final set of 61,013 SNPs, including an additional 15,818 SNPs selected for the magnitude of their effect on economically important traits evaluated in the US (Wiggans et al., 2016). After editing and filtering, 57,347 SNPs were saved for the GWA analyses. The markers on the X chromosome were further partitioned into pseudoautosomal and X-specific components. 4.3.4.2 Real Data Analyses GWA analyses based on the MT and RFI models discribed above were applied to this rFE dairy consortium dataset. Variance components under both models were estimated using REML. Furthermore, BLUP and the corresponding SNP and window-based association tests, as previously described, for ,,, and under the MT model and for under the RFI model were determined as well. GWA analysis was provided for each component trait in the MT analysis. Bonferroni corrections based on a genome wide error rate of 5% were applied with the correction being based on the number (57,347) of SNP markers for single SNP associations and the number (2,673) of windows for window-based associations. Potential candidate genes located within windows deemed to have an association with the various traits were determined using the Bioconductor package biomaRt based on the UMD3.1 bovine assembly from Ensembl. 84 4.4.1 Genetic Parameter Estimates The estimated genetic and residual variances as well as the estimated heritabilities for two definitions of rFE and its component traits are provided in Table 4.2. Estimated residual and genetic covariances among the three component traits of the MT model are reported in Table 4.3. Residual variance estimates were nearly identical for RFI and DMI|MILKE,MBW such that the respective heritabilities were 0.180.04 and 0.170.03, consistent with those based on using only pedigree and not genotype information as previously reported by our group (Lu et al., 2015). The estimated genetic partial efficiencies relating DMI to MILKE and DMI to MBW were =0.420.06 and =0.110.02, respectively, whereas the estimated residual partial efficiencies relating DMI to MILKE and DMI to MBW were =0.360.07 and =0.100.02, again similar to what we previously reported(Lu et al., 2015). Table 4.2: Estimated variance components and heritability for residual feed efficiency and its component traits. 1 Milk energy 2 Metabolic body weight 3 Dry matter intake 4 Residual feed intake 5 Dry matter intake conditional on milk energy and metabolic body weight 85 Table 4.3: Residual and genetic covariances between the three component traits 1 i=1) MILKE (Milk energy), i=2) MBW (metabolic body weight), and i=3) DMI (Dry matter intake) 4.4.2 Single SNP Associations We compared GLS inferences of SNP effects for rFE between the two models (i.e. vs. ) based on the methods we described above. We noted that computed were slightly more shrunk to zero as opposed to that for (Figure 4.1). This larger spread for was likely caused by the slightly larger REML estimates of relative to . Nevertheless, they were still relatively close to and highly correlated with each other, as were diagonals of and , such that negative log10 P-values ( -log1(p) ) for GWA inference on RFI and for DMI|MILKE,MBW were in relatively good agreement (Figure 4.2). 86 Figure 4.1: Comparison of BLUP of SNP effects for RFI versus DMI conditional on MILKE and MBW. Line of slope 1 and intercept 0 superimposed. Figure 4.2: Comparison of single SNP inferences (-log10 (p)) for DMI conditional on MILKE and MBW versus those for RFI. Line of slope 1 and intercept 0 superimposed. 87 Manhattan plots for single SNP GWA inference for both rFE traits (RFI vs DMI|MILKE,MBW) are provided in Figure 4.3. There were no significant SNP associations detected for either DMI|MILKE,MBW (Figure 4.3a) nor for RFI (Figure 4.3b). For the component traits of rFE, the only significant associations were discovered for MBW (Figure 4.4b), namely two SNP markers (BovineHD0500030398 and Hapmap60480-ss46526970) located at 105,804,923bp and 105,870,613bp on BTA5 and 2 markers (BovineHD1800016753 and ARS-BFGL-NGS-109285) located at 57,516,245bp and 57,589,121bp on BTA18. However, no significant associations were determined for MILKE (Figure 4.4a) nor for DMI (Figure 4.4c). Figure 4.3: The Manhattan plots of single SNP for DMI conditional on MILKE and MBW(a) and for RFI (b). Bonferroni correction threshold superimposed. (b) (a) 88 Figure 4.4: The Manhattan plots of single SNP on MILKE (a), MBW(b) , and DMI(c). Bonferroni correction threshold superimposed. 4.4.3 Window-Based Associations The window-based Manhattan plots for RFI, and DMI|MILKE,MBW are provided in Figure 4.5. As further detailed in Table 4.4, the same 2 regions were found to be important for both RFI and DMI|MILKE,MBW, being on chromosome BTA12 centered at 2.50 Mb (BTA12_2.50) and on BTA 26 centered at 46.50Mb (BTA26_46.50). BTA26_46.50 was previously confirmed as an important region for rFE in Nelore cattle (Olivieri et al., 2014). Furthermore, Sherman et al. (2009) and Nkrumah et al. (2007) reported associations on BTA26 at 47 Mb for RFI in beef cattle, which were in good agreement with the association that we found within BTA26_46.50. (a) (b) (c) 89 Table 4.4: Physical location1 (in Mb) of significant regions detected for each trait Chr DMI3 MBW4 MILKE5 RFI6 DMI|MILKE,MBW7 5 106.50 12 2.50 2.50 14 7.50,9.50, 25.50 18 58.50,59.50 20 14.50 26 46.50 46.50 X-specific2 102.50 1Location of a window defined as center of that window 2 X-specific section on X chromosome 3 Dry matter intake 4 Metabolic body weight 5 Milk energy 6 Residual feed intake 7 Dry matter intake conditional on milk energy and metabolic body weight 90 Figure 4.5: The Manhattan plots of 1MB windows for DMI conditional on MILKE and MBW(a) and for RFI (b). Bonferroni correction threshold superimposed. We found 4 genes in the bovine genome assembly that physically overlapped with those two windows associated with rFE; these genes and a summary of their biological processes are listed in Table 4.5. These genes include disintegrin and metallopeptidase domain 12 (ADAM12, 45,848,827-46,238,138 bp). ADAM12 regulates broad biological processes, including modulation of cell morphological changes, satellite cell activation, and ectodomain shedding during signaling of muscle and fat development (Cao et al., 2003, Kawaguchi et al., 2003). The ADAM12 gene is also identified as a regulator for transforming growth factor-involved in the differentiation of human adipose tissue-derived mesenchymal stem cells into smooth muscle cells (Kim et al., 2012). Furthermore, ADAM12 regulates myogenesis and (b) (a) 91 adipogenesis in beef cattle (Coles et al., 2014) and was reported as a candidate gene for marbling score in Korean cattle (Ryu and Lee, 2016). ADAM12 was also suggested as a candidate gene for rFE in Nelore cattle (Olivieri et al., 2014, Olivieri et al., 2016). It is possible that ADMA12 gene affects rFE by directly regulating muscle development and fatty acid utilization. Thus, ADMA12 can be a promising candidate gene for rFE Table 4.5: Symbol, name, and involved biological functions of genes overlapped with 2 significant regions identified for residual feed efficiency in Holstein cows The window-based Manhattan plots for the component traits of rFE are provided in Figure 4.6. There were 2, 3, and 3 regions found to be associated with MILKE (Figure 4.6a), MBW (Figure 4.6b), and DMI (Figure 4.6c), respectively. The three detected regions for DMI were centered at 106.50 Mb on BTA5, at 58.50 Mb and at 59.50Mb on BTA18 (see Table 4.4). The detected window on BTA5 is between 2 SNPs (at 100,826,813 bp and 118,501,191 bp) detected for DMI by Veerkamp et al. (2012). The three identified regions associated with MBW were all on BTA14, centered at 7.50 Mb, 9.50 Mb, and 25.50Mb. Lee et al. (2013) detected 6 SNPs at 24-25 MB on BTA14 for carcass weight in Korean Hanwoo. Lu et al. (2013) reported 3 SNP associations at HAPMAP29891-BTC-007427 ( 8,485,892 bp) , UA-IFASA-6233 92 (10,090,807),and BTB-01532239(24,437,778 bp) in beef cattle that fall within the 3 windows detected for MBW in our study. Saatchi et al. (2014a) also reported a region at 24 Mb on BTA14 that was associated with MBW in Angus and Simmental × Angus populations. Saatchi et al. (2014b) later detected a genomic region at 24 Mb on BTA14 associated with BW across multiple beef cattle species. The 2 important regions for MILKE were centered at 14.50 Mb on BTA20 and at 102.50Mb on the X chromosome. The former window was previously reported to be associated with milk fat percentage and milk yield in dairy cattle (Plante et al., 2001, Waters et al., 2011). Figure 4.6: The Manhattan plot of 1-Mb windows on MILKE (a), MBW(b) , and DMI(c). Bonferroni correction threshold superimposed. (b) (a) (c) 93 4.4.4 General Discussion GWA analyses have been increasingly used to infer upon the genetic architecture of traits given the increasing availability of high density SNP marker panels. Our primary purpose, however, was not to just conduct GWA analyses on rFE and its component traits, DMI, MILKE, and MBW, but rather to demonstrate how the genomic architecture may differ substantially between traits that define rFE. We particularly focus on rFE(closely related to RFI) and DMI because of the current controversy on where to focus selection for improving efficiency of feed utilization (Berry and Pryce, 2014). Among genomic regions determined to be associated with the various traits using the window-based test, there was no regions in common between any of the four traits. This indicates that the genetic architecture of rFE, whether specified as RFI or as DMI|MILKE,MBW, is distinctly different from its component traits, including DMI. This result may at first seem counterintuitive since Kennedy et al. (1993) and Lu et al. (2015) demonstrated that a selection index involving DMI, MILKE, and MBW should be equivalent to a selection index on rFE, MILK, and MBW. However, because of multivariate normal assumptions on elements of u1, u2, and u3, it can be readily demonstrated that Equations [4.2a], [4.2b], and [4.2c] can be similarly expressed at the level of SNP effects. For example, by analogy with Equation [4.2c], it can be determined that such that is independent of , and also of and . Hence, GWA inferences on DMI|MILKE,MBW should be independent of inferences on DMI, MILKE or MBW simply by construction. We previously discovered quantitative genetic inferences on RFI were nearly identical to those for DMI|MILKE,MBW (Lu et al., 2015). Hence, 94 a orthogonal relationship between RFI with DMI, MILKE, or MBW similar to that between DMI|MILKE,MBW and these three component traits should nearly hold as well. In beef cattle studies, for example, no overlap was also determined between detected genomic regions for rFE and BW (Rolf et al., 2012, Saatchi et al., 2014a). The relationship suggests that GWA inferences on DMI should be somewhat driven by MILKE and MBW. As an illustration consider the 2 most significant associations, both on BTA 18, for DMI as we discovered earlier. The corresponding P-values for MILKE and MBW in those same regions were 0.05/0.008 and 0.0003/0.003. Hence, selection against DMI in those regions would have detrimental consequences for MILKE and MBW. By contrast, the 2 detected associations for DMI|MILKE,MBW, as well as for RFI, had corresponding P-values for MILKE/MBW of 0.38/0.26 and 0.32/0.10 , thereby partly confirming the genetic independence between rFE and its component traits. From the viewpoint of identifying candidate genes for maximizing efficiency, it seems that one should target attention on inferences involving g3|1,2 rather than g1. We previously noted that a general movement to more window-based inference as opposed to single SNP inference in GWA studies. Indeed, we did not detect any single SNP associations for either measure of rFE, nor for DMI or MILKE. It is possible that these traits are very complex in the sense that they are controlled by many genes each with small effects. Although we did identify two genomic 1MB regions associated with rFE based on the window test, the proportion of the genetic variance jointly accounted by those two regions was rather small (~1.1%) using the approach suggested by Fernando and Garrick (2013). Similar findings and conclusions on 95 non-detection of genomic regions for RFI in pigs and in beef cattle have been discovered (Lu et al., 2013, Do et al., 2014, Santana et al., 2014, Serão et al., 2016) . Herd and Arthur (2009) further noted that the physiological basis for feed energy utilization is a complicated process involving digestion, metabolism, activity, and thermoregulation, thereby suggesting a plethora of genes. Similarly, MILKE is a composite of milk, fat, protein, and lactose yield (Tempelman et al., 2015). GWA inferences on RFI versus DMI|MILKE,MBW were virtually identical, whether based on single SNP and window tests suggesting it may be rather inconsequential whether one chooses the MT versus RFI model for GWA. These similarities may be somewhat anticipated given the similarities in partial efficiency estimates between the genetic and residual levels for both DMI on MILKE () and DMI on MBW () as we have noted previously (Lu et al., 2015). However, we have recently noted that each of the partial efficiencies ,, , and may be heterogeneous across environments (Lu et al., 2017a) being similar to our previous findings that the partial regression parameters b1 and b2 in the RFI model of Equation [4.3] are heterogeneous across research stations(Tempelman et al., 2015). We briefly investigated whether the specification of heterogeneous station the partial regression parameters b1 and b2 to derive RFI in a manner similar to that pursued by Tempelman et al. (2015) might impact GWA inferences based on the RFI model. Figure C.1 in Appendix A provides the scatterplot of log10(p) for SNPs and window-based on the homogeneous partial regression RFI versus heterogeneous partial regression RFI models. The correlation between log10(p) values for SNPs and window-based on either model was 0.87 and 0.79. Admittedly, the detected regions between 96 the homogeneous and heterogeneous partial regression RFI models were not identical based on a formal Bonferroni test; nevertheless, both analyses models overlapped on 66% of the windows for P < 0.01. Accommodating heterogeneous partial efficiencies across research stations in a MT GWA analysis would be rather laborious using Monte Carlo Markov Chain approaches as pursued by Lu et al. (2017a). Nevertheless, these developments and others are worth pursuing in hierarchical Bayesian extensions for the MT model. For example, GBLUP assumes constant genetic variability across the genome whereas heavy tailed and/or sparse prior specifications have been known to exhibit better GWA properties (Moser et al., 2015). Similarly, one might perceive that ,, , and (i.e., pleiotropic associations) are also heterogeneous across the genome as well. It is also important to recognize that the single-step methods proposed by Wang et al. (2012) were vital to this GWA study given that 2021 of the 6937 cows with phenotypes were missing genotypes. For instance, when GWA analysis was based only using only phenotypes on genotyped cows, no significant associations were determined using window-based inference for RFI (Figure C.2bTable). Nevertheless the windows with the two most significant associations were the same in either case. A GWA study on rFE and its components traits DMI, MBW, and MILKE in dairy cattle was conducted based on a MT model involving the component traits and the RFI model. The same 2 genomic regions on BTA12 and BTA26 were concluded to be associated with rFE in both models. The BTA26 region included a putative candidate gene ADAM12 that possibly regulates 97 muscle development and fatty acid utilization. The MT model facilitated comparisons of important SNPs or regions between rFE (i.e., RFI or DMI|MILKE,MBW) and its component traits (i.e. MILKE,MBW, and DMI). GWA efforts focusing primarily on DMI will not be useful for targeting genomic regions important for rFE. 98 Chapter5: Modeling genotype by environment interaction over multiple covariates for genome-wide association analyses of residual feed efficiency in dairy cattle Residual Feed efficiency (rFE) vitally impacts the environmental sustainability of the dairy industry. Genome-wide association (GWA) analyses based on high-density genomic marker panels are useful to narrow down genomic markers or regions and, eventually the identification of potential candidate genes associated with traits of interest such as rFE. However, most such studies infer an overall effect for each marker across environments, without allowing for the fact that allelic substitution effects at each marker may be sensitive to various environmental covariates. This form of genotype by environment (GxE) interaction is plausible for complex traits like rFE, particularly for data collected over a wide range of climatic and production systems. However, as the number of plausible environmental covariates increase, such inferences can be too intractable and computationally infeasible. We tested a reaction norms (RN) model extension to an conventional genomic model (GBLUP) whereby some simplifying assumptions in the RN model allow one to tractably solve for a cumulative RN effect across multiple covariates followed by equally tractable backsolving for RN specific marker effects. Based on simulation embedded in the genotypes and design structure of our consortium data, we determined that the RN model had superior GWA properties to GBLUP at false positive rate from 0.025 to 0.15 and was able to detect RN effects. We also applied both models to the dairy rFE consortium data analyses, with environmental covariates being linear and quadratic terms on average production (AP), relative humidity (RH), temperature (TP), and the interaction between RH and TP. In a cross-validation study, the proposed RN model yielded 5% greater prediction 99 accuracy compared to the regular GBLUP analysis. Gene ELOVL4 and ATP5E were suggested as the candidate genes that have RN effects with linear term on AP and quadratic term on TP65, respectively. Our study indicated that GxE is an important genetic determinant for rFE. Residual feed efficiency (rFE) or dry matter intake (DMI) is being increasingly considered in dairy cattle selection programs. The availability of high-density single nucleotide polymorphism (SNP) marker panels has not only facilitated whole genomic prediction (WGP) for rFE or DMI but it also allows the use of genome-wide association (GWA) analyses to infer important genomic regions, and, potentially, any resident candidate genes or quantitative trait loci (QTL) that may influence these traits. Most GWA studies, however, focus on estimating the overall allelic substitution effect of any SNP marker across environments, assuming that any QTL in linkage disequilibrium (LD) with that marker has a constant effect across environments. Conversely, heterogeneity in these effects across environment, also known as genotype by environment (GxE) interaction, could be important for rFE in dairy cattle as is true with other complex traits. Advanced cattle reproductive technologies, such as artificial insemination and embryo transfer, facilitate the exchange of germplasm across many different environments such that inference and management of GxE would seem more important in dairy cattle production than in many other livestock species where such technologies are not as widely used. GxE modeling within a WGP or GWA context can lead to an extremely parameterized model if environment-specific SNP effects need to be modeled. A parsimonious strategy to model GxE is based on the use of environmental covariates. These models are often referred to as reaction 100 norm (RN) models and are based on modeling genetic or SNP effects as a function of a single covariate, often specified as the mean response for that particular environment or herd (Kolmodin, 2003). This covariate is often regarded to be a proxy for level or quality of herd management. In other words, GxE is modeled by specifying allelic substitution effects to be a linear function of a covariate(Lillehammer et al., 2009). If these SNP specific linear functions are large and important, the SNP marker is said to be environmentally sensitive. These models are being increasingly used to model GxE in WGP models (Lillehammer et al., 2009, Lopez-Cruz et al., 2015) since the dimensionality of the genomic part of the model is only doubled with a linear function on one covariate (Lillehammer et al., 2007a). A RN model is also an example of a random regression (Lillehammer et al., 2007b) or random coefficients model as the overall (i.e. average) and GxE effects for each SNP marker are modeled using random intercept and random slope specifications. There may be any number of potential covariates that could be responsible for generating GxE. However, increasing the number of such covariates, including second or higher order polynomials and interactions between covariates, may unduly increase model dimensionality as well. Recently, (Jarquin et al., 2014) proposed the use of covariance functions for modeling GxE as a function of a large number of environmental covariates. We adapt their approach and demonstrate how to extend it further to conduct GWA analyses for covariate specific components of GxE. Specifically, our objectives were to infer upon GxE for rFE and, specifically, to determine which genomic regions may be environmentally sensitive to covariate-specific values. 101 5.3.1 Conventional Genomic Prediction Model for Residual Feed Intake A commonly used response to characterize rFE is residual feed intake (RFI)(Tempelman et al., 2015). Based on definitions provided in (Berry and Crowley, 2013) and Kennedy et al. (1993), RFI is typically defined as DMI adjusted linearly for various energy sink covariates such as milk energy (MILKE), metabolic body weight (MBW), and change in body weight () over the course of the period from which DMI was recorded. Additionally specifying other effects that influence RFI, the statistical model can be specified as in Equation [5.1]. ; in [5.1] Here, the subscript i on DMI, MILKE, MBW, andindicates that records on those specific traits belong to animal i, and b1, b2 , and b3 are partial regression coefficients of DMI on ME, MBW, andrespectively. Furthermore, is a p x 1 vector of fixed effects with corresponding known row incidence matrix whereas is a vector of random SNP marker effects (i.e. random intercept) on RFI forbeing a known row incidence vector of genotypes on m markers for animal i. Even though genotypes are typically coded as the number of copies (0,1, or 2) of a reference allele, was standardized in the way implemented in VanRaden (2008). We assume throughout that and the residual . Note that RFI response for animal i, can be written as where typically b1, b2, and b3 are estimated in a first stage model to derive RFIi. In turn, RFIi. is written as a function of all 102 remaining effects in a second stage model; however, we prefer to fit all such effects simultaneously in Equation [5. Lu et al. (2015). 5.3.2 Reaction Norm Extension Involving Multiple Covariates (i.e. random slope) on - - 103 and as vectors of animal specific effects of overall and cumulative (across covariates) RN genetic merit, respectively, with where an n x 1 vector of values on covariate k. Furthermore, # denotes the Hadamard product. This covariance structure is . 104 5.3.3 Backsolving for SNP-specific Effects Using the theory of backsolving for BLUP of one set of random effects as linear function of another set within an equivalent linear mixed model (Henderson, 1977), being the BLUP of or the vector of animal effects for RN on covariate k can be determined from in Equation [5.4] using Equation [5.5]. . [5.5] Similarly, one could use the same theory to solve for or the vector of SNP RN effects specific to covariate k: [5.6] 5.3.4 Genome Wide Associations on SNP-specific Overall Effects and Reaction Norms BLUP estimates (i.e., ) by themselves do not directly indicate as to which chromosomal regions might be directly important for overall genetic or environment-specific RN effects. A common GWA inference strategy is to use the EMMAX procedure (Kang et al., 2010) whereby the marker of interest is tested using GLS, treating all other markers as random, or equivalently modeling correlation between animals based on a genomic relationship matrix constructed from all SNP markers. Recent developments described by Gualdron Duarte et al. (2014) and adopted previously by our group lead to the following GLS test statistic for testing overall genetic effects (k=0) and the RN effect for SNP j on covariate k: 105 . [5.7] Underneath the null hypothesis Ho: gkj = 0, it can be demonstrated that zkj ~ N(0,1). Using mixed model theory, and can be determined as element k,k of the expression in Equation [5.8]. . [5.8] Here is portion of the inverse of the coefficient matrix in Equation [5.4] that corresponds to the lower right portion corresponding to cumulative RN effects; i.e. [5.9] More detailed derivations can be found in Appendix D. For a GWA analysis that jointly fits all SNP markers, any given marker may exhibit only a weak association with a closely linked QTL, in part because of the high LD that exists with neighboring markers (Lu et al., 2017b). Hence, we also considered genomic based windows associations based on partitioning columns of Z, and hence elements of go, g1, g2 gc, into R distinct components or windows with each window containing a certain number of SNP markers and/or based on a genomic length. In this study, we partitioned the genome into non-overlapping 1megabase (MB) regions resulting in a total of R =2,673 windows. For joint testing Ho: gkr = 0 106 pertaining to window r for SNP effects in gk, k=0,1,2,..c, Lu et al. (2017b) extended the test from Gualdron Duarte et al. (2014) to the test statistic as in Equation [5.10] [5.10] where . [5.11] Under Ho: gkr = 0, the test statistic in Equation [5.10] is chi-square distributed with degrees of freedom nr, being the number of SNP markers in window r. Data on all key phenotypes were collected on 3,463 Holstein cows from 8 different dairy research stations including University of Alberta (AB) in Edmonton, Canada, Iowa State University (ISU) in Ames, IA, Michigan State University (MSU) in East Lansing, MI, University of Florida (UF) in Gainesville, FL, University of Wisconsin-Madison (UW) in Madison, WI, United States Dairy Forages Research Center (USDFRC) in Madison, WI, Virginia Polytechnic Institute and State University (VT) in Blacksburg, VA, and the USDA Animal Genomics and Improvement Laboratory (AGIL) in Beltsville, MD. Data editing procedures were outlined in Tempelman et al. (2015) and based on the first 28 d after 50 DIM as in Lu et al. (2017b) and Manzanilla-Pech et al. (2016). The distribution of cows in each station can be found in Table 5.1. Daily weather data including relative humidity (RH) and temperature (TP) was retrieved using the R package weatherData for all the test dates within each station. These weather records were averaged across the 28 days of data on each animal. 107 Table 5.1: Brief characterization of dataset by stations Country Station1 No. of animals USDFRC AGIL Overall 3463 1 AB = University of Alberta; UF = University of Florida; ISU = Iowa State University; MSU = Michigan State University; MIN= Dairy Research Facility at Miner Institute; UW = University of WisconsinMadison; PANC= Purina Animal Nutrition Center; USDFRC = USDA Dairy Forages Research Center; USDA AGIL = USDA Animal Genomics and Improvement Laboratory; VT= Virginia Polytechnic Institute and State University; Genomic characterization was based on Illumina® BovineSNP50 Genotyping BeadChip, which features 54,609 informative SNP probes uniformly spread over the bovine genome. Raw Genotypes were pre-(AIPL) where parent-progeny conflicts were checked. Genotypes were then increased to a final set of 61,013 SNPs, including additional 15,818 selected SNPs from bovine HD chips based on the magnitude of their effect on traits evaluated in the U.S. (Wiggans et al., 2016). Further editing and filtering were implemented to rule out SNPs with MAF <0.05 and in perfect LD which resulted in 57,347 SNPs being left. The SNPs on X chromosome were classified into two groups: pseudo-autosomal and X-specific SNPs. 108 5.3.5 Simulation Study To partly validate our proposed method and model, a simulation study was conducted based on actual genotype and data structure within our dairy rFE consortium dataset. Environmental covariates were first, second, third and fourth order polynomial terms on average contemporary group production (AP) for the contemporary group defined as a group of cows in 28-day test period within the same station. First to fourth polynomial terms of AP were standardized to mean of 0 and variance of 1. Thirty of the 57,347 markers were randomly chosen to be QTL. With columns of Z also standardized to mean 0 and variance of 1, random intercepts (i.e., QTL subset of go) was simulated from independent Gaussian distributions with mean of 0 and variance of 1000 such that variance of overall genetic merit was roughly 0.50. RN effects on first to fourth order polynomial terms of AP (i.e., c = 4) were simulated from independent Gaussian distributions with the same variance of 200 and mean of 0. Phenotypes were based on simulating breeding values on all such QTL effects plus a residual generated from a Gaussian distribution with variance = 2.0 so that the heritability under the average environmental covariate value was roughly 0.20. The heritability of rFE ranged from 0.20 to 0.46 across the values of AP. Ten replicated datasets were simulated in this manner and then analyzed based on the proposed RN model and a conventional GBLUP analyses. Variance components were estimated using REML in all cases with the R package regress. Window-based GWA tests based on non-overlapping 1MB genomic regions and the chi-square test presented by Lu et al. (2017b) were conducted for random intercepts (go) and slopes (g1, g2, g3, and g4) on each replicate using Equation [5.10]. A true positive was claimed if the 109 corresponding window including that QTL was declared to be significant based on a Bonferroni test. Conversely, a false positive was determined if a window was deemed to be statistically significant but did not include a QTL. The rates of true and false positives were used to compare receiver operating characteristic (ROC) curves averaged across the 10 replicates between the proposed RN model and conventional GBLUP. The area under the ROC curve (AUC) compared between two models at various false positive rates (FPR) (Wray et al., 2010) based on a nonparametric Wilcoxon Signed-Rank test, blocking on data replicate. 5.3.6 Application to RFI Data The proposed RN model was applied to the joint dairy rFE dataset described above. For modeling RN on milk production, it has been dbe best suited to model the relationship between milk yield and TP, recognizing that heat stress and hence potential GxE effects due to TP might not occur until beyond a certain breakpoint. This breakpoint has often deemed to be around 72ºF in various studies involving milk yield(Bohmanova et al., 2007, Aguilar et al., 2009, Ludovico et al., 2015). Below this breakpoint up until cold stress, which is not likely a major consideration with intensively managed research facilities in this study, animals are believed to be in a thermoneutral environment by which there should be little, if any, regulatory changes in metabolic heat production or evaporative heat loss . We decided to infer upon this breakpoint via a series of 8 RN models/analyses whereby 6 of the models were labeled TP55, TP60, TP65, TP70, TP75, TP80, for which TP breakpoints on RN specifications ranged from 55 to 80 ºF in 5 ºF increments, respectively. The other two models were TPN where no breakpoint was specified and TPG as a conventional GBLUP model. 110 Specifically, environmental covariates used for SNP-specific RN effects for each of 8 RN models included linear and quadratic terms on corresponding TP (i.e., TPN, TP55, TP60, TP65, TP70, TP75, or TP80), AP, RH, and crossproduct between corresponding TP and RH. Thus, a total of c=8 environmental covariates were considered for any RN analyses on RFI. The vector of fixed effects in Equation [5.3] for the RN models included the effects of stations, rations within stations, first through fourth order polynomial effects of DIM, parity effects, the interaction between first through fourth order polynomial effects of DIM and parity effects, and fixed environmental covariates. To facilitate a likelihood ratio test (or equivalently information criteria like assessments based on REML), the same fixed environmental covariates were specified in for all 8 candidate models. These fixed environmental covariates included linear and quadratic effects on RH and AP and 6th order Bspline functions on each of TP and TP*RH, the linear by linear crossproduct of TP and RH. All variance components were estimated by REML. SNP-specific RN effects for each environmental covariate were back-solved using Equation [5.6] as previously noted and single SNP hypothesis tests were conducted. We subsequently investigated inferences for non-overlapping 1MB windows based on the chi-square test in Lu et al. (2017b) as shown in Equation [5.10]. Association between regions and the trait of interest were declared if the test was significant based on a Bonferroni adjustment for the number of SNP markers for single SNP tests or number of windows for window-based tests. Potential candidate genes located within windows deemed to have RN with the various environmental covariates were determined and 111 their gene function information were retrieved using the Bioconductor package biomaRt based on the UMD3.1 bovine assembly from Ensembl. As a comparison between RFI and DMI, we also fitted the proposed RN model on DMI. The difference between the analyses on RFI and DMI was simply that the covariates ME, MBW, and were not fitted in RN model in Equations [5.2-5.3]; i.e., . All other effects were the same as those considered for RFI. 5.3.7 Cross-validation Assessment In order to compare the performances between the best fitting RN model (based on the chosen breakpoint for TP) and the regular GBLUP model for rFE, a 5-fold cross-validation was performed. DMI prediction accuracy determined as the correlation between true DMI and predicted DMI in the validation dataset based on estimates derived from the analyses of the training dataset was computed and compared to determine the performances of two models. 5.4.1 Simulation Study ROC curves for the random intercepts or overall SNP effects (go) based on the RN model and the conventional GBLUP were plotted for 1MB windows (Figure 5.1). Generally, the ROC curve for the RN model straddled the ROC curve for the GBLUP model although the ROC curve for the RN model tended to exceed that for the GBLUP model for false positive rate (FPR) rates between 0.025 and 0.30. To formally test if difference was significant, we calculated the AUC for FPR at 0.01, 0.025, 0.05, 0.15, 0.30, 0.40, and 0.50 for the regular GBLUP and the RN model. The nonparametric Wilcoxon Signed-Rank test on AUC implied that the RN model significantly 112 increased QTL detection power at FPR ranging from 0.025 to 0.15 (see Table 5.2); nevertheless differences were small. Table 5.2: Areas under receiver operating characteristic curve (AUC) between the classic GBLUP and the reaction norm (RN) model at different false positive rates. 1 Based on the nonparametric Wilcoxon Signed-Rank test on the AUC Figure 5.1: Receiver operating characteristic (ROC) curves for random intercept for 1Mb non-overlapping windows based on the reaction norm (RN) model and the classic GBLUP averaged across 10 replicates. Line of slope 1 and intercept 0 superimposed. The ROC curve for RN inferences on first to fourth order polynomial terms on AP are also provided in Figure 5.2. Generally, the ROC curves for the RN effects on all four polynomial 113 terms of AP were above the reference line with slope 1 and intercept 0. This reference line is expected if QTL windows for RN effects are simply guessed. The nonparametric Wilcoxon Signed-Rank test on AUC at FPR ranging from 0.01 to 0.50 confirmed that RN effects on all four polynomial terms of AP were significantly different from reference line (P<0.01), indicating that inferences on windows containing RN effects are at least superior to random guessing. Furthermore, there seems to be no evidence of differences between AUC for ROC curves for any of the four RN polynomial specifications. Furthermore, AUC were generally much smaller for RN inferences (Figure 5.2) compared to random intercept or overall genetic effects (Figure 5.1). Figure 5.2: Receiver operating characteristic (ROC) curves for random slopes on first to fourth order polynomial terms of AP for 1Mb non-overlapping windows based on the reaction norm (RN) model and the classic GBLUP averaged across 10 replicates. Line of slope 1 and intercept 0 superimposed. 114 5.4.2 Real Data Analyses To determine an appropriate breakpoint for TP on RFI and DMI, variance components and LRT statistics, i.e. -2 times the difference in log REML likelihood between full models in TPN and TP55-TP80 and reduced model in TPG, were calculated and summarized in Tables 3 and 4. Compared to the regular GBLUP model in TPG, the 7 tested RN models involving various breakpoints on TP all significantly fitted better than GBLUP for both RFI (Table 5.3), and DMI (Table 5.4), with seemingly greater statistical significance of RN on RFI compared to DMI based on the relative magnitude of the LRT. Table 5.3: Estimated variance components and likelihood ratio test for residual feed intake based on the reaction norm (RN) models with various breakpoints on temperature (TP). 1 Likelihood ratio test , relative to TPG model 115 Table 5.4: Estimated variance components and likelihood ratio test for dry matter intake based on the reaction norm (RN) models with various breakpoints on temperature (TP). 1 Likelihood ratio test , relative to TPG model Model TP65 was the best fitting model for RFI having the largest LRT statistic relative to GBLUP suggesting a broken stick RN relationship for TP at a breakpoint of 65ºF. This thereby suggests that GxE as a function of TP is important above 65ºF. For the best fitting RN model for RFI (i.e. TP65), Given that genotypes are standardized in the manner described previously, the diagonals of ZZvely close to 1 (0.913-1.162), implying that the diagonals of are relatively close to the diagonals of which are simply the sum of the squared environmental covariates for any one animal; i.e., element i of DD. So for a genomically non-inbred animal (i.e. corresponding diagonal element of ZZthe heritability based on this model is defined to be . This model further constrains the heritability to be lowest in an average environment (i.e., such that and subsequently ). Based on the estimates provided in Table 5.3, this implies that the estimated heritability for rFE at an average 116 environment, consistent with what we have reported previously (Lu et al., 2017b). The highest heritability determined for RFI is for at the most extreme environment. 5.4.3 Genome-wide Association Analyses for RFI Random Intercept: Genome-wide association analyses for RFI using both the single SNP and 1MB window-based inference strategies were conducted under the best fitting RN model (i.e., TP65) as well as GBLUP (i.e., TPG). Firstly, there was no evidence of an association for overall genetic effects (go) based on neither single SNP (Figure D.1 in Appendix D) nor window-based inferences (Figure D.2 in Appendix D) whether using the conventional GBLUP model or the TP65model. Furthermore, we noted that there was little change in log10(P-values) between either model for either of the two types (single SNP versus window-based) of inferences on random intercept go (Figure 5.3). Figure 5.3: Comparison on -log10(p) of SNP (a) and 1Mb window (b) for random intercept between the GBLUP and the reaction norm (RN) model. Line of slope 1 and intercept 0 superimposed. 117 We also inferred SNP-specific and window-specific associations for RN on each environmental covariate or polynomial or interaction thereof. Manhattan plots for SNP-specific RN effects (i.e. random slopes) on linear and quadratic terms of TP65, RH, AP, and interaction between TP65 and RH were provided (see Figure D.3-D.4 in Appendix D). For SNP specific associations, there was only 1 detected SNP (Hapmap58910-rs29012613, 20,171,743 bp) on BTA 9 pertaining to the linear RN on AP (see Figure D.3). Manhattan plots for window-specific RN effects on various environmental covariates are also provided (see Figures D.6-D.8 in Appendix D). A region at 20.5 Mb on BTA9 was found to have a significant association for linear RN effect of AP whereas a region at 82.5 Mb on BTA5 was associated with a quadratic RN effect of AP (see Figure D.6a and Figure D.6b). Two windows on BTA13 centered at 49.5 Mb and 58.5 Mb, were associated with the quadratic RN effect of TP65 (Figure D.7b) whereas no such region was detected to be associated with the linear RN term of TP65 (Figure D.7a). Furthermore, there was no detected important region having RN association for linear and quadratic terms of RH nor interaction between TP65 and RH (see Figure D.8). A few candidate genes in the regions that identified having substantial RN effects with corresponding environmental covariates were retrieved and are listed in Table 5.5. 118 Table 5.5: Symbol, name, and biological process of genes have reaction norm (RN) effects with temperature breaks at 65F (TP65) and contemporary group milk production (AP) for the residual feed intake - 5.4.4 Cross-validation Accuracy We computed the accuracy of DMI prediction for both the regular GBLUP model and the best fitting RN model. Prediction accuracy for DMI defined as the correlation between true and estimated DMI in validation dataset for each folder was listed in Table 5.6. Prediction accuracy for DMI in the simplified RN model was generally 5% higher compared to that based on the regular GBLUP (P<0.01). 119 Table 5.6: Cross Validation Prediction Accuracies for each of 5 different validation datasets (fold) under the regular GBLUP and the reaction norm (RN) model. In this study, we considered the high dimensional RN model proposed by Jarquin et al. (2014) to investigate GxE for RFI over linear and quadratic terms on TP, RH, AP, and interaction between TP and RH. These covariates are commonly considered in GxE studies in plant and animal breeding (Hayes et al., 2016). We did not discover any significant SNP for overall genetic effect (i.e. random intercept). Similarly, there have been other studies failing to detect any genomic associations with rFE suggesting that it is a rather complex trait (Lu et al., 2013, Do et al., 2014, Santana et al., 2014, Serão et al., 2016). We also detected no significant associations for overall genetic effects based on window-based associations. This result was unlike what we have determined previously (Lu et al., 2017b). However, that previous work entailed the use of single-step GBLUP techniques (Aguilar et al., 2010) which permit the incorporation of data on non-genotyped cows. Furthermore, that previous analysis also included cows from countries other than U.S. and Canada; we were not able to obtain weather information on data from research stations in the United Kingdom and in the Netherlands. 120 We also noticed that inferences on overall or random intercept effects whether based on single SNP or window-based associations differed very little between conventional GBLUP and the RN models. This might imply that whether or not one fits RN effects does not weaken the ability to detect QTL effects averaged across environments. Noting that cross validation indicated that our RN model yielded higher prediction accuracies for DMI compared to the regular GBLUP, GWA inferences lead to the detection of four regions having significant RN effects with environmental covariates AP or TP65. These results suggest these genomic regions are environmentally sensitive to these covariates for their effect on rFE. Among genes identified in those regions, the two most promising candidates seemed to be gene ELOVL Fatty Acid Elongase 4(ELOVL4, 19,842,903-19,881,703 bp) and ATP synthase subunit epsilon (ATP5E, 57,866,138-57,869,829bp). ELOVL4 has been related to the biosynthesis of fatty acids, while gene ATP5E controls mitochondrial ATP synthesis (Mayr et al., 2010). Fatty acids are important in feed digestion and metabolism and control body weight and insulin sensitivity (Canfora et al., 2015), whereas ATP plays important roles in intracellular energy transfer and involves in many biological processes including growth and maintenance (Maruyama, 1991, Tort et al., 2011). The RN model proposed by Jarquin et al. (2014) provided an efficient approach to investigate GxE under multiple environmental factors for WGP and GWA studies. It is much more computationally feasible compared to the classical RN model, thus widely used in plant and human genomic research (Crossa et al., 2016, De Coninck et al., 2016, Vazquez et al., 2016). Note that if all those random RN slopes are random draws from an identically and independent 121 normal distribution and zero correlations are specified between random intercept and RN slopes, then Jarqui. Note that the common variance for all RN slopes is a rather strong assumption. The natural follow-up question is what are the implications for GWA inferences if, as expected, variance components are rather heterogeneous across different RN covariates, including different polynomials or interactions? We speculate that the implications may be not so serious at first glance, although admittedly the seriousness of this issue will depend greatly on the degree of heterogeneity in variance components between RN terms. Firstly, as indicated earlier, it is important to invoke standardizations on covariates so that scale dependence is alleviated, and this is what we did in this study. Secondly, our GWA inferences are based on test statistics which are equivalent to GLS test statistics. That is, these tests specify the single SNP effect or joint SNP effects in a window fixed effects in hypothesis testing. Hence the mis-specification of variance components should only misalign somewhat the specification of genomic-based kinship covariances between individuals in a manner similar to that used in EMMAX, noting that one woul., ) with th One possible strategy to allow for covariate or RN specific variance components is to fit each covariate separately in a series of single-covariate RN analysis and estimate the covariate-specific RN variance component for covariate k for each of c separate single covariate RN analyses. Upon collating the corresponding estimates , kc across these analyses, one might then jointly fit all covariate-specific RN effects in the J. Admittedly, this might be considered a rather ad hoc 122 solution. A potentially more elegant yet computationally demanding method could be based on Bayesian variable selection. Bayesian variable selection approach automatically allows difference variances on those individual RN terms and performs shrinkage on those less important RN terms so that no preselection on the environmental covariates is needed. Some of this work has been already been done in the context of single RN covariate specifications (Yang et al., 2015). For high dimensional environmental covariate modeling, it seems rather challenging to estimate unique variance components and/or covariances between all possible RN terms in somewhat finite datasets. One conceivable strategy might be to extend the model with augmented variables such that there is a borrowing of information across RN specific variance and covariance components using hierarchical Bayesian modeling(Bello et al., 2012). Abetween any of go, g1, g2 gc. With doing so, this invokes the dubious assumption that the total genetic variance and hence, heritability, is lowest at the point where is smallest. However, it is not necessarily logical to presume that this should even be so. Perhaps one strategy to help circumvent this issue is to determine the environments (and covariate values thereof) where heritability is estimated to be lowest in a series single covariate RN analyses and specify these values as the reference values for which the standardized covariates are expressed covariance between intercept and every single slope and fit that to solve the mixed model equation of Equation [5.4] but that may be as dubious as an assumption as specifying zero covariance between intercept and every single slope. 123 Finally, -step GBLUP analyses to include information on non-genotyped animals as well. Furthermore, the analyses in this chapter were based on the RFI phenotype whereas we have highlighted a multiple trait approach to modeling rFE in this dissertation (Lu et al., 2015). The to the multiple trait approach is possible although that might potentially entail specifying RN on all of the component traits which could be somewhat laborious. We extended the high-dimensional environmental covariate RN model from Jarquin et al. (2014) to investigate which genomic regions might be contributing to environmentally sensitive to various covariates for rFE. We were able to demonstrate using a simple simulation study that this model is able to detect associations for various orders of polynomial RN effects although the impact of detection of causal variants for overall genetic effects was very slight relative to a conventional GBLUP model. We conducted GWA analyses on a dairy feed consortium study using inferences based on both single SNP marker and 1 MB window associations. We detected no association for overall genetic effects. However, two windows on BTA9 and BTA5 were found to have significant associations with the linear and quadratic RN on AP, respectively, and two genomic regions on BTA 13 were associated with the quadratic RN effects of TP65. Cross-validation confirmed that GxE plays an important role for influencing rFE. However, this model was based on some simplifying assumptions that should be extended with future methodological research. 124 Chapter6: Conclusions and Future Work The primary focus of this dissertation has been to develop meaningful methodological extensions, whether based on either classical mixed models theory or on hierarchical Bayesian inference, in order to develop a much stronger statistical framework for the quantitative genetic modeling of residual feed efficiency (rFE) in dairy cattle than what has been considered in previous research. These extensions are timely since there are simultaneous global research efforts in collecting dry matter intake (DMI) data for the purpose of improving the long-term economic and environmental sustainability of dairy cattle production. One such project is the USDA--NIFA Grant # 2011-68004-30340) which has funded this effort and provided the data for this dissertation. This data consists of DMI, body weights (BW) and milk production and its components collected between 50 and 200 days in milk (DIM) on nearly 7,000 cows from 16 research stations in 4 countries. In Chapter 2, we developed a genetic measure of rFE based on a multiple trait (MT) analyses of DMI, metabolic body weight (MBW = BW0.75), and milk energy (MILKE) based on a linear rFE is residual feed intake (RFI) which is based on the estimated residual from fitting DMI as a linear function of MILKE and MBW as well as change in body weight throughout the recording period. The RFI is then typically modeled as a function of genetic effects in a second stage model although we have made the case that RFI inferences can be based on a single-step model (Lu et al., 2015). We demonstrated in Chapter 2 how the RFI modeling strategy assumes that the partial relationships between DMI and MILKE and MBW are the same at both genetic and non- 125 genetic levels. Conversely, our proposed MT model, invoking the use of the Cholesky decomposition on the genetic and residual variance-covariance matrices, allows these partial relationships to be specified differently for genetic and residual levels. The potential impact of large discrepancies between genetic and residual partial relationships was demonstrated by simulation such that the MT model can lead up to 4% greater accuracy of genetic prediction. However, there appeared to be no appreciable difference in fit between the MT and RFI model when applied to the USDA-NIFA feed efficiency project data. Heritability estimates (± standard error) were also similar for rFE measures under both models, being 0.14 ± 0.03 under the MT model and 0.16 ± 0.03 using the RFI model. Nevertheless, the MT approach is promising in the sense that it could handle missing data; this is a non-trivial consideration for future genetic evaluations of rFE knowing that the number of records on DMI will generally be far less than MILKE, for example. This should then provide a more promising approach to provide genetic merit on rFE for all US dairy cows than the current ad hoc strategy consider in the Total Performance Index (TPI) by Holstein Association USA which bases rFE on a linear formula involving milk produced and DMI (http://www.holsteinusa.com/genetic_evaluations/ss_tpi_formula.html). In fact, we also demonstrated in Chapter 2 that once economic weights (i.e. revenues and/or costs) of DMI, MBW, and MILKE are known, it is fairly straightforward to determine the relative economic weightings of rFE to MILKE for example. Given that Tempelman et al. (2015) discovered that the partial relationships relating DMI to MILKE and MBW are heterogeneous across research stations in RFI modeling, we extended the MT model in Chapter 3 using hierarchical Bayesian analyses to allow for multifactorial modeling of these partial relationships at both the genetic and residual level as a function of 126 various management and environmental factors, including station, parity, DIM, and ration. Data on all components of feed efficiency (i.e. DMI, MILKE, MBW, and 28 days after 50 DIM for each cow within the USDA-NIFA study. Results clearly indicated that genetic and residual variation in rFE is highly heterogeneous across stations, parity, or diet such that the heritability for rFE ranged widely from 0.16 to 0.46. We also found that heritability for rFE in primiparous cows was nearly twice of that in multiparous cows. Although there was some evidence of heterogeneity across stations for partial relationships between DMI and MILKE and DMI and MBW at the residual level, there was no such evidence at the genetic level. At any rate, modeling these different layers of heterogeneity contributed to a 4% higher prediction accuracy on DMI compared to the standard specification from Chapter 2. We then investigated a comparison of the MT versus RFI approaches for genome wide association (GWA) analyses in Chapter 4. For both approaches, we adopted the single-step genomic BLUP approaches developed by Aguilar et al. (2010) using the Illumina® BovineSNP50 Genotyping BeadChip panel and 28-day records from the USDA-NIFA FE dataset. Inferences were conducted based on single SNP tests and 1 MB window-based joint tests on multiple SNP markers, the latter used to mitigate the impact of multicollinearity due to linkage disequilibrium. For either modeling approach, no single SNP associations with rFE were detected whereas the window-based approach led to the detection of 2 genomic regions within BTA12 and BTA26 significantly associated with rFE. Nevertheless, the MT approach provided important insights that GWA inferences on DMI and on rFE are not synonymous as we provided evidence that the two most significant GWA associations for DMI were partially driven by mild pleiotropic associations with MILKE and MBW whereas GWA inferences on rFE are completely different from the three other traits. 127 It is possible that any number of environmental covariates could be drivers for genotype by environment (GxE) interactions for economically important dairy traits such as rFE. These sorts of GxE inferences are typically based on reaction norm (RN) models whereby the allelic substitution effect for a particular SNP marker is modeled to be a linear function of a covariate (or polynomial or interaction thereof). Hence, it might be useful to determine specifically which SNP markers and/or regions are associated with these particular GxE covariate drivers so that we can identify genomic regions that are environmentally sensitive. We adapted the model proposed by Jarquin et al. (2014) which allows a large number of such covariate driven RN terms to be fitted simultaneously based on some simplifying assumptions; namely that the RN variance is the same for each covariate and that there is no covariance between the overall average SNP effect and its RN. After validating GWA and various levels of environmental sensitivity inferences from this model using a simple simulation study, we applied it to the North American subset of the USDA-NIFA consortium dataset. Environmental covariates considered were various terms involving temperature (TP), relative humidity (RH), and average contemporary group milk production (AP). In the GWA studies on GxE effect, 4 regions were found that were environmentally sensitive to various polynomials in TP or AP. The importance of modeling GxE as a function of RN on these covariates was further established as the GxE RN model yielded a 5% higher cross validation prediction accuracy on DMI compared to the conventional GBLUP model. We confirmed that rFE is a heritable trait such that it is possible to improve it by genetic selection. Furthermore, our proposed MT model offers an innovative way to incorporate rFE in the dairy breeding plan. We proposed to simultaneously select for DMI, MBW, and MILKE with 128 recognition that multiple-trait selection on DMI, MBW, and MILKE is the same as multiple-trait selection on rFE, MBW, and MILKE (Kennedy et al., 1993). We subsequently determined that the nature of the genetic and non-genetic relationships among the component traits of rFE is highly heterogeneous and subject to different environmental and management conditions. Furthermore, within a GxE context, we again confirmed the importance of herd management impacting allelic substitution effects, and hence heterogeneity of genetic variance. The importance of management practices on influencing genetic variability, and hence estimated genetic merit, for various dairy production traits has been made in other studies as well (Bello et al., 2012, Martin-Collado et al., 2015). Our GWA results suggested that rFE is likely controlled by many genes with small effects making it rather challenging to target specific genomic areas for special attention. Although we identified two regions having significant associations with rFE based on the window-based test, it appeared that those two regions only contribute to a very small proportion of the total genetic variance (1.1%). Thus, I personally believe it may not be beneficial to conduct further functional experiments to locate candidate genes within these two regions. We also noted that GxE and management driven heterogeneities play an important role in genetic regulation of rFE, thereby adding to the genetic complexity of this trait. This dissertation has addressed several important issues with various integrated chapters or studies although it might be apparent that there may be a need to jointly infer some of the various phenomena considered in this dissertation. For example, the heterogeneous variance and partial regression modeling at the genetic and residual levels in Chapter 3 should not be considered really distinctly different from the GxE modeling considered in Chapter 5 since GxE generates 129 heterogeneous genetic variances across environmental covariates. However, the fact that both models are already highly parameterized may require some judicious approaches to model development; for example, perhaps heterogeneous genetic variance and partial relationship modeling might be specified only at the level of the overall (i.e. random intercept) genetic effects in the RN based GxE inferences. Furthermore, we note that others (Aggrey and Rekaya, 2013, Saatchi et al., 2014a) have additionally sought to determine whether or not there is genetic variability in the partial relationships of DMI with MILKE and with MBW. Again, it may be difficult to accommodate many more extensions without increasing dataset sizes. Except for Chapter 3, the bulk of this work has been based on the implementation of classical mixed models theory; i.e., BLUP based on REML (Chapters 2, 4, and 5). We recognize that Bayesian regression methods, especially those based on heavy-tailed or variable selection specifications (de los Campos et al., 2013) may have advantages over the BLUP assumption of normally independently and identically distributed SNP effects. Implementation of these procedures using Markov Chain Monte Carlo methods can be computationally onerous and plagued by numerical instabilities; e.g. poor mixing, especially in high dimensional models (Tempelman, 2015). However, if conducted conscientiously, these Bayesian regression procedures can augment prediction accuracies as well as accuracies of GWA inferences (Moser et al., 2015). For example, although Bayesian regression strategies have already been proposed for single covariate RN models(Yang et al., 2015), they conceptually could be extended further to select covariates for higher dimensional GxE analyses as we conducted in Chapter 5. Bayesian inference may be particularly useful to extend the distributional assumptions of the with a prior distribution. 130 It is also important to recognize that the concept of an RFI-like rFE measure as I have investigated in this dissertation is not necessarily synonymous with overall feed efficiency or economic efficiency. Selection index for dairy breeding should include component traits in order to favor the dilution of maintenance and decrease feed intake. In an addition, fixed overhead costs (e.g. stall or pen space) are not adequately represented when penalizing high producing cows that have a much higher DMI consumption than anticipated (i.e. highly positive RFI or smaller feed saved). With increasingly better ways to quantify all input costs, including feed, for the individual cow, it is necessary to integrate the economic and genetic efficiency of dairy production, and promote novel economic evaluations of rFE that are not just based on minimization of feed costs. Future work on the genetics-economics interface of rFE is then vitally important. This should include more optimal herd management, including grouping strategies, to maximize milk output relative to feed and other input costs of which the genetic potential of RFI-like measures is only a small component. 131 APPENDICES 132 Supplementary Figures and Tables Table A.1: True values for each of seven key parameters in 162 run response surface design simulation study. 133 Table A.1: (con 134 Table A.1: 135 Table A.1: 136 Derivation of Full Conditional Densities for MCMC implementation Much of the following developments closely follow the Appendix in Bello et al. (2010) noting that their developments were based on a 2 trait rather than a 3 trait system. FCD of and u: We write the entire dataset on all subjects as . Furthermore, write the overall fixed and random effects design matrices as and , respectively. We also define , noting that is ordered by traits within animals and denotes the direct sum operator (Searle, 1982). Hence, it should then be readily noted that. We similarly define with inverse noting that can be readily determined by merely rearranging elements of by animals within traits rather than by traits within animals. One might specify subjective multivariate normal prior densities on fixed location parameters for each trait: , with hyperparameters and being specified as known. Bounded uniform priors are also commonly considered, i.e., 137 (Sorensen and Gianola, 2002) as, typically, enough data is available to infer upon elements of with any reasonable noninformative prior distribution in large field studies (Gelman, 2006). Using (Sorensen and Gianola, 2002) that the joint FCD of is multivariate Gaussian: [B1] for , , and . There are a number of different alternative strategies for sampling from elements of , including single site or univariate Gibbs updates and block sampling strategies that exploit the sparsity (i.e., high frequency of zero elements) in as we invoked in this study. Note then that draws of breeding values for the three traits,, , and can readily be determined as , and , respectively, for A = CCC being the Cholesky decomposition of the relationship matrix A. Furthermore, draws of can then simply be determined as a vector with elements whereas draws of can be determined as a vector with elements noting that Similarly, draws of can be determined as whereas draws of are determined as . FCD of,,. 138 Note from Equations [2.2], [2.3b] and [2.4b] that one could rewrite the model for Trait # 2 (i.e. MBW) as follows: [B2] where and and all other terms defined as before. We rearrange Equation [B2] to be a linear model in and hence of and using Equation [2.5]: [B3] It can then be readily demonstrated using Equation [B5] that the joint FCD of is multivariate normal just like Equation [B1] except one substitutes for W, for , for , for , and for y noting that , and . Similarly, one could rearrange [B2] to be a linear model in and hence of and using Equation [2.6]: [B4] 139 such that the joint FCD of is multivariate normal just like in [B1] except one substitutes for W, for , for , for , and for y noting that , and . FCD of,,,, and Note from Equations [2.2], [2.3c] and [2.4c] that one could rewrite the model for Trait #3 (i.e. DMI) as follows: [B5] where , , , and all other terms defined as before. We rearrange [B5] to be a linear model in and hence of and using Equation [2.5]: [B6] It can then be readily demonstrated that the joint FCD of is multivariate normal just like in [B1] except one substitutes 140 for W, for , for , for , and for y. Similarly, one could rearrange [B5] to be a linear model in and hence of and using Equation [2.6]: [B7] such that the joint FCD of is multivariate normal just like in [B1] except one substitutes for W, for , for , for , and for y noting that , and . We rewrite [B5] to be a linear model in and hence of and using Equation [2.5]: [B8] It can then be readily demonstrated that the joint FCD of is multivariate normal just like in [B1] except one substitutes 141 for W, for , for , for , and for y. Finally, one could also rearrange [B5] to be a linear model in and hence of and using Equation [2.6]: [B9] such that the joint FCD of is multivariate normal just like in [B1] except one substitutes for W, for , for , for , and for y noting that and . FCD offor r =21,31,32 The FCD for for r =21,31,32 can be determined as follows: 142 Note that if as adapted in this paper, then the FCD of is Scaled Inverted Chi-Square with parameters and . If is a proper scaled inverted chi-square density, then it could be readily shown that is scaled inverted chi-square distributed as well. FCD of where r=21,31,32 The FCD for for r =21,31,32 can be determined as follows: Note that if as adapted in this paper, then the FCD of is Scaled Inverted Chi-Square with parameters and . If is a proper scaled inverted chi-square density, then it could be readily shown that is scaled inverted chi-square density as well. FCD of and for s = 1, 2|1, and 3|1,2. Equation [2.7] can be rewritten as noting that and 143 and noting that elements of are typically either 0 or 1, then the FCD of can be written as: In other words, the FCD of can be demonstrated to be Inverse Gamma with parameters and. Note that I(.) refers to the indicator function which is equal to 1 if the condition defined therein is true; otherwise I(.) = 0; furthermore, ; i.e., the number of animals that are directly connected to . Using similar developments, it can be also demonstrated that the FCD for with a flat prior () is also inverse Gamma distributed with parameters: and 144 where ; i.e., the number of animals that are directly connected to . FCD for and for s = 1, 2|1, and 3|1,2. Equation [8] can be rewritten as noting that and prior of and noting that elements of are typically either 0 or 1, then the FCD of can be written as: 145 which can be demonstrated to be Inverse Gamma with parameters and . Note that I(.) refers to the indicator function which is equal to 1 if the condition defined therein is true; otherwise I(.) = 0; furthermore, . Using similar developments, it can be also demonstrated that the FCD for is also inverse Gamma distributed with parameters and FCD of Since this FCD is not recognizable, a Metropolis Hastings step is used here; further details are provided in Bello et al. (2012) FCD of 146 Since this FCD is not recognizeable, a Metropolis Hastings step is also used here Supplementary Figures and Tables 147 Table B.1: Posterior inferences characterizing heterogeneity of genetic and residual partial efficiencies of MBW1 on MILKE2 - - - - - - - - - - - - - -- - - - - - - - - - - 1 Metabolic body weight (in Kg) 2 Milk energy (in Mcal) 3 Posterior mean (± posterior standard deviation) 4 95% Highest posterior density interval 5 Effective sample size 148 Table B.1: 6 Station levels characterized in body of manuscript. Table B.2: Marginal Mean and Variance Component Inferences of Genetic and Residual Variance Components of MILKE1 1 Milk energy (in Mcal) 2 Posterior mean (± posterior standard deviation) 149 Table B.2: 3 Estimates not sharing the same letters within factors are statistically different (P < 0.05). 4 95% Highest posterior density interval 5 Effective sample size 6 Station levels characterized in body of paper. 7 Coefficient of variation of ration within station specific genetic and residual variance components. 150 Table B.3: Marginal Mean and Variance Component Inferences of Genetic and Residual Variance Components of MBW1| MILKE2 1 Metabolic body weight (in Kg) 2 Milk energy (in Mcal) 3 Posterior mean (± posterior standard deviation) 4 Estimates not sharing the same letters within factors are statistically different (P < 0.05). 5 95% Highest posterior density interval 6 Effective sample size 7 Station levels characterized in body of paper. 8 Coefficient of variation of ration within station specific genetic and residual variance components. 151 Derivations where and And one may also absorb the fixed effects too: where such that . Then Cgg part can be computed as And we already know that and where and are the j,j diagonal element of and Cgg is the prediction error variance of being the GBLUP estimate for SNP j. Consider the MT (multiple-trait) model on MILKE, MBW, and DMI , and the BV estimates and variance genetic and residual covariance matrixes between three traits are 152 It just happens that . Furthermore, . and The mixed model equation for MT model is: such that the inverse of the mixed model equations can be written as follows: 153 . Of course, the key part of that inverse pertains to the random effects for three traits which we illustrate below: Suppose one wants to backsolve for solutions of g using breeding value estimates from genotyped animals. Refer to these genotypes as Z and the corresponding breeding values from the genotyped animals as on the three traits. Then, it can be established that: 154 such that 155 . So the so- is based on dividing an element in that vector by the square root of the corresponding diagonal element in the big expression for above. In an anther word, test statistics for SNP k on trait i is , where is the SNP estimates on trait i and the denominator is the square root of the k,k element in. It can be proved that zik ~ N(0,1). Now our proposed rFE trait based on Cholesky decomposition is: .The corresponding variance covariance matrix of can be then determined as follows: For the element i of , its variance then really turns out to be : 156 Hence the fixed effects test is based on the square root of this expression as the standard error. Now, for the window-base test on window r covering SNPs j to k on trait i , the chi-square test statistics for that window will be : where and are the corresponding parts in and for SNPs j to k. statistics will be compared to a Chi-square distribution with df being number of SNPs in window r. Supplementary Figures and Tables Figure C.1: Scatterplot of log10(p) for SNPs (a) and windows (b) based on the homogeneous partial regression RFI versus heterogeneous partial regression RFI models. Line of slope 1 and intercept 0 superimposed. (b) (a) 157 Figure C.2: Manhatten plots for windows on RFI based on all cows (a) and genotyped cows (b). Bonferroni correction threshold superimposed. (a) (b) 158 Derivations Assuming environmental covariates were fitted as fixed effects, a general genomic prediction model with GxE effect can be written like this: for and being vectors for SNP-specific random intercept and slope, respectively. Suppose that . Now, we rewrite and as being the genomic breeding values for intercept and slope such that . variances are the same; i.e., for k 159 where can be also written as such that is a n x c matrix of covariates for n being number of records and c being number of environmental covariates fitted. The mixed model equation for above model can be written as . Assume To backsolve for SNP-specific random intercept and slopes, it pertains to any of the covariates use the following: 160 161 and SNP based GWA test Then, the GWA tests on the SNP-specific random intercept and slope on environmental covariate k for SNP i can be written as . Z scores will be compared to a standardized normal distribution to obtain p values for each SNP. Window-based GWA test We also considered window-based associations test with partitioning columns of Z, and hence elements of go, g1, g2 gc, into R distinct 1MB windows. Then, for the window r with genotype of for rn SNPs covered in this window, GWA tests on random intercept vector and slope vector on environmental covariate k can be written as and , where and To obtain p values for a window, Chi-square statistics will be compared to a Chi-square distribution with degree freedom being rn. A simple approach to solve for the MME model The simplest way to solve for the solutions to the MME would be as follows: 162 Where and . Now let the inverse of be written as but what we really want is this inverse: and Based on Henderson (1985) , ; j=1 and 2 Then one could derive that pertains to the random intercept and that pertains to the random slope Along with And 163 Supplementary Figures and Tables Figure D.1: Manhattan plot for SNP-specific random intercept based on the regular GBLUP (a) and the RN model (b) (a) (b) 164 Figure D.2: Manhattan plot for Window-specific random intercept based on the regular GBLUP (a) and the RN model (b) Figure D.3: Manhattan plot for SNP-specific random slope on the linear (a) and quadratic (b) terms of AP (a) (b) (b) (a) 165 Figure D.4: Manhattan plot for SNP-specific random slope on the linear (a) and quadratic (b) terms of TP65 (a) (b) 166 Figure D.5: Manhattan plot for SNP-specific random slope on the linear (a) and quadratic (b) terms of RH (1) and interaction between TP65 and RH (2) (a1) (b1) (a2) (b2) 167 Figure D.6: Manhattan plot for Window-specific random on the linear (a) and quadratic (b) terms of AP Figure D.7: Manhattan plot for Window-specific random on the linear (a) and quadratic (b) terms of TP65 (b) (a) (b) (a) 168 Figure D.8: Manhattan plot for Window-specific random slope on the linear (a) and quadratic (b) terms of RH (1) and interaction between TP65 and RH (2) (b2) (a2) (b1) (a1) 169 REFERENCES 170 REFERENCES Aggrey, S. E. and R. Rekaya. 2013. Dissection of Koch's residual feed intake: implications for selection. Poult. Sci. 92(10):2600-2605. Aguilar, I., I. Misztal, D. L. Johnson, A. Legarra, S. Tsuruta, and T. J. Lawlor. 2010. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J. Anim. Sci. 93(2):743-752. Aguilar, I., I. Misztal, and S. Tsuruta. 2009. Genetic components of heat stress for dairy cattle with multiple lactations. J. Anim. Sci. 92(11):5702-5711. Alpay, F., Y. Zare, M. H. Kamalludin, X. Huang, X. Shi, G. E. Shook, M. T. Collins, and B. W. Kirkpatrick. 2014. Genome-wide association study of susceptibility to infection by Mycobacterium avium subspecies paratuberculosis in Holstein cattle. PLoS ONE 9(12):e111704. Andersson, L., A. L. Archibald, C. D. Bottema, R. Brauning, S. C. Burgess, D. W. Burt, E. Casas, H. H. Cheng, L. Clarke, C. Couldrey, B. P. Dalrymple, C. G. Elsik, S. Foissac, E. Giuffra, M. A. Groenen, B. J. Hayes, L. S. Huang, H. Khatib, J. W. Kijas, H. Kim, J. K. Lunney, F. M. McCarthy, J. C. McEwan, S. Moore, B. Nanduri, C. Notredame, Y. Palti, G. S. Plastow, J. M. Reecy, G. A. Rohrer, E. Sarropoulou, C. J. Schmidt, J. Silverstein, R. L. Tellam, M. Tixier-Boichard, G. Tosser-Klopp, C. K. Tuggle, J. Vilkki, S. N. White, S. H. Zhao, H. Zhou, and F. Consortium. 2015. Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project. Genome Biol. 16:57. Banerjee, S., B. S. Yandell, and N. Yi. 2008. Bayesian quantitative trait loci mapping for multiple traits. Genetics 179:2275-2289. Banos, G., M. P. Coffey, R. F. Veerkamp, D. P. Berry, and E. Wall. 2012. Merging and characterising phenotypic data on conventional and rare traits from dairy cattle experimental resources in three countries. Animal : an international journal of animal bioscience 6(7):1040-1048. Bello, N. M., J. P. Steibel, R. J. Erskine, and R. J. Tempelman. 2012. Inferring Upon Heterogeneous Associations in Dairy Cattle Performance Using a Bivariate Hierarchical Model. J AGRIC BIOL ENVIR S 17(1):142-161. Bello, N. M., J. P. Steibel, and R. J. Tempelman. 2010. Hierarchical Bayesian modeling of random and residual variance-covariance matrices in bivariate mixed effects models. Biom. J. 52(3):297-313. Bernal Rubio, Y. L., J. L. Gualdron Duarte, R. O. Bates, C. W. Ernst, D. Nonneman, G. A. Rohrer, A. King, S. D. Shackelford, T. L. Wheeler, R. J. Cantet, and J. P. Steibel. 2016. Meta-analysis of genome-wide association from genomic prediction models. Anim Genet 47:36-48. 171 Berry, D. and J. Pryce. 2014. Feed Efficiency in Growing and Mature Animals. in Proc. 10th World Congress on Genetics Applied to Livestock Production. In Prec. 10th World Congress on Genetics Applied to Livestock Production. ASAS, August 17-22, 2014, Vancouver, CANADA. Berry, D. P., F. Buckley, P. Dillon, R. D. Evans, M. Rath, and R. F. Veerkamp. 2003. Genetic relationships among body condition score, body weight, milk yield, and fertility in dairy cows. J. Dairy Sci. 86:2193-2204. Berry, D. P., M. P. Coffey, J. E. Pryce, Y. de Haas, P. Lovendahl, N. Krattenmacher, J. J. Crowley, Z. Wang, D. Spurlock, K. Weigel, K. Macdonald, and R. F. Veerkamp. 2014. International genetic evaluations for feed intake in dairy cattle through the collation of data from multiple sources. J. Anim. Sci. 97(6):3894-3905. Berry, D. P. and J. J. Crowley. 2013. Cell Biology Symposium: genetics of feed efficiency in dairy and beef cattle. J. Anim Sci. 91(4):1594-1613. Bohmanova, J., I. Misztal, and J. B. Cole. 2007. Temperature-humidity indices as indicators of milk production losses due to heat stress. J. Anim. Sci. 90(4):1947-1956. Bolormaa, S., B. J. Hayes, K. Savin, R. Hawken, W. Barendse, P. F. Arthur, R. M. Herd, and M. E. Goddard. 2011. Genome-wide association studies for feedlot and growth traits in cattle. J. Anim. Sci. 89(6):1684-1697. Brooks, S. P. and A. Gelman. 1998. General methods for monitoring convergence of iterative simulations. J Comput Graph Stat 7:434-455. Byrne, T. J., B. F. Santos, P. R. Amer, D. Martin-Collado, J. E. Pryce, and M. Axford. 2016. New breeding objectives and selection indices for the Australian dairy industry. J. Dairy Sci. 99:8146-8167. Calus, M. P. L., D. P. Berry, G. Banos, Y. d. Haas, and R. F. Veerkamp. 2013. Genomic selection: the option for new robustness traits? Adv. Anim. Biosci. 4(3):618625. Canfora, E. E., J. W. Jocken, and E. E. Blaak. 2015. Short-chain fatty acids in control of body weight and insulin sensitivity. Nat Rev Endocrinol 11:577-591. Cao, Y., Z. Zhao, J. Gruszczynska-Biegala, and A. Zolkiewska. 2003. Role of metalloprotease disintegrin ADAM 12 in determination of quiescent reserve cells during myogenic differentiation in vitro. Mol. Cell Biol. 23:67256738. Capper, J. L., R. A. Cady, and D. E. Bauman. 2009. The environmental impact of dairy production: 1944 compared with 2007. Journal of animal science 87(6):2160-2167. Clayton, D. G. 1996. Generalized linear mixed models. Pages 275-301 in Markov chain Monte Carlo in practice. Springer, New York, US. 172 Coles, C. A., J. Wadeson, M. I. Knight, L. M. Cafe, W. H. Johns, J. D. White, P. L. Greenwood, and M. B. McDonagh. 2014. A disintegrin and metalloprotease-12 is type I myofiber specific in Bos taurus and Bos indicus cattle. J. Anim. Sci. 92:1473-1483. Connor, E. E. 2015. Invited review: improving feed efficiency in dairy production: challenges and possibilities. Animal 9:395-408. Connor, E. E., J. L. Hutchison, H. D. Norman, K. M. Olson, C. P. Van Tassell, J. M. Leith, and R. L. Baldwin. 2013. Use of residual feed intake in Holsteins during early lactation shows potential to improve feed efficiency through genetic selection. J. Anim. Sci. 91(8):3978-3988. Crossa, J., D. Jarquin, J. Franco, P. Perez-Rodriguez, J. Burgueno, C. Saint-Pierre, P. Vikram, C. Sansaloni, C. Petroli, D. Akdemir, C. Sneller, M. Reynolds, M. Tattaris, T. Payne, C. Guzman, R. J. Pena, P. Wenzl, and S. Singh. 2016. Genomic prediction of gene bank wheat landraces. G3 (Bethesda) 6:1819-1834. De Coninck, A., B. De Baets, D. Kourounis, F. Verbosio, O. Schenk, S. Maenhout, and J. Fostier. 2016. Needles: Toward large-scale genomic prediction with marker-by-environment interaction. Genetics 203:543-558. de Haas, Y., M. P. Calus, R. F. Veerkamp, E. Wall, M. P. Coffey, H. D. Daetwyler, B. J. Hayes, and J. E. Pryce. 2012. Improved accuracy of genomic prediction for dry matter intake of dairy cattle from combined European and Australian data sets. J. Anim. Sci. 95(10):6103-6112. de Haas, Y., J. E. Pryce, M. P. Calus, E. Wall, D. P. Berry, P. Lovendahl, N. Krattenmacher, F. Miglior, K. Weigel, D. Spurlock, K. A. Macdonald, B. Hulsegge, and R. F. Veerkamp. 2015. Genomic prediction of dry matter intake in dairy cattle from an international data set consisting of research herds in Europe, North America, and Australasia. J. Anim. Sci. 98(9):6522-6534. de los Campos, G., J. M. Hickey, R. Pong-Wong, H. D. Daetwyler, and M. P. L. Calus. 2013. Whole Genome Regression and Prediction Methods Applied to Plant and Animal Breeding. Genetics 193:327-345. deHaas, Y., J. E. Pryce, D. P. Berry, and R. F. Veerkamp. 2014. Genetic and genomic solutions to improve feed efficiency and reduce environmental impact of dairy cattle.10th World Congress on Genetics Applied to Livestock Production. ASAS, August 17-22, 2014, Vancouver, CANADA. Dempfle, A., A. Scherag, R. Hein, L. Beckmann, J. Chang-Claude, and H. Schafer. 2008. Gene-environment interactions for complex traits: definitions, methodological requirements and challenges. Eur. J. Hum. Genet. 16(10):1164-1172. Do, D. N., A. B. Strathe, T. Ostersen, S. D. Pant, and H. N. Kadarmideen. 2014. Genome-wide association and pathway analysis of feed efficiency in pigs reveal candidate genes and pathways for residual feed intake. Front Genet 5:307. 173 Fernando, R. L. and D. Garrick. 2013. Bayesian Methods Applied to GWAS, pp. 237-274 in Genome-Wide Association Studies and Genomic Prediction, edited by C. Gondro, J. van der Werf and B. Hayes. Humana Press, New York, U.S. Garcia-Ruiz, A., J. B. Cole, P. M. VanRaden, G. R. Wiggans, F. J. Ruiz-Lopez, and C. P. Van Tassell. 2016. Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection. Proc. Natl. Acad. Sci. U.S.A. 113(28):E3995-E4004. Gelman, A. 2006. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis 1(3):515-533. Gelman, A., J. Hill, and M. Yajima. 2012. Why We (Usually) Don't Have to Worry About Multiple Comparisons. Journal of Research on Educational Effectiveness 5(2):189-211. Gilmour, A. R., B. J. Gogel, B. R. Cullis, and R.Thompson. 2009. ASReml 3.0 User Guide, 3.0 VSN International Ltd, Hemel Hempstead, HP1 1ES, UK www.vsni.co.uk. Goddard, M. E. and B. J. Hayes. 2009. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nat. Rev. Genet. 10(6):381-391. Goddard, M. E., K. E. Kemper, I. M. MacLeod, A. J. Chamberlain, and B. J. Hayes. 2016. Genetics of complex traits: prediction of phenotype, identification of causal polymorphisms and genetic architecture. Proc Biol Sci. 283(1835). Gonzalez-Recio, O., J. E. Pryce, M. Haile-Mariam, and B. J. Hayes. 2014. Incorporating heifer feed efficiency in the Australian selection index using genomic selection. J. Dairy Sci. 97:3883-3893. Gualdron Duarte, J. L., R. J. Cantet, R. O. Bates, C. W. Ernst, N. E. Raney, and J. P. Steibel. 2014. Rapid screening for phenotype-genotype associations by linear transformations of genomic evaluations. BMC bioinformatics 15:246. Guo, J. Z., H. Jorjani, and O. Carlborg. 2012. A genome-wide association study using international breeding-evaluation data identifies major loci affecting production traits and stature in the Brown Swiss cattle breed. BMC Genet. 13:82. Hagemann, M., A. Ndambi, T. Hemme, and U. Latacz-Lohmann. 2012. Contribution of milk production to global greenhouse gas emissions. An estimation based on typical farms. Environ Sci Pollut Res Int 19(2):390-402. Hardie, L. C., L. E. Armentano, R. D. Shaver, M. J. VandeHaar, D. M. Spurlock, C. Yao, S. J. Bertics, F. E. Contreras-Govea, and K. A. Weigel. 2015. Considerations when combining data from multiple nutrition experiments to estimate genetic parameters for feed efficiency. J. Anim. Sci. 98(4):2727-2737. Hawken, R. J., Y. D. Zhang, M. R. Fortes, E. Collis, W. C. Barris, N. J. Corbet, P. J. Williams, G. Fordyce, R. G. Holroyd, J. R. Walkley, W. Barendse, D. J. Johnston, K. C. Prayaga, B. Tier, A. 174 Reverter, and S. A. Lehnert. 2012. Genome-wide association studies of female reproduction in tropically adapted beef cattle. J. Anim. Sci. 90(5):1398-1410. Hayes, B. J., H. D. Daetwyler, and M. E. Goddard. 2016. Models for Genome x Environment Interaction: Examples in Livestock. Vol. 56. Crop Sci. Hayes, B. J., H. A. Lewin, and M. E. Goddard. 2013. The future of livestock breeding: genomic selection for efficiency, reduced emissions intensity, and adaptation. Trends Genet. 29(4):206-214. Henderson, C. R. 1976. A Simple Method for Computing the Inverse of a Numerator Relationship Matrix Used in Prediction of Breeding Values. Biometrics 32(1):69-83. Henderson, C. R. 1977. Best linear unbiased prediction of breeding values not in model for records. 60:783-787. Henderson, C. R. 1985. Best linear unbiased prediction of nonadditive genetic merits in noninbred populations. J. Anim. Sci. 60:111-117. Herd, R. M. and P. F. Arthur. 2009. Physiological basis for residual feed intake. J. Anim. Sci. 87(14):E64-71. Jarquin, D., J. Crossa, X. Lacaze, P. Du Cheyron, J. Daucourt, J. Lorgeou, F. Piraux, L. Guerreiro, P. Perez, M. Calus, J. Burgueno, and G. de los Campos. 2014. A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theor. Appl. Genet. 127(3):595-607. Johnson, R. C., G. W. Nelson, J. L. Troyer, J. A. Lautenberger, B. D. Kessing, C. A. Winkler, and S. J. O'Brien. 2010. Accounting for multiple comparisons in a genome-wide association study (GWAS). BMC Genomics 11:724. Kang, H. M., J. H. Sul, S. K. Service, N. A. Zaitlen, S. Y. Kong, N. B. Freimer, C. Sabatti, and E. Eskin. 2010. Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348-U110. Kawaguchi, N., C. Sundberg, M. Kveiborg, B. Moghadaszadeh, M. Asmar, N. Dietrich, C. K. Thodeti, F. C. Nielsen, P. Moller, A. M. Mercurio, R. Albrechtsen, and U. M. Wewer. 2003. ADAM12 induces actin cytoskeleton and extracellular matrix reorganization during early adipocyte differentiation by regulating beta1 integrin function. J. Cell Sci. 116:3893-3904. Kennedy, B. W., L. R. Schaeffer, and D. A. Sorensen. 1988. Genetic Properties of Animal Models. Journal of Dairy Science 71:17-26. Kennedy, B. W., J. H. J. Vanderwerf, and T. H. E. Meuwissen. 1993. Genetic and statistical properties of residual feed-intake. J. Anim. Sci. 71(12):3239-3250. Kim, Y. M., J. Kim, S. C. Heo, S. H. Shin, E. K. Do, D. S. Suh, K. H. Kim, M. S. Yoon, T. G. Lee, and J. H. Kim. 2012. Proteomic Identification of ADAM12 as a Regulator for TGF-beta 1- 175 Induced Differentiation of Human Mesenchymal Stem Cells to Smooth Muscle Cells. Plos ONE 7:e40820. Kizilkaya, K. and R. J. Tempelman. 2005. A general approach to mixed effects modeling of residual variances in generalized linear mixed models. Genet. Sel. Evol. 37:31-56. Koch, R. M., K. E. Gregory, D. Chambers, and L. A. Swiger. 1963. Efficiency of Feed Use in Beef Cattle. J. Anim. Sci. 22(2):486-&. Kolmodin, R. 2003. Reaction norms for the study of genotype by environment interaction in animal breeding.Sveriges lantbruksuniv., Acta Universitatis agriculturae Sueciae. Agraria, 1401-6249. Kutner, M. H., C. J. Nachtsheim, J. Neter, and W. Li. 2005. Applied Linear Statistical Models, 5th Edition. McGraw-Hill Irwin, New York. Lee, S. H., B. H. Choi, D. Lim, C. Gondro, Y. M. Cho, C. G. Dang, A. Sharma, G. W. Jang, K. T. Lee, D. Yoon, H. K. Lee, S. H. Yeon, B. S. Yang, H. S. Kang, and S. K. Hong. 2013. Genome-wide association study identifies major loci for carcass weight on BTA14 in Hanwoo (Korean cattle). PLoS ONE 8:e74677. Lillehammer, M., M. Arnyasi, S. Lien, H. G. Olsen, E. Sehested, J. Odegard, and T. H. Meuwissen. 2007a. A genome scan for quantitative trait locus by environment interactions for production traits. J. Dairy Sci. 90:3482-3489. Lillehammer, M., B. J. Hayes, T. H. Meuwissen, and M. E. Goddard. 2009. Gene by environment interactions for production traits in Australian dairy cattle. J. Dairy Sci. 92:4008-4017. Lillehammer, M., J. Odegard, and T. H. Meuwissen. 2007b. Random regression models for detection of gene by environment interaction. Genet. Sel. Evol. 39:105-121. Littell, R. C., W. W. Stroup, R. J. Freund, and SAS Institute. 2002. SAS for linear models. 4th ed. SAS Institute ;Wiley, Cary, N.C. New York. Lopez-Cruz, M., J. Crossa, D. Bonnett, S. Dreisigacker, J. Poland, J. L. Jannink, R. P. Singh, E. Autrique, and G. de los Campos. 2015. Increased prediction accuracy in wheat breeding trials using a marker x environment interaction genomic selection model. G3 (Bethesda) 5:569-582. Lopez-Romero, P., R. Rekaya, and M. J. Carabano. 2003. Assessment of homogeneity vs. heterogeneity of residual variance in random regression test-day models in a Bayesian analysis. J. Anim. Sci. 86(10):3374-3385. Lu, D., S. Miller, M. Sargolzaei, M. Kelly, G. Vander Voort, T. Caldwell, Z. Wang, G. Plastow, and S. Moore. 2013. Genome-wide association analyses for growth and feed efficiency traits in beef cattle. J. Anim Sci. 91(8):3612-3633. 176 Lu, Y., M. J. Vandehaar, D. M. Spurlock, K. A. Weigel, L. E. Armentano, C. R. Staples, E. E. Connor, Z. Wang, N. M. Bello, and R. J. Tempelman. 2015. An alternative approach to modeling genetic merit of feed efficiency in dairy cattle. J. Anim. Sci. 98(9):6535-6551. Lu, Y., M. J. VandeHaar, D. M. Spurlock, K. A. WEigel, L. E. Armentano, C. R. Staples, E. E. Connor, Z. Wang, M. Coffey, R. F. Veerkamp, Y. de Haas, and R. J. Tempelman. 2017a. Modeling genetic and nongenetic variation of feed efficiency and its partial relationships bteween component traits as a function of management and environmental factors. Journal of Dairy Science 100((in press)). Lu, Y., M. J. VandeHaar, D. M. Spurlock, K. A. Weigel, E. E. Connor, M. Coffey, R. F. Veerkamp, Y. D. Haas, C. R. Staples, Z. Wang, M. D. Hanigan, and R. J. Tempelman. 2017b. Genome wide association analyses based on alternative strategies for modeling feed efficiency. Journal of Dairy Science (in preparation). Ludovico, A., V. B. Maion, D. E. Bronkhorst, F. D. C. R. Grecco, L. F. C. D. Cunha, I. Y. Mizubuti, K. M. de Almeida, M. D. Ludovico, and E. H. W. de Santana. 2015. Losses in milk production and quality due to milk somatic cell count and heat stress of Holsteins cows in temperate climate. Semin-Cienc Agrar 36(5):3455-3470. Manafiazar, G., T. McFadden, L. Goonewardene, E. Okine, J. Basarab, P. Li, and Z. Wang. 2013. Prediction of residual feed intake for first-lactation dairy cows using orthogonal polynomial random regression. J. Anim. Sci. 96(12):7991-8001. Manzanilla-Pech, C. I., R. F. Veerkamp, R. J. Tempelman, M. L. van Pelt, K. A. Weigel, M. VandeHaar, T. J. Lawlor, D. M. Spurlock, L. E. Armentano, C. R. Staples, M. Hanigan, and Y. De Haas. 2016. Genetic parameters between feed-intake-related traits and conformation in 2 separate dairy populations--the Netherlands and United States. J. Anim. Sci. 99(1):443-457. Martin-Collado, D., T. J. Byrne, P. R. Amer, B. F. S. Santos, M. Axford, and J. E. Pryce. 2015. Analyzing the heterogeneity of farmers' preferences for improvements in dairy cow traits using farmer typologies. J. Anim. Sci. 98(6):4148-4161. Maruyama, K. 1991. The siscovery of Adenosine-Triphosphate and the establishment of its structure. J. Hist. Biol. 24:145-154. Mayr, J. A., V. Havlickova, F. Zimmermann, I. Magler, V. Kaplanova, P. Jesina, A. Pecinova, H. Nuskova, J. Koch, W. Sperl, and J. Houstek. 2010. Mitochondrial ATP synthase deficiency due to a mutation in the ATP5E gene for the F1 epsilon subunit. Hum. Mol. Genet. 19:3430-3439. Meuwissen, T. H. E., B. J. Hayes, and M. E. Goddard. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819-1829. Milliken, G. A. and D. E. Johnson. 2009. Analysis of Messy Data Volume 1: Designed Experiments. CRC Press, Boca Raton,FL. 177 Moser, G., S. H. Lee, B. J. Hayes, M. E. Goddard, N. R. Wray, and P. M. Visscher. 2015. Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model. Plos Genet 11(4):e1004969. Nkrumah, J. D., E. L. Sherman, C. Li, E. Marques, D. H. Crews, Jr., R. Bartusiak, B. Murdoch, Z. Wang, J. A. Basarab, and S. S. Moore. 2007. Primary genome scan to identify putative quantitative trait loci for feedlot growth rate, feed intake, and feed efficiency of beef cattle. J. Anim. Sci. 85:3170-3181. NRC. 1981. Effect of environment on nutrient requirements of domestic animals. Natl. Acad. Press, Washington, D.C. NRC. 2001. Nutrient requirements of dairy cattle. 7th rev. ed. Natl. Acad. Press, Washington, D.C. Olivieri, B. F., A. S. M. Cesar, M. L. do Nascimento, A. S. Chaves, P. C. Tizioto, R. R. Tullio, D. P. D. Lanna, A. N. Rosa, T. S. Sonstegard, G. B. Mourao, J. M. Reecy, D. J. Garrick, M. A. Mudadu, L. L. Coutinho, and L. C. A. Regitano. 2014. Identification of genomic regions associated with feed efficiency in Nelore cattle. BMC Genet. 15:100. Olivieri, B. F., M. E. Mercadante, J. N. Cyrillo, R. H. Branco, S. F. Bonilha, L. G. de Albuquerque, R. M. Silva, and F. Baldi. 2016. Genomic regions associated with feed efficiency indicator traits in an experimental Nellore cattle population. PLoS ONE 11(10):e0164390. Ou, Z. N., R. J. Tempelman, J. P. Steibel, C. W. Ernst, R. O. Bates, and N. M. Bello. 2016. Genomic Prediction Accounting for Residual Heteroskedasticity. G3-Genes Genom Genet 6(1):1-13. Pahl, R. and H. Schafer. 2010. PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing. Bioinformatics 26:2093-2100. Plante, Y., J. P. Gibson, J. Nadesalingam, H. Mehrabani-Yeganeh, S. Lefebvre, G. Vandervoort, and G. B. Jansen. 2001. Detection of quantitative trait loci affecting milk production traits on 10 chromosomes in Holstein cattle. J. Dairy Sci. 84:1516-1524. Plummer, M., N. , K. C. Best, and K. Vines. 2006. CODA: convergence diagnosis and output analysis for MCMC. Vol. 6. R News. Pollak, E. J., J. Vanderwerf, and R. L. Quaas. 1984. Selection bias and multiple trait evaluation. 67:1590-1595. Pourahmadi, M. 2007. Cholesky decompositions and estimation of a covariance matrix: orthogonality of variance-correlation parameters. Biometrika 94(4):1006-1013. Pourahmadi, M., M. J. Daniels, and T. Park. 2007. Simultaneous modelling of the Cholesky decomposition of several covariance matrices. J Multivariate Anal 98(3):568-587. 178 Pryce, J. E., O. Gonzalez-Recio, G. Nieuwhof, W. J. Wales, M. P. Coffey, B. J. Hayes, and M. E. Goddard. 2015. Hot topic: Definition and implementation of a breeding value for feed efficiency in dairy cows. J. Anim. Sci. 98(10):7340-7350. Pryce, J. E., W. J. Wales, Y. de Haas, R. F. Veerkamp, and B. J. Hayes. 2014. Genomic selection for feed efficiency in dairy cattle. Animal : an international journal of animal bioscience 8(1):1-10. Rekaya, R. and S. E. Aggrey. 2015. Genetic properties of residual feed intakes for maintenance and growth and the implications of error measurement. J. Anim Sci. 93(3):944-948. Richardson, E. C. and R. M. Herd. 2004. Biological basis for variation in residual feed intake in beef cattle. 2. Synthesis of results following divergent selection. Aust. J. Exp. Agric. 44(4-5):431-440. Rolf, M. M., J. F. Taylor, R. D. Schnabel, S. D. McKay, M. C. McClure, S. L. Northcutt, M. S. Kerley, and R. L. Weaber. 2012. Genome-wide association analysis for feed efficiency in Angus cattle. Anim. Genet. 43:367-374. Ronnegard, L., M. Felleki, F. Fikse, H. A. Mulder, and E. Strandberg. 2010. Genetic heterogeneity of residual variance - estimation of variance components using double hierarchical generalized linear models. GENET. SEL. EVOL. 42(8). Ronnegard, L., M. Felleki, W. F. Fikse, H. A. Mulder, and E. Strandberg. 2013. Variance component and breeding value estimation for genetic heterogeneity of residual variance in Swedish Holstein dairy cattle. J. Anim. Sci. 96(4):2627-2636. Ryu, J. and C. Lee. 2016. Genetic association of marbling score with intragenic nucleotide variants at selection signals of the bovine genome. Animal : an international journal of animal bioscience 10(4):566-570. Saatchi, M., J. E. Beever, J. E. Decker, D. B. Faulkner, H. C. Freetly, S. L. Hansen, H. Yampara-Iquise, K. A. Johnson, S. D. Kachman, M. S. Kerley, J. Kim, D. D. Loy, E. Marques, H. L. Neibergs, E. J. Pollak, R. D. Schnabel, C. M. Seabury, D. W. Shike, W. M. Snelling, M. L. Spangler, R. L. Weaber, D. J. Garrick, and J. F. Taylor. 2014a. QTLs associated with dry matter intake, metabolic mid-test weight, growth and feed efficiency have little overlap across 4 beef cattle studies. BMC genomics 15:1004. Saatchi, M., R. D. Schnabel, J. F. Taylor, and D. J. Garrick. 2014b. Large-effect pleiotropic or closely linked QTL segregate within and across ten US cattle breeds. BMC genomics 15:442. Santana, M. H. A., Y. T. Utsunomiya, H. H. R. Neves, R. C. Gomes, J. F. Garcia, H. Fukumasu, S. L. Silva, G. A. Oliveira, P. A. Alexandre, P. R. Leme, R. A. Brassaloti, L. L. Coutinho, T. G. Lopes, F. V. Meirelles, J. P. Eler, and J. B. S. Ferraz. 2014. Genome-wide association analysis of feed intake and residual feed intake in Nellore cattle. Bmc Genet 15. Savietto, D., D. P. Berry, and N. C. Friggens. 2014. Towards an improved estimation of the biological components of residual feed intake in growing cattle. J. Anim. Sci. 92:467-476. 179 Searle, S. R. 1982. Matrix Algebra Useful for Statistics. First ed. Wiley-Interscience. Searle, S. R., G. Casella, and C. E. McCulloch. 1992. Variance Components. John Wiley and Sons, New York. Serão, N. V., E. D. Mauch, S. K. Onteru, M. F. Rothschild, and J. C. M. Dekkers. 2016. Genome wide association study for residual feed intake and component traits of feed efficiency in pigs divergently selected for residual feed intake.2016 Joint Annual Meeting. ASAS, July 19-23, 2016, Salt Lake City, UT. Sherman, E. L., J. D. Nkrumah, C. Li, R. Bartusiak, B. Murdoch, and S. S. Moore. 2009. Fine mapping quantitative trait loci for feed intake and feed efficiency in beef cattle. J. Anim. Sci. 87:37-45. Shirali, M., V. H. Nielsen, S. H. Moller, and J. Jensen. 2015. Longitudinal analysis of residual feed intake and BW in mink using random regression with heterogeneous residual variance. Animal : an international journal of animal bioscience 9(10):1597-1604. Sorensen, D. and D. Gianola. 2002. Likelihood, Bayesian and MCMC Methods in Quantitative Genetics. Statistics for Biology and Health. Springer Verlag, New York. Sorensen, D. and R. Waagepetersen. 2003. Normal linear models with genetically structured residual variance heterogeneity: a case study. Genet. Res. 82:207-222. Sova, A. D., S. J. LeBlanc, B. W. McBride, and T. J. DeVries. 2013. Associations between herd-level feeding management practices, feed sorting, and milk production in freestall dairy farms. J. Anim. Sci. 96(7):4759-4770. Spiegelhalter, D. J., N. G. Best, B. R. Carlin, and A. van der Linde. 2002. Bayesian measures of model complexity and fit. J Roy Stat Soc B 64:583-616. Spurlock, D. M., J. C. Dekkers, R. Fernando, D. A. Koltes, and A. Wolc. 2012. Genetic parameters for energy balance, feed efficiency, and related traits in Holstein cattle. J. Dairy Sci. 95:5393-5402. Steinfeld, H., P. Gerber, T. D. Wassenaar, V. Castel, M. R. M, and C. d. Haan. 2006. Livestock's long shadow: environmental issues and options. Roma. Strathe, A. B., T.Mark, B. Nielsen, D. N. Do, H. N. Kadarmideen, and J. Jensen. 2014. Deriving genomic breeding values for residual feed intake from covariance functions of random regression models.In Proc. 10th World Congress Genet. Appl. Livest. Prod. ASAS, August 17-22, 2014, Vancouver, BC, Canada. Taylor, J. F., K. H. Taylor, and J. E. Decker. 2016. Holsteins are the genomic selection poster cows. Proc. Natl. Acad. Sci. U.S.A. 113(28):7690-7692. 180 Tempelman, R. J. 2015. Statistical and Computational Challenges in Whole Genome Prediction and Genome-Wide Association Analyses for Plant and Animal Breeding. Journal of Agricultural, Biological, and Environmental Statistics 20(4):442-466. Tempelman, R. J., D. M. Spurlock, M. Coffey, R. F. Veerkamp, L. E. Armentano, K. A. Weigel, Y. de haas, C. R. Staples, E. E. Connor, Y. Lu, and M. J. Vandehaar. 2015. Heterogeneity in genetic and non-genetic variation and energy sink relationships for residual feed intake across research stations and countries. J. Anim. Sci. 98(3):(in press). Tort, F., M. Del Toro, W. Lissens, J. Montoya, M. Fernandez-Burriel, A. Font, N. Bujan, A. Navarro-Sastre, E. Lopez-Gallardo, J. A. Arranz, E. Riudor, P. Briones, and A. Ribes. 2011. Screening for nuclear genetic defects in the ATP synthase-associated genes TMEM70, ATP12 and ATP5E in patients with 3-methylglutaconic aciduria. Clin. Genet. 80:297-300. Valdar, W., L. C. Solberg, D. Gauguier, W. O. Cookson, J. N. P. Rawlins, R. Mott, and J. Flint. 2006. Genetic and environmental effects on complex traits in mice. Genetics 174(2):959-984. Vallimont, J. E., C. D. Dechow, J. M. Daubert, M. W. Dekleva, J. W. Blum, C. M. Barlieb, W. Liu, G. A. Varga, A. J. Heinrichs, and C. R. Baumrucker. 2010. Genetic parameters of feed intake, production, body weight, body condition score, and selected type traits of Holstein cows in commercial tie-stall barns. J. Anim. Sci. 93(10):4892-4901. Vallimont, J. E., C. D. Dechow, J. M. Daubert, M. W. Dekleva, J. W. Blum, C. M. Barlieb, W. Liu, G. A. Varga, A. J. Heinrichs, and C. R. Baumrucker. 2011. Heritability of gross feed efficiency and associations with yield, intake, residual intake, body weight, and body condition score in 11 commercial Pennsylvania tie stalls. J. Anim. Sci. 94(4):2108-2113. VandeHaar, M. J., L. E. Armentano, K. Weigel, D. M. Spurlock, R. J. Tempelman, and R. Veerkamp. 2016. Harnessing the genetics of the modern dairy cow to continue improvements in feed efficiency. J. Anim. Sci. 99(6):4941-4954. VandeHaar, M. J. and N. St-Pierre. 2006. Major advances in nutrition: relevance to the sustainability of the dairy industry. J. Anim. Sci. 89(4):1280-1291. VanRaden, P. M. 2008. Efficient methods to compute genomic predictions. J. Anim. Sci. 91(11):4414-4423. Vazquez, A. I., Y. Veturi, M. Behring, S. Shrestha, M. Kirst, M. F. R. Resende, and G. de los Campos. 2016. Increased proportion of variance explained and prediction accuracy of survival of breast cancer patients with use of whole-genome multiomic profiles. Genetics 203:1425-1438. Veerkamp, R. F., J. E. P. , D. Spurlock, D. Berry, M. Coffey, P. Løvendahl, R. v. d. Linde, J. Bryant, F. Miglior, Z. Wang, M. Winters, N. Krattenmacher, N. Charfeddine, J. Pedersen, and Y. d. Haas. 2013. Selection on Feed Intake or Feed Efficiency:A Position Paper from gDMI Breeding Goal Discussions. Interbull Bulletin No. 47. Nantes, France, August 23 - 25, 2013. Veerkamp, R. F., M. Coffey, D. Berry, Y. de Haas, E. Strandberg, H. Bovenhuis, M. Calus, and E. Wall. 2012. Genome-wide associations for feed utilisation complex in primiparous Holstein- 181 Friesian dairy cows from experimental research herds in four European countries. Animal : an international journal of animal bioscience 6(11):1738-1749. Veerkamp, R. F., J. K. Oldenbroek, H. J. Van der Gaast, and J. H. Van der Werf. 2000. Genetic correlation between days until start of luteal activity and milk yield, energy balance, and live weights. J. Anim. Sci. 83(3):577-583. Wang, H., I. Misztal, I. Aguilar, A. Legarra, and W. M. Muir. 2012. Genome-wide association mapping including phenotypes from relatives without genotypes. Genet Res. 94:73-83. Waters, S. M., M. S. McCabe, D. J. Howard, L. Giblin, D. A. Magee, D. E. MacHugh, and D. P. Berry. 2011. Associations between newly discovered polymorphisms in the Bos taurus growth hormone receptor gene and performance traits in Holstein-Friesian dairy cattle. Anim. Genet. 42:39-49. Wathes, D. C., Z. Cheng, N. Bourne, V. J. Taylor, M. P. Coffey, and S. Brotherstone. 2007. Differences between primiparous and multiparous dairy cows in the inter-relationships between metabolic traits, milk yield and body condition score in the periparturient period. Domest. Anim. Endocrin. 33:203-225. Wiggans, G. R., T. A. Cooper, P. M. VanRaden, C. P. Van Tassell, D. M. Bickhart, and T. S. Sonstegard. 2016. Increasing the number of single nucleotide polymorphisms used in genomic evaluation of dairy cattle. J. Dairy Sci. 99:4504-4511. Wray, N. R., J. Yang, M. E. Goddard, and P. M. Visscher. 2010. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 6:e1000864. Yang, J., N. A. Zaitlen, M. E. Goddard, P. M. Visscher, and A. L. Price. 2014. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46:100-106. Yang, W., C. Chen, J. P. Steibel, C. W. Ernst, R. O. Bates, L. Zhou, and R. J. Tempelman. 2015. A comparison of alternative random regression and reaction norm models for whole genome predictions. J. Anim. Sci. 93(6):2678-2692. Yang, Y., O. F. Christensen, and D. Sorensen. 2011. Use of genomic models to study genetic control of environmental variance. Genet. Res. 93:125-138. Yao, C., D. M. Spurlock, L. E. Armentano, C. D. Page, Jr., M. J. VandeHaar, D. M. Bickhart, and K. A. Weigel. 2013. Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle. J. Anim. Sci. 96(10):6716-6729. Zom, R. L. G., G. Andre, and A. M. van Vuuren. 2012. Development of a model for the prediction of feed intake by dairy cows: prediction of feed intake. Livest Sci 143:43-57.