ANALYSIS OF HOST TRANSCRIPTOME RESPONSE TO PORCINE REPRODUCTIVE AND RESPIRATORY SYNDROME VIRUS INFECTION By Maria Eugenia Arceo A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Animal Science 2012    ABSTRACT ANALYSIS OF HOST TRANSCRIPTOME RESPONSE TO PORCINE REPRODUCTIVE AND RESPIRATORY SYNDROME VIRUS INFECTION By Maria Eugenia Arceo Porcine Reproductive and Respiratory Syndrome (PRRS) has been affecting commercial populations of pigs in the US for more than 20 years. We evaluated differences in gene expression in pigs from the PRRS Host Genetics Consortium initiative showing a range of responses to PRRS virus infection. Pigs were allocated into four phenotypic groups according to their serum viral level and weight gain. We obtained RNA at several days post-infection and hybridized it to the 20K 70 mer-oligonucleotide Pigoligoarray. We initially used plasmode datasets to select an optimal procedure for analyzing these data. We showed that the random array effects model with the moderated F statistic and significance thresholds obtained by permutation provided the most powerful analysis procedure. We then addressed global differential gene expression between phenotypic groups. We identified cell death as a biological function significantly associated with several gene networks enriched for differentially expressed genes. We found the genes interferon-alpha 1, major histocompatibility complex, class II, DR alpha, and major histocompatibility complex, class II, DQ alpha 1 differentially expressed between phenotypic groups. Finally, we used this study as pilot data to inform the design of future time-course transcriptional profiling experiments. We concluded the best scenario for investigation of early response to PRRSV infection consists of sampling at 4 and 7 days post infection using approximately 30 pigs per phenotypic group.   To my family and those friends who became my extended family, their support and love have made it possible to accomplish this thesis iii    AKNOWLEDGEMENTS I owe a debt of gratitude to many people for having believed in me throughout this journey. First, I would like to acknowledge Dr. Cantet and Dr. Miquel for introducing me to quantitative genetics/genomics and encouraging me to pursue a career in that field. Because they believed in me, I am here today. Second, I would like to thank my advisor Dr. Juan Steibel. I would never have accomplished this thesis without his constant encouragement, guidance, trust and friendly support. I would like to express my gratitude to the members of my committee, Drs. Tempelman, Lunney, Ernst, and Steibel for their useful suggestions throughout my graduate program. Moreover, I would like to thank Drs. Steibel, Ernst and Lunney for their ideas to develop this project and their contribution of knowledge during the experimental design and statistical analysis. I would especially like to thank Dr. Tempelman for his academic advice which guided me and that I feel will have an impact in my long term career. Finally, I deeply appreciate the moral support Dr. Lunney kindly offered me. I am particularly grateful for the help and contributions to my knowledge provided by my fellow graduate students in statistical genetics. I would especially like to acknowledge Yvonne Badke, for her assistance with computational resources. I would like to express my deepest appreciation to my friends, the ones I met here and my lifelong friends. They provided me with countless hours of unconditional support. Thank you Belen, Irene, Yvonne, Leandro. iv    Finally, I would like to thank my family for their love and encouragement. Mom, Dad, Mercedes, Tomas, Maggie and Pepe: you can only imagine how much your support meant to me. v      TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................... viii LIST OF FIGURES ...................................................................................................................... ix CHAPTER ONE ..............................................................................................................................1 1. Introduction ..........................................................................................................................2 References ............................................................................................................................5 CHAPTER TWO .............................................................................................................................9 I. Assessing optimal analysis models and statistical tests in a two color microarray experiment: a case study involving Porcine Reproductive and Respiratory Syndrome Virus infected pigs using plasmode datasets ...........................................................................................................10 Abstract ....................................................................................................................................10 Keywords .................................................................................................................................10 1. Introduction ........................................................................................................................11 2. . Materials and Methods .......................................................................................................14 2.1. Population, phenotypic groups and microarray design ...............................................14 2.2. Plasmode construction ................................................................................................14 2.3. Linear Model Analysis................................................................................................15 2.4. Test Statistics ..............................................................................................................16 2.5. Estimation of critical values........................................................................................16 2.6. Assessment of performance under the null hypothesis ...............................................17 2.7. Assessment of performance in a dataset with DE transcripts .....................................17 3. Results ................................................................................................................................18 3.1. Overall evidence of differential expression in lung and BLN microarray datasets ....18 3.2. Estimation of Type I error using plasmode datasets ...................................................19 3.3. Assessment of differential expression in lung dataset ................................................20 4. Discussion ..........................................................................................................................22 References ..........................................................................................................................26 CHAPTER THREE .......................................................................................................................29 II. Characterizing differential individual response to Porcine Reproductive and Respiratory Syndrome Virus infection through statistical and functional analysis of gene expression......30 Abstract ....................................................................................................................................31 Keywords .................................................................................................................................32 1. Introduction ........................................................................................................................33 vi    2. Materials and methods .......................................................................................................35 2.1. Animal model and study design ..................................................................................35 2.2. Phenotypic groups.......................................................................................................36 2.3. Microarray design and analysis ..................................................................................36 2.4. Pathway analysis .........................................................................................................38 2.5. qPCR design and analysis ...........................................................................................39 2.6. Statistical power and sample size computation ..........................................................40 3. Results ................................................................................................................................40 3.1. Microarray analysis .....................................................................................................40 3.1.1. Evidence of differential gene expression at 0 DPI.............................................41 3.1.2. Evidence of differential gene expression for remaining DPI.............................41 3.1.3. Weight gain and viral load interaction effect on gene expression .....................42 3.1.4. Weight gain and viral load effect on gene expression .......................................43 3.2. Pathway analysis .........................................................................................................44 3.2.1. HvHg vs. LvHg ..................................................................................................44 3.2.2. HvLg vs. LvLg ...................................................................................................45 3.2.3. HvHg vs. HvLg ..................................................................................................45 3.2.4. LvHg vs. LvLg ...................................................................................................46 3.3. qPCR analysis .............................................................................................................46 3.3.1. Verification of microarray findings ...................................................................46 3.3.2. Additional genes ................................................................................................47 3.4. Microarray statistical power and sample size estimation ...........................................48 4. Discussion ..........................................................................................................................50 Conflict of Interest Statement ............................................................................................56 Acknowledgements ............................................................................................................56 References ..........................................................................................................................57 CHAPTER FOUR ..........................................................................................................................63 1. Conclusion ..........................................................................................................................64 1.1. Goals and contribution of this study ...........................................................................64 1.2. Future Research Directions .........................................................................................67 References ..........................................................................................................................69 APPENDICES ...............................................................................................................................72 1. Appendix A ........................................................................................................................73 1.1. Figure Legends............................................................................................................73 2. Appendix B.........................................................................................................................80 2.1. Figure Legends............................................................................................................80 vii      LIST OF TABLES Table 2.1. Number of differentially expressed genes in a fixed versus mixed effects model at different FDR thresholds .....................................................................................22 Table 3.1. Number of putatively differentially expressed transcripts ................................43 Table 3.2. Test of immune gene expression for genes not present in the microarray platform ..............................................................................................................................48 Table 3.3. Expected discovery rate (EDR) comparing 20 and 30 biological replicates ....50 Supplementary Table 1 ......................................................................................................84 viii      LIST OF FIGURES Figure 2.1 Distribution of P-values from BLN and Lung datasets ....................................75 Figure 2.2. Realized versus nominal Type I error rate in plasmode datasets ....................76 Figure 2.3. Uniform Q-Q plot of P-values in plasmode datasets .......................................77 Figure 2.4. Proportion of rejected hypotheses in lung dataset ...........................................78 Figure 2.5. Uniform Q-Q plot in lung dataset for tests for tests that controlled the Type I error rate .............................................................................................................................79 Figure 3.1. Scatterplot of weight gain versus viral load for all pigs in PHGC trial one ....81 Figure 3.2. Histogram of P-values for the four contrasts of interest at 0 DPI ...................82 Figure 3.3. Uniform Q-Q plot for gene expression of all contrasts across 4 to 42 DPI after correcting for 0 DPI estimated effect .................................................................................83 ix    CHAPTER ONE 1    1. Introduction Porcine Reproductive and Respiratory Syndrome (PRRS) was initially described in the US over 20 years ago (Done et al., 1996). The disease causes $ 664 million annual losses to the US pork industry (Holtkamp et al., 2012). Viral replication takes place in the host’s immune cells (Rowland et al., 2003;Genini et al., 2008). Therefore, a possible way of controlling the economic impact of PRRS is addressing the host genetic component. In this context, phenotypic variation between breed-lines has been observed in disease-related and production traits of PRRS virus experimentally infected pigs (Petry et al., 2005;Vincent et al., 2006;Doeschl-Wilson et al., 2009). Moreover, the availability of whole genome microarrays (Steibel et al., 2009) and next generation sequencing (Mardis, 2008) have favored whole genome expression profiling of PRRSV infected animals, and global differential expression has been assessed in pigs showing phenotypic variation to PRRSV infection responses (Lee et al., 2004a;Lee et al., 2004b;Miller and Fox, 2004;Bates et al., 2008;Genini et al., 2008;Xiao et al., 2010a;Xiao et al., 2010b;Ait-Ali et al., 2011;Zhou et al., 2011;Wysocki et al., 2012). While most previous studies focused on comparing gene expression of PRRS virus in infected versus uninfected pigs, or gene expression between animals showing differences in post-infection viral titers, little is known of the interaction between viral load and weight gain as it relates to gene expression post-infection. This is particularly important given the reported associations of immune traits with growth rate (Galina-Pantoja et al., 2006;Boddicker et al., 2012) and the genetic correlations between growth rate and disease traits (Doeschl-Wilson et al., 2009) as well as between growth rate and immune related traits (Clapperton et al., 2009). To address differential gene expression using microarrays, different designs have been suggested for two color microarray experiments. In general, these experiments could include a 2    reference sample (reference designs) or not (loop designs) (Kerr and Churchill, 2001;Dobbin and Simon, 2002;Rosa et al., 2005;Tempelman, 2005). Among reference designs, blocked reference designs have been shown to be more efficient than common reference designs and to allow meaningful hypotheses testing using the reference samples (Steibel and Rosa, 2005). A two steps (Wolfinger et al., 2001) linear mixed model approach most often underlies the analysis of these experiments (Rosa et al., 2005). Initially, a general model that accounts for technical variation, such as array and dye effects, is fitted to the data. Then, the residuals estimated from this model are used to analyze each transcript specific expression, accounting for all appropriate sources of variation (Wolfinger et al., 2001). Consequently, this is a flexible scheme that enables the fitting of linear fixed and mixed models. To test different hypotheses in microarray experiments several statistical methods have been proposed (Cui and Churchill, 2003;Wolfinger et al., 2001). Among the most commonly used is the classical F statistic or a modification of the F statistic (Cui et al., 2005). These two methods differ on the gene specific estimated variance components they incorporate into the tests. The modified F statistic uses shrunken (towards a common value) estimated variance components (Cui et al., 2005). Shrinking variance components has been reported to enhance power for detecting differential expression (Cui et al., 2005), since microarray experiments usually involve a small number of biological samples use to test differential expression of a large number of genes. However, moderated test statistics have non-standard distributions. To assess significance of these tests, P-values cannot be obtained by comparison with a reference null distribution. In this context, permutation analyses (Anderson and Ter Braak, 2003) have been implemented to obtain the null distribution of moderated test statistics (Yang and Churchill, 3    2007). Permutation analyses imply shuffling the data to simulate a null distribution and thereby can be very computationally intensive (Anderson, 2001). The goals of this study were to evaluate the response to infection in in-vivo PRRS virus infected pigs in a time-course basis. In particular, we wanted to assess whole-genome gene expression in blood of infected pigs at different times post-infection. Furthermore, we wanted to identify significant biological functions and gene sets that characterize the response to PRRS virus infection over time. To attain for these goals, the first chapter of this thesis assesses the statistical framework needed for more accurate and powerful evaluation of microarray data differential expression, including model assessment and tests for hypothesis testing. Real data is used to create plasmode datasets for evaluation of two different proposed linear models and tests statistics (Gadbury et al., 2008;Vaughan et al., 2009). The second chapter of this thesis addresses global differential gene expression in pigs showing variation in their phenotypic response to PRRS virus experimental infection during a time-course experiment. Therefore, we evaluated a microarray whole-genome expression profile of pigs assigned to four reaction groups (phenotypic groups) according to the pigs’ weight gain and blood viral load, as part of the PRRS Host Genetics Consortium (Lunney et al., 2011). Phenotypic groups were specified as a combination of these two traits, as follows: 1) high viral load-high weight gain (HvHg), 2) high viral load-low weight gain (HvLg), 3) low viral load-high weight gain (LvHg) and 4) low viral load-low weight gain (LvLg). Individuals selected had the most extreme (highest and/or lowest) observed values for both variables defining the phenotypic groups. 4    REFERENCES 5    References Ait-Ali, T., Wilson, A.D., Carre, W., Westcott, D.G., Frossard, J.P., Mellencamp, M.A., Mouzaki, D., Matika, O., Waddington, D., Drew, T.W., Bishop, S.C., and Archibald, A.L. (2011). Host inhibits replication of European porcine reproductive and respiratory syndrome virus in macrophages by altering differential regulation of type-I interferon transcriptional response. Immunogenetics 63, 437-448. Anderson, M.J. (2001). Permutation tests for univariate or multivariate analysis of variance and regression. Canadian Journal of Fisheries and Aquatic Sciences 58, 626-639. Anderson, M.J., and Ter Braak, C.J.F. (2003). Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation 73, 85-113. Bates, J.S., Petry, D.B., Eudy, J., Bough, L., and Johnson, R.K. (2008). Differential expression in lung and bronchial lymph node of pigs with high and low responses to infection with porcine reproductive and respiratory syndrome virus. Journal of Animal Science 86, 3279-3289. Boddicker, N., Waide, E.H., Rowland, R.R.R., Lunney, J.K., Garrick, D.J., Reecy, J.M., and Dekkers, J.C.M. (2012). Evidence for a major QTL associated with host response to Porcine Reproductive and Respiratory Syndrome Virus challenge. Journal of Animal Science 90, 1733-1746. Clapperton, M., Diack, A.B., Matika, O., Glass, E.J., Gladney, C.D., Mellencamp, M.A., Hoste, A., and Bishop, S.C. (2009). Traits associated with innate and adaptive immunity in pigs: heritability and associations with performance under different health status conditions. Genetics Selection Evolution 41. Cui, X.G., Hwang, J.T.G., Qiu, J., Blades, N.J., and Churchill, G.A. (2005). Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6, 59-75. Cui, X.Q., and Churchill, G.A. (2003). Statistical tests for differential expression in cDNA microarray experiments. Genome Biology 4. Dobbin, K., and Simon, R. (2002). Comparison of microarray designs for class comparison and class discovery. Bioinformatics 18, 1438-1445. Doeschl-Wilson, A.B., Kyriazakis, I., Vincent, A., Rothschild, M.F., Thacker, E., and GalinaPantoja, L. (2009). Clinical and pathological responses of pigs from two genetically diverse commercial lines to porcine reproductive and respiratory syndrome virus infection. Journal of Animal Science 87, 1638-1647. Done, S.H., Paton, D.J., and White, M.E.C. (1996). Porcine reproductive and respiratory syndrome (PRRS): A review, with emphasis on pathological, virological and diagnostic aspects. British Veterinary Journal 152, 153-174. 6    Gadbury, G.L., Xiang, Q.F., Yang, L., Barnes, S., Page, G.P., and Allison, D.B. (2008). Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates. Plos Genetics 4. Galina-Pantoja, L., Mellencamp, M.A., Bastiaansen, J., Cabrera, R., Solano-Aguilar, G., and Lunney, J.K. (2006). Relationship between immune cell phenotypes and pig growth in a commercial farm. Animal Biotechnology 17, 81-98. Genini, S., Delputte, P.L., Malinverni, R., Cecere, M., Stella, A., Nauwynck, H.J., and Giuffra, E. (2008). Genome-wide transcriptional response of primary alveolar macrophages following infection with porcine reproductive and respiratory syndrome virus. Journal of General Virology 89, 2550-2564. Holtkamp, D.J., Kliebenstein, J.B., Neumann, E.J., Zimmerman, J.J., Rotto, H., Yoder, T.K., Wang, C., Yeske, P., Mowrer, C., and Haley, C. (2012). Assessment of the economic impact of porcine reproductive and respiratory syndrome virus on United States pork producers. J. Swine Health Prod. (Manuscript in press). Kerr, K.M., and Churchill, G.A. (2001). Experimental design for gene expression microarrays. Biostatistics 2, 183-201. Lee, C., Bachand, A., Murtaugh, M.P., and Yoo, D. (2004a). Differential host cell gene expression regulated by the porcine reproductive and respiratory syndrome virus GP4 and GP5 glycoproteins. Veterinary Immunology and Immunopathology 102, 189-198. Lee, C., Rogan, D., Erickson, L., Zhang, J., and Yoo, D. (2004b). Characterization of the porcine reproductive and respiratory syndrome virus glycoprotein 5 (GP5) in stably expressing cells. Virus Research 104, 33-38. Lunney, J.K., Steibel, J., Reecy, J.M., Fritz, E., Rothschild, M.F., Kerrigan, M., Trible, B., and Rowland, R.R.R. (2011). Probing genetic control of swine responses to PRRSV infection: current progress of the PRRS host genetics consortium. BMC Proceedings 5, S30. Mardis, E.R. (2008). The impact of next-generation sequencing technology on genetics. Trends in Genetics 24, 133-141. Miller, L.C., and Fox, J.M. (2004). Apoptosis and porcine reproductive and respiratory syndrome virus. Vet Immunol Immunopathol 102, 131-142. Petry, D.B., Holl, J.W., Weber, J.S., Doster, A.R., Osorio, F.A., and Johnson, R.K. (2005). Biological responses to porcine respiratory and reproductive syndrome virus in pigs of two genetic populations. Journal of Animal Science 83, 1494-1502. Rosa, G.J.M., Steibel, J.P., and Tempelman, R.J. (2005). Reassessing design and analysis of twocolour microarray experiments using mixed effects models. Comparative and Functional Genomics 6, 123-131. Rowland, R.R.R., Lawson, S., Rossow, K., and Benfield, D.A. (2003). Lymphoid tissue tropism of porcine reproductive and respiratory syndrome virus replication during persistent 7    infection of pigs originally exposed to virus in utero. Veterinary Microbiology 96, 219235. Steibel, J.P., and Rosa, G.J.M. (2005). On reference designs for microarray experiments. Statistical Applications in Genetics and Molecular Biology 4. Steibel, J.P., Wysocki, M., Lunney, J.K., Ramos, A.M., Hu, Z.L., Rothschild, M.F., and Ernst, C.W. (2009). Assessment of the swine protein-annotated oligonucleotide microarray. Animal Genetics 40, 883-893. Tempelman, R.J. (2005). Assessing statistical precision, power, and robustness of alternative experimental designs for two color microarray platforms based on mixed effects models. Veterinary Immunology and Immunopathology 105, 175-186. Vaughan, L.K., Divers, J., Padilla, M.A., Redden, D.T., Tiwari, H.K., Pomp, D., and Allison, D.B. (2009). The use of plasmodes as a supplement to simulations: A simple example evaluating individual admixture estimation methodologies. Computational Statistics & Data Analysis 53, 1755-1766. Vincent, A.L., Thacker, B.J., Halbur, P.G., Rothschild, M.F., and Thacker, E.L. (2006). An investigation of susceptibility to porcine reproductive and respiratory syndrome virus between two genetically diverse commercial lines of pigs. Journal of Animal Science 84, 49-57. Wolfinger, R.D., Gibson, G., Wolfinger, E.D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., and Paules, R.S. (2001). Assessing gene significance from cDNA microarray expression data via mixed models. Journal of Computational Biology 8, 625-637. Wysocki, M., Chen, H., Steibel, J.P., Kuhar, D., Petry, D., Bates, J., Johnson, R., Ernst, C.W., and Lunney, J.K. (2012). Identifying putative candidate genes and pathways involved in immune responses to porcine reproductive and respiratory syndrome virus (PRRSV) infection. Animal Genetics 43, 328-332. Xiao, S.Q., Jia, J.Y., Mo, D.L., Wang, Q.W., Qin, L.M., He, Z.Y., Zhao, X.A., Huang, Y.K., Li, A.N., Yu, J.W., Niu, Y.N., Liu, X.H., and Chen, Y.S. (2010a). Understanding PRRSV Infection in Porcine Lung Based on Genome-Wide Transcriptome Response Identified by Deep Sequencing. Plos One 5. Xiao, S.Q., Mo, D.L., Wang, Q.W., Jia, J.Y., Qin, L.M., Yu, X.C., Niu, Y.N., Zhao, X.A., Liu, X.H., and Chen, Y.S. (2010b). Aberrant host immune response induced by highly virulent PRRSV identified by digital gene expression tag profiling. Bmc Genomics 11. Yang, H.U., and Churchill, G. (2007). Estimating p-values in small microarray experiments. Bioinformatics 23, 38-43. Zhou, P., Zhai, S., Zhou, X., Lin, P., Jiang, T., Hu, X., Jiang, Y., Wu, B., Zhang, Q., Xu, X., Li, J., and Liu, B. (2011). Molecular Characterization of Transcriptome-wide Interactions between Highly Pathogenic Porcine Reproductive and Respiratory Syndrome Virus and Porcine Alveolar Macrophages in vivo. Int J Biol Sci 7, 947-959. 8    CHAPTER TWO 9    Assessing optimal analysis models and statistical tests in a two color microarray experiment: a case study involving Porcine Reproductive and Respiratory Syndrome Virus infected pigs using plasmode datasets. Abstract We addressed optimal statistical analysis frameworks for microarray data. In particular we were interested in testing differential gene expression in pigs as a response to Porcine Reproductive and Respiratory Syndrome Virus experimental infection. We identified an existing dataset suitable to derive plasmodes conforming the null distribution expected under no differential expression .We used these plasmode datasets to evaluate analysis models and test statistics. We had eight alternative analyses from the factorial combination of two models (fixed vs. random array effect), two test statistics (classic F vs. moderated F), and two ways of computing significance thresholds (with tabulated F distribution vs. with permuted distribution). Specifically, we assessed Type I error rate of the alternative analysis models. We then assessed power of the tests that controlled the Type I error rate at or close to the nominal level. Keywords Porcine Reproductive and Respiratory Syndrome virus, microarray, plasmode datasets, optimal statistical analysis 10    1. Introduction The choice of appropriate mixed model analysis for microarray data has proven to be challenging (Rosa et al., 2005). To address differential gene expression using two-color microarrays several different designs have been suggested. In general, these experiments may include a reference sample, such as reference designs, or they may not, such as loop designs (Kerr and Churchill, 2001;Dobbin and Simon, 2002;Rosa et al., 2005;Tempelman, 2005). A linear mixed model approach most often underlies the analysis of data collected from these experimental designs (Rosa et al., 2005). The mixed model analysis approach involves a twostep analysis of variance (Wolfinger et al., 2001). Initially, a general model that accounts for technical variation implicit in the microarray experiment and that may affect the estimate of expression level for the genes, such as array and dye effects, is fitted to the data. Second, the residuals from the first model (now considered as normalized expression values) are used to analyze the expression of each gene separately, accounting for all pertinent sources of variation present in the experiment (Wolfinger et al., 2001).There may be multiple sources of variation in a microarray experiment, and an important distinction has to be made between technical and biological replication (Churchill, 2002;Rosa et al., 2005). Technical replication pertains to multiple subsamples obtained from the same individual sample, whereas biological replication pertains to multiple subjects being sampled (Cui and Churchill, 2003;Rosa et al., 2005). Ideally, to evaluate different treatment conditions in any given biological population, biological replication should be present (Churchill, 2002). Technical replication would further allow accounting for variation introduced by the protocol implemented for processing the samples in the laboratory (Churchill, 2002;Cui and Churchill, 2003). Linear models can account for most sources of variation, fitting variance components to represent the natural variability at several levels. 11    A possible way of increasing the efficiency of single gene mixed model analysis is by treating the array effect as a Gaussian distributed random effect (Kerr and Churchill, 2001;Kerr, 2003b) when analyzing log-intensity data. This approach was shown to be more efficient than treating array as a fixed effect for several designs where more than two groups are being compared (Steibel, 2007). To estimate variance components in a mixed model analysis, the Restricted Maximun Likelihood (REML) algorithm is typically used. To test hypotheses in a mixed model analysis of microarray data several statistical tests have been proposed (Cui and Churchill, 2003). A commonly used test is based on the classical F statistic (Wolfinger et al., 2001;Cui and Churchill, 2003). The classical F statistic is based on transcript specific estimated variance components (Cui and Churchill, 2003). An alternative test is a moderated F statistic that borrows information across all transcripts in the microarray and shrinks the estimated variance components towards a common value (Cui et al., 2005). Microarray experiments usually involve a small number of biological samples available to test differential expression of a large number of genes. When samples sizes are small variance components are estimated with less reliability (Cui and Churchill, 2003) which may lead to many false positive decisions (rejecting a null hypothesis of no differential expression when it is actually true). A way to avoid this is by shrinking estimated variance components towards a common value. Shrinkage of variance estimates has been reported to enhance power and to reduce Type I error rate for detecting differential expression (Cui et al., 2005). However, the moderated F statistic has a non-standard distribution under the null hypothesis. Consequently, to assess significance of these tests, P-values cannot be obtained by comparison with a reference null distribution. Hence, permutation analyses (Anderson and Ter Braak, 2003) have been implemented to obtain the null distribution of these statistics (Yang and Churchill, 2007). 12    Permutation analyses imply shuffling the observed data to simulate the distribution of the test statistic under the null hypothesis (Anderson, 2001). In summary, possible alternatives to enhance power of microarray analyses when sample sizes are small are: 1) treating array effect as random t and 2) optimizing estimation of variance components using shrinkage to borrow information across transcripts. Consequently, one can choose to model array as a random or fixed effect, and to use a classic F statistic or a moderated F statistic to assess statistical significance. Selecting a linear mixed model with a moderated F statistic may result in a computationally very intensive analysis. This is because the REML algorithm would have to iterate to get variance components estimates di novo within each permuted dataset. The question remains on whether this is a helpful strategy or not. Possible ways of predicting the performance of the different test statistics and linear models are: analytical derivations, evaluation in simulated data or evaluation in plasmode datasets (Mehta et al., 2004). Analytical derivations are often times non-tractable (Gadbury et al., 2008) and thus simulations or plasmode datasets become more relevant. Simulated datasets are generated based on parametric assumptions about their distributions, whereas plasmode datasets are generated based on real data for which some true structure is known (Mehta et al., 2004;Gadbury et al., 2008). Plasmode datasets have the advantage over simulations that they do not rely on pre-specified model assumptions, making them a useful tool for evaluation of statistical methods (Gadbury et al., 2008;Steibel et al., 2009a;Vaughan et al., 2009). The goal of this study was to compare linear fixed versus mixed models, and classic F versus a moderated F statistics as ways to optimize microarray data analysis for both specificity and sensitivity. To accomplish this we generated plasmode datasets using an existing microarray 13    experiment. The experiment from which the plasmode datasets were constructed involved pigs experimentally infected with Porcine Reproductive and Respiratory Syndrome virus (PRRSV) and showing phenotypic variation in response to the disease (Wysocki et al., 2012). 2. Materials and methods 2.1. Population, phenotypic groups and microarray design The samples used in this study and the experimental design are described in detail in Wysocki et al. (2012). Briefly, Hampshire-Duroc crossbred pigs infected with PRRSV have been classified into two phenotypic groups: 1) low responders (L), with low viremia, greater weight gain, few lung lesions and 2) high responders (H), high viremia, low/no weight gain, many lung lesions. Lung and bronchial lymph node (BLN) RNA was obtained from 4 pigs of each phenotypic group at 14 days post infection, and a microarray experiment was performed using the Pigoligoarray (Steibel et al., 2009b) following a common reference design (16 arrays in total). A pooled RNA sample isolated from several different tissues of uninfected animals was used as the common reference sample. 2.2. Plasmode construction To build the plasmode datasets we used the 8 arrays from BLN tissue, where no differential expression was detected. For explanation see the Results section of this paper and the original paper from Wysocki et al. (2012). We generated the plasmode datasets by randomly partitioning the gene expression dataset from BLN tissue into two groups of four arrays each. The non-reference sample in each array was randomly assigned a treatment label “H” or “L” (phenotypic group). Reference sample label was kept throughout plasmode construction. There were a total of 35 possible label arrangements on the eight available arrays, following: 14    (n! / [(n/2)!]2) / 2 where n = 8 arrays was assumed to be split between two treatment groups. One of the combinations had confounded dye and treatment effect. Thus, we did not incorporate that combination into the final plasmode. Consequently, all results presented in this paper are based on 34 plasmode datasets where no differential expression was expected, but where the natural biological variation was conserved. 2.3. Linear Model Analysis We considered linear fixed and linear mixed models. The basic linear model used to analyze gene expression involved two steps. The first step accounted for overall effects across all transcripts by fitting the following model: ygijk = µ + Dj + Ak + DAjk + egijk th where ygijk is the log-intensity for the k array labeled with the from phenotypic group i; [1] jth dye, corresponding to a pig μ is the overall mean; Dj is the effect of jth dye, with j = 1,2; Ak is th th th the effect of the k array, with k = 1,…,8; DAjk is the interaction of the j array and k dye, 2 finally eijk is the residual with eijk~N(0,σe ). In a second step a model that accounted for gene specific variation was fitted: ˆ e gijk = µg + Tgi + Dgj + Agk + εgijk 15    [2] where ˆ e gijk is the estimated residual from [1] for transcript g; µg is the gene specific overall mean, with g = 1, …, 20736; Tgi is the effect of phenotypic group i in the expression of the g th th transcript, with i = H, L and R (reference); Dgj is the effect of the j dye in the expression of the gth transcript; Agk is the effect of the kth array in the expression of the gth transcript; finally εgijk is the gene specific residual, with εgijk ~N(0,σε2). th The fixed effects model assumed Agk were the fixed effect of the k array, with k = 1,…,8. The mixed effects model assumed the array effect as random, with Agk ~ N(0, σga2). We analyze the data using MAANOVA package (Wu et al., 2003) in R (R Development Core Team, 2010). 2.4. Test Statistics We evaluated two test statistics, the classic F test (Wu et al., 2003) and a moderated F test (Cui et al., 2005). The classical F statistic is based on transcript specific variance components whereas the moderated F statistic is based on shrunk estimates of the variance components. 2.5. Estimation of the critical values Two different approaches were used for the estimation of significance thresholds. First, critical values were obtained from a central F distribution. Second, we considered a permutation approach where the critical values were obtained by shuffling the residuals of a subset of the transcripts. This subset was obtained before the permutation analysis and corresponded to transcripts which F statistic was smaller than an initial critical value of 0.9 (Yang and Churchill, 16    2007). The number of permutations was 100 for the random array effect model, and 1000 for the fixed array effect model. The resulting test statistics obtained for the permuted subset of transcripts were then used to compute the P-values. 2.6. Assessment of performance under the null hypothesis We had eight alternative analyses from the factorial combination of two models (fixed vs. random array effect), two test statistics (classic F vs. moderated F), and two ways of computing significance thresholds (using a tabulated F distribution vs. using a permuted distribution). To assess the performance of these alternative tests under the null hypothesis we computed empirical Type I error rates in the plasmode datasets. Defining the Type I error rate (α) as the number of true null hypothesis that are rejected at a significance threshold, to estimate empirical Type I error rates we counted the number of tests that were rejected at a particular P-value threshold. We arbitrary evaluated nominal significance thresholds of 0.05, 0.01, 0.005, 0.001, and 0.0001. 2.7. Assessment of performance in a dataset with DE transcripts The performance of the statistical tests described in Section 2.6. was evaluated in the lung dataset, where differential expression between phenotypic groups was assumed to be present. Although we did not know which transcripts were differentially expressed in this experiment, we used this dataset to count the proportion of rejected hypotheses obtained with the eight alternative analyses. If two analyses methods provided the same Type I error (estimated from the plasmode analyses), the procedure leading to more rejections in the lung dataset is likely to provide more power. 17    Finally, we used the false discovery rate (Storey, 2003) to assess significance (i.e. DE transcripts) in the lung dataset. The false discovery rate (FDR), defined as the expected proportion of false rejected hypotheses (Benjamini and Hochberg, 1995), is a multiple comparison adjustment to control the Type I error rate. 3. Results 3.1. Overall evidence of differential expression in lung and BLN microarray datasets Initially, a moderated F statistic and significance thresholds obtained by permutation within a mixed effects model (eq. [1] and [2] with Agk normally distributed) was used to address differential expression between phenotypic groups in both datasets. Figure 2.1 shows histograms and Q-Q plots of P-values for each dataset. Histograms and Q-Q plots are used to assess the distributional assumptions of an observed variable. The uniform distribution reflected by a flat histogram of P-values in BLN dataset (Fig. 2.1A) indicated that there was no evidence of differential expression between phenotypic groups. On the other hand, we observed evidence of differential expression between phenotypic groups in the lung dataset reflected in a larger frequency of small P-values (Fig. 2.1B). Similar conclusions can be made upon inspection of uniform Q-Q plots. Uniform Q-Q plots were used to compare the observed quantiles of the negative log transformed P-values versus the negative log transformed quantiles of a uniform distribution, implicit under the null hypothesis of no differential expression (Fig. 2.1C and 2.1D, red x = y line). Although the Q-Q plot for the BLN dataset (Fig. 2.1C) showed a slight departure from the quantiles expected under the null hypothesis, this was likely due to chance alone. Moreover, for an adjusted p-value threshold or FDR ≤ 20% no differentially expressed (DE) transcripts were identified, as stated in Wysocki et al. (2012). On the other side, the Q-Q plot 18    for the lung dataset (Fig. 2.1D) showed a clear departure from the expected quantiles for a uniform distribution, evidencing the presence of differential gene expression (Wysocki et al., 2012). Consequently, we used BLN data to generate plasmode datasets under the assumption of no differential expression. 3.2. Estimation of Type I error using plasmode datasets Tests computed with the classic F statistic and using tabulated significance thresholds controlled the type I error rate close to nominal levels in all cases. This is shown in Fig. 2.2 where the realized Type I error rate for these tests (blue line) is very close to (for the fixed effects model, Fig. 2.2B) or exactly equal to (for the mixed effects model, Fig. 2.2A) the nominal Type I error rate (red x = y line). Similarly, tests based on significance thresholds obtained by permutation controlled Type I error rate at the nominal level for both distributional model assumptions (array as a fixed or random effect) and statistics (classic and moderated F) considered. This is evidenced by the dashed lines in Fig. 2.2A and 2.2B that overlaid with the red x = y line, indicating that the realized Type I error rate was equal to the nominal Type I error rate. Tests computed with the moderated F statistic and tabulated significance thresholds did not control the Type I error rate at the nominal level in any case, and the realized type I error rate was consistently smaller than the nominal Type I error rate indicating an overly conservative test (Fig. 2.2A and 2.2B, green line). Based on the above presented results, we focused on tests that accurately controlled the Type I error rate close to the nominal level. That is, tests based on the classic F statistic and tabulated significance thresholds as well as tests based on the moderated F statistic and permuted 19    significance thresholds. We further explored the distribution of P-values for these tests (Fig. 2.3), and we observed that the P-value distributions resembled the expected distributions under the null hypothesis (green lines versus red lines in Fig. 2.3). However, we noticed plasmode to plasmode variation in the P-value distribution from all these tests (black lines in the graphics) such that an individual list of P-values from a particular experiment might contain more or less false positives compared to the expected proportion at the nominal significance level. 3.3. Assessment of differential expression in lung dataset Tests based on the moderated F statistic and significance thresholds obtained by permutation had the largest proportion of rejected hypotheses (proportion of rejected hypothesis = 0.091, Fig. 2.4A) when array was assumed as a random effect. When the same moderated F statistic with significance thresholds obtained by permutation was used in a fixed array model, the proportion of rejected hypotheses slightly decreased (proportion of rejected hypothesis = 0.087, Fig. 2.4B). Nevertheless, in both models (fixed or random array) these moderated F-tests had the largest proportion of rejected hypotheses when compared to classical F-tests and tabulated critical values. Tests computed based on the classic F statistic assuming array as a random effect returned similar results whether the null distribution was obtained by permutation or from a tabulated F distribution. This is shown by the overlapping blue lines in Fig. 2.4A. However, for the same tests computed in the model assuming array as a fixed effect, P-values obtained with significance thresholds obtained by permutation returned a larger proportion of rejected tests than P-values obtained with tabulated significance thresholds (blue lines in Fig. 2.4B). 20    Among tests that controlled the Type I error rate close to the nominal level, we computed the number of rejected hypotheses (i. e. the number of DE transcripts) at different FDR thresholds (Table 2.1). The largest number of rejected hypotheses occurred for tests based on a moderated F statistic and significance thresholds obtained by permutation in models assuming a random array effect. In addition, the comparison of the fixed array effect versus the random array effect models showed that considering array as a random effect enhanced power (larger number of rejected hypotheses) at all FDR thresholds. That is, the number of rejected hypotheses ranged between 3 and 21 for the fixed array effect model, whereas that number ranged between 5 and 69 for the random array effect model (depending on the FDR threshold considered). Finally, we compared the quantiles of the observed distribution of P-values for tests controlling Type I error rate close to the nominal level with the expected quantiles under the null distribution (Fig 2.5). We observed that all tests had more extreme values than expected under the null distribution, revealing that they detect differential expression. Regardless the distributional assumptions on the array effect, P-values obtained based on the moderated F statistic and significance thresholds obtained by permutation had the furthermost extreme quantile values. Moreover, when array was assumed random, P-values associated with moderated F statistic and significance thresholds obtained by permutation departed more from the expected null distribution than their counterpart based on a fixed array effect model. 21    Table 2.1. Number of differentially expressed genes in a fixed versus mixed effects model at different FDR thresholds. Fixed Effects model Mixed Effects Model False Discovery Rate (%) Test 1 2.5 5 10 1 2.5 5 10 classic F statistic and tabulated significance thresholds 0 1 3 5 0 2 3 7 classic F statistic and permuted significance thresholds 2 2 3 3 2 2 3 7 moderated F statistic and tabulated significance thresholds 0 0 0 0 0 0 0 0 moderated F statistic and permuted significance thresholds 3 5 9 21 5 10 21 69   4. Discussion In this study we aimed to identify optimal statistical analysis frameworks for microarray data. In particular we were interested in testing differential gene expression in pigs as a response to PRRSV experimental infection. Consequently, we used an existing dataset (Wysocki et al., 2012) suitable to derive plasmodes (Gadbury et al., 2008;Vaughan et al., 2009) conforming the null distribution expected under no differential expression. We worked with these plasmode datasets to evaluate analysis models and test statisitcs. Specifically, we wanted to assess Type I error rate of alternative analysis models. We then assessed power of the tests that controlled the Type I error rate at or close to the nominal level. Among all testing procedures and analysis scenarios, we identified three types of tests that controlled the Type I error rate close to the nominal level. First, tests based on a moderated F statistic and significance thresholds obtained by permutation. Second, tests based on a classic F 22    statistic and tabulated significance thresholds. Third, tests based on a classic F statistic and significance thresholds obtained by permutation. These results extended the results from Yang and Churchill (2007) who reported that P-values estimated based on a moderated t-statistic using permutation controlled Type I error rate close to the nominal level. On the contrary, tests based on the moderated F statistic and tabulated significance thresholds were too conservative. This was expected since the moderated F statistic did not have a standard F distribution (Cui et al., 2005). Therefore, permutation analysis should be used to obtain the P-values (Anderson and Ter Braak, 2003;Yang and Churchill, 2007). When the array effect was assumed as fixed, tests based on the classic F statistic and significance thresholds obtained by permutation better controlled the Type I error rate to the nominal level than tests based on the same statistic and tabulated significance thresholds. In this regard, P-values associated with tabulated significance thresholds were computed based on the assumption of independent identically normally distributed residuals (Anderson, 2001). However, the analyzed dataset may have not met this assumption resulting in P-values that did not conform to the nominal F distribution. The P-values associated with significance thresholds obtained by permutation were computed not making such assumptions (Anderson, 2001), therefore resulting in an improved performance of these tests. In this study we showed that treating array as a random effect increased power of all tests evaluated. This is particularly evident in tests based on the moderated F statistic and significance thresholds obtained by permutation, where for example at FDR < 10%, 69 tests were rejected when random array effect was assumed, and only 21 tests were rejected when the array effect was assumed fixed. Modeling the array as a random or fixed effect has been discussed previously 23    (Kerr and Churchill, 2001;Wolfinger et al., 2001;Kerr, 2003a;b). Defining the relative efficiency as the ratio of the variances for fixed versus random array analysis models of the difference in means between two treatment groups, Kerr (2003a) showed that when array is assumed as random in a common reference design and its variance tends to infinity the relative efficiency equals 1. However, when the array variance tends to 0, the relative efficiency is larger than 1, and in that case to assumed array as a random effect is more efficient for estimating treatment differences. Moreover, Steibel (2007) proved that the relative efficiency of the fixed array versus random array model reached a maximum as the array and biological variance became small compared to the residual variances. Thus, treating array as a random effect allowed recovering of inter-block (i.e. inter-array) information. This has been reported before (Kerr, 2003b;Steibel, 2007), but it has been shown that the amount of recovery of information and increase in power depends on variance ratios. In this paper we showed that the practical implication of assuming array as a random effect was that the number of rejected hypothesis increased from 21 to 69 for a FDR < 10%. As mentioned above, tests based on the moderated F statistic and significance thresholds obtained by permutation exceeded in the number of rejected hypotheses all other tests. There, shrinkage improved the estimation of variance components by borrowing information across all transcripts in the microarray (Cui et al., 2005) therefore resulting in more powerful tests. This was consistent with a previous report where P-values estimated using permutation and a moderated t-statistic yielded larger number of positive genes than P-values estimated using permutation and a regular t-statistic at all FDR thresholds evaluated (Yang and Churchill, 2007). In summary, in this study we show that considering array as a random effect, and addressing differential expression using tests based on the moderated F statistic and significance 24    thresholds obtained by permutation was the most powerful analysis. However, this type of analysis may be very computational demanding. An alternative would be to consider array to be a fixed effect. While this model results in less powerful tests, for a fixed number of permutations, the elapsed computational time is reduced by 87 to 90 fold, in the BLN and lung dataset respectively. Another alternative would be to consider array as a random effect, and tests based on the classic F statistic and tabulated significance thresholds. This would enable avoiding the permutation analysis and decreasing the elapsed computational time by 48 fold. However the power of the tests resulting from this approach can be expected to be much smaller. 25    REFERENCES 26    References Anderson, M.J. (2001). Permutation tests for univariate or multivariate analysis of variance and regression. Canadian Journal of Fisheries and Aquatic Sciences 58, 626-639. Anderson, M.J., and Ter Braak, C.J.F. (2003). Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation 73, 85-113. Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 289-300. Churchill, G.A. (2002). Fundamentals of experimental design for cDNA microarrays. Nat Genet 32 Suppl, 490-495. Cui, X.G., Hwang, J.T.G., Qiu, J., Blades, N.J., and Churchill, G.A. (2005). Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6, 59-75. Cui, X.Q., and Churchill, G.A. (2003). Statistical tests for differential expression in cDNA microarray experiments. Genome Biology 4. Dobbin, K., and Simon, R. (2002). Comparison of microarray designs for class comparison and class discovery. Bioinformatics 18, 1438-1445. Gadbury, G.L., Xiang, Q.F., Yang, L., Barnes, S., Page, G.P., and Allison, D.B. (2008). Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates. Plos Genetics 4. Kerr, K.M., and Churchill, G.A. (2001). Experimental design for gene expression microarrays. Biostatistics 2, 183-201. Kerr, M.K. (2003a). Design Considerations for Efficient and Effective Microarray Studies. Biometrics 59, 822-828. Kerr, M.K. (2003b). Linear Models for Microarray Data Analysis: Hidden Similarities and Differences. Journal of Computational Biology 10, 891-901. Mehta, T., Tanik, M., and Allison, D.B. (2004). Towards sound epistemological foundations of statistical methods for high-dimensional biology. Nature Genetics 36, 943-947. R Development Core Team (2010). " R: A language and environment for statistical computing.". (Vienna, Austria: R Foundation for Statistical Computing). Rosa, G.J.M., Steibel, J.P., and Tempelman, R.J. (2005). Reassessing design and analysis of twocolour microarray experiments using mixed effects models. Comparative and Functional Genomics 6, 123-131. Steibel, J.P. (2007). Improving experimental design and statistical inference for transcriptional profiling experiments. Ph.D. 3264239, Michigan State University. 27    Steibel, J.P., Poletto, R., Coussens, P.M., and Rosa, G.J.M. (2009a). A powerful and flexible linear mixed model framework for the analysis of relative quantification RT-PCR data. Genomics 94, 146-152. Steibel, J.P., Wysocki, M., Lunney, J.K., Ramos, A.M., Hu, Z.L., Rothschild, M.F., and Ernst, C.W. (2009b). Assessment of the swine protein-annotated oligonucleotide microarray. Animal Genetics 40, 883-893. Storey, J.D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics 31, 2013-2035. Tempelman, R.J. (2005). Assessing statistical precision, power, and robustness of alternative experimental designs for two color microarray platforms based on mixed effects models. Veterinary Immunology and Immunopathology 105, 175-186. Vaughan, L.K., Divers, J., Padilla, M.A., Redden, D.T., Tiwari, H.K., Pomp, D., and Allison, D.B. (2009). The use of plasmodes as a supplement to simulations: A simple example evaluating individual admixture estimation methodologies. Computational Statistics & Data Analysis 53, 1755-1766. Wolfinger, R.D., Gibson, G., Wolfinger, E.D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., and Paules, R.S. (2001). Assessing gene significance from cDNA microarray expression data via mixed models. Journal of Computational Biology 8, 625-637. Wu, H., Kerr, M., Cui, X., and Churchill, G. (2003). "MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments The Analysis of Gene Expression Data," eds. G. Parmigiani, E. Garrett, R. Irizarry & S. Zeger. Springer London), 313-341. Wysocki, M., Chen, H., Steibel, J.P., Kuhar, D., Petry, D., Bates, J., Johnson, R., Ernst, C.W., and Lunney, J.K. (2012). Identifying putative candidate genes and pathways involved in immune responses to porcine reproductive and respiratory syndrome virus (PRRSV) infection. Animal Genetics 43, 328-332. Yang, H.U., and Churchill, G. (2007). Estimating p-values in small microarray experiments. Bioinformatics 23, 38-43. 28    CHAPTER THREE (Submitted to Frontiers in Livestock Genomics, undergoing review process) 29    Characterizing differential individual response to Porcine Reproductive and Respiratory Syndrome Virus infection through statistical and functional analysis of gene expression Maria E. Arceo1, Catherine W. Ernst1, Joan K. Lunney2, Igseo Choi2, Nancy E. Raney1, Tinghua Huang3, Christopher K. Tuggle3, R.R.R. Rowland4, Juan P. Steibel1,5,* 1 Department of Animal Science, Michigan State University, East Lansing, MI, USA 2 Animal Parasitic Diseases Laboratory, Agriculture Research Service, United States Department of Agriculture, Beltsville, MD, USA 3 Department of Animal Science, Iowa State University, Ames, IA, USA 4 Department of Diagnostic Medicine and Pathobiology, Kansas State University, Manhattan, KS, USA 5 Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, USA Correspondence: Dr. Juan P. Steibel Michigan State University Department of Animal Science and Department of Fisheries and Wildlife 1205 Anthony Hall, East Lansing, MI, 48824,USA 30    steibelj@msu.edu Abstract We evaluated differences in gene expression in pigs from the Porcine Reproductive and Respiratory Syndrome (PRRS) Host Genetics Consortium initiative showing a range of responses to PRRS virus infection. Pigs were allocated into four phenotypic groups according to their serum viral level and weight gain. RNA obtained from blood at 0, 4, 7, 11, 14, 28, and 42 days post infection (DPI) was hybridized to the 70-mer 20K Pigoligoarray. We used a blocked reference design for the microarray experiment. This allowed us to account for individual biological variation in gene expression, and to assess baseline effects before infection (0 DPI).Additionally, this design has the flexibility of incorporating future data for differential expression analysis. We focused on evaluating transcripts showing significant interaction of weight gain and serum viral level. We identified 491 significant comparisons (FDR ≤ 10%) across all DPI and phenotypic groups. We corroborated the overall trend in direction and level of expression (measured as fold change) at four DPI using qPCR (r = 0.91, p ≤ 0.0007). At 4 and 7 DPI, network and functional analyses were performed to assess if immune related gene sets were enriched for genes differentially expressed across four phenotypic groups. We identified cell death function as being significantly associated (FDR ≤ 5%) with several networks enriched for differentially expressed transcripts. We found the genes interferon-alpha 1(IFNA1), major histocompatibility complex, class II, DQ alpha 1 (SLA-DQA1), and major histocompatibility complex, class II, DR alpha (SLA-DRA) to be differentially expressed (p ≤ 0.05) between 31    phenotypic groups. Finally, we performed a power analysis to estimate sample size and sampling time-points for future experiments. We concluded the best scenario for investigation of early response to PRRSV infection consists of sampling at 4 and 7 DPI using about 30 pigs per phenotypic group. Keywords: porcine reproductive and respiratory syndrome, microarray, quantitative PCR, functional analysis, power analysis. 32    1. Introduction Porcine Reproductive and Respiratory Syndrome (PRRS) was initially described in the US over 20 years ago (Done et al., 1996) and a virus, now known as Porcine Reproductive and Respiratory Syndrome Virus (PRRSV), was identified as the primary causative agent (Collins et al., 1992). Overall, the disease causes $ 664 million annual losses to the US pork industry (Holtkamp et al., 2012). Viral replication takes place in the host’s immune cells (Rowland et al., 2003;Genini et al., 2008) thereby, reducing the cytokine-mediated inflammatory response (Kimman et al., 2009). In this context, a possible way of controlling PRRS is addressing the host genetic component. Host genetic response to infection, in particular, phenotype-genotype associations in immune related traits can be evaluated using currently available genomic tools (Lewis et al., 2007;Lunney and Chen, 2010). While the molecular pathways involved in the protection against PRRS have not yet been entirely elucidated (Kimman et al., 2009) genotype and immune traits associations have been documented (Clapperton et al., 2005;Wattrang et al., 2005). Furthermore, phenotypic variation between breed-lines has been observed in disease-related and production traits of experimentally infected pigs (Petry et al., 2005;Vincent et al., 2006;Doeschl-Wilson et al., 2009). These authors reported differences in clinical symptoms and lung pathology in response to PRRSV infection, as well as in virus levels in serum and/or respiratory tissues, such as lung and bronchial lymph nodes. Doeschl-Wilson et al. (2009) and Petry et al. (2005) also reported differential body weight changes in PRRSV infected pigs. 33    Studying gene expression in pigs showing phenotypic variation to PRRSV infection responses will enhance our knowledge of genetic control of the susceptibility to this disease. In this context, differential expression of a reduced number of immune related genes has been evaluated (Petry et al., 2007;Lunney et al., 2010) and global differential expression has been assessed in vivo (Bates et al., 2008;Xiao et al., 2010a;Xiao et al., 2010b;Zhou et al., 2011;Wysocki et al., 2012) and in vitro (Lee et al., 2004a;Lee et al., 2004b;Miller and Fox, 2004;Genini et al., 2008;Ait-Ali et al., 2011). Most previous studies focused on comparing gene expression of PRRSV-infected and uninfected pigs, as well as gene expression between animals showing differences in postinfection viral titers. However, little is known of the interaction between viral load and weight gain as it relates to gene expression post-infection. This is particularly important given the reported associations of immune traits with growth rate (Galina-Pantoja et al., 2006;Boddicker et al., 2012) and the genetic correlations between growth rate and disease traits (Doeschl-Wilson et al., 2009) as well as between growth rate and immune related traits (Clapperton et al., 2009). The availability of whole genome microarrays (Steibel et al., 2009c) and next generation sequencing (Mardis, 2008) have further favored whole genome expression profiling of PRRSV infected animals (Xiao et al., 2010a;Xiao et al., 2010b). Important features when evaluating gene expression are: 1) the correct modeling of the phenotypic variation and the inclusion of biological replication (Rosa et al., 2005) and 2) sampling relevant tissues and time-points (Mateu and Diaz, 2008;Lunney et al., 2010). We evaluated whole-genome expression profile of pigs assigned to 4 reaction groups (phenotypic groups) according to the pigs’ weight gain and blood viral load as part of the PRRS 34    Host Genetics Consortium (PHGC) (Lunney et al., 2011). The goals of this study were: 1) to assess global differential gene expression in commercial pigs showing variation in phenotypic response to PRRSV experimental infection, and to identify relevant molecular networks and biological functions enriched for differentially expressed genes involved in the pig’s immune response to PRRSV infection; and 2) to inform the design of future experiments, to determine the most informative early time-points and sample sizes required for powerful inferences when assessing gene expression in blood of commercial pigs experimentally infected with PRRSV. 2. Materials and methods 2.1. Animal model and study design Crossbred commercial pigs (~200) from PHGC trial one (Lunney et al., 2011) were transported to the Kansas State University bio-secure testing facility at weaning (11 to 21 d. old) and allocated to pens (10 to 15 pigs/pen). Pigs came from PRRSV-, Influenza virus- and Mycoplasma hyopneumoniae-free farms. After a 7-day acclimation period and antibiotic treatments, pigs were both intramuscularly and intranasally infected with a known isolate of PRRSV (105 tissue culture infectious dose50 of NVSL 97-7985). Blood samples were collected in Tempus™ Blood RNA Tubes (Life Technologies, Carlsbad, CA) at 0, 4, 7, 11, 14, 19, 28 and 42 days post-infection (DPI). Individual animal weight was measured at weekly intervals. Serum viral level was quantified using a semi-quantitative TaqMan PCR assay. More details on the pig resources, study design and data storage are in Lunney et al. (2011) and Boddicker et al. (2012). The study was approved by Kansas State University Institutional Animal Care and Use Committee. 35    2.2. Phenotypic groups Figure 3.1 shows a scatterplot of weight gain versus viral load (VL) for all pigs from PHGC trial one. Four phenotypic groups were defined according to the pigs’ weight gain and VL. Weight gain was defined as body weight (kg) from 0 to 42 DPI; VL was defined as the area under the curve of the log-transformed serum viral level from 0 to 21 DPI. These two variables showed moderate negative correlation (r = -0.29). Thus, bivariate data of VL and weight gain were centered at their mean values and rotated to obtain uncorrelated measures. Phenotypic groups were then specified as a combination of these two traits: 1) high viral load-high weight gain (HvHg), 2) high viral load-low weight gain (HvLg), 3) low viral load-high weight gain (LvHg) and 4) low viral load-low weight gain (LvLg). For allocation to these four groups, pigs that were within one standard deviation of the population mean for either of the traits were discarded and the remaining animals were classified to one of the groups (Fig. 3.1). 2.3. Microarray design and analysis Three pigs per group were randomly selected and their RNA isolated using the TempusTM Spin RNA Isolation Kit as per manufacturer’s instructions (Applied Biosystems/Life Technologies) from blood at 0, 4, 7, 11, 14, 28, and 42 DPI. RNA samples were reverse transcribed using the Amino Allyl MessageAmp II aRNA Amplification Kit (Ambion./Life Technologies), labeled with N-hydroxysuccinate (NHS) ester Cy3 or Cy5 dyes (GE Healthcare, CA), and hybridized to the 20K 70-mer oligonucleotide Pigoligoarray as previously described (Steibel et al., 2009c) following a block reference design (Steibel and Rosa, 2005). Each individual pig’s 0-DPI-sample served as reference for all other samples from the same animal. Reference sample dye flipping was performed across pigs to allow separation of dye and 0-DPI effects (Steibel and Rosa, 2005). Fluorescent images and fluorescence intensity data were 36    collected as previously described (Steibel et al., 2009c). Median intensities were background corrected with Normexp method fixing the offset parameter κ =50 (Ritchie et al., 2007). Background corrected data was normalized using a within print-tip loess-location normalization (Yang et al., 2002). All computations were implemented in R (R Development Core Team, 2010) through LIMMA (Smyth, 2005). Normalized log-intensities were analyzed on a transcript per transcript basis using a linear mixed model accounting for all pertinent random and fixed effects (Rosa et al., 2005) as described below: yijklm = μ + TGij + Dl + Am + Sk + eijklm th th where yijklm is the log-intensity measure at the i DPI, for the k pig corresponding to th th phenotypic group j in the m array labeled with the l dye; μ is the overall mean; TGij is the effect of DPI i and phenotypic group j in the expression of the transcript, with i = 1,…,7 and j = th th 1,…,4; Dl is the effect of l dye, with l = 1,2; and Am is the random effect of the m array, 2 th with m = 1,…,72 and Am~N(0,σa ); Sk is the random effect of the k pig, with k = 1,…12 2 2 and Sk~N(0,σs ); finally eijklm is the residual with eijklm~N(0,σe ). The mixed model was fitted using the package MAANOVA (Wu et al., 2003). An F test based on a shrinkage estimator of variance components was used to evalulate significance of fixed effects (Cui et al., 2005). Permutation based P-values (number of permutations = 100) were obtained to assess significance (Yang and Churchill, 2007). To account for multiple testing a two-stage testing procedure was used to assess significance of gene expression changes in response to weight gain and viral load status over time. First, for 37    each DPI, the interaction effect TGij was tested at 10% false discovery rate (FDR) (Storey, 2003). Then, for all transcripts with significant interaction, the effect of viral load was evaluated in high-weight-gain and low-weight-gain pigs, separately (nominal p ≤ 1/Ns, with Ns = number of significant interactions). Similarly, the effect of weight gain was evaluated in high-viral-load and low-viral-load pigs, separately. All in all, this testing protocol resulted in 4 contrasts of interest for the comparisons of phenotypic groups. Each contrast involved the comparison of two phenotypic groups. Two of these contrasts evaluated the effect of viral load and the other two the effect of weight gain. In particular, for the viral load effect, we compared 1) HvHg vs.LvHg and 2) HvLg vs.LvLg. To evaluate the weight gain effect, we compared 1) HvHg vs.HvLg and 2) LvHg vs.LvLg. Data have been deposited in NCBI’s Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) with accession number GSE41144. Code used for these analyses can be found at https://www.msu.edu/~steibelj/JP_files/PRRSV.html 2.4. Pathway Analysis Gene set enrichment analyses were performed using Ingenuity Pathways Analysis software (IPA) (Ingenuity® Systems, www.ingenuity.com). After statistical analyses described above pathway analyses were restricted to early time-points (4 and 7 DPI). The network analyses generated a set of relevant networks (P ≤ 10-10 and number of DE genes ≥ 5) built based on a user-specified list of genes. Networks were composed of genes and gene products that are known to interact with each other and that were enriched for the DE transcripts (as defined in section 2.3.). Functional analysis identified the biological functions that were significantly enriched (Benjamini-Hochberg (1995) p < 0.05) for these DE genes. 38    2.5. qPCR design and analysis Quantitative PCR (qPCR) analysis was used to assess differential expression of 15 genes. Twelve genes were selected from microarray and pathway analyses results. In addition, three genes [interferon-alpha 1 (IFNA1); major histocompatibility complex, class II, DQ alpha 1 (SLADQA1); and major histocompatibility complex, class II, DR alpha (SLA-DRA)] not present in the microarray platform were included based on previously documented knowledge of their relevant role in immune modulation, reviewed by Lunney and Chen (2010). Probes and primers were obtained from the Porcine Immunology and Nutrition Database (Dawson et al., 2005) or designed with Primer Express Software v3.0 (Life Technologies) from sequences obtained from Ensembl (http://useast.ensembl.org/index.html). Primers and probes were designed to span exonexon junctions, if possible, to avoid false positive genomic DNA contamination (Supplementary Table 1). Synthesis of cDNA was performed with SuperScript Reverse Transcriptase and qPCR amplification was implemented using the Brilliant Kit (Agilent Technologies, Inc., CA) with 35 ng of cDNA in an ABI Prism 7500 Sequence Detector System (Life Technologies). Assays were performed in duplicate. The amplification conditions are described in Royaee et al.(2004). Ct values were obtained from each individual amplification curve. Average Ct for each target gene in each sample and DPI (4 and 7) were subtracted from the corresponding average Ct of RPL32 (housekeeping gene), producing ΔCt values. ΔΔCt values were computed by subtracting 0-DPIΔCt from ΔCt at each DPI. Resulting ΔΔCt were analyzed separately for each DPI (except 0 DPI) with the following linear model: y m = Gm + e m 39    th where ym is the ΔΔCt value for the gene in the m phenotypic group, Gm is the effect of the phenotypic group m and em~N(0,σe 2 ) is the residual. This model is equivalent to a previously described linear model (Steibel et al., 2009a). 2.6. Statistical power and sample size computation Using this experiment’s dataset as pilot data for future experiments with a similar design, we computed the expected discovery rate (EDR) and FDR as defined by Gadbury et al. (2004) to estimate statistical power at a fixed number of biological replicates (n) and Type I error rate (α). We considered either n = 20 or n = 30 per phenotypic group (4 groups) and α = 0.01. This choice of α resulted in a FDR < 10 % in all cases. Computations were performed using PowerAtlas software (Page et al., 2006). Sample sizes were selected assuming a common reference design with either 4 (n = 20) or 3 (n = 30) sampling time-points, such that the total number of microarray slides was fixed to 240. This represents a common situation where the researcher has to decide whether to allocate arrays to extra biological samples with fewer time-points or to include more time-points at the expense of sample size for a given total budget. 3. Results Dye labeled cDNA prepared from blood samples from 12 PHGC pigs at 7 different time points (0, 4, 7, 11, 14, 28, and 42 DPI) were hybridized to the Pigoligoarray using a block reference design. Three pigs per group were randomly selected from each of the four phenotypic groups defined according to the pigs’ weight gain and viral load (HvHg, HvLg, LvHg and LvLg). We addressed global differential expression in four contrasts of interest (HvHg vs. LvHg, HvLg vs. LvLg, HvHg vs. HvLg, and LvHg vs. LvLg). 3.1. Microarray analysis 40    3.1.1. Evidence of differential gene expression at 0 DPI The presence of an effect on gene expression profile that cannot be attributed to the experimental infection was addressed by evaluating differential gene expression between the four phenotypic groups at 0 DPI. Although no significant differences in transcripts were identified (FDR ≤ 0.1), inspection of P-value distributions for the four contrasts indicated a departure from the expected uniform distribution under null hypothesis (Fig. 3.2). The actual distribution of P-values for LvHg vs. LvLg indicated an excess of small P-values. This is consistent with the alternative hypothesis of differential expression. The contrasts HvHg vs. LvHg and HvLg vs. LvLg showed P-value distributions inconsistent with both null and alternative hypotheses implicit in the analysis model (Page et al., 2006). The observed deviations in the P-value distributions of these tests likely reveal the existence of unaccounted effects (Page et al., 2003). These patterns also appeared in contrasts at other time-points if these differences at 0 DPI were ignored (results not shown). 3.1.2. Evidence of differential gene expression for remaining DPI Based on the results from the previous section, differential expression between phenotypic groups after 0 DPI was corrected by subtracting the estimated difference at 0 DPI. For example, to address differential expression between two phenotypic groups of pigs, the effect was estimated following [(TGi≠0, j – TGi’=0, j) – (TGi≠0, j’ – TGi’=0, j’)]. The same procedure was used for all contrasts. After correcting for 0-DPI estimated effect, the distribution of P-values for all contrasts was consistent with the expected distribution under either null or alternative hypotheses (data not shown). This indicated that correcting each comparison estimate by the corresponding estimate at time zero accounts for pre-existing differences in gene 41    expression and/or for animal specific effects missed by the model. Consequently, we based all inferences on the above specified contrasts. Evidence of differential expression was found, as revealed by the Q-Q plot of P-values (Fig. 3.3). This plot represents the quantiles of the empirical distribution of P-values versus the expected quantiles of uniformly distributed P-values (corresponding to the null hypothesis). The represented departure from the straight line y = x indicates an excess of small P-values as compared to the expectation under the null hypothesis, consistent with the alternative hypothesis. 3.1.3. Weight gain and viral load interaction effect on gene expression A total of 491 null hypotheses were rejected (FDR ≤ 10%) when testing for the interaction between weight gain and VL. The number of transcripts showing a significant interaction was 288, 14, 177 and 12 at 4, 7 14 and 42 DPI respectively (Table 3.1). There were no significant interactions detected at 11 and 28 DPI. Transcripts showing significant weight gain and viral load interaction effect on their expression were further evaluated and results are presented in section 3.1.4. 42    Table 3.1. Number of putatively differentially expressed transcripts. Phenotypic Groups Comparisons Time (DPI) Ns HvHg vs. LvHg HvLg vs. LvLg HvHg vs. HvLg LvLg vs. LvHg 4 288 86 42 22 141 7 14 13 12 14 11 14 177 106 25 38 120 42 12 12 12 9 12 Numbers indicate differentially expressed genes per time and phenotypic groups’ comparisons. Ns= total number of transcripts being tested (i.e. with a significant interaction at the specific DPI being evaluated). 3.1.4. Weight gain and viral load effect on gene expression To declare a differentially expressed (DE) transcript we used an adjusted P-value where the null hypothesis was rejected if p ≤ 1/Ns, with Ns = total number of transcripts being tested (i.e. with a significant interaction at the specific DPI being evaluated), such that we expected one false positive per comparison. This led to a number of putatively DE transcripts per DPI and comparison, as shown in Table 3.1. At 4 and 14 DPI, we observed a similar number of DE transcripts, consisting of a large number of transcripts with significant interaction (288 and 177 respectively) mainly involving differential expression in HvHg vs. LvHg (86 and 106 transcripts) and in LvLg vs. LvHg (141 and 120 transcripts). However, at 7 and 14 DPI, the number of significant interactions is smaller (14 and 12) with most of the transcripts DE across all four contrasts. Although we reported number of DE at 42 DPI, we did not follow up on these results because response at that late sampling time-point could be due to a rebound of the disease (Boddicker et al., 2012). 43    3.2. Pathway analyses Subsequent to testing for changes in global gene expression, we assessed if immune related gene sets were enriched for DE genes across pigs from the four phenotypic groups. We performed pathway analyses to identify relevant molecular networks and biological functions associated with such networks enriched for the DE genes identified in section 3.1.4. We restricted the analyses to 4 and 7 DPI since 1) we were interested in early immune responses and 2) these were the times that provided the most power to detect future differential expression (as shown in section 3.4.). Following these pathway analyses, and to limit the interpretation of results to genes with large effects, in addition to the P-value threshold described in section 3.1.4, we considered an absolute fold-change (FC) threshold equal to, or greater than 1.5. For instance, in a specific contrast involving two groups, the gene expression level in the first phenotypic group had to be 50 % larger (or smaller) than the one in the second phenotypic group for a significantly DE gene to be considered. DE genes with a positive FC (larger than 1.5) were considered to be over-expressed, and DE genes with a negative FC (smaller than -1.5) were considered to be under-expressed in the first phenotypic group included in the contrast equation. 3.2.1. HvHg vs. LvHg At 4 DPI, pathway analyses identified 5 significant molecular networks. Significant functional categories identified in these networks were: Cell Death, Cell Morphology, Cellular Assembly and Organization, Cellular Function and Maintenance. Among DE genes in these networks and associated with these categories, c-mer proto-oncogene tyrosine kinase (MERTK) was under-expressed in this contrast. Ezrin (EZR) and moesin (MSN) were over-expressed and under-expressed in this contrast, respectively. Present in the top network was also Rho GTPase activating protein 35 (GRLF1) that was under-expressed. 44    At 7 DPI, pathway analyses identified one significant molecular network and similar to 4 DPI a significant functional category associated with the network was also Cell Death. Within this category DE genes involved in the initiation of apoptosis (PYD and CARD domain containing, PYCARD) and cytotoxicity of cytotoxic T cells (granzyme A, GZMA) were underexpressed in this contrast. Also present in this network and under-expressed in this contrast was epidermal growth factor receptor pathway substrate 15 (EPS15). 3.2.2. HvLg vs. LvLg At 4 DPI, pathway analysis identified three significant molecular networks. Significant functional categories identified in these networks were Organismal Development, and Cell Death. Among DE genes in these networks and associated with Cell Death, major histocompatibility complex, class II, DR beta 1 (HLA-DRB1/SLA-DRB1), involved in the cytotoxicity of Th1 cells, was under-expressed in this contrast. In the same network, jumonji, AT rich interactive domain (JARID2) was under-expressed in the contrast. In addition, present in the top network generated was integrin beta 7 (ITGB7) and RAS guanyl releasing protein 1 (RASGRP1) were both under-expressed in this contrast. At 7 DPI, pathway analyses results pointed at one significant molecular network. Functional categories associated with this network were: Genetic Disorder, Inflammatory Disease, and Cellular Compromise. Differentially expressed genes associated with these functions were PYCARD and GZMA, over-expressed in this contrast. 3.2.3. HvHg vs. HvLg At 4 DPI, pathway analysis identified one significant molecular network with 12 DE genes. However, no significant functional category was identified for genes in this network. 45    At 7 DPI there was also one significant network identified. The significant functional category identified for this network was Cell Death. DE genes in this category related to cell death of leukocytes (GZMA and PYCARD), and initiation of apoptosis (PYCARD) were underexpressed in this contrast. Another gene, NEDD8 activating enzyme E1 subunit 1 (NAE1), also related to initiation of apoptosis was over-expressed in the contrast. 3.2.4. LvHg vs. LvLg At 4 DPI, seven significant molecular networks were identified. However, no significant functional categories were identified for any of these networks. At 7 DPI, one significant molecular network was identified. The significant functional category identified for this network was Cell Morphology. Associated with this function, GZMA was DE and over-expressed in this contrast. 3.3. qPCR analysis 3.3.1. Verification of microarray findings Among a total of 96 comparisons for the 12 genes present in the microarray and selected for qPCR (12 genes * 4 phenotypic groups * 2 DPI), 26 significant comparisons (p ≤ 1/Ns, as defined in section 3.1.4.) were detected by the microarray. Significant comparisons occurred for nine genes: EPS15, EZR, GRLF1, GZMA, ITGB7, JARID2, MERTK, PYCARD, and RASGRP1. Each comparison corresponds to a significant test of differential expression at a specific DPI. Focusing on the direction and level of expression change of all comparisons, we evaluated the gene set correlations between microarray and qPCR measured FC. From a measurement error perspective, the overall trend in direction and amount of expression was validated at 4 DPI (r = 0.91, p ≤ 0.0007), but not at 7 DPI. As expected, correlation between measurements associated 46    with non-DE comparisons in the microarray and the qPCR were not significant (r = 0.15, p > 0.24). Based on microarray results, qPCR confirmed differential expression of two genes: JARID2 and ITGB7. JARID2 was confirmed in HvLg vs .LvLg at 4 DPI (p ≤ 0.03 and p ≤ 0.0001 for QPCR and microarray, respectively). ITGB7 was confirmed in LvHg vs. LvLg at 7 DPI (p ≤ 0.007 and p ≤ 0.004). A total of 70 non-DE comparisons were reported by the microarray analysis. This includes comparisons among the nine genes mentioned plus all comparisons for interleukin-1 alpha (IL1A); interleukin 8 (IL8); and interferon regulatory factor 1 (IRF1). Among these nonDE comparisons only two were detected DE with qPCR. These included IRF1 in HvLg vs. LvLg (p ≤ 0.007, FC ≤ -2.22) and LvHg vs. LvLg (p ≤ 0.005, FC ≤ -2.33) at 7 DPI. All these results together indicate that, although the rate of differential expression validation of individual genes is limited, the overall pattern of differential expression was confirmed for comparisons at 4 DPI. Consequently, enrichment analysis (networks and functions) identified at the earliest time-point are expected to be reproduced in future experiments. 3.3.2. Additional genes Three genes not present on the microarray were also tested using qPCR: interferon-alpha 1 (IFNA1), major histocompatibility complex, class II, DQ alpha 1 (HLA-DQA1/SLA-DQA1), and major histocompatibility complex, class II, DR alpha (HLA-DRA/SLA-DRA). SLA class II antigens are expressed in B and T cells, with numerous haplotypes identified throughout different pig populations, which led researchers to explore their association with disease responses (Lunney et al., 2009). IFNA1 encodes for an innate cytokine, and has been reported to be modulated by PRRSV (Mateu and Diaz, 2008;Kimman et al., 2009). Significance levels and FC for these genes in all comparisons are presented in Table 3.2. 47    At 4 DPI, IFNA1 was significantly DE (p ≤ 0.05) when comparing HvLg to LvLg and HvHg to HvLg. At 7 DPI, IFNA1 was significantly DE in all contrasts. SLA-DQA was DE in all contrasts except for HvHg vs. HvLg and SLA-DRA was DE in LvHg vs. LvLg Table 3.2. Test of immune gene expression for genes not present in the microarray platform. Viral Comparisons Growth Comparisons HvHg vs. LvHg HvLg vs. LvLg HvHg vs. HvLg LvHg vs. LvLg DPI Symbol P-value FC P-value FC P-value FC P-value FC IFNA 0.01 -6.67 0.02 5.45 0.05 -3.93 SLA-DQA 0.71 1.15 0.11 -1.92 0.71 1.15 0.11 -1.92 0.96 -1.03 0.62 -1.35 0.71 -1.25 0.42 -1.64 IFNA 0.04 3.25 0.02 -4.08 0.04 3.27 0.02 -4.05 SLA-DQA 0.04 1.32 0.01 -1.43 0.95 -1.01 0.00 -1.89 SLA-DRA 7 3.21 SLA-DRA 4 0.09 0.15 1.58 0.55 -1.20 0.40 -1.29 0.02 -2.45 Values in the table include significance levels and FC of genes; bolded values indicate significant comparisons and their FC   3.4. Microarray statistical power and sample size estimation In order to inform the design of future experiments, we computed the EDR per contrast at early DPI. The EDR is the multi-test equivalent to power, which is also called sensitivity (Steibel et al., 2009b). EDR should be computed at a specific nominal Type I error rate, α, and for a given sample size, conditioning on estimated effects from a previous experiment (Gadbury et al., 2004). We considered a future experiment that would include sampling time = 0 DPI plus two or three other times, selected among 4, 7, and 11 DPI. We assumed effect sizes estimated from this data, a fixed nominal error rate α = 0.01, and two sample sizes (n = 30 for three sampling time48    points or n = 20 for four sampling time-points). The sample allocation (sampling fewer time points with more biological replicates or vice versa) would require the same number of microarray slides (240) and would roughly have the same cost. For evaluating which sampling time-points have to be included in a future study, we set the threshold of inclusion to EDR > 80%. That is, the average probability of detecting an effect (assuming the effect is indeed present) to be larger than 0.8. When n = 30, two sampling time-points (other than 0 DPI) could be included. In that case, the best sampling combination would be at 4 and 7 DPI. This would provide adequate power (EDR > 80%) to detect weight gain and VL effects in all contrasts except for LvHg vs. LvLg. Changing the sampling scheme to include 11 DPI (with only 20 samples per phenotypic group), would not add to the purpose of having sufficient power in that contrast. Furthermore, sampling at 11 DPI would only result in one contrast (HvLg vs. LvLg) having EDR > 80% (Table 3.3)  49    Table 3.3. Expected discovery rate (EDR) comparing 20 and 30 biological replicates. Sampling time-points (DPI) Phenotypic Group Comparisons HvHg vs. LvHg HvLg vs. LvLg HvHg vs. HvLg LvHg vs LvLg Sample Size (n) 4 7 11 20 0.84 0.86 0.37 30 0.91 0.92 0.47 20 0.83 0.80 0.96 30 0.91 0.88 0.98 20 0.77 0.88 NA 30 0.86 0.94 0.45 20 0.55 0.42 0.62 30 0.63 0.51 0.70 All four contrasts were compared at each DPI and evaluated for future sampling with desirable power (> 80%) NA: not available. The algorithm could not reach a result 4. Discussion The first objective of this study was to assess global differential gene expression in weaned pigs showing variation in weight gain and blood viral load in response to PRRSV infection. To achieve this objective, four reaction groups (phenotypic groups) of pigs were evaluated. Our study is different from other PRRSV-response gene expression profiling experiments in three ways. First, we focused on modeling individual biological variation of gene expression. Given that a longer term objective is to find genes for diagnosis and prognosis of PRRSV infection, we were interested in characterizing the variance of expression at the 50    individual level. The resulting scope of inference is different from that obtained using pooled samples (Genini et al., 2008;Xiao et al., 2010a;Xiao et al., 2010b), since our experimental and inferential unit is the individual animal and not a pool of animals. Second, we used a blocked reference design (Steibel and Rosa, 2005). In contrast to common reference designs (Bates et al., 2008;Xiao et al., 2010a;Xiao et al., 2010b;Wysocki et al., 2012), our design allowed us to use 0 DPI samples as a reference, and still include 0 DPI in tests, which was instrumental in assessing baseline effects before infection. Additionally, because of the design used to accommodate single cDNA samples, this study has the flexibility of incorporating future data for differential expression analysis. Third, we report on whole-genome expression profiling of white blood cells from in-vivo infected pigs that complements existing results from studies using pulmonary alveolar macrophages (PAM), bronchial lymph node and lung (Petry et al., 2007;Bates et al., 2008;Genini et al., 2008;Lunney et al., 2010;Xiao et al., 2010a;Zhou et al., 2011;Wysocki et al., 2012). Furthermore, obtaining blood samples is simpler and less invasive than sampling other tissues, thus simplifying implementation of genomic diagnostics in pigs, including in-situ sampling at farms. We first assessed differential expression at 0 DPI between pigs allocated to different phenotypic groups, and observed effects that could not be solely attributed to experimental infection or random errors (Page et al., 2006). The individual baseline (0 DPI) differential expression assessment was only briefly reported before (Ait-Ali et al., 2011), and using 0 DPI correction has not been reported. This type of correction was not usually addressed either because the baseline samples were pooled (Genini et al., 2008;Xiao et al., 2010a;Xiao et al., 2010b) or because they were omitted from the expression experiment (Bates et al., 2008;Wysocki et al., 2012). 51    We tested the interaction effect between weight gain and viral load on global gene expression. From breeding and management perspectives, quantifying interaction effects at early time-points would allow making timely management and selection decisions. Consequently we concentrated on 4 to 14 DPI for further analyses. In addition, by considering early DPI we assured that the effect being evaluated was exclusively due to an initial infection stage and not to a rebound of the disease (Boddicker et al., 2012). Our results complement and extend those reported by Petry et al. (2007) and Bates et al. (2008), who tested the interaction between viral burden and genetic line as well as infection status on gene expression. Petry et al. (2005) reported that pigs from Nebraska Index Line, selected for improved reproductive traits, gained more weight than Hampshire x Duroc crossbred pigs after PRRSV infection. Therefore, the weight gain and blood viral level interaction effect we evaluated resembled the genetic line by viral burden interaction reported by Petry et al. (2007) and Bates et al. (2008). These authors reported seven genes with significant line by viral load interaction in lung or bronchial lymph node expression profiles. Querying expression levels for the same genes in our dataset, we found that DDX3Y (DEAD box proteins, ATP-dependent RNA helicase), a paralog of DDX3 reported by Bates et al. (2008), was also DE in blood cells. Significant differences in DDX3 expression occurred between low and high viral burden pigs in Nebraska Index Line, but not in Hampshire x Duroc line. Likewise, we observed significant differences (p ≤ 0.003) in DDX3Y expression in HvHg vs. LvHg, but not in HvLg vs. LvLg, at 14 DPI, although the direction of change was opposite in our results as compared to the Bates et al. (2008) experiment. The reason for this is unclear but it could be attributed to differences in tissues, genetic background, and/or functional differences between the paralogous genes. 52    Within the first objective of this study we also aimed to characterize gene networks and individual genes influencing PRRSV immune response in the four phenotypic groups considered. We further evaluated transcripts with expression subject to significant viral load by weight gain interaction effects to identify biological functions associated with relevant molecular networks. We focused on 4 and 7 DPI since these provide insights on early host anti-viral and innate immune response to PRRSV infection and they are candidate sampling time-points to be pursued in future studies. Pathway analysis revealed that cell death function was significantly associated with several networks enriched for DE genes at 4 and 7 DPI. Genes included in these networks and associated with cell death were MERTK, GZMA, and PYCARD. All these genes followed a general pattern of under-expression in high viral load compared to low viral load pigs (FC ≤ 1.5). An exception to this was HvLg vs. LvLg at 7 DPI. These overall results are consistent with those from Genini et al. (2008) that reported inhibition of apoptosis in cell lines 9 to 12 hours post infection and with results from Xiao et al. (2010b) comparing gene expression of pigs infected with a highly pathogenic strain of PRRSV compared to uninfected controls at 4 and 7 DPI. Cell death is a host defense mechanism to inhibit viral replication (Alcami and Koszinowski, 2000). Overall, the global gene expression profile showed a trend where HvLg and HvHg pigs had lower expression of the listed genes relative to LvLg and LvHg pigs, respectively, indicating that the defense mechanism mediated by cell death had reduced efficiency, thereby allowing increased viral replication. At 4 DPI, our study identified MERTK as DE and associated with cell death in HvHg vs. LvHg pigs. The product of MERTK is a phagocytic receptor that is involved in the clearance of apoptotic thymocytes. Mouse macrophages lacking MERTK showed a delayed clearance of apoptotic cells (Seitz et al., 2007). There have been no previous reports of this gene identified as DE in PRRSV response studies. 53    Key genes in the swine leukocyte antigens (SLA) complex have been well documented for their effects on production and immune traits in different pig populations (Lunney et al., 2009). At 7 DPI, we identified SLA-DRA significantly over-expressed in LvLg relative to LvHg. SLA-DQA1 followed the same trend, and in addition, it was significantly under-expressed in LvHg and HvLg relative to HvHg and LvLg, respectively. Global differential expression and functional analysis comparing PRRSV infected to uninfected pigs at the same sampling timepoints by Xiao et al. (2010a) reported that MHC class II antigens (SLA-DQA, SLA-DMB, SLADQB1 and SLA-DRA) were significantly induced in PRRSV infected lungs. Our study identified IFNA1 as being significantly DE in all contrasts but HvHg vs. LvHg at 4 DPI. Specifically, at both 4 and 7 DPI, IFNA1 was over-expressed in HvHg and LvLg relative to HvLg and LvHg pigs, respectively. This gene was reported as under-expressed in PRRSVinfected with respect to uninfected PAM at 30 (Ait-Ali et al., 2011) but not at 12 hours post infection (Genini et al., 2008). IFNA was reported under-expressed at 4 and 7 DPI in lung tissue of infected pigs relative to uninfected controls (Xiao et al., 2010a) but at 14 DPI, Lunney et al. (2010) reported no differences in expression in tracheobronchial lymph node for several innate markers (IFNA, IL1B and IL8). In addition, Petry et al. (2007) found that differences in expression of IFNA were influenced by pig genetic line. Overall, our findings of DE genes in whole blood are in agreement with previous reports on specific target tissues and cells, such as lung and PAM, following PRRSV infections. These results stress the usefulness of our study for sampling the more accessible blood to reveal the complexity of host responses to PRRSV infection. We expect that our planned, more detailed studies will generate further questions on the role of these and many other genes in anti-PRRSV responses. 54    Finally, our global differential expression results were used as pilot data to inform design of future time-course transcription profiling experiments. We evaluated different scenarios of sample sizes and sampling time-points for combinations given a fixed total sampling effort. We concluded the best scenario for future studies consists of sampling at 4 and 7 DPI using about 30 pigs per phenotypic group, and that a minimum of 20 pigs per group are needed for controlling Type I and Type II error rates to acceptable levels in most comparisons. The results obtained with a sample size n = 30 were consistent with previous results obtained from a dataset generated by Chen et al. (manuscript in preparation). Our group used the Wysocki et al. (2012) dataset of lung tissue expression at 14 DPI to evaluate statistical power of high versus low viral burden pigs, and affirmed that approximately the same sample size was needed. These results underscore the importance of computing sample size. We predict that this could be applied in a broader context, for instance, in next generation sequencing experiments. Such technology is being increasingly used for evaluating expression profiling in pigs infected with PRRSV (Xiao et al., 2010a;Xiao et al., 2010b). Even though we could expect less technical variation in expression measured with RNA-seq (Marioni et al., 2008), biological variation would remain unaffected. In such cases, the only way of increasing power of the tests would be increasing the number of biological samples (Steibel et al., 2009b). Evidence presented in this paper highlights the importance of thoughtful experimental design and accurate modeling. We acknowledge sample size is a key factor of every experiment and correct modeling of variation (biological and technical) is essential. As a result, this experiment provided information on actual sample sizes and sampling time-points needed for more precise estimation of effects of interest. Our preliminary results have already identified differential gene expression, molecular networks and biological functions affecting the four 55    phenotypic groups of pigs and the influence of PRRSV infection. Finally, due to the flexible experimental design utilized in this study, the resulting dataset can be merged with future data for increasingly powerful and precise inferences on response to PRRSV infection. Conflict of Interest Statement No competing interests. Acknowledgements This work was supported by USDA ARS funding and USDA NIFA grant #2010-65205-20433. The PRRS Host Genetics Consortium (PHGC) samples were supported through grants #07-233 and #09-244 from the U.S. National Pork Board. The authors acknowledge the careful RNA preparations performed by Amber Jean Tietgens at BARC 56    REFERENCES 57    References Ait-Ali, T., Wilson, A.D., Carre, W., Westcott, D.G., Frossard, J.P., Mellencamp, M.A., Mouzaki, D., Matika, O., Waddington, D., Drew, T.W., Bishop, S.C., and Archibald, A.L. (2011). Host inhibits replication of European porcine reproductive and respiratory syndrome virus in macrophages by altering differential regulation of type-I interferon transcriptional response. Immunogenetics 63, 437-448. Alcami, A., and Koszinowski, U.H. (2000). Viral mechanisms of immune evasion. Immunology Today 21, 447-455. Bates, J.S., Petry, D.B., Eudy, J., Bough, L., and Johnson, R.K. (2008). Differential expression in lung and bronchial lymph node of pigs with high and low responses to infection with porcine reproductive and respiratory syndrome virus. Journal of Animal Science 86, 3279-3289. Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series BMethodological 57, 289-300. Boddicker, N., Waide, E.H., Rowland, R.R.R., Lunney, J.K., Garrick, D.J., Reecy, J.M., and Dekkers, J.C.M. (2012). Evidence for a major QTL associated with host response to Porcine Reproductive and Respiratory Syndrome Virus challenge. Journal of Animal Science 90, 1733-1746. Clapperton, M., Bishop, S.C., and Glass, E.J. (2005). Innate immune traits differ between Meishan and Large White pigs. Veterinary Immunology and Immunopathology 104, 131144. Clapperton, M., Diack, A.B., Matika, O., Glass, E.J., Gladney, C.D., Mellencamp, M.A., Hoste, A., and Bishop, S.C. (2009). Traits associated with innate and adaptive immunity in pigs: heritability and associations with performance under different health status conditions. Genetics Selection Evolution 41. Collins, J.E., Benfield, D.A., Christianson, W.T., Harris, L., Hennings, J.C., Shaw, D.P., Goyal, S.M., Mccullough, S., Morrison, R.B., Joo, H.S., Gorcyca, D., and Chladek, D. (1992). Isolation of Swine Infertility and Respiratory Syndrome Virus (Isolate ATCC VR-2332) in North America and Experimental Reproduction of the Disease in Gnotobiotic Pigs. Journal of Veterinary Diagnostic Investigation 4, 117-126. Cui, X.G., Hwang, J.T.G., Qiu, J., Blades, N.J., and Churchill, G.A. (2005). Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6, 59-75. Dawson, H.D., Beshah, E., Nishi, S., Solano-Aguilar, G., Morimoto, M., Zhao, A.P., Madden, K.B., Ledbetter, T.K., Dubey, J.P., Shea-Donohue, T., Lunney, J.K., and Urban, J.F. (2005). Localized multigene expression patterns support an evolving Th1/Th2-like paradigm in response to infections with Toxoplasma gondii and Ascaris suum. Infection and Immunity 73, 1116-1128. 58    Doeschl-Wilson, A.B., Kyriazakis, I., Vincent, A., Rothschild, M.F., Thacker, E., and GalinaPantoja, L. (2009). Clinical and pathological responses of pigs from two genetically diverse commercial lines to porcine reproductive and respiratory syndrome virus infection. Journal of Animal Science 87, 1638-1647. Done, S.H., Paton, D.J., and White, M.E.C. (1996). Porcine reproductive and respiratory syndrome (PRRS): A review, with emphasis on pathological, virological and diagnostic aspects. British Veterinary Journal 152, 153-174. Gadbury, G.L., Page, G.P., Edwards, J.W., Kayo, T., Prolla, T.A., Weindruch, R., Permana, P.A., Mountz, J.D., and Allison, D.B. (2004). Power and sample size estimation in high dimensional biology. Statistical Methods in Medical Research 13, 325-338. Galina-Pantoja, L., Mellencamp, M.A., Bastiaansen, J., Cabrera, R., Solano-Aguilar, G., and Lunney, J.K. (2006). Relationship between immune cell phenotypes and pig growth in a commercial farm. Animal Biotechnology 17, 81-98. Genini, S., Delputte, P.L., Malinverni, R., Cecere, M., Stella, A., Nauwynck, H.J., and Giuffra, E. (2008). Genome-wide transcriptional response of primary alveolar macrophages following infection with porcine reproductive and respiratory syndrome virus. Journal of General Virology 89, 2550-2564. Holtkamp, D.J., Kliebenstein, J.B., Neumann, E.J., Zimmerman, J.J., Rotto, H., Yoder, T.K., Wang, C., Yeske, P., Mowrer, C., and Haley, C. (2012). Assessment of the economic impact of porcine reproductive and respiratory syndrome virus on United States pork producers. J. Swine Health Prod. (Manuscript in press). Kimman, T.G., Cornelissen, L.A., Moormann, R.J., Rebel, J.M.J., and Stockhofe-Zurwieden, N. (2009). Challenges for porcine reproductive and respiratory syndrome virus (PRRSV) vaccinology. Vaccine 27, 3704-3718. Lee, C., Bachand, A., Murtaugh, M.P., and Yoo, D. (2004a). Differential host cell gene expression regulated by the porcine reproductive and respiratory syndrome virus GP4 and GP5 glycoproteins. Veterinary Immunology and Immunopathology 102, 189-198. Lee, C., Rogan, D., Erickson, L., Zhang, J., and Yoo, D. (2004b). Characterization of the porcine reproductive and respiratory syndrome virus glycoprotein 5 (GP5) in stably expressing cells. Virus Research 104, 33-38. Lewis, C.R., Ait-Ali, T., Clapperton, M., Archibald, A.L., and Bishop, S. (2007). Genetic perspectives on host responses to porcine reproductive and respiratory syndrome (PRRS). Viral Immunology 20, 343-358. Lunney, J.K., and Chen, H. (2010). Genetic control of host resistance to porcine reproductive and respiratory syndrome virus (PRRSV) infection. Virus Research 154, 161-169. Lunney, J.K., Fritz, E.R., Reecy, J.M., Kuhar, D., Prucnal, E., Molina, R., Christopher-Hennings, J., Zimmerman, J., and Rowland, R.R.R. (2010). Interleukin-8, Interleukin-1 beta, and 59    Interferon-gamma Levels Are Linked to PRRS Virus Clearance. Viral Immunology 23, 127-134. Lunney, J.K., Ho, C.-S., Wysocki, M., and Smith, D.M. (2009). Molecular genetics of the swine major histocompatibility complex, the SLA complex. Developmental & Comparative Immunology 33, 362-374. Lunney, J.K., Steibel, J., Reecy, J.M., Fritz, E., Rothschild, M.F., Kerrigan, M., Trible, B., and Rowland, R.R.R. (2011). Probing genetic control of swine responses to PRRSV infection: current progress of the PRRS host genetics consortium. BMC Proceedings 5, S30. Mardis, E.R. (2008). The impact of next-generation sequencing technology on genetics. Trends in Genetics 24, 133-141. Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M., and Gilad, Y. (2008). RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research 18, 1509-1517. Mateu, E., and Diaz, I. (2008). The challenge of PRRS immunology. Veterinary Journal 177, 345-351. Miller, L.C., and Fox, J.M. (2004). Apoptosis and porcine reproductive and respiratory syndrome virus. Vet Immunol Immunopathol 102, 131-142. Page, G.P., Edwards, J.W., Gadbury, G.L., Yelisetti, P., Wang, J., Trivedi, P., and Allison, D.B. (2006). The PowerAtlas: a power and sample size atlas for microarray experimental design and research. Bmc Bioinformatics 7, 84. Page, G.P., George, V., Go, R.C., Page, P.Z., and Allison, D.B. (2003). “Are We There Yet?”: Deciding When One Has Demonstrated Specific Genetic Causation in Complex Diseases and Quantitative Traits. The American Journal of Human Genetics 73, 711-719. Petry, D.B., Holl, J.W., Weber, J.S., Doster, A.R., Osorio, F.A., and Johnson, R.K. (2005). Biological responses to porcine respiratory and reproductive syndrome virus in pigs of two genetic populations. Journal of Animal Science 83, 1494-1502. Petry, D.B., Lunney, J., Boyd, P., Kuhar, D., Blankenship, E., and Johnson, R.K. (2007). Differential immunity in pigs with high and low responses to porcine reproductive and respiratory syndrome virus infection. Journal of Animal Science 85, 2075-2092. R Development Core Team (2010). " R: A language and environment for statistical computing.". (Vienna, Austria: R Foundation for Statistical Computing). Ritchie, M.E., Silver, J., Oshlack, A., Holmes, M., Diyagama, D., Holloway, A., and Smyth, G.K. (2007). A comparison of background correction methods for two-colour microarrays. Bioinformatics 23, 2700-2707. 60    Rosa, G.J.M., Steibel, J.P., and Tempelman, R.J. (2005). Reassessing design and analysis of twocolour microarray experiments using mixed effects models. Comparative and Functional Genomics 6, 123-131. Rowland, R.R.R., Lawson, S., Rossow, K., and Benfield, D.A. (2003). Lymphoid tissue tropism of porcine reproductive and respiratory syndrome virus replication during persistent infection of pigs originally exposed to virus in utero. Veterinary Microbiology 96, 219235. Royaee, A.R., Husmann, R.J., Dawson, H.D., Calzada-Nova, G., Schnitzlein, W.M., Zuckermann, F.A., and Lunney, J.K. (2004). Deciphering the involvement of innate immune factors in the development of the host response to PRRSV vaccination. Veterinary Immunology and Immunopathology 102, 199-216. Seitz, H.M., Camenisch, T.D., Lemke, G., Earp, H.S., and Matsushima, G.K. (2007). Macrophages and dendritic cells use different Axl/Mertk/Tyro3 receptors in clearance of apoptotic cells. Journal of Immunology 178, 5635-5642. Smyth, G.K. (2005). "Limma: linear models for microarray data.," in Bioinformatics and Computational Biology Solutions using R and Bioconductor, eds. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry & W. Huber. (New York: Springer), 397-420. Steibel, J.P., Poletto, R., Coussens, P.M., and Rosa, G.J.M. (2009a). A powerful and flexible linear mixed model framework for the analysis of relative quantification RT-PCR data. Genomics 94, 146-152. Steibel, J.P., and Rosa, G.J.M. (2005). On Reference Designs For Microarray Experiments. Statistical Applications in Genetics and Molecular Biology, 1-19. Steibel, J.P., Rosa, G.J.M., and Tempelman, R.J. (2009b). Optimizing design of two-stage experiments for transcriptional profiling. Computational Statistics & Data Analysis 53, 1639-1649. Steibel, J.P., Wysocki, M., Lunney, J.K., Ramos, A.M., Hu, Z.L., Rothschild, M.F., and Ernst, C.W. (2009c). Assessment of the swine protein-annotated oligonucleotide microarray. Animal Genetics 40, 883-893. Storey, J.D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. The Annals of Statistics 31, 2013-2035. Vincent, A.L., Thacker, B.J., Halbur, P.G., Rothschild, M.F., and Thacker, E.L. (2006). An investigation of susceptibility to porcine reproductive and respiratory syndrome virus between two genetically diverse commercial lines of pigs. Journal of Animal Science 84, 49-57. Wattrang, E., Almqvist, M., Johansson, A., Fossum, C., Wallgren, P., Pielberg, G., Andersson, L., and Edfors-Lilja, I. (2005). Confirmation of QTL on porcine chromosomes 1 and 8 influencing leukocyte numbers, haematological parameters and leukocyte function. Animal Genetics 36, 337-345. 61    Wu, H., Kerr, M., Cui, X., and Churchill, G. (2003). "MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments The Analysis of Gene Expression Data," eds. G. Parmigiani, E. Garrett, R. Irizarry & S. Zeger. Springer London), 313-341. Wysocki, M., Chen, H., Steibel, J.P., Kuhar, D., Petry, D., Bates, J., Johnson, R., Ernst, C.W., and Lunney, J.K. (2012). Identifying putative candidate genes and pathways involved in immune responses to porcine reproductive and respiratory syndrome virus (PRRSV) infection. Animal Genetics 43, 328-332. Xiao, S.Q., Jia, J.Y., Mo, D.L., Wang, Q.W., Qin, L.M., He, Z.Y., Zhao, X.A., Huang, Y.K., Li, A.N., Yu, J.W., Niu, Y.N., Liu, X.H., and Chen, Y.S. (2010a). Understanding PRRSV Infection in Porcine Lung Based on Genome-Wide Transcriptome Response Identified by Deep Sequencing. Plos One 5. Xiao, S.Q., Mo, D.L., Wang, Q.W., Jia, J.Y., Qin, L.M., Yu, X.C., Niu, Y.N., Zhao, X.A., Liu, X.H., and Chen, Y.S. (2010b). Aberrant host immune response induced by highly virulent PRRSV identified by digital gene expression tag profiling. Bmc Genomics 11. Yang, H., and Churchill, G. (2007). Estimating p-values in small microarray experiments. Bioinformatics 23, 38-43. Yang, Y.H., Duboit, S., Luu, P., Lin, D.M., Peng, V., Ngai, J., and Speed, T.P. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 30, e15. Zhou, P., Zhai, S., Zhou, X., Lin, P., Jiang, T., Hu, X., Jiang, Y., Wu, B., Zhang, Q., Xu, X., Li, J., and Liu, B. (2011). Molecular Characterization of Transcriptome-wide Interactions between Highly Pathogenic Porcine Reproductive and Respiratory Syndrome Virus and Porcine Alveolar Macrophages in vivo. Int J Biol Sci 7, 947-959. 62    CHAPTER FOUR 63    1. Conclusion 1.1. Goals and contribution of this study Porcine Reproductive and Respiratory Syndrome (PRRS) has been affecting commercial populations of pigs in the US for more than 20 years (Done et al., 1996). Different studies on the economic impact of this disease in the pork industry reported million annual losses of approximately $ 560 in 2005 (Neumann et al., 2005) and $ 664 in 2011 (Holtkamp et al., 2012). In this regard, the PRRS Host Genetic Consortium (PHGC) was created with the objective of addressing the genetic control of the response to PRRS virus (PRRSV) infection (Lunney et al., 2011). In this thesis, the overall goal was to evaluate whole-genome expression profile of PRRSV-infected pigs to characterize their response to infection over the time. A specific goal was to use these data to inform the design of future time-course related experiments. In this regard, we evaluated whole-genome expression profile of PRRSV experimentally infected pigs from an infection trial of the PHGC. These pigs were experimentally infected with PRRSV. Genetic correlations between growth rate and disease traits (Doeschl-Wilson et al., 2009), as well as between growth rate and immune related traits (Clapperton et al., 2009) have been reported. In addition, associations of immune traits with growth rate have also been reported (Galina-Pantoja et al., 2006;Boddicker et al., 2012).Therefore we defined four phenotypic groups based on the pigs’ weight gain and viral load, and subsequently allocated the pigs into one of the four groups. We used the 20k 70-mer oligonucleotide Pigoligoarray to assess global differential gene expression in these pigs as a response to PRRSV infection. We tested differential gene expression at seven different days post infection (DPI), from 0 DPI to 42 DPI. To address differential gene expression using two colors microarrays several different designs have been assessed (Kerr and Churchill, 2001;Dobbin and Simon, 2002;Rosa et al., 64    2005;Tempelman, 2005). A linear mixed model approach most often underlies the analysis of data from these experiments (Wolfinger et al., 2001). Moreover, hypothesis testing in a mixed model analysis of microarray data could be performed with different statistical tests (Wolfinger et al., 2001;Cui and Churchill, 2003;Cui et al., 2005). In chapter one, we used plasmode datasets to select an optimal analysis framework. Specifically, we estimated Type I error rate of the alternative tests. Additionally, we assessed power of tests that controlled Type I error rate close to the nominal significance level. Among all possible combinations of different testing procedures, we identified three that control the Type I error rate close to the nominal level: tests based on a moderated F statistic and significance thresholds obtained by permutation, tests based on a classic F statistic and tabulated significance thresholds, and tests based on a classic F statistic and significance thresholds obtained by permutation. We showed that the random array effect model, as opposed to the fixed array effect model, increased power of all tests evaluated. Furthermore, we identified the most powerful procedure corresponding to tests based on a moderated F statistic and significance thresholds obtained by permutation. Therefore, all further analyses on microarray data were performed using a linear mixed model that considered array and biological sample as random effects and the hypothesis testing was performed with a moderated F statistic and significance thresholds obtained by permutation. In chapter two, we addressed global differential gene expression using a blocked reference design that allowed the modeling of individual biological variation in gene expression as well as assessing baseline effects before infection (at 0 DPI). We observed evidence of differential gene expression that could not be attributed to the experimental infection and therefore such effect was accounted for when evaluating differential gene expression at later 65    DPI. Interestingly, the use of a baseline differential expression correction has never been reported before. In addition to accounting for the baseline effect, modeling the individual biological variation in gene expression is particularly relevant in our case, since the longer term objective is to identify genes for prediction of resistance/susceptibility of individual pigs to PRRSV infection. We focused on characterizing the differential expression of genes with a significant interaction of weight gain and viral load at early DPI. From a disease management perspective, being able to identify these genes would facilitate selection decisions. By considering early DPI, we also assured that the effect being evaluated was exclusively due to an initial infection stage and not to a rebound of the disease (Boddicker et al., 2012). Subsequently, we identified relevant molecular gene networks and associated biological functions. Cell death function was significantly associated with several gene networks enriched for differentially expressed (DE) genes. We identified MERTK, GZMA and PYCARD as DE in these networks. All these genes followed and overall pattern of under expression in high viral load compared to low viral load pigs. Therefore, viral replication might be increased in high viral load pigs since the defense mechanism mediated by cell death was reduced in efficiency. Finally, we used this study as pilot data to inform the design of future time-course transcription profiling experiments. We evaluated different scenarios for sample sizes an sampling time-points, based on the objective of controlling costs while maximizing power. Therefore, the combinations of sample sizes and sampling time-points were chosen to produce a fixed number (240) of microarray slides. We concluded that the best scenario consists of sampling at 4 and 7 DPI using 30 pigs per phenotypic group. A minimum of 20 pigs per group 66    are needed for controlling Type I and Type II error rates to acceptable levels in most comparisons. 1.2. Future research directions We acknowledge sample size and accurate modeling of variance components are key factors in every experiment. Particularly, in this experiment we analyzed the data with a powerful statistical framework. In a future experiment, sample size will be enlarged to allocate 27 pigs per phenotypic group. Given the flexibility of the microarray design we are using, data from the 27 pigs can be easily merged with data from the 3 pigs per phenotypic group we have already analyzed to reach 30 pigs per phenotypic group. We showed that 30 pigs per phenotypic group is an adequate sample size to estimate differential gene expression in this type of experiment. Furthermore, we expect that a next experiment will have enough power to uncover new genes that are differentially expressed among phenotypic groups. This in turn, will translate into identifying new molecular gene networks and biological functions that are affecting the response to PRRSV infection in commercial pigs. The whole-blood genome profiling approach of this study is useful to identify genes whose expression is associated with the viremia and weight loss in pigs due to PRRSV infection. In the past there has been skepticism about using whole blood for differential expression studies as opposed to look at differential expression in specific cell types. We have shown that this approach is valuable and uncovers genes and networks previously reported when looking at differential expression in tissues that are target of the disease. The result is important because it encourages development of diagnostic tools using transcriptional profiling of whole blood. In the longer term, the diagnostic tools developed could be incorporated for breeding pigs with enhanced PRRS resistance. Ultimately, the results from 67    this thesis will allow the design of more accurate profiling experiments that will translate into a helpful tool for making decisions for improved PRRSV resistance. 68    REFERENCES 69    References Boddicker, N., Waide, E.H., Rowland, R.R.R., Lunney, J.K., Garrick, D.J., Reecy, J.M., and Dekkers, J.C.M. (2012). Evidence for a major QTL associated with host response to Porcine Reproductive and Respiratory Syndrome Virus challenge. Journal of Animal Science 90, 1733-1746. Clapperton, M., Diack, A.B., Matika, O., Glass, E.J., Gladney, C.D., Mellencamp, M.A., Hoste, A., and Bishop, S.C. (2009). Traits associated with innate and adaptive immunity in pigs: heritability and associations with performance under different health status conditions. Genetics Selection Evolution 41. Cui, X.G., Hwang, J.T.G., Qiu, J., Blades, N.J., and Churchill, G.A. (2005). Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6, 59-75. Cui, X.Q., and Churchill, G.A. (2003). Statistical tests for differential expression in cDNA microarray experiments. Genome Biology 4. Dobbin, K., and Simon, R. (2002). Comparison of microarray designs for class comparison and class discovery. Bioinformatics 18, 1438-1445. Doeschl-Wilson, A.B., Kyriazakis, I., Vincent, A., Rothschild, M.F., Thacker, E., and GalinaPantoja, L. (2009). Clinical and pathological responses of pigs from two genetically diverse commercial lines to porcine reproductive and respiratory syndrome virus infection. Journal of Animal Science 87, 1638-1647. Done, S.H., Paton, D.J., and White, M.E.C. (1996). Porcine reproductive and respiratory syndrome (PRRS): A review, with emphasis on pathological, virological and diagnostic aspects. British Veterinary Journal 152, 153-174. Galina-Pantoja, L., Mellencamp, M.A., Bastiaansen, J., Cabrera, R., Solano-Aguilar, G., and Lunney, J.K. (2006). Relationship between immune cell phenotypes and pig growth in a commercial farm. Animal Biotechnology 17, 81-98. Holtkamp, D.J., Kliebenstein, J.B., Neumann, E.J., Zimmerman, J.J., Rotto, H., Yoder, T.K., Wang, C., Yeske, P., Mowrer, C., and Haley, C. (2012). Assessment of the economic impact of porcine reproductive and respiratory syndrome virus on United States pork producers. J. Swine Health Prod. (Manuscript in press). 70    Kerr, K.M., and Churchill, G.A. (2001). Experimental design for gene expression microarrays. Biostatistics 2, 183-201. Lunney, J.K., Steibel, J., Reecy, J.M., Fritz, E., Rothschild, M.F., Kerrigan, M., Trible, B., and Rowland, R.R.R. (2011). Probing genetic control of swine responses to PRRSV infection: current progress of the PRRS host genetics consortium. BMC Proceedings 5, S30. Neumann, E.J., Kliebenstein, J.B., Johnson, C.D., Mabry, J.W., Bush, E.J., Seitzinger, A.H., Green, A.L., and Zimmerman, J.J. (2005). Assessment of the economic impact of porcine reproductive and respiratory syndrome on swine production in the United States. J Am Vet Med Assoc 227, 385-392. Rosa, G.J.M., Steibel, J.P., and Tempelman, R.J. (2005). Reassessing design and analysis of twocolour microarray experiments using mixed effects models. Comparative and Functional Genomics 6, 123-131. Tempelman, R.J. (2005). Assessing statistical precision, power, and robustness of alternative experimental designs for two color microarray platforms based on mixed effects models. Veterinary Immunology and Immunopathology 105, 175-186. Wolfinger, R.D., Gibson, G., Wolfinger, E.D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., and Paules, R.S. (2001). Assessing gene significance from cDNA microarray expression data via mixed models. Journal of Computational Biology 8, 625-637. 71    APPENDICES 72    1. Appendix A 1.1. Figure Legends Figure 2.1. Distribution of P-values from BLN and lung datasets. The P-values resulted from testing for differential expression with a moderated F statistic and significance thresholds obtained by permutation within a mixed effects model. A) histogram of P-values from BLN dataset, B) histogram of P-values from lung dataset, C) Uniform Q-Q plot of p-values for BLN dataset, and D) Uniform Q-Q plot of P-values for lung dataset. Red x=y line in C and D represents the expected p-values quantiles under the null distribution. Figure 2.2. Realized versus nominal Type I error rate in plasmode datasets. A) Mixed effects model, and B) Fixed effects model. Each line in the plots corresponds to P-values computed for a specific combination of test statistic and significance thresholds. Blue lines correspond to the classic F statistic and green lines to the moderated F statistic. Solid lines correspond to tabulated significance thresholds and dashed lines to significance thresholds obtained by permutation. Red x = y line represents values for which the realized Type I error rate equals the nominal Type I error rate. P-values from the classic F statistic and significance thresholds obtained by permutation lay under P-values from the moderated F statistic and significance thresholds obtained by permutation. Figure 2.3. Uniform Q-Q plot of p-values in plasmode datasets. A) Negative log transformed P-values from a moderated F statistic and permuted significance thresholds obtained under a mixed effects model, B) Negative log transformed P-values from a moderated F statistic and permuted significance thresholds obtained under a fixed effects model, C) Negative log transformed p-values from a classic F statistic and tabulated significance thresholds obtained 73    under a mixed effects model, D) Negative log transformed P-values for a classic F statistic and tabulated significance thresholds obtained under a fixed effects model. Each black line corresponds to the distribution in one of the 34 plasmode datasets. The green line corresponds to the overall distribution in all 34 plasmodes. The red x = y line corresponds to the expected negative log transformed quantiles under the null distribution. Figure 2.4. Proportion of rejected hypotheses in lung dataset. A) Mixed effects model, and B) Fixed effects model. Each line in the plots corresponds to P-values computed for a specific combination of test statistic and significance thresholds. Blue lines correspond to the classic F statistic and green lines to the moderated F statistic. Solid lines correspond to tabulated significance thresholds and dashed lines to significance thresholds obtained by permutation. Red x = y line represents values for which the proportion of rejected hypotheses equals the nominal level. Figure 2.5. Uniform Q-Q plot in lung dataset for tests that controlled the Type I error rate. Each line in the plots corresponds to negative log transformed P-values computed for a specific combination of test statistic, significance thresholds, and linear model. Blue lines correspond to a fixed effects model. Green lines correspond to a mixed effects model. Solid lines correspond to the classic F statistic and tabulated significance thresholds. Dashed lines correspond to the moderated F statistics and significance thresholds obtained by permutation. Red x = y line corresponds to the expected negative log transformed quantiles under the null distribution 74    Figure 2.1. Distribution of P-values from BLN and lung datasets. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this thesis. 75    Figure 2.2. Realized versus nominal Type I error rate in plasmode datasets. 76    Figure 2.3. Uniform Q-Q plot of P-values in plasmode datasets. 77    Figure 2.4. Proportion of rejected hypotheses in lung dataset 78    Figure 2.5. Uniform Q-Q plot in lung dataset for tests that controlled the Type I error rate 79    2. Appendix B 2.1. Figure Legends Figure 1. Scatterplot of weight gain versus viral load for all pigs in PHGC trial one. Each dot represents a pig. Color shadings indicate the four different phenotpypic groups (HvHg, HvLg, LvHg, and LvLg). Dark color indicates pigs that were classified into one of the groups. Light color indicates pigs that were not classified because they lay in the boundary of the groups. Circles indicate pigs that were selected for transcriptional profiling in this experiment. Figure 2. Histogram of P-values for the four contrasts of interest at 0 DPI. For each contrast of interest (HvHg vs. LvHg, HvLg vs. LvLg, HvHg vs. HvLg, and LvHg vs. LvLg), this figure shows the distribution of P-values at 0 DPI. For a condition of no differential expression the histograms should have a flat trend. Figure 3. Uniform Q-Q plot for gene expression of all contrasts across 4 to 42 DPI after correcting for 0 DPI estimated effect. This plot represents the quantiles of the empirical distribution of P-values versus the expected quantiles of uniformly distributed P-values (corresponding to the null hypothesis). The represented departure from the straight line y=x indicates an excess of small P-values as compared to the expectation under the null hypothesis, consistent with the alternative hypothesis of differential expression. 80    Figure 3.1. Scatterplot of weight gain versus viral load for all pigs in PHGC trial one 81    Figure 3.2. Histogram of P-values for the four contrasts of interest at 0 DPI 82    Figure 3.3. Uniform Q-Q plot for gene expression of all contrasts across 4 to 42 DPI after correcting for 0 DPI estimated effect 83    Supplementary Table 1. Primers and probes used to amplify genes evaluated with qPCR. Ensembl Accesion Number Primer Foward Primer Reverse Probe EPS15 ENSSSCT 000000042 84 AGCGAGGTTC AGGATCTTCA AG GGGCCTTCTGCTC ATCCA CCCAGAAACAGCAA GTACAGGAACTCCTT G EZR ENSSSCT 000000044 87 GCAGCGGCAG CTGATGA GTCGTTGTGGGTC CTCTTGTTC ACCAACGAGCTGTC CCAGGCCA GRLF1 ENSSSCT 000000034 46 CCCATACAAC ATGCAGATGG AT ACCTCCTTTAGGG CATGTAGCTT TGGAGGCACACAAA ATCAACGACCG GZMA ENSSSCG 000000169 02 GGAGCTCACT CGATAACCAA GAAA GCTTTAGAAGTTT AAGGTCACCCTCA T TCCTTATCCATGCTT TGACCAGGACACAC IFNA1 GQ415055 (GenBank) TCAGCTGCAA TGCCATCTG AGGGAGAGATTCT TGACCTGCCTCAGAC CCTCATTTGTG CCACAGCC IL1A ENSSSCG 000000080 90 CTGAAGAAGA GACGGTTGAG TTTAAA AAGTTGTATTTCA TGTTGCTCTGGAA CAGAAGAAGAAATC ATCAAGCCCAGATC AGC IL8 ENSSSCG 000000089 53 CCGTGTCAAC ATGACTTCCA A GCCTCACAGAGAG CTGCAGAA CTGTTGCCTTCTTGG CAGTTTTCCTGC IRF1 ENSSSCG 000000142 77 AATCCAGCCC TGATACCTTCT CT GGCCTGTTCAATG TCCAAGTC TGCCTGATGACCAC AGCAGCTACACA ITGB7 ENSSSCG 000000002 57 TCGCAGCCCA GAGTTTGACT GGCACTGGTGACA GCAAAGA TCAGGTAGCCCAGG CCCTCTCTGC JARID2 ENSSSCT 000000011 55 CCTTTCTCTGC CTTCGAGGTT CGTCCTGAGAGCT TCCGAAAT TCCTGCGCTGCCCAA CAGCA MERTK ENSG0000 0153208 GGAAAGATGG GAAGGAATTG C TCATCTTACAGAT ATATGACCCATTG TCT TTCAGCATAACCAGT GTGCAGCGTTCA Gene 84    Supplementary Table 1 (cont’d) PYCARD ENSG00 0001034 90 CAAACCAGCA CTGCACTTCGT CAGCCCGTCCACG TCTGT RASGRP 1 ENSSSC G000000 04791 GGAGAATAAA GAATCCCTCA TAAAATCA TTATTTCCTGTTCC CTCCGTCACCTCAGA AGCTCTTGGT CTCCCCACC RPL32 ENSSSC G000000 27637 GGAAAGATGG GAAGGAATTG C TCATCTTACAGAT ATATGACCCATTG TCT TTCAGCATAACCAGT GTGCAGCGTTCA SLADQA1 ENSSSC G000000 01456 GGTTCCTGAG GTGACTGTGT TT GACAGAGTGCCCG TTCTTCAA CTGGGTCAGCCCAA CACCCTCAT SLA-DRA ENSSSC G000000 01453 CCCGCCAGTG GTCAATGT AGTGGAACTTGCG GAAAAGG AGGAGTGTCAGAGA CAGTCTTCCTGCCC 85    CGGGCAGCCCTCAT CTCAAGGG