ANALYSIS OF HOST TRANSCRIPTOME RESPONSE TO PORCINE REPRODUCTIVE
AND RESPIRATORY SYNDROME VIRUS INFECTION
By
Maria Eugenia Arceo

A THESIS
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
MASTER OF SCIENCE
Animal Science
2012 
 

ABSTRACT
ANALYSIS OF HOST TRANSCRIPTOME RESPONSE TO PORCINE REPRODUCTIVE
AND RESPIRATORY SYNDROME VIRUS INFECTION
By
Maria Eugenia Arceo
Porcine Reproductive and Respiratory Syndrome (PRRS) has been affecting commercial
populations of pigs in the US for more than 20 years. We evaluated differences in gene
expression in pigs from the PRRS Host Genetics Consortium initiative showing a range of
responses to PRRS virus infection. Pigs were allocated into four phenotypic groups according to
their serum viral level and weight gain. We obtained RNA at several days post-infection and
hybridized it to the 20K 70 mer-oligonucleotide Pigoligoarray. We initially used plasmode
datasets to select an optimal procedure for analyzing these data. We showed that the random
array effects model with the moderated F statistic and significance thresholds obtained by
permutation provided the most powerful analysis procedure. We then addressed global
differential gene expression between phenotypic groups. We identified cell death as a biological
function significantly associated with several gene networks enriched for differentially expressed
genes. We found the genes interferon-alpha 1, major histocompatibility complex, class II, DR
alpha, and major histocompatibility complex, class II, DQ alpha 1 differentially expressed
between phenotypic groups. Finally, we used this study as pilot data to inform the design of
future time-course transcriptional profiling experiments. We concluded the best scenario for
investigation of early response to PRRSV infection consists of sampling at 4 and 7 days post
infection using approximately 30 pigs per phenotypic group.

 

To my family and those friends who became my extended family, their support and love have
made it possible to accomplish this thesis

iii 
 

AKNOWLEDGEMENTS
I owe a debt of gratitude to many people for having believed in me throughout this
journey.
First, I would like to acknowledge Dr. Cantet and Dr. Miquel for introducing me to
quantitative genetics/genomics and encouraging me to pursue a career in that field. Because they
believed in me, I am here today.
Second, I would like to thank my advisor Dr. Juan Steibel. I would never have
accomplished this thesis without his constant encouragement, guidance, trust and friendly
support.
I would like to express my gratitude to the members of my committee, Drs. Tempelman,
Lunney, Ernst, and Steibel for their useful suggestions throughout my graduate program.
Moreover, I would like to thank Drs. Steibel, Ernst and Lunney for their ideas to develop this
project and their contribution of knowledge during the experimental design and statistical
analysis. I would especially like to thank Dr. Tempelman for his academic advice which guided
me and that I feel will have an impact in my long term career. Finally, I deeply appreciate the
moral support Dr. Lunney kindly offered me.
I am particularly grateful for the help and contributions to my knowledge provided by my
fellow graduate students in statistical genetics. I would especially like to acknowledge Yvonne
Badke, for her assistance with computational resources.
I would like to express my deepest appreciation to my friends, the ones I met here and my
lifelong friends. They provided me with countless hours of unconditional support. Thank you
Belen, Irene, Yvonne, Leandro.

iv 
 

Finally, I would like to thank my family for their love and encouragement. Mom, Dad,
Mercedes, Tomas, Maggie and Pepe: you can only imagine how much your support meant to me.

v 
 

 

TABLE OF CONTENTS

LIST OF TABLES ....................................................................................................................... viii
LIST OF FIGURES ...................................................................................................................... ix
CHAPTER ONE ..............................................................................................................................1
1. Introduction ..........................................................................................................................2
References ............................................................................................................................5
CHAPTER TWO .............................................................................................................................9
I. Assessing optimal analysis models and statistical tests in a two color microarray experiment:
a case study involving Porcine Reproductive and Respiratory Syndrome Virus infected pigs
using plasmode datasets ...........................................................................................................10
Abstract ....................................................................................................................................10
Keywords .................................................................................................................................10
1. Introduction ........................................................................................................................11
2. . Materials and Methods .......................................................................................................14
2.1. Population, phenotypic groups and microarray design ...............................................14
2.2. Plasmode construction ................................................................................................14
2.3. Linear Model Analysis................................................................................................15
2.4. Test Statistics ..............................................................................................................16
2.5. Estimation of critical values........................................................................................16
2.6. Assessment of performance under the null hypothesis ...............................................17
2.7. Assessment of performance in a dataset with DE transcripts .....................................17
3. Results ................................................................................................................................18
3.1. Overall evidence of differential expression in lung and BLN microarray datasets ....18
3.2. Estimation of Type I error using plasmode datasets ...................................................19
3.3. Assessment of differential expression in lung dataset ................................................20
4. Discussion ..........................................................................................................................22
References ..........................................................................................................................26
CHAPTER THREE .......................................................................................................................29
II. Characterizing differential individual response to Porcine Reproductive and Respiratory
Syndrome Virus infection through statistical and functional analysis of gene expression......30
Abstract ....................................................................................................................................31
Keywords .................................................................................................................................32
1. Introduction ........................................................................................................................33
vi 
 

2. Materials and methods .......................................................................................................35
2.1. Animal model and study design ..................................................................................35
2.2. Phenotypic groups.......................................................................................................36
2.3. Microarray design and analysis ..................................................................................36
2.4. Pathway analysis .........................................................................................................38
2.5. qPCR design and analysis ...........................................................................................39
2.6. Statistical power and sample size computation ..........................................................40
3. Results ................................................................................................................................40
3.1. Microarray analysis .....................................................................................................40
3.1.1. Evidence of differential gene expression at 0 DPI.............................................41
3.1.2. Evidence of differential gene expression for remaining DPI.............................41
3.1.3. Weight gain and viral load interaction effect on gene expression .....................42
3.1.4. Weight gain and viral load effect on gene expression .......................................43
3.2. Pathway analysis .........................................................................................................44
3.2.1. HvHg vs. LvHg ..................................................................................................44
3.2.2. HvLg vs. LvLg ...................................................................................................45
3.2.3. HvHg vs. HvLg ..................................................................................................45
3.2.4. LvHg vs. LvLg ...................................................................................................46
3.3. qPCR analysis .............................................................................................................46
3.3.1. Verification of microarray findings ...................................................................46
3.3.2. Additional genes ................................................................................................47
3.4. Microarray statistical power and sample size estimation ...........................................48
4. Discussion ..........................................................................................................................50
Conflict of Interest Statement ............................................................................................56
Acknowledgements ............................................................................................................56
References ..........................................................................................................................57
CHAPTER FOUR ..........................................................................................................................63
1. Conclusion ..........................................................................................................................64
1.1. Goals and contribution of this study ...........................................................................64
1.2. Future Research Directions .........................................................................................67
References ..........................................................................................................................69
APPENDICES ...............................................................................................................................72
1. Appendix A ........................................................................................................................73
1.1. Figure Legends............................................................................................................73
2. Appendix B.........................................................................................................................80
2.1. Figure Legends............................................................................................................80

vii 
 

 

LIST OF TABLES
Table 2.1. Number of differentially expressed genes in a fixed versus mixed effects
model at different FDR thresholds .....................................................................................22
Table 3.1. Number of putatively differentially expressed transcripts ................................43
Table 3.2. Test of immune gene expression for genes not present in the microarray
platform ..............................................................................................................................48
Table 3.3. Expected discovery rate (EDR) comparing 20 and 30 biological replicates ....50
Supplementary Table 1 ......................................................................................................84

viii 
 

 

LIST OF FIGURES
Figure 2.1 Distribution of P-values from BLN and Lung datasets ....................................75
Figure 2.2. Realized versus nominal Type I error rate in plasmode datasets ....................76
Figure 2.3. Uniform Q-Q plot of P-values in plasmode datasets .......................................77
Figure 2.4. Proportion of rejected hypotheses in lung dataset ...........................................78
Figure 2.5. Uniform Q-Q plot in lung dataset for tests for tests that controlled the Type I
error rate .............................................................................................................................79
Figure 3.1. Scatterplot of weight gain versus viral load for all pigs in PHGC trial one ....81
Figure 3.2. Histogram of P-values for the four contrasts of interest at 0 DPI ...................82
Figure 3.3. Uniform Q-Q plot for gene expression of all contrasts across 4 to 42 DPI after
correcting for 0 DPI estimated effect .................................................................................83

ix 
 

CHAPTER ONE

1 
 

1. Introduction
Porcine Reproductive and Respiratory Syndrome (PRRS) was initially described in the
US over 20 years ago (Done et al., 1996). The disease causes $ 664 million annual losses to the
US pork industry (Holtkamp et al., 2012). Viral replication takes place in the host’s immune cells
(Rowland et al., 2003;Genini et al., 2008). Therefore, a possible way of controlling the economic
impact of PRRS is addressing the host genetic component. In this context, phenotypic variation
between breed-lines has been observed in disease-related and production traits of PRRS virus
experimentally infected pigs (Petry et al., 2005;Vincent et al., 2006;Doeschl-Wilson et al., 2009).
Moreover, the availability of whole genome microarrays (Steibel et al., 2009) and next
generation sequencing (Mardis, 2008) have favored whole genome expression profiling of
PRRSV infected animals, and global differential expression has been assessed in pigs showing
phenotypic variation to PRRSV infection responses (Lee et al., 2004a;Lee et al., 2004b;Miller
and Fox, 2004;Bates et al., 2008;Genini et al., 2008;Xiao et al., 2010a;Xiao et al., 2010b;Ait-Ali
et al., 2011;Zhou et al., 2011;Wysocki et al., 2012). While most previous studies focused on
comparing gene expression of PRRS virus in infected versus uninfected pigs, or gene expression
between animals showing differences in post-infection viral titers, little is known of the
interaction between viral load and weight gain as it relates to gene expression post-infection.
This is particularly important given the reported associations of immune traits with growth rate
(Galina-Pantoja et al., 2006;Boddicker et al., 2012) and the genetic correlations between growth
rate and disease traits (Doeschl-Wilson et al., 2009) as well as between growth rate and immune
related traits (Clapperton et al., 2009).
To address differential gene expression using microarrays, different designs have been
suggested for two color microarray experiments. In general, these experiments could include a

2 
 

reference sample (reference designs) or not (loop designs) (Kerr and Churchill, 2001;Dobbin and
Simon, 2002;Rosa et al., 2005;Tempelman, 2005). Among reference designs, blocked reference
designs have been shown to be more efficient than common reference designs and to allow
meaningful hypotheses testing using the reference samples (Steibel and Rosa, 2005). A two steps
(Wolfinger et al., 2001) linear mixed model approach most often underlies the analysis of these
experiments (Rosa et al., 2005). Initially, a general model that accounts for technical variation,
such as array and dye effects, is fitted to the data. Then, the residuals estimated from this model
are used to analyze each transcript specific expression, accounting for all appropriate sources of
variation (Wolfinger et al., 2001). Consequently, this is a flexible scheme that enables the fitting
of linear fixed and mixed models.
To test different hypotheses in microarray experiments several statistical methods have
been proposed (Cui and Churchill, 2003;Wolfinger et al., 2001). Among the most commonly
used is the classical F statistic or a modification of the F statistic (Cui et al., 2005). These two
methods differ on the gene specific estimated variance components they incorporate into the
tests. The modified F statistic uses shrunken (towards a common value) estimated variance
components (Cui et al., 2005). Shrinking variance components has been reported to enhance
power for detecting differential expression (Cui et al., 2005), since microarray experiments
usually involve a small number of biological samples use to test differential expression of a large
number of genes. However, moderated test statistics have non-standard distributions. To assess
significance of these tests, P-values cannot be obtained by comparison with a reference null
distribution. In this context, permutation analyses (Anderson and Ter Braak, 2003) have been
implemented to obtain the null distribution of moderated test statistics (Yang and Churchill,

3 
 

2007). Permutation analyses imply shuffling the data to simulate a null distribution and thereby
can be very computationally intensive (Anderson, 2001).
The goals of this study were to evaluate the response to infection in in-vivo PRRS virus
infected pigs in a time-course basis. In particular, we wanted to assess whole-genome gene
expression in blood of infected pigs at different times post-infection. Furthermore, we wanted to
identify significant biological functions and gene sets that characterize the response to PRRS
virus infection over time.
To attain for these goals, the first chapter of this thesis assesses the statistical framework
needed for more accurate and powerful evaluation of microarray data differential expression,
including model assessment and tests for hypothesis testing. Real data is used to create plasmode
datasets for evaluation of two different proposed linear models and tests statistics (Gadbury et
al., 2008;Vaughan et al., 2009).
The second chapter of this thesis addresses global differential gene expression in pigs
showing variation in their phenotypic response to PRRS virus experimental infection during a
time-course experiment. Therefore, we evaluated a microarray whole-genome expression profile
of pigs assigned to four reaction groups (phenotypic groups) according to the pigs’ weight gain
and blood viral load, as part of the PRRS Host Genetics Consortium (Lunney et al., 2011).
Phenotypic groups were specified as a combination of these two traits, as follows: 1) high viral
load-high weight gain (HvHg), 2) high viral load-low weight gain (HvLg), 3) low viral load-high
weight gain (LvHg) and 4) low viral load-low weight gain (LvLg). Individuals selected had the
most extreme (highest and/or lowest) observed values for both variables defining the phenotypic
groups.

4 
 

REFERENCES

5 
 

References
Ait-Ali, T., Wilson, A.D., Carre, W., Westcott, D.G., Frossard, J.P., Mellencamp, M.A.,
Mouzaki, D., Matika, O., Waddington, D., Drew, T.W., Bishop, S.C., and Archibald,
A.L. (2011). Host inhibits replication of European porcine reproductive and respiratory
syndrome virus in macrophages by altering differential regulation of type-I interferon
transcriptional response. Immunogenetics 63, 437-448.
Anderson, M.J. (2001). Permutation tests for univariate or multivariate analysis of variance and
regression. Canadian Journal of Fisheries and Aquatic Sciences 58, 626-639.
Anderson, M.J., and Ter Braak, C.J.F. (2003). Permutation tests for multi-factorial analysis of
variance. Journal of Statistical Computation and Simulation 73, 85-113.
Bates, J.S., Petry, D.B., Eudy, J., Bough, L., and Johnson, R.K. (2008). Differential expression in
lung and bronchial lymph node of pigs with high and low responses to infection with
porcine reproductive and respiratory syndrome virus. Journal of Animal Science 86,
3279-3289.
Boddicker, N., Waide, E.H., Rowland, R.R.R., Lunney, J.K., Garrick, D.J., Reecy, J.M., and
Dekkers, J.C.M. (2012). Evidence for a major QTL associated with host response to
Porcine Reproductive and Respiratory Syndrome Virus challenge. Journal of Animal
Science 90, 1733-1746.
Clapperton, M., Diack, A.B., Matika, O., Glass, E.J., Gladney, C.D., Mellencamp, M.A., Hoste,
A., and Bishop, S.C. (2009). Traits associated with innate and adaptive immunity in pigs:
heritability and associations with performance under different health status conditions.
Genetics Selection Evolution 41.
Cui, X.G., Hwang, J.T.G., Qiu, J., Blades, N.J., and Churchill, G.A. (2005). Improved statistical
tests for differential gene expression by shrinking variance components estimates.
Biostatistics 6, 59-75.
Cui, X.Q., and Churchill, G.A. (2003). Statistical tests for differential expression in cDNA
microarray experiments. Genome Biology 4.
Dobbin, K., and Simon, R. (2002). Comparison of microarray designs for class comparison and
class discovery. Bioinformatics 18, 1438-1445.
Doeschl-Wilson, A.B., Kyriazakis, I., Vincent, A., Rothschild, M.F., Thacker, E., and GalinaPantoja, L. (2009). Clinical and pathological responses of pigs from two genetically
diverse commercial lines to porcine reproductive and respiratory syndrome virus
infection. Journal of Animal Science 87, 1638-1647.
Done, S.H., Paton, D.J., and White, M.E.C. (1996). Porcine reproductive and respiratory
syndrome (PRRS): A review, with emphasis on pathological, virological and diagnostic
aspects. British Veterinary Journal 152, 153-174.

6 
 

Gadbury, G.L., Xiang, Q.F., Yang, L., Barnes, S., Page, G.P., and Allison, D.B. (2008).
Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public
Databases: An Illustration Using False Discovery Rates. Plos Genetics 4.
Galina-Pantoja, L., Mellencamp, M.A., Bastiaansen, J., Cabrera, R., Solano-Aguilar, G., and
Lunney, J.K. (2006). Relationship between immune cell phenotypes and pig growth in a
commercial farm. Animal Biotechnology 17, 81-98.
Genini, S., Delputte, P.L., Malinverni, R., Cecere, M., Stella, A., Nauwynck, H.J., and Giuffra,
E. (2008). Genome-wide transcriptional response of primary alveolar macrophages
following infection with porcine reproductive and respiratory syndrome virus. Journal of
General Virology 89, 2550-2564.
Holtkamp, D.J., Kliebenstein, J.B., Neumann, E.J., Zimmerman, J.J., Rotto, H., Yoder, T.K.,
Wang, C., Yeske, P., Mowrer, C., and Haley, C. (2012). Assessment of the economic
impact of porcine reproductive and respiratory syndrome virus on United States pork
producers. J. Swine Health Prod. (Manuscript in press).
Kerr, K.M., and Churchill, G.A. (2001). Experimental design for gene expression microarrays.
Biostatistics 2, 183-201.
Lee, C., Bachand, A., Murtaugh, M.P., and Yoo, D. (2004a). Differential host cell gene
expression regulated by the porcine reproductive and respiratory syndrome virus GP4 and
GP5 glycoproteins. Veterinary Immunology and Immunopathology 102, 189-198.
Lee, C., Rogan, D., Erickson, L., Zhang, J., and Yoo, D. (2004b). Characterization of the porcine
reproductive and respiratory syndrome virus glycoprotein 5 (GP5) in stably expressing
cells. Virus Research 104, 33-38.
Lunney, J.K., Steibel, J., Reecy, J.M., Fritz, E., Rothschild, M.F., Kerrigan, M., Trible, B., and
Rowland, R.R.R. (2011). Probing genetic control of swine responses to PRRSV infection:
current progress of the PRRS host genetics consortium. BMC Proceedings 5, S30.
Mardis, E.R. (2008). The impact of next-generation sequencing technology on genetics. Trends
in Genetics 24, 133-141.
Miller, L.C., and Fox, J.M. (2004). Apoptosis and porcine reproductive and respiratory syndrome
virus. Vet Immunol Immunopathol 102, 131-142.
Petry, D.B., Holl, J.W., Weber, J.S., Doster, A.R., Osorio, F.A., and Johnson, R.K. (2005).
Biological responses to porcine respiratory and reproductive syndrome virus in pigs of
two genetic populations. Journal of Animal Science 83, 1494-1502.
Rosa, G.J.M., Steibel, J.P., and Tempelman, R.J. (2005). Reassessing design and analysis of twocolour microarray experiments using mixed effects models. Comparative and Functional
Genomics 6, 123-131.
Rowland, R.R.R., Lawson, S., Rossow, K., and Benfield, D.A. (2003). Lymphoid tissue tropism
of porcine reproductive and respiratory syndrome virus replication during persistent
7 
 

infection of pigs originally exposed to virus in utero. Veterinary Microbiology 96, 219235.
Steibel, J.P., and Rosa, G.J.M. (2005). On reference designs for microarray experiments.
Statistical Applications in Genetics and Molecular Biology 4.
Steibel, J.P., Wysocki, M., Lunney, J.K., Ramos, A.M., Hu, Z.L., Rothschild, M.F., and Ernst,
C.W. (2009). Assessment of the swine protein-annotated oligonucleotide microarray.
Animal Genetics 40, 883-893.
Tempelman, R.J. (2005). Assessing statistical precision, power, and robustness of alternative
experimental designs for two color microarray platforms based on mixed effects models.
Veterinary Immunology and Immunopathology 105, 175-186.
Vaughan, L.K., Divers, J., Padilla, M.A., Redden, D.T., Tiwari, H.K., Pomp, D., and Allison,
D.B. (2009). The use of plasmodes as a supplement to simulations: A simple example
evaluating individual admixture estimation methodologies. Computational Statistics &
Data Analysis 53, 1755-1766.
Vincent, A.L., Thacker, B.J., Halbur, P.G., Rothschild, M.F., and Thacker, E.L. (2006). An
investigation of susceptibility to porcine reproductive and respiratory syndrome virus
between two genetically diverse commercial lines of pigs. Journal of Animal Science 84,
49-57.
Wolfinger, R.D., Gibson, G., Wolfinger, E.D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C.,
and Paules, R.S. (2001). Assessing gene significance from cDNA microarray expression
data via mixed models. Journal of Computational Biology 8, 625-637.
Wysocki, M., Chen, H., Steibel, J.P., Kuhar, D., Petry, D., Bates, J., Johnson, R., Ernst, C.W.,
and Lunney, J.K. (2012). Identifying putative candidate genes and pathways involved in
immune responses to porcine reproductive and respiratory syndrome virus (PRRSV)
infection. Animal Genetics 43, 328-332.
Xiao, S.Q., Jia, J.Y., Mo, D.L., Wang, Q.W., Qin, L.M., He, Z.Y., Zhao, X.A., Huang, Y.K., Li,
A.N., Yu, J.W., Niu, Y.N., Liu, X.H., and Chen, Y.S. (2010a). Understanding PRRSV
Infection in Porcine Lung Based on Genome-Wide Transcriptome Response Identified by
Deep Sequencing. Plos One 5.
Xiao, S.Q., Mo, D.L., Wang, Q.W., Jia, J.Y., Qin, L.M., Yu, X.C., Niu, Y.N., Zhao, X.A., Liu,
X.H., and Chen, Y.S. (2010b). Aberrant host immune response induced by highly
virulent PRRSV identified by digital gene expression tag profiling. Bmc Genomics 11.
Yang, H.U., and Churchill, G. (2007). Estimating p-values in small microarray experiments.
Bioinformatics 23, 38-43.
Zhou, P., Zhai, S., Zhou, X., Lin, P., Jiang, T., Hu, X., Jiang, Y., Wu, B., Zhang, Q., Xu, X., Li,
J., and Liu, B. (2011). Molecular Characterization of Transcriptome-wide Interactions
between Highly Pathogenic Porcine Reproductive and Respiratory Syndrome Virus and
Porcine Alveolar Macrophages in vivo. Int J Biol Sci 7, 947-959.
8 
 

CHAPTER TWO

9 
 

Assessing optimal analysis models and statistical tests in a two color microarray
experiment: a case study involving Porcine Reproductive and Respiratory Syndrome Virus
infected pigs using plasmode datasets.
Abstract
We addressed optimal statistical analysis frameworks for microarray data. In particular
we were interested in testing differential gene expression in pigs as a response to Porcine
Reproductive and Respiratory Syndrome Virus experimental infection. We identified an existing
dataset suitable to derive plasmodes conforming the null distribution expected under no
differential expression .We used these plasmode datasets to evaluate analysis models and test
statistics. We had eight alternative analyses from the factorial combination of two models (fixed
vs. random array effect), two test statistics (classic F vs. moderated F), and two ways of
computing significance thresholds (with tabulated F distribution vs. with permuted distribution).
Specifically, we assessed Type I error rate of the alternative analysis models. We then assessed
power of the tests that controlled the Type I error rate at or close to the nominal level.
Keywords
Porcine Reproductive and Respiratory Syndrome virus, microarray, plasmode datasets,
optimal statistical analysis

10 
 

1. Introduction
The choice of appropriate mixed model analysis for microarray data has proven to be
challenging (Rosa et al., 2005). To address differential gene expression using two-color
microarrays several different designs have been suggested. In general, these experiments may
include a reference sample, such as reference designs, or they may not, such as loop designs
(Kerr and Churchill, 2001;Dobbin and Simon, 2002;Rosa et al., 2005;Tempelman, 2005). A
linear mixed model approach most often underlies the analysis of data collected from these
experimental designs (Rosa et al., 2005). The mixed model analysis approach involves a twostep analysis of variance (Wolfinger et al., 2001). Initially, a general model that accounts for
technical variation implicit in the microarray experiment and that may affect the estimate of
expression level for the genes, such as array and dye effects, is fitted to the data. Second, the
residuals from the first model (now considered as normalized expression values) are used to
analyze the expression of each gene separately, accounting for all pertinent sources of variation
present in the experiment (Wolfinger et al., 2001).There may be multiple sources of variation in
a microarray experiment, and an important distinction has to be made between technical and
biological replication (Churchill, 2002;Rosa et al., 2005). Technical replication pertains to
multiple subsamples obtained from the same individual sample, whereas biological replication
pertains to multiple subjects being sampled (Cui and Churchill, 2003;Rosa et al., 2005). Ideally,
to evaluate different treatment conditions in any given biological population, biological
replication should be present (Churchill, 2002). Technical replication would further allow
accounting for variation introduced by the protocol implemented for processing the samples in
the laboratory (Churchill, 2002;Cui and Churchill, 2003). Linear models can account for most
sources of variation, fitting variance components to represent the natural variability at several
levels.
11 
 

A possible way of increasing the efficiency of single gene mixed model analysis is by
treating the array effect as a Gaussian distributed random effect (Kerr and Churchill, 2001;Kerr,
2003b) when analyzing log-intensity data. This approach was shown to be more efficient than
treating array as a fixed effect for several designs where more than two groups are being
compared (Steibel, 2007). To estimate variance components in a mixed model analysis, the
Restricted Maximun Likelihood (REML) algorithm is typically used.
To test hypotheses in a mixed model analysis of microarray data several statistical tests
have been proposed (Cui and Churchill, 2003). A commonly used test is based on the classical F
statistic (Wolfinger et al., 2001;Cui and Churchill, 2003). The classical F statistic is based on
transcript specific estimated variance components (Cui and Churchill, 2003). An alternative test
is a moderated F statistic that borrows information across all transcripts in the microarray and
shrinks the estimated variance components towards a common value (Cui et al., 2005).
Microarray experiments usually involve a small number of biological samples available to test
differential expression of a large number of genes. When samples sizes are small variance
components are estimated with less reliability (Cui and Churchill, 2003) which may lead to many
false positive decisions (rejecting a null hypothesis of no differential expression when it is
actually true). A way to avoid this is by shrinking estimated variance components towards a
common value. Shrinkage of variance estimates has been reported to enhance power and to
reduce Type I error rate for detecting differential expression (Cui et al., 2005). However, the
moderated F statistic has a non-standard distribution under the null hypothesis. Consequently, to
assess significance of these tests, P-values cannot be obtained by comparison with a reference
null distribution. Hence, permutation analyses (Anderson and Ter Braak, 2003) have been
implemented to obtain the null distribution of these statistics (Yang and Churchill, 2007).
12 
 

Permutation analyses imply shuffling the observed data to simulate the distribution of the test
statistic under the null hypothesis (Anderson, 2001).
In summary, possible alternatives to enhance power of microarray analyses when sample
sizes are small are: 1) treating array effect as random t and 2) optimizing estimation of variance
components using shrinkage to borrow information across transcripts. Consequently, one can
choose to model array as a random or fixed effect, and to use a classic F statistic or a moderated
F statistic to assess statistical significance. Selecting a linear mixed model with a moderated F
statistic may result in a computationally very intensive analysis. This is because the REML
algorithm would have to iterate to get variance components estimates di novo within each
permuted dataset. The question remains on whether this is a helpful strategy or not.
Possible ways of predicting the performance of the different test statistics and linear
models are: analytical derivations, evaluation in simulated data or evaluation in plasmode
datasets (Mehta et al., 2004). Analytical derivations are often times non-tractable (Gadbury et al.,
2008) and thus simulations or plasmode datasets become more relevant. Simulated datasets are
generated based on parametric assumptions about their distributions, whereas plasmode datasets
are generated based on real data for which some true structure is known (Mehta et al.,
2004;Gadbury et al., 2008). Plasmode datasets have the advantage over simulations that they do
not rely on pre-specified model assumptions, making them a useful tool for evaluation of
statistical methods (Gadbury et al., 2008;Steibel et al., 2009a;Vaughan et al., 2009).
The goal of this study was to compare linear fixed versus mixed models, and classic F
versus a moderated F statistics as ways to optimize microarray data analysis for both specificity
and sensitivity. To accomplish this we generated plasmode datasets using an existing microarray

13 
 

experiment. The experiment from which the plasmode datasets were constructed involved pigs
experimentally infected with Porcine Reproductive and Respiratory Syndrome virus (PRRSV)
and showing phenotypic variation in response to the disease (Wysocki et al., 2012).
2. Materials and methods
2.1. Population, phenotypic groups and microarray design
The samples used in this study and the experimental design are described in detail in
Wysocki et al. (2012). Briefly, Hampshire-Duroc crossbred pigs infected with PRRSV have been
classified into two phenotypic groups: 1) low responders (L), with low viremia, greater weight
gain, few lung lesions and 2) high responders (H), high viremia, low/no weight gain, many lung
lesions. Lung and bronchial lymph node (BLN) RNA was obtained from 4 pigs of each
phenotypic group at 14 days post infection, and a microarray experiment was performed using
the Pigoligoarray (Steibel et al., 2009b) following a common reference design (16 arrays in
total). A pooled RNA sample isolated from several different tissues of uninfected animals was
used as the common reference sample.
2.2. Plasmode construction
To build the plasmode datasets we used the 8 arrays from BLN tissue, where no
differential expression was detected. For explanation see the Results section of this paper and
the original paper from Wysocki et al. (2012). We generated the plasmode datasets by randomly
partitioning the gene expression dataset from BLN tissue into two groups of four arrays each.
The non-reference sample in each array was randomly assigned a treatment label “H” or “L”
(phenotypic group). Reference sample label was kept throughout plasmode construction. There
were a total of 35 possible label arrangements on the eight available arrays, following:
14 
 

(n! / [(n/2)!]2) / 2
where n = 8 arrays was assumed to be split between two treatment groups. One of the
combinations had confounded dye and treatment effect. Thus, we did not incorporate that
combination into the final plasmode. Consequently, all results presented in this paper are based
on 34 plasmode datasets where no differential expression was expected, but where the natural
biological variation was conserved.
2.3. Linear Model Analysis
We considered linear fixed and linear mixed models. The basic linear model used to
analyze gene expression involved two steps. The first step accounted for overall effects across all
transcripts by fitting the following model:

ygijk = µ + Dj + Ak + DAjk + egijk
th

where ygijk is the log-intensity for the k array labeled with the
from phenotypic group i;

[1]

jth dye, corresponding to a pig

μ is the overall mean; Dj is the effect of jth dye, with j = 1,2; Ak is

th

th

th

the effect of the k array, with k = 1,…,8; DAjk is the interaction of the j array and k dye,
2

finally eijk is the residual with eijk~N(0,σe ).
In a second step a model that accounted for gene specific variation was fitted:

ˆ
e gijk = µg + Tgi + Dgj + Agk + εgijk

15 
 

[2]

where

ˆ
e gijk is the estimated residual from [1] for transcript g; µg is the gene specific overall

mean, with g = 1, …, 20736; Tgi is the effect of phenotypic group i in the expression of the g

th

th

transcript, with i = H, L and R (reference); Dgj is the effect of the j dye in the expression of the

gth transcript; Agk is the effect of the kth array in the expression of the gth transcript; finally
εgijk is the gene specific residual, with εgijk ~N(0,σε2).
th

The fixed effects model assumed Agk were the fixed effect of the k array, with k =
1,…,8. The mixed effects model assumed the array effect as random, with Agk

~ N(0, σga2).

We analyze the data using MAANOVA package (Wu et al., 2003) in R (R Development
Core Team, 2010).
2.4. Test Statistics
We evaluated two test statistics, the classic F test (Wu et al., 2003) and a moderated F test
(Cui et al., 2005). The classical F statistic is based on transcript specific variance components
whereas the moderated F statistic is based on shrunk estimates of the variance components.
2.5. Estimation of the critical values
Two different approaches were used for the estimation of significance thresholds. First,
critical values were obtained from a central F distribution. Second, we considered a permutation
approach where the critical values were obtained by shuffling the residuals of a subset of the
transcripts. This subset was obtained before the permutation analysis and corresponded to
transcripts which F statistic was smaller than an initial critical value of 0.9 (Yang and Churchill,
16 
 

2007). The number of permutations was 100 for the random array effect model, and 1000 for the
fixed array effect model. The resulting test statistics obtained for the permuted subset of
transcripts were then used to compute the P-values.
2.6. Assessment of performance under the null hypothesis
We had eight alternative analyses from the factorial combination of two models (fixed vs.
random array effect), two test statistics (classic F vs. moderated F), and two ways of computing
significance thresholds (using a tabulated F distribution vs. using a permuted distribution). To
assess the performance of these alternative tests under the null hypothesis we computed
empirical Type I error rates in the plasmode datasets. Defining the Type I error rate (α) as the
number of true null hypothesis that are rejected at a significance threshold, to estimate empirical
Type I error rates we counted the number of tests that were rejected at a particular P-value
threshold. We arbitrary evaluated nominal significance thresholds of 0.05, 0.01, 0.005, 0.001,
and 0.0001.
2.7. Assessment of performance in a dataset with DE transcripts
The performance of the statistical tests described in Section 2.6. was evaluated in the lung
dataset, where differential expression between phenotypic groups was assumed to be present.
Although we did not know which transcripts were differentially expressed in this experiment, we
used this dataset to count the proportion of rejected hypotheses obtained with the eight
alternative analyses. If two analyses methods provided the same Type I error (estimated from the
plasmode analyses), the procedure leading to more rejections in the lung dataset is likely to
provide more power.

17 
 

Finally, we used the false discovery rate (Storey, 2003) to assess significance (i.e. DE
transcripts) in the lung dataset. The false discovery rate (FDR), defined as the expected
proportion of false rejected hypotheses (Benjamini and Hochberg, 1995), is a multiple
comparison adjustment to control the Type I error rate.
3. Results
3.1. Overall evidence of differential expression in lung and BLN microarray datasets
Initially, a moderated F statistic and significance thresholds obtained by permutation
within a mixed effects model (eq. [1] and [2] with Agk normally distributed) was used to
address differential expression between phenotypic groups in both datasets. Figure 2.1 shows
histograms and Q-Q plots of P-values for each dataset. Histograms and Q-Q plots are used to
assess the distributional assumptions of an observed variable. The uniform distribution reflected
by a flat histogram of P-values in BLN dataset (Fig. 2.1A) indicated that there was no evidence
of differential expression between phenotypic groups. On the other hand, we observed evidence
of differential expression between phenotypic groups in the lung dataset reflected in a larger
frequency of small P-values (Fig. 2.1B). Similar conclusions can be made upon inspection of
uniform Q-Q plots. Uniform Q-Q plots were used to compare the observed quantiles of the
negative log transformed P-values versus the negative log transformed quantiles of a uniform
distribution, implicit under the null hypothesis of no differential expression (Fig. 2.1C and 2.1D,
red x = y line). Although the Q-Q plot for the BLN dataset (Fig. 2.1C) showed a slight departure
from the quantiles expected under the null hypothesis, this was likely due to chance alone.
Moreover, for an adjusted p-value threshold or FDR ≤ 20% no differentially expressed (DE)
transcripts were identified, as stated in Wysocki et al. (2012). On the other side, the Q-Q plot
18 
 

for the lung dataset (Fig. 2.1D) showed a clear departure from the expected quantiles for a
uniform distribution, evidencing the presence of differential gene expression (Wysocki et al.,
2012). Consequently, we used BLN data to generate plasmode datasets under the assumption of
no differential expression.
3.2. Estimation of Type I error using plasmode datasets
Tests computed with the classic F statistic and using tabulated significance thresholds
controlled the type I error rate close to nominal levels in all cases. This is shown in Fig. 2.2
where the realized Type I error rate for these tests (blue line) is very close to (for the fixed
effects model, Fig. 2.2B) or exactly equal to (for the mixed effects model, Fig. 2.2A) the nominal
Type I error rate (red x = y line).
Similarly, tests based on significance thresholds obtained by permutation controlled Type
I error rate at the nominal level for both distributional model assumptions (array as a fixed or
random effect) and statistics (classic and moderated F) considered. This is evidenced by the
dashed lines in Fig. 2.2A and 2.2B that overlaid with the red x = y line, indicating that the
realized Type I error rate was equal to the nominal Type I error rate.
Tests computed with the moderated F statistic and tabulated significance thresholds did
not control the Type I error rate at the nominal level in any case, and the realized type I error rate
was consistently smaller than the nominal Type I error rate indicating an overly conservative test
(Fig. 2.2A and 2.2B, green line).
Based on the above presented results, we focused on tests that accurately controlled the
Type I error rate close to the nominal level. That is, tests based on the classic F statistic and
tabulated significance thresholds as well as tests based on the moderated F statistic and permuted
19 
 

significance thresholds. We further explored the distribution of P-values for these tests (Fig. 2.3),
and we observed that the P-value distributions resembled the expected distributions under the
null hypothesis (green lines versus red lines in Fig. 2.3). However, we noticed plasmode to
plasmode variation in the P-value distribution from all these tests (black lines in the graphics)
such that an individual list of P-values from a particular experiment might contain more or less
false positives compared to the expected proportion at the nominal significance level.
3.3. Assessment of differential expression in lung dataset
Tests based on the moderated F statistic and significance thresholds obtained by
permutation had the largest proportion of rejected hypotheses (proportion of rejected hypothesis
= 0.091, Fig. 2.4A) when array was assumed as a random effect. When the same moderated F
statistic with significance thresholds obtained by permutation was used in a fixed array model,
the proportion of rejected hypotheses slightly decreased (proportion of rejected hypothesis =
0.087, Fig. 2.4B). Nevertheless, in both models (fixed or random array) these moderated F-tests
had the largest proportion of rejected hypotheses when compared to classical F-tests and
tabulated critical values.
Tests computed based on the classic F statistic assuming array as a random effect
returned similar results whether the null distribution was obtained by permutation or from a
tabulated F distribution. This is shown by the overlapping blue lines in Fig. 2.4A. However, for
the same tests computed in the model assuming array as a fixed effect, P-values obtained with
significance thresholds obtained by permutation returned a larger proportion of rejected tests
than P-values obtained with tabulated significance thresholds (blue lines in Fig. 2.4B).

20 
 

Among tests that controlled the Type I error rate close to the nominal level, we computed
the number of rejected hypotheses (i. e. the number of DE transcripts) at different FDR
thresholds (Table 2.1). The largest number of rejected hypotheses occurred for tests based on a
moderated F statistic and significance thresholds obtained by permutation in models assuming a
random array effect. In addition, the comparison of the fixed array effect versus the random array
effect models showed that considering array as a random effect enhanced power (larger number
of rejected hypotheses) at all FDR thresholds. That is, the number of rejected hypotheses ranged
between 3 and 21 for the fixed array effect model, whereas that number ranged between 5 and 69
for the random array effect model (depending on the FDR threshold considered).
Finally, we compared the quantiles of the observed distribution of P-values for tests
controlling Type I error rate close to the nominal level with the expected quantiles under the null
distribution (Fig 2.5). We observed that all tests had more extreme values than expected under
the null distribution, revealing that they detect differential expression. Regardless the
distributional assumptions on the array effect, P-values obtained based on the moderated F
statistic and significance thresholds obtained by permutation had the furthermost extreme
quantile values. Moreover, when array was assumed random, P-values associated with
moderated F statistic and significance thresholds obtained by permutation departed more from
the expected null distribution than their counterpart based on a fixed array effect model.

21 
 

Table 2.1. Number of differentially expressed genes in a fixed versus mixed effects model at
different FDR thresholds.
Fixed Effects model

Mixed Effects Model

False Discovery Rate (%)
Test

1

2.5

5

10

1

2.5

5

10

classic F statistic and tabulated
significance thresholds

0

1

3

5

0

2

3

7

classic F statistic and permuted
significance thresholds

2

2

3

3

2

2

3

7

moderated F statistic and tabulated
significance thresholds

0

0

0

0

0

0

0

0

moderated F statistic and permuted
significance thresholds

3

5

9

21

5

10

21

69

 

4. Discussion
In this study we aimed to identify optimal statistical analysis frameworks for microarray
data. In particular we were interested in testing differential gene expression in pigs as a response
to PRRSV experimental infection. Consequently, we used an existing dataset (Wysocki et al.,
2012) suitable to derive plasmodes (Gadbury et al., 2008;Vaughan et al., 2009) conforming the
null distribution expected under no differential expression. We worked with these plasmode
datasets to evaluate analysis models and test statisitcs. Specifically, we wanted to assess Type I
error rate of alternative analysis models. We then assessed power of the tests that controlled the
Type I error rate at or close to the nominal level.
Among all testing procedures and analysis scenarios, we identified three types of tests
that controlled the Type I error rate close to the nominal level. First, tests based on a moderated F
statistic and significance thresholds obtained by permutation. Second, tests based on a classic F
22 
 

statistic and tabulated significance thresholds. Third, tests based on a classic F statistic and
significance thresholds obtained by permutation. These results extended the results from Yang
and Churchill (2007) who reported that P-values estimated based on a moderated t-statistic using
permutation controlled Type I error rate close to the nominal level.
On the contrary, tests based on the moderated F statistic and tabulated significance
thresholds were too conservative. This was expected since the moderated F statistic did not have
a standard F distribution (Cui et al., 2005). Therefore, permutation analysis should be used to
obtain the P-values (Anderson and Ter Braak, 2003;Yang and Churchill, 2007).
When the array effect was assumed as fixed, tests based on the classic F statistic and
significance thresholds obtained by permutation better controlled the Type I error rate to the
nominal level than tests based on the same statistic and tabulated significance thresholds. In this
regard, P-values associated with tabulated significance thresholds were computed based on the
assumption of independent identically normally distributed residuals (Anderson, 2001).
However, the analyzed dataset may have not met this assumption resulting in P-values that did
not conform to the nominal F distribution. The P-values associated with significance thresholds
obtained by permutation were computed not making such assumptions (Anderson, 2001),
therefore resulting in an improved performance of these tests.
In this study we showed that treating array as a random effect increased power of all tests
evaluated. This is particularly evident in tests based on the moderated F statistic and significance
thresholds obtained by permutation, where for example at FDR < 10%, 69 tests were rejected
when random array effect was assumed, and only 21 tests were rejected when the array effect
was assumed fixed. Modeling the array as a random or fixed effect has been discussed previously

23 
 

(Kerr and Churchill, 2001;Wolfinger et al., 2001;Kerr, 2003a;b). Defining the relative efficiency
as the ratio of the variances for fixed versus random array analysis models of the difference in
means between two treatment groups, Kerr (2003a) showed that when array is assumed as
random in a common reference design and its variance tends to infinity the relative efficiency
equals 1. However, when the array variance tends to 0, the relative efficiency is larger than 1,
and in that case to assumed array as a random effect is more efficient for estimating treatment
differences. Moreover, Steibel (2007) proved that the relative efficiency of the fixed array versus
random array model reached a maximum as the array and biological variance became small
compared to the residual variances. Thus, treating array as a random effect allowed recovering of
inter-block (i.e. inter-array) information. This has been reported before (Kerr, 2003b;Steibel,
2007), but it has been shown that the amount of recovery of information and increase in power
depends on variance ratios. In this paper we showed that the practical implication of assuming
array as a random effect was that the number of rejected hypothesis increased from 21 to 69 for a
FDR < 10%.
As mentioned above, tests based on the moderated F statistic and significance thresholds
obtained by permutation exceeded in the number of rejected hypotheses all other tests. There,
shrinkage improved the estimation of variance components by borrowing information across all
transcripts in the microarray (Cui et al., 2005) therefore resulting in more powerful tests. This
was consistent with a previous report where P-values estimated using permutation and a
moderated t-statistic yielded larger number of positive genes than P-values estimated using
permutation and a regular t-statistic at all FDR thresholds evaluated (Yang and Churchill, 2007).
In summary, in this study we show that considering array as a random effect, and
addressing differential expression using tests based on the moderated F statistic and significance
24 
 

thresholds obtained by permutation was the most powerful analysis. However, this type of
analysis may be very computational demanding. An alternative would be to consider array to be
a fixed effect. While this model results in less powerful tests, for a fixed number of permutations,
the elapsed computational time is reduced by 87 to 90 fold, in the BLN and lung dataset
respectively. Another alternative would be to consider array as a random effect, and tests based
on the classic F statistic and tabulated significance thresholds. This would enable avoiding the
permutation analysis and decreasing the elapsed computational time by 48 fold. However the
power of the tests resulting from this approach can be expected to be much smaller.

25 
 

REFERENCES

26 
 

References
Anderson, M.J. (2001). Permutation tests for univariate or multivariate analysis of variance and
regression. Canadian Journal of Fisheries and Aquatic Sciences 58, 626-639.
Anderson, M.J., and Ter Braak, C.J.F. (2003). Permutation tests for multi-factorial analysis of
variance. Journal of Statistical Computation and Simulation 73, 85-113.
Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and
Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B
(Methodological) 57, 289-300.
Churchill, G.A. (2002). Fundamentals of experimental design for cDNA microarrays. Nat Genet
32 Suppl, 490-495.
Cui, X.G., Hwang, J.T.G., Qiu, J., Blades, N.J., and Churchill, G.A. (2005). Improved statistical
tests for differential gene expression by shrinking variance components estimates.
Biostatistics 6, 59-75.
Cui, X.Q., and Churchill, G.A. (2003). Statistical tests for differential expression in cDNA
microarray experiments. Genome Biology 4.
Dobbin, K., and Simon, R. (2002). Comparison of microarray designs for class comparison and
class discovery. Bioinformatics 18, 1438-1445.
Gadbury, G.L., Xiang, Q.F., Yang, L., Barnes, S., Page, G.P., and Allison, D.B. (2008).
Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public
Databases: An Illustration Using False Discovery Rates. Plos Genetics 4.
Kerr, K.M., and Churchill, G.A. (2001). Experimental design for gene expression microarrays.
Biostatistics 2, 183-201.
Kerr, M.K. (2003a). Design Considerations for Efficient and Effective Microarray Studies.
Biometrics 59, 822-828.
Kerr, M.K. (2003b). Linear Models for Microarray Data Analysis: Hidden Similarities and
Differences. Journal of Computational Biology 10, 891-901.
Mehta, T., Tanik, M., and Allison, D.B. (2004). Towards sound epistemological foundations of
statistical methods for high-dimensional biology. Nature Genetics 36, 943-947.
R Development Core Team (2010). " R: A language and environment for statistical computing.".
(Vienna, Austria: R Foundation for Statistical Computing).
Rosa, G.J.M., Steibel, J.P., and Tempelman, R.J. (2005). Reassessing design and analysis of twocolour microarray experiments using mixed effects models. Comparative and Functional
Genomics 6, 123-131.
Steibel, J.P. (2007). Improving experimental design and statistical inference for transcriptional
profiling experiments. Ph.D. 3264239, Michigan State University.
27 
 

Steibel, J.P., Poletto, R., Coussens, P.M., and Rosa, G.J.M. (2009a). A powerful and flexible
linear mixed model framework for the analysis of relative quantification RT-PCR data.
Genomics 94, 146-152.
Steibel, J.P., Wysocki, M., Lunney, J.K., Ramos, A.M., Hu, Z.L., Rothschild, M.F., and Ernst,
C.W. (2009b). Assessment of the swine protein-annotated oligonucleotide microarray.
Animal Genetics 40, 883-893.
Storey, J.D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value.
Annals of Statistics 31, 2013-2035.
Tempelman, R.J. (2005). Assessing statistical precision, power, and robustness of alternative
experimental designs for two color microarray platforms based on mixed effects models.
Veterinary Immunology and Immunopathology 105, 175-186.
Vaughan, L.K., Divers, J., Padilla, M.A., Redden, D.T., Tiwari, H.K., Pomp, D., and Allison,
D.B. (2009). The use of plasmodes as a supplement to simulations: A simple example
evaluating individual admixture estimation methodologies. Computational Statistics &
Data Analysis 53, 1755-1766.
Wolfinger, R.D., Gibson, G., Wolfinger, E.D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C.,
and Paules, R.S. (2001). Assessing gene significance from cDNA microarray expression
data via mixed models. Journal of Computational Biology 8, 625-637.
Wu, H., Kerr, M., Cui, X., and Churchill, G. (2003). "MAANOVA: A Software Package for the
Analysis of Spotted cDNA Microarray Experiments
The Analysis of Gene Expression Data," eds. G. Parmigiani, E. Garrett, R. Irizarry & S. Zeger.
Springer London), 313-341.
Wysocki, M., Chen, H., Steibel, J.P., Kuhar, D., Petry, D., Bates, J., Johnson, R., Ernst, C.W.,
and Lunney, J.K. (2012). Identifying putative candidate genes and pathways involved in
immune responses to porcine reproductive and respiratory syndrome virus (PRRSV)
infection. Animal Genetics 43, 328-332.
Yang, H.U., and Churchill, G. (2007). Estimating p-values in small microarray experiments.
Bioinformatics 23, 38-43.

28 
 

CHAPTER THREE
(Submitted to Frontiers in Livestock Genomics, undergoing review process)

29 
 

Characterizing differential individual response to Porcine Reproductive and Respiratory
Syndrome Virus infection through statistical and functional analysis of gene expression
Maria E. Arceo1, Catherine W. Ernst1, Joan K. Lunney2, Igseo Choi2, Nancy E. Raney1, Tinghua
Huang3, Christopher K. Tuggle3, R.R.R. Rowland4, Juan P. Steibel1,5,*
1

Department of Animal Science, Michigan State University, East Lansing, MI, USA

2

Animal Parasitic Diseases Laboratory, Agriculture Research Service, United States Department

of Agriculture, Beltsville, MD, USA
3

Department of Animal Science, Iowa State University, Ames, IA, USA

4

Department of Diagnostic Medicine and Pathobiology, Kansas State University, Manhattan, KS,

USA
5

Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, USA

Correspondence:
Dr. Juan P. Steibel
Michigan State University
Department of Animal Science and Department of Fisheries and Wildlife
1205 Anthony Hall,
East Lansing, MI, 48824,USA

30 
 

steibelj@msu.edu

Abstract
We evaluated differences in gene expression in pigs from the Porcine Reproductive and
Respiratory Syndrome (PRRS) Host Genetics Consortium initiative showing a range of
responses to PRRS virus infection. Pigs were allocated into four phenotypic groups according to
their serum viral level and weight gain. RNA obtained from blood at 0, 4, 7, 11, 14, 28, and 42
days post infection (DPI) was hybridized to the 70-mer 20K Pigoligoarray. We used a blocked
reference design for the microarray experiment. This allowed us to account for individual
biological variation in gene expression, and to assess baseline effects before infection (0
DPI).Additionally, this design has the flexibility of incorporating future data for differential
expression analysis. We focused on evaluating transcripts showing significant interaction of
weight gain and serum viral level. We identified 491 significant comparisons (FDR ≤ 10%)
across all DPI and phenotypic groups. We corroborated the overall trend in direction and level of
expression (measured as fold change) at four DPI using qPCR (r = 0.91, p ≤ 0.0007). At 4 and 7
DPI, network and functional analyses were performed to assess if immune related gene sets were
enriched for genes differentially expressed across four phenotypic groups. We identified cell
death function as being significantly associated (FDR ≤ 5%) with several networks enriched for
differentially expressed transcripts. We found the genes interferon-alpha 1(IFNA1), major
histocompatibility complex, class II, DQ alpha 1 (SLA-DQA1), and major histocompatibility
complex, class II, DR alpha (SLA-DRA) to be differentially expressed (p ≤ 0.05) between

31 
 

phenotypic groups. Finally, we performed a power analysis to estimate sample size and sampling
time-points for future experiments. We concluded the best scenario for investigation of early
response to PRRSV infection consists of sampling at 4 and 7 DPI using about 30 pigs per
phenotypic group.
Keywords: porcine reproductive and respiratory syndrome, microarray, quantitative PCR,
functional analysis, power analysis.

32 
 

1. Introduction
Porcine Reproductive and Respiratory Syndrome (PRRS) was initially described in the
US over 20 years ago (Done et al., 1996) and a virus, now known as Porcine Reproductive and
Respiratory Syndrome Virus (PRRSV), was identified as the primary causative agent (Collins et
al., 1992). Overall, the disease causes $ 664 million annual losses to the US pork industry
(Holtkamp et al., 2012).
Viral replication takes place in the host’s immune cells (Rowland et al., 2003;Genini et
al., 2008) thereby, reducing the cytokine-mediated inflammatory response (Kimman et al.,
2009). In this context, a possible way of controlling PRRS is addressing the host genetic
component. Host genetic response to infection, in particular, phenotype-genotype associations in
immune related traits can be evaluated using currently available genomic tools (Lewis et al.,
2007;Lunney and Chen, 2010). While the molecular pathways involved in the protection against
PRRS have not yet been entirely elucidated (Kimman et al., 2009) genotype and immune traits
associations have been documented (Clapperton et al., 2005;Wattrang et al., 2005). Furthermore,
phenotypic variation between breed-lines has been observed in disease-related and production
traits of experimentally infected pigs (Petry et al., 2005;Vincent et al., 2006;Doeschl-Wilson et
al., 2009). These authors reported differences in clinical symptoms and lung pathology in
response to PRRSV infection, as well as in virus levels in serum and/or respiratory tissues, such
as lung and bronchial lymph nodes. Doeschl-Wilson et al. (2009) and Petry et al. (2005) also
reported differential body weight changes in PRRSV infected pigs.

33 
 

Studying gene expression in pigs showing phenotypic variation to PRRSV infection
responses will enhance our knowledge of genetic control of the susceptibility to this disease. In
this context, differential expression of a reduced number of immune related genes has been
evaluated (Petry et al., 2007;Lunney et al., 2010) and global differential expression has been
assessed in vivo (Bates et al., 2008;Xiao et al., 2010a;Xiao et al., 2010b;Zhou et al.,
2011;Wysocki et al., 2012) and in vitro (Lee et al., 2004a;Lee et al., 2004b;Miller and Fox,
2004;Genini et al., 2008;Ait-Ali et al., 2011).
Most previous studies focused on comparing gene expression of PRRSV-infected and
uninfected pigs, as well as gene expression between animals showing differences in postinfection viral titers. However, little is known of the interaction between viral load and weight
gain as it relates to gene expression post-infection. This is particularly important given the
reported associations of immune traits with growth rate (Galina-Pantoja et al., 2006;Boddicker et
al., 2012) and the genetic correlations between growth rate and disease traits (Doeschl-Wilson et
al., 2009) as well as between growth rate and immune related traits (Clapperton et al., 2009).
The availability of whole genome microarrays (Steibel et al., 2009c) and next generation
sequencing (Mardis, 2008) have further favored whole genome expression profiling of PRRSV
infected animals (Xiao et al., 2010a;Xiao et al., 2010b). Important features when evaluating gene
expression are: 1) the correct modeling of the phenotypic variation and the inclusion of
biological replication (Rosa et al., 2005) and 2) sampling relevant tissues and time-points (Mateu
and Diaz, 2008;Lunney et al., 2010).
We evaluated whole-genome expression profile of pigs assigned to 4 reaction groups
(phenotypic groups) according to the pigs’ weight gain and blood viral load as part of the PRRS

34 
 

Host Genetics Consortium (PHGC) (Lunney et al., 2011). The goals of this study were: 1) to
assess global differential gene expression in commercial pigs showing variation in phenotypic
response to PRRSV experimental infection, and to identify relevant molecular networks and
biological functions enriched for differentially expressed genes involved in the pig’s immune
response to PRRSV infection; and 2) to inform the design of future experiments, to determine the
most informative early time-points and sample sizes required for powerful inferences when
assessing gene expression in blood of commercial pigs experimentally infected with PRRSV.
2. Materials and methods
2.1. Animal model and study design
Crossbred commercial pigs (~200) from PHGC trial one (Lunney et al., 2011) were
transported to the Kansas State University bio-secure testing facility at weaning (11 to 21 d. old)
and allocated to pens (10 to 15 pigs/pen). Pigs came from PRRSV-, Influenza virus- and
Mycoplasma hyopneumoniae-free farms. After a 7-day acclimation period and antibiotic
treatments, pigs were both intramuscularly and intranasally infected with a known isolate of
PRRSV (105 tissue culture infectious dose50 of NVSL 97-7985). Blood samples were collected in
Tempus™ Blood RNA Tubes (Life Technologies, Carlsbad, CA) at 0, 4, 7, 11, 14, 19, 28 and 42
days post-infection (DPI). Individual animal weight was measured at weekly intervals. Serum
viral level was quantified using a semi-quantitative TaqMan PCR assay. More details on the pig
resources, study design and data storage are in Lunney et al. (2011) and Boddicker et al. (2012).
The study was approved by Kansas State University Institutional Animal Care and Use
Committee.

35 
 

2.2. Phenotypic groups
Figure 3.1 shows a scatterplot of weight gain versus viral load (VL) for all pigs from
PHGC trial one. Four phenotypic groups were defined according to the pigs’ weight gain and
VL. Weight gain was defined as body weight (kg) from 0 to 42 DPI; VL was defined as the area
under the curve of the log-transformed serum viral level from 0 to 21 DPI. These two variables
showed moderate negative correlation (r = -0.29). Thus, bivariate data of VL and weight gain
were centered at their mean values and rotated to obtain uncorrelated measures. Phenotypic
groups were then specified as a combination of these two traits: 1) high viral load-high weight
gain (HvHg), 2) high viral load-low weight gain (HvLg), 3) low viral load-high weight gain
(LvHg) and 4) low viral load-low weight gain (LvLg). For allocation to these four groups, pigs
that were within one standard deviation of the population mean for either of the traits were
discarded and the remaining animals were classified to one of the groups (Fig. 3.1).
2.3. Microarray design and analysis
Three pigs per group were randomly selected and their RNA isolated using the TempusTM
Spin RNA Isolation Kit as per manufacturer’s instructions (Applied Biosystems/Life
Technologies) from blood at 0, 4, 7, 11, 14, 28, and 42 DPI. RNA samples were reverse
transcribed using the Amino Allyl MessageAmp II aRNA Amplification Kit (Ambion./Life
Technologies), labeled with N-hydroxysuccinate (NHS) ester Cy3 or Cy5 dyes (GE Healthcare,
CA), and hybridized to the 20K 70-mer oligonucleotide Pigoligoarray as previously described
(Steibel et al., 2009c) following a block reference design (Steibel and Rosa, 2005). Each
individual pig’s 0-DPI-sample served as reference for all other samples from the same animal.
Reference sample dye flipping was performed across pigs to allow separation of dye and 0-DPI
effects (Steibel and Rosa, 2005). Fluorescent images and fluorescence intensity data were
36 
 

collected as previously described (Steibel et al., 2009c). Median intensities were background
corrected with Normexp method fixing the offset parameter κ =50 (Ritchie et al., 2007).
Background corrected data was normalized using a within print-tip loess-location normalization
(Yang et al., 2002). All computations were implemented in R (R Development Core Team, 2010)
through LIMMA (Smyth, 2005). Normalized log-intensities were analyzed on a transcript per
transcript basis using a linear mixed model accounting for all pertinent random and fixed effects
(Rosa et al., 2005) as described below:

yijklm = μ + TGij + Dl + Am + Sk + eijklm
th

th

where yijklm is the log-intensity measure at the i DPI, for the k pig corresponding to
th

th

phenotypic group j in the m array labeled with the l dye; μ is the overall mean; TGij is the
effect of DPI i and phenotypic group j in the expression of the transcript, with i = 1,…,7 and j =
th

th

1,…,4; Dl is the effect of l dye, with l = 1,2; and Am is the random effect of the m array,
2

th

with m = 1,…,72 and Am~N(0,σa ); Sk is the random effect of the k pig, with k = 1,…12
2

2

and Sk~N(0,σs ); finally eijklm is the residual with eijklm~N(0,σe ).
The mixed model was fitted using the package MAANOVA (Wu et al., 2003). An F test
based on a shrinkage estimator of variance components was used to evalulate significance of
fixed effects (Cui et al., 2005). Permutation based P-values (number of permutations = 100) were
obtained to assess significance (Yang and Churchill, 2007).
To account for multiple testing a two-stage testing procedure was used to assess significance of
gene expression changes in response to weight gain and viral load status over time. First, for
37 
 

each DPI, the interaction effect TGij was tested at 10% false discovery rate (FDR) (Storey,
2003). Then, for all transcripts with significant interaction, the effect of viral load was evaluated
in high-weight-gain and low-weight-gain pigs, separately (nominal p ≤ 1/Ns, with Ns = number
of significant interactions). Similarly, the effect of weight gain was evaluated in high-viral-load
and low-viral-load pigs, separately.
All in all, this testing protocol resulted in 4 contrasts of interest for the comparisons of
phenotypic groups. Each contrast involved the comparison of two phenotypic groups. Two of
these contrasts evaluated the effect of viral load and the other two the effect of weight gain. In
particular, for the viral load effect, we compared 1) HvHg vs.LvHg and 2) HvLg vs.LvLg. To
evaluate the weight gain effect, we compared 1) HvHg vs.HvLg and 2) LvHg vs.LvLg.
Data have been deposited in NCBI’s Gene Expression Omnibus (GEO)
(http://www.ncbi.nlm.nih.gov/geo/) with accession number GSE41144. Code used for these
analyses can be found at https://www.msu.edu/~steibelj/JP_files/PRRSV.html
2.4. Pathway Analysis
Gene set enrichment analyses were performed using Ingenuity Pathways Analysis
software (IPA) (Ingenuity® Systems, www.ingenuity.com). After statistical analyses described
above pathway analyses were restricted to early time-points (4 and 7 DPI). The network analyses
generated a set of relevant networks (P ≤ 10-10 and number of DE genes ≥ 5) built based on a
user-specified list of genes. Networks were composed of genes and gene products that are known
to interact with each other and that were enriched for the DE transcripts (as defined in section
2.3.). Functional analysis identified the biological functions that were significantly enriched
(Benjamini-Hochberg (1995) p < 0.05) for these DE genes.
38 
 

2.5. qPCR design and analysis
Quantitative PCR (qPCR) analysis was used to assess differential expression of 15 genes.
Twelve genes were selected from microarray and pathway analyses results. In addition, three
genes [interferon-alpha 1 (IFNA1); major histocompatibility complex, class II, DQ alpha 1 (SLADQA1); and major histocompatibility complex, class II, DR alpha (SLA-DRA)] not present in the
microarray platform were included based on previously documented knowledge of their relevant
role in immune modulation, reviewed by Lunney and Chen (2010). Probes and primers were
obtained from the Porcine Immunology and Nutrition Database (Dawson et al., 2005) or
designed with Primer Express Software v3.0 (Life Technologies) from sequences obtained from
Ensembl (http://useast.ensembl.org/index.html). Primers and probes were designed to span exonexon junctions, if possible, to avoid false positive genomic DNA contamination (Supplementary
Table 1). Synthesis of cDNA was performed with SuperScript Reverse Transcriptase and qPCR
amplification was implemented using the Brilliant Kit (Agilent Technologies, Inc., CA) with 35
ng of cDNA in an ABI Prism 7500 Sequence Detector System (Life Technologies). Assays were
performed in duplicate. The amplification conditions are described in Royaee et al.(2004). Ct
values were obtained from each individual amplification curve. Average Ct for each target gene
in each sample and DPI (4 and 7) were subtracted from the corresponding average Ct of RPL32
(housekeeping gene), producing ΔCt values. ΔΔCt values were computed by subtracting 0-DPIΔCt from ΔCt at each DPI. Resulting ΔΔCt were analyzed separately for each DPI (except 0
DPI) with the following linear model:

y m = Gm + e m

39 
 

th

where ym is the ΔΔCt value for the gene in the m phenotypic group, Gm is the effect of the
phenotypic group m and em~N(0,σe

2

) is the residual. This model is equivalent to a previously

described linear model (Steibel et al., 2009a).
2.6. Statistical power and sample size computation
Using this experiment’s dataset as pilot data for future experiments with a similar design,
we computed the expected discovery rate (EDR) and FDR as defined by Gadbury et al. (2004) to
estimate statistical power at a fixed number of biological replicates (n) and Type I error rate (α).
We considered either n = 20 or n = 30 per phenotypic group (4 groups) and α = 0.01. This choice
of α resulted in a FDR < 10 % in all cases. Computations were performed using PowerAtlas
software (Page et al., 2006). Sample sizes were selected assuming a common reference design
with either 4 (n = 20) or 3 (n = 30) sampling time-points, such that the total number of
microarray slides was fixed to 240. This represents a common situation where the researcher has
to decide whether to allocate arrays to extra biological samples with fewer time-points or to
include more time-points at the expense of sample size for a given total budget.
3. Results
Dye labeled cDNA prepared from blood samples from 12 PHGC pigs at 7 different time
points (0, 4, 7, 11, 14, 28, and 42 DPI) were hybridized to the Pigoligoarray using a block
reference design. Three pigs per group were randomly selected from each of the four phenotypic
groups defined according to the pigs’ weight gain and viral load (HvHg, HvLg, LvHg and
LvLg). We addressed global differential expression in four contrasts of interest (HvHg vs. LvHg,
HvLg vs. LvLg, HvHg vs. HvLg, and LvHg vs. LvLg).
3.1. Microarray analysis
40 
 

3.1.1. Evidence of differential gene expression at 0 DPI
The presence of an effect on gene expression profile that cannot be attributed to the
experimental infection was addressed by evaluating differential gene expression between the
four phenotypic groups at 0 DPI. Although no significant differences in transcripts were
identified (FDR ≤ 0.1), inspection of P-value distributions for the four contrasts indicated a
departure from the expected uniform distribution under null hypothesis (Fig. 3.2). The actual
distribution of P-values for LvHg vs. LvLg indicated an excess of small P-values. This is
consistent with the alternative hypothesis of differential expression. The contrasts HvHg vs.
LvHg and HvLg vs. LvLg showed P-value distributions inconsistent with both null and
alternative hypotheses implicit in the analysis model (Page et al., 2006). The observed deviations
in the P-value distributions of these tests likely reveal the existence of unaccounted effects (Page
et al., 2003). These patterns also appeared in contrasts at other time-points if these differences at
0 DPI were ignored (results not shown).
3.1.2. Evidence of differential gene expression for remaining DPI
Based on the results from the previous section, differential expression between
phenotypic groups after 0 DPI was corrected by subtracting the estimated difference at 0 DPI.
For example, to address differential expression between two phenotypic groups of pigs, the
effect was estimated following [(TGi≠0, j –

TGi’=0, j) – (TGi≠0, j’ – TGi’=0, j’)]. The same

procedure was used for all contrasts. After correcting for 0-DPI estimated effect, the distribution
of P-values for all contrasts was consistent with the expected distribution under either null or
alternative hypotheses (data not shown). This indicated that correcting each comparison estimate
by the corresponding estimate at time zero accounts for pre-existing differences in gene

41 
 

expression and/or for animal specific effects missed by the model. Consequently, we based all
inferences on the above specified contrasts.
Evidence of differential expression was found, as revealed by the Q-Q plot of P-values (Fig. 3.3).
This plot represents the quantiles of the empirical distribution of P-values versus the expected
quantiles of uniformly distributed P-values (corresponding to the null hypothesis). The
represented departure from the straight line y = x indicates an excess of small P-values as
compared to the expectation under the null hypothesis, consistent with the alternative hypothesis.
3.1.3. Weight gain and viral load interaction effect on gene expression
A total of 491 null hypotheses were rejected (FDR ≤ 10%) when testing for the
interaction between weight gain and VL. The number of transcripts showing a significant
interaction was 288, 14, 177 and 12 at 4, 7 14 and 42 DPI respectively (Table 3.1). There were
no significant interactions detected at 11 and 28 DPI. Transcripts showing significant weight
gain and viral load interaction effect on their expression were further evaluated and results are
presented in section 3.1.4.

42 
 

Table 3.1. Number of putatively differentially expressed transcripts.
Phenotypic Groups Comparisons
Time (DPI) Ns

HvHg vs. LvHg HvLg vs. LvLg HvHg vs. HvLg LvLg vs. LvHg

4

288

86

42

22

141

7

14

13

12

14

11

14

177

106

25

38

120

42

12

12

12

9

12

Numbers indicate differentially expressed genes per time and phenotypic groups’
comparisons.
Ns= total number of transcripts being tested (i.e. with a significant interaction at the specific
DPI being evaluated).

3.1.4. Weight gain and viral load effect on gene expression
To declare a differentially expressed (DE) transcript we used an adjusted P-value where
the null hypothesis was rejected if p ≤ 1/Ns, with Ns = total number of transcripts being tested
(i.e. with a significant interaction at the specific DPI being evaluated), such that we expected one
false positive per comparison. This led to a number of putatively DE transcripts per DPI and
comparison, as shown in Table 3.1. At 4 and 14 DPI, we observed a similar number of DE
transcripts, consisting of a large number of transcripts with significant interaction (288 and 177
respectively) mainly involving differential expression in HvHg vs. LvHg (86 and 106 transcripts)
and in LvLg vs. LvHg (141 and 120 transcripts). However, at 7 and 14 DPI, the number of
significant interactions is smaller (14 and 12) with most of the transcripts DE across all four
contrasts. Although we reported number of DE at 42 DPI, we did not follow up on these results
because response at that late sampling time-point could be due to a rebound of the disease
(Boddicker et al., 2012).
43 
 

3.2. Pathway analyses
Subsequent to testing for changes in global gene expression, we assessed if immune
related gene sets were enriched for DE genes across pigs from the four phenotypic groups. We
performed pathway analyses to identify relevant molecular networks and biological functions
associated with such networks enriched for the DE genes identified in section 3.1.4. We
restricted the analyses to 4 and 7 DPI since 1) we were interested in early immune responses and
2) these were the times that provided the most power to detect future differential expression (as
shown in section 3.4.). Following these pathway analyses, and to limit the interpretation of
results to genes with large effects, in addition to the P-value threshold described in section 3.1.4,
we considered an absolute fold-change (FC) threshold equal to, or greater than 1.5. For instance,
in a specific contrast involving two groups, the gene expression level in the first phenotypic
group had to be 50 % larger (or smaller) than the one in the second phenotypic group for a
significantly DE gene to be considered. DE genes with a positive FC (larger than 1.5) were
considered to be over-expressed, and DE genes with a negative FC (smaller than -1.5) were
considered to be under-expressed in the first phenotypic group included in the contrast equation.
3.2.1. HvHg vs. LvHg
At 4 DPI, pathway analyses identified 5 significant molecular networks. Significant
functional categories identified in these networks were: Cell Death, Cell Morphology, Cellular
Assembly and Organization, Cellular Function and Maintenance. Among DE genes in these
networks and associated with these categories, c-mer proto-oncogene tyrosine kinase (MERTK)
was under-expressed in this contrast. Ezrin (EZR) and moesin (MSN) were over-expressed and
under-expressed in this contrast, respectively. Present in the top network was also Rho GTPase
activating protein 35 (GRLF1) that was under-expressed.
44 
 

At 7 DPI, pathway analyses identified one significant molecular network and similar to 4
DPI a significant functional category associated with the network was also Cell Death. Within
this category DE genes involved in the initiation of apoptosis (PYD and CARD domain
containing, PYCARD) and cytotoxicity of cytotoxic T cells (granzyme A, GZMA) were underexpressed in this contrast. Also present in this network and under-expressed in this contrast was
epidermal growth factor receptor pathway substrate 15 (EPS15).
3.2.2. HvLg vs. LvLg
At 4 DPI, pathway analysis identified three significant molecular networks. Significant
functional categories identified in these networks were Organismal Development, and Cell
Death. Among DE genes in these networks and associated with Cell Death, major
histocompatibility complex, class II, DR beta 1 (HLA-DRB1/SLA-DRB1), involved in the
cytotoxicity of Th1 cells, was under-expressed in this contrast. In the same network, jumonji, AT
rich interactive domain (JARID2) was under-expressed in the contrast. In addition, present in the
top network generated was integrin beta 7 (ITGB7) and RAS guanyl releasing protein 1
(RASGRP1) were both under-expressed in this contrast.
At 7 DPI, pathway analyses results pointed at one significant molecular network.
Functional categories associated with this network were: Genetic Disorder, Inflammatory
Disease, and Cellular Compromise. Differentially expressed genes associated with these
functions were PYCARD and GZMA, over-expressed in this contrast.
3.2.3. HvHg vs. HvLg
At 4 DPI, pathway analysis identified one significant molecular network with 12 DE
genes. However, no significant functional category was identified for genes in this network.
45 
 

At 7 DPI there was also one significant network identified. The significant functional
category identified for this network was Cell Death. DE genes in this category related to cell
death of leukocytes (GZMA and PYCARD), and initiation of apoptosis (PYCARD) were underexpressed in this contrast. Another gene, NEDD8 activating enzyme E1 subunit 1 (NAE1), also
related to initiation of apoptosis was over-expressed in the contrast.
3.2.4. LvHg vs. LvLg
At 4 DPI, seven significant molecular networks were identified. However, no significant
functional categories were identified for any of these networks.
At 7 DPI, one significant molecular network was identified. The significant functional
category identified for this network was Cell Morphology. Associated with this function, GZMA
was DE and over-expressed in this contrast.
3.3. qPCR analysis
3.3.1. Verification of microarray findings
Among a total of 96 comparisons for the 12 genes present in the microarray and selected
for qPCR (12 genes * 4 phenotypic groups * 2 DPI), 26 significant comparisons (p ≤ 1/Ns, as
defined in section 3.1.4.) were detected by the microarray. Significant comparisons occurred for
nine genes: EPS15, EZR, GRLF1, GZMA, ITGB7, JARID2, MERTK, PYCARD, and RASGRP1.
Each comparison corresponds to a significant test of differential expression at a specific DPI.
Focusing on the direction and level of expression change of all comparisons, we evaluated the
gene set correlations between microarray and qPCR measured FC. From a measurement error
perspective, the overall trend in direction and amount of expression was validated at 4 DPI (r =
0.91, p ≤ 0.0007), but not at 7 DPI. As expected, correlation between measurements associated
46 
 

with non-DE comparisons in the microarray and the qPCR were not significant (r = 0.15, p >
0.24). Based on microarray results, qPCR confirmed differential expression of two genes:
JARID2 and ITGB7. JARID2 was confirmed in HvLg vs .LvLg at 4 DPI (p ≤ 0.03 and p ≤ 0.0001
for QPCR and microarray, respectively). ITGB7 was confirmed in LvHg vs. LvLg at 7 DPI (p ≤
0.007 and p ≤ 0.004).
A total of 70 non-DE comparisons were reported by the microarray analysis. This
includes comparisons among the nine genes mentioned plus all comparisons for interleukin-1
alpha (IL1A); interleukin 8 (IL8); and interferon regulatory factor 1 (IRF1). Among these nonDE comparisons only two were detected DE with qPCR. These included IRF1 in HvLg vs. LvLg
(p ≤ 0.007, FC ≤ -2.22) and LvHg vs. LvLg (p ≤ 0.005, FC ≤ -2.33) at 7 DPI. All these results
together indicate that, although the rate of differential expression validation of individual genes
is limited, the overall pattern of differential expression was confirmed for comparisons at 4 DPI.
Consequently, enrichment analysis (networks and functions) identified at the earliest time-point
are expected to be reproduced in future experiments.
3.3.2. Additional genes
Three genes not present on the microarray were also tested using qPCR: interferon-alpha
1 (IFNA1), major histocompatibility complex, class II, DQ alpha 1 (HLA-DQA1/SLA-DQA1),
and major histocompatibility complex, class II, DR alpha (HLA-DRA/SLA-DRA). SLA class II
antigens are expressed in B and T cells, with numerous haplotypes identified throughout
different pig populations, which led researchers to explore their association with disease
responses (Lunney et al., 2009). IFNA1 encodes for an innate cytokine, and has been reported to
be modulated by PRRSV (Mateu and Diaz, 2008;Kimman et al., 2009). Significance levels and
FC for these genes in all comparisons are presented in Table 3.2.
47 
 

At 4 DPI, IFNA1 was significantly DE (p ≤ 0.05) when comparing HvLg to LvLg and
HvHg to HvLg.
At 7 DPI, IFNA1 was significantly DE in all contrasts. SLA-DQA was DE in all contrasts
except for HvHg vs. HvLg and SLA-DRA was DE in LvHg vs. LvLg
Table 3.2. Test of immune gene expression for genes not present in the microarray platform.
Viral Comparisons

Growth Comparisons

HvHg vs. LvHg HvLg vs. LvLg HvHg vs. HvLg LvHg vs. LvLg
DPI Symbol

P-value

FC

P-value

FC

P-value

FC

P-value

FC

IFNA

0.01 -6.67

0.02

5.45

0.05 -3.93

SLA-DQA

0.71

1.15

0.11 -1.92

0.71

1.15

0.11 -1.92

0.96

-1.03

0.62 -1.35

0.71

-1.25

0.42 -1.64

IFNA

0.04

3.25

0.02 -4.08

0.04

3.27

0.02 -4.05

SLA-DQA

0.04

1.32

0.01 -1.43

0.95

-1.01

0.00 -1.89

SLA-DRA

7

3.21

SLA-DRA

4

0.09

0.15

1.58

0.55 -1.20

0.40

-1.29

0.02 -2.45

Values in the table include significance levels and FC of genes; bolded values indicate
significant comparisons and their FC
 

3.4.

Microarray statistical power and sample size estimation
In order to inform the design of future experiments, we computed the EDR per contrast at

early DPI. The EDR is the multi-test equivalent to power, which is also called sensitivity (Steibel
et al., 2009b). EDR should be computed at a specific nominal Type I error rate, α, and for a
given sample size, conditioning on estimated effects from a previous experiment (Gadbury et al.,
2004). We considered a future experiment that would include sampling time = 0 DPI plus two or
three other times, selected among 4, 7, and 11 DPI. We assumed effect sizes estimated from this
data, a fixed nominal error rate α = 0.01, and two sample sizes (n = 30 for three sampling time48 
 

points or n = 20 for four sampling time-points). The sample allocation (sampling fewer time
points with more biological replicates or vice versa) would require the same number of
microarray slides (240) and would roughly have the same cost. For evaluating which sampling
time-points have to be included in a future study, we set the threshold of inclusion to EDR >
80%. That is, the average probability of detecting an effect (assuming the effect is indeed
present) to be larger than 0.8. When n = 30, two sampling time-points (other than 0 DPI) could
be included. In that case, the best sampling combination would be at 4 and 7 DPI. This would
provide adequate power (EDR > 80%) to detect weight gain and VL effects in all contrasts
except for LvHg vs. LvLg. Changing the sampling scheme to include 11 DPI (with only 20
samples per phenotypic group), would not add to the purpose of having sufficient power in that
contrast. Furthermore, sampling at 11 DPI would only result in one contrast (HvLg vs. LvLg)
having EDR > 80% (Table 3.3) 

49 
 

Table 3.3. Expected discovery rate (EDR) comparing 20 and 30 biological replicates.
Sampling time-points (DPI)
Phenotypic
Group
Comparisons
HvHg vs. LvHg

HvLg vs. LvLg

HvHg vs. HvLg

LvHg vs LvLg

Sample
Size (n)

4

7

11

20

0.84

0.86

0.37

30

0.91

0.92

0.47

20

0.83

0.80

0.96

30

0.91

0.88

0.98

20

0.77

0.88

NA

30

0.86

0.94

0.45

20

0.55

0.42

0.62

30

0.63

0.51

0.70

All four contrasts were compared at each DPI and evaluated for future sampling with desirable
power (> 80%)
NA: not available. The algorithm could not reach a result

4. Discussion
The first objective of this study was to assess global differential gene expression in
weaned pigs showing variation in weight gain and blood viral load in response to PRRSV
infection. To achieve this objective, four reaction groups (phenotypic groups) of pigs were
evaluated. Our study is different from other PRRSV-response gene expression profiling
experiments in three ways. First, we focused on modeling individual biological variation of gene
expression. Given that a longer term objective is to find genes for diagnosis and prognosis of
PRRSV infection, we were interested in characterizing the variance of expression at the
50 
 

individual level. The resulting scope of inference is different from that obtained using pooled
samples (Genini et al., 2008;Xiao et al., 2010a;Xiao et al., 2010b), since our experimental and
inferential unit is the individual animal and not a pool of animals. Second, we used a blocked
reference design (Steibel and Rosa, 2005). In contrast to common reference designs (Bates et al.,
2008;Xiao et al., 2010a;Xiao et al., 2010b;Wysocki et al., 2012), our design allowed us to use 0
DPI samples as a reference, and still include 0 DPI in tests, which was instrumental in assessing
baseline effects before infection. Additionally, because of the design used to accommodate single
cDNA samples, this study has the flexibility of incorporating future data for differential
expression analysis. Third, we report on whole-genome expression profiling of white blood cells
from in-vivo infected pigs that complements existing results from studies using pulmonary
alveolar macrophages (PAM), bronchial lymph node and lung (Petry et al., 2007;Bates et al.,
2008;Genini et al., 2008;Lunney et al., 2010;Xiao et al., 2010a;Zhou et al., 2011;Wysocki et al.,
2012). Furthermore, obtaining blood samples is simpler and less invasive than sampling other
tissues, thus simplifying implementation of genomic diagnostics in pigs, including in-situ
sampling at farms.
We first assessed differential expression at 0 DPI between pigs allocated to different
phenotypic groups, and observed effects that could not be solely attributed to experimental
infection or random errors (Page et al., 2006). The individual baseline (0 DPI) differential
expression assessment was only briefly reported before (Ait-Ali et al., 2011), and using 0 DPI
correction has not been reported. This type of correction was not usually addressed either
because the baseline samples were pooled (Genini et al., 2008;Xiao et al., 2010a;Xiao et al.,
2010b) or because they were omitted from the expression experiment (Bates et al.,
2008;Wysocki et al., 2012).

51 
 

We tested the interaction effect between weight gain and viral load on global gene
expression. From breeding and management perspectives, quantifying interaction effects at early
time-points would allow making timely management and selection decisions. Consequently we
concentrated on 4 to 14 DPI for further analyses. In addition, by considering early DPI we
assured that the effect being evaluated was exclusively due to an initial infection stage and not to
a rebound of the disease (Boddicker et al., 2012). Our results complement and extend those
reported by Petry et al. (2007) and Bates et al. (2008), who tested the interaction between viral
burden and genetic line as well as infection status on gene expression. Petry et al. (2005)
reported that pigs from Nebraska Index Line, selected for improved reproductive traits, gained
more weight than Hampshire x Duroc crossbred pigs after PRRSV infection. Therefore, the
weight gain and blood viral level interaction effect we evaluated resembled the genetic line by
viral burden interaction reported by Petry et al. (2007) and Bates et al. (2008). These authors
reported seven genes with significant line by viral load interaction in lung or bronchial lymph
node expression profiles. Querying expression levels for the same genes in our dataset, we found
that DDX3Y (DEAD box proteins, ATP-dependent RNA helicase), a paralog of DDX3 reported
by Bates et al. (2008), was also DE in blood cells. Significant differences in DDX3 expression
occurred between low and high viral burden pigs in Nebraska Index Line, but not in Hampshire x
Duroc line. Likewise, we observed significant differences (p ≤ 0.003) in DDX3Y expression in
HvHg vs. LvHg, but not in HvLg vs. LvLg, at 14 DPI, although the direction of change was
opposite in our results as compared to the Bates et al. (2008) experiment. The reason for this is
unclear but it could be attributed to differences in tissues, genetic background, and/or functional
differences between the paralogous genes.

52 
 

Within the first objective of this study we also aimed to characterize gene networks and
individual genes influencing PRRSV immune response in the four phenotypic groups considered.
We further evaluated transcripts with expression subject to significant viral load by weight gain
interaction effects to identify biological functions associated with relevant molecular networks.
We focused on 4 and 7 DPI since these provide insights on early host anti-viral and innate
immune response to PRRSV infection and they are candidate sampling time-points to be pursued
in future studies. Pathway analysis revealed that cell death function was significantly associated
with several networks enriched for DE genes at 4 and 7 DPI. Genes included in these networks
and associated with cell death were MERTK, GZMA, and PYCARD. All these genes followed a
general pattern of under-expression in high viral load compared to low viral load pigs (FC ≤ 1.5). An exception to this was HvLg vs. LvLg at 7 DPI. These overall results are consistent with
those from Genini et al. (2008) that reported inhibition of apoptosis in cell lines 9 to 12 hours
post infection and with results from Xiao et al. (2010b) comparing gene expression of pigs
infected with a highly pathogenic strain of PRRSV compared to uninfected controls at 4 and 7
DPI. Cell death is a host defense mechanism to inhibit viral replication (Alcami and
Koszinowski, 2000). Overall, the global gene expression profile showed a trend where HvLg and
HvHg pigs had lower expression of the listed genes relative to LvLg and LvHg pigs,
respectively, indicating that the defense mechanism mediated by cell death had reduced
efficiency, thereby allowing increased viral replication. At 4 DPI, our study identified MERTK as
DE and associated with cell death in HvHg vs. LvHg pigs. The product of MERTK is a
phagocytic receptor that is involved in the clearance of apoptotic thymocytes. Mouse
macrophages lacking MERTK showed a delayed clearance of apoptotic cells (Seitz et al., 2007).
There have been no previous reports of this gene identified as DE in PRRSV response studies.

53 
 

Key genes in the swine leukocyte antigens (SLA) complex have been well documented
for their effects on production and immune traits in different pig populations (Lunney et al.,
2009). At 7 DPI, we identified SLA-DRA significantly over-expressed in LvLg relative to LvHg.
SLA-DQA1 followed the same trend, and in addition, it was significantly under-expressed in
LvHg and HvLg relative to HvHg and LvLg, respectively. Global differential expression and
functional analysis comparing PRRSV infected to uninfected pigs at the same sampling timepoints by Xiao et al. (2010a) reported that MHC class II antigens (SLA-DQA, SLA-DMB, SLADQB1 and SLA-DRA) were significantly induced in PRRSV infected lungs.
Our study identified IFNA1 as being significantly DE in all contrasts but HvHg vs. LvHg at 4
DPI. Specifically, at both 4 and 7 DPI, IFNA1 was over-expressed in HvHg and LvLg relative to
HvLg and LvHg pigs, respectively. This gene was reported as under-expressed in PRRSVinfected with respect to uninfected PAM at 30 (Ait-Ali et al., 2011) but not at 12 hours post
infection (Genini et al., 2008). IFNA was reported under-expressed at 4 and 7 DPI in lung tissue
of infected pigs relative to uninfected controls (Xiao et al., 2010a) but at 14 DPI, Lunney et al.
(2010) reported no differences in expression in tracheobronchial lymph node for several innate
markers (IFNA, IL1B and IL8). In addition, Petry et al. (2007) found that differences in
expression of IFNA were influenced by pig genetic line.
Overall, our findings of DE genes in whole blood are in agreement with previous reports
on specific target tissues and cells, such as lung and PAM, following PRRSV infections. These
results stress the usefulness of our study for sampling the more accessible blood to reveal the
complexity of host responses to PRRSV infection. We expect that our planned, more detailed
studies will generate further questions on the role of these and many other genes in anti-PRRSV
responses.

54 
 

Finally, our global differential expression results were used as pilot data to inform design
of future time-course transcription profiling experiments. We evaluated different scenarios of
sample sizes and sampling time-points for combinations given a fixed total sampling effort. We
concluded the best scenario for future studies consists of sampling at 4 and 7 DPI using about 30
pigs per phenotypic group, and that a minimum of 20 pigs per group are needed for controlling
Type I and Type II error rates to acceptable levels in most comparisons. The results obtained
with a sample size n = 30 were consistent with previous results obtained from a dataset generated
by Chen et al. (manuscript in preparation). Our group used the Wysocki et al. (2012) dataset of
lung tissue expression at 14 DPI to evaluate statistical power of high versus low viral burden
pigs, and affirmed that approximately the same sample size was needed. These results
underscore the importance of computing sample size. We predict that this could be applied in a
broader context, for instance, in next generation sequencing experiments. Such technology is
being increasingly used for evaluating expression profiling in pigs infected with PRRSV (Xiao et
al., 2010a;Xiao et al., 2010b). Even though we could expect less technical variation in expression
measured with RNA-seq (Marioni et al., 2008), biological variation would remain unaffected. In
such cases, the only way of increasing power of the tests would be increasing the number of
biological samples (Steibel et al., 2009b).
Evidence presented in this paper highlights the importance of thoughtful experimental
design and accurate modeling. We acknowledge sample size is a key factor of every experiment
and correct modeling of variation (biological and technical) is essential. As a result, this
experiment provided information on actual sample sizes and sampling time-points needed for
more precise estimation of effects of interest. Our preliminary results have already identified
differential gene expression, molecular networks and biological functions affecting the four

55 
 

phenotypic groups of pigs and the influence of PRRSV infection. Finally, due to the flexible
experimental design utilized in this study, the resulting dataset can be merged with future data
for increasingly powerful and precise inferences on response to PRRSV infection.
Conflict of Interest Statement
No competing interests.
Acknowledgements
This work was supported by USDA ARS funding and USDA NIFA grant #2010-65205-20433.
The PRRS Host Genetics Consortium (PHGC) samples were supported through grants #07-233
and #09-244 from the U.S. National Pork Board.
The authors acknowledge the careful RNA preparations performed by Amber Jean Tietgens at
BARC

56 
 

REFERENCES

57 
 

References
Ait-Ali, T., Wilson, A.D., Carre, W., Westcott, D.G., Frossard, J.P., Mellencamp, M.A.,
Mouzaki, D., Matika, O., Waddington, D., Drew, T.W., Bishop, S.C., and Archibald,
A.L. (2011). Host inhibits replication of European porcine reproductive and respiratory
syndrome virus in macrophages by altering differential regulation of type-I interferon
transcriptional response. Immunogenetics 63, 437-448.
Alcami, A., and Koszinowski, U.H. (2000). Viral mechanisms of immune evasion. Immunology
Today 21, 447-455.
Bates, J.S., Petry, D.B., Eudy, J., Bough, L., and Johnson, R.K. (2008). Differential expression in
lung and bronchial lymph node of pigs with high and low responses to infection with
porcine reproductive and respiratory syndrome virus. Journal of Animal Science 86,
3279-3289.
Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate - a Practical and
Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series BMethodological 57, 289-300.
Boddicker, N., Waide, E.H., Rowland, R.R.R., Lunney, J.K., Garrick, D.J., Reecy, J.M., and
Dekkers, J.C.M. (2012). Evidence for a major QTL associated with host response to
Porcine Reproductive and Respiratory Syndrome Virus challenge. Journal of Animal
Science 90, 1733-1746.
Clapperton, M., Bishop, S.C., and Glass, E.J. (2005). Innate immune traits differ between
Meishan and Large White pigs. Veterinary Immunology and Immunopathology 104, 131144.
Clapperton, M., Diack, A.B., Matika, O., Glass, E.J., Gladney, C.D., Mellencamp, M.A., Hoste,
A., and Bishop, S.C. (2009). Traits associated with innate and adaptive immunity in pigs:
heritability and associations with performance under different health status conditions.
Genetics Selection Evolution 41.
Collins, J.E., Benfield, D.A., Christianson, W.T., Harris, L., Hennings, J.C., Shaw, D.P., Goyal,
S.M., Mccullough, S., Morrison, R.B., Joo, H.S., Gorcyca, D., and Chladek, D. (1992).
Isolation of Swine Infertility and Respiratory Syndrome Virus (Isolate ATCC VR-2332)
in North America and Experimental Reproduction of the Disease in Gnotobiotic Pigs.
Journal of Veterinary Diagnostic Investigation 4, 117-126.
Cui, X.G., Hwang, J.T.G., Qiu, J., Blades, N.J., and Churchill, G.A. (2005). Improved statistical
tests for differential gene expression by shrinking variance components estimates.
Biostatistics 6, 59-75.
Dawson, H.D., Beshah, E., Nishi, S., Solano-Aguilar, G., Morimoto, M., Zhao, A.P., Madden,
K.B., Ledbetter, T.K., Dubey, J.P., Shea-Donohue, T., Lunney, J.K., and Urban, J.F.
(2005). Localized multigene expression patterns support an evolving Th1/Th2-like
paradigm in response to infections with Toxoplasma gondii and Ascaris suum. Infection
and Immunity 73, 1116-1128.
58 
 

Doeschl-Wilson, A.B., Kyriazakis, I., Vincent, A., Rothschild, M.F., Thacker, E., and GalinaPantoja, L. (2009). Clinical and pathological responses of pigs from two genetically
diverse commercial lines to porcine reproductive and respiratory syndrome virus
infection. Journal of Animal Science 87, 1638-1647.
Done, S.H., Paton, D.J., and White, M.E.C. (1996). Porcine reproductive and respiratory
syndrome (PRRS): A review, with emphasis on pathological, virological and diagnostic
aspects. British Veterinary Journal 152, 153-174.
Gadbury, G.L., Page, G.P., Edwards, J.W., Kayo, T., Prolla, T.A., Weindruch, R., Permana,
P.A., Mountz, J.D., and Allison, D.B. (2004). Power and sample size estimation in high
dimensional biology. Statistical Methods in Medical Research 13, 325-338.
Galina-Pantoja, L., Mellencamp, M.A., Bastiaansen, J., Cabrera, R., Solano-Aguilar, G., and
Lunney, J.K. (2006). Relationship between immune cell phenotypes and pig growth in a
commercial farm. Animal Biotechnology 17, 81-98.
Genini, S., Delputte, P.L., Malinverni, R., Cecere, M., Stella, A., Nauwynck, H.J., and Giuffra,
E. (2008). Genome-wide transcriptional response of primary alveolar macrophages
following infection with porcine reproductive and respiratory syndrome virus. Journal of
General Virology 89, 2550-2564.
Holtkamp, D.J., Kliebenstein, J.B., Neumann, E.J., Zimmerman, J.J., Rotto, H., Yoder, T.K.,
Wang, C., Yeske, P., Mowrer, C., and Haley, C. (2012). Assessment of the economic
impact of porcine reproductive and respiratory syndrome virus on United States pork
producers. J. Swine Health Prod. (Manuscript in press).
Kimman, T.G., Cornelissen, L.A., Moormann, R.J., Rebel, J.M.J., and Stockhofe-Zurwieden, N.
(2009). Challenges for porcine reproductive and respiratory syndrome virus (PRRSV)
vaccinology. Vaccine 27, 3704-3718.
Lee, C., Bachand, A., Murtaugh, M.P., and Yoo, D. (2004a). Differential host cell gene
expression regulated by the porcine reproductive and respiratory syndrome virus GP4 and
GP5 glycoproteins. Veterinary Immunology and Immunopathology 102, 189-198.
Lee, C., Rogan, D., Erickson, L., Zhang, J., and Yoo, D. (2004b). Characterization of the porcine
reproductive and respiratory syndrome virus glycoprotein 5 (GP5) in stably expressing
cells. Virus Research 104, 33-38.
Lewis, C.R., Ait-Ali, T., Clapperton, M., Archibald, A.L., and Bishop, S. (2007). Genetic
perspectives on host responses to porcine reproductive and respiratory syndrome (PRRS).
Viral Immunology 20, 343-358.
Lunney, J.K., and Chen, H. (2010). Genetic control of host resistance to porcine reproductive
and respiratory syndrome virus (PRRSV) infection. Virus Research 154, 161-169.
Lunney, J.K., Fritz, E.R., Reecy, J.M., Kuhar, D., Prucnal, E., Molina, R., Christopher-Hennings,
J., Zimmerman, J., and Rowland, R.R.R. (2010). Interleukin-8, Interleukin-1 beta, and
59 
 

Interferon-gamma Levels Are Linked to PRRS Virus Clearance. Viral Immunology 23,
127-134.
Lunney, J.K., Ho, C.-S., Wysocki, M., and Smith, D.M. (2009). Molecular genetics of the swine
major histocompatibility complex, the SLA complex. Developmental & Comparative
Immunology 33, 362-374.
Lunney, J.K., Steibel, J., Reecy, J.M., Fritz, E., Rothschild, M.F., Kerrigan, M., Trible, B., and
Rowland, R.R.R. (2011). Probing genetic control of swine responses to PRRSV infection:
current progress of the PRRS host genetics consortium. BMC Proceedings 5, S30.
Mardis, E.R. (2008). The impact of next-generation sequencing technology on genetics. Trends
in Genetics 24, 133-141.
Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M., and Gilad, Y. (2008). RNA-seq: An
assessment of technical reproducibility and comparison with gene expression arrays.
Genome Research 18, 1509-1517.
Mateu, E., and Diaz, I. (2008). The challenge of PRRS immunology. Veterinary Journal 177,
345-351.
Miller, L.C., and Fox, J.M. (2004). Apoptosis and porcine reproductive and respiratory syndrome
virus. Vet Immunol Immunopathol 102, 131-142.
Page, G.P., Edwards, J.W., Gadbury, G.L., Yelisetti, P., Wang, J., Trivedi, P., and Allison, D.B.
(2006). The PowerAtlas: a power and sample size atlas for microarray
experimental design and research. Bmc Bioinformatics 7, 84.
Page, G.P., George, V., Go, R.C., Page, P.Z., and Allison, D.B. (2003). “Are We There Yet?”:
Deciding When One Has Demonstrated Specific Genetic Causation in Complex Diseases
and Quantitative Traits. The American Journal of Human Genetics 73, 711-719.
Petry, D.B., Holl, J.W., Weber, J.S., Doster, A.R., Osorio, F.A., and Johnson, R.K. (2005).
Biological responses to porcine respiratory and reproductive syndrome virus in pigs of
two genetic populations. Journal of Animal Science 83, 1494-1502.
Petry, D.B., Lunney, J., Boyd, P., Kuhar, D., Blankenship, E., and Johnson, R.K. (2007).
Differential immunity in pigs with high and low responses to porcine reproductive and
respiratory syndrome virus infection. Journal of Animal Science 85, 2075-2092.
R Development Core Team (2010). " R: A language and environment for statistical computing.".
(Vienna, Austria: R Foundation for Statistical Computing).
Ritchie, M.E., Silver, J., Oshlack, A., Holmes, M., Diyagama, D., Holloway, A., and Smyth,
G.K. (2007). A comparison of background correction methods for two-colour
microarrays. Bioinformatics 23, 2700-2707.

60 
 

Rosa, G.J.M., Steibel, J.P., and Tempelman, R.J. (2005). Reassessing design and analysis of twocolour microarray experiments using mixed effects models. Comparative and Functional
Genomics 6, 123-131.
Rowland, R.R.R., Lawson, S., Rossow, K., and Benfield, D.A. (2003). Lymphoid tissue tropism
of porcine reproductive and respiratory syndrome virus replication during persistent
infection of pigs originally exposed to virus in utero. Veterinary Microbiology 96, 219235.
Royaee, A.R., Husmann, R.J., Dawson, H.D., Calzada-Nova, G., Schnitzlein, W.M.,
Zuckermann, F.A., and Lunney, J.K. (2004). Deciphering the involvement of innate
immune factors in the development of the host response to PRRSV vaccination.
Veterinary Immunology and Immunopathology 102, 199-216.
Seitz, H.M., Camenisch, T.D., Lemke, G., Earp, H.S., and Matsushima, G.K. (2007).
Macrophages and dendritic cells use different Axl/Mertk/Tyro3 receptors in clearance of
apoptotic cells. Journal of Immunology 178, 5635-5642.
Smyth, G.K. (2005). "Limma: linear models for microarray data.," in Bioinformatics and
Computational Biology Solutions using R and Bioconductor, eds. R. Gentleman, V.
Carey, S. Dudoit, R. Irizarry & W. Huber. (New York: Springer), 397-420.
Steibel, J.P., Poletto, R., Coussens, P.M., and Rosa, G.J.M. (2009a). A powerful and flexible
linear mixed model framework for the analysis of relative quantification RT-PCR data.
Genomics 94, 146-152.
Steibel, J.P., and Rosa, G.J.M. (2005). On Reference Designs For Microarray Experiments.
Statistical Applications in Genetics and Molecular Biology, 1-19.
Steibel, J.P., Rosa, G.J.M., and Tempelman, R.J. (2009b). Optimizing design of two-stage
experiments for transcriptional profiling. Computational Statistics & Data Analysis 53,
1639-1649.
Steibel, J.P., Wysocki, M., Lunney, J.K., Ramos, A.M., Hu, Z.L., Rothschild, M.F., and Ernst,
C.W. (2009c). Assessment of the swine protein-annotated oligonucleotide microarray.
Animal Genetics 40, 883-893.
Storey, J.D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value.
The Annals of Statistics 31, 2013-2035.
Vincent, A.L., Thacker, B.J., Halbur, P.G., Rothschild, M.F., and Thacker, E.L. (2006). An
investigation of susceptibility to porcine reproductive and respiratory syndrome virus
between two genetically diverse commercial lines of pigs. Journal of Animal Science 84,
49-57.
Wattrang, E., Almqvist, M., Johansson, A., Fossum, C., Wallgren, P., Pielberg, G., Andersson,
L., and Edfors-Lilja, I. (2005). Confirmation of QTL on porcine chromosomes 1 and 8
influencing leukocyte numbers, haematological parameters and leukocyte function.
Animal Genetics 36, 337-345.
61 
 

Wu, H., Kerr, M., Cui, X., and Churchill, G. (2003). "MAANOVA: A Software Package for the
Analysis of Spotted cDNA Microarray Experiments
The Analysis of Gene Expression Data," eds. G. Parmigiani, E. Garrett, R. Irizarry & S. Zeger.
Springer London), 313-341.
Wysocki, M., Chen, H., Steibel, J.P., Kuhar, D., Petry, D., Bates, J., Johnson, R., Ernst, C.W.,
and Lunney, J.K. (2012). Identifying putative candidate genes and pathways involved in
immune responses to porcine reproductive and respiratory syndrome virus (PRRSV)
infection. Animal Genetics 43, 328-332.
Xiao, S.Q., Jia, J.Y., Mo, D.L., Wang, Q.W., Qin, L.M., He, Z.Y., Zhao, X.A., Huang, Y.K., Li,
A.N., Yu, J.W., Niu, Y.N., Liu, X.H., and Chen, Y.S. (2010a). Understanding PRRSV
Infection in Porcine Lung Based on Genome-Wide Transcriptome Response Identified by
Deep Sequencing. Plos One 5.
Xiao, S.Q., Mo, D.L., Wang, Q.W., Jia, J.Y., Qin, L.M., Yu, X.C., Niu, Y.N., Zhao, X.A., Liu,
X.H., and Chen, Y.S. (2010b). Aberrant host immune response induced by highly
virulent PRRSV identified by digital gene expression tag profiling. Bmc Genomics 11.
Yang, H., and Churchill, G. (2007). Estimating p-values in small microarray experiments.
Bioinformatics 23, 38-43.
Yang, Y.H., Duboit, S., Luu, P., Lin, D.M., Peng, V., Ngai, J., and Speed, T.P. (2002).
Normalization for cDNA microarray data: a robust composite method addressing single
and multiple slide systematic variation. Nucleic Acids Res 30, e15.
Zhou, P., Zhai, S., Zhou, X., Lin, P., Jiang, T., Hu, X., Jiang, Y., Wu, B., Zhang, Q., Xu, X., Li,
J., and Liu, B. (2011). Molecular Characterization of Transcriptome-wide Interactions
between Highly Pathogenic Porcine Reproductive and Respiratory Syndrome Virus and
Porcine Alveolar Macrophages in vivo. Int J Biol Sci 7, 947-959.

62 
 

CHAPTER FOUR

63 
 

1. Conclusion
1.1.

Goals and contribution of this study

Porcine Reproductive and Respiratory Syndrome (PRRS) has been affecting commercial
populations of pigs in the US for more than 20 years (Done et al., 1996). Different studies on the
economic impact of this disease in the pork industry reported million annual losses of
approximately $ 560 in 2005 (Neumann et al., 2005) and $ 664 in 2011 (Holtkamp et al., 2012).
In this regard, the PRRS Host Genetic Consortium (PHGC) was created with the objective of
addressing the genetic control of the response to PRRS virus (PRRSV) infection (Lunney et al.,
2011). In this thesis, the overall goal was to evaluate whole-genome expression profile of
PRRSV-infected pigs to characterize their response to infection over the time. A specific goal
was to use these data to inform the design of future time-course related experiments.
In this regard, we evaluated whole-genome expression profile of PRRSV experimentally
infected pigs from an infection trial of the PHGC. These pigs were experimentally infected with
PRRSV. Genetic correlations between growth rate and disease traits (Doeschl-Wilson et al.,
2009), as well as between growth rate and immune related traits (Clapperton et al., 2009) have
been reported. In addition, associations of immune traits with growth rate have also been
reported (Galina-Pantoja et al., 2006;Boddicker et al., 2012).Therefore we defined four
phenotypic groups based on the pigs’ weight gain and viral load, and subsequently allocated the
pigs into one of the four groups. We used the 20k 70-mer oligonucleotide Pigoligoarray to assess
global differential gene expression in these pigs as a response to PRRSV infection. We tested
differential gene expression at seven different days post infection (DPI), from 0 DPI to 42 DPI.
To address differential gene expression using two colors microarrays several different
designs have been assessed (Kerr and Churchill, 2001;Dobbin and Simon, 2002;Rosa et al.,

64 
 

2005;Tempelman, 2005). A linear mixed model approach most often underlies the analysis of
data from these experiments (Wolfinger et al., 2001). Moreover, hypothesis testing in a mixed
model analysis of microarray data could be performed with different statistical tests (Wolfinger
et al., 2001;Cui and Churchill, 2003;Cui et al., 2005).
In chapter one, we used plasmode datasets to select an optimal analysis framework.
Specifically, we estimated Type I error rate of the alternative tests. Additionally, we assessed
power of tests that controlled Type I error rate close to the nominal significance level. Among all
possible combinations of different testing procedures, we identified three that control the Type I
error rate close to the nominal level: tests based on a moderated F statistic and significance
thresholds obtained by permutation, tests based on a classic F statistic and tabulated significance
thresholds, and tests based on a classic F statistic and significance thresholds obtained by
permutation. We showed that the random array effect model, as opposed to the fixed array effect
model, increased power of all tests evaluated. Furthermore, we identified the most powerful
procedure corresponding to tests based on a moderated F statistic and significance thresholds
obtained by permutation. Therefore, all further analyses on microarray data were performed
using a linear mixed model that considered array and biological sample as random effects and
the hypothesis testing was performed with a moderated F statistic and significance thresholds
obtained by permutation.
In chapter two, we addressed global differential gene expression using a blocked
reference design that allowed the modeling of individual biological variation in gene expression
as well as assessing baseline effects before infection (at 0 DPI). We observed evidence of
differential gene expression that could not be attributed to the experimental infection and
therefore such effect was accounted for when evaluating differential gene expression at later

65 
 

DPI. Interestingly, the use of a baseline differential expression correction has never been
reported before. In addition to accounting for the baseline effect, modeling the individual
biological variation in gene expression is particularly relevant in our case, since the longer term
objective is to identify genes for prediction of resistance/susceptibility of individual pigs to
PRRSV infection.
We focused on characterizing the differential expression of genes with a significant
interaction of weight gain and viral load at early DPI. From a disease management perspective,
being able to identify these genes would facilitate selection decisions. By considering early DPI,
we also assured that the effect being evaluated was exclusively due to an initial infection stage
and not to a rebound of the disease (Boddicker et al., 2012).
Subsequently, we identified relevant molecular gene networks and associated biological
functions. Cell death function was significantly associated with several gene networks enriched
for differentially expressed (DE) genes. We identified MERTK, GZMA and PYCARD as DE in
these networks. All these genes followed and overall pattern of under expression in high viral
load compared to low viral load pigs. Therefore, viral replication might be increased in high viral
load pigs since the defense mechanism mediated by cell death was reduced in efficiency.
Finally, we used this study as pilot data to inform the design of future time-course
transcription profiling experiments. We evaluated different scenarios for sample sizes an
sampling time-points, based on the objective of controlling costs while maximizing power.
Therefore, the combinations of sample sizes and sampling time-points were chosen to produce a
fixed number (240) of microarray slides. We concluded that the best scenario consists of
sampling at 4 and 7 DPI using 30 pigs per phenotypic group. A minimum of 20 pigs per group

66 
 

are needed for controlling Type I and Type II error rates to acceptable levels in most
comparisons.
1.2.

Future research directions

We acknowledge sample size and accurate modeling of variance components are key
factors in every experiment. Particularly, in this experiment we analyzed the data with a
powerful statistical framework. In a future experiment, sample size will be enlarged to allocate
27 pigs per phenotypic group. Given the flexibility of the microarray design we are using, data
from the 27 pigs can be easily merged with data from the 3 pigs per phenotypic group we have
already analyzed to reach 30 pigs per phenotypic group. We showed that 30 pigs per phenotypic
group is an adequate sample size to estimate differential gene expression in this type of
experiment. Furthermore, we expect that a next experiment will have enough power to uncover
new genes that are differentially expressed among phenotypic groups. This in turn, will translate
into identifying new molecular gene networks and biological functions that are affecting the
response to PRRSV infection in commercial pigs. The whole-blood genome profiling approach
of this study is useful to identify genes whose expression is associated with the viremia and
weight loss in pigs due to PRRSV infection. In the past there has been skepticism about using
whole blood for differential expression studies as opposed to look at differential expression in
specific cell types. We have shown that this approach is valuable and uncovers genes and
networks previously reported when looking at differential expression in tissues that are target of
the disease. The result is important because it encourages development of diagnostic tools using
transcriptional profiling of whole blood. In the longer term, the diagnostic tools developed could
be incorporated for breeding pigs with enhanced PRRS resistance. Ultimately, the results from

67 
 

this thesis will allow the design of more accurate profiling experiments that will translate into a
helpful tool for making decisions for improved PRRSV resistance.

68 
 

REFERENCES

69 
 

References
Boddicker, N., Waide, E.H., Rowland, R.R.R., Lunney, J.K., Garrick, D.J., Reecy, J.M., and
Dekkers, J.C.M. (2012). Evidence for a major QTL associated with host response to
Porcine Reproductive and Respiratory Syndrome Virus challenge. Journal of Animal
Science 90, 1733-1746.
Clapperton, M., Diack, A.B., Matika, O., Glass, E.J., Gladney, C.D., Mellencamp, M.A., Hoste,
A., and Bishop, S.C. (2009). Traits associated with innate and adaptive immunity in pigs:
heritability and associations with performance under different health status conditions.
Genetics Selection Evolution 41.
Cui, X.G., Hwang, J.T.G., Qiu, J., Blades, N.J., and Churchill, G.A. (2005). Improved statistical
tests for differential gene expression by shrinking variance components estimates.
Biostatistics 6, 59-75.
Cui, X.Q., and Churchill, G.A. (2003). Statistical tests for differential expression in cDNA
microarray experiments. Genome Biology 4.
Dobbin, K., and Simon, R. (2002). Comparison of microarray designs for class comparison and
class discovery. Bioinformatics 18, 1438-1445.
Doeschl-Wilson, A.B., Kyriazakis, I., Vincent, A., Rothschild, M.F., Thacker, E., and GalinaPantoja, L. (2009). Clinical and pathological responses of pigs from two genetically
diverse commercial lines to porcine reproductive and respiratory syndrome virus
infection. Journal of Animal Science 87, 1638-1647.
Done, S.H., Paton, D.J., and White, M.E.C. (1996). Porcine reproductive and respiratory
syndrome (PRRS): A review, with emphasis on pathological, virological and diagnostic
aspects. British Veterinary Journal 152, 153-174.
Galina-Pantoja, L., Mellencamp, M.A., Bastiaansen, J., Cabrera, R., Solano-Aguilar, G., and
Lunney, J.K. (2006). Relationship between immune cell phenotypes and pig growth in a
commercial farm. Animal Biotechnology 17, 81-98.
Holtkamp, D.J., Kliebenstein, J.B., Neumann, E.J., Zimmerman, J.J., Rotto, H., Yoder, T.K.,
Wang, C., Yeske, P., Mowrer, C., and Haley, C. (2012). Assessment of the economic
impact of porcine reproductive and respiratory syndrome virus on United States pork
producers. J. Swine Health Prod. (Manuscript in press).
70 
 

Kerr, K.M., and Churchill, G.A. (2001). Experimental design for gene expression microarrays.
Biostatistics 2, 183-201.
Lunney, J.K., Steibel, J., Reecy, J.M., Fritz, E., Rothschild, M.F., Kerrigan, M., Trible, B., and
Rowland, R.R.R. (2011). Probing genetic control of swine responses to PRRSV infection:
current progress of the PRRS host genetics consortium. BMC Proceedings 5, S30.
Neumann, E.J., Kliebenstein, J.B., Johnson, C.D., Mabry, J.W., Bush, E.J., Seitzinger, A.H.,
Green, A.L., and Zimmerman, J.J. (2005). Assessment of the economic impact of porcine
reproductive and respiratory syndrome on swine production in the United States. J Am
Vet Med Assoc 227, 385-392.
Rosa, G.J.M., Steibel, J.P., and Tempelman, R.J. (2005). Reassessing design and analysis of twocolour microarray experiments using mixed effects models. Comparative and Functional
Genomics 6, 123-131.
Tempelman, R.J. (2005). Assessing statistical precision, power, and robustness of alternative
experimental designs for two color microarray platforms based on mixed effects models.
Veterinary Immunology and Immunopathology 105, 175-186.
Wolfinger, R.D., Gibson, G., Wolfinger, E.D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C.,
and Paules, R.S. (2001). Assessing gene significance from cDNA microarray expression
data via mixed models. Journal of Computational Biology 8, 625-637.

71 
 

APPENDICES

72 
 

1. Appendix A
1.1.

Figure Legends

Figure 2.1. Distribution of P-values from BLN and lung datasets. The P-values resulted from
testing for differential expression with a moderated F statistic and significance thresholds
obtained by permutation within a mixed effects model. A) histogram of P-values from BLN
dataset, B) histogram of P-values from lung dataset, C) Uniform Q-Q plot of p-values for BLN
dataset, and D) Uniform Q-Q plot of P-values for lung dataset. Red x=y line in C and D
represents the expected p-values quantiles under the null distribution.
Figure 2.2. Realized versus nominal Type I error rate in plasmode datasets. A) Mixed
effects model, and B) Fixed effects model. Each line in the plots corresponds to P-values
computed for a specific combination of test statistic and significance thresholds. Blue lines
correspond to the classic F statistic and green lines to the moderated F statistic. Solid lines
correspond to tabulated significance thresholds and dashed lines to significance thresholds
obtained by permutation. Red x = y line represents values for which the realized Type I error rate
equals the nominal Type I error rate. P-values from the classic F statistic and significance
thresholds obtained by permutation lay under P-values from the moderated F statistic and
significance thresholds obtained by permutation.
Figure 2.3. Uniform Q-Q plot of p-values in plasmode datasets. A) Negative log transformed
P-values from a moderated F statistic and permuted significance thresholds obtained under a
mixed effects model, B) Negative log transformed P-values from a moderated F statistic and
permuted significance thresholds obtained under a fixed effects model, C) Negative log
transformed p-values from a classic F statistic and tabulated significance thresholds obtained
73 
 

under a mixed effects model, D) Negative log transformed P-values for a classic F statistic and
tabulated significance thresholds obtained under a fixed effects model. Each black line
corresponds to the distribution in one of the 34 plasmode datasets. The green line corresponds to
the overall distribution in all 34 plasmodes. The red x = y line corresponds to the expected
negative log transformed quantiles under the null distribution.
Figure 2.4. Proportion of rejected hypotheses in lung dataset. A) Mixed effects model, and
B) Fixed effects model. Each line in the plots corresponds to P-values computed for a specific
combination of test statistic and significance thresholds. Blue lines correspond to the classic F
statistic and green lines to the moderated F statistic. Solid lines correspond to tabulated
significance thresholds and dashed lines to significance thresholds obtained by permutation. Red
x = y line represents values for which the proportion of rejected hypotheses equals the nominal
level.
Figure 2.5. Uniform Q-Q plot in lung dataset for tests that controlled the Type I error rate.
Each line in the plots corresponds to negative log transformed P-values computed for a specific
combination of test statistic, significance thresholds, and linear model. Blue lines correspond to a
fixed effects model. Green lines correspond to a mixed effects model. Solid lines correspond to
the classic F statistic and tabulated significance thresholds. Dashed lines correspond to the
moderated F statistics and significance thresholds obtained by permutation. Red x = y line
corresponds to the expected negative log transformed quantiles under the null distribution

74 
 

Figure 2.1. Distribution of P-values from BLN and lung datasets.
For interpretation of the references to color in this and all other figures, the reader is referred to
the electronic version of this thesis.

75 
 

Figure 2.2. Realized versus nominal Type I error rate in plasmode datasets.

76 
 

Figure 2.3. Uniform Q-Q plot of P-values in plasmode datasets.

77 
 

Figure 2.4. Proportion of rejected hypotheses in lung dataset

78 
 

Figure 2.5. Uniform Q-Q plot in lung dataset for tests that controlled the Type I error rate

79 
 

2.

Appendix B
2.1.

Figure Legends

Figure 1. Scatterplot of weight gain versus viral load for all pigs in PHGC trial one. Each
dot represents a pig. Color shadings indicate the four different phenotpypic groups (HvHg,
HvLg, LvHg, and LvLg). Dark color indicates pigs that were classified into one of the groups.
Light color indicates pigs that were not classified because they lay in the boundary of the groups.
Circles indicate pigs that were selected for transcriptional profiling in this experiment.
Figure 2. Histogram of P-values for the four contrasts of interest at 0 DPI. For each contrast
of interest (HvHg vs. LvHg, HvLg vs. LvLg, HvHg vs. HvLg, and LvHg vs. LvLg), this figure
shows the distribution of P-values at 0 DPI. For a condition of no differential expression the
histograms should have a flat trend.
Figure 3. Uniform Q-Q plot for gene expression of all contrasts across 4 to 42 DPI after
correcting for 0 DPI estimated effect. This plot represents the quantiles of the empirical
distribution of P-values versus the expected quantiles of uniformly distributed P-values
(corresponding to the null hypothesis). The represented departure from the straight line y=x
indicates an excess of small P-values as compared to the expectation under the null hypothesis,
consistent with the alternative hypothesis of differential expression.

80 
 

Figure 3.1. Scatterplot of weight gain versus viral load for all pigs in PHGC trial one

81 
 

Figure 3.2. Histogram of P-values for the four contrasts of interest at 0 DPI

82 
 

Figure 3.3. Uniform Q-Q plot for gene expression of all contrasts across 4 to 42 DPI after
correcting for 0 DPI estimated effect

83 
 

Supplementary Table 1. Primers and probes used to amplify genes evaluated with qPCR.
Ensembl
Accesion
Number

Primer Foward

Primer Reverse

Probe

EPS15

ENSSSCT
000000042
84

AGCGAGGTTC
AGGATCTTCA
AG

GGGCCTTCTGCTC
ATCCA

CCCAGAAACAGCAA
GTACAGGAACTCCTT
G

EZR

ENSSSCT
000000044
87

GCAGCGGCAG
CTGATGA

GTCGTTGTGGGTC
CTCTTGTTC

ACCAACGAGCTGTC
CCAGGCCA

GRLF1

ENSSSCT
000000034
46

CCCATACAAC
ATGCAGATGG
AT

ACCTCCTTTAGGG
CATGTAGCTT

TGGAGGCACACAAA
ATCAACGACCG

GZMA

ENSSSCG
000000169
02

GGAGCTCACT
CGATAACCAA
GAAA

GCTTTAGAAGTTT
AAGGTCACCCTCA
T

TCCTTATCCATGCTT
TGACCAGGACACAC

IFNA1

GQ415055
(GenBank)

TCAGCTGCAA
TGCCATCTG

AGGGAGAGATTCT TGACCTGCCTCAGAC
CCTCATTTGTG
CCACAGCC

IL1A

ENSSSCG
000000080
90

CTGAAGAAGA
GACGGTTGAG
TTTAAA

AAGTTGTATTTCA
TGTTGCTCTGGAA

CAGAAGAAGAAATC
ATCAAGCCCAGATC
AGC

IL8

ENSSSCG
000000089
53

CCGTGTCAAC
ATGACTTCCA
A

GCCTCACAGAGAG
CTGCAGAA

CTGTTGCCTTCTTGG
CAGTTTTCCTGC

IRF1

ENSSSCG
000000142
77

AATCCAGCCC
TGATACCTTCT
CT

GGCCTGTTCAATG
TCCAAGTC

TGCCTGATGACCAC
AGCAGCTACACA

ITGB7

ENSSSCG
000000002
57

TCGCAGCCCA
GAGTTTGACT

GGCACTGGTGACA
GCAAAGA

TCAGGTAGCCCAGG
CCCTCTCTGC

JARID2

ENSSSCT
000000011
55

CCTTTCTCTGC
CTTCGAGGTT

CGTCCTGAGAGCT
TCCGAAAT

TCCTGCGCTGCCCAA
CAGCA

MERTK

ENSG0000
0153208

GGAAAGATGG
GAAGGAATTG
C

TCATCTTACAGAT
ATATGACCCATTG
TCT

TTCAGCATAACCAGT
GTGCAGCGTTCA

Gene

84 
 

Supplementary Table 1 (cont’d)
PYCARD

ENSG00
0001034
90

CAAACCAGCA
CTGCACTTCGT

CAGCCCGTCCACG
TCTGT

RASGRP
1

ENSSSC
G000000
04791

GGAGAATAAA
GAATCCCTCA
TAAAATCA

TTATTTCCTGTTCC CTCCGTCACCTCAGA
AGCTCTTGGT
CTCCCCACC

RPL32

ENSSSC
G000000
27637

GGAAAGATGG
GAAGGAATTG
C

TCATCTTACAGAT
ATATGACCCATTG
TCT

TTCAGCATAACCAGT
GTGCAGCGTTCA

SLADQA1

ENSSSC
G000000
01456

GGTTCCTGAG
GTGACTGTGT
TT

GACAGAGTGCCCG
TTCTTCAA

CTGGGTCAGCCCAA
CACCCTCAT

SLA-DRA

ENSSSC
G000000
01453

CCCGCCAGTG
GTCAATGT

AGTGGAACTTGCG
GAAAAGG

AGGAGTGTCAGAGA
CAGTCTTCCTGCCC

85 
 

CGGGCAGCCCTCAT
CTCAAGGG