r- ~ .-.\ ..> I. ..~. u. :- . .\p...-

 

 

 

 

 

 

 

 

 

4n: .

l‘vt‘I-ln'1\v:l"

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This is to certify that the
dissertation entitled

IMPROVING EXPERIMENTAL DESIGN AND STATISTICAL
INFERENCE FOR TRANSCRIPTIONAL PROFILING
EXPERIMENTS

presented by

Juan Pedro Steibel

has been accepted towards fulﬁllment
of the requirements for the

PhD degree in Animal Science

 

Major Professor's Signature
Mam/L 7-) .10 C) 7

Date

MSU is an afﬁnnative-action, equal-opportunity employer

 

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

DATE DUE DATE DUE DATE DUE

 

IMPROVING EXPERIMENTAL DESIGN AND STATISTICAL INFERENCE FOR
TRAN SCRIPT ION AL PROFILIN G EXPERIMENTS.

By

Juan Pedro Steibel

A DISSERTATION

Submitted to
Michigan State University
In partial fulﬁllment of the requirements
For the degree of

DOCTOR OF PHILOSOPHY
Department of Animal Science

2007

ABSTRACT

IMPROVING EXPERIMENTAL DESIGN AND STATISTICAL INFERENCE FOR
TRANSCRIPTION PROFILING EXPERIMENTS.

By
Juan Pedro Steibel

Functional genomic studies resort to transcriptional proﬁling experiments in order
to gain insights into the function of genes and their patterns of expression and regulation.
The most commonly used techniques for gene expression studies in animal science are
two-color microarrays and quantitative reverse transcription polymerase chain reaction
(qRT-PCR). The overall objective of this dissertation was to increase the efﬁciency and
statistical power of these studies by further optimizing experimental designs and
statistical analysis methodology for microarray and validation (qRT-PCR) studies.

The first study addressed the comparison of alternative reference designs,
including a potentially more efﬁcient variant called the blocked reference design (BRD).
The relative efﬁciency of the various designs was shown to depend on the number of
treatments and the relative magnitude of biological and residual variability. All designs
were deemed practically equally efﬁcient when the magnitude of the biological variances
is negligible relative to the residual variance; however, with a large number of treatment
comparisons and a larger biological to residual variance ratio, the BRD can be 1.5 to 2
times more efﬁcient than more traditional alternatives.

The second study compared models based on using either log-intensities or log-
ratios as the response variables in linear model analyses, demonstrating their inferential
equivalences and differences. The results indicated that the main difference between the

two models primarily depended on the ability of the corresponding design to recover

inter-slide information using a log-intensity model compared to log-ratio model. The
amount of this recovery also depended on the relative amount of variability between
slide, biological and residual variances.

The third study presented a novel method for the analysis of relative
quantiﬁcation RT—PCR data. Using linear mixed models, we accounted for hierarchical
replication due to more than one gene assayed in each sample and multiple repeated
measures in each sample-gene combination. Extensive simulations showed that the
proposed model controlled the Type I error rate at the nominal level and yielded better
power than traditional analysis methods.

The last chapter focused on jointly optimizing microarray and qRT-PCR
validation experiments in order to maximize sensitivity while controlling the false
discovery rate (FDR) within a two-stage testing framework. This optimization was based
on partitioning a set of biological samples into two groups to be separately utilized for the
microarray and validation steps. The results indicated that most of the samples (typically
more than 60%) should be assigned to the microarray experiment. Even though the
original optimization assumed independent genes, known variances, and uniform effect
sizes, we showed that the results are valid for moderate departures from those
assumptions. The problem of designing an adequate validation experiment, conditional
on an existing microarray experiment was also studied. We found that if independent
samples are used for the validation, a limited sample size and a liberal signiﬁcance level

(>015) could be used to properly control FDR after the second stage test.

Copyright by
JUAN PEDRO STEIBEL

2007

To my daughter, Maria Eva

AKNOWLEDGEMENTS

In the ﬁrst place, I would like to acknowledge my family, especially my parents,
Pedro and Silvia, and my daughter, Maria, for their constant support and understanding.
Despite the distance, Dad and Mom have always been close to me. Without their support
and encouragement, it would have been hard to keep up. Maria is probably the most
wonderful child that I could have dreamed of and her patience and creativity has
contributed a lot to my life and provided inspiration to ﬁnish this program.

To the members of my committee: Drs Rob Tempelman, Guilherme Rosa, Paul
Coussens, R.V. Ramamoorthi and Cathy Ernst. For the encouragement, constructive
criticism and guidance that they provided. I would like to say thanks especially to Dr.
Coussens for allowing me to work with part of his group and for providing data, advice,
ideas and help whenever I needed them.

Guilherme and Rob deﬁnitely deserve a dedicated paragraph. From the beginning,
Guilherme trusted me, and gave me space and freedom to propose and develop ideas,
while at the same time he offered wise advise when he considered that I needed some
help. And he was right every single time. But over all, I am very grateful to him for
offering me his friendship and support in every single issue of my academic and personal
life. Rob Tempelman “adopted me” towards the end of my program but treated me as if I
were his student from the beginning. He was very considerate of the work done and
accepted to keep the course of the research that I had started. His generosity and support
have made the difference to me. Both, Rob and Guilherme, at this point, are not only

academic role models for me, but they are also models as friends and parents. The things

vi

I learned from them are so numerous, valuable and diverse that I will not attempt to list
them here. Enormous thanks go to both of them.

The experience that I gained by collaborating with other fellow students and
researchers is as valuable as what I learned though my own research. Among the people
I’d like to thank especially for this are Rosangela Poletto, Mahmoud Romehi, Osman
Patel, Nora Bello and Patricia Almeida. It has been a pleasure to work with them, learn
from them and transfer to them part of my knowledge.

The fellow students and researchers in statistical genetics, Fernando Cardoso,
Shirley (Lan Xiao) and Ashok Ragavendran were always ready to help with
computational resources, share a good idea or get involved in a fruitful discussion. Their
contribution has made a difference in more than one occasion.

Last, but not least, thanks to my friends. Especially to those who have been close
to me during these last four years. Federico Aime, Pablo Ross, Nora Bello and Patricia de
Almeida are among the most emblematic of the friends that I ever made. And I met them
here, during my PhD program. These folks accepted me (and my circumstances) as a
whole package. Their unconditional generosity and support have helped me during some
very rough times of my life and I will always have a special place for them in my

memories and in my heart.

“If the pages of this book contain some successful verse, the reader must excuse me the
discourtesy of having usurped it ﬁrst. Our nothingness diﬁers little; it is a trivial and chance circumstance
that you should be the reader of these exercises and 1 their author. "

Jorge Luis Borges.

vii

TABLE OF CONTENTS

LIST OF TABLES .............................................................................................................. x
LIST OF FIGURES .......................................................................................................... xii
INTRODUCTION ............................................................................................................... l
1. Experimental design and analysis of two-color microarray experiments. .............. 2

2. Analysis of relative quantiﬁcation qRT-PCR data. ................................................. 5

3. Design of screening and validation experiments. .................................................... 7

4. General hypothesis ................................................................................................... 8

5. Speciﬁc Aims ........................................................................................................... 9
CHAPTER ONE ................................................................................................................ l 1
I. On reference designs for microarray experiments ................................................. 12
1. Introduction ........................................................................................................... 14

2. The traditional reference designs .......................................................................... 15
2.1. General layouts .............................................................................................. 15

2.2. Required resources: arrays and mRNA samples ........................................... 16

2.3. Linear models and contrast variance ............................................................. 17

3. An improved reference design .............................................................................. 19
3.1. Layout ............................................................................................................ 19

3.2. Required resources ........................................................................................ 20

3.3. Statistical efﬁciency ...................................................................................... 2O

4. Results and discussion .......................................................................................... 22
4.1. Biological and technical replication .............................................................. 22

4.2. Relative efﬁciency of traditional reference designs ...................................... 22

4.3. An alternative reference design ..................................................................... 23

4.4. Technical and logistic issues ......................................................................... 23

4.5. Concluding remarks ....................................................................................... 25
CHAPTER TWO ............................................................................................................... 33
11. Comparison of log-intensity and log-ratio linear models for two color
microarrays. ................................................................................................................. 33

1 Introduction ........................................................................................................... 34

2 Linearly transforming a log intensity model to a log ratio model. ....................... 35

3. Ratio models for speciﬁc designs ......................................................................... 36

4 Recovery of inter-array information. .................................................................... 45

5 Discussion ............................................................................................................. 49
CHAPTER THREE ........................................................................................................... 60
HI. Linear mixed models for the analysis of relative quantiﬁcation RT-PCR data ..... 6O
1. Background ........................................................................................................... 61

2. Results ................................................................................................................... 64

viii

2.1. Motivating example ....................................................................................... 64

2.2. Testing and estimating differential expression .............................................. 66
2.3. Validation through simulation ....................................................................... 67
2.4. Model checking and result comparison in experimental datasets ................. 70
2.5. Comparison of results with 2AMr methods ................................................... 72
3. Discussion ............................................................................................................. 73
4. Conclusions ........................................................................................................... 76
5. Methods ................................................................................................................ 76
5.1. Materials and RT-PCR reactions ................................................................... 76
5.2. Model derivation ........................................................................................... 77
5.3. Hypothesis testing and estimation ................................................................. 78
5.4. Data and response variable ............................................................................ 79
5.5. Programs for analysis .................................................................................... 79
5.6. Simulation study ............................................................................................ 79
5.7. Model selection ............................................................................................. 80
CHAPTER FOUR .............................................................................................................. 88
IV. Design and analysis of two-stage experiments for transcription proﬁling ............ 88
1. Introduction ........................................................................................................... 89
2. Methods ................................................................................................................ 90
2.1. Type I error rate in a two-stage single test. ................................................... 91
2.2. FDR and sensitivity in two-stage tests .......................................................... 92
3. Results ................................................................................................................... 94
3.1. Sample size and sample allocation ................................................................ 94
3.1.1. Known variance and constant effect sizes across independent genes...94
3.1.2. Distributed effect sizes .......................................................................... 97
3.1.3. Unknown variances ............................................................................... 98
3.1.4. Implementing two stage tests ................................................................ 98
3.1.5. Correlated genes .................................................................................. 100
3.2. Design of a validation experiment ............................................................... 101
3.2.1. Sample size calculation of a (independent) validation experiment ..... 101
3.2.2. Non-independent sample sets in screening and validation

experiments ......................................................................................... 103

3.3. Estimation of effect sizes and correlation for microarray and qRT-PCR
experiments .................................................................................................. 105
4. Discussion ........................................................................................................... 107
CHAPTER FIVE ............................................................................................................. 129
General discussion ........................................................................................................... 129
1. Objectives revisited and their impact in animal functional genomics. ........................ 129
2. Future research directions ............................................................................................ 135
APPENDIX ONE ............................................................................................................. 132
REFERENCES ................................................................................................................ 147

ix

LIST OF TABLES

Table 1.1. Number of biological replicates in non-control treatments (r), for a given
number of arrays (n) and treatments (K), for each alternative reference design ................ 16

Table 1.2. Total number and speciﬁc variances of direct and indirect contrasts between
treatments groups, and overall average variance of contrasts for replicated (RRD),
classical (ClRD) and common (CRD) reference designs. ................................................. 17

Table 1.3. Resources required in each of the four variants of reference design for
comparing K treatments, with r biological replicates for the non-control groups. q:
necessary aliquot of mRNA for each hybridization ........................................................... 20

Table 2.1. Expected mean squares of relevant terms in models [4] and [5]. In both cases,

the model [4] is assumed to be the data generation model. 0'22 Is the residual variance in

the log-intensity scale, 0'12; is the individual variance in the log-intensity scale. .............. 52

Table 2.2. Relative efﬁciency of intra-inter-slide (random array) analysis compare to the

intra-slide (ﬁxed array) analysis in the loop design (Figure 4). 0'3 Is the residual variance

in the log-intensity scale, 0% is the individual variance in the log-intensity scale. .......... 53
Table 3.1 Properties of the point estimates of the fold change. ......................................... 81
Table 3.2. Properties of the interval estimates of the fold change. .................................... 82
Table 3.3. Properties of the hypothesis tests. ..................................................................... 83
Table 3.4. Experimental data for model checking and result comparison. ........................ 84

Table 4.1 Outcomes of m=m0+m1 two-stage tests. A represents accepted hypotheses, R
represents rejected hypotheses, the sub index 0 indicates true null hypotheses and 1
indicates true alternative hypotheses. The super index indicates the stage of the test ..... 112

TABLE 4.2 Error measures in two stage tests based on the quantities presented in Table
4.1 ..................................................................................................................................... 1 13

Table 4.3. Sensitivity and FDR for two stage tests based on correlated test statistics.
m=0.99, J”=0.6, J“=0. 72, n1=n2=25 ........................................................................... 114

xi

LIST OF FIGURES

Figure 1.1: Alternative reference designs with three treatments (A, B and C) and 12
arrays: a) Common reference design (CRD) with four replicates (subindexes); reference
sample (R) is the same in all arrays; b) Classical reference design (ClRD) with four
replicates (subindexes) in 12 arrays; reference sample (R) is replicated; and c) Replicated
reference design (RRD), in which six replications of treatments B and C are hybridized
together with independent replicates of a control treatment (A). ...................................... 15

Figure 1.2. Relative efﬁciency of the replicated reference design (RRD) as compared to
the common reference design (CRD), considering: a) the same number of replicates (r),
and b) the same number of arrays (n). X-axis is the log2 of the variance ratio; curves
correspond to different numbers of treatments (K). ........................................................... 18

Figure 1.3. Efﬁciency of the: a) common reference design (CRD), and the b) replicated
reference design (RRD), relative to the classical reference design (ClRD). a: ﬁxed
number of biological replicates, I: ﬁxed number of arrays ............................................... 19

Figure 1.4. Blocked reference design (BRD) with 3 treatments (A, B and C) and 6
replicates (subindexes) in 12 arrays. Each test sample is hybridized together with a
replicate from the control treatment (A); the same set of replicates of the control group is
used with the other treatments. .......................................................................................... 19

Figure 1.5. Relative efﬁciency of the blocked reference design (BRD) compared to: a) the
replicated reference design (RRD), and to b) the common reference design (CRD). A
fixed number of replicates (r) is considered. X-axis is the log2 of the variance ratio. Each
curve corresponds to a different number of treatments (K). .............................................. 21

Figure 2.1. Common reference design. Bk, is the 1m biological replicate within the km

group. ................................................................................................................................. 54
Figure 2.2. Complete block design with dye swap . Bk, is the 1m biological replicate within
the k’h group. ...................................................................................................................... 55

Figure 2.3. Dye swap design with technical replication. B,—,- is the j'h biological replicate
within the i'h group. ............................................................................................................ 56

xii

Figure 2.4. Connected loop design. Bk, is the 1m biological replicate within the k'h group 57

Figure 2.5. Split-plot design. B“, is the 1'” biological replicate within the k’h group of
factor T receiving the s‘h level of factor S. The same experimental unit (By) is assigned to
two levels of the factor S (subplot factor) in the same array. Independent experimental
units are used for the two levels of the factor T (whole plot factor) .................................. 58

Figure 2.6. Relative contrast variance under different assumptions for the array effect in

reference designs ................................................................................................................ 59
Figure 3.1.Fold change estimates ....................................................................................... 85
Figure 3.2.Histograms of p-values under null hypothesis. ................................................ 86

Figure 3.3.Result comparison between ACT (vertical axis) and linear mixed model
(horizontal axis). ................................................................................................................ 87

Figure 4.1. a) Sample allocation and b) sensitivity of optimal designs as a function of the
relative effect size between the two stages. The lines represent different proportions
(0.005, 0.01, 0.015, 0.02 and 0 =1 stage) of selected hypotheses to be carried to the
second stage. The results were calculated for: 712F099, #005, (2”)=0.4, and n=170 (see
text). ................................................................................................................................. 115

Figure 4.2. Distribution of the effect size for stage 1. a) constant, b) symmetric. c)
decreasing. The three distributions were scaled to have mean 1.0 or 0.4. ....................... 116

Figure 4.3. Sample size allocation and sensitivity as function of the distribution of the

ﬁrst and second stage effect size, average 6(1):], n=28. 710=0.99,f=0.05. The single stage
sensitivity corresponds to a decaying distribution. .......................................................... 117

Figure 4.4. Sample size allocation and sensitivity as function of the distribution of the

ﬁrst stage effect size, average @(I)=0.4, n=l7O (c,d). ”0:099, f=0.05. The single stage
sensitivity corresponds to a decaying distribution. ......................................................... 118

xiii

Figure 4.5. Distribution of the ratio of the effect sizes. a) constant, b) symmetric. c)
decreasing. ....................................................................................................................... 1 19

Figure 4.6. Sample size allocation (a) and sensitivity (b) as function of the distribution of
the ﬁrst stage effect size for different effect size rations (lines). 712F099, f=0.05 ........... 120

Figure 4.7. Sample allocation (a) and sensitivity (b) of optimal designs as a function of
the relative effect size. $099,120.05, al’el, n=30. .................................................... 121

Figure 4.7. Distribution of the sensitivity and false discovery rate obtained from
simulations based on different procedures to select the signiﬁcance cutoff (am). The
points indicate the mean whereas the line is the expected value obtained from the
optimization algorithm. .................................................................................................... 122

Figure 4.8. Distribution of a) sensitivity and b) false discovery rate obtained from
simulations for different correlation structure in the data. The x-axis numbers indicate the
correlation coefficient whereas the subindex on the x-axis values indicates the block size.
The points within each box indicate the mean whereas the line reﬂects the expected value
obtained from the optimization algorithm assuming independence. ............................... 123

Figure 4.9. Sensitivity of two-step tests for correlated hypotheses as function of the
sample size in the ﬁrst step. N=200 simulation with m=0.99, #005, c3”)=1, n=30,
R” )/m=0.01. The arrow indicates the optimal sample allocation obtained assuming
uncorrelated tests. 0: p=0.4, block size=40, I: p=0.4, block size=10. ............................ 124

Figure 4.10 Minimum number of replicates per treatment in a validation ex eriment as
function of proportion of null hypotheses (FDRI 1)) and effect sizes. RI =25 genes
putatively differentially expressed at different levels of FDR , target sensitivity = 0.9,
FDR after validation = 0.05. ............................................................................................ 125

FIGURE 4.11. Power of a single two-test for different correlations (0, 0.56, and 0.66)
between the ﬁrst and second stage data as a function the ratio of second stage to ﬁrst
stage effect sizes. The effect size of the ﬁrst stage (8“) is 0.6 whereas the number of
replicates per treatment is 25 for each test. ..................................................................... 126

Figure 4.12. Box-Plot of the microarray and qRT-PCR based effect size estimates and
their ratios. ....................................................................................................................... 127

xiv

Figure 4.13. Histogram of the estimated correlation coefficient between microarray and
qRTPCR measured expressions. ...................................................................................... 128

XV

INTRODUCTION

Transcriptional proﬁling is a key aspect of functional genomics as it can provide
invaluable insights into the function of genes by elucidating their pattern of expression or
regulation. Drawing meaningful inferences from these studies given resource and funding
constraints depends on optimizing the experimental designs and statistical analysis
models and methods according to restrictions imposed by the currently available
technologies. The most commonly used techniques for gene expression studies are
microarrays and quantitative reverse transcription polymerase chain reaction (qRT-PCR).

A two-color microarray, such as a cDNA or long-oligonucleotide microarray, can
measure the relative expression of thousands of genes simultaneously between a pair of
samples. This expression proﬁling technology seems particularly useful for individuals
studying a wide range of less well represented organisms in their research programs and
thus has become the platforms of choice for high throughput expression proﬁling in
livestock functional genomics. Some of the challenges using these two color arrays
pertain to experimental design (sample size and optimal sample allocation for a given
experimental setup) and statistical analyses.

In general, the statistical issues of a microarray experiment include the following
items (Allison et a1., 2006): experimental design, preprocessing, inference and validation.
Many statisticians have recently played a key role in the development of experimental
designs and statistical methods for microarray experiments. For example, mixed models
that account for multiple sources of variation in a statistical analysis have greatly

improved the inferential power of two color microarrays (Cui and Churchill, 2003).

However, in some areas, statistical developments are still sorely lacking. For example,
there has been no extensive work on acceptable statistical practice for qRT-PCR
validation of microarray results, even though such validation is encouraged or even
required by many journals.

Now qRT-PCR is a gene expression proﬁling technique that is usually assumed to
be more accurate and precise than microarrays (largely due to increased dynamic range)
and is commonly used as a validation tool as aforementioned. The characteristics of qRT-
PCR assays make them more suited to simultaneously test a lower number of genes from
a larger set of samples, and consequently are more limited as a high throughput screening
tool compared to microarrays. Nevertheless, as with microarrays, qRT-PCR data
presents interesting challenges for inferring upon differential gene expression, including
aspects of choosing appropriate experimental designs and declaring statistical
significance on differential expression, particularly when this methodology is used for

validation of microarray results.

1. Experimental design and analysis of two-color microarray experiments.

For two-color or (two-channel) microarrays, the experimental design
characterizes how particular samples are paired within each slide and the total number of
biological replicates used in the experiment. A two-color microarray experiment actually
includes two blocking factors. One factor is the array (or slide) that can accommodate
two samples (i.e. one per channel) whereas the other factor is the dye labeling that
typically involves assignment of one of two states or colors to each sample within an

array. Given the restriction of allowing a comparison of only two treatments within

 

slides for a two channel array, a number of incomplete block (e.g. loop) designs have
been proposed when several treatments are of interest (Dobbin and Simon, 2002; Kerr
and Churchill, 2001).

Despite it not being the most efficient design (Dobbin and Simon, 2002; Kerr and
Churchill, 2001; Tempelman, 2005), the reference design is still pervasively used for two
color microarray studies. In a common reference design, aliquots from a uniform single
reference sample are always hybridized with a different sample from each experimental
group on each array. There are several possible deviations on this design, e.g. biological
vs. technical replicates from the experimental groups and the labeling of reference with
one or both dyes across arrays. Now whereas a common reference design consistently
uses a single sample as a reference, a classical reference design uses different biological
replicates from the reference group across all arrays (Tempelman, 2005). While the
reference is not of interest per se, the efﬁciency might be improved by using the reference
as one of the treatments of interest in some settings. Under those circumstances with the
correct sample allocation to arrays, the reference design has the potential to be a very
powerful and ﬂexible experimental layout. Several authors have compared reference
designs to other alternatives (Dobbin and Simon, 2002; Kerr, 2003a; Kerr and Churchill,
2001; Tempelman, 2005; Yang and Speed, 2002) in two color systems. However,
competing deviations on reference designs have not been considered exhaustively.
Furthermore (Kerr and Churchill, 2001; Yang and Speed, 2002), researchers often do not
make the distinction between different levels of variability that arise in these designs.
Appropriately accounting for such hierarchical replication in two color microarray

experiments has been shown to be crucial to obtain appropriate inferences (Rosa et al.,

2005).

There are several experimental situations that can benefit from the improvement
of reference designs. One such situation is a screening experiment whereby several
treatments are compared to a control upon which significantly expressed genes are
selected for further study. Likewise, if the number of treatments or classes to be
compared is not known a priori, the reference design is extendible provided there is a
sufﬁciently large source of reference sample to be repeatedly drawn from for future
hybridizations with new experimental samples.

Typically, ﬂuorescence intensity data (e.g. Cy3 and Cy5) from reference designs
is further reduced to a ratio of these two values for each spot on an array before
subsequent statistical analysis. The common reference design lends itself, in particular,
to the quotient of ﬂuorescence intensities between the “unknown” sample and the
reference sample. In general, these ratios are subsequently logarithmically transformed
to facilitate data normalization and distributional normality assumptions as required for
parametric statistical methods; furthermore, ratios are easily compared across arrays.
Analyses based on log-ratios are simpler for designs with one level of replication (such as
a common reference design); however, log-ratio models are not easy to generalize to
more complex experimental designs with hierarchical or technical levels of replication.
Furthermore, even for a reference design, the use of ratios may lead to the loss of
statistical information, and this should be investigated further.

Linear ﬁxed and mixed effect models based on the analysis of log intensities have
recently gained popularity (Kerr et al., 2000; Rosa et al., 2005; Smyth, 2004; Wolﬁnger

et al., 2001). These models offer a ﬂexible way to account for multiple sources of

 

variation that arise from hierarchically replicated microarray experiments. Log-ratio and
log-intensity linear models have been compared only for simple experimental designs
(Kerr, 2003b). Moreover, there is very limited research on how to implement linear
mixed models for log-ratio data in order to account for multiple layers of replication
(Smyth et al., 2005).

Given the wider availability of statistical analysis software for microarray
experiments that usually accommodate only one or the other of the two different
expression measures (log—ratios or log-intensities), a better understanding of the
circumstances under which the two models will yield the same results and the limitations
of each analysis would provide invaluable insights into the interpretations and
comparison of results from independent investigations. Also, a better understanding of
the extent of the loss of information incurred when resorting to a simpler analysis strategy
would be important when evaluating the ramifications of such a strategy in a particular
experiment. A general computational framework to assess these issues would contribute
to a better understanding of the similarity and differences of alternative analysis

strategies.

2. Analysis of relative quantiﬁcation of RT-PCR data.

The primary response variable for gene expression from qRT-PCR consists of the
fractional cycle number (CT) at which threshold intensity is reached and is directly
proportional to the negative logarithm of the mRNA concentration. The corresponding
constant of proportionality may be estimated from a standard curve thereby providing

absolute quantiﬁcation of the mRNA present in a sample. For most comparative

transcriptional proﬁling experiments, however, the relative level between samples rather
than actual abundance is sufﬁciently informative for comparing several treatments.

Relative quantiﬁcation aims at comparing relative levels of target mRNA across
samples by standardizing the total RNA quantity to an internal control gene assumed to
be constant in its expression across different experimental conditions (Pfafﬂ, 2001). The
most common summary statistic for this type of quantiﬁcation is the AACT method
(Livak and Schrnittgen, 2001). This measure is easily computed for simple experimental
designs where several groups are compared to a single control group. A series of ad-hoc
parametric or non-parametric approaches (Pfafﬂ et al., 2002) are used to obtain p-values
and standard errors associated with the treatment comparisons of interest using the AACT
measure. Nonetheless, more general statistical models have not been speciﬁed to analyze
this measure in more efﬁcient and elaborate designs.

In more complex experimental designs, such as those that commonly appear in
animal functional genomics, hypotheses that are more complex than all pairwise
comparisons are likely to be of interest. For example, two-factor interaction hypotheses,
time trend contrasts and hypotheses related to the signiﬁcance of variance components
are increasingly common. However, there has been limited formal development of
statistical models and methods adopted for these questions using relative quantiﬁcation of
RT-PCR data. (Fu et al., 2006)

Linear mixed models offer a promising approach to the statistical analysis of
qRT-PCR data as they have had with microarray data. Even though linear models have
been used for qRT-PCR data in the past, these uses have generally been restricted to

control gene selection (Szabo et al., 2004) or they do not consider multiple random

sources of variability (Cook et al., 2004). A general formulation of a linear mixed model
for relative quantiﬁcation of RT-PCR data will provide a ﬂexible and powerful analysis
methodology to accommodate arbitrarily complex designs and allow researchers to draw

accurate inferences from hierarchically replicated experiments.

3. Design of screening and validation experiments.

A general consensus has been established on the necessity to independently
validate results from microarray experiments (Allison et al., 2006) although recently the
utility of such validation has been questioned (Rockett, 2003; Rockett and Hellmann,
2004). Most genornicists will attempt to validate results from a microarray assay using
qRT-PCR, inmunohistochemistry or protein analysis techniques. Such validation is
typically conducted for only a small fraction of the genes declared to be differentially
expressed by a microarray experiment. The most common practice consists of selecting a
few genes for qRT-PCR using mRNA from the same samples included in the microarray
experiment. This practice of using the same samples again has been criticized because it
provides only “technical validation” and does not preclude the possibility that a few
unusual samples might have been selected by chance for the study (Allison et al., 2006),
thereby biasing statistical tests for both microarrays and qRT-PCR in the same direction.
Even though a better recommendation would to use different samples for validation by
qRT-PCR that are completely independent from the samples used in the microarray
experiment, there are yet no clear guidelines on how to assess appropriate sample size
calculations and multiple testing considerations for a second stage validation.

Multiple testing considerations for controlling false positive rates are generally

ignored in qRT-PCR validation experiments because of the limited number of tests (i.e.
selected genes) considered in the validation step. However, qRT-PCR is being
increasingly automated such that there is now more research where several dozen genes
are assayed simultaneously with these techniques (Perreard et al., 2006; Szabo et al.,
2004). With many simultaneous multiple tests, improper attention to experiment wise
error rates could yield an unacceptably large number of false positive results even in the
validation stage. For example, the conventional comparison wise Type I error rates that
were eventually realized to be too liberal for gene-speciﬁc tests in large microarray
experiments might likewise not be suitable for a validation experiment. For a validation
experiment using qRT-PCR, controlling the false discovery rate may be more useful to
control for the rate of false positives without unduly sacriﬁcing sensitivity as has been
demonstrated for microarray experiments (Verhoeven et al., 2005). In other words, the
decision rules for validating genes as being differentially expressed using qRT-PCR need
to be more carefully studied. There is also a need to evaluate experimental designs for
validation experiments and to provide a framework for helping to guide researchers

design initial screening (microarray) and subsequent validation (qRT-PCR) experiments.

4. General hypothesis
The efﬁciency and power of comparative transcriptional proﬁling can be
increased by further optimizing experimental designs and statistical analysis models and

methods for microarray and validation (qRT-PCR) studies.

5. Speciﬁc Aims

This dissertation intended to propose more efﬁcient experimental designs and
statistical analysis models and methods for expression proﬁling experiments. In
particular, linear ﬁxed and mixed effects models are developed for the analysis of both
microarray and qRT-PCR data, and attention is particularly directed to the modeling of
multiple levels of variation.

The questions that will be addressed are of interest to experimental genomicists,
and the proposed developments will be potentially useful not only to researchers working
in functional genomics research with animals, but also with other model organisms. More
efﬁcient designs for the microarray experiments coupled with more powerful second-
stage designs and analysis for the validation experiments will facilitate greater sensitivity
and speciﬁcity for discovering genes that are differentially expressed between treatments
of interest.

The overall aim of this dissertation was to investigate optimization possibilities
for the design and statistical analysis of expression proﬁling experiments, including the
two stage process of screening and further validating differentially expressed genes using

linear model methods.

The speciﬁc objectives were:
1) To compare alternative reference designs for statistical efficiency of two color
microarray experiments considering multiples sources of variation.
2) To investigate the ramiﬁcations of log-ratio versus log-intensity modeling in two
color microarrays using linear mixed effects models

3) To develop linear mixed models for the analysis of relative quantiﬁcation of RT-

PCR data.
4) To propose a general framework for determining the false discovery rate and
sensitivity of gene expression studies when jointly designing microarray

screening experiments linked to subsequent and selective qRT-PCR validation.

10

CHAPTER ONE

Juan P. Steibel and Guilherme J .M. Rosa. 2005. On reference designs for
microarray experiments. Statistical Applications in Genetics and Molecular

Biology. (4) 1:A36.

Reprinted with permission from the publisher, The Berkeley Electronic Press,
@2005. Originally published in Statistical Applications in Genetics and Molecular
Biology, available at http://www.bepress.com/sagmb/vol4/iss1/art36.

11

Statistical Applications in Genetics
and Molecular Biology

Volume 4, Issue 1 2005 Article 36

 

 

On Reference Designs For Microarray
Experiments

Juan P. Steibel* Guilherme J. M. RosaJr

‘Department of Animal Science, Michigan State University, steibelj@msu.edu

IDepartment of Animal Science and Department of Fisheries & Wildlife, Michigan State Uni-
versity, rosag@msu.edu

Copyright ©2005 by the authors. All rights reserved. No part of this publication may be re-
produced, stored in a retrieval system, or transmitted, in any form or by any means, electronic,
mechanical, photocopying, recording, or otherwise, without the prior written permission of the
publisher, bepress, which has been given certain exclusive rights by the author. Statistical Applica-
tions in Genetics and Molecular Biology is produced by The Berkeley Electronic Press (bepress).
http://www.bepress.com/sagmb

12

On Reference Designs For Microarray
Experiments*

Juan P. Steibel and Guilherme J. M. Rosa

Abstract

We compare four variants of the reference design for microarray experiments in terms of
their relative efﬁciency. A common reference sample across arrays is the most extensively used
variation in practice, but independent samples from a reference group have also been considered
in previous works. The relative efﬁciency of these designs depends of the number of treatments
and the ratio between biological and technical variances. Here, we propose another alternative of
reference structure, denoted by blocked reference design (BRD), in which each set (replication) of
the treated samples is co-hybridized to an independent experimental unit of the control (reference)
group. We provide efﬁciency curves for each pair of designs under different scenarios of variance
ratio and number of treatments groups. The results show that the BRD is more efﬁcient and less
expensive than the traditional reference designs. Among the situations where the BRD is likely to
be preferable we list time course experiments with a baseline and drug experiments with a placebo

group.

KEYWORDS: Microarrays, Experimental design, Reference design, Relative efﬁciency

‘This project was supported in part by National Research Initiative Grant no. 2004-35604-14580
from the USDA Cooperative State Research, Education, and Extension Service.

13

Steibel and Rosa: Reference Designs for Microarray Experiments

1 INTRODUCTION

The reference design is one of the simplest and most commonly used designs in
microarray experiments (Churchill, 2002). Despite being generally less powerful
than other alternatives such as the loop design (Dobbin and Simon, 2002; Kerr,
2003a; Kerr and Churchill, 2001; Yang and Speed, 2002), the reference design
presents some advantages (Dobbin and Simon, 2002), as it simpliﬁes the
statistical analysis and facilitates the comparison of results from different
experiments within a meta-analysis context.

The reference design consists of the hybridization of each test sample with a
common type of sample (Kim et al., 2002). In the statistical literature on design of
microarray experiments, however, the term “reference design” is used for two
different experimental layouts. Kerr and Churchill (2001) and Glonek and
Solomon (2004) implicitly use a “replicated reference”, where the reference
sample includes replication at the same level as the treatment samples. Hereinafter
we refer to this design as “classical reference design” (ClRD). Conversely,
Dobbin and Simon (2002) consider a common reference sample (“common
reference design”, CRD), which refers to a single (biological) sample that is co-
hybridized to each of the test samples. Both variants of reference design are used
in practice but the CRD seems to be preferred (Alizadeh et al., 2000; Lin et al.,
2002; Papp et al., 2003; Perou et al., 2000).

In general, in the CRD and ClRD the reference group is not of interest per-se.
Nevertheless, the reference group can be represented by one of the treatments of
interest, such as the initial time in a time course experiment (Yang and Speed,
2002) or a wild type strain (Wolﬁnger et al., 2001). To differentiate this design
from the ClRD described above, which also presents biological replication for the
reference group, we will denote it by replicated reference design (RRD). The aim
of this paper is two fold: ﬁrst to compare these alternative designs from both
statistical and technical points of view, and second to propose a new variant of the
reference design, which is shown to present higher power and lower cost than the
traditional alternatives.

The paper is structured as follows: in Section 2, the CRD, ClRD and RRD are
compared to each other in terms of total amount of resources required and their
average variance of contrasts. In Section 3, a new reference design alternative is
proposed and compared to the traditional reference designs (CRD, ClRD and
RRD). In Section 4, the results are discussed and general recommendations on the
use of each design are provided.

Produced by The Berkeley Electronic Press, 2006
14

2 Statistical Applications in Genetics and Molecular Biology Vol. 4 [2005], NO. 1, Article 36

2 THE TRADITIONAL REFERENCE DESIGNS

2.1 GENERAL LAYOUTS

Schematic representations of CRD, ClRD and RRD experiments are provided in
Figure 1 (Panels a, b and c, respectively). The experiments involve three
treatments (A, B and C) and 12 arrays; the indexes represent biological replication
within each treatment group. The CRD uses a single reference sample (denoted by
R) without biological replication on it, which is co-hybridized with each of the
four biological replications of each treatment group. Conversely, the ClRD uses
an independent reference sample in each array. The RRD, on the other hand,
considers one of the treatments as a reference (the group A in Figure 1c), from
which a different biological replication is hybridized in each of the 12 arrays. In
this case, group A has then 12 biological replications whereas the remaining
treatment groups have six replications each. In the CRD and ClRD, any gene
speciﬁc dye bias can be confounded with the reference sample effect, as it will
cancel-out in the indirect comparisons, therefore the dye swap approach is not
mandatory. The RRD however, should include both labeling directions so dye and
treatment effects can be disentangled.

a) A, A2 A3 A, B, B, B3 13, c1 c2 c, c,

litttttttit

RRRRRRRRRRRR
0)

g r t ‘t r

8 R9 R10 R11 R12

c) 31 132 B3 3,, 35 36 c1 c2 c, c, as c,
t t t t t I t I t I
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12

Figure 1: Alternative reference designs with three treatments (A, B and C) and 12
arrays: a) Common reference design (CRD) with four replicates (subindexes);
reference sample (R) is the same in all arrays; b) Classical reference design
(ClRD) with four replicates (subindexes) in 12 arrays; reference sample (R) is
replicated; and c) Replicated reference design (RRD), in which six replications of
treatments B and C are hybridized together with independent replicates of a
control treatment (A).

http://www.bepress.com/sagmb/vol4/iss 1 /art36
15

Steibel and Rosa: Reference Designs for Microarray Experiments 3

2.2 REQUIRED RESOURCES: ARRAYS AND mRNA SAMPLES

In terms of resources needed for running a speciﬁc experiment, we can compare
the three designs by ﬁxing either the total number of arrays or the number of
replications in each treatment group. Speciﬁcally, consider a CRD or ClRD with
K treatments and r biological replicates per treatment, such that r = C(K-I), where
c is an arbitrary integer value considered here to constrain the design space to
balanced cases only. In this case, n = rK arrays are necessary for running the
experiment. On the other hand, it is seen that a RRD can accommodate r’ = cK =
r+r/(K-I) biological samples with the same number of arrays (Table 1; compare
also Figures 1a and 1b). Likewise, ﬁxing the number of non-reference samples,
only n: r(K—I) arrays will be needed in a RRD as compared to rK arrays in the
CRD or ClRD.

Table 1. Number of biological replicates in non-control treatments (r), for a given
number of arrays (n) and treatments (K), for each alternative reference design.

 

 

 

 

 

 

Design

K n CRD ClRD RRD
6 3 3 6

2 10 5 5 10
20 10 10 20

3 12 4 4 6
24 8 8 12

4 12 3 3 4
24 6 6 8

6 30 5 5

 

2.3 LINEAR MODELS AND CONTRAST VARIANCE

The analysis of log-ratio of Cy3 to Cy5 intensities is considered here through the
following general gene-speciﬁc linear model:

r g=Xﬂg+831 I1 I

where r,g is a vector of log ratios (log Cy3 — log Cy5) for gene g in each array,
after suitable data normalization (Lonnstedt and Speed, 2002). Here we assume a
single spot per gene on each array, but in the case of multiple spots, the elements
of rg may represent the average (or median) normalized log-ratio from all the

Produced by The Berkeley Electronic Press. 2006
16

4 Statistical Applications in Genetics and Molecular Biology Vol. 4 [2005]. N0. 1. AﬂiCIC 36

spots corresponding to gene g in each array. In addition, X is an incidence matrix
(see example and details in the Appendix); and ﬁg is a vector of linear
coefﬁcients, taken to be ﬂg=[tg-m Tm-R)]T or ﬂg=[a {(2.1) thIT for the
CRD (or ClRD) and RRD, respectively, where rﬁ_j)=,u,--/tj represents the contrast
between treatments i and j (i, j: l,2,...,K and R), and dis a dye bias term. The
elements of the vector 88 are assumed independently normally distributed with
variances 0'52 for the CRD, and as? for ClRD and RRD cases. Assuming that the
variance components on the log intensity scale are of (biological variance) and
of (residual variance), it is seen that 0'52 = 0;,2 +20;2 and 022 = 20;,2 +20}
(please refer to Appendix for details).

While the CRD and the ClRD have only indirect comparisons between
treatment groups, the RD comprises K-I direct and (K-1)(K-2)/2 indirect
comparisons. As each kind of contrast has a different variance, we compare the
designs using the A-optimality criterion discussed in Kerr and Churchill (2001).
According to this criterion, the average variance of all contrasts is computed
considering all the comparisons equally relevant. If this is not the case, the use of
A-optimality criterion may not be appropriate for comparing experimental designs
(Yang and Speed, 2003). The variances of each kind of contrast (direct and
indirect), as well as the overall average variances are presented in Table 2.

Table 2. Total number and speciﬁc variances of direct and indirect contrasts
between treatments groups, and overall average variance of contrasts for
replicated (RRD), classical (ClRD) and common (CRD) reference designs.

 

 

 

 

 

 

 

 

 

RRD ClRD CRD
Number of Direct
____________ Compansons(K_l)00
2 2
Var[direct comparison] M - -
r
Number of Indirect (K - 2)(K -1) K(K-1) K(K ”1)
............ Compansonszzz
2 2 2 2 2 2
Var[indirect comparison] w m 2(00 +20.)
r r r
_ 2 2 2 2 2 2
Overall Average Variance (K 1)4(0',, +09) 4(0“ +0? ) 2(00 +20!)
Kr r r

 

The efﬁciency of the RRD relative to the CRD may be computed using the
following expression:

httpzl/www.bepress.com/sagmb/voI4/iSS l/art36
17

Steibel and Rosa: Reference Designs for Microarray Experiments 5

_ Var(CRD) [2]
RRD:CRD Var(RRD) ’

where l7ar(-) is the overall average variance of the contrasts in each design.

Similar equations can be used to compare any pair of designs.
For the designs presented so far, the relative efﬁciency is a function Of the
number of treatments (K) and (or) the biological-to-technical variance ratio

(0': / of ) , as discussed by Dobbin and Simon (2002). Figures 2a and 2b present

the relative efﬁciency as a function of the variance ratio for a given number of
biological replicates (r) or number of arrays (n), respectively. The comparison for
ﬁxed number of arrays is restricted to cases in which r=c(K-1), as discussed
before.

 

 

 

 

 

V.
__
>.
O
50!, z
a" a
— 0
3°. E
LLv— LU
w a
I.“
a I:
'3‘ E
U] m
0:0.
o-

 

 

 

 

 

 

 

I l l T I
—4 —2 0 2 4 —4 -2 O 2 4
log-2 variance ratio (biological :technical) log-2 variance ratio (biological :technical)

Figure 2. Relative efﬁciency of the replicated reference design (RRD) as
compared to the common reference design (CRD), considering: a) the same
number of replicates (r), and b) the same number of arrays (n). X-axis is the log2
of the variance ratio; curves correspond to different numbers of treatments (K).

The relative efﬁciency of the CRD as compared to the ClRD is a function
of the variance ratio only (Figure 3a), and the relative efﬁciency of the RD to
the ClRD depends only on the number of treatments. Similar results for the
relative comparison between CRD and ClRD were discussed by Tempelman
(2005).

Produced by The Berkeley Electronic Press, 2006
18

6 Statistical Applications in Genetics and Molecular Biology Vol. 4 [2005], NO. 1, Article 36

 

 

 

 

 

 

 

 

 

 

0.-
o1 9! '
a) N‘ b)
>- >-
g 00.. 0 °._
LE " E N
o ' 00
E :3- ‘E’ r" -
u. a e,_
E V... ‘é‘ " . '
l.— v- E V; I
5 5 .- , .
w or us '
m ,_:‘ (I: a! . . I ' I
FT 0 . . ' I .
. ' ' 0 O o o
0 O.
'— I I I I I '- I I I I I I l l
-4—20 2 4 246810121416
log-2 variance ratio (biological technical) number of treatments

Figure 3. Efﬁciency of the: a) common reference design (CRD), and the b)
replicated reference design (RRD), relative to the classical reference design
(ClRD). 0: ﬁxed number of biological replicates, I: ﬁxed number of arrays.

3 AN IMPROVED REFERENCE DESIGN

3.1 LAYOUT

An alternative layout for the reference design structures discussed so far is
represented in Figure 4, which will be denoted hereinafter by “blocked reference
design” (BRD). In this case, a reference treatment is used similarly to the RD
but with a set of r biological samples hybridized with every treatment. Here we
deliberately consider a single dye labeling for each subject in the reference group
(treatment A) but with a dye balanced across subjects within each treatment. Note
that for K=2, the BRD is the same as the RRD, which in turn are balanced
complete block designs.

Bifl 13E c2 Biro, if, 13Y f5 isles
A1 A2 A3 A4 A5 A6

Figure 4. Blocked reference design (BRD) with 3 treatments (A, B and C) and 6
replicates (subindexes) in 12 arrays. Each test sample is hybridized together with
a replicate from the control treatment (A); the same set Of replicates of the control
group is used with the other treatments.

httpzllwww.bepress.com/sagmb/vol4/iss l/art36
19

Steibel and Rosa: Reference Designs for Microarray Experiments 7

3.2 REQUIRED RESOURCES

Technically, all choices of reference designs (CRD, ClRD, RRD and BRD) can be
compared in terms of the number of replications or arrays required, as well as the
sample quantity (mRNA quantity) necessary for a given number of biological
replication (Table 3).

Table 3. Resources required in each of the four variants of reference design for
comparing K treatments, with r biological replicates for the non-control groups. q:
necessary aliquot of mRNA for each hybridization.

 

CRD ClRD RRD BRD

 

Number of non-ref. samples r r r r
Number of arrays rK rK r(K-1) r(K-I)

Number of ref. samples I rK r(K-1) r

mRNA quantity of non—ref. q q q q
mRN A quantity Of ref. qu q q (K—I)q

 

3.3 STATISTICAL EFFICIENCY

Model [1] with correlated residuals (see details in the Appendix) can be used for
the statistical analysis of the BRD. In this case, the residuals from log-ratios taken
with respect to the same reference sample will have the following covariance
structure:

I- '1

03 03' 0.3
0'2 0'2 02
_ a 8 can a
[(K—Iqu—I) — : z o. : 9 [3]
2 2 2
—aa a“ .O. as-

 

 

where 0'52 = 2 02,2 +2 0;} . The variance of an indirect contrast will be 2(o;,2+20;2)/r,
whereas the variance of a direct comparison will be 2(0;,2+o;2)/r. The BRD has
the same number of direct and indirect comparisons, as well as the same variance
for the direct contrasts, as compared to the RRD. The variance of the indirect
contrasts, however, is equal to the variance in a CRD. Consequently, ﬁxing the
number of biological replicates (of the non reference samples) in a speciﬁc
experiment, the variance of any contrast under BRD will be equal to or smaller

Produced by The Berkeley Electronic Press. 2006
20

8 Statistical Applications in Genetics and Molecular Biology Vol. 4 [2005], No. 1, Article 36

than the variance under either of the other designs. The average variance of
contrasts in a BRD is:

Var(BRD) =3[a§ +%(K—1)of], [4]
r

and the efﬁciency of this design is the highest under any variance ratio scenario
(Figures 5a and 5b). In addition, as the BRD uses fewer biological replicates of
the reference sample than the RRD and fewer arrays than the CRD, its cost will be
always the smallest among the three variants.

 

 

   

 

 

 

 

 

 

 

 

‘9 _ Q ..
5 5 b) '5?”
Z “a z I
H '- E (D
0 2 v: ‘
II ”I t
b T m
LIJ
g 53 g. If —
a: '- t: N. __
o, c:
‘— I I I I I '— F I I I I
—4 —2 0 2 4 —4 —2 O 2 4
log-2 variance ratio (biological :technical) I09--2 variance ratio (biological :technical)

Figure 5 . Relative efﬁciency of the blocked reference design (BRD) compared to:
a) the replicated reference design (RRD), and to b) the common reference design
(CRD). A ﬁxed number of replicates (r) is considered. X-axis is the log2 of the
variance ratio. Each curve corresponds to a different number of treatments (K).

Similarly to the RRD, the BRD does not spend half of the hybridizations
with a group that is not of direct interest. This allows incorporating more
replicates in a BRD as compared to the CRD, for a given number of arrays.
Consequently, the advantage in relative efﬁciency of the BRD as compared to the
CRD is even larger than what is shown in Figure 5b, for a ﬁxed number of
replicates (r). Relatively to the RRD, the efﬁciency curves presented in Figure 5a
for the BRD are the same if the comparisons are made at ﬁxed number of
replicates or ﬁxed number of arrays.

http://www.bepress.com/sagmb/vol4/issl/art36
21

Steibel and Rosa: Reference Designs for Microarray Experiments 9

4 RESULTS AND DISCUSSION

4.1 BIOLOGICAL AND TECHNICAL REPLICATION

This paper compares four variants of the reference design for microarray
experiments, considering both the technical and the biological variability. We also
distinguish between designs where the reference is not of interest per se (CRD
and ClRD) and designs where the reference is one of the groups under
comparison (RD and BRD).

Some of these designs have been compared in previous works. Yang and
Speed (2002) described the RRD, in which one treatment is used as the reference,
and compared it to a CRD for time-course experiments. The authors, however, did
not discuss the issue of biological and technical replication. Even though the error

variance in their work is always directly comparable to of. discussed here for the

RRD, the error variance for their CRD is not so unless we assume zero biological
variance in both designs. In a later paper, Yang and Speed (2003) incorporated the
effect of different levels of replications in design comparisons, but the relative
efﬁciency of the RRD to the CRD was not reassessed by them. Tempelman
(2005) compared the CRD and ClRD in terms of statistical power and efﬁciency,
at different levels of technical and biological variability. Our paper is more
general than the previous ones, as it compares additional reference designs and
makes also a clear distinction between biological and technical variability in the
analyses.

The methodological approach of this paper consists of analyzing the log-ratio
of Cy3 and Cy5 using a linear model parameterized in terms of the variances of
log intensities. Such model is equivalent to the intensity model (Kerr, 2003b) if
arrays are considered as ﬁxed effects. Our procedure may also be considered a
variant of the one presented by Yang and Speed (2003), where the variance and
covariances for the log-ratios are expressed in terms of the variance components
of the intensity model. This parameterization allows a general and straightforward
comparison of the variants of reference design for different biological-to-technical
variance ratios and number of treatments.

4.2 RELATIVE EFFICIENCY OF TRADITIONAL REFERENCE DESIGNS

Figure 3 shows that the ClRD is the least efﬁcient of the designs. It will be
equivalent to the CRD only in the hypothetical situation Of null biological
variance. Likewise, the ClRD is also less efﬁcient than the RRD, and will tend to
be as efﬁcient as the RRD only if the number of treatments is very large.
Consequently, from a statistical standpoint, it makes little sense to use the
classical reference design. Similarly to the ClRD, the RRD also uses a replicated

Produced by The Berkeley Electronic Press, 2006
22

 

10 Statistical Applications in Genetics and Molecular Biology V01- 4 [2005]. NO- 1. Article 36

reference, but such a reference is one of the treatments of interest. This strategy
increases its overall efﬁciency and allows for an increase of sample size (more
replicates for a ﬁxed number of arrays).

Figure 2 shows the tendency of the RRD to be more efﬁcient than the CRD
for lower variance ratios (smaller biological variance relative to technical
variance) and smaller number of treatments. With four treatments, if the number
of non-control replicates is held ﬁxed (Figure 2a), the relative efﬁciency tends to
an asymptotic value (for 052:0) of 1.33, which equals to the value (2.00/1.50)
reported in Table 2 of Yang and Speed (2002). Here, we also performed the
comparisons by holding ﬁxed the number of arrays (Figure 2b). For instance, with
three treatments, the RRD always outperforms the CRD, while 032< 0.2
guarantees relative efficiency above one for up to six treatments.

4.3 AN ALTERNATIVE REFERENCE DESIGN

In this paper, we propose a new variant of the reference design (denoted here as
BRD) and compare it to the traditional reference designs commonly used in the
literature. We show that the BRD is more efﬁcient and less expensive than the
other three variants of reference design, for experiments involving three or more
treatment groups. As shown in Figures 4 and 5, the average variance of the
contrasts for the BRD is always the lowest among the four alternatives. For lower
biological variances, the differences in efﬁciency between the BRD and RRD
vanish, whereas at higher biological variance the BRD and the CRD tend to be
equivalent. Consequently, for any microarray experiment the BRD may have
similar efficiency to the RRD for some genes and to the CRD for others, but on
average the BRD will always outperform both designs. As compared to the ClRD,
the BRD will always be more efﬁcient, as the ClRD is outperformed by the CRD
and the RRD as well.

4.4 TECHNICAL AND LOGISTIC ISSUES

As far as cost is concerned, at a ﬁxed number of biological replicates, the BRD
uses fewer arrays as compared to the CRD or ClRD and fewer biological
replicates for the control treatment than the RRD. Consequently, the BRD is
always the least expensive among the three alternatives.

In spite of the reference design being less efﬁcient than other designs under
certain circumstances (Bueno Filho et al., 2005; Dobbin and Simon, 2002; Kerr, -
2003a; Kerr and Churchill, 2001; Yang and Speed, 2002), it is still advocated by
some authors, especially in the applied genomic science. For example, the CRD is
suggested by some authors to be the most convenient design for sample clustering
applications (Dobbin and Simon, 2002) or genetical genomics studies (Jansen and

http://www.bepress.com/sagmb/vol4/iss1/art36
23

Steibel and Rosa: Reference Designs for Microarray Experiments ll

Nap, 2001), where the aim of the research is to extract the individual variation in
the expression proﬁle. In addition, even if the aim of the experiment is to compare
average proﬁles of gene expression among populations, the reference design may
be preferable in some cases. For example, if the comparison of each treatment to a
control group is more important than the other contrasts, the reference design can
be as good as any other design (Glonek and Solomon, 2004). This situation may
occur in preliminary or screening experiments including a large number of
treatments that are compared to a reference group (e.g. several drugs against a
placebo group or a time course experiment with a baseline). If the comparisons of
treatments against a control group are the most relevant, the RRD and BRD are
the only experimental designs that guarantee direct comparisons for those
contrasts and consequently should be the most efﬁcient designs. Additionally, the
BRD will be more efﬁcient than the RRD for the rest of the comparisons (among
no control treatments).

With more than three treatments, the BRD will be more efﬁcient than the
balanced incomplete block design for a ﬁxed number of biological replicates,
under almost any variance ratio scenario (refer to formulas in Supplementary
Material of Dobbin and Simon, 2002). Fixing the number of arrays, the direct
comparisons under the RRD and BRD cases will have the same variance as in the
balanced incomplete block design. The balanced incomplete block design,
however, will still show some signiﬁcant advantages over the reference designs
because it provides the same variance for every contrast, while the BRD will have
smaller variance for the comparisons against the control and larger variances for
the contrasts among non-control treatments.

Lastly, another situation where the reference design may be preferred over
other alternatives is when the number of groups or classes is uncertain at the
beginning of the experiment, for example within a class discovery context
(Dobbin and Simon, 2002). In these cases, the balanced incomplete block design
is not indicated, as one needs to anticipate the number of groups (as well as each
experimental unit membership) to be able to distribute them accordingly across
the slides. Under these circumstances, the CRD is the most indicated design,
especially if large quantities of reference samples are available.

The literature also shows a different aspect of the comparison of the reference
design to other designs based on direct comparisons (Belbin et al., 2004; Konig et
al., 2004). These authors compare the reliability of an indirect measurement of
fold change (through a reference) to a direct measurement. Using a CRD, both
works reached the same conclusion: the direct and indirect measures of log-ratio
or fold change tend to agree if some kind of ﬁltering criteria is applied to the data.
This is expected because keeping only the most reliable spots from each array will
diminish the overall technical variability and increase the reliability of indirect
measurements. The results from Belbin et a1. (2004) and Konig et a1. (2004),

Produced by The Berkeley Electronic Press, 2006
24

12 Statistical Applications in Genetics and Molecular Biology Vol. 4 [2005], No. 1, Article 36

however, should be interpreted with some care as they included only limited
technical replication, and no biological replication.

From a practical point of view, the researcher should examine not only the
statistical power and the cost of the arrays and experimental units when designing
a microarray experiment. Sample quantity and availability are also important
issues. The CRD requires a large quantity of mRNA from a single sample
(reference). The RRD, on the other hand, requires limited mRNA quantity from a
large number of biological samples, which may be unfeasible when working with
expensive or limited experimental units. Lastly, the BRD is an intermediate
alternative in this aspect as it requires the same number of experimental units for
all treatments (including the reference group), but the reference requires (K-I)
times more mRNA than the non-control groups. This last point may be prohibitive
when working with limited mRNA quantity and a large number of treatments (e.g.
experiments involving tissue samples from small animals or embryos).

Another issue to consider is the suitability of the control treatment as a
reference. In general, a treatment with a large number of non-expressed genes will
not be a good reference group. In these cases, a pool from several individuals, or a
synthetic reference sample should be used in a CRD layout. This issue is out of
the scope of this paper and the reader should refer to the existing literature
(Gorreta et al., 2004; Kim, et al., 2002; Sterrenburg et al., 2002) for further
discussion on this matter.

4.5 CONCLUDING REMARKS

In summary, this paper proposes an alternative design for microarray experiments
(denoted as BRD), which exploits the ﬂexibility of the reference design, but
minimizes the loss of efﬁciency of its indirect comparisons. Speciﬁcally, the use
of one of the treatments as the reference saves arrays and provides a subset of
direct comparisons, while the proper allocation of samples across arrays may
yield lower variance than a RRD for the indirect contrasts. The BRD presents
both of these advantages with a minimal increase in the required sample quantity
(as discussed above) for the reference treatment. Among the situations where the
BRD is likely to be implemented we list time course experiments (time zero as a
control, for example) and experiments including competing drugs and a placebo

group.

http://www.bepress.com/sagmb/vol4/iss l/art36
25

Steibel and Rosa: Reference Designs for Microarray Experiments 13

APPENDIX

Consider the following gene-specific model:
r1“ : Yilkl ’11::sz ,

where r,- is the normalized log—ratio relative to the i’h array, Yak, is the log-intensity
corresponding to the Cy3-labeled 1“ sample from treatment k and Yak-1 is the log-
intensity of the Cy5-labeled l'h sample from treatment k’. Assume also that (Rosa
et al., 2005):

Yijkl =#+Ai "I'D; +Tk +31“) +eijkl ’

where ,u is the general mean, A), D} and Tk are array, dye and treatment effect
terms (considered as ﬁxed effects here), respectively; Brut) is a subject-speciﬁc
effect, i. i.d N (O, of) ; and 60'“ is a residual term, i. i.d N (0, 0'3 ) . In general, a
model on the log-ratio scale will be derived from the formula above as follows:

"I = Yilkl _Yi2k‘l

=lu+Ai +DI +Tk +Bl(k) +eilkl _(ﬂ+Ai +D2 +Tk' +3106) +613“)
= (01 ‘02) +(Ti _Tk')+(Bl(k) —Bl(k') +eilkl Tent-'1)
= a+ 21H.) +e,,

where a=Dl-D2 is the difference between Cy3 and Cy5 labeling effects,
are“: Tk-Tk» is the contrast between treatments k and k’, and
8,-=B,(k, - Bl(k’) + euk, — em.) is a residual term containing both biological and
technical components.

Produced by The Berkeley Electronic Press. 2006
26

14 Statistical Applications in Genetics and Molecular Biology Vol. 4 [2005], NO. 1, Article 36

1. Model for CRD and ClRD (refer to Figure la; array 1 is the left—most array):

 

 

 

 

 

 

"r,“ "1 0 01 "sf
r2 1 0 0 £2
r3 1 0 0 £3
r4 1 0 0 3,
r5 0 1 0 — £5

Tut-R)
r6 0 1 0 £6
= Tris-R) '1-
r7 0 1 0 £7
TIC-R)
r8 0 1 0 - 88
r9 0 O 1 £9
rlo 0 0 1 £10
rH 0 0 1 E“
_r12_ _0 0 1- 1812‘

Here Tk’ is R (the reference), which is always labeled with Cy5.
Consequently the dye effect (intercept) can not be included in the model, but the
effects of interest (1) can still be estimated.

The variance of each error term a is given by:

a) CRD:

Var(£,.) = V0r(Bi(k) '— Bl(k') "I” eilkl _ei2k'l)
= 20,,2 +03,

and the covariance between any pair of elements is null, Cov(&, av)=0, so the
(co)variance matrix relative to the residual vector 8 can be expressed as

Var(£) = [(12) (0: + 203). Note that Var(B,(,,.)) = 0 as there is no biological
replication in the reference sample.

b) ClRD:

Var(€,-) = Var(B,(k, ’ 8100+ eilkl _ eiZk'l)
= 20,2 + 20:.

http://www.bepress.com/sagmb/vol4/iss 1/art36
27

Steibel and Rosa: Reference Designs for Microarray Experiments 15

Likewise, the covariance between any pair of elements is null, Cor/(8,, aw)=0. The
(co)variance matrix relative to the residual vector scan be expressed as

Var(€) = 1“,,(203 + 203).

2. Model for RRD (refer to Figure 1b; array 1 is the left-most array):

— — run- _ p -

 

 

 

 

 

 

r1 1 -l 0 8l
r2 1 - 1 0 £2
r3 1 —1 0 £3
r4 1 1 0 8,,
r5 1 1 0 - a 85
r 1 1 0 8
6 = 128—A) + 6
r7 1 0 —l 87
TIC—A)
r8 1 0 —1 ~ 88
r9 1 0 —1 £9
rIO 1 0 l 8,0
rH 1 O l 8”
_r12 _ _l 0 1 _ _£12 _

As in this case the control treatment A (T1,) is labeled with both Cy3 and Cy5, the
dye effect (dye bias) can be modeled by including an intercept into the model. In a
RRD, as the reference treatment is also replicated (at the biological level), the
variance Of each residual term is given by:

Var(€,.) =Var(B,m — Bl(k') +eilkl ‘81:“)
=ud+ﬁt

which are independent to each other, so the residual (co)variance matrix can be
expressed as Var(£) = I (12,2(0: +03). Notice that this matrix is equal to that of

the ClRD.

Produced by The Berkeley Electronic Press, 2006
28

16 Statistical Applications in Genetics and Molecular Biology Vol, 4 [2005], No, 1, Article 36

3. Model for BRD (refer to Figure 4; array 1 is the left-most array):

 

 

 

 

 

 

P rl - '1 —1 0 I F 8, -
r2 1 0 -1 .62
r3 1 —l 0 £3
r4 1 0 —1 8,,
r5 1 —l 0 - a 85
r 1 0 --1 86
r: = 1 1. 0 7‘3"" + a,
r, 1 0 1 L7“) 5,
r9 1 1 0 89
r10 1 0 l 8,0
r11 1 1 0 8”

f1: _ _l 0 1 _ _€,2 _

Similarly to the RRD, an intercept is included into the model to estimate (and to
account for) the dye effect. Here also the residual variances are given by

Var(£,-) = Var(B,(k) "' But») + eilkl _et2k'l)
= 2(0,2 +05).

The residuals, however, are not independent. The (co)variances are given by:

C0v<£r ’81") = C0V(Bl(k) —Bl(k') +eilkl T6121”! 9310") — Bl(k'") +8i'lk"! Tern-"7)
= C0v(—B,(k.) ,—B,(k...) ),

which can be expressed as:

2 .

0' l B ,. =B

C0v(—B](k')9—B[(k'")) : a f l(k) . “A I .
0 otherwrse

The residual (co)variance matrix is then written as

a
02 2(03 + 03)

a

2 2 + 2 0.2
Var(g):](6)®[ (0.. 0'.) J

where O is the Kronecker product operator.

http://www.bepress.com/sagmb/vol4/iss l/art36
29

Steibel and Rosa: Reference Designs for Microarray Experiments 17

REFERENCES

Alizadeh, A. A., M. B. Eisen, R. E. Davis, C. Ma, I. S. Losses, A. Rosenwald, J.
C. Boldrick, H. Sabet, T. Tran, X. Yu, J. 1. Powell, L. Yang, G. E. Marti, T.
Moore, J. Hudson, IL, L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C.
Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy,
W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. 0. Brown, and L. M. Staudt
(2000). Distinct types of diffuse large B-cell lymphoma identiﬁed by gene
expression proﬁling. Nature 403(6769):503-511.

Belbin, T. J., J. Gaspar, M. Haigentz, R. Perez-Soler, S. M. Keller, M. B.
Prystowsky, G. Childs, and N. D. Socci (2004). Indirect measurements of

differential gene expression with cDN A microarrays. Biotechniques 36(2):310-
3 14.

Bueno Filho, J. S. S., S. G. Guilmour, and G. J. M. Rosa (2005). Design of
microarray experiments for genetical genomics studies. Submitted.

Churchill, G. A. (2002). Fundamentals of experimental design for cDN A
microarrays. Nat Genet 32 Supplz490-495.

Dobbin, K. and R. Simon (2002). Comparison of microarray designs for class
comparison and class discovery. Bioinformatics 18(11):1438-1445.

Glonek, G. F. V. and P. J. Solomon (2004). Factorial and time course designs for
cDNA microarray experiments. Biostatistics 5(1):89-111.

Gorreta, F ., D. Barzaghi, A. J. VanMeter, V. Chandhoke, and L. Del Giacco
(2004). Development of a new reference standard for microarray experiments.
Biotechniques 36(6): 1002- 1009.

Jansen, R. C. and J. P. Nap (2001). Genetical genomics: the added value from
segregation. Trends Genet l7(7):388-391.

Kerr, M. K. (20033). Design considerations for efﬁcient and effective microarray
studies. Biometrics 59(4):822-828.

Kerr, M. K. (2003b). Linear models for microarray data analysis: Hidden
similarities and differences. J Comput Biol l0(6):891-901.

Produced by The Berkeley Electronic Press, 2006

30

18 Statistical Applications in Genetics and Molecular Biology V01. 4 [2005]. N0. 1. Article 36

Kerr, M. K. and G. A. Churchill (2001). Experimental design for gene expression
microarrays. Biostatistics 2(2): 183-201 .

Kim, H., B. Zhao, E. C. Snesrud, B. J. Haas, C. D. Town, and J. Quackenbush
(2002). Use of RNA and genomic DNA references for inferred comparisons in
DNA microarray analyses. Biotechniques 33(4):924-930.

Konig, R., D. Baldessari, N. Pollet, C. Niehrs, and R. Eils (2004). Reliability of
gene expression ratios for cDNA microarrays in multiconditional experiments
with a reference design. Nucleic Acids Research 32(3):e29.

Lin, S. J., M. Kaeberlein, A. A. Andalis, L. A. Sturtz, P. A. Defossez, V. C.
Culotta, G. R. Fink, and L. Guarente (2002). Calorie restriction extends

Saccharomyces cerevisiae lifespan by increasing respiration. Nature
418(6895):344-348.

Lonnstedt, I. and T. Speed (2002). Replicated microarray data. Statistica Sinica
12(1):31-46.

Papp, B., C. Pal, and L. D. Hurst (2003). Dosage sensitivity and the evolution of
gene families in yeast. Nature 424(6945): 194-197.

Perou, C. M., T. Sorlie, M. B. Eisen, M. van de Rijn, S. S. Jeffrey, C. A. Rees, J.
R. Pollack, D. T. Ross, H. Johnsen, L. A. Akslen, O. Fluge, A. Pergamenschikov,
C. Williams, S. X. Zhu, P. E. Lonning, A. L. Borresen-Dale, P. 0. Brown, and D.

Botstein (2000). Molecular portraits of human breast tumours. Nature
406(6797):747-752.

Rosa, G. J. M., J. P. Steibel, and R. J. Tempelman (2005). Reassessing design and
analysis of two-colour microarray experiments using mixed effects models.
Comparative and Functional Genomics 6(3): 123-131.

Sterrenburg, E., R. Turk, J. M. Boer, G. B. van Ommen, and J. T. den Dunnen
(2002). A common reference for cDNA rrricroarray hybridizations. Nucleic Acid
Research 30(21):e116.

Tempelman, R. J. (2005). Assessing statistical precision, power, and robustness of
alternative experimental designs for two color microarray platforms based on
mixed effects models. Vet Immunol Immunopathol 105(3-4):175-186.

http://www.bepress.com/sagmb/vol4/iss l/art36
31

 

Steibel and Rosa: Reference Designs for Microarray Experiments 19

Wolﬁnger, R. D., G. Gibson, E. D. Wolﬁnger, L. Bennett, H. Hamadeh, P.
Bushel, C. Afshari, and R. S. Paules (2001). Assessing gene signiﬁcance from

cDN A microarray expression data via mixed models. J Comput Biol 8(6):625-
637.

Yang, Y. H. and T. Speed (2002). Design issues for cDNA microarray
experiments. Nat Rev Genet 3(8):579-588.

Yang, Y. H. and T. P. Speed (2003). Design and analysis of comparative
microarray experiments. in Statistical Analysis of Gene Expression Microarray
Data. T. P. Speed, ed. CRC Press, Boca Raton, FL.

Produced by The Berkeley Electronic Press. 2006
32

CHAPTER TWO
COMPARISON OF LOG-INTENSITY AND LOG-RATIO LINEAR MODELS

FOR TWO COLOR MICROARRAYS.

ABSTRACT: In this paper, we compare log-ratio and log-intensity models for two-color
microarray experiments with technical and biological replication. A linear transformation
of the log-intensity model is used to derive the equivalent log-ratio model for a variety of
designs. We demonstrate that some designs, such as a dye swap or connected loop
designs, require the speciﬁcation Of random sample pair effects in the ratio. For a split
plot design where arrays are experimental units, log-ratios are not convenient as this
makes some effects unestimable in the model. Furthermore, analyses using log-intensity
values generally allow for more efﬁcient recovery of inter-slide information in design

involving more than two treatments as we determine for different scenarios.

33

 

Introduction.

Differentially labeled ﬂuorescence intensities are used to compare two samples
for mRN A abundance of a particular transcript for any one spot on a long oligonucleotide
or cDNA microarray. Statistical analyses of such two color microarray experiments have
often been based on using logarithms of ratios (log-ratios) of these spot-speciﬁc
ﬂuorescence intensities as the response variables (Yang and Speed, 2003) although
models based on the use of logarithm of the corresponding intensities (log-intensities) for
each dye have also been used (Kerr et al., 2000; Wolﬁnger et al., 2001). Logarithms,
typically to base 2, are often used to transform microarray ﬂuorescence intensities and
their corresponding ratios in the anticipation that the resulting data will facilitate analysis
by parametric statistical methods that depend upon normality assumptions. Current
statistical models for the analysis for log-intensities typically include fixed and random
effects (i.e. mixed models) to account for different levels of replication (Rosa et al.,
2005), whereas ﬁxed effects linear models are more commonly used for log ratios
(Smyth, 2004). Multilevel models have been proposed to analyze log-ratios although the
most commonly used implementation requires a common intra class correlation
assumption across transcripts (Smyth et al., 2005). Other generalized least squares (GLS)
or mixed model analyses have been used tocompare experimental designs using log-
ratios measures of expression (Steibel and Rosa, 2005). The literature on the comparison
of log-ratio versus log intensity models has focused its attention on those models that are
readily speciﬁed for either response variable (Kerr, 2003b), but not in more complex

designs that demand modeling different hierarchical levels of replication.

34

 

Modeling log-intensities allows specifying the array effect as random such that
for some incomplete block designs, it is possible to recover inter array information
(Graybill and Deal, 1959). On the other hand, an analysis based on ratios is strictly an
intra array analysis such that there is a potential loss of information.

In this paper we demonstrate how to derive a linear model for log-ratios starting
with linear mixed model speciﬁcations for log-intensities. We further compare log—ratio
and log-intensity based models for several experimental designs. In the ﬁrst section,
general methodology for deriving the log-ratio model starting from a log-intensity model
is presented. In the second section we apply the method to several designs and show in
which cases and under which circumstances the results will be equivalent. In the last
section we elaborate on the possibility of recovering inter-array information using a log-

intensity model as opposed to a log-ratio model.

Linearly transforming a log intensity model to a log ratio model.

For any two color microarray design, a linear (mixed) model can be developed to
analyze the log-intensity values (Wolﬁnger et al., 2001). These designs usually include
ﬁxed effects like the two dyes, treatments and other covariates as well as random effects
such as arrays and other terms to account for different levels of replication. A log-ratio
statistical model could be derived from a log-intensities model by simply applying a
linear transformation. After applying this transformation, the original intercept and the
array effects vanish. The new intercept for the log-ratios model is simply the dye effect

(Vinciotti et al., 2005) whereas the treatment effect is parameterized in terms of a set of

35

treatment contrasts (Smyth, 2004). Furthermore, the variance components in the new log-
ratios model will be a linear combination of the variance components in the log-intensity
scale. As a result, random effects often still persist in the statistical model for log-ratio
data in some designs such that they should be included. We further illustrate this

property for some designs in the next section.

Ratio models for specific designs

Consider the common reference design based on, for example, it biological
replicates per treatment for each of t treatments. The design then requires a total of nt
arrays and 2m intensity measurements, including nt measurements on the commonly
labeled reference sample, if each gene is spotted only once on each array and there is no
technical replication (i.e. no multiple measures per biological replicate within
treatments). Let’s suppose that the 2m x 1 vector of log-intensities y is sorted by
treatment and by array, as in the order provided from left to right in Figure 2.1, such that
the ﬁrst dye labeled sample is speciﬁed before the second dye labeled sample within each

array. A suitable mixed effects model for the log intensity model could then be written as

follows:
-11_ -bl,l -
y 1'2 611 bl,2
y=1mp+XD[ l]+XT E +ZA E +ZB i +e [1]
72
7t ant bt,n
le _bR,1_

 

 

 

 

36

1 0
Here, XD=(1,,,®12) and XB=[1,,®I,®|:O] | ln®[1]:| define the design

matrices for the ﬁxed effects of dye and treatment respectively. Similarly,

I

0
] I In, 653': :H deﬁne the design matrices for the
0 1

ZA :(Int @12) and ZB =[Int ®[

random effects. Note that 1,, denotes an unitary row vector of dimension n whereas I,

denotes the identity matrix of order t. Also 69 denotes the Kronecker or direct product

(Searle, 1982) whereas we use the notation I to denote the horizontal concatenation of two

matrices. Hence it denotes the overall mean, 71 and 72 denote the ﬁxed effects of the two
dyes, T1,... TR denote the treatment effects, a1, a2, ...., an, denote the random effects of

arrays m = l,2,...,nt, respectively, whereas ka denotes the random effect of the lth

biological replicate within the kth treatment, noting that there is only one common

replicate in the reference group having effect bRJ. Finally, e denotes the vector of

residuals which are presumed to be distributed as N(0,12,,,03 ); similarly,
2 2
am ~ NIID(0,0a) vm and 12,“, ~ NIID(0,0b) Vk,l.

Suppose that the vector of log-intensities y is sorted by array and dye (e.g. Cy3

and Cy5) within array such that the vector of log ratios, r, can be determined as r = Ly,

for L=Im ®[1 —l]. Then, as expected, the dimension nt x 1 of r is half that of y.

Premultiplying also the right side of Equation [1] by L, we derive the following

equilavent log-ratio model for r:

37

I- ill

1'1 ‘TR

Tz—TR
r=1m(t’I-72)+(1n®1:) - +3 [2]

 

 

-7: ”TR _
2 2 _ 2 2 . 2 . . . . .
where 3 ~ N 0, 1,1,05 for 0'5 — 0b... +20}: With 017* denotrng the biological variatron

for the difference between the treated and common reference biological replicates.

Either of the two model speciﬁcations ([1] or [2]) will produce identical results
using conventional ordinary least squares (OLS) estimation as appropriate when all
effects are assumed to be ﬁxed as they are in equation [1] if we momentarily assumed
array effects to be ﬁxed. However, assuming random array effects in equation [1] would
potentially allow for the recovery of inter-block information (Kerr, 2003a) if data is
missing.

Now consider a dye swap balanced block (i=2) design (Dobbin and Simon, 2002;
Kerr, 2003a) with n biological replicates without technical replication such that it also
deﬁnes the number of arrays as in Figure 2.2. Again, ordering the vector of log intensities

y by array and then by dye within array we have the following intensity model:

“I
_ 7’1 71 ~
y—lznﬂi'XD +XT +ZA I +8,
72 72
am
'1 0l
0 1
where XD:(ln®12)’XT=1n ® ,and ZA:(In®12)'
/2 0 1
_1 0‘

 

 

38

Invoking the same distributional specifications on the random and residual effects
and developments analogous to those used in progressing from Equation [1] to [2] for the
common reference design, a ﬁxed effect linear effects model may be speciﬁed for the

log-ratios in this balanced block design using r = Ly such that:

r :1n(71 —72)+[1% (’9[_11D(1'1-72)+8

[3]

Here 8 ~ N(0,Io'2€) which can be shown to be expressed as a function of the residual and

biological random effects of the log intensity model since 03 = 2057‘ + 20‘;2 .

Let’s now consider the balanced block design with n biological replicates per each
of the two treatments but such that now a dye swap is conducted on the same two
biological replicates hybridized twice (m=2) against each other. Then the total number of

arrays is nm. Figure 2.3 illustrates the case for m = 2. The intensity model is speciﬁed as

 

 

 

 

follows:
’71- ~bI,I-
y 72 a1 b2,l
y=14na+XD[7]]+XT 1 +ZA +ZB E +e,
2
7r am bl,n
-TR- __b2,n_
'1 0l ’1 0‘
. O 1 0 1
Wlth XD:(12n®12)9 XT=1n® O 1 ;ZA:(12n®12) and ZB:In® 0 1 .
_l 0_ -1 0_

 

 

39

 

 

Invoking again our transformation r = Ly and the same distributional assumptions
on the random and residual effects as before, it becomes obvious that we need to specify

the random effects of biological pair replicates in the log-ratios model for this design:

a:

r = 1m, (71 — y2)+[l,, ®[_11D(71—12)+(In ®I-11II bI +8. [4]
b

it

Here b; = ’91,! —b2,, are the random biological pairing effects or the differences in

the random biological replicate effects from the two treatments (1 and 2) paired against
each other on the same array. Therefore, the distribution of these random effects as a

function of the biological replicate variance component 01‘} on the log intensity scale is

b; ~ NIID(0,20g )‘V’l ; similarly, it could be readily shown 8 ~ N (0, 2Inm0e2) where 03

is the residual variance on the log intensity scale.
Note that the log-intensity model [3] and log—ratio model [4] will yield identical
results if a mixed model analysis that properly speciﬁes the random effects is used in

both cases. However, only one random effects factor with half as many biological effects

(bf, b;,,...,b;) needs to be speciﬁed in the log-ratios model [4]. If a mixed model

software is not available (e.g. using OLS thereby ignoring bf, b;,,...,b; ), one might

treat these biological pair effects as ﬁxed and then compute a contrast based on those

effects:

1 b1
r=12,,(7]—72)+(In®|:_1]] +£ [5]
b"

40

. A .t 1 ~ .
Then, the treatment difference would be estimated by 71—72 =—Zbk. ThIS
n
k

contrast gives an unbiased estimate of the treatment difference, however, it will not lead
to the correct inference (i.e. standard errors and P-values) because this contrast along
with [5] ignores biological variability.

Using model [5], the OLS statistic used to test the treatment effect contrast would
fail to partition biological from technical variability. This is, under the null hypothesis

H0 271—72 2 0, the F -statistic from model [4] is not distributed following a central F-

distribution (as expected) if there is biological variability for gene expression. Conversely

because of the simpler experimental design, model [3] will provide an F statistic that will

indeed test for treatment effect, even in the presence of non-null 0,3. In this design,

treating the array as ﬁxed or random has no effect because each array represents a
complete block.

A connected loop design is an incomplete block design that can be shown to be
particularly effective for comparing t > 2 treatments in a two color microarray system
(Tempelman, 2005). Figure 2.4 illustrates Such a design for t = 3. The corresponding log

intensity model for t = 3 as described in Rosa et a1. (2005) is presented below:

41

 

 

 

 

31,11

_1.'1- b2,l

72 01 b3,1

y—16np+XD|:;/l]+XT +ZA +ZB +e,
2

7t a3n bl,n

_TR_ b2,n

_b3.n_

where,

“I 0 0‘ '1 0 0“
0 1 0 0 1 0
0 1 0 0 1 O
X=1®I,X=l® ,Z=I®1andZ=I® .
D(3n 2) T n 001 A(3n 2) B n 001
0 0 l 0 0 1
_1 0 0_ __1 o 0_

 

 

 

 

This model could also be transformed to a log-ratio model using the same linear
transformation (i.e. r = Ly) described previously. Equation [6] represents the

corresponding log-ratio model for the t=3 as from Figure 2.4.

I I711 [6]
71—73 .,
l“=13n(?’1—7’2)+Xt +21: 1 +8
72-73
b3,n
1 —l 1 —1 0
where X,=1,,® 0 l and lb: In® 0 ["69 1 ["69 —1 are known
—1 0 —1 O 1

design matrices. Again, the distribution of random effects may be parameterized in terms

42

of the log-intensity model variance components ka ~ NIID(0,0§) and

8 ~ N11D(0,2082).

Unlike that of any of the other previously discussed designs, the assumption on
the speciﬁcation of the array effects as ﬁxed or random in the log-intensity model is
important in the connected loop design. Specifying array effects as ﬁxed leads to the log-
ratio model speciﬁed in [6] above. The residual variance estimated with model [4] will be
twice the residual variance of the intensity model, while the biological variance estimates
will be identical to that from the log-intensity model. However, if array effects are
assumed to be random in the log-intensity model, there will be recovery of inter-array
information and the estimates of the treatment difference (and their standard errors) will
be different between the two models.

There are some microarray designs where there is no equivalent model between
the log-intensity and log-ratio scales, where array effects are treated as ﬁxed or random.
This is true for the split plot design, as illustrated in Figure 2.5, where there are at least
two different treatment factors, say S and T. Consider the simplest case where each

factor has two levels. For the subplot factor, for example T, comparisons are made

between two different levels or treatments, T1 and T2 for biological replicates within

each array, analogous to that considered for the balanced block design in Figure 2.2.

Levels of the wholeplot factor S are further superimposed on this design such that s

arrays are assigned to SI and s arrays to 52. In other words, arrays serves as the

experimental units for factor S whereas biological replicates serve as the experimental

43

units for T. Suppose that the data vector y is sorted by the two levels of S, by array and

then by dye. Then the linear mixed model for the log—intensity model is presented below.

— -

 

 

 

 

l O
y=1 2+0 er)’l +(r<>91)“1+1e°1 ”1+
40 23 2 9,2 2 2s a2 5 O 1 72
K ‘1 O-\-a'l'11_ all
0 I “1'12 .,
I 691 C9 + I (91 : +e,
612,
K _I O_)_at'22_ s

 

 

 

 

where all and a2 denote the ﬁxed effects of the main plot factor, 1:1 and 1:2 denote the

fixed effect of the sub plot factor, and at” ...0t122 denote the ﬁxed effect of the

interaction between factors, a“ ~ NIID(0,03) and e ~ N (0,1062). Given that array

serves as the experimental unit for S, array effects must be treated as random. By pre-
multiplying the incidence matrices of this model by L, the columns corresponding to the

main plot factor vanish. In other words, the following log-ratio model cannot test for

main plot factor effects.

1 1 at —02'
l‘ = 1236+{ra ®[_1D(q —t2)+[12 e313 ®[_1D[m;: _m‘2:]+e, [7]

where 8 ~ NIID(0,203).

44

For the other fixed effects of interest (the sub-plot factor, interaction and simple
effects of subplot factor within main plot factor), the tests based on the log-ratio model

and log intensity model will be equivalent.

Recovery of inter-array information.

The analysis of log-ratios is equivalent to performing an intra-array analysis; in
other words, there is no other choice than to treat array effects as ﬁxed when analyzing
log-ratios. On the other hand, one may treat array effects as ﬁxed (intra-array analysis) or
random (combined inter-intra-array analysis) for the analysis of log intensity data. In this
section we describe a method for estimating the amount of recovery of inter-array
information using a mixed model analysis on log-intensities (i.e. treating arrays as
random).

Consider, for example, the reference design of Figure 2.1 using Equation [1] as
the analysis model. Assuming array effects to be ﬁxed would be equivalent to analyzing
the difference (log-ratio) between the two treatments at each spot. The (intra-array)

estimate of the treatment effect is:
A 1 1 ' _
71—72=;Z(Yill’yi‘RI)—;Z(yi"2l"J’i'R1) [8]
i i '
Equation [8] shows that the intra-array estimate of the treatment difference is

actually equivalent to a log-ratio analysis. Speciﬁcally, instead of analyzing the

(k)

intensities we take the log ratio for each array: ri = yik, — YiRl and then we ﬁt equation

45

[2]. Conversely, if the array effects are considered random, two estimates of a treatment
difference are available:

A] 1

, 1
a) Inter-array estimate: 71 — 1'2 - z Z I yi '21 + Vi ' R1 )
it

20m + YiRl)
i

n

 

. 2 1 1
b) Intra-al‘ray esumatei 71—72( )=;Z(Yill_YiRl)_;Z(yi‘21_Yi'Rl)
i "

1
Note that in these expressions, the choice of treatments (k=1, 2 in this example) will
determine the set of arrays (i or i’) used in the computation of contrasts. Also, note that

the subindex l for subject is entirely determined by the array subindex i. As before, there

is only one subject (R 1) for the reference.

A single, minimum variance estimate can be Obtained for the treatment difference

with:

 

 

 

(0'3 i(71 “72)(1)+(0e2 +203 )(71 '72 )(2) [91

r—r’l‘:
1 2 2 2
20e+0a

 

A mixed model (equation [1] with random array) may be used to obtain the
combined intra-inter array estimate of the treatment difference in equation [9]. Kerr
(Kerr, 2003a; 2003b) present expressions to compute the variance of a treatment contrast
in this design assuming either ﬁxed or random array effects. Specifying arrays as random
facilitates a more efﬁcient estimation of the treatment difference. For more complex
designs, the analytical derivation of the contrast variance under a mixed model may be

unwieldy, although a numerical approach is possible using linear mixed model software

46

(Rosa et al., 2005; Tempelman, 2005). This procedure has been used to compute the
relative efﬁciency of two designs, but can also be applied to compute the relative
efﬁciency of two models to estimate the treatment difference.

Assume an arbitrary linear mixed model (in the log-intensity scale):

y = X11 + Zu + a ,
where y is a vector containing the elements ygijlk, ﬂ is a vector of fixed effects (e.g. dye

and treatment effects), 11 is a vector Of random effects (e. g. array and biological replicate),

a is a vector of residuals, and X and Z are design-speciﬁc incidence matrices. For n and e

i:l~w~llillii Jill

where G is a diagonal variance-covariance matrix for the random effects with elements

we assume:

02:2 and 07,2 and R=loez, so y~N(X|3,V), with V=ZGZ’+R. In this context, the

comparison. of experimental designs may be performed in terms of their relative
efﬁciency to estimate a treatment difference. Given two alternative designs, A and B,

their relative efﬁciency may be expressed as:

A

_ Var(z'1 — 2'2 )3

 

REAzB [10]

 

Var(t'1 "' 1'2 )A
If RE)”; is > 1 implies that design A is more efﬁcient than design B to estimate

the treatment difference. The variances under each design may be estimated analytically

(Kerr, 2003a) or numerically (Stroup, 2002) depending on the particular designs and

47

models under consideration. In general, given design A with ﬁxed and random effects

incidence matrices X4 and ZA, respectively, and a contrast vector K’, such that, for

example, K13: 1'1 — 72 speciﬁes the estimated treatment difference, then

Var(71 42),, = K' [X'A(z,,,(;z'A + R)‘1x,,]_ K.
To estimate the increase in efﬁciency due to recovery of inter-array information, the

equation [8] is used replacing Var(z'1 —t'2)A with the variance of the estimated treatment

difference from the ﬁxed array effects model and Var(t”'2 —f'l)3 with the contrast

variance derived from the mixed model as follows:
Var(’t'1—z'2)3 = K. [X'B(ZBGZ'B +R)_1XB]_ K .

For example, given a common reference design, the relative efﬁciency of the
mixed model to the ﬁxed array effects model is illustrated in Figure 2.6. The maximum
increase in efﬁciency occurs when the biological and array variance are small compared
to the residual variances. For any variance ratio close to 1, we expect a moderate increase
in efﬁciency on the order of 20% to 30% whereas for biological or array variances more
than four times the residual variance, the gain in efﬁciency will be negligible.

Analytical calculations are not possible in the connected loop design comparison
such that we use a numerical approach to estimate the contrast variances (Table 2.2). The
mixed model procedure (PROC MIXED) of SAS was used to compute the contrast

variances for both models conditional on the variance components ratios 03 303 and

48

0g :03. For this design, the maximum value for the relative efﬁciency is 1.33 and

occurs when both biological and array variance components are zero. If the magnitude of
the residual variance is on the same order of magnitude of the other variance components
(variances ratios equal to one), the recovery of inter block information will be less that a

5% increase in efﬁciency.

Discussion

Linear mixed models are commonly used for the analysis of two color microarray
data. Software packages are either oriented towards the analysis of log-intensities or log-
ratios. Previous studies have compared log-ratio and log-intensity models for designs
with only one level of replication (Kerr, 2003a, b; Vinciotti et al., 2005). We have
extended these comparisons to experiments with technical and biological replication. An
appropriate log-ratio model can be derived from a linear transformation of the log-
intensity model specifying various levels of technical and biological replication. By pre-
multiplying the model components by a contrast matrix, we have shown that for some
designs (connected loop, dye swap) there is an equivalent linear model in the log-ratio
scale.

For the balanced block design, the two approaches are always equivalent. In
particular, a balanced block design without technical replication can be analyzed in either
of the two scales using ﬁxed effects models with identical results. If technical replication

is present, the random effects of the sample (in the log-intensity model) or sample pair (in

49

the log-ratio model) should also be included. Moreover, treating the biological replicates
as ﬁxed effects does not produce the correct tests as we have shown in Table 2.1.

If more than two treatments are included in the design (for example the loop or
reference designs), the derived log-ratio model will be equivalent to the log-intensity
model based on ﬁxed array effects. However, we have also demonstrated that the analysis
of log-ratios is sometimes not useful as it is not possible to estimate certain effects (for
example in a split-plot design). This situation can be considered an extreme case of
information loss in the intra-array analysis.

Analyzing log-ratios rather than log-intensities may involve a loss of information,
particularly if array effects are properly treated as random in the latter. The mixed model
with random array effects will tend to be more efﬁcient than the ﬁxed array effects model
for log-intensities (equivalent to the log-ratio model) as we have illustrated in a common
reference design and in a connected loop design. In those cases, the amount of recovery
of inter array information is a function of the variance ratios. Large residual variance
relative to both array and biological variances will lead to the maximum information
recovery using mixed model analyses in the common reference design. Simmilarly, for
the connected loop design, if either the array or biological variances are small compared
to the residual variance, the recovery of inter-array information will be maximized using
the mixed model analysis.

It is commonly perceived that two color microarrays yield a measure of relative
expression and consequently the ratios are to be preferred over absolute intensities.

However as true this argument is, it does not invalidate the use of log-intensity models,

50

provided that array effects are specified. Moreover, for a simple design (dye swap design
without technical replication), the log-ratio and log-intensity models are completely
equivalent. This is analogous to the comparison of a paired t—test and ANOVA for a
randomized complete block design with two treatments. The paired t-statistic is the
square root of the F test from the ANOVA and both will yield identical P-values.

Another point to consider is the effect of normalization on the model comparison.
In this paper we assumed data to be properly normalized as well as log-transformed. In
general, the normalization is done in the log-ratio scale using intra-array LOESS (Yang et
al., 2002) from which normalized log intensities are derived. Consequently there is no
difference induced by the normalization procedure applied before ﬁtting either log-
intensity or log-ratio models. Other methods of normalization, for example quantile
normalization, may apply differently to ratios or intensities and that could be an
additional source of differences.

To conclude, we have shown that in complex designs a log-ratio model could be
derived from a linear transformation of the corresponding log-intensity model. However,
the log-intensity models are, in general, easier to elicit, more ﬂexible, and eventually
more efﬁcient than log-ratio models. Analyses of log-ratios, however, may still be useful
for simple designs, especially for slightly improving computational tractability, if there is

no expected recovery of inter-array information from analyzing log-intensity data.

51

Table 2.1. Expected mean squares of relevant terms in models [4] and [5]. In both cases,

the model [4] is assumed to be the data generation model. 03 Is the residual variance in

the log-intensity scale, 03, is the individual variance in the log-intensity scale.

 

Model [4] Model [5]

 

EMS(Trt) 03 + 20b2 + Q(Trt) 203 + 4013 + QITrt)

 

 

EMS B 2 2 -

( ) 0e +20b

EMS(error) 0.2 20.2
8 e

 

52

Table 2.2. Relative efﬁciency of intra-inter-slide (random array) analysis compare to the

intra-slide (fixed array) analysis in the loop design (Figure 2.4). 03 Is the residual

variance in the log-intensity scale, 0% is the individual variance in the log-intensity

 

 

 

scale.
A: p=0123 =0?
031 =03 0 1A1 1A2 1 2 4 a...
0 1.33 1.22 1.17 1.11 1.07 1.04 1.00
a 1.22 1.15 1.12 1.08 1.05 1.03 1.00
1/2 1.17 1.12 1.09 1.06 1.04 1.02 1.00
1 1.11 1.08 1.06 1.04 1.03 1.01 1.00
2 1.07 1.05 1.04 1.03 1.02 1.01 1.00
4 1.04 1.03 1.02 1.01 1.01 1.01 1.00
area 1.00 1.00 1.00 1.00 1.00 1.00 1.00

 

53

n, m,
brA-Ob
3. m m
bzAIobr
2 R
10...,110
1, m
bzAIOb
n, m.
.02 b
3 u .I.
2. . R
10 A1010
2. m,
rum/.110
m m,
b AIOb
n, 1,
I R
b AlOb
3.. m 1,
I R
b fob
n, m
b AIOb
.1 .1,
I, R
:0 A1010

Figure 2.1. Common reference design. Bk] is the lth biological replicate within the kth

group.

54

1,3 b1,n

[91,1 I71,2 b
1 it
[91,] [71,2 b I71m

Figure 2.2. Complete block design with dye swap . Bk] is the lth biological replicate

within the kth group.

55

113,1 xii/,2 113,3 :14 {tm
I72,1 b2,2 [92,3 [92,4 I72,n
Figure 2.3. Dye swap design with technical replication. B,-j is the jth biological replicate

. . .th
wrthrn thet group.

56

2,1 3,1 172,2 3,2 52,” 3,n

Figure 2.4. Connected loop design. Bk] is the lth biological replicate within the kt‘h group

57

W

1,1,1 b1,1,1 1,2,1 b1,a,1 b2 1 I b221

.. ,, I92,3,1 b2
1 ---t t t 1
1,1,2 19 b [9

2,3,2 b2,a,2

,a,1

W

1,1,2 b1,2,2 I,a,2 b2,1,2 b2,2,2

Figure 2.5. Split-plot design. Bkls is the lth biological replicate within the kth group of

. . th . . . .
factor T recervrng the 3 level of factor S. The same experimental unrt (By) 18 assrgned to

two levels of the factor S (subplot factor) in the same array. Independent experimental

units are used for the two levels of the factor T (whole plot factor).

58

 

eg/e:

0.0009765625 0.25 4 64 1024

 

 

REA Iixed:A random

 

 

 

 

 

 

109(0: lof)

Figure 2.6. Relative contrast variance under different assumptions for the array effect in

. 2 . . . . 2 . . 2 .
reference desrgns. 0 B Is the brologrcal varrance, 0 A IS the array varrance and 0 a Is the

residual variance.

59

CHAPTER THREE

LINEAR MIXED MODELS FOR THE ANALYSIS OF RELATIVE
QUANTIFICATION OF RT-PCR DATA

ABSTRACT: Quantitative reverse transcription polymerase chain reaction (qRT-PCR) is
currently viewed as the most precise technique to quantify levels of messenger RNA.
Relative quantiﬁcation compares the expression of a target gene under two or more
experimental conditions normalized to the measured expression of a control gene. The
statistical methods and software currently available for the analysis of relative
quantiﬁcation of RT-PCR data lack the ﬂexibility and statistical properties to produce
valid inferences in a wide range of experimental situations. In this paper we present a

novel method for the analysis of relative quantiﬁcation of RT-PCR data, which consists
of the analysis of cycle threshold values (CT) for a target and a control gene using a
general linear mixed model. Our method allows testing of a broader class of hypotheses
than traditional analyses such as the comparative CT. For all possible pairwise
comparisons, the estimated fold change was the same using either linear mixed models or
a comparative CT method, but a simulation study indicated that the linear mixed model

approach is more powerful. In summary, the method presented in this paper is more
accurate, powerful and ﬂexible than the traditional analysis methods for analysis of RT-
PCR data. This new method will be especially useful for studies involving more than two

treatments and multiple experimental factors.

60

1. Background
Reverse transcription (RT), followed by quantitative polymerase chain reaction (qPCR),
is currently viewed as the most accurate, sensitive, and speciﬁc technique to quantify
levels of messenger RNA (Bustin, 2000). At present, there are several instrumentations
and chemistries available for implementation of this technique, all of which rely on the
same fundamental principle (Bustin, 2000). This principle consists of the speciﬁc
ampliﬁcation of cDNA from a target transcript in several cycles of PCR coupled with
measurement of a ﬂuorescence intensity assumed to be directly proportional to the
amount of product in each cycle (Giulietti et al., 2001). This methodology has been
extensively validated, and its accuracy and speciﬁcity have been proven for the different
chemistries available (W iner et al., 1999).

The quantitative output of the RT-PCR consists of an ampliﬁcation curve, which

is composed of a set of cycle numbers and associated ﬂuorescence intensities that are

ulteriorly summarized in a single value called cycles to threshold (CT). The CT is a

unitless value defined as the fractional cycle number at which the sample ﬂuorescence

signal passes a ﬁxed threshold above the baseline. Because the threshold is arbitrarily set
within the exponential ampliﬁcation phase, the CT is inversely proportional to the log of
the initial transcript copy number (or log-transcript concentration) of the assayed sample.
The constant of proportionality of the CT to the log-concentration is the ampliﬁcation

efﬁciency (E).
Absolute and relative quantiﬁcation strategies can be applied to measure mRNA

abundance using qRT-PCR (Giulietti et al., 2001; Johnson et al., 2000). Absolute

61

quantiﬁcation allows a direct comparison of expression between different treatments, but
it is more costly than relative quantiﬁcation. This is specially the case when several genes
are proﬁled in one experiment, requiring the ﬁt of a standard curve for each target gene.
On the other hand, relative quantiﬁcation is more practical to be implemented on a large
scale but its statistical analysis remains a challenge.

Relative quantification compares the expression of a target gene under various
conditions (treatments) normalized to the measured expression of an internal control
(Pfafﬂ, 2001) (assumed to be constantly expressed across samples). In general, the
numerous mathematical expressions available for this calculation (Gentle et al., 2001; Liu
and Saint, 2002; Livak and Schmittgen, 2001; Marino et al., 2003; Muller et al., 2002;
Ramakers et al., 2003; Swillens et al., 2004; Tichopad et al., 2004; Tichopad et al., 2003;

Tichopad et al., 2002) may be summarized by equation (1) (Pfafﬂ, 2001):

ACT t t) (”12 -trt1) (l)
(ETarget ) ( arge

 

FC . = ,
0!! r0

where, F Cml m2 is the relative expression (fold-change) of the target gene in a sample
from treatment 1 compared to a sample from treatment 2, ETarget and Ecomml are the

amplification efficiencies of the target and the control genes, respectively, and

ACT(rarger)(m2 —trt1) is the CT of the treatment 2 minus the CT of the treatment 1. If

both amplification efﬁciencies take the maximum possible value (E=2), expression ( 1)

becomes the familiar Z'AACI expression (Livak and Schmittgen, 2001.). Moreover, almost

any other mathematical expression or method available in the literature to calculate the

fold change is a variant of the expression (1). The differences among the variants of

62

equation (1) refer mainly to the estimation of the efﬁciency either from a relative
standard curve (Johnson et al., 2000; Pfafﬂ, 2001) or from the individual ampliﬁcation
curves (Gentle etal., 2001; Swillens et al., 2004).

The methods based on expression (1) are mathematical equations devised to
calculate the fold change between two samples. These equations, however, lack the
statistical formalism needed to draw valid inferences, especially when multiple biological
replicates from each experimental group are assayed (Gentle et al., 2001; Marino et al.,
2003; Pfafﬂ et al., 2002). Moreover, many ad-hoc approaches associated with formulas
similar to (1) have been used with the Objective of generating a set of “companion” p-
values or standard errors (Livak and Schmittgen, 2001; Muller et al., 2002). However,
few of them are valid in the presence of both biological and technical replication.
Currently, the REST® software (Pfafﬂ et al., 2002) is one of the few programs that
implements a valid statistical analysis to test hypotheses and estimate the fold changes
using expression (1).

However, such software is limited to the simplest case of an experimental design
because it can analyze only pair-wise comparisons among groups under a completely
randomized design. A linear mixed model (Cook et al., 2004) was recently proposed for
the implementation of the so-called analytical method (Marino et al., 2003). Such a
model is potentially more ﬂexible than the existing alternatives, but it makes the strong
assumption that there is a common random effect for the control and test genes in each
biological replicate. Consequently, there is a necessity for a formal statistical method for

the analysis of the relative quantiﬁcation of RT-PCR data that allows the accommodation

63

of more complex experimental designs (such as blocking factors) and the testing of
general hypotheses (including interactions, pairwise and group contrasts).

The objective of this paper is to present a novel, ﬂexible method for the analysis
of relative quantiﬁcation of RT-PCR data using linear mixed models. A variety of
approaches are used to validate the proposed methodology, to compare it with existing
methods, and to illustrate its ﬂexibility. First, our model is compared to other alternatives
using a real dataset; second, a model-free simulation based on that dataset is used for

comparative validation of the methodology; third, several datasets are analyzed and

different linear models are compared; and lastly, the results from comparative CT and

linear models are compared in those datasets.

2 Results

2.1 Motivating example

Quantitative RT-PCR was used to study the expression of the gene diazeparn binding
inhibitor (DBI) in the brain of piglets subject to weaning and social isolation (Poletto et
al., 2006). The experimental layout followed a completely randomized block design (n=3
litters) and the treatments consisted of a 2 x 2 factorial combination of weaning (early
weaned or non-weaned) and social isolation (isolated or control).

Preliminary assays indicated that Sus scrofa 18S ribosomal RNA (188) was
suitable for use as an endogenous control gene. The estimated ampliﬁcation efﬁciency of
primers of the two genes (18S and DBI) was close to two (Table 3.1 Supplementary
material). All reactions were performed in triplicates but some observations were

excluded from the analysis because of evidence of non-specific ampliﬁcations (as

revealed by dissociation curve analyses) (Giglio et al., 2003). Model (2) was used for the

analysis Of the expression of DB1 normalized to the expression of 185:

* 2
Ygijkr :TGgi Tlgj +B k +Dijk +egijkr’ ( )

gij

h
where y g ijkr is the C T obtained from the thermocycler software for the gt gene from the

1h . 1h . . .th . . .th
r well, correspondrng to the k anrmal In the j litter subjected to the 1 treatment,

*

TG ig is the mean of treatment i in the expression of gene g (18S and DBI), lg,- ~N(0,
012g) is a gene-specific random effect of the jth litter, Bgijk ~N(0,0§g ) is a gene-speciﬁc
random effect of the kth piglet in the jth litter, Dijk ~N(0, 012)) is a random sample speciﬁc

effect (common to both genes) and e ~N(0,03) is a residual term. The sample

gijkr
speciﬁc effect, D jk , captures differences among samples that are common to both genes,

particularly those that affect total mRNA concentration, such as differential extraction or
ampliﬁcation efﬁciencies among samples. The treatments consisted of the combination of
two factors, and the sub-index i =1, 2, 3, 4 corresponds to: early weaning + control
(EWC), early weaning + isolation (EWI), non-weaning + control (NWC) and non-
weaning + isolation (NW1), respectively.

Model (2) was ﬁt to the data using the SAS mixed procedure (Littell, 1996) and a
residual analysis was performed to check the parametric assumptions of the model. Tests
of differential expression among groups were performed for the interaction of weaning by
isolation and for pair-wise treatment differences (simple effects). Point and interval

estimates of the fold changes were approximated from the linear contrasts (in the log

65

. . -AACT
scale) by back transformation. The fold changes were also estimated With the 2

method (Livak and Schmittgen, 2001) (ACT) using the two procedures presented in the

original work (Poletto et al., 2006). In the ﬁrst variant (ACTHI) , equation (1) (assuming

E=2) was applied to each pair of littermates and the resulting pair-wise fold change was

then averaged. In the second variant (AC T12 l), equation (1) was applied on the averaged
pair-wise CT difference among littermates. A paired t-test was used to assess the
signiﬁcance of the fold change calculated with ACT[2]; no statistical tests were

performed on the fold change obtained with AC T[ 1 I because of insufﬁcient replications to

implement the suggested non-parametric test (Ben Ahmed et al., 2004; Kishimoto et al.,
2004; Martell et al., 1999; Pellagatti et al., 2003; Rhoden et al., 2004).

Finally, model (3) was also used to analyze the data:

* 3
Ygijkr : TGgi + lj + Dijk + egijkr ' ( )

Model (3) is a simpliﬁed version of model (2) without the random sample and

litter effects, and is equivalent to a previously published model for analysis of

ampliﬁcation curve data (Cook et al., 2004).

2.2 Testing and estimating differential expression
There was no evidence of interaction effect between isolation and weaning on the

expression of DB1 (P=0.829), but there was a signiﬁcant three-fold decrease in the DBI

expression due to isolation (P=0.003). The traditional analysis method (ACT[2]) does not

66

allow testing of this interaction, but it may be still used to estimate the fold change of

pair-wise comparisons (Fig. 3.1).

The estimates of fold change were similar using Model (2), (3) and ACT/2], but
the estimates were slightly larger (in absolute terms) using ACT/1] (Fig. 3.1). For
instance, in the NWI-NWC comparison, the fold change from Models (2), (3) and AC TB 1

was 0.35 (i.e., suppression of 2.9 fold) while ACT/1] yielded an estimate of 0.40 (i.e.,

suppression of 2.5 fold).
The conﬁdence intervals for the fold changes based on ACT/2] were wider than
those based on Models (2) and (3), and the general conclusions were not equivalent. For

example, Models (2) and (3) indicated a signiﬁcant decrease in the expression of DBI in

response to social isolation in both early-weaned and non-weaned animals (P=0.013 and
P=0.019 respectively from model (2) ), While ACTIZ] only detected EWI - EWC as
signiﬁcant (P=0.03). At a significance level of OL=5%, Model (2) and Model (3) yielded

the same conclusions, but the conﬁdence intervals were narrower for Model (3). Then,

depending on the signiﬁcance level adopted, the conclusions might differ.

2.3 Validation through simulation

A simulation experiment was used to validate and compare alternative analysis methods.
A fair simulation study in this case precluded the use of any of the analysis models as the
data generation process. Alternatively, we permuted the real data to generate a population

of 1000 datasets with known fold changes, while keeping the original data structure,

67

distribution and variability. The simulated datasets were analyzed using four methods:
Model (2), ACT[]], ACTIZ] and Model (3).
Table 3.1 summarizes the point estimates of the fold change obtained for each

contrast using the four analysis methods. Model (2), Model (3) and ACTIZ 1' yielded

unbiased estimates of the fold change and the three methods had roughly the same mean

square error. On the other hand, ACT] 1] produced biased estimates and also showed a

larger mean square error. The increase in the mean square error for the ACTH] based

estimates was due to both their bias and their larger variance. Moreover, the bias was
always upwards and, consequently, the over expression was exaggerated and the down

regulation was understated. For example, the expected fold change in the contrast EWI-

EWC was 2.0 but the average estimate from AC T1 1 I was 2.2. Conversely, for the contrast

EWC-NWC the true fold change was 0.707, but ACT/1] produced an average fold change

estimate of 0.805.

Conﬁdence intervals (95%) for the fold changes were computed using Model (2),

Model (3) and ACTH, (Table 3.2). The narrowest conﬁdence intervals corresponded to

Model (3), followed by Model (2) and AC T12 1' Nevertheless, a further analysis of the real

coverage of these “nominal” 95% conﬁdence intervals (values within parenthesis in
Table 3.2) revealed that Model (3) yielded intervals with signiﬁcantly less coverage than
the other two methods. The real coverage of conﬁdence intervals obtained from Model
(3) was well below the nominal 95% conﬁdence level. Conﬁdence intervals calculated

from Model (2) exhibited the closest coverage to the nominal level. The coverage

68

obtained from ACTIZ] derived conﬁdence intervals was also close to 95%, but the width

of the conﬁdence intervals were sensibly larger.
Model (2) and Model (3) allowed testing general linear hypotheses related to the

interaction and the main effects (including pair-wise comparisons of treatments).

Conversely, ACT/2 I only allowed single effect contrasts between pairs of treatments

(Table 3.3).

Under the null hypothesis (contrast NW1 - NWC in Table 3.3), Model (2) and

ACT[2] yielded a type I error rate very close to the nominal 5% test value, and the

discrepancies observed in Table 4.4 for the ACTH] method were within the expected

simulation error (based on extensive simulations not shown in this paper). The realized
type I error rate of tests from Model (3) was clearly above the nominal error level (0.07
for 01:00] and 0.18 for 01:0.05). The histograms of the p-values (Figure 3.2) show in
more detail the anomalous distribution of the p-values from model (3). The expected

distribution of p-values for a series of independent tests under null hypothesis is uniform

over the interval [0,1]. While the p-values from Model (2) and ACT[2] exhibited the

typical (uniform) distribution expected under the null hypothesis, the inverted-J shape in
Panel (a) of Figure 3.2 reveals an excess of smaller p-values.

Under the alternative hypothesis (Table 3.3, all comparisons except NW1 - NWC),
Model (3) showed the highest probability of declaring signiﬁcant a fold change larger

than 1, but part of this apparent power comes from an inﬂated type I error rate as shown

69

before. Model (1) was more powerful than ACTH] and, in absolute terms, the increase in

power was more evident for larger fold changes.

2.4 Model checking and result comparison in experimental datasets.

Although Model (2) is suitable only for the analysis of the described dataset, an
equivalent model can be elicited for any speciﬁc data structure or design layout. The
main components of Model (2) are the random sample effects and the random interaction
between sample and gene factors. Moreover, gene speciﬁc variances are assumed for the
sample-gene interaction. The measurement error term (residual effects) is assumed
homoskedastic with respect to genes.

To test the adequacy of these assumptions in a broader set of experimental data
(Abruzzo et al., 2005; Coussens et al., 2003; Coussens et al., 2004; Peirson et al., 2003;
Poletto et al., 2006; Szabo et al., 2004), six different datasets where analyzed. Details of
the datasets are presented in Table 4.4. The datasets included one to 63 test genes and
four to 80 biological samples. All datasets but one included technical replicates (assay
replicates). Twelve alternative models were compared using the Akaike information
criteria and the Bayesian information criteria (Schwarz, 1978). The models represented
the combinations of different assumptions: sample speciﬁc random effect (included or
not), sample-gene random interaction (homoskedastic, heteroskedastic, or not included in
the model), and residual variance (homogeneous or heterogeneous across genes).

The effects included in the best-ﬁt model for each dataset are shown in the
Supplementary Material. A random sample effect was present in all models. Similarly,
the gene by sample interaction with heterogeneous variances among genes was selected

for almost all the datasets. The only exception was the MRD (Peirson et al., 2003) dataset
70

where the model with homogeneous variances was preferred. A possible explanation for
this is that the dataset included only two genes (control and test) that may have not shown
signiﬁcant heterogeneity of variances to select the alternative model, while the inclusion
of more test genes increases the chances of having differential variances of gene
expression. Gene speciﬁc residual variances were also generally favored by the model
selection criteria. In the TLD (Abruzzo et al., 2005) dataset, heterogeneous residual
variances could not be ﬁt due to convergence problems. In the SHK (Szabo et al., 2004)
dataset, the residual term included both the residual and the sample-gene interaction
effects of the other models because the dataset lacks technical replicates. Consequently,
the model with heterogeneous residuals indicates the presence of a gene by sample
interaction with heterogeneous variances, heterogeneity of variance in the measurement
errors, or both. In the remaining datasets, the heterogeneity of residual variances was
caused by different (gene speciﬁc) precisions for the measurement of gene expression.
Disentangling the sources of such heterogeneity is beyond the scope of this paper, but we
anticipate that differential ampliﬁcation efﬁciencies may be one of such causes.

Except for a few subtle differences among the models selected for each dataset,
the general model including sample and gene-sample random effects was always
preferred. The inclusion of heterogeneous residual variances had only a marginal effect
on the tests for differential expression (results not shown). In contrast, omitting the
sample-gene effect from the models increased the type I error rates over the nominal

value (as shown in the previous sections).

71

2.5 Comparison of results with Z'AACT methods

The datasets were used to compare the estimation of ratios of expression between groups

using the linear mixed model and the Z-MCT methodology (ACT). For each comparison

of interest (Supp. Material), the point estimates and 95% conﬁdence intervals of the log-
ratio of gene expression were obtained using the linear mixed model and the ACT.

Figure 3.3a presents the point estimates. As Shown, there was an almost perfect
agreement between the two model speciﬁcations. This result can actually be generalized
for other gene expression datasets so that the linear mixed model and the ACT
methodology are expected to produce very similar point estimates of fold changes.

Figure 3.3b presents the results for the width of the conﬁdence intervals. In
general, there are more points above the indifference line than below it. This implies that
the conﬁdence intervals for log-ratios obtained using the linear mixed model tended to be
shorter than those obtained with the ACT method. Nevertheless, there were differences
among datasets. In general, the datasets involving only two groups (MRD (Peirson et al.,
2003) and SHK (Szabo et al., 2004)) showed very similar length of conﬁdence intervals
for the two methods. In contrast, datasets involving multiple groups tended to show
markedly shorter conﬁdence intervals with the linear mixed models. An exception was
the SPL (Coussens et al., 2003) dataset, for which the width of the conﬁdence intervals
did not show a clear pattern between the linear model and the ACT for the various
contrasts (treatment pairs), possibly due to an apparent heterogeneity of variances among

groups.

72

3. Discussion
In this paper we presented a novel method for the analysis of relative quantiﬁcation of

RT-PCR data. Our approach consists of the analysis of (raw or efﬁciency corrected) CT

values for a target and a control gene using a general linear mixed model. Currently, the
use of qRT-PCR is pervasive in functional genomics studies and the complexity of
experimental designs or sampling schemes have increased considerably (Hamalainen et
al., 2001; Martinez et al., 2004; Recinos et al., 2004). However, the statistical and
mathematical approaches available for the analysis of such data lack the ﬂexibility and
statistical properties necessary to produce useful and valid inferences in complex
experimental layouts. Conversely, our method is ﬂexible and allows the incorporation of
an arbitrarily complex experimental protocol in both the treatment structure (factorial,
time courses, etc.) and the sampling scheme (blocks, split-plots, etc.). Furthermore, the
linear mixed model allows testing any general linear hypothesis; for instance, in the ﬁrst
real data example presented in this paper, we could test the hypothesis of interaction

between social isolation and early weaning in the expression of DBI in the brain of the

piglets. In contrast, the traditional analysis method (2-AACT) could not test the same

hypothesis and its application was restricted to pair-wise comparisons of treatments. With
other datasets, we could test for linear and quadratic trends in time course experiments
and for interactions and main effects in a 2x3 split-plot design (results of these contrasts

not shown).

For pair contrasts, our method (Model (2) ), Model (3) and ACTIZ] produced

Similar estimates of the fold change, but another Implementation of the 2 method

73

(ACTHI) produced slightly different estimates. The ACTH] implementation of the 2

1S frequently used in practice, and its Signiﬁcance lS usually assessed With non

parametric tests (Ben Ahmed et al., 2004; Kishimoto et al., 2004; Martell et al., 1999;
Pellagatti et al., 2003; Rhoden et al., 2004). Such a procedure could not be applied in this
case because of limited sample size. For hypothesis testing and interval estimation the

methods yielded divergent results: Model (3) showed the highest signiﬁcance (lower p-

values and narrower conﬁdence intervals) and ACT12] showed the least signiﬁcant

results; Model (2) produced intermediate results.

These differences are consequence of the assumptions behind each procedure.
Particularly, our model assumes a Gaussian distribution of the log expression. It also
assumes heterogeneous variances in the expression of the target gene and the control
gene, and the presence of sample-speciﬁc effects related to the measurement protocol. A
priori, all these assumptions are plausible. The assumption of normally distributed log-
expression levels has been extensively used (Andersen et al., 2004; Brunner et al., 2004;
Szabo et al., 2004). Also, the heteroskedastic models for the analysis of several candidate
control genes presented better ﬁt than homoskedastic alternatives (Szabo et al., 2004).

The aforementioned assumptions were reasonable in this experiment; however,
we also conducted a simulation study to evaluate the performance of our model. We
avoided the use of a parametric model for the simulation by creating a population of

datasets using resampling methods. In other words, neither the original distribution of the

CT values nor the relative technical and biological variabilities were altered. From this
simulation, Model (2) emerged as the best model for the analysis showing a correct type I

74

error rate and conﬁdence intervals coverage. Contrarily, Model (3) overstated the
signiﬁcance of the comparisons and the coverage of the conﬁdence intervals. While these
results are speciﬁc for these data, we think that the inclusion of a gene—sample speciﬁc
effect separated from a sample speciﬁc effect (as in Model 2) is more plausible than

solely inclusion of a sample effect common to both genes (as it is implicit in Model 3).

Comparing the simulation results from Model (2) and ACTIZ 1’ it is evident that
the simultaneous analysis of all groups (Model (2) ) provides more power than the
independent pairwise comparisons (ACT[2])- This is not surprising because a pair
contrast within Model (2) had 6 degrees of freedom while the paired t-test associated with

ACT/2] had only 2 degrees of freedom. Moreover, the advantage of the mixed model

methodology over the ACT/2 I would be larger if more treatments were included. For

instance, in a completely randomized design with 10 treatments or groups and 3
biological replicates in each, any t-test between a pair of treatments will have 4 degrees
of freedom, while an ANOVA based F-test will have 20 degrees of freedom. On the other
hand, if a certain experiment is restricted to two treatments or groups, both methods will
yield identical results.

We validated these conclusions by analyzing a set of experimental data that
included different numbers of genes and biological replications. The assumptions
regarding random effects were conﬁrmed by model selection in every dataset.
Additionally, we found heterogeneous residual variances in most of the datasets. Using

the same datasets, the increased overall power of the linear mixed model analysis over

the comparative CT for large numbers of groups was illustrated using the width of the

75

log-ratio confidence interval. In general, comparisons from experiments involving more
groups tended to show signiﬁcantly shorter conﬁdence intervals using the linear mixed
model, while comparisons from experiments with two groups yielded equivalent

conﬁdence intervals with any methodology.

4. Conclusions
In summary, we have shown the importance of proper modeling of qRT-PCR data
for correctly controling the type I error, and we have provided a general method for the

analysis. The most important feature of our modeling approach is the use of (raw or

efficiency corrected) CT data as response variables to conduct a joint analysis of target

and control gene expression, modeling simultaneously the biological and technical
variation. Furthermore, Model (2) is a single alternative implementation of the linear
mixed model approach (the most appropriate for our real data example), but it was easily
expanded to ﬁt data from other designs. Finally, our method is more accurate, powerful
and ﬂexible than any existing analysis method and it is especially useful in studies

involving more than two treatments or time points and multiple experimental factors.

5. Methods
5.1 Materials and RT-PCR reactions
Sample collection, mRNA extraction, cDNA synthesis and PCR protocols are described

in detail in the Supplementary Material.

76

 

Iii

 

 

5.2 Model derivation

We assume that the expression Zgijk (copy number or concentration of mRNA) of gene g

in sample k of litter (block) j and experimental group i can be described by:
log(zg,-jk)=TGg,-+lgI-+B (4)

gijk ’

. . . . h .
where TG,g is the mean expressron of gene g in the It treatment, lg,- 13 the random effect
of litter on each gene [lgl-~N(0, 012g )] and Bgijk is the gene and sample speciﬁc effect

[138,-], ~N(0, 0,2,8 )1.
If mRN A is isolated, cDNA is synthesized and qRT-PCR is conducted in several

independent wells for each sample in the presence of primers for each of the genes, the

generated data may be analyzed with the following model:
ygijkr : TGgi + lg] + Bgijk + Dijk + egijkr 1 (5)

where ygl-J-k, is a measured expression level in the log scale (for example: CT), Dij- is a

sample speciﬁc effect introduced by the experimental protocol and 8 is a well-

gijkr
speciﬁc measurement error.

In Model (5), Di!- represents a measurement artifact that is sample speciﬁc, and it

is assumed to be Dijk ~ N (d,- , 0,73) . This implies that the experimental protocol affects the

measurement on the sample for all assayed genes in the same way, but it may generate a

treatment bias (di). These assumptions (apart from the speciﬁc Gaussian distribution) are

77

are

 

standard in relative quantiﬁcation analyses. Moreover, the existence of the DU- effects is

supposedly the reason to include a control gene in such assays.

If we assume that the TGgi = p + Tgt and that 1'8,- = 0 for the control gene, and we

i.i.d
ﬁt Model (5) assuming Dl-j ~ N (0, 022)) (i.e. Model(2) ), the following values for the

:1:
TG gi effects are expected:
TG; = a + 78,- + di, for g = target (6)
T0; = ,u + ‘11 , for g = control

5.3 Hypothesis testing and estimation
Suppose that the interest is to estimate the fold change between EWI (i=2) and EWC
(i =1) for the target gene (g=2) normalized to the control gene (g=1). This is equivalent to

estimate the log-difference (or log of the fold change) using:
d' — To" —TG"‘ — TG’“ —TG* (7)
If(EWI-EWC)"( 22 21) ( 12 11)-

It is clear that if (6) holds (i.e. there is no differential expression of the control

gene), the expectation of (7) is:
E[dif(EWI—EWC)] = TG22 +612 ‘T021 “(11 —(d2 “‘11) = T022 ‘T021, (8)
which is the quantity of interest. Furthermore, point estimates, hypothesis tests and

conﬁdence intervals of (7) are readily available, and the fold-change estimates may be

approximated by transforming point and conﬁdence interval limits to the correct scale.

For example, if the data z are CT values, the fold change estimation formulae would be:

-di1ft EWI—EWC (9)
FC(EWl—EWC) =2 ( )-

78

5.4 Data and response variable
The response variable for Model (5) may be any measure proportional to the log-mRNA

concentration in the samples. In our particular example, the ampliﬁcation efﬁciency was

close to the optimal value (E=2), and consequently the CT values constituted a suitable

response. The Supplementary Material includes a detailed explanation of an alternative
response variable when the ampliﬁcation efﬁciency is smaller than two.

5.5 Programs for analysis

The models implemented in this paper can be readily ﬁt using mixed model software.
Particularly, a SAS code is available in the Supplementary Material.

5.6 Simulation study

From the expression data (raw CT) from the target (DB1) and the control (185) genes, we

computed the arithmetic mean of each treatment (averaging out all the available
biological and technical replicates), subtracted the corresponding average treatment value
from each individual observation and added the general mean. The result of this
procedure is a dataset that keeps the original variability among litters and among
technical replicates, but it has a common mean for all treatments. Subsequently, the data
were reshufﬂed to create 1000 datasets. Within each litter, the treatment memberships
were permutated among the four treatments, but the technical replicates were kept
together. Then, the observations corresponding to EWC animals were increased by the
value 0.5 and the observations corresponding to EWI animals were decreased by the
value 0.5. Consequently, the resulting population of trials had roughly the same
biological and technical variability of the original data, but known fold change for each
treatment pair (second column in Table 3.1).

79

'I‘Z‘fi

 

“_

 

The analysis of this population of datasets provided a set of 1000 p-values for
each of the hypotheses tested. The type one error rate (or) was estimated by the rate of
rejections (for certain nominal 01) in the comparison EWC-NWC. Conversely, power was
estimated counting the number of rejections in any non-null hypothesis. The coverage of
the conﬁdence intervals was estimated from the proportion of intervals that contained the
true fold change value.

5.7 Model selection
A detailed model description used for each dataset of Table 3.4 for model

selection and results comparison is presented in the Supplementary Material.

80

 

Contrast

E(FC)a

13cm”

FC[ACTI f

chcn [‘1

Ire/31"

 

EWI - EWC
EWI - NWI
EWC - NW
NW I - NWC

2.000
1.414
0.707
1.000

 

 

2.059 (0.3922)
1.442 (0.1588)
0.744 (0.0470)
1.054 (0.0925)

 

2.220 (0.4958)
1.558 (0.2004)
0.805 (0.0618)
1.140 (1.3200)

 

2.059 (0.3959)
1.443 (01602)
0.744 (0.0474)
1.054 (0.0937)

 

2.054 (0.3720)
1.440 (0.1503)
0.742 (0.0443)
1.051 (0.0875)

 

Table 3.1 Properties of the point estimates of the fold change. aexpected (true) value of

the fold change for each comparison. The other columns present the mean estimates with

their mean squared errors (in parenthesis) from 1000 simulations. bModel (2); CACTU];

d ACT[2]; and eModel (3).

81

 

Contrast C I l 2 I“ C IIACTI lb C [[316

EWI - EWC 3.03 (93.7 %) 6.03 (91.0 %) 1.63 (79.8 %)
EWI - NWI 2.12 (95.4 %) 4.34 (94.1 %) 1.14 (84.2 %)
EWC — NWC 1.09 (94.7 %) 2.24 (93.0 %) 0.58 (81.4 %)
NWI - NWC 1.55 (94.7 %) 3.22 (92.9 %) 0.83 (82.0 %)

 

 

 

 

 

Table 3.2. Properties of the interval estimates of the fold change. Each column presents
the mean width of the 95% conﬁdence interval for the fold change with the actual

coverage of the interval in parenthesis. Desirable properties are coverage close to 95%

and small interval width. aModel (2); bACTp} and cModel (3).

82

 

 

 

 

or: 1% or: 5%
Contrast b c a b c
19121a PIACDI P131 P121 P14c_721 P131
EWI - EWC 0.157 0.058 0.700 0.439 0.257 0.872
EWI - NWI 0.030 0.026 0.246 0.131 0.098 0.415
EWC - NWC 0.031 0.026 0.276 0.152 0.104 0.429
NWI - NWC 0.010 0.029 0.072 0.053 0.071 0.180
wean. x isol. 0.075 - 0.416 0.249 - 0.584

 

 

 

 

 

 

 

Table 3.3. Properties of the hypothesis tests. p: proportion of rejected tests from the 1000
simulated datasets in each contrast at two signiﬁcance levels (a). The NWI-NWC
contrast corresponded to the null hypothesis (no differential expression) and the expected

value is p= a For contrasts different from NW I-NW C, a larger value of p implies more

power. The subindex indicates the analysis method. aModel (2); bACTp} and cModel (3).

83

 

 

Number Biological Assay
Dataset name Experimental Design
of genes samples replicates

 

Split plot design. Main plot
SPLIT (Coussens et
2 factor: disease status. Subplot 12 2
al., 2003)

factor: infection

Longitudinal time course
TC (Coussens et al.,

5 experiment. Timepoints: 0, 2, Four 2
2004)
4, 8, 16 hours after infection.
2 x 2 Factorial in a randomized
PFC (Poletto et al.,
5 complete block design. 12 3
2006)
Factors: weaning; isolation
MRD (Peirson et Completely randomized design
2 Eight 3
al., 2003) with two groups.
TLD (Abruzzo et Completely randomized design
64 Nine. 4
al., 2005) with two groups.
SHK (Szabo et al., Completely randomized design
6 80 1
2004) with one group.

 

Table 3.4. Experimental data for model checking and result comparison. The letters in

parenthesis are the abbreviations used in the text to refer to each dataset.

84

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3 8
2 T - 4
T" - 2
A —1 0
0 LI.
3., -.
8‘:
.3 - 2
- 4
.1 IVIOCIeI(2)
‘3 ‘I z ACT1 ‘ ' 3
1:1 ACT2
_ I Model(3)
-4 I I I 16

EWC - NWC EWI - NWI EWI - EWC NWI - NWC

Comparison

Figure 3.1. Fold change estimates. The log2 fold changes for four pairs of contrasts

(abscissa) are presented. Fold change scale is included on the right axis. Segments
indicate the 95% conﬁdence interval. Comparisons whose conﬁdence interval include the

value 0 (1 in fold change scale) are not signiﬁcant at a=5%. A conﬁdence interval or

signiﬁcant level for AC T[ 1 I could not be calculated.

85

ACT2 Model [2]

      

  

N N.
g: g to
U) (o tn
5 e- a
‘O ‘O
0. 0.
o o iﬁ—I—r—r—I
0.0 0.4 0.8
p—value p—value
Model [3]
0.
g N
U)
C O
o I
‘0 1-
Q
o I‘_T‘_1—I—I—l

0.0 0.4 0.8

p—value

Figure 3.2.Histograms of p-values under null hypothesis. Under null hypothesis, the p-
values are expected to follow a uniform distribution over the [0, 1] interval. The
departure from that distribution in model (3) indicates an excess of false positives (higher

frequencies for smaller p-values).

86

 

 

 

'—
O
Q
A
m ._
E '- .. o .....
B“) N j I ,,,,,,
E ._ _. ..-c"'
o. - ,,.-'"" .
—. (p .1 .
O =
“5 N ‘
5 I I I I I I
or
C
2

2 4 6 81012141618
length of 0.1. of log(FC) LM

 

_ '.I,
I" <1- 4 _
g j b) I'
A o
(L13 _
V Y‘ H
C) ‘J.
.2 T ,A
(1') ._l A-

 

log(FC) LM

Figure 3.3. Result comparison between ACT (vertical axis) and linear mixed model
(horizontal axis). (a) log fold changes. (b) length of the conﬁdence interval for log fold
changes. The 1:1 line is represented by dots. Comparisons in each dataset are indicated

by symbols. I: TLD, O: SPLIT, A: TC, 0: MRD and O: PFC.

87

CHAPTER FOUR
DESIGN AND ANALYSIS OF TWO-STAGE EXPERIMENTS FOR
TRANSCRIPTIONAL PROFILIN G
1. Introduction

Gene expression microarrays are powerful tools for screening the transcription
proﬁle for thousands of genes simultaneously on any particular sample. Despite the
continuous sensitivity and increased dynamic range of microarrays, this technology is
Still regarded as lacking some precision. This limitation has prompted genomics
researchers to validate their microarray results using an independent technique, typically
quantitative reverse transcription polymerase chain reaction (qRT-PCR), although some
authors have expressed skepticism about the necessity of doing so (Rockett, 2003;
Rockett and Hellmann, 2004). At any rate, in some scientiﬁc journals, such validation is
mandatory, for example in Circulation Research and in Arthritis and Rheumatism
(Rockett, 2003).

Allison et a1. (2006) has recently posed the important question of what is
validation and how should it be performed. We assume that the ultimate objective of
gene expression technologies is to efﬁciently search for genes that are differentially
expressed (DE) between two or more conditions. Generally, the expression of a set of
genes that were declared DE by a microarray experiment is further screened using qRT-
PCR. If a gene is also concluded to be DE using qRT-PCR, presumably in the same
direction as the microarray results, then that gene is believed to be validated. In other

words, the testing procedure is two stages; the screening or pilot phase is performed using

88

microarray technology followed by a conﬁrmatory or validation phase using qRT-PCR
for only those gene-speciﬁc null hypotheses rejected in the ﬁrst stage.

Two-stage tests and designs have previously been proposed for association studies
(Lin, 2006; Satagopan and Elston, 2003; Satagopan et al., 2004). Those procedures
control the family wise error rate (FWER) and have potentially lower power when a large
number of tests are considered. More recently, two-stage designs controlling false
discovery rates (FDR) have also been proposed for association studies (Zehetmayer et al.,
2005). Miller (2001) described a very simple procedure to control the FWER in the
second stage for gene expression studies.

In practice, a researcher may be interested in assessing the number of samples
necessary to obtain certain power with a given FDR. This problem has been addressed for
single stage experiments based on microarrays (Jung, 2005). Conversely, the power and
experimental design of the conﬁrmation phase in a two-stage test have not been
addressed in the literature. In this paper, we propose methods for helping design two-
stage experiments and provide guidelines for conducting the two-stage tests. We work
under two possible design scenarios. In the ﬁrst case, the total number of samples
available is ﬁxed and the main objective is to allocate samples to either the ﬁrst stage (i.e.
microarray) or the second stage (qRT-PCR). In the second case, the necessary number of
samples to attain a given power for the second stage is determined conditional on a
previous microarray experiment. We also investigate the ramiﬁcations of using
independent samples as opposed to the same samples in both stages.

The structure of the paper is constructed as follows: In Section 2, we formulate

the problem and elicit the theory for FDR control and power calculation in a two-stage

89

experiment involving multiple tests. Section 3 is divided into three parts. First, we assess
the effects of the proportion of genes that are non-DE and the proportion of genes chosen
to be validated in the second stage on overall sensitivity and FDR in the joint design of a
screening and validation experiment. Second, we study these same effects for the design
of a validation experiment upon completion of a microarray experiment. Finally we
estimate effect sizes and correlation between expression measures on the same gene in an

experimental dataset and demonstrate how to empirically infer upon sensitivity and FDR.

2. Methods
2.1 Type I error rate in a two-stage single test.

Consider an expression proﬁling experiment comparing just two experimental
groups or treatments. Let Xuj (X i2j), with j =1 ...n in both treatments, denote the measured
expression level of a gene i in subject j within Group 1 (2).

. . . . 2

We momentarily assume a known Within-group variance 0'1 to be common to the
two treatments and (log) expression levels to be normally distributed. Specifying
di=E(X,-1j)-E(X,-2j) as the true mean difference, the classical z-test statistic has

distribution:

(1)
"(1) “(1) N(0’l)!d' =0
(1)2X11 “X12 ’

Zl ~ .
1) n1 (1)
Girl) 3 N(0}( End, >0

"1

90

Here, the superscript index (1) indicates data generated under Stage 1 (microarray) such

(1)

that 0}“) =—%)- defines the effect size for Stage 1. Similarly, superscript index (2)
0'1

pertains to data derived from Stage 2 (qRT-PCR).

The joint distribution of the ﬁrst stage and second stage test statistics is speciﬁed

_ 41'
as bivariate normal with correlation p :

(‘50) "1‘ I
25” ~N " l; [I 11*]
(2) ’ '
z- 2 In
1 511) _2_

\- 2- 1

Now if independent samples are used for the first and second stage testing, then p*=0;

 

 

 

 

4’
however, if the same samples are used in both stages, we would expect p >0 to be a

monotonic function of the correlation in gene expression measurements between the two

techniques.
In general, the overall Type I error rate or for any particular gene over the two-
stage test can be readily determined as

a: J‘l—(I)[ZC2—-p§l ¢(Zl)d21, I1]

201 \Vl-P*2

where ch = (b-1011) and Zcz = (1)—1(02) are the critical values for declaring a gene to
be DE at the ﬁrst and second stage tests respectively, based on stage-speciﬁc Type I error

rates 011 and 012, respectively. Under a true alternative hypothesis, the power (l-ﬂ) of the

two-stage test is:

91

 

 

 

I i
_ ,2) 2.2. =1: _ .1) ﬂ
- [3 1:01.61 m]
1—13: ji—cp 2 ¢(z1)dzl.
ch \/I—P*
\ }

 

 

We momentarily assume that different samples are used for each stage. Note that under

these circumstances, 0.: 01102 and (I-ﬂ)= (1-,6(1))(1-,B(2)).

2.2 FDR and sensitivity in two-stage tests.
Table 4.1 delineates all possible combinations of outcomes of the true state of

nature with conclusions drawn from statistical analysis using a two-stage gene expression

proﬁling experiment. The ﬁrst stage (i.e. microarray) is conducted using the same 11]
experimental units per group from each treatment for all m = m0 + m1 hypotheses or
genes, m0 of which are non DE and the remaining m1 are then DE. Hence a proportion

71'0=mO/(m1+m0) of the m gene speciﬁc hypotheses is truly null (non-DE). The ﬁrst stage

(1)

rejection set involves the R hypotheses declared signiﬁcant by a statistical test on the

first stage experiment; these genes are subsequently tested in a second stage (i.e. qRT-

PCR) using a set of n2 samples per group being independent from the ﬁrst 111 samples. In

other words, a total of n = it] + n2 biological replicates per treatment are used in the two

( )

stage study. Among these R 1 genes, there are Rm null hypotheses rejected again in the

second stage thereby creating a list of putatively validated DE genes. The remainder of

92

)

the second stage tests (A(2 hypotheses or genes) are then generally concluded to be non-

DE. Now the correctly accepted hypotheses (true negatives) across the two stages sum to

A6” + (2) whereas the number of incorrectly accepted hypotheses (false negatives) sum

1 2 . . .
to A; )+ ( ); again, here the superscripts characterize the testing stage. Similarly, the

number of correctly rejected second stage (true positives) tests is RI‘Z’ whereas the

number of incorrectly rejected second stage hypotheses (false positives) is Rs”. Among

(1)

the R samples tested in the second stage, R0(1)=A0(2)+R0(2) determines the number of

(2 )

true negatives whereas R1(1)=A1 )+R1(2 is the number of rejected hypotheses. Hence,

the ﬁrst stage false discovery rate (FDRm) is deﬁned by the proportion R0(l)/ RU).

Among the various deﬁnitions of power for multiple tests, the sensitivity is probably the
most meaningful in microarray experiments (Pawitan et al., 2005). Table 4.2 illustrates

that applying a second stage test will generally decrease the FDR and the sensitivity.

Nevertheless, the allocation of the n samples to ﬁrst stage (n1) and second stage (n2)

should be optimized in order to attain the maximum sensitivity on identifying DE genes
across the two stages.

To control the FDR and maximize the sensitivity we assume independence of the test

statistics and deﬁne the following two stage expressions:

 

 

mam = ”0“ 2 , and {31
trod +(1— 7:0)Sensitivity( )
Z (l—ﬂji [4]
Sensitivity(2) = 16m] ,
mi

93

 

From [4], the sensitivity is the average power of the single (non-null) tests and
represents the proportion of DE genes that we expect to detect with the two-stage
experiment. A two stage design for expression proﬁling should only have a sensitivity
advantage over the single stage design (i.e., using only microarrays) if the effect size of

the second stage tends to be larger that the effect size of the ﬁrst stage for any particular

gene. Moreover, for a given sample partition (12 1mg) between the ﬁrst and second stages,

the maximum sensitivity of the two stage design will always correspond to the largest
. 0(1) 0(1) . .

pOSSIble . However, the largest value of 18 generally constrained by the

maximum number of ﬁrst stage rejections (declared DE genes) that can be further

validated in the second stage because of budget limits. For example, if there are 4000

hypotheses tested with a microarray experiment and at most 40 of those could be
validated using high throughput qRT-PCR, then the value of am should be determined

such that

R0) 2 (1‘4”) [5]

_=7:Oa(l) +040) 1'6"“ $0.01.
112 m]

 

The second stage Type I error rate, 0(2), for statistical signiﬁcance should be

chosen such that the FDR in Equation [2] is controlled at a desired level. Finally, for a

given total sample size n, the optimization should be done over the alternative sample

partitions (n 1mg).

3. Results
3.1 Sample size and sample allocation.

3.1.1 Known variance and constant eﬂect sizes across independent genes.

94

The question of how many samples are needed to simultaneously control the FDR
and attain a certain level of speciﬁcity with single-stage microarray experiments has been
addressed previously (Jung, 2005). Here we optimize the design of a two-stage
experiment by applying such a method to estimate the number of samples per group in a

single stage experiment. Subsequently, we maximize equation [4] with respect to the
sample partitions and a! ) subject to restrictions in FDR and proportion of genes selected

in equations (Recinos et a1.) and [5].
In summary, the two-stage design procedure proceeds as follows:
1) Specify the input parameters
a) ﬁmaximum FDR level
b) m= total number of hypotheses

c) s= minimum target sensitivity

(1) R(1)/m= maximum proportion of hypotheses declared DE in the ﬁrst

stage to be validated in the second stage

6) m: proportion of hypotheses that are null
0 (5w: First stage effect Sizes

g) 412): Second stage effect sizes
2) Obtain overall sample size, n, for a single stage design, based on 1a) through g)

3) Apply the optimization over all possible partitions of n in n, and n2.

Let’s consider a couple of cases:
Case 1. f=0.05, m=4000, 7m=0.99, s=0.6, @1121. Using equation (8) in (lung, 2005)

(1)

along with n=28. $221.4, R /m= 0.01 (i.e., maximum of 40 genes to be validated), the

optimal sample allocation is n1=21 replicates per treatment in the ﬁrst stage and n2=7

replicates per treatment in the second stage. With these design parameters, the sensitivity

95

of the two-stage design and the single stage design are both very close to 0.6. Hence,

there would be little advantage in using a two-step design in this situation.

(1)

Case 2. If twice as many hypotheses were to be validated as in Case 1, i.e., R /m= 0.02

with all other input parameters being identical, the optimal sample allocation would then

be: n1=18 samples for the ﬁrst stage and n2=10 samples for the second stage.

Furthermore, the sensitivity for the two-stage design would be 0.7 thereby substantially

exceeding the sensitivity of 0.63 for the single stage design.

Using the same procedure, the optimal sample allocation and resulting sensitivity

(

for a two-stage experiment for different proportions of genes to be validated (i.e. R 1)/m )

in the second stage are presented in Figure 4.1. In general, most of the samples are
allocated to the ﬁrst stage (Figure 4.1a), but this allocation changes with the proportion of
declared DE ﬁrst stage hypotheses to be validated and the effect size ratio (0‘ 2):0( U). As
more declared DE hypotheses are allowed to be retested in the second stage, less samples
are proportionately assigned to the ﬁrst stage. That is, if only a limited number of
hypotheses are retested in the second stage, the ﬁrst stage needs to be as sensitive as

possible in order to carry over as many of the truly DE genes in the second stage. With

respect to the consequence of the effect size ratios, the minimum proportion of samples

allocated to the ﬁrst step occurred at 0(2):0u)=1.4 regardless of the values for the

proportion of genes to be carried over from the first stage. From Figure 4.1b, it is further

evident that the proportion of all genes to be canied over the second stage should be at

96

least equal to the proportion of genes that are DB in order to have greater sensitivity with

the use of a two-stage design. For example, if 40 out of 4000 hypotheses are true

positives (7:1: 0.01) and we choose to validate only 20, the sensitivity could never

exceed 0.5.

3.1.2 Varying eﬁ‘ect sizes

In the previous section, a constant effect size was assumed for each gene within
both stages; however, we know that this assumption is generally not valid for gene
expression studies. Subsequently, we studied three different distributions for effect sizes
as illustrated in Figure 4.2: a) constant, b) symmetric and c) asymmetric decreasing.

Figures 4.3 and 4.4 display the optimal sample allocation and sensitivity

determinations based on two different across-gene means for effect size in the ﬁrst stage

E(d§l))=0.4 and E(d§1))=1.0 for each of the three distribution types. In general, a

constant effect size for the ﬁrst stage always had greater sensitivity (Figures 4.3 and 4.4
b). However, the sample allocation strategy changes only marginally between the three
distribution types, especially when there is a large number of samples (Figure 4.4 a).
With a larger effect size and hence smaller number of replicates to attain the desired
power, the proportion allocated to the ﬁrst stage appeared to be more sensitive to the
distribution of the ﬁrst stage effect size, but this was only attributable to the reallocation
of one or two samples per treatment.

We already demonstrated that different ﬁxed ratios of the second stage to ﬁrst
stage effect sizes across all genes inﬂuenced optimal sample size allocations between the

two stages (Figures 4.1, 4.3 and 4.4). We also decided to investigate variability in these

97

 

ratios based on similar distribution speciﬁcations as considered for ﬁrst stage effect size
(Figure 4.5)
It should be noted, however, that these ratios had no impact on the optimal sample

size allocation and overall power as indicated in Figure 4.6.

3.1.3 Unknown variances.

Thus far, we have assumed a known variance in all of these tests (i.e. z-tests), but
the procedure can be extended to a t—test for the more typical cases where variances are
not known and must be inferred from the data. Figure 4.7 shows the optimal sample size
allocation and sensitivity for various effect size ratios and proportion of genes to be
validated when the variance is assumed unknown. We adjusted the overall I: to maintain

the same target minimum sensitivity level in the single stage case. For example, for

(f 1) =1, for the t-test we have n=30 while for the z-test we needed n=28 for the same

effect size using a z-test. In general, the sample size allocation is very similar to that
obtained in the known variance case, but a slightly smaller proportion is assigned to the

ﬁrst stage.

3.1.4 Implementing two stage tests.

In order to determine the optimal sample allocation between the two stages, the

optimization procedure used in the previous sections also yielded the optimal signiﬁcance

level cutoffs in both stages; i.e. 0(1) and am. Zehetmayer et al. (Zehetmayer et al., 2005)

proposed to use aw calculated from equation [5] assuming a priori values for effect sizes

and 71'0 and then set amsuch that the desired FDR is attained. Such a strategy could

98

certainly be used in our work with the additional restriction that the proportion of

hypotheses selected from the ﬁrst stage is not larger than a speciﬁed limit. Alternatively,

a ﬁxed proportion could be selected in the ﬁrst stage and 012) then set to control the FDR.
Furthermore, the estimation of the FDR in the ﬁrst and second stages requires the
speciﬁcation of no, the proportion of all hypotheses that are truly null. At present, there

are several methods to estimate no, including the procedure proposed by Storey (2002)

that we subsequently use. The estimation of no is more critical in the second stage where

only a limited number of hypotheses are tested. In such situation, the FDR from the Stage

1 is an estimate of the no in the validation set.

In this section we use a simulation study to assess the properties of both testing
procedures based on an optimal design calculated in the previous section. For all cases,

variances were assumed to be unknown such that inferences are based on t-tests. We use

. I . 2 .
four procedures to determine a! ) and a( ) 1n order to control FDR:
. I 2 . . .
1) Estimates of a( ) and J ) from the optimization procedure.
. 0(1) . . . . .
2) Estimates of from the optrmlzatron but restricted to a maxrmum number of

. . 2 . . . .
I‘CjCCthl‘lS and a.I )set to control FDR. Furthermore, no rs estrmated from the validation
experiment.

3) Similar to 2) but no is estimated from the FDR of the ﬁrst stage.

4) Fixed proportion of hypotheses selected in the ﬁrst stage with second stage

similar to 3).

99

The design was optimized for the following input parameter values: 710:0.99,

(1) (1)

#005, 6} =1, n=30, R /m=0.01. The optimization resulted. in 23 samples for the ﬁrst

stage and 7 samples in the second stage. The true FDR and sensitivity of the two-stage

test are presented in panels a) and b), respectively, of Figure 4.8.

All testing methods provided good control of the FDR and similar sensitivity. The
ﬁrst two procedures, however, are not expected to perform well (control FDR and give
maximum sensitivity) if the actual parameters are different from the values assumed in

the optimization because they do not account for uncertainty in the values of effect sizes

and 74).

3.1.5 Correlated genes.

The optimization has been performed assuming that the expression of the genes is
not correlated; i.e. expression of each gene is independent from each other. Therefore,
we used simulation to further investigate the sensitivity obtained with the selected design
when the statistical tests are correlated between genes. We speciﬁed a block-diagonal
compound symmetry correlation structure between all genes based on individual blocks
of either size 10 or 40 genes and within-block correlation coefficients of either 0.4 or
0.75. Figure 4.9 shows the sensitivity and FDR of the two-stage procedure for these

speciﬁcations.

100

 

It can be noted from Figure 4.9 that, on average, the FDR is controlled at the
nominal level and the target sensitivity is obtained, but the variance of FDR and

sensitivity increases with the correlation.

A set of simulation studies was run over alternative partitions of n12n2 to evaluate

the sample optimal allocation under correlated tests (Figure 4.10). For a moderate
correlation structure (block size=10, p=0.4), the optimal partition coincided with that
obtained in the optimization assurrring independent tests. However, for a more extreme
correlation (block size=40, p=0.4)), the optimal sample partition was slightly different,

assigning one more replication to the ﬁrst stage. In practice these differences may not be

relevant and the actual gain in sensitivity might be minimal for low to moderate it].

Even for highly correlated gene expression, the optimal sample partition (n 1:26,

n2=4) is very close to the sample partition assuming uncorrelated tests (n 1:23, n2=7).

Moreover, any of these sample partitions yielded practically the same sensitivity, i.e.,

very close to the target value of 0.6.

3.2 Design of a validation experiment.
3.2.] Sample size calculation of an independent validation experiment.

We previously considered the joint design of a screening and validation
experiment. A simpler case may be the optimization of a validation experiment upon
completion of an observed screening experiment. Several authors have addressed the

issue of sample size calculation in microarray experiments (Hu et al., 2005; Jung, 2005).

101

 

Jung (2005) has proposed a simple algorithm for sample size calculation to attain a target
FDR and sensitivity that could be easily applied to the design of validation experiments.
Given a list of putatively differentially expressed genes from the microarray, the
associated FDR is an estimate of the proportion of null hypotheses in the set. Moreover,
from the microarray data, an estimate of the effect sizes for each gene could be obtained.
With these elements and given a target FDR and sensitivity for the validation experiment,

the required sample size can be readily computed.

Case 3:
A microarray experiment yielded a list of 25 genes declared DE, with an overall FDR of
0.20. The average effect size was 6:10. The target sensitivity of the validation

experiment was 0.9 and the FDR was speciﬁed to be either 0.01 or 0.05. Following Jung

(2005), the required sample size is n2=10 or n2=20 to control FDR at levels of 0.05 or

0.01, respectively. Moreover, to attain such FDR levels, we expect that the cutoffs for

declaring statistical signiﬁcance for the validation experiment to be a2: 0.0364

(FDR=0.01) or a2=0.19 (FDR=0.05).

These values in Case 3 are calculated based on the assumption that the samples
used in the validation experiment are independent from those used in the microarray

experiment. Also, note that if the traditional comparison wise level of signiﬁcance cutoffs

(19:00] or a2=0.05 are used (for ﬁxed sample size), the sensitivity is actually expected

to be smaller than the intended value.

102

Figure 4.11 presents the sample size (n2) per treatment needed to validate an

experiment with 25 genes declared to be DE at different levels of FDR and average

different effect sizes. Here It; is small compared to the number of replicates (n 1)

necessary for a microarray experiment, particularly for small FDR values. As expected,
the number of samples increases if the FDR from the microarray increases or if the

average effect size decreases.

3.2.2 Non-independent sample sets in screening and validation experiments.

Thus far we have assumed that different samples are used in the validation. But
due to resource constraints, the same samples are often reused for the qRT-PCR
validation experiment. Such validation might more properly be called technical
validation. If the same samples are used, the two test statistics (based on the microarray

and qRT-PCR data) on each gene are not independent and this should be accounted for

when determining a“) and 0(2).

The correlation among the tests (p*) will depend on the correlation p among

repeated measures (i.e., correlation of the microarray and qRT—PCR expression measures

on the same sample) and the number of samples in common in the two experiments (no);

that is, it can be readily shown that p* = low/£9. Consequently, if the same biological
n

samples are used for both tests, the correlation of the tests is the repeated measure

correlation.

103

 

In the presence of correlated tests, the actual type I error rate of the two-step test,

needs to be calculated with equation [1]. Alternatively, given a nominal type I error level

of the ﬁrst step (a1) and a target overall 0t, 0(2) for the validation step can be obtained by

solving equation [1] for Zcz. The power of this test can subsequently be obtained from

equation [2].

Case 4.

For a certain gene, the correlation p among repeated measurements with

microarray and qRT-PCR is 0.66. If am =0.05 and the overall type I error rate required
. 0(2)
1s 00:0.0025, the second stage p-value cutoff should be 0.0037898. On the other

hand, if p =0.56, then d2) should be equal to 0.0051.

Figure 4.12 illustrates the power of a single test for n1=n2=25 and 6U)=0.6 in the

ﬁrst stage for three different scenarios of correlation between measures of the same gene
with different technologies (p =0, p =0.56, and p =0.66).

For a large relative effect size in the second stage, the power of the test is almost
identical regardless of the degree of correlation in the two measures. But for moderate
effect size ratios, there is a clear advantage in using independent samples for both stages.

The FDR of the two stage tests also depends on the correlation between sequential

. . . . * I
test statistics. In the prevrous case, if we assume for example p =0.66, 6( )=0.6 and

6(2)=0.72, using the same value for all) as for correlated tests compared to independent

104

tests, the FDR sensibly increases (Table 4.3). The value for 05(2) can also be adjusted in

correlated tests to attain an overall comparison wise error rate equivalent to the

independent test. In that case the FDR is only slightly larger than the FDR for

' . . . 2
independent tests, but the power is reduced from 0.57 to 0.42. Finally, adjustrng d ) to

obtain the same FDR as the uncorrelated test, yields a further reduction in power in the

correlated tests, to a value I-ﬂ=0.37.

3.3 Estimated effect sizes and correlation for microarray and qRT-PCR assays.
Throughout this paper, it is assumed that effect sizes from different technologies
are known. For illustration purposes, effect sizes in the range of 0.4 to 1.5 are commonly
used in the literature, but in practice, a pilot dataset might be used to estimate these
parameters for a similar future experiment. In this section, we estimated effect sizes using
an experimental dataset available from the pubic domain (Perreard et al., 2006). The
original dataset consisted of 123 samples of which 94 non-metastasised tumor samples
were included in the study where the absolute expression level of 53 genes was measured
using a qRT-PCR assay. Expression levels of the corresponding genes in the same
samples were retrieved from a more comprehensive microarray study (GSE2607 at
http://www.ncbi.nlm.nih.gov/geo/) The sample was divided into two groups according to
estrogen receptor status (positive, n=49 and negative, n=45). The expression values were
normalized relative to the arithmetic mean of three housekeeping genes (MRPL19,
PSMC4, and PUMl). A summary of the distribution of the effect sizes for both

technologies is presented in Figure 4.13.

105

The overall effect size ratio was slightly larger than 1.0 for the expression
measured by qRT—PCR with respect to microarrays. In fact, for some cases, this ratio was
as high as 1.5 whereas for some genes the effect size estimated from microarrays was
somewhat larger than the estimated effect size for qRT-PCR. The average effect size
from both technologies was just below 1.0, indicating a moderately large effect size as
compared to other reports from gene expression studies (Pawitan et al., 2005). This result
is not surprising considering that the genes were selected based on previous knowledge of
their functional classiﬁcation.

The effect sizes are a function of the experimental designs and different
experimental layouts will produce different effect size estimates because of differences in
efficiencies. In order to generalize these results to other designs, we need to specify the

relative efﬁciency of the two designs (Timm, 2004). For example, suppose that design A

has an effect size ESA=1.0 and that design is 1.1 times more efﬁcient than design A, the

expected effect size of design B is ESB=1.0><1.1.

A bivariate linear model was used to estimate the residual correlation between the
two technologies for each gene. The median correlation coefﬁcient was 0.66 and the
central 50% of the estimated values was between 0.56 and 0.79 (Figure 4.14). This high
level of correlation is on the order of the values used in the preceding section on
correlated tests.

Generalizing the estimate of the correlation to other experimental layouts is not as
straightforward as it was for effect size estimates as the task requires dissecting the
technical and biological sources of variation associated with each gene expression assay

platform.

106

4 Discussion

In this study, we proposed the use of two-stage tests for expression proﬁling
experiments to couple results from microarrays and qRT-PCR assays. The proposal
consists of using the microarray experiment as a screening step, selecting a reduced
number of genes that are subsequently assayed using qRT-PCR. Two stage designs can
be planned a priori by splitting a sample set for usage within the two stages and deﬁning
the number of hypothesis to carry from the ﬁrst to the second stage. Previous studies
controlled the experiment wise type I error rate (Satagopan et al., 2004; Satagopan et al.,
2002), but more recently, the control of FDR in two stage experiments has been
considered for association studies (Zehetmayer et al., 2005). Recently, the use of
sensitivity has been proposed as a measure of power in multiple tests (Pawitan et al.,
2005).

Our results showed that the sensitivity of a two stage experiment at a ﬁxed FDR
will depend on the sample size, the value and distribution of effect sizes in the ﬁrst and
second stage, and the proportion of genes selected in the ﬁrst stage for subsequent testing
in the second stage. For attaining a larger sensitivity in a two stage test compared to a
single stage test, the proportion of genes selected should be equal or larger than the
expected proportion of differentially expressed genes and the effect sizes of the second
stage should be larger than the effect sizes in the ﬁrst stage. We focused most of our
calculations in the particular case where the proportion of selected genes is equal to the

expected proportion of true alternative hypotheses. Pawitan et al. (2005) showed that in

this circumstance, for a 710:0.99, the FDR is expected to be very high, usually larger than

0.2. In our case, however, this is not completely detrimental since the second step is

107

designed to eliminate the false positives and lower the overall FDR to an acceptable
level. Conversely, selecting a small proportion of genes in the ﬁrst stage for further
second stage validation will impose a constraint in the maximum sensitivity that can be
attained.

Our results indicate that in general, more samples should be assigned to the ﬁrst
stage than to the second stage to maximize sensitivity. That is, generally 60 to 80 % of
all biological samples should be utilized for the microarray experiment. Lower numbers
of first stage replicates would be recommended for situations where a large proportion of
genes is selected for further validation and the relative effect size for the second stage is
around 1.4 times larger than the effect size in the ﬁrst stage. We also showed that the
relative sample allocation is similar for different combinations of total numbers of
samples available, for different effect sizes, and for different distributions of the effect
sizes. The magnitude and distribution of the effect sizes however, do affect the overall
sensitivity and minimum sample size necessary to attain a certain target FDR and
sensitivity which is in accordance with results from single stage experimental designs
(Jung, 2005).

In certain studies, including a large number of arrays may be cost prohibitive. If
less arrays than the optimal number are used, the sensitivity of the experiment will be
reduced. A way to mitigate this is to use a more liberal p-value cutoff and include more
genes in the validation study (simultaneously increasing sample size), but such strategy
will also increase the overall cost of the experiment. Reducing sample size in the ﬁrst
stage while keeping a stringent signiﬁcance criteria can not be compensated by increasing

sample size in the second stage because the overall power of each test is (for independent

108

tests) the product of the ﬁrst stage and second stage power. Moreover, in the optimized
designs the power of the validation step is already close to one for all genes.

The overall sample allocation between the two stages is similar assuming known
or unknown variances, but the total sample size is increased if there is uncertainty about
the variance, especially if large effect sizes are expected. Under these circumstances,
reduced sample sizes will lead to different total sample size if a t-test (unknown
variances) is assumed instead of a z-test (known variances). Similar to the case of single
stage experiments (Jung, 2005), the differences are minimal for effect sizes of the order
of 1.0 or smaller.

We also compared different alternatives to implement the two-stage test. Our
results suggest that an effective strategy is to select a ﬁxed proportion of genes in the ﬁrst
stage and then estimate the p-value cutoff for statistical signiﬁcance in the second stage
to control the FDR at a desired level. In that case, the FDR of the selected genes from the
ﬁrst stage is a suitable estimate of the proportion of null hypotheses selected from the
first stage. Interestingly, the p-value cutoffs for the second stage are much higher than the
traditional 0.05 and 0.01 values commonly employed in validation experiments. The
reason for this is that the proportion of null hypotheses in the validation set is usually
small, i.e. lower than 0.5.

In the second section of this study, we addressed the issue of designing a
validation experiment conditional on an existing microarray study and the use of the
same samples in the validation experiment (i.e. technical validation). Design optimization
of a validation experiment has not been considered before. Yet we have shown that if a

sufﬁciently large list of genes is being validated, the general principles of design of

109

microarray experiments when controlling FDR (Jung, 2005) could be applied. Moreover,
in that case, the microarray experiment can be used as a pilot study from which the
proportion of differentially expressed genes in the validation set and effect sizes could be
estimated. In general, if the FDR of the microarray experiment is small and the effect size
is large, few additional samples and a liberal p-value cutoff for statistical signiﬁcance
would be sufﬁcient for the second stage validation experiment.

Reusing the same samples in the validation experiment is called technical
validation. This practice, though not recommended, is common (Allison et al., 2006).
One reason for this protocol is the fact that some transcriptional proﬁling experiments are
conducted using samples stored in the past from previous experiments and re-conducting
the validation experiment with new samples might not be feasible. We have shown that
given a false rejection (false positive) in the ﬁrst stage, the chances of repeating this same
decision error in the second stage would be higher when using the same samples. In order
to control the overall type I error rate in these cases, the p-value cutoff for declaring
statistical signiﬁcance in the second stage would need to be reduced. Reducing this
cutoff, however, will produce a less powerful test. In order to calculate these cutoffs, the
correlation between the tests, which in turn is a function of the correlation between
sequential measures in the same sample, needs to be determined.

Lastly, we used a publicly available dataset to estimate effect sizes and
correlation of microarray and qRT-PCR based expression measures. We found that for
this particular dataset, the effect size of the selected genes (declared as differentially
expressed) was of the order of 1.0. The average ratio of effect size from qRT-PCR to

microarray was close to 1.6 but showed a large variability with some genes having ratios

110

as low as 0.6 and more than half of the genes have ratios larger than 1.0. Also the
repeated measures correlation was very high, with an average value of 0.64 and a more
than half of the genes showing a correlation larger than 0.7. These results are speciﬁc to

this particular experiment, but are within the range of parameters considered in our study.

111

TABLE 4.1 Outcomes of m=m0+m1 two-stage tests. A represents accepted hypotheses,

R represents rejected hypotheses, the sub index 0 indicates true null hypotheses and 1

indicates true alternative hypotheses. The super index indicates the stage of the test.

 

 

 

 

Conclusions from ﬁrst stage test
TRUE STAT A“) R”)
0F Conclusions from 2nd stage test
NATURE A”) R0) Total
['10 A0“) A00) R00) m0
H] A1”) 141(2) R10) m1

 

 

 

 

112

TABLE 4.2 Error measures in two stage tests based on the quantities presented in Table

 

 

 

 

4.1.
Error Expression
measure First Stage Two-Stage
A1”) + R1”) R1”)
Sensitivity m1 m1
(2) (2) (2)
A0 + R0
(2) (2) (2) (2) (2) (2)
FDR A0 +R0 +"‘1 +R1 R0 +R1

 

113

 

Table 4.3. Sensitivity and FDR for two stage tests based on correlated test statistics.

”0:099, 6‘“=0.6, 3220,72, n1=n2=25

 

 

p Nominal a2 or Sensitivity FDR
0.00 0.05 0.0025 0.57 0.31
0.66 0.05 0.0179 0.64 0.73
0.66 0.0038 0.0025 0.42 0.38
0.66 0.0022 0.0016 0.53 0.31

 

114

.98“ 33 at": EH .vdu

3

we .mcdnk .dddﬂak new 383030 203 3:62

2E. .033. 2503 05 9 @0an on 9 335093 @8028 m0 Aomﬁm Th o and mod .de .86 .398 809035 880% Homage.“
8:: 2:. .mowﬁm 9E 05 5233 3% “coho 2522 05 mo conga a ma 83.0.06 3:590 mo 53238 was 8980:“ 033m 4.9 0.59%

Emu \Amvww

9m 3 3.. m; 3

 

mowﬁp :1
mod .1.
de

Ed 1 u
mood I
8823.35

 

 

 

.‘ rill

 

 

 

3.5.23 3

0'0

3'0

170

9'0

8'0

0'1

(”Mummies

 

Asmwtuvww
dd md dd m;
P _
«dd . r.
5d a ..
mood I
8.8.8 .85

 

 

‘ o
I
ooooo

 

 

 

 

cozuoo=< 29:3 3

0'0

30

V0

90

8'0

0'1

WM

115

.vd no A: 538 2:3

9 @038 one? 3333mm. 085 0E. $580.83 6 650885. 3 4:838 Q A omSm H8 one Soto 06 mo :oUBEmE .Né 9.33%

 

 

 

 

 

 

me wwm www
ON m... o; md VS NJ 0.? Cd od v.9 N... 0; ad 0.0
P h _ _ — _ — — _ F _ _ . —
a 0 - 0
I C. I m
H H H.
m - m .. m
e r 0 8 u 0 m
u u u
m .m m
I l u 8
9 0
r. a.

032:.» E828

 

116

.vd 8 o; 508 023

S @038 803 aousnEmE 085 05. $830.83 A0 0.50883. 3 .8888 Q A own; How 0an Soho 05 mo noun—£55 .Né 0.59%

 

 

 

 

 

 

5m 5m 5m
dd m; o; md v; N; 0; ad wd v; m; o; ad dd
7 _ _ l. .l _ . _ _ _ _ _ p _
I 0 I 0
I S l mu.
H H H
m t m m
m To m - w m
U U U
9 3 O
.A .A (A
I 1' I 9
C. 0
- w - 0

 

€8.50 282.8

116

.nousnEmG ugh—83 a 8 380900.88 bm>2800 one; 0380 2:. $.de .oddnbk dun:

.mH:»Q owgo>a .ONmm Hoowo DMSm ﬁﬂOOOm ﬁg Hg 05 m0 ﬂown—93mg on... MO ﬁOﬂO—ua ma ESU—mem mug ﬁCUﬂoozd ONmm Dumadm .Moﬁ Pun—ME

 

 

 

 

 

 

 

 

 

 

ACwm\ Emu vam>~vmm
0...." m.N ON m... o... 0...." md ON m... o...
n _ _ _ i _ _ F
0°30. .I. I W >300 r W
>800 050.56 II
oEoE>w I I 28050 l.
.5238 I. I .0 5m. 52559 I .0
.mm 8:358 a a
I .0 S I .0
.7 9 .7
U
m... m.
w” W
I .0 ha I .0
9 (\ 9
ll I .0 I .0
8 8
I m - m

 

 

 

 

 

 

3.2.350 3 8:822 03:50 a

117

8080303 80.30% 0 8 38000088 b32800 own; 0380 BE. .20de .dddndk .860

Chg": .V.Q"«:® owao>m .ONmm HOOHMO OwSm yam“ 05 MO ﬂown—33mg 05 HO GOUUGB an bn>ﬂnmem ﬁg ﬁOUﬂOOq—d ONE o—QEMW .Q.Q a;

 

 

 

 

 

 

     

 

 

 

Eww>$wm Asmm\ﬁvmm
90 md dd m; 0; od 0N dd 0; o;
_ _ _ L _ _ _ _
0920, .I. IN >300 TM
>300 .. .. 0E0E>w I I
oEmE>m I I €30.80 III
E8050 I I .0 5m cozantﬁo I .0
me co_Sn_.=0_0 8 Z
I .0 S. I .0
V a V
U
m... ..u
'll'|'lll|lllllll.l.l...l1.lff.t W. w
............ I I I I I I I I .0 Ma I .0
....................... I I \ 9 ( 9
I1 I .0 I .0
8 8
I m I m

 

 

 

 

 

 

3.2.350 3 8:822 0.0.50 a

118

.0930 @8800 05 E 080. “8&0 05 9 03200 083.... 080 05 E 080. 000000 05 00 can. 05 00 8095589 .md. 0.5mm."—

mé n... v;

 

 

Aouenberd
Aouenbeid
1
OZ
Aouenberd

91

 

 

 

 

OZ

 

0:250 E8230

119

 

.31 dad"? .905 ”one 0%

Soto 306% 3m 3?. Soto own: 65 05 we nouanEmE 05 mo 83:3 3 A8 @3288 BE 3 8282? 3?. £95m 6.9 0.33%

 

 

 

 

 

 

 

 

 

‘1 m; o; ad ad to v; m; o; ad ad v.0
_ _ L _ _ _ _ _ _ _ _ _
was“: .I. I W >83. I W
380 2:956 I I
oEmcim I I 29200 I
Emﬁcoo I 0 5m 0
5m cozanEmB I Z I Z
I .0 S I .0
.7 9 .V
U
5. Lu
m w
I .o A) o
[[[[[[[ [LI-LIP: it-r.tLII-l 9 I7“ I. 9
0%.!
1. .0 \ .I .0
9 9
.. m I m

 

 

 

 

 

 

3.2.35...” 3 5:822 26.58 3

120

Au

.Om": J" W .WOAI. .gdvﬂsx .ONmm woomwu Q>UM~OH 05 m0 ﬂOUOﬂn—m a mu ﬂaw—mow 3590 MC bgﬂmmﬂom ﬁg ﬁOUMOOH—ﬂ O—Qadm .63? Ohm—MPH

 

 

 

 

 

 

 

3
Sww\ vawm Acmm\§wm
od md ad m; o. F Qm md oN m; o;
_ _ _ _ _ _ _ _ _ _
mama, II T W mod .I I M
mod .I. m5.o .
mad 5.0 II
Ed II mood II.
mood I I w 8.8.3 den. I w
$828.35
I .0 S I .0
.7 9 .V
U
m... ..u
\I m w
\\.... 0 ) \.I O
IIIIIIIIIIIIIIIHIIAIVKII I 9 a I.\. 2., I .9
|||\.I\\\...\..\ .\\ .. I
IIIIIIIIIIIIIIII .I\ II .I..I...\. III, I
........... III 10 I.I.I.I.. . IIIII I [0
................ II \ I. 8 ' ’14............\".|\\ I 8
It ......... (I """ \‘ I
I m I m

 

 

 

 

 

 

332.23. 3 5:822 macaw a

121

Hanan
sung? 05 Boa 35890 on?» @8895 05 mm on: 2: 3833 938 08 «E22: 8509 2:. A33 @850 850$qu 05
828 8 3563on 320% no woman accruing. Bod @0588 SE bosommv 03$ 23 bgummaom 05 Ho 8253me dd. 5
“can A

N N
p _ _ _

N N

A

 

 

0'0

”can. Ammo“. 3%. m :95. m 320. m sea. m
0
O
O
0

1'0

3'0
80:!
MIAnIsues

8'0

 

 

7'0

 

 

 

 

 

2a: 5385 one". a 2.352%

122

.ounounomouﬁ magmas
gnaw? nouﬁmaumo 05 Bow 35890 33> @8830 05 Soﬂwou on: 06 $833 538 05 8862 KB noso “:55 8509
can. .0an x83 05 "088:2: v.03? 38-x 05 co xouﬁnam 05 380:? 306508 acuﬁouuoo 2: 8863 338:: 38.3 2? Saw

2: E 2385 gov—2880 Bahama. 8m gouging Bod Buﬂﬁo 88 ©9686 33 3 was @3238 Q mo nova—£55 6.1 unaut—

 

 

 

cuouﬂq 388...
33 068 etc etc o ammo ammo 9% etc o
_ _ _ _ _ _
W o
o o l W S
o 9
o 4.. U
o O B.
u m.
0 .I .0 rm.
9
O
O
u .o
8
o - m

 

 

 

 

 

 

 

Sum E0335 omin— 3 §>Emcom E

123

 

0.7

06

I

H-H
l-H
Hm
Hui
H.”
W

|-I-|
H-H
i-I-l H—l

0.5

3‘ V-

I; o

:5)

C

(D (O

a, l

0.2

0.1

 

 

 

I I | I I I I
16 18 20 22 24 26 28

Figure 4.10. Sensitivity of two-step tests for correlated hypotheses as function of the

sample size in the ﬁrst step. =200 Simulation with 710:0.99,f=0,05, (5(1)

(1) /m=0.01 . The arrow indicates the optimal sample allocation obtained assuming

=1,n=30,
R

uncorrelated tests. 0: p=0.4, bloCk size=40, I: p=0.4, block size=10.

124

 

 

50
I

30

"I2

10

o
a”"--
.I.—4".
-.-
-.-
--
‘I-I-l-I-l-l
'-
"
"”
d.
a-
.-
av
'
.—
a’
’
a’
’
ﬂ
’ .
a-" ............
'a ---------------------
' ......
" IIIIIIIIII
........

 

.---
0000
----------
..--
......

 

 

o — — 0.6 ---- 1.0 ------- 1.4
I I I I | I I I

0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

 

 

 

FDR“)

Figure 4.11 Minimum number of replicates per treatment in a validation experiment as
function of proportion of null hypotheses (FDRm) and effect sizes. Rm=25 genes

putatively differentially expressed at different levels of FDR , target sensitivity = 0.9,
FDR after validation = 0.05.

125

 

 

 

 

 

 

 

 

 

°°. _
O
‘0. _
O
‘0- <-
| d -
G! ..
0
Correlation
— O
o ---- 0.56
o - ------- 0.66
I I I I I I
1.0 1.2 1.4 1.6 1.8 2.0

5(2) / 5(1)
FIGURE 4.12. Power of a single two-test for different correlations (O, 0.56, and 0.66)

between the ﬁrst and second stage data as a function the ratio of second stage to ﬁrst
. . I .
stage effect Slzes. The effect Size of the ﬁrst stage (5 )) 1s 0.6 whereas the number of

replicates per treatment is 25 for each test.

126

 

 

Relative expression
2
I

_ﬁ_
I
__I__ I
' I
I
I
I
I
I
I
I
I
I
I

     

 

 

 

 

 

I
i :
. _J_
o _ __I_
I l l
8PCFI/ 6array 8array 5mg
variable

Figure 4.13. Box-Plot of the microarray and qRT-PCR based effect size estimates and

their ratios.

127

14

 

0 CD I- ---------------- r + i I------I

 

12

 

 

 

Frequency

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.0 0.2 0.4 0.6 0.8 1.0
A
p
Figure 4.14. Histogram of the estimated correlation coefficient between microarray and

qRT-PCR measured expressions.

128

 

GENERAL DISCUSSION

Transcription proﬁling is essential for eludicating gene function and regulation. In
particular, microarrays and qRT-PCR are the most commonly used platforms to measure
levels of mRNA of target genes. Since the ﬁrst emergence of microarrays as the high
throughput technique of choice for transcription proﬁling, there has been a consensus that
sound experimental designs, appropriate statistical models and methods with validation
are crucial to the experimental process (Allison et al., 2006). In this dissertation, I
addressed some particular aspects of experimental design, statistical modeling and

validation of transcription proﬁling studies.

1. Objectives revisited and their impact in animal functional genomics.
1) To compare alternative reference designs for statistical efﬁciency of two color

microarray experiments considering multiples sources of variation.

Two color microarrays (i.e. cDN A and oligonucleotide microarrays) are platforms
of choice for transcription proﬁling in animal functional genomics studies. The reference
design is one of the most popular designs despite not necessarily being the most
statistically efﬁcient. The term reference design is generally used for any different
experimental layout where a common reference is hybridized with a different test sample
in every array. Reference designs have been previously compared to other designs

(Dobbin and Simon, 2002; Kerr, 2003a; Kerr and Churchill, 2001b; Tempelman, 2005;

129

 

Yang and Speed, 2002), but an exhaustive comparison among all variants of the reference
design has not yet been performed. In Chapter 1, the comparison among designs was
made in terms of relative efﬁciency for the same amount of resources. In contrast to
some previous work (Kerr and Churchill, 2001a; Yang and Speed, 2002), we make a
clear distinction between technical and biological variability. Additionally, we introduced
a new variant of the reference design, that we called Blocked Reference Design (BRD).
The BRD had the highest efficiency compared to any of the traditional design
alternatives. The BRD is potentially very useful in studies where comparisons with a
control group is of interest, yet comparisons between non-control groups are also
important. The BRD is as efﬁcient as the replicated reference design for direct
comparisons whereas it is as efﬁcient as the common reference design for indirect
comparisons (between non-control groups). One criticism made of the common reference
design is that each hybridization involves a sample that is of no intrinsic interest to the
researcher. This is not the case with the BRD where half of the hybridizations are also
used for the control group. Depending on the number of groups, the BDR may be more
efﬁcient for some comparisons than other designs such as the loop design. For the
comparisons of these designs, we used a model based upon log-ratios of expression as
response variables; nevertheless, we took special care to consider all sources of variation

to account for hierarchical replication that is present in the BRD.

2) To investigate the ramifications of log-ratio versus log-intensity modeling in two

color microarrays using linear mixed effects models

130

 

Chapter 2 compared two broad classes of statistical models used for the analysis
of microarray data. Kerr (2003b) compared log-intensity and log-ratio models for some
simple experimental designs, and we extended their work to more general cases based on
using a linear transformation that connect the two models. The focus of our study was in
modeling hierarchical sources of variation and the potential loss of information incurred
when using log-ratios rather than log-intensities as response variables. For this purpose
we used mixed model methods that had already been presented in the context of
microarray experiments (Rosa et al., 2005; Wolﬁnger et al., 2001).

We showed that the analysis of intensities, specifying the array as a random
effect, lead to the recovery of inter-block information. In our ﬁrst chapter, we used a log-
ratio model to compare reference designs ignoring the inter-block information.
Accounting for such information, however, is more important when comparing reference
designs to direct comparison designs where there is no recovery of inter slide information
as shown by Kerr (2003a; 2003b). We showed that the recovery of inter-slide
information is very important in some other designs such as the split plot design. For that
design, the analysis of log-ratios, or the use of a log-intensity model with ﬁxed array
effects, leads to the loss of all information for inferring upon the sub-plot factor.
Moreover, even if the recovery of interslide information is rather minor as in some cases,
we showed that it is often essential to properly account for hierarchical replication that
occasionally occurs with two color microarray designs. Formulating the correct model on

the log-ratio scale is not always a trivial task; nevertheless, such a model can always be

131

 

derived from a log-intensity model by applying a linear transformation. Another
important point that was further evident from Chapter 2 is that treating experimental units
as ﬁxed effects generally does not lead to proper inferences even though point estimates
and treatment differences are the same as those from a log-intensity model. These results
are relevant to the applied genomicists, particularly those using statistical analysis

software based on log-ratio models, such as LIMMA (Smyth, 2004; Smyth et al., 2005).

3) To develop linear mixed models for the analysis of relative quantiﬁcation RT-

PCR data.

The qRT-PCR technology is currently the method of choice when validating
ﬁndings from microarray studies. In contrast to microarray data, the development of
statistical methodology for the statistical analysis of qRT-PCR data has been overlooked
until recently (Yuan et al., 2006). Most of the currently available statistical methods in
this area are more useful for simple experimental situations where the main goal consists
of pairwise comparisons of experimental groups. Recently, some linear models have been
proposed for the analysis of qRT-PCR data (Cook et al., 2004; Fu et al., 2006; Szabo et
al., 2004; Yuan et al., 2006). In Chapter 3 we proposed a linear model for the joint
analysis of test and control genes that allows one to accommodate an arbitrarily complex
design and to test general linear hypotheses in qRT-PCR experiments. In comparison
with some of the previous work, we again make a clear distinction between technical and

biological replicates. For example, the work by Yuan et. a1 (2006) or Szabo et. a1 (2004)

132

 

considered only one level of replication. Fu et al. (2006) addressed the issue of technical
and biological replicates using a generalized estimation equation model, but they
simpliﬁed the model by taking the difference of CT between control and test genes in
each technical replicate. Our experience is that this strategy cannot be used in most
experimental designs because technical replicates of control and test genes generally
cannot be matched. We used a motivating example from an animal genomics study
(Poletto et al., 2006a) to demonstrate the implementation of the model and resorted to
simulation to evaluate properties of the statistical procedure and estimates derived from
it. Finally we applied the analysis model to several datasets (Coussens et al., 2003;
Coussens et al., 2004; Szabo et al., 2004) to illustrate its ﬂexibility. Our model is more
powerful than the AACT method (Livak and Schmittgen, 2001) and provides a better
control of the Type I error rate than a previously published method (Cook et al., 2004).
Using simulation, we demonstrated the importance of including random effects of
biological replication within gene in order to correctly draw inferences in the presence of
technical replications. The linear mixed model presented in Chapter 3 is not only more
powerful than classical analysis methods, but is also more ﬂexible as it allows testing for

general linear hypothesis such as interactions and linear trends.

4) To propose a general framework for determining the false discovery rate and

sensitivity of gene expression studies when jointly designing microarray

screening experiments linked to subsequent and selective qRT-PCR validation.

133

 

Optimizing design of validation experiments has not been considered previously
in the statistical genomics literature. In Chapter 4, we addressed the issue of sample size
calculation and allocation in validation experiments using a two-stage testing approach.
Sample size calculations for two stage tests have been developed for association studies
(Satagopan et al., 2004; Satagopan et al., 2002; Zehetmayer et al., 2005) but not for
transcription proﬁling experiments. We considered various measures of power and error
control concentrating on those cases where several genes are validated in an attempt to
discover most differentially expressed genes within an experiment. We demonstrated that
more samples are generally needed in the screening stage to ensure high sensitivity for
adequate error control as necessary for microarray experiments. Conversely, we showed
that a more liberal level of statistical signiﬁcance could be used in the second stage
thereby requiring fewer samples in order to attain a high sensitivity. This result per se is
very important because it deﬁes a currently more common practice; i.e. a limited sample
size is often used for the microarray study whereas more samples are used in the qRT-
PCR stage. Moreover, the traditional levels of signiﬁcance, 0.05 and 0.01, are typically
used in the second stage when a higher cutoff might yield an adequate control of the false
discovery rate. A similar ﬁnding was reported by Zehetmayer et al. (2005) in two stage
designs for association studies. However, we have to acknowledge that the
implementation of these guidelines, however, will imply an increase in the cost of the
experiments if a high sensitivity is desired. Finally, Chapter 4 addressed the power and
false discovery rate of a technical validation study using the same samples in both stages.

Even though technical validation has been recognized as having questionable merit

134

(Allison et al., 2006), it is still used extensively in functional genomics. Our results
suggest that if the same samples are used for the validation, the levels of signiﬁcance
should be adjusted downwards accordingly to appropriately control the false discovery
rate. Moreover, we provide expressions to numerically calculate these levels based on the
ﬁrst stage signiﬁcance level and the correlation among measures on gene expression
between the two technologies. Accounting for such corrections would help mitigate the

problem of double false positives due to correlation among tests.

2. Future research directions

A natural extension of the linear mixed models used in Chapter 2 would be to
consider borrowing information across genes. Shrinkage estimators for intensity based
mixed models were presented by Cui et al. (2005) while shrinking for log-ratio models
was published by Smyth (2004; 2005) but based on the use of ﬁxed effect models or a
very restricted class of mixed effects models. The concepts presented by Cui et al. (2005)
however, could be readily extended to a mixed model for log-ratios.

The mixed model for qRT-PCR studies assumed a simple correlation structure
among genes, as derived from the sampling scheme, but assuming no co-regulation or
biological correlation between mRNA level of genes. This assumption, though
implausible, is very common in microarrays and qRT-PCR studies; nevertheless, a
multivariate model could be used for qRT-PCR if a few genes are assayed for a large
number of samples perhaps by generalizing, for example, the model used by Poletto et al

(2006b) to incorporate technical replication. It is expected that a multivariate linear

135

 

model will use information across genes and could yield important information about the
correlation in the expression of co-regulated genes.

Finally, given the very simple experimental setup used in Chapter 4, the sample
size allocation and rejection rules should be considered to be preliminary results as
applied to potentially more efﬁcient incomplete block designs. In those cases,
optimization of two stage tests could be studied using mixed models for qRT-PCR as
presented in Chapter 3 for more complex designs. Such an extension, however, is limited
by the lack of knowledge regarding the correlation between the two techniques, although
this correlation could be estimated using bivariate linear models. Publicly available
datasets may provide some information about the relative effect sizes of the two
technologies and to some extent of the technical correlation, but at some point, future
experiments might be required to fully understand the co-variation for gene expression

between microarrays and qRT-PCR technologies.

136

 

APPENDIX ONE

137

 

SUPPLEMENTARY MATERIAL TO CHAPTER 3
The data used in the ﬁrst sections of this paper are part of a study that involved the
analysis of expression of several genes in the brain of piglets subjected to weaning and
isolation stress. The complete presentation of those results is the motivation of an
independent publication(Poletto et al., 2006) and a detailed description can be found in
that paper. In this section we give more details of the statistical analysis, including

variable deﬁnition, data structure and program code.

Quantitative Real-Time RT-PCR. Q-RT-PCR was performed as described

previously(Gibson et al., 1996; Heid et al., 1996). Brieﬂy, a total of 2 pg of total RNA

from each sample was reverse transcribed using oligo (dT)13 and SuperScript II Rnase H-

reverse transcriptase (Invitrogen Life Technologies Corp., Carlsbad, CA). Reverse
transcribed cDNA was quantiﬁed using ND-1000 spectrophotometer (NanoDrop
Technologies Inc., Rockland, DE). A total of 30 ng of cDNA was employed in each real—
time reaction. Forward and. reverse primer sequences were designed with Primer Express
2.0 Software (Applied Biosystems, Foster City, CA) and synthesized by Quiagen. The
oligonucleotide sequences of the primers are summarized in Table 1. To conﬁrm that
primer-dimer products were not inﬂuencing ﬁnal Ct values, control reactions without
template but with each set of primers were performed, with the anticipated result that no
product was ampliﬁed. Q-RT-PCR was performed and analyzed on an ABI Prism 7000
Sequence Detection System (Applied Biosystems). Sus scrofa 188 ribosomal RNA was
selected as the control gene to be used for normalization purposes based on preliminary

test reactions, as expression of this proposed control gene expression did not change

138

 

relative to treatments. All reactions were performed using template from individual

animals in triplicate. Relative quantiﬁcation methods were determined based on the

primer ampliﬁcation efﬁciency tests(Livak and Schmittgen, 2001).

Table 1. Primer sequences and efﬁciencies.

 

 

 

a
Gene Sequence E
1 BS Fonlvard 5'-GGCTCATTAAATCAGTTATGGTTCCT-3' 1 .99
Reverse 5'-AGCTCTAGAATTACCACAGTTATCCAAG-S'
DBI Forward 5'-GGAAGTTAAGAACCTTAAGACCAAACC-S' 1 .96

Reverse 5'-TCGCTTGTTTGTAGTGGCTGTAG-3'
2
aamplification efficiency obtained from a relative standard curve (R >O.99). The two values are

close enough to the ideal efficiency (E=2) such that the raw CT is a valid proxy for the log-

concentration of mRNA.

 

Response variable for linear model analysis if E¢2. In the presence of estimates of the

ampliﬁcation efficiency (E < 2.0) and CT value from each ampliﬁcation curve, the

response variable for the analysis will be:

—CT "k (1)
ygijkr = “82(Egijkr g1] r )’

where the efﬁciency values may be estimated from the ampliﬁcation curve (i.e analytical
method(Marino et al., 2003)) or from a relative standard curve included in the assay
p1ate(Pfafﬂ, 2001). If a relative standard curve quantiﬁcation is selected, the ABI 7000

sequence detection system produces a “quantity” value in the output. Such a value is

directly proportional to the efficiency corrected CT in (l), and the user only has to

transform the value by taking logarithm. The back-transformation equivalent to equation

(9) of the Methods section will be:

diﬁ‘ EWI—EWC (2)
FC(EWl—EWC) = 2 ( )-

139

SAS code and data. The SAS program used to fit the linear mixed model and obtain the

log-fold change estimates is presented in Figure 1.

Figure 1. SAS Code for implementation of mixed model analysis with model (2).

 

proc mixed data=ratio;

class gene trt sample litter;

model ct=geneltrt /outp=resi;

random sample;

random litter litter*trt/group=gene;

estimate

estimate

'interaction'

'main effect:

estimate 'ewc-nwc' gene*trt -1 0 1 O 1 0 -1 O/cl,
estimate 'ewi-nwi' gene*trt 0 -1 O 1 O l O -1/cl,
estimate 'ewi—ewc' gene*trt 1 -l O 0 -l 1 O O/cl,
estimate 'nwi—nwc' gene*trt O 0 1 -1 O O -l l/cl,

gene*trt -1 1 1 —1 1 —1 —1 1;

ew-nw'

gene*trt —.5 —.5 .5 .5

estimate 'main effect: i—c' gene*trt .5 -.5 .5 —.5

run;

 

.5
-.5

.5 -.5 -.5/Cl ;
.5 -.5 .5/Cl;

 

 

The estimate commands provide hypothesis tests, point and interval estimates of

the log-fold change.

The input data was prepared from the relative quantiﬁcation output of the ABI

7000 Sequence detection system. Treatment and litter memberships were added a

posteriori. Table 2 presents the complete dataset.

140

 

Table 2 | Datafile used by the SAS program to fit the mixed model.

 

 

well Sample gene CI IRI litter weII Sample gene CI IRI litter

A4 Fc1 188 17.02 nwc D4 Fc1 DBI 23.33 nwc
B4 Fc1 188 16.46 nwc E4 Fc1 DBI 23.43 nwc
C4 Fc1 1 88 18 nwc F4 Fc1 DBI 23.3 nwc
A11 Fc11 188 18.18 nwi D11 Fc11 DBI 25.31 nwi
811 Fc11 188 18.06 nwi E11 Fc11 DBI 25.5 nwi
C11 Fc11 188 17.64 nwi F11 F011 DBI 25.56 nwi
A6 Fc12 188 16.07 ewi D6 Fc12 DBI 23.86 ewi
86 F012 188 16.12 ewi E6 F012 DBI 23.75 ewi
C6 Fc12 188 16.13 ewi F6 Fc12 DBI 24.14 ewi
A10 Fc15 188 16.23 ewi D10 Fc15 DBI 23.85 ewi
810 Fc15 188 16.35 ewi E10 Fc15 DBI 23.9 ewi
C10 Fc15 188 16.56 ewi F10 F015 DBI 23.83 ewi
87 F016 188 19.45 nwi D7 F016 DBI 27.19 nwi
C7 F016 188 19.11 nwi E7 F016 DBI 28.84 nwi
85 F017 188 16.64 ewc F7 F016 DBI 28 nwi
CS F017 188 16.52 ewc D5 F017 DBI 23.08 ewc
A9 Fc18 188 17.04 ewc E5 F017 DBI 23.16 ewc
89 F018 188 16.83 ewc D9 Fc18 DBI 22.31 ewc
09 F018 188 16.58 ewc E9 Fc18 DBI 22.78 ewc
A12 Fc19 188 17.47 nwc F9 Fc18 DBI 23.18 ewc
812 Fc19 188 17.33 nwc D12 Fc19 DBI 23.93 nwc
C12 Fc19 188 17.53 nwc E12 Fc19 DBI 23.78 nwc
A1 F03 188 18.26 ewc F12 F019 DBI 24.35 nwc
B1 F03 188 18.06 ewc D1 F03 DBI 24.83 ewc
C1 F03 188 18.42 ewc E1 F03 DBI 24.71 ewc
A2 Fc4 188 15.8 ewi F1 F03 DBI 24.76 ewc
82 Fc4 1 88 1 5.79 ewi D2 Fc4 DBI 24.57 ewi
CZ Fc4 1 88 1 5.92 ewi E2 Fc4 DBI 24.7 ewi
A3 F06 188 1 5.1 7 nwi D3 F06 DBI 24.15 nwi

B3 F06 188 1 5.34 nwi E3 F06 DBI 24.22 nwi

CS F06 1 88 1 5.35 nwi F3 F06 DBI 24.22 nwi

A8 F09 188 16.19 nwc D8 F09 DBI 23.77 nwc
88 F09 1 88 1 6.1 2 nwc F8 F09 DBI 24.01 nwc
C8 F09 1 88 1 5.96 nwc

NMQQQQQQQQQ-‘Ad-‘AANNNN-FA-‘NNN-‘A-‘ODOOOO
NNwwwwwwmw-‘AAAAANNNNN—t-t-‘NNNA4400030)

N

The dataset is presented in two sets of columns, one for the control gene and
one for the target gene, but it was input into SAS as a single dataset with six

columns.

141

 

Model comparison. Models used in the datasets listed in Table 4 of the paper.
1. Paratuberculosis infection dataset, Full model:
ygijk, = GSIgl-j + B(S),-k + BI(S),-jk + GB(S)g,-k + GBI(S)g,-jk + egijkl,
Where: Ygijkl is the CT observed in the [”1 well corresponding to the gm gene in the kth

sample with status 1' and treatment j. G: Gene, B: Biological sample, S: Status, 1:

treatment. GSIg,-j is the mean expression for gene g in the Status level i and treatment

~ 2 2 2

 

GB] (S ) gijk ~ N (0, 0:818 ), egg-kl ~ N(0,0'62g ). Reduced models were generated by

omitting random effects or assuming homogeneous variances.

2. Paratuberculosis time course, Full model:

ygikl = GTgi + Bk 'I' BTik +Gng + GBTgik + 68,-“
. . lth . th . th
Where: Ygilsl IS the CT observed In the well correspondrng to the g gene In the k

sample at time i. G: Gene, B: Biological sample, T: Time. GTgi is the mean expression

. . . 2 2
f h , B ~ , , . ~ 9 3 ~ 9 9
or gene g Int e timer k N(0 0'8) 3le N(0 O'BT) Gng N(0 0:38 )

GBTgik ~ N (0,0: ), e8,“ ~ N (0, a; ) . Reduced models were generated by omitting

Big

random effects or assuming homogeneous variances.

142

3. Stress induced differential expression in piglets. Full model:
See model (5) in Material and methods of the paper.
4. rd/rd mice and Taqman low desity array. Full model:

ygikl = GTgI 'I' B(T)ik 'I' GB(T)gik 'I" egikl
. . lth . th . th
Where: ngkl IS the CT observed In the well correspondrng to the g gene In the k

sample from group i. G: Gene, GT8,- is the mean expression for gene g in the group i. B:
Biological sample, T: group (rd or wild type in rd/rd dataset and mutated or non-mutated

in taqman low density array dataset). B(T)ik ~ N (0, 0'12” ) , GB(T)g,-k ~ N (0, 0:81 ).
8

e gikl ~ N (0, 03g ) . Reduced models were generated by omitting random effects or

assuming homogeneous variances.

5. Single tissue housekeeping comparison:

ygk = Gg +Bk +€gk
Where: ngI is the CT observed in the well corresponding to the gm gene in the kth
sample. G: Gene, B: Biological sample. Bk ~ N (0, 0': ) , egk ~ N (0, 0'3 ). Reduced
8

models were generated by omitting the random effect or assuming homogeneous error

variance.

143

Table 3. Effects included in the best-ﬁt models.

 

Random effects

 

Dataset Fixed effectsa

 

b . d
Sample Sample-genec Resrdual
* * '
SPLIT gene Status yes Yes. Gene specrﬁc Gene speciﬁc variance
Infection varrance
TC Gene*Time yes Yes. Gene specrﬁc ‘ Gene speciﬁc variance
varrance
* . * .
PFC Gene Weaning yes Yes. Gene specrﬁc Gene speciﬁc variance
Isolauon variance
MRD Gene*strain yes No/Yese Gene speciﬁc variance
TLD Gene*mutation yes Yes. Gene specrﬁc Homogeneous
variance varrance
SHK Gene yes Gene Speciﬁc variance

 

aspeciﬁcation for ﬁxed effects. I)random sample effect included (yes) or not (no).
Crandom sample by gene interaction included (yes/no) and variance of the effect
(Homogeneous or Gene speciﬁc). dSpeciﬁcation of the residual variance (Homogeneous

or Gene speciﬁc). ethe model without sample-gene random effect and the model with a

sample-gene interaction (with homogeneous variances) yielded the same values for both

selection criteria.

144

Table 4. Contrasts of interest.

 

 

 

 

 

 

Dataset Comparisons with ACT and LMMa Comparisons 31th LMM
Description Number only
SPLIT Simple effect of Infection 3 Interaction infection by
Simple effect of status status
TC Every time versus baseline 4 Linear and quadratic
trend
PFC Simple effect of weaning 2 Interaction weaning by
Simple effect of isolation 2 Isolation
Mrd Wild type versus mutant 1 -
TLD Mutated versus unmutated 1 -

 

aContrasts that can be obtained with both methodologies. bComparison with LMM:

contrasts that are calculated with the linear model only.

145

SUPLEMENT REFERENCES

Gibson, U. E., C. A. Heid, and P. M. Williams. 1996. A novel method for real time
quantitative rt-pcr. Genome Res 6: 995-1001.

Heid, C. A., J. Stevens, K. J. Livak, and P. M. Williams. 1996. Real time quantitative pcr.
Genome Res 6: 986-994.

Livak, K. J ., and T. D. Schmittgen. 2001. Analysis of relative gene expression data using
real-time quantitative pcr and the 2(-de1ta delta c(t)) method. Methods 25: 402-
408.

Marino, J. H., P. Cook, and K. S. Miller. 2003. Accurate and statistically veriﬁed
quantiﬁcation of relative mma abundances using sybr green i and real-time rt-pcr.
J Immunol Methods 283: 291-306.

Pfafﬂ, M. W. 2001. A new mathematical model for relative quantiﬁcation in real-time rt-
pcr. Nucleic Acids Res 29: e45.

Poletto, R., J. M. Siegford, J. P. Steibel, P. M. Coussens, and A. J. Zanella. 2006.
Investigation of changes in global gene expression in the frontal cortex of early-
weaned and socially isolated piglets using microarray and quantitative real-time
rt-pcr. Brain Res 1068: 7-15.

 

146

REFERENCES

147

REFERENCES

Abruzzo, L. V. et al. 2005. Validation of oligonucleotide microarray data using microﬂuidic low-
density arrays: A new statistical method to normalize real-time rt-pcr data. Biotechniques
38: 785-792.

Alizadeh, A. A. et al. 2000. Distinct types of diffuse large b-cell lymphoma identiﬁed by gene
expression proﬁling. Nature 403: 503-511.

Allison, D. B., X. Cui, G. P. Page, and M. Sabripour. 2006. Microarray data analysis: From
disarray to consolidation and consensus. Nat Rev Genet 7: 55-65.

Andersen, C. L., J. L. Jensen, and T. F. Omtoft. 2004. Normalization of real-time quantitative
reverse transcription-per data: A model-based variance estimation approach to identify

genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res
64: 5245-5250.

Belbin, T. J. et al. 2004. Indirect measurements of differential gene expression with cdna
microarrays. Biotechniques 36: 310-314.

Ben Ahmed, M., H. Houman, M. Miled, K. Dellagi, and H. Louzir. 2004. Involvement of
chemokines and thl cytokines in the pathogenesis of mucocutaneous lesions of behcet's
disease. Arthritis Rheum 50: 2291-2295.

Brunner, A. M., I. A. Yakovlev, and S. H. Strauss. 2004. Validating internal controls for
quantitative plant gene expression studies. BMC Plant Biol 4: l4.

Bueno Filho, J. S. S., S. G. Guilmour, and G. J. M. Rosa. 2005. Design of microarray
experiments for genetical genomics studies. Submitted.

Bustin, S. A. 2000. Absolute quantiﬁcation of ma using real-time reverse transcription
polymerase chain reaction assays. J Mol‘ Endocrinol 25: 169-193.

Churchill, G. A. 2002. Fundamentals of experimental design for cdna microarrays. Nat Genet 32
Suppl: 490-495.

Cook, P., C. Fu, M. Hickey, E. S. Han, and K. S. Miller. 2004. Sas programs for real-time rt-pcr
having multiple independent samples. Biotechniques 37: 990-995.

Coussens, P. M., C. J. Colvin, G. J. Rosa, J. Perez Laspiur, and M. D. Elftman. 2003. Evidence
for a novel gene expression program in peripheral blood mononuclear cells from

mycobacterium avium subsp. Paratuberculosis-infected cattle. Infect Immun 71: 6487-
6498.

148

 

Coussens, P. M., A. J effers, and C. Colvin. 2004. Rapid and transient activation of gene
expression in peripheral blood mononuclear cells from johne's disease positive cows
exposed to mycobacterium paratuberculosis in vitro. Microb Pathog 36: 93-108.

Cui, X. G., J. T. G. Hwang, J. Qiu, N. J. Blades, and G. A. Churchill. 2005. Improved statistical
tests for differential gene expression by shrinking variance components estimates.
Biostatistics 6: 59-75.

Dobbin, K., and R. Simon. 2002. Comparison of microarray designs for class comparison and
class discovery. Bioinformatics 18: 1438-1445.

Fu, W. J. J., J. B. Hu, T. Spencer, R. Carroll, and G. Y. Wu. 2006. Statistical models in assessing
fold change of gene expression in real-time rt-pcr experiments. Computational Biology
and Chemistry 30: 21-26.

Gentle, A., F. Anastasopoulos, and N. A. McBrien. 2001. High-resolution semi-quantitative real-
time per without the use of a standard curve. Biotechniques 31: 502, 504—506, 508.

Gibson, U. E., C. A. Heid, and P. M. Williams. 1996. A novel method for real time quantitative
rt-pcr. Genome Res 6: 995-1001.

Giglio, S., P. T. Monis, and C. P. Saint. 2003. Demonstration of preferential binding of sybr

green i to speciﬁc DNA fragments in real-time multiplex pcr. Nucleic Acids Res 31:
e136.

Giulietti, A. et al. 2001. An overview of real-time quantitative pcr: Applications to quantify
cytokine gene expression. Methods 25: 386-401.

Glonek, G. F. V., and P. J. Solomon. 2004. Factorial and time course designs for cdna
microarray experiments. Biostatistics 5: 89-111.

Gorreta, F., D. Barzaghi, A. J. VanMeter, V. Chandhoke, and L. Del Giacco. 2004. Development
of a new reference standard for microarray experiments. Biotechniques 36: 1002-1009.

Hamalainen, H. et a1. 2001. Distinct gene expression proﬁles of human type 1 and type 2 t helper
cells. Genome Biol 2: RESEARCH0022.

Heid, C. A., J. Stevens, K. J. Livak, and P. M. Williams. 1996. Real time quantitative pcr.
Genome Res 6: 986-994.

Hu, J ., F. Zou, and F. A. Wright. 2005. Practical fdr-based sample size calculations in microarray
experiments. Bioinformatics 21: 3264-3272.

Jansen, R. C., and J. P. Nap. 2001. Genetical genomics: The added value from segregation.
Trends Genet 17: 388-391.

149

 

Johnson, M. R., K. Wang, J. B. Smith, M. J. Heslin, and R. B. Diasio. 2000. Quantitation of
dihydropyrimidine dehydrogenase expression by real-time reverse transcription
polymerase chain reaction. Anal Biochem 278: 175-184.

Jung, S. H. 2005. Sample size for fdr-control in microarray data analysis. Bioinformatics 21:
3097-3104.

Kerr, M. K. 2003a. Design considerations for efﬁcient and effective microarray studies.
Biometrics 59: 822-828.

Kerr, M. K. 2003b. Linear models for microarray data analysis: Hidden similarities and
differences. J Comput Biol 10: 891-901.

Kerr, M. K., and G. A. Churchill. 2001a. Experimental design for gene expression microarrays.
Biostatistics 2: 183-201.

Kerr, M. K., and G. A. Churchill. 2001b. Statistical design and the analysis of gene expression
microarray data. Genetical Research 77: 123-128.

Kerr, M. K., M. Martin, and G. A. Churchill. 2000. Analysis of variance for gene expression
microarray data. J Comput Biol 7: 819-837.

Kim, H. et a1. 2002. Use of ma and genomic DNA references for inferred comparisons in DNA
microarray analyses. Biotechniques 33: 924-930.

Kishimoto, Y. et al. 2004. Gene expression relevant to osteoclastogenesis in the synovium and
bone marrow of mature rats with collagen-induced arthritis. Rheumatology (Oxford) 43:
1496—1503.

Koni g, R., D. Baldessari, N. Pollet, C. Niehrs, and R. Eils. 2004. Reliability of gene expression
ratios for cdna microarrays in multiconditional experiments with a reference design.
Nucleic Acids Research 32: e29.

Lin, S. J. et al. 2002. Calorie restriction extends saccharomyces cerevisiae lifespan by increasing
respiration. Nature 418: 344-348.

Littell, R. C. 1996. Sas system for mixed models. SAS Institute Inc., Cary, N .C.

Liu, W., and D. A. Saint. 2002. A new quantitative method of real time reverse transcription

polymerase chain reaction assay based on simulation of polymerase chain reaction
kinetics. Anal Biochem 302: 52-59.

Livak, K. J ., and T. D. Schmittgen. 2001. Analysis of relative gene expression data using real-
time quantitative pcr and the 2(-de1ta delta c(t)) method. Methods 25: 402-408.

Lonnstedt, 1., and T. Speed. 2002. Replicated microarray data. Statistica Sinica 12: 31-46.

150

Marino, J. H., P. Cook, and K. S. Miller. 2003. Accurate and statistically veriﬁed quantiﬁcation
of relative mrna abundances using sybr green i and real-time rt-pcr. J Immunol Methods
283: 291-306.

Martinez, M. J. et a1. 2004. Genomic analysis of stationary-phase and exit in saccharomyces
cerevisiae: Gene expression and identiﬁcation of novel essential genes. Mol Biol Cell 15:
5295-5305.

Miller, R. A., A. Galecki, and R. J. Shmookler-Reis. 2001. Interpretation, design, and analysis of
gene array expression experiments. J Gerontol A Biol Sci Med Sci 56: B52-57.

Muller, P. Y., H. Janovjak, A. R. Miserez, and Z. Dobbie. 2002. Processing of gene expression
data generated by quantitative real-time rt-pcr. Biotechniques 32: 1372-+.

Papp, B., C. Pal, and L. D. Hurst. 2003. Dosage sensitivity and the evolution of gene families in
yeast. Nature 424: 194- 197.

 

Pawitan, Y., S. Michiels, S. Koscielny, A. Gusnanto, and A. Ploner. 2005. False discovery rate,
sensitivity and sample size for microarray studies. Bioinformatics 21: 3017-3024.

Peirson, S. N., J. N. Butler, and R. G. Foster. 2003. Experimental validation of novel and '-

conventional approaches to quantitative real-time pcr data analysis. Nucleic Acids Res
3l:e73.

Pellagatti, A. et al. 2003. Gene expression proﬁling in polycythemia vera using cdna microarray
technology. Cancer Res 63: 3940-3944.

Perou, C. M. et al. 2000. Molecular portraits of human breast tumours. Nature 406: 747-752.

Perreard, L. et al. 2006. Classiﬁcation and risk stratiﬁcation of invasive breast carcinomas using
a real-time quantitative rt-pcr assay. Breast Cancer Res 8: R23.

Pfafﬂ, M. W. 2001. A new mathematical model for relative quantiﬁcation in real-time rt-pcr.
Nucleic Acids Res 29: 045.

Pfafﬂ, M. W., G. W. Horgan, and L. Dempﬂe. 2002. Relative expression software tool (rest) for
group-wise comparison and statistical analysis of relative expression results in real-time
pcr. Nucleic Acids Res 30: e36.

Poletto, R., J. M. Siegford, J. P. Steibel, P. M. Coussens, and A. J. Zanella. 2006a. Investigation
of changes in global gene expression in the frontal cortex of early-weaned and socially
isolated piglets using microarray and quantitative real-time rt-pcr. Brain Res 1068: 7-15.

Poletto, R., J. P. Steibel, J. M. Siegford, and A. J. Zanella. 2006b. Effects of early weaning and
social isolation on the expression of glucocorticoid and mineralocorticoid receptor and

151

llbeta-hydroxysteroid dehydrogenase 1 and 2 mmas in the frontal cortex and
hippocampus of piglets. Brain Res 1067: 36-42.

Ramakers, C., J. M. Ruijter, R. H. Deprez, and A. F. Moorman. 2003. Assumption-free analysis
of quantitative real-time polymerase chain reaction (per) data. Neurosci Lett 339: 62-66.

Recinos, A., 3rd et a1. 2004. Liver gene expression associated with diet and lesion development
in atherosclerosis-prone mice: Induction of components of alternative complement
pathway. Physiol Genomics 19: 131-142.

Rhoden, K. J. et a1. 2004. Real-time quantitative rt-pcr identiﬁes distinct c-ret, ret/ptcl and
ret/ptc3 expression patterns in papillary thyroid carcinoma. Lab Invest 84: 1557-1570.

Rockett, J. C. 2003. To conﬁrm or not to conﬁrm (microarray data)--that is the question. Drug
Discov Today 8: 343.

Rockett, J. C., and G. M. Hellrnann. 2004. Conﬁrming microarray data--is it really necessary?
Genomics 83: 541-549.

Rosa, G. J. M., J. P. Steibel, and R. J. Tempelman. 2005. Reassessing design and analysis of
two-colour microarray experiments using mixed effects models. Comparative and
Functional Genomics 6: 123-131.

Satagopan, J. M., and R. C. Elston. 2003. Optimal two-stage genotyping in population-based
association studies. Genet Epidemiol 25: 149-157.

Satagopan, J. M., E. S. Venkatraman, and C. B. Begg. 2004. Two-stage designs for gene-disease
association studies with sample size constraints. Biometrics 60: 589-597.

Schwarz, G. 1978. Estimating dimension of a model. Annals of Statistics 6: 461-464.

Smyth, G. K. 2004. Linear models and empirical bayes methods for assessing differential
expression in microarray experiments. Stat Appl Genet Mol Biol 3: Article3.

Smyth, G. K., J. Michaud, and H. S. Scott. 2005. Use of within-array replicate spots for assessing
differential expression in microarray experiments. Bioinformatics 21: 2067-2075.

Steibel, J. P., and G. J. M. Rosa. 2005. On reference designs for microarray experiments.
Statistical Applications in Genetics and Molecular Biology 4: A36.

Sterrenburg, E., R. Turk, J. M. Boer, G. B. van Ommen, and J. T. den Dunnen. 2002. A common
reference for cdna microarray hybridizations. Nucleic Acid Research 30: e116.

Stroup, W. W. 2002. Power analysis based on spatial effects mixed models: A tool for
comparing design and analysis strategies in the presence of spatial variability. Journal of
Agricultural Biological and Environmental Statistics 7: 491-511.

152

Swillens, S., J. C. Goffard, Y. Marechal, A. de Kerchove d'Exaerde, and H. El Housni. 2004.
Instant evaluation of the absolute initial number of cdna copies from a single real-time
pcr curve. Nucleic Acids Res 32: e56.

Szabo, A. et al. 2004. Statistical modeling for selecting housekeeper genes. Genome Biol 5: R59.

Tempelman, R. J. 2005. Assessing statistical precision, power, and robustness of alternative
experimental designs for two color rrricroarray platforms based on mixed effects models.
Vet Immunol Immunopathol 105: 175-186.

Tichopad, A., A. Didier, and M. W. Pfafﬂ. 2004. Inhibition of real-time rt-pcr quantification due
to tissue-speciﬁc contaminants. Mol Cell Probes 18: 45-50.

Tichopad, A., M. Dilger, G. Schwarz, and M. W. Pfafﬂ. 2003. Standardized determination of
real-time pcr efﬁciency from a single reaction set-up. Nucleic Acids Res 31: e122.

Tichopad, A., A. Dzidic, and M. W. Pfafﬂ. 2002. Improving quantitative real-time rt-pcr
reproducibility by boosting primer-linked ampliﬁcation efﬁciency. Biotechnology Letters
24: 2053-2056.

Timm, N. H. 2004. Estimating effect sizes in exploratory experimental studies when using a
linear model. American Statistician 58: 213-217.

Verhoeven, K. J. F., K. L. Simonsen, and L. M. McIntyre. 2005. Implementing false discovery
rate control: Increasing your power. Oikos 108: 643-647.

Vinciotti, V. et al. 2005. An experimental evaluation of a loop versus a reference design for two-
channel rrricroarrays. Bioinformatics 21: 492-501.

Winer, J ., C. K. Jung, I. Shackel, and P. M. Williams. 1999. Development and validation of real-
time quantitative reverse transcriptase-polymerase chain reaction for monitoring gene
expression in cardiac myocytes in vitro. Anal Biochem 270: 41-49.

Wolﬁnger, R. D. et a1. 2001. Assessing gene signiﬁcance from cdna microarray expression data
via mixed models. J Comput Biol 8: 625-637.

Yang, Y. 11., and T. Speed. 2002. Design issues for cdna microarray experiments. Nat Rev Genet
3: 579-588.

Yang, Y. H., and T. P. Speed. 2003. Design and analysis of comparative microarray experiments.
In: T. P. Speed (ed.) Statistical analysis of gene expression microarray data. CRC Press,
Boca Raton, FL.

Zehetmayer, S., P. Bauer, and M. Posch. 2005. Two-stage designs for experiments with a large
number of hypotheses. Bioinformatics 21: 3771-3777.

153

   

I11111111311111"leljlljllll