‘WW — V
; 1 A

‘ié‘izilgﬁ .1 ,
.3 :{v 1! t ‘
I n I 5
v why}: ‘32:,

1%
‘3‘}:

mad. . -
I"

a,

2'"

~ v.

\
. u
una-

' " Thy-(J llllv

‘

‘iii

.a

c

i

1 -:
1%;
I .

 

 

u‘m‘

f
.7

..
‘W

I. J
3‘?“

5;“.

4:42:
5,

éapnw

I ‘4-v.'u v
race

'21
3-352 3%.
. v 1.. r‘
“Efﬁa: 14
*

“as

"-
333 ‘
3‘

. 1‘2. 31:
. 3)

s (a
'1
0
“MDT
4 .

’ "'15.“
3‘,

:-

w
13’:
u
5.1,

mi...
M2" vs,
‘ ha».

I
,_
”4’:-

.4:
u
n

 

.‘9
Eff-1&3-

I THE-"S lo

/ l , .LIBRARY
A00 1 Michigan State

University

This is to certify that the
thesis entitled

Differential Expression Analysis of DNA Microarray
data with Application to the Heat Shock
Response of Arabidopsis thaliana

presented by

William R. Swindell

has been accepted towards fulfillment
of the requirements for the

MS. degree in Statistics and Probability

 

 

Major l5rofessor’s Signature
‘1 - 25 —« 2007—

Date

 

MSU is an afﬁrmative-action, equal-opportunity employer

—.—----—~-0—t-—~-4-I--‘—-I-<_I-—v~—-

PLACE IN RETURN BOX to remove this checkout from your record.
10 AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/05 chm' 'C/DateDueljﬂndd-I ""p'. 1'5

DIFFERENTIAL EXPRESSION ANALYSIS OF DNA MICROARRAY DATA WITH

APPLICATION TO THE HEAT SHOCK RESPONSE OF ARABIDOPSIS THALIANA

By

William R. Swindell

A THESIS
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
MASTER OF SCIENCE
Department of Statistics and Probability

2007

ABSTRACT

DIFFERENTIAL EXPRESSION ANALYSIS OF DNA MICROARRAY DATA WITH
APPLICATION TO THE HEAT SHOCK RESPONSE OF ARABIDOPSIS THALIANA

By
William R. Swindell

DNA microarrays are widely used research tools that allow the expression level of
thousands of genes to be monitored simultaneously. A common interest arising in the
context of microarray data is to determine whether a gene’s expression level differs
between two conditions. Differential expression anlaysis provides a means for addressing
such interests, but several aspects of microarray data complicate the application of
standard two-sample methods. The Lima software package utilizes a Bayesian linear
model approach to differential expression analysis, which has gained considerable
popularity within the research community. In Chapter 1, the primary statistical challenges
associated with differential expression analysis are reviewed, and the Bayesian linear
model approach to these challenges is outlined. It is concluded that the statistical methods
implemented in Lima are useful in the absence of ideal procedures, but that attention
should be paid to several key assumptions that may not be satisﬁed by the data. Chapter 2
presents a detailed application of the Lima procedure (and other methods) to microarray
data generated from experiments performed with Arabidopsis thaliana. In particular, the
heat shock transcription factor and protein network of Arabidopsis is proﬁled under a
wide range of abiotic and biotic stress treatments in multiple cell types. The analysis
characterizes the interaction between Arabidopsis heat shock genes with heat and other
types of stress, and identiﬁes several heat shock gene expression patterns that have not

been previously described.

ACKNOWLEDGEMENTS

I would like to thank Prof. Marianne Huebner for having provided me with
encouragement throughout my graduate studies in Statistics, as well as clear guidance in
developing new skills in the analysis of DNA microarray datasets. My research
experiences working with Marianne were an important contribution to my graduate
studies at Michigan State University, and will be of considerable value in subsequent
stages of my career. I would also like to thank Prof. Andreas Weber (Plant Biology) for
the considerable amount of time he spent working with Marianne and I on several
projects. Andreas was of great assistance in having identiﬁed database resources critical
to the completion of this thesis research, and was always welcoming of questions that
arose during the course of our analyses. I thank Prof. Vince Melﬁ for taking the time to
serve on my thesis committee and for having provided comments on this manuscript.

I am especially grateful to the Michigan State University Department of
Probability and Statistics for having provided me with ﬁnancial support in the form of
teaching assistantships. In addition, I acknowledge the Michigan State University
Quantitative Biology and Modeling Initiative (QBMI) for having provided research
assistantships that have enabled the completion of several projects.

The gene expression data analyzed in chapter two of this thesis was generated by
the AtGenExpress consortium and the Arabidopsis Functional Genomics Network
(AFGN). The public availability of this data was critical to the work presented in this

thesis and provides a valuable resource for the Arabidopsis research community.

iii

TABLE OF CONTENTS

List of tables ................................................................................. v

List of ﬁgures ................................................................................ vi

CHAPTER 1

Differential expression analysis of DNA microarray data with linear

models and Bayesian statistics .............................................................. 1
Summary ..................................................................................... 1
Section 1 Introduction ....................................................................... 2
Section 2 Linear Models ................................................................... 6
Section 3 The moderated t and posterior odds statistics ............................. 14
Section 4 Discussion ...................................................................... 23
Bibliography .............................................................................. 32

CHAPTER 2

Transcriptional proﬁling of Arabidopsis heat shock genes ..................................... 35
Summary .................................................................................... 35
Section I Introduction .................................................................... 36
Section 2 Results ................................................................................................... 41
Section 3 Discussion ....................................................................... 63
Section 4 Methods ......................................................................... 72
Bibliography ............................................................................... 77
Appendix .................................................................................... 84

iv

LIST OF TABLES

Table 1 Overview. Root tissue ............................................................ 48
Table 2 Overview. Shoot tissue ............................................................ 49
Table 3 Hsf protein family ................................................................. 50
Table 4 Hsp20 protein family ............................................................. 52
Table 5 Hsp70 protein family ............................................................. 53
Table 6 Hsp90 protein family ............................................................. 54
Table 7 HsplOO protein family ............................................................ 55

LIST OF FIGURES

Figure 1-1 Saturated direct design for two-color microarray with three samples. . ..8
Figure 1-2 Comparison of ranks based on the moderated t-statistic with ranks

based on fold-change ................................................................ 30
Figure 1-3 Comparison of ranks based on the moderated t-statistic with ranks

based on the ordinary t-statistic .................................................... 31
Figure 2-1. Hsf expression response proﬁles in roots ..................................... 56
Figure 2-2. Hsf expression response proﬁles in shoots ................................... 57
Figure 2-3. Hsp20 expression response proﬁles in roots ................................ 58
Figure 2-4. Hsp20 expression response proﬁles in shoots .............................. 59
Figure 2-5. Expression response proﬁles of select Hsp20 genes under

wounding and heat stress treatments .............................................. 60
Figure 2-6. Hsp70, Hsp90 and HsplOO expression response proﬁles under

ultraviolet-B light stress treatment ................................................. 61
Figure 2-7. Expression response proﬁles of selected Hsp70, Hsp90 and

HsplOO genes under wounding and heat stress treatments. . .62

Figures in this thesis are presented in color.

vi

CHAPTER 1
DIFFERENTIAL EXPRESSION ANALYSIS OF MICROARRAY DATA WITH

LINEAR MODELS AND BAYESIAN STATISTICS

Summary

DNA microarrays are increasingly popular research tools that allow the expression level
of thousands of genes to be monitored simultaneously. Statistical tests of differential
expression are widely used in conjunction with DNA microarray data, but there are
several approaches to identifying differentially expressed genes. The goal in differential
expression analysis is to determine whether sufﬁcient evidence exists to declare that a
given gene’s expression level differs between two conditions of measurement. The
Limma software package is a recently developed tool that utilizes a Bayesian approach to
differential expression analysis. The software, moreover, is implemented within a highly
ﬂexible linear model ﬁamework, which allows investigators to evaluate complicated
microarray experiments with little difﬁcultly. In this chapter, an overview of the
challenges entailed by differential expression analysis is provided, and the statistical
methods by which the Lima procedure addresses these challenges are presented. In
particular, the basic t-test as applied to differential expression is developed within the
context of linear models. The main ideas underlying the moderated t-statistic and
posterior odds are then presented, with the aim of drawing attention to the key data
assumptions upon which statistical inferences depend. The performance of the Lima
procedure as evaluated by Smyth (2004) is then discussed in conjunction with remaining
issues that await resolution. The main conclusion emerging from this review is that
statistical inferences based upon the Lima procedure are dependent upon several
assumptions that may not be satisﬁed by microarray datasets. In the absence of ideal
methods, therefore, inferences generated by Limma should be made with appropriate
caution. These considerations underscore the challenges entailed by the complexity of

DNA microarray data.

Section 1 Introduction

DNA microarrays are a widely used tool for simultaneously monitoring the
expression level of thousands of genes throughout a genome (Brown and Botstein 1999;
Lander 1999). As the application of microarray technology has become increasingly
widespread, the statistical analysis of microarray data has become a topic of considerable
importance. Critical statistical issues arise in almost every phase of microarray
processing, including the experimental design stage (Yang and Speed 2002), extraction of
expression intensities from microarray images (Jain et a1. 2002), data preprocessing and
normalization (Quackenbush 2002), and most especially, the post-processing phase
during which a multitude of analyses are possible depending on the biological questions
under consideration (e.g., Eisen et a1. 1998, Alter et a1. 2000, Brown 2000). At the post-
processing stage, standard differential expression analysis is perhaps the most basic
procedure that is most widely implemented. The objective in differential expression
analysis is to determine whether there is sufﬁcient evidence to claim that expression
levels differ between two samples (e.g., tumor vs. benign tissue) for individual genes.
Although this basic problem is analogous to that commonly addressed by the two-sample
t-test procedure (W elch 1947), several complicating factors arise in the context of
microarray data, such that only rough agreement now exists regarding the best approach
for identiﬁcation of differentially expressed genes (Allison et a1. 2006).

The biological and technical variability commonly associated with gene
expression estimates is, at a fundamental level, a key factor that complicates differential
expression analysis. If such variability was not present or trivial, simple analytical

methods would prove robust when applied to microarray datasets. Unfortunately,

however, gene expression estimates are inherently variable, since mixtures of cell types
and possibly genotypes are common RNA sources in experiments, and the technical
processes of mRNA extraction, ampliﬁcation, and probe hybridization are subject to
various sources of noise and interference. In light of these challenges, a wide range of
approaches have been suggested and implemented in the biological literature. The ﬁrst
and most straightforward approach is to avoid hypothesis testing altogether, and identify
genes on the basis of their expression level fold-change between samples. There are, in
fact, several advantages to this approach, including a focus on effect size, non-
dependency on modeling assumptions, minimal estimation of parameters, and
importantly, straightforward interpretation. Ultimately, however, this approach is
problematic since it does not account for the sources of variation mentioned above, which
can impact different genes to varying extents. The standard two sample t-test is perhaps
the next simplest alternative to fold-change, since it does account for biological and
technical variability. The ordinary t-statistic, however, varies inversely with the standard
deviation of expression estimates, such that signiﬁcant statistics can arise for effect sizes

that are arbitrarily small. Furthermore, the standard t-test requires that the variance

associated with gene expression levels (02) be estimated for each gene based upon what
is typically a small number of replicates. While the former difﬁculty can be addressed by
adding a constant term to the denominator of the t-statistic (e.g., Efron et a1. 2001; Tusher

et a1. 2001; Broberg 2003), alternative methods are necessary to avoid independent

estimation of 0'2 for all genes represented on a microarray.

Bayes and empirical Bayes methods provide one possible way of circumventing

the gene-by- gene estimation ofoz. Bayesian methods assume a prior distribution that

characterizes how a2 (and possibly other parameters) varies across all genes considered
in an analysis. The key advantage of this approach lies in its effective use of the
replication that exists among all genes represented on a microarray, which is not
otherwise utilized in the standard t-test procedure. The empirical Bayes approach, in
particular, minimizes strict reliance on priors by estimating the parameters of prior
distributions (i.e., the hyperparameters) from the microarray dataset under consideration.
This Bayesian framework is the basis of the differential expression procedures
implemented in the Limma software package (Smyth 2004), which is part of the R
Bioconductor project and freely available online at http://www.bioconductor.org
(Gentleman et a1. 2004). The Lima package implements Bayesian approaches within a
highly ﬂexible linear model framework, which allows complex experiments to be
analyzed with relative ease by researchers without extensive statistical background. This
practical simplicity is a primary factor underlying the increasingly widespread application
Limma to a wide range of research problems involving gene expression data (e.g.,
Boutros et a1. 2004; Golden and Melov 2004; Renn et a1. 2004; Rodriguez et a1. 2004;
Peart et a1. 2005; Rensink et a1. 2005).

In this chapter, the basic features of the linear model framework that underlie the
methodology implemented in Lima are explained, and the empirical Bayesian model
utilized in differential expression testing is outlined. Particular attention is given to the
assumptions upon which inferences are based. In section 1, the basic t-test as applied to
differential expression analysis is presented within the context of linear models. In
section 2, the main ideas underlying the posterior odds and moderated t-statistic are

presented. The ﬁnal section provides a discussion of simulations and comparative

analyses in which Limma has been applied and provides an overview of persisting

problems associated with differential expression analysis.

Section 2 Linear Models

Linear models represent an efﬁcient way of combining data from multiple arrays
into a single analysis, and provide considerable ﬂexibility that allows for the analysis of
microarray experiments of arbitrary complexity. In this section, the simple two-sample t-
test will be developed in the framework of linear models. The analysis will consist of a
gene-by-gene approach in which a linear model is ﬁt to the expression values associated

with each gene individually. The response variable under consideration will thus be
represented as y g , where the subscript denotes that the linear model is being ﬁt to
expression values associated with gene g. The meaning of y g will differ depending upon

whether the gene expression data has been generated from two-color or one-color (e.g.,

Affymetrix) microarray platforms. It will generally be assumed below that expression

data has been generated from a two-color microarray platform, such that the response y g

represents the relative expression intensities of two RNA sources that have been labeled

with red (Rg ) or green (08) dyes. In particular, y 8 will represent the log-ratio of the

distinct expression intensities.

R
yg =1082[‘6§'] =1082(Rg) _10g2(Gg)

8

For the case of high density oligonucleotide platforms, the response variable y 8

will be absolute log-intensities rather than relative log-ratios. This modiﬁcation, however,
does not alter the formulation to follow, which will focus on the two-color case. It will

be assumed that each gene has been represented on a total of n arrays, such that n

replicate values of y g are available. In vector format, these n replicate measurements are

represented by the random vector y; = (ygl ,..., ygn ). Replicates represent RNA samples

that have been obtained from independent sources (biological replicates), rather than
multiple samples that have been taken from the same individual (technical replicates). It
is further assumed that the data have been normalized to prevent systematic variation
from inﬂuencing the results of the differential expression analysis.

The ﬁrst step towards building a linear model is construction of an appropriate
design matrix (X) that serves to specify the RNA targets used on arrays. The design
matrix used is generally non-unique and can vary according to the parameters included in

the model. The systematic component of the model is represented by Xa g , where u g is a
vector of coefﬁcients representing a series of contrasts. The coefﬁcient vector ug can be

viewed as the eﬂects of interest in the linear model analysis. In the familiar context of

multiple regression, these coefﬁcients are the ,6, most often estimated by least-squares
methods. The model partially explains the variance among the elements of y g

with E (y g) = Xag. The (18 can be estimated by least-squares, maximum likelihood, or

robust regression.

Consider as an example a saturated direct design for two-color microarrays with
three sources of RNA. This example was provided by Smyth (2004) and is illustrated in
Figure 1-1. Given this design, one possible design matrix and coefﬁcient vector are the

following.

 

AB BC CA

 

 

 

 

Array #1 Array #2 Array #3

 

 

 

Figure 1-1. Saturated direct design for two-color microarray with three samples.
Three RNA sources are considered (A, B, and C) using three microarrays (l, 2, and 3).
The ﬁgure inset on the left provides a summary of the RNA-array hybridization scheme,
in which the RNA source at the base of each arrow is labeled green, while the source at

the tip of each arrow is labeled red (example from Smyth 2004).

B—A
X: 0 l a :
lc—Bl

The design matrix and coefﬁcient vector given above can then be used to

calculate the expected value of the random response vector y R . In accordance with the
deﬁnitions of y, presented above for two-color microarray platforms, this expectation

represents relative log—ratios among the three sources of RNA.

y“ 0 B'—A
B—A
y —1 —l A—C

The contrast matrix (C) serves to specify which comparisons are of interest. In
general, only some of a larger number of possible contrasts of the coefﬁcients will be of

biological importance for any one experiment. These selected contrasts are represented

by [i g , which is obtained by applying the transposed contrast matrix to the coefﬁcient

vector ag.

Returning to the example presented in Figure 1-1 , the simplest possible case is
that in which only one speciﬁc contrast is of interest. Suppose, for instance, that a
researcher wishes to compare the RNA abundance in sample C to that of sample A. This

particular contrast is obtained by choosing the vector (1, 1) as the contrast matrix.
C = (1 1)

Applying this contrast matrix (vector) to the coefﬁcient vector then yields the

desired comparison.

pg =(1 1{B_A]=C—A

C—B

Typically, more than one contrast among RNA sources will be of interest. Such

scenarios, however, are easily accommodated by an alternative speciﬁcation of the

contrast matrix. For example, referring again to Figure 1-1, it may be of interest to
evaluate all possible contrasts among the three sources of RNA. In this case, the

following contrast matrix yields the appropriate set of contrasts [3 g .

The contrast matrix yields the following comparisons of interest.

The numerical values of [I g are simply the log-ratio fold changes that are of
interest. In the above example, therefore, [3 g is a vector that represents the log; fold-

change in expression between RNA sources B/A, C/B, and C/A. While more complex

hypotheses are possible, the main interest is to determine whether the elements of [I g are

signiﬁcantly different from zero, i.e., whether expression values differ between RNA

samples. For the jtlr entry of [3 g , therefore, it is of interest to test the null hypothesis
H 0:,8 g,- = 0 versus the alternative H A:,B g]. at 0.

The standard t-statistic makes two basic assumptions regarding the distribution of

the ﬁg]. and s: for any given gene. Additional assumptions will be required in the

context of the moderated t-statistic (see section 3). Let ti g represent the coefﬁcient

10

estimator of mg , and suppose that (“18 has covariance matrix Vgs: . Since we wish to

make inferences regarding the elements of [3 g , it is necessary to obtain the covariance

matrix of fig (Smyth 2004).

var(|3g) = var(CT&g)= CT var(&gk' = CTVng:

Hence, Ii 8 has covariance matrix C TVg Cs: . The elements of this matrix are a

sub-sarnple of those from Vgs: , and thus depend on VAR(&) for individual effects that
are of interest (as speciﬁed by the contrast matrix). The variance associated with the jth

contrast of interest (i.e., ﬁg}. ) is therefore proportional to 0': times the jth diagonal
T . . . .
element of C VgC . Letting thrs latter quantity be represented by vgl. , the variance

associated with the jth contrast of interest is vgi 0:. It is assumed that the estimated value

of the jth contrast of interest approximates a normal distribution with mean ﬁg,- and

variance vgio': (see Smyth 2004).

ﬂame”: ~ Nagoya":

If d g represents the degrees of ﬁ'eedom associated with error term in the linear
model, then d gs: / 0': ~ 2'3, from sampling theory (Rice 1995). Rearranging this

distributional equality yields the sampling distribution of s: .

ll

2

a
2 2 g 2
Sglag [d ]ng

g

The elements of [3 8 therefore approximate a normal distribution, and the variance

of each element follows a scaled chi-square distribution. The ratio of ﬁg]. to sg , ivy. thus

represents a normally distributed random variable divided by a chi-square random

variable with d g degrees of freedom, and is therefore a t-distribution with d g degrees of

freedom (Rice 1995).

 

The above represents the ordinary t-statistic and has been used in a number of
studies in which the interest has been to determine whether differential expression has
occurred between two RNA sources (e. g., tumor versus normal tissue). The two primary

problems with this statistic are immediately upon inspection. First, if the value of

sg ,lvgj is small by chance for a particular gene, then very large values of t 8,. may result

even if the estimated fold-change (38,-) is relatively small. Such genes are unlikely to be
of biological importance, but would nonetheless be identiﬁed as differentially expressed
based on the above statistic. The second problem lies in the estimation of s: based upon

what is a sample size equal to the number of arrays involved in the study, which would

12

most often not exceed six in a simple two sample case. Although additional assumptions
regarding the data are required, these issues are addressed by the statistics implemented

in the Lima linear modeling procedure.

13

Section 3 The moderated t and posterior odds statistics

The Limma linear modeling package evaluates hypotheses associated with
contrasts of interest based upon two different test statistics. The moderated t-statistic (7 )
is an extension of the standard t-statistic introduced in section 1, while the posterior odds
statistic (0) is based upon the work of Lonnstedt and Speed (2002), and provides an
indication of the likelihood of differential expression with respect to a given contrast. The
moderated t-statistic is generally used as a basis for obtaining p-values, while the
posterior odds is primarily used for ranking genes according to evidence for differential
expression. Gene rankings based upon the posterior odds are identical to those based
upon the moderated t-statistic if the microarray data under consideration does not have
missing values. Of the two statistics, the moderated t statistic may be of greater practical
utility, since it is based upon fewer assumptions in comparison to the posterior odds. As
discussed below, both statistics are based upon certain hyperparameters that must be
estimated from the data. One particular hyperparameter upon which the posterior odds is
based requires knowledge regarding the proportion of genes that are differentially
expressed, which is unknown and difﬁcult to estimate from the data. The posterior odds
is therefore ultimately dependent upon an assumed value regarding the proportion of
differentially expressed genes. The moderated t-statistic, however, is not dependent upon
voj. Another advantage of the moderated t-statistic is that it is easily extendable to
multiple testing of multiple contrasts using the F distribution. The central ideas leading to
the moderated t-statistic and posterior odds statistics are discussed below, with particular
attention to the assumptions that underlie their probabilistic interpretation. It should be

mentioned that some notations and concepts are introduced below without immediately

14

stating a complete and precise deﬁnition. This has been done for clarity and allows for an
initial focus on the main results and ideas, subsequently followed by details where

appropriate.

The moderated t-statistic
The central difficulty associated with the standard t-statistic is the low number of

observations available for individual genes. This low replication leads to poor estimates
of the standard deviation associated with each gene (sg ). In microarray datasets,
however, while replication associated with individual genes is low, there is ample

replication with respect to the thousands of genes that are represented on an array. The

advantage of the moderated t-statistic over the standard t-statistic lies in the manner in

which this source of replication is utilized to obtain most stable estimates of sg for
individual genes. The moderated t-statistic (7 ) is thus identical to the standard version,
except that an improved estimator of SE (3g ) is substituted into the denominator (Smyth
2004)

~ ﬁg,-

tgj 34v
g gt

 

The estimator 3g is improved in the sense that it represents a weighted average
between the observed sg for each individual gene and a prior value so. The critical

choice of s0 is made within an empirical Bayesian framework, utilizing information from

15

all genes, and is explained below. Ultimately, this shrinkage towards a prior estimate so
prevents problems associated with very large or very small sg , thereby increasing the

stability of associated with the denominator of t~ . A reﬂection of this added stability is

that 7 is now associated with degrees of freedom d g + do , where do represents the prior
degrees of freedom associated with so. For a ﬁxed level of a , therefore, a smaller
observed value of ITI is required to obtain a signiﬁcant result.

The "532 is a weighted value between the observed and prior variance for
individual genes, with the weighting determined by the relative magnitudes of d g and

do (Smyth 2004).

2 2
doso +dgsg
do +dg

~2

 

The value of d g represents the residual degrees of freedom associated with the

linear model ﬁt to gene g, and is therefore proportional to the replication that is available.

The value of d g will be identical among all genes if no values are missing from the data.

The prior degrees of ﬁ'eedom do is estimated based upon how .6 are distributed among

g

all genes represented on arrays (see below). In particular, do will be large when

variances are homogenous among all genes, and conversely, do will be small in
magnitude if variances differ greatly among genes. If the s; are similar among genes,

therefore, 3'; is nearly equal to the prior so for all genes, in which case 7 is directly

l6

proportional to the observed fold-change. If the s“: differs considerably among genes,
however, 382 is determined by the 5; associated with individual genes to a greater extent,

shrinkage towards the prior so is limited, and 7 becomes increasingly similar to the

standard t-statistic discussed in section 1.

The moderated t-statistic can be viewed as a type of offset t-statistic along the
lines of those developed by Tusher et a1. (2001), Efron et a1. (2001) and Brobery (2003).
Offset t-statistics include an added (constant) term in the denominator to prevent small

variances from leading to signiﬁcant differential expression calls. In particular, for the

moderated t-statistic, if do < co and d g > 0 then,

 

2
tog

In the above formulation of the moderated t-statistic, the moderated variance 3
includes the term (do / d g )53 , which sets a lower limit on the magnitude of the
denominator. In contrast to the statistics proposed by Tusher et a1. 2001 , Efron et a1. 2001
and Brobery 2003, however, the constant term (do /dg )53 is not arbitrary, but is instead

estimated from the data based upon an underlying distributional theory (described

below).

17

All advantages of 7 over t cited above are attained at the expense of an additional

assumption, as well as the having to estimate hyperpararneters upon which the term '5;

depends (i.e., so and do). The primary assumption introduced is that the estimators [i g
and s: are independent for different genes, which is not likely to be valid for most

microarray datasets. The estimation of so and do is somewhat less problematic, due to

the well developed distributional framework in which these parameters are embedded,

which allows stable estimates based on information borrowed from all genes considered
in the analysis. A sketch of the manner in which estimators of so and do are derived is

provided below, further details of which can be found in Smyth (2004).

A hierarchical model is assumed that describes the manner in which a: varies

across all genes. Given the prior estimator so with do degrees of freedom, and noting the

relationship ks2/02 ~ Xi expected under sampling theory (Rice 1995), it is assumed that

0: follows a scaled chi-square distribution across genes.

 

2
2 doso
0 ~
g 2
Id,

If it is further assumed that the ﬁg,- approximate a normal distribution, the joint

distribution of the observed s; and ﬂy. under the null hypothesis can be derived by

carrying out the following integration (Smyth 2004).

18

P(“,,-.s:I/3g,- =0) = jP(/§g,-|02.ﬂg,- =omega010(02)ch2

All three probability terms in the integrand are known by the assumptions stated
above. Following substitution of the appropriate normal or chi-square distributions and

integration of the resulting term, it can be shown based upon the joint distribution
P( ﬁzz-$21412 = 0) that that the 3: among genes is expected to follow a surprisingly

simple distribution (Smyth 2004).

S: ~ 83 F dg vdo

The distributional result stated above allows for estimation of the Sti and do
hyperparameters based upon the observed distribution of s: among all genes. In
particular, if 2g = log s: and f (o) to represents the digarnma function (Smyth 2004),
deﬁne the following quantity e g for each gene.

eg =zg —f(dg /2)+log(dg /2)

Lastly, letting G represent the number of genes present on arrays and h(o)

represent the trigarnma function, a closed form estimators of do and so are provided by

the following equations (Smyth 2004).

19

—1 eg—E
d0=2h mean G —6——1— —h(dg/2)

s5 = exp{§+ f(do /2) — log(do /2)}

The posterior odds statistic

The posterior odds statistic provides a convenient summary measure for

comparing the probability that a gene is differentially expressed (ﬁg,- ¢ 0) to the
probability that a gene is not differentially expressed ( ﬁg,- = 0) with respect to thejth

contrast of interest. This statistic was originally developed by L6nnstedt and Speed
(2002) and is primarily intended to provide a means of ranking genes, rather than as a

device upon which formal inferences can be based. The limitations of the posterior odds

(08,.) statistic are immediately apparent from its deﬁnition below (Smyth 2004), in which

p j represents the proportion of genes that are differentially expressed.

 

 

 

 

_ ng, ¢q?g,,s§) = ng, ¢0,Tg,,s§) = p 1. Pool/98,. :0)
8’ Pm, =0I?,,-.s:) Pm,- =0.?',..s:) 1— p,- Peg-w, =0)

The last ratio reveals that the posterior odds are dependent upon the value of p j ,

which is unknown and cannot be reliably estimated from the data. Moreover, substituting

the distribution of t3,- into the last ratio reveals that 08,. is dependent upon an additional

hyperparameter voj .

20

( W(l+do+dg)/2

[/2 2
0.: pj Vg; gi+d0+dg
g] I‘Pj ng+V0j ~2 vgi

+do+dg
\ vgj+voj J

l

N

 

 

 

 

 

The value voj represents the unsealed variance of 16y , such that vog-2 08 is the

standard deviation of the log fold-changes associated with differentially expressed genes.
Smyth (2004) derived a closed form estimator of V0; , which had not previously been

obtained in the original formulation of Lonnstedt and Speed (2002). This estimator,

however, is based only on genes that are differentially expressed, rather than on all genes
represented on the array (as with do and so ). Given that estimates of voj are based upon
a smaller number of genes, it is likely that these estimates are less stable than those
obtained for other hyperpararneters. In support of this conjecture, low stability of voj

estimates was, in fact, observed in simulation studies carried out by Smyth (2004). These

concerns, however, are compounded by the dependency of 0g,- on p j , for which no

estimator is available.

Arguably, it is conceivable that researchers could, in some cases, have a rough a
priori idea regarding what fraction of genes ( p j) should be differentially expressed in a
particular experiment. It is clear that some treatments should have a larger impact on a
transcriptome than other treatments. It is reasonable, for example, to suppose that treating
an Arabidopsis plant with a genotoxic stress will have much smaller effect on expression

levels than prolonged temperature stress. Nevertheless, there does seem to belittle basis

21

for specifying particular values of p j , such as whether 10%, 20% or 30% of the genome

should be differentially expressed by genotoxic stress. These considerations become

especially awkward given the expected dependence of p j on the number of arrays used

in an experiment and consequent levels of statistical power.

In light of the above considerations, Ogi is directly proportional to 7g]. , and for
this reason, rankings of genes based upon 0g]. are in agreement of those based upon 7g,-
(when there are no missing values in the data). The statistic 08,. therefore remains a

useful tool for ranking genes, despite its limitations as a tool for drawing statistical

inferences.

22

Section 4 Discussion

Statistical tests for differential expression are widely used throughout the research
community and are a starting point in the analysis of most microarray datasets. A number
of procedures have now been developed that represent considerable improvements over
methods employed in early studies of gene expression. The Lima package allows
researchers to test hypotheses for two sample comparisons, but also provides a ﬂexible
linear model implementation that permits testing of hypotheses that are of considerably
greater complexity. This chapter has reviewed the basic framework of linear models as
applied to microarray data, as well as the statistics used by Limma to rank genes and
carry out inferences regarding differential expression. Perhaps the most critical point
emerging from this review is that statistical inferences based upon Limma’s statistics are
dependent upon key distributional assrnnptions, and should therefore be applied with
some caution. It is clear, for example, that the posterior odds statistic should only be used
as a tool for ranking genes with respect to evidence for differential expression. The
moderated t-statistic, in contrast, is of greater potential utility as an inferential statistic.
However, as noted by Smyth (2004), the moderated t-statistic assumes that variances
associated with gene expression measurements are independent across genes, in addition
to a number of distributional requirements that need to be approximately valid for the
dataset under consideration. Like the posterior odds, therefore, the moderated t-statistie is
best used as a means for ranking genes according to evidence of differential expression.
Overall, these considerations underscore the challenges entailed by the complexity of

microarray data. For some experimental designs, especially those for which a small

23

number of arrays have been used, ideal methods for drawing inferences regarding
differential expression remain to be developed.

Simulation analyses are an important part of evaluating the efﬁciency of statistical
procedures, and provide a useful means to compare new and pre-existing methodologies.
Smyth (2004) carried out simulation analyses to explore how results obtained using the
statistics implemented in Lima compared with those based upon fold-change, the
ordinary t-statistic, an offset t-statistie (Efron et al. 2001), and the B-statistic developed
by Lonnstedt and Speed (2002) (implemented in the SMA package for R). Several

different sets of simulated data were generated, with each set exhibiting a different

pattern of variance (02) among genes. The results of these investigations revealed that,
when variances are homogenous among genes, alternative methods were comparable in
terms of false discovery rates. Differences among alternative methods, however, were
more substantial when variances were heterogeneous among genes. In such cases, it was
found that the moderated t-statistie exhibited superior performance to alternative methods
in terms of false discovery rates and the area underneath ROC curves. The second main

ﬁnding from Smyth (2004)’s simulation analyses was that estimation of the
hyperpararneters 56 and do was very accurate, while in contrast, estimation of vo could
be poor if the proportion of differentially expressed genes is speciﬁed incorrectly. In
particular, if the proportion of differentially expressed genes is not correctly speciﬁed,
then vo is generally overestimated by as much as 73% (when variances are highly similar

among genes). Since the posterior odds statistic is dependent upon the vo

hyperparameter, the simulations illustrate why this statistic provides a poor basis for

inferences about differential expression.

24

It should be noted that the simulations carried out by Smyth (2004) used data
generated under the assumptions of the hierarchical model, such that variances among
genes followed an inverse chi-square distribution and contrast coefﬁcients were
approximately normal in distribution. In an important sense, therefore, simulations were
tailored to the statistics implemented in the Limma package, such that the relative
performance of the Lima procedure could be inﬂated beyond that typically observed on
real datasets. A second important point is that the P-values associated with the moderated
t-statistie were not explicitly checked for accuracy. Hence, although the relative
performance of the moderated t-statistic was superior to that of other methods, the
accuracy associated with p-values is not guaranteed (especially with respect to real
datasets for which hierarchical model assumptions may not be valid).

In addition to simulation analyses, evaluating the relative performance of new
methods on real datasets is critical to judge how well underlying assumptions conform to
biological variability. When applied to real datasets, Smyth (2004) found that the
moderated t-statistic and posterior odds yielded rankings that were consistent with prior
biological knowledge associated with the genes being analyzed. Alternative methods, in
contrast, were in some cases not consistent with preexisting biological knowledge. In
Figures 1-2 and 1-3, the associations between gene rankings based upon fold-change, the
ordinary t-statistic, and the moderated t-statistic are shown. These plots were constructed
from the top 30 genes that emerged ﬁom Smyth (2004)’s analysis of the Swirl dataset
(Dudoit and Yang 2003). There was a weak overall association between ranks based upon
fold-change and the moderated t-statistic (r = 0.402), although there was an overall

stronger association between ranks based upon the ordinary and moderated t-statisties (r

25

= 0.765). An interesting difference evident from Figures 1-2 and 1-3 is that, with respect
to fold-change, differences from Lima rankings are of the same magnitude for both
large and small rankings. With respect to the ordinary t-statistie, however, differences
from Lima rankings are large for strongly upregulated genes, but rather small for
strongly downregulated genes. This distinction most likely reﬂects the shrinkage of large
ordinary t-statistics that arise from spuriously low variance estimates associated with
some genes.

An ideal procedure for differential expression analysis would be powerful enough
to reliably detect differentially expressed genes, but at the same time, not critically
dependent upon distributional assumptions that are unlikely to be valid for most
microarray datasets. It could be argued, for example, that if p-values generated by a
differential expression procedure are not meaningful due to heavy dependence upon
shaky assumptions, the purpose behind developing a probabilistic framework has been
defeated. At the other extreme, however, two-sample non-parametric methods, which are
free of restrictive assumptions, often lack the power to detect many differentially
expressed genes of biological relevance and are therefore not useful. The moderated t-
statistic and posterior odds have a number of positive aspects that represent
improvements over earlier approaches, even if all properties associated with the
hypothetical “ideal statistic” are not satisﬁed. Since differential expression analysis
remains an active area of investigation, there are prospects for continued improvement
through the development of new methodologies (e.g., Pan 2003; Lu et a1. 2005; Zen and

Hastie 2005).

26

Although differential expression analysis has been the primary focus of this
review, it is important to note that, in most experiments, biologically important
transcriptional changes will occur that are undetected. Differential expression analysis
effectively divides the transcriptome into “important” versus “non-important” genes
(with regard to the biological phenomena under consideration). While this schism is a
convenient summary device for the human mind seeking to understand a given process, it
may often be an inappropriate characterization. The true importance of genes with regard
to a particular process (e.g., stress response, cancer proliferation) is likely to be as
continuous as the p-values generated by a differential expression analysis. As the number
of arrays used to investigate a particular treatment increases, the effect sizes declared
signiﬁcant by differential expression analysis become increasingly small, and
correspondingly, the list of “important genes” will grow in length. The concept of
differential expression, therefore, is relative in a key sense, and in most cases, it will be
misleading to believe that genes not declared differentially expressed are unimportant. An
important future goal is to develop investigative methodologies that go beyond
differential expression. Such methods will utilize whole-genome datasets more efﬁciently
by recognizing small transcriptional changes below the differential expression threshold,
yielding outputs of greater depth than the “list of genes” approach, which (hopefully) can
be interpreted in a biologically meaningful way.

An additional issue relevant to all differential expression analysis methods is the
means by which multiple comparisons are accounted for. Regardless of the test statistic
that is used, differential expression analysis involves performing as many tests as there

are genes represented on a given array. The p-values generated by a given test statistic

27

must therefore be adjusted to account for the large number of tests that are being
performed. The Bonferroni method is the simplest approach to controlling for a large
number of hypothesis tests. Bonferroni p-value adjustments control the family-wise error
rate by dividing p-values associated with each gene by n, where n is the number of genes
being tested in the differential expression analysis. Following this adjustment, genes
associated with P-values less than a nominal type-I error rate of a can be declared as
differentially expressed at level a. The Bonferroni approach seems to perform
moderately well in many applications in the biological sciences (Sokal and Rohlf 1995).
In the context of differential expression analysis, however, it is generally agreed that it
yields results that are far too conservative, and may lead to a failure to detect genes that
are of biological importance (Allison et a1. 2006). To remedy this short-coming, it was
proposed that controlling the false discovery rate would be more appropriate in the
context of differential expression analysis (Benjamini and Hochberg 1995). This proposal
has been well-received, such that the Benj amini-Hochberg approach to adjusting for
multiple testing via control of the false discovery rate has become widely applied. The
Benjamini-Hochberg method, however, assumes that p-values generated for each
individual gene are independent of one another. This assumption is unlikely to be valid in
most cases, although it has been argued that the method is robust to certain types of
dependency structures among genes (Reiner et al. 2003). Some recent approaches have
attempted to relax the independence assumption by developing resarnpling methods to
control for false discovery rate (Pollard and van der Laan 2004; van der Laan et a1. 2004).
In addition, a large number of methods have been developed based upon mixture model

distributions (Pounds and Morris 2003; Datta and Datta 2005; Do et a1. 2005), which are

28

not free of assumptions, but are generally more powerful than the Benjamini-Hochberg
method (Allison et al. 2006).

The generation of biological knowledge from DNA microarray technology
depends critically on the statistical validity that underlies differential expression
procedures. Differential expression, however, is just one of several issues that arise in the
context of DNA microarray data analysis. Apart from differential expression, the
reliability associated with data preprocessing and normalization, clustering methods, and
classiﬁcation algorithms are also of considerable importance. There are several reasons
for optimism regarding the prospects for future improvements of procedures applied to all
phases of nricroarray data analysis. The transition time between algorithm development
and implementation, for example, should be considerably reduced in light of a freely
available user-friendly statistical computing environment (Gentleman et al. 2004). A
second positive development is the widespread adoption of the MIAME (minimum
information about a microarray experiment) standards (Brazrna et al. 2001), which will
increase the efﬁciency with which analyses can be carried out, and when necessary,
independently repeated by new investigators. With the abundance of tools now available,
the volumes of data generated by microarrays will serve as an increasingly valuable

resource for addressing biologically signiﬁcant research questions.

29

 

 

 

 

30- ./‘
e o ,’
.//
/
o 25- . O //-O
‘30 /
g C O //
L.) 20‘ 0/0
s ° /
LL: /
O O O
8 15~ O ///
5 ° / .
cc // O
m lO-i O / .
é /// O
’ o
5‘ 020//
O /
/6 o
0‘ /
T r I l l I I
0 5 10 15 20 25 30

Rank Based on Moderated t-stat'stic

Figure 1-2. Comparison of ranks based on the moderated t-statistic with ranks
based on fold-change. Smyth (2004) calculated both the moderated t-statistie and fold-
change for genes within the Swirl dataset (Dudoit and Yang 2003). The scatterplot above
displays the association among ranks calculated based on the two methods for the top 30
genes. Open circles represent strongly down-regulated genes, while closed circles
represent strongly up-regulated genes.

30

 

 

 

 

30- O /
O//
O O/
O /
,8. 254 // 0
.3 o /
£3 0 /
m C /
.1. // O
b 20" . O//
:5 /// . O
O 15‘ e //
s 0/
'O O O
33 //O
a 10‘ / O
O //
0
it. o x
M 5‘ // O
O /
// g .
/
O
o~ /
0 5 10 15 20 25 30

Rank Based on Moderated t-statistic

Figure 1-3. Comparison of ranks based on the moderated t-statistic with ranks
based on ordinary t-stastistic. Smyth (2004) calculated both the moderated t-statistic
and the ordinary t-statistic for genes within the Swirl dataset (Dudoit and Yang 2003).
The scatterplot above displays the association among ranks calculated based on the two
methods for the top 30 genes. Open circles represent strongly down-regulated genes,
while closed circles represent strongly up-regulated genes.

31

Bibliography

Allison, D. B., X. Cui, G. P. Page, and M. Sabripour. 2006. Microarray data analysis:
from disarray to consolidation and consensus. Nat. Rev. Genet. 7: 55-65.

Alter, O., P. Brown, and D. Botstein. 2000. Singular value decomposition for genome-
wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97:
10101-10106.

Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate: a powerful
and practical approach to multiple testing. J. Roy. Stat. Soc. B. 57: 289-300.

Boutros, P. C., I. D. Moffat, M. A. Franc, N. Tijet, J. Tuomisto, R. Pohjanvirta, and A. B.
Okey. 2004. Identiﬁcation of the DRE-II gene battery by phylogenetic
footprinting. Biophys. Res. Cornmun. 312(3): 707-715.

Brazma, A., P. Hingamp, J. Quackenbush, G. Sherlock, P. Spellman, C. Stoeckert, J.
Aach, W. Ansorge, C. A. Ball, H. C. Causton, T. Gaasterland, P. Glenisson, F. C.
P. Holstege, I. F. Kim, V. Markowitiz, J. C. Matese, H. Parkinson, A. Robinson,
U. Sarkans, S. Schulze-Kremer, J. Stewart, R. Taylor, J. Vilo, and M. Vingron.
2001. Minimum information about a microarray experiment (MIAME) — towards
standards for microarray data. Nat. Genet. 29: 365-371.

Broberg, P. 2003. Statistical methods for ranking differentially expressed genes. Genome
Biol. 4: R41.

Brown, P. O., and D. Botstein. 1999. Exploring the new world of the genome with DNA
microarrays. Nature Genet. 21(suppl.): 33-37.

Brown, M. P., W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet, T. S. Furey, M. Ares,
and D. Haussler. 2000. Knowledge-based analysis of microarray gene expression
data by using support vector machines. Proc. Natl. Acad. Sci. USA 97(1): 262-
262.

Datta, S., and S. Datta. 2005. Empirical bayes screening of many p-values with
applications to microarray studies. Bioinform. 21: 1987-1994.

Do, K. A., P. Mueller, and F. A. Tang. 2005. A nonparametric Bayesian mixture model
for gene expression. J. R. Stat. Soc. Ser. C. 54: 1-18.

Dudoit, S., and Y. H. Yang. 2003. Bioconductor R packages for exploratory analysis and
normalization of cDNA microarray data. Pp. 73-101 in G. Pannigiani, E. S.
Garrett, R. A. Irizarry, and S. L. Zeger, eds. The analysis of gene expression data:
methods and software. Springer, New York.

32

Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher. 2001. Empirical bayes analysis of a
microarray experiment. J. Amer. Stat. Assoc. 96: 1 151-1160.

Eisen, M. B., P. T. Spellman, P. 0. Brown, and D. Botstein. 1998. Cluster analysis and
display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25):
14863-14868.

Gentleman, R. C., V. J. Carey, D. M. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis,
L. Gautier, Y. Ge, J. Gentry, K. Henrik, T. Hothom, W. Huber, S. Iaeus, R.
Irizarry, F. Leisch, C. Li, M. Maechler, A. J. Rossini, G. Sawitzki, C. Smith, G.
Smyth, L. Tierney, J. Y. H. Yang, and J. Zhang. 2004. Bioconductor: open source
software development for computational biology and bioinforrnatics. Genome
Biol. 5: R80.

Golden, R. T., and S. Melov. 2004. Microarray analysis of gene expression with age in
individual nematodes. Aging Cell 3: 111-124.

Jain, A. D., T. A. Tokuyasu, A. M. Snijders, R. Segraves, D. G. Albertson, and D. Pinkel.
2002. Fully automatic quantiﬁcation of microarray image data. Genome Res. 12:
325-332.

Lander, E. S. 1999. Array of hope. Nat. Genet. 21(suppl.): 3-4.
Lonnstedt, I., and T. P. Speed. 2002. Replicated microarray data. Statist. Sin. 12: 31-46.

Lu, Y., P. Y. Liu, P. Xiao, and H. W. Deng. 2005. Hotelling’s T2 multivariate proﬁling
for detecting differential expression of microarrays. Bioinform. 21: 3105-31 13.

Pan, W. 2003. On the use of permutation in and the performance of nonparametric
methods to detect differential gene expression. Bioinform. 19: 1333-1340.

Peart, M. J ., G. K. Smyth, R. K. van Laar, V. M. Richon, A. J. Holloway, and R. W.
Johnstone. Identiﬁcation and functional signiﬁcance of genes regulated by
structurally diverse histone deacetylase inhibitors. Proc. Natl. Acad. Sci. USA
102(10): 3697-3702.

Pollard, K. S., and M. J. van der Laan. 2004. Choice of a null distribution in resampling-
based multiple testing. J. Stat. Plann. Infer. 125: 85-100.

Pounds, S., and S. W. Morris. 2003. Estimating the occurrence of false-positives and
false negatives in microarray studies by approximating and partitioning the

empirical distribution of p-values. Bioinform. 19: 1236-1242.

Quackenbush, J. 2002. Microarray data normalization and transformation. Nat. Genet.
32(suppl.): 496-501.

33

Reiner, A., D. Yekutieli, and Y. Benjanrini. 2003. Identifying differentially expressed
genes using false discovery rate controlling procedures. Bioinform. 19: 368-3 75.

Renn, S. C. P., N. Aubin-Horth, and H. A. Hofrnann. 2004. Biologically meaningful
expression proﬁling across species using heterologous hybridization to a cdna
microarray. BMC Genomics 5(1): 42.

Rensink, W. A., S. Lobst, A. Hart, S. Stegalkina, J. Liu, and C. R. Buell. 2005. Gene
expression proﬁling of potato responses to cold, heat, and salt stress. Funct. Integ.
Genom. 5: 201-207.

Rice, J. A. 1995. Mathematical statistics and data analysis. Duxbury Press, Belmont,
California.

Rodriguez, M. W., A. C. Paquet, Y. H. Yang, and D. J. Erle. 2004. Differential gene
expression by integrin ,B7+ and ﬂ7- memory T helper cells. BMC Immun. 5:13.

Smyth, G. K. 2004. Linear models and empirical bayes methods for assessing differential
expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3(1):3.

Sokal, R. R., and Rohlf, F. J. 1995. Biomety. W. H. Freeman, New York.

Tusher, V. G., R. Tibshirani, and G. Chu. 2001. Signiﬁcance analysis of microarrays
applied to the ionizing radiation response. Proc. Natl. Acad. Sci. 98: 5116-5121.

van der Laan, M. J ., S. Dudoit, and K. S. Pollard. 2004. Multiple testing part 1: single-
step procedures for control of general type I error rates. Stat. Appl. Genet. Mol.
Biol. 3: 13.

Welch, B. L. 1947. The generalization of ‘students’ problem when several different
population variances are involved. Biometrika 34: 28-35.

Yang, Y. H., and T. Speed. 2002. Design issues for cDNA microarray experiments. Nat.
Rev. Genet. 3: 579-588.

Zou, H., and T. Hastie. 2005. Regularization and variable selection via the elastic net. JR.
Stat. Soc. Ser. B. Stat. Methodol. 67: 301-320.

34

CHAPTER 2

TRANSCRIPTIONAL PROFILING OF ARABIDOPSIS HEAT SHOCK GENES

Summary

The heat shock response ofArabidopsis thaliana is dependent upon a complex regulatory
network involving twenty-one known transcription factors and several heat shock protein
families. It is known that heat shock proteins (Hsps) and transcription factors (Hsfs) are
involved in cellular response to various forms of stress besides heat. However, the role of
Hsps and Hsfs under cold and non-thermal stress conditions is not well understood, and it
is unclear which types of stress interact least and most strongly with Hsp and Hsf
response pathways. We have examined transcriptional response proﬁles ofArabidopsis
Hsfs and Hsps to a range of abiotic and biotic stress treatments (heat, cold, osmotic stress,
salt, drought, genotoxic stress, ultraviolet light, oxidative stress, wounding, and pathogen
infection) in three different types of tissue (roots, shoots, leaves). Our ﬁndings
demonstrate that nearly all stress treatments interact with Hsf and Hsp response
pathways, suggesting considerable cross-talk between heat and non-heat stress regulatory
networks. We identiﬁed several heat shock gene expression pattems that have not been
previously described. First, with respect to the Hsp20 protein family, large expression
responses occurred under all types of stress, with striking similarity among expression
response proﬁles. Second, a number of Hsp20, Hsp70 and HsplOO genes were
speciﬁcally upregulated twelve hours after wounding in root tissue, and exhibited a
similar expression response pattern during recovery ﬁom heat stress. Lastly, among all
protein families, large expression responses occurred under ultraviolet-B light stress in
shoot tissue but not root tissue. These ﬁndings have implications regarding the molecular
basis of cross-tolerance in plant species, and raise several new questions to be pursued in

future experimental studies of the Arabidopsis heat shock response network.

35

Section 1 Introduction

The heat shock response network ofArabidopsis thaliana involves temperature
perception mechanisms, an intricate array of signal transduction networks, and twenty-
one known transcription factors that activate heat shock proteins and other heat-stress
related genes (Never et al. 2001; Sung et al. 2003; Larkindale et al. 2005). The
availability of genome sequence data has considerably advanced our understanding of
this heat shock response pathway, as well as the molecular basis of regulatory networks
that underlie other forms of environmental stress in Arabidopsis (e.g., cold, salinity,
drought). One result of this development has been increased recognition of the cross-talk
or overlap that exists among cellular responses to different environmental stress
treatments (e.g., Cheong et al. 2002; Rensink et al. 2005; Ma et al. 2006; Mittler 2006;
Rossel et al. 2006). In this respect, heat shock proteins (and their associated transcription
factors) are of special interest. Heat shock proteins are molecular chaperones that regulate
the folding, localization, accumulation, and degradation of protein molecules in both
plant and animal species (Feder and Hoﬁnann 1999). Heat shock proteins are therefore
believed to play a broad role in many cellular processes, which may impart a generalized
role in tolerance to multiple environmental stress treatments apart ﬁorn heat stress.
Understanding the role of heat shock proteins under cold and non-thermal stress
conditions may therefore provide insight into multiple stress tolerance mechanisms
(Hoffrnann and Parsons 1991). This may be of considerable importance for improving the
production of agriculturally important crop species under ﬁeld conditions, which are best
characterized as an interaction of several different types of stress, rather than just a single

stress treatment in isolation (Mittler 2006).

36

The Arabidopsis heat shock proteins (Hsps) and transcription factors (Hsfs) have
been well characterized on the basis of genome sequence information (Agarwal et al.
2001; Lin et al. 2001; Never et al. 2001; Krishna and Gleer 2001; Scharf et al. 2001). In
addition to the twenty-one known transcription factors (Never et al. 2001), the
Arabidopsis heat shock response is partly mediated by thirteen Hsp20 proteins (Scharf et
al. 2001), eighteen Hsp70 proteins (Lin et al. 2001), seven Hsp90 proteins (Krishna and
Gleer 200.1), and up to eight members of the Hsp100 protein family (Agarwal et al.
2001). The molecular pathways leading to Hsp expression are not entirely understood
(Sung et al. 2003), but involve temperature perception mechanisms coupled with multiple
signal transduction pathways (Larkindale et al. 2005), which together lead to the
activation of Hsfs that induce expression of heat shock genes by binding to heat shock
elements (Schefﬂ et al. 1998). There are several levels at which this molecular pathway
may overlap with those underlying response to cold and non-thermal stress treatments.
However, since Hsps play a uniquely broad role in cellular processes, they are
particularly likely to underlie interactions between heat and non-heat stress response
pathways. A role of heat shock proteins in cellular response to cold and non-heat stress
treatments, for instance, has been supported by several gene expression studies. In
Arabidopsis and other plant species, various Hsps have been induced by low temperature
(Sabehat et al. 1998), osmotic stress (Sun et al. 2001), salt (Liu et al. 2006a), oxidative
stress (Banzet et al. 1998; Lee et al. 2000; Volkov et al. 2006), desiccation (Liu et al.
2006b), exposure to intense light (Desikan et al. 2001; Hihara et al. 2001; Rossel et al.

2002), wounding (Cheeng et al. 2002), and heavy metal exposure (Gyorgyey et al. 1991).

37

While a number of studies have shown the Hsp expression can be induced under
cold and non-thermal stress treatments, no comparative analyses have been carried out to
identify which particular stress treatments are the weakest and strongest inducers of Hsp
expression. It therefore remains unclear which stress-response pathways overlap most
extensively with this important part of the Arabidopsis heat shock regulatory network. If
the primary stress conditions interacting with Hsp response pathways can be identiﬁed, it
would be of considerable interest to understand how Hsfs and Hsps contribute to
tolerance under such stress conditions. The physiological role of Hsfs and Hsps in
promoting tolerance may differ depending on the nature of the stress imposed upon the
cell. Heat stress, for instance, leads directly to denaturatien of cellular proteins. It is
therefore clear how molecular chaperone activity may contribute to high temperature
tolerance via prevention of deleterious protein conformations and elimination of non-
native aggregations. With respect to cold and non-thermal stress treatments, however, the
impact on cellular protein conformations is less direct and not as well understood. The
role of Hsps as molecular chaperones, therefore, may not strictly parallel their function
under heat stress, and it is possible that their cellular function extends beyond the
chaperone activity that has been well characterized in vitro (e.g., Lee et al. 1997; Lee and
Vierling 2000). One possibility, for example, is that Hsps limit damage resulting from
accumulation of reactive oxygen species, which are generated as messengers and
elements of signal transduction pathways under a wide range of stress conditions (Pastori
and Foyer 2002). In both plant and animal species, for instance, there is evidence to
suggest that Hsps protect against reactive oxygen species (Mehlen et al. 1993; Fleming et

al. 1992; Wheeler et al. 1995; Hamdahl et al. 1999; Preville et al. 1999; Ding and Keller

38

2001; Martindale and Helbrook 2002; Kregel 2002; Neta—Sharir et al. 2005). This
hypothesis is particularly intriguing in light of the considerable interconnectivity that
exists between heat shock and oxidative stress response pathways in plant species
(Desikan et al. 2001; Panchuk et al. 2002; Miller and Mittler 2006; Volkov et al. 2006).
DNA microarray technology offers a promising approach for better understanding
the functional role of Arabidopsis heat shock proteins and transcription factors under both
heat and non-heat stress conditions. Recently, a number of genome-wide microarray
datasets have been generated and made publicly available by the AtGenExpress
consortium. These resources provide an opportunity to proﬁle Hsf and Hsp expression
over a wide range of stress conditions simultaneously. In this study, we utilized
AtGenExpress datasets to analyze transcriptional responses of Arabidopsis Hsfs and Hsps
to a total of ten different abiotic and biotic stress treatments (cold, osmotic stress, salt,
drought, genotoxic stress, ultraviolet light, oxidative stress, wounding, high temperature,
and pathogen infection). The gene expression data we consider was generated ﬁom three
different types of plant tissue (roots, shoots, leaves), with expression measurements
obtained at up to six different time points of stress exposure (0.5, l, 3, 6, 12, and 24
hours). With respect to each of ﬁve protein families (Hsf, Hsp20, Hsp70, Hsp90, and
Hsp100), we evaluated whether expression responses of each family to each stress were
signiﬁcantly large in comparison to other Arabidopsis genes. This analysis provided
indication of which types of stress interacted most and least with each protein family. In
addition, we characterized Hsf and Hsp stress-response patterns at the level of protein
families, as well as among individual genes within protein families. This allowed

identiﬁcation of family-level expression patterns under each stress, gene sub-groups

39

within families exhibiting similar expression patterns, and individual Hsf/Hsp genes with

large expression responses to several different stress treatments.

40

Section 2 Results

An overview of how strongly each stress treatment impacted expression levels for
each heat shock gene family is provided in Tables 1 and 2 for root and shoot tissue,
respectively. To compare the effect of each stress, a summary statistic was developed (I)
that represents the median level of fold-change induced by each stress among members of
a given protein family (see Equation 2 in Methods). For the Hsp20, Hsp70 and Hsp90
protein families, the largest expression responses were associated with the high
temperature treatment, with median levels of fold-change in each family (T) ranging from
one to above four (Tables 1 and 2). However, for the Hsf and Hsp100 protein families,
the largest magnitude expression responses were associated with osmotic stress (Table 1).
For each stress, it was of interest to determine whether median-level expression responses
of each gene family were large in comparison to all other genes represented on the ATHl
array. A resampling procedure was therefore carried out to evaluate the likelihood of
observed T statistics under a null hypothesis of random sampling ﬁ'om the genome (see
Table 1 caption and Methods). Signiﬁcant T statistics were found with respect to each
type of stress we considered, indicating that for one or more protein families, each stress
induced expression responses that were large in comparison to other Arabidopsis genes
(see Tables 1 and 2). High temperature was associated with a signiﬁcant T statistic for all
protein families except Hsp100 in both roots and shoot. The second strongest elicitor of
expression responses was oxidative stress, since it was associated with a signiﬁcant T
statistic for most families in both tissue types. Interestingly, for the pathogen stress

treatment (not shown in Tables 1 and 2), signiﬁcant T statistics were associated with

41

respect to the Hsp70 and Hsp90 families (P = 0.001 and 0.018, respectively), while T
statistics were non-signiﬁcant for the Hsf, Hsp20 and Hsp100 families (P > 0.096).
Protein families exhibiting strong expression responses to many stress treatments
exhibit a generalized expression response pattern. The Hsf and Hsp20 were associated
with the most stress-general expression patterns, since for both families, T was signiﬁcant
for nearly all types of stress (see Tables 1 and 2). In contrast, the Hsp70, Hsp90, and
Hsp100 families were not so widely responsive across all stress treatments. Family-level
expression response patterns speciﬁc to each tissue-treatment combination and variation

among individual heat shock genes are described in the following sections.

Heat Shock Transcription Factors

Heat shock transcription factors were most strongly upregulated under heat, cold,
osmotic, and salt stress treatments. Figure 2-1 displays gene expression response proﬁles
for all Hsf genes in roots, while Figure 2-2 displays response proﬁles of Hsf genes in
sheets. In both tissues, expression responses to cold, osmotic, and salt treatment primarily
occur over the late stages of stress exposure between 6 and 24 hours (see Figs. 2-1 and 2-
2, parts A - C). This pattern contrasts with that observed under heat stress treatment, in
which Hsfs were strongly up-regulated during early stages of stress exposure, with the
effect diminishing after the heat stress was removed beyond the 6 hr. time point (Figs. 2-
1 and 2-2, part I). A general trend among all ﬁve heat shock gene groups was a difference
between the effects of UV-B stress in sheet tissue in comparison to root tissue. With

respect to the Hsf genes, UV-B stress induced strong up-regulation over most time points

42

in sheet tissue (Fig 2-2G), but yielded comparatively low gene expression responses in
roots (Fig. 2-1G).

Considerable variation was observed among expression response patterns
associated with individual Hsf genes. To discern which Hsf genes were the least and most
stress-responsive across all stress treatments, we ranked genes according to an index (d)
(see Table 3). The value of d represents the mean proportion of time points, among all
stress and tissue types we considered, at which a gene was differentially expressed (see
Methods). Highly stress-responsive genes were associated with large values of d, while
genes less responsive to stress were associated with low values of d. The seven least-
stress responsive Hsf genes were all Class A type Hsfs, and were associated with values
of d less than or equal to 0.167. The most stress responsive Hsf gene, in contrast, was the
one class C transcription factor in the Hsf family (Hstl , d = 0.456). Cluster analysis
using the HOPACH algorithm identiﬁed three multi-member clusters of Hsf genes with
respect to stress-responses across all tissue-treatment-time combinations (stress-clusters
451, 452, and 470) (see Table 2). The heatrnap corresponding to this clustering solution is
provided in section 1C of supplemental data ﬁle 1. For comparison, the Hsfs were also
clustered with respect to their expression patterns across the developmental series
conditions analyzed by Schmid et al. (2005) (see Table 3). Members of developmental-
cluster 60 (see Table 2) exhibited a pattern of tissue-speciﬁcity that was found among
certain genes from each of the four Hsp families. The expression pattern was
characterized by strong upregulation speciﬁc to seed stages 6 — 10 (ATGE conditions 81

— 84), roots (17 days) (ATGE condition 9), ﬂowers stage 12 (ATGE conditions 34 — 37),

43

and ﬂowers stage 15 (ATGE conditions 41 — 45) (see section 1C of supplemental data

ﬁle 1).

Hsp20 Protein Family

The Hsp20 protein family exhibited the strongest overall responsiveness to
environmental stress treatments, as well as the most cohesive family-level expression
patterns among member genes. Expression response proﬁles are displayed in Figures 2-3
and 4 for all Hsp20 proteins in root and shoot tissues respectively. Tissue-speciﬁc
patterns of Hsp20 stress-response can be discerned ﬁ'orn a comparison of Figures 2-3 and
24. With respect to the UV-B treatment, for example, strong downregulation of Hsp20
genes occurred between the 3 — 6 hr. time points in roots (Fig. 2-3G). In shoots, however,
UV-B stress induced strong upregulation over this same time period (Fig. 2-4G). With
respect to the cold stress treatment, Hsp20 genes were downregulated between the 3 — 6
hr. time points in roots (Fig. 2-3A), while no such response pattern was associated with
sheets (Fig. 2-4A). More subtle tissue differences were associated with wounding and
heat stress. Hsp20 proteins were responsive to both stress treatments, but the temporal
dynamics of expression differed between the two tissue types (see Figs. 2-3 H, 2-31, 2-4H
and 2-4I).

The expression responses of Hsp20 proteins under wounding and heat stress
revealed surprising family-level patterns within root tissue. Nearly all Hsp20 proteins
exhibited strong upregulation 12 hrs. following wounding of root tissue (see Fig. 2-3H).
Under the heat treatment, Hsp20 upregulation also occurred at the 12 hr. time point (Fig

2-3 I), which represented the heat stress recovery period (9 hrs. following cessation of

heat stress). These expression responses during heat stress recovery were a unique aspect
of the Hsp20 family, since generally, all other heat shock genes were responsive only
while heat stress was directly applied (0.5 — 3 hrs.). For a number of Hsp20 genes,
moreover, the 12 hr. upregulation under heat strongly coincided with that observed under
the wounding stress treatment. This synchrony between expression response proﬁles
under wounding and heat treatment in root tissue is evident from Figure 2-5, which
displays response proﬁles of nine Hsp20 genes under wounding and heat stress
treatments.

Most Hsp20 proteins exhibited strong expression responses to several types of
stress. AtHsp14.2-P(r) exhibited the weakest overall responsiveness to stress (d = 0.096),
while in contrast, AtHsp18.5-Cl(r) showed the strongest expression responses to stress
treatments (d = 0.325) (see Table 4). Cluster analyses revealed strong similarities among
Hsp20 genes with respect to stress-response patterns and the developmental series of
Schmid et al. (2005) (see section 2C of supplemental data ﬁle 1). Members of stress-
cluster 30 (AtHsp23.6-M, AtHsp25.4-P) and stress-cluster 42 (AtHsp26.5-P(r),
AtI-Isp15.7-Cl(r), AtHsp22.0-ER) were highly responsive to stress treatments in root
tissue. The other multi-gene stress-cluster (21) consisted of Hspl 7 proteins entirely
(AtHsp17.4-CI, AtHspl7.6-CII, AtHsp17.6B-CI, AtHsp17.6C-CI, AtHsp17.7-CII), and
similar to stress-clusters 30 and 42, exhibited large expression responses to all stress
treatments, except that strong responses were present in both roots and sheets. The
members of all three of these stress-clusters, and most Hsp20 proteins in general,
exhibited a similar expression proﬁle among developmental stages (see section 2C of

supplemental data ﬁle 1). As among certain Hsfs, this developmental expression proﬁle

45

consisted of high expression levels with respect to roots (17 days), ﬂowers stage 12,

ﬂowers stage 15, and seed stages 6 — 10.

Hsp 70, Hsp90, and Hsp100 Protein Families

The Hsp70, Hsp90, and Hsp100 protein families were generally associated with
smaller magnitude expression responses across stress conditions. However, members of
these families were stress-responsive, since differential expression occurred under most
stress conditions for nearly every Hsp within these families. Members of Hsp70, Hsp90,
and Hsp100 families were most strongly induced by heat, primarily over the early portion
of the time course (0.5 -3 hrs.), although several genes within each family exhibited large
responses to the cold, osmotic, and salt stress treatments. The Hsp70, Hsp90, and Hsp100
families were all associated with a similar tissue-speciﬁc pattern under the UV-B stress
condition (see Fig. 2-6). In particular, expression levels of member genes increased at the
3 — 6 hr. time points in sheet tissue, with little or no transcriptional induction in root
tissue. In addition, the expression response pattern identiﬁed following wounding and
heat stress in root tissue was also evident with respect to AtHsp70-5, AtI-Isp70-8,
AtHsplOO-l, and to a lesser extent, AtHsp90-1 (see Fig. 2-7).

The individual gene members of the Hsp70, Hsp90, and Hsp100 families are
listed in tables 4, 5, and 6, respectively. On the basis of differential expression, these
families contained both the least and most stress responsive Arabidopsis Hsps. The most
stress-responsive was AtHsp70-4 (d = 0.439), which was differentially expressed under
all stress treatments, including all three time points of exposure to pathogen stress. In

contrast, AtHsp100-2 was the least stress-responsive Arabidopsis Hsp (d = 0.009), and

46

was differentially expressed with respect to just one time point under the heat stress
treatment.

Clustering of Hsp90 genes with respect to stress-response patterns assigned ﬁve
members to one group (AtHsp90-2, 4, 5, 6, and 7), since these genes were all associated
with highly similar (and weak) expression response patterns in root tissue (see section 4C
of supplemental data ﬁle 1). The remaining AtHsp90-l exhibited a strong expression
response pattern distinct from all other AtI-Isp90 genes, and was therefore assigned to a
singleton cluster (see Table 6). Within Hsp70 and Hsp100 families, clustering with
respect to stress-response patterns identiﬁed few sub-groups among member genes (see
Tables 5 and 7).

Various members of the Hsp70, Hsp90, and Hsp100 families were associated with
the same developmental expression pattern found among certain Hsf and Hsp20 genes.
This pattern was best exhibited by AtHsp70—4, AtHsp70-11, AtHsp90-l , and AtHsplOO-
1, all of which were highly expressed in roots (17 days), ﬂowers stage 12, ﬂowers stage

15, and seed stages 6 — 10 (see sections 3C, 4C, and 5C of supplemental data ﬁle 1).

47

Table 1. Oveview. Root tissue. Values of the T statistic associated with each tissue-

treatrnent combination with respect to each of ﬁve protein families (Hsf, Hsp20, Hsp70,

Hsp90, Hsp100) in the root tissue type. The value of T is proportional to the median level

of log fold-change induced by a given stress treatment among the n gene members

within a protein family (see Equation 2 in Methods). The P-values associated with each

statistic were obtained by genome-wide resampling and represent the probability of

obtaining an equal or larger value of T based on 10,000 random samples of n genes from

the N = 22746 genes represented on the ATHI array. P-values exceeding 0.0245 in the

table below are non-signiﬁcant following the Benjamini-Hochberg adjustment for

multiple testing (with nominal type I error rate of a = 0.05).

Treatment

cold
osmotic
salt
drought
genotoxic
oxidative
UV-B
wounding

heat

Hsf
(n=21)

0.52(0.001)

0.75(< 0.001)

1.17(< 0.001)

0.33(o.001)
O.24(0.246)
0.23(0.035)
0.27(0.050)
0.27(o.043)

0.49(0.003)

Hsp20
(n = 18)
1.24(< 0.001)
2.12(< 0.001)
1.70(< 0.001)
0.56(< 0.001)
0.72(< 0.001)
O.61(< 0.001)
O.96(< 0.001)
1.18(< 0.001)

4.32(< 0.001)

48

Hsp70
(n = 13)
O.45(0.026)
O.48(0.089)
0.45(0.341)
0.26(0.087)
O.31(0.061)
0.26(0.021)
0.27(0.107)

0.35(0.007)

1.55(< 0.001)

Hsp90
(n = 6)
O.32(0.364)
O.74(0.025)
O.64(0.133)
0.15(0.824)
O.27(O.258)
0.19(0.417)
0.17(O.797)

0.32(0.069)

1.02(< 0.001)

Hsp100
(n = 7)
O.25(0.660)
O.66(0.040)
0.52(O.269)
0.20(0.432)
0.27(O.243)
O.31(0.021)
O.24(0.288)
0.35(0.030)

0.50(0.048)

Table 2. Oveview. Shoot tissue. See Table l caption.

Treatment

cold
osmotic
salt
drought
genotoxic
oxidative
UV-B
wounding

heat

Hsf
(n = 21)
0.62(0.001)
0.83(< 0.001)
0.52(0.oo1)
0.36(0.002)
0.22(0.343)
O.29(0.005)
0.50(0.011)
O.33(0.018)

0.55(< 0.001)

Hsp20
(n = 18)

0.51(0.023)
l.35(< 0.001)
1.00(< 0.001)
O.66(< 0.001)

0.35(0.001)
0.78(< 0.001)
O.69(< 0.001)
0.62(< 0.001)

4.66(< 0.001)

49

Hsp70
(n = 13)
O.44(O.136)
O.76(0.008)
0.44(0.017)
O.31(0.057)

0.20(0.607)

0.48(< 0.001)

0.53(0.024)

0.27(0225)

1.34(< 0.001)

Hsp90
(n = 6)
0.57(0.070)
0.46(0.276)
0.39(O.116)
O.28(0.198)
O.23(O.374)
0.22(0.344)
O.69(0.020)

0.35(0.099)

1.45(< 0.001)

Hsp100
(n = 7)
0.35(0.423)
0.65(0.078)
0.3 1 (0.275)
O.28(0.192)
O.23(O.363)
O.33(0.016)
O.32(O.499)
O.28(O.248)

O.32(0.339)

Table 3. Hsf protein family. Genes are ordered from least to most stress-responsive
(according to d). The value of d represents the mean proportion of time points, among the
19 tissue-treatment combinations considered, at which a gene was differentially
expressed. Cluster IDs represent gene groupings determined by the HOPACH clustering
algorithm (see Methods). The development cluster analysis was carried out with respect
to the developmental series conditions of Schmid et al. (2005). The stress cluster analysis
was carried out with respect to expression responses observed under each of the 111
tissue-treatment-time combinations examined in this study. The jth digit in each cluster
ID indicates the group to which a gene was assigned in the jth iteration of the HOPACH

algorithm (see Pollard and van der Laan 2005).

50

Gene Name Cluster ID (Development) Cluster ID (Stress) d

Hsz9 31 452 0.018
HszS 50 452 0.061
Hsz1 a 32 452 0.070
Hsz7b 60 440 0.088
Hsz7a 80 300 0.123
Hsz6a 31 420 0.149
Hszlb 70 452 0.167
Hsz3 40 430 0.184
Hsz4c 40 452 0.193
Hsz1 d 10 451 0.202
Hsz2 60 200 0.228
Hsz4 90 451 0.228
Hsz1 e 60 460 0.237
HszB 20 410 0.272
Hsz6b 31 100 0.307
Hsz2b 60 470 0.316
Hsz4a 20 490 0.351
Hsz8 20 480 0.395
HszZa 60 470 0.439
Hsz1 40 500 0.447
Hst1 80 600 0.456

51

Table 4. Hsp20 protein family. Genes are ordered from least to most stress-responsive

(according to d). See Table 3 caption for an explanation of clustering procedures and the

value of d.

Gene Name Cluster ID (Development) Cluster ID (Stress) d
AtHsp14.2-P(r) 50 60 0.096
AtHsp25.4-P 24 30 0.167
AtHsp17.6-CII 25 21 0.184
AtHsp23 .6-M 22 30 0.184
AtHsp22.0-ER 22 42 0.193
AtHsp23.5-M 30 10 0.193
AtHsp17.6B-Cl 28 21 0.202
AtHspl7.7-CII 27 21 0.219
AtHsp21.7-Cl(r) 70 44 0.219
AtHsp26.5-P(r) 26 42 0.219
AtHsp17.6A-Cl 23 22 0.228
AtHspl 8.1-C1 40 41 0.237
AtHsp17.6C-Cl 27 21 0.254
AtHsp17.4-Cl 27 21 0.263
AtHsp15.7-Cl(r) 21 42 0.263
AtHsp17.4-CIII 10 43 0.307
AtHsp15.4-Cl(r) 60 70 0.316
AtHspl 8.5-Cl(r) 70 50 0.325

52

Table 5. Hsp70 protein family. Genes are ordered from least to most stress-responsive

(according to d). See Table 2 caption for an explanation of clustering procedures and the

value of d.

Gene Name Cluster ID (Development) Cluster ID (Stress) d
AtHsp70-15 5 54 0.202
AtHsp70-5 2 80 0.21 1
AtHsp70-8 2 70 0.219
AtHsp70-6 7 53 0.228
AtHsp70-17 6 54 0.228
AtHsp70-9 5 52 0.254
AtHsp70-1 5 54 0.272
AtHsp70-10 4 40 0.281
AtHsp70-1 1 1 60 0.333
AtHsp70-7 7 51 0.3 51
AtHsp70-3 4 20 0.368
AtHsp70-2 3 30 0.386
AtHsp70-4 1 21 0.439

53

Table 6. Hsp90 protein family. Genes are ordered from least to most stress-responsive

(according to d). See Table 2 caption for an explanation of clustering procedures and the

value of d.

Gene Name Cluster ID (Development) Cluster ID (Stress) d
AtHsp90-6 3 2 0.28 1
AtHsp90-4 3 2 0.298
AtHsp90-5 2 2 0.307
AtHsp90-7 4 2 0.3 l 6
AtHsp90-1 1 1 0.333
AtHsp90-2 3 2 0.342

54

Table 7. Hsp100 protein family. Genes are ordered from least to most stress-responsive

(according to d). See Table 2 caption for an explanation of clustering procedures and the

value of d.

Gene Name Cluster ID (Development) Cluster ID (Stress) d
AtHsp100-2 2 3 0.009
AtHsp100-5 3 3 0.140
AtHsp100-1 1 1 0.228
AtHsp100-8 3 5 0.246
AtHsp100-4 l 2 0.254
AtHsp100-3 4 3 0.316
AtHsp100-7 3 4 0.368

55

 

a W

 

 

 

(D)

@—
(Dr—r
V'l
NJ

0.. t
()1

no. (G)

(O—l

 

 

,.
N—l
WV
91—1

I T I I I I I I I I I I I I 1 1 T I
0513612240513 612240.513 61224

 

 

 

 

 

 

 

 

Figure 2-1. Hsf expression response profiles in roots. Plots shew proﬁles associated
with (A) cold stress, (B) osmotic stress, (C) salt stress, (D) drought, (E) genotoxic stress,
(F) oxidative stress, (G) UV-B light, (H) wounding, and (I) heat. Class A, B, and C
transcription factors are represented by black, red, and blue lines, respectively. The
horizontal axis of each subplot corresponds to time points at which gene expression
measurements were taken under each stress treatments (0.5, l, 3, 6, and 12 hrs.). The
vertical axis of each subplot indicates the log; fold-change associated with each Hsf (see
Equation 1). The dotted horizontal line in each plot indicates a log; fold-change of zero

(no expression response to stress). [This image is presented in color]

56

 

oo- (A) (B) (C)
<04
#-
N-

o—

 

N—

m .1 (D) (B (F)

‘0'1

 

 

V—l

N—l

.. aﬁ

N
I

a, - (G) (H)

‘01

 

 

*1

°‘ ‘ \.
’A\~.-

O -I - ..

 

 

 

 

 

 

 

or.

 

IIIIIIIIIIIIIIIIII
0.513612240.513612240.51361224

Figure 2-2. Hsf expression response proﬁles in sheets. See Figure 2-1 caption.

57

 

 

 

 

02468

 

-4

 

 

_ (G) (H)

02468
I

 

 

 

 

 

 

 

IIIIIIrI l I ll
0.513 612240.513 612240.513 61224

Figure 2-3. Hsp20 expression response proﬁles in roots. Plots show proﬁles associated
with (A) cold stress, (B) osmotic stress, (C) salt stress, (D) drought, (E) genotoxic stress,
(F) oxidative stress, (G) UV-B light, (H) wounding, and (1) heat. The cytoplamic/nuclear
Hsp205 (classes I— III) are represented by black lines. Plastidial, endoplasmic reticulum,
and mitochondrial Hsp205 (classes P, ER, and M) are represented by red lines. Class I
and Class P related Hsp205 are indicated by blue lines. The horizontal axis of each
subplot corresponds to time points at which gene expression measurements were taken
under each stress treatments (0.5, l, 3, 6, and 12 hrs.). The vertical axis of each subplot
indicates log: fold-change (see Equation 1). The dotted horizontal line in each plot

indicates a leg: fold-change of zero (no expression response to stress).

58

 

 

 

 

 

 

 

 

 

 

 

 

 

lllTllllllllllllll
051361224051361224051361224

Figure 2-4. Hsp20 expression response proﬁles in sheets. See Figure 2-3 caption.

59

 

 

 

 

 

 

 

 

 

 

 

 

 

IIIIIIIIIIIIIIIIII
0.51361224051361224051361224

Figure 2-5. Expression response proﬁles of select Hsp20 genes under wounding
and heat stress treatments. Expression response proﬁles of nine selected Hsp20
proteins under wounding (solid line) and heat (dotted line) treatments are shown.
Subplots display response proﬁles associated with (A) 17.6A-CI, (B) 17.4-CI, (C) 17.6C-
CI, (D) 17.6-CII, (E) 17.7-CII, (F) 25.4-P, (G) 23.6-M, (H) 15.7-Cl(r), and (I) 26.5-P(r).
The horizontal axis corresponds to time points at which gene expression measurements
were obtained, while the vertical axis indicates the log fold-change. The dotted
horizontal line in each plot indicates a log fold-change of zero (no expression response
to stress). For the heat stress treatment, roots were exposed to heat until the 3 hr. time

point, such that the 3-24 hr time interval represents a recovery period.

60

 

(A)

   

 

 

(C) (D)

 

 

 

 

 

 

 

 

Figure 2-6. Hsp70, Hsp90 and Hsp100 expression response profiles under
ultraviolet-B light stress treatment. Expression response proﬁles associated with all
members of the Hsp70 family (A and B), Hsp90 family (C and D), and Hsp100 family (E
and F). Proﬁles associated with root tissue are shown in A, C, and E, while expression
response proﬁles associated with shoot tissue are shown in B, D, and F. The horizontal
axis corresponds to time points at which genes expression measurements were obtained,
while the vertical axis indicates the log; fold-change. The dotted horizontal line in each
plot indicates a log fold-change of zero (no expression response to UV-B light). Hsps
were localized to the cytoplasm (black lines), plastid (red lines), chloroplast (green line),

mitochondria (blue lines), or endoplasmic reticulum (dashed blue line).

61

 

 

 

 

 

 

 

 

 

I I I I I I I I I I I
1

0.5 1 3 6 12 24 0.5

Figure 2-7. Expression response proﬁles of selected Hsp70, Hsp90 and Hsp100
genes under wounding and heat stress treatments. Expression response proﬁles of
four selected proteins (Hsp70, Hsp90, or Hsp100) under wounding (solid line) and heat
(dotted line) treatments are shown. Subplots display response proﬁles associated with (A)
AtHsp70-5, (B) AtHsp70-8, (C) AtHsp90-1, and (D) AtHsp100-1. The horizontal axis
corresponds to time points at which gene expression measurements were obtained, while
the vertical axis indicates the log fold-change. The dotted horizontal line in each plot
indicates a log; fold-change of zero (no expression response to stress). For the heat stress
treatment, roots were exposed to heat until the 3 hr. time point, such that the 3-24 hr time

interval represents a recovery period.

62

Section 3 Discussion

Heat shock proteins (Hsps) and transcription factors (Hsfs) are central components of the
A rabidopsis thaliana heat shock regulatory network. It has long been recognized that
these elements are also involved in response to cold and non-thermal stress treatments
(Feder and Hoﬁnann 1999), but the types of stress that most strongly elicit Hsp/Hsf
expression responses have not been identiﬁed, and the physiological role of these
proteins under non-heat stress treatments is unclear. The ﬁndings of this study support the
hypothesis that Hsps and Hsfs represent an intersection point between heat and non-heat
stress response pathways. Our results indicate that, to varying extents, each of nine cold
and non-thermal stress treatments interact with Hsfs and Hsps at the level of gene
expression. Several prominent family-level expression response patterns were identiﬁed.
These included highly similar stress-response proﬁles among Hsp20 proteins, a number
of Hsps speciﬁcally upregulated 12 hours after wounding and during recovery following
heat stress, and upregulation of heat shock genes to UV-B stress in shoot but not root
tissue. Our ﬁndings raise important questions to be pursued in future experimental studies
of the Arabidopsis heat shock response network.

Genome-wide transcriptional proﬁling allowed the expression of Hsf and Hsp
genes under many stress conditions to be examined within the same context. This
facilitated identiﬁcation of which stressors interact with each protein family most
strongly, which provides insight into the nature and degree of cross-talk that exists
between heat and other forms of stress. The osmotic, cold, and salt treatments were
among the strongest inducers of heat shock gene expression. These stress treatments

induced expression responses of heat shock genes that were large in comparison to other

63

Arabidopsis genes (see Table l), and also large in an absolute sense, since these stressors
induced strong fold-changes and differential expression of individual Hsf and Hsp genes
(see Figs. 2-1 to 2-4 and supplemental data). Expression response patterns were very
similar under each of these treatments, with upregulation primarily occurring over the
late stages of stress exposure (3 — 24 hours). Since osmotic, cold, and salt stress
treatments are each believed to have a deleterious impact on cellular water potential
(Kreps et al. 2002; Verslues et al. 2006), it is possible that their impact on heat shock
genes is related to this common effect. In support of this notion, several previous studies
in plant species have implicated Hsp20 proteins in tolerance to water stress treatments
(Alrnoguera ct al. 1993; Coca et al. 1996; Sun et al. 2001). Among other stress
treatments, wounding and UV-B stress induced moderately large expression responses of
heat shock genes (with strong differences among families and between tissue types). The
pathogen infection treatment was unique, since in contrast to other types of stress, it
elicited strong expression responses among the Hsp70, Hsp90, and Hsp100 families,
while most members of the Hsf and Hsp20 family were not responsive. Overall, drought
and genotoxic stress treatments were associated with weak induction of heat shock genes,
although some individual genes can be cited as an exception to this trend (e. g., Hsz8,
AtHsp15.4-Cl(r), AtHsp100-7).

The degree to which oxidative stress impacted heat shock gene expression is
difﬁcult to discern. In comparison to other Arabidopsis genes, all protein families (except
Hsp90) exhibited large expression responses to oxidative stress (see Table 1). However,
since genomic expression responses to oxidative stress were generally small, absolute

fold-changes induced by oxidative stress were nonetheless of small magnitude. Only one

64

transcription factor, for instance, was differentially expressed under oxidative stress
(Hszld). These results were surprising, since there is considerable evidence that heat
shock transcription factors can function as reactive oxygen species sensors in plants
(reviewed by Miller and Mittler 2006), and extensive interactions have been identiﬁed
between heat and oxidative stress molecular pathways (e.g., Hiimdahl et al. 1999;
Panchuk et al. 2002; Pnueli et al. 2003; Davletova et al. 2004; Neta-Sharir et al. 2005).
Moreover, since the generation of reactive oxygen species is a general response under
many types of stress (Pastori and Foyer 2002), Hsf activation by reactive oxygen species
may provide the best hypothesis to explain why heat shock genes are induced by so many
stress treatments. In view of this, an important factor to consider is the means by which
oxidative stress was experimentally induced. F or data we analyzed, oxidative stress was
induced by exogenous application of methyl viologen, which is a generator of superoxide
anion radical (Laloi et al. 2004). The impact of this reactive oxygen species on heat shock
gene expression may differ item that of others (op den Camp et al. 2003; Gadjev et al.
2006). In a recent study, for instance, Gadjev et al. (2006) demonstrated that arneng
genes upregulated more than two-fold under heat stress, relatively few were responsive to
superoxide anion radical, while most were instead responsive to the singlet oxygen
reactive oxygen species. These considerations suggest that, although the oxidative stress
treatment examined by this study may not have had a strong impact on heat shock genes,
the production of different types of reactive oxygen species (e.g., H202), leading to Hsf
activation and consequently Hsp expression, remains a pathway through which cellular

responses to heat and other forms of stress may be linked.

65

Heat shock transcription factors are of fundamental importance to understanding
stress response networks, since these proteins coordinate the expression of Hsps and
other stress-responsive genes. The Arabidopsis Hsf family is larger than that which has
been described in any other plant or animal system (Never et al. 2001), and at present, no
single Hsf has been identiﬁed as a primary trigger of the heat shock response. The
emerging picture is one of considerable complexity, with extensive interactions among
individual Hsfs and sensitivity to a diverse range of environmental signals (Miller and
Mittler 2006). We found that seven Hsfs (six class A, one class B) exhibited very weak
expression responses to heat and all other stress conditions (see Table 2, stress-clusters
451 and 452), while the remaining 14 Hsfs were strongly induced by several stress
treatments. The most distinctive expression response patterns we observed were
associated with Hsz6b and HstI (see supplemental data ﬁle 2). In root tissue, Hsz6b
exhibited approximately ﬁve-fold induction to salt and osmotic treatments across all time
points of gene expression measurement (0.5 - 24 hours). This pattern contrasted with that
observed among other Hsfs, most of which responded to salt and osmotic stress over the
late stages of stress exposure only. This early response of Hsz6b to salt and cold
treatments was, in fact, unique among all the heat shock genes that we examined,
suggesting that Hsz6b may interact with elements outside of the Hsf/Hsp response
pathway. On the basis of differential expression analysis, Hstl was the most stress-
responsive of all Hsfs. Among all treatments and tissues that we examined, this
transcription factor was, on average, differentially expressed with respect to nearly half of

the time points at which gene expression was measured. This strong expression response

66

pattern is particularly noteworthy in light of the large structural dissinrilarities between
Hst1 and all other Arabidopsis Hsfs (Never et al. 2001).

The Hsp20 family exhibited the most stress-general expression response pattern
of all the protein groups that we examined. Our results therefore suggest that this protein
family is of potential importance as a factor contributing to multiple stress tolerance in
plant species. These ﬁndings are also consistent with those of previous studies, which
have found that certain Hsp20 proteins are involved in cellular responses to a wide
variety of environmental treatments besides heat, such as alcohol (Kuo et al. 2000), cold
(Sabehat et al. 1998), heavy metals (Lin et al. 1984; Tseng et al. 1993; Guan et al. 2004;
Sun et a1. 2002), osmotic stress (Sun et al. 2001), desiccation (W ehrneyer and Vierling
2000), and oxidative stress (N eta-Sharir et al. 2005). At present, little is known regarding
how Hsp20 proteins are integrated with molecular networks that underlie cellular
responses to these stress treatments. Increasingly, it has been recognized that Hsp20
proteins can engage in a wide range of cellular processes under stress, including ATP-
independent stabilization of substrate proteins undergoing confonnatienal disruption (Sun
et al. 2002), or associating with lipid molecules to regulate ﬂuidity of the membrane
structure (Tsvetkova et al. 2002). This latter function suggests that Hsp203 could be
involved in the perception of stressful stimuli leading to the activation of signal
transduction pathways. Under temperature extremes, the role of membrane ﬂuidity as a
means of stress perception and activation of signal transduction pathways has been well
established (Sun et al. 2002). However, since non-thermal stressors may also alter

membrane ﬂuidity or lead to various types of membrane damage, interactions of Hsp205

67

with membranes could partly account for the overall stress-responsiveness of the Hsp20
family.

A striking aspect of the Hsp20 family was the similarity among the expression
response patterns of member genes. This similarity was demonstrated by our clustering
analysis (see section 2C of supplemental data 1), which interestingly, revealed a cluster
of ﬁve 17 kDa Hsp20 proteins that included both class I and II nuclear/cytosolic proteins.
This result is consistent with ﬁndings of previous studies, which have identiﬁed
functional similarities between class I and II Hsp203 (e.g., [5w et al. 2000), despite
marked differences between the amino acid sequences of the two classes (Vierling 1991).
If analysis is restricted to stress responses occurring in the root tissue type only, the
overall homology of Hsp20 expression response patterns is considerably enhanced. In
root tissue, expression patterns of 17 kDa Hsp20s are very similar to those of the 18 — 20
kDa HspZOS, including those localized to the mitochondria and endoplasmic reticulum
(see section 2C of supplemental data ﬁle 1). The similarity of expression patterns among
the Hsp20 proteins may reﬂect shared induction mechanisms, and possibly extensive
coordination among Hsp20s as cellular chaperones, such as that observed during the
formation of heat-stress granules (N over et al. 1983). Shared induction mechanisms
among Hsp20 proteins may include accumulation of denatured proteins in the cytoplasm
(Sung et al. 2003), generation of reactive oxygen species (Miller and Mittler 2006), or
changes in membrane lipid composition and ﬂuidity (Tsvetkova et al. 2002). These
processes are thought to be upstream signals leading to the activation of critical Hsfs,

which are most likely the direct inducers of Hsp20 expression under stress.

68

A number of Hsps were upregulated 12 hours after wounding, with a parallel
expression response pattern during recovery from heat stress in root tissue. While the
majority of these proteins were members of the Hsp20 family (see F ig.2-5), some
members of the Hsp70, Hsp90, and Hsp100 families also exhibited this distinctive
expression pattern (see Fig. 2-7). The upregulation of multiple Hsps following wounding
and during heat stress recovery has not been previously documented in Arabidopsis or
other plant species, and is therefore an important ﬁnding of this study. The ﬁrst indication
that Hsps are involved in the wounding response pathway was provided by the study of
Cheong et al. (2002), in which the effect of wounding on expression levels of 8,200
Arabidopsis genes was surveyed in leaf tissue. Cheong et al. (2002) identiﬁed one Hsf
upregulated 0.5 hours following wounding (AtHsz4a), along with another upregulated
both 0.5 and 6 hours after wounding (Atl-lszl). In addition, three Hsp70 proteins were
upregulated 6 hours after wounding (HSP70, HSC70-G8, HSC70-G7), as well as two 17
kDa sHSPs (AtHsp17 .8-CII and AtHspl7.7-CII). We found that the most interesting
wounding-response patterns occurred in root tissue, but our results are consistent with
those of Cheong et a1. (2002), since Hsp upregulation also occurred after wounding in
aerial shoot tissue. Overall, our ﬁndings suggest that Hsp involvement in wounding
response is greater than previously recognized, and by proﬁling Hsps simultaneously
under multiple types of stress, our results show that late wound-responsive genes are also
active during recovery from heat stress. These results point to a broad role of some Hsps
during stress recovery or acclimation, i.e., the process by which plants increase stress-
tolerance following an initial period of exposure. Following wounding of plant tissue,

both local and systemic signals are generated that coordinate defense responses aimed at

69

limiting further injury (e.g., by pathogen) (Leon et al. 2001). Acclimation to heat stress
has been especially well studied, and is characterized by an increased tolerance or
hardening to high temperatures following initial exposure (Sung et al. 2003). The
functional role of Hsps upregulated as part of the post-wounding and post-heat stress
response is unclear and warrants further investigation. With respect to heat stress
recovery, one recent study found that mutant plants lacking a 32 kDa heat shock
associated protein (Hsa32) exhibited an elevated decay in thennotolerance following
exposure to heat stress (Chamg et a1. 2006).

Ultraviolet-B radiation resulted in upregulation of heat shock proteins and
transcription factors in sheets, but did not have this effect in root tissue. This distinction
between aerial and subterranean tissue types was most marked with respect to the Hsp20
group, in which nearly all Hsp203 were upregulated in sheets and downregulated in roots.
Similar to other stress treatments, exposure to ultraviolet-B light has been associated with
the production of reactive oxygen species (Amott and Murphy 1991; Green and F luhr
1995). Speciﬁcally, ultraviolet light stress has been found to increase cellular
concentrations of H202 (Shiu and Lee 2005), which has been thought to activate Hsf
expression (Miller and Mittler 2006), especially that of Hsz4a and Hsz8 (Davletova et
al. 2005). We found that both Hsz4a and Hsz8 were strongly induced by UV-B stress
in sheets but not in roots (see supplemental data ﬁle 2). These observations are consistent
with the notion that Hsp expression in sheets results ﬁ'om UV-B induced activation of
Hsfs, possibly Hsz4a and Hsz8, with the generation of H202 as an intermediary signal.
Given the tissue-speciﬁc effect we observed, however, the generation of H202 could be

dependent upon interactions between UV-B stress and photosynthetic processes taking

70

place in chloroplast. In previous models, it has been suggested that UV-B generated
reactive oxygen species are upstream components that act upon photosynthetic genes
(i.e., H202 —> photosynthesis) (A.-H.-Mackerness et al. 1999; John et al. 2001). Our
results, however, suggest that the reverse is also plausible, in which photosynthetic
processes are upstream components leading to the generation of reactive oxygen species
under UV-B stress (i.e., photosynthesis —2 H202).

Recently, it has been emphasized that the generation of agricultural varieties
tolerant to a range of stress conditions should be a primary goal in biotechnological
applications, since under ﬁeld conditions, plants may encounter different types of stress
in combination (Mittler 2006). Focusing on overlapping elements among response
pathways that underlie diverse forms of stress may advance our knowledge of cross-
tolerance in plant species (Bowler and Fluhr 2000). The Arabidopsis heat shock proteins
and transcription factors exhibit expression responses under a wide range of stressﬁrl
stimuli, and are therefore a natural model for developing our understanding of integration
between regulatory networks associated with different kinds of stress. The ﬁndings of
this study have identiﬁed which types of stress interact least and most strongly with Hsfs
each Hsp family at the transcriptional level. In addition, new family-level expression
response patterns related to wounding and ultraviolet-B light stress have been uncovered.
These results provide insight into the nature and degree of cross talk between heat and
non-heat stress conditions, and represent a basis for further experimental investigations

into the involvement of Hsf and Hsp proteins under cold and non-thermal stress.

71

Section 4 Methods

All microarray data analyzed in this study were generated using the ATHl
Affymetrix microarray platform (Hennig et al. 2003; Redman et al. 2004), with
expression estimates obtained by gcRMA nerrnalization (Wu et al. 2004). A total of
22,810 probes were included on the ATHl platform, along with 64 control probes not
corresponding to Arabidopsis genes. Our analysis is therefore based on a total of 22,746
genes, representing approximately 80% of all known Arabidopsis genes (Schmid et al.
2005). Gene expression datasets were downloaded ﬁom AtGenExpress at
http://www.weigelworld.org/resources/microarray/AtGenExpress/ (abiotic stress and
pathogen series). Complete protocols associated with these data can be obtained from
TAIR (http://www.arabidopsis.org/) (submission numbers: ME00325, ME00326,
ME00327, ME00328, ME00329, ME00330, ME00338, ME00339, ME00340,
ME00342). In brief, the abiotic stress series data consists of gene expression
measurements performed on Arabidopsis thaliana (col-0) roots and shoots under a benign
control condition and nine enviromnental stress conditions. F or each stress condition,
expression measurements were obtained from 16 to 18-day old plants at six different time
points of stress-exposure (1/2, 1, 3, 6, 12, and 24 hours). All expression measurements
were performed with duplicate biological replications. Stress treatments included cold
(4°), osmotic stress (300 mM Mannitol), salt (150 mM NaCl), drought (15 min. dry air
stream leading to 10% loss of fresh weight), genotoxic stress (1.5 )4ng bleomycin, 22
ug/ml mitomycin), oxidative stress (10 uM methyl viologen), ultraviolet-B light stress
(15 min. exposure, 1.18 W/m2 Phillips TL40W/ 12), wounding (pin puncture), and high

temperature (3 hrs. at 38° followed by 21 hrs. recovery at 25°). From the pathogen series

72

dataset, we considered experiments involving P. infestans infection of 5-week old
Arabidopsis leaves, along with corresponding control treatments in which H20 was
applied to leaves. Expression measurements were obtained from three biological
replications at each of three post-infection time points (6, 12, and 24 hours). Pathogen
infections used 10'8 efu/ml in MgCl2 with 5 x 105 P. infestans spores applied to leaf
surfaces.

The heat shock proteins and transcription factors analyzed in this study were
selected based upon the genomic sequence analyses performed by Agarwal et al. (2001),
Lin et al. (2001), Never et al. (2001), Krishna and Gleer (2001), and Scharf et al. (2001).
Our analysis includes all of the twenty-one Hsfs identiﬁed by Never et al. (2001). Several
Hsps identiﬁed by the above-cited studies were not represented on the ATHl array
(AtHspl7.8-Cl, AtHsp70-12, AtHsp70-l3, AtHsp70-14, AtHsp70-16, AtHsp70-18,
AtHsp90-3, and AtHsp100-6), and therefore could not be included in this study. In total,
our heat shock protein analysis is based upon 18 of 19 members of the Hsp20 family (12
sHsps and 6 related sHsp-like proteins), 13 of 17 members of the Hsp70 family (11 DnaK
and 2 SSE subfamily), 6 of 7 members of the Hsp90 family, and 7 of 8 members of the
Hsp100 family (AtHsplOO-l and six homologues). The expression response patterns of
each Hsf and Hsp gene was analyzed with respect to nine abiotic stress treatments
(applied to root and shoot tissue), in addition to pathogen infection treatment (applied to
leaf tissue). In total, therefore, the expression response of each Hsf and Hsp was
examined under 19 tissue-treatment combinations.

The T statistic represents the median level of fold-change induced by a given

stress treatment among members of a given protein family. Let To,“ represent the mean

73

gcRMA normalized expression intensity of the ith gene under the jth experimental
treatment (abiotic stress or pathogen) within the kth tissue following t hours of stress

exposure (i = l...N,j = 1... 10, k = l...3, and t = 0.5... 24). For every tissue-treatment-
time combination, values of foo were associated with a corresponding control

measurement designated as You (i = 0). Log2 fold-changes (M) at each tissue-treatment-

time combination were thus calculated as the difference between expression intensities in

the jth stress treatment and corresponding control treatments.

Mijkr : Ijkr ' XiO/rr (1)

The average value of |M| occurring over all time points under a given tissue-

treatrnent combination reﬂects the overall stress-responsiveness associated with a gene’s

expression proﬁle. Letting this average value for gene i under treatment j in tissue k be

represented by Wtkl , a test statistic T was deﬁned as the median value of WWI among

the n gene members of a given protein family.

n= mgfliflnﬂ M I) (2)

The magnitude of T reﬂects how large expression responses of a protein family
are, on average, with respect to a given tissue-treatment combination. The signiﬁcance of
observed T statistics was evaluated under the null hypothesis that the n genes in each

protein family are a random sample of the N = 22746 genes represented on the ATHl

74

array, versus the alternative that the n genes are a non-random sample yielding a T
statistic larger than expected within a random sample. This hypothesis was evaluated by
the following resampling procedure. With respect to each tissue-treatment combination
and each protein family, a total of 103 random samples of n genes were drawn from
among all N = 22746 genes, and the value of T was calculated from each of the 103
random samples. This yielded null distributions speciﬁc to each tissue-treatment
combination and protein family, which were used to evaluate the signiﬁcance of observed
T statistics. An observed T statistic was signiﬁcant if the proportion of random samples
yielding a larger or equal T statistic was less than a = 0.05. A signiﬁcant Tstatistic
indicates that the expression responses among the n members of a protein family (with
respect to a given tissue-treatment combination) are larger than expected within a random
sample of n genes.

Hsf and Hsp expression response patterns within protein families and among
individual genes were analyzed by differential expression analysis and clustering.
Differential expression analysis was carried out using the Lima linear modeling
package available in the R Bioconductor software suite (Smyth 2004). In this approach, a
linear model was ﬁt for all genes with respect to each of the 19 tissue-treatment
combinations. This allowed heat shock related genes to be tested for differential
expression at every time point associated with each tissue-treatment combination. F or
each of the 19 linear model analyses performed, P-values were adjusted for multiple
comparisons using the Benjamini and Hochberg method (Benjamini & Hochberg 1995;
Reiner et al. 2003). The differential expression analysis was used to construct the index

(d) introduced in Results section.

75

The hierarchical ordered partitioning and collapsing hybrid (HOPACH) clustering
algorithm was used to identify sub-groups of genes with similar expression response
patterns in each protein family (van der Laan and Pollard 2003). In this algorithm, the
number of clusters appropriate in the ﬁnal clustering solution is determined automatically
according the median split silhouette criterion (Pollard and van der Laan 2005). The
HOPACH algorithm is particularly well-suited for ﬁnding homogenous clusters of small
size among a limited number of genes. Stress-clusters were formed by grouping Hsf/Hsp
genes with respect to their expression responses (M) under all 111 tissue-treatment-time
combinations included in our analysis (18 tissue-treatment combinations with 6 time
points + l tissue-treatment combination with 3 time points). The Euclidean distance
metric was used to measure similarity between vectors of expression responses (M)
associated with each Hsf/Hsp gene. To form developmental-clusters, genes were centered
to have a mean expression intensity of zero across the 79 developmental series
conditions, and the cosine angle similarity metric was used to cluster expression proﬁles

of Hsf/Hsp genes within each family.

76

Bibliography

A.-H.-Mackerness, S., S. L. Surplus, P. Blake, C. F. John, V. Buchanan—Wollaston, B. R.
Jordan, and B. Thomas. 1999. Ultraviolet-B-induced stress and changes in gene
expression in Arabidopsis thaliana: role of signaling pathways controlled by

jasmonic acid, ethylene and reactive oxygen species. Plant Cell Environ. 22:
1413-1423.

Agarwal, M., S. Katiyar-Agarwal, C. Sahi, D. R. Gallic, and A. Grover. 2001.
Arabidopsis thaliana Hsp100 proteins: kith and kin. Cell Stress Chap. 6: 219-224.

Almoguera, C., M. A. Coca, and J. Jordano. 1993. Tissue-speciﬁc expression of
sunﬂower heat-shock proteins in response to water-stress. Plant J. 4: 947-958.

Amett, T., and T. M. Murphy. 1991. A comparison of the effects of a fungal elicitor and
ultraviolet-radiation on ion-transport and hydrogen-peroxide synthesis by rose
cells. Env. Exp. Bot. 31: 209-216.

Banzet, N., C. Richaud, Y. Deveaux, M. Kazmaier, J. Gagnon, and C. Triantaphylides.
1998. Accumulation of small heat shock proteins, including mitochondrial Hsp22,
induced by oxidative stress and adaptive response in tomato cells. Plant J. 13:
519-527.

Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate: a powerful
and practical approach to multiple testing. J. Roy. Stat. Soc. B. 57: 289-300.

Bowler, C., and R. F luhr. 2000. The role of calcium and activated oxygens as signals for
controlling cross-tolerance. Trends Plant Sci. 5: 241-246.

Charng, Y. Y., H. C. Liu, N. Y. Liu, F. C. Hsu, and S. S. Ke. 2006. Arabidopsis Hsa32, a
novel heat shock protein, is essential for acquired thermotelerance during long
recovery after acclimation. Plant Physiol. 140: 1297-1305.

Cheong, Y., H. Chang, R. Gupta, X. Wang, T. Zhu and S. Luan. 2002 Transcriptional
proﬁling reveals novel interactions between wounding, pathogen, abiotic stress,
and hormonal response in Arabidopsis. Plant Phys. 129: 661-677.

Coca, M. A., C. Almoguera, T. L. Thomas, and J. Jordano. 1996. Differential regulation
of small heat-shock genes in plants: analysis of a water-stress-inducible and
developmentally activated sunﬂower promoter. Plant Mol. Biol. 31 : 863-876.

Davletova S., K. Schlauch, J. Coutu, and R. Mittler. 2005. The zine-ﬁnger protein Zat12

plays a central role in reactive oxygen and abiotic stress signaling in Arabidopsis.
Plant Phys. 139: 847-856.

77

Desikan, R., S. A.-H.-Mackerness, J. T. Hancock, and S. J. Neill. 2001. Regulation of the
Arabidopsis transcriptome by oxidative stress. Plant Phys. 127: 159-172.

Ding, Q. X., and J. N. Keller. 2001. Proteasome inhibition in oxidative stress
neurotexicity: implications for heat shock proteins. J. Neurochem. 77: 1010-1017.

Feder, M. E., and G. E. Hofrnann. 1999. Heat-shock proteins, molecular chaperones, and
the stress response: evolutionary and ecological physiology. Ann. Rev. Physiol.
61: 243-282.

Fleming, J. E., I. Reveillaud, and A. Niedzwiecki. 1992. Role of oxidative stress in
Drosophila aging. Mutat. Res. 275: 267-279.

Gadjev, 1., S. Vanderauwera, T. S. Gechev, C. Laloi, I. N. Minkov, V. Shulaey, K. Apel,
D. Inze, R. Mittler and F. Van Breusegem. 2006. Transcriptomic footprints

disclose speciﬁcity of reactive oxygen species signaling in Arabidopsis. Plant
Phys. 141: 436-445.

Green, R., and R. Fluhr. 1995. UV-B-induced PR-l accumulation is mediated by active
oxygen species. Plant Cell 7: 203-212.

Guan, J. C., T. L. Jinn, C. H. Yeh, S. P. Feng, Y. M. Chen, and C. Y. Lin. 2004.
Characterization of the genomic structures and selective expression proﬁles of

nine class I small heat shock protein genes clustered on two chromosomes in rice
(Oryza sativa L.). Plant Mol. Biol. 56: 795-809.

Gyérgyey, J ., A. Gartner, K. Nemeth, Z. Magyar, H. Hirt, E. Heberlebors, and D. Dudits.
1991. Plant Mol. Biol. 16: 999-1007.

Héirndahl, U., R. B. Hall, K. W. Osteryoung, E. Vierling, J. F. Bemman, and C. Sundby.
1999. The chloroplast small heat shock protein undergoes oxidation-dependent

conformational changes and may protect plants from oxidative stress. Cell Stress
Chap. 4: 129-138.

Hennig, L., M. Menges, J. A. Murray and W. Gruissen, 2003 Arabidopsis transcript
proﬁling on Affymetrix GeneChip arrays. Plant Mol. Biol. 53: 457-465.

Hihara, Y., A. Kamei, M. Kanehisa, A. Kaplan, and M. Ikeuchi. 2001. DNA microarray
analysis of cyanebacterial gene expression during acclimation to high light. Plant
Cell 13: 793—806.

Hoffrnann, A. A., and P. A. Parsons. 1991 Evolutionary genetics and environmental
stress. Oxford University Press, Oxford.

78

John, C. F ., K. Morris, B. R. Jordan, B. Thomas, and S. A.-H.-Mackerness. 2001.
Ultraviolet-B exposure leads to up-regulation of senescence-associated genes in
Arabidopsis thaliana. J. Exp. Bot. 52: 1367-1373.

Krishna, P., and G. Gleer. 2001. The Hsp90 family of proteins in Arabidopsis thaliana.
Cell Stress Chap. 6: 238-246.

Kreps, J. A., Y. Wu, C. Hur-Song, T. Zhu, X. Wang, and J. F. Harper. 2002
Transcriptomic changes for Arabidopsis in response to salt, osmotic, and cold
stress. Plant Phys. 130: 2129-2141.

Kregel, K. C. 2002. Heat shock proteins: modifying factors in physiological stress
responses and acquired thermotelerance. J. Appl. Physiol. 92: 2177-2186.

Kuo, H. F ., Y. F. Tsai, L. S. Young, and C. Y. Lin. 2000. Ethanol treatment triggers a

heat shock-like response but no therrnotolerance in soybean (Glycine max cv.
Kaohsiung No.8) seedlings. Plant Cell Environ. 23: 1099-1108.

Laloi, C., K. Apel, and A. Danon. 2004. Reactive oxygen signaling: the latest news. Curr.
Opin. Plant Biol. 7: 323-328.

Larkindale, J ., J. D. Hall, M. R. Knight, and E. Vierling. 2005. Heat stress phenotypes of
Arabidopsis mutants implicate multiple signaling pathways in the acquisition of
thermotelerance. Plant Phys. 138: 882-897.

Lee, G. J ., A. M. Rosernan, H. R. Saibil, and E. Vierling. 1997. A small heat shock
protein stably binds heat-denatured model substrates and can maintain a substrate
in a folding-competent state. Embo J. 16: 659-671.

Lee, G. J ., and E. Vierling. 2000. A small heat shock protein cooperates with heat shock
protein 70 systems to reactivate a heat-denatured protein. Plant Phys. 122: 189-
197.

Lee, B. H., S. H. Won, H. S. Lee, M. Miyao, W. I. Chung, I. J. Kim, and J. Jo. 2000.
Expression of the chloroplast-localized small heat shock protein by oxidative
stress in rice. Gene 245: 283-290.

Leon, J ., E. Rejo, and J .J . Sanchez-Serrano. 2001. Wound signalling in plants. J. Exp.
Bot. 52: 1-9.

Lin, C. Y., J. K. Roberts, and J. L. Key. 1984. Acquisition of therrnotolerance in soybean

seedlings: synthesis and accumulation of heat shock proteins and their cellular
localization. Plant Physiol. 74: 152-160.

79

Lin, B.-L., J .-S. Wang, H.-C. Liu, R.-W. Chen, Y. Meyer, A. Barakat, and M. Delseny.
2001. Genomic analysis of the Hsp70 superfarnily in Arabidopsis thaliana. Cell
Stress Chap. 6: 201-208.

Liu, N. Y., S. S. Ke, K. C. Yeh, and Y. Y. Chamg. 2006a. Isolation and characterization
of tomato Hsa32 encoding a novel heat-shock protein. Plant Science 170: 976-
985.

Liu, D., X. Zhang, Y. Cheng, T. Takano, and S. Liu. 2006b. rHsp90 gene expression in
response to several environmental stresses in rice (Oryza sativa L.). Plant Physiol.
Biochem 44: 380-386.

L6w, D., K. Brindle, L. Never, and C. Forreiter. 2000. Cytosolic heat-stress proteins
Hsp17.7 class I and Hspl 7.3 class II of tomato act as molecular chaperones in
vivo. Planta 211: 575-582.

Ma, S., Q. Gong, and H. J. Behnert, 2006 Dissecting salt stress pathways. J. Exp. Bot. 57:
1097-1107.

Martindale, J. L., and N. J. Helbrook. 2002. Cellular response to oxidative stress:
signaling for suicide and survival. J. Cell. Phys. 192: 1-15.

Mehlen, P., J. Briolay, L. Smith, C. Diazlatoud, N. Fabre, D. Pauli, and A. P. Arrigo.
1993. Analysis of the resistance to heat and hydrogen peroxide stresses in cos
cells transiently expressing wild-type or deletion mutants of the Drosophila 27-
kda heat-shock protein. Europ. J. Biochem. 215: 277-284.

Miller, G., and R. Mittler. 2006. Could heat shock transcription factors function as
hydrogen peroxide sensors in plants? Ann. Bot. 98: 279-288.

Mittler, R. 2006. Abiotic stress, the ﬁeld environment and stress combination. Trends
Plant Sci. 11: 15-19.

Neta-Sharir, I., T. Isaacson, S. Lurie, and D. Weiss. 2005. Dual role for tomato heat
shock protein 21: protecting photosystem II from oxidative stress and promoting
color changes during ﬁ'uit maturation. Plant Cell 17: 1829-1838.

Never, L., K. D. Scharf, and D. Neumann. 1983. Formation of cytoplasmic heat-shock
granules in tomato cell-cultures and leaves. Mol. Cell. Biol. 3: 1648-1655.

Never, L., K. Bharti, P. D6ring, S. K. Mishra, A. Ganguli, and K.-D. Scharf. 2001.

Arabidopsis and the heat stress transcription factor world: how many heat stress
transcription factors do we need? Cell Stress Chap. 6: 177-189.

80

0p den Camp, R. G. L., D. Przybyla, C. Ochsenbein, C. Laloi, C. Kim, A. Darren, D.
Wagner, E. Hideg, C. G6bel, I. Feussner, M. Nater, and K. Apel. 2003. Rapid
induction of distinct stress responses after release of singlet oxygen in
Arabidopsis. Plant Cell 15: 2320-2332.

Panchuk, I. 1., R. A. Volkov, and F. Schdfﬂ. 2002. Heat stress and heat shock
transcription factor dependent expression and activity of ascorbate peroxidase in
Arabidopsis. Plant Phys. 129: 838-853.

Pastori, G. M., and H. Foyer. 2002 Common components, networks, and pathways of
cross-tolerance to stress. The central role of “redox” and abscisic acid-mediated
controls. Plant Phys. 129: 460-468.

Pnueli, L., H. Liang, M. Rozenberg, and R. Mittler. 2003. Growth suspension, altered
stomatal responses, and augmented induction of heat shock proteins in cytosolic
ascorbate peroxidase (Apx1)-deﬁcient Arabidopsis plants. Plant J 34: 187-203.

Pollard, K. and M. van der Laan. 2005. Cluster analysis of genomic data. Pp. 209-228 in
R. Gentleman, W. Huber, V. J. Carey, R. A. Irizarry, and S. Dudoit, eds.
Bioinforrnatics and Computational Biology Solutions Using R and Bioconductor.
Springer, New York.

Preville, X., F. Salvemini, S. Giraud, S. Chaufour, C. Paul, G. Stepien, M. V. Ursini, and
A. P. Arrigo. 1999. Mammalian small stress proteins protect against oxidative
stress through their ability to increase glucose-6-phosphate dehydrogenase
activity and by maintaining optimal cellular detoxifying machinery. Exp. Cell
Res. 247: 61-78.

Redman, J. C., B. J. Haas, G. Tanimoto, and C. D. Town. 2004 Development and
evaluation of an Arabidopsis whole genome Affymetrix probe array. Plant J. 38:
545-561.

Reiner, A., D. Yekutieli, and Y. Benjamini. 2003. Identifying differentially expressed
genes using false discovery rate controlling procedures. Bioinfonrr. 19: 368-375.

Rensink, W. A., S. Lobst, A. Hart, S. Stegalkina, J. Liu, and C. R. Buell. 2005. Gene
expression proﬁling of potato responses to cold, heat, and salt stress. Funct. Integ.
Genom. 5: 201-207.

Rossel, J. B., I. W. Wilson, and B. J. Pogson. 2002. Global changes in gene expression in
response to high light in Arabidopsis. Plant Phys. 130: 1109-1120.

Rossel, J. B., P. B. Walter, L. Hendrickson, W. S. Chow, A. Poole, P. M. Mullineaux,
and B. J. Pogson. 2006. A mutation affecting ascorbate peroxidase 2 gene

expression reveals a link between responses to high light and drought tolerance.
Plant Cell Environ. 29: 269-281.

81

Sabehat, A., S. Lurie, and D. Weiss. 1998. Expression of small heat-shock proteins at low
temperatures — A possible role in protecting against chilling injuries. Plant Phys.
117: 651-658.

Scharf, K.-D., M. Siddique, and E. Vierling. 2001. The expanding family of Arabidopsis
thaliana small heat stress proteins and a new family of proteins containing or-
crystallin domains (Acd proteins). Cell Stress Chap. 6: 225-23 7.

Schmid, M., T. S. Davison, S.R. Henz, U. J. Pape, M. Demar, M. Vingron, B. Scholkopf,
D. Weigel, and J. U. Lohmann. 2005. A gene expression map of Arabidopsis
thaliana development. Nat. Genet. 37: 501-506.

Schéfﬂ, F ., R. Prandl, and A. Reindl. 1998. Regulation of the heat-shock response. Plant
Phys. 117: 1135-1141.

Shiu, C. T., and T. M. Lee. 2005. Ultraviolet-B-induced oxidative stress and responses of
the ascorbate-glutathiene cycle in a marine macroalga Ulva fasciata. J. Exp. Bot.
56: 2851-2865.

Smyth, G. K. 2004. Linear models and empirical Bayes methods for assessing differential
expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3: Article 3.

Sun, W. N., C. Bernard, B. van de Cotte, M. Van Montagu, and N. Verbruggen. 2001.
At-Hspl 7.6A, encoding a small heat-shock protein in Arabidopsis, can enhance
osmotolerance upon overexpression. Plant J. 27: 407-415.

Sun, W. N., M. Van Montagu, and N. Verbruggen. 2002. Small heat shock proteins and
stress tolerance in plants. Gene Struct. Expr. 1577: 1-9.

Sung, D. Y., F. Kaplan, K.-J. Lee, and G. L. Guy. 2003. Acquired tolerance to
temperature extremes. Trends Plant Sci 8: 179-187.

Tseng, T. S., S. S. Tzeng, K. W. Yeh, C. H. Yeh, F. C. Chang, Y. M. Chen, and C. Y.
Lin. 1993. The heat-shock response in rice seedlings — isolation and expression of

cDNAs that encode class-I low-molecular-weight heat-shock protein. Plant Cell
Phys. 34: 165-168.

Tsvetkova, N. M., I. Horvath, Z. Torok, W. F. Wolkers, Z. Balogi, N. Shigapova, L. M.
Crowe, F. Tablin, E. Vierling, J. H. Crowe, and L. Vigh. 2002. Small heat-shock

proteins regulate membrane lipid polymorphism. Proc. Natl. Acad. Sci. 99:
13504-13509.

Van der Lann, M., and K. Pollard. 2003. Hybrid clustering of gene expression data with
visualization and bootstrap. J. Stat. Plan. Inf. 117: 275-303.

82

Verslues, P. E., M. Agarwal, S. Katiyar-Agarway, J. H. Zhu, and J. K. Zhu. 2006.
Methods and concepts in quantifying resistance to drought, salt and freezing,
abiotic stresses that affect plant water status. Plant J. 45: 523-539.

Vierling, E. 1991. The roles of heat-shock proteins in plants. Ann. Rev. Plant Phys. Mol.
Biol. 42: 579-620.

Volkov, R. A., I. l. Panchuk, P. M. Mullineaux, and F. Schiifﬂ. 2006. Heat stress-induced
H202 is required for effective expression of heat shock genes in Arabidopsis.
Plant Mol. Biol. 61: 733-746.

Wehmeyer, N., and E. Vierling. 2000. The expression of small heat shock proteins in
seeds responds to discrete developmental signals and suggests a general protective
role in desiccation tolerance. Plant Phys. 122: 1099-1108.

Wheeler, J. C., E. T. Bieschke, and J. Tower. 1995. Muscle-speciﬁc expression of
Drosophila hsp70 in response to aging and oxidative stress. Proc. Natl. Acad. Sci.
92: 10408-10412.

Wu, Z., R. Irizarry, R. Gentleman, F. Martinez Murillo, and F. Spencer. 2004. A model

based background adjustment for oligonucleotide expression arrays. J. Amer. Stat.
99: 909-917.

83

Appendix

The following R code was used to carry out differential expression analyses using the
LIMMA linear modeling package. The matrix “X” is a 22746 x 24 matrix that contains
all expression data associated with a single stress-tissue combination (stress and control
treatments, 6 time points, 2 replicates per time point).

> library(limma)

> X = read.table("ExpressionMatrix.txt")

>wa=X

> design = model.matrix(~ -1+factor(c(l, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10,
10,11,11,12,12)))

> colnames(design) =

c("controll ","contr012","centrol3","control4","centrolS","control6","timel ",
"time2","time3","time4","time5","time6")

> ﬁt = lmFit(eset, design)

> contrastmatrix = makeContrasts(timel-controll, time2-contr'012,time3-control3,
time4-control4, time5-controlS, time6-control6,levels=design)

> ﬁt2 = contrasts.ﬁt(ﬁt, contrastmatrix)

> ﬁt2 = eBayes(ﬁtZ)

> result = decideTests(ﬁtZ, method = "separate", adjustmethod = "BH", p.value=0.05)
> resultl = abs(result[,l])

> result2 = abs(result[,2])

> result3 = abs(result[,3])

> result4 = abs(result[,4])

> result5 = abs(result[,5])

> result6 = abs(result[,6])

84

The following is an example of the R-Code used to construct Figures 2-1 to 2-4 of
Chapter 2. The matrices X1, X2, ..., X9 have six rows each (one for each time point at
which measurements were obtained). The column number of each matrix corresponds to
the number of genes represented in a given ﬁgure (i.e., 21 in Figures 2-1 and 2-2, 18 in
Figures 2-3 and 2-4).

> par(mfrow == c(3,3))

> linetypes = rep(1,21)

> thecolors = c(rep(l,15),rep(2,5),4)

>

> par(mai = c(0.05, 0.27, 0.10, 0))

> X1 = data.ﬁame(Xl)

> I'OW.naInCS(X1) = c(II().5rr,Irl "’II3II,II6","1201,1024")

> matplot(Xl, axes=F, frame=T, lty=linetypes, col=thecolors, type="l",
xlab="Time(hrs.)",ylab="M", ylim=c(-2,9))

> axis(2)

> abline(h = 0, lty = 2)

> text(0.75, 8.6, labels="(A)", pos=4)

>

> par(mai = c(0.05, 0.10, 0.10, 0.05))

> X2 = data.frame(X2)

> I'OW.naInCS(x2) = C("O.5","1","3 "’116","12","24")

> matplot(X2, axes=F, frame=T, lty=linetypes, col=thecolors, type="l",
xlab="Time(hrs.)",ylab="M", ylim=c(-2,9))

> abline(h = 0, lty = 2)

> text(0.75, 8.6, labels="(B)", pos=4)

>

> par(mai = c(0.05, 0.05, 0.10, 0.10))

> X3 = data.frame(X3)

> TOW.I'IalneS(X3) ___ C("0.5","1","3","6","12","24")

> matplot(X3, axes=F, frame—T, lty=linetypes, col=thecolors, type="l",
xlab="Time(hrs.)",ylab="M", ylim=c(-2,9))

> abline(h = 0, lty = 2)

> text(0.75, 8.6, labels=”(C)", pos=4)

>

> par(mai = c(0.10, 0.25, 0.05, 0))

> X4 = data.frame(X4)

> TOW.nameS(X4) = C("O.5","1","3","6","12","24")

> matplot(X4, axes=F, frame=T, lty=linetypes, col=thecolors, type="l",
xlab="Time(hrs.)",ylab="M", ylim=c(-2,9))

> axis(2)

> abline(h = 0, lty = 2)

> text(0.75, 8.6, labels="(D)", pos=4)

>

> par(mai = c(0.10, 0.10, 0.05, 0.05))

> X5 = data.frame(X5)

85

> row.names(XS) = c("0.5","1","3","6","12","24")

> matplot(XS, axes=F, frame=T, lty=linetypes, col=thecolors, type="l",
xlab="Time(hrs.)",ylab="M", ylim=c(-2,9))

> abline(h = 0, lty = 2)

> text(0.75, 8.6, labels="(E)", pos=4)

>

> par(mai = c(0.10, 0.05, 0.05, 0.10))

> X6 = data.frame(X6)

> row.names(X6) = c("0.5","l","3","6","12","24")

> matplot(X6, axes=F, ﬁ'ame=T, lty=linetypes, col=thecolors, type="l",
xlab="Time(hrs.)",ylab="M", ylim=c(-2,9))

> abline(h = 0, lty = 2)

> text(0.75, 8.6, labels="(F)", pos=4)

>

> par(mai = c(0.27, 0.27, 0, 0))

> X7 = data.ﬁarne(X7)

> row.names(X7) = C(IIO.5","1II,ﬂ3II,II6II,II1211,1124")

> matplot(X7, axes=F, frame=T, lty=linetypes, col=thecolors, type="l",
xlab="Time(hrs.)",ylab="M", ylim=c(-2,9)) '

> axis(2)

> axis(1,1 :6,row.names(X7))

> abline(h = 0, lty = 2)

> text(0.75, 8.6, labels="(G)", pos=4)

>

> par(mai = c(0.27, 0.10, 0, 0.05))

> X8 = data.frame(XS)

> row.names(X8) = c("0.5","l ","3","6","l2","24")

> matplot(X8, axes=F, frame=T, lty=linetypes, col=thecolors, type:
xlab="Time(hrs.)",ylab="M", ylim=c(-2,9))

> axis(l ,1 :6,row.names(X8))

> abline(h = 0, lty = 2)

> text(0.75, 8.6, labels="(H)", pos=4)

>

> par(mai = c(0.27, 0.05, 0, 0.10))

> X9 = data.frame(X9)

> row.names(X9) = c("0.5","1","3","6","12","24")

> matplot(X9, axes=F, frame=T, lty=linetypes, col=thecolors, type="l",
xlab="Time(hrs.)",ylab="M", ylim=c(-2,9))

> axis(l ,1 :6,row.names(X9))

> abline(h = 0, lty = 2)

> text(0.75, 8.6, labels="(l)", pos=4)

"l",

86

      

 
   
 

  
 

       

             

11111121111111)

1 3

’5 111111))