hwy;
i. a
dummk a

£91,
122:: ,2

i 7:»:

 

: ‘
.. 7
iw I. .....i.!w
J 3.313...
. 1:! la...
uV- ‘I‘LV ,

al.
”iiiiv.‘ idqn.
£211 1...: S. .s . .
.ioni . its}. 1‘? 0:. :11}...
|vl\ :2... LP:

. .a . .5: av)

 

Lzuz .u,.w...u..._.v.$

 

 

”cats

1
f0 E7

 

This is to certify that the
dissertation entitled

INTEGRATING GENE EXPRESSION AND METABOLIC
PROFILES TO OPTIMIZE CELLULAR FUNCTIONS

presented by

ZHENG LI

has been accepted towards fulﬁllment
of the requirements for the

PhD degree in Chemical Enﬂeering

 

 

 

Weir»; 01%

Major Professor’s Signature

A443, .23: 9004

Date

MSU is an Afﬁrmative Action/Equal Opportunity Institution

 

LIBRARY
MIChlgq. . State
University

 

 

 

-.—--.-I-c-

0--I-I-0-u-n-n-o-n-o--o-o-I-I--¢--o-I-u-o-o--u-u-I-I-O-I-a-v--o-o-n-I-.-->

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 c:/ClRC/DateDue.p65-p.15

 

 

INTEGRATING GENE EXPRESSION AND METABOLIC PROFILES
TO OPTIMIZE CELLULAR FUNCTIONS

By

ZHENG LI

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Chemical Engineering and Materials Science

2006

ABSTRACT

INTEGRATING GENE EXPRESSION AND METABOLIC PROFILES
TO OPTIMIZE CELLULAR FUNCTIONS

By

ZHENG LI

With advances in high throughput technology, proﬁles of gene expressions,
proteins and metabolites can be acquired to help elucidate the network of pathways
involved in producing a speciﬁc phenotype. This dissertation presents a systems approach
that was developed to integrate gene expression, metabolic and phenotypic proﬁles to
identify active pathways that confer a phenotype. The approach involves several separate
components to i) identify genes that are relevant to a cellular or metabolic process, ii)
integrate multi-source information, and iii) reconstruct pathways and networks.

Approaches to identify genes relevant to a phenotype were developed using
genetic algorithm coupled partial least squares analysis (GA/PLS) and discussed in
Chapter 2. GA/PLS used a log linear model to identify subsets of genes that can best
predict a phenotype e.g. a metabolic function. Next we applied Bayesian network
analysis to infer network structures from metabolic data, discussed in Chapter 3.
Metabolic data was chosen initially because if Bayesian network analysis is able to infer
well-known metabolic structures, pathways e. g. TCA cycle and urea cycle, from
experimental data, which provided conﬁdence in the ability of this methodology to infer
other networks, such as, genetic regulatory networks from gene data. In Chapter 4, we
integrated both ideas ﬁ'om the previous two chapters into a Three-stage Integrative

Pathway Search (TIPSO), which combined methods to identify relevant genes with

network reconstruction. Unlike other approaches, this approach identiﬁed the active
pathways without requiring interaction measurements or libraries of genetic mutants, and
with limited amount of data. The reconstructed network was validated through in silico
perturbations studies with published results and further experiments. The framework
provided very good predictions of the effect of some, but not all, of the perturbation
studies. This may be due in part to Bayesian network analysis’ inability to handle
transients, such as cycles and feedback loops. Uncovering the additional information
from these transients would be valuable in elucidating the mechanism involved in
producing a particular phenotype. Therefore, in Chapter 5, we developed a dynamic
model using time-series gene expression data. To illustrate the ability of the model, we
applied the model to Escherichia coli K12 (E. coli) and Saccharomyces cerevisiae
(yeast), which were more readily arnendable to validatation. Finally, Chapter 6 discussed
the improvements to the current T IPS© approach namely, to include multiple metabolites,

incorporate multi-source information, and dynamic modeling for time-series data.

Copyn'ght by
ZHENG L1

2006

DEDICATION

TO LULU
FOR HER SUPPORT AND LOVE

ACKNOWLEDGMENTS

First I would like to thank Professor Christina Chan, who gave me guidance
throughout my research. Without her knowledge, patience and support this dissertation
would have not been possible. She is not only my academic advisor, but also a mentor
and a good friend. The knowledge and friendship I gained from her will deﬁnitely
inﬂuence the rest of my life. I would also thank Professor Sarat Dass, his keen insights
and valuable discussions on statistics often gave me stimulating ideas in research.

I am grateful to Professor Worden and Professor Dale for taldng time to serve on
the guidance committee and overseeing this work. Their insightful comments and
suggestions have enhanced the technical soundness of this dissertation. I am also grateful
to all my friends and colleagues. The discussions with them substantially contiibuted to
my work and broadened my knowledge.

Last but not the least, many thanks go to my family: my wife, my Mom and Dad,
and my brother. Without their encouragement and continuous support, I would not be
who I am today.

This research was supported by National Science Foundation, Whitaker
Foundation, Environmental Protection Agency. I was also partially supported by QBMI
award and DeVlieg fellowship for Computational Engineering at Michigan State

University.

TABLE OF CONTENTS

LIST OF TABLES ............................................................................................................. xi
LIST OF FIGURES ......................................................................................................... xiii
LIST OF ACRONYMS .................................................................................................... xv
CHAPTER 1 INTRODUCTION ........................................................................................ 1
1.1 Background ......................................................................................................... 1
1.2 Organization of the Dissertation ............................................................................... 5
1.3 Contribution of the Dissertation ................................................................................ 6

CHAPTER 2 IDENTIFYING GENES RELEVANT TO A CELLULAR FUNCTION 7

2.1 Introduction ............................................................................................................... 7
2.2 Methodology of GA/PLS .......................................................................................... 8
2.2.1 Assumption of GA/PLS ..................................................................................... 9
2.2.2 Partial Least Squares Analysis ......................................................................... 10
2.2.3 Genetic Algorithm ........................................................................................... 11
2.2.4 Gene Subset Selection Optimization ............................................................... 15
2.3 Application of GA/PLS to Rat Hepatocypte Culture System ................................. 16
2.3.] Experimental System ....................................................................................... 17
2.3.2 Data Preprocess ................................................................................................ 18
2.3.3 Results .............................................................................................................. 23
2.3.4 Model Robustness Evaluation .......................................................................... 34
2.4 Discussion ............................................................................................................... 4O

vii

CHAPTER 3 A BAYESIAN FRAMEWORK TO INFER BIOLOGICAL PATHWAYS

AND NETWORK ............................................................................................................. 47
3.1 Introduction ............................................................................................................. 47
3.2 MATERIAL AND METHODS .............................................................................. 52

3.2.1 Data Collection ................................................................................................ 52
3.2.2 Bayesian networks ........................................................................................... 54
3.2.3 Inferring the Bayesian Network from information theory ............................... 56

3.2.4 Detecting latent variables and comparing alternative models based on a

Bayesian scoring metric ............................................................................................ 58
3.2.5 Bayesian network predictor and target function prediction ............................. 60
3.2.6 Data discretization ........................................................................................... 61
3.2.7 Sensitivity analysis of Bayesian Networks ...................................................... 62
3.3 Results ..................................................................................................................... 64
3.3.1 Reverse engineering the sub-networks ............................................................ 65
3.3.2 Identifying relevant pathways linked to intracellular TG levels ...................... 68
3.3.3 Accuracy and sensitivity of the postulated networks ....................................... 71
3.4 Discussion ............................................................................................................... 74
CHAPTER 4 IDENTIFYING PHENOTYPE RELEVANT PAHTWAYS ..................... 84
4.1 Introduction ............................................................................................................. 84
4.2 Materials and Methods ............................................................................................ 87
4.2.1 Three Stage Integrative Pathway Search Framework (TIPSO) ........................ 87
4.2.2 Bayesian Network Inference ............................................................................ 90
4.3 Results and discussion ............................................................................................ 91

viii

4.3.1. Identifying genes relevant to cytotoxicity or LDH release using GA/PLS 91
4.3.2. Identifying genes involved in an independent pathway related to cytotoxicity

using CICA ............................................................................................................... 92
4.3.3. Reconstruct pathways related to cytotoxicity using BN ................................. 93

CHAPTER 5 STATE SPACE MODEL TO IN FER TRANSCRIPTION FACTOR

ACTIVITIES FROM DYNAMIC GENE EXPRESSION DATA ................................. 109
5.1 Introduction ........................................................................................................... 109
5.2 Materials and methods .......................................................................................... 113
5.3 Results ................................................................................................................... 120

5.3.1. Inferring TFA from simulated data ............................................................... 120
5.3.2. E. coli ............................................................................................................ 124
5.3.3. Saccharomyces Cerevisiae ............................................................................ 125
5.4 Discussion ............................................................................................................. 128

CHAPTER 6 EXTENSIONS TO IMPROVE THE HPSO FRAMEWORK ................ 134

6.1 Introduction ........................................................................................................... 134

6.2 A Hierarchical Approach Employing Metabolic and Gene Expression Proﬁles to

Identify the Pathways that Confer Cytotoxicity in HepGZ Cells ................................ 134
6.2.1 Introduction .................................................................................................... 134
6.2.2 Materials and Methods ................................................................................... 136
6.2.3 Results ............................................................................................................ 139
6.2.4 Discussion ...................................................................................................... 148
6.2.5 Conclusion ..................................................................................................... 149

ix

 

6.3 Inferring active pathways that confer a phenotype involving multiple metabolites

..................................................................................................................................... 149
6.3.1 Introduction .................................................................................................... 149
6.3.2 Materials and Methods ................................................................................... 150
6.3.3 Results and discussion ................................................................................... 150

6.4 Using a Dynamic Bayesian Network (DBN) to identify active pathways from the

dynamic proﬁles .......................................................................................................... 156
6.4.1 Introduction .................................................................................................... 156
6.4.2 Materials and Methods ................................................................................... 157
6.4.3 Results and Discussion .................................................................................. 157

6.5 Future work ........................................................................................................... 161

CHAPTER 7 CONCLUSIONS ...................................................................................... 163

Appendix Figure 1. Metabolic Reaction Map ............................................... 165

BIBLIOGRAPHY ....................................................................................................... 166

LIST OF TABLES

Table 2.1: 111 genes selected after t-test ....................................................... 19

Table 2.2: 45 genes involved in the metabolic network ...................................... 22
Table 2.3 A. Genes selected for intracellular triglyceride .................................... 25
Table 2.3 B. Genes selected for urea synthesis ................................................. 27
Table 2.4 A. Genes selected for intracellular triglyceride after noise addition ............. 36
Table 2.4 B. Genes selected for urea synthesis aﬁer noise addition .......................... 38

Table 2.5A. Genes selected for intracellular TG using ﬁtness function in equation 2.10

......................................................................................................... 42
Table 2.53. Gene selected for intracellular TG with ﬁtness function in equation 2.10 after
noise addition ........................................................................................ 43
Table 3.1. Learned TCA cycle relationship ...................................................... 54
Table 3.2. Learned urea cycle relationships .................................................... 68

Table 3.3. Relations recovered by Bayesian network analysis with IC* algorithm. . . . . ...71

Table 3.4. Bayesian score and prediction accuracy of the Figure 3.5 ........................ 73
Table 3.5 Sensitivity analysis of TG network shown in Figure 3.6 ......................... 74
Table 3.6. Relations recovered by Bayesian network analysis with MCMC algorithm, a

search and score method. A value of 1 indicates a direct connection. ...................... 81
Table 4.1 Simulating genetic perturbation and its effects on LDH release ................. 95

Table 4.2 Simulating down-regulation of PKC-5 and its effects on NF-KB in medium

culture (with 20 ng/ml TNF-a). ............................................................... 101
Table 5.1 (a) Parameters of the simulated network shown in Figure 4.2 .................. 121
Table 5.1 (b) Learned parameters of the simulated network shown in Figure 4.2. . . . . ...122
Table 5.1 (c) Simulated gene expression data ................................................. 123
Table 5.2 Eight genes regulated by ARCA and CRP ......................................... 124

xi

Table 6.1: Gene sets used in the GSEA analysis .............................................. 138

Table 6.2 Correlation between the metabolic ﬂuxes and the LDH release ................ 142
Table 6.3 Enriched Gene sets in GSEA analysis .............................................. 144
Table 6.4 Genes selected by MBPLS model ................................................... 145

Table 6.5 Important genes selected for two factors ofTG and LDH........................150
Table 6.6 Important genes selected for LDH alone ........................................... 153

Table 6.7 Experimental conditions for DBN learning ......................................... 158

xii

LIST OF FIGURES

Figure 2.1 An overview of the GA/PLS feature selection algorithm .......................... 9
Figure 2.2 Representative of the string in the GA/PLS algorithm ............................ 12
Figure 2.3 Determining w value for selecting TG genes ....................................... 14
Figure 2.4 Determining w for selecting urea synthesis genes .................................. 14
Figure 2.5 TG gene selection and metabolic function prediction ............................. 25
Figure 2.6 Pathway reconstruction for TG ....................................................... 31
Figure 2.7 PLS prediction of urea synthesis ..................................................... 32
Figure 2.8 Pathway reconstruction for urea synthesis .......................................... 34
Figure 3.1 Bayesian network based framework ................................................. 50
Figure 3.2 An example of a simple Bayesian network ......................................... 55
Figure 3.3 An illustration of the three phases of the Bayesian network learning process
using mutual information based algorithm ...................................................... 58
Figure 3.4 Reverse Engineered TCA and urea cycle learned by mutual information based
algorithm ............................................................................................. 67
Figure 3.5 Postulated sub-networks for intracellular TG accumulation, learned with IC*
algorithm ............................................................................................. 70
Figure 3.6. PLS ﬂux selection ..................................................................... 76

Figure 3.7 The effect of noise in the data on Bayesian network learning of the urea cycle.
......................................................................................................... 78

Figure 3.8 The effect of omitted measurements on Bayesian network learning. . . . . . . . ....79

Figure 4.1 Scheme of T IPS© ...................................................................... 85
Figure 4.2 Example of a Bayesian network ...................................................... 88
Figure 4.3 A representative sub-network related to cytotoxicity ............................. 91

Figure 4.4 Effect of TNFa and palmitate on stearoyl-CoA desaturase (SCD) measured by
western blotting ....................................................................................... 94

xiii

Figure 4.5 Effect of SCD activator, cloﬁbrate and ciproﬁbrate, on LDH release in the
palmitate cultures .................................................................................... 97

Figure 4.6 Effect of antioxidant N-acetyl-cysteine on SCD measured by western blotting.
......................................................................................................... 99

Figure 4.7 Effect of TNFor on PKG-8 expression measured by western blotting. . .. . ....101

Figure 4.8 Measurement of LDH release in the different culture mediums, with and

without rottlerin .................................................................................... 103
Figure 4.9 Effect of rottlerin on NF -KB measured by western blotting .................... 103

Figure 4.10 Effects of TNFor on Bcl-2 .......................................................... 105
Figure 5.1 SSM representation of the gene regulatory network ............................. 112
Figure 5.3 An example and its SSM representation of the gene regulatory motifs ....... 118
Figure 5.3 The results of using a SSM to analyze an E. coli system ........................ 125
Figure 5.4 A SSM representation of the Saccharomyces cerevisiae (yeast) system studied(é
........................................................................................................ 12

Figure 5.5 The results of using a SSM to analyze a yeast system ........................... 128
Figure 6.1 An overview of the hierarchical approach ......................................... 135
Figure 6.2 Fisher’s discriminant analysis of metabolites ..................................... 141
Figure 6.3 DBN learned gene regulatory network ............................................. 162

xiv

LIST OF ACRONYMS

ACC: acetyl-CoA carboxylase
ALDH: aldehyde dehydrogenase
ANOVA: analysis of variance

Bel-2: B-cell lymphoma 2

BSA: Bovine Serum Albumin

BN: Bayesian network analysis

ChIP: chromatin irnmunoprecipitation
EM: expectation maximization

ETC: electron transport chain

FDA: Fischer’s discriminant analysis
FFA: free fatty acid

GA: genetic algorithm

GSEA: gene set enrichment analysis
GST: glutathione—s transferase

IAP: inhibitor of apoptosis protein
ICA: independent component analysis
LDH: lactate dehydrogenase
MBPLS: multi-block partial least squares
NAC: N-acetyl-cysteine

NF -KB: nuclear factor kappa B

PLS: partial least squares analysis

PKC-delta: protein kinase C delta isoforrn

XV

PKR: double stranded RNA dependent protein kinase
ROS: reactive oxygen species

SCD: stearoyl-CoA desaturase

SOD: superoxide dismutase

SSM: state-space model

TIPS: three stage integrative pathway search system
TNF: tumor necrosis factor

TG: triglyceride

TRAF: TNF receptor associated factor

xvi

CHAPTER 1 INTRODUCTION

1.1 Background

A cell is a complex multi-scale biological system, with information ﬂowing between
genes, proteins, and metabolites. Genes transcribe to proteins and some proteins are
employed as enzymes to catalyze metabolic reactions, while some other proteins are
employed as transcription factors. Genes can be transcriptionally regulated by both
proteins (transcription factors) and metabolites. Thus, a cell is a complex system
regulated by interconnected gene regulatory and signal transduction networks. The
development of high throughput technologies, such as cDNA mircoarray and mass
spectroscopy, makes it possible to probe a cell system globally and measure their gene
expression, protein and metabolite proﬁles. Due to the quantity of data, there is an
increasing reliance on models to interpret the data and understand the underlying
mechanisms.

Cellular functions are regulated by networks of genes, proteins, and small
molecules, such as metabolites and signaling molecules. When cellular functions are
abnormally regulated, disease states may ensue. For example, apoptosis is abnormally
regulated in cancer cells, while glucose level is abnormally regulated by insulin in type 2
diabetes. Therefore, to understand the mechanisms of disease development requires
knowledge of how cellular function are regulated, which in turn would aid the
development of therapies. There are two types of approaches that have been used to
reconstruct pathways ﬁom high throughput data measurements. The ﬁrst approach

reconstructs pathways from measurements of molecular interactions such as protein-

protein and protein-DNA interactions. The second approach infers the pathways from
measurements of cellular states, such as gene expression and metabolic proﬁles.

With the ﬁrst approach, gene regulatory [Lee T.I. et al., 2002] and signal
transduction [Steffen, M. et al., 2002] networks have been reconstructed using protein-
DNA and protein-protein interactions, respectively. Protein-DNA interactions can be
measured by chromatin irnmunoprecipitation (ChIP). Lee T.I. et al. (2002) integrated
gene expression with protein-DNA interaction data to reconstruct the transcriptional
regulatory network in yeast cell cycle. Similarly, gene expression proﬁles have been
integrated with protein-protein interaction data to construct signal transduction pathways
[Steffen M. et al., 2002] while a global molecular interaction network was reconstructed
by integrating gene expression with protein-DNA and protein-protein interaction [Ideker
T. 2002]. In that case, the gene expression data were used to identify the active pathways
or sub-networks. The method assumes that genes involved in the same sub-
network/pathway should show similar expression changes. Therefore, the active sub-
network is identiﬁed by selecting a subset of genes that are connected based upon
interaction data and whereby their expression proﬁles changed signiﬁcantly.

However the availability of genome-wide protein-protein and protein-DNA
interaction measurements is more limited in mammalian systems. To address this
problem, we applied the second approach to infer networks from measurements of
cellular states, such as gene expression and metabolite proﬁles, which are observed
cellular responses induced by the underlying regulatory networks. A typical approach is
to identify co-regulated genes by clustering genes with similar expression. For example,

[Sega], E. et al., 2003] built relational probabilistic models to combine gene expression

data with DNA sequence proﬁle or protein-protein interaction data to identify co-
expressed genes that share common regulatory motifs or whose gene products interact
with each other. [Tanay, A. et al. 2004] developed an approach, SAMBA, to cluster genes
based upon the combined information from gene expression, protein-protein interaction,
protein-DNA interaction, and growth phenotype data. However, cluster analysis can not
identify potential pathways by which the cells are regulated to confer a speciﬁc
phenotype.

To infer the active pathways from gene expression data, Collins’ group has
developed a model by assuming a log linear relationship between the expression of a
gene, and the genes that regulate it, as well as the external perturbations [di Bernardo, D.
et al. 2005]. The gene expression proﬁles of yeast, under different perturbed states, such
as, gene deletion, promoter insertion and drug compound treatment, were obtained from
the literature and used to train the model. Then, the gene expression response to a new
drug (external perturbation) was obtained and the log linear model was applied to predict
the effects induced by a new drug. This method requires a large amount of data for the
network reconstruction [di Bernardo, D. et al. 2005].

Therefore, we propose an alternative approach that identiﬁes the active pathways
without requiring interaction measurements or libraries of genetic mutants, and with a
limited amount of data, namely, by integrating gene expression and phenotypic proﬁles.
The method assumes that information about the regulatory networks is contained within
the proﬁles. Central to our approach is Bayesian Network (BN) analysis, a probabilistic
graphic model, which detects implicit as well as explicit connections [Li and Chan,

2004b]. BN can detect indirect inﬂuences and unmeasured events and is not susceptible

to the existence of unobserved variables. It has been applied to infer gene regulatory
network of yeast cell cycle from gene expression data [Friedman 2004], metabolic
subnetworks from metabolic data [Li and Chan, 2004b] and protein signaling pathways
from protein activity data [Sachs et al. 2005]. Unlike previous studies, in which the nodes
are fairly well established in the literature and the BN approach was applied to infer the
connections between the nodes, this study aims to identify the relevant nodes, which are
unknown a priori to be activated by the environment or important to a cellular process of
interest. The BN approach is computationally inefﬁcient when applied to large networks
with thousands of genes. In other words, it is more difﬁcult to infer the structure of large
networks using BN analysis. Therefore, in our approach we address this shortcoming as
follows: since only part of the network is typically perturbed (active) when the cells are
subjected to an environmental stress, methods to reduce the set of genes to a smaller and
more relevant subgroup will be valuable. BN analysis can then be used to reconstruct the
active sub-network from the smaller group of genes to reveal which pathways are
induced by the external stimuli or environment.

As proof-of-concept, the framework was applied to identify pathways that are
involved in palmitate and tumor necrosis factor (TNF)-or induced cytotoxicity in human
liver cells. The level of cytotoxicity was measured by lactate dehybrogenase (LDH)
release, which is a measure of cell death. Elevated levels of free fatty acids (FFAs) and
TNFor have been shown to be involved in the pathogenesis of liver disorders, such as
fatty liver disease and steatohepatitis [Felber 2002, Kobayashi 1998, Tilg 2001, Watada

2003]. Quantiﬁcation of the genetic responses of hepatocytes to physiological actions of

 

 

these factors helped to identify the pathways involved in conferring cytotoxicity (e.g.,

producing a cellular phenotype with a high level of LDH release).

1.2 Organization of the Dissertation

In the following chapters, I will talk about techniques to select important genes relevant
to a cellular function. In chapter 2, genetic algorithm coupled partial least square analysis
(GA/PLS) is introduced and applied initially to experimental data to of primary rat
hepatocyte culture system to select genes relevant to cellular functions such as urea
production and TG accumulation. In chapter 3, Bayesian network analysis is introduced
and applied to reconstruct regulatory networks from experimentally obtained metabolic
data to illustrate its capabilities. Chapter 4 discusses the Three-stage Integrative Pathway
Search (T IPSQ) framework, which combines several techniques e.g. GA/PLS,
independent component analysis (ICA) and BN analysis, and its application to a HepG2
cell culture system to infer pathways important in regulating the cytotoxic phenotype.
These pathways were subsequently validated through both a literature survey and further
experiments. In chapter 5, multisource data including gene regulatory network structure
information and gene expression data were integrated in a dynamic state space model to
extract underlying information of transcription factor activities. Chapter 6 discusses
several extensions to improve the T [PS© framework, namely, by incorporating multiple

metabolites and multi-source information into a dynamic model

1.3 Contribution of the Dissertation

This dissertation is one of the ﬁrst endeavors to integrate phenotypic, metabolic and gene
expression proﬁles with the objective to identify active pathways that regulate the cellular
functions. It contributes in the following aspects:

1) Introduced GA/PLS to select a subset of genes relevant to cellular functions and was
applied to experimental systems.

2) Proposed a Bayesian framework to reconstruct biological pathways. Metabolic ﬂuxes
and gene regulatory network were reconstructed from experimental data.

3) Independent component analysis was applied to extract pathways relevant a cellular
function.

4) T IPSO framework integrated GA/PLS, ICA and BN to infer active pathways relevant
to a phenotype. T IPS© approach was applied to a HepG2 cell system and meaningful
pathways were extracted and veriﬁed.

5) A dynamic, state space model was introduced to infer transcription factor activities

from gene expression data.

CHAPTER 2 IDENTIFYING GENES RELEVANT TO A
CELLULAR FUNCTION

2.1 Introduction

As a step towards uncovering the interplay between genes and metabolic functions, the
objective of this chapter is to develop a framework capable of selecting a subset of genes
that can i) quantitatively predict metabolic functions based upon their expression levels;
ii) be used to reconstruct regulatory pathways. Since metabolic ﬁmctions are determined
in part by the levels and activities of enzymes in the network, which in turn are regulated
in part by their gene expression, it is reasonable to suggest that metabolic functions could
be predicted by the expression level of a subset of relevant genes. To test this assumption,
we developed a genetic algorithm coupled partial least squares analysis (GA/PLS)
framework to identify a subset of genes that can predict metabolic function(s). The
relevance of the gene subset is assessed by reconstructing their regulatory roles in the
metabolic network with aid from the current literature knowledge. Gene expression and
metabolic data were experimentally obtained for a hepatocellular system.

This concept of identifying a subset of genes that can predict metabolic
function(s) is analogous to selecting an optimal subset of wavelengths in constructing a
spectroscopic calibration model to predict chemical concentrations from signal
measurements, e.g. absorbance value [Banglore 1996]. Full spectrum data (signal
measurements) consists of absorbance values at different wavelengths, which are easily
collected in a matter of seconds, but only a subset of the spectral data is essential to

building a calibration model. The optimal subset of wavelengths used to build the

calibration model changes as the chemicals being measured change [Leardi 2000]. By
analogy, a full range of gene expression data is also easily collected, but only a subset of
these genes are relevant to a particular cellular function. This subset of genes changes as
the cellular function being predicted changes. In the current study, genetic algorithm
(GA) was used to identify a subset of genes that are most inﬂuential or relevant to a
metabolic function. Then a partial least squares analysis (PLS) model was subsequently
constructed to predict the metabolic function based upon the expression level of the
selected genes. We reviewed the biological function of the selected genes to elucidate
their potential role in regulating the metabolic function. Finally, we evaluated the model
robustness by artiﬁcially adding noise to the gene expression data and comparing the

subset of genes selected before and after noise addition.

2.2 Methodology of GA/PLS

The GA/PLS framework is illustrated in Figure 2.1. In brief, the whole dataset is
separated into three independent groups, namely training, monitoring, and testing data
sets, in order to avoid over-ﬁtting. First, GA randomly selects a subset of genes from the
training data set. The expression levels of these genes are subsequently used to construct
a PLS model to predict a metabolic function, e.g. intracellular triglyceride (TG) or urea
synthesis. The accuracy of the PLS model is evaluated using both the training and
monitoring datasets. The accuracy of the PLS model prediction is used to assess the
ﬁtness of the subset of genes selected. Genetic algorithm continues to search for a gene
subset until the accuracy of the PLS model prediction is maximized. The testing data set

is used to validate the prediction ability of the ﬁnal GA/PLS model.

 

 

FIGURE 2.1: An overview of the GA/PLS feature selection algorithm. The data was separated into
three independent subsets: training set, monitoring set, and testing set. GA randomly selects a subset of
genes from the training data set. A PLS model was used to predict metabolic function based upon the
expression of the selected genes. The prediction accuracy of the PLS model was used as the ﬁtness value of
the gene subset The gene subsets with high ﬁtness values were selected into the next iteration and
subjected to crossover and mutation to introduce diversity into the population. The process terminates when
the objective function reaches its maximum or when the termination condition (e.g., maximum number of
iterations) is satisﬁed. The ﬁnal model is validated with a testing data set.

2.2.1 Assumption of GA/PLS

We approximate the relation between the metabolic ﬁrnction and expression level of this

subset of genes with a log-linear model:

Met(treatment) _ " Gene(treatment)i C(i) (21)
Met(control) i=1 Gene(control)i I
where Met(treatment) and Met(control) are the metabolic function for the treated and

control cultures, respectively; Gene(treatment)i and Gene(control),- are the expression

level of gene i for the treated and control cultures, respectively.

 

 

G t t t -
Denoting Y as log2(Met(treatment)) and X; as log,( ene( rea men )’ ), equation (2.1)
Met(control) Gene(c0ntrol),-
is transformed to:
n
Y = 2 com (2.2)

i=1
Since DNA mircoarray data are typically measured with respect to a reference level, we
applied a log-linear model, which works well when data are presented as relative levels.
Furthermore, a log-linear model allows some of the nonlinear relationships between
metabolic function and gene expression to be captured. Log-linear models have been
applied to approximate nonlinear processes in biochemical systems [Ni and Savageau,
1996; Ni and Savageau, 1996]. In this study the coefﬁcients C(i) in equation (2.2) are
determined by partial least square (PLS) analysis and the genes, Genei, are selected by

GA/PLS as described below.

2.2.2 Partial Least Squares Analysis

PLS [Wold 1984, Geladi 1986] is a statistical approach capable of modeling a large
number of variables using a relatively small set of observations. It circumvents typical
problems associated with highly correlated and collinear nature of experimental data by
projecting the data onto a few independent latent factors. These latent factors simplify the
complex and diverse relationships by capturing the variable interactions contained in the
original data into a new set of fewer unobserved/latent variables. We used this approach
to map the levels of gene expression (X) to a metabolic function (Y), to gain an
understanding of the interplay between a hepatic function and the gene expression

proﬁle. The PLS algorithm determines, based upon a nonlinear iterative partial least

10

squares (NIPALS) approach, a set of orthogonal projection axes W, henceforth called
PLS-weights, and sample projections T. For direct projection of the samples, W* =
(W (PT*W)'l) is used:

T = XW* (2.3)
Then, regression coefﬁcients [5 in equation (2.4) are obtained by regressing Y onto the
sample projections T:

Y = TB (2.4)

With a PLS factors, the PLS model is:

A . a
Y=XWaﬂ=Taﬂ=thjﬂj (2.5)
J:
T ‘1 T
B =(Ta T) Ta Y (2.6)
2.2.3 Genetic Algorithm

GA is an optimization method based upon the theory of natural selection. A more
complete discussion of GA can be found in [Goldberg 1989]. Here we demonstrate a
feature selection algorithm that combines GA and PLS (see Figure 2.1). GA begins with
an initial, randomly selected population. Each individual in the population is a potential
solution to the optimization problem and its ﬁtness to the optimization problem is
evaluated by a ﬁtness function. The population evolves over generation by way of three
genetic operators: reproduction, crossover, and mutation. The process terminates when
the objective function reaches its maximum or when the termination condition (e.g.,
maximum number of iterations) is satisﬁed. The algorithm includes four main steps:

initialization, evaluation, reproduction and crossover, and mutation.

11

Step 1: Initialization

In GA, an individual is a string of length N, schematically shown in Figure 2.2. The ﬁrst
N—I values of the string are binary with value l/0 representing the inclusion or exclusion
of a gene in the PLS model and the N“ value represents the number of latent variables in
the PLS model. A population is a collection of different individuals, i.e. a set of different
strings (e.g., genes) of the same length. The initial population is created randomly in a
user speciﬁed bounds of the N variables in the string. For example, the bounds of the ﬁrst
N—I bits of the string in this current study were [0,1], thus the individuals in the initial
population were assigned randomly with values 0 or 1. The N“ bit of the string, i.e.

number of latent variables, is randomly assigned a value between [1,10].

 

 

 

 

 

 

 

 

 

f Genel Gene2 Gene3 Gene 156
l 1 0 O 1 0 2
Num of latent
variables

FIGURE 2.2: Representative of the string in the GA/PLS algorithm. 156 genes were selected by t-test
and addition of known relevant genes. The ﬁrst 156 elements of the string contain a binary 0/1 data
representing the exclusion or inclusion of the genes in the PLS model. The last bit of the chromosome
represents the number of latent variables in the PLS model.

Step 2: Evaluation

Each individual in the population is evaluated with a ﬁtness function to determine how
well they ﬁt or improved the optimization problem. The optimization problem in our case
is to ﬁnd a subset of genes that can be used to construct a PLS model to predict metabolic

functions with a minimal prediction error and number of latent variables. The ﬁtness

12

function is deﬁned by equation 2.7. Similar ﬁtness function has been used in [Bangalore
1996] to select wavelengths that optimize calibration models.

1
20,. —y,. )2 +LVW

 

ﬁtness = (2.7)

A

where y,- is the measured metabolic ﬂux value, y, is the PLS predicted metabolic ﬂux

value, LV is the number of latent variables in the PLS model, and w is a weighting factor
to establish an optimal balance between prediction accuracy and the model size (number
of PLS latent variables). w normalizes the model size to the same scale as the prediction
error terms. An Optimal value of w is critical to the success of the GA/PLS model. Too
large a w results in a model with too few factors to explain the gene expression data. Too
small a w causes the PLS model to include too many factors and become essentially a
curve ﬁt. An optimal w, guided by a GA search, allows one to ﬁnd a model that will
provide the most accurate prediction with a minimum number of latent variables. To
determine an optimal w value, different w values, ranging from 0 to 0.5, were used to
search for a PLS model. The different models were compared using two criteria, the
mean square error of prediction and the number of PLS latent variables. An optimal
value of w was determined by balancing these two criteria to obtain a reasonable
prediction error with a minimum number of latent variables. The optimal value for w was
determined to be 0.4 for predicting intracellular TG accumulation and 0.45 to predict urea

synthesis, shown in Figures 2.3 and 2.4, respectively.

13

 

 

 

 

A
8.E-O4
Mean 6.E-04
2,”ng 4.304
predctiou 2304
0.13100
0 w 0.1 0.2 0.3 0.4 0.5 0.6
B
4
Numof 2 T T T A A
PLSIatentI ' '
variables 0
0 W 0.1 0.2 0.3 0.4 0.5 0.6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

FIGURE 2.3: Determining w value for selecting TG genes. An optimal value for w in equation 2.7 was
determined by balancing the two criteria of GA/PLS performance: the mean square error of the PLS model
prediction (shown in 3A), and the number of latent variables in the PLS model (shown in 38). Different
values of w values ranging from 0 to 0.5 was tested while monitoring the above two criteria. An optimal
value of 0.4 for w was found for intracellular TG. At this w value, both the number of latent variables and

prediction error reach their minimum levels.

A
8.E-04
6.E-04
4.E-04
Mm 23-04
8‘1““ on+oo
error 0f 0 0 1 0 2 0 '3 0 4 0 5 O 6
“Cdcum ‘V . . . . .-. .
3
B
2 e e e
Nrrm of 1 g g
PLS latent
variables 0
0 w 0.1 0.2 0.3 0.4 0.5. 0.6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

FIGURE 2.4: Determining w for selecting urea synthesis genes. An optimal value for w in equation 2.7
was determined by balancing the two criteria in GA/PLS performance: the mean square error of the PLS
model prediction (shown in 4A), and the number of latent variables in the PLS model (shown in 4B).
Different values of w ranging from 0 to 0.5 were tested while monitoring the above two criteria. An optimal
value of 0.45 for w was found for urea synthesis. At this w value, both the number of latent variables and

prediction error reach their minimum levels.

Step 3: Reproduction

After the evaluation step, each individual in the population is assigned a probability value
according to its ﬁtness value, which determines whether it will be selected to the next
iteration or generation in GA. We used roulette selection [Holland 1975] to determine

the probability of selecting an individual into the next iteration:

Pi = §‘ (2.8)

where the ﬁtness f,- ,i=1,2, ...,N of each individual is calculated using the ﬁtness function
deﬁned in equation 2.7.

Step 4: Crossover and Mutation

Crossover and mutation was applied to introduce diversity into the population, which
ensures the escape from local optimum in the GA search algorithm. In crossover, two
distinct individuals were selected randomly from the population and some portions of the
strings are exchanged with a probability PC. In mutation, one or more bits of the string are
altered with probability Pm. In this study, we applied an arithmetic crossover with a
probability of 0.6 and a boundary mutation with probability of 0.05 (see reference

[Houck 1995] for details on the crossover and mutation methods applied in this chapter).

2.2.4 Gene Subset Selection Optimization

GA/PLS selects different subsets of genes to predict the same metabolic ﬂux given
different initial populations. In other words, different subsets of genes can give the same
or nearly same degree of accuracy in the prediction of the metabolic ﬂux based upon their

gene expression level. Therefore, a strategy was developed to identify a group of genes

15

from multiple possible solutions. The strategy involved running the GA/PLS model with
different initial populations and counting the frequency of appearance of each gene in the
multiple solutions. The initial population size ranged from 30 to 100 individuals and each
individual contained a set of different genes. GA/PLS was run 14 times with different
initial population of individuals. A gene was included in the ﬁnal subset if it was selected
by the GA/PLS model in more than half of the runs. Therefore, the genes that appeared
more than 8 times as a solution in the GA/PLS model were selected into the ﬁnal gene
subset. Similar methods have been applied for wavelength selection in spectroscopic

calibration model [Leardi 2000].

2.3 Application of GA/PLS to Rat Hepatocypte Culture System

To test the ability of GA/PLS in identifying relevant genes of a cellular function,
GA/PLS was applied to the gene expression and metabolic proﬁles obtained ﬁ'om
primary rat liver cell cultures. GA/PLS selected a subset of genes relevant to cellular
frmctions of urea production and triglyceride accumulation. Then a partial least squares
analysis (PLS) model was subsequently constructed to predict the metabolic function
based upon the expression level of the selected genes. We reviewed the biological
function of the selected genes to elucidate their potential role in regulating the metabolic
function. Finally, we evaluated the model robustness by artiﬁcially adding noise to the
gene expression data and comparing the subset of genes selected before and after noise

addition.

16

2.3.1 Experimental System

The gene expression and metabolic data were obtained from cultured primary rat
hepatocytes [Chan 2002, 2003 a,b,c]. Hepatocytes were isolated from adult female Lewis
rats. The isolated hepatocytes were cultured in a collagen sandwich conﬁguration and
incubated in standard hepatocyte culture medium containing either 0.5 U/ml insulin (high
insulin) or 50 uU/ml insulin (low insulin). The hepatocytes were cultured in this fashion
for at least 6 days prior to plasma exposure. This interval is considered the pre-
conditioning period. The six-day-old-sandwiched hepatocyte cultures were subsequently
exposed to unsupplemented, or amino acid supplemented plasma solution for an
additional seven days [Chan 2003 a,b]. The four combinations of pre-conditioning and
plasma supplementation evaluated were as follows. They were i) low insulin pre-
conditioned and unsupplemented plasma (denoted as [L,O] thereafter), ii) low insulin pre-
conditioned and amino acid supplemented plasma (denoted as [L,A]), iii) high insulin
pre-conditioned and unsupplemented plasma (denoted as [H, 0]), and iv) high insulin pre-
conditioned and amino acid supplemented plasma (denoted as [H,A]) cultures. The
metabolites were measured using HPLC and biochemical assays described elsewhere
[Chan 2002, 2003 a,b,c]. A model for hepatocyte metabolism was created based on the
known stoichiometry of the hepatic metabolic network. Metabolite measurements were
coupled to Metabolic Flux Analysis (MFA) to obtain the intracellular ﬂuxes [Chan
2002a]. The gene expression proﬁles of the samples were collected using Affymetrix
chips. All expression proﬁles were generated using total RNA, with in vitro transcription

yielding biotinylated cRNA for hybridization to Affymetrix Rat UG34A GeneChip array.

17

The chips were analyzed at the Brigham and Women’s Hospital in Boston. The

expression data can be accessed at www.msu.edu/~lizhengl/microgrrav.zip.

2.3.2 Data Preprocess

Expression levels of 8,799 genes were measured using Affyrnetrix chips and a subset of
these genes were differentially expressed across the range of experimental conditions in
this study. Student t-test was used to identify the subset of genes that were differentially
expressed between the high insulin pre-conditioned (including [11, 0], [H,A]) and the low
insulin pre—conditioned (including [L,O], [L,A]) groups. In t-test, normalized distance

A between the high insulin and low insulin groups is calculated as follows:

2 2
’51. 31h
"h "1h

where (mh,mu,) is the population mean value of the high insulin and low insulin groups,
respectively, ($11,311,) is the population variances, and (nh, 1111,) is the number of the samples
in the population. It is known that D follows approximately a student distribution. The
two populations are considered different when D exceeds a certain threshold value, which
depends on the conﬁdence level selected. Using student t-test, 179 genes were identiﬁed
as signiﬁcantly different with a 95% conﬁdence level. The 179 genes were checked
manually to exclude irrelevant ones, for example, genes expressed speciﬁcally in other
tissues, e.g., brain were removed. Finally, 111 genes remained for further GA/PLS
analysis and are listed in Table 2.1.

In addition to the genes selected by t-test, many other genes encode the enzymes
that are involved in the metabolic network deﬁned in reference [Chan 2003 a] and are
relevant to heptocellular function, but were not selected by the t-test. We included these

18

informative genes, namely, 45 genes that encode the enzymes or transcription factors

involved in the metabolic network detailed elsewhere [Chan 2003 a]. These genes are

listed in Table 2.2.

Table 2.1: 111 genes selected after t-test

 

 

No. Accession number Gene Name
Glyeeraldehydes-3-phosphate-dehydrogenase (GAPDH) (GAPDH,

1 AFFX_Rat_GAPDH_3 EC 1 .2.1.12)

2 U93306 U93306 Rattus norvegicus VEGF receptor-ZIFLK-I mRNA

3 AF000139 25—hydroxyvitamin D 1-hydroxylase (CYP1) mRNA, complete eds
RATATPBZS Rat Na+, K+ -ATPase (EC 3.6.1.3) betaZ subunit

4 D90048 gene and 5 ﬂank

5 E03344 E033440ds cDNA sequence of peroxisome forming factor

6 J01435 mitochondrial cytochrome oxidase subunits I,|l, Ill genes,
cholesterol side-chain cleavage enzyme mRNA (P450800),

7 J05156 complete cds

8 K00750 cytochrome c nuclear-encoded mitochondrial gene and ﬂanks

9 K00996 cytochrome p-450e (phenobarbital-induced) mRNA, 3 end
RATCYP45A Rat P-450(1) variant (phenobarbital-inducible)

10 K01721 cytochrome 3 end, ﬂank

11 L27129 stress activated protein kinase gamma isoform, mRNA, 5 end
M20131eds RATCYP45| Rat cytochrome P450llE1 gene, complete

12 M201 31 eds
M80784 RATTGFBET R.norvegicus type III TGF-beta receptor

13 M80784 mRNA, complete eds

14 J03960 J03960 Rat 5-lipoxygenase mRNA, complete eds
M88592 Rattus rattus peroxisome proliferator activated receptor

15 M88592 (PPAR) mRNA,
M9505800mplete$eq Rattus rattus steroid 5-alpha-reductase 2

16 M95058 Mrna

17 U12187 U12187 Rattus norvegicus ras-related protein (rad) mRNA

18 U39207 U39207 Rattus norvegicus cytochrome P450 4F5 (CYP4F5) mRNA
U11681 Rattus norvegicus rapamycin and FKBP12 target-1 protein

19 U11681 (rRAFT1)mRNA
U48596 Rattus norvegicus MAP kinase kinase kinase 1 (MEKK1)

20 U48596 mRNA

21 U68544 cyclophilin D mRNA, nuclear gene encoding mitochondrial protein
U40004 Rattus norvegicus cytochrome P450 pseudogene

22 U40004 (CYP2J3P2) mRNA

23 083538 083538 Rat mRNA for 230kDa phosphatidylinositol 4-kinase

24 AF012891 AFO12891 Rattus norvegicus frizzled related protein frpAP mRNA
$78284 bel-xshort=apoptosis inducer [rats, ovary, mRNA Partial,

25 878284 537 nt]

26 U18982 foe-related antigen 2 (fra-2) mRNA, complete eds

27 U39943 cytochrome P450 monooxygenase (CYP2J3) mRNA, complete eds
mRNA for hepatic microsomal UDP—glucuronosyltransferase

28 Y00156 (UDPGT)

29 A09811 A09811eds R.norvegicus mRNA for BRL-3A binding protein

30 ABOO4278 ABOO4278 Rat mRNA for protocadherin 2, partial eds
A8009372 Rattus norvegicus mRNA for Lysophospholipase,

31 A8009372 complete eds

l9

Table 2.1 (continued)

32

33
34

35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

64
65

ABO10154

A801 1068
A801 1530

AF00141 7
AF058795
AF0821 26
AF085693
AF093773
AJ000347
D10891

D1 3978
D161 01
D6341 1
D7921 5
J01 435
J04807
L04672

L1 31 51
L27061
M81 784
$48325
$52878
$59893
$68736
$76404
U04934
U37099
U75923
U78090
X06889
X54250
X62660

X65296
X74833

PKN mRNA for serinlthreonine protein kinase expressed in
hippoeam pus
A8011068 Rattus rattus mRNA for type II iodothyronine deiodinase,
com plete eds
ABO11530 Rattus norvegicus mRNA for MEGF4, complete eds
AF00141? Rattus norvegicus zinc ﬁnger protein mRNA, complete
eds
AF058795 Rattus norvegicus GABA-B receptor gb2 mRNA,
complete eds
AF082126 Rattus norvegicus aryl hydrocarbon receptor (AHR)
mRNA
G protein-coupled receptor kinase-associated ADP ribosylation
factor
cytosolie malate dehydrogenase (Mdh) mRNA, complete eds
mRNA for 3 (2 ),5 -bisphosphate nucleotidase
mRNA for metabotropic glutamate receptor mGluR5, complete eds
013978 Rattus sp. mRNA for argininosuccinate lyase, complete eds
016101 RATSGLT1 RAT mRNA for sodium/glucose cotransporter
063411 RATMPR Rat mRNA for mitochondrial precursor receptor,

complete eds
079215 Rattus norvegicus mRNA for FGF—10, complete eds
mitochondrial cytochrome oxidase subunits |,||, Ill genes

J04807mRNA RATINSIIA Rat insulin II gene mRNA, 3 end
L04672 RATGPCRCTR Rattus rattus G protein-coupled receptor
mRNA, com plete eds

L1 31 51 eds RATGAPX Rat GTPase-activating protein (GAP) gene,
com plete eds

L27061 RATPHOSE Rattus norvegicus phosphodiesterase mRNA,
3 end

M81784 RATKCAB Rattus norvegicus K+ channel mRNA,
sequence

$48325 diabetes-inducible cytochrome P450RLM6 [rats, liver,
mRNA Partial, 1093 nt]

intestinal 15 kda protein=fatty acid-binding protein homolog [rats,
ileum, mRNA, 460 nt]

$59893 La=autoantigen SS-B/La {3 region, clone K51} [rats,
mRNA, 106 nt]

myosin heavy chain [rats, CCl4-eirrhotie liver fat-storing cell line,
mRNA, 2924 nt]

beta -HKA=H,K-ATPase beta-subunit [rats, Genomic, 8983 nt,
segment 2 of 2]

(CD-1) clone Kc1 Na-Ca exchanger mRNA, partial eds

small GTP-binding protein (rab3c) mRNA, partial eds
U75923UTR#1 SEG_RNTRNAIS3 Rattus norvegicus isoleucyl
tRNA synthetase mRNA

potassium channel regulator 1 mRNA, complete eds

X06889eds RNRAB3 Rat res-related mRNA rab3

X54250m RNA RRATBPZ Rat mRNA for zinc finger protein AT-BP2,
partial eds

X62660m RNA RRGTSB R.rattus mRNA for glutathione transferase
subunit 8

X65296cds RRESHVEL R.rattus mRNA for carboxylesterase (Es-
HVEL)

X74833cds RNACRB1 R.norvegicus mRNA for acetylcholine

20

Table 2.1 (continued)

66
67

68
69

7O
71

79

81
82

83
85
87

89

91
92
93

95

97
98

100
101

102
103
104
105
106
107

108
109

Y09453
Y1 0258

21 1932
E02468

E01884
E01 789

M74494
M5921 1
M64033
L22558

M27433
M22253
U61 772

L10072

AF041 246

U44750
J04629

M6061 7
JO5031
J031 70
M60655
U35775
L46791

U39044
X1 2459
X66494
X80477
X86086
X62528
X04979

X91 234
U57362
U69673
D49980
U53859
D8441 8

M1 8467
078359
J04486

A8009636

M15883

AF003008

A80061 37

D901 09

receptor beta-subunit

Y09453cds RNY09453 R.norvegicus mRNA for calcium channel
gamma subunit

Y10258eds RSKET Rattus sp. mRNA for DNA binding protein, KET
Z11932cds RRVASOVZM R.rattus mRNA for vasopressin V2
receptor

E02468cds DNA sequence of rat TNF

E01884cds DNA sequence coding for rat lL-1-beta(interleukin-1
beta)

E01789cds cDNA sequence coding for rat C-kinase type-ll (beta-2)
M74494 Rat sodium/potassium ATPase alpha-1 subunit truncated
isoform mRNA,

M59211 Rat potassium channel Kv3.2b mRNA

M64033 Rat secretin gene, complete eds

adenylyl cyclase-activated serotonin receptor (5-HT7) mRNA
M27433 Rattus norvegicus germinal histone H4 gene

M22253 Rattus norvegicus sodium channel I mRNA

U61772 Rattus norvegicus merlin (NF2) mRNA

L10072 Rattus norvegicus 5-hydroxytryptamine receptor (5HT5a)
mRNA

AF041246 Rattus norvegicus orexin receptor-2 mRNA
NAB-dependent 15-hydroxyprostaglandin dehydrogenase mRNA
J04629 Rat (Na+, K+)-ATPase-beta-2 subunit mRNA

M60617 Rat CCAAT binding transcription factor-B subunit (CBF-
A1 1) mRNA

J05031 Rat isovaleryl-CoA dehydrogenase (IVD) mRNA

J03170 Rat liver speciﬁc transcription factor (LF-B1) gene
M60655 Rat alpha-1 B adrenergic receptor mRNA

U35775 Rattus norvegicus gamma-adducin mRNA

L46791 Rattus norvegicus cholesterol esterase mRNA

U39044 Rattus norvegicus cytoplasmic dynein intermediate chain
2A mRNA

X12459 Rat mRNA for argininosuccinate synthetase (EC 6.3.4.5)
X66494 R.norvegicus CHDT1 mRNA

X80477 R.norvegicus P2X mRNA

X86086 R.norvegicus RNA for annexin Vl

X62528 R.rattus mRNA for ribonuclease inhibitor

X04979 Rat gene for apolipoprotein E /cds=(23,958)

X91234 R.norvegicus mRNA for 17-beta hydroxysteroid
dehydrogenase type 2

U57362 Rattus norvegicus collagen XII alpha 1 (Col12a1) mRNA
U69673 Rattus norvegicus protein tyrosine phosphatase 20 mRNA
D49980 Rat gene for 8-oxo-dGTPase

U53859 Rattus norvegicus calpain small subunit (css1) mRNA
084418 Rat mRNA for chromosomal protein HMGZ

M18467 Rattus norvegicus mitochondrial aspartate
aminotransferase mRNA

078359 Rat drs (a gene down-regulated by v-src) mRNA

J04486 Rat insulin growth factor-binding protein mRNA
A8009636 Rattus norvegicus mRNA for phosphoinositide 3-kinase
M15883 Rat clathrin light chain (LCB2) mRNA

AF003008 Rattus norvegicus Mxi1 protein (eri1) mRNA
A8006137 Rattus norvegicus FTA mRNA for alpha 1,2-
fucosyltransferase

090109 Rat mRNA for long-chain aeyI-CoA synthetase (EC 6.2.1.3)

21

Table 2.1 (continued)

1 10 J02646

111

M14053

J02646 Rat translational initiation factor (elF-2) alpha subunit
mRNA
M14053 Rat glucocorticoid receptor mRNA

 

Table 2.2: 45 genes involved in the metabolic network

 

 

Accession
No. number Gene Name
1 L37333 glueose-6-phosphatase (G6Pase) mRNA, complete cds
2 X58865 X58865m RNA Rat PFK-L mRNA for liver phosphofructokinase
3 U32314 U32314 Rattus norvegicus pyruvate carboxylase mRNA
4 K03243 Phosphoenolpyruvate carboxykinase (GTP) gene, exons 1-3
U07177 Rattus norvegicus lactate dehydrogenase-C (LDH-C)
5 U071 77 mRNA
6 M54926 M54926 Rat lactate dehydrogenase A mRNA, 3 end
U07181 Rattus norvegicus lactate dehydrogenase~B (LDH-B)
7 U071 81 mRNA
8 U75393 succinyl-CoA synthetase alpha subunit mRNA,
9 J04473 J04473 Rat mitochondrial fumarase mRNA, complete eds
10 K03041 ornithine carbamoyltransferase mRNA
11 J02720 J02720 Rat liver arginase mRNA, complete eds
12 227513 gene for carbamoylphosphate synthase 1, exon 38
13 010354 010354 RATAAT Rat mRNA for alanine aminotransferase
J03865mRNA RATSDH22 Rat serine dehydratase (SDH2) gene, 3
14 J03865 ﬂank
X07467 Rat mRNA for glucose-6-phosphate dehydrogenase (Gd,
15 X07467 EC 1 .1 .1.49)
16 M16235 M16235 Rat hepatic lipase mRNA
X78593 R.norvegicus mRNA for glycerol-3-phosphate
1 7 X78593 dehydrogenase
L46791 Rattus norvegicus cholesterol esterase mRNA, complete
18 L46791 eds
19 J05446 J05446 Rat glycogen synthase mRNA, complete eds
M76767 RATFASA Rattus norvegicus fatty acid synthase mRNA,
20 M76767 complete eds
Rat long chain acyl-CoA dehydrogenase (LCAD) mRNA, complete
21 J05029 eds
038448 Rat mRNA for 88k0a-diacylglyeerol kinase (DGK-lll),
22 038448 com plete eds
AF017251 Rattus norvegicus phospholipase 0 (PLDs) mRNA,
23 AF017251 complete cds
24 X12752 X12752 Rat gene for DNA binding protein C/EBP
CCAAT binding transcription factor-B subunit (CBF-B) mRNA,
25 M34238 complete eds
X54423cds RNHNF1 Rat mRNA for hepatic nuclear factor one
26 X54423 (HNF1)
X571 33m RNA RNHNF4 Rat mRNA for hepatocyte nuclear factor 4
27 X57133 (HNF4)
Y1 4933m RNA RNHNF6B R.norvegicus mRNA for hepatocyte
28 Y14933 nuclear factor 6 beta
29 X55955 X55955 Rat mRNA for hepatocyte nuclear factor 3A (HNF-3A)

22

Table 2.2 (continued)

30 L09647 hepatocyte nuclear factor 3a (HNF-3 beta) mRNA, complete eds
31 K03486 K03486 RATPKC32 Rat protein kinase C type III mRNA, 3 region
32 M15523 protein kinase C-family related mRNA, partial eds, clone RP16
33 M55417 protein kinase C-gamma (PRKC-gamma) gene, exon 1
34 X07286 X07286eds RNPKCAR Rat mRNA for protein kinase C alpha
35 X07287 X07287cds RNPKCG Rat mRNA for protein kinase C gamma
X54096 Rat mRNA for lecithin-cholesterol acyltransferase (EC
36 X54096 2.3.1.43) (LCAT)
X53588 Rat mRNA for glucokinase, alternatively spliced GK2 (EC
37 X53588 2.7.1.1)
ABOO4329 Rattus norvegicus mRNA for acetyl-CoA carboxylase,
38 ABOO4329 complete eds
39 X55286 X55286 R.norvegicus mRNA for HMG-CoA reductase
U03763eds RRU03763 Rattus rattus phospholipase mRNA,
40 U03763 complete eds
41 L03294 L03294 Rattus norvegicus Iipoprotein lipase mRNA, complete eds
42 L06040 L06040 Rattus norvegicus 12-lipoxygenase mRNA, complete eds

type II pneumoeyte CD36-related class B scavenger receptor
(SRB1 R) mRNA
JO5460 RATCHOL7H Rat cholesterol 7-alpha-hydroxylase mRNA,

43 AF071495

 

44 J05460 complete eds
S775280ds rNFIL-6=C/EBP-related transcription factor [rats,
45 877528 Genomic/mRNA, 1759 nt]
2.3.3 Results

GA/PLS was applied to gene expression and metabolic data obtained from 4
culture conditions, [H, 0], [H,A], [L,0], [L,A], as described in the Methods. The data was
separated into 3 groups, a training ([H,A], [L,Oj), monitoring ([H, 0]), and testing ([L,A])
set. We investigated the ability of GA/PLS to select a subset of genes able to predict
hepatic frmctions (intracellular TG accumulation and urea synthesis). In addition, the role
of the selected genes in regulating the aforementioned functions was reconstructed with
the aid of a literature review. Finally, we evaluated the robustness of the GA/PLS model
to noise in the gene expression data.

Genes Selected for Intracellular TG

The frequency of each gene selected by GA/PLS in the multiple runs for
intracellular TG is plotted in Figure 2.5A. 59 genes were selected with a frequency

23

greater than 8 and are listed in Table 2.3. The selected genes can be categorized into the
following groups: fatty acid and lipid metabolism, transcription factors, signal pathways,
electron transport chain and oxidative phosphorylation, and ion transporter. A PLS model
with two latent variables was constructed to predict intracellular TG based upon the
expression level of the 59 genes. As shown in Figure 2.5B, intracellular T0 was
predicted well using the GA/PLS model (mean square error is 0.0425).

14-

Frequen cy

ll|llll|' will] ‘1 "

 

 

1 9 17 25 33 41 49 57 65 73 81 89 97105113122130140150

Genes

A

FIGURE 2.5: TG gene selection and metabolic function prediction. 2.5A) TG gene frequency.
GA/PLS was run 14 times with different initial populations. The ﬁ'equency at which each gene appeared in
the 14 solutions was counted and plotted. These genes that appeared with a ﬁ'equency higher than 8 were
selected into the ﬁnal gene subset (shown in Table 2.3) for prediction and pathway reconstruction

24

 

 

 

 

 

 

 

0.8 . .
. 5 GT6 observation
0.6 _* ____________________ ’ _____________________ .' _______________ ATG prediction
‘ i ,A i i
0.4 - ---------------------- 7 --------------------- :g-[H-"l ------------- y ---------------------- g -----------------------
0: ' i i
”-1 ‘ """""""""""" ;1 mo] """"""""""""""""""" """"""""""" """"""""""""
4 0 ' . i i
2 . s a
3-0.2 ~+- ----------------------
3' [LO] §
-04 - -------------------------------------------------------------------- é --------------------- § ------ IEAI ---------
E A
no - ------------------------------------------------------------------- § ----------------------
-03
Experlmertd condition 3

FIGURE 2.5: TG gene selection and metabolic function prediction.. 2.5B) PLS prediction of TG. The
level of TG accumulation was predicted based upon the expression level of the genes selected by GA/PLS.
[H,A], [L,O] was used for training, [H,O] was used for monitoring, and [H,A] was used for testing and
validating the PLS model. The mean square error of the PLS prediction model was 0.0425.

Table 2.3 A. Genes selected for intracellular triglyceride

 

 

Functional Accession
Category Number Gene name Frequency
Fatty acid and lipid
metabolism
090109 mRNA for long-chain acyl-CoA synthetase (EC 6.2.1.3) 8
J03960 5-lipoxygenase mRNA, complete cds 8
X04979 gene for apolipoprotein E /cds=(23,958) 11
M16235 Rat hepatic lipase mRNA 8
X78593 mRNA for glycerol-3-phosphate dehydrogenase 13
long chain acyl-CoA dehydrogenase (LCAD) mRNA,
J05029 complete cds 9
AF017251 phospholipase D (PLDs) mRNA, complete eds 11
ABOO4329 mRNA for acetyl-CoA carboxylase, complete eds 12
U03763 phospholipase mRNA, complete cds 8
L06040 12-lipoxygenase mRNA, complete cds 8
cholesterol side-chain cleavage enzyme mRNA
J05156 (P4508CC), complete cds 8

25

Table 2.3A (continued)

J05460

X54096

Transcription factor

M88592
M6061 7
X54250

L09647

lon transporter

090048
016101
U78090

M74494
M5921 1
J04629

Cholesterol 7-alpha-hydroxylase mRNA, complete eds
X54096 Rat mRNA for lecithin-cholesterol acyltransferase
(EC 2.3.1.43) (LCAT)

Peroxisome proliferator activated receptor(PPAR) mRNA
Rat CCAAT binding transcription factor-B subunit

mRNA for zinc ﬁnger protein AT-BP2

hepatocyte nuclear factor 3a (HNF-3 beta) mRNA,
complete eds

Na+,K+—ATPase(EC3.6.1.3) beta2 subunit gene
mRNA for sodium/glucose cotransporter

Potassium channel regulator 1 mRNA, complete eds
sodium/potassium ATPase alpha-1 subunit truncated
isofonn mRNA,

Potassium channel Kv3.2b mRNA
(Na+, K+)—ATPase-beta-2 subunit mRNA

Electron transport chain Oxidative phosphoration

K00996

K01721
M20131

U39943

AF0001 39

Signal pathways

Others

U93306
M80784
U1 1681
U48596
083538

D10891
D6341 1
U37099
X06889
Z1 1932
L10072
M60655

M1 5523
J04486
L27061

J02646
010354
J03865
X07467

RATCYP45E Rat cytochrome p-450e (phenobarbital-
induced) mRNA, 3 end

RATCYP45A Rat P-450(1) variant (phenobarbital-inducible)
cytochrome 3 end

RATCYP45| Rat cytochrome P450llE1 gene, complete cds
cytochrome P450 monooxygenase (CYP2J3) mRNA,
complete eds

25-hydroxyvitamin 0 1-hydroxylase (CYP1) mRNA,
complete eds

VEGF receptor-2/FLK-1 mRNA

type III TGF-beta receptor mRNA, complete eds
Rapamycin and FKBP12 target-1 protein (rRAFT1) mRNA
MAP kinase kinase kinase 1 (MEKK1) mRNA

mRNA for 230kDa phosphatidylinositol 4-kinase

mRNA for metabotropic glutamate receptor mGluR5,
complete eds

mRNA for mitochondrial precursor receptor, complete cds
small GTP-binding protein (rab3c) mRNA, partial cds

Rat res-related mRNA rab3

mRNA for vasopressin V2 receptor

5-hydroxytryptamine receptor (5HT5a) mRNA

alpha-1 B adrenergic receptor mRNA

protein kinase C-family related mRNA, partial eds, clone
RP16

insulin growth factor-binding protein mRNA
Phosphodiesterase mRNA 3 end

Rat translational initiation factor (elF-2) alpha subunit
mRNA

mRNA for alanine aminotransferase

serine dehydratase (SDH2) gene, 3 ﬂank
mRNA for glucose-6-phosphate dehydrogenase (Gd, EC

26

A
OD

cocoon

—l —b
o°°o

OCDOO

10
11

12

Table 2.3A (continued)

 

 

 

1.1.1.49)
AB006137 FT A mRNA for alpha 1.2-fucosyltransferase 9
X12459 mRNA for argininosuccinate synthetase (EC 6.3.4.5) 8
U39044 cytoplasmic dynein intermediate chain 2A mRNA 9
M64033 secretin gene, complete cds 9
M27433 germinal histone H4 gene 9
NAD-dependent 15-hydroxyprostaglandin dehydrogenase
U44750 mRNA 8
X62660 mRNA for glutathione transferase subunit 8 8
mRNA for hepatic mierosomal UDP-
Y00156 glucuronosyltransferase (UDPGT) 10
AF012891 frizzled related protein frpAP mRNA 10
U18982 fos-related antigen 2 (fra-2) mRNA, complete cds 9
cyclophilin 0 mRNA, nuclear gene encoding mitochondrial
U68544 protein 10
Table 2.3 B. Genes selected for urea synthesis
Functional Accession
Catggory Number Gene name Frequency
Urea cycle
013978 m RNA for argininosuccinate lyase 12
X12459 argininosuccinate synthetase (EC 6.3.4.5) 11
K03041 Ornithine carbamoyltransferase mRNA 8
227513 gene for carbamoylphosphate synthase I, exon 38 1O
M18467 mitochondrial aspartate aminotransferase mRNA 9
gluconeogensis
U32314 Pyruvate carboxylase mRNA 9
phosphoenolpyruvate carboxykinase (GTP) gene,
K03243 exons 1-3 9
L37333 Glucose-6-phosphatase (G6Pase) mRNA 1 1
X58865 PFK-L mRNA for liver phosphofructokinase 9
TCA cycle
AF093773 Cytosolic malate dehydrogenase mRNA 9
Transcription factor
X12752 gene for DNA binding protein C/EBP 12
877528 rNFIL-6=C/EBP-related transcription factor 12
X55955 Hepatocyte nuclear factor 3A (HNF-3A) 10
Electron transport chain Oxidative phosphoration
cytochrome P450 monooxygenase (CYP2J3) mRNA,
U39943 complete cds 8
RATCYP45A Rat P-450( 1) variant (phenobarbital-
K01721 inducible) cytochrome 3 end, ﬂank 10
mitochondrial cytochrome oxidase subunits l,ll, lll
J01435 genes, 9
diabetes-inducible cytochrome P450RLM6 [rats, liver,
$48325 mRNA Partial, 1093 nt] 8
E03344 cDNA sequence of peroxisome forming factor 10

27

Table 2.38 (continued)

K00750
$48325

Signal pathways

Others

U1 21 87
L271 29
083538
AF085693
L04672

L1 31 51

L22558
AFO41 246
M60655
A8009636

M95058

AFOOO1 39
AF01 2891

878284

Y001 56
ABOO4278
N000347

06341 1
J04807

L27061

X62660
X65296
U61 772
U35775
X66494
X86086
X62528

U57362
D49980

U53859
084418
M1 5883

J02646

L46791
X55286

AFO71495

Cytochrome c nuclear-encoded mitochondrial gene
diabetes-inducible cytochrome P450RLM6

ras-related protein (rad) mRNA

Stress activated protein kinase mRNA

mRNA for 230kDa phosphatidylinositol 4-kinase
GTPase-activating protein (GIT1) mRNA

G protein-coupled receptor mRNA, complete cds
GTPase—activating protein (GAP) gene, complete cds
adenylyl cyclase-activated serotonin receptor (5-HT7)
mRNA

orexin receptor-2 mRNA
alpha-1 B adrenergic receptor mRNA
mRNA for phosphoinositide 3-kinase

steroid 5-alpha-reductase 2 mRNA
25-hydroxyvitamin D 1-hydroxylase (CYP1) mRNA,
complete cds

frizzled related protein frpAP mRNA
bcl-xshort=apoptosis inducer [rats, ovary, mRNA
Partial, 537 nt]

mRNA for hepatic microsomal UDP-
glucuronosyltransferase (UDPGT)

mRNA for protocadherin 2, partial cds

mRNA for 3 (2 ),5 -bisphosphate nucleotidase
mRNA for mitochondrial precursor receptor, complete
cds

insulin ll gene mRNA, 3 end

Phosphodiesterase mRNA, 3 end

mRNA for glutathione transferase subunit 8

mRNA for carboxylesterase (Es-HVEL)

U61772 Rattus norvegicus merlin (NF2) mRNA
U35775 Rattus norvegicus gamma-adducin mRNA
X66494 R.norvegicus CHOT1 mRNA
X86086 R.norvegicus RNA for annexin VI
X62528 R.rattus mRNA for ribonuclease inhibitor
U57362 Rattus norvegicus collagen Xll alpha 1
(Col12a1) mRNA

D49980 Rat gene for 8-oxo-dGTPase

U53859 Rattus norvegicus calpain small subunit (css1)
mRNA

084418 Rat mRNA for chromosomal protein HMGZ
M15883 Rat clathrin light chain (LCBZ) mRNA
J02646 Rat translational initiation factor (elF-2) alpha
subunit mRNA

L46791 Rattus norvegicus cholesterol esterase mRNA,
complete cds
X55286 R.norvegicus mRNA for HMG-CoA reductase
type II pneumocyte CD36-related class B scavenger
receptor (SRBi R) mRNA

A
oooco

CDCDQCOCD

cocoa“:

10

 

28

To assess the relevance of the genes that were selected by the GA/PLS framework
to predict the speciﬁed metabolic function(s), we evaluated the role of these genes in
connection with these functions by way of a literature review. Many of the genes selected
by GA/PLS were closely related to enzymes that are involved in reactions that affect TG
levels or its precursors. Gene M16235 encodes hepatic lipase enzyme which reduces TG
to glycerol and fatty acids. Gene D90109 and J 05029 encode long-chain aeyl-CoA
synthetase and dehydrogenase enzymes, respectively. These two enzymes catalyze the
initial reactions of fatty acid B-oxidization and reduce the level of long chain aeyl-CoA
which is an important precursor in TG synthesis. Similarly, genes involved in mierosomal
oxidation and fatty acid synthesis enzymes were also identiﬁed. The P450 family of
genes (such as M20131), encode mierosomal oxidase enzyme, which is involved in the
oxidation of fatty acid to a-hydroxy fatty acids.

Conversely, gene ABOO4329 encodes aeetyl-CoA carboxylase enzyme, which
catalyzes the committed step in fatty acid synthesis to produce malonyl-CoA from acetyl-
CoA. The level of fatty acid affects the level of fatty acyl-CoA, which in turn affects TG
accumulation. Phosphatidie acid is another important precursor of TG. Gene AF 017251
and U03763 encode the phospholipase D and phospholipase A2 enzymes, respectively,
which reduce phosphatidyleholine to phosphatidic acid. Another gene X78593 encodes
the glycerol-3-P dehydrogenase enzyme which is involved in the ﬁrst committed pathway
to produce phophatidic acid from dihydroxyacetone-P.

In addition, GA/PLS also identiﬁed genes involved in the signal pathways that are

relevant to TG accumulation, notably, the phosphodiesterase (L27061) and protein kinase

29

C (M15523) genes. Insulin activates phosphodiesterase to reduce the CAMP level and
thus counteracts glucagon's ability to activate TG lipase to hydrolyze TG. Protein kinase
C (PKC) is involved in Ca2+ related signal transduction. A high level of intracellular Ca2+
activates PKC, which attenuates insulin's action and activates lipolysis [Idn's 2001].
Interestingly, three genes (D90048, M74494, 104629) encoding Na+, K+-ATPase were
also identiﬁed. The activities of Na+, K+-ATPase are signiﬁcantly decreased in plasma
membranes of dietary obese rats [Izpisua 1989]. The reason for this down-regulation is
currently not known. Although, Na+, K+-ATPase inhibition has been found to be involved
in increasing Ca2+ level [Xie 2002].

The model also identiﬁed genes encoding transcription factors that regulate lipid
metabolism. Gene M88592 encodes the transcription factor PPAR-a, which has been
found to regulate the expression of many enzymes involved in fatty acid oxidation and
synthesis, e. g. fatty acid synthetase, fatty acid oxidase, etc. [Smith 2002]. It acts as a lipid
sensor to modulate cellular capacity for fatty acid oxidation and synthesis. Gene M60617
encodes the CCAAT binding transcription factor. C/EBP family of transcription factors
are involved in acute phase and inﬂammatory response of the liver as well as lipid
metabolism. This transcription factor regulates fatty acid synthase (FAS) expression
[Roder 1999] and activates genes of cholesterol and fatty acid metabolism with sterol
regulatory element-binding protein (SREBP) as a co-regulator [Magana 2000].

Indeed, many of the genes selected by the GA/PLS model are closely related to
TG metabolism. Based upon the current knowledge from the literature, we reconstructed
the role of these genes and gene products in the regulating TG metabolism. For

illustration purposes, only the genes enumerated above and their role in regulating TG

3O

accumulation are shown in Figure 2.6. This GA/PLS framework provides a mechanism to
identify potential areas (pathways) for further study in optimizing, in the current study,

minimizing cellular TG accumulation.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Phosphatidie acid Choline
AFO|725I
; x;
PLDl p’ i
K PPAR cm:
NA K Diacyl-glycerol phosphatidyleholine . jw
\
ATpase \ ’1
\ ,/
I Transcriptional
' reanlation
N‘K ”" Intra TG Insulin :
ATPase I g...) 1‘chch A
l
31c lipase 1' ﬂ diesterase >- ABOO4329 . __
glycm' Acyl-CoA Lc-ACS , Fatty acid F AS H Malonyl-CoA
I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

l

’\ Lc-acyl-coa < Microsomal F-
, G3pdh A D90109 oxidase

dehydrog . .. __

X73533/ r T
631’ Trans enoyl CoA I a-hydroxy fatty acid

O l: O

Enzyme or transcription factor Metabolites Genes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

FIGURE 2.6: Pathway reconstruction for TG. The pathways were reconstructed by identifying the
biological function of the genes selected by GA/PLS based upon current literature knowledge. The
elements of the pathways include gene, enzyme or gene products, and metabolites. The genes include lipid
and fatty acid metabolism genes, PPAR-a gene, P450 genes, and Nath-ATPase genes. These genes are
known in literature to affect TG metabolism. Dotted line indicates an indirect relationship.

Genes Selected for Urea Synthesis
GA/PLS also was applied to select genes relevant to urea synthesis. It selected 57

genes which are listed in Table 2.3B. Based on the expression level of these 57 genes, the

31

GA/PLS model predicted urea synthesis very well (with a mean square error of 0.0125)
as illustrated in Figure 2.7. The role of these genes in regulating urea synthesis was

investigated by reviewing the published literature.

 

 

 

 

 

 

 

 

 

1 E E a E E i ' urea wwﬁm
: : 5 : 5 A urea predichon
0.8 - ----------------------- ----------- :~ ---------- 3- -----------
0.6 .1 ....................... ...................... é .......... 1
i i 5 M1
0.4 -* -----------
“‘4'“ E
5 01 ~ ------------------------ g ----------------------- ¢ ---------- . ----------- -----------------------------------
g 0 . . . . r
g s : . z
-0.2 _,t, ............................................... :. ...................................
A
.114 -1 ...................................................................... f. ...................................
I 9
.. ........................................... [15,0] ...............................
-06 ["10] §
-0.8 ' '
Experimertal condtion

FIGURE 2.7: PLS prediction of urea synthesis. The level of urea synthesis was predicted based upon the
expression level of the genes selected by GA/PLS. [H,A], [L,O] was used for training, [H,O] was used for
monitoring data, and [H,A] was used for testing and validating the PLS model. The mean square error of
the PLS prediction model was 0.0125.

Urea cycle is essential for proper disposal of ammonia in mammals. The cycle is
mainly regulated by ﬁve enzymes: carbamoyl phosphate synthetase-I (CPS-I; EC
6.3.4.16), ornithine transcarbamylase (OTC; EC 2.1.3.3), argininosuccinate synthetase
(AS; EC 6.3.4.5), argininosuccinate lyase (AL; EC 4.3.2.1), and arginase (EC 3.5.3.1). It
is encouraging that the genes encoding four of the enzymes CPS-I (227513), OTC

(K03041), AS (X12459) and AL (D13978) were selected by GA/PLS.

32

An upstream enhancer required for liver-speciﬁc expression and hormonal
induction of the rat CPS-I gene has been identiﬁed. This enhancer region was revealed to
contain DNA elements that bind the transcription factors such as hepatocyte nuclear
factor 3 (HNF3) and C/EBP. Detailed transcriptional regulation of urea synthesis has
been reviewed in reference (24-26). It is interesting that GA/PLS identiﬁed two genes
(X12752 and 877528) that encode transcription factors C/EBP and C/EBP-related
transcription factors, respectively, and one gene X55955 that encodes the transcription
factor HNF30L. In addition, a C/EBP binding element has been found in the OTC and
arginase genes [Meijer 2000, Morris 1992, 2002].

GA/PLS also uncovered the coupling between urea cycle, TCA cycle and
glueoneogenesis. GA/PLS identiﬁed a group of genes involved in gluconeogenesis and
TCA cycle that are relevant to urea synthesis. The genes include pyruvate carboxylase
(U32314), phospheoenolpyruvate carboxykinase (K03243), g1ucose-6-phosphatase
(G6Pase) (L37333), aspartate aminotransferase (M18467), cytosolie malate
dehydrogenase (AF093773) and phosphofructokinase (X58865) genes. Parallel increases
in gluconeogenesis and urea synthesis have been found in rat liver during prolonged
starvation or by hormonal stimulation [Martin-requero 1992]. The anaplerotic effect of
increasing the ﬂux through pyruvate carboxylase may enhance the production of
aspartate, which drives urea synthesis by producing more arginosuccinate.

Using the genes highlighted above, we reconstructed the pathways involved in
regulating urea synthesis and illustrated in Figure 2.8. Urea synthesis is regulated by
genes encoding the enzymes of urea cycle, gluconeogenesis, TCA cycle and transcription

factors, such as C/EBP and HNF families.

33

 

002+NH4+ATP

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

< M18467 >

 

 

 

 

 

f

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[Phosphoenobymvate ]

 

,_(
PC

 

 

 

 

1
CPS-I ~69 LAspartate v i PEPCK
t
Carbamoyl < AST >
' ' Citruline Oxalaeetate
/ , As
OTC _
K03041 x12459
I UREA argninosuccinate TCA
Transcriptional ‘rgin'se CYCLE CYCLE
regulation j_\
\ \ @ A.
CEBP 5 HNF , ,
A ‘\ argmmase
I \ ‘- \ \ fumarate

O

Enzyme or transcription factor

FIGURE 2.8: Pathway reconstruction for urea synthesis. The pathways were reconstructed by
identifying the biological function of the genes selected by GA/PLS. The elements of the pathways include
gene, enzyme or gene products, and metabolites. The genes include urea cycle genes, C/EBP and HNF
genes, glueoneogenie and TCA cycle genes. These genes are known in the literature to affect urea synthesis.

 

 

 

Dotted line indicates an indirect relationship.

2.3.4 Model Robustness Evaluation

Gene expression data is typically noisy, therefore, we evaluated the effect of noise

C]

Metabolites

 

 

 

0

Genes

1

addition to the gene expression data on our model results. To test the robustness of the

model, 10% noise was artiﬁcially added to the gene expression data. Noise in the form of

a Gaussian distribution was generated according to the following equation,

Noise=N*o;,*R

34

where N is a vector of random numbers of length n following a normal distribution, the o;
is the standard deviation of the data, and the R is any number between [0,1]. The
robustness of the GA/PLS framework was evaluated by comparing the genes selected
before and after noise addition.

The genes selected after noise addition are enumerated in Table 2.4A and Table
2.43 for intracellular TG and urea production, respectively. 31 of the 59 genes remain
selected for TG and 34 of the 57 genes remain selected for urea synthesis (highlighted in
bold, Table 2.4). Of the 59 genes selected for TG before noise addition, 14 were from the
supervised group (the 45 genes added aﬁer t—test). It is notable that 11 of the 14 genes
selected from the supervised group remained the same after noise addition. Similarly, 12
of the 13 genes selected from the supervised group remained for urea synthesis. In the
case of TG, most of the fatty acid and lipid metabolism genes such as long-chain fatty
aeyl-CoA synthetase and dehydrogenase, phospholipase, hepatic lipase, acetyl-CoA
carboxylyase genes remain selected. In addition, the CCAAT binding transcription factor,
Na+,K+ ATPase and P450 family genes were also selected after noise addition. In the case
of urea synthesis, the urea cycle enzymes, e.g. AS, OTC, CPS-I, aspartate
aminotransferase genes; and transcription factors genes, C/EBP and HNF; and
glueoneogenesis enzyme genes, such as pyruvate carboxylase, G6Pase,
phosphofructokinase and phosphoenolpyruvate carboxykinase genes remain selected
after noise addition. This suggests that GA/PLS framework is robust to a certain level of
noise in the gene expression data. Nevertheless, several genes such as the PPAR gene for

TG and AL gene for urea synthesis disappeared after noise addition.

35

Table 2.4 A. Genes selected for intracellular triglyceride after noise addition

 

 

Functional Accession
Category Number Gene name Frequency
Fatty acid and lipid
metabolism
cholesterol side-chain cleavage enzyme mRNA
J05156 (P45OSCC) 9
mRNA for long-chain acyl-CoA synthetase (EC
D90109 6.2.1.3) 8
M16235_at M16235 Rat hepatic lipase mRNA 11
X78593_at mRNA for glycerol-3-phosphate dehydrogenase 13
RATACOADA Rat long chain acyl-CoA
J05029_s_at dehydrogenase (LCAD) mRNA 9
mRNA for 88kDa-diaeylglyeerol kinase (DGK-
DS8448_at Ill), complete eds 10
AF017251_at phospholipase D (PLDs) mRNA, complete eds 11
mRNA for lecithin-cholesterol acyltransferase
X54096_at (EC 2.3.1.43) (LCAT) 9
mRNA for aeetyl-CoA carboxylase, complete
ABOO4329_at eds 1 2
L06040_s_at 12-lipoxygenase mRNA, complete cds 9
mRNA for glucose-6-phosphate dehydrogenase
X07467_at (Gd, EC 1.1.1.49) 11
Transcription factor
RRATBP2 Rat mRNA for zinc ﬁnger protein AT-
X54250 BP2, partial eds 9
CCAAT binding transcription factor-B subunit
M60617 (CBF-A1 1) mRNA 8
Ion transporter
Na+, K+ -ATPase (EC 3.6.1.3) beta2 subunit
090048 gene and 5 ﬂank 11
(CD-1) clone Ke1 Na-Ca exchanger mRNA,
U04934 partial eds 8
Potassium channel regulator 1 mRNA, complete
U78090 eds 10
Electron transport chain Oxidative phosphoration
E03344 cDNA sequence of peroxisome forming factor 8
cytochrome c nuclear-encoded mitochondrial
K00750 gene and ﬂanks 9
P-450(1) variant (phenobarbital-indueible)
K01721 cytochrome 3 end, ﬂank 8
U39207 Cytochrome P450 4F5 (CYP4F5) mRNA 8
Signal pathways
U93306 VEGF reeeptor-2/FLK-1 mRNA 9
M80784 type III TGF-beta receptor mRNA, complete eds 9
U48596 MAP kinase kinase kinase 1 (MEKK1) mRNA 8
083538 mRNA for 230kDa phosphatidylinositol 4-kinase 8
A09811 mRNA for BRL-3A binding protein 8
ADP ribosylation factor GTPase-activating
AF085693 protein (GlT1) mRNA 10

36

Table 2.4A (continued)

Others

L04672

L22558
M60655
J04486
M1 4053

M1 5523_s_at

AF0001 39
M95058

$78284
U1 8982
YOO‘I 56

ABO10154
ABO11530

D1 3978
D7921 5
X62660
X65296
M27433

U44750

J05031

U35775
X66494
X86086

X91234
U53859

J02646

G protein-coupled receptor mRNA, complete
eds

adenylyl eyelase-activated serotonin receptor
(5-HT7) mRNA

Rat alpha-1 B adrenergic receptor mRNA
insulin growth factor-binding protein mRNA
Rat glucocorticoid receptor mRNA

Rat protein kinase C-family related mRNA,
partial eds, clone RP16

25-hydroxyvitamin D 1-hydroxylase (CYP1)
mRNA, complete eds
steroid 5-alpha-reduetase 2 mRNA
bel-xshort=apoptosis inducer [rats, ovary, mRNA
Partial, 537 nt]

fos-related antigen 2 (fra-2) mRNA, complete
eds

mRNA for hepatic mierosomal UDP-
glucuronosyltransferase (UDPGT)

PKN mRNA for serinlthreonine protein kinase
expressed in hippocampus

mRNA for MEGF4, complete eds

mRNA for argininosuccinate lyase, complete
eds

mRNA for FGF—10, complete eds

mRNA for glutathione transferase subunit 8
mRNA for carboxylesterase (Es-HVEL)
germinal histone H4 gene

MAD-dependent 15-hydroxyprostaglandin
dehydrogenase mRNA

Rat isovaleryl-CoA dehydrogenase (IVD)
mRNA

gamma-adducin mRNA

CHOT1 mRNA

RNA for annexin VI

mRNA for 17-beta hydroxysteroid
dehydrogenase type 2

calpain small subunit (css1) mRNA

Rat translational initiation factor (elF-2) alpha
subunit mRNA

oooocooo (D

(O

A
cococooooo coo

A
oeuro co

once on

 

37

Table 2.4 B. Genes selected for urea synthesis after noise addition

 

 

Functional Accession
Category Number Gene name Frequency
Urea cycle
X12459 mRNA for argininosuccinate synthetase (EC 6.3.4.5) 8
M18467 mitochondrial aspartate aminotransferase mRNA 9
mRNA RATOTCB Rat (Sprague-Dawley) ornithine
K03041 carbamoyltransferase mRNA 8
227513 gene for carbamoylphosphate synthase l, exon 38 11
gluconeogensis
glucose-6—phosphatase (G6Pase) mRNA, complete
L37333 eds 9
X58865 mRNA for liver phosphofructokinase 9
U32314 pyruvate carboxylase mRNA 12
Rat phosphoenolpyruvate carboxykinase (GTP)
K03243 gene, exons 1-3 1 2
mRNA for glucokinase, alternatively spliced GK2 (EC
X53588 2.7.1.1) 11
J05446 Rat glycogen synthase mRNA, complete cds 9
TCA cycle
cytosolie malate dehydrogenase (Mdh) mRNA,
AF093773 complete eds 9
Transcription factor
peroxisome proliferator activated receptor (PPAR)
M88592 mRNA, 8
J03170 liver speciﬁc transcription factor (LF-Bt) gene 9
X12752 gene for DNA binding protein C/EBP 11
X55955 mRNA for hepatocyte nuclear factor 3A (HNF-3A) 12
rNFIL-6=C/EBP-related transcription factor [rats,
$77528 Genomic/mRNA, 1759 nt] 9
Electron transport chain Oxidative phosphoration
RATCYP45E Rat cytochrome p~450e (phenobarbital-
K00996 induced) mRNA, 3 end 10
RATCYP45A Rat P-450(1) variant (phenobarbital-
K01721 inducible) cytochrome 3 end, ﬂank 8
E03344cds cDNA sequence of peroxisome forming
E03344 factor 9
cytochrome P450 monooxygenase (CYP2J3) mRNA,
U39943 complete eds 10
Signal pathways
U12187 ras-related protein (rad) mRNA 9
U48596 MAP kinase kinase kinase 1 (MEKK1) mRNA 10
083538 mRNA for 230kDa phosphatidylinositol 4-kinase 9
AF082126 aryl hydrocarbon receptor (AHR) mRNA 8
ADP ribosylation factor GTPase-activating protein (GlT1)
AF085693 mRNA 12
mRNA for metabotropic glutamate receptor mGluR5,
010891 complete eds 12

38

Table 2.48 (continued)

Others

06341 1
J04807
U37099
X06889
Z1 1932

L22558
L1 0072
J04486

A3009636

M14053

M55417

090048
M95058

U1 1681
878284
U1 8982

Y00156

ABOO4278

ABO10154

$52878

$76404
U04934

X62660
E02468

E01 884
M27433
X86086
X62528
U53859

ABOOG1 37

0901 09
J02646

L467 91
X55286

AF071495

Rat mRNA for mitochondrial precursor receptor,
complete eds

Rat insulin II gene mRNA, 3 end

small GTP-binding protein (rab3c) mRNA, partial eds
ras-related mRNA rab3

mRNA for vasopressin V2 receptor

adenylyl cyclase-activated serotonin receptor (5-HT7)
mRNA

5-hydroxytryptamine receptor (5HT5a) mRNA

insulin growth factor-binding protein mRNA

mRNA for phosphoinositide 3-kinase

Rat glucocorticoid receptor mRNA

Rat protein kinase C-gamma (PRKC-gamma) gene, exon
1

Rat Na+, K+ -ATPase (EC 3.6.1.3) beta2 subunit gene
and 5 ﬂank

M9505800mpleteSeq Rattus rattus steroid 5-alpha-
reductase 2 mRNA

U11681 Rattus norvegicus rapamycin and FKBP12
target-1 protein (rRAFT1) mRNA

apoptosis inducer [rats, ovary, mRNA Partial, 537 nt]
fos-related antigen 2 (fra-2) mRNA, complete eds
mRNA for hepatic mierosomal UDP-
glucuronosyltransferase (UDPGT)

mRNA for protocadherin 2, partial eds

mRNA for serinlthreonine protein kinase expressed in
hippocampus, partial eds

fatty acid-binding protein homolog [rats, ileum, mRNA,
460 nt]

37640182 beta -HKA=H,K-ATPase beta-subunit [rats,
Genomic, 8983 nt, segment 2 of 2]

clone Kc1 Na-Ca exchanger mRNA, partial eds
X62660m RNA RRGTSB R.rattus mRNA for glutathione
transferase subunit 8

E02468cds DNA sequence of rat TNF

E01884eds DNA sequence coding for rat IL-1-
beta(interleukin-1 beta)

M27433 Rattus norvegicus germinal histone H4 gene
X86086 R.norvegicus RNA for annexin VI

mRNA for ribonuclease inhibitor

calpain small subunit (csst) mRNA

mRNA for alpha 1,2-fucosyltransferase

Rat mRNA for long-chain acyI-CoA synthetase (EC
6.2.1.3)

J02646 Rat translational initiation factor (elF-2) alpha
subunit mRNA

L46791 Rattus norvegicus cholesterol esterase mRNA,
complete eds

mRNA for HMG-CoA reductase

type II pneumocyte C036-related class B scavenger
receptor (SRBt R) mRNA

39

10

11

cocooocoooco coco coco on (D

co

Table 2.48 (continued)

cholesterol side-chain cleavage enzyme mRNA
J05156 (P4SOSCC), complete cds 9

 

2.4 Discussion

We applied a GA/PLS framework to gene expression and metabolic data to identify a
subset of genes whose expression levels could predict metabolic functions. Subsequently,
part of the regulatory pathways was reconstructed based upon the functional information
of the genes. The selection of relevant genes was obtained using GA guided by a ﬁtness
function (equation 2.7) that was deﬁned by the prediction error and the number of latent
variables. No a priori information was considered in equation 2.7. The gene selection
process was based solely upon the gene expression and metabolic data. This provides a
framework to uncover information from the data itself.

However, the information extracted by GA/PLS is redundant for both intracellular
TG and urea synthesis, i.e., many of the selected genes are not relevant to these cellular
functions. It was observed that the genes selected from the supervised group (45 genes
added after t-test) were more relevant to these cellular functions than from the
unsupervised group (1 11 genes selected by t-tcst). To reduce the redundant information,
an approach that weighs the genes based on a priori knowledge may be better suited. If
one knows that some genes are related and others are not related to a metabolic function,
this information can be incorporated into the ﬁtness function by adding a factor that
awards the inclusion of relevant genes and punishes the inclusion of irrelevant genes.
Each gene is assigned a score based upon its relevance to the metabolic ﬁrnction based

upon the literature. An overall gene relevance score (Gs in equation 2.10) can be obtained

40

by summing up the score of each individual gene and integrating it into a ﬁtness function
as follows:

1
26,- -y,-)2 +LVW1

 

ﬁtness = + Gs w2 (2.10)

To illustrate the ability of this ﬁtness ﬁmetion (equation 2.10) to incorporate domain
knowledge, we set up a hypothetical example. In this hypothetical example, we included
only the genes selected by t-test and assumed we know a priori that genes (D90109,
J04087, M88592, U37099) are relevant to intracellular TG accumulation and that the
other genes are not relevant. Therefore, a score of one was assigned to genes (090109,
104087, M88592, U37099), and a score of zero was assigned to all the other genes listed
in Table 2.1 for intracellular TG. The parameter w] and w2 were assigned values of 0.4
and 0.1, see methods for determination of the w] and w2, so that the prediction error,
number of latent variables and gene score are normalized to the same scale. Using this
new ﬁtness function in the GA/PLS model with the same gene expression and metabolic
data, we evaluated the robustness of the GA/PLS model to the addition of 10% noise in
the gene data. The genes selected before and after noise addition are listed in Tables 2.5A
and 2.53, respectively. From the results in Table 2.5, we note that the genes assigned a
relevance score of one were selected before and after noise addition and at a high
frequency (i.e., selected each time the model was run, regardless of the initial population),
which suggests that incorporating a priori knowledge of the relevance of the genes into
the ﬁtness function (equation 2.10) ensures that their selection is not affected by the noise
in the data. In addition to the genes that were deﬁned a priori to be relevant, other genes

such as P450 genes, Na+, K+-ATPase genes, which are not deﬁned a priori as relevant,

41

were also selected by the GA/PLS model with this new ﬁtness function, both before and
after noise addition. Therefore, by deﬁning an appropriate ﬁtness function, the GA/PLS
framework was able to incorporate available domain knowledge into the model as well as
able to recognize genes relevant to a metabolic ﬁmction without a prior knowledge. It is
possible that the relevance of a number of genes to a cellular function may currently not
be known. Therefore an advantage of not including a priori knowledge allows GA/PLS
model to be naive to any biases in the user and in turn may allow the uncovering of genes

that may be important and to be investigated further.

Table 2.5A. Genes selected for intracellular TG using fitness function in
equation 2.10

 

Function Accession
Category Number Gene name Freq Score
Lipid metabolism

 

mRNA for long-chain aeyl-CoA synthetase (EC

090109 6.2.1.3) 14 1
L27061 Phosphodiesterase mRNA 11 0
Transcription factor
peroxisome proliferator activated receptor (PPAR)
M88592 mRNA 14 1
J0317O liver speciﬁc transcription factor (LF-Bt) gene 10 0
CCAAT binding transcription factor-B subunit (CBF-
M60617 A1 1) mRNA 9 0
X54250m RNA RRATBPZ Rat mRNA for zinc ﬁnger
X54250 protein AT-BP2 12 0
ion transporter
J04629 (Na+, K+)—ATPase-beta-2 subunit mRNA 9 0
sodium/potassium ATPase alpha-1 subunit
M74494 truncated isoform mRNA 12 0
Electron transport chain Oxidative phosphoration
K00996 cytochrome p-450e (phenobarbital-induced) mRNA 8 0
cytochrome c nuclear-encoded mitochondrial gene
K00750 and ﬂanks 9 0
mitochondrial cytochrome oxidase subunits |,ll, lll
J01435 genes, 11 0
U39207 cytochrome P450 4F5 (CYP4F5) mRNA 8 0
E03344 cDNA sequence of peroxisome forming factor 8 0
Signal pathways
U93306 VEGF receptor-2/FLK-1 mRNA 14 0
M80784 type III TGF-beta receptor mRNA 13 0

42

Table 2.5A (continued;

Others

U68544
083538
J04807
U37099

AF085693

X74833

L22558
M60655
X04979

Y001 56
J03960

859893
U75923
M64033
M27433
J05031

X66494
X80477
X86086

X91234
U53859

J02646

clophilin 0 mRNA, nuclear gene encoding
mitochondrial protein

mRNA for 230kDa phosphatidylinositol 4-kinase
RATINSIIA Rat insulin ll gene mRNA

small GTP—binding protein (rab3c) mRNA

G protein-coupled receptor kinase-associated ADP
ribosylation factor GTPase-activating protein (GlT1)
mRNA

X mRNA for acetylcholine receptor beta-subunit
adenylyl cyclase-activated serotonin receptor (5-
HT7) mRNA

alpha-1 B adrenergic receptor mRNA
X04979 Rat gene for apolipoprotein E

mRNA for hepatic mierosomal UDP-
glucuronosyltransferase (UDPGT)
5-Iipoxygenase mRNA

La=autoantigen SS-B/La

isoleucyl tRNA synthetase mRNA

Rat secretin gene

Gerrninal histone H4 gene
lsovaleryl-CoA dehydrogenase (IVD) mRNA
X66494 R.norvegicus CHOT1 mRNA
X80477 R.norvegicus P2X mRNA
X86086 R.norvegicus RNA for annexin Vl
X91234 R.norvegicus mRNA for 17-beta
hydroxysteroid dehydrogenase type 2
calpain small subunit (css1) mRNA

translational initiation factor (elF-2) alpha subunit
mRNA

12 O
8 0
14 1
14 1
8 0
12 0
9 O
13 0
9 0
8 O
11 0
9 O
9 0
13 0
1O 0
9 0
9 0
9 0
12 0
12 0
8 0
9 0

 

Table 2.58. Gene selected for intracellular T6 with ﬁtness function in
equation 2.10 after noise addition.

 

Function Accession

 

Category Number Gene name Freq Score
Lipid metabolism
mRNA for long-chain acyl-CoA synthetase (EC
090109 6.2.1.3) 14 1
L27061 phosphodiesterase mRNA 11 0
Transcription factor
peroxisome proliferator activated receptor (PPAR)
M88592 mRNA 14 1
CCAAT binding transcription factor-B subunit (CBF-
M60617 A11) mRNA 13 0
X54250m RNA RRATBP2 Rat mRNA for zinc ﬁnger
X54250 protein AT-BP2 10 0
Ion transporter
M22253 M22253 Rattus norvegicus sodium channel I mRNA 8 0
sodium/potassium ATPase alpha-1 subunit
M74494 truncated isoform mRNA 10 0

43

Table 2.58 (continued)
Electron transport chain Oxidative phosphoration

M20131

Signal pathways

Others

U93306
M80784

U68544
J04807
U37099
X74833

L22558
M60655
X04979
U53859

Y001 56
J03960

A801 1530

010891
848325

E01 884
U75923
M64033
M27433
U61 772

U44750
J05031

X66494
X80477
X86086

X91 234

J02646

cytochrome P450llE1 gene

VEGF receptor-2/FLK-1 mRNA

type III TGF-beta receptor mRNA

cyclophilin 0 mRNA, nuclear gene encoding
mitochondrial protein

RATINSIIA Rat insulin II gene mRNA

small GTP—binding protein (rab3c) mRNA
mRNA for aeetyleholine receptor beta-subunit
adenylyl cyclase-aetivated serotonin receptor (5-
HT7) mRNA

alpha-1B adrenergic receptor mRNA

X04979 Rat gene for apolipoprotein E

calpain small subunit (css1) mRNA

mRNA for hepatic mierosomal UDP-
glucuronosyltransferase (UDPGT)

5-Iipoxygenase mRNA

mRNA for MEGF4

mRNA for metabotropic glutamate receptor mGluR5
848325 diabetes-inducible cytochrome P450RLM6
DNA sequence coding for rat lL-1-beta(interleukin-1
beta)

isoleucyl tRNA synthetase mRNA

Rat secretin gene

Gerrninal histone H4 gene

merlin (NF2) mRNA

NAD—dependent 15-hydroxyprostaglandin
dehydrogenase mRNA

lsovaleryl-CoA dehydrogenase (lVD) mRNA
X66494 R.norvegicus CHOT1 mRNA

X80477 R.norvegicus P2X mRNA

X86086 R.norvegicus RNA for annexin Vl

mRNA for 17-beta hydroxysteroid dehydrogenase
type 2

translational initiation factor (elF-2) alpha subunit
mRNA

13

14
14
14
12

12
10
10

10

06000 OOOOO 0000 0—3-30 00

©0000

O

 

In this chapter, GA/PLS was used to ﬁnd a set of possible solutions rather than a
single solution. With this method, multiple solutions of different genes give similar
prediction accuracy. By selecting genes based upon their frequency of appearance in the
multiple runs, we explored the sub-gene-space for high prediction accuracy. The

probability of signiﬁcant features (important genes) appearing in the sub-gene-space was

44

estimated based upon their frequency. The probabilistic nature of this method improved
the robustness of the GA/PLS approach. Increasing the number of runs enhances the
effectiveness of the probabilistic nature of gene selection. Although, GA can not
guarantee global optimization, it is possible to improve the performance of GA by
combining global optimization methods, such as simulated annealing.

GA/PLS can be applied to group genes according to the metabolic function they
are involved in. As opposed to gene clustering, GA/PLS used both metabolic proﬁle and
gene expression data to functionally group genes according to a metabolic function. In
addition, one gene can be assigned to more than one group, which is not permitted with
clustering methods. Typically, a gene may be involved in multiple pathways and may
regulate different cellular functions. For example, transcription factors could affect more
than one gene, which in turn may be involved in a number of different pathways. The
grouping of genes according to metabolic functions facilitates the reduction of a complex
biological system with thousands of genes to a system of functional subgroups. This
could help facilitate the optimization of cellular function.

A subsequent step after identifying the gene subsets is reconstructing the
regulatory pathways based upon the gene expression and metabolic proﬁles. Data driven
reverse engineering methods such as Bayesian network analysis, which has been used
successﬁrlly in reconstructing metabolic sub-networks from ﬂux data, [Li 2004] could be
used to infer regulatory networks at both the genetic and metabolic levels as well as the
interaction between the two levels. The resulting model could be used to suggest
hypothetical pathways involved in regulating metabolic functions, and thus likely

pathways for further experimental studies.

45

The current model does not incorporate translational and post-translational effects
in the gene selection process and thus may be a possible reason why some of the
important genes disappeared after noise addition. Incorporating other data types, such as
protein expression proﬁle, protein-protein and protein-DNA interactions would enhance
the ability of modeling frameworks to predict cellular and physiological function more
accurately.

In conclusion, GA/PLS was applied for the ﬁrst time to quantitatively integrate
experimental gene expression and metabolic ﬂux proﬁles for a cellular system. The
results indicate that this method is able to select a subset of genes capable of predicting a
metabolic function. In addition, using the selected genes, we were able to reconstruct the

pathways involved in regulating a metabolic function.

46

CHAPTER 3 A BAYESIAN FRAMEWORK TO INFER
BIOLOGICAL PATHWAYS AND NETWORK

3.1 Introduction

The introduction of array technology has made readily attainable thousands of genes and
proteins for a given metabolic or physiological state, providing a wealth of data from
which regulatory and functional relationships may be unraveled. A major challenge
limiting the application of these data is the ability to integrate this information, from the
gene through the metabolic level, in such a way that it reconstructs the biological process.
Moreover, environmental factors, in addition to the genetic wiring, contribute to the
regulation and control of the connections within the biological system. Thus, new
approaches and mathematical tools for analyzing this information are needed. The
motivation behind reverse engineering of pathways and networks from experimental data
is the belief that the data contain the structure of the biological system or phenotype.
Therefore, the goal is to provide a framework to extract the network structure and
connections from the experimental data, thus capturing the effects and changes due to
external stimuli and its interactions with the genetic wiring. In chapter 2, we introduced
GA/PLS to identify important genes relevant to a cellular function. We then used
information from the literature to manually reconstruct the pathways for TG and urea
functions, which depended upon the known information currently available. In order to
discover pathways from the data itself, we introduce a Bayesian framework in this

chapter.

47

As a ﬁrst level attempt at reconstruction, numerous studies (Friedman et al., 2000;
Pe'er et al., 2001) endeavor to infer gene regulatory pathways and networks from micro-
array and simulated data from prokaryotes and yeasts. Typically networks are not known
a priori, thus the validity and accuracy of these reconstructions cannot be accessed.
Instead, the reconstructed networks are used to suggest biological hypotheses.
Alternatively, our goal is to develop a mathematical framework that infers biochemical
pathways and networks from metabolic data from mammalian cells (systems). The
rationale for reconstructing metabolic networks ﬁrst, prior to applying this framework to
gene expression data, is to conﬁrm the ability of the proposed approach to reconstruct
known biological networks, such as the TCA and urea cycles. This approach provides a
degree of conﬁdence in the novel networks that may be reconstructed from experimental
data. Nevertheless, experimental veriﬁcation of these novel networks is ultimately still

required.

Numerous mathematical models, from Boolean or neural network models [Liang
et al., 1998] to deterministic (differential equation based) models [Chen et al., 1999] have
been applied to represent biological networks. Thus far, the former algorithms have been
used predominantly to infer network structures from gene data, both experimental
[Hakamada et al., 2001] and simulated [Akutsu et al., 1999; Maki et al., 2001]. The
limitation of the Boolean network model is that it captures only the logical relations
withinthe data. With this method, the genes take on binary (1/0) inputs and give binary
outputs according to logical or Boolean rules. Boolean methodology provides a context to
analyze logical rules such as A = B or C. The information captured by such a model

would be, for example, whether gene A can be activated by either gene B or gene C.

48

Boolean models are limited to time series data and cannot be used to evaluate
quantitatively the dependency between pathways. To infer networks and interactions
requires knowledge, to a certain degree, of the cause-effect relationships among the
variables. Typically, these data-driven approaches are based up correlation, and
correlation between two variables does not guarantee causality [Sprites et al., 1993].
Data-driven methods (herein denoted as data-driven reverse engineering methods), infer
relations between variables (pathways) from the data itself without any a priori
assumptions of the mechanisms involved, precluding statistical hypothesis testing of
possible regulatory models [Hartemink et al., 2001]. Similarly, deterministic models
require detailed information that is presently unavailable for mammalian systems,
certainly not in the detail required to model the system sufﬁciently to suggest or evaluate
hypothetical environmental conditions. Therefore, the complexity of mammalian systems
limits the applicability of deterministic models to relatively simple biological systems

and not easily to gene expression data.

In contrast, for certain microorganisms, metabolic networks and their control
circuits have been reconstructed from currently available gene sequence data combined
with a literature search of the known biochemical pathways and their related
physiological ﬁmctions [Schilling et al., 1999]. These networks are built from a
knowledge-based methodology and permit hypothesis testing of possible regulatory
models (herein denoted as model-driven hypothesis testing methods). These knowledge-
based approaches are formulated by predeﬁning or assuming a model and evaluating its
likelihood with respect to existing sequence data. Therefore, these models cannot easily

assess the effects of changes in the environment on the hypothetical networks, although,

49

the metabolic regulatory network of the E. coli has been redesigned by using metabolic

control analysis from experimental data [Farmer and Liao, 2000].

MN": |:> Mud lrlonnalon based ' Reversed Metaiollc
Profile Lea'ring network

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Biologcd aid
Blochenied
knowledge
Identify causal
related [80‘0” Hypothesis testing Hypothetlcd
and optimize <:i m metric
cellular fmctlon (Bayesian) ' Networks

 

 

 

 

 

Figure 3.1. Bayesian network based framework. Mutual information based Bayesian network structure
learning algorithm was ﬁrst applied to the metabolic proﬁle to infer the regulatory network Incorporating
known biological knowledge into the formulation of the hypothetical networks further reﬁned the
structures. Bayesian metric score identiﬁed the mostly likely network structure.

Alternatively, we developed a Bayesian-based framework, schematically shown in Ejgrgg
3_.1_, which combines data-driven reverse engineering and model-driven hypothesis testing
methods to re-construct sub-network structures, for example, TCA and urea cycles, from
hepatocellular metabolic data. The advantage of the Bayesian framework over other data-
driven methods is the ability of the Bayesian approach to perform cause-and—effect
analysis, thus providing a basis of identifying causal relationships. This is accomplished
without a priori detailed knowledge or assumptions of the biological system and the
governing equations but rather is based on the concept of conditional probability.
Therefore, similar to model-driven methods, the Bayesian network approach permits the
evaluation of several hypothetical networks by using quantitative measures, such as a

Bayesian metric score, to access the likelihood of a proposed network structure. In

50

contrast to other model-driven methods, our framework incorporates experimental data,

which allows one to evaluate the affects of environmental changes.

We integrated several Bayesian based techniques into our framework. Data-driven
Bayesian network learning method was ﬁrst applied to learn the raw metabolic regulatory
network from the data itself. Many algorithms exist that can infer the Bayesian network
structure. Only a few are computationally efﬁcient enough to deal with large datasets.
One such method is the information theory-based learning algorithm, which we have
applied to the data to infer the underlying metabolic regulatory network. The inferred
network was then compared with the known metabolic network to evaluate the ability of
our framework to learn for example the TCA and urea cycles. Next, a latent variable
detection algorithm was applied to the inferred sub-network with the goal of ﬁnding
possible latent variables in the network by discovering indirect pathways relevant to our
objective function. Methods, such as inductive causation (10"), were applied to the sub-
network composed of the identiﬁed pathways with the goal of ﬁnding possible latent
variables in the network that inﬂuence biological (objective) functions, such as
triglyceride accumulation and the rate of urea synthesis. Then, based on the pathways
learned from the data and currently known biological associations, we postulated several
metabolic regulatory network models. The utility and accuracy of these alternative
models were evaluated by comparing 1) their Bayesian metric scores, calculated from the
experimental data, to assess the model’s goodness of ﬁt to the data; and 2) how well the
model predicted the objective function(s). Thereupon an optimal model representing the
metabolic regulatory network was selected based on these criteria. Finally, a sensitivity

analysis was applied to evaluate the stability and robustness of the system or phenotype

51

with respect to changes, that is, noise, in the system parameters, namely, the

environmental or metabolic variables.

The ultimate goal of these models is to aid our understanding of the underlying
mechanism that governs the biological systems and the transition of these systems to
different states. In the current investigation, our system experiences a transition in its
microenvironment going from culture medium to plasma. Although, for the purpose of
the current investigation, we evaluated the system’s steady-state. The structure of the
biological network can be obtained from the steady-state data, which represents the
cellular, tissue, or organ phenotype or state. Once the structure is attained, one can
evaluate, using transient data, the system dynamics and how it changes from the steady-
state. Understanding the system dynamics would permit the management of transitions

from a diseased to a more beneﬁcial or normal phenotype.

3.2 MATERIAL AND METHODS

3.2.1 Data Collection

The sources of the dataset used in this study were previously published elsewhere [Chan
et al., 2003a,b,c], and what follows is a brief description of the experimental method and
the metabolic ﬂux analysis (MFA) used to obtain the dataset that was used in BN
analysis. Primary isolated hepatocytes were cultured in a collagen sandwich
conﬁguration and incubated in standard hepatocyte culture medium containing either 0.5
U/ml insulin (high insulin) or 50 uU/ml insulin (low insulin). The hepatocytes were
cultured in this fashion for at least 6 days prior to plasma exposure. This interval is

considered the pre-conditioning period. The six-day-old-sandwiched hepatocyte cultures

52

were subsequently exposed to unsupplemented, amino acid, hormone, or amino acid plus
hormone supplemented plasma solution for an additional seven days (see Chan et al.,
2003a,b for details). Therefore, six combinations of pre-conditioning-plasma
supplementation were evaluated. They were i) low-insulin, pre-conditioned, and
unsupplemented plasma; ii) low-insulin, pre-conditioned, and amino acid-supplemented
plasma; iii) high-insulin pre-conditioned and unsupplemented plasma; iv) high-insulin,
pre-conditioned, and amino acid-supplemented plasma; v) high-insulin, pre-conditioned,
and horrnone-supplemented plasma; and vi) high-insulin, pre-conditioned, and amino
acid plus hormone-supplemented plasma. A model for hepatocyte metabolism was
created based on the known stoichiometry of the hepatic metabolic network. The
stoichiometry of each reaction in the MFA model is listed in Chan 2003a. A total of 33
metabolite measurements were coupled to the MFA to estimate the other 43 intracellular
ﬂuxes (see Chan et al., 2003a,b for details). The structure of the resulting data, namely,
the measured metabolites (in bold) and the estimated intracellular ﬂux values, with their
corresponding ﬂux numbers, are illustrated in Table 3.2. To help illustrate the overall
model structure see appendix Figure 1, we applied our Bayesian-based methodology to
the data in Chan 2003a, to initially infer the known biological networks, such as the TCA
and urea cycles. After which, the methodology was applied to A infer the network

structures relevant to intracellular triglyceride (TG) accumulation.

53

Table 3.1. Learned TCA cycle relationships

 

Learned relation Biological explanation

 

 

 

 

 

7 9 9 Synthesis of oxaloacetate (pathway no. 7) controls the synthesis of
citrate (pathway no. 9).

9 9 10 Citrate synthesis controls (pathway no. 9) the reaction from citrate
to cr-ketoglutarate (pathway no. 10).

11 9 12 a-ketodehydrogenase (pathway no. 11) controls the formation of
fumarate from succinyl-CoA (pathway no. 12).

13 9 14 Synthesis of malate from fumarate (pathway no. 13) controls the
reaction from malate to oxaloacetate (pathway no. 14).

9 (- 7 9 8 Link between the TCA cycle and glueoneogenesis through

pyruvate carboxylase (pathway no. 7).

 

 

 

 

3.2.2 Bayesian networks

Bayesian networks are directed acyclic graphs (DAG) whose nodes correspond to
variables and whose arcs represent the dependencies between variables. The
dependencies are determined by the conditional probabilities of each node x,, given its
parent node pa, Pr(x,- | pa(x,-)). A Bayesian network assumes conditional independence,
such that each node is independent to its non-descendants, given its parents. In other
words, x,- and x; are conditionally independent to each other given pa (see Figge 3.2),

then

Pr(Xr I 11319.: (xi?) = ”(xi lpa (Id) 00

54

Figure 3.2. An example of a simple Bayesian network. The network consists of three nodes P,, X,, X, and
XI, in which P, is the parent node of X, and Xj. X, and X,- are conditionally independent to each other given
the parent node P.. P. is the cause of or regulates X, and X,; X, and X,- are the effects of Pa.

Also, a Bayesian network consists of the joint distribution deﬁned by a set of variables

{x,} as:
N
Pr(x1,....,x,,) = 11140:, | pa(x,.)) (3.2)

In the example above:
1’de xi: Xj.. xk) = ”(M Pr(xz'lpa) Pr(lepa) 1’ka | xi) (3-3)

The approaches developed to learn the Bayesian network structure from either

experimental or simulated data generally fall into two categories: 1) search and scoring or

55

2) constraint-based approach (dependency analysis approach). With the search-and-
scoring-based approach, the algorithm starts with a graph without arcs, and then new arcs
are added and evaluated with a speciﬁc scoring function. A scoring function is calculated,
and if the score improves, the new are is kept; otherwise it is eliminated. Thus, the
Bayesian network is learned through optimizing this scoring function. Numerous scoring
functions have been applied to this type of algorithm, such as Bayesian scoring method
[Heckerman et al., 1994] and minimum description length method [Suzuki, 1996]. Given
the immense computational time and cost associated with this approach, heuristic
methods to reduce the search space, such as the hill climb, greedy searches, and
simulated annealing, have been applied. The constraint-based approach applies the
conditional independency (Cl) test to reconstruct the Bayesian network by uncovering the
dependencies within the data. Several algorithms are available for this approach; one of
the better-known algorithms is inductive causation (IC) [Sprites et al., 1996]. In the
current investigation we combined the constraint-based method with a scoring function to

guide the development of the metabolic network.

3.2.3 Inferring the Bayesian Network from information theory

Metabolic ﬂux data are much smaller in scale than micro-array data. Nevertheless,
applying Bayesian network analysis to ﬂux data is still computationally prohibitive.
Many algorithms exist that can infer the Bayesian network structure, only a few are
computationally efﬁcient enough to deal with large datasets. The information theory-
based learning algorithm is one such method, which we’ve applied to ﬂux data to infer
the underlying metabolic regulatory network. This algorithm has been applied to real-

world data with hundreds of variables and records [Cheng et al., 2002] and is described

56

brieﬂy below. This constraint-based algorithm is performed in three separate phases:

drafting, thickening, and thinning.

Phase 1 computes the mutual information contained in each pair of ﬂuxes as a measure of
closeness indicating the correlation between ﬂuxes and creates a draft of the regulatory
network based on this information. The mutual information I(X,-,X,-) for each pair of

metabolic ﬂuxes (x,,xj) is computed as:

1(X X ) Z Pr( )1 Pixi’x”)
., . :: x.,x. 0g
‘ J xixj l J Pr(x,.)P(xj)

 

(3.4)

Then each pair of metabolic ﬂuxes with mutual information I(X,~,X,-) greater than a
threshold value, T, is sorted in a list L from high to low. An arc is drawn for the ﬁrst two
pairs of nodes in L. The pointer is then moved to the next pair of nodes. If no path exists
between them, an arc is added. To illustrate this point, we provide a modiﬁed example
from [Cheng et al., 2002]. Suppose we have ﬁve metabolic ﬂuxes A, B, C, D, E, and the

mutual information sorted as follows:

I(A,B)>I(B,C)>I(B,D)> T > (A,D)> I(A,C)>I(C,D)

then

L={[A.B],[B,C],[B,D]}

and we obtain a draft shown in Figure. 3.3 A.

 

57

 

o 9 C 0‘9 C 0 9 C
Phase 1 D Phase 2 0 Phase 3 0

Figure 3.3 An illustration of the three phases of the Bayesian network learning process using mutual
information based algorithm. Phase 1: the arcs are added to pairs of nodes whose mutual information are
larger than the threshold value E. Phase 2: the arcs are added to pairs of nodes that are not conditionally
independent using mutual information based CI test. Phase 3: the arcs are removed between pairs of nodes
that are conditionally independent using mutual information based CI test.

 

In Phase 2, arcs are added when pairs of unconnected metabolic ﬂuxes (i.e., nodes) are
not independent as determined by C1 test. The CI test is based on conditional mutual
information as deﬁned by Eq. 5 below. For I(Xi,Xj|c) less than the speciﬁed threshold

value T, (Xi,X]') are said to be independent given c, where c is a set of metabolic ﬂuxes.

 

Pr(x-,x-|c)
I(X,,Xj|c)= Z Pr(xi,xj,c)log ’ J (3.5)
.vx.

x 1,6 Pr(x,- |c)P(xj 10)

I

For example, if the metabolic ﬂuxes (A, D) in Figge. 3.3A are not independent based on

the CI test, then the are between (A, D) is added, with the resulting network shown in

Figgrc. 3.3B.

In Phase 3, each arc is examined by using the CI test and arcs are removed if the two
metabolic ﬂuxes linked by the are are conditionally independent. For example, if (B, D)
are found to be independent given (A) by the CI test, then the are between (B, D) is
removed with the resulting network shown in Figge. 3.3C.

3.2.4 Detecting latent variables and comparing alternative models based on a
Bayesian scoring metric

58

Many biological systems are not well deﬁned due to our incomplete understanding of the
underlying mechanism involved. As a result, certain relevant factors may be overlooked
in the experimental design and measurements. From a mathematical standpoint, these
factors would be considered latent variables. To detect the presence of these latent

variables, algorithms such as 10" [Pearl, 2000], and the faster FCI [Sprites et al., 1993]

have been used.

Previously [Chan et al., 2003c], we found that the intracellular TG accumulation
was not captured completely by the partial least-squares (PLS) method. A possible
explanation may be due to insufﬁcient details of the metabolic pathways pertinent to
intracellular TG. In our current model development, the IC* algorithm was applied to a
sub-network of pathways to detect possible latent variables relevant to intracellular TG.
The alternative models containing these latent variables were then compared
quantitatively by using a Bayesian scoring metric. A model’s score, BS(S), is deﬁned as

the log of the posterior probability of the model 8 given the observed data 0, p(S|D):

BS(S) =10gp(SlD) =10gp(5) +10gp(DIS) —10gp(D) (3.6)

where p(DlS) is the likelihood of the observed data D given S. For a given set of data D,
all possible structures S are assumed a prior to have equal probability, namely p(S) is
constant. p(D) is a normalizing constant, which is the a prior probability of a given
dataset that is applied to all structures S being evaluated. p(DlS) is determined by Eq. 7,
which was ﬁrst derived by [Cooper and Herskovits,l992].

_ n qi may.) r.- F<ayk+Ngk)
p(DlS)— I] HF(a,-,— Mfg-)1] Nag-r) (3.7)

 

 

59

Because P(S) and P(D) are the same for every network, thus they can be ignored in our
application with only p(DlS) remaining in the BS(S). Therefore, the score obtained is a
relative posterior probability. A Matlab Bayesian Network Toolbox (BNT) was
implemented to detect for latent variables and to evaluate the Bayesian metric score

[Murphy, 2001].

3.2.5 Bayesian network predictor and target function prediction

The Bayesian network can be used to predict the value of a particular node as a function
of the values of the remaining nodes (e.g., pathways) in the network. The target node
being predicted will be denoted here as the class node. The posterior probability that the
class node will take on a certain value given the values of the other nodes is determined
based on conditional probability. The class node can take on any number of possible
values, but the value that gives the highest posterior probability is selected as the

“predicted” value for the node.

The predictive capabilities of a Bayesian network model can be improved by
incorporating into the Bayesian network structure the current knowledge of the biological
system being characterized. The prediction accuracy of the Bayesian network model in
turn can be used to assess the correctness of the structure. The model is built as follow.
Suppose node pa in Figgge. 3.3 is the class node, and b; and c) are the known values of
nodes x,- and )9, respectively, we can predict the class value with the highest posterior

probability Pr(pa| x,, xj) according to the Bayesian rule:

Pr(pa=a1|xi=b1,xj=c1) = Pr(x,-=b1,xj=c1[pa=a1)*Pr(pa=al)/Pr(xi=b1,xj=c1) (3.8)

60

where a, is one of any possible values for node pa. The dataset of ﬂux values is divided
into a test and training set. Because nodes x,, xj are conditionally independent to each
other given the value of node pa, Pr(x,=b1,xj=c,|pa=a,) can be estimated from the

training data as follows:

Pr(xz'=b1.Xj=61[Pa=ar) = Pr(xz'=b1lpa=01)*Pr(Xj=61LDa=ar) (39)

Similarly, Pr(pa=a1) can be estimated from the training data, and Pr(x,~=b1, x,=c1) is
immaterial since it is the same for each class value of pa. Therefore, a known Bayesian
network structure inferred using information theory-based algorithm can be used to
determine the conditional probabilities of a target variable. In this chapter, intracellular
triglyceride is the class node predicted based on the values of the other ﬂuxes in the

network.

3.2.6 Data discretization

To learn the parameters and structure of the Bayesian network, we must ﬁrst deﬁne a
probability table containing all the nodes. This requires that the ﬂuxes be discretized into
nominal data. The simplest method to discretize the data is to divide the ﬂuxes into equal-
size intervals. This method, however, may be more sensitive to outliers, because extreme
values may skew the range. A method that is less sensitive to this type of outlier is to
divide the data into equal frequency intervals. In this approach, a continuous variable is
divided into k bins where each bin contains equal number of frequency or adjacent
values. These two methods are unsupervised and thus do not take into account the culture

conditions associated with the metabolic ﬂux data. Alternatively, a supervised algorithm

61

for discretization that incorporates the culture conditions uses class information entropy

to select the boundaries of the bins. Class information entropy is deﬁned as:

N n,
Ent :Z'I'VLH" (3.10)
i=1
where n,- is the number of samples in interval 1', N is the total number of samples, and H,- is

the Shannon entropy in interval i. The Shannon entropy is deﬁned in terms of the

probability of observing an event, pj:
H,=Z—pjlogpj (3.11)

where pj is the probability of culture condition j in interval 1'. The partition boundaries are
determined by minimizing the class information entropy deﬁned in Eq. 10. Each of the
aforementioned methods was applied, and the equal ﬁequency method was found to
perform the best by comparing the resulting Bayesian networks with the known
metabolic network. Therefore, we applied the equal frequency discretization method to

our data.

3.2.7 Sensitivity analysis of Bayesian Networks

Bayesian network is a construction of connections between variables where the
connections represent their degree of dependency based on conditional probabilities. The
conditional probabilities will be denoted here as parameters of the Bayesian network. The
importance of a connection to the network can be assessed through a sensitivity analysis.
The sensitivity analysis examines the posterior probability of a response variable or

network when a parameter is systematically perturbed. Some parameters are likely to

62

show considerable effects and thus are deemed important to the response variable or

network.

Let B be a Bayesian network, X be a parameter, and Y be a response variable, e be the

data, the sensitivity of X on Y given e is a partial derivative:

S<x1y.e)=5”%f2 (3.12)

It was proven by [Castillo et al., 1997] that any posterior probability of a response

variable is a ﬁaction of two linear functions of X,

 

p(y1e>(x)= ””3 (3.13)
}ar+l

then S(x[y, e) can be expressed as

S(x|y,e)=6p(ay|e)= “"372 (3.14)
x (”(+1)

To determine the value of a,B,y, we calculate p(y|e)(x) for three different values of x, for

example, if we set x to values of 0, 0.5, 1, then (1,8,7 are determined by

ﬂ=

P0
___ﬂ- po'

10 -p
=p (7+1)- ﬂ

where p0, p0'5, pl denote the corresponding posterior probabilities of p(yle) given x = 0,

0.5, 1. We applied a sensitivity analysis to our Bayesian network by using a Matlab code

63

developed by [Wang et al., 2002]. The sensitivity analysis enabled us to identify the
important or most sensitive variables (nodes) in the network with regard to the targeted
response variable (objective function), that is, intracellular TG or urea. Therefore,
carefully controlling or modulating the most sensitive parameters will facilitate the

optimization of the response variable or network structure.

3.3 Results

In this chapter, we illustrate the integration for the ﬁrst time of several Bayesian-based
techniques, such as Bayesian network structure inference [Cheng et al., 2002], latent
variable detection algorithms [Pearl, 2000], hypothesis testing using Bayesian metric
scoring [Hartemink et al., 2001], Bayesian network sensitivity analysis [Wang et al.,
2002], and Bayesian network prediction [Friedman and Goldszrnidt, 1996], into one
united framework. Moreover, we illustrate the utility of this uniﬁed framework to
biological data, and the ﬁrst application of this approach to metabolic data for the
purpose of inferring metabolic network structure of hepatocellular metabolism from the
data. We applied this framework to identify direct and indirect pathways that inﬂuence an
objective ﬁmction, such as TG accumulation and the rate of urea synthesis. Previously,
we developed a methodology that combined the metabolic ﬂux distribution obtained by
MFA with PLS to reveal the underlying structure and correlation within the data. PLS
could capture and predict urea synthesis very well, but the intracellular TG levels not as
well. Alternatively, the Bayesian-based framework could accurately predict both the level

of urea synthesis and TG accumulation for each of the cellular conditions evaluated.

64

Algorithms to detect latent variables to uncover non-obvious pathways relevant to
intracellular TG accumulation were used to help construct the Bayesian network model,
thus providing ﬂexibility in the model building process. This contributed to the improved
predictive capabilities of the Bayesian network model over the PLS method. To uncover
these latent variables, both the measured and estimated intracellular ﬂuxes, the latter
obtained from a MFA model, were combined into one data matrix X (36X76). The matrix
X was then subjected to the equal frequency discretization to obtain a discretized data
matrix D. We applied Bayesian network analysis to the data matrix D to infer sub-
networks within the larger metabolic network and to identify possible latent variables not
included in 0. Based on the latent variables identiﬁed, we proposed several alternative
metabolic sub-networks and evaluated their “accuracy” with a Bayesian scoring function

and its ability to predict the obj ective function.

3.3.1 Reverse engineering the sub-networks

Using Bayesian network analysis, we reversed engineered parts of the metabolic network
from the ﬂux data, namely, the TCA (Figure. 3.4A) and urea (Figure. 3.46) cycles, The
TCA cycle is important for producing cellular energy (ATP) from the oxidation of ﬁrels
and generating intermediates for the glueoneogenie pathway. The analysis enabled us to
infer the relationships shown in Figure. 3.43 and listed in ﬁrm. The direct links
between the TCA pathways 10—11 and 12—13 were not learned by Bayesian network
analysis but rather were identiﬁed as being coupled through oxidative phosphorylation
(ﬂux nos. 51 to 53). This recognition is rather appropriate because oxidative
phosphorylation, whereby ATP is formed by electron transfer from NADH and FADHZ

to 02, is coupled to the TCA cycle. It is noteworthy that, in our model, pathway no. 10—

65

11 combined two pathways, namely, citrate—isocitrate and isocitrate—a-ketoglutarate, into
one, citrate—a-ketoglutarate; likewise two pathways were combined into pathway no. 12—
13. If we add the missing links (10 —) 11 and 12 ——> 13) to the inferred network, the

Bayesian metric score indeed improved from —198 to —172.

Another important hepatic function is the removal of NI-I4+ derived from protein
and amino acid metabolism via its conversion to urea by the liver through the urea cycle
(Figure. 3.40. Bayesian network analysis was able to “learn” most of the pathways in
the urea cycle (Figure. 3.4D, Table 3.2). A comparison of the actual pathways and the
inferred network is shown in Figgre. 3.4. The link between pathways 16 and 15 was not
learned by Bayesian network analysis. Adding the missing relation (15 —) l6) improved
the Bayesian metric score from —116 to —102. The learned Bayesian-based network
shown in F iggre. 3.4D was evaluated on how well it predicted urea production. We found
the Bayesian model could predict urea production as well as the PLS and multi-block

PLS models, with ~92% accuracy [Chan et al., 2003c; Hwang et al., 2004].

66

    
     
 

Flux 52,53

  

Reversed
Urea cycle

 

 

 

Figure 3.4. Reverse Engineered TCA and urea cycle learned by mutual information based algorithm.
A) Actual metabolic network of TCA cycle, each node represents a metabolite and connections (arcs)
between nodes represent the metabolic ﬂuxes. B) TCA cycle inferred by Bayesian network analysis, each
node represents a ﬂux in Figure 3.4A, each are between the nodes represent a causal relation between the
ﬂuxes, the solid connections were learned and the dashed connections were not learned by Bayesian
network analysis. C) Actual metabolic network of urea cycle, each node represents a metabolite and
connections between nodes represent the metabolic ﬂuxes. D) Urea cycle inferred by Bayesian network
analysis, each are between the nodes represent a causal relation between the ﬂuxes, the solid connections
were learned and the dashed connections were not learned by Bayesian network analysis.

67

Table 3.2. Learned urea cycle relationships

 

Learned relation Biological explanation

 

l6 9 17 Synthesis of arginine (pathway no. 17) is controlled by the

formation of citrulline carbamoylate from ornithine (pathway no.
16),

l6 9 43 Aspartate arninotranferase (pathway no.43) is controlled by the

formation of citrulline carbamoylate from ornithine (pathway no.
16),

17 9 15 Urea production (hydrolysis of arginine) (pathway no.15) is
controlled by pathway no. 17

 

 

 

17 9 18 Uptake of arginine (pathway no. 1 8) is controlled by pathway no.
1 7

 

 

 

 

3.3.2 Identifying relevant pathways linked to intracellular TG levels

The ability of Bayesian network analysis to infer the known metabolic sub-networks from
the ﬂux data lends credence to the latent relationships it uncovers. Despite the
tremendous amount of information known about the hepatic metabolic network, our
understanding is incomplete. Therefore, due to our imperfect knowledge of the metabolic
network, it is plausible that important latent variables that are not directly associated but
revealed by the model to be linked to currently unassociated pathways may indeed be

consequential, as was noted previously with the uncovering of the link between ﬂux nos.

51—53 and the TCA cycle.

Applying a constraint-based algorithm to infer a sub-network containing
pathways linked to TG, we found that intracellular TG (ﬂux no. 76) was inﬂuenced by B-
hydroxybutyrate (BOH) dehydrogenase (ﬂux no. 50), glutaminase (ﬂux no. 38), and

glutamate dehydrogenase I (ﬂux no. 36), whereas extracellular TG (ﬂux no. 70) was

68

affected by lactate dehydrogenase (LDH; ﬂux no. 8), glyceraldehyde-3-P (G3P; ﬂux no.
5), free fatty acid uptake (ﬂux no. 72), glutamate dehydrogenase I (ﬂux no. 36), and
asparaginase (ﬂux no. 45). Latent variable detection (IC*) algorithm was applied to the
aforementioned pathways, along with several other pathways (ﬂux nos. 2, 26, 54, 61, 64,
73, 74, 75), a total of 17 pathways, and the results suggest that latent variables may exist
that inﬂuence intracellular TG and the glutamate related pathways (ﬂux nos. 36 and 38).
Those with direct connections learned through 10" are listed in Table 3.3. Although
direct connections to ﬂux no. 50 were not found with IC“, a direct connection between
ﬂux no. 50 and 76 was indicated by the constraint-based method; therefore, it was
included in all subsequent analyses. From the currently known pathways related to TG
metabolism and the inferred connections, we postulated several alternative networks,
shown in Figure. 3.5. We tested the hypothesis that ﬂux no. 50 may be a latent variable
between ﬂux no. 76 and ﬂux no. 36 (Figure. 3.5A) and between ﬂux no. 76 and ﬂux no.
38 (Eiggre. 359. Similarly, we tested the hypothesis that ﬂux no. 36 was a latent
variable between ﬂux nos. 76 and 38 (Figure. 3.53) and ﬂux no. 38 was a latent variable
between ﬂux nos. 76 and 36 (not shown, but will be denoted as 5B alternate). Finally, in
Figure. 3.5D, we tested the possibility that ﬂux no. 50 was the latent variable inﬂuencing
ﬂux nos. 76 and 38, in combination with ﬂux no. 38 as the latent variable for ﬂux nos. 76
and 36. We evaluated the likelihood of these models with a Bayesian metric score and

prediction accuracy of the inferred model.

69

 

 

 

 

Cholesterol

esterf75)

M

] Free fatty acld
uptake(72)

 

 

 

 

 

[Cholesterol Free fatty acid]
[esterﬂw J [ uptake(72)
[ibydroxybutyrate
(50)
Glutamate
dehydrogenase(36)

 

l

 

Intracellular T6
(76)

] [Glutamlnasemw J

 

 

A

 

 

 

Cholesterol
ester(75)

[

uptake(72)

] [Free fatty aeld

 

 

ﬂhydroxybutyrate
(50)

 

 

Glutamate
dehydrogenase(3

   

 

.1

 

lntracellular T
(76)

G] [ Glutamlnase(38) ]

 

 

 

C

  
   

 

 

j[,rﬂhydroxybutyrathe]
( 50)

 

Glutamate
dehydrogenase(3

.1

 

 

 

 

[mmcﬂs’ar 7'61 Glutamlnase(38)]
8
Cholesterol Free fatty acid
ester(75) uptake(72)

 

 

 

ﬂhydroxybutyrate
(50)

 

 

Glutamate
dehydrogenase(36)

  

 

 

 

[mmcﬂéj’a’ TGH Glutamlnasef38) ]

D

 

 

Figure 3.5. Postulated sub-networks for intracellular TG accumulation, learned with IC* algorithm.
A) Postulated network that assumes B-hydroxybutyrate as the latent variable regulating intracellular TG
accumulation and glutamate dehydrogenase pathways. The network’s Bayesian metric score is -l68. B)
Postulated network that assumes glutamate dehydrogenase pathway as the latent variable between
intracellular TG accumulation and glutaminase. The network’s Bayesian metric score is -158. C) Postulated
network that assumes B-hydroxybutyrate as latent variable regulating intracellular TG accumulation and
glutaminase pathways. The network’s Bayesian metric score is --155. D) Postulated network that assumes
B-hydroxybutyrate as the latent variable regulating intracellular TG accumulation and glutaminase
pathways while glutaminase inﬂuences intracellular TG accumulation and glutamate dehydrogenase

pathways. The network’s Bayesian metric score is

70

-152.

Table 3.3. Relations recovered by Bayesian network analysis with IC*
algorithm. A value of 0, 1, -1 indicates no direct link, a direct connection,
and the existence of a possible latent variable, respectively.

 

3.3.3 Accuracy and sensitivity of the postulated networks

Two methods were used to evaluate the performance of candidate networks, namely, the
Bayesian metric score and the prediction accuracy. Linking the known pathways in the
urea and TCA cycles that were not inferred by Bayesian network analysis improved the
Bayesian metric score of the inferred network. This illustrates that our knowledge of the
biological system can be used to help reﬁne the inferred network and the Bayesian metric
score can be used to guide the identiﬁcation of relevant pathways within a network. The
metric scores of the models in Figure. 3.5A, 3.53, 3.53 alterngte, 3.5C, and 3.5D were —
168, -—158, —153, —155, and —152, respectively. The scores indicate that the model in
Figure. 3.53 is 22,000 (e'0=22,000) times more likely than the model in Figure. 3.5A in
explaining the experimental data, and the model Figure. 3.53 altemaﬁ is ~150 times
more likely than the model in Figure. 3.53. Similarly, the model in Figgre. 3.5C is

~440,000 times more likely than the model in Figure. 3.5A. Combining Figure. 3.5C and

 

3.53 alternate into Figure. 3.5D provides a network model that is 20 times more likely

 

than the model in Figure. 3.5C, 3 times more likely than the model in Figure. 3.53

alternate, and 8.9 million times more likely than the model in Figure. 3.5A. The program

 

71

used to obtain these scores is available at
http://www.ai.mit.cdu/~murphyk/Soﬁware/bnsoft.html. In the second method, we divided
our 36 samples into a training set and a test set consisting of 24 and 12 samples,
respectively, to evaluate the hypothetical network structures shown in Figgre. 3.5. We
trained several Bayesian classifying models with our training dataset to predict
intracellular TG. The ability of the postulated networks to predict intracellular TG for the
test set are indicated in Table 3.4. The model shown in Figr_rr_'e. 3.5A classiﬁed the
samples with a prediction accuracy of ~67%. However, the model in Figure. 3.53, 3.53
_altemgte. 3.5C. and 3.5D predicted with an accuracy of 92, 92, 83, and 92, respectively.
The Bayesian metric score and the prediction accuracy method both indicated that the
model most likely to be correct given our data is the model illustrated in Figure. 3.5D.
Therefore, it is possible that B-hydroxybutyrate (ﬂux no. 50) may be a latent variable
affecting ﬂux nos. 38 and 76, whereas the glutaminase pathway (ﬂux no. 38) may be a
latent variable participant in ﬂux nos. 36 and 76 (see Table 3.4). Because glutamine is the
amino acid that is supplied to the cells in the highest quantities (4 mM vs. less than 0.8
mM for the other amino acids), a possible role played by the glutaminase (no. 38) is to
up-regulate the TCA cycle with the necessary intermediates, such as glutamate and in
turn or-ketoglutarate and oxaloacetate. An up-regulated TCA cycle increases the demand
on acetyl-CoA, which can be obtained from the breakdown of fatty acids via B-oxidation,
with some portion directed to ketone body production, such as B-hydroxybutyrate.
Therefore, it is encouraging that the model identiﬁed both ﬂux nos. 50 and 38 as relevant

inﬂuences on intracellular TG accumulation. The program used to determine the

72

prediction accuracy is Belief Network Power Predictor, available at

http://www.cs.ualberticahicheng/bnpphtm.

Table 3.4. Bayesian score and prediction accuracy of the Figure 3.5

 

 

 

 

 

 

Model Prediction Accuracy Bayesian Score
3.6 A 67% -l68
3.6 B 92% -158
3.6 B alternate 92% -153
3.6 C 83% -155
3.6 D 92% -152

 

 

 

 

 

A sensitivity analysis was applied to the intracellular TG sub-network shown in ﬂguLe,
i5, with intracellular TG (ﬂux no. 76) as the response variable. The results in Table 3.5
indicate that the intracellular TG level is most sensitive to B-hydroxybutyrate (ﬂux no.
50) and glutaminase (ﬂux no. 38) and least sensitive to glutamate dehydrogenase (ﬂux
no. 36) and cholesterol ester uptake (ﬂux no. 75). Although the sensitivity coefﬁcients are
all small, the coefﬁcients for B-hydroxybutyrate (ﬂux no. 50) and glutaminase (ﬂux no.
38) are generally an order-of-magnitude higher than the other pathways, indicating that
they have more impact on TG storage. Thus, changing ﬂux nos. 50 and 38 would more
likely alter the level of intracellular TG accumulation as oppose to changing the other

pathways.

73

Table 3.5 Sensitivity analysis of TG network shown in Figure 3.6

 

 

 

 

 

 

 

Pathway Cholesterol FFA Beta- Glutaminase Glutamate
uptake(75) uptake

sensitivity (72) hydroxy(50) (38) dehydrogenase(36)

sensitivitySA 0.0001725 0.0014 0.0094 0.0014 0

5B 0.000345 0.0013 0.0115 0.0048 0.0058

5B 0.0021 0.0023 0.0138 0.0134 0

alternative

5C 0.000469 0.000525 0.0093 0.0013 0

5D 0.0014 0.0016 0.0093 0.0134 0

 

 

 

 

 

 

3.4 Discussion

We applied the Bayesian-based framework to infer known network structures from

metabolic data and evaluated the hypothetical networks using quantitative measures, such

as Bayesian metric scores. The framework was applied to hepatocellular systems and

successfully inferred the metabolic sub-networks, TCA and urea cycles, from metabolic

data. We identiﬁed both direct and indirect pathways that inﬂuence our objective

function, taken to be either intracellular TG accumulation or rate of urea synthesis. It was

found in [Chan et al., 2003c] that urea synthesis can be predicted well with a PLS model,

but the intracellular TG accumulation was modeled only reasonably well with PLS. Thus

one of the objectives in the current study was to develop a model that could capture the

causal mechanism or system structure and in turn more accurately predict intracellular

TG accumulation.

74

 

Compared with PLS analysis [Chan et al., 2003c], Bayesian network analysis
offers the advantage/possibility of characterizing the underlying causal structure within
the data. PLS is an approach based on correlation analysis; thus it identiﬁes factors that
are highly correlated to the target variable or objective function. Thus, PLS cannot
distinguish situations in which an identiﬁed factor (x,) and the target variable (Xj) are
highly correlated but are caused or regulated by a third variable, the common cause or
parent variable (pa). Under those conditions, to achieve an optimal response in the target
or objective ﬁrnction requires modulating the common cause variable as opposed to the
variable(s) that are simply highly correlated. F or example, in our previous paper [Chan et
al., 2003c], ﬂux no. 43 (aspartate aminotransferase) was found to contribute most to
optimizing or restoring urea synthesis. However in the sub-network inferred by the
Bayesian network analysis, shown in Figure. 3.4D. ﬂux nos. 43 and 17 (argininosuccinate
synthetase and argininosuccinase) have a common cause, ﬂux no.16 (carbamoyl-P-
synthetase and ornithine transcabamylase). It suggested that ﬂux no. 43 is relevant to urea
synthesis but it is not the cause of urea synthesis. Thus, to systematically optimize urea
production, the variable of common cause (ﬂux no. 16) should be modiﬁed rather than
the conditionally independent variable, ﬂux no. 43. This suggests that ammonia,
regardless of its source, generated in the liver mitochondria and combined with HCO3‘ to
form carbamoyl phosphate, is the driving force for urea synthesis and is the source of the
up-regulation in pathways 17 and 43, argininosuccinate synthetase, and aspartate
aminotransferase, respectively. This is a reasonable ﬁnding by the model. The ability of
the Bayesian framework to perform cause-and-effect analysis provides an advantage over

PLS in optimizing the target variable. However, multivariate analysis, such as PLS, can

75

be used to pre-select a subset of variables that are highly correlated to the target variable,
for example, intracellular TG, thereby decreasing the Bayesian network learning
procedure. For example, we selected the variables with absolute regression coefﬁcient
value larger than 0.5 and applied Bayesian network analysis on this subset of variables.
The results are shown in Figge. 3.6. Although this decreases the complexity of the
learning process, it does not preclude the possibility that causal information may be less

by evaluating only the highly correlated variables.

 

 

Cysteine
[Intracellular uptake(26) ]
TG(76)

 

 

 

 

 

[ 91:1“; j Glutamate Cholesterol
up T“ ) dehydrogenase(36) ester(75)

[Glutaminase(38) Free fatty
add uptake(72) ]

 

 

 

 

 

Figure 3.6. PLS flux selection. A subset of variables with absolute PLS regression coefﬁcients larger than
0.5 were selected based on PLS analysis as being important to intracellular TG accumulation were
subsequently subjected to the search and score based Bayesian network learning approach.

In the reversed Bayesian network, some relations were missing, for example, pathways
12—>13 in Figure. 3.53 and 15—916 in Figure. 3.5D. Two possible explanations could

account for this: 1) noise in the data; and 2) omitted measurements. We tested the ﬁrst

hypothesis by adding noise to our data and examined its effect on the resulting network

76

structure. Noise in the form of a Gaussian distribution was generated according to the

following equation,
Noise = randn(n) * std(signal) * ratio (3.16)

where randn(n) is a vector of random numbers of length n following a normal
distribution, the std(signal) is the standard deviation of the data, and the ratio is any
number between [0,1]. We added three different levels of noise to the data by assigning a
value of 0.05, 0.10, or 0.20 to the ratio, representing 5, 10, and 20% noise, respectively.
The resulting urea cycles are shown in Figr_rre. 3.7. When 5% noise was added, the
pathway 16—>17 disappeared but instead pathway 15—>16 was learned (Figure. 3.83).
This indicated the possibility that the missing pathway 15—>16 may have been due to
noise in the data. Adding 10% noise to the data resulted in pathway 17—918 being
replaced by a new pathway 16-—)l 8 (Figure. 3.70, and adding 20% noise eliminated both
pathways 15—>16 and 17—>18 (Figure. 3.7D). The results suggested that Bayesian
network analysis can tolerate a certain degree (5—10%) of noise within the data to provide
a satisfactory level of performance, represented by the number of relations inferred.
Although when the noise increased to a higher level, for example, 20%, the performance
of Bayesian network approach decreased as reﬂected in the higher number of missed
relations. We tested the second explanation by omitting one measurement from the data
and examining the pathways inferred. To illustrate this idea, we removed the lactate
measurement from the dataset (see Figure. 3.8/1), which resulted in the omission of two
additional pathways, pathways 8—>7 and 7—>9, which were no longer inferred by

Bayesian network analysis (Figure. 3.83). Therefore, both noise and missing

77

measurements could contribute to an incomplete or incorrect reversed Bayesian network.
Thus care must be taken in collecting the data and, if possible, a comprehensive set of
measurements needs to be acquired in order to gain the maximum beneﬁt from Bayesian

network analysis. It is appropriate to note that incomplete data would be less of an issue

with the availability of high-throughput micro-array technology for collecting gene data.

  

  

Reversed
Urea cycle

 
    
 

Flux 16

 

  
  
  
 

O
O

Flux 15

  

Reversed
Urea cycle

  

 

 

Figure 3.7. The effect of noise in the data on Bayesian network learning of the urea cycle. 5%, 10%
and 20% noise were added to experimental data and Bayesian network was applied to each set of data. A)
Urea cycle learned from the experimental data without the addition of noise. B) Adding 5% noise resulted
in pathway 16917 not inferred but 15916 inferred instead, suggesting the possibility that 15916 were
not inferred in Figure 3.7A due to noise in the data. C) Adding 10% noise resulted in pathway 17918 not
inferred but 16918 inferred instead, which indicates noise could result in incorrect relations being learned.
0) Adding 20% noise resulted in pathway 17918 not inferred as well as pathway 15916, which indicates

noise could cause decrease performance in the Bayesian network learning methods.

78

 

 

  
 

Reversed
TCA Cycle

   

 

Figure 3.8. The effect of omitted measurements on Bayesian network learning. A) The current
network B) Not including the lactate measurement resulted in the causal relations 897 and 799 not being
inferred, indicating that omitted measurement also could account for the causal relations missed in the
networks inferred by the Bayesian network analysis.

According to the methodology developed in the proposed framework, several
hypothetical networks describing intracellular TG accumulation were postulated and
tested. Data-driven reverse engineering and model-driven hypothesis testing are two
essential components of the proposed framework. Of the two, the data-driven portion of
the analysis (i.e., the constraint based or mutual information Bayesian learning portion)
used to infer the network structure is essentially automatic; thus, this part of the Bayesian
network analysis is both fast and efﬁcient. The analysis identiﬁes hypothetical structures
containing causal information that can be subsequently reﬁned further by hypothesis
testing. This is accomplished by incorporating existing biological knowledge in
determining the pathways to include in the hypothetical structures. For instance, in our

79

intracellular TG example, we manually selected a subset of pathways to perform the IC*
algorithm to determine which pathways inﬂuence TG accumulation. The method
identiﬁed the existence of latent variables, which we determined through a process of
elimination with the help of the metric score and the model’s prediction accuracy to guide
us to the “ﬁnal” model illustrated in Figgre. 3.5D. As an indirect conﬁrmation of the
model in Figure. 3.5D, we applied the Markov Chain Monte Carlo (MCMC) algorithm,
which is a search-and-score method to the 17 pathways mentioned earlier. Because this
method is computationally expensive, it is not feasible to apply this method to large
networks. However, we obtained a solution for the sub-network after 500 iterations. This
search-and-score method identiﬁed the connection between ﬂux nos. 38 and 76 (see
Table 3.6), which we postulated to be the latent variable between ﬂux nos. 36 and 76
(Table 3.3). Thus it indirectly lends support to our hypothesis that ﬂux no. 38 may be the
latent variable linking ﬂux no. 36 with ﬂux no. 76. However, as illustrated by our
example, the hypothesis testing has to be formulated and assessed manually, slowing the
entire process. In our example, we manually postulated the four structures shown in
Figgre. 3.6 for intracellular TG. Therefore, a reverse engineering framework that
combines these two steps becomes time-consuming. If only one could automatically
incorporate the existing and, as much as possible, comprehensive knowledge of the
biological system into the hypothesis formulation of the framework, the effectiveness of

this methodology would be greatly enhanced.

80

Table 3.6. Relations recovered by Bayesian network analysis with MCMC
algorithm, a search and score method. A value of 1 indicates a direct
connection.

 

 

 

 

 

 

 

'Flux nos. 36 3s 150 72 75 76
36

0 1 0 1 1 0
38

1 o 0 0 0 1
so

0 0 o 0 o 1
72

l 0 0 o 0 0
75

1 o 0 o o 0
76

0 1 0 0 0 0

 

 

 

 

 

 

 

 

 

The data in the current study were based upon a metabolic ﬂux model, which provides a
comprehensive overview of the intracellular metabolic state. MFA invokes certain
assumptions in the development of the metabolic ﬂux model, and these assumptions
inﬂuence the results we obtain. Ideally, it would be advantageous to increase the number
of measurements and reduce the number of assumptions. This may eventually come to
pass as high-throughput technologies facilitate these measurements. Gene chip data have
the advantage of more independent measurements. Unfortunately, transcriptional,
translational, and post-translational effects currently limit our ability to infer cellular
function from gene expression data. The measurements used in this study are at the
functional level, with the disadvantage that MFA is needed to augment the data. That
said, the Bayesian network analysis is naive with respect to the assumptions used in
MFA. The Bayesian analysis uncovers the dependencies resulting from these

assumptions without bias. The ability of the Bayesian-based framework to infer the TCA

81

and urea cycles, which are well-established cellular networks, provides conﬁdence in this

approach to uncover relationships from this, as well as, other types of data.

In addition, the Bayesian methodology allowed us to infer the coupling of the
oxidative phosphorylation pathways (ﬂux no. 51—53) to the TCA cycle (ﬂux nos. 9—14).
This ﬁnding was encouraging, as oxygen uptake is an independent and direct
measurement and is not linked directly to the TCA cycle in the ﬂux model. Therefore, the
ability of the Bayesian ﬁamework to infer this coupling provides a degree of conﬁdence
in the ability of this framework to infer, as well, implicit causal relationships from the

data.

MFA is performed under the assumption of pseudo-steady-state, which limits the
evaluation of the dynamic behavior of the system. In the future, we plan to monitor the
temporal metabolic proﬁles in order to evaluate the dynamic properties of the model.
Sensitivity analysis will then be used to identify the most sensitive parameters and hence
the most important variables in the network that affect the stability of a metabolic state.
Furthermore, this analysis can be used to evaluate how to manage the transition from one
steady-state to another and thus guide us in designing experiments to collect the requisite
data for these transition studies. Because the costs for collecting experimental data are
relatively high in both time and monetary expense, this type of analysis can provide a
means to screen and thus optimize the number of experiments necessary to obtain the

relevant information.

In conclusion, we have shown that Bayesian network analysis could enable

inference of network structures from metabolic data. From the resulting network, we can

82

identify the relevant variables that most affect a target function(s), thereby providing a
framework to optimize the target ﬁmction(s) and insight into the underlying mechanisms
that govern the metabolic state. Finally, the ability of this methodology to infer well-
known metabolic structures from experimental data provides conﬁdence in the ability of
this methodology to infer other networks, such as, genetic regulatory networks from gene

data.

83

CHAPTER 4 IDENTIFYING PHENOTYPE RELEVANT
PAHTWAYS

4.1 Introduction

In chapter 2, GA/PLS was applied to identify important genes relevant to a metabolic
function. In chapter 3, Bayesian Network analysis was applied to infer metabolic network
from metabolites proﬁles. In this chapter, approaches, i.e. GA/PLS and Bayesian network
analysis are integrated into a uniﬁed framework, which we denote as the Three Stage
Integrative Pathway Search (T IPSO), to identify the genes and pathways that regulate a
particular phenotype.

The regulation of cellular function is achieved through the contribution and
interactions of genetic, signaling, and metabolic pathways. Consequently, cellular
processes may be singularly regulated, i.e., at the gene or transcription level, or controlled
by a network of interactions of genes, proteins and metabolites. Therefore, understanding
the biological function of the myriad of genes and how these genes and gene products
interact and regulate each other to yield a flmctional cell would help to identify more
appropriate pathways that should be targeted or studied for a given disease. An effective
drug should be designed to regulate the targeted pathways, while leaving other pathways
unaltered. Thus, it is important to identify the pathways in cells perturbed by external
stimuli including the drug.

With the advent of high throughput technologies, systems of genes, proteins and
metabolites for a given cellular state may be readily obtained, requiring novel analytical

frameworks to aid in unraveling the regulatory and functional relationships. Approaches

84

such as log linear regression [di Bernardo et al. 2005] and mutual information based
ARACNe [Basso et a1 2005] have been applied to yeast and human B cells, respectively,
to reverse engineering genome wide gene regulatory networks. Unfortunately, these
approaches require large amount of data. Thus the challenge remains as to how to
reconstruct active networks ﬁ'om a small number of observations. Therefore, we
developed an approach that identiﬁes the active pathways without requiring interaction
measurements or libraries of genetic mutants, and with a limited amount of data, namely,
by integrating gene expression and phenotypic proﬁles. To identify the active pathways,
we ﬁrst developed approaches to identify phenotype relevant (active) genes with GA/PLS
[Li and Chan 2004] and constrained independent component analysis (CICA) [Lin et al.
2002, Lieberrneister 2002]. BN analysis was then used to reconstruct the active sub-
network from the smaller group of genes to reveal which pathways are induced by the
external stimuli or environment. BN can detect indirect inﬂuences and unmeasured
events and is not susceptible to the existence of unobserved variables. It has been applied
to infer gene regulatory network of yeast cell cycle from gene expression data, metabolic
subnetworks from metabolic data and protein signaling from protein activity data [Li and
Chan 2004a; Friedman 2004; Sachs et al. 2005]. BN is computationally inefﬁcient when
applied to large network e. g. genome wide network. However, in our framework BN was
applied to a reduced subset of relevant genes, which circumvented this limit. In addition,
the reconstructed model can predict the effect of perturbing a gene (or several genes) on
the other genes in the network, and thus provide insight into how the genes interact

within the network. Mathematical frameworks that can reconstruct the active pathways

85

will permit better understanding of how the environment interacts with the genes to
regulate cellular ﬁmctions and phenotypes.

As proof-of-concept, the framework was applied to identify pathways that
regulate cytotoxicity in human liver cells. Elevated levels of free fatty acids (FFAs) and
TNFcr have been shown to be involved in the pathogenesis of liver disorders, such as
fatty liver disease and steatohepatitis. Saturated FFA, palmitate, was found to induce
cytotoxicity and TNFcr exacerbated this effect [Srivastava 2006]. Quantiﬁcation of the
genetic responses of hepatocytes to physiologically elevated levels of FFAs and T'NFor
may provide insight into the physiological actions of these factors and identify the
pathways involved in conferring cytotoxicity.

We developed a TIPSO framework whereby we assume, as a ﬁrst approximation,
3 log linear relationship between gene expression and cytotoxicity. The genes selected by
GA/PLS were corroborated with published results to identify known interactions and
analyzed by gene ontology to identify known functional associations. Thus, GA/PLS was
used to determine the relevance of the genes to LDH release. In order to extract an
independent pathway related to a process, such as LDH release, from the gene expression
proﬁle, we propose a constrained ICA (CICA) approach. With the relevance as
determined by GA/PLS and the LDH proﬁle as constraints, CICA was applied to extract
an independent component from the gene expression data. CICA assumes that the
expression proﬁle of thousands of genes can be represented by a reduced number of
mutually independent processes. Biological meaningful gene groups have been
successfully identiﬁed by ICA [Lieberrneister 2002; Martoglio et al. 2002]. CICA selects

a subset of genes whose proﬁles are statistically independent to the other components and

86

corresponds closest to the proﬁle of LDH release. This is achieved by minimizing the
mutual information between the independent components and maximizing the correlation
to the constraints. Finally, Bayesian network (BN) analysis was applied to reconstruct the
pathways from the expression proﬁle of the selected genes. The reconstructed network
was perturbed to identify i) which genes, when perturbed, have the greatest impact on
altering the phenotype (high LDH release) in the palmitate cultures, and ii) how does
perturbing one or several genes (nodes) affect the other genes in the network, as well as
the phenotype. The simulated perturbations of the reconstructed model were evaluated
experimentally to assess the validity of the predictions. The reconstructed network of
pathways provided potential explanation(s) on how TNFcr and palmitate induce LDH
release. The model identiﬁed i) the involvement of stearoyl-CoA desaturase (SCD) in the
palmitate-induced cytotoxicity, ii) the activation of nuclear factor kappa B (NF-KB) by
TNF-or is mediated by protein kinase C (PKC)-8, , and iii) the role of Bel-2 and PKR in

palmitate induced cytotoxicity.
4.2 Materials and Methods

4.2.1 Three Stage Integrative Pathway Search Framework (T IPS©)

We applied a mathematical framework that ﬁrst integrates genetic algorithm (GA) and
partial least squares (PLS) analyses to identify the genes relevant to LDH release, but
these genes may be involved in many independent pathways. Therefore, the framework
then applies CICA to identify an independent pathway involved in LDH release. Finally
the connections between these identiﬁed genes are reconstructed using BN analysis to

infer how the genes interact with each other in the independent pathways. The

87

reconstructed network illustrates how the genes interact under the given enviromnental

conditions to regulate LDH release. The framework is shown schematically in Figure 4.1.

 

 

Metabolic function
e. g. LDH

\ GA/PLS /

Genes involved in the
cellular process

[ Gene expression ]

LI

 

  

 

 
 
  

  

 

Genes involved in the
Independent process

\BN/

[ Pathways ]

 

 

 

 

 

Figure 4.1: Scheme of TIPSe. GA/PLS, ICA and BN are integrated to infer the pathways thtat regulate a
cellular function.

The framework being developed involved two central tasks. The ﬁrst task attempts to
identify the factors involved in producing a phenotype and the second task attempts to
uncover how these factors associated with each other in a network of pathways. The three
stage framework illustrated in Figure 4.1 was developed to integrate gene expression and
metabolic proﬁles to identify the pathways involved in producing a particular phenotype.
In the TIPS© framework, GA/PLS and CICA were applied to achieve the ﬁrst task and
EN analysis was applied to achieve the second task.

ICA was developed originally for blind source separation (BSS). Original sources
S were mixed by matrix M resulting in observation Y. The goal of B88 is then to

estimate the original source with the dc-mix matrix W resulting in Z. Thus the ICA model

can be expressed as
Z = WY (4- 1)
88

where W is the de—mix matrix and Y is the observations, Z is the estimated source signal.
Z and W are simultaneously estimated by minimizing the statistical dependence (e.g.
mutual information) between the columns in Z. The problem of ICA is then reduced to:
Minimize (Z) (4.2)
s.t. Z = WY.

In some cases, a priori information about the signal source and mix matrix are available.
This information can be used to constrain W and Z, and thereby incorporate the a priori
knowledge. We denote the signal measurements Y to be the gene expression data, S to be
the independent components (pathway), and A to be the mixing matrix. The ICA model is
then expressed as

Y = A s (4.3)
The gene expression Y is supplied to the ICA model and the mixing matrix A can be
uniquely estimated by assuming that the components in S are statistically independent to
each other. Y(i, t) represents the expression of gene 1' in experiment t, A(i,j) represents
weight of gene i in independent pathway j, S(/', t) represents the proﬁle of independent
pathway j in experiment I. Since the objective here is to identify LDH release related
independent pathway, A was further constrained with the frequency learned by GA/PLS
and S was further constrained with LDH release proﬁle. Let F(i) be the frequency of gene
i with respect to LDH release and a(i) be the weight of gene i in the pathway related to

LDH release. Thus, a is constrained to have a correlation with F by equation (4.4), where

pl is a threshold value.

Corrl = aT*diag (F)*a/(aT*a)>pl , (4.4)

89

Let s(t) be the proﬁle of LDH related pathway in experiment t and L(t) be the proﬁle of
LDH release in experiment 1. Similarly, s is constrained to have a correlation with L by
equation (4.5), where p2 is a threshold value.

Corr2 = sT*L*LT*s/(s*sT) >p2 (4.5)
For further details please refer [Lin et al. 2002].

The advantage of TIPS© is that it identiﬁes a subset of genes involved in the
active pathways in response to the environmental stimuli and then reconstructs the
pathways from this subset of genes using BN analysis. It reduces the number of samples
required for pathway reconstruction and provides pathway information speciﬁc to a

cellular function.

4.2.2 Bayesian Network Inference

Bayesian network inference was used to predict the probabilities of a phenotype, e.g.
LDH, or a gene within the network taking on a certain value upon changing a target gene.
The posterior probability that the class node will take on a certain value given the values
of the other nodes is determined based upon conditional probability. Suppose node A in
Figure 4.2 is the target node and b1 and c1 are the known values of evidence nodes B and
C, respectively, we can predict the posterior probability Pr(A| xi, xj) according to the
Bayesian rule:

Pr(A=a1|B=b1, C=c1)=Pr(B=b1,C=c, |A=a,)*Pr(A=a1)/Pr(B=b1,C=c1) (4.6)
However, applying this exact inference to a large network is computationally expensive
[Cooper 1990]. Therefore, we applied approximate inference algorithm such as logic
sampling [Henrionl988] to infer the posterior probabilities. Brieﬂy, logic sampling
generates a case by randomly assigning values to each node weighted by the probability

90

 

of that value occurring. To estimate the posterior probability Pr(X|E) where X is the
target node and E is the evidence node, we compute the ratio of the number of cases
where both E and X are true to the number of cases where just E is true, i.e. Pr(X=x|E=e)

= Pr(X=x, E=e)/Pr(E=e).

 

Figure 4.2. Example of a Bayesian network. Gene A is the parent node of Genes B and C, Gene B is the
parent node of gene 0. Genes B and gene C are conditionally independent given Gene A.

4.3 Results and discussion

4.3.1. Identifying genes relevant to cytotoxicity or LDH release using GA/PLS

We found that exposing human hepatoblastoma cells (HepG2/C3A) to three different
types of FFAs (palmitate, oleate, and linoleate at 0.7mM) in the presence and absence of
TNFcr (0, 20, 100 ng/ml), only palmitate was cytotoxic to the cells and resulted in
signiﬁcantly higher LDH release. TNFcr alone was not toxic to the cells. The effect of

TNFcr on cytotoxicity was observed only in the palmitate-treated cells. To obtain a global

91

view of the cellular processes, we capitalized upon high-throughput methods to quantify
gene expression proﬁles of the HepG2 cells.

We applied GA/PLS to the gene expression and LDH release data. The GA/PLS
algorithm determined the relevancy of each gene to LDH release by counting the
frequency with which each gene was selected by GA into a subset of genes used to
predict LDH release. The genes identiﬁed by GA/PLS to be relevant to palmitate-
induced cytotoxicity, as measured by LDH release, included oxidative stress related
genes (e.g., glutathione peroxidaase (AA664180)), apoptosis related genes (e.g., caspase
8 (AA44868), Bel-2 (AA446839)), TNF- related genes (e.g., TNF super-family
(AA77863)), mitochondria membrane permeability related genes (e.g., translocase of
outer membrane (TOM)-34 (AA457118)), ceramide related genes (e.g.,
sphingomyelinase (AA676836), sphingosine kinase (AA630354)), translational related
genes (e.g., eIFZB (AA027240)), and signal cascade related genes (e.g., protein
phosphatase 2A (AA490473) and PKCS (H11054)). The identiﬁcation by GA/PLS of
oxidative stress related genes suggests the involvement of reactive oxygen species (ROS)
in the palmitate-induced cytotoxicity. Likewise the model identiﬁed ceramide related
genes, e.g., sphingomyelinase and sphingosine kinase, are involved. In addition,
mitochondria potential and signal transduction are involved in the palmitate-induced
cytotoxicity.

4.3.2. Identifying genes involved in an independent pathway related to
cytotoxicity using CICA

GA/PLS determined the frequency with which each gene was selected to predict LDH
release. The frequency and the proﬁle of LDH release were applied as constraints in

92

 

CICA to extract an independent component from the gene expression proﬁle. The
independent component in this case identiﬁed a subset of genes whose proﬁles
corresponded to the proﬁle of LDH release. CICA determined the weights for each gene
by minimizing the mutual information between the independent components and
maximizing the correlation to the constraints. Genes that have weights signiﬁcantly
different from zero with a 95% conﬁdence using the Z-test were subjected to BN analysis

for pathway reconstruction.

4.3.3. Reconstruct pathways related to cytotoxicity using BN

BN reconstructed how the genes, identiﬁed by CICA, are connected in a network and
involved in LDH release or cytotoxicity. The resulting network is shown in Figure 4.3.
The model revealed possible mechanisms involved in palmitate-induced cytotoxicity. To
evaluate these potential mechanisms, we performed perturbation analyses and
experimental validation. In the perturbation studies, all connections relevant to the

perturbed genes were included.

93

 

Trensloeese of outer

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

TNF ligand TRAF Binding . . ,
. —— . — mrtochondnal membrane ——- 601-
superfamrly protein (TOT) 20
CPT-1 BCLZ elF28
Capase 9
| 7 Apoptotic chromatin
condensation
CREB
l
l PKR
PTP receptor N- F AAH
90043895092 I Apoptosis inhibitor
I ’ (APl)5
Capase 10 I
PP2A
PPAR binding _ Fatty aeyl-coa _ '
protein ligase ‘ HMOXZ
I TOM 70 '
l PTP non receptor
1 (PT 1)
PPZA _
PKC-Iike l-KappanB Interacting
Ras—lik protein
Cytoeh c (J, i
- , Un pling
ATPase oxidase prolein PTB
_- PKC delta PTPN9 Acetyl-CoAcarboxylase
NF-Kappa-B JNK-1 l-kappa-B _1
l Stero -CoA
desaTrase
Apoptosis related
protein
LDH

Figure 4.3 A representative sub-network related to cytotoxicity. GA/PLS and ICA select the relevant
genes, and BN analysis reconstructs the network using the subset of selected genes. The network provides
an overview of the factors and pathways involved in regulating cytotoxicity.

94

4.3.3.1 Perturbation of the reconstructed network

Based upon the reconstructed network illustrated in Figure 4.3, BN inference was applied
to predict the effect of perturbing a single gene/node on i) the other genes within the
network and ii) reducing the level of LDH release, in the palmitate and control cultures.
The reconstructed network allowed us to artiﬁcially perturb the network and identify
which nodes were the most sensitive and thus had the most impact on modulating LDH
release. An assumption in our model is that perturbations at the gene level will translate
correspondingly (qualitatively, but not necessarily quantitatively) to the protein level.
Therefore, we altered the activity of the nodes by either down-regulating (up-regulating)
the genes or inhibiting (activating) the protein activity, to study how the nodes interact
with each other in the network. Genetic perturbations of SCD, PKC-6 , Bel-2, NF-KB

and double-stranded-RNA-dependent protein kinase (PKR) were simulated with the BN
inference to study their effects on the probability of LDH taking on a phenotype either of
high or low release (see Table 4.1).

Table 4.1 Simulating genetic perturbation and its effects on LDH release

 

 

 

 

 

 

 

 

Probability of LDH Release

Control Palmitate
Gene Perturbation Low High Low High
no perturbation 0.94 0.06 0.02 0.98
Upregulation 0.94 0.06 0.55 0.451
SCD Downregulation 0.9 0.1 0.02 0.98
PKC-S Downregulation 0.9 0.1 0.02 0.98
PKR Downregulation 0.85 0.151 0.08 0.921
NF-KB Upregulation 0.94 0.06 0.15 0.851

 

 

 

 

 

 

 

 

95

Note that the probability of LDH release taking on a high or low level in the control and
palmitate cultures. Using a BN inference on the structure shown in Figure 4.4 —
simulations of up/down regulation of SCD, down-regulation of PKR, and down-
regulation of PKC-5 and upregulation of NF-KB as compared to the original condition
(unperturbed state). “1” indicates decreased probability and “T” indicates increased

probability.

4.3.3.2 Role of stearoyl-CoA desaturase (SCD) in palmitate induced cytotoxicity

A connection between SCD and acetyl-CoA carboxylase (ACC) was found in the
reconstructed network. SCD is the rate-limiting enzyme to produce monounsaturated
fatty acids. Its deﬁciency has been found to increase fatty acid oxidation by activating
AMP-activated protein kinase (AMPK) in the liver. AMPK phosphorylates ACC at Ser-
79 which inhibits ACC activity and decreases malonyl-CoA concentration [Dobrzyn et al.
2004]. Malonyl-CoA inhibits carnitine palmitoyl-CoA transferase (CPT-l) [Kemer and
Hoppel, 2000]. Thus a decreased level in malonyl-CoA activates CPT-l activity and
increases fatty acid beta-oxidation in the mitochondrion [Dobrzyn et al. 2004]. In the
palmitate cultures, the protein (Figure 4.4) expression levels of SCD were reduced
compared to the control. This may explain in part the preference of hepatocytes to
oxidize as oppose to synthesize TG from palmitate, as is the preference with the
unsaturated fatty acids. Indeed, co-supplementing palmitate with oleate enhanced the

accumulation of TG [Li 2006] and reduced the cytotoxicity of palmitate (Figure 4.4c).

96

 

(A)

B—actin

 

TNF(ng/ml)100 2o 0 100 20 o 100 20 o

Oleate Palmitate HepGZ

(B)

 

TNF (nglml) 100 20 0 100 20 0 100 20 0

 

 

Palmitate + Oleate Palmitate HepGZ
(C)
1:: /W

P0 P20 P100 P00 PO20 P0100
Experlmental Conditions

Figure 4.4 Effect of TNFo. and palmitate on stearoyl-CoA desaturase (SCD) measured by western
blotting. (A) SCD was downregulated in the palmitate (0.7 mM) cultures as compared to the oleate
(0.7mM) and control cultures. (B) Co-supplementation of oleate (0.3 mM) with palmitate (0.4mM)
prevented the downregulation of SCD. TNFor decreased the expression of SCD in the control, oleate and
co-supplementation (oleate + palmitate) cultures. (C) Co-supplementing palmitate (0.4 mM) with oleate
(0.3mM) decreased LDH release signiﬁcantly, P<0.01 (t-test). P: treated with 0.7mM palmitate for 48
hours, PO: treated with 0.4 mM palmitate plus 0.3 mM oleate for 48 hours. Number following P or PO
represents the concentration of TNFcr in ng/ml, e.g. P20 is 0.7 mM palmitate with 20 ng/ml TNFcr. Data
expressed as average +/— SD from three independent experiments.

97

To evaluate the role of SCD on LDH release, we simulated up- or down-regulation of
SCD in the reconstructed network by setting the SCD gene expression level at either a
high or low level, respectively. Up-regulating SCD in the palmitate cultures reduced the
probability from ~98% to ~45% that the LDH release would remain high (Table 4.1).
While down-regulating SCD in the control cultures suggests that the LDH release would
likely remain low (i.e., down-regulating SCD will have no effect on LDH release). The
simulation results agreed well with the literature, e.g., over-expressing SCD has been
shown to prevent palmitate-induced cytotoxicity [Listenberger et al. 2001]. The induction
of SCDl activity is transcriptionally activated [Ntambi 1995]. In further support of the
protective of role of SCD, we supplemented the cultures with 50 M of cloﬁbrate or
ciproﬁbrate, which are known to increase the activity of SCD through a PPAR
independent pathway [Rodriguez et al. 2001; Miller and Ntambi 1996]. The ﬁbrate
supplementation signiﬁcantly decreased LDH release in the palmitate cultures (Figure
4.5). In the control cultures TNFa decreased the level of SCD protein (Figure 4.4 A), but
did not affect the LDH release [Srivastava 2006], which agreed with the simulation
results shown in Table 6.1.

To examine the connection between TNFor and SCD shown in Figure 4.3, the
protein expression level of SCD in control (HepG2 medium), palmitate, oleate, and
palmitate + oleate cultures were measured as a function of the level of TNFa by western
blotting. The presence of TNFcr down-regulated the protein expression level of SCD
under all the conditions except palmitate (Figure 4.4A, B). Similarly, palmitate
supplementation decreased the protein expression level of SCD as compared to the

control and oleate cultures. Co-supplementing the palmitate cultures with oleate restored

98

 

the SCD protein expression level (Figure 4.4B) and correspondingly reduced LDH
release (Figure 4.4 C). Administrating ﬁbratcs increases the liver phospholipid proportion
of monounsaturated fatty acids [Miller and Ntambi 1996], which appears to be similar in

effect to the addition of oleate to the palmitate cultures.

 

LU-lFbleaee(%)
a 8 8

b
O

    

Ciproﬁbrate+Palm Cloﬁbrate+Palm Palm

 

 

 

Figure 4.5. Effect of SCD activator, cloﬁbrate and ciproﬁbrate, on LDH release in the palmitate
cultures. Cloﬁbrate and ciproﬁbrate are known to increase the activity of SCD. Palm: treated with 0.7 mM
palmitate for 48 hours, Palm+cloﬁbrate: treated with 50 M cloﬁbrate and 0.7 mM palmitate,
Palm+ciproﬁbrate: treated with 50 11M ciproﬁbrate and 0.7mM palmitate. 6 hours pretreatment followed by
48 hours eo-supplementation of 50 M of cloﬁbrate or ciproﬁbrate signiﬁcantly decrease LDH release in
the palmitate culture, P<0.01 (t-test). Data expressed as average +/- SD ﬁom three independent
experiments.

Oxidative stress appears to be involved in down-regulating the protein level of SCD in
the palmitate cultures. Supplementing with 10 mM N-acetyl cysteine (NAC) prevented
the down-regulation of SCD (Figure 4.6) and reduced the release of LDH. The
involvement of SCD in palmitate-induced cytotoxicity is further supported by a recent

study whereby over-expressing SCD was found to protect CHO cells from palmitate

induced cytotoxicity [Listenberger et al. 2001].

99

SCD

 

w in.» w w 1.1—r d B-aetin

NAC (10mM) + + + - - -

Figure 4.6. Effect of antioxidant N-acetyl-cysteine on SCD measured by western blotting. Down-
regulation of SCD in the palmitate culture was prevented by co-supplementing with an antioxidant N-acetyl
cysteine (10 mM).

4.3.3.3 Activation of NF-KB by TNF-a is mediated by PKC-6

In a separate study, we found the phospho-p65 NF-KB levels to be signiﬁcantly lower in
the palmitate cultures than in the control cultures (data not shown). Using the BN
inference to simulate an up-regulation of NF-KB in the palmitate cultures predicted that
the probability of LDH release being high would decrease (see Table 4.1). NF-KB is an
important cytoprotective transcription factor, which can be activated by oxidative stress
and cytokines, including TNF-or [Haddad 2002]. From Figure 4.3 we ﬁnd the connection
between TNFor and NF-KB is linked through PKC-S, suggesting that PKC-B is an
intermediate factor in the activation of NF-KB. PKC-5 has been found to be a redox-
sensitive kinase in many cell types [Kanthasamy et al. 2003]. PKC-8 is activated through
translocation, tyrosine phosphorylation or proteolysis. During proteolysis, the native
PKC-5 (72-74 kDa) is cleaved into two fragments, a catalytically active 41 kDa and a
regulatory 38 kDa fragment. The 41 kDa catalytically active fragment plays a key role in
promoting apoptotic cell death [Kanthasamy et al. 2003]. In addition, PKC-5 has been
found to associate with the P60 TNF receptor during TNFor triggered signaling

[Kilpatrick et al. 2002]. As shown in Figure 4.7, TNFcr increased the expression of the

100

 

PKC-S 41 kDa catalytic active ﬁ'agment. This conﬁrmed the connection between TNFor
and PKC-B in the reconstructed network.

Connections between TNFcr, PKC-S and NF-KB have been identiﬁed in cells
such as neutrophils [Kilpatrick et al. 2002] and pancreatic acinar cells [Satoh et al. 2004].

I I .
-..r .1.-.” ‘ e.
_ 3': r‘u-‘X‘A 4

Cleaved

 

PKC-S

 

B—actin
TNF(ngIml) 100 2f0 0 100 20 O 100 20 0

Oleate Palmitate HepG2

Figure 4.7. Effect of TNFo. on PKC-6 expression measured by western blotting. Expression of the
catalytically active PKC-S 41 kDa fragment was increased by TNFcr.

Table 4.2 Simulating down-regulation of PKC-S and its effects on NF-ch in
medium culture (with 20 nglml TNF-a).

 

Probability of NF-KB activation

 

 

 

Gene Perturbation Low High
no perturbation 0.08 0.92
PKG-6 downregulation 0.4 0.6 1

 

 

 

 

 

 

Note: The probability of NF-xB taking on a high or low level. The model predicts that simulating a
down-regulation in PKC-S will decrease the probability of NF-KB taking on a high value. “1”
indicates decreased probability and “1” indicates increased probability.

Inhibiting PKC-6 has been shown to attenuate TNF-a-mediated activation of the anti-
apoptotic transcription factor NF-KB in adherent neutrophils [Kilpatrick et al. 2002].
There has been no study to date suggesting that PKC-S mediates TNF-a’s activation of

NF-KB in HepG2 cells. Our model suggests that down-regulating PKC-S will decrease

101

the probability of NF-KB taking on a high expression level in the medium (plus TNF-or)
cultures (Table 4.2), and maintain a high probability of a high LDH release phenotype in
the palmitate cultures (Table 4.1). To determine whether PKC-5 is involved in mediating
the activation of NF-KB by TNFcr in HepG2 cells, we added rottlerin, an inhibitor of
PKC-5, and measured the activity levels of NF-KB by western blotting, and LDH release.
Rotterlin inhibits PKC-8 by inhibiting the tyrosine phosphorylation of PKC-S. The
activity of NF-KB was measured by detecting the levels of phophorylated NF-KB p65 at
Set-536 [Sakurai et al. 2003]. As shown in Figure 4.8, the activation of NF-ch p65 was
attenuated by rottlerin. Therefore, PKC-S was appropriately identiﬁed by the model to be
an important factor in mediating the TNFcr signaling pathway. In addition, the model
predicted that inhibiting PKC-5 in the palmitate cultures should maintain a high LDH
release phenotype. Indeed, as shown in Figure 4.9, rottlerin supplementation increased
LDH release in all the palmitate cultures and the medium cultures with TNFcr. Therefore,
the cytoprotective role of NF-KB on LDH release in HepG2 cells is mediated in part by

PKG-5.

102

 

TNF(ng/ml) 100 20 o 100 20 o

 

 

 

Rottlerin + + + - - -
HepGZ NF-ch
Mediu B-aetin
Palmitate nun—t m NF-ch
Medium B-actin

“—uww

Figure 4.8. Effect of rottlerin on NF-KB measured by western blotting. Expression of phospho—P65 NF-KB in
control and palmitate mediums with 0, 20, 100 ng/ml TNFu, with and without rottlerin (5 uM). TNFcr activated
phospho-P65 NF-KB and this activation was attenuated with the PKC-6 inhibitor, rottlerin.

 

%LOH Release

 

 

Experimental Condition

 

 

 

Figure 4.9 Measurement of LDH release in the different culture mediums, with and without rottlerin.
Palmitate increased LDH release compared to control cultures. Rottlerin signiﬁcantly increased LDH
release in the palmitate cultures (comparing P0 with POR, P20 with P20R, P100 with PlOOR), P<0.05 (t-
test). H: treated with culture medium (control) for 48 hours, P: treated with 0.7 mM palmitate for 48 hours;
R: treated with 511M rottlerin. Data expressed as average +/- SD from 3 independent experiments.

4.3.3.4 Role of Bel-2 and PKR in Palmitate induced cytotoxicity

103

Bel-2 is a group of proteins including pro-apoptotic members, such as Bax, Bid, Bad, and
anti-apoptotic ones such as Bel-2, Bcl-xl, Bel-w. Anti-apoptotic Bel-2 protein inhibits
apoptosis by guarding the mitochondrial gate against the release of cytochrome c and the
subsequent activation of caspases. Bel-2 was found to be connected to factors such as
TNFa, PKR, TOM20, eIF2B, which in turn inﬂuenced LDH release (Figure 4.3). The
regulation of Bel-2 by TNFa is cell dependent [Kim et al. 2002; Tamatani et a1. 1999].

The protein expression level of Bel-2 in cultured HepG2 cells as a function of
TNFa concentrations were measured with western blots (Figure 4.10A). In the control
cultures, TNFa (20-100 ng/ml) slightly suppressed the protein expression level of Bel-2
in a dose-dependent manner. Similarly, palmitate signiﬁcantly decreased Bel-2 protein
expression levels as compared to the control and oleate cultures. The suppression of Bel-
2 may explain in part the palmitate and TNFa induced cytoxicity. In support of this
ﬁnding, over-expression of Bel-2 in 2B4.11 T cell hybridoma cell lines have been shown
to inhibit palmitate induced cytotoxicity [dc Pablo et al. 1999].

Bel-2

p-actin

TNF 100 20 0 100 20 0 100 20 0

r

(nglml) Oleate Palmitate HepG2

Figure 4.10. Effects of TNF a on Bel-2. (A) Effect of palnritate and TNFa on Bel-2 expression measured
by western blotting. TNFa supplementation at 20-100 ng/ml downregulated Bel-2 in the control, palmitate,
and oleate cultures. Palmitate downreglated Bel-2 protein expression level as compared to the control and
oleate cultures

104

(B)
Bel-2

~ ~ WW ﬂ-actin

 

 

TNF (nglml) 100 2; 0 100 20 0
PKR inhibitor + + + - - -
(C)

”_ gum- .u. W... »~. as...“ 1 Bel-2

”m- “ “I * B-actin

TNF(ngImI) 100 go o 100 go o

PKR inhibitor + + + - - -

Figure 4.10. Effects of TNF a on Bel-2. (B) Effect of PKR inhibition on Bel—2 level in the control
cultures. PKR inhibitor (6uM) decreased the expression of Bel-2. (C) Effect of PKR inhibition on Bel-2
level in the palmitate cultures. PKR inhibitor (611M) increased the expression of Bel-2.

Bel-2 was connected to the translocase of outer membrane (TOM20), cysteine ligase
(GCL) and carnitine palmitoyl transferase (CPT—l) in the reconstructed network. These
connections are supported by published results [Motz 2002, Haddad 2004]. Targeting the
Bel-2 protein to the mitochondria is mediated by the interaction between the C terminus
of Bel-2 and TOM20 [Motz et al. 2002], whereas inhibiting GCL in fATIII cells has been
found to enhance Bax and p53 expressions over Bel-2 expression [Haddad 2004]. Finally,
direct interaction between Bel-2 and CPT-l has been identiﬁed with co-

immunoprecipitations [Paumen et al. 1997]. These connections predicted by the model

105

 

require further analysis into how Bel-2 may be regulated as well as its role in regulating
the mitochondrial membrane in HepG2 cells exposed to palmitate.

PKR was also found ill the model to be connected to Bel-2. Simulating a down-
regulation in the PKR node suggests that the LDH release will decrease in the palmitate
cultures and increase in the control cultures. Indeed we found that inhibiting PKR (6 uM
PKR inhibitor) in the control cultures decreased the Bel-2 protein expression (Figure
4.10B), correspondingly the LDH release increased from ~3% to ~25%. Inhibiting PKR
in the palmitate cultures up-regulated the Bel-2 protein expression (Figure 4.10C) and
decreased the LDH release from ~50% to ~40%. Therefore, the model appropriately
identiﬁed PKR to be an important factor involved in regulating Bel-2 protein expression,
and in turn LDH release.

In addition, PKR was also found to be connected to eIF2B, PP2A, apoptotic
chromatin condensation inducer and apoptosis inhibitor, suggesting that these factors are
likely involved in the apoptotic signaling pathway. These connections are supported by
published results in the literature. PKR can be cleaved by caspase-3, 7, 8 to liberate the
eIF2-or kinase domain, which phosphorylates eIF2-or [Saelens et al. 2001].
Phosphorylation of eIF2-or by PKR will inhibit protein synthesis and lead to apoptosis.
Translation of all proteins is dependent upon the binding of the eIF2-GTP-tRNA,M“
ternary complex to the 40S ribosomal subunit, and the amount of eIF2 present in the
complex is dependent upon the activity of eIF2B. PKR also can bind to PP2A at the B56
alpha regulatory subunit and increase the phosphatase activity of PP2A [Xu and Williams

2000]. PP2A is a major Ser/Thr phosphatase involved in many signal transduction

106

 

 

pathways. PP2A can dephosphorylate and inactivate the anti-apoptotic Bel-2 at Ser-7O
[Deng et al. 1998].
4.3.4. Model extension
With the availability of high dimensional biological data to characterize a cellular state,
one of the challenges is the development of robust methods that can integrate various
orthogonal datasets and identify the genes and pathways that induce a phenotype. The
signiﬁcance of the TIPS© framework is its ability to extract relevant information, both
known and unknown, from the high dimensional data. The phenotypic proﬁle guided the
information extraction process. Proteins relay information from the genes to execute
biological functions, which deﬁne the cellular phenotype, e. g. metabolic functions. Thus,
the effects of regulation occurring at the protein level manifest themselves in the
phenotype. This chapter illustrates that uncovering this information at the protein (i.e.,
intermediate) level may be achieved by integrating phenotype and gene expression data.

Metabolite proﬁles, which characterize the cellular phenotype, may also be used
as constraints. Currently, only one phenotypic proﬁle, e.g., LDH release, was used to
identify the active network perturbed by TNFa and FFA exposure. The phenotype
characterization could be improved by incorporating more metabolic functions. This is
discussed in more detail in chapter 6. An approach to obtain more constraints would be
to apply ICA or PLS to extract several latent variables from the metabolic proﬁles
[Grifﬁn et al. 2004].

The TIPS‘9 framework provides hypotheses of potential mechanisms involved in
palmitate and TNFa induced cytoxicity that may be experimentally tested and used to

further enhance the model. Experimental approaches, such as ChIP technology, which

107

measure the physiological interactions between genes and proteins, are becoming more
readily available. Integrating interaction data with the gene expression proﬁle using
models, i.e. BN, would improve the network inferred from gene expression data alone.
One way to accomplish the integration would be to deﬁne the a priori probabilities of the
connections in the gene regulatory network based upon the interaction measurement.
Hartemink et al. 2002 provided an example of combining protein-DNA interaction with
gene expression data in a BN framework to infer regulatory network underlying
pheromone response in yeast.

Due to computational limitation as well as limited availability of data, it is not
possible to reconstruct a network with high conﬁdence using all the genes across the
genome. GA/PLS and ICA provide an approach to identify relevant genes for ﬁrrther
analysis. However, useful information may be missed in the selection process due to low
abundance transcripts or the noise in the data. To address the former, methods such as
kinetically monitored reverse transcriptase-initiated PCR (kRT-PCR) could be used to
measure genes with low abundance transcripts [Holland 2002]. For noise, more replicates
of cDNA microarray should be obtained to achieve an accurate estimation of the mRNA
expression level. For example, 7 arrrays per sample are required to identify 90% of the
truly differentially expressed genes using two-sample t-test [Wang and Chen 2004].
Therefore, another approach to address the latter would be to use a more targeted array
with a smaller subset of genes. In conclusion, we have illustrated that TIPS© may be
applied to reconstruct the active associations, both known and currently unknown, from
gene expression and metabolite proﬁles to help elucidate the pathways involved in

regulating palmitate-induced cytotoxicity.

108

 

 

CHAPTER 5 STATE SPACE MODEL TO INFER
TRANSCRIPTION FACTOR ACTIVITIES FROM
DYNAMIC GENE EXPRESSION DATA

5.1 Introduction

The framework introduced in chapter 4 provided very good predictions of the effect of
some, but not all, of the perturbation studies. This may be due in part to Bayesian
network analysis’ inability to handle transients, such as cycles and feedback loops.
Uncovering the additional information from these transients would be valuable in
elucidating the mechanism involved in producing a particular phenotype. Therefore, in
chapter 5, we developed a dynamic model using time-series gene expression data. To
illustrate the ability of the model, we applied the model to Escherichia coli K12 (E. coli)
and Saccharomyces cerevisiae (yeast), which were more readily arnendable to
validatation.

Different gene regulatory motifs such as auto-regulatory, single input, multiple
input, feed-forward, has been identiﬁed from protein-DNA interaction data of
Saccharomyces cerevisiae[Lee 2002]. These various types of data capture different levels
of cellular response to environmental factors and contain within it information about the
underlying regulatory network structure. Much effort has been devoted to analyzing these
various data sets to reconstruct the regulatory features. A majority of the mathematical
approaches, such as clustering, independent component analysis (ICA), principal
component analysis (PCA), Boolean network, and Bayesian network have been applied
to infer biological information from high dimensional micro-array data. These methods

109

 

typically do not incorporate known regulatory information into the structure of the
models. Therefore, we developed a state space model (SSM) that integrates gene ‘
expression and gene regulatory network structure information to extract underlying
information on transcription factor activities.

Cells respond to environmental and physiological changes through an extensive
transcriptional regulatory network, which is composed of transcription factors (TF) and
genes. These transcription factors bind to the promoter regions of speciﬁc genes to either
positively or negatively regulate expression. High throughput technologies, such as
cDNA microarray, allow the measurement of expression data of the whole genome;
however, genome-wide measurement of the regulatory signals, i.e., transcription factor
activities (TFA), remains a challenge. Clustering has been applied to gene expression
data to identify co-regulated genes [Bar-Joseph et al., 2002; Eisen et al., 1998; Ramoni et
al., 2002] and Bayesian network analysis has been applied to infer regulatory networks
[Friedman et al., 2000]. The objective of this chapter is to infer TFAs from gene
expression data. The advent of the genome-wide binding assay to measure protein-DNA
interactions has helped to uncover the network structure describing the connections
between TFs and genes in Escherichia coli K12 (E. coli) [Salgado et al., 2000] and
Saccharomyces cerevisiae (yeast) [Lee et al., 2002]. Given the regulatory network
structure and gene expression proﬁles, the TFAs can be inferred with mathematical
modeling.

Several methods have been developed to infer TFAs from gene expression data. A
kinetic based approach [Nachman et al., 2004], which modeled mRNA transcription and

decay, did not include feedback from genes to TFs. Network Component Analysis (N CA)

110

 

 

[Liao et al., 2003; Kao et al., 2004], which assumed a log-linear relationship between a
gene’s expression and its regulatory signals, i.e. TFA, modeled the gene regulatory
network as multiple-input motifs. Feedback from genes to TFs within network structures,
such as in auto-regulation, feed-forward loops, the regulator chain, or the interaction
between TFs, is modeled as a “closed-loop” from the TF to the genes, without explicitly
modeling the feedback [Tran et al., 2005].

To complement existing approaches, we have developed a state-space model
(SSM) with hidden variables that explicitly models feedback in gene regulatory networks
to infer the regulatory signals from the gene expression proﬁle. SSM is a subclass of
dynamic Bayesian network (DBN). DBN and has been applied to infer the transcriptional
regulatory network from gene expression proﬁles, e.g., T-cell activation [Range] et al.,
2004; Beal et al., 2004]. Other models, such as Hidden Markov model (HMM), the
Boolean network, and linear and nonlinear auto-regression models are also subclasses of
DBN [Murphy and Mian, 1999]. SSM assumes the existence of state variables that
produce observations that are measurable, as well as hidden variables, which are state
variables that do not produce an observation. This feature of SSM is attractive for
modeling gene regulatory networks. As illustrated in Figure 5.1, a gene regulatory
network consisting of TFs and genes can be represented by a SSM. The state of each
gene produces observations, such as expression proﬁles, that can be measured with
cDNA microarray. The state of each TF is hidden, and thus, does not produce measurable
observations. The structure of the connections can be deduced from measurements of

protein-DNA interactions.

111

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

___]
G2 TF2
(A)
. G3 «—
e e
Space \

G1 G2 G3 G1 G2 G3 (3,
........ -.-.-.-..l.-.-.- -.-......-.. .-.-.-...-.-.- ___.)
Obsen atlon timer
'Spaee

{El} E2 E3 <1} E2 E3

 

 

1

T-
Transcription factor

 

 

Gl

T
Gene @ Gene expression

 

 

Figure 5.1: SSM representation of gene regulatory network. A) An example of a gene regulatory
network with two transcription factors, TF1 and TF2, and three genes, G1, G2, and G3. TF1 regulates GI
and G1 encodes TF1, which formulates auto-regulation between TF1 and G1. TF1 regulates G2, G2
encodes TF2, both TF1 and TF2 regulate G3, which formulates a feed-forward loop between TF1, G2, TF2
and G3. B) A SSM representation of the gene regulatory network illustrated in (A). The transcription
factors and genes make up the state space, whereas the observation space is comprised of gene expression

data.

 

We demonstrate that SSM can be applied to represent gene regulatory networks of known
structures and to infer the TFA from the gene expression proﬁle. We ﬁrst applied SSM to
learn the TFA from data simulated for the gene regulatory network illustrated in Figure
5.1. Then we applied the model to experimental data from E. coli transitioning its carbon
source from glucose to acetate [Kao et al., 2004] and to cell cycle data from
Saccharomyces cerevisiae [Spellman et al., 1998]. The inferred activity proﬁle for each
TF was validated either by a physical measurement (if available) or activity information
from the literature. Finally, further extensions to improve the current SSM model are

discussed.

5.2 Materials and methods

State-Space Model

State and Observation: In SSM, a sequence of observations (01, 02,...,OT) is generated
from a sequence of states (S1, 82,. . .,ST) with the following model:

S,=AS,.1+Wt (5.1)
O, = B S, + Vt (5.2)
where A deﬁnes the state transition probability P(S(|S(-1), i.e., how the state at time point
T can be determined from the state at time point T-l, and B deﬁnes the observation
probability P(O,|S,), i.e., how the observation at time point at T can be determined from
the state at time point T. W~N(0,Q) and V~N(0,R) deﬁne the Gaussian noise of the state
and observation, respectively. For convenience of notation, parameters A, B, W, V were
combined into a single parameter vector 0 = (A, B, W, V). In the SSM model, the
structure is time invariant and the parameters are also time invariant, i.e., the parameters
that determine the transition from T-l to T are the same as the parameters that govern the

113

.uxmr.e.-' L 011' m"

 

transition from T to T+1. However, the time-scale for each loop is not assumed to be the
same. For example, in the yeast dataset [Spellman et al., 1998], the time-scale for each
loop is the same (~ 7 minutes). However, in the E. coli [Kao etal., 2004] example, the 10
time points were taken at 0, 5, 15, 30, 60, 120, 180, 240, 300, 360 minutes. Therefore, the
time-scale for each loop using the ﬁrst 5 data points is not the same as for the last 5 data
points.

Parameter learning: 0 = (A, B, W, V) deﬁnes a SSM and is learned from N samples of
observation data 0 = (On (1),...., On (N)) by maximizing the likelihood of the
observation. An Expectation-Maximization (EM) algorithm is used to learn the
parameters. Starting with an initial guess of 0, we perform the E (expectation) step at
iteration k to estimate the value of states S_hat with 6x and 0 using inference; then, we
perform the M (maximization) step to maximize the likelihood of the conditional
probability P(0,S_hat|t9), such that 6i.+1=argmax(P(0,S_hat| 6)). For details of the
parameter learning, see [Murphy and Mian, 1999]. We used Bayes Net Toolbox
[Murphy, 2001] for the model computation.

State inference: After the parameters are learned with the EM algorithm from the
observation data, the value of the state variables, including the hidden variables, can be

recursively inferred with the Bayes rule:

P(s, 101,) = mo, 15,) Ems, I s,_1 )P(s,_1 | 01-1—1) (5.3)
t—l

Sampling method to generate simulated data
If the structure and parameters of a SSM are deﬁned, data can be generated with

sampling methods such as Gibbs sampling. The function of sample_bnet in Bayes Net

114

 

 

Toolbox [Murphy, 2001] was used to generate the simulated gene expression data and
TFA proﬁles.
Represent gene regulatory motifs within SSM framework
Different motifs, such as auto-regulation, feed-forward loops, multiple-inputs, and single
input, have been identiﬁed with protein-DNA interaction measurements [Lee et al.,
2002]. Protein-DNA interaction measurements identify the DNA sequences of a gene to
which a TF will bind. Once a TF binds to a DNA sequence (i.e., binding site) of a gene,
the TF regulates the transcription, and in turn the level of expression of that gene. Some
TFs bind to similar DNA sequences that may exist on many genes, while other TFs bind
to speciﬁc DNA sequences present in only one or a few genes. TFA is deﬁned as the
concentration of the active conformation of a TF that is capable of DNA binding. In the
current SSM representation of a gene regulatory network, each TF has one TFA node and
its TFA is the same for all binding sites. Therefore, in the model the level of activity of
the transcription factor (i.e., the TFA) indicates only whether a transcription factor is
activating (either positively or negatively) its genes or not. The likelihood that a gene is
activated by a transcription factor is inferred from the data as conditional probabilities.
For example, transcription factor TF1 binds to genes G1 and G2 at the same (or different)
binding sites. The activity level of TF1 is assumed to be the same for G1 and GZ,
however, the probabilities that G1 or G2 is activated by TF1 are different as deﬁned by
P(GllTFl) and P(GZITFI). In addition, the SSM assumes that there is a time delay
between binding and transcription.

Here we demonstrate that SSM is able to model these motifs, i.e., auto-regulation,

feed-forward loops, multiple-inputs, and single input. As shown in Figure 5.1, SSM is a

115

 

dynamic model composed of two parts: states and observations. State variables generate
observations that are measurable, whereas state variables that do not generate
observations are called hidden variables. A static gene regulatory motif can be
represented dynamically by connecting genes at time point T-l to the TFs they encode at
time point T, and connecting the TFs at time point T-l to the genes that they regulate at
time point T. Each TF has a TFA node. Each gene has a state node and an observation
node. Duplicating the state and observable variables of a gene will increase the
computational load. However, this limitation has an intrinsic advantage. Having the state
variable (gene) and the observation variable (gene expression) as separate entities allows
the SSM approach to take into account potential effects, such as post-transcriptional
effects and measurement noise. Ideally, if RNA decay and measurement noise did not
exist, then a gene that is transcriptionally activated would be measured at the mRNA
level to be activated. In other words, Pr (E=1 | G=1) =1. However, mRNA expression is
determined by the net effect of mRNA transcription and degradation. Effects, such as
RNA decay (post-transcriptional modiﬁcation) and noise in the microarray measurement,
could result in conditional probabilities Pr (E=1|G=1) less than 1. In other words, a gene
that is transcriptionally activated may be measured at the mRNA level to be inactivated if
mRNA decay dominates over transcription [Wang et al., 2002]. Figure 5.2 illustrates a
graphical representation of the auto-regulation, feed-forward loop, multiple-input, and

multi-component loop motifs.

116

 

 

stelZ

  
 

 

 

 

 

 

 

 

 

 

Stelz @

T-1 T
(A)autoregulation

 

 

 

 

 

 

 

2
9
9

I' 0x1 ap6 m ap6

ea c»

 

 

 

O

)

 

 

 

IROX

 

 

9

 

 

 

 

 

_ T-1 T
(B) Mum-component loop

Figure 5.2: An example and its SSM representation of the gene regulatory motifs of A) auto-
regulation. Transcription factor Ste12 binds to the promoter sequence of gene STE12, which encodes
transcription factor Ste12. B) multi-componcnt loop. Transcription factor Roxl binds to the promoter
sequence of gene YAP6, which encodes transcription factor Yap6. Transcription factor Yap6 binds to the
promoter sequence of gene ROXl , which encodes transcription factor Roxl.

117

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[11er FPLMA [hrszre [arszzA IPL2B . L16 A I, 82113 , 822A
1 l l I | 1 j I

T-1 1'

 

 

 

 

 

 

 

Figure 5.2: An example and its SSM representation of the gene regulatory motifs of C) feed-forward
loop. Transcription factor Mcml binds to the promoter sequence of gene SW14, which encodes
transcription factor Swi4. Transcription factor Swi4 and Mcml bind to promoter sequence of gene CLBZ.
and D) multi-input motif. Transcription factors Fhll, Rap2, YapS all bind to the promoter sequences of
genes RPLZB, RPL16A, RPSZIB, RPSZZA.

118

 

 

Threshold determination

We used a function with a deﬁnable threshold Th to discretize the gene expression data.
Any gene that showed a change larger than Th, based upon log; ratio, was assigned a
discrete value of 1, or otherwise was assigned a value of 0. Thus, a threshold of 1
indicates that a 2-fold (2') change in the expression of a gene, relative to its initial state,
is signiﬁcant.

SSM requires an optimal threshold in order to obtain reliable results. To ﬁnd an
optimal threshold, we evaluate the TF A data for each TF over various thresholds between
—1 and 1. The optimal threshold, which gives the most appropriate proﬁle for each TF, is
determined by comparing the TFA results with known (e.g., measured or literature)
values. For example, the gene expression data that we used for Saccharomyces cerevisiae
was taken over 18 time points that spanned two cell cycles [Spellman et al., 1998]. The
TFs we studied in Saccharomyces cerevisiae are known to have phase-speciﬁc activity
during the cell cycle [Acme et al. 1998; Baetz et al., 1999; Kovacech et al., 1996; Lee et
al., 2002; MacKay et al., 2001; Mclnemy et al., 1997; Oehlen et al., 1996; Spellman et
al., 1998]. Therefore, we assumed that the activity proﬁle of each TF during the ﬁrst cell
cycle should be repeated in the second cell cycle. We identiﬁed the optimal threshold as

the one that predicted this cyclic behavior for all the transcription factors.

119

5.3 Results

5.3.1. Inferring TFA from simulated data

Before testing SSM with experimental data where the transcription factor activities are
unknown, we applied SSM to a simulated system. We created a simple regulatory
network (see Figure 5.1a) with the feed-forward loop and auto-regulation motifs
containing two TFs and three genes. From the known network structure, we constructed a
SSM representation of the network (see Figure 5.1b) and pre—defmed the 0 parameter,
which is shown in Table 5.1(a). Using the network reconstructed by the SSM, we
simulated the dynamic proﬁles of TFA and gene expression levels with the sampling
method discussed in the Methods section. Given the simulated gene expression proﬁle
(shown in table 5.1(c)) and the network structure, SSM learned the parameters 0 and the
TFA proﬁle. The learned parameters are shown in Table 5.1(b). They match closely to
the predeﬁned parameters in Table 5.1(a). The learned or inferred TFA was compared to
the simulated TFA. The learned TFA matched 95% of the time points (38 out of 40) of
the simulated TFA. The results of this simulation provided conﬁdence that the SSM may
be able to analyze a real system. The simulations were performed on Matlab R13,
Windows XP, on a PC with Celeron CPU 2.4 GHZ and RAM 512MB. The simulation
time was 328.5 seconds.

SSM requires sufﬁcient data to infer the parameters. We evaluated how many
data points are needed with the simulation data by varying the number of time points
from 6 to 40. Ten time points were needed to correctly infer 80% of the TFA proﬁle and

this percentage increased with the number of time points used. The sampling time needs

120

 

 

to be small enough to capture the dynamic proﬁle, which will vary with the biological

system being modeled.

Table 5.1 (a) Parameters of the simulated network shown in Figure 5.1

 

1.1nitial probabilities of (TF1, TF2, G1, G2, G3)

[P(x=1),P(x=2)] = [0.5, 0.5]

 

2.Conditional probabilities between gene (G) and its expression (E)

[P(E=1|G=1),P(E=2|G=l),P(E=1|G=2),P(E=2|G=2)= [0.9, 0.1, 0.1, 0.9]

 

3.Conditiona1 probabilities between transcription factors (TF) and genes (G)

TF1—>Gl; TF1->G2
[P(G=1 |TF=1),P(G=2|TF=1),P(G=1|TF=2),P(G=2|TF=2)]
= [0.9, 0.1, 0.1, 0.9]

(TF1+TF2)->G(3)
P(G=1|TF1=1,TF2=1)=O.9
P(G=1 |TF1=1,TF2=2)=0.7

P(G=1 |TF=2,TF2=1)=0.7
P(G=1 |TF=2,TF2=2)=0.01

 

4.Conditional probabilities between genes (G) and transcription factors (TF)

G1->TF(1);G2->TF2
P(TF=1|G=1)=O.9
P(TF=2|G=1)=O.1
P(TF=1 |G=2)=0.1
P(TF=2|G=2)=0.9

 

 

 

121

Table 5.1 (b) Learned parameters of the simulated network shown in Figure 5.1

 

 

1.Initial probabilities of (T F1, TF2, G1, G2, G3)

[P(TF1=1),P(TF1=2)] = [0.25, 0.75]
[P(TF2=1),P(TF2=2)] = [0.55, 0.45]
[P(Gl=1),P(G1=2)] = [049,051]
[P(GZ=1), P(G2=2)] = [045,055]
[P(G3=l),P(G3=2)] = [0.44, 0.56]

 

2.Conditional probabilities between gene (G) and its expression (E)

[P(E1=1|G1=l),P(E1=2|G1=1),P(E1=1|Gl=2),P(El=2|G1=2)= [0.83, 0.17, 0.13, 0.87]
[P(E2=1|G2=1),P(E2=2|G2=l),P(E2=l|G2=2),P(E2=2|GZ=2)= [0.95, 0.05, 0.12, 0.88]
[P(E3=1|G3=l),P(E3=2|G3=l),P(E3=l|G3=2),P(E3=2|G3=2)= [0.91, 009,015,085]

 

 

3.Conditional probabilities between transcription factors (TF) and genes (G)

TF1->Gl
[P(G=1 |TF=1),P(G=2|TF=1),P(G=1lTF=2),P(G=2|TF=2)]
= [0.94, 0.06, 0.05, 0.95]

TF1->G2
[P(G=1 |TF=1),P(G=2|TF=1),P(G=1|TF=2),P(G=2|TF=2)]
= [0.83, 017,007,093]

(TF1+TF2)->G(3)

P(G=1|TF1=1,TF2=1)=0.76
P(G=1|TF1=1,TF2=2)=O.86
P(G=1 lTF=2,TF2=1)=O.15
P(G=1 lTF=2,TF2=2)=0.04

 

4.Conditional probabilities between genes (G) and transcription factors (TF)

G1->TF(1)
P(TF=1|G=1)=0.92
P(TF=2|G=1)=0.08
P(TF=1 |G=2)=0.1

P(TF=2|G=2)=0.9

G2->TF(2)

P(TF=1|G=1)=O.44
P(TF=2|G=1)=0.56
P(TF=1 |G=2)=0.34
P(TF=2|G=2)=0.66

 

 

 

122

 

 

 

123

 

Table 5.1 (c) Simulated gene expression data

Note: T: time point, E1, E2, E3 are expression data of gene 1, 2, 3 respectively. 0:

inactive. 1: active.

 

Table 5.2 Eight genes regulated by ARCA and CRP

Gene TFs
Acs +
SucD -
SucC

Sth -
SdhA -
SucB -
mdh -
- +
Note: (+) positive control, (-) negative control

 

5.3.2. E. coli

We applied SSM to a model system: E. coli transitioning from glucose to acetate as a
carbon source. The gene expression data was obtained ﬁ'om [Kao et al., 2004] and the
regulatory information is available from the regulonDB database. The SSM model
included two TFs, CRP and ArcA, and eight of the genes (shown in Table 5.2) that are
regulated by the two TFs. The dynamic proﬁles of these two TFs were learned from the
gene expression levels of the eight genes. In [Kao et al., 2004], cAMP was measured to
indirectly indicate the activity of CRP, since the activation of CRP requires the binding of
CAMP. From the measurement of CAMP (see Figure 5 in Kao 2004), we can see that the
level decreased from its initial high but remained upregulated, around 10 fold above the
basal level, ﬁ'om the second time point onward. Indeed, SSM inferred that CRP is active
from the second point onwards (see Figure 5.3). F or the ﬁrst time point, however, CRP is
predicted to be inactive. The expression level at the ﬁrst time point is the reference; all
subsequent expression levels are measured relative to the expression level at the ﬁrst time
point. SSM also identiﬁed that ArcA was inactive for the ﬁrst four time points and active
for the remaining 6 time points (see Figure 5.3). An ArcA measurement is not readily

obtainable for comparison.

124

___-4"

 

AKA SSM Proﬁle CRP SSM Proﬁle

 

 

 

 

 

 

 

 

 

‘70 2 3 :ﬁ ; e e 6 “e; e e e e

at; I ii 2r

f; lei . . TA 11 . .

" 0 2 4 6 " 0 2 4 6
innerllours) Tilt-cilia")

 

 

 

 

Figure 5.3: The results of using a SSM to analyze an E. coli system. SSM predicts that CRP is inactive
initially and then active from the second time point onwards, as expected item the CAMP measurement
[Kao 2004]. SSM predicts that ArcA is inactive for the ﬁrst four time points, but is activated for the last six
time points.

5.3.3. Saccharomyces Cerevisiae

Next, we applied SSM to model several of the common regulatory motifs in
Saccharomyces cerevisiae. According to Lee et al. (2002), 39 out of the 106 studied
regulators were involved in feed forward loops and 10 of the 106 were involved in auto-
regulation. Lee et al. (2002) also found that in combinations of two or more of the 106
regulators, 295 were involved in multi-input motifs. Therefore, we selected TFs with the
aforementioned regulatory motifs, and whose activities could be veriﬁed by the literature.
Namely, Mle, Swi4, Swi5, and Swi6, which are well studied and well—understood
[Acme et al., 1998; Baetz et al., 1999; Kovacech et al., 1996; Lee et al., 2002; MacKay et
al., 2001; McInemy et al., 1997; Oehlen et al., 1996; Spellman et al., 1998] TFs. SSM
was applied to analyze the system illustrated in Figure 5.4. The gene regulatory motifs,
feed-forward (Mcml 9 SW15 9 Swi5 9 YJL160C + PIRl + PIR3), multiple-input
(Mcml + Swi4 + Swi6 9 CLA4 + SW14 + LSM4 + PCLl + SIMl + GIN4 +
YDR509W), and auto-regulation, (Swi5 9 SW15; Swi4 9 SW14) were obtained from

[Lee et al. 2002].

125

 

 

 

 

 

masters @Fo‘mrag

ﬂow erasure
~ (sneer emanate

Figure 5.4: A SSM representation of the Saccharomyces cerevisiae (yeast) system studied. The gene
regulatory motifs were taken ﬁom [Lee et al. 2002], including feed-forward (Mcml->SW15->Swi5-
>YJL160C+PIR1+PIRB), multiple-input (Mcml + Swi4 + Swi6 -> CLA4 +
SW14+LSM4+PCL1+SIM1+GIN4+YDR509W), auto-regulation (Swi5->SW15; Swi4->SWI4).

 

 

 

 

 

We evaluated the ability of the SSM to infer TFA proﬁles during the cell cycle of
Saccharomyces cerevisiae. Lee et al. (2002) performed a genome-wide binding analysis
to obtain the connectivity information between the TFs and genes in yeast. We coupled
the connectivity information [Lee et al., 2002] with gene expression data taken from
yeast cultures synchronized by or-factor arrest [Spellman et al., 1998]. Of the four
synchronization methods used by Spellman et al., we chose the data obtained with the or-
factor method because it presented the least amount of missing data for the genes studied.
With the (Jr-factor arrest method, the data were sampled every 7 minutes, which captured
approximately two cell cycles with the 18 time points [Spellman et al., 1998]. This
provided 9 time points (~ 63 minutes) for each cell cycle. Each phase (i.e., M, G1, S, and
G2) within a cell cycle takes about 15 minutes [Liao et al., 2003]. In other words, one cell

cycle is about 60 minutes. Therefore, a 7 minute sampling time is small enough to

126

capture the phase change proﬁle within a cell cycle. The binding motifs and gene
expression data were used by SSM to infer the TFAs.

The SSM predictions are consistent with the literature results. SSM inferred that
Mle is active during the G2/M/Gl phases, Swi4 and Swi6 are active during the GI and
S phases, and Swi5 is active during the M and G1 phases. We conﬁrmed the predictions
(Figure 5.5) made by SSM with the literature. Past studies found that Mle induced the
expression of many genes during the G2/M/G1 phases. High transcription of both FAR]
and STE2 in the G2/M phases requires Mle [Oehlen et al., 1996]. Mle is also
known to induce the transcription of CLN3, SW14, and CDC6 at the M/Gl boundary
[Spellman et al., 1998]. In contrast, Swi4 induced genes in the G1 and 8 phases. Swi4 is
the DNA binding component of SBF [Baetz et al., 1999]. Baetz et a1. (1999) indicated
that SBF promotes the induction of gene expression at the Gl/S-phase transition of the
mitotic cell cycle. MacKay et al. (2000) also showed that a complex containing Swi4
induces CLNl and CLN2 transcription in the late G1 and drives the transition to S.
Similarly, Swi6 induced genes during the G1 and 8 phases. The activity of Swi6 is very
similar to that of Swi4, and these two factors are known to be connected [Baetz et al.,
1999; MacKay et al., 2000]. MacKay et al. (2000) showed that a Swi4-Swi6 complex
induces CLNl and CLN2 transcription in late G1 until S. Baetz et al. (1999) also
suggested that the DNA binding domain of Swi4 is inaccessible in the full-length protein
when not complexed with Swi6. In contrast, Swi5 was activated during the M phase and
the M/Gl boundary. Kovacech et al. (1996) found that Swi5 is partially responsible for

the peak in EGT2 expression during late M and early G1 phases. Aerne et al. (1998)

127

 

 

 

found that SwiS regulates the expression of PCL2, PCL9, and the SICl Cdk inhibitor in

the late M phase.

‘ M’GlGl S G2 MIvI/Gl G1 S G2 M MJ'GIGI S G2 IVIIvi/Gl G1 S G2 M
_. l .4

OIIIIIIITTIIIIIIIII

 

 

 

 

OIIIIIIIIIITIIIIIII

1357911131517 1357911131517

1

“Mi'GlGl S G3 MM/Gl G1 3 (33 M NUGIGI S G2 MM/Gl G1 S G2 M
2 q M 2 -
1 ‘ l -( W

Ollﬂllllllllllrllll 0

 

 

 

 

rilllllllllllllrll

1357911131517 1357911131517

Figure 5.5: The results of using a SSM to analyze a yeast system. The SSM predictions closely followed
the experimental trends and phases in which each transcription factor is known to be active or inactive
[Acme et al., 1998; Baetz et al., 1999; Kovacech et al., 1996; Lee et al., 2002; MacKay et al., 2001;
McInemy et al., 1997; Oehlen et al., 1996; Spellman et al., 1998]. SSM inferred that Mle is active
during the G2/M/Gl phases, that Swi4 and Swi6 are active during the G1 and S phases, and that Swi5 is
active during the M and G1 phases.

In summary, SSM was applied to three systems, a simulated gene regulatory network and
two experimental systems, E. coli and S. cerevisiae. The simulation study conﬁrmed the
ability of SSM to infer network parameters and state values from observational data.
Application of SSM to the experimental systems illustrates that TFAs can be inferred

from the gene expression data given the regulatory network structure.

5.4 Discussion

SSM is a subclass of DBN. DBN has been applied to infer the structure of regulatory

networks from temporal gene expression data [Rangel et al., 2004; Beal et al., 2005; Ong
128

 

 

p7

et al., 2002; Perrin et al., 2003; Nachman et al., 2004]. Ong et al. (2002) explicitly
included operons as hidden variables in the model to facilitate the incorporation of a
priori biological knowledge of the co-expressed genes to improve the quality of the
analysis. Here, we explicitly included TFs in the model and included the connections
between TFs and genes obtained through binding analysis [Lee etal., 2002]. SSM has the
potential to infer unmeasurable, as well as unmeasured, connections and events.

The beneﬁt of the SSM over existing models (e.g., NCA [Liao et al., 2003],
kinetic modeling [Nachman et al., 2004) is that all gene regulatory motifs, including
feedback from gene to TFs, are explicitly modeled. NCA implicitly models the feedback
from gene to TFs [Tran et al., 2005]. In kinetic modeling [Nachman et al., 2004],
feedback from gene to TFs is not incorporated. In contrast, SSM can explicitly model the
feedback from gene to TFs, such as the auto-regulatory motif. This facilitates the
incorporation of domain or experimental knowledge. For example, if a TF is
experimentally knocked—out or silenced, the SSM approach could easily incorporate this
information for the auto-regulatory motif.

In SSM, the nonlinear relationship between the TFs and genes are quantiﬁed with
conditional probabilities, i.e. P(0,|S,-,). By using conditional probabilities, SSM does not
presuppose a relationship between the TFs and genes in the model, whereas NCA
assumes a log-linear relationship and kinetic modeling assumes a form for the rate law,
such as Michaelis-Menten kinetics. Conversely, NCA [Liao et al., 2003] and kinetic
modeling [Nachman et al., 2004] can infer continuous proﬁles of TFA. Another
advantage of kinetic modeling is the ability to explicitly model both mRNA transcription

and decay. In the current application of SSM the gene expression data are discretized,

129

thus allowing us to infer when the TFAs are active. This was sufﬁcient to allow
veriﬁcation of the model predictions with the literature. F or example, the model predicted
the phase in the cell cycle in which the genes that are regulated by a TF are activated,
which can be compared with the phase of the cell cycle in which the genes are known to
be activated in the literature.

The SSM assumes there is a time delay between binding and transcription. By
considering the state values at time T as a function of the state values at time T-l, the
SSM implicitly assumes a time delay for all TF effects on gene expression. In other
words, although the binding of a TF to the DNA sequence of a gene may occur quickly
[MeAdams and Arkin, 2002], there is a time offset between the binding of the TF to the
DNA sequence of a gene and the onset of transcription. This time offset ranges between
minutes to hours [Kersberg, 2004]. It has been shown that incorporating a time delay in
modeling gene regulatory networks is critical to inferring the oscillatory behavior of NF-
kB [Monk, 2003]. We further evaluated this assumption by allowing the genes to be
regulated by the current TFA in the yeast dataset. Without the time delay, the cyclic TFA
could not be inferred. This, in addition to the previous study [Monk, 2003], suggests that
this biologically relevant time delay [Kersberg, 2004] must be incorporated in the model
to accurately infer the TF A proﬁles. In some cases, if the actual time delay is on the order
of minutes and the measurements are taken on the order of hours, then considerable error
would be introduced. In those cases, it would be more appropriate to incorporate the
connection between the TFS and genes in the same time slice.

In the simulation study, the TFA inferred by the SSM matched the simulated TFA

well but not exactly. The mismatches may be due in part to the EM algorithm being a

130

 

T

 

local optimization method [Ong et al., 2002], in other words, the algorithm cannot
guarantee a TFA of (global) maximal likelihood. The optimization could be improved by
either running EM multiple times from different starting points or using a global search
algorithm, such as Markov Chain Monte Carlo (MCMC) [Murphy 2001].

The SSM model determines an optimal threshold value for discretizing the gene
expression data based upon a priori knowledge of the TFA. If no a priori knowledge is
available for the TFA dynamics, this can be addressed one of two ways. In one approach,
the TFAs could be estimated from other approaches, e. g. NCA, and the estimated TFA
could be used to determine the threshold value. In the other approach, the optimal
threshold could be determined in the SSM by including the threshold value (Th) as a part
of the parameter learning process, i.e., in the parameters 0 = (A, B, W, V, Th). How the
observation data O(Th) = (Om (1),. . .., Om (N)) is discretized depends on the threshold
value, e.g., a very high threshold value will set all the genes in the inactive state while a
very low threshold value will set all the genes in the active state. The Expectation-
Maximization (EM) algorithm could be used to learn the parameters. Starting with an
initial guess of 0, we can perform the E (expectation) step at iteration k to estimate the
value of states S_hat with 0,. and O(Th) using inference; then, we can perform the M
(maximization) step to maximize the likelihood of the conditional probability
P(0(Th),S_hat|6), such that 61+1=argmax(P(0(Th),S_hat|6)). Therefore, the parameters,
including the threshold value, can be inferred by maximizing the probability of the
observation and state values given the inferred parameters (e.g., conditional

probabilities).

131

 

 

 

The current SSM infers the most probable model, given the observed data, by
approximating the underlying structure of the noise to be Gaussian [Murphy and Mian,
1999; Perrin et al., 2003]. Gene expression is an inherently stochastic phenomenon
[McAdams and Arkin, 2002]. SSM modeled the regulatory networks stochastically using
conditional probabilities. This probabilistic SSM may capture some of the stochastic
nature of the gene regulatory network, but an accurate representation of the stochasticity
requires further understanding of the underlying structure of the noise. Without knowing
the structure of the noise, studies have assumed it to be Gaussian [Perrin et al., 2003].

The current SSM model could be extended to incorporate a step to learn the
structure before inferring the TFAs by searching for a network that gives the maximal
likelihood against the observation. The structural information obtained from the binding
analysis could be used to construct the initial network as a starting point for the search
[Nachman et al. 2004]. Alternatively, the connections indicated by the interaction data
could be used to deﬁne a priori probabilities of the connections in the network
[Hartemink et al., 2002]. Thus, the connections that are supported by the interaction
measurements would have a higher likelihood of being valid connections in the network
than the unsupported connections. By including a step to learn the structure, it could help
reﬁne the network by inferring the interactions that are unmeasurable or missing due to
error (noise) in the measurement. A more accurate network provides more conﬁdence to
the inferred TFA proﬁle. In addition, the model size was limited by the computational
tool that was used, namely Bayes Net Toolbox [Murphy, 2001], which is on a Matlab

platform. We are currently working on developing an executable SSM in a CH version

132

 

 

 

..,...

of Bayes Net Toolbox, Probabilistic Networks Library

http://www.intel.corn/techno]ogy/computing/pnl/ to handle larger model sizes.

133

(PNL)

 

 

“- .' 1

 

CHAPTER 6 EXTENSIONS TO IMPROVE THE T IPS©
FRAMEWORK

6.1 Introduction

In the previous chapters, we introduced the T [PS© framework to identify genes and
pathways relevant to a phenotype, e. g. cytotoxicity. This chapter discusses improvements
to the current T IPS© framework. Improvements include hierarchically integrating
phenotypic, metabolic and gene expression proﬁles with genomic function information,
inferring pathways that confer a phenotype involving multiples metabolites and

developing a dynamic modeling. Preliminary results are presented in this chapter.

6.2 A Hierarchical Approach Employing Metabolic and Gene
Expression Proﬁles to Identify the Pathways that Confer Cytotoxicity in
HepG2 Cells

6.2.1 Introduction

In the TIPS© approach introduced in chapter 2 to chapter 4, genes relevant to a
phenotype e.g. cytotoxicity were selected with GA/PLS and ICA without incorporating
metabolic proﬁles and functional information of the genes. It is the objective of this
section to develop an approach to hierarchically integrate phenotypic, metabolic and gene
expression proﬁles with a priori knowledge to select genes relevant to a phenotype. The
approach was applied to the cell culture system introduced in chapter 4, namely HepG2

cells exposed to FFAs and TNF-or, to study the genes involved in palmitate induced

cytotoxicity.

134

 

 

 

Metabolite Gene sets, groups or

 

 

 

 

 

 

 

 

 

 

 

 

 

Measurement expression proﬁle
Correlation Correlation
/ FDA (GSEA)

Identification of Identification of

metabolites gene sets related

related to to cytotoxicity
MB-PL

 

Identification of
genes related be
the metabolites

 

 

 

Figure 6.1 An overview of the hierarchical approach. First, important metabolic ﬂuxes relevant to a
phenotype were identiﬁed with discriminant analysis. Second, to identify the transcriptionally altered
pathways, gene set enrichment analysis (GSEA) was applied to the cDNA microarray data. Finally, the
expression of the enriched gene sets and the metabolic proﬁles were integrated with a multi-block partial
least squares analysis (MBPLS) regression model to identify the genes and gene sets that regulate the
metabolic pathways identiﬁed in step 1. This ﬁgure was prepared by Shireesh Srivastava.

An overview of the hierarchical framework that was developed is shown in Figure
6.1. The framework consisted of three stages. First, the metabolic ﬂuxes of glucose, fatty
acid and amino acid metabolism were measured to identify the metabolic changes
responsible for producing the cytotoxic phenotype. Different phenotypes induced by
TNF-or and FFA were characterized with Fisher’s Discriminant Analysis (FDA) (Chan
2003 c) to identify the metabolic ﬂuxes that contributed most to separating the
phenotypes. Ketone body (e.g., beta-hydroxybutyrate (BOH) and acetoacetate) release
and TG accumulation were identiﬁed to be the most important in separating the
phenotypes, suggesting their involvement in inducing the cytotoxicity. However, the
FDA of the metabolic changes did not provide information on the changes in the
signaling and gene regulatory pathways. To obtain this missing information, the genomic

responses were measured using cDNA microarrays, and analyzed with the gene set

135

 

 

enrichment analysis (GSEA) [Subramanian 2005]. GSEA integrates a priori knowledge
of a gene’s functional role with the expression data to detect concerted expression
changes in a set of genes responsible for producing a phenotype. GSEA indicated the
involvement of mitochondria-related, oxidative stress-related and FFA metabolism
pathways in inducing the cytotoxic phenotype. GSEA identiﬁes the possible functional
pathways, but does not provide a quantitative model to predict the effects of individual
genes on the phenotype. Therefore, in the ﬁnal step, the gene expression and metabolic
ﬂux proﬁles were integrated with multi-block partial least squares analysis (MBPLS)
[Hwang 2004] to identify the genes most relevant to the metabolic ﬂuxes which
correlated most to the cytotoxicity. In the MBPLS model, the different gene sets
identiﬁed to be enriched by GSEA formed the blocks of the MBPLS. This ensured that
the blocks were separated according to the functional role of the genes. Block scores
were extracted from each block to predict the metabolic ﬂuxes. Some of the identiﬁed
genes were experimentally perturbed to validate their predicted roles in regulating the

cytotoxicity.

6.2.2 Materials and Methods

Fisher’s Discriminant Analysis

FDA was applied to identify the measured metabolic ﬂuxes that contributed to the
separation of the different phenotypes (cytotoxic versus nontoxic). FDA was applied to
15 samples belonging to 5 groups with 27 metabolic ﬂux measurements for each sample.
FDA identiﬁes the projection axes that maximize the ratio of the between-group and the

within-group variations. Details on the FDA algorithm can be found in [Chan 20030].

136

Two axes T1 and T2 were extracted; each is a linear combination of the 27 metabolic

ﬂuxes.

T = 2W,- x Met(i) (6.1)

W,- provides the contribution of each metabolic ﬂux in separating the phenotypes in the
T1 and T2 directions.

Gene Set Enrichment analysis (GSEA) of the gene data

In order to evaluate the genetic response of the cells to the treatments, cDNA microarray
analyses were conducted. The expression levels of 19,522 genes were measured, and the
data was analyzed using GSEA. GSEA aims to identify the gene sets whose coordinated
change differentiates two phenotypes. The software GSEA-P, from
http://www.broad.harvard.edu/gseal, was used for the GSEA analysis. Thirty seven gene
sets from the molecular signature database, MsigDB [Subramanian 2005] functional gene
group c2 were selected, as shown in Table 6.1. These sets included 10 metabolic
pathways, 26 signal pathways and 1 cellular component. An enrichment score of a gene
set S characterizes whether the set of genes randomly distributed across the list or falls
mainly at the bottom or top of the list. The null hypothesis that a gene set S randomly
distributes across the ranked gene list was tested with Kolmogorov-Smimov test, with the
statistical signiﬁcance value estimated through 1000 random permutations of the
phenotype label. The gene sets with a high signiﬁcance of enrichment are considered
important in separating the distinct phenotypes. Here, the expressions of 19,522 genes
were measured and ranked based upon the comparison between the palmitate treatment
and the control using t-test.

Integrating the gene expression and metabolic ﬂux proﬁles

137

 

 

MBPLS is a hierarchical multivariate analysis method [MacGregor 1994; Lopes 2002],
where the variables are divided into different blocks based upon a priori knowledge, for
example according to different stages of a process [MacGregor 1994] or different
metabolic pathways in a cell [Hwang 2004]. In this study, we separated the genes into
different blocks based upon their functional roles in different pathways. This facilitates
the identiﬁcation of an important block (gene set) to a metabolic ﬂux, and then further
identiﬁes the important genes within the block. Gene expression X is divided into K
blocks X = [X1,X2,. . .,Xk], block scores B = [b1, b2, ...bk] are extracted from each block

Table 6.1: Gene sets used in the GSEA analysis

 

Metabolic pathway

 

 

 

Glycolysis fatty acid metabolism
Oxidative

TCA phosphorylation

PPP pathway beta-oxdiation

Fructose Sphingoglycolipid

Fatty acid biosynthesis Glutathione

8m pathway

Bcl2family_and_reg_network Map_kinase

CAPASE Mapkk

Cell death Ppara

ceramidePathway pparG

crebPathway S1 P signaling

NFKB Programmed cell death
ERK1 Erk2 Mapk

P38 pathway

Akt JNK

EGF TNFalpha

EIF2 Stress Pathway

E|F4 TNFR1 pathway

Electron Transport Chain TNFR2 pathways

ERK pathway

 

Cellular Comﬂment

 

Mitochondria

respectively. A PLS regression model was then constructed to map B to a metabolic ﬂux.

For details of the MBPLS algorithm, please refer [Hwang 2004]. Important genes sets

138

 

 

were identiﬁed by evaluating the weights of each block and importance of individual
genes were then further identiﬁed by evaluating the regression coefﬁcients of the genes
within the block. The N-way toolbox (http://www.models.kvl.dk/source/nwgytoolbox)

[Anderson 2000] was applied to conduct the MBPLS modeling.

6.2.3 Results

Metabolic ﬂux important in separating cytotoxic phenotype

FDA was applied to identify the metabolic ﬂuxes responsible for separating the
phenotypic response (cytotoxic versus nontoxic as deﬁned by the level of lactate
dehydrogenase (LDH) release). The metabolic ﬂux data were obtained by Shireesh
Srivastava. As shown in Figure 6.2 (a), palmitate cultured samples were separated from
the rest in a two dimensional space deﬁned by FDA projections T1 and T2. Furthermore,
as shown in Figure 6.2 (b) metabolic ﬂuxes of BOH, acetoacetate, intracellular TG
accumulation and amino acids such as aspartate, alanine, ornithine were the most
signiﬁcant in separating the cytotoxic (palmitate) from the nontoxic phenotype (control,
unsaturated fatty acids, TNF-or and their combinations). Intracellular TG, ornithine,
aspartate have the highest coefﬁcients in separating palmitate from the other conditions in
the T2 direction, while BOH, acetoacetate and alanine separated the two groups in the T1
direction, which suggested differences in the metabolic state between the phenotypes
exist mainly in the aforementioned metabolic ﬂuxes such as TG, BOH and acetoacetate.
While FDA can identify the important metabolic ﬂuxes to the phenotype, it does not
provide the direction of the relation, i.e., whether the variables are correlated positively or
negatively. To obtain the direction of the relationships, we performed correlation

analysis. The (Pearson’s) correlation coefﬁcients (r) between cytotoxicity and metabolic

139

ﬂuxes are shown in Table 6.2. BOH, acetoacetate, alanine release had positive
correlations. Acetoacetate and BOH are produced from acetyl-CoA, the metabolic
intermediate of fatty acid oxidation. Increased fatty acid oxidation is known to increase
oxidative stress and cell death [Gill 2006, Sanyal 2002]. TG accumulation, ornithine and
aspartate had negative correlations. TG accumulation has been found to protect cells from
palmitate induced lipoxicity [Listenberger 2003]. Ornithine and aspartate may affect fatty
acid metabolism through the TCA cycle. Thus, both the FDA and correlation analyses
suggested that metabolic ﬂuxes relevant to fatty acid oxidation are positively associated
with the palmitate-induced cytotoxicity, while metabolic ﬂuxes relevant to the

accumulation of TG are negatively correlated to cytotoxicity.

140

 

10

 

 

 

 

 

OH
.8
AP
xC)
XL

 

 

 

 

 

 

 

 

 

 

 

 

T1

 

 

 

0'4 _ o Glucose
N I Lacate
_l Asp a Acec

Acac .’ BOH X BOH
B ' . Glycerol
A J o FFA
0 X + T6
r I 1 r {P .., i I r ' 80'
(i o 0.2 0.4 0.8 - Gin
L1 ' His
-o.2 ~ ' ‘59
o Glu
6'5!
TG -NH3
\ -0.4 i , NO
+ Thr
-Ala
-0.6 a - Pro
9 Cys
.. Tyr
-08 d a Val

>l< .
Om / Met
a: Om

o Lys
. lie

-Leu
- Phe

 

 

 

 

 

 

 

Figure 6.2. Fisher’s discriminant analysis of metabolites. (a) Two dimensional score (T1 and T2) plot of
the samples, H: control, B: BSA, P: palmitate culture, 0: oleate culture, L: linoleate culture. (b) Loading
plot of the ﬁrst two dimensions for the metabolites.

141

Table 6.2 Correlation between the metabolic ﬂuxes and the LDH release

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Metabolite Correlation
Beta-

hydroxybutyrate 0.96
Acetoacetate 0.74
Lactate 0.47
Glutamate 0.37
Alanine 0.27
Fatty acid 0.25
NH4 (iri 0.19
Serine 0.14
Histidine -0.09
Aspartate -0.09
Triglyceride -0.33
Glutamine -0.35
Arginine -0.5
Lysine -0.51
Proline -0.52
Glycine —0.55
Tyrosine —0.55
lsoleucine -0.56
Leucine -0.6
Threonine -0.6
Ornithine -0.61
Valine -0.62
Phenylalanine -0.64
Methionine -0.69
Glycerol -0.7
Cysteine -0.84

 

 

 

Gene Sets Enriched by Palmitate

37 gene sets involved in a variety of cellular processes and organelles, such as fatty acid
metabolism, cell death, TNF-a signaling, etc., were evaluated to identify which of these
processes were associated with palmitate-induced cytotoxicity (Table 6.1). Of the 37
gene sets evaluated, 14 were found to be signiﬁcantly enriched with nominal p values

less than 0.05, and shown in Table 6.3. The pathways that were enriched included

142

electron transport chain (ETC), fatty acid metabolism, glycolysis, oxidative
phosphorylation, ROS, the pentose phosphate pathway (PPP), cell death, fatty acid beta-
oxidation, TCA cycle, ﬁ'uctose metabolism, glutathione, and the ERK l, ERK 2, MAP
kinase pathways. Therefore, it suggested that the altered fatty acid metabolism may be
associated with oxidative stress related pathways such as ROS, glutathione, oxidative
phosphorylation and ETC to induce cytotoxicity. Interestingly, the gene set belonging to
sphingolipid (ceramide) metabolism was not selected, suggesting that ceramide
metabolism does not play an important role in the observed toxicity. Notably, all the
enriched gene sets belong to the mitochondria, for example fatty acid beta-oxidation,
electron transport chain, oxidative phosphorylation, indicating a central role of this
organelle in the observed cytotoxicity. To further support this ﬁnding, the gene set of
mitochondria with 229 genes, deﬁned according to MsigDB, was found to be
signiﬁcantly changed in the palmitate-treated cells, with a nominal p value of 0.00189.
Genes Important to Cytotoxicity

To identify the genes that regulate the metabolic functions most closely associated with
cytotoxicity, a MBPLS model was developed to predict the metabolic processes based
upon the expression data of the gene sets identiﬁed by GSEA. Ketogenesis was identiﬁed
to be positively related to LDH release by the FDA and correlation analysis. Therefore, a
MBPLS model was developed to predict BOH based upon the expression proﬁles of the
14 enriched gene sets shown in Table 6.3. Cytoxicity-related functional pathways were

identiﬁed by evaluating the block weights. Based upon the block weigth of the MBPLS

143

Table 6.3 Enriched Gene sets in GSEA analysis

 

 

NOM p-

Gene Sets SIZE NES val
Electron Transport Chain 43 -1.998 0
Fatty acid metabolism 33 -1.824 0
Glycolysis 55 -1 .765 0
Oxidative
Phosphorylation 30 -1 .89 0.004
ROS 26 -1 .741 0.009
PPP pathway 15 -1.644 0.01
Cell death 15 -1.631 0.01
ERK pathway 31 -1.62 0.016
Fatty acids beta-
oxidation 7 -1 .58 0.02
Fatty acid biosynthesis 5 -1.574 0.021
TCA 15 -1 .631 0.023
Fructose 19 -1 .599 0.026
ERK1 Erk2 Mapk
pathway 29 -1 .542 0.034
Glutathione 23 -1 .52 0.036
Mitochondria 229 -1 .945 0.002

 

Note: Size: number of genes in the gene set; NES: normalized enrichment score; NOM p-val:
nominal P value.

identiﬁed by evaluating the block weights. Based upon the block weight of the MBPLS
model, it was identiﬁed that functional groups such as glycolysis, oxidative
phosphorylation, ETC, ERK, ROS, glutathione, and fatty acid metabolism had relatively
higher weights, suggesting the involvement of these pathways in inducing the
cytotoxicity. The genes relevant to these functional groups were identiﬁed by evaluating
the regression coefﬁcients of the individual genes, and are discussed below.
Identiﬁcation and literature evaluation of the cytotoxic pathways through BOH
BOH and acetoacetate were identiﬁed to be important in separating the phenotypes.
Increased production of ketone bodies, such as BOH and acetoacetate, were observed in
the palmitate treated cells. Increased fatty acid oxidation has been shown to be associated

with oxidative stress [Gill 2006, Sanya 2003]. Thus, the genes with large (positive or

144

negative) regression coefﬁcients to BOH release, for example, could suggest pathways

relevant to the cytotoxicity. The regression coefﬁcients of the genes were inspected to

 

identify the individual genes that were important. The genes with the largest regression
coefﬁcients are listed in Table 6.4. Genes relevant to ROS production, fatty acid
metabolism, and detoxiﬁcation of lipid peroxidation products were found to have high
regression coefﬁcients. Genes involved in ETC and oxidative phosphorylation including
NADH dehydrogenase with positive regression coefﬁcients, indicated a positive role in
the cytotoxicity. ROS are produced during the electron transfer and oxidative

phosphorylation process. NADH dehydrogenase complex, or complex I of the electron

 

transport chain, is a main source of superoxide production [Moller 2001].

Table 6.4 Genes selected by MBPLS model

 

Num VAID GenName

933 CASP8 and FADD-like apoptosis regulator
1088 glutathione S-transferase M1
1095 Glucose-6-phosphatase, catalytic
1160 mitogen-activated protein kinase 3
1 172 dihydrolipoamide dehydrogenase
1197 cytochrome P450, family 2, subfamily A, polypeptide 7
1222 clusterin (corn plement lysis inhibitor, SP-40)
1247 cytochrome P450, family 2, subfamily E, polypeptide 1
1319 aldehyde dehydrogenase 3 family, member 82
1322 alcohol dehydrogenase IB (class I), beta polypeptide
1332 acyi-Coenzyme A dehydrogenase, C-4 to C-12 straight chain
1341 ATPase, H+ transporting, lysosomal 42kDa, V1 subunit C, isoform 1
1342 ATPase, Cu++ transporting, alpha polypeptide

nuclear factor of kappa light polypeptide gene enhancer in B-cells 1

1 3 1 554 (p105)
14 1650 TNF receptor-associated factor 2 [‘I’55353,NM_021 138.3,7186]
15 2024 forkhead box M1 [AA136566,NM_202003.1.2305]
16 2056 TNF receptor-associated factor 3 [AA504259,NM_145725.1 .71 87]
17 2111 brain and reproductive organ-expressed (TNFRSF1A modulator)

a—l
OOG‘JODU'leN-‘O

‘4
NA

145

Table 6.4 (Continued)

1 8
1 9
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

51
52
53
54
55
56
57
58
59
60
61
62
63

54

2152
2177
2179
2201
2202
2244
2408
2427
2510
2541
3926
4111
4824
5100
5110
5146
5147
5148
5186
5532
5640
5982
6532
7320
7546
8188
8331
8512
8692
8749
8959
9089
9112

9148
9160
9269
9282
9404
9456
10037
10252
10378
10441
10486
11001
11072

12701

peroxiredoxin 2 [H68845,NM_005809.4,7001]

HIV-1 Rev binding protein [AA485958,NM_004504.3,3267]

Fas (TNFRSF6)—associated via death domain

isocitrate dehydrogenase 3 (NAD+) gamma

caspase 8, apoptosis-related cysteine protease

glutathione S-transferase M4 [AA486570,NM_000850.3,2948]
v-raf murine sarcoma 3611 viral oncogene homolog

neutrophil cytosolie factor 1 (47kDa)

FOS-like antigen 2 ['l’58873,NM_005253.3,2355]

zinc finger protein 118 [N57658,NM_006955.1,7558]

(gF) insulin-like growth factor 1 receptor_(AA256532,_,Hs.239176)
interleukin 1 7D [N92873,NM_1 38284.1 ,53342]

myeloperoxidase [R05801,NM_000250.1 ,4353]

FOS—like antigen 2 [T 58873,NM_005253.3,2355]

v—raf murine sarcoma 3611 viral oncogene homolog

baculoviral lAP repeat-containing 3 [AA002126,NM_001165.3,330]
caspase 3, apoptosis-related cysteine protease

baculoviral lAP repeat-containing 2 [AA702174,NM_001166.3,329]
platelet-derived growth factor receptor, alpha polypeptide

alcohol dehydrogenase 4 (class II), pi polypeptide
BCL2-associated athanogene 4 [N25897,NM_004874.2,9530]
(gC) ESTs, Highly similar to AS3983 pyruvate kinase (EC 2.7.1.40)
KIAA1117 [AA001870,NM_015018.2,23033]

tumor necrosis factor, alpha-induced protein 3

aldehyde dehydrogenase 1 family, member A2

tumor necrosis factor receptor superfamily, member 18

nerve growth factor receptor (TNFR superfamily, member 16)
clusterin-like 1 (retinal) [H91647,NM_199167.1,27098]

epithelial membrane protein 2 [T88721,NM_001424.3,2013]
Kruppel-like factor 6 [AA055585,NM_001300.4,1316]
fucose-1-phosphate guanylyltransferase [R38619,NM_003838.2,8790]
glutathione S-transferase M5 [N56898,NM__000851.2,2949]
fructose-1,6-bisphosphatase 1 [AA699427,NM_000507.2,2203]

mitogen-activated protein kinase kinase 2
[AA425826,NM_030662.2,5605]

cytochrome P450, family 2, subfamily J, polypeptide 2

aldehyde dehydrogenase 1 family, member A1

acyI-Coenzyme A dehydrogenase, C-2 to C-3 short chain

ATP citrate lyase [H08548,NM_001096.2,47]

acylphosphatase 1, erythrocyte (common) type

acyl-CoA synthetase long-chain family member 3
mitogen-activated protein kinase kinase kinase 7

chemokine (C-C motif) ligand 5 [AA486072,NM_002985.2,6352]
NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 4, 9kDa
glutathione peroxidase 3 (plasma) [AA664180,NM_002084.2,2878]
ATPase, H+ transporting, lysosomal 13kDa, V1 subunit G isoform 1

Phosphoglucomutase 3 [AA1891 13,NM_015599.1 .5238]
amyotrophic lateral sclerosis 2 (juvenile) chromosome region, candidate
13

146

Table 6.4 (Continued)

65
66
67

68

69
70
71
72

73

74
75
76
77
78
79
80
81
82
83

12963
1 3744
1 3975

1 6995

17052
17066
171 14
17207

1 7299

1 7355
1 7461
1 7484
18054
18637
1971 7

receptor (TNFRSF)—interacting serine-threonine kinase 1
interleukin 17B [AA443286.NM_014443.2,27190]
oxidative-stress responsive 1 [R95128.NM_005109.1,9943]
copper chaperone for superoxide dismutase
[N30404,NM_005125.1 .9973]

nuclear factor of kappa light polypeptide gene enhancer in B-cells
inhibitor, beta

dual speciﬁcity phosphatase 2 [AA759046,NM_004418.2.1844]
ecotropic viral integration site 1 [AA181023,NM_005241.1,2122]
epithelial membrane protein 3 [W73810,NM_001425.1,2014]
amyotrophic lateral sclerosis 2 (juvenile) chromosome region. candidate
3

TNFRSF1A-associated via death domain
[AA916906,NM_003789.2,8717]

serine/threonine kinase 25 (STE20 homolog, yeast
2,3-bisphosphoglycerate mutase [AA678065,NM_199186.1 .669]
mitogen-activated protein kinase kinase kinase 3 [

NADH dehydrogenase (ubiquinone) ﬂavoprotein 1, 51 kDa

(gC) alcohol dehydrogenase IB (class I), beta polypeptide
ACAC (Metabolic ﬂux)

BOH (Metabolic ﬂux)

TG (Metabolic ﬂux)

LDH (Phenotype)

 

Experimental Validations

In the hierarchical approach, FDA suggested a protective role of TG accumulation.
Indeed, our laboratory had found that supplementing palmitate cultures with oleate

increased TG synthesis and reduced the cytotoxicity signiﬁcantly, to the same levels as

the control [Li 2006]. GSEA found ROS to be an important factor associated with the

cytotoxicity of saturated FFA and TNF-a. Indeed, our laboratory has found that ROS
levels were increased in response to palmitate and that ROS played an important role in
the observed cytotoxicity in the palmitate cultures [Srivastava 2006]. NADH
dehydrogenase was identiﬁed by MBPLS to be one of the genes that is positively related
to cytotoxicity. This was experimentally validated for its role in regulating palmitate-

induced cytotoxicity [Srivastava 2006]. The levels of ROS and LDH release were

147

measured in the palmitate cultured cells with NADH dehydrogenase inhibitor and both
were signiﬁcantly reduced [Srivastava 2006, Li 2006], which conﬁnned the role of

NADH dehydrogenase in the observed toxicity as predicted by the model.

6.2.4 Discussion

A hierarchical framework was developed to integrate phenotypic, metabolic and gene
expression proﬁles with functional genomics information to identify the pathways which
played an important role in regulating FFA and TNF-or induced cytotoxicity in
HepG2/C3A cells. The phenotype was connected to the gene expression proﬁle through
the intermediate metabolic level. Metabolic changes that contributed and are relevant to
the cytotoxic phenotype were identiﬁed by discriminant and correlation analyses.
Pathway-based gene expression analysis, namely GSEA, was used to identify the
functional pathways that were associated with the cytotoxic phenotype. Metabolic and
signaling pathways identiﬁed from the gene expression analysis support the results
obtained by FDA and correlation analysis of the metabolic analysis and provided
additional information on the mechanisms of toxicity. While GSEA was able to identify
the gene sets important for the phenotype, it did not provide information on which genes
within these sets were responsible for mediating the cytotoxic effect. MBPLS offered the
advantage of identifying the individual genes that were directly related to the desired
phenotype. We applied this framework to a model system of HepG2/C3A cells and
successfully identiﬁed the targets to reduce F FA toxicity. For example, GSEA identiﬁed
that mitochondrial and ROS-related genes, but not ceramide synthesis contribute to the

toxicity, which was found experimentally to be indeed the case [Srivastava 2006]. The

148

 

MBPLS prediction of the role of NADH dehydrogenase in regulating cytotoxicity was

validated with complex I inhibitor studies [Srivastava 2006, Li 2006].

6.2.5 Conclusion

In summary, this section illustrated how phenotypic, metabolic and genetic proﬁles can
be integrated hierarchically to identify phenotype relevant genes and pathways. The
metabolites whose levels were most strongly related to the cytotoxicity were identiﬁed.
The genes associated with these metabolic processes were subsequently identiﬁed.
Determining the genes which control the levels of these metabolites provided important
information on the likely mechanism of toxicity and identiﬁed the targets to control the
cytotoxicity of palmitate. This approach identiﬁed the involvement of ROS generation,
altered fatty acid and energy, but not ceramide metabolism in the cytotoxicity. Thus, the
integration of metabolic and genetic information can provide a more comprehensive
picture of the cellular states and provide potential targets that regulate the cellular

I'CSpOl'lSCS .

6.3 Inferring active pathways that confer a phenotype involving
multiple metabolites

6.3.1 Introduction

In the current application of T IPS© to identify cytotoxicity related pathways in HepG2
cells, the proﬁle of LDH release was used to characterize the phenotype. In order to
deﬁne a phenotype more accurately and robustly, more information, e.g. more
metabolites should be incorporated. Metabolites are connected to each other through the

metabolic network, which are implicitly captured by the metabolite proﬁles [Li and Chan

149

2004b]. Therefore, multiple metabolites can discriminate different phenotypes more

readily than a single biomarker.

6.3.2 Materials and Methods

Extracting pathways for multiple metabolites

Let S(t) be the proﬁle of pathways related to a phenotype in experiment t and L1 (t) be the
proﬁle of metabolite L1 in experiment t, L2(t) be the proﬁle of metabolite L2 in
experiment t. S is constrained to have a correlation with L1 and L2 by equation (6.2),
where p2 is a threshold value.

Corr2 = ST*L1*L1T*S/(S*ST) *A+ ST*L2*L2T*S/(S*ST) *B>p2 (6.2)

A and B in the above equation deﬁnes the weight of each latent variable that contributes

to the overall correlation of the pathways related to a phenotype.

6.3.3 Results and discussion

Preliminary study to identify genes relevant to both LDH and TG
We applied the approach described above to identify genes relevant to two cellular
processes, TG and LDH, hereby denoted as two factors. The results are presented in

Table 6.5. The genes selected using only LDH are listed in Table 6.6 for comparison.

Table 6.5 Important genes selected for two factors TG and LDH.

 

Access
Num Name

AA018907 protein phosphatase 3 (formerly 28), regulatory subunit B, 19kDa, alpha isoform
R06605 protein tyrosine phosphatase, non-receptor type 1 (PTPN1)

AA160670 lysophosphatidic acid phosphatase (ACP6)

AA464163 acyl-Coenzyme A dehydrogenase, very long chain (ACADVL)
R39463 aldolase C, fructose-bisphosphate (ALDOC)

AA437389 Lanosterol synthase (2,3-oxidosqualene-Ianosterol cyclase)
R46512 protein phosphatase 1, regulatory subunit 7

 

150

 

Table 6.5 (Continued)

T55835
AA644448
R46823
AA457700
T81 764
T40568
T71 976
AA490466
T77729
AA789328
N93686
M59851 3
N71 653
AA520978
N50655
AA1291 71
M443899
AA421 269
T61 256
M487588
N531 69
H21 869
AA443577
H08753
R71 691
R2221 9
W92859
AA1 86686
R06458
R98442
AA434420
M464957
N46830
AA01 0079
N58305
AA279072
AA425450
R95841
AA682469
AA022480
AA497051
AA41 7279
R53787
AA45481 9
AA608857
N62379
R88440
AA41 6584
H29475

histone deacetylase 6 (HDAC6)

protein tyrosine phosphatase, receptor type, U (PTPRU)
N-acetylgalactosaminidase, alpha- (NAGA)

stearoyl-CoA desaturase (delta-9—desaturase) (SCD)

cell division cycle 27 (CDC27)

phosphatidylinositol-4-phosphate 5-kinase, type II, alpha (PIP5K2A)
phosphatidic acid phosphatase type 28 (PPAPZB)

Gap junction protein. beta 2. 26kDa (connexin 26)_(AA490466,_,Hs.323733)
Pyruvate carboxylase (PC). nuclear gene encoding mitochondrial protein
cyclin-dependent kinase (CDC2-like) 10 (CDK10)

Aldehyde dehydrogenase 3 family, member B1 (ALDH3B1)

protein tyrosine phosphatase, receptor type, F (PTPRF)

aspartoacylase (aminoacylase 2, Canavan disease) (ASPA)
Ubiquitin-conjugating enzyme E2H (UBC8 homolog, yeast) (UBEZH)

solute carrier family 27 (fatty acid transporter), member 3 (SLC27A3)
protein phosphatase 2, regulatory subunit B (356). beta isofonn (PPPZRSB)
scavenger receptor class B, member 1 (SCARB1)

phosphatidylinositol 4-kinase, catalytic. alpha polypeptide (PIK4CA)
ketohexokinase (fructokinase) (KHK)

ATPase. H+ transporting, lysosomal interacting protein 1 (ATP6IP1)
apolipoprotein C-lll (APOC3)

COX10 homolog

tumor necrosis factor (ligand) superfamily, member 13 (T NFSF1 3)

guanine nucleotide binding protein (G protein)

TNF receptor-associated factor 1 (T RAF1)

(gF) phosphate cytidylyltransferase 2, ethanolamine (PCYT2)

protein tyrosine phosphatase

prostaglandin E synthase 2 (PTGESZ)

lecithin-cholesterol acyltransferase

UDP-glucose ceramide glucosyltransferase-Iike 1

protein tyrosine phosphatase, non-receptor type 9 (PTPN9)

Pleckstrin homology, Sec7 and coiled/coil domains 2 (cytohesin-Z) (PSCD2)
Cytokine receptor-like factor 3 (CRLF3)

protein kinase, interferon-inducible double stranded RNA dependent (PRKR)
hydroxysteroid (17-beta) dehydrogenase 7

inositol polyphosphate phosphatase-like 1 (INPPL1)

glycoprotein (transmembrane) nmb (GPNMB)

ribosomal protein 86 kinase, 90kDa

alcohol dehydrogenase lB (class I)

CREB binding protein (Rubinstein-Taybi syndrome) (CREBBP)
sialyltransferase (STHM)

protein tyrosine phosphatase. non-receptor type substrate 1 (PTPNS1)
protein phosphatase 2. regulatory subunit 8 (856), epsilon isoform (PPP2R5E)
mitogen-activated protein kinase 3

serine/threonine protein kinase SSTK (SSTK)

l-kappa-B-interacting Ras-like protein 1

Pas apoptotic inhibitory molecule 2

chemokine (C-C motif) ligand 3

Pyruvate dehjdrogenase kinase, isoengyme 2 (PDK2)

151

 

 

Table 6.5 (Continued)

AA86281 3
AA4431 21
AA464590
AA236957
R83038
T61 271
AA446839
N34876
R6261 2
AA676836
AA464421
H75862
AA452802
M098980
AA485397
M490501
N801 29

AA1 48548
N59764
AA460646
M28062
AA4551 02
W80489
AA644550
H98694
AA451 886
N481 78
M405901
M678095
AA464580
AA001 614
M485347
H03436
M488373
AA1 56571
AA459265
H92232
AA429946
T62040
AA663439
H53340
AA401 1 1 1
AA42934
T81 033

cytochrome c oxidase subunit Vlll (COX8), nuclear gene encoding mitochondrial
protein

apoptosis-inducing protein D (APPD)

protein tyrosine phosphatase. receptor type, N polypeptide 2 (PTPRN2)
Rac/Cdo42 guanine nucleotide exchange factor (GEF) 6 (ARHGEF6)

low density lipoprotein receptor-related protein 5 (LRP5)

phospholipase A2, group IIA (platelets, synovial ﬂuid) (PLA262A)
BCL2/adenovirus E1B 19kDa interacting protein 3 (BNIP3)

TRAP-binding protein domain

ﬁbronectin 1 (FN1)

acid sphingomyelinase-like phosphodiesterase (ASM3A)

zinc ﬁnger protein 144 (Mel-18) (ZNF144)

Apoptotic chromatin condensation inducer in the nucleus (ACINUS)

solute carrier family 4, sodium bicarbonate cotransporter.

protein kinase C-like 2 (PRKCL2)

Ubiquitin-Iike 4

UV radiation resistance associated gene (UVRAG)

metallothionein 1L (MT1 L)

fatty acid binding protein 3. muscle and heart (mammary-derived growth inhibitor)
(FABP3)

guanine monphosphate synthetase (GMPS)

peroxisomal biogenesis factor 6 (PEX6)

diacylglycerol kinase, delta 130kDa (DGKD)

heat shock 70kDa protein 2

acylphosphatase 1. erythrocyte (common) type (ACYP1)

translocase of outer mitochondrial membrane 20 (yeast) homolog (KIAA0016)
Pl-3-kinase-related kinase SMG-1 (SMG1)

cytochrome P450. subfamily l (dioxin-inducible)

phosphoinositide-binding protein PlP3-E

NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 3. 9kDa (NDUFA3)
G protein-coupled receptor 48 (GPR48)

acetyl-Coenzyme A carboxylase alpha

insulin receptor_

protein phosphatase 1, regulatory (inhibitor) subunit 11 (PPP1R11)
UDP-GlcNAczbetaGal beta-1.3-N-acetylglucosaminyitransferase 3 (B3GNT3)
Phosphoglucomutase 1 (PGM1)

alanyl-tRNA synthetase (AARS)

(gF) C/EBP-induced protein (LOC81558)

guanine nucleotide binding protein (G protein), alpha 11 (Go class) (GNA11)
peroxisomal short-chain alcohol dehydrogenase (humNRDR)
electron-transfer-ﬂavoprotein, beta polypeptide (ETFB)

solute carrier family 25 (mitochondrial carrier; adenine nucleotide translocator)
metallothionein 1 G (MT1 G)

glucose phosphate isomerase (GPI)

Rho GTPase activating protein 1 (ARHGAP1)

zinc ﬁnger protein 148 (pHZ-52)

 

152

Table 6.6 Important genes selected for LDH alone

 

Access
Num

AA626370
AA86281 3
AA464590
AA453404
AA400973
AA495981
T67006
T81 033
AA430654
T73556
AA485898
W92859
H75862

H57850

T47788
W8571 0
H1 1 054
AA8724ZO
N721 15
R2221 9

M644550
H77542
H92556
R31 562
R50407
AA676836
AA482070
R95841
T64223
AA481 464
H0981 9
AA1691 76
AA41 1656
R36587
H98694
AA1 60670
H95038
AA458563
AA776294
H6661 7
AA451 935
AA098980
N34876
AA281 945

Gene Name

(gM) heme oxygenase (decycling) 2 (HMOXZ)

(gC) cytochrome c oxidase subunit VIII (COX8)

(gC) protein tyrosine phosphatase, receptor type, N polypeptide 2 (PTPRN2)
(gM) PPAR binding protein

(9) lipocalin 2 (oncogene 24p3) (LCN2)

(gC) Rho GTPase activating protein 6 (ARHGAP6)

(gF) glucokinase (hexokinase 4) regulatory protein (GCKR)

(gC) zinc finger protein 148 (pHZ-52)

(gN) ATPase, H+ transporting. lysosomal V0 subunit 3 isofonn 1 (ATP6VOA1)
(gM) fatty-acid-Coenzyme A ligase, long-chain 2 (FACL2)

(gC) apoptosis related protein APR-3 (APR-3)

(gC) protein tyrosine phosphatase, non-receptor type 1

(90) apoptotic chromatin condensation inducer in the nucleus (ACINUS)
(90) protein phosphatase 2 (formerly 2A), regulatory subunit A (PR 65). beta
isoform (PPP2R1 B)

(gC) apoptosis-inducing factor (AlF)-homologous mitochondrion-associated
inducer of death (AMID)

(gC) carnitine palmitoyltransferase l, muscle (CPT1 B)

(gC) protein kinase C, delta

(gN) collagen. type Vlll. alpha 1 (COL8A1)

(gC) cyclin-dependent kinase inhibitor 2C (p18, inhibits CDK4) (CDKN2C)
(gF) phosphate cytidylyltransferase 2. ethanolamine (PCYT2)

(gM) translocase of outer mitochondrial membrane 20 (yeast) homolog
(KIAA0016)

(gC) hyperpolarization activated cyclic nucleotide-gated potassium channel 3
(9C) eukaryotic translation initiation factor 2B. subunit 2 beta, 39kDa (EIF2B2)
(gC) GDP-diacylglycerol synthase (phosphatidate cytidylyltransferase) 1 (C081)
(gM) chemokine (C-X-C motif) ligand 2 (CXCL2)

(gC) acid sphingomyelinase-like phosphodiesterase (ASM3A)

(gC) Rho guanine exchange factor (GEF) 16 (ARHGEF16)

(gC) ribosomal protein 86 kinase, 90kDa, polypeptide 3

(9M) carboxypeptidase A3 (mast cell) (CPA3)

(gC) peptidylprolyl isomerase B (cyclophilin B) (PPIB)

(gC) phosphomevalonate kinase (PMVK)

(gC) glycerol-3-phosphate dehydrogenase 2 (mitochondrial)

(gC) chemokine (C-X-C motif) ligand 16 (CXCL16)

(gM) phosphatidylinositol (4.5) bisphosphate 5-phosphatase homolog

(gC) Pl-3-kinase—related kinase SMG-1 (SMG1)

(gC) lysophosphatidic acid phosphatase (ACP6)

(gl) uncoupling protein 4

(gN) ribulose-5-phosphate-3-epimerase (RPE)

(gC) Rab geranylgeranyltransferase, alpha subunit (RABGGTA)

(gC) golgi apparatus protein 1 (GLG1)

(gC) apoptosis inhibitor 5 (API5)

(gC) protein kinase C-like 2 (PRKCL2)

(gC) TRAF-binding protein domain

(90) MAP-kinase activating death domain (MADD)

 

153

 

 

Table 6.6 (Continued)

AA45481 0
T65790

R5378?

AA022480
AA1291 71
AA01 8907
AA279072
H1 751 3

AA463931
M6001 73

AA609421
R941 91
AA459265
AA431 988
W65461
AA437389

AA487586
AA434420
AA487588
AA464580
N71 497
R98442

H04769
AA490494
T61 256
AA486407
T55835
R39463
AA490501
AA451 886
AA280677
AA457700
R83038
N62847
T73468
AA443577
AA682469
T77729
AA644448
W20487
N91 1 35

(g) tumor-associated calcium signal transducer 2 (TACSTDZ)

(gC) famesyl diphosphate synthase (famesyl pyrophosphate synthetase
(gC) protein phosphatase 2, regulatory subunit B (B56), epsilon isoform
(PPP2R5E)

(gC) CREB binding protein (Rubinstein-Taybi syndrome) (CREBBP)
(90) protein phosphatase 2. regulatory subunit B (356)

(9C) protein phosphatase 3 (formerly 2B), regulatory subunit B, 19kDa
(gK) inositol polyphosphate phosphatase-like 1 (INPPL1)

(gF) heat shock 70kDa protein 1-like_(H17513,_.Hs.80288)

(gC) inositol 1.3,4-triphosphate 5/6 kinase (ITPK1)

(gC) ubiquitin-conjugating enzyme EZA (RADG homolog) (UBEZA)

(gC) membrane protein, palmitoylated 6 (MAGUK p55 subfamily member 6)
(MPP6)

(gC) translocase of outer mitochondrial membrane 70 homolog A (yeast)
(T OMM70A)

(gF) C/EBP-induced protein (LOC81558), mRNA.
(AA459265,NM_030802.Hs.9851 )

(gC) fatty acid amide hydrolase (FAAH), mRNA.
(AA431988,NM__001441,Hs.288828)

(gC) dual speciﬁcity phosphatase 5 (DUSP5), mRNA.
(W65461,NM_004419,Hs.2128)

(g) lanosterol synthase (2,3-oxidosqualene-lanosterol
cyclase)_(AA437389._.Hs.93199)

(gC) inorganic pyrophosphatase (SlD6-306), mRNA.
(AA487586,NM_006903.Hs.5123)

(gC) protein tyrosine phosphatase, non-receptor type 9 (PTPN9)

(gC) ATPase, H+ transporting, lysosomal interacting protein 1 (ATP6IP1)
(gC) acetyl-Coenzyme A carboxylase alpha_(AA464580,_,Hs.7232)
(gC) low density lipoprotein receptor-related protein 6 (LRP6)

(gC) UDP-glucose ceramide glucosyltransferase-like 1_(R98442,_,Hs.105794)
(gC) protein kinase (CAMP-dependent. catalytic) inhibitor
beta_(H04769,_. Hs.106106)

(gC) tumor necrosis factor receptor superfamily, member 21 (T NFRSF21)
(gM) ketohexokinase (fructokinase) (KHK)

(gN) phosphodiesterase 4D interacting protein (myomegalin)

(gC) histone deacetylase 6 (HDACG)

(gC) aldolase C. fructose-bisphosphate (ALDOC)

(gC) UV radiation resistance associated gene (UVRAG)

(gC) cytochrome P450, subfamily I (dioxin-inducible)

(gC) zinc ﬁnger protein 258 (ZNF258)

(gC) stearoyl-CoA desaturase (delta-9—desaturase) (SCD)

(90) low density lipoprotein receptor-related protein 5 (LRP5)

(g) lysosomal-associated membrane protein 2 (LAMP2)

(gC) glutathione S-transferase A2 (GSTA2)

(gC) tumor necrosis factor (ligand) superfamily, member 13 (TNFSF13)
(gC) alcohol dehydrogenase IB (class I), beta polypeptide

(90) pyruvate carboxylase (PC)

(90) protein tyrosine phosphatase. receptor type

(90) BCL2-related protein A1 (BCL2A1)

(9N) chloride intracellular channel 3 (CLIC3)

 

154

 

Table 6.6 (Continued)

AA425450 (gC) glycoprotein (transmembrane) nmb (GPNMB)
R71691 (gC) TNF receptor-associated factor 1 (TRAF1)
AA186686 (gC) prostaglandin E synthase 2 (PTGESZ)
AA454588 (gC) BCL2-Iike 2 (BCL2L2)

T71976 (gC) phosphatidic acid phosphatase type 23 (PPAP2B)
N46830 (gC) cytokine receptor-like factor 3 (CRLF3)
T81764 (90) cell division cycle 27 (CD027)

AA434130 (gC) thioredoxin reductase 2 (T XNRDZ)
AA520978 (gC) ubiquitin-conjugating enzyme E2H (UBC8 homolog, yeast)
. (gC) protein kinase, interferon-inducible double stranded RNA dependent
AA010079 (PRKR)
AA446839 (gC) BCL2/adenovirus E1 B 19kDa interacting protein 3 (BNIP3)
(gC) cytochrome P450, subfamily l (aromatic compound-inducible). polypeptide
AA418907 1 (CYP1A1)

R06605 (gC) protein tyrosine phosphatase, non-receptor type 1 (PTPN1)
N62379 (gC) l-kappa-B-interacting Ras-like protein 1
R46823 (g) N-acetylgalactosaminidase, alpha- (NAGA)

 

Comparing Tables 6.5 with 6.6, we can see that many important genes such as SCD,
ACC, TNF super family, TRAFI, Bel-2 were selected in both cases. However, the two
factor case selected more lipid metabolism related genes, e. g. acyl-CoA dehydrogenase
and diacylglycerol kinase. In addition, in the two factors case selected NADH
dehydrogenase, which is involved in complex I of the electron transport chain. It is a
major site of ROS production and has been shown to play an important role in the
palmitate-induced toxicity [Srivastava 2006, Li 2006]. Therefore, more useful
information was extracted with multiple factors. The next paragraph describes how the
T IPS© framework would be extended to incorporate more than two metabolites.

To incorporate more than two metabolites or cellular processes such as the entire
metabolome proﬁle into the T IPS© framework, latent variables can be extracted from the
metabolome data with methods such as PCA, PLS [Grifﬁn 2004], and FDA. L1 and L2 in
Equation 6.2 would be substituted with the extracted latent variables that represent the

metabolome data. PCA and PLS extracts latent variables that account for most of the

155

variance in the data while FDA extracts latent variables that have the largest
discriminating capability. To incorporate nonlinearity effects, kernels may be applied to
the aforementioned methods. The number of latent variables may be determined based
upon the variance captured by the latent variables. For example, to capture 90% of the
variance in the raw data, we included enough latent variables to extract 90% (or more) of
the variance. In the network reconstruction stage by BN, the latent variables (rather than
simply LDH and TG) may be used as pseudo nodes to represent pathways, the biological
meaning of the pathways are then determined from the genes in the latent variables
[Grifﬁn 2004]. An alternative method is to select the metabolites with high weights to be
the latent variables that would then be used for the ﬁnal network reconstruction in the BN
step. This method is not limited to two factors as illustrated in equation 6.2, but may be

extended to multiple factors, L1, L2, ..., Ln.

6.4 Using a Dynamic Bayesian Network (DBN) to identify active
pathways from the dynamic proﬁles

6.4.1 Introduction

The current TIPS© model handles static data, i.e. data at a single time point. However,
cells continuously reprogram gene regulatory networks when they sense the changes in
the environmental conditions. In order to understand how cells are regulated in response
to environmental changes, time series data (dynamic data) are required. In our system,
gene expression data as well as metabolites over three consecutive days (24, 48, 72 hours
aﬁer FFA and TNFa exposure) have been proﬁled. It has been observed that cytotoxicity

was regulated by FFA and TNFa differently as a function of time [Srivastava 2006]. For

example, TNFa increased cytoxicity on day 1 and had no effects on day 2. However it

156

decreased cytotoxicity on day 3. This suggests that the gene regulatory network changes
over time. Uncovering this information would be valuable in helping to elucidate the
mechanism of toxicity and cell death. Models, such as DBN, can be applied to model
time series data. To apply DBN to identify the pathways that regulate cytotoxicity in
HepG2 cells, we ﬁrst applied a hierarchical framework introduced in section 6.2 to select

the genes relevant to cytotoxicity.

6.4.2 Materials and Methods

Dynamic Bayesian network analysis

We applied Banjo (http://www.cs.duke.edu/~amink/software/ba_mio/) from Alexander
Hartemink’s group to learn the DBN. Every node in the Bayesian network represents a
gene, a metabolic ﬂux or a phenotypic response. A ﬁrst order Markov DBN was used, i.e.
a node at a given time point can be inﬂuenced by itself or its parents in the immediate
previous time point. The search and score approach was used to ﬁnd a network structure
with the maximum score. Bayesian Dirichlet equivalence (BDe) was used to score a

network. Greedy search algorithm was used to search for the optimal network.

6.4.3 Results and Discussion

Network Reconstruction

157

 

Table 6.7: Experimental conditions for DBN learning

 

 

TNF-(nglml)
FFA 0 20 100
Hepcz medium (H)|H-0 H-20 H-100
BSA(B) B0 320 B100
Palmitate(P) P0 P20 P100
Oleate(0) 00 020 0100
Linoleate(L) L0 L20 L100

 

 

 

Based upon the hierarchial approach in section 6.2, we selected 80 genes, 3
metabolites (BOH, ACAC, TG) and one phenotype (LDH) for DBN reconstruction. The
genes, metabolites and phenotypic proﬁles are listed in Table 6.4. The network was
learned for each treatment, i.e. HepG2 medium (H, control), Bovine Serum Albumin (B),
Palmitate (P), Oleate (O), Linoleate (L) and their combinations with TNF—0r, separately,
based upon the dynamic proﬁle of the genes measured at 24, 48 and 72 hours after the
treatments. As summarized in Table 6.7, we learned the network for 15 conditions with
two replicates for each condition. The experiments were performed by Shireesh
Srivastava and the gene expression data were obtained at the Van Andel Institute. All
connections that appeared more than twice were combined together for network
reconstruction. The network is shown in Figure 6.3, with the numbers in the network
corresponding to the Nums in Table 6.4.

From the network shown in Figure 6.4 we can see that the phenotype is mainly
regulated by two categories of factors, oxidative stress related factors and TNF-0r signal

pathway related factors. The oxidative stress related factors include NADH

158

dehydrogenase, aldehyde dehydrogenase (ALDH), glutathione-S transferase (GST),
copper chaperone for superoxide dismutase (CCS) and CYP2E1. TNF-or related factors
include NF-KB, IKB, TRAF, TNF superfamily, capase 8, inhibitor of apoptosis protein
(lAP). These two categories interact with each other in the network. LDH was found
connected directly to genes such as copper chaperone for superoxide dismutase (CCS)
and biophosphoglycerate mutase (BPGM). CCS was connected to ALDH1A2, NADH
dehydrogenase, CYP2E1, and GSTMS. All these factors are relevant to cellular ROS and
lipid peroxidation and thus play a role in the palmitate-induced cytotoxicity. Excessive
accumulation of ROS leads to oxidative damage, e. g. peroxidizing membrane to produce
deleterious aldehydes such as malonedialdehyde and 4-hydroxy-2,3-(E)-nonenal (4-
HNE). ROS are produced during the electron transfer and oxidative phosphorylation
process. NADH dehydrogenase complex, or complex I of the electron transport chain, is
a main source of superoxide production [Moller 2001]. Superoxide dismutase (SOD)
protects cells from free radical damage by reducing superoxide anion radical into
hydrogen peroxide. 4-HNE has been found to elicit cytotoxic effects in diverse cells
[Hartley 1995]. The detoxiﬁcation pathways of 4-I-INE include oxidation of its carbonyl
group into carboxylic acid by ALDH or conjugation with glutathione by GSTs.
Cytochrome P-450 e.g. CYP2E1 sequentially transfers two electrons from NADPH to
molecular oxygen. In the course of electron transfer some of the activated oxygen is
released as superoxides or H202 [Moller 2001]. Excessive accumulation of ROS leads to
oxidative damage. Therefore, the connections among CCS, ALDH, NADH
dehydrogenase, GST, CYP2E1 suggest the involvement of ROS in the observed

cytotoxicity and possible detoxiﬁcation mechanisms. BPGM is an enzyme involved in

159

glycolysis to convert 3-phospho-D-glyceroyl phosphate to 2,3-bisphospho-D-glycerate
and it is connected to two apoptosis inhibitor genes of IAP2 and IAP3. The connections
identiﬁed between LDH and IAP and BPGM suggested the involvement of glycolysis
and apoptosis in the palmitate induced cytotoxicity. High glucose level has been reported
to synergize with saturated fatty acids to cause increased cell death through apoptosis in
pancreatic beta cells because elevated glucose level has inhibitory effects on the lipid
detoxiﬁcation via beta-oxidation [El-Assaad 2003]. In addition, high glucose metabolism
may increase superoxide production, which also induces apoptosis [Okuyama 2003].
Indeed the glucose level in the HepG2 culture medium is very high. Apoptosis was
initiated in the palmitate cultures as was evident by a signiﬁcantly higher caspase activity
detected in these cultures as compared to the control and unsaturated fatty acid cultures
[Srivastava 2006].

The activation of NF-KB by the TNF-0L signaling pathway was uncovered by
DBN. TNFRSFI connects to TRAF3, which then connects to IKB. IKB is connected to
NF- KB through GSTM4. TNF-0r is a cytokine that can be detected by the TNF receptor
which activates TRAF and IKK. Activation of IKK will phosphorate and degrade IKB to

release NF-KB. It is encouraging that the DBN model was able to reconstruct this

signaling pathway. The model also indicated that the NF-KB activation may be mediated
by genes such as GST and ALDH.

Discriminant and correlation analyses found the metabolic ﬂux, BOH, to be the
most relevant to the palmitate-induced cytotoxicity. BOH was found to be connected to
NADH dehydrogenase and SOD. BOH are produced from acetyl-CoA, which is a

metabolic intermediate of fatty acid oxidation. Fatty acid oxidation induces R08

160

 

 

 

production [Gill 2006, Sanya 2003]. NADH dehydrogenase complex, or complex I of the
electron transport chain, is a main site of superoxide production. Thus, the connection
between NADH dehydrogenase, SOD and BOH indicates the involvement of ROS
production in the palmitate-induced cytotoxicity. Indeed, experimentally inhibiting
NADH dehydrogenase in the palmitate cultures signiﬁcantly reduced the LDH release
[Srivastava 2006, Li 2006].

Therefore based upon the important genes identiﬁed by the hierarchical approach
(described in section 6.2), DBN was able to reconstruct a regulatory network from the
dynamic gene expression and metabolic proﬁles. The resulting network suggested
possible interactions between the identiﬁed genes and metabolic ﬂuxes, which indicated
the involvement of some pathways such as apoptosis, NF-KB and ROS in the palmitate
induced cytotoxicity. The inferred network structure may be used to construct an in silico
regulatory network to simulate how BOH or LDH release is dynamically regulated. Thus,
potential hypothesis and mechanisms involved in inducing cytoxicity may be identiﬁed

based upon the reconstructed network.

6.5 Future work

We discussed in this chapter the improvements to TIPS© including hierarchically
integrating phenotypic metabolic, and gene expression proﬁles with genomic function
information, inferring pathways that confer a phenotype using multiples metabolites and
dynamic modeling. Future work would be to integrate all these features into a dynamic
TIPS© ﬁamework. Brieﬂy, metabolites important to a phenotype can be identiﬁed with
the hierarchical approach using FDA. This method would be able to identify multiple
metabolites that may be important in characterizing the phenotype. GSEA can be used to

161

deﬁne the functronal gene groups and identify signiﬁcantly enriched ones Wthh can be
integrated With the multiple metabolites, identiﬁed in FDA by GA/MB PLS to select
important functronal gene sets for the several metabolites that are most relevant to a

phenotype Fmally, important gene sets may be subjected to DBN analysrs to learn the

regulatory network and reconstruct the pathways from time senes data

   

 

    
   
  
  
  
  

/\//’,/ /I/ , :\ \\\
‘1411115411391 11(6411 \‘
\/\/ \‘T' l Vj \ \
1 Y, : \ \
l

 

 

‘1 I
I
/ ‘1 (if; 1
. \1 // \ \J\ 1
“‘1 ‘ ‘ 1/ ‘ 1 ' \\
,1 \ 1 43 ,1,“ \‘f /X'\<1 ‘ 50) 1 \\ ‘\ 1
1 2,. :\\\ 1’ \\\ \ \.'\:\ \ ‘ .\
‘1 j‘\.1\;_1\/// \ \. \ 1:
1173,): 7603 '1‘ \\ :
X i 1 - 1
4.1 \_ 1' , 1 , \ :1
162114611 11 ‘1‘ 1
e54 \ : 1
0111911 (74"441 '1, 1:
::I:3‘&*< v/“K/ ’11 1X/;‘\ / 1:11
M571110111,,58/“111119/11
77* 5713;,31 1;: ’ \ \\ ‘\ 15W
{\37’1161116‘

"U42

Figure 6 3 DBN reconstrunction of the network. The nubers in the nodes correspond to the nums rn
Table 6.4.

162

CHAPTER 7 CONCLUSIONS

It is the objective of this research to develop approaches that can identify the pathways
involved in conferring a phenotype or cellular state. Once these pathways are elucidated,
targets may be identiﬁed and engineered to more accurately modulate or optimize the
cellular state. Microarray gene expression and metabolic ﬂux data were obtained to
characterize cellular states. In addition to the measurements, a priori knowledge about
the genes, metabolites and pathways available in literature and public databases were also
used. Systems approaches were developed in this thesis to integrate these multiple
sources of data to identify the genes and active pathways relevant to a particular
phenotype. T [PS© was developed in chapter 2 to chapter 4 to identify the active pathways
that regulate a phenotype. T IPSO applied GA/PLS and CICA to identify a subset of
relevant genes, and this subset of genes was subsequently subjected to BN analysis to
reconstruct active pathways. T IP50 was successfully applied to identify active pathways
relevant to palmitate and TNF-01 induced cytotoxicity in HepG2 cells. A dynamic SSM
was developed in chapter 5 to integrate gene expression and a priori knowledge of the
gene regulatory network structure to infer underlying transcription factor activity proﬁles.
SSM was successfully applied to infer transcription factor activities of E. C011 and yeast,
which were validated by the literature. Extensions to TIPS© were discussed in chapter 6
to incorporate functional information in the gene selection process, infer pathways
relevant to multiple metabolites, and from time series data using DBN analysis. In
summary, systems approaches were developed to integrate gene expression, metabolic

and phenotypic proﬁles, along with a priori information, to elucidate mechanisms and

163

pathways involved in conferring a particular phenotype. As a proof of concept, the
approaches developed were applied to HepG2 cells exposed to FFAs and TNF-01 to

identify genes and pathways that confer a cytotoxic vs. a cytoprotective phenotype.

164

 

Appendix Figure l

 

BIBLIOGRAPHY

Abreo K., Sella M., Alvarez-Hemandez X., Jain S. (2004) “Antioxidants prevent
aluminum-induced toxicity in cultured hepatocytes” J Inorg Biochem.
98(6):] 129—34.

Aeme B.L., Johnson A.L., Toyn J.H., Johnston L.H. (1998). “Swi5 Controls a Novel
Wave of Cyclin Synthesis in Late Mitosis” Mol. Biol. Cell 9, 945-956.

Akutsu T., Miyano S., Kuhara S. (1999) “Identiﬁcation of Genetic Networks from a
Small Number of Gene Expression Patterns Under the Boolean Network Model”
chiﬁc Symposium on Biocomputing, 4:17-28.

Al-Shahrour F., Minguez P., Vaquerizas J .M., Conde L., Dopazo J .(2005) “Babelomics: a
suite of web-tools for functional armotation and analysis of group of genes in
high-throughput experiments” Nucleic Acids Research. 33(Web Server issue),
W460-W464.

Alter 0., Brown P.O., Bostein D. (2000) “Singular value decomposition for genome-wide
expression data processing and modeling” PNAS. 97(18):]0101-10106.

Andersson C. A. and Bro R. (2000) “The N-way Toolbox for MATLAB”, Chemometrics
& Intelligent Laborgtory Systems. 52 (1):1-4

Basso K., Margolin A.A., Stolovitzky G., Klein U., Dalla-Favera R., Califano A.(2005)
“Reverse engineering of regulatory networks in human B cells” Nat Genet.
37(4):382-90.

Bangalore A.S., Shaffer R. E., Small G.W. (1996) “Genetic algorithm-based method for
selecting wavelengths and model size for use with partial least-squares regression:
application to near-infrared spectroscopy.” Anal. Chem, 68: 4200-4212.

Baetz K., Andrews B. (1999). “Regulation of Cell Cycle Transcription Factor Swi4
through Auto-Inhibition of DNA Binding.” Mol. Cell Biol. 19, 6729-6741.

Bar-Joseph Z., Gerber G., Gifford D., Jaakkola T., Simon 1. (2002). “A New Approach to
Analyzing Gene Expression Time Series Data.” 6th Annual International
Conference on Research in Computgtional Molecular Biology. pp. 39-48

Beal M.J., Falciani F., Ghahramani 2., Range] C., Wild, BL, (2005) “A Bayesian
approach to reconstructing genetic regulatory networks with hidden factors.”
Bioinforrnatics 21(3): 349-356.

Bligh E.G., Dyer W.J., (1959) “A rapid method of total lipid extraction and puriﬁcation”,
Can J Biochem Physiol. 37(8):911-7.

166

 

Bolon C., Gauthier C., Simonnet H. (1997) “Glycolysis inhibition by palmitate in renal
cells cultured in a two-chamber system.” Am J Physiol. 273(5 Pt 1):C1732-8.

Brenner D. A. (1998)“Signal transduction during liver regeneration”, J Gwenterol.
Hmatol. 13 Suppl: 893-95.

Brookes P. S. (2005)“Mitochondrial H(+) leak and ROS generation: an odd couple.” FE
Radic Biol Med. 38(1):]2-23.

Castillo E. F., Gutierrez J. M., and Hadi A. S. (1997) “Sensitivity analysis in discrete
Bayesian networ ”, IEEE Transactions on System; Man,421nd Cybernetics - Pa_rt
A: Systemsﬁ and Humans, 27:412-423.

Chan C., Berthiaume F., Washuzu J ., Toner M., and Yarmush M. L. (2002) “Metabolic
Pre-Conditioning of Cultured Cells in Physiological Levels of Insulin: Generating
Resistance to the Lipid Accumulating Effects of Plasma in Hepatocytes”
Biotechnology and Bioengineerigg, 78: 753-760.

Chan C., Berthiaun‘re F ., Lee K., and Yarmush M. L. (2003a) “Metabolic Flux Analysis
of Cultured Hepatocytes Exposed to Plasma”, Biotechnology and Bioengineering,
81: 33-49.

Chan C., Berthiaume F., Lee K., and Yarmush M. L. (2003b) “Effect of Hormones on
Liver-speciﬁc Function and Lipid Accumulation of Hepatocytes during Plasma
Exposure Analyzed by Metabolic Flux Analysis”, Metabolic Engineering, 5: 1-15.

Chan C., Hwang D. H., Stephanopoulos G. N., Yarmush M. L., and Stephanopoulos G.
(20030) “Application of Multivariate Analysis to Optimize Function of Cultured
Hepatocytes”, Biotechnology Prggress, 19: 580-598.

Chen J .L., Peacock E., Samady W., Turner S.M., Neese R.A., Hellerstein M.K., Murphy
E.J.(2005) “Physiologic and pharmacologic factors inﬂuencing glyceroneogenic

contribution to triacylglyceride glycerol measured by mass isotopomer
distribution analysis”, J Biol Chem. 280(27):25396-402.

Chen T., He H. L., Church GM. (1999) “Modeling Gene Expression with Differential
Equations”, Paciﬁc Symposium on Biocomputing, 4:29-40

Cheng J., Kelly J ., Bell DA. and Liu W. (2002) “Learning belief networks from data: An
information theory based approach”, Artiﬁcial Intelligence J ourn_al, 137: 43-90.

Ciccoli L., Ferrali M., Casini A. F., Comporti M. (1981) “Effect of reduced glutathione
on carbon tetrachloride induced fatty liver: various considerations” Boll Soc Ital
Biol Sper. 57(13):]463-9.

167

Cooper GP. (1990) “The computational complexity of probabilistic inference using
Bayesian belief networks”. Artiﬁcial Intelligence, 42(2-3):393-405

Cooper G. and Herskovits E. (1992) “A Bayesian method for the induction of
probabilistic networks from data”, Machine Learning, 9:309-347.

de Pablo M.A. (1999) “Palmitate induces apoptosis via a direct effect on mitochondria.”
Apoptosis 4, 81 (1999).

Deng X., Ito T., Carr B., Mumby M., May Jr, W.S. (1998) “Reversible Phosphorylation
of Bch following Interleukin 3 or Bryostatin 1 Is Mediated by Direct Interaction
with Protein Phosphatase 2A”. J. Biol. Chem. 273, 34157-34163.

Dentin R., Benhamed F., Pegorier J. P., Foufelle F., Viollet B., Vaulont S., Girard J.,
Postic C. (2005) “Polyunsaturated fatty acids suppress glycolytic and lipogenic
genes through the inhibition of ChREBP nuclear protein translocation.” J Clin
Invest. 1 15(10):2843-54.

di Bernardo D., Thompson M. J., Gardner T. S., Chobot S. E., Eastwood E. L., Wojtovich
A. P., Elliott S. J ., Schaus S. E., and Collins J .J . (2005) “Chemogenomic proﬁling
on a genome-wide scale using reverse-engineered gene networks.” Nature

Biotechnology 23(3):377-383.

Di Giovanni S., Mirabella M., Papacci M., Odoardi F., Silvestri G., Servidei S. (2001)
“Apoptosis and ROS detoxiﬁcation enzymes correlate with cytochrome c oxidase

deﬁciency in mitochondrial encephalomyopathies”, Mol Cell Neurosci.
17(4):696-705.

Ding W. X., Yin X. M.(2004) “Dissection of the multiple mechanisms of TNF-alpha-
induced apoptosis in liver injury”, J Cell Mol Med, 8(4): 445-54.

Dobrzyn P., Dobrzyn A., Miyazaki M., Cohen R, Asilmaz E., Hardie D.G., Friedman J.
M., Ntambi J. M. (2004) “Stearoyl-CoA desaturase ] deﬁciency increases fatty
acid oxidation by activating AMP-activated protein kinase in liver.” PNAS.

101(17):6409-6414.

Eisen M.B., Spellman P.T., Brown PO. and Botstein D. (1998) “Cluster analysis and
display of genome-wide expression patterns.” Proc. Natl. Acad. Sci. USA, 95(25):
14863-14868.

El-Assaad W., Buteau J ., Peyot M. L., Nolan C., Roduit R., Hardy S., Joly E., Dbaibo G.,
Rosenberg L., Prentki M. (2003) “Saturated fatty acids synergize with elevated
glucose to cause pancreatic beta-cell death.” Endocrinology. 144(9):4154-63.

Farmer W.R., and Liao J .C. (2000) “Improving lycopene production in Escherichia coli
by engineering metabolic control”, Nature biotechnology, 1825332537.

168

Felber J. P. and Golay A. (2002) "Pathways from obesity to diabetes." International
Journ_al of Obesity 26(Suppl. 2): $39-$45.

Fellenberg K., Hauser N.C., Brors B., Neutzner A., Hoheisel J .D., Vingron M. (200])
“Correspondence analysis applied to microarray data.” Proc. Natl. Acad. Sci.
USA, 98(19):]0781-10786.

Fleury C. (1997) “Uncoupling protein-2: a novel gene linked to obesity and
hyperinsulinemia.” Nature Genetics 15, 269-272.

Friedman N., Linial M., Nachman 1., and Pe' er D. (2000) “Using Bayesian networks to
analyze expression data” J. Comp. Bio.. 7:601-620.

Friedman N. (2004) “Inferring Cellular Networks Using Probabilistic Graphical Models.”
Science 303(5659):799-805.

 

Geladi P., and Kowalski B. P. (1986) “Partial least-squares regression: a tutorial.” Anal.
Chim. Acta. 185,].

 

Gill H., Wu G. (2006) “Non-alcoholic fatty liver disease and the metabolic syndrome:
Effects of weight loss and a review of popular diets. Are low carbohydrate diets
the answer.” World J Ga_stroente;o_l 12(3):345-353

Goldberg D.E, Deb K. (1989) “Genetic algorithms in search, optimization, and machine
learning” Addison-Wesley, Reading, MA.

Gomez E.O., Mendoza-Milla C., Ibarra-Sanchez M.J., Ventura-Gallegos J.L. (1996).
“Ceramide reproduces late appearance of oxidative stress during TNF-mediated
cell death in L929 cells”, Biochem Biophys Res Commun. 228(2): 505-9.

Grifﬁn J .L. (2004) “An integrated reverse functional genomic and metabolic approach to
understanding orotic acid-induced fatty liver.” Physiol. Genomics 17, 140-149.

Haddad J. (2002) “Oxygen-sensing mechanisms and the regulation of redox-responsive
transcription factors in development and path0physiology” Respirgtory Research
3, 26.

Haddad J.J. (2004) “On the antioxidant mechanisms of Bcl-2: a retrospective of NF-
[kappa]B signaling and oxidative stress.” Biochemical and Biophysical Research
Communicgtions 322, 355.

Hakarnada K., Hanai T., Honda H., and Kobayashi T., (2001) “Identifying genetic

network using experimental time series data by Boolean algorithm”, Genome
Informatics, 12:272-273.

169

Henrion M. (1988) “Propagating uncertainty in Bayesian networks by probabilistic logic
sampling”. In Uncertainty in Artiﬁcial Intellgience 2, pages 149-163, New York,
N. Y., Elsevier Science Publishing Company, Inc.

Hartley D. P., Ruth J. A., Petersen D. R. (1995) “The hepatocellular metabolism of 4-
hydroxynonenal by alcohol dehydrogenase, aldehyde dehydrogenase, and
glutathione S-transferase”Arch Biochem Biophys. 316(1): 197-205.

Hartemink A.J., Gifford D. K., Jaakkola T. S., and Young RA (2001) “Using Graphical
Models and Genomic Expression Data to Statistically Validate Models of Genetic
Regulatory Networ ” Paciﬁc Symposium on Biocomputing, 6:422-433.

Hayes J. D., Pulford D. J. (1995) “The glutathione S-transferase supergene family:
regulation of GST and the contribution of the isoenzymes to cancer
chemoprotection and drug resistance.” Crit Rev Biochem Mol Biol. 30(6):445-
600.

Heckerman D., Geiger D. and Chickering D.M. (1995) “Learning Bayesian networks: the

combination of knowledge and statistical data” ﬁrehine LeannrgLJoumJal.
20:197-243.

Heyninck K., Wullaert A., Beyaert R. (2003) “Nuclear factor-kappa B plays a central role

in tumour necrosis factor-mediated liver disease”, Biochem. Phannacol.
66(8):]409-15.

Holland J. (1975) “Adaptation in natural and artiﬁcial system.” The university of
Michigan press, Ann Arbor.

Holland M. (2002) “Transcript abundance in yeast varies over six orders of magnitude.”
J. Biol. Chem. 277, 14363-14366.

Houck C.R., Joines J.A., Kay MG. (1995), “A genetic algorithm for function
optimization: a Matlab implementation,” NCSU-IETR 95-09.

Hwang D. H., Stephanopoulos G., Chan C. (2004) “Inverse modeling using multi-block
PLS to determine the environmental conditions that provide optimal cellular
function,” Bioinformatics 20, 487-499.

Ideker T., Ozier O., Schwikowski B., Siege] Andrew F. (2002) “Discovering regulatory
and signalling circuits in molecular interaction networks.” Bioinformatics 18
Suppl l:S233-240.

Idris 1., Gray 8., Donnelly R. (200]) “Protein kinase C activation: isozyme-speciﬁc
effects on metabolism and cardiovascular complications in diabetes”
Diabetologig, 44: 659-673.

170

 

Ikuko M., Hiroto M., Yuki K., Tetsuya K., Sachiko S., Shinya T. (2003) “Acetaldehyde
Induces Granulocyte Macrophage Colony-Stimulating Factor Production in
Human Bronchi through Activation of Nuclear Factor-KB” Allergy and Asthma
Proceedings, 24 (5): 367-371

Izpisua J.C., Barber T., Cabo J., Hrelia S., Rossi C.A., Parenti C. G., Lerker G., Biagi
P.L., Bordoni A., Lenaz G. (1989) “Lipid composition, ﬂuidity and enzymatic
activities of rat liver plasma and mitochondiral membranes in dietary obese rats.”
Int J Obes. 13(4): 531-542.

 

J anicke R., Droge W. (1985) “Effect of L-ornithine on proliferative and cytotoxic T-cell
responses in allogeneic and syngeneic mixed leukocyte cultures” Cell Irnmunol.
92(2):359-65.

Jim R., Si L., Srivastava S., Li Z., Chan, C. (2006) “A Knowledge Driven Regression
Model for Gene Expression and Microarray Analysis”, EMBC, accepted.

J etten A. M., Suter U. (2000) “The peripheral myelin protein 22 and epithelial membrane
protein family”, Prog Nucleic Acid Res Mol Biol. 64297-129.

Ji J ., Zhang L., Wang P., Mu Y. M. (2006) “Saturated free fatty acid, palmitic acid,
induces apoptosis in fetal hepatocytes in culture”, Exp Toxicol Pathol. 56(6):369-
76.

J ia Z., Xu S. (2005) “Clustering expressed genes on the basis of their association with a
quantitative phenotype.” Genet Res. 86(3):]93-207.

Jump DB. (2004) “Fatty acid regulation of gene transcription”, Crit Rev Clin Lab Sci.
41(1):41-78.

Kao K.C., Yang Y.L., Boscolo R., Sabatti C., Roychowdhury V.P., Liao J.C., (2004)
“Transcriptome-based determination of multiple transcription regulator activities
in Escherichia coli by using network component analysis.” PNAS, 101(2): 641-
646.

Kao K.C., Tran L.M., Liao J .C., (2005) “A global regulatory role of gluconeogenic genes
in Escherichia coli revealed by transcriptome network analysis,” The Journal of
Biological Chemistry, Vol 280 (43):36079-36087

Kanthasamy A.G., Kitazawa M., Kanthasamy A., Anantharam V. (2003) “Role of
Proteolytic Activation of Protein Kinase C in Oxidative Stress-Induced
Apoptosis ”, Antioxid Redox Siggal. 5(5):609-620.

Kerszberg M. (2004) “Noise, delays, robustness, canalization and all that” Current
Opinion in Genetics and Development 14: 440-445

171

Kemer J., Hoppel C. (2000) “Fatty acid import into mitochondria.” Biochimica et
Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids 1486, l).

Kilpatrick L. E., Lee J. Y., Haines K.M., Campbell D.E., Sullivan K.E., Korchak HM.
(2002) “A role for PKC-delta. and PI 3-kinase in TNF-alpha -mediated
antiapoptotic signaling in the human neutrophil.” Am J Physiol Cell Physiol

283(1):C48-57.

Kim BC. (2002) “Tumor Necrosis Factor Induces Apoptosis in Hepatoma Cells by
Increasing Ca2+ Release from the Endoplasmic Reticulum and Suppressing Bcl-2
Expression.” J. Biol. Chem. 277, 31381-31389.

Kim J. S. He L., and Lemasters J. J. (2003) “Mitochondrial permeability transition: 3
common pathway to necrosis and apoptosis”, Biochem Biophys Res Comm
304(3):463-70.

Kobayashi M. (1998) "Molecular mechanism of insulin resistance." Saishin Igaku 53(6):
1210-1216.

Kovacech B., Nasmyth K., Schuster T. (1996). “EGT2 Gene Transcription is Induced
Predominantly by Swi5 in Early G1.” Mol. Cell. Biol. 16, 3264—3274.

Leardi R. (2000) “Application of genetic algorithm-PLS for feature selection in spectral
data sets.” J. Chemometrics. 14: 643-655.

Lee T. 1., Rinaldi N. J ., Robert F., Odom D. T., Bar-Joseph Z., Gerber G. K., Hannett N.
M., Harbison C. T., Thompson C. M., Simon 1., Zeitlinger J., Jennings E. G.,
Murray H. L., Gordon D. B., Ren B., Wyrick J. J., Tagne J.-B., Volkert T. L.,
Fraenkel E., Gifford D. K., and Young R. A. (2002) “Transcriptional Regulatory
Networks in Saccharomyces cerevisiae”. Science, 298(5594): 799-804.

Lewandowski E. D., Kudej R. K., White L. T., O'Donnell J. M., Vatner S. F. (2002)
“Mitochondrial preference for short chain fatty acid oxidation during coronary
artery constriction.” Circulation. 105(3):367-72.

Liang S., Fuhrman S. and Somogyi R. (1998) “REVEAL, a general reverse engineering
algorithm for inference of genetic network architectures.” Paciﬁc Symposium on
BiocomputingLBAS -29.

Liao, J .C., Boscolo, R., Yang, Y.L., Tran, L.M., Sabatti, C., Roychowdhury, V.P.,
(2003), “Network component analysis: Reconstruction of regulatory signals in
biological systems. ” PNAS, 100(26):15522-15527.

Lieberrneister W. (2002) “Linear modes of gene expression determined by independent
component analysis.” Bioinformatics 18, 51-60 (2002).

172

Listenberger L. L., Han X., Lewis S. E., Cases 8., Farese R.V., Jr. Ory D. S., Schaffer
J .E. (2003) “Triglyceride accumulation protects against fatty acid-induced
lipotoxicity.” Proceedings of the National Academy of Sciences of the United
States of America 100(6):3077-3082.

Listenberger L.L., Ory D.S., Schaffer J .E. (2001) “Palmitate-induced apoptosis can occur
through a ceramide-independent pathway.” Journgl of Biological Chemistry 276,
14890-14895.

Li Z., Chan C. (2004a) “Integrating Gene Expression and Metabolic Profiles.” J Biol
Chem 279(26):27124-27137.

Li Z., Chan C. (2004b) “Inferring pathways and networks with a Bayesian framework”
FASEB J 18(6):746-748.

Li Z., Srivastava S., Mittal 8., Norton P., Resau J., Haab B., Chan C. (2006) “A
Hierarchical Approach to Identify Pathways that Confer Cytotoxicity in HepG2
Cells from Metabolic and Gene Expression Proﬁles”, in review.

Lin S. M., Liao X., McConnell P., Vata K., Carin L., Goldschmidt P. (2001) “Using
functional genomic units to corroborate user experiments with the Rosetta
compendium.” Methods of Microarray Data Analysis II, Papers from CAMDA
11, Durham, NC, United States, Oct 15-16, 2001 2002:123-137.

Loew L.M., Schaff J .C. (2001) “The virtual cell: a software environment for
computational cell biology.” Trends in Biotechnology, 19(10): 401-406.

Lopes J.A., Menezes J.C., Westerhuis J.A., Smilde A.K.. (2002) “Multiblock PLS
analysis of an industrial pharmaceutical process” Biotechnol Bioeng. 80(4):4l9-
27

Lu Z.H., Mu Y.M., Wang B.A., Li X.L.(2003) “Saturated free fatty acids, palmitic acid
and stearic acid, induce apoptosis by stimulation of ceramide generation in rat
testicular Leydig cell”, BiochemicL and Biophysigrl Research Communication;
303(4): 1002-1007.

MacKay V.L., Mai B., Waters L., Breeden LL. (2001). “Early Cell Cycle Box-Mediated
Transcription of CLN3 and SW14 Contributes to the Proper Timing of the Gl-to-
S Transition in Budding Yeast.” Mol. Cell Biol. 21, 4140-4148.

McAdams H. and Arkin A. (1997). "Stochastic mechanisms in gene expression." PNAS
94(3): 814-819.

McInemy C.J., Partridge J.F., Mikesell G.E., Greemer D.P., Breeden LL. (1997). “A

novel Mcml-dependent element in the SW14, CLN3, CDC6, and CDC47
promoters activates M/Gl-speciﬁc transcription.” Genes & Dev. 11, 1277-1288.

173

 

Monk N. A.M. (2003) “Oscillatory expression of Hesl, p53 and NF—kB driven by
transcriptional time delays,” Current Biology 13: 1409-1413

Murphy K. and Mian S. (1999). “Modelling gene expression data using Dynamic
Bayesian Networks.” Technical report, University of California, Berkeley.

Murphy K. (2001) “The Bayes net toolbox for matlab.” Computing Science and
Statistics: Proceedings of Interface. 33

MacGregor J. F., Jaeckle C., Kiparissides C., Koutoudi, M., (1994) “Process monitoring
and diagnosis by multiblock PLS methods” AIChE J ., 40, 826-838.

Maki Y., Tominaga D., Okarnoto M., Watanabe S., Eguchi Y. (2001) “Development of a
System for the Inference of Large Scale Genetic Networ ” Paciﬁc Symposium
on Biocomputing, 62446—458.

Magana M.M., Koo S.H., Towle H.C., Osborne T.F. (2000) “Different sterol regulatory
element-binding protein-1 isoforms utilize distinct co-regulatory factors to

activate the promoter for fatty acid synthase.” The ioumal of biologal chemistry,
275(7): 4726-4733.

Martens G.A., Cai Y., Hinke S., Stange G., Van de Casteele M., Pipeleers D. (2005)
“Glucose suppresses superoxide generation in metabolically responsive pancreatic
beta cells.” J Biol Chem. 280(21):20389-96.

Martin-Requero A., Cipres G., Rodriguez A., Ayuso M. S., and Parrilla R. (1992) “On
the mechanism of stimulation of ureagenesis by glueoneogenie substrates: role of
pyruvate carboxylase.” American J ou_m_al of Physiology 263: E493-E499

Martoglio A.M., Miskin J.W., Smith S.K., MacKay D.J.C. (2002) “A decomposition
model to track gene expression signatures: preview on observer-independent
classiﬁcation of ovarian cancer.” Bioinformatics 18, 1617-1624.

Meijer A. J ., Lamers W. H., Chamuleau R. A. F. M. (1990) "Nitrogen metabolism and
ornithine cycle ﬁmction." Physiol. Rev., 70: 701-748.

Miller C.W., Ntambi J .M. (1996) "Peroxisome proliferators induce mouse liver stearoyl-
CoA desaturase 1 gene expression." PNAS 93, 9443-9448.

Moller I. M. (2001) “Plant mitochondria and oxidative stress: Electron Transport,
NADPH Turnover, and Metabolism of Reactive Oxygen Species” Annu Rev
Plant Physiol Plant Mol Biol. 52:561-591.

Mootha V. K., Lindgren C. M., Eriksson K. F., Subrarnanian A., Sihag S., Lehar J.,
Puigserver P., Carlsson E., Ridderstrale M., Laurila E., Houstis N., Daly M. J.,

174

Patterson N., Mesirov J .P., Golub T.R., Tarnayo P., Spiegelman B., Lander E.S.,
Hirschhom J .N., Altshuler D., Groop LC. (2003) “PGC-lalpha-responsive genes

involved in oxidative phosphorylation are coordinately downregulated in human
diabetes” Nat Genet. 34(3):267-73.

Morris S. M., Jr. (1992) "Regulation of enzymes of urea and arginine synthesis." Annu.
Rev. Nutr. 12: 81-101.

 

Morris S. M., Jr. (2002) "Regulation of enzymes of the urea cycle and arginine
metabolism." Annu. Rev. Nu_tr., 22: 87-105.

Motz C., Martin H., Krimmer T., Rassow J. (2002) "Bel-2 and Porin Follow Different
Pathways of TOM-dependent Insertion into the Mitochondrial Outer Membrane."
J ourngl of Molecular Biology 323, 729-738.

Murphy K. and Mian S. (1999). “Modelling gene expression data using Dynamic
Bayesian Networks.” Technical report, University of California, Berkeley.

Murphy KR (2001) “The Bayesian Net Toolbox for Matlab”, Computing Science and
Statistics, 33. Available at
http://www.cs.berl_<eley.edu/~murphy/B§yes/usage.html.

Nachman 1., Regev A. (2004). “Infening quantitative models of regulatory networks
from expression data.” Bioinformatics 20(suppl_l): i248-256

Nagai H., Matsumaru K., Feng G., Kaplowitz N. (2002) “Reduced glutathione depletion
causes necrosis and sensitization to tumor necrosis factor-alpha-induced apoptosis
in cultured mouse hepatocytes”. ngtaol. 36(1): 55-64.

Nakao M., Bono H., Kawashima S., Kamiya T., Sato K., Goto S., Kanehisa M. (1999)
“Genome-scale Gene Expression Analysis and Pathway Reconstruction in
KEGG.” Genome Inform Ser Workshop Genome Inform. 10:94-103

Ni T. C. and. Savageau M. A (1996). "Application of biochemical systems theory to
metabolism in human red blood cells. Signal propagation and accuracy of
representation." Journal of biological chemistry 271(14): 7927-41.

Ni T. C. and Savageau M. A. (1996). "Model assessment and reﬁnement using strategies
from biochemical systems theory: application to metabolism in human red blood
cells." Jourﬂl of theoretical biology 179(4): 329-68.

Ntambi J.M. (1995) “The regulation of stearoyl-CoA desaturase (SCD).” Progress in
Lipid Research 34, 139.

175

Nystrom T., Larsson C., Gustafsson L., (1996) “Bacterial defense againt aging: role of
the Escherichia coli ArcA regulator in gene expression readjusted energy ﬂux and
survival during stasis”, The EMBO Joum_a_l 15(13): 3219-3228

Oehlen L.J., McKinney J .D., Cross RR. (1996). “Ste12 and Mcml Regulate Cell Cycle-
Dependent Transcription of FAR] .” Mol. Cell Biol. 16, 2830—2837.

 

Okuyama R., Fujiwara T., Ohsumi J. (2003) “High glucose potentiates palmitate-induced
NO-mediated cytotoxicity through generation of superoxide in clonal beta-cell
HIT-T15.” FEBS Lett. 545(2-3):219-23.

Olsen C., (1971) “An enzymatic ﬂuorimetric micromethod for the determination of
acetoacetate, beta-hydroxybutyrate, pyruvate and lactate”, Clin Chim Acta.
33(2):293-300.

Ong 1., Glasner J. and Page D. (2002) “Modelling regulatory pathways in E.coli from
time series expression proﬁles.” Bioinformatics 18, $241—$248.

Ontko J. A. (1972) “Metabolism of free fatty acids in isolated liver cells. Factors
affecting the partition between esteriﬁcation and oxidation” J Biol Chem.
247(6):]788-800.

Parke D.V. (1982) “Mechanisms of chemical toxicity--a unifying hypothesis.” Reggl
Toxicol Pharmacol. 2(4):267-86.

 

Paumen MB. (1997) “Direct Interaction of the Mitochondrial Membrane Protein
Camitine Palmitoyltransferase I with Bcl-2.” Biochemical and Biophysical
Research Communications 231, 523.

Pearl J. (2000) Causality: models. reasoning_and inference. Cambridge University Press,
New York.

 

Pe'er D., Regev A., Elidan G. and Friedman N. (2001) “Inferring subnetworks from
perturbed expression proﬁles”, Bioinformatics, 17, Sup 1.]: $215-$224.

Perrin B.E., Ralaivola L., Mazurie A., Bottani S., Mallet J ., and d’Alche Buc F. (2003).
“Gene networks inference using dynamic Bayesian networks.” Bioinformatics 19,
$138—$148.

Ramoni M.F., Sebastiani P., Kohane LS. (2002). “Cluster analysis of gene expression
dynamics.” Proc. Nat]. Acad. Sci. USA 99, 9121-9126.

Randle P. J., Garland P. B., Newsholme E. A., Hales C. N. (1965) “The glucose fatty acid

cycle in obesity and maturity onset diabetes mellitus”, Ann N Y Acad Sci.
131(1):324-33.

176

Range] C., Angus J., Ghahramani Z., Lioumi M., Sotheran E., Gaiba A., Wild D. L., and
Falciani F. (2004). “Modelling T cell activation using gene expression proﬁling
and state space models.” Bioinformatics 20, 1361—1372.

Re D.B., Naﬁa 1., Melon C., Shimamoto K., Goff L.K., Had-Aissouni L. (2006)
“Glutamate leakage from a compartmentalized intracellular metabolic pool and
activation of the lipoxygenase pathway mediate oxidative astrocyte death by
reversed glutamate transport”, Qlj_a. 54(1):47-57.

Rego A.C., Santos M.S., Oliveira CR. (1996) “Oxidative stress, hypoxia, and ischemia-
like conditions increase the release of endogenous amino acids by distinct
mechanisms in cultured retinal cells” J Nemchem. 66(6):2506-l 6.

Roder K., Wolf S.S., Sickinger S., and Schweizer M. (1999) “FIRE3 in the promoter of
the rat fatty acid synthase (FAS) gene binds the ubiquitous transcription factors
CBF and USF but does not mediate an insulin response in a rat hepatoma cell
line.” Eur J. Biochem, 260: 743-751.

Rodriguez C.(2001) “Differential induction of stearoyl-CoA desaturase and acyl-CoA
oxidase genes by ﬁbrates in HepG2 cells.” Biochemical Pharmacology 61, 357.

Rousset S., Bringuier A., Lardeux B., F eldmann G. (2003) “Apoptosis induced by tumor
necrosis factor a in human hepatoma cell lines”, m Symposium. 113: 303-13.

Sachs K., Perez 0., Pe'er D., Lauffenburger D.A., Nolan GP. (2005) “Causal Protein-
Signaling Networks Derived from Multiparameter Single-Cell Data.” Science
308, 523-529.

Satoh A. (2004) “PKC-{delta} and -{epsilon} regulate NF- {kappa}B activation induced
by cholecystokinin and TNF-{alpha} in pancreatic acinar cells.” Am J Physiol
Gastrointest Liver Physiol 287, G582-591.

Sakurai H. (2003) “Tumor Necrosis Factor-{alpha}-induced IKK Phosphorylation of NF-
{kappa}B p65 on Serine 536 Is Mediated through the TRAF2, TRAFS, and
TAK] Signaling Pathway.” J. Biol. Chem. 278, 36916-36923.

Saelens X., Kalai M., Vandenabeele P. (2001) ,,Translation Inhibition in Apoptosis.
CASPASE-DEPENDENT PKR ACTIVATION AND eIF2-alpha
PHOSPHORYLATION.” J Biol Chem 276(45):41620-41628.

Salgado H., Santos-Zavaleta A., Gama-Castro S., Millan-Zarate D., Diaz-Paredo E.,
Sanchez-Solano F ., Perez-Rueda E., Bonavides-Martinez C., Collado-Vides J.
(2001) “RegulonDB (version 4.0): transcriptional regulation, operon organization
and growth conditions in Escherichia coli K-12“, Nucleic Acid Research 29: 72-
74

177

Sanyal A. J., Campbell-Sargent C., Mirshahi F., Rizzo W. B., Contos M. J ., Sterling R.
K., Luketic V. A., Shiffrnan M. L. and Clore J. N. (2001) “Nonalcoholic
steatohepatitis: association of insulin resistance and mitochondrial abnormalities”,
Gastroenterolo gy. 120(5): 1 1 83-92.

Schroder J.M., Weber R., Weyhenmeyer S., Lammers-Reissing A., Meurers B.,
Reichmann H. (1991) “Adult onset lipid storage in gastric mucosa and skeletal
muscle ﬁbers associated with gastric pain, progressive muscle weakness and
partial deﬁciency of cytochrome C oxidase.” Pathol Res Pract. 187(1):85-95

Schilling C.H., Edwards J .S., and Palsson B.O., (1999) “Toward metabolic phenomics:
analysis of genomic data using ﬂux balances”, Biotechnol. Prog, 15:288-295.

Sega] E., Shapira M., Regev A., Pe'er D., Botstein D., Koller D., Friedman N. (2003)
“Module networks: identifying regulatory modules and their condition-speciﬁc
regulators from gene expression data.” ﬂame Genetics 34(2):]66-176

Smith SA. (2002) “Peroxisome proliferator-activated receptors and the regulation of
mammalian lipid metabolism”. Biochemical society transactions, 30: 1086-1090.

Soltoff SP. (2001) “Rottlerin Is a Mitochondria] Uncoupler That Decreases Cellular ATP
Levels and Indirectly Blocks Protein Kinase Cdelta Tyrosine Phosphorylation.” J.
Biol. Chem. 276, 37986-37992.

Spellman P. T., Sherlock G., Zhang M.Q., Iyer V.R., Anders K., Eisen M.B., Brown
P.O., Botstein D., Futcher B.(1998). “Comprehensive Identiﬁcation of Cell Cycle-

regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray
Hybridization.” Mol. Biol. Cell 9, 3273-3297.

Spirtes P., Glyrnour C. and Scheines R. (1993) Causation, Prediction, and Search,
Springer-Verlag, New York.

Srivastava 8., Chan C. (2006) “Hydrogen peroxide and hydroxyl radical mediate
palmitate-induced cytotoxicity to hepatoma cells: relation to MPT", accepted.

Staal F. J ., Roederer M., Herzenberg L.A., Herzenberg LA. (1990) “Intracellular thiols
regulate activation of nuclear factor kappa B and transcription of human
immunodeﬁciency virus.” Proc Nat] Acad Sci U S A. 87(24):9943—9947

Steffen M., Petti A., Aach J ., D'Haeseleer P., Church G. (2002) “Automated modelling of
signal transduction networks.” BMC Bioinformatics, 3:34-44.

Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A.,
Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J .P. (2005) “Gene

set enrichment analysis: a knowledge-based approach for interpreting genome-
wide expression proﬁles” Proc Natl Acad Sci U S A.102(43):15545-50.

178

Suzuki J. (1996) “Learning Bayesian belief networks based on the MDL principle: An
efﬁcient algorithm using the branch and bound technique”, Proceedings of the
International Conference on Mﬁachine 1.631111%, Bari, Italy.

Takahashi Y., Campbell B.A., Hirata Y., Takayama T., Listowsky 1.(1993) “A basis for
differentiating among the multiple human Mu-glutathjone S-transferases and
molecular cloning of brain GSTM5”, J Biol Chem. 268(12):8893-8.

Taylor V., Suter U. (1996) “Epithelial membrane protein-2 and epithelial membrane
protein-3: two novel members of the peripheral myelin protein 22 gene family”,
Gene, 175(1-2): 115-20.

Tamatani M. (1999) “Tumor Necrosis Factor Induces Bel-2 and Bcl-x Expression
through NFkappa B Activation in Primary Hippocampal Neurons.” J. Biol. Chem.
274, 8531-8538.

Tanay A., Sharan R., Kupiec M., Shamir R. (2004) “Revealing modularity and
organization in the yeast molecular network by integrated analysis of highly

heterogeneous genomewide data.” Proceedings of the National Academy of
Sciences of the United States of Americg 101(9):2981-2986.

Tilg H., Diehl A.M. (2000) “Cytokines in alcoholic and nonalcoholic steatohepatitis”,
New Eng J Med. 343(20): 1467-1476.

Tilg H. (2001) "Cytokines and liver diseases." Can J Gastroenterol 15(10): 661-8.

Townsend A.J., Leone-Kabler S., Haynes R.L., Wu Y., Szweda L., Bunting K.D. (2001)
“Selective protection by stably transfected human ALDH3A1 (but not human
ALDHlAl) against toxicity of aliphatic aldehydes in V79 cells”, Chem Biol
Interact]30-132(1-3):261-73.

Trach V., Buschmans-Denkel E., Schaper W. (1986) “Relation between lipolysis and
glycolysis during ischemia in the isolated rat heart”, Basic Res Cardiol.
81(5):454-64.

Tran L.M., Brynildsen M.P., Kao K.C., Suen J.K., Liao J.C. (2005). “gNCA: A
ﬁamework for determining transcription factor activity based on transcriptome:
identiﬁability and numerical implementation.” Met Eng. 7, 128-141.

Valle Blazquez M., Luque 1., Collantes E., Aranda E., Solana R., Pena J., Munoz E.
(1997) “Cellular redox status inﬂuences both cytotoxic and NF-kappa B
activation in natural killer cells. Immunology.” 90(3):455-60.

179

Van Helden J ., Naim A., Mancuso R., Eldridge M., Wemisch L., unoen U., w UU'dK S. J.
(2000) “Representing and analyzing molecular and cellular ﬁrnction using the
computer.” Biol. Chem. 381(9-10): 921-935.

Wang H., Rish I., and Ma S. (2002) “Using Sensitivity Analysis for Selective Parameter
Update in Bayesian Network Learning,” 2002 AAAI Spring Symposium
Proceedings on Information Reﬁnement and Revision for Decision Making:
odeling for Diagnostics, Prognostics, and Prediction, Stanford, Palo Alto,

March 25-27, 2002.

Wang S.J., Chen J. J. (2004) “Sample Size for Identifying Differentially Expressed Genes
in Microarray Experiments.” Journil of Computatioan Biolggy Vol. 11, 714-726.

Wang Y., Liu C. L., Storey J.D., Tibshirani R.J., Hershlag D., Brown PO. (2002).
“Precision and functional speciﬁcity in mRNA decay” PNAS 99(9): 5860-5865.

Watada H. and Kawamori R.(2003) "Insulin resistance and NASH." BIO Clinica. 18(10):
874-879.

Wei Y., Wang D., Topczewski F., Pagliassotti M.J. (2006) “Saturated fatty acids induce
endoplasmic reticulum stress and apoptosis independently of ceramide in liver
cells”, Am J Physio] Endocrinol Metab.291(2):E275-81

Wold S., Ruhe A., Dunn W.J. (1984) “The ollinearity problem in linear regression. The
Partial Least Squares (PLS) approach to generalized inverses”, SIAM J ourgrl on

Scientiﬁc £11141 StaListical Computing, 5, 735-743

Xie, Z., and Askari, A. (2002) “Na+,K+-ATPase as a signal transducer.” Eur. J. Biochem,
269: 2434-2439.

Xu 2., Williams BR. (2000) “The B56a1pha regulatory subunit of protein phosphatase
2A is a target for regulation by double-stranded RNA-dependent protein kinase
PKR.” Molecular and cellular biology 20(14):5285-5299.

Yamaguchi M., Miyashita Y., Kumagai Y., Kojo S. (2004) “Change in liver and plasma
ceramides during D-galactosamine-induced acute hepatic injury by LC-MS/MS”,
Bioorg Med Chem Lett. 14(15): 4061-64.

Zhang B., Schmoyer D.(2004). "GOTree Machine (GOTM): a web-based platform for
interpreting sets of interesting genes using Gene Ontology hierarchies." BMC
Bioinformatics 5(1): 16.

 

180

 

IIIIIIIIIIIIIIIIIIIIIIIIIIIII

IIHIUHIIHHIIIfllllllllllllllltllllllllllll'llllll!