, i _._ —

PLACE IN RETURN BOX to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 KzlProleoc&Pres/CIRCIDateDue.hdd

 

DECIPHERING CIS-REGULATORY TRANSCRIPTIONAL GRAMMAR IN
DROSOPHILA MELANOGASTER BY MATHEMATICAL MODELS

By'

Ahmet Ay

A DISSERTATION

Submitted to
Michigan State University
in partially fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Mathematics

2009

ABSTRACT

DECIPHERING CIS-REGULATORY TRANSCRIPTIONAL GRAMMAR TN
DROSOPHILA MELANOGASTER BY MATHEMATICAL MODELS

By
Ahmet Ay

Transcriptional regulatory information, represented by patterns of protein binding
sites on DNA, comprises a key portion of genetic coding. Despite the abundance of
genomic sequences now available, identifying and characterizing this information
remains a major challenge. Minor changes in protein binding sites can have profound
effects on gene expression, and such changes have been shown to underlie key aspects of
disease and evolution. Thus, a key goal in contemporary genomics is to develop a global
understanding of the transcriptional regulatory code, allowing prediction of gene output
based on DNA sequence information. Recent studies have focused on endogenous
transcriptional regulatory sequences; however distinct enhancers differ in many features,
including transcription factor activity, spacing and cooperativity, making it difﬁcult to
learn the effects of individual features and generalize them to other cis-regulatory
elements. We have pursued a unique “bottom up” approach to understand the mechanistic
processing of regulatory elements by the transcriptional machinery, using a well deﬁned
and characterized set of repressors and activators in Drosophila blastoderrn embryos. The
study is concentrated on a set of proteins known as short-range repressors such as Giant,
Krﬁppel, Knirps and Snail, which play central roles in development.

We have generated a large quantitative data set using ﬂuorescent Confocal Laser

Scanning Microscopy (CLSM) to determine the inputs (Giant, Krﬁppel and Knirps

protein levels) and outputs (lacZ mRNA levels) of the regulatory elements introduced
into Drosophila by transgenesis. In this study (Chapter 2) we present a semi-automatic
algorithm to process the image stacks from CLSM to correlate the protein levels of the
short-range repressors with lacZ mRNA produced by reporter genes using images of
Drosophila blastoderm embryos. We show that signals derived from CLSM are
proportional to actual mRNA levels. Our analysis reveals that a suggested parabolic form
of the background ﬂuorescence in confocal images of early Drosophila embryos is
evident most prominently in ﬂattened specimens, with intact embryos exhibiting a more
linear background. The data extraction described in this paper is primarily conceived for
analysis of synthetic reporter genes, but the techniques are generalizable for quantitative
analysis of other engineered or endogenous genes in embryos.

Using fractional occupancy based modeling on this data set (Chapter 3); we
identiﬁed quantitative values for parameters affecting transcriptional regulation in vivo,
and these parameters are used to build and test the model. We uncovered previously
unknown features that allow correct predictions of regulation by short-range repressors
on synthetic and endogenous elements. These features include a nonmonotonic distance
function for quenching, which implicates possible phasing effects, a modest contribution
for repressor—repressor cooperativity, and similarity in repression of disparate activators.
This work provides essential quantitative elements of a transcriptional grammar that will
allow extensive analysis of genomic information in Drosophila melanogaster and related
organisms. Extension of these predictive models should facilitate the development of
more sophisticated computational algorithms for the identiﬁcation of cis-regulatory

elements and the prediction of the quantitative output from novel regulatory elements.

Dedicated to my complaisant mother (Penbe Sahin),
my father (Bayram Sahin) and to my lovely wife (Ayten Ay)

ACKNOWLEDGEMENTS

So many people have touched my life throughout my time at Michigan State
University. I would ﬁrst and foremost like to thank my advisors, Dr. Chichia Chiu and
Dr. David N. Amosti, for their guidance, support, and encouragement over the last few
years. Thanks to my committee members, Dr. Shin-Han Shiu, Dr. Zhengfang Zhou and
Dr. Guowei Wei for their helpful comments about my research.

Where would I be without the fun, support, friendships, and frustrations that come
with being an Amosti group member? Thanks to all along the way who helped shape the
scientist and the person that I have become. Thank you (alphabetically, of course) to
Carlos, Jacob, Li, Liang, Lyle, Pankaj, Rupinder, Sandhya, Tess, Walid and Yang. I have
been fortunate to have a wonderful group of friends at Michigan State, without whom, I
am not sure I would have survived.

Not to be forgotten are the friends in my life that saw me through the process.
Their support was invaluable. You helped me keep a healthy perspective about what life
is really all about. Thanks so much. I would also like to extend a very special thanks to
my wife Ayten for all of her love, support and patience during these years. I truly could
not have done this without you.

Finally, and most importantly, I would like to say thank you to my parents. I
would not be where I am today without your encouragement, your love, and your

support. Thank you for always being there for me.

-A hmet-

TABLE OF CONTENTS

List of Tables ....................................................................................... vii
List of Figures ...................................................................................... vii
CHAPTER I
Introduction ............................................................................................ l
Thermodynamic Models .................................................................... 6
Differential Equation models .............................................................. 24
Boolean Models ............................................................................. 40
Parameter Estimation ....................................................................... 53
Model Selection ............................................................................. 57
References ................................................................................... 62
CHAPTER 11
Image Processing and Analysis for Quantifying Gene Expression from Early Drosophila
Embryos ............................................................................................... 67
Abstract ...................................................................................... 68
Introduction .................................................................................. 68
Materials and Methods .................................................................... 70
Results ........................................................................................ 77
Discussion ................................................................................... 99
References .................................................................................. 1 00
CHAPTER III
Deciphering a transcriptional grammar: modeling short-range repression in the
Drosophila embryo ................................................................................ 102
Abstract .................................................................................... 103
Introduction ................................................................................ 103
Materials and Methods .................................................................. 106
Results ...................................................................................... 115
Discussion ................................................................................. 1 53
References .................................................................................. l 59
CHAPTER IV
Conclusions and Future Directions ............................................................... 163
APPENDIX A
supplementary Material for Chapter III .......................................................... 180

vi

LIST OF TABLES

Table 111-1: Parameter descriptions for Scheme 2135

Table 111-2: Emotionally grouped sets of gene constructs, used for leave-sets-out analysis

shown in Figure III-7B ............................................................................ 145
Table A-1: Ex functions for the synthetic enhancers of the model .......................... 209
Table A-II: Parameter assignments for all the genes and schemes .......................... 212
Table A-III: Oligo and primer sequences used in this study ................................. 214
Table A-IV: Robustness of Evolutionary Strategy (ES) parameter estimation. . . . .216
Table A-V: Extension of model to Knirps and Kriippel short-range repressors. .. . . .....218

vii

LIST OF FIGURES

Images in this thesis/dissertation are presented in color.

Figure II-l: Parabolic and nonparabolic background forms. 79
Figure II-2: Proportionality of signal to mRNA levels... ................................................. 84
Figure Il-3: Proportionality of signal to Giant protein levels .................................. 87
Figure II-4: Sampling of integrated mRNA and Giant data ................................... 93

Figure lI-5: Gene regulatory representations from Giant regulated lacZ reporter
gene ..................................................................................................... 97

Figure III-1: Transformation of DNA sequence and protein information by gene
modeling ............................................................................................ 117

Figure III-2: Structures of genes assayed to determine context dependence of short-range
repressor activity, and representative in situ images showing lacZ

activity ............................................................................................... 122
Figure III-3: Representative [lacZ] vs. [Gt] plots .............................................. 128
Figure III-4: Parameters found by the ES parameter estimation technique for scheme 2 of
the model ........................................................................................... 133
Figure III-5. Parameters for scheme 2 with the constraint ................................... 137

Figure III-6: Parameters for scheme 2 with cooperativity parameters set to different
levels ................................................................................................... 138

Figure III-7. Validation of modeling by prediction of subsets of the data from parameters
derived from the remainder of the data ......................................................... 146

Figure III-8: Extension of the model to endogenous regulatory elements ................. 151

Figure A-l: One hundred parameter estimation runs are shown using the evolutionary
strategy parameter estimation method for all schemes ....................................... 188

Figure A-2: One hundred parameter estimation runs are shown using the genetic
algorithm method for all schemes ............................................................... 192

viii

Figure A-3: One hundred parameter estimation runs are shown using the simulated
annealing method for all schemes ............................................................... 196

Figure A-4: Robustness of the evolutionary strategy parameter estimation technique on
synthetic data with added noise. ................................................................ 200

Figure A-5: Results of parameter estimation for R, C1, C2 and quenching employing

different schemes .................................................................................. 204

Figure A-6: Comparison of average error induced by leave-one-out analysis, employing
nine different formulations of the model ...................................................... 208

Chapter I

INTRODUCTION

Recent improvements in experimental techniques in biological systems have lent
a new momentum to mathematical modeling in recent years. These changes have
impacted studies of gene regulation the essential biological process that comprises the
production of mRNA from the DNA and translation of mRNA into the protein. Gene
regulation is a highly regulated dynamic process in which subtle changes, such as an
increase or decrease of regulatory protein levels, can have profound effects. Such effects
are involved in many human diseases such as cancer, as well as population differences
and the evolution of morphological novelties. Despite an expanding knowledge based
knowledge base on the biochemistry of gene regulation in the last ﬁfty years, we still lack
a quantitative understanding of this process. Mathematical models of the gene regulation
offer an alternative approach for understanding the rules of this ﬁmdamental problem.

Most studies of gene regulation up to now have been experimental, constructing a
“parts list” for transcription rather than generating the whole picture. These studies have
provided an important but a qualitative picture of regulation instead of a physical and
mechanical view. Now large scale sources of biological information are being generated
providing a quantitative basis for systems biology studies. These include the complete
genome sequences for many organisms, identiﬁcation of many molecular factors
involved in the regulatory processes inside the nucleus, expression levels for many genes
measured at different time points, and in vivo occupancy of the DNA by DNA binding
proteins such as transcription factors and chromatin modiﬁers. However, the quantitative

understanding about how gene regulation works is far from comprehensive since the

created data usually gives only a snapshot of the system and obtaining this quantitative
understanding with only experimentation is challenging. Mathematical modeling suggests
an alternative path for this key problem. Today, with advances in molecular biology,
genetic manipulation and the availability of complete genome sequences, development of
new models that incorporate detailed dynamics of sets of biochemical interactions is
possible.

Mathematical modeling has been used for understanding biological systems for
decades (Turing, 1952). Although, these early models were breakthrough for their time
and generally able to validate the experimental observations but could not provide any
new insights on how the system works beyond which the experimenters had already
proposed to be true. Recently the interest in this ﬁeld exploded due to the new
experimental and computational power. The new experimental techniques as mentioned
above have provided a surge of biological data, which are difﬁcult to understand without
quantitative analysis and the improvements in the computational power, which is due to
progresses in computational techniques and technology, enabled the calculations to be
done which was not possible before. These improvements made the use of diverse
mathematical modeling methods to different biological problems in particular gene
regulation possible.

Systems biology provides a fresh perspective on gene regulation in biological
studies and requires close collaborations between experimentalists and modelers. To
carry out effective studies, the researchers who are involved in these collaborative studies
in gene regulation need an understanding of both biological and mathematical ﬁelds.

Here by focusing on the models that have been developed for eukaryotic systems in

particular yeast and Drosophila, we introduce the most common modeling approaches
and their applications and focusing on goals, challenges and future directions. We also
discuss how understanding of modeling can facilitate the understanding of eukaryotic
transcription.

In modeling gene expression there are two general approaches. In the ﬁrst
approach, using largely statistical treatments, expression levels for hundreds of genes are
analyzed. Here, gene expression levels under different experimental conditions or at
different time points are used to ﬁnd groups of genes that function together, new motifs
for transcription factors or, to deduce regulatory relations between those genes. The
ultimate goal of this approach is to construct the regulatory network that underlies the
given data. Although this approach cannot explain the ﬁne details of complex relations
between transcription factors, RNA polymerase and other regulatory proteins, it can be
very insightful on seeing the big picture, as it covers large fractions of genes in the
organism. The second approach, which oﬁen involves analytical methods, describes the
gene expression of a small number of genes, usually focusing on a detailed level of
transcription using mathematical models. The models might include terms relating to
binding of transcription factors and RNA polymerase to the DNA, cooperative and
inhibitive interactions between transcription factors, mRNA and protein degradation,
export to cytosol, and mRNA translation rate. For this approach we need an extensive
level of knowledge and some hypotheses to check the system. In this review we will
concentrate mainly on the second approach and discuss the use of major modeling
techniques of this approach on gene regulation with examples from eukaryotic systems.

We leave the discussion of the ﬁrst approach to the other reviews in the ﬁeld.

The models of the ﬁrst approach such as graph based probabilistic models give a
simple view of gene regulation; transcription factors bind to their target genes, and once
bound, the factors can either stimulate or inhibit transcription. Those models could be
used for analyzing the gene expression data coming from hundreds of genes, suggesting
new regulatory relations that has not been discovered from experiments yet, ﬁnding
feedback regulations in the system and comparing gene regulatory networks of different
species, which might reveal conserved relations. However these models cannot be used
to decipher the ﬁne details of the system such as enhancer architecture and network
dynamics. Neural, Boolean and Bayesian networks are examples of this approach and in
this review as mentioned above we will not concentrate on them (Heckerman 1998;
Friedman et a1, 2000; de Jong 2002). In this approach the gene expression levels,
regulatory factors like protein concentrations and experimental conditions might be used
as input to the model.

A variety of mathematical models have been applied for quantitative
understanding of gene regulation in eukaryotes, including thermodynamic models,
Boolean models, differential equation models, and stochastic models. Those models are
used to summarize the experimental data (Yuh et al, 2001), to infer new relations from
complex experimental data, guiding the researcher to new hypotheses to test (J aeger et al,
2004) and to ﬁnd properties of the system that are hard to measure experimentally and
can not be found without modeling (Fakhouri et al, 2009). These modeling approaches
could be primarily categorized into four ways: discrete vs. continuous and stochastic vs.
deterministic. In discrete models, time, state or space will assume a discrete set of values

and in continuous models continuous values. On the other hand, deterministic models

describe the system from only the available experimental data, but stochastic models use
probabilistic considerations to describe the system. All of these strategies have their
strong and weak points, which we will discuss in detail in the following sections. For
instance, a differential equation model, which is a continuous approach, serves as a good
approximation for the underlying biological system, however it usually cannot be solved
analytically, and numerical solutions are generally computationally expensive.

The choice of a model are usually depends on the system and problem under
consideration. Several criteria should be applied in selecting a modeling approach; the
model chosen should give new biological insights that could not be found with present
experimental techniques or clarify some of the known connections; it should not only
recapitulate what is already known. Although the aim of modeling is replicating the
physical interactions of molecules, the complexity of the biology or lack of data for most
processes obstructs us from doing that and cause most of the models in biology to be
phenomenological.

Mathematical models of gene regulation depend heavily on the availability of
good quality data, which until recently was largely available only in prokaryotic systems
and yeast. Some aspects of gene regulation learned from these systems are applicable to
more complex higher eukaryotic systems, but some features of gene regulation rules in
higher eukaryotes are unique to these systems, such as the extensive use of alternative
initiation sites and regulation at a distance. Drosophila has emerged as a model system
for understanding the gene regulation in higher eukaryotes due to its elegant genetics,
functional genomics (12 Drosophila genomes and genomes of other distant relatives are

annotated), wide spectrum of methods for manipulations of the transcription, ease and

affordability of experiments as compared to other multi-cellular eukaryotes, and well
established knowledge of the system due its long tradition. In addition, the results of the
Drosophila studies are transferable to other eukaryotes such as humans: of 287 known
human disease genes, 197 have homologs in Drosophila.

The Drosophila blastoderrn embryo provides an ideal setting for the analysis of
transcriptional enhancers; the cascade of maternally and zygotically supplied
transcription factors has been extensively investigated at a molecular level, and many
DNA regulatory elements have been identiﬁed and functionally dissected. In Drosophila,
genes with complex expression patterns are controlled by multiple enhancers, whose
modular function depends on the local action of repressor proteins (Small et al, 1993). As
a result of the availability of high quality data sets the blastodenn embryo has been used
for mathematical modeling of gene expression by several approaches, including reaction
diffusion, Boolean, and fractional occupancy modeling (Jaeger et al, 2006; Sénchez &
Thieffry, 2001; Segal et a1, 2008). Drosophila segmentation networks’ dependence on
transcription for regulating its gene expression, the availability of enormous experimental
knowledge on the system and available bioinformatics techniques for completing the
unknown parts of the system makes this system unique for modeling transcription in
higher eukaryotes.

In the following sections we will elucidate the applications of mentioned models
particularly to Drosophila melanogaster and yeast. Our analysis suggests that although
the mathematical models have been used to decipher gene regulation rules in higher

eukaryotes, successfully.

THERMODYNAMIC MODELS

A general feature of most transcriptional elements in prokaryotes and many
elements in eukaryotes is the activation or deactivation of a gene in response to binding
of sequence-speciﬁc transcription factors (TFs). That is the occupancy of a regulatory
element by TFs can provide a good proxy of the mRNA levels expressed by the
associated gene. Although this picture ignores additional processes such as chromatin
structure and modiﬁcation, DNA methylation, and recruitment of general transcription
machinery, the application of so called thermodynamic models permit the modeling of
gene expression as a function of TF binding to promoter or distal enhancer elements. An
underlying assmnption for these models is that the level of gene expression is
proportional to the equilibrium probability that the gene will be in an “active” state that is
proportional to the number of activators, and inversely proportional to the number of
repressors bound to the gene. The models seek to predict how different combinations of
binding sites on a regulatory region function together to give diverse temporal and spatial
expression outputs. These models are based on simple biophysical descriptions of DNA-
protein interactions and statistical physics, and enumerate all possible ‘states’ of an
enhancer based on potential transcription factor-DNA interactions. The probability of a
gene ﬁring is calculated as the fraction of the ‘successful’ states, i.e. those with
preponderance of activators bound (Bintu et al, 2005a; Janssens et al, 2006; Zinzen et al,
2006; Segal et al, 2008).

An important step in modeling transcriptional regulation by thermodynamic
models is connecting different states of enhancers to expression. All possible states of the
enhancer are listed, and then a statistical weight for each state is assigned. For a simple

case of one binding site, there will be just two states, bound and unbound. For an element

with four sites, there will be sixteen states etc. The statistical weight for a state is
calculated by using the concentration of transcription factors and binding afﬁnity of these
factors to their binding sites on the DNA. Thus for abundant proteins binding to high
afﬁnity sites, the weight will be much greater than cases where the TF is scarce or the
binding site is weak. After the weight for each state is calculated, the probability of each
state can be calculated by dividing the statistical weight of the state by the sum of the
statistical weight of all possible states. This calculation process can incorporate properties
known to affect transcription. For example, cooperative interactions between
transcription factors and inhibitory effects of repressors on activators can be explicitly
added to the model by assigning higher or lower weights. These cooperative and
inhibitory effects can furthermore be modeled to allow for distance effects. Competitive
binding can also be incorporated to disallow simultaneous occupation of the same
binding site by two different factors. Aﬁer binding states and probabilities are deﬁned,
the next step in thermodynamic modeling is calculating gene expression output from each
state. States with high activator occupancy are likely to induce high expression, while
repressor occupancy might result in low expression. As discussed below, one can model
gene expression output as proportional to the binding probability of the RNA polymerase
or weighted sum of the transcription factors (Bintu et al, 2005a, 2005b; Segal et al, 2008;
Fakhouri et al, 2009).

The theoretical underpinnings of thermodynamic modeling have been explored
ﬁrst and foremost in prokaryotic systems. Because the regulatory regions are generally
small, binding to few TF, simple prokaryotic systems provide a tractable setting for

quantitative studies and fractional occupancy modeling. The lac operon in E. coli and the

lysis/lysogeny switch of phage lambda are two examples that have been treated (Von
Hippel et al, 1974; Ackers et al, 1982; Shea & Ackers 1985; Vilar & Leibler, 2003);
reviewed in Buchler et al (2003) and Bintu et al (2005a). Additional promoters and
conﬁgurations are considered in Bintu et al (2005a & 2005b). Zhou and Su generalized
the results of Bintu et al (2005a) to derive a single formula calculating transcriptional
probability for all simple regulatory conﬁgurations. The model is available as a Python
module, ‘tCal’, which allows the user to easily build and conﬁgure transcription models
of target genes.

The mechanisms of gene regulation found exclusively in higher eukaryotes
suggest that TF binding to these regulatory regions may invoke distinct activities not
captured by thermodynamic modeling of prokaryotic elements. However, a recent study
by showed that, using thermodynamic modeling, many eukaryotic transcriptional
responses can be executed by simple combinatorial logic. For example the Boolean AND
gate can be generated by two TFs if the binding energies and factor concentrations are
such that individual binding is weak, but joint binding is strong (Buchler et al, 2003).
Larger groups of binding sites can encode a whole repertoire of more complicated logical
functions. Their study showed that even in the absence of complex protein-protein
interactions, by only changing the arrangement of binding sites, cooperativity of
transcription factors and afﬁnity of binding sites, complex interactions could be created
(Buchler et al, 2003). One possible limitation to implementing some schemes may be the
slow kinetics of assembling very large molecular complexes. Another prediction from
these studies was that some complex regulation requires looping between distant sites and

weak glue like interaction between proteins. The density of bacterial genome and lack of

chromatin boundaries may mitigate against extensive use of such long-range effects in
prokaryotes, whereas higher eukaryotes have mechanisms to prevent intergenic cross
talk.

Among all modeling approaches, thermodynamic models appear to provide the
highest potential to predict the output of different combinations of transcription factor
binding sites in eukaryotes. Recent studies involving yeast or Drosophz'la regulatory
elements illustrate the possibilities and limitations of this approach in higher eukaryotes
(Granek & Clark, 2005; Janssens et al, 2006; Zinzen et al, 2006; Segal et al, 2008 Gertz
& Cohen, 2009, Fakhouri et al, 2009). In these models, parameters include cooperativity
between proteins and the binding afﬁnity of transcription factors to the DNA. Such
models do not explicitly model other events such as chromatin modiﬁcations, RNA
polymerase phosphorylation and promoter release however. The success of
thermodynamic models suggest that those events which are downstream of the primary
DNA and protein interactions might therefore play minor roles in the relationship
between enhancer architecture and gene expression. Below, we discuss recent uses of
thermodynamics models to study eukaryotic systems, and consider successes and
limitations of these approaches.

A simplest method for transcription factor target detection is looking for single
consensus binding sites in overrepresented binding sites to check whether they are
regulating a group of genes. Such an approach has problems: motifs not the same with
consensus sequence but close to it is not taken into account and often regulated genes
have more than one binding site for a given factor. Granek & Clarke 2005 used

thermodynamic modeling for detection of transcription factor targets in yeast genome by

10

using sequences, PWMs and concentration of transcription factors. They designed a
transcription factor target detection algorithm (GOMER) which is unique in its ability to
model competitive and cooperative interactions between transcription factors in
comparison to machine learning methods. They applied their algorithm to yeast, but
clearly it could be applied to other regulatory proteins. They used GOMER to identify
Fkh2p and Mcmlp targets in controlling the expression of a set of cell-cycle regulated
genes in yeast, and analyzed the role of cooperativity in this process. They also used the
model to investigate the role of competition between the transcription factors, th80p
and Sumlp, in distinguishing between mitotic and meiotic programs of gene regulation.
Another place they used their algorithm is predicting genome-wide binding of
transcription factors, they guessed where the protein Raplp is binding in the genome and
then used chip-array to compare this model’s prediction.

An alternative application of thermodynamic modeling in Saccharomyces
cerevisiae involved intensive investigation of a Cir-regulatory grammar applying to just
the speciﬁc activator and repressor proteins, which are known to co-regulate genes in that
organism. Cohen and colleagues constructed a set of over 2800 promoters containing
three or four binding sites for these proteins. Quantitative output of each promoter was
assayed by means of the ﬂuorescent reporter, and the activities were ﬁt by a
thermodynamic model (Shea & Ackers, 1985). Their model could describe cooperativity
between transcription factor binding sites, effects of weak binding sites, and the large
amounts of variation in the gene expression due to different promoter architectures
correctly. They used their model on the whole Saccharomyces cerevisiae genome to

predict novel targets for the transcription factors they used. For example they found new

11

targets of Migl transcription targets, which was not identiﬁed bioinformatically
previously due to their low binding afﬁnity score.

At the other end of the spectrum, thermodynamic modeling has also been applied
to a single, complex regulatory region to discover the detailed functioning of the element.
Reinitz and colleagues modeled the activity of the promoter proximal 1.7 kb region the
Drosophila melanogaster of even-skipped (eve) gene, which is expressed in seven stripes
in the embryo. This region contains the modular enhancer directing blastoderrn
expression of stripe 2, as well as weak expression of stripe 7. After careful observation of
the expression directed by this fragment, the authors incorporated the spatial and
temporal expression levels of TFs regulating this gene into a thermodynamic model.
Using only experimentally determined binding sites they were unable to recreate the
expression patterns produced by reporter gene. However, when they included the likely
binding sites predicted by bioinformatics techniques the model was able to ﬁt the data.
This study indicated that widely dispersed binding sites may operate together to generate
enhancer—like outputs, suggesting that not all developmental elements exist as compact
modules. Smaller segments of DNA may play necessary but not sufﬁcient roles in driving
gene expression. They extended their analysis by testing in silico the effects of mutation
of speciﬁc binding sites or loss of speciﬁc transcription factors. The model was able to
reproduce the altered patterns induced by these mutations. One limitation to this model is
that it cannot be readily extended to other enhancer regions.

Segal and colleagues conducted a study that took advantage of high quality
quantitative data available in the Drosophila blastoderm embryo extending the approach

of Reinitz to 59 different enhancers. The model utilized spatial expression data for eight

12

transcription factors and expression of the target genes in mid-blastoderm embryos. Their
model incorporated parameters for concentration scaling, homotypic but not heterotypic
cooperative binding and expression contribution for each transcription factor. Unlike the
eve promoter study, this model made no attempt to incorporate the distance effect of
repressors, a critical feature of these proteins. Despite the simpliﬁcations of this model,
reasonable predictions are obtained for many of the enhancers. The study predicted that
weak binding sites make important contributions, as do homotypic interactions. This
study highlighted the possible importance of weak binding sites to enhancer activity;
these sites would provide a quantitatively signiﬁcant level of activity and buffering
against loss of single binding sites. This idea has yet to be validated, however. The
contribution of homotypic cooperativity was also noted; this interaction provides sharper
patterns at lower input concentrations. Their model generally predicts the expression
patterns of the earlier expressed gap genes well, but is less successful with later
expression patterns of pair-rule genes, possibly because of heterotypic cooperative
interactions and distance-dependent quenching are not considered (Zinzen et al, 2006).
They used ROC curves for evaluating the ability of their model to predict the expression
patterns of modules that were not used as training data. Their analysis shows that their
model is not over ﬁtting the data; when they randomly generate PSSMs, swap PSSMs,
permute the columns of PSSMs, they get worse results.

A distinct approach to thermodynamic modeling was taken by Papatsenko and
colleagues, who focused on gene regulatory rules relevant to neurogenic gene expression
in Drosophila melanogaster. The rho, vnd and W1 are mainly regulated by two

transcriptional activators Dorsal (DI) and Twist (Twi), and one repressor Snail (Sna).

13

Differences in the regulatory regions for these genes lead to slight differences in
expression patterns in dorsal and ventral regions. What was unique about this study is
that rather than basing their thermodynamic model on the actual DNA sequences, the
authors generated conceptual regulatory elements containing key core blocks of Dorsal-
Twist-Snail sites. Their model was able to reproduce the relative gene expressions for
rho, vnd and vn and suggested that the structural features such as cooperativity between
transcription factors and Dorsal-Twist-Snail (DTS) module numbers could explain the
expression differences. They concluded in their parameter comparisons that rho models
require 5-10 fold higher Dl-Twi cooperativity than vnd, as well as higher Twi-Twi
cooperativity, and vnd models require more DTS modules than rho as well as higher Sna-
Sna cooperativity. Phylogenetic comparisons were employed to validate these
conclusions. Comparing enhancer sequences from Drosophila species they noted that
spacing between factor binding sites is generally conserved, and the number of DTS
modules in vnd is always more than in rho so that features distinguishing vnd from rho
have been maintained. They extended their study on vnd, mutating Twist binding sites
and showing that their model can reproduce the experimental data. The actual function of
modeled DNA sequences is not directly tested; however, most of the results of this paper
were conﬁrmed by earlier qualitative studies (Szymanski et al, 1995; Ip et al, 1992).

A combination approach was employed in a recent study that focused on
modeling synthetic elements in silico using actual in vivo quantitative data obtained for
each of the constructs (Fakhouri et al, 2009). Here 27 synthetic enhancers were devised
to test features affecting in early Drosophila embryos. Levels of reporter gene activity

were measured by confocal laser scanning imaging of over 900 embryos, and quantitative

14

differences resulting from minor changes in enhancer structure were noted. To obtain a
ﬁne scale understanding of the systems, only speciﬁc features affecting repressors were
systematically explored. As explained above, earlier studies incorporating widely
disparate enhancer sequences may not produce sufﬁcient data to identify important
features of enhancer organization, such understanding is critical to describe how
extensively reorganized enhancers maintain similar function in some instances, or show
quantitatively distinct outputs in other cases (Ludwig et al, 2005; Crocker et al, 2008).
Amosti and colleagues limited the number of features that differed between the relatively
small number of reporter genes, enabling a model with a tractable number of parameters
and robust estimation of these parameters. This study identiﬁed nonlinear quenching
effects of short range repressors, similar quenching of different activators, and modest
levels of cooperativity between short range repressors. Signiﬁcantly, this type of
modeling provided insights that were not apparent from the analysis of individual
embryos. The study was extended to an endogenous enhancer, rho NEE, showing that
certain features derived from synthetic enhancers are directly applicable to real enhancers
and, highlighting several features of the architecture of this enhancer.

In considering these applications of thermodynamic models there are limitations
and challenges that face the modeler. Eukaryotic transcription could be divided into three
layers; binding of transcription factors, recruitment of cofactors and reduction of energy
barrier of transcription by cofactors. This process is represented by three steps in the
model of Janssens et al (2006): fractional occupancy of transcription factors and
correction of activator occupancies by short-range repressor quenching, recruitment of

cofactors (they use the adapter term in their study) and calculation of transcription rate by

15

Arrhenius mechanism. In the ﬁrst layer of their model transcription factors bind to the
DNA independently i.e. no cooperative binding, and occupancy of activators is corrected
by reduction due to short-range repressor quenching effects. In this step quality of
repressors is taken into account as free parameters, repressor quenching efﬁciency is
assumed to be decreasing monotonically; complete repression up to 50 bp, linear
decrease in repression 50 bp to 150 bp and complete loss of repression after 150 bp, and
repressors is assumed to be cooperating on quenching activators which is represented by
multiplication of their effects. The second layer of their model describes the cofactor
recruitment by transcription factors, which is only a crude simpliﬁcation of the process,
where each activator has a constant potential to recruit cofactors. They also incorporated
direct repression (reduction of RNA polymerase binding due to repressors) in an earlier
version of their model in the second layer by decrease in cofactor levels; which is not
incorporated by any of the models we mention here. Another simpliﬁcation taken by
them in this step is the assumption of activators recruit same cofactor, but biologically
each activator might be recruiting different cofactors. The third layer in their model
describes activation of transcription by Arrhenius mechanism in which cofactors lowers
the activation energy barrier. In this step due to Arrhenius mechanism formulation, model
gives cooperative effects between activators and nonlinear response to activation signal,
and an exponential increase as more cofactors are recruited. To solve the exponential
increase in transcription problem, a maximum threshold level is set, which does give a
limit on transcription but not in natural way like logistic functions. Zinzen et a1 (2006)
models only the ﬁrst layer of transcription similar to Janssens et al (2006), and assumes

that transcription level is assumed to be linearly correlated to the level of active states; in

16

his case the binding of activators but with at least one Dorsal activator and one Twist
activator bound but not Snail repressor. In contrast to Janssens et al (2006) they have
cooperative binding for transcription factors. Models of Granek & Clarke (2005) and
Gertz & Cohen (2009) are similar to Zinzen et a1 (2006)’s implementation with some
additional features such as weight functions for cooperativity. Segal et al (2008) modeled
the transcription in three layers in a similar fashion to Janssens et al (2006); occupancy of
transcription factors, summation of expression contributions of transcription factors and
calculation of transcription by a sigmoidal function. The ﬁrst layer of the model is similar
to Janssens et al (2006) with the incorporation of homotypic cooperativity to occupancy
calculations. In the second layer he parameterized expression contributions of different
transcription factors, summed the expression contributions to get activation potential for
each state. In the third layer he got the total transcription level by summing up the
product of probability of seeing states and their expression contributions, which is
calculated by using sigmoidal function. Model used in Fakhouri et al (2009) has the same
set up with Segal et a1 (2008), but incorporates short-range repression and heterotypic
cooperativity.

The gene regulation mechanism for eukaryotes is still not completely known and
perhaps not all regulation mechanisms are found yet. Although we can incorporate all of
the features known about gene regulation to the models quantitatively, we generally don’t
have enough data to test them. Here we will mention a few of those properties, which
should be incorporated to the modeling as the data emerge. All the modeling attempts we

mention here are only crude approximations to the truth and one might wonder how

17

realistic a model is, if it cannot mimic the complete picture of regulation. There are
several key points to be noted in that respect.

First, although thermodynamic models could be incorporated to differential
equation models, this incorporation is generally skipped and equilibrium assumption is
taken i.e. the probability of molecules binding to the DNA is its equilibrium probability.
Although the success of these models suggests that this is a reasonable simpliﬁcation, it
is still unclear whether these systems come to equilibrium and if they come how they
manage that. ’

Second, currently most of the thermodynamic model applications could not take
into account many structural features of the DNA in a realistic manner, especially for
eukaryotic systems. For example effects of nucleosomes on transcription factor binding,
distances between transcription factors, orientation of binding sites, closeness to
transcription start site, chromatin modiﬁcations, DNA looing and so on. Although there
are not much data available to add three dimensional picture of the DNA into the model,
it is also not clear how most of these properties could be added. The simple cases of this
such as looping could be incorporated into the model possibly by modifying statistical
weights of states. Perhaps, the experiments in puriﬁed in vitro systems might be used for
directly measuring the quantitative effects of these components such as chromatin
modiﬁers and these effects could be incorporated to thermodynamic models also.

Third, although thermodynamic models is advantageous due to their rational and
careful quantization of gene expression under different combinatorial states of
transcriptional factors, they sometimes give up the dynamics of the system, and only

concentrate on a snapshot of the system. These models could still reveal some useful

18

information about the system; however, we should note that steady state approach isn’t as
useful as the dynamic approach for modeling gene regulation due to the essence of
biology which changes in concentrations matter a lot.

Fourth, in spite of all modeling studies, understanding the functional
consequences of the changes in transcription are still a challenge. Although DNA based
models might be one of the key contributors for answering this question, a network view
of the system is needed for a clear explanation. The modeling attempts which help us to
understand the fundamental rules between TFs might be used for constructing those
network level models.

Fifth, eukaryotic enhancers in contrast to prokaryotes; might involve hundreds of
binding sites and for those binding sites we don’t know whether a certain binding site is
functional at a certain time or not. Generally it is assumed if the protein is present,
binding will occur and it would be functional, but clearly this is not the case for many
binding sites. The lack of knowledge in where the proteins are binding and if they are
binding are they functional is another key problem of modeling gene regulation,
especially in eukaryotes.

Sixth, proteins competitively bind to DNA if their binding sites are overlapping
and for some regulatory networks, competition is a key regulatory mechanism. Segal et al
(2008) argued that competition for binding is not a major factor in gap gene network
where, Zinzen et a1 (2008) argued the opposite in vNE genes. Although people use the
assumption of overlapping binding site for competitive binding, due to steric hindrance
constraints transcription factors with close binding sites might be also in competition for

binding. This entity is taken into account by Granek & Clarke (2005); their competition

19

term incorporates all potential competitors binding at any window that affects binding of
another protein. They program also allows the use of user deﬁned weight functions for
competitive binding.

Seventh, in thermodynamic modeling cooperativity is generally assumed at the
level of binding, however could be added for later steps of regulation also. It is generally
assumed that transcription factors bind to the DNA cooperatively, but neither proteins
which cooperate nor their mechanism are known. Lack og knowledge in the correct
cooperativity parameter values forces researchers to either estimate this parameter from
experimental data or to incorporate it to the model in a very simple way (Granek &
Clarke 2005; Zinzen et al, 2006; Segal et al, 2008; Fakhouri et al, 2009). Simplifying
assumptions taken include; cooperativity only occurs between neighboring and same type
of proteins (homotypic cooperativity), and it decreases by a distance dependent function,
such as normalized Gaussian function with mean 0 and standard deviation 50 (Zinzen et
al, 2006; Segal et al, 2008; Fakhouri et al, 2009). These assumptions deﬁnitely cannot
explain the complex cooperativity schemes. For example importance of heterotypic
cooperativity in gene regulation is shown previously by many experimental studies, so
although ignoring heterotypic cooperativity simpliﬁes the calculations it is not
biologically reasonable (Zinzen et al, 2006; Segal et al, 2008). For example Granek &
Clarke (2005) showed that heterotypic cooperativity Fkh2p-Mcmlp dramatically
increased their models ability to explain the regulation of forkhead-regulated genes in
contrast to homotypic cooperativity Fkh2p-Fkh2p which had only little effect. Granek &

Clarke (2005) also incorporated user deﬁned cooperativity weight functions to their

20

model which makes it easier to explore more elaborate models of cooperativity; however
the choice of weight ﬁrnction for eukaryotes is not an easy task.

Eighth, actual concentration levels of the proteins are not known for most of the
transcription factors in eukaryotic systems. People generally use confocal microscopy
data, which measures relative levels of proteins and mRNAs, but levels of different
proteins and mRNAs cannot be compared due to differences in the antibodies used or
light wavelength (Janssens et al, 2006; Zinzen et al, 2006; Segal et al, 2008, Fakhouri et
al, 2009). Since the thermodynamic models heavily depend on concentration levels, this
lack of knowledge causes a key problem for thermodynamic models. A way to approach
this problem is assigning free parameters for sealing concentration levels, which could be
estimated by parameter estimation techniques (Segal et al, 2008; Fakhouri et al, 2009).
Although this approach is used frequently, its applicability is still questionable due to
lack of their validation. The improvements in confocal microscope technology will
deﬁnitely improve the quality of the data, and help us to solve this dilemma.

Ninth, although transcription shows sigmoidal pattern is an assumption used by
many researchers frequently, it is not proved to be true generally. For a correct modeling
of gene regulation, this behavior should be tested further by experiments which compare
different activator and repressor levels to transcription levels.

Tenth, some of the transcription factors function in context dependent manner in
their activity. For example Hb is known to be a repressor on MSE3 and believed to be an
activator on MSE2 due to bicoid presence, but the context dependency of this protein is

not taken into account in the recent studies; Hb is taken as an activator in Janssens et al

21

(2006) and a repressor in Segal et al (2008). This property should also be incorporated to
the models as more experimental data becomes available.

Eleventh, transcriptional regulation could be changed either by changes on
regulatory regions or transcription factors. Since the effect of little changes on the
transcription factor might possibly affect many other transcription networks, evolution
generally takes the second option and does little changes on binding sites such as their
afﬁnity, composition and arrangement when there is a need to change the transcription
level of a certain gene. This makes detection of binding sites a key starting point for
modeling transcription; however transcription factors can tolerate high sequence
variability, which gives a high ﬂexibility to gene regulation and makes the detection of
binding sites a complicated task. The number of known binding sites experimentally is
limited and bioinformatics techniques do not guarantee detecting binding sites precisely,
but this makes the results of modeling studies prone to errors or over ﬁtting. In Janssens
er al (2006), use of 17 experimentally detected binding sites were not enough to ﬁt the
model to the gene expression data; they were only able to ﬁt their model when 17
bioinformatically detected binding sites were added.

Twelve, although many modeling studies treat the relationship between the
transcription factor and its target binding site as something that either does or does not
exist, biophysically this does not make sense and a quantitative description which assigns
a wide range of afﬁnities to the transcription factor and its target binding sites on the
DNA is needed. A theoretical outline for ﬁnding the relative binding afﬁnities of binding
sites is described by Berg & Von Hippel (1987). Their outline describes an approximate

relationship between the Position Weight Matrix (PWM) score which is inferred from

22

base frequencies in known binding sites and the relative afﬁnity of the binding site
assuming additivity of the binding energy for each base pair. Although, if we mutate each
bp in binding site one by one and check binding levels we can actually ﬁnd free energy
contributions for each bp, this is costly. Berg & von Hippel (1987) framework is thought
be a nice approximation due to the assumed connection between the information content
in the set of sequences and speciﬁcity of binding. This framework takes into account the
assumption that natural selection has given rise to a certain level of sequence speciﬁcity
for each TF and the sequences that give the same binding affinity are selected. From a
biophysical point of view this formalism is not satisfactory, evolutionary arguments
should not be invoked when the goal is to model the physical interactions between a
transcription factor and the DNA sequence. In addition, although the strength of TF-DNA
interaction varies with the local DNA sequence, it should not depend on the choice of a
background model representing the global characteristics of the DNA. Purely biophysical
approaches are needed to infer TF binding speciﬁcity; as an example, transcription factor
binding speciﬁcity could gain a lot from structural information on transcription factor-
DNA binding and this information might open up new possibilities for determining TF
sequence speciﬁcity.

Thirteenth, generally the binding afﬁnity is taken care of by use of PWMs or
totally ignored and each binding site for a transcription factor is assumed to have the
same binding afﬁnity. Both of these ways have problems; on one hand there is not
enough number of binding site data available for most of the transcription factors to
construct a nice and representative PWM, on the other hand assigning the same afﬁnity

for each binding sites of a transcription factor is an oversimpliﬁcation. In two recent

23

studies a third approach is taken and binding afﬁnities are optimized for each binding site
from the experimental data and they argued that this approach didn’t improve their results
signiﬁcantly compared to the classical approaches mentioned above (J anssens et al, 2006;
Zinzen et al, 2006). Zinzen et al (2006) also argued that a wide range of binding afﬁnity

values 7< log(Ka)< 11 could not account for the measured differences in the rho and

vnd expression patterns and they are not as important as cooperativity parameter for
thermodynamic models. The insensitivity of gene expression to changes in binding
afﬁnity due to model is probably a weak point of thermodynamic models.

Despite all its shortcomings, thermodynamic models still stay as the most
promising model for deciphering gene regulation on DNA level. Although we don’t have
sufﬁcient data for some of the gene regulation features mentioned here, mathematicians
should go forward and check the applicability of them on synthetic data and show new
ways to experimentalists to create reasonable data sets for deciphering gene regulation.
On the other hand, the incorporation of thermodynamic models to large gene regulatory
network studies will possibly increase their performances and this opportunity should be

also further investigated.

DIFFERENTIAL EQUATION MODELS

A dynamical system is a set of components that interact by explicit rules which
dictate how the states of the components in the system change. One example of dynamic
systems is regulatory networks in which the components are mRNAs and proteins, and
the state of these components is their concentration levels. The rules of this evolving
system of interacting molecules could be represented by differential equation models in

terms of the rate equations for expression of any variable in terms of the other variables.

24

These models could incorporate parameters such as the degradation rates of mRNAs and
proteins, or the ﬁring rate of promoters. lmportantly they could be used to reproduce
observed system behavior or perform in silico tests of the system parameters at levels
which are hard to get by experimentation, leading new hypotheses to check.

Differential equation models are based on the principal idea that there is a
sufﬁcient number of components that we can consider them as continuous quantities
which are spatially homogeneously distributed. Although these assumptions are
oversimpliﬁcations of the real problem, and usually do not hold true for biological
systems-differential equation models still provide many useﬁrl insights about complex

regulatory networks.

In general terms a differential equation model can be simpliﬁed to—j—= fl.(x),

131' S n where x =(xl,x2,...,xn) is a vector of concentrations of mRNAs, proteins or
other related molecules and fis are rate equations, which express the changes due to
transcription, degradation, translation, etc. The selection of an fi is a critical step in

modeling and one should balance complexity with detail. Complex models are good to
explain the system, but they will contain many parameters, which might be hard to

measure experimentally or estimate computationally. Furthermore if necessary, fis might

be chosen to take into account the time-delays and uncertainty in the system.
Differential equation models can be divided into two main groups; Ordinary
Differential Equation (ODE) models and Partial Differential Equation (PDE) models.
ODE and PDE models depend on one or multiple variables respectively. For instance, for

temporal changes only we use ODE models, for both temporal and spatial changes, we

25

use PDE models. In modeling gene regulation ODEs are the most common formalism.
They include time dependent variables such as protein and mRNA concentrations, and
constants for production and degradation rates for each variable in the system. Although
PDEs are more appropriate for modeling gene regulation, the requirement in
mathematical analysis and experimental data makes them less favorable.

ODEs are a well studied ﬁeld of mathematics; the ODE theory is well established
and many numerical methods and software tools for solving ODEs are freely available
and easy to implement. Although, ODE models are generally hard to solve analytically,
i.e. ﬁnding formulas that express the solutions as explicit functions, approximations of
the solutions can be found by using numerical methods. Usually the spatial structure is
not taken into account in ODE models, when it plays an important role in the system,
however, an extra parameter is written into the model so that it could be taken into
account. In this case, we consider the spatial structure as a collection of homogeneous
compartments, between which information can be transferred. ODE models are
insufficient for modeling gene regulation if continuous aspects of the geometry of the
system are important or the homogeneity assumption is not acceptable to the system.
PDEs should be used for these cases.

PDEs, like ODEs are well studied analytically and numerically. Unfortunately, its
theory is more complicated and computations are more demanding. Finding analytical
solutions are much harder, therefore numerical simulations are the main analysis tool for
biological systems. Those models could take into account the spatial aspects of the
cellular processes if necessary (Eldar et al, 2002), however, spatial heterogeneity of the

systems is generally neglected. We will see the applications of a classical example of a

26

PDE (the reaction diffusion equation) below, which describes the production, diffusion
and degradation of biological components.

Following standard methods of chemical reaction kinetics, one can obtain a set of
differential equations for any regulatory network, which could be solved numerically to a
high accuracy level using numerical methods such as Bulirsch—Stoer Method (Press et al,
1992). This standard modeling approach has been applied to many systems, ranging from
a few isolated components to entire cells. This modeling approach has been previously
applied in many prokaryotic regulatory networks such as lac operon of bacteria. The lac
operon consists of a number of genes and a small regulatory DNA region, which controls
the expression by binding to either repressor or RNA polymerase. The ﬁrst mathematical
model to study lac operon regulation was given by Goodwin in 1965 and Grifﬁth 1968.
However, the beauty of the problem attracted many other researchers in the past 50 years
(Nicolis & Prigogine, 1977; Santillan & Mackey 2001; Yildirim & Mackey 2003;
Mackey et al, 2004). For example, the model of Grifﬁth takes into account activation of
the genes, transcription of mRNA, degradation of lactose, synthesis of beta galactosidase
and perrnease. Later, the Nicolis and Prigogine model added more details about the
system, including the action of the repressor, inducer and enzyme synthesis. Yildirim and
Mackey’s model added delays in the system, such as transcription initiation and
translation. One wonders whether all these attempts could be generalized to other DNA
regions or not?

Although differential equations have been used abundantly by modelers for many
biological systems such as population dynamics, embryo patterning, infection dynamics

and gene regulation, the framework they provide is largely unknown to biologists. Here

27

we will brieﬂy describe the use of differential equation models as a tool for describing
and making predictions about temporal and spatial changes in eukaryotic regulatory
networks, and discuss their strengths and weaknesses. Use of differential equation based
models in eukaryotic regulatory networks is relatively new; however, the accumulated
data on molecular biology such as the segmentation network in Drosophila makes the use
of these models a preferable choice (von Dassow et al, 2000; Gregor et al, 2005; Jaeger
et al, 2004). We will also present several recent applications of these models to facilitate
testing hypotheses about systems level properties of complex eukaryotic systems.

Barkai and colleagues studied the network of proteins that pattern the dorsal
region of the Drosophila embryo, which is initiated by the graded activation of the bone
morphogenetic protein (BMP) pathway (Eldar et al, 2002). Their results indicate that the
BMP activation gradient is robust to changes in gene dosage. However, the mechanism of
robustness is different than the general design of robust networks, which involve
feedback loops to buffer against perturbations in the system. Their modeling of the
system suggested the transport of the Scw and Dpp, BMP class ligands, into the dorsal
midline by Sog, a BMP inhibitor, as the key event in robustness. They validated this
result experimentally for Dpp.

Three reaction diffusion equations that form their model are written for the
system. Their model could account for the formation of the BMP-Sog complex, diffusion
of Sog, BMP and BMP-Sog, and allowed for the cleavage of Sog by Tld, both when Sog
is free and when Sog is associated with BMP. They carried out 66000 simulations to ﬁnd
the robust networks where they changed parameters of their model such as the rate

constants and protein concentrations over four orders of magnitude and solved their

28

model numerically. They generated and solved numerically three perturbed networks
representing heterozygous situations by reducing the gene dosages of sog, tld or the dpp
(or scw) by a factor of two and compared their output with the initial, nonperturbed
network. They found that only 198 out of 66000 networks are robust to twofold changes
in three genes, which showed a unique sharp concentration gradient that peaked in the
dorsal midline. Analysis of the parameters showed that the robust networks could have a
wide range of possibilities for most parameters. However, there are two restrictions on
the design of the network; cleavage of Sog by Tld is facilitated by the formation of the
complex Sog—BMP and the BMP—Sog is broadly difﬁrsible, while free BMP is not.
Maternal morphogen bicoid (Bed) of Drosophila diffuses along the anterior-
posterior axis of the embryo and forms a gradient in the early blastoderm stage of the
Drosophila embryo. This gradient helps the formation of anterior-posterior patterning,
which assigns different fates to nuclei depending on the level of Bed they are exposed to.
Gregor and colleagues wanted to understand how it is possible to get similar anterior-
posterior patterning within different Drosophila species with different sizes of embryos
(Gregor et a1, 2005). In order to see how diffusion works for Bcd they injected an inert,
ﬂuorescently tagged molecule that mimicked the Bed at the anterior pole of the embryo
and measured the concentrations at different spatial points of the embryo over time. They
modeled this process by reaction diffusion model, which describes the change in protein
concentration pursuant to diffusion over space and time as well as decay due to protein
lifetime. In order to solve their model numerically over time, they discretized the embryo
into a three-dimensional grid. They ﬁt the experimental data with their model for

different species and found species-speciﬁc diffusion constants. In species with different

29

sized embryos the relative change in diffusion constants was minimal and also the
difference in developmental time scales of the species was small. These results led them
to hypothesize that the reason for the identical patterns over different Drosophila species
is due to species speciﬁc lifetime of Bed. However, this result has yet to be
experimentally veriﬁed.

Shape and stability of Bcd morphogen are always assumed to be the result of
localized production, diffusion and degradation. Recently, the diffusion rate of Bed
protein is reported but the rate of Bed production and degradation remains uncertain
(Gregor et a1, 2005). On the other hand, it has been shown by recent live-imaging
experiments that Bcd undergoes rapid nucleocytoplasmic shuttling and equilibrates
between the cytoplasmic and nuclear compartments (Gregor et al, 2007). However, the
equilibrium level changes in time due to the number of nuclei present to trap the Bed.
These recent observations led Shvartsman and colleagues to suggest a new differential
equation model to explain the exponential shape of Bed without degradation of Bed
protein; their model incorporates constant localized production at the anterior pole of the
embryo, diffusion and nucleocytoplasmic shuttling in the presence of the growing
number of nuclei instead of localized production, diffusion and degradation (features of
earlier reaction diffusion models) (Coppey et al, 2007). Their model predicts that nuclei
do not contribute signiﬁcantly to the shape of the Bed gradient; the Bcd gradient
establishes before the nuclei migrate to the periphery of the embryo (blastoderm stage)
and remains stable during subsequent nuclear divisions, i.e. nuclei can be viewed as
essentially inert sensors of the pre—established concentration from earlier time points. The

existence of the stable dynamics of the proﬁles of nuclear Bed is a robust feature of the

30

model; the parameters of the model do not have to be ﬁne tuned. Another prediction of
the model is that local defects in nuclear density should generate only local defects in the
proﬁle of nuclear Bcd. Although these predictions have not been tested yet, they could be
tested by careful measurement of Bed stability and analyzing mutants with late defects in
nuclear migration. Despite these nice predictions, their model cannot account for sealing
of the gradient with the size of the embryo similar to earlier reaction-diffusion models
(Gregory et al, 2005, 2007).

Anterior-posterior patterning in the early Drosophila embryo is one of the well
studied biological systems experimentally by many enhancer bashing and gene mutation
experiments. However, these studies are usually insufﬁcient to get a complete picture of
this patterning process. To get the complete picture experimentally, we need to create an
in vitro system which reconstitutes the underlying gene network from well deﬁned
ingredients. This reconstruction is almost impossible for right now, and for this reason
alternative approaches are needed. Therefore in recent years researchers have been trying
a new approach using mathematical modeling to recreate this network in silico and test
the hypothesis.

In several recent studies Reinitz and colleagues analyzed the Drosophila gap gene
regulatory circuit (one hour prior to cellularization in Drosophila) by using reaction-
diffusion models, which was used by the same group in an earlier study to analyze the
formation of stripes f expression of the pair-rule gene eve (Jaeger et al, 2004a; Jaeger et
al, 2004b; Reinitz & Sharp 1995). In their study they used spatial and temporal data of
wild type protein levels of the network, which constituted of the gap genes hunchback

(hb), Kruppel (Kr), knirps (kni) and giant (gt), maternal factors bicoid (bed) and caudal

31

(cad) and the zygotic gene tailless (tll). Their model provides a way to use this data to
infer how concentrations of products of a given gene change with time and how these
changes are inﬂuenced by the activating or repressing effects of the products of other
genes. Their model is based on three main ideas; protein concentrations are taken as state
variables of the network, chemical reaction kinetics are given by coarse grained rate
equations for protein concentrations, and least squares ﬁt is used to estimate the
parameters of the network. The optimized parameters suggested to them regulatory
relations in the network, which they compared to the literature and checked if their model
could successfully mimic the gap gene circuit and known mutations in the circuit. Their
model agreed with the earlier mutant and reporter studies, recommended that some of the
previously reported regulatory interactions are not necessary for getting the model to ﬁt
and suggested some new regulatory interactions such as activation of Kr by Cad.

They show that their model could reproduce gap gene expression at high accuracy
and found the mechanisms previously inferred from qualitative studies of mutant gene
expression data (Jaeger et al, 2004a, 2004b). Their analysis argues that threshold
dependent interpretation of maternal morphogen concentrations is not sufﬁcient to
explain shifts in gap domain boundary positions. They argue that maternal factors start
activation of gap genes but their ﬁne detail will be determined by gap genes i.e.
positioning of gap gene boundaries and maintenance of gap gene expression depend
mostly on gap-gap talks rather than maternal activators (Jaeger et al, 2004a, 2004b).
Their model also suggested diffusion as a non critical mechanism for observing gap gene
shiﬁs; if they don’t allow shifts in the gap gene networks, they still see the shifts in gap

gene expressions (Jaeger et al, 2004a).

32

The model predicts that synthesis is conﬁned to the anterior region of each
expression domain, which implies that there is an asymmetric distribution of gap gene
transcript in protein domains; transcript domains of Kr,'kni and gt are shifted anteriorly
with respect to their corresponding protein domains. They argue that these shifts are due
to protein synthesis domination anteriorly and protein decay domination posteriorly.
Shifts of anterior gap domains could be considered secondary effects of shifts of posterior
gap domains, i.e. shifts of anterior gap domain boundaries either follow the posterior
boundary shifts of more anterior gap genes or are due to sharpening of posterior
boundaries of anterior gt and hb (Jaeger et al, 2004a).

They predict activation of Kr by Cad and several other regulatory relations
(clarify evidence on the effects of Hb on Kr, Kr on kni, and Gt on kni) (Jaeger et al,
2004b). Unfortunately, most of the results from this study have not been validated yet. It
has also been noted that the model fails on null mutant experiments. They claim that this
failure is due to indeterminacy in quantitation of protein signal, or due to false early gap
gene regulation, which predicts high levels of gap genes that does not actually exist.

The periodic spatial pattern of segment polarity gene expression of Drosophila
melanogaster along the anterior-posterior axis of the embryo is maintained throughout
development, providing positional information for subsequent developmental events.
Segment polarity genes process and maintain their expression state through cross
regulatory interactions, which involves cell to cell interactions. Recently, von Dassow
and colleagues analyzed the segment polarity network by ODE models and checked
whether the known interactions in this network are enough to yield robustness (von

Dassow et al, 2000).

33

They developed a logical network interpretation from experimental results for the
reactions between the segment polarity genes and their products, including Engrailed
(EN), Wingless (WG), Hedgehog (HH), Patched (PTC) and Cubitus interruptus (CID).
They used this logical network for designing an ODE model that encodes this logic in a
set of 13 nonlinear ordinary differential equations, which incorporates synthesis, decay,
heterodimerization, cleavage and cell-to-cell trafﬁc. The model encoded 50 parameters
including binding rates, cooperativity coefﬁcients and half lives of proteins and mRNAs
where the real values are usually unknown. In their study, they tried to ﬁnd a set of
parameter values which the model exhibits the desired behavior of a segment polarity
network, given realistic initial conditions. Their model could not result in the
segmentation patterns observed with the known interactions among the segment polarity
genes and their products in the observed behavior of the embryo during and after the
segment polarity stage. Even after attempting a wide range of constrained parameters for
their model, they could not determine one parameter set that would maintain the initial
pattern of these genes’ expression stably over time.

They argued that the discrepancy seen is possibly due to lack of information about
the topology of the network rather than the parameters chosen. They redesigned the
protein-protein interaction network with some additional interactions, which was
suggested by their best approximate results; wg and en expresses alternately in every
other cell along lines of cells parallel to the anterior—posterior axis. They added two new
interactions, which are biologically reasonable to get the expected pattern; wg auto-
activation, and inhibition of en by Ci amino-terminal repressor fragment. The difference

between this new model and old model is the emergence of two positive feedback loops,

34

one for en and one for wg. With these links installed there are many parameter sets that
enable the model to reproduce the target behavior, i.e. the segment polarity network is
robust to parameter variation. Their model also suggests that the segment polarity
network requires few absolute demands on initial conditions, and it seems likely that the
evolutionary process could replace those inputs relatively easily which aids in the use of
the same module in different organisms. They also showed that robustness property is not
an artifact of the network topology; they analyzed models that include additional links
and components and observed that as long as they keep the core topology the same
conclusions hold. Although the results of this model haven’t been experimentally
validated yet, the approach itself was extremely valuable in guiding new experiments.
There are a few points to be noted in differential equation based models. First,
three-dimensional structure of biological systems should be taken into account for
realistic modeling. Although reducing the dimension of the system might simplify
calculations and be a reasonable ﬁrst approximation to the problem, models should be
extended as new data emerges. For example, Jaeger et al (2004a, 2004b) and Reinitz &
Sharp (1995) concentrated on a strip of cells from the middle of the Drosophila embryo
with the assumption that anterior-posterior and dorsal-ventral patterning networks are
independent in Drosophila melanogaster, but it is accepted that this assumption is not
entirely true. Recently, F owlkes et al (2006) and Luengo-Hendriks et al (2006) noted the
importance of nuclear movements on gap gene regulation, which could also be taken into
account in a three dimensional treatment of the problem. They are producing three-
dimensional data on Drosophila segmentation genes, which the modeling approach taken

by Reinitz and colleagues could be applied to (Fowlkes et al, 2006; Luengo-Hendriks et

35

al, 2006). Similarly, Coppey et a1 (2007) assumed in their model that nuclei are
uniformly distributed throughout the embryo which contradicts also to studies of F owlkes
et al (2006) and Luengo-Hendriks et al (2006).

Second, although differential equation models are more suitable than other
modeling approaches like Boolean models for the dynamic nature of biological systems,
the quality and quantity of data needed to construct a differential equation model makes
them difﬁcult to apply. If there is insufﬁcient data or the data of poor quality, use of these
models might result in inaccurate predictions. However, the inaccuracy could be resolved
by increasing quality and quantity of the data, and the differential equation models could
be solved numerically to any desired precision. For example, although Reinitz et al
(2004a, 2004b) used the same modeling and optimization techniques with Reinitz et al
(1995), they had a lower degree of variation in the distribution of parameters and the
error levels in the gap gene expression patterns were reduced to less than 5% by the
increase in the data quality.

Third, although we assume that cells and cell compartments are homogeneous, in
reality cell compartments are highly heterogeneous and compartmentalized structures.
This leads to a situation where the discrete nature of the molecular components cannot be
ignored, resulting in the stochastic behavior of biological systems. Although, differential
equation models are appropriate for simulating systems which satisfy continuum
approximation, they fail to describe stochastic interactions. Approaches that model
stochastic effects between individual molecules offer better descriptions of system

behavior in such cases, such as the Gillespie algorithm (Gillespie et al, 1977).

36

Fourth, differential equation models usually depend on lots of parameters that
quantify the interactions between molecules and ﬁnding these parameters experimentally
is not an easy task. Consequently, these models often become underdeterrnined and many
parameter values could work equally well. Therefore, producing quantitative predictions
for large biological systems by differential equations is not straightforward and this could
only be done for simple systems. On the other hand, the numerical techniques for solving
differential equations should be chosen carefully and the stability of the solutions should
also be checked. In a numerically stable algorithm, errors in the input lessen in
signiﬁcance as the algorithm executes, having little effect on the ﬁnal output, and in a
numerically unstable algorithm, errors in the input cause a considerably larger error in the
ﬁnal output. Finally, due to high computational needs these models do not scale well to
complex regulatory networks with hundreds of interacting molecules. However, since
they protect the overall picture of the biological systems, as opposed to other modeling
approaches such as Boolean models, the disadvantages in computation are compensated
for by the accuracy of the results. This problem might be potentially solved by the
improvements in computational techniques and technology.

Fifth, discrete time scale could be added to the biological systems, if necessary,
by using difference equations, which is the discrete form of differential equations where
the value of one variable at a certain time depends on the value of that variable at a
former time. Difference equations are more realistic than differential equations clue to the
discrete nature of biology. Although the use of discrete time in modeling is correct for
ecology models in which new organisms are born in synchrony, it is less appropriate

when no natural time step exists, such as in transcription where transcription time varies

37

across genes. On the other hand, the time delays in gene regulation due to involvement of
processes such as import and export to the nucleus or slowness of the processes such as
translation could be incorporated into the model using delay differential equations.

Sixth, there is very limited information about the actual in vivo processes, and for
this reason lots of assumptions are made about nonlinear biological processes such as
spatial organization of the system, interactions between molecules, and reaction rates. For
example, it is generally assumed that interactions among molecular species follow mass-
action kinetics, but mass-action kinetics may not be suitable for some reactions such as
conformational changes in large-scale macromolecular aggregates. On the other hand, in
these models the DNA level regulation of transcription is taken as a black box, and
usually approximated by a function of sigmoidal type, which may not be realistic since
every protein does not have to obey the same type of sigmoidal behavior; some proteins
may be extremely sensitive to increases in activation signal where others may not.

Seventh, it is believed that the regulatory networks are composed of modules,
which are assumed to be semi-independent in their behavior. For example, in a typical
Drosophila embryo it is assumed that the genetic regulatory networks are effectively
isolated from other developmental processes such as cell-cell interactions, morphogenetic
movements, and protein phosphorylation. Although differential equation models do a
pretty good job at describing the behavior of the modules, due to possible missing pieces
in the modules their use is limited; for example, changes in the modules such as addition
or subtraction of new proteins might have profound effects, which may not be pointed out
by differential equation models due to over ﬁtting to the earlier description of the module

(von Dassow, 2000).

38

Eighth, differential equation models usually ignore post-translational regulation
and take DNA level gene regulation as a black box. For this reason, these models can't
help us to understand the enhancer structure and organization, such as cooperative
binding, distances between transcription factors, orientation of transcription factors;
however, these models are useful for ﬁnding potential targets for DNA level studies.
Because of skipping DNA level regulation, the rules learned from one circuit is typically
not applicable to other circuits, since on the DNA level protein binding sites might be
reorganized and this reorganization might change the rules learned on protein level.
Although this is a weakness inherent in the studies that do not take into account the DNA
level information, it does not mean that this approach is useless-it might really help us to
see the bigger picture. However, if the rules on the DNA level are found they could be
connected to the protein level analysis and possibly used by other circuits.

Ninth, the ﬁtness of differential equation models are measured by cost functions,
which compare the model’s predictions to experimental data and visual inspection. There
are two approaches to this problem; ﬁnding parameters of the system by an optimization
method which minimizes the cost function or taking a parameter range for parameters
and trying all possible values. Usually for parameter estimation global optimization
techniques are used, which are known to give multiple parameter sets that can satisfy the
system equally well due to their stochastic nature. For this reason interpretation of the
parameters is not easy and clearly the most important part of modeling. For example,
Reinitz and colleagues used Parallel Lam Simulated Annealing for estimating parameters,
which resulted in many gene circuits which they checked for defects and used the circuits

which do not have any observable defect for further analysis (Jaeger et al, 2004a, 2004b).

39

Alternatively, Eldar et al (2002) choose 66000 different parameter combinations where
each parameter ranges over four orders, and they checked for the ones which give robust

pMad expression.

BOOLEAN MODELS

Biological processes often show switch-like behavior such as competence in
bacteria, apoptosis decision in cells and transcription of a gene. For this reason, Boolean
models which represent the regulatory relations as logic gates are one of the most studied
discrete approaches that could capture and describe this behavior. In this approach,
mRNAs and proteins in the network are assumed to be binary valued logical variables,
i.e. their states can be either on or off which is usually decided according to a threshold
value. The associations between the variables are described by Boolean or logical
functions, which provides a statement performing on the inputs, mRNAs or proteins that
have regulatory signal to the target, using the logic gates such as “and”, “or” and “not”,
and the output is on ( 1) or off (0). For instance a gene which is regulated by two
transcription factors; AND function implies that the gene is transcribed only if both
transcription factors are binding, OR function implies that the gene is transcribed even if
one of the transcription factors are binding, and NOT ﬁmction implies that the gene is not
transcribed if both of the transcription factors are bound.

Boolean networks can be used for simulating dynamic behavior in biological
systems by applying the Boolean functions in discrete time steps, usually the time
interval that is larger or equal to the duration of all biological processes in the system.
The updates in the network, which could be synchronous or asynchronous, cause the

biological system to evolve from state to state, where a state of the biological system is a

40

binary vector demonstration of the states of each variable (mRNA, protein, etc.) in the
system. In this approach each variable has two states-on (l) or off (0), and the dynamics
show how the variables change each others’ states over time. Due to the deterministic
nature of Boolean models, the initial state of the system completely determines the end
state of the system. If no difference occurs between transitions of states, then the system
is in a point attractor state (analogous to steady state in differential equations), and if
states of the system repeat periodically, then the system is in a cycling attractor state
(analogous to limit cycles in differential equations).

Boolean modeling could be used for any biological system, where interactions
between its elements are well described, to combine qualitative experimental
observations in a logical structure. Due to their simple nature, they circumvent the need
to know quantitative details about the reactions in the systems, which is not available for
many biological systems. This simpliﬁcation creates an advantage for Boolean models
over other modeling approaches (such as differential equation models which usually
include many unknown parameters), they become mathematically more tractable which
makes them easy to analyze analytically and implement computationally. For example
analysis of the steady states or limit cycles of the Boolean model is much easier than
differential equation models. However, despite their simplicity, they could still provide
qualitative insights into the fundamental nature of the underlying system, such as the role
of feedback loops in the network’s behavior. Also, since they are easy to analyze and
implement, they could be extended to large scale biological systems with thousands of

players.

41

Boolean network models could be used as an exploratory tool for systems where
the network structure is not clear. Many variants of the same network could be created,
and the simulations of the model could be compared to previous experimental ﬁndings
and the instincts of the modeler. As a result, Boolean models could be a way for the
modeler to understand the dynamics of the system and start modeling even without
knowing ﬁne details of the system.

Biological networks are usually robust despite intrinsic and extrinsic noise, for
example protein and mRNA levels are noisy with varying activity, and dynamic.
However, by adding stochasticity to Boolean models they could be extended and used for
analyzing the conditions under which the biological network is robust. On the other hand,
use of coarse-grained Boolean models to capture dynamicity in the networks seems
unrealistic. Recently several studies used these models based on very simple Boolean
functions to ﬁt sequences of gene expression patterns (Sanchez et al, 2001; Albert &
Othmer 2003). We will analyze these studies below for a further understanding of these
models.

Recently, Albert & Othmer (2003) have constructed and analyzed a Boolean
network model for segment polarity genes of Drosophila melanogaster. These genes
show stable expression patterns which supply the necessary information for the following
developmental processes. The expression of the segment polarity genes are reﬁned and
sustained throughout a number of developmental stages by regulatory interactions
between the elements of the network, which constitute not only cellular interactions but
also intercellular interactions. As mentioned in the previous section, analysis of this

network by nonlinear differential equation models suggested that the steady-states of this

42

network are not affected by choices in the kinetic parameters and determined mainly by
the type of regulatory interactions within elements of the network and topology of the
network. Albert & Othmer (2003) constructed a Boolean network model which
recapitulates the main conclusions of von Dassow et al (2000) and accurately predicts the
dynamics of this network. Their model is based on binary ON (1) and OFF (0)
representations of mRNA and protein levels in the network and the interactions between
elements of the network are deﬁned by logical functions. The Boolean network is
updated every unit time step to create the next state of the network. They also changed
their one step model to a two-step model, where they assumed that proteins degrade in
two steps, but mRNAs degrade in one step. However this alteration in the network did
not change the main conclusions of the model.

The model’s performance is measured by its prediction of spatial and temporal
gene expression levels of the network, which are present (1) or absent (0), rather than
absolute continuous levels of mRNAs and proteins. In their model the following
assumptions were made: the effect of transcriptional activators and inhibitors is never
additive, but rather, inhibitors are dominant, transcription and translation are ON/OF F
functions of the state, if transcription/translation is ON, mRNAs/proteins are synthesized
in one time step, mRNAs decay in one time step if not transcribed, transcription factors
and proteins decay in one time step if their mRNA is not present. In their modeling they
take into account only 12 cells i.e. 3 parasegment primordia and impose periodic
boundary conditions. 4 cell per parasegment primordium are used since when expression
of segment polarity genes begins, a given gene is expressed every four cells. They did not

add the nodes such as F2 and smo to the modeling since these are not regulated by other

43

nodes in the network. With all the elements they have, the total number of nodes becomes
180. They add more nodes which are composites of two or more, to make it easier to
implement. After this expansion, the number of nodes increases from 180 to 444, but this
simpliﬁes the network topology representation. They have a different functional topology
of the network for each cell of parasegment since some proteins don't exist in that
particular cell or on the neighboring cells. This boolean model is calculated again and
again for different time points and different places. They used the patterns of segment
polarity genes formed before stage 8 as initial states and the ﬁnal stable state is wild type
patterns maintained during stages 9-11. The modeling is done on only one parasegment,
for which we can ﬁnd all possible steady states of this network. They iterate the
dynamical system deﬁned by their Boolean model starting from the initial state described
above. They found that after only six time steps, the expression pattern stabilizes in a
time invariant spatial pattern.

They found 10 solutions of their model analytically which lead to six distinct
steady states. Three of those steady states are well known experimentally, corresponding
to wild type pattern and two mutant patterns with either no stripes or broadened stripes.
The existence of three additional states suggests that the network can produce some
patterns which are not needed for Drosophila melanogaster embryogenesis also. They
also identiﬁed the basin of attraction for each steady state by searching in the space of
potential initial conditions and observed that the network could correct itself for the
errors in the initial expression patterns. This property of error correcting is a signiﬁcant

robustness property of the segment polarity network.

44

Their model gives several insights into the design of the segment polarity
network. First of all, it suggests that the Wingless gene is a key element in the network,
and its initiation in the right pattern at the right time is vital. On the other hand, non-
initiation of engrailed and hedgehog could be rescued by the interactions in the segment
polarity network. In their study in order to reduce the number of assumptions taken they
assumed that the interactions in the network follow a single time step. They extend this
by using the assumption that decay of proteins is slower (twice of mRNA decay time).
They make this change to the network to make it more realistic. The number of
intermediate steps decrease for the two step model. This assumption provides a more
realistic modeling of the decay of proteins without changing the conclusions of the
model. The model of Albert & Othmer could be improved by taking more realistic
assumptions such as taking different time intervals for the decay for mRNAs and proteins
or considering a two dimensional array of cells instead of one.

As mentioned above the gene networks of embryonic segmentation in Drosophila
have been modeled by differential equation models or Boolean models (von Dassow et
al, 2000; Sanchez & Thieffry, 2001; Albert & Othmer 2003; Jaeger et al, 2004). The ﬁrst
modeling study which focused on the segment polarity gene network was done by von
Dassow et al (2000) as described in the differential equation modeling section, where
they developed a nonlinear differential equation model of the network. Their study
focused on ﬁve genes ci, en, hh, ptc and wg and their proteins. Their choice of network
topology failed to reproduce the wild type expression patterns of these genes, which they
extended by adding two more interactions. This extension resulted in a robust network

with respect to variations in the kinetic variables. The robustness of this network to

45

changes in variables, suggested that the t0pology of the network and the regulatory
interactions within the network are essential for robustness. Based on this observation,
Albert & Othmer (2003) used a Boolean model to reproduce the main characteristics of
the segmentation polarity network. In their study they showed that the network topology
and signatures of interactions in the network, whether an interaction is activating or
inhibiting, is enough to reproduce the essential features of the segment polarity network
dynamics. We should note here that although, Albert & Othmer (2003) modeled the
segment polarity network with a simple Boolean model, other networks might require
more comprehensive models which incorporate features such as asynchronous updating.

Although both studies come to the same conclusion, the networks they employed
are slightly different. The difference between their choices of network topology is due to
two opposing observations on en inhibition; Aza-Blanc et a1 (1997) suggests that en
inhibition is due to CIR and Cadigan et al (1994) suggests that this inhibition might be
due to transcription factors encoded by sloppy paired gene. Hence, in Albert & Othmer
(2003) they added sloppy pair to get the asymmetrical en activation instead of taking en
inhibition by CIR. On the other hand, the level of importance given to inhibition by these
studies is different. In Von Dassow et a1 (2000) inhibitory effects only reduce the level of
activation; however in Albert & Othmer (2003) inhibitory effects are dominant. This
difference resulted in a large number of patterns with very broad en and wg stripes for
even wild type initial gene expression patterns in von Dassow et a1 (2000).

Sanchez & Thieffry (2001) used a Boolean approach, similar to their earlier study
on the dorso-ventral patterning (Sanchez er al, 1997) to model the gap gene network in

Drosophila melanogaster. Up until now the experimental studies, especially mutation

46

analysis led the construction of a qualitative network, which determines the pattern
formation in Drosophila. This study extends these analyses with an addition to the
understanding of their dynamics and attractor states. Their model was able to simulate the
wild type, single and multiple mutant qualitative gap gene expression patterns and
describe the ways a gap gene regulatory network function to generate different patterns in
response to maternal factors; Bcd, Hb and Cad. Their modeling also sheds light on the
least number of functional levels associated with the maternal (Bch, Cad and Hb) and gap
genes (Gt, Hb, Kr and Kni), the most crucial interactions and regulatory circuits of the
gap gene network, the ordering of different regulatory interactions governed by each of
these products according to corresponding concentration scales and the importance of
gap-gap cross regulation in this network. For example, although cross-inhibition between
gap genes was suggested as a critical mechanism for creating gap gene expression
patterns, their network analysis suggested that cross inhibitory interactions between gt
and Kr constitutes a positive circuit which is critical for the whole gap gene network, but
not others (Rivera-Pomar & Jackle, 1996).

Their Boolean model takes the sum of the regulatory inputs to a target and
transforms it into logical parameters. How this transformation is done is a critical part of
the modeling. To select the speciﬁc values for parameters of the model, which takes care
of the transformation, they dynamically analyzed the gap gene system and accepted the
lowest parametric values that can generate the expression states compatible with known
wild type and mutant phenotypes. These parameters lead to the ﬁnding of the most
important cross regulatory interactions between the elements of the gap gene network. In

their Boolean modeling they also divided the embryo to four domains along the anterior-

47

posterior axis, depending on the concentration levels of maternal morphogens (Bcd, Cad
and Hb) and assigned the logical variables (maternal and gap genes) to different
functional threshold levels. They assigned three functional thresholds for Bicoid (Bed)
and Hunchback (Hb), two for Caudal (Cad) and Kruppel (Kr), and one for Giant (Gt) and
Knirps (Kni). In addition to that, if necessary, they ordered different functional
interactions where distinct functional concentrations of the same regulatory product are
involved. For example they assumed that Cad will activate kni at the ﬁrst threshold and gt
at the second threshold.

As discussed above in modeling dynamic biological processes differential
equation models are used not to lose the ﬁne details of the system. Those models use
details of the system including production, diffusion and degradation rates of regulatory
factors to develop a model that can typically be usually solved in a computationally
expensive way.

However, in most of the biological systems we lack the good quality quantitative
information on the molecular interactions between genes and proteins. Modelers in order
to circumvent this problem use Boolean models instead of differential equation models.
The Boolean model of Sanchez & Thieffry (2001) and the differential equation model of
Jaeger et a1 (2004) on the gap gene network show two sides of the coin here. Although,
Sanchez & Thieffry (2001) and Jaeger et al (2004) accomplish a comparable level of
analysis for gap gene network, there are some key differences worth mentioning, which
we will discuss further below.

A repressive feedback loop between kni and hb is reported in Jaeger et al

(2004) as essential for the gap gene network, but not in Sanchez & Thieffry (2001)

48

due to the fact that they did not take into consideration t1] and the posterior hb in
their analysis. On the other hand Sanchez & Thieffry (2001) suggest a dual role for
Hb in Kr regulation but Jaeger et al (2004) argues that the dual role of Hb is not
required for the proper expression of Kr. However this might be due to the fact that,
Jaeger et al (2004) takes the sum of the contributions from all the regulatory
proteins which possibly excludes all potential context sensitive interactions. Jaeger
et al (2004) suggests autoactivation as a critical component for sharpening gap
domain boundaries, however Sanchez & Thieffry (2001) could not ﬁnd it.

The differences in their results might also be due to the minimalist approach
of the Boolean modeling. In logical analysis functional thresholds are assigned for
continuous protein concentrations, which results in four discrete regions of
functional borders for the gene expression domain along the anterior-posterior axis,
however these borders are crude and perhaps do not match well with real
expression borders. For this reason modeling boundary sharpening in detail similar
to Jaeger et al (2004) is not possible for Sanchez & Thieffry (2001).

Another difference between these two approaches is the fact that the
approach of Jaeger et al (2004) is computationally much more expensive than
Sanchez & Thieffry (2001) despite the fact that it is applied for modeling a one
dimensional strip of nuclei. On the other hand the success of Jaeger et al (2004)'s
parameter estimation technique, which is the core part of their study, depends
heavily on the very high quality of the data on gene expression levels. However this

is not possible at this time.

49

Another application of a Boolean model is given by Yuh et' al (2001), where the
approach is used to understand the transcriptional regulation on the DNA level. In this
paper they analyzed the end016 gene of sea urchin in detail, which encodes a protein of
the embryonic and larval midgut. This enhancer is a relatively well studied i.e. regulatory
elements which control this gene spatially and temporally are known. The regulatory
organization of the endol6 gene has been created by many experimental studies which
launched the regulatory regions, their functions and interrelations between each other.
The end016 gene has a complex regulatory enhancer region, which helps it to react to
diverse spatial and temporal conditions in development. There are not many
developmentally regulated genes that have been analyzed in as much detail as end016.
Because of this detailed knowledge, the endol6 gene is usually used as an example to
show how developmental enhancers process regulatory information.

A small module (module A) in the regulatory region of the end016 gene has been
modeled by logic gates such as AND, OR, NOT to state the interactions between
regulatory regions (Yuh et al, 1998). Recently they extended their earlier study by
analysis of a nearby module (module B) and its interactions with module A. In this
enhancer, module A functions like a central processing unit, regulating the expression in
the early embryos and contributing to the expression in late embryos with module B.
There is a relatively large gap(~240 bp) between module A and module B. Either piece of
DNA, i.e., Module B or Module A, if associated with a reporter gene (they used Bp-
CAT) and injected into eggs, independently displays a speciﬁc and characteristic

transcriptional activity.

50

In their study they used mutational analysis to set the logical model. However
using mutation data usually results in a recapitulation of what is known rather than new
ﬁndings. They used quantitative measurements of the kinetic outputs of various
embryonic expression constructs that had been introduced into fertilized eggs. In this way
they could recognize regulatory functions. In their Boolean model they incorporated the
repressive contributions from other modules (DC, E, and F) to module A, a module B to
module A regulatory connection (nine different sequence speciﬁc transcription factors
interact within Modules B and A), synergistic contributions of module A to module B
and a control switch between module A and module B. Using their Boolean model they
predicted that there is an internal switch in the endol6 enhancer region gene which
moves the control from module A to module B. They conﬁrmed their prediction
experimentally and found the key players on module B which control this switch. In this
enhancer, module A starts the activity of the gene at earlier time points in development
(vegetal plate speciﬁcation), however, once the gut differentiation starts module B takes
control and becomes the primary operating unit. Module B generates and transmits its
regulatory input to module A, and interacts with module A to amplify the expression of
endol6. In Yuh et a1 (2001) they showed by a quantitative kinetic experiment that the
expression of a construct including only module B was at all time points four to ﬁve
times lower than a construct with module A and B together. Their model now could
explain the control of expression changes in the endol6 gene throughout embryogenesis.
Their model not only allowed them to summarize the interactions but also provided many

testable predictions and predicted variations in the regulatory elements.

51

One shortcoming of Boolean models is their inherent deterministic behavior
which may result in poor predictive power due to stochasticity in gene regulation and
noise in the experimental measurements. The noise in the biological data makes
application of Boolean models without probabilistic considerations impossible. One
possible approach to solve this problem is adding stochastic functions to parts of the
model where necessary.

Lack of knowledge in the network architecture is another problem for the
use of Boolean models. Although, Boolean models could be used for investigative
purposes, for biological systems with insufﬁcient knowledge in system parameters,
their success might be heavily affected by network architecture. It has been
previously reported by the knock out experiments in the yeast network of Li et a]
(2004) that a single change in the network changes the dynamical trajectory with a
50% probability.

Sanchez & Thieffry (2001) and Yuh et al (2001) used analysis of mutation
experiments to construct their models. However, this approach is simple, crude and
likely to replicate the explanation of the data they are based on instead of suggesting
new ways to explain it. Possibly if a new interaction is suggested by experiments in
the system it will change the results of those types of studies. 0n the other hand the
models, not based on assumptions coming from experimental data such as Jaeger et
al (2004), have the chance to suggest new ways to explain the data.

The simplicity of Boolean models comes with a price; the accuracy in the system
will be lost. For example, a Boolean variable has only two possible states, which is not a

good simpliﬁcation for many biological properties. If the system depends crucially on the

52

ﬁne details of the system such as reaction rates, timing of regulatory relations and
concentrations of mRNAs and proteins then these models will not do a satisfactory job at
describing the system. For example if a gene is negatively regulating its own production,
in Boolean modeling this would give a oscillatory behavior, but in reality such a process

will lead to steady state unless there is a signiﬁcant time delay.

PARAMETER ESTIMATION

Most biological modeling problems reduce to an inverse problem, where the
certain parameters in the model should be estimated. However, inverse problems are not
easy to solve, it is usually a challenge to ﬁnd a unique parameter set which satisﬁes all
the constraints of the system. On the other hand they are often computationally very
expensive. Nevertheless, there is no universal parameter estimation technique which
works well for all inverse problems; rather there are different parameter estimation
methods that are customized for different problem types. The choice of a suitable
parameter estimation technique and its validation is extremely important since this choice
determines whether the problem is solved fast or slow, or even solved at all. However,
this problem has not been treated in the biological modeling in particular transcriptional
modeling literature explicitly (Janssens et al, 2006; Zinzen et al, 2006; Segal et al, 2008).

To estimate the parameters, which ﬁt the model to experimental data, we must
ﬁrst identify an objective function. Objective function depends on variables of the model
and gives a quantitative measure of the performance of the model. Oﬁen sum of squares
of the residuals between the model’s prediction and experimental data is employed as the
objective function. The goal in parameter estimation is to ﬁnd values of the parameters

that optimize the objective. Often the parameters are restricted, or constrained, in some

53

way, usually expressed as equalities and inequalities. Parameter estimation algorithms
start with an initial guess and iteratively generate new estimates until they stop,
optimistically at a solution. How the iteration process works differentiate different
parameter estimation techniques and usually depends on the objective ﬁrnction and
constraints of the parameters. In biological models objective functions and the constraints
are often nonlinear, which might imply multimodality i.e. the objective function might
have many local and global optimums. The modelers are usually interested in ﬁnding the
global optimum solution among the set of all possible solutions, however this is not an
easy task.

Parameter ﬁtting approaches could be divided into two main groups as local and
global parameter estimation techniques. Local estimation techniques include the
conjugate gradient method, Newton’s method, simplex methods, and the Nelder-Mead
method. Within all local techniques two groups are particularly important; gradient based
approaches such as Newton’s method and the Levenberg-Marquardt method and direct
search methods such as the Nelder-Mead method. Gradient based approaches require the
calculation of the objective function’s derivative or at least its approximation by a ﬁnite
difference method. On the other hand direct search methods circumvent the need for
gradient calculation, which makes them the choice for problems with discontinuous,
nondifferentiable or nonlinear objective functions. This method is based on a comparison
of the objective function at the vertices of a simplex, which is updated at each step.

Use of local parameter estimation techniques for searching global optimums of
biological models is troublesome; the search usually ends up in a local optimum rather

than global (Mendes & Kell, 1998). When prior knowledge about the parameters is

54

available local techniques could be used, however for many biological problems this is
not the case. To overcome this problem, local parameter estimation techniques might be
used repeatedly with different initial starting points for parameters, however this
approach is not very efﬁcient. Mendes (2001) noted that gradient methods could not ﬁnd
the optimum parameter set from any random starting point for estimating 36 parameters
of a nonlinear biochemical dynamic model.

Global parameter estimation techniques include deterministic strategies such as
the branch and bound method, and interval optimization, and stochastic strategies such as
genetic algorithms, the simulated annealing and, evolutionary strategies. Deterministic
methods provide some guarantee of ﬁnding the global optimum, however ﬁnding the
global optimum is computationally very expensive. On the other hand stochastic models
are probabilistic approaches and give merely a weak guarantee on ﬁnding the global
optimum. However they can reach the vicinity of the real solution in a reasonable amount
of time.

Global parameter estimation techniques, especially stochastic strategies, are better
than their local counterparts in ﬁnding the global optimum of the system and have been
shown to be more suitable for biological systems (Mendes & Kell, 1998; Mendes 2001,
Moles et al, 2003). In a recent study, Banga and colleagues compared a balanced
selection of competitive parameter estimation algorithms, including several deterministic
and stochastic global parameter estimation techniques, for their efﬁciency and reliability.
They observed that the stochastic approach evolutionary strategy functioned best on their
continuous problem with a high number of unknowns (Moles er al, 2003). Slow

convergence is a problem for global methods, however many global methods have

55

parallel versions, which shorten the time needed to ﬁnish the job. In a recent study,
Mendes & Kell (1998) considered the parameter estimation in the mechanism of
irreversible inhibition of HIV proteinase, which has 20 parameters to estimate. They
suggested that within all the methods they tried the simulated annealing algorithm
functioned the best; however the solution obtained by the Levenberg-Marquardt was
comparable and the algorithm was 750 fold faster. To overcome this problem, hybrid
models, which combine global and local techniques, have been used frequently (Gursky
et al, 2004).

In several recent studies, we have realized once more the necessity of a study
which compares the performance of parameter estimation techniques on gene regulation
models. In Reinitz & Sharp (1995), the lam simulated annealing method is used where
each parameter estimation took approximately 1 week of CPU time on a Sparc 2 and in
Jaeger et al (2004a, 2004b), a parallel lam simulated annealing method is used where
each parameter estimation took between 8-160 hours on ten 2.4GHz Pentium P4 Xeon
processors (Lam & Delosme, 1988). The same study is repeated with the same
conclusions by more efﬁcient algorithms. Perkins et al (2006) used local search
procedures and particular characteristics of the gap gene system to estimate the
parameters in 1-2 days. Fomekong-Nanfack et al (2007) used the evolution strategy
approach (motivated by the earlier studies Moles et al (2003) and Runarsson & Yao
(2000)) and reduced the time needed 5-140 times compared to the parallel lam simulated
annealing method.

Parameter estimation techniques have been used in modeling extensively as

mentioned above, however there are not many studies in the literature which compare

56

parameter estimation techniques in the ﬁeld of modeling, particularly gene regulation
modeling. Mendes & Kell (1998) discussed the performance of the parameter estimation
techniques for ﬁnding global optimum in biochemical models and concluded that there is
no best algorithm which works well for all problems. In their study they recommended
the use of a set of diverse parameter estimation methods to attain the best possible
solution.

Modeling approaches will get more complicated soon to incorporate the new
genome scale data. To estimate the parameters of these models we need algorithms that
are robust (starting point independent), efﬁcient (not require much computational power
and storage) and accurate (not sensitive to errors in the data). Although we cannot attain

all of these goals at the same time, it is our job to pick the best for our needs.

MODEL SELECTION

Quantitative analysis of biological systems relies on the iterative process of
integration of experimentation, data processing and modeling. New experiments and
model development continues until an agreement is reached between the experimental
data and model predictions. Modelers usually require plenty of good quality data for this
iterative process; however the data collection from biological systems is usually limited,
which makes the modeling extremely challenging. For example, in biological systems
data is usually collected at a certain temporal state, i.e. it is only a snapshot of a dynamic
system. Although there are some biological systems which are relatively better known,
such as anterior-posterior patterning in Drosophila, even the current information in these

systems is not sufﬁcient to detail all the regulatory relations.

57

The models which are restricted by the limited amount of data could generally
reproduce the experimental data, but not provide any new insights. Currently, there are
widespread studies that generate genome, transcriptome and proteome data, which will
end the data limitation problem. The new challenge would be how to merge these
different “omic” data types. Recently several studies showed the power of integrating
insights from different data sets such as transcriptome and genome sequence (Tavazoie et
al, 1999; Segal et al, 2004). For example Tavazoie et a1 1999 used gene expression data
from microarrays to cluster co-expressed genes and searched upstream regions of these
genes to identify common regulatory motifs. They applied this technique for identifying
novel regulatory networks in S. cerevisiae and were able to identify 18 motifs in the
upstream sequences of genes in 12 clusters that are overrepresented in their cluster and
absent in the others. However, such a study is difﬁcult to extend to many biological
systems for now.

The noisy nature of biological data creates another challenge for modelers and
affects the success of the modeling efforts dramatically. Quantitative biological data is
usually noisy due to both inherent noisiness of the biological systems and the imprecision
of the data collection methods. Although the noise can be reduced by repetition of the
experiments and application of carefully chosen data processing steps, which estimate
and remove the noise from the data, it is not possible to eliminate the noise from the
biological data entirely. However, despite the noise, modeling such biological systems
might still provide some critical insights which might not be obtained without modeling.

Understanding the nature of noise in biological data sets is critical for the

selection of appropriate analysis instruments and modeling. The model chosen should be

58

able to deal with noisy data. As mentioned above some modeling problems reduce to an
inverse problem i.e. identifying parameters of the model which ﬁts the model to
experimental data. However, the noisy data makes this inverse problem ill-posed where
the solution lacks stability i.e. a few percent of noise can possibly lead to huge relative
errors in the parameters and unreliable solutions. This problem might be overcome partly
by approaches such as the Tikhonov regularization method.

A good model is consistent with the available data, reﬂects essential properties of
the system, helps answer speciﬁc questions about the system such as computing
properties difﬁcult to measure, and guides the researcher to design new experiments.
Before we start modeling we should have a good grasp of the system, we should know
properties of the system such as major molecular components, their interactions, layers of
interactions, and the physical geometry of the system. Modeling is a simpliﬁcation of
reality, and for this reason when we are modeling we should make a choice about the
level of detail we want in the model. A highly detailed model that may give a more
accurate representation of the system is not necessarily a good thing; the increase in
number of parameters boosts the possibility of overﬁtting to the data. After the level of
detail is decided, depending on the data available and questions we are interested in, we
can choose a model type. A biological system can be investigated with different
experimental methods and mathematical models. Although standardizing experimental
conditions and modeling approaches are necessary for integration of efforts of different
groups and comparability of results, it is hard to achieve. On the other hand, use of
diverse modeling approaches will promote creativity and possibly decode different

aspects of the problem. The choice of the model is largely dictated by the data available

59

and questions that are addressed. The model should be appropriate for the problem we are
solving. All of the models mentioned above have both advantages and disadvantages. For
example, a transcriptional network can be modeled by a discrete approach dynamic
Boolean model, where the time is discrete, each node of the network (mRNA, protein or
other component) has few states and the regulatory interactions between nodes are
described by logical functions (Sanchez & Thieffry, 2001; Yuh et al, 2001). A
transcription network can also be modeled by a continuous approach where each node of
the network are continuous functions of time and the evolution of the nodes are modeled
by differential equations, usually with mass action kinetics (von Dassow et al, 2000;
Jaeger et al, 2004). Although differential equation models provide more detail than
Boolean models the latter are simpler to handle, easier to expand to larger biological
systems and are computationally efﬁcient. However, Boolean models are time discrete
and can only provide qualitative results. On the other hand, both of these approaches take
the DNA level regulation as a black box and are not as useful as thermodynamic models
for explaining the enhancer architecture.

The era of systems biology has brought together people from different disciplines
for collaborative studies. For this reason, models proposed for biological problems should
be easy to understand by researchers from other disciplines for its diverse applications. A
model that is hard to code and implement may not be so useful since it is not easy for a
researcher without a quantitative background to imitate or use those models and codes.
However, a good model should not be computationally challenging and expensive.

Although, the improvements in the computing make the latter problem less signiﬁcant, it

60

is still a challenge. Although, these computational considerations don’t make a model

right or wrong; but rather affect its applicability and extendibility.

61

REFERENCES

Ackers GK, Johnson AD, Shea MA (1982) Quantitative model for gene regulation by it
phage repressor. Proc Natl Acad Sci USA 79(4): 1129-1133

Albert R, Othmer HG (2003) The topology of the regulatory interactions predicts the
expression pattern of the segment polarity genes in Drosophila melanogaster. J Theor
Biol 223(1): 1-18

Aza-Blanc P, Ramirez-Weber FA, Laget MP, Schwartz C, Kornberg TB (1997)
Proteolysis that is inhibited by hedgehog targets Cubitus interruptus protein to the
nucleus and converts it to a repressor. Cell 89(7): 1043-1053

Berg OG, von Hippel PH (1987) Selection of DNA binding sites by regulatory proteins.
Statistical-mechanical theory and application to operators and promoters. J Mol Biol 193:
723-750

Bintu L, Buchler NE, Garcia HG, Gerland U, Hwa T, Kondev J, Kuhlman T, Phillips R
(2005b) Transcriptional regulation by the numbers: applications. Curr Opin Genet Dev
15(2): 125-135

Bintu L, Buchler NE, Garcia HG, Gerland U, Hwa T, Kondev J, Phillips R (20053)
Transcriptional regulation by the numbers: models. Curr Opin Genet Dev 15(2): 116-124

Buchler NE, Gerland U, Hwa T (2003) On schemes of combinatorial transcription logic.
Proc Natl Acad Sci USA 100(9): 5136-5141

Cadigan KM, Grossniklaus U, Gehring WJ (1994) Localized expression of sloppy paired
protein maintains the polarity of Drosophila parasegments. Genes Dev 8: 899—913

Coppey M, Berezhkovskii AM, Kim Y, Boettiger AN, Shvartsman SY (2007) Modeling
the bicoid gradient: diffusion and reversible nuclear trapping of a stable protein. Dev Biol
312(2): 623-630

Crocker J, Tamori Y, Erives A (2008) Evolution acts on enhancer organization to ﬁne-
tune gradient threshold readouts. PLoS Biol 6: e263

de Jong H (2002) Modeling and simulation of genetic regulatory systems: a literature
review. J Comput Biol 9(1): 67-103

Eldar A, Dorfman R, Weiss D, Ashe H, Shilo BZ, Barkai N (2002) Robustness of the

BMP morphogen gradient in Drosophila embryonic patterning. Nature 419(6904): 304-
308

62

Fakhouri WD, Ay A, Sayal R, Dresch J, Dayringer E, Chiu C, Amosti DN Deciphering a
transcriptional grammar: modeling short-range repression in the Drosophila embryo (in
preparation).

Fomekong-Nanfack Y, Kaandorp JA, Blom J (2007) Efﬁcient parameter estimation for
spatio-temporal models of pattern formation: case study of Drosophila melanogaster.
Bioinformatics 23(24): 3356- 3363

Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayesian networks to analyze
expression data J Comput Biol 7: 601-620

Gertz J, Cohen BA (2009) Environment-speciﬁc combinatorial cis-regulation in synthetic
promoters. Mol Syst Biol 5: 244

Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys
Chem 81: 2340—2361

Goodwin B (1965) Oscillatory behaviour in enzymatic control process. Adv Enz Regul 3:
425-438

Granek JA, Clarke ND (2005) Explicit equilibrium modeling of transcription-factor
binding and gene regulation. Genome Biol 6(10): R87

Gregor T, Bialek W, de Ruyter van Steveninck RR, Tank DW, Wieschaus EF (2005)
Diffusion and scaling during early embryonic pattern formation. Proc Natl Acad Sci USA
102(51): 18403-18407

Gregor T, Wieschaus EF, McGregor AP, Bialek W, Tank DW (2007) Stability and
nuclear dynamics of the bicoid morphogen gradient. Cell 130(1): 141-152

Grifﬁth J S (1968) Mathematics of cellular control processes. 11. Positive feedback to one
gene. J Theor Biol 20: 209-216

Gursky V.V., Jaeger J., Kozlov K.N., Reinitz J., Samsonov AM. (2004). Pattern
formation and nuclear divisions are uncoupled in Drosophila segmentation: Comparison
of spatially discrete and continuous models. Physica D, 193: 286-302.

Ip YT, Park RE, Kosman D, Bier E, Levine M (1992) The dorsal gradient morphogen

regulates stripes of rhomboid expression in the presumptive neuroectoderrn of the
Drosophila embryo. Genes Dev 6: 1728-1739

Jaeger J, Blagov M, Kosman D, Kozlov KN, Manu, Myasnikova E, Surkova S, Vanario-
Alonso CE, Samsonova M, Sharp DH, Reinitz J (2004) Dynamical analysis of regulatory
interactions in the gap gene system of Drosophila melanogaster. Genetics 167(4): 1721-
I737

63

 

Jaeger J, Surkova S, Blagov M, Janssens H, Kosman D, Kozlov KN, Manu, Myasnikova
E, Vanario-Alonso CE, Samsonova M, Sharp DH, Reinitz J (2004) Dynamic control of
positional information in the early Drosophila blastoderm. Nature 430(6997): 368-371

Janssens H, Hou S, Jaeger J, Kim AR, Myasnikova E, Sharp D, Reinitz J (2006)
Quantitative and predictive model of transcriptional control of the Drosophila
melanogaster even-skipped gene. Nat Genet 38(10): 1159-1165

Keranen SV, Fowlkes CC, Luengo Hendriks CL, Sudar D, Knowles DW, Malik J, Biggin
MD (2006) Three-dimensional morphology and gene expression in the Drosophila
blastoderm at cellular resolution 11: dynamics. Genome Biol 7(12): R124

Lam J, Delosme J (1988) Performance of a New Annealing Schedule," DAC, 306 -311

Li F, Long T, Lu Y, Ouyang Q, Tang C (2004) The yeast cell-cycle network is robustly
designed. Proc Natl Acad Sci USA 101: 4781-4786

Ludwig MZ, Kreitman M (1995) Evolutionary dynamics of the enhancer region of even-
skipped in Drosophila. Mol Biol Evol 12: 1002-1011

Luengo Hendriks CL, Keranen SVE, F owlkes CC, Simirenko L, Weber GH, DePace AH,
Henriquez C, Kaszuba DW, Hamann B, Eisen MB, Malik J, Sudar D, Biggin MD, and
Knowles DW (2006) Three-dimensional morphology and gene expression in the
Drosophila blastoderm at cellular resolution 1: data acquisition pipeline. Genome Biol 7:
R123

Mackey MC, Santillan M, Yildirim N (2004) Modeling operon dynamics: the tryptophan
and lactose operons as paradigms. C R Biol 327(3): 211-224

Mendes P, Kell D (1998) Non-linear optimization of biochemical pathways: applications
to metabolic engineering and parameter estimation. Bioinformatics 14(10): 869-883

Mendes, P. and Kell, DB. (2001) MEG (Model Extender for Gepasi): a program for the
modelling of complex, heterogeneous, cellular systems. Bioinformatics, 17, 288-289.

Moles CG, Mendes P, Banga JR (2003) Parameter estimation in biochemical pathways: a
comparison of global optimization methods. Genome Res 13(11): 2467-2474

Nicolis G, Prigogine I (1977) Self-organization in nonequilibrium systems. From
dissipative structures to order through ﬂuctuations. Wiley, New York, USA

Perkins TJ, Jaeger J, Reinitz J, Glass L (2006) Reverse engineering the gap gene network
of rosophila melanogaster. PLoS Comput Biol 2(5): e51

Press WH, Teukolsky SA, Vetterling WT, F lannery BP (1992) Numerical Recipes in C.
Cambridge University Press, Cambridge, UK

64

Reinitz J, Sharp DH (1995) Mechanism of eve stripe formation. Mech Dev 49(1-2): 133-
58

Rivera-Pomar R, Jackle H (1996) From gradients to stripes in Drosophila embryogenesis:
ﬁlling in the Gaps. Trends Genet 12: 478-483

Runarsson TP, Yao X (2000) Stochastic Ranking for Constrained Evolutionary
Optimization IEEE Transactions on Evolutionary Computation 4(3): 274-283

Sénchez L, Thieffry D (2001) A logical analysis of the Drosophila gap-gene system. J
Theor Biol 211(2): 115-141

Sanchez L, van Helden J, Thiefﬁ'y D (1997) Establishement of the dorso-ventral pattern
during embryonic development of drosophila melanogasater: a logical analysis J Theor
Biol 189(4): 377-389

Santillan M, Mackey MC (2001) Dynamic behavior in mathematical models of the
tryptophan operon. Chaos 11(1): 261-268

Segal E, Raveh-Sadka T, Schroeder M, Unnerstall U, Gaul U (2008) Predicting
expression patterns from regulatory sequence in Drosophila segmentation. Nature
451(7178): 535-540

Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N (2003) Module
networks: identifying regulatory modules and their condition-speciﬁc regulators from
gene expression data. Nat Genet 34(2): 166-176

Shea MA, Ackers GK (1985) The OR control system of bacteriophage lambda. A
physical-chemical model for gene regulation. J Mol Biol 181(2): 211—230

Small S, Amosti DN, Levine M (1993) Spacing ensures autonomous expression of
different stripe enhancers in the even-skipped promoter. Development 119(3): 762-772

Szymanski P, Levine M (1995) Multiple modes of dorsal-bHLH transcriptional synergy
in the Drosophila embryo.EMBO J 14(10): 2229-2238.

Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM (1999) Systematic
determination of genetic network architecture. Nat Genet 22(3): 281-285

Turing AM (1952) The chemical basis of morphogenesis. Philos Trans R Soc Lond B
237: 37-72

Vilar JM, Leibler S (2003) DNA looping and physical constraints on transcription
regulation. J Mol Biol 331(5): 981-989

65

von Dassow G, Meir E, Munro EM, Odell GM (2000) The segment polarity network is a
robust developmental module. Nature 406(6792): 188-192

Von Hippel PH, Revzin A, Gross CA, Wang AC (1974) Non-speciﬁc DNA binding of
genome regulating proteins as a biological control mechanism: 1. The lac operon:
equilibrium aspects. Proc Natl Acad Sci USA 71(12): 4808-4812

Yildirim N, Mackey MC (2003) Feedback regulation in the lactose operon: a

mathematical modeling study and comparison with experimental data Biophys J 84:
2841-2851

Yuh CH, Bolouri H, Davidson EH (1998) Genomic cis-regulatory logic: experimental
and computational analysis of a sea urchin gene. Science 279(5358): 1896-1902

Yuh CH, Bolouri H, Davidson EH (2001) Cis-regulatory logic in the endol6 gene:
switching from a speciﬁcation to a differentiation mode of control. Development 128(5):
617-629

Zinzen RP, Senger K, Levine M, Papatsenko D (2006) Computational models for
neurogenic gene expression in the Drosophila embryo. Curr Biol 16(13): 1358-1365

66

Chapter 11

Image Processing and Analysis for Quantifying Gene Expression from Early

Drosophila Embryos

Ahmet Ay, Walid D. Fakhouri, Chichia Chiu and David N. Amosti, (2008). Image

Processing and Analysis for Quantifying Gene Expression from Early Drosophila

Embryos. Tissue Engineering Part A. 14(9): 1517-1526.

67

ABSTRACT

Correlation of quantities of transcriptional activators and repressors with the
mRNA output of target genes is a central issue for modeling gene regulation. In
multicellular organisms, both spatial and temporal differences in gene expression must be
taken into account; this can be achieved by use of in situ hybridization followed by
Confocal Laser Scanning Microscopy (CLSM). Here we present a method to correlate the
protein levels of the short-range repressor Giant with lacZ mRNA produced by reporter
genes using images of Drosophila blastoderm embryos taken by CLSM. The image
stacks from CLSM are processed using a semi-automatic algorithm to produce
correlations between the repressor levels and lacZ mRNA reporter genes. We show that
signals derived from CLSM are proportional to actual mRNA levels. Our analysis reveals
that a suggested parabolic form of the background ﬂuorescence in confocal images of
early Drosophila embryos is evident most prominently in ﬂattened specimens, with intact
embryos exhibiting a more linear background. The data extraction described in this paper
is primarily conceived for analysis of synthetic reporter genes that are designed to
decipher cis-regulatory grammar, but the techniques are generalizable for quantitative

analysis of other engineered or endogenous genes in embryos.

INTRODUCTION

Complex patterns of gene expression underlie the development of multicellular
organisms, and capturing the spatial and temporal information of gene expression is
critical for modeling gene regulatory networks. Microarrays or qPCR reactions can

provide quantitative information, but these methods usually lack spatial information. On

‘68

the other hand, ﬂuorescent in situ hybridizations provide spatial information about gene
expression and have the possibility of providing quantitative information as well. Recent
studies have relied on such in situ techniques to quantitate the levels of nuclear proteins
in the Drosophila embryo for the purposes of modeling."2

Because of its extensively researched genetic network, the Drosophila embryo
provides one of the best characterized systems for modeling transcriptional regulation.
Transcription factors encoded by the maternal, gap and pair rule genes form a regulatory
network, whose interactions have been carefully described in molecular studies.3 A
comprehensive mathematical description of this system still eludes us, however, in recent
years several studies have modeled parts of this system.l’2’4’5 In most cases, confocal
images of early Drosophila embryos were used to provide data on levels of transcription
factors in the nucleus. One study reported a straightforward approach in which levels of
regulatory proteins in the nucleus were related to protein levels of downstream targets,
while other studies have focused on quantitative descriptions of mRNA levels in the
embryo."'8 A later study took a further step in correlating nuclear transcription factor
levels to mRNA levels of a target gene but did not fully explore parameters and methods
required for this analysis.4 Here, we report such a method that correlates the level of
transcription factor to level of reporter gene mRN A, as a basis for mathematical modeling
of gene regulatory elements.

Key to quantitative assessment of gene expression levels is information about the
proportionality of signal read out to the actual levels of mRNA and protein. This
relationship has been insufﬁciently tested; therefore we examine this issue using gene

dosage studies and independent mRNA measurements. Background (in this case,

69

nonspeciﬁc ﬂuorescence) is another central issue in many biological data analyses. A
simple approach is to apply uniform background subtraction from the data. It has been
previously noted that the background for ﬂuorescently stained Drosophila embryos can

9 Our study suggests that ﬂuorescence

be represented as a paraboloid function.
background from undistorted embryos is not very paraboloidal, but as samples are
ﬂattened a paraboloidal background becomes more evident.

This methodology appears to provide the correct basis for quantitative modeling
approaches that utilize empirically established gene regulatory surfaces to facilitate
parameter ﬁtting. Such quantitative approaches will provide the tools to discover

important regulatory information in genomic data sets, as well as lay a foundation for

design of engineered transcriptional elements.

MATERIALS AND METHODS
l. Immunoﬂuorescent in situ hybridization:

Embryos were collected and ﬁxed as previously described.'0 Immunoﬂuorescent
in situ hybridization was done essentially as previously described with some
modiﬁcations.”"5 All washes were done in 1.0 ml. Fixed embryos stored at —20°C in
methanol were brieﬂy washed six times with 100% ethanol, then with xylene for 1.0h.
About 50ul of embryos were transferred into individual microfuge tubes, washed four
times with methanol-phosphate buffer 0.1%-Tween80 ((PBT; 1.37M NaCl, 43 mM
NazHPO4, l4mM NaHzPO4),1:1, v/v ratio) and then with PBT four times, each for 3 min
with continuous rocking. Embryos were washed in PBT with hybridization solution (50%

formamide, 5X SSC (3M NaCl, 0.3M Na-citrate), 100ug/m1 sonicated salmon sperm

70

DNA, 50pg/ml heparin, 0.1% Tween80 (1:1, v/v ratio) for 10 min, and then brieﬂy in
100% hybridization solution for 2 min. New hybridization solution was added and the
tubes were placed for 1h in a water bath at 55 °C. Anti-sense RNA probes of digU labeled
lacZ were heated in 50 pl hybridization solution at 80°C for 3 min, directly placed on ice
for 1 min, then prehybridization solution was completely removed and the probe in 50p]
of hybridization solution was added to each tube, and tubes were incubated at 55°C for
18-20h. After incubation, lml of 55 °C hybridization solution was added to each tube and
all tubes were rocked at room temperature for 1min, hybridization solution was changed
and tubes were incubated for another 1h at 55 °C, followed by four washes with
hybridization solution for 15 min each at 55 °C and with hybridization solution and PBT
(1:1, v/v ratio) two times at room temperature for 15 min. Five more washes were done
with PBT for 10 min with rocking at room temperature. The embryos were washed with
a blocking solution 1.0% casein in maleic acid buffer (Western Blocking Reagent, Roche,
Indianapolis, IN) with PBT (1:1, v/v ratio). 0.5 ml PBT and blocking solution (1:1, v/v)
containing primary antibodies (2.2 pl of 1:250 dilution of mouse anti-DigU (Roche,
Indianapolis, IN) and 0.75 pl of 1:800 dilution of rabbit anti-Giant (antibodies are a gift
from Reinitz Lab”) was added and the tubes were rocked at 4 °C overnight. Tubes were
washed four times each with PBT for 15 min at room temperature. 0.4 ml of PBT and 0.5
% casein blocking reagent (1:1) containing 8.0 pl of secondary antibodies: goat anti-
mouse conjugated to Alexa 555 for detection of lacZ mRNA and chicken anti-rabbit
conjugated to Alexa 488 for detection of the Giant protein (Molecular Probes, Eugene,
OR ) preabsorbed against ﬁxed yw embryos were added to each vial, and the tubes were

covered with aluminum foil to protect them from light and incubated overnight at 4 °C.

71

Embryos were then washed with PBT four times at room temperature for 5 min with
rocking, washed in glycerol + PBT (7:3, vzv ratio) for 2h until the embryos settled to the
bottom of the tubes. The embryos were then resuspended in 0.4 ml glycerol + PBT (9:1
ratio) and 0.2 ml of Permaﬂuor Mountant Medium 434990 (Thermo Electron
Corporation, Pittsburgh, PA), mounted labeled slides and covered with large rectangular
Corning cover slips (No. 1.5, 24x50mm). The slides were protected from light and stored

ﬂat at room temperature until they were imaged.

Sequences for Reporter genes:

The reporter constructs analyzed here are synthetic, Twist and Dorsal activator
binding sites were previously characterized in lacZ reporter genes in the embryo. '3 High-
afﬁnity Giant binding sites derived from the Kruppel promoter were previously
characterized. '4 The 25bp neutral spacer used lacks high afﬁnity sites to blastoderm
transcriptional regulatory proteins. Two reporters are tested here, both containing a core
of two Twist sites 5’ of two Dorsal sites. A pair of Giant motifs is located either 31bp or
l3lbp 5’ of the activators. Binding sites are capitalized; Giant sites are italicized, Twist

sites are bold and Dorsal sites are underlined.

2gt.25.2T2D

5 ’-gaattc TA TGA C GCAA GA atgc gactc g TA T GA CGCAA GA ggatctggttagtaagctgtaaactggatc

 

cCATATGttgagCATATGtctagaGGGAI I I ICCCAaatcgaGGGAAAACCCAAccgcg
g-3’
2gt.125.2T2D

5 ’ -gaattc TA T GA CGCAA GA atgc gactc g TA TGA CGCAA GA g gatctggttagtaagctgtaaactggatct

72

ggttagtaagctgtaaactggatctggttagtaagctgtaaactggatctggttagtaagctgtaaactggatctggttagtaagctg
taaactggatccCATATGttgagCATATGtctagaGGGA I 1'] I CCCAaatcgaGGGAAAACCC

AAccgcgg -3’

2. Confocal Laser Scanning Microscopy (CLSM):

An inverted CLSM (Olympus, Flowview FVlOOO) was used for capturing the
confocal ﬂuorescent images. For each scan of mounted embryos, the same microscope
settings were employed to all images to simplify comparison of results. The argon laser
(488 nm) was set at 5.0% and helium-neon laser (543 nm) was set at 25%. Emitted
ﬂuorescence from Alexa 488 and 555 was detected through a dichroic 405/488/543 and a
BP505-525 ﬁlter for the green channel and a BP560-620 ﬁlter for the red channel. The
pinhole was set to 105 pm (1.0 Airy unit), and the PMT detector, gain and offset were
680, 1.0% and 6% for both green and red channels. The PMT detector was adjusted in
cases where the embryos showed saturation of signal intensities. Embryos were imaged at
a scan speed 6.51 s/scan, line ﬁlter equal 2 line Kalman ﬁlter, and 1.73 pm-thick Z-
sections for a total of 21-30 slices. CLSM image data were stored as two separate stacks
and projections of images for each channel. The section dimensions were 333pm in
length and width and 1.73 pm in depth. Fluorescence pixels were recorded as 12-bit

images and stored as Tiff ﬁles.

3. Determining proportionality of ﬂuorescence intensity:
To test whether immunoﬂuorescent signals were proportional to the actual levels
of protein or mRNA, we varied gene dosage of lacZ reporter or the Giant repressor,

assuming that heterozygotes express at half the level of homozygotes. Male transgenic

73,

ﬂies homozygous for a lacZ reporter gene regulated by the Giant repressor and Twist and
Dorsal activators were crossed with virgin yellow white females to produce heterozygous
embryos carrying a single copy of the lacZ gene. Homozygous lacZ reporter embryos
were also collected. Two to four hour old homozygous and heterozygous embryos were
dechorionated, ﬁxed and analyzed for lacZ expression using immunoﬂuorescent in situ
hybridization. For RTqPCR mRNA analysis, dechorionated embryos were frozen in
liquid nitrogen and stored at '80 °C until needed. For extraction of total RNA, frozen
embryos were dipped brieﬂy in liquid nitrogen and macerated with a pestle in 500 pl of
TriaZol reagent from Invitrogen (Carlsbad, CA, USA) in a 1.5 ml microfuge tube. The
extraction and subsequent DNase (Roche, USA) treatment were performed according to
the manufacturer’s directions. To test for the presence of any DNA contamination,
standard PCR was performed using primers speciﬁc for lacZ gene. RNA was
spectrophotometrically quantiﬁed and equivalent amounts of RNA from each sample
were used for the reverse transcriptase reaction using the Superscript III enzyme. The
RTqPCR speciﬁc primer set for lacZ (F: 5’-CTGGGATCTGCCATTGTCAGA; R: 5’-
TGGTGTGGGCCATAATTCAATT) was designed using Primer Express Software (ABI
7500 Prism). RTqPCR normalization and analysis were done as previously described
previously.”"6 Primer sets of actin (F: 5’-CGCGGTTACTCTTTCACCA; R: 5’-
GCCATCTCCTGCTCAAAGTC) and 288 rRNA (F: 5’-GATGCCGCGCTAGTTACAT;
R: 5’-GCTGCTCAACCACTTACAACAC) were used for normalization.

For analysis of Giant protein levels, a gt’m stock was obtained from the

Bloomington Stock Center.17 gt’m virgins were out-crossed with wild type male ﬂies to

74

Yll

obtain male hemizygous th” mutant embryos, gt and heterozygous female embryos

and wild-type female embryos.

4. Image processing prior to lacZ mRNA and Giant repressor measurement:

In this study, Image J (rbs.info.nih.gov) and MATLAB softwares (MathWorks)
were used. lmageJ software was used to extract the ﬂuorescent intensity levels of lacZ
and Giant from each confocal ﬂuorescent image. Software implementing the algorithms
described below was written in MATLAB and is available upon request. Approximately
18-21 non-overlapping 1.73 pm depth optical sections were taken from each embryo,
providing a 1024x1024 pixel 12 bit image representing the top half of the embryo. The
slices were projected by taking maximum intensity for each pixel along the z-axis and the
images were converted to 8 bit images. For ﬂatter embryos analyzed in a previous study,
the average of two adjacent sections was used but with our more three dimensional
imaging, this approach is not valid because for most sections a portion of the central part
of the embryo containing only yolk is included.6

To position images in a uniform horizontal manner, masks were used to rotate the
images as described in Janssens et‘ al.". A mask is created for the projected images by
using the threshold value that is taken from outside of the embryo. The convex hull of
the embryo boundary is taken, the inside of the convex hull is set to 255, and the outside
is set to 0. The rotated images were checked to compare the alignments of the two rotated
channels. Images were ﬂipped upside-down or left-right to align unifome on the
anterior-posterior and dorsal-ventral axes. To remove extraneous portions of the image
outside the embryo, the rotated images were cropped to a minimal canvas size. The

images were not changed to a uniform size in order to avoid interpolation of the data,

75

which might have varying effects for different embryos. Subsequent to these steps,
perimeter pixels of the embryo were found using the mask.

In the next step, embryo boundaries, lacZ mRNA data and Giant repressor protein
data were used to decide where to set boxes that are approximately the size of an average
“cell”. The embryos are at this point incompletely cellularized, but separate nuclei are
clearly discernible. Positioning of the boxes is explained in data collection section. The
Giant protein and lacZ mRNA levels were averaged inside those boxes and plotted in 2-

dimensional space.

5. Comparison of imaging to data from other databases:

A variety of methods have been applied to obtain spatially resolved mRNA and
protein information from Drosophila embryos. Our CLSM images of Giant were
obtained using antibody staining and one photon imaging, but embryos were not ﬂattened
as in a previous study.l Alternatively, mRNA quantitation from non deformed embryos
has been acquired by two photon imaging.8 We compared qualitative features of our
Giant images with these data sets to see if general trends were consistent. As an example,
we observe in some of the images lower apparent levels of Giant protein in the ventral
and dorsal regions of the embryo. Giant images were accessed from two other databases:
the Berkeley Drosophila Transcription Network Project (BDTNP) and the Database of
Segmentation Gene Expression in Drosophila (FlyEx). mRNA levels in actual embryos
ﬁom the BDTNP and protein levels from an average virtual embryo image frOm the
FlyEx were used.‘"18 Overall relative differences in intensities between anterior and
posterior levels of Giant protein and mRNA were observed in all three data sets, as well

as the relative lower levels of Giant in ventral regions of the anterior stripe (data not

76

shown). Despite differences in imaging, these methods appear to capture the same

essential features of this gene’s expression.

RESULTS
Background:

A critical issue in quantitative measurement of gene expression is background
subtraction. In a previous study, it was noted that the background coming from
nonspeciﬁc binding of a variety of primary and secondary antibodies to Drosophila
embryos can be approximated by a paraboloid.9 In that study, embryos were ﬂattened (by
the weight of a coverslip and using reduced amounts of mounting solution) so that a
signiﬁcant fraction of nuclei of the embryo could be captured in two 2pm sections. In our
system, we do not distort the embryo, thus we tested whether this parabolic background
relationship still applied. By assessing ﬂuorescence in two different channels in embryos
lacking a lacZ reporter gene, as well as in portions of embryos devoid of Giant protein
(45-55% egg length) we found that the background often did not show a paraboloid shape
(Fig. II-IA, B). One evident difference between our methods and that described in
Myasnikova et al.9 is the degree of ﬂatness of the embryos. We imaged embryos of
successively ﬂatter proportions and analyzed background levels as a function of ﬂatness
(Fig. II-lC-F). Embryo ﬂatness was judged by the number of 1.73pm slices required to
reach the center of the embryo. Embryos that were ﬂattened by a weighted coverslip had
radial thicknesses ranging from 14-23 pm, while rounder embryos had thicknesses of 24-
33pm. This level of ﬂattening is not as pronounced as that described in Myasnikova et
al.9 where almost the entire upper half of the embryo can be scanned in two 2pm slices,

but a clear trend emerged. The background has a distinct tendency to be more

77

paraboloidal with ﬂatter embryos, perhaps because unique light scattering properties of
ﬂat embryo sections contribute to a nonlinear background. Correlation between ﬂatness
of the embryos and curvature of the background intensity is measured by Pearson
correlation coefﬁcients. For lacZ images, the Pearson correlation coefﬁcients between
ﬂatness of the embryo and curvature of the background curve are -0.5 (p=0.006) (ventral-
dorsal) and -0.6 (p<0.001) (anterior-posterior). When this relationship is measured as the
correlation between embryo ﬂatness and the natural log of the curvature, the correlation
coefﬁcients are -0.6 (p<0.001) (ventral-dorsal) and -0.7 (p<0.001) (anterior-posterior).
The relationship between ﬂatness of the embryo and curvature of the background curve
can be explained better with nonlinear functions. The main point here is that parabolic
background cannot be assumed without taking the geometry of embryo into account. We
also noted that for some embryos, inhomogeneously distributed background was evident
in both channels (488 nm and 555 nm), suggesting that speciﬁc structural features of

embryos can affect background (data not shown).

78

Figure II-l: Parabolic and nonparabolic background forms.

Representative parabolic (A) and ﬂat (B) backgrounds. Data was taken from the middle
45-55% egg length of embryos . In general, we observed parabolic background for ﬂatter
embryos. Embryo cross sectional radii were measured and classiﬁed as ‘ﬂattened’ (14-23
pm) or ‘round’ (24—33 pm). Considerable variability in curvature was noted, but ﬂatter
embryos tended to exhibit higher curvature overall. Data were obtained for Giant imaging
ventral-dorsal (C), anterior-posterior (D) and lacZ imaging ventral-dorsal (E), anterior-
posterior (F). Parabolicity of background is measured by curvature of the parabola ﬁt to
curve, with apex generally at center of embryo cross section. For anterior-posterior Giant
background measurements, young gfm mutants expressing virtually no detectable Giant
protein were used. For lacZ background imaging embryos without the reporter gene were

utilized.

79

Intensi

Intensi

 

-50

i —?ARABOLAI

l

o
Ventral-Dorsal

 

 

 

50

 

100

 

35—~~~

25% ~

20
15L

 

1

 

 

~~ - » i —i=ARABOLA| ~

 

 

o
Ventral-Dorsal

80

100

Figure II-l: Continued.

 

 

 

 

 

CS

— i— _ _

4 3 2 1

359.332.: s: 9.33.50

a

0

Round

Flattened

 

 

 

 

 

 

 

D1

8. 6. A 2
0 O O 0

$505305 .5 9.3250

Round

Flattened

81

Figure II-l: Continued.

 

 

 

 

 

 

$596255 c: 25950

 

Round

lﬂaﬂened

 

 

 

 

 

 

 

5
2

2 5 1 5
1 0

$523305 .5 252.50

Round

lﬂaﬂened

82

Fluorescent quantitation of mRNA and protein:

Quantitative measurements of transcription factor levels and mRNA inputs are
essential for modeling gene regulation, yet surprisingly few studies relying on CLSM
have independently tested whether the signals thus obtained exhibit strongly nonlinear
effects. A quantitative study of Drosophila mRNA levels made a single correlation
between knirps mRNA and Knirps protein obtained by CLSM as a test for
proportionality, while other studies have quantitated levels of GFP protein as a proxy for

18"9 We concluded that more rigorous independent measurements

transcript levels per cel
are essential for testing the validity of our quantitation. In the ﬁrst case of mRNA
detection, we varied the gene dose of a lacZ reporter expressed in ventral regions of the
embryo and compared levels of mRNA by RTqPCR and CLSM. The RTqPCR results
showed that the relative amount of lacZ mRNA was 1.0 in heterozygotes compared to
1.92 in homozygotes (Fig. II-2A). In comparison, ﬂuorescent intensities for lacZ
heterozygous compared to homozygous embryos were 1 to 1.85 (Fig. II-2B). This result
suggests that both methods are responding to estimated twofold differences in mRNA in
a similar manner, and that the ﬂuorescent intensities do provide a reasonable proxy for
actual mRNA levels.

Regarding protein measurement, previous studies have analyzed relative
expression levels of a number of regulatory proteins in the Drosophila embryo by CLSM,
but to our knowledge the proportionality of these measurements to actual protein has not
been assessed.1 We tested the measurement of protein by comparing the values of Giant

expression in embryos derived from an outcross of thH heterozygotes to a wild type

strain (Fig. "-3). gt is located on chromosome 1, thus male hemizygotes carrying the

83

Figure II-2: Proportionality of signal to mRNA levels

The average relative amounts of lacZ mRNA in heterozygous and homozygous
transgenic embryos were measured by RTqPCR analysis (A) and by immunoﬂuorescent
in situ hybridization and confocal laser scanning microscopy (B). Quantitation of mRNA
levels by RTqPCR is representative of two biological assays; error bars indicate standard
deviation from ﬁve technical replicates. Fluorescent imaging of 39 embryos were

quantitated in (B).

84

 

 

 

 

 

 

5.
2

2 5. 1 5.
1 0
:oﬁaoixo ocoo 0520:

Homozygote

Heterozygote

 

 

 

 

 

 

5
2

2 5 1.. 5
1.. 0

3.2.9:. «caucuses: 02:22".

Homozygote

Heterozygote

85

mutant allele would be expected to express the lowest levels of the protein, and female
heterozygotes would have intermediate levels. The allele used, thH, has been reported
to express low levels of protein and consistent with this observation, no mid-blastoderm
embryos entirely lacked Giant expression.

Embryos were stage matched to allow direct comparison of protein levels.
Embryos with apparently normal intensity of Giant signal were observed alongside of
age-matched embryos with very low Giant levels (Fig. lI-3A, B); those differences are
not likely to be merely a function of overall staining efﬁciency because background
levels were fairly constant and such large differences are not usually seen when imaging
wild-type embryos. When background subtracted, normalized levels of Giant were
plotted for a set of 22 images, a roughly ten-fold range of signal intensities were observed
(Fig. II-3C). The values did not fall into three obviously discrete clusters, corresponding
to hemizygous null, heterozygous, and wild-type backgrounds, but were rather
continuously distributed with some degree of over representation at higher and lower
values. This effect may be due to nonlinearity of the ﬂuorescent readout, which would
produce some compression at one end of the spectrum, variability in detection or even
expression of Giant from embryo to embryo, or some combination of these effects. We
can conclude that this method does permit an apparent dynamic detection range of at least
ten fold sufﬁcient to capture the total dynamic variation reported for Giant in the
blastoderm embryo but clearly a more detailed comparison of signal proportionality is
warranted.1 In light of our mRNA results, it appears that ﬂuorescent detection can be an
appropriate proxy for in situ levels of biomolecules, but the proportionality of read out

must be established for each set of reagents.

86

Figure II-3: Proportionality of signal to Giant protein levels

Embryos with apparently normal intensity of Giant (A) and low intensity of Giant (B)
were observed for age-matched embryos. Normalized levels of 22 background-subtracted
Giant images were plotted, and a roughly ten-fold range of signal intensities were
observed (C). Overlapping data points were separated into two columns for clarity for the

anterior and posterior stripe.

87

Intensi

A

Ventral-Dorsal

 

250— -- -

100 ,_ .....

50%

 

150_.,. . ....... iﬁ. . , ._ .. .. _ . . . . .. _
t1

 

 

 

l i i 1 i l
100 150 200 250 300 350 400 450
Anterior-Posterior

88

Figure II-3: Continued.

FD

 

 

 

'5
(D
L-
O
D
_L
E
.0.-
C
d)
>
Anterior-Posterior
250l— ‘l' l l - i l I l I _
200i '
'tT) 150r- ' _
5
IE 100 . , . . : -
_ ‘ MVANWIWWW‘ “NV/Kim» :
50 lfwf‘fw . . , : WWWW~5~WWwWﬂ . ”WW
l 2 ‘ ' 1
0i l i l l i i i
0 50 100 1 50 200 250 300 350 400 450

Anterior-Posterior

89

Figure II-3: Continued.

C.

2.5

 

 

a:
.2
z
8 i
a.» 2 o. o.
H
5 o .9
E- z 09
a: 1.5. 3 ‘9
o
m
e :-
2 1‘ ’
'0- O .9
'a 9 o
o
:2 ’ o:
a 0.5- g: Q.
E ’ ’
§

0

 

 

Anterior Posterior

90

Methods for Data Collection:

In order to understand the functional relationship between transcriptional activator
and repressor levels and mRNA output we need to create gene regulatory “maps” that
describe the quantitative relationship between these elements.” We created a series of
lacZ reporter genes regulated by the Giant transcriptional repressor and the Dorsal and
Twist activators. These reporter genes are active in ventral regions of the embryo, and
expression is interrupted in areas where Giant is expressed, depending on the
arrangement of the binding sites (Fig. II-4A, B). We directly measure levels of the Giant
repressor protein as an input value. The spatially and temporally varying levels of this
protein provide in each embryo an entire set of values relevant to a gene regulatory map.
Activator levels can be similarly measured, and in the case of Dorsal and Twist, these
activators vary along the dorsal to ventral axis.2 We focus on gene modules that test
varying features of repressor binding sites, while holding Dorsal and Twist sites constant.
Dynamic and spatially heterogeneous protein levels of transcriptional regulators make it
imperative that we compare lacZ levels with corresponding levels of regulatory factors in

nearby nuclei from which the mRNA originates.

Background:

Background intensity for the Giant channel is calculated by averaging the data
from the middle (SO-60% egg length) of the embryo along the ventral-dorsal axis. In
cases where the curve shows linear behavior, the average of this middle stripe can be
subtracted from the whole embryo, and if it shows parabolic behavior, then the

background can be calculated and subtracted as previously described.9 Background

91

intensity for the lacZ channel can be calculated similarly by using the dorsal parts of the

embryo, where no lacZ is present.

Binning:

A pixel by pixel comparison between Giant, found in the nucleus, and the lacZ mRNA
channel is not applicable because the mRNA accumulates in the spatially separated
cytoplasm. In addition, we observe that lacZ mRNA accumulates in a punctuate pattern
(a function of the speciﬁc reporter, as other mRNAs show a smoother, nonnuclear
pattern, data not shown). This problem can be solved by measuring the approximate size
of a “cell” in confocal images and covering the region of interest by boxes of the size of

an average cell size. A 10x10 pixel box is the average cell size for our images.

Relevant areas of data collection:

We then collect a series of data points representing the lacZ gene output in
regions with similar levels of activators but varying levels of Giant protein. When not
regulated by Giant, the lacZ expression pattern is almost constant from approximately 20-
80% egg length (anterior-posterior) in the ventral part of the embryo (Fig. II-4A). When
Giant is effective at repressing the lacZ reporter, gaps in this pattern are seen in anterior
and posterior regions (Fig. II-4B). It has been reported that twist mRNA levels are not
uniform from anterior to posterior“ but this variation does not seem to be reﬂected in the
output of our reporters, which are activated by Twist and Dorsal in a fairly uniform
pattern in the mid-blastoderm stage. For this reason, in the regions that are exactly
parallel to the ventral boundary, we assume that the combined activator effect is not

changing for our synthetic enhancer constructs. We therefore measure correlated lacZ

92

Figure “-4: Sampling of integrated mRNA and Giant data

lacZ reporter constructs were introduced into Drosophila by gerrnline transformation and
mRNA detected by ﬂuorescent imaging. Dorsal and Twist activators drive lacZ gene
expression in ventral regions (A). When regulated by Giant, gaps in this pattern are
evident (B). A sampling mesh is imposed upon the background subtracted image and
values for Giant and lacZ are collected (C, D). The mesh size here is exaggerated for
clarity. Sampling proceeds from anterior region of anterior stripe to posterior region of

posterior stripe.

93

 

94

and Giant levels in “slices” that track along the ventral aspect of the embryo (Fig. II-4C,
D). Curved slices, 10 pixels in width, are taken from ventral part of the embryo along
anterior to posterior, following the boundary of the embryo. Each slice is divided into
10x10 pixel boxes along anterior to posterior to derive mean values for lacZ and Giant.

To accommodate images of embryos that are rotated around the anterior-posterior
axis to different degrees, exposing more or less of the ventral surface where Dorsal/Twist
are active, we vary the number of slices applied to each embryo so that our lacZ mRNA
and Giant measurements reﬂect areas with nearly constant activator levels. To determine
the number of slices used, we measure lacZ levels in central regions (SO-60% egg length)
that are not subject to Giant regulation and extend slices from dorsal to ventral until
activator levels are determined to be limiting. This is accomplished by measuring lacZ
expression in the 50-60% egg length portion of the embryo and averaging along the
anterior-posterior axis, producing a plot that describes how lacZ is changing from ventral
to dorsal. This data has a peak value at the ventral side of the embryo and decreases from
ventral to dorsal. The slices start from the peak level of lacZ expression and end at 50%
of the maximal lacZ expression.

Because the ventral lacZ expression is also affected by vector-mediated activation
in anterior regions, as well as Torso-mediated regulation of Dorsal” in the poles, our
slices start from a position of half of the maximal Giant intensity on the anterior side of
the major anterior stripe, to half of the maximal Giant intensity of the posterior side of the
posterior Giant stripe. The selected region still includes most of the region of Giant
expression and removes portions of the data subject to extraneous inﬂuences (Fig. Il-4C,

1)).

95

Normalization and regulatory plots:

The quantitation described above focuses on the lacZ levels as a function of the
Giant concentration (Fig. II-5). The Giant data is normalized by dividing the 2-4h old
embryo time interval studied here into eight segments as described in Jaeger et al. The
average levels of the protein in each interval are used to normalize the Giant signals. In
this way, comparisons of lacZ expression to relative Giant levels in a given embryo can
be combined with data from other embryos into larger data sets. Because Dorsal and
Twist activator proteins are relatively constant, each embryo can be normalized by
dividing by its maximum intensity value. These normalized expression levels of Giant
protein and lacZ mRNA are used to plot expression surfaces, showing the repressor
signal and activator signal versus output (F ig.II- 5). For the visualization shown here, the
data is presented as two dimensional plots, in which each box represents one slice from
the ventral region of the embryo. As expected, with increasing Giant levels or decreasing
activator levels, the lacZ signal is reduced. The resulting plot represents two dimensional
slices of a gene regulatory surface that is unique to the speciﬁc regulatory region of
interest.

Previous studies have demonstrated that the Giant protein can repress transgenes
regulated by Dorsal and Twist, but the characterization of this relationship has never gone
beyond a qualitative nature.” To characterize the quality of the data obtained from this
system and assess its utility for quantitative modeling we measured the correlation
between [Gt] and ln([lacZ]) using the Pearson correlation coefﬁcient. For the

2gt.25.2T2D construct showing regulation by Giant, correlation coefﬁcients for slice 1 to

96

Figure “-5: Gene regulatory representations from Giant regulated lacZ reporter
gene

To generate gene regulatory plots, data derived from slices (lacZ mRNA channel, A and
Giant protein channel, B) are plotted in a two dimensional space. Only two slices are
shown for clarity. The plots numbered 1-6, show how robust lacZ levels present in
ventral portions of the embryo are repressed in regions containing peak Giant levels such
as in plot 1-4. In more dorsally located regions (plot 5-6) limiting levels of activators
reduce lacZ expression regardless of Giant levels. The overlaid lines, obtained by ﬁtting
the data with a rational function show the trend of the relation between levels of Giant

and lacZ.

97

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

98

slice 6 are -0.74, -0.78, -0.82, -0.85, -0.65 and -0.67. The p-values for these correlation

levels are all less than 0.0001, indicating statistical signiﬁcance.

DISCUSSION

We describe a method to correlate the transcription factors and mRNA output for
gene modeling purposes. We show that the levels of lacZ mRNA, and potentially the
transcriptional repressor protein Giant, are proportional to ﬂuorescent intensities, a
critical basis for‘ quantitative modeling. Our analysis also reveals that a suggested
parabolic form of the background ﬂuorescence in confocal images of early Drosophila
embryos is evident most prominently in ﬂattened specimens, with intact embryos
exhibiting a more linear background. After appropriate background subtraction and
normalization, these data are amenable to representation of gene regulatory surfaces that
permit creation and validation of quantitative models of gene expression. In this way, we
have constructed the foundations for modeling a cis-regulatory ‘grammar’ that applies to
an important set of transcriptional regulators in the Dros0phila blastoderm embryo. More
generally, the image and data analysis techniques described in this paper can be
generalized to understand the quantitative function of endogenous genes or gene
networks in the Drosophila embryo or other well-characterized systems. Such an
approach may also prove useful for engineered transcriptional regulatory elements

employed for targeted expression of genes in therapeutic settings.

99

REFERENCES:

Jaeger J., Surkova S., Blagov M., Janssens H., Kosman D., Kozlov K.N., Manu,
Myasnikova E., Vanario-Alonso C.E., Samsonova M., Sharp DH, and Reinitz J.
Dynamic control of positional information in the early Drosophila blastoderm. Nature
430, 368, 2004.

Zinzen R.P., Senger K., Levine M., and Papatsenko D. Computational models for
neurogenic gene expression in the Drosophila embryo. Curr Biol. 16, 1358, 2006.

Rivera-Pomar, R. and Jackle, H. From gradients to stripes in Drosophila embryogenesis:
ﬁlling in the Gaps. Trends Genet 12, 478, 1996.

Janssens H., Hou S., Jaeger J., Kim A-R., Myasnikova E., Sharp D., and Reinitz J.
Quantitative and predictive model of transcriptional control of the Drosophila
melanogaster even skipped gene. Nat Genet 38, 1159, 2006.

Segal E., Raveh-Sadka T., Schroeder M., Unnerstall U., and Gaul U. Predicting
expression patterns from regulatory sequence in Drosophila Segmentation. Nature 451,
535, 2008.

Janssens H., Kosman D., Vanario-Alonso C. E., Jaeger J ., Samsonova M., and Reinitz J.
A high-throughput method for quantifying gene expression data from early Drosophila
embryos. Dev Genes Evol 215, 374, 2005.

Tomancak P., Beaton A., Weiszmann R., Kwan E., Shu S., Lewis S.E., Richards S.,
Ashbumer M., Hartenstein V., Celniker SE, and Rubin G.M. Systematic determination
of patterns of gene expression during Drosophila embryogenesis. Genome Biol 3, 88,
2002.

Luengo Hendriks C.L., Keranen S.V.E., Fowlkes C.C., Simirenko L., Weber G.H.,
DePace A.H., Henriquez C., Kaszuba D.W., Hamann B., Eisen M.B., Malik J., Sudar D.,
Biggin M.D., and Knowles D.W. Three-dimensional morphology and gene expression in
the Drosophila blastoderm at cellular resolution 1: data acquisition pipeline. Genome
Biol 7, R123, 2006.

Myasnikova E., Samsonova M., Kosman D., and Reinitz J. Removal of background
signal from in situ data on the expression of segmentation genes in Drosophila. Dev

Genes Evol 215, 320, 2005.

Small 8., Blair A., and Levine M. Regulation of even-skipped stripe 2 in the Drosophila
embryo. EMBO J 11, 3147, 1992.

Kosman D., Mizutani C. M., Lemons D., Cox W. G., McGinnis W., and Bier E.
Multiplex detection of RNA expression in Drosophila embryos. Science 305, 846, 2004.

100

Kosman K., Small S. and Reinitz J. Rapid preparation of a panel of polyclonal antibodies
to Drosophila segmentation proteins. Dev Genes Evol 208, 290, 1998.

Szymanski P. and Levine M. Multiple modes of dorsal-bHLH transcriptional synergy in
the Drosophila embryo. EMBO J 14, 2229, 1995.

Hewitt G.F., Strunk B.S., Margulies C., Priputin T., Wang X.D., Amey R., Pabst B.A.,
Kosman D. Reinitz J. and Amosti D.N. Transcriptional repression by the Drosophila
giant protein: cis element positioning provides an alternative means of interpreting an
effector gradient. Development 126, 1201, 1999

Huggett J., Dhehada K., Bustin S., and Zumla A. Real-time RT-PCR normalization;
strategies and considerations. Genes Immun 6, 279, 2005.

Vandesompele J., De Preter, K., Pattyn F ., Poppe B., Van Roy N., De Paepe A., and
Speleman F. Accurate normalization of real-time quantitative RT-PCR data by geometric
averaging of multiple internal control genes. Genome Biol 3, 34, 2002.

Eldon E., and Pirrotta V. Interactions of the Drosophila gap gene giant with maternal and
zygotic pattern-forming genes. Development 111, 367, 1991.

Myasnikova E., Samsonova A., Kozlov K., Samsonova M., and Reinitz, J. Registration of
the expression patterns of Drosophila segmentation genes by two independent methods.
Bioinformatics 17, 3, 2001.

Darnle S., Hanser B., Davidson E. H., and Fraser S.E. Confocal quantiﬁcation of cis-
regulatory reporter gene expression in living sea urchin. Dev Biol 299, 543, 2006.

Setty Y., Mayo A.E., Surette M.G., and Alon U. Detailed map of cis-regulatory input
function. PNAS 100, 7702, 2003.

Li X., MacArthur S., Bourgon R., Nix D., Pollard D.A., Iyer V.N., Hechmer A.,
Simirenko L., Stapleton M., Luengo Hendriks C.L., Chu H.C., Ogawa N., Inwood W.,
Sementchenko V., Beaton A., Weiszmann R., Celniker S.E., Knowles D.W., Gingeras T.,
Speed T.P., Eisen M.B., and Biggin M.D. Transcription factors bind thousands of active
and inactive regions in the Drosophila blastoderm. PLoS Biol 6, e27, 2008.

Rusch J., and Levine M. Regulation of the dorsal morphogen by the T011 and torso

signaling pathways: a receptor tyrosine kinase selectively masks! transcriptional
repression. Genes Dev 8, 1247, 1994.

101

Chapter III

Deciphering a transcriptional grammar: modeling short-range repression in the

Drosophila embryo

Walid D. F akhouri, Ahmet Ay, Rupinder Sayal, Jacqueline Dresch, Evan Dayringer,

Chichia Chiu and David N. Amosti submitted to Molecular Systems Biology

102

ABSTRACT

Systems biology seeks a genomic-level interpretation of transcriptional regulatory
information represented by patterns of protein binding sites. Obtaining this information
without direct experimentation is challenging; minor alterations in binding sites can have
profound effects on gene expression, and underlie key aspects of disease and evolution.
Quantitative modeling offers an alternative path to develop a global understanding of the
transcriptional regulatory code. Recent studies have focused on endogenous regulatory
sequences; however distinct enhancers differ in many features, making it difﬁcult to
generalize to other cis-regulatory elements. We applied a systematic approach to simpler
elements, and present here the ﬁrst quantitative analysis of short-range transcriptional
repressors, which play central roles in metazoan development. Our fractional occupancy-
based modeling uncovered unexpected features of these proteins’ activity that allow
accurate predictions of regulation by the Giant, Knirps, Kruppel and Snail repressors,
including modeling of an endogenous enhancer. This study provides essential elements of
a transcriptional grammar that will allow extensive analysis of genomic information in

Drosophila melanogaster and related organisms.

INTRODUCTION

The rapid increase in sequenced genomes has provided an extensive “parts lists”
of organisms, however deeper understanding of genomic information that includes gene
regulatory functions is critical to understanding the dynamic activity of biological
systems. Subtle changes in regulatory elements are often involved in hereditary diseases,
population differences and the evolution of morphological novelties (Carroll et al, 2001).

Comparative studies have demonstrated that regulatory regions can retain function over

103

large evolutionary distances, even though the DNA sequences are divergent and poorly
alignable (Ludwig & Kreitrnan, 1995; Hare et al, 2008). The ﬂexibility in arrangement of
binding sites is not unlimited, however. For instance, the effectiveness of short-range
transcriptional repressors that play key roles in Drosophila development is strongly
inﬂuenced by activator-repressor distances (Gray et al, 1994; Amosti et al, 1996a;
Kulkarni & Amosti, 2005).

The Drosophila blastoderm embryo provides an ideal setting for the analysis of
transcriptional enhancers; the cascade of maternally and zygotically supplied
transcription factors has been extensively investigated at a molecular level, and many
DNA regulatory elements have been identiﬁed and functionally dissected. In this system,
genes with complex expression patterns are controlled by multiple enhancers, whose
modular function depends on the local action of repressor proteins (Small et al, 1993).
The blastoderm embryo has been used for quantitative analysis of gene expression by
reaction diffusion, Boolean, and fractional occupancy modeling (Jaeger et a1, 2006;
Sénchez & Thieffry, 2001; Segal et al, 2008). Fractional occupancy models draw from
simple biophysical principles and statistical physics to predict the overall readout of
endogenous enhancers (Bintu et al, 2005a, 2005b). In these models, parameters include
the binding afﬁnity of transcription factors to the DNA and cooperativity between
proteins. Such models assume that gene regulation is dictated largely by the equilibrium
binding of transcription factors to the DNA, without explicitly modeling events such as
chromatin modiﬁcations and RNA polymerase phosphorylation.

Simple prokaryotic systems provide a tractable setting for quantitative studies,

and fractional occupancy models have been applied to the lac operon in E. coli and the

104

lysis/lysogeny switch of phage lambda (Von Hippel et al, 1974; Ackers et al, 1982; Shea
& Ackers 1985, Vilar & Leibler, 2003). Use of these models in eukaryotes is more
problematic, given the higher degree of enhancer complexity in eukaryotic systems, but
Drosophila enhancers have been treated by fractional occupancy models that account for
factor spacing and recruitment of co-regulators (Janssens et al, 2006; Zinzen et al, 2006).
These models can reproduce the behavior of speciﬁc enhancers, but a major limitation of
fractional occupancy modeling of endogenous enhancers is that models of a single
regulatory region, may not generally apply to other elements. In studies of multiple
enhancers, the parameter estimation has been difﬁcult, as the different architecture of
distinct enhancers, even those regulated by the same proteins, makes it difﬁcult to know
which parameters (number of bindings sites, relative arrangements etc.) are key to
determining the particular activity of an enhancer (Segal et al, 2008). As we describe
here, a more systematic approach is necessary to parse the contributions of individual
physical features to enhancer activity.

One particular area that has been inadequately explored is the key role played by
repressor proteins. Giant, Knirps, and Kriippel are regionally deployed short-range
repressor proteins that bind to and control the patterning of pair-rule genes such as even-
skipped. Previous studies showed that precise positioning of short-range repressors on an
enhancer can be used to generate the appropriate expression pattern in a morphogenetic
ﬁeld where the concentration of these repressors are used to set gene expression
thresholds (Hewitt et al, 1999; Clyde et al, 2003). Thus, the ﬂexibility of enhancer
architecture incorporating these proteins is constrained by some distance limitations. Our

previous study demonstrated that activator-repressor stoichiometry and arrangement of

105

binding sites also inﬂuence the overall readout of developmental enhancers (Kulkarni &
Amosti, 2005). To build tools able to accurately predict the function of novel enhancer
sequences, we recognized a need to quantitatively measure the speciﬁc contributions of
these factors to overall enhancer function. Here we describe the creation and quantitative
assessment of a well-deﬁned set of transcriptional regulatory modules in the Drosophila
embryo, in which individual aspects relating to repressor-activator spacing, stoichiometry
and arrangement are systematically explored. Using quantitative data from these genes,
we apply a fractional occupancy approach to model the interaction of short-range
repressors with endogenous transcriptional activators. We show that this approach can
correctly decipher the transcriptional regulatory code of endogenous enhancers, pointing
the way to a general approach for unlocking the transcriptional regulatory information of

genomes.

MATERIALS AND METHODS

Reporter genes

The binding motifs for the Giant, Krﬁppel and Knirps short-range repressors and
the Twist and Dorsal activators used in this study were characterized in previous studies
(Szymanski & Levine, 1995; Hewitt et al, 1999; Kulkarni & Amosti, 2005).
Regulatory modules were constructed in pBluescript KS(+) using the EcoRI, BamHI,
XbaI and SacII restriction sites, ampliﬁed by PCR using T3 and T7 primers, and
amplicons were digested with EcoRI and SacII and subcloned into the compatible sites of

C4PLZ (Wharton & Crews, 1993). Gene 1 contains two Giant binding sites inserted

106

between EcoRI and BamHI, two Twist sites inserted between BarnHI and XbaI, and two
Dorsal binding sites between XbaI and Sacll.

Gene 2 includes a 25bp spacer inserted between Giant and Twist sites using
BamHI. For genes 4, 6, 7 and 8, the same 25bp spacer was concatemerized and inserted
at BamHI. For genes 3 and 5, a 35bp or 60bp spacer was inserted at BamHI, between the
Giant and Twist binding sites. Gene 9 contains a single Giant binding site inserted
between EcoRI and BamHI. Gene 10 was constructed by digestion of the parent gene 1
pBluescript plasmid with EcoRI and insertion of the single Giant binding site, preserving
a single 5’ EcoRI site. For genes 12, 13, 14, 15 and 19 the same strategy was used to
insert an extra Giant binding site 3’ of the Dorsal sites using Sacll. For genes 13 and 15,
in which the binding sites are moved away from the basal promoter of lacZ reporter gene,
a 340bp spacer was ampliﬁed from the coding region of knirps gene and inserted into the
Spel of C4PLZ plasmid (Kulkarni & Amosti, 2005). A weaker Giant site was tested in
gene 20. The sequences for all oligos used are shown in Appendix A Table 111. All gene
cassettes were conﬁrmed by sequencing, and at least 5 transgenic lines of each gene were
analyzed by in situ hybridization for lacZ expression pattern. Lines showing enhancer
trapping were not included in the analysis. Fixed embryos from two to three transgenic

lines of each gene were used for confocal laser scanning microscopy (CLSM).

Image processing
A ﬁve step procedure was applied to all embryo images as described in Ay et al
(2008), involving binary image generation, rotation, outlier removal, background

subtraction and normalization. Non-speciﬁc signals were not observed in Kruppel and

107

Knirps channels, so no outlier removal was required for these channels. Background
intensity for the Giant channel was calculated by ﬁtting a parabola along the dorsal-
ventral axis to the average of the data from the middle (SO—60% egg length) of the
embryo along the anterior-posterior axis, a region lacking any Giant expression.
Background intensity for the Krtippel and Knirps channels was calculated and removed
similar to Giant channel, by using the data from the middle (75—85% and 25—35% egg
length respectively) of the embryo along the ventral—dorsal axis.

The Giant channel was normalized so that embryos in the same age group with
similar rotations have the same total average signal in anterior and posterior stripes. The
lacZ channel was normalized using the average signal in a region bounded by the 50-60%
egg length (anterior-posterior axis) and the peak to 60% of the peak lacZ signal (dorsal-
ventral axis). Kriippel and Knirps channels were normalized similarly to the Giant
channel. We also tested normalization scheme for Giant levels that used the mean of the
peak 0.25% values set to 153 of 255 by linear transformation (The value 153 was chosen
empirically to minimize the number of saturated pixels). The former scheme was more
effective in normalizing Giant levels, but the [lacZ] vs. [Gt] plots were not signiﬁcantly

altered, possibly due to the higher level of noise in the lacZ channel.

Data set

A total of 769 embryos bearing lacZ reporter gene constructs regulated by the
Giant repressor protein were analyzed and an additional 45 and 88 were analyzed for
genes regulated by Kruppel and Knirps respectively. Genes 6, 7, 8 and 18 were not used

in the quantitative modeling because no Giant repression was ever observed, and an

108

ectopic modulation of the reporter gene by unknown factors in a fraction of the imaged
embryos made this data especially noisy. All primary data are available on our server at:

www.bch.msu.edu/faculty/arnosti.htm

Parameter estimation

There are two main groups of approaches for parameter ﬁtting; local and global
parameter estimation techniques. Both groups might get stuck at local optima, but the
chance of ﬁnding the global optimum is greater with global parameter estimation
techniques. We compared three global techniques, namely evolutionary strategy
(Runarsson & Yao, 2005), genetic algorithm (The MathWorks, Inc.) and simulated
annealing (The MathWorks, Inc.) .These methods have been shown to be suitable for
parameter estimation in biological systems in several studies (Mendes & Kell, 1998;
Moles et al, 2003; Fomekong-Nanfack et al, 2007). We tested these estimation
techniques by running them on a synthetic data set produced by calculating our model
output for assigned parameters; such parameter estimation is a common technique for
validation (Appendix A Figures 1-4) (Moles et al, 2003). We further checked their
robustness on the synthetic data by introducing noise into the data. Strikingly different
results were obtained when these approaches were tested on the model. After 100
simulated runs, the genetic algorithm and simulated annealing techniques were unable to
accurately predict the correct parameters, even with no noise introduced into the data set
(Appendix A Figures 2-3). In contrast, the evolutionary strategy algorithm was able to
reproducibly predict the correct parameter set (Appendix A Figures 1 and 4). To

determine the robustness of the parameter estimation techniques to noise, the estimation

109

algorithms were also run on the synthetic data with noise of normal distribution. Not
surprisingly, the variance for genetic algorithm and simulated annealing algorithm was
high at all levels of noise. Even at a noise level of 30%, the variation for the evolutionary
strategy approach was lower than the other two methods (Appendix A Figure 4). It is
possible that genetic algorithm or simulated annealing may be capable of producing
better results if implemented differently, however if the choice of parameter estimation is
not examined carefully, a broad range of values may be produced, leading to the

conclusion that there are no optimal values.

Schemes

Distinct forms of our model were implemented in which the quenching
parameters were grouped in different ‘bins’ of distances. For distances more than 81bp
quenching efﬁciencies of the repressors are taken as 0, motivated by our genes 6-8, which

shows no repression.

Schemes 1, 2 and 8: q1(6bp), q2(28—41bp), q3(50—56bp), q 4(63—66bp), q5(78bp),

q6(6bp from 3' end of activators).

Schemes 3, 4 and 9: ql(6bp), q2(28—41bp), q3(50-53bp), q4(56—66bp), q5(78bp),

q6(6bp from 3' end of activators).

Schemes 5: ql(6bp), q2(28—3lbp), q3(4l—50bp), q4(53-—56bp), q5(63—66bp),

q6(78bp), q7(6bp from 3' end of activators).

110

Schemes 6: q1(6bp), q2(28—3lbp), q3(4lbp), q 4(50—56bp), q5(63—66bp), q6(78bp),

q7(6bp from 3' end of activators).

Schemes 7: ql(6bp), q2(28bp), (13(3lbp), q 4(4lbp), q5(50bp), q6(53bp), q7(56bp),
q8(63bp), q9(66bp), q10(78bp), q11(6bp from 3' end of activators).

We also tried different expressions of cooperativity. In schemes 1, 3, 5, 6 and 7
only one parameter is used for Giant-Giant cooperativity. In schemes 2, 4, 8 and 9 two
parameters are used for Giant-Giant cooperativity, where the ﬁrst parameter describes
cooperativity of Giant proteins with lObp distance and the second describes cooperativity
for 32bp distance. In schemes 2 and 4 we described the cooperativity of observing all
three Giant proteins on the DNA as the summation of the two cooperativity parameters,

and in schemes 8 and 9 as the multiplication of the two cooperativity parameters.

Derivation of the model for gene 1
We express efﬁciency of the activator group bound and not bound respectively

as E A and E N’ and efﬁciency of the Giant repressor as E Gt' We represent the

efﬁciency vector that represents efﬁciency for each state of activator set and Giant
repressors as E, the state vector of activator set and Giant repressors as F , the
regulatory function that transforms each efﬁciency vector input to transcription level as
T , and total steady state transcription level as Ex. The probability of each state of those
proteins on the DNA can be calculated. Because the activator binding sites do not vary
within the genes tested here, the Dorsal and Twist activators are not parameterized, and

are considered as one group. We set S AlA] equal to 100 with the assumption that in the

111

absence of Giant repressor protein, the activators are fully functional. A set of eight
equations describes all possible states of this gene. For simpliﬁcation in the following

formulas,

_ 2
Z—(l+ SAIAI)(1+2SR[GII+C(SRIth) ).

No activator and repressor bound: F N =%

. . S AIAI
Activator set rs bound: F A = T

SR[Gt|

 

Proximal Giant to the activator set is bound: F G t =
1

 

Distal Giant to the activator set is bound: F Gt 2

. . . . . S AIAISRIGII
Activator set and proxrmal Grant to the actrvator set rs bound: F A Gt = Z
1

 

. . . . . S AIAIS Rthl
Actrvator set and distal Grant to the activator set rs bound: F =
AGI2 Z

2
_ C(SRthl)

F _
thGt2 Z

Both Giant repressors are bound:

2
_ SAIAIC(SRIGII)

thGt2 Z

 

Activator set and both Giant repressors are bound: F A

112

Then the states vector of one activator set and two repressors can be written as:

F=|F ,F ,F ,F ,F ,F ,F ,F ]
N A Gr1 Gt2 Ath AGt2 thGt2 [‘thth2

In the expressions above we stated all the Boltzmann states of an enhancer with
one activator set and two repressor binding sites. We claim that the binding of the
repressors modulates the probability of states with activators and repressors
simultaneously bound by a quenching factor. For example binding of the activatorA and

repressor R simultaneously is reduced by a factor of (l—ql). We can modify the

1

probability of states, in the following way:

~

F = [F N’F A’F 0:1 I (3:2 ’F Ath (1“th AGt2 (1 "‘12” thGtz ’F AthGtza—qlxl "‘12”

Next we calculate the total efﬁciency of the enhancer when factors are bound on
the DNA. We model cooperativity between Giant sites, which might be for example due
to cooperative cofactor recruitment, with an additive function, so the total efﬁciency of

Giant repressors bound to the DNA at the same time is

_ Gt Gt Gt Gt . .
0,102 - W1 E Gt] + w2 E 012 where WI and w2 are the cooperatrvrty terms after

E

binding. The efﬁciency of one activator set and two repressors are expressed in the

following way; each term representing one state of the all possible states:

113

,E ,E .E .E .E .E l
A th Gt2 Ath A012 thGt2 («thth2

_ Gt Gt
‘ IEN’EA’Eth ’EGt2 ’EA + Eth ’EA + 50:2 "’1 Eth + “’2 50:2:

Gt Gt
EA + ”’1 Eth + ”’2 EGtZ]

E=[EN,E

Expression contributions from each state are added to obtain the total expression:

EszFiT(Ei)
1'

Gt =0, EGt =0 and

If we set the following simple assumptions E N = 0, E A =10, E
1 2

T(x) = ——15-——x— the total expression of the enhancer with 1 activator set and 2 repressor
l+e _

binding sites can be written as:

‘ 2
x z SAIAJ x 1+ (2-ql -qZ)SRIGt]+ C(l-ql)(1_q2)(5R[0,D
”SAM! 1+25RthI+C(SR[Gt])2

 

Expression functions (Ex) for all cases are shown in Appendix A Table 1. Further

details about the model are explained in the supplementary material (Appendix A).

Modeling endogenous enhancer sequences

We made the following assumptions to simplify the parameter estimation for
modeling of the rho NEE: ( 1) we model activity of the NEE in the mesoderrn, where
Dorsal and Twist levels are high, and Snail is present at uniform levels. We used values

for expression contribution of Dorsal and Twist as +5 each, and for Snail, -5. We also

114

carried out parameter estimation with values of +3 or +7 for activators and -3 or -7 for
Snail, and obtained essentially equivalent results. (2) We set quenching parameters to
those obtained from our modeling, as shown in Figure III-4, on the reasonable
assumption that these are functionally equivalent among short-range repressors, (3) To
reduce the number of possible parameters, we only included cooperative interactions
between factors that are nearest neighbors, and are located within 25 bp of each other. (4)
We allow that the relative effectiveness of repression with four Snail sites might be
higher than that seen with one or two, and stipulate ranges of repression in which
parameter space is investigated (Figure 111-8). (5) We set ranges for cooperativity and
scaling factors from 1-100. (6) For each transcription factor, we took the score of the
strongest site among all those that bind that transcription factor as a free parameter and
constrain the other values by treating the PWM score as a free energy of binding (Storrno,
2000). We used PWMs created from F lyReg database by Daniel A. Pollard, which are
available at: htt://www.ﬂyreg.org/. As an example, the two Twist sites differ considerably
in terms of their match to a consensus PWM, with Twist site #2 predicted to have a forty
seven-fold lower score than Twist site #1, although it still has a considerably higher score

than background sequences.

RESULTS

Gene modules
We set out to map regulatory surfaces of genes controlled by short-range
repressors; these surfaces show the functional relationship of activator/repressor input

and gene expression output (Figure III-l). Such regulatory surfaces reﬂect evolutionary

115

forces that shape gene output, as demonstrated for the lac operon (Setty et al, 2003;
Mayo et al, 2006). The design of the enhancers responding to short-range repressors
accommodates sensitive distance and binding site parameters within a ﬂexible design
framework (Clyde et al, 2003; Kulkarni & Amosti, 2005).

The output of a model of a particular conﬁguration of transcription factor binding
sites should lead to a regulatory surface that allows mapping of known values of
regulatory factors, such as Dorsal and Twist activator protein levels, and Giant repressor
protein levels, through this surface to produce an expected regulatory outcome (Figure
III-1).

To carry out this scheme on a practical level, we created a series of genes to test
in a systematic fashion the effect of parameters affecting repression. The quantitative
measurement of these genes was used to create a database suitable for quantitative
modeling, identiﬁcation of parameters related to repressor activity, and analysis of
endogenous regulatory elements (Figure III-1E). We used endogenous activators and
repressors that are active in the blastoderm embryo. A convenient juxtaposition of
anterior-posteriorly expressed repressor proteins Giant, Kriippel or Knirps are
superimposed on the patterns derived from activators working on the dorsal-ventral axis
to generate readouts as shown in Figure III-1. This design permits the simultaneous
monitoring of repressed and unrepressed states in a single embryo. Twenty-seven P-
element based genes d into the Drosophila gerrnline to produce stably integrated lacZ
reporters. We tested multiple lines for each; position effects had some effect on overall
expression levels, but not on relative repression effectiveness. As described below,

activator signals are normalized before parameter estimation and modeling, removing this

116

Figure III-l: Transformation of DNA sequence and protein information by gene
modeling. (A) An enhancer with three repressors (red squares) and four activators (green
circles) is modeled, to generate the gene expression surface shown in (B). The axes
represent normalized activator, repressor and gene activity levels. (C) A Drosophila
embryo with Giant repressor (red stripes) and Dorsal activator (green) staining is shown.
Each embryo provides a diversity of potential inputs to the regulatory element: the white
arrow points to a region where activator levels are high and repressor levels are low. The
black arrow points to a region where both activator and repressor levels are low. The
white triangle points to a region where activator and repressor levels are both high, and
the black triangle points to a region where repressor levels are high and activator levels
are low. (D) Output of regulatory element shown in (A), which mirrors values from (C)
being mapped through surface shown in (B). (E) Formal scheme of data collection,

analysis and modeling.

117

Gene Activity

Repressor

118

 

Activator

 

Continued.

Figure III-1

 

119

Figure III-1: Continued.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Synthetic Gene
Construction and Model Construction
Data Collection
Imaging and Data Evaluating Parameter
Analysis Estimation Technique
l 1 i l
I Parameter Estimation l
Modelin ,
._ Endogenogus Modeling Alternative

 

 

Enhancers 399168803

 

 

 

 

120

potential source of variability. Based on previous studies, we knew that spacing between
activators and repressors would be a critical element to model, thus a series of genes (1-8,
Figure III-2) tested variable distances between Giant repressor binding sites and the
nearest Twist activator sites. As revealed by conventional in situ staining, repression
effectiveness was markedly attenuated by this increase in spacing. Genes for which the
most proximal binding site for Giant was located at least 81 bp from the nearest Twist
site failed to show any repression (genes 6-8, Figure III-2). A gene containing a single
Giant binding site adjacent to the Twist activators was weakly repressed, consistent with
earlier reports (Hewitt et al, 1999), and this repression was also found to be distance-
dependent (genes 9, 16).

Increasing the number of binding sites to three (genes 10, 17, 18) appeared to
generate an especially effective repression context, one that was similarly susceptible to
distance effects; at this level of resolution, it was not clear whether the distance function
is appreciably different with different numbers of repressors. We also tested the effect of
arranging the repressors in a distinct pattern so that some sites were located 3’ of the
activator cluster, adjacent to the Dorsal activator sites. In this way, we were able to test
whether overall stoichiometry of repressors to activators was the sole determinant of
repression effectiveness when binding sites are close to the activators. We noted that
different distributions of two or three sites appeared to yield similar results, whether all
sites were located 5’ of the activator cluster, proximal to the Twist activator sites, or with
some of the Giant repressor sites located 3’ of the activator cluster, adjacent to Dorsal
(genes 12, l4, 19). Insertion of a 340 bp neutral spacer sequence between the

transcription factor cluster and the basal promoter did not change the pattern of gene

121

Figure III-2: Structures of genes assayed to determine context dependence of short-
range repressor activity, and representative in situ images showing lacZ activity. Mid

blastoderm embryos are oriented dorsal up, anterior to the left.

122

. 2Gt.2Tw.ZDI

. 2Gt.25.2Tw.ZD|

. 2Gt.35.2Tw.ZD|

. 2Gt.50.2Tw.ZD|

. 2Gt.60.2Tw.ZD|

. 2Gt.75.2Tw.2D|

. 2Gt.100.2Tw.2Dl

. 2Gt.125.2Tw.ZDI

. 1G1.2TW.ZDI

. 3Gt.2Tw.ZD|

. 2DI.2TW.ZGt.2Tw.ZDI

. 2Gt.2Tw.2D|.1Gt

. 2Gt.2Tw.2Dl.1Gt.34O

1Gt.2Tw.ZD|.1Gt

123

r, r;
l

1

(fit t”

/

( l
l

I

Figure III-2: Continued.

15. 1Gt.2Tw.2D|.1Gt.340
16. 1Gt.25.2Tw.ZDl
17. 3Gt.50.2Tw.2D|
18. 3Gt.75.2Tw.ZD|
19. 1Gt.50.2Tw.2D.1Gt
20. 2Gt(af).2Tw.2Dl
21. 1Kr.2Tw.2D|

22. 2Kr.2Tw.2D|

23. 3Kr.2Tw.2D|

24. 1Kni.2Tw.2D|

25. 2Kni.2Tw.2D|

26. 3Kni.2Tw.2Dl

27. 2Dl.2Tw.2Kni.2Tw.2Dl

124

expression, suggesting that the repressor is not acting directly on the basal promoter in
this context (genes 12 vs. 13; 14 vs. 15). Most blastoderm enhancers characterized for
these regulatory proteins are located some distance from transcriptional start sites, thus
the distance independence of these modules mimics the activity of endogenous
enhancers. We furthermore tested the effect of increasing the number of activators
located in the vicinity of the repressors (genes 11, 27) and found that repression
effectiveness was little compromised in the case of Giant, but appeared to be attenuated
in the case of the weaker Knirps repression. Weaker binding sites for Giant produced
attenuated repression, as expected (gene 20). Finally, a series of genes with increasing
numbers of binding sites for Knirps and Kriippel allowed for direct comparison of
repressor effectiveness and effects of stoichiometry (genes 21-26); as noted for Giant,
more sites were generally more effective, but overall repression effectiveness of Knirps
was lower. This difference may be attributed to weaker binding sites, lower absolute
levels of the protein, or protein activity, as discussed below. The quantitative analysis of

these genes was followed by quantitative measurements, described below.

Image Processing and Data Analysis

To simplify modeling, we initially restricted our measurements to the regions of
the embryos containing peak levels of the Dorsal and Twist activators, which were
identiﬁed as ventral regions expressing >60% of peak lacZ levels. To identify gene
responses to varying repressor levels, we generated correlated Giant protein/lacZ mRNA
plots (Figure III-3). This step involved a series of image processing procedures, as

described in Ay et al (2008). We ﬁrst identiﬁed and subtracted non speciﬁc signals

125

(“outliers”) observed in the Giant channel, then identiﬁed and subtracted background
from each embryo. Background intensities for the lacZ and Giant channels were
subtracted using average values from regions lacking activators and repressors. Next, we
normalized the Giant channel for similarly aged and oriented embryos. The lacZ channel
was normalized using the average signal in a region deﬁned by 50-60% egg length of the
embryo (anterior-posterior) and peak to 60% of the peak (dorsal-ventral).

Our data set comprises expression data from 20 lacZ reporter genes regulated by
Giant, 3 lacZ reporter genes regulated by Kruppel and 4 lacZ reporter gene constructs
regulated by Knirps. Over 900 blastoderm embryos were quantiﬁed to aid in
pararneterization of repressor and lacZ expression. Images were processed as described
above and [lacZ] vs. [repressor] (Giant, Kri'rppel or Knirps) plots were created. The
relative levels of gene expression as a function of repressor protein were plotted for
individual images and compiled into composite plots (Figure III-3) (Ay et al,2008).
These plots were used to infer cis-regulatory rules by fractional occupancy models as

described below.

Fractional Occupancy Modeling

Fractional occupancy models of transcriptional regulatory regions enumerate all
possible ‘states’ of an enhancer based on potential transcription factor-DNA interactions,
and then calculate the probability of a gene ﬁring as the fraction of the ‘successful’ states,
i.e. those with activators bound, and without excessive interference by repressors (Bintu
et al, 2005a; Janssens et al, 2006; Zinzen et al, 2006; Segal et al, 2008). To capture the

key role of short-range repressors on activator elements, we used a modiﬁed fractional

126

site occupancy model that explicitly accounts for distances between activators and short-
range repressors, as well as cooperativity and binding affinity of short-range repressors.
We allow for change in repression with distance but make no a priori assumptions about
how the repression efﬁciency changes.

For a general description of our model, we employ three parameter types: S R’ a

repressor scaling factor, indicating the potency of the repressor, C, representing

cooperativity between repressor proteins binding to sites that are close together, and q ,

representing the distance-dependent “quenching” efﬁciency of the short-range repressors.
In genes assayed here, the activator binding sites do not vary; therefore additional
parameters representing activator potency or binding cooperativity are not required. A
more sophisticated general model incorporating these features is described below for
endogenous sequences.

To apply this model to one of our genes, ZGt.2Tw.2Dl (gene 1), we express

normalized activator and Giant repressor concentrations respectively as [A] and [Gt],

activator and Giant repressor scaling factors as S A and S R (which represent binding

afﬁnity and concentration scaling combined into one scaling factor) (1 S S A’ R S 100),
and cooperativity between Giant repressor proteins for binding to DNA as C

(0.1 S C S 100). Quenching, the distance-dependent repression efﬁciency, is represented

by ql and quor the two Giant repressors in this gene(0Sql,q2 S1). As derived in

Materials and Methods, the expression of this gene when fully bound by activators and

repressors will be:

127

Figure III-3: Representative [lacZ] vs. [Gt] plots.

(A) Structures of three genes assayed (1, 9 and 10). (B-C) Representative embryos
imaged for Giant protein and lacZ reporter gene activity. (D) The data from multiple
confocal embryo images was processed and compiled to provide normalized reporter

gene [lacZ] vs. normalized repressor [Gt].

128

_.

F

_.

[zoen ‘

 

 

Q (\l

 

N

[20911

N

[20911 ’

 

 

 

129

 

 

z SAIA] x1+(2—q1—q2)SR[Gt]+C(l—qlxl'qzstlG’llz
1+ SAIAI 1+ ZSR[Gt] + C(SRIGIDZ

Comparable expressions are generated for each of the genes (Appendix A Table 1).

Parameter Estimation

Parameter estimation is a critical step in implementation of modeling. Here we
infer relationships of regulatory factors from the transcriptional output of reporter genes,
which involves solving an inverse problem that may not have a unique solution. For this
reason, no parameter estimation technique works well for all problems, so the choice of
the parameter estimation technique is critical. In the transcriptional modeling literature,
however, this facet has not been explicitly treated by modelers, and the choice of the
parameter estimation technique is often not validated (Janssens et al, 2006; Zinzen et al,
2006; Segal et al, 2008). We tested three popular global parameter estimation techniques,
evolutionary strategy (Runarsson & Yao, 2005), genetic algorithm (The MathWorks,
Inc.) and simulated annealing (The MathWorks, Inc.) and found that for our model the

evolutionary strategy produced superior results as described in Appendix A Figures 1-4.

Testing/Implementing Nine Forms of the Model

To analyze the quantitative data obtained from the embryos, we built nine forms
of the model featuring increasing complexity in terms of number of parameters used; the
models differ in their treatment of cooperativity and quenching distance. In the simpler
case, a single parameter represents cooperativity between adjacent Giant repressor

binding sites, as well as the interaction of all three sites involved in genes 10, 17 and 18.

130

Alternatively, we also employed a more complex treatment in which adjacent sites are ﬁt

to Cl and sites separated by intervening Giant sites are ﬁt to C2. Similarly, quenching

efﬁciency parameters of repressors can be deﬁned either as unique parameters for each
distance or as parameters for a range of distances, as described in Materials and Methods.

We show a pictorial description of the parameter assignments for scheme 2, a
simpler form, in Table 111-1. Appendix A Table II provides a pictorial description of the
parameter assignments for all schemes.

We compared the nine schemes as explained in model validation section below.
As judged by the error comparison, schemes 1-4, 8 and 9 work better than schemes 5-7 in
this data set, probably due to the smaller number of parameters (Appendix A Figure 6).
Here for further analysis we showed the results of scheme 2. The results of the schemes
1, 3-6, 8 and 9 were comparable, suggesting that conclusions drawn from scheme 2 are

representative (Appendix A Figure 5).

Model Predictions

Previously identiﬁed qualitative relationships about quenching and
cooperativity/activity provide the backdrop for this work; the quantitative relationships
presented here constitute the heart of this study, obtained after modeling our quantitative
data set. It was striking that certain qualitative and quantitative insights became apparent
only after analysis of the complete data set; these were not relationships that would
necessarily be evident by inspection of individually stained embryos in Figure III-2. First,
our model predicts rather modest levels of Giant-Giant cooperativity, greater than simply

additive but lower than previous estimates (Figure III-4A) (Segal et al, 2008).

131

Second, previous qualitative observations show that the effect of short-range
repressors decreases with distance, and is lost around 100-150bp. To our knowledge, our
study is the ﬁrst that analyzes distance dependency of the short-range repressors
systematically. Short-range repressor quenching efﬁciency is represented by several
parameters in the model as described previously.

We noted a general decrease in quenching efficiency with distance, consistent
with previous qualitative observations, but at (52-55) bp, relative efﬁciency is predicted
to increase, before dropping off with greater distance (Figure III-4B). This trend was
evident for multiple formulations of the model (Appendix A Figure 5), and persisted
when we carried out parameter estimation with subsets of the data (see below), indicating
that the non-monotonic behavior reﬂects a real biochemical property of the Giant
repressor. The change in this monotonic behavior may be a reﬂection of speciﬁc phasing
effects, perhaps relating to nucleosomal structure. The non-monotonic decline in
repression effectiveness was an unexpected result of our modeling and contrary to the
simple step functions or linear functions used in previous modeling efforts (Janssens et
al, 2006; Zinzen et al, 2006). Note that the reduction in repression efﬁciency at ~30bp
does not imply that gene 3 (2Gt.35.2Tw.2Dl) should have weak repression, because this
gene has an additional more distal binding site that also contributes to activity through

quenching and cooperativity.

Third, the repressor quenching efﬁciency parameters are similar whether the
repressor was located adjacent to the Twist or to the Dorsal activator site, which suggests
that short-range repressors have similar effects on different activators (Figure 111-48).

The short-range repression mechanism appears to involve chromatin modiﬁcation, which

132

 

Figure III-4: Parameters found by the ES parameter estimation technique for

scheme 2 of the model.

(A) Root mean square error, E, is shown on the left, with corresponding scale shown on

the left axis. Repressor scaling factor R (referred to as S R in fractional occupancy model

in Materials and Methods) and cooperativity C are shown in the central and right portions
respectively, with scale shown on the right axis. (B) Quenching efﬁciency parameters are
shown for increasing distances of repressors located 5’ of the activators on the left.
Quenching efficiency levels relative to Twist proximal (T) sites and Dorsal proximal (D)
sites are shown in the right panel. A non-monotonic decrease in quenching efficiency for

increasing distances is observed.

A B

 

 

 

0.2; - 30 1
0.16: 24 0.3-
0.12} 5.18 0.61 - i

i o I f. 0 O . ’
0.08i "-12 0.4:

l i 5 ’
0.04! . . *6 0.2. .

2 0 ;

E R or C2 0 20 4o 60 80 TW DI

bp

133

may allow for more promiscuous action on many types of transcription factors, rather
than a mechanism based on speciﬁc contacts between repressor and activator (Li Li,
unpublished data). This activator insensitivity is consistent with the action of short—range
repressors on a range of enhancers that bind diverse transcriptional enhancers (Gray et al,
1994; Kulkarni & Amosti, 2005). Parameters identiﬁed in this study are therefore likely
to be generally applicable to diverse settings.

We tested whether the nonlinear quenching is critical to obtaining reasonable
parameters by repeating our procedure with a constraint that required a monotonic
decrease for quenching efﬁciency. As shown in Figure III-5, this constraint produced
parameter sets that predicted repressor quenching efﬁciency would remain almost
constant between 6bp-77bp, which is not supported by this or previous studies. For
example, the 35bp increase in spacing between gene 2 and gene 5 has a measurable
effect. Therefore the non-monotonic decrease in quenching efﬁciency is likely to indicate
some actual biological property of the repressors and should be validated experimentally.

The recent fractional occupancy modeling of 44 endogenous Drosophila
enhancers identiﬁed potential cooperativity values that were somewhat greater than those
found here. We ran our parameter estimation algorithm with ﬁxed Giant cooperativity
values found in Segal et al (2008) and estimated the remaining parameter values in our
model. We observed that although the main conclusions of our study did not change, the
overall ﬁtting was slightly worse (Figure III-6). We extended this analysis by running our
parameter estimation algorithm with eight more choices of Giant cooperativity values.

Although we tested Giant cooperativity values ranging from 0 to 30, the root mean square

134

Table III-l: Parameter descriptions for Scheme 2.
In the ﬁrst column, 12 synthetic enhancers used for parameter estimation in this study are
listed. In the second column, parameter selections are shown. In the third column

structure of the synthetic enhancers are depicted.

135

 

Parameter

 

 

 

 

 

 

 

 

 

Gene Assignments Gene Structure

1. 2Gt.2Tw.ZDI Q1 =q1

QZiZ
2. 2Gt.25.2Tw.2Dl or=q2 Ct Q1

02=q3 Ar '\
3. 2Gt.35.2Tw.2Dl Q1 =q2
4. 2Gt.50.2Tw.2DI or =q3 02

02=q5
5. 2Gt.60.2Tw.2D| Q1 =q4

02:0
9. 1Gt.2Tw.2Dl Q1 =q1 Q1

_InmE“

16. 1Gt.25.2Tw.2Dl Q1=q2

QT =q1
10. 3Gt.2Tw.2Dl 02=q2

03=q3 C1 033

CM, m
17. 3Gt.50.2Tw.2Dl 02=q5

03:0 02

 

Q1=q1 C1 01 Q3

12. 2Gt.2Tw.ZDl.1 Gt Q2=q2 W’V‘
03=q6

 

v
Q2
Qt =q1
14. 1Gt.2Tw.2D|.1Gt Q2=q6
01 Q2

 

19. 1Gt.50.2Tw.2DI.1Gt Q2=q6

 

 

 

 

 

136

Figure III-5: Parameters for scheme 2 with the constraint that quenching efficiency

parameters decrease monotonically.

(A) Root mean square error E, repressor scaling factor R, and cooperativity C labeled as
in Figure 111-4. (E) Quenching efficiency parameters and relative quenching of Dorsal
and Twist sites. Under this constraint, the level of quenching efficiency changes very

little from 28-66bp, in contrast to observed trends (Figure III-2).

A B

 

 

 

0.2; > . 1 30 1
0.16; 24 0.3-
0.12i 18 0.6 ' .
I .o o .
0.08; . '12 0.4: o o o
F O
0.04i . ° 6 0.2.
l o
0 E R 01 C2 0 0 20 4o”‘e‘o' a‘o‘m‘ or

bp

137

Figure III-6: Parameters for scheme 2 with cooperativity parameters set to different
levels. (A, B) Parameters found in our study (circles) and parameters found by constraint
of cooperativity parameters to those from in Segal et al (2008) (diamonds). The increased
cooperativity value is compensated by a decreased repressor scaling factor R. (C) Root
mean square errors (RMSE) for cooperativity parameters (constrained to values between
0-30). Estimated cooperativity values from our model lie near the lowest point in this

curve.

138

 

 

 

 

 

 

 

 

 

 

 

0.2 30
0.16- 24
0.12- -18
0.08- I 12
0.04' 0 ’ g ‘6

O E F1 C1 C2 0
B

1
on
0.6- ‘ ~

0 . . 0
0.1
9

0.2 .

O r r i . i L

o 20 4o 60 80 Tle

bp

139

Figure III-6: Continued.

C

o

RMSE

o“.
o

 

 

”36 i a 1zieio21121132
Cooperativity

140

 

errors between predicted and observed values did not change drastically, with minimum

at cooperativity value 3.

Model Validation

The analysis described above involved identifying parameters using all data
available. An important question is whether such values are “over ﬁt”, and whether the
model and parameter estimation technique are robust, i.e. relatively insensitive to
contributions of individual portions of the data set. Robustness of the parameter
estimation technique was described above; here we
assess the model’s effectiveness at predicting subsets of the data. We tested whether
parameter estimation was markedly affected by removal of individual genes from the data
set (“leave-one-out” analysis) (Figure III-7A). We employed nine different forms of the
model to evaluate the effects including different assumptions of cooperativity and
quenching. We calculated the average of twelve leave-one-out prediction root mean
square errors for each scheme, and used these error values for comparison of schemes
(Pizarro et al, 2000). As judged by the error comparison, schemes 1-4, 8 and 9 work
better than schemes 5-7 in this data set, probably due to the smaller number of parameters
(Appendix A Figure 6). Leave-one-out analysis was extended by excluding nine separate,
speciﬁc groups of genes that share structural properties (Figure III-7B). The sets used for
this analysis are described in Table 111-2. The results of excluding individual genes or sets
of related genes suggest that genes that depend on fewer parameters, such as
th.2Tw.2Dl (gene 9), which has no contribution by repressor-repressor cooperativity,

may not be predicted well in our analysis. Thus, the contributions of certain classes of

141

 

gene can be great. Parameters found by leave-one-out analysis did not change much, but
the parameters found by leaving out speciﬁc sets of genes changed depending on the
genes chosen (Figure III-7A and B). The predictions for genes 1, 10 and 12 by the
parameters estimated from set 8, which excludes genes 1, 10 and 12, are shown in Figure
III-7C. We conclude that the set of gene modules tested here adequately sample enhancer
design to identify critical elements for repressor activity in a robust manner.

Each embryo, with its thousands of imaged nuclei representing different
levels of transcription factors, provides a matrix of input and output values that
should in theory suffice to describe the response of a gene construct. However,
variations in embryo age, staining, and orientation necessitate multiple images for
each gene. We obtained between 30 and 53 good quality images for each gene used
in our parameter estimation. To test whether this data set is sufficient, or additional
individual images would signiﬁcantly change the conclusions reached, we sampled
randomly 50% or 75% of the images from each reporter gene construct, and
repeated the parameter estimation. Reducing the data set by one quarter or even
one half does not change the value of estimated parameters drastically or the main
conclusions of the paper (Figure III-7D). This result suggests that our data set is
sufﬁciently complete, allowing us to draw signiﬁcant conclusions. In contrast, as
shown above, decreasing the number of genes rather than just the number of images

obtained for each gene, can affect our results drastically (Figure III-7A and B).

Extension of the model to other repressors and endogenous regulatory elements

142

Our modeling focused on repression mediated by Giant, which possesses
quenching properties similar to those of Snail, Kruppel, and Knirps (Gray et al, 1994;
Hewitt et al, 1999; Kulkarni & Arnosti 2005). To extend these ﬁndings to other short-
range repressors, Kruppel and Knirps were tested in parallel genes containing one, two,
or three binding sites (genes 21-26). As was evident from qualitative staining, both
proteins mediated repression, but Kruppel appeared to be a more effective repressor in
terms of completeness of reduction of lacZ activity. We measured Knirps or Kruppel
protein levels with antibodies as was done with Giant, and created [lacZ] vs. [repressor]
plots for parameter ﬁtting. The limited number of genes tested for these factors did not
exhaustively explore possible architectural features, thus making it difﬁcult to
differentiate effects of spacing, cooperativity, and relative activity. We judged distance
parameters most likely to be conserved between these different factors, based on
previously tested genes; therefore the modeling was carried out using quenching
parameters from Giant (Gray et al, 1994; Amosti et al, 1996a). Modeling was performed
to identify likely scaling factors and cooperativity constants. Using the same form of the
model used for Figure III-4, we found that cooperativity parameters were low (e.g.
Krtippel = 2; Knirps = 0.67), similar to those observed for Giant (Figure A-S; Table A-
V). The major difference between Krtippel and Knirps was the repressor scaling factor,
which was low in the case of Knirps (~l .4), and more robust for Krﬁppel (~30), similar to
that of Giant (~14). Differences in repression efﬁciency may be attributed to distinct
levels of cooperativity, but the model suggests that such homotypic interactions are of
minor importance. This prediction suggests that the higher effectiveness of Kriippel is

likely to due to greater potency of this protein on a molar basis, a more complete

143

occupancy of the binding sites due to their higher affinity, or higher concentrations of the
repressor. Further analysis will be required to separate these effects.

Dorsal and Twist activators were studied previously in the context of the
rhomboid (rho) neuroectoderrnal enhancer (NEE), where their activity was used to
identify properties of short-range repressors, including Snail. This protein is required to
block expression of rho in the mesoderm, resulting in two lateral stripes of expression in
the presumptive neuroectoderrn of the blastoderm embryo (Figure III-8A). Four Snail
binding sites are located within the 330 bp minimal NEE enhancer, and loss of these sites
strongly attenuates repression, permitting expression in the mesoderm (Gray et al, 1994).
A single Snail site (#2) is sufﬁcient to mediate repression, and similar repression is
effected by ectopic Snail, Kriippel, or Knirps sites introduced 5’ and 3’ of the Dorsal l
and 4 sites respectively, or even a single Snail site 3’ of the Dorsal 4 site (Figure III-8A)
(Gray et al, 1994; Amosti et al, 1996a).

As an extension of our analysis, we tested quenching parameters produced from
our model on this element, and carried out parameter estimation to determine values of
cooperativity and scaling factors. This modeling is more complex than that employed for
genes in Figure III-2, because we now consider scaling factors for each transcription
factor, not just for the repressor, and binding sites of different qualities are considered.
Position weight matrix (PWM) information was used to score Dorsal, Twist, and Snail
sites within the rho NEE. In addition, we consider cooperativity not just between
repressor sites, but also between activator sites, both of heterotypic (Dorsal-Twist) and

homotypic (Twist-Twist) nature. A further consideration is that information about these

144

Table III-2: Functionally grouped sets of gene constructs, used for leave-sets-out

analysis shown in Figure III-7B.

 

Set Excluded Genes
#

 

1 Genes with one or three Giant binding sites (9, 10, 12, 16 and
17)

 

2 Stoichiometry Genes (1, 9 and 10)

 

3 Genes with adjacent Giant binding sites in both 5’ and 3’ end of
activators (12, 14 and 19)

 

4 Genes with only one Giant binding site (9 and 16)

 

5 Genes with exactly three Giant binding sites (10, 12 and 17)

 

6 Genes with one Giant binding site at 5’ end of activators (9, 14,
16 and 19)

 

7 Genes with one Giant binding site adjacent to the 5’ end of
activators (9 and 14)

 

8 Genes with at least two Giant binding sites adjacent to the 5’ end
of activators (1, 10 and 12)

 

9 Genes with three Giant binding sites adjacent to the 5’ end of
activators (10 and 17)

 

 

 

 

145

Figure III-7: Validation of modeling by prediction of subsets of the data from

parameters derived from the remainder of the data.

(A) Leave-one-out analysis. Root mean square errors are calculated using parameters
found by 11 genes excepting the genes indicated, and all the genes. Relative RMSE
ratios, indicating greater errors for prediction of genes 2, 9 and 16, indicating their greater
contribution to the parameter constraints. (B) Leave-sets-out analysis for nine distinct sets
of genes deﬁned by their shared properties (Table 111-2). Root mean square errors are
calculated using parameters found from the reduced set and the entire set. Relative RMSE
ratios, indicating greater errors for prediction of sets 1, 2 and 4, indicating their greater
contribution to the parameter constraints. (C) Predictions for leaving out set 8. Gene 1, 10
and 12 are predicted by using parameters found from other 9 genes. Points represent
average values for [lacZ] vs. [Gt] data which was divided into 20 bins. (D) Parameter
estimation results are shown for different amounts of data 50%, 75% and 100%. The data

is cut randomly from each gene at the same percentage.

146

 

8*: _._--,s_q. ,__S. -_ﬁ._2__-,_,,21._,__--_- _Mm.--
.
.

.z. . .L.li ll iiill..lrl

4
03mm

 

3 2

mmim

1

w

m
L9
.1

_
2.7
.1
3
1
4
1
2
1
0
1

9
Construct

4

-_._ _ —_ “vama—Jm Wm-m~_.-..._ -.- ._- __.-

ll 7111123

 

4
03mm

tl.fri..iol.1.. I

3 :1

mmim

147

Figure 111-7: Continued.

 

 

C
Construct 1
1
KI'
0
g 0.5 ,
0.5
[Gt]
Construct 1 0
1.
a: 0.5
0.5
£th
Construct 12
1
ca 0.5

148

Figure III-7: Continued.

 

f

I
' I
I
i
F
I

.0

o
1

v

o I t
. 1

1

I
l
.

 

WE“ R

 

0 "W56 " 40 so

149

 

.. .. __ _..L_.__.____.__..A-- _ .-..

.I

-AM%~MM~30

24

18

 

312

 

 

c1

80

 

O
”#0 on

fw r51

rho NEE variants is qualitative; a single Snail site can repress but may not be as effective
as four Snail binding sites.

Simultaneous parameter estimation was carried out using the forms of the rho
NEE shown in Figure III-8A. We estimated levels of mesoderrnal repression to be greater
than 90% for the endogenous gene, 70-90% for genes carrying one Snail #2 binding site
or two ectopic binding sites located 5’ and 3’ of the element, and 50-70% for one Snail
site located 3’ of Dorsal #4. Evolutionary strategy parameter estimation was performed
multiple times to identify parameters for cooperativity and scaling factors, as well as the
predicted effect on expression within ranges speciﬁed above. Several striking outcomes
were evident from this exercise; ﬁrst, to ﬁnd optimal values, the model consistently
predicts that the wild-type rho NEE, containing four Snail sites, will have output at the
lowest end of the allowed range, close to zero, while the internal Snail site #2, or the two
ectopic ﬂanking Snail sites, generate values close to the'bottom of the allowed range, at
about 10% residual activity (Figure III-8A). The single ectopic Snail site 3’ of Dorsal #4
is predicted to mediate repression in the middle of the allowable range, about 40%
residual activity, consistent with published images (Gray et al, 1994). The scaling factor
for Dorsal (i.e. its overall activity) is considerably lower than that predicted for Twist,
while the scaling factor for Snail is similar to those of Kruppel and Giant (Figure III-8B).
Dorsal-Twist cooperativity values vary considerably, with DorsalZ—Twistl cooperativity
predicted to be lower than Twist2-Dorsal3, consistent with the closer spacing of the latter
two factors (Crocker et al, 2008). Twist-Twist cooperativity is also predicted to be high.
These relative differences in activator scaling factors and cooperativity values support

known features of the rho NEE; the low scaling factors for Dorsal sites are consistent

150

Figure III-8: Extension of the model to endogenous regulatory elements.

(A) The rhomboid gene is expressed in the blastoderm embryo in two lateral stripes (one
shown in focal plane), under control of the Dorsal and Twist activators. Ventral
expression is inhibited by the Snail short-range repressor, which is expressed in the
presumptive mesoderm. The cis-regulatory modules used for analysis are shown.
Different forms of rhomboid NEE enhancer are depicted, with varying number and
arrangements of Snail short-range repressor binding sites. Dorsal and Twist activators are
shown by large and small green circles respectively, and Snail repressors are shown by
red squares. On the right are the predicted repression levels caused by Snail binding sites
shown in each module based on parameter estimation using this group of enhancers. (B)
Predicted parameters for sealing factors for each transcription factor and cooperativity.

Average and standard deviation for twenty estimation runs are shown.

151

 

Scaling Factors
Dorsal: 1.2 :t 0.13
Twist : 75 i 18
Snail :54 i 6.5

152

MH

Cooperativity
Dorsal2-Twist1 :7 1- 1.3
Dorsal3-Twist2 : 74 :1: 16
Twist1-Twist2 :69 i 22
SnailZ-SnaiB :65 i 20

Mesodennal
Evasion

0.16%:0.01%

119610.996

11%iO.9%

41%il%

with the inability of individual Dorsal sites to mediate robust activation (Ip et al, 1992) ;
but in combination with Twist sites they add considerably to the output of the enhancer.
A single repressor binding site that is not close to most of the activator sites would in this
model still be able to impair enhancer function by initiating a chain-reaction collapse of
cooperative interactions. The native rho NEE does not appear to rely solely on this
mechanism, as most of the identiﬁed activator sites lie within a short distance of one of
the four Snail sites, suggesting a “belt-and-suspenders” redundant approach to repression.
It will be interesting to survey the entire set of enhancers targeted by short-range

repressors to determine if this feature is consistently observed in most elements.

DISCUSSION

In the last twenty years, essential features of the biochemistry of gene regulation
have come into focus, serving to highlight the considerable complexity and multifarious
activities of such cis-regulatory elements. We still lack a comprehensive picture of how a
transcriptional enhancer operates however. Quantitative models, based on aspects of the
system that are readily quantiﬁable, such as primary DNA structure of a regulatory
region, quantities of regulatory proteins, and transcript levels, appear to offer an
alternative route to learn about key features of regulatory systems. When combined with
biochemical and genomic information, such models may provide the “bridge” that will
allow deeper understanding of the function and evolution of cis regulatory elements,
which are the nexus of many biological processes.

In this study, by employing a reductionist analysis of short-range repression, we

sought to explore a relatively untouched, yet central aspect of gene regulation in

153

 

Drosophila. Previous qualitative studies highlighted the extreme distance-dependence of
short-range repressors, and comparative analysis has shown many instances of
evolutionary plasticity of regulatory regions controlled by these proteins (Gray et al,
1994; Ludwig & Kreitman, 1995; Hewitt et al, 1999; Hare et al, 2008). Knowing that
transcription factors inﬂuence each other in a local fashion permitted the identiﬁcation of
novel enhancers, based on the clustering of binding sites (Berman et al, 2002; Schroeder
et al, 2004). Yet these studies alone do not provide the basis for predicting evolutionary
changes that reshape transcriptional output, or predicting comparative activity of
enhancers controlled by similar groups of transcription factors. For example, the original
hypothesis that the affinity and or number of Bicoid binding sites dictates the output of
regulated genes has been replaced by an understanding that other, as—yet unknown
features, appear to play more decisive roles (Driever et al, 1989; Gao et al, 1996; Ochoa-
Espinosa et al, 2009).

Previous modeling studies focused on endogenous enhancers, which have
complex arrangements of transcription factor binding sites, thus even relatively subtle
changes made to these elements potentially inﬂuence a number of factors. This
complexity required an oversimpliﬁed treatment of quenching (arbitrary assumption of
linear decreases) and cooperativity (only homotypic interactions considered) (Janssens et
al, 2006; Segal et al, 2008). Alternatively, one study considered simpliﬁed regulatory
elements, but only modeled them in silico, without the experimental treatment we use
here (Zinzen et al, 2006). It is not clear that the earlier studies are sufﬁciently robust to
predict why extensively reshufﬂed enhancer sequences may retain similar function in

some instances, or show quantitatively distinct outputs in others (Ludwig et a1, 2005;

154

 

Crocker et al, 2008). Our studies are by their design shaped to detect quantitative
differences resulting from minor differences in repressor-activator spacing, for example,
and should be useful for such evolutionary studies. We limited the number of features
that differed between elements, allowing modeling with a tractable number of
parameters. We utilized a common block of Dorsal and Twist activator sites, allowing us
to focus on changes made in the number and arrangement of repressor sites; clearly,
differences in affinity, number and arrangement of activator sites also play decisive roles
in dictating transcriptional output, thus future modeling efforts will need to integrate
these elements as well. The tight focus on short-range repressors with the analysis of a
relatively small number of reporter genes provided sufficient data for robust estimation of
key parameters (Figure III—7). From our comparison of repression by other short-range
repressors, it is likely that the analysis of Giant can guide studies of other similarly-acting
repressors, including Kruppel, Knirps and Snail (Figure III-8).

Relating to transcriptional grammar, our study uncovered speciﬁc quantitative
features that appear to apply to short-range repressors in a general context. We identiﬁed
a complex nonlinear quenching relationship that suggests that within the range of
activity, Giant, and probably other short-range repressors, have an optimum distance of
action that may reﬂect steric constraints (Figure III-4). Multiple formulations of the
model generated very similar predictions, suggesting that this nonlinear distance function
is a real feature of the system (Appendix A Figure III-5). Consistent with this notion, a
previous study of transcription factor binding sites in Drosophila enhancers discovered

an overall preference of Kruppel sites to be found 17 bp from Bicoid activator sites,

155

which may be an indication that other short-range repressors also have preferred
distances for optimal activity (Makeev et a1, 2003).

The similar quenching efﬁciencies for repressors acting adjacent to Dorsal or
Twist activator sites was an additional signiﬁcant ﬁnding (Figure III-4). The similar
effect on disparate activator proteins indicates that the effects of short-range repression
are general, and are likely to be translatable to distinct contexts. Previous empirical tests
had already pointed in this direction; for example, insertion of ectopic binding sites for
Knirps and Kruppel into rho NEE sequences is sufﬁcient to induce repression, although
these proteins do not usually cross regulate (Gray et al, 1994; Amosti et al, 1996a). In
addition, systematic comparisons suggest that short-range repressors can counteract a
variety of transcriptional activation domains with similar efﬁciency, suggesting that
speciﬁc protein-protein contacts are not essential (Amosti et al, 1996b; Kulkarni &
Amosti 2005). This ﬂexibility plays well into a computational understanding of
transcriptional information in the ﬂy, because it is likely that the quantitative correlations
we observe in speciﬁc genes tested here are general properties of the repressors. The
application of our modeling to the rho NEE indicates that certain features can be directly
applied to more general contexts (Figure III—8).

In one area we found quantitative differences between parameters derived from
the synthetic gene modules and the endogenous regulatory regions. The importance of
homotypic cooperativity predicted for Snail sites in the context of the rho NEE was
overall much higher than that found for Giant, Kruppel and Knirps sites acting on the
synthetic gene constructs; this might be an example where the individual proteins do

exhibit different context dependencies perhaps because the proteins differ in level of

156

“stickiness”. Alternatively, the distance between the Snail sites in question, 23 bp, might
facilitate cooperative interactions much more than the closely apposed spacing used in
our genes, where steric interference may play an opposing role.

In revisiting previous incarnations of the endogenous rho NEE, we highlighted
several features of the architecture of this regulatory region. This enhancer appears to be
“over-engineered” in terms of the use of Snail to mediate repression; based on previous
experiments, it appears that even a single Snail site is sufﬁcient to mediate repression
(Gray et al, 1994). Consistent with this view, our short-range repressor parameters would
predict that for many values of the Snail repressor scaling factor, the enhancer should be
effectively inhibited by the set of endogenous binding sites (Figure III-8). How might we
understand this apparent redundancy? Such design may provide the correct dynamical
response, with a swift repression of rho at an early enough time where Snail levels are
still low, or it may ensure that gene output is robust to environmental and genetic noise.
To carry out this modeling, we made explicit assumptions about the relative repression
effectiveness of different conﬁgurations of this enhancer, containing one, two or four
Snail sites. These different thresholds of repression seem to be warranted, but it would be
clearly desirable to have quantitative values for the relative expression, similar to those
obtained for our synthetic gene modules, to better model the activity of these enhancers.

The rho NEE modeling also highlighted features of transcriptional activators.
Activator scaling factors for Dorsal were reproducibly lower than those of Twist, and this
was apparent for several different assumptions of expression level (Figure 111-8 and data
not shown). The relative differences in contribution to activation can be explained by

examination of the structure of the enhancer; contribution by the low intrinsic values of

157

Dorsal are ampliﬁed by strong cooperativity with Twist, setting up a chain of interacting
weak sites that together are highly active. Experimental evidence bears out these
conclusions; isolated Dorsal sites tested on reporter genes mediate relatively weak
activation, and a rho NEE lacking Twist sites, but containing four Dorsal sites, is
similarly compromised (Ip et al, 1992; Szymanski & Levine, 1995).

Our earlier study have suggested that many developmental enhancers, including
those regulated by short-range repressors, may possess a ﬂexible “billboard” design, in
which individual factors or small groups of proteins would independently communicate
with the promoter region, so that the net output of an enhancer would reﬂect the
cumulative set of contacts over a short time period (Kulkarni & Amosti, 2003). Such a
view of enhancers would account for the evolutionary plasticity observed in regulatory
sequences. No DNA-scaffolded superstructure, reﬂecting the formation of a unique three
dimensional complex, would be necessary in this scenario. Yet our modeling suggests
that the rho NEE might involve communication between relatively distant binding sites,
through sets of cooperative interactions. In this case, it is possible that such distant
interactions might be compatible with a ﬂexible structure, if many distinct conﬁgurations
of binding sites provide such a cooperative network. Current studies have indeed
highlighted potential frameworks involving Dorsal and interacting factors on same
classes of enhancer (Erives & Levine, 2004; Papatsenko & Levine, 2007). Application of
a transcriptional grammar integrating activities of activators and repressors is a critical

next step to illuminate enhancer design and evolution.

158

REFERENCES

Ackers GK, Johnson AD, Shea MA (1982) Quantitative model for gene regulation by )1
phage repressor. Proc Natl Acad Sci USA 79: 1129-1133

Amosti DN, Gray S, Barolo S, Zhou J, Levine M (1996a) The gap protein knirps
mediates both quenching and direct repression in the Drosophila embryo. EMBO J 15:
3659-3666

Amosti DN, Barolo S, Levine M, Small S (1996b) The eve stripe 2 enhancer employs
multiple modes of transcriptional synergy. Development 122: 205-214

Ay A, Fakhouri WD, Chiu C, Amosti DN (2008) Image processing and analysis for
quantifying gene expression from early Drosophila embryos. Tissue Eng Part A 14:
1517-1526

Berrnan BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M, Rubin GM,
Eisen MB (2002) Exploiting transcription factor binding site clustering to identify cis-
regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl
Acad Sci USA 2: 757-762

Bintu L, Buchler NE, Garcia HG, Gerland U, Hwa T, Kondev J, Phillips R (2005a)
Transcriptional regulation by the numbers: models. Curr Opin Genet Dev 15: 116-124

Bintu L, Buchler NE, Garcia HG, Gerland U, Hwa T, Kondev J, Kuhlman T, Phillips R
(2005b) Transcriptional regulation by the numbers: applications. Curr Opin Genet Dev
15: 125-135

Carroll SB, Grenier JK, Weatherbee SD (2001) From DNA to Diversity. Blackwell
Science, Malden, Massachusetts, USA

Clyde DE, Corado MS, Wu X, Paré A, Papatsenko D, Small S (2003) A self-organizing
system of repressor gradients establishes segmental complexity in Drosophila. Nature
426: 849-853

Crocker J, Tarnori Y, Erives A (2008) Evolution acts on enhancer organization to ﬁne-
tune gradient threshold readouts. PLoS Biol 6: e263

Driever W, Thoma G, Nﬁsslein-Volhard C (1989) Determination of spatial domains of
zygotic gene expression in the Drosophila embryo by the afﬁnity of binding sites for the
bicoid morphogen. Nature 340: 363-367

Erives A, Levine M (2004) Coordinate enhancers share common organizational features
in the Drosophila genome. Proc Natl Acad Sci USA 101: 3851-3856

159

 

Fomekong-Nanfack Y, Kaandorp JA, Blom J (2007) Efficient parameter estimation for
spatio-temporal models of pattern formation: case study of Drosophila melanogaster.
Bioinformatics 23: 3356-3363

Gao Q, Wang Y, F inkelstein R (1996) Orthodenticle regulation during embryonic head
development in Drosophila. Mech Dev 56: 3-15

Gray S, Szymanski P, Levine M (1994) Short-range repression permits multiple
enhancers to function autonomously within a complex promoter. Genes Dev 8: 1829-
1838

Hare EE, Peterson BK, Iyer VN, Meier R, Eisen MB (2008) Sepsid even-skipped
enhancers are functionally conserved in Drosophila despite lack of sequence
conservation. PLoS Genet 4: e1000106

Hewitt GF, Strunk BS, Margulies C, Priputin T, Wang XD, Amey R, Pabst BA, Kosman
D, Reinitz J, Amosti DN (1999) Transcriptional repression by the Drosophila giant
protein: cis element positioning provides an alternative means of interpreting an effector
gradient. Development 126: 1201-1210

Ip YT, Park RE, Kosman D, Bier E, Levine M (1992) The dorsal gradient morphogen
regulates stripes of rhomboid expression in the presumptive neuroectoderrn of the
Drosophila embryo. Genes Dev 6: 1728-1739

Jaeger J, Surkova S, Blagov M, Janssens H, Kosman D, Kozlov KN, Manu, Myasnikova
E, Vanario-Alonso CE, Samsonova M, Sharp DH, Reinitz J (2004) Dynamic control of
positional information in the early Drosophila blastoderm. Nature 430: 368-371

Janssens H, Hou S, Jaeger J, Kim AR, Myasnikova E, Sharp D, Reinitz J (2006)
Quantitative and predictive model of transcriptional control of the Drosophila
melanogaster even-skipped gene. Nat Genet 38: 1159-1165

Kulkarni MM, Amosti DN (2003) Information display by transcriptional enhancers.
Development 130: 6569-6575

Kulkarni MM, Amosti DN (2005) Cis-regulatory logic of short-range transcriptional
repression in Drosophila. Mol Cell Biol 9: 3411-3420

Ludwig MZ, Kreitman M (1995) Evolutionary dynamics of the enhancer region of even-
skipped in Drosophila. Mol Biol Evol 12: 1002-1011

Ludwig MZ, Palsson A, Alekseeva E, Bergman CM, Nathan J, Kreitman M (2005)
Functional evolution of a cis-regulatory module. PLoS Biol 3: e93

160

 

Makeev VJ, Lifanov AP, Nazina AG, Papatsenko DA (2003) Distance preferences in the
arrangement of binding motifs and hierarchical levels in organization of transcription
regulatory information. Nucleic Acids Res 31: 6016-6026

Mayo AE, Setty Y, Shavit S, Zaslaver A, Alon U (2006) Plasticity of the cis-regulatory
input function of a gene. PLoS Biol 4: e45

Mendes P, Kell D (1998) Non-linear optimization of biochemical pathways: applications
to metabolic engineering and parameter estimation. Bioinformatics 14: 869-883

Moles CG, Mendes P, Banga JR (2003) Parameter estimation in biochemical pathways: a
comparison of global optimization methods. Genome Res 13: 2467-2474

Ochoa-Espinosa A, Yu D, Tsirigos A, Strufﬁ P, Small S (2009) Anterior-posterior
positional information in the absence of a strong Bicoid gradient. Proc Natl Acad Sci
USA 106: 3823-3828

Papatsenko D, Levine M (2007) A rationale for the enhanceosome and other
evolutionarily constrained enhancers. Curr Biol 17: 95 5-957

Pizarro J, Guerrero E, Galindo, PL (2000) A statistical model selection strategy applied to
neural networks. Proceedings of the ESANN 2000: 55-60

Runarsson TP, Yao X (2005) Search Biases in Constrained Evolutionary Optimization.
IEEE Transactions on Systems, Man and Cybernetics Part C 35: 233-243

Sénchez L, Thieffry D (2001) A logical analysis of the Drosophila gap-gene system. J
Theor Biol 211: 115-141

Schroeder MD, Pearce M, Fak J, Fan H, Unnerstall U, Emberly E, Rajewsky N, Siggia
ED, Gaul U (2004) Transcriptional control in the segmentation gene network of
Drosophila. PLoS Biol 2: e271

Segal E, Raveh-Sadka T, Schroeder M, Unnerstall U, Gaul U (2008) Predicting
expression patterns from regulatory sequence in Drosophila segmentation. Nature 451:

535-540

Setty Y, Mayo AE, Surette MG, Alon U (2003) Detailed map of a cis-regulatory input
function. Proc Natl Acad Sci USA 100: 7702-7707

Shea MA, Ackers GK (1985) The OR control system of bacteriophage lambda. A
physical-chemical model for gene regulation. J Mol Biol 181: 211—230

Small S, Amosti DN, Levine M (1993) Spacing ensures autonomous expression of
different stripe enhancers in the even-skipped promoter. Development 119: 762-772

161

E.-

 

Stormo, GD (2000) DNA binding sites: representation and discovery. Bioinformatics 16:
16-23

Szymanski P, Levine M (1995) Multiple modes of dorsal-bHLH transcriptional synergy
in the Drosophila embryo. EMBO J 14: 2229-2238

Vilar JM, Leibler S (2003) DNA looping and physical constraints on transcription
regulation. J Mol Biol 331: 981-989

Von Hippel PH, Revzin A, Gross CA, Wang AC (1974) Non-speciﬁc DNA binding of
genome regulating proteins as a biological control mechanism: 1. The lac operon:

equilibrium aspects. Proc Natl Acad Sci USA 71: 4808-4812

Wharton KA Jr, Crews ST (1993) CNS midline enhancers of the Drosophila slit and Toll
genes. Mech Dev 40: 141-154

Zinzen RP, Senger K, Levine M, Papatsenko D (2006) Computational models for
neurogenic gene expression in the Drosophila embryo. Curr Biol 16: 1358-1365

Zinzen RP, Papatsenko D (2007) Enhancer responses to similarly distributed antagonistic
gradients in development. PLoS Comput Biol 3: e84

162

Chapter 4
Conclusions and Future Directions
CONCLUSION

Analysis of gene networks and genomic cis-regulatory elements holds the key to
understanding how genomes encode the properties of organisms. However, despite
intensive research, our understanding of the design principles underlying complex genetic
switches is currently at a rudimentary level. In the last twenty years, essential features of
the biochemistry of gene regulation have come into focus. For the purposes of developing
a comprehensive picture of how a transcriptional enhancer operates, however, such
studies have served more to highlight the considerable complexity and multifarious
activities of such cis regulatory elements, rather than to provide a quantitative basis for
understanding their activity. Quantitative models, based on aspects of the system that are
readily quantiﬁable, such as primary DNA structure of a regulatory region, quantities of
regulatory proteins, and transcript levels, appear to offer an alternative route to learn
about key features of regulatory systems. When combined with biochemical and
genomic information, such models may provide the “bridge” that will allow deeper
understanding of the function and evolution of cis regulatory elements, which are the
nexus of many biological processes.

In this work, we describe studies including image processing, data analysis and
mathematical modeling that shed light on how transcriptional regulatory information
encoded within the cis-regulatory enhancer sequence is interpreted by the cellular
machinery. In the image processing and data analysis section (Chapter 2) we describe a

method to correlate the transcription factors and mRNA output for gene modeling

163

 

purposes. We show that the levels of lacZ mRNA, and potentially the transcriptional
repressor protein Giant, are proportional to ﬂuorescent intensities, a critical basis for
quantitative modeling. Our analysis also reveals that a suggested parabolic form of the
background ﬂuorescence in confocal images of early Drosophila embryos is evident most
prominently in ﬂattened specimens, with intact embryos exhibiting a more linear
background. After appropriate background subtraction and normalization, these data are
amenable to representation of gene regulatory surfaces that permit creation and validation
of quantitative models of gene expression. In this way, we have constructed the
foundations for modeling a cis-regulatory ‘grarnmar’ that applies to an important set of
transcriptional regulators in the Drosophila blastoderm embryo. More generally, the
image and data analysis techniques described in this study can be generalized to
understand the quantitative function of endogenous genes or gene networks in the
Drosophila embryo or other well-characterized systems. Such an approach may also
prove useful for engineered transcriptional regulatory elements employed for targeted
expression of genes in therapeutic settings.

In the mathematical modeling section (Chapter 3), by employing a reductionist
analysis of short-range repression, we sought to explore a relatively untouched, yet
central aspect of gene expression from Drosophila. Previous qualitative studies played a
key role in highlighting the extreme distance-dependence of such repressors, and
comparative analysis has shown many instances of evolutionary plasticity of regulatory
regions controlled by such repressors (Gray et al, 1994; Ludwig & Kreitman, 1995;
Hewitt et al, 1999; Hare et al, 2008). Knowing that transcription factors inﬂuence each

other in a local fashion permitted the identiﬁcation of novel enhancers, based on the

164

 

clustering of binding sites (Berrnan et al, 2002; Schroeder et al, 2004). Yet these studies
alone do not provide the basis for predicting evolutionary changes that reshape
transcriptional output, or predicting comparative activity of enhancers controlled by
similar groups of transcription factors. For example, the original hypothesis that the
afﬁnity and or number of Bicoid binding sites dictates the output of regulated genes has
been replaced by an understanding that other, as-yet unknown features, appear to play
more decisive roles (Driever et al, 1989; Gao et al, 1996; Ochoa-Espinosa et al, 2009).
These previous studies largely relied on endogenous enhancers, which have
complex arrangements of transcription factor binding sites, thus even relatively discrete
changes made to these elements probably perturb a number of factors. Here, we limited
the number of features that differed between elements, allowing modeling with a
tractable number of parameters. We utilized a common block of Dorsal and Twist
activator sites, allowing us to focus on changes made in the number and arrangement of
repressor sites; clearly, differences in afﬁnity, number and arrangement of activator sites
also play decisive roles in dictating transcriptional output, thus future modeling efforts
will need to integrate these elements as well. The tight focus on short-range repressors
with the analysis of a relatively small number of reporter constructs provided data in
depth sufﬁcient to provide robust information on key parameters (Figure III-7). From our
comparison of repression by other short-range repressors, it is likely that the extensive
analysis of Giant provides good guidance for the activity of other similarly-acting
repressors; modeling of Kruppel, Knirps and Snail appear to be feasible on elements that
they are known to interact with (Figure III-8). To extend this type of analysis to other

transcription factors, it is likely that separate empirical studies will be essential to

165

 

understand the basic elements of their context effects. In one case, the Hairy long-range
repressor exerts repressive inﬂuences over greater than one kilobase, suggesting that the
constraints on placement of Hairy binding sites on target genes may be quite relaxed
compared to those applicable to short-range repressors, an issue that warrants further
investigation (Barolo & Levine, 1997). Studies of this sort may not unlock a
comprehensive atlas of cis regulatory relationships for all transcription factors, but by
providing in-depth insights on a one-by-one (or a class-by-class) basis, important
advances are feasible relating to the activity and evolution of speciﬁc classes of genes.
To date, similar model gene studies have provided the basis of our qualitative, empirical
understanding of transcription, thus attacking a quantitative problem in a discrete fashion,
one group of factors at a time, has strong precedents.

Fractional occupancy modeling has been applied to understand Drosophila
transcriptional regulation in several other studies. Segal and colleagues combined
expression data from 44 characterized blastoderm enhancers and 8 trans-acting factors to
identify parameters describing cooperativity and factor activity (Segal et al, 2008). This
exercise was fairly successful, judged by the ability of the obtained parameters to predict
an additional set of 26 enhancers that were not included in the modeling. This form of the
model did not consider quenching distances, however, (contributions of activators and
repressors are merely summed, with no consideration for relative spacing). In light of the
extensive use of short-range repression in these enhancers, this simpliﬁcation must be
considered a major shortcoming. The failure of this model to consider heterotypic
cooperativity is also likely to account for some of the limitations, as pointed out by these

researchers. Reinitz and colleagues focused in on a 1.7 kbp regulatory region driving

166

 

expression of the even-skipped gene; by considering possible binding sites for Caudal not
included in original DNaseI footprinting studies, this model predicted speciﬁc features of
expression that were subsequently quantitatively validated (Janssens et al, 2006). It is
not clear how generally applicable to other enhancers the derived parameters were from
this model — quenching distances of short-range repressors are estimated to be 1 for
distances less than 50 bp with a linear decrease to 0 for distances between 50 bp and 150
bp, for example. In the former study, the modeling might suffer from simpliﬁcation
required in considering a great number of enhancers that differ in a many different ways,
while in the latter study, a single complex element is target of focused studies, with the
possible limitation that any model would be liable to overﬁtting (although mutant forms
of the enhancer are also considered in the modeling). A third fractional occupancy
modeling study considered regulatory elements controlled by the same activators we
tested here, namely Dorsal and Twist (Zinzen et al, 2006). The overall objective of this
study was to understand the relative activity of enhancers of the neuroectoderrn that are
controlled by Dorsal, Twist and the Snail short-range repressor. In this model, parameters
for cooperativity, activity, and quenching were also considered, and a critical importance
of Dorsal-Twist cooperativity was identiﬁed, although not Twist-Twist cooperativity.
The quenching factor for Snail was again not explicitly measured, but was deﬁned as a
simple stepwise function equal to 1 for distances less than 100 and 0 for distances greater
than 100 bp. A major difference between this study and those of Segal et al (2008) and
Janssens et al (2006) is that the modeling was conducted in silico on synthetic gene
modules with different conﬁgurations of binding sites, similar to our approach, however

the output of these gene modules was never empirically tested. The main conclusions

167

from parameter estimation are therefore based on which sets of values might be likely to
provide informative results. Thus, our study provides a critical feedback between
experiment and quantitative modeling that is lacking in this earlier study. It is not clear
that the earlier studies are sufﬁciently robust to predict why extensively reshufﬂed
enhancer sequences from different retain similar function, or, in some cases, show
quantitatively distinct outputs (Ludwig et al, 2005; Crocker et al, 2008). Our studies are
by their design shaped to detect quantitative differences resulting from minor differences
in repressor-activator spacing, for example, and should be useful for such evolutionary
studies.

This study identiﬁed speciﬁc quantitative features that appear to apply to short-
range repressors in a general context. Our study identiﬁed a complex nonlinear
relationship that suggests that within the range of activity, Giant, and probably other
short-range repressors, have an optimum distance of action that may reﬂect steric
constraints (Figure III-4). Multiple formulations of the model generated very similar
predictions, suggesting that this nonlinear distance function is a real feature of the system
(Figure A-S). Consistent with this notion, a previous study of transcription factor binding
sites in Drosophila enhancers discovered an overall preference of Kruppel sites to be
found 17 bp from Bicoid activator sites, which may be an indication that other short-
range repressors also have preferred distances for optimal activity (Makeev et al, 2003).

Another signiﬁcant ﬁnding was the similar quenching efﬁciencies for repressors
acting adjacent to Dorsal or Twist activator sites (Figure 111-4). The similar effect on
disparate activator proteins indicates that the effects of short-range repression are general,

and are likely to be translatable to distinct contexts. Previous empirical tests had already

168

 

pointed in this direction; for example, insertion of ectopic binding sites for gap gene
repressors Giant, Knirps and Kruppel into rhomboid NEE sequences is sufﬁcient to
induce repression, although these proteins do not usually cross regulate (Gray et al, 1994;
Amosti et al, 1996a; Hewitt et al, 1999). In addition, systematic comparisons suggest
that short-range repressors can counteract a variety of transcriptional activation domains
with similar efﬁciency, suggesting that speciﬁc protein-protein contacts are not essential
(Amosti et al, 1996b; Kulkarni & Amosti 2005). The generality of repression suggests
that this class of transcription factor is able to be readily recruited to novel contexts, a
feature that may have played an instrumental role during the extensive remodeling of
transcriptional regulatory circuitry associated with the evolution of the derived syncytial
blastoderm of Drosophila. This ﬂexibility plays well into a computational understanding
of transcriptional information in the ﬂy, because it is likely that the quantitative
correlations we observe in speciﬁc genes tested here are general properties of the
repressors. The application of our modeling to the rho NEE indicates that certain
features can be directly applied to more general contexts (Figure 111-8). The prediction
that short-range repressors such as Snail and Giant will generally exhibit similar
efﬁciency acting on other activators, in other contexts, will be an important goal for
future application of this modeling.

In one area we found quantitative differences between parameters derived from
the synthetic gene modules and the endogenous regulatory regions. The importance of
homotypic cooperativity predicted for Snail sites in the context of the rho NEE was
overall much higher than that found for Giant, Kruppel and Knirps sites acting on the

synthetic gene constructs; this might be an example where the individual proteins do

169

 

exhibit different context dependencies perhaps because the proteins differ in level of
“stickiness”. Alternatively, the distance between the Snail sites in question, 23 bp, might
facilitate cooperative interactions much more than the closely apposed spacing used in
our constructs, where steric interference may play an opposing role.

In revisiting previous incarnations of the endogenous rho NEE, we highlighted
several features of the architecture of this regulatory region. This enhancer appears to be
“over-engineered” in terms of the use of Snail to mediate repression; based on previous
mutagenesis experiments, it appears that even a single Snail site located within the central
enhancer would be sufﬁcient to mediate repression (Gray et al, 1994). Consistent with
this view, our short-range repressor parameters would predict that for many values of the
Snail repressor scaling factor, the enhancer should be effectively inhibited by the set of
endogenous binding sites (Figure III-8). How might we understand this apparent
redundancy? Such design may provide the correct dynamical response, with a swift
repression of rho at an early enough time where Snail levels are still low, or it may ensure
that gene output is robust to environmental and genetic noise. We made explicit
assumptions about the relative repression effectiveness of different conﬁgurations of this
enhancer, containing one, two or four Snail sites. These different thresholds of repression
seem to be warranted, but it would be clearly desirable to have quantitative values for the
relative expression, similar to those obtained for our synthetic gene modules, to better
model the activity of these enhancers.

A second feature of this modeling provided an insight into transcriptional
activators that was not apparent from our synthetic modules, in which a single cluster of

Dorsal and Twist sites was used in different contexts. Activator scaling factors for

170

 

 

 

Dorsal were reproducibly lower than those of Twist, and this was apparent for several
different assumptions of expression level (Figure 111-8 and data not shown). The relative
differences in contribution to activation can be rationalized by examination of the
structure of the enhancer; contribution by the low intrinsic values of Dorsal are ampliﬁed
by strong cooperativity with Twist, setting up a chain of interacting weak sites that
together are highly active. Interference of part of this chain of regulatory interactions by
Snail would be sufﬁcient to bring the whole ediﬁce crashing down. Experimental
evidence bears out these conclusions; isolated Dorsal sites tested on reporter genes
mediate relatively weak activation, and a rho NEE lacking Twist sites, but containing
four Dorsal sites, is similarly compromised. The relative constrained placement of short-
range transcriptional repressors in some contexts should contrast with the expected
relaxed constraints for long-range repressors, such as Hairy, that are able to mediate
repression from distal sites. We expect that evolutionary constraint of binding sites for
this latter type of transcriptional regulator should be much lower, a prediction that will
require better identiﬁcation of physiologically active sites on endogenous target genes.
Our earlier study have suggested that many developmental enhancers, including
those regulated by short-range repressors, may possess a ﬂexible “billboard” design, in
which individual factors or small groups of proteins would independently communicate
with the promoter region, so that the net output of an enhancer would reﬂect the
cumulative set of contacts over a short time period (Kulkarni & Amosti, 2003). Such a
view of enhancers would account for the evolutionary plasticity observed in regulatory
sequences. No DNA-scaffolded superstructure, reﬂecting the formation of a unique three

dimensional complex, would be necessary in this scenario. Yet our modeling suggests

171

 

that the rho NEE might involve communication between relatively distant binding sites,
through sets of cooperative interactions. In this case, it is possible that such distant
interactions might be compatible with a ﬂexible structure, if many distinct conﬁgurations
of binding sites provide such a cooperative network. Current studies have indeed
highlighted potential frameworks involving Dorsal and interacting factors on same
classes of enhancer (Erives & Levine, 2004; Papatsenko & Levine, 2007). Application of
a transcriptional grammar integrating activities of activators and repressors is a critical

next step to illuminate enhancer design and evolution.

FUTURE DIRECTIONS

Comparative sequence analysis, coupled with the development of algorithms to
search genomic databases, has provided important tools for the identiﬁcation of gene
regulatory elements at a scale not previously possible, but this approach has been
partially successful at ﬁnding cis-regulatory motifs. In many cases, predicted regulatory
elements are non-functional possibly due to enhancer architecture, which are not taken
into account usually by the present techniques. It is necessary to factor in features of
enhancers such as binding site number, spacing, orientations and order, in order to
achieve better predictions in the interpretation of their biological function. Our study
(Chapter 3) takes a ﬁrst step towards providing such a paradigm for early Drosophila
developmental enhancers that are regulated by the short-range transcriptional repressors.
Our ﬁndings on enhancer grammar might be incorporated to present bioinformatics
techniques, which employ information about putative binding sites and evolutionary

conservations, to survey genomic regions. Such incorporation might provide powerful

172

predictive tools that will be useful for studies of population biology, such as those where
distinct gene expression proﬁles are linked to disease susceptibility and differences in
drug metabolism. A major goal now is to identify novel regulatory elements using all
sequenced Drosophila genomes and understand how evolutionary processes shape
endogenous enhancers, focusing on targets of short-range repressors that have not been
well characterized with respect to regulatory elements. Some candidates include A-P
patterning gene even-skipped and D-V patterning gene rhomboid, which show conserved
function in other Drosophila species with some small changes in enhancer structure.
Current approaches for identiﬁcation of candidate regulatory modules include
computational methods that look for clusters of transcription factor binding sites and
phylogenetic comparisons that identify evolutionarily conserved sequences (Berman et al,
2002; Markstein and Levine, 2002; Markstein et al., 2002; Bergman and Kreitman,
2001). However these methods give large number of false-positive and false-negative
results for enhancer prediction, possibly due to degenerate nature of transcription-factor-
binding sites. Thus, in many cases where cis-regulatory modules predicted by
computational methods appear suitable, something in the arrangement of sites which has
shown to be critical in our study renders the cis-element non-functional. In order to
achieve better predictions and eliminate false-positive and false-negative results,
computational methods should include structural features such as parameters for spacing
and position of binding motifs within cis-elements in the search algorithms. For example,
suppose that a computational search for novel cis-elements based on clustering of binding
sites for transcription factors was used to identify enhancers by Giant. Given the relative

number, afﬁnities, spacing and distribution of repressor and activator sites within the

173

module we would be better able to predict the possibility of Giant regulating that element
and how strong that regulation would be. The neurectoderrn enhancers (NEEs) such as
rho, vnd and sog active in the mid-blastoderm Drosophila embryo provide an excellent
test case for this idea. These enhancers are not cross-regulated by Giant, Kriippel, or
Knirps, i.e. these short-range repressors’ binding sites located within the critical NEE
enhancers cannot be of functional importance, by this criterion. Furthermore, we can
model the expression driven by NEEs in terms of transcriptional inputs from the expected
regulators (D1, Twi, Sna) as well as the “inappropriate” repressors (Gt, Kr, Kni), and
examine the effect of each putative cross-regulating site. Such an analysis will reveal the
presence of any inappropriate sites that are allowed in the NEEs because they are located
outside of the zone of inﬂuence imposed by short-range repressors.

By combining quantitative methods to expression levels of the transcription
factors and a downstream target gene over space and time we can build a predictive
modeling tool that will allow us not only the identiﬁcation of novel cis- elements that are
regulated by the same suite of transcription factors, in this case the short-range
repressors, but also to predict its expression pattern. Computational algorithms based on
binding site sequence data from transcription factor binding site databases, can be used to
scan the genome for clusters of short-range repressor binding sites together with binding
sites for other proteins that are known to work with them. Combining datasets from a
number of large-scale analyses might be used for preﬁltering the set of putative
regulatory elements obtained. Bioinformatic predictions of binding sites might be used to
build a predicted picture of the conﬁgurations of these regulatory elements, and expected

output of the endogenous regions might be obtained by fractional occupancy modeling. A

174

 

simpliﬁed example of how exactly such a predictive tool can be used is as follows: We
can identify the putative targets of short-range transcriptional repressors in the
Drosophila melanogaster genome and predict how these candidate genes will respond
over space and time. To understand the molecular function and design of these elements,
we can incorporate the identiﬁed parameters for short-range repression in the
bioinformatic analysis of known regulatory elements in D. melanogaster and in related
species, in particular the 12 sequenced ﬂy genomes. For example, we can apply our tool
to anterior-posterior and dorsal-ventral patterning enhancers, for which no quantitative
modeling approach has adequately considered the special nature of short-range repressors
up until now. Such an analysis would be a test for the predicted parameters in our study,
by which we should be able to accurately predict expression levels. In addition, we can
make predictions about the effect of ectopic binding sites introduced artiﬁcially into
regulatory sequences; and directly validate these by measurement of transgenes. By these
analyses we might identify previously hidden features of such elements, such as possible
redundancy and extra regulatory sites not found in simple “promoter-bashing”
experiments.

Mathematical models such as Boolean models and differential equation models
have been used to understand the gap gene regulatory network in early Drosophila
melanogaster embryos as mentioned in the introduction (Sanchez et al, 2001; Jaeger et
al, 2004a; Jaeger et al, 2004b). These models were in agreement with the earlier mutant
and reporter studies. In addition, the differential equation model was also in agreement
with the available spatial and temporal data of protein levels. However, experimental

analysis of this regulatory cascade point to at least one additional level of complexity that

175

needs to be included in such predictive models of gene regulatory networks, which is a
detailed understanding of the cis-regulatory substructure and its functional signiﬁcance as
we have deﬁned for the short range transcriptional repressors (Giant, Knirps and

Kruppel) in Chapter 3.

 

176

 

REFERENCES

Amosti DN, Gray S, Barolo S, Zhou J, Levine M (1996a) The gap protein knirps
mediates both quenching and direct repression in the Drosophila embryo. EMBO J 15:
3659-3666

Amosti DN, Barolo S, Levine M, Small S (1996b) The eve stripe 2 enhancer employs
multiple modes of transcriptional synergy. Development 122: 205-214

Barolo S, Levine M (1997). Hairy mediates dominant repression in the Drosophila
embryo. EMBO J. 16: 2883-2981

Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M, Rubin GM,
Eisen MB (2002) Exploiting transcription factor binding site clustering to identify cis-
regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl
Acad Sci USA 2: 757-762

Bergman, C.M., Kreitman, M. (2001). Analysis of conserved noncoding DNA in
Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res.
11(8): 1335--l345.

Crocker J, Tamori Y, Erives A (2008) Evolution acts on enhancer organization to ﬁne-
tune gradient threshold readouts. PLoS Biol 6: e263

Driever W, Thoma G, Nﬁsslein-Volhard C (1989) Determination of spatial domains of
zygotic gene expression in the Drosophila embryo by the affinity of binding sites for the
bicoid morphogen. Nature 340: 363-367

Erives A, Levine M (2004) Coordinate enhancers share common organizational features
in the Drosophila genome. Proc Natl Acad Sci USA 101: 3851-3856

Gao Q, Wang Y, Finkelstein R (1996) Orthodenticle regulation during embryonic head
development in Drosophila. Mech Dev 56: 3-15

Gray S, Szymanski P, Levine M (1994) Short-range repression permits multiple
enhancers to function autonomously within a complex promoter. Genes Dev 8: 1829-
1838

Hare EE, Peterson BK, Iyer VN, Meier R, Eisen MB (2008) Sepsid even-skipped
enhancers are functionally conserved in Drosophila despite lack of sequence

conservation. PLoS Genet 4: e1000106

Hewitt GF, Strunk BS, Margulies C, Priputin T, Wang XD, Amey R, Pabst BA, Kosman
D, Reinitz J, Amosti DN (1999) Transcriptional repression by the Drosophila giant

177

protein: cis element positioning provides an alternative means of interpreting an effector
gradient. Development 126: 1201-1210

Jaeger J, Blagov M, Kosman D, Kozlov KN, Manu, Myasnikova E, Surkova S, Vanario-
Alonso CE, Samsonova M, Sharp DH, Reinitz J (2004) Dynamical analysis of regulatory
interactions in the gap gene system of Drosophila melanogaster. Genetics 167(4): 1721-
1737

Jaeger J, Surkova S, Blagov M, Janssens H, Kosman D, Kozlov KN, Manu, Myasnikova
E, Vanario-Alonso CE, Samsonova M, Sharp DH, Reinitz J (2004) Dynamic control of
positional information in the early Drosophila blastoderm. Nature 430(6997): 368-371

Janssens H, Hou S, Jaeger J, Kim AR, Myasnikova E, Sharp D, Reinitz J (2006)
Quantitative and predictive model of transcriptional control of the Drosophila
melanogaster even-skipped gene. Nat Genet 38: 1159-1165

Kulkarni MM, Amosti DN (2003) Information display by transcriptional enhancers.
Development 130: 6569-6575

Kulkarni MM, Amosti DN (2005) Cis-regulatory logic of short-range transcriptional
repression in Drosophila. Mol Cell Biol 9: 3411-3420

Ludwig MZ, Kreitman M (1995) Evolutionary dynamics of the enhancer region of even-
skipped in Drosophila. Mol Biol Evol 12: 1002-1011

Ludwig MZ, Palsson A, Alekseeva E, Bergman CM, Nathan J, Kreitman M (2005)
Functional evolution of a cis-regulatory module. PLoS Biol 3: e93

Makeev VJ, Lifanov AP, Nazina AG, Papatsenko DA (2003) Distance preferences in the
arrangement of binding motifs and hierarchical levels in organization of transcription
regulatory information. Nucleic Acids Res 31: 6016-6026

Markstein M and Levine M. (2002) Decoding cis-regulatory DNAs in the Drosophila
genome. Curr Opin Genet Dev. 12, 601-606.

Markstein M, Markstein P, Markstein V, and Levine M. (2002) Genome-wide analysis of

clustered Dorsal binding sites identiﬁes putative target genes in the Drosophila embryo.
PNAS 99, 763-768.

Ochoa-Espinosa A, Yu D, Tsirigos A, Strufﬁ P, Small S (2009) Anterior-posterior
positional information in the absence of a strong Bicoid gradient. Proc Natl Acad Sci

USA 106: 3823-3828

Papatsenko D, Levine M (2007) A rationale for the enhanceosome and other
evolutionarily constrained enhancers. Curr Biol 17: 955-957

178

Sénchez L, Thieffry D (2001) A logical analysis of the Drosophila gap-gene system. J
Theor Biolle: 115-141

Schroeder MD, Pearce M, Fak J, Fan H, Unnerstall U, Emberly E, Rajewsky N, Siggia
ED, Gaul U (2004) Transcriptional control in the segmentation gene network of

Drosophila. PLoS Biol 2: e271

Segal E, Raveh-Sadka T, Schroeder M, Unnerstall U, Gaul U (2008) Predicting
expression patterns from regulatory sequence in Drosophila segmentation. Nature 451:

535-540

Zinzen RP, Senger K, Levine M, Papatsenko D (2006) Computational models for
neurogenic gene expression in the Drosophila embryo. Curr Biol 16: 1358-1365

179

APPENDIX A
Supplementary Material for Chapter III
Modeling:
Here we describe a general fractional site occupancy model and then explain how it is
applied in our case. Consider an enhancer, with two activator and two repressor binding

sites bound by activators A1 and A2, and repressors R1 and R2 respectively. In our case,

the repressors are short-range repressors. Simultaneous binding of the repressors and

activators may result in a partially active state of the enhancer.

Different bound state of the enhancer, for example, two activators and one repressor,
represent distinct states. The probability of each state of those proteins on the DNA is

described by the following parameters.

[A1] and [A2]: Concentration of activator l and 2 respectively,
[R1] and [R2]: Concentration of repressor 1 and 2 respectively,

K and K A : Binding afﬁnity of activator 1 and 2 respectively,

“‘1 2

K R and K R : Binding afﬁnity of repressor 1 and 2 respectively,
1 2

C?2 and Clrz: Cooperativity of activators and repressors for binding to the DNA

respectively.
A .
q R] : Quenching efﬁciency of repressor Ri on the activator Aj.
i

FX: Probability of having X case.

For simpliﬁcation in the following formulas,

180

Z: (1+KA1M1HKA2M2HC012 KA1[A]]KA2[A2])(1+KR1[R1]+KR2[R2]+

 

 

C{2KR1[R 1K1: 1R2»
No activator and repressor bound to the DNA: F N =é—
KAI [All
Activator 1 is bound to the DNA: F A] = Z
K 2421/12]
Activator 2 is bound to the DNA: F A2 = T—
1er
Repressor 1 is bound to the DNA: F R = 1Z
1
[R21
Repressor 2 is bound to the DNA: F = —L——
R2 Z

C1612 KAIIA1 JKAZMZI
Activator l and 2 are bound to the DNA: F

A1A2 = Z

 

A1 [A1 ]K R [R1]
Activator l and repressor 1 are bound to the DNA: F = l

AlRl Z

 

KllKlRl
_AIA1R22

Activator 1 and repressor 2 are bound to the DNA: F _-

 

 

A1R2 Z
K A2 I A2 1K R [R] l
Activator 2 and repressor 1 are bound to the DNA: F = 1
f‘2R1 Z
K A21A21KR [R21
2

Activator 2 and repressor 2 are bound to the DNA: F

A2R2= Z

 

181

 

r
CIZKR [RllKRleZI
Repressor 1 and 2 are bound to the DNA: F

R1R2 Z

Activator l and 2, and repressor 1 are bound to the DNA:
CIZK A llA'IJKAZIAZIKRl [R

F _

A1A2R1 Z

Activator 1 and 2, and repressor 2 are bound to the DNA:

F C12R1A [AIRZHAZMZIK
A1A2R2 Z

Activator 1, repressor 1 and 2 are bound to the DNA:

KA11A11C1’2K R [R 1KR21R21
F

A1R1R2: Z

 

 

 

Activator 2, repressor 1 and 2 are bound to the DNA:
F KA21A21C1’2KR11R1KR21R21
A2R1R2: Z

Activator 1 and 2 and repressor 1 and 2 are bound to the DNA:

Cf2A1A 1KR21A21C12R1R1KR21R21
F
A1A2R1R2= Z

Then the states vector of two activators and two repressors can be written in the
following way.

 

=[F,,,FFFF,F

, R,F ,F ,F .

N A1 ”‘2 R1 R2 A1A2FA1R1F1R2 A2R1 A2R2 R1R2
’ ’ ’ ’ ]

FA1A2R1 FA1A2R2 FA1R1R2 FA2R1R2 FA1A2R1R2

182

 

In the expressions above we enumerate the Boltzmann states of an enhancer with two
activator and two repressor binding sites. We claim that the binding of the repressors
modulates the probability of states with activators and repressors simultaneously bound

by a quenching factor. For example, binding of the activator A}. and repressor Ri

A .
simultaneously is reduced by a factor of (1-qRJ ). We can modify the probability of
i

states, in the following way:

“FA: "1
=1FN [Hf/21,17 .1? .FR 2r 2,2,13,1le q>R1AIR11 qurrAlea— qRIZ.)

2
"2 "1 R1
(1- q ).F —q )(1- 2).F (l- q )(l- 61
FRsz R2 R2HR1R2R1 R1 qu A2R2 R2 R2
A
2111 A2)

:1 ’1
F (l— )(l— q ).F (l-q -q 1
AIRle qu R2 "2R1R2qR1 R2
A
A1111- ZAIXI— AZXI—qRZn

4R

FA1A2R1R2(1_ qR1 2 qu 2

A2)

With a description of states in hand, we calculate the total efﬁciency of the enhancer.

Let E A] and E A be efﬁciency of activators when they are bound to the DNA alone. We
2 _

can write the case of two activators bound to the DNA at the same time in the following

way. Here WI“ and w; are the weight constants.

E = waE +waE

“1’42 1"1 2A2

183

LetE and E R be efﬁciency of activators when they are bound to the DNA alone. We

R1 2

can write the case of two repressors bound to the DNA at the same time in the following

way. Here wlr and w; are the weight constants.

E =w’E +w_"E

R1R2 lR1 2Rz

Then the overall efﬁciency vector of two activators and two repressors can be written in

the following way:

E=lE .E .E .E .E .E .E ,E .E .E .E .
N A] A2 R2 21.2 1111 1R2 A211 A2112 11R.
E .E ,E .E ‘ .E l
A1A2R1 A1A2R2 A1R1R2 A2R1R2 A1A2R1R2
=[E,E,E,E,E,waE+waE,E+E,E+E,
”Al/‘2 11214221111111.

E +E ,E +E ,w’E +w’E ,waE +waE +E ,
A2 Rl A2 R2 1 R1 2 R2 1 Al 2 A2 RI
a a r r r r
wE+wE+E,E+wE+wE,E+wE+wE,
1 A1 2 R2 I R1 2 R2 A2 1 RI 2 R2
a a r r
w E +w E +w E +w E l
1 A1 2 A2 1 Rl 2 R2

Then we can obtain the total expression level of the enhancer as the sum of the

expression contribution of each state.

~

Ex: 2F i TREi)’ where T is a regulatory function used to calculate expression
i

contribution of each state. If we set the following simple assumptions E N = 0, E A1 = 10,

184

1
E =10, w“ = 0.5, w“ = 0.5, E = 0, E = 0, and T(x)= —, the general
A. I 2 R1 R2 1...s-x

framework described above reduces to a form employed by Zinzen et al (2006) and total

expression of the enhancer with 2 activator and 2 repressor binding sites can be written

as:

Esz +F +F

A1 A2 RRZ+FR1R1"“’R1 RR“ ,R

A
A1 )+F (1— 2)+
A

A2R1 qu
FRsza—qRZRF AZRIU1:1)(lqR12)H7A1:2R2(1-q212)(1—q2:
F41R1R2(l_qRixl_ qA‘z)+FAR1R<1 41210—qu?
FA1A2R1R2(1-42:)(1-q:1)(lq-CIR:12)(12A2)

)+

Cooperative Binding:

Incorporating cooperative binding for the general case is not a simple matter to
solve, since we might have all different cooperativity parameters for different protein
types. We can have simple assumptions like all neighboring activators (repressors) are
cooperating or all binding activators (repressors) are cooperating. The formulations of
these schemes for homotypic cooperativity are explained in Zinzen & Papatsenko (2007).

We can also assume that cooperative binding level changes by distance such as
proteins cooperate better if they are closer. This could be done by multiplying the
cooperativity parameter C by a distance dependent function as described in Segal et al

(2008), which used a Gaussian function with mean 0 and variance 50.

185

 

Combining fractional occupancy to expression:

The modeling of an enhancer and its fractional occupancy must also consider how
those states ultimately affect transcription levels. If we assume that occupancy of the
enhancer by the activators is the only indication for transcription and there is no need to
specify different weights to different states we could use a simple linear

function T (x) = x (Zinzen & Papatsenko, 2007). However, numerous experimental

studies have demonstrated nonlinearity in gene activation, which probably reflects the

thresholds imposed by chromatin and inherent limits on transcriptional initiation rates.

Thus a nonlinear function of sigmoidal type T(x): ———15—7 can be used, as described
l+e —

in Segal et a1 (2008), where b represents basal expression level. We used

T(x) = ———l—; for our analysis on synthetic enhancers and rhomboid NEE.
l+e _

Binding afﬁnity:

The set of synthetic gene modules used in this study did not explore the effects of
different qualities of binding sites to any extent, but for analysis of endogenous
regulatory regions, this consideration is paramount. In the rhomboid NEE study, for each
transcription factor, we took the score of the strongest site among all those that bind that
transcription factor as a free parameter and constrain the other values by treating the

PWM score as a free energy of binding (Stormo, 2000).

Parameter Estimation:

186

To ﬁt the model with the actual data we minimized the root mean square error

between the predicted and experimental data. Root mean square error is deﬁned in the

n
following way: RMSE=(_1. 2 (y (j)— ye(j))2)l/2

, where n is the number of data
"j=1 P

points, yp is the model’s prediction and ye is the experimental data (e.g. Figure 3). The

ﬁtting is done by optimizing the parameters that gives the minimum RMSE error. Global
parameter estimation algorithm, evolutionary strategy (ES) as described in Runarsson &
Yao (2005) is used for estimation. ES is a nature-inspired parameter estimation method
belonging to the class of evolutionary algorithms which uses mutation, and selection
applied to a population of individuals containing candidate solutions in order to evolve
iteratively better and better solutions. Three global parameter estimation techniques
(genetic algorithm (GA), simulated annealing (SA) and evolutionary strategy) were
checked for our model on synthetic data before we selected ES. The results of BS, GA
and SA are shown in the following ﬁgures (Figure A-1-3) on synthetic data. The
estimation is done for 9 different schemes and 100 runs and results are plotted. The ES
algorithm clearly functioned better than genetic algorithms and simulated annealing on
our model. For further testing of our parameter estimation algorithm, we added different
amounts of noise to the synthetic data ( 1% to 30%). Adding noise to the synthetic data

does not change the estimation results drastically (Figure A-4).

un

Figure A—l: One hundred parameter estimation runs are shown using the evolutionary
strategy parameter estimation method for all schemes (variant formulations of the model

employing different assumptions for cooperativity and quenching).

188

0.8;
0.6%
0.42
0.2:-

0.8
0.6
0.4]

0.23

 

 

 

 

 

 

3100
580
60
40
420

C1 02

100
so
60
:40
20

ere;

100
«80
‘60
‘40
20

c1 c2

189

0.85
0.6-
0.4;
0.2:

1%

p
0.8;
0.63
0.4

0.2,

0i. _

1.
0.8
0.6?
0.4
0.2-
0- e

' .

 

20 4o 60 80 TW DI

 

bp
D D
D
D
D
"2'0 _ 4‘0"}36" ‘ee" TWBT
W
D

 

20 4060 80 Two:
bp

Figure A-l: Continued.

D

 

 

 

 

 

 

 

 

 

1; l i '100 1'
E . -> > D
0.8; 80 0.8- >
0.6: 60 0.6' >
04 :40 0.4' P
l j ”
0.2: :20 0.2- >
0 >1 . . '. o o- l . . . . '
E R C1 02 20 40 60 80 Tw DI
E b"
1? V ' 100 1'
l D D D
0.8; 460 0.8 .
0.6l _60 0.6% >-
04? 40 0.4- ..
0.2l 20 0.2: >
D ; '
0% - ‘10 0: _ ._ > . _
E R 01 02 20 40 60 80 Tw DI
F '0"
12 .100 1' l
; "p >7 y
0.8i , £80 0.8. >
0.6: $60 0.6? >
0.4l 40 0.4; >-
0.2 =20 0.2 >4
D .
0 >. _ ’ 0 0 _ 4 > , _ V
E R 01 02 20 40 6o 80 Tw DI

bp

190

Figure A-l: Continued.

G

 

 

 

 

 

 

 

1' ' 2100 1
I , V: D': D
0.8 80 0.8» ,1.
0.6% 60 0,5 ' DO 0
. D3
04% 940 0.4- v:
i ; DO
02% $20 0.2- o .e
DC )3
01 > . " , o o- . . ‘ >0; . _
E R C1 02 20 40 60 80 Tw DI
H ”p
1% e ' 1100 1
! E D D p
0-85‘ 180 0.8; v
0.6l E60 06; >-
0.4§ f40 0.4i >-
02? 20 0.2;
. , _
0' . ’ ' 0 0 _ _ .
E R 01 02 20 40 60 80 Tw DI

I RR

 

 

 

1; ’ ‘ ‘ 5100 1:

1 ," D D
0.83; {80 0.8; >-
0-6? 60 0.6: .-
04 40 0.4 >
0.2; 20 0.2 >

. D

0% ' ’ -0 o A _ A

E R 01 02 20 40 60 80 Tw DI

bp

191

 

Figure A-2: One hundred parameter estimation runs are shown using the genetic

algorithm method for all schemes.

192

 

0.8'
0.6
0.4
0.2

0.8
0.6
0.4
0.2

0.8}
0.61

0.4
0.2

.l

 

 

 

 

 

 

C1 02

’C1'ECé _

’I
01 c2

1 OO
’80
'60
'40
1720

1 00
8O
60
4O
2O

1 00
80
‘60
4O
'20
0

193

0.8-1 1

emu—L...

 

é

.1- .

0.41 751 g

0.2. I" M >6
0

20 40 60 80 Tw Dl
bp

.11 112
. 1 1° .

I}

0.4
0.2
O

T 7 ‘20" '40'V’60 730 1.751“
bp

2:? l i H

0.4 " >5 .

0.2 . ' vi

20 40 60 80 Tw DI
bp

L'J

 

 

Figure A-2: Continued.

D

 

 

 

 

 

 

 

 

 

 

 

1- ' i 100 1: ' 1.; ‘ ' W;
'i 3 i q C? ’ 1
0.81 . :60 0.81 >8 1 , I I
0.61 60 0.62- >51 _ 9 i
041 _' 40 041 13 j
021 i c :20 0.29 '
v f , 4
0i J . - 'I 'l ;0 0' . , v. _. .,
E R C1 02 20 40 60 80 Tw DI
E bp
1; ‘ ' 100 1 ' ' :
1 Di Di D :
0,81 780 0.8 I. _. ! '
0.61 60 0.6
0.4g 40 0.41
0.2,i i .3 .20 0.2.
. ' D 1
0 J ‘ " o 01 . . _ . i
E R 01 02 20 4o 60 80 Tw 01
F ”p
1? ‘ :100 1g * . :. 1
1 Di 3 H >| D 1
0.81 @80 0.8% I '
0.61 60 0.6 i; ‘
0.4- 40 0.4- E
0.2 i _ 20 0.2- 1.- il
. p "’
0 J . 'l . 0 0
E Fl C1 02 20 40 60 80 W on

DD

194

 

Figure A-Z: Continued.

G

 

 

 

   

 

 

 

 

 

 

 

1 100 ' =
0.8; '80 1 b!
0.6 60
0.4; 40 3
0.21 i 20

1 D .

0 J . " 4 o - .' . A

R C1 C2 20 40 60 80 Tw 01
H bp

1 100 1 . '

. pi O '3 vi p
0.8, 380 0.8,. 1 .,
0.61 60 0.6: ' >I , 1;;

I ’ :-

04: 40 0.4 11>! B
0.2 I , , 20 0.2' >5
. D

0 vi 1 'i ’i O 0 . E .

E R C1 02 20 40 60 80 Tw DI
| hp

1, ' @100 1' ‘ ' ' g

C ' 1;, > p ;
0.8, :80 0.81.! 'E 1 I
0.6' 60 0.6 ' >1 .. f -
0~4 40 0.4 I ~
0.2 i 20 0.2 >

' D
0 b. 1 .i ’l ’0 0 . , . '

E Fl C1 02 20 40 60 80 Tw 01

bP

195

Figure A-3: One hundred parameter estimation runs are shown using the simulated

annealing method for all schemes.

196

 

0.8-
0.6
0.4*
0.2'

0.8‘
0.6;

0.4
0.2

"~ (11

" DJ) 2.

. 1);.

V

—J&Xih :1

-
_I’JZIDL‘JL .3-

V

 

 

Fl

9

ocm‘

 

 

 

" I [I comm-1n WET-(1)1337

E
v

C

O C" ‘

-
mum 0:1. 1 mm

V

 

 

100

'80

60
40
20

'100

80
60

‘40
,20

1 00
80

'60

4O

‘20

197

1 1

0.8’
0.6
0.4‘
0.2}

1
0.8
0.6
0.4
0.2

0

V

mmmmo .I-ZIC

 

 

Elihu”) Dr:

'7 2016' 6673? TV ’-

L‘C‘ﬂm sigma-r. -

V

V
,_
I,

‘ C(l VIC"
V

 

20 '4o’ 60' 80 w

Figure A-3: Continued.

D

01?
(16
04%

 

01!
Of;
ng

:
. o‘
l

0.2‘
. . .

 

01:
01;
02¢

 

 

 

 

IUD (NICE 1 I?

.‘D II: {IIJC’E'ZJUIVQ' ."O (I) -

MID EII'CCID

V
V
0‘

’cﬁ <52

Tilt (1313‘

2130030 Ch? '

3 1.

V
m

'CH <32

(ID 0 QB in

0; am: "NUDE? C’ '1

CH <32

100
‘80
T60
:40
12o

i100
Q30
:60
340
'20

‘100
$80
60
£40
:20

198

1% '
(18 i >i

013‘

021

02:
o:

5).

,l 2

2o 40 so 80

V
. 0. 0m MRmIﬁI-‘r O! C

—
.-

 

20 40 so 80

bp
1: ' .
0.8.1 i
0.67?‘1 _vE
S

<14
<12- *; >
0’ . 1 H
20 4o 60 80
bp

 

 

 

 

“IO!"

Tultﬁi

Tull]

fi‘Kd' -
<10<mnin

Tuttm

Figure A-3: Continued.

G

1.
0.81
0.6i
0.4:
0.2§

03
H

0.8;
0.6;
0.4.
0.2

0.8;
0.6
0.4
0.2;

 

 

 

m _

 

 

 

cm (1 O

V
{IT'IIIIO ‘11) C) ( 13117 3‘31: '1

C1 C2

:11 -

mmuo : 3r: :cm.»: a

V

C1 CZ

IIIIUTO 3 32-0 II"; 3K:

“'1'
-..L

V
mm: )3; ‘ .1}

C1 C2

TC- ‘

’1-
RA

01:34) HIKWXI

:0! PD ‘-

V

'E~.D I ‘1

CY.

(“1 i3?

311 IO? 2.?

V
mum-7

180
60
{40
920

.100
180
360
:40
20

gm
:80
:60
{40
.20

199

 

0.81
0.0- ‘
0.41
0.2

0.6;
0.4»
0.2'

‘1‘, .

I ‘1; ‘k . _
)

man: ”33);“

 

j
20 40 60 80 Tw 01

bp
.3

20 40 60 80 m DI

3
DP:
1

t

4

.

I

‘l

l

D
' (Mid' 1

 

 

20 40 60 80 Tw 01
bp

 

Figure A-4: Robustness of the evolutionary strategy parameter estimation technique on
synthetic data with added noise. (A) No noise, (B) 1% noise, (C) 3% noise, (D) 5% noise,

(E) 10% noise, (F) 20% noise and (G) 30% noise.

200

 

 

0.8-
0.65
0.4'
0.2}

011
or?
011

0.2

0.8%
0.6:
0.4'
0.2-

 

 

 

R

 

 

 

Ch <32

<31AC22

’0 ’0

CH <32

100
-80
:60
740
—20

7 1100

80
60
40
'20

i1 00
80
60
40
20

201

1.
0.8
0.6'
0.4'
0.2

0.

011
or;

0.4
0.2

D

20 40 60 80

‘20"”40"604“00"4

bp

VI

20 40 60 80
bp

 

 

 

iﬁﬁmtn'

Tw DI

Figure A-4: Continued.

D

 

 

 

>0

>0

>0

 

 

 

’0 .0

C1 02

A

pa ,'
C1 C2

.0 ,i
C1 C2

100
580
‘60
:40
E20

100
‘80
'60
:40
520

1 00
8O
60
140
20

202

1?”
>0
0.8'
0.6'
0.4'
0.2:-
0

0.8
0.6
0.4
0.2-

1,
1".
0.8
0.6
0.4
0.2'

O 3

>0

4

 

20 4060 80 Tw 01

bp

 

 

an

20 40 60 80 TW DI

20 4'0 60 80 Tw DI)

bP

Figure A-4: Continued.

G

1
0.81
0.6}

0.2;.

O .

 

>0

R

 

*. pi
C1 C2

1100
00
$60
.0
‘20

203

1,
"I
0.8'
0.6:
0.4.E
0.2%

O ,

 

2'0 40 60 80 m DI
bp

 

Figure A-5: Results of parameter estimation for R, C,, C2 and quenching employing

different schemes. (A) scheme 1, (B) scheme 2, (C) scheme 3, (D) scheme 4, (E) scheme

5, (F) scheme 6, (G) scheme 7, (H) scheme 8 and (I) scheme 9.

204

 

 

 

R

 

 

 

C1 C2

C1 C2

C1 C2

430
$24
{18

12

30
24
'18
12

O

30
3.8
:12

O

205

0.8
0.6-

0.4
0.2

0.8
0.6%
0.4:
0.2

1

0.8-
0.6:-
0.4-
0.2"

0 a

 

2040 60 80 Tw DI:
bP

 

20" 46310-0 ‘80- 1.181“
DP

 

2‘0 40 60 8‘0 Tw 01
bp

Figure A-5: Continued.

D

0.2
0.181
0.12
0.08;

0.2
0.18!
0.12
0.08:
0.04;

 

 

 

 

 

 

C1 C2

C1 CZ

C1 C2

30
24
18
12

30
24
18
412

O

30
24
‘18
12

206

0.8
0.6
0.4

0.2-

0.8

0.6;
0.4,-

0.2

1

0.8-
0.6,
0.41
0.2'

0

 

20 40 60 80 Tw 01
bp

 

20 40 60 80 Tw DI
bP

 

20 40 60 80 1w on
DD

 

 

Figure A-S: Continued.

G

 

 

 

 

 

 

 

 

 

 

0.2 - - A 30 1
0.16; . 24 0.8
0.12 18 0.6' ' . ' . °
2 .0 ° 0 . o
0.08 312 0.4E
0.04, , 8 0.2} '
; . F o ‘
i - . . 1 . A. _ 1 ; _.
0 E R C1 C2 0 O 20 40 60 80 Tw DI
H hp
0.2; - < 4 30 1
0.16} :24 0.8
0.123? 18 0.6 ° 7
1 C O
t . o o
0.08 12 0.4
‘ 1 ‘ O
0.04: o :5 0,2 0
O O .
O E R C1 C2 0 0 20 40 60 80 Tw DI.
I hp
0.2 , 30 1
0.16; 24 0.8:
0.12? =18 06?: °
’ To 0 o ‘
i 0 , 1 o
0.08§ :12 0.4-
C
0.04? . 6 0.2 '
i C . .
0 E R C1 C2 0 O 20 40 60 80 Tw 01

bp

207

Figure A-6: Comparison of average error induced by leave-one-out analysis, employing

nine different formulations of the model. Lower error levels were noted with schemes 1-

4, 8 and 9.

 

0.15

0.1* o o 0

RMSE

0.05'

 

 

 

1234567693
Scheme

208

Table A-1: Ex functions for the synthetic enhancers of the model.

209

 

mu. #9 n mwb new _ u Mb .mbn emu .Gu “MD a new m 95ch 5d
«9+ _9 n mwu ucm e u Mu _No H MC .6“ MC v ucm m mEmcom 5m

#18“ 218.212.212 ....
~91 Ge use 169 C169 91qu tam o m m $5285“.

 

 

 

 

 

 

2518me + NECEWXMWC + “mo + “we + 51% : €31 : uSmdmeom
x 1 Rm 538
A sagas 1 5% 1 a} 1 ammo + Gauges 1 _st 1 an .34»
m G N Go
imw 1:}15MC + 3? _x «T 3MB + 51$? 1 we 1 $1 a + a
53 i x $31 + _ 1 am umammé
51m} 1 s i if 539
_ e -
w n o m _. mEocow 5m
6 358.?
x S x osmomam
N62 1359 + :2 Q i €31 : gammam
x 1 am UNNQEN

manage}? 32e1§wo+§§§1 «1e: :3

 

53mm

 

 

210

 

 

NNCAACCC :iCCC CNI if:

 

 

X ”Rm #0 . . .Hm
NANAACC CAANM1 _CACN1 _CC+ACCCNAN,ANN1 _N1NC: ENC wawNNmCmNAN
_C1 CCC A: @528 Co“.
min: mCN CAM: NATE «32+: wCCiGCNA mm: A<C<m+A
X ”R.“

 

 

AN ACCCNCA1NNCCCANw1_CACN1NCCCNCC+NAACCCNAN1C1AANN _CANN1 CC EC»,

+AmN1CCAAN1_C+ANN1.ACAAN1_CNCCCLCCCNACANN1NN1AN1NC+¢

CSCNCNCNN

 

 

 

.825an ”3A 22:.

211

Table A-II: Parameter assignments for all the genes and schemes. Capital qs represent
the quenching efﬁciency of repressors shown in the gene structure and lower case qs
represent the assigned parameters, which are grouped into different ‘bins’ of distances as

explained in Materials and Methods.

212

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ocuNo CauNo N Fume NouNo NcuNc £30 £30 CNuNo wcumo
316 NEE Ncﬂo $15 $15 Euro 315 mcﬂc Nana 2
U r1NQ\/a\ ouumo CcuNc :cuNc NauNo NouNo Ccumo CcuNo 938 233
315 315 315 815 315 Nan—o 615 SHE 630 3
No
UH; manna manna :cumo Naumc Sumo manna €18 canoe €18
{wo\Jc\,ﬁo. NcuNo NcuNo Nano Ncumc NcuNo NcuNo NauNo NcuNo NcuNo
Bio 815 315 315 330 315 315 315 315 N.
oumo oumo oumo 018 018 018 oumo oumo onmo
3 £18 muuNo opaumo Caumo oauNo £13 313 .313 game
315 815 Nana A815 315 $15 $15 maﬁa maﬁa t
6 ”ml manna manna A318 318 manna manna manna manna manna
no 6 NauNo Nanmc NcuNc Nanc NcuNo NcuNo Ncumo NcuNo Naumo
815 315 315 315 615 :30 315 315 :30 A:
Ugaul Naﬂo Ncﬂo maﬂo Nana Nana Naﬂo Nana Non—o Nana 2
(p0\\
315 315 315 315 815 315 Suﬁ 315 3qu m
018 018 38 018 013 018 $8 018 ouNo
1E0 315 mauve CACHE £30 315 $15 315 Nana m
318 318 oEnNc CunNc oano mcuNo mouNC mcumc mauNo
No Nana mcﬂo Euro Nana N515 Nona Nan—c €15 maﬁa e
318 $13 mcuNc mcuNo 318 Eng Ncumc Nanc Nauwo
/|\( Ncﬂc NauE vuﬂc muupo mauve Nana Nana Nun—c NuuNo n
_o 6 3an mano CuuNo $18 Nano maumc $13 $13 munmo
NEE Nana maﬁa Naupo Naupo Nana N515 Nun—c NEE N
NcuNo NcuNo NNuNo Nano Ncumo Nanmc NcuNc NauNc NunNo
815 Bio 315 315 315 315 315 615 315 N
2:826 230 am am Nm 8 mm om mm Nm 5 89C

 

 

 

 

 

 

 

 

 

 

 

 

213

Table A-III: Oligo and primer sequences used in this study.

 

DA782 / 783 (2
Twist)

5'-GATCCQAIAIQTTGAGQAIAT_GT
5'-CTAGACATATGCTCAACATATGG

 

 

DA784/ 785 (2 S'-CTAGAQ_QQAIIII_CQQAAATCGA

Dorsal) WCCGC
5’-GGTTGGGTTTTCCCTC GAT'I'I'GGGAAAATCC
CT

DA786/ 787 5’-GATCTGGTTAGTAAGCTGTAAACTG

L2 5bp spacer) 5'—GATCCAGTTTACAGCTTACTAACCA .

 

DA792/793 (2
Giant)

5'-AATTCLA_IGAQQ£AA§AATGCGACTCG
WC
5’-GATCCTCTTGCGTCATACGAGTCGCATTCTTG
CGTCATAG

 

 

 

 

 

 

 

DA1255/ 1256 5'-AATTCIAIGA§QQAA§AATGCGT

£1 Gt-EcoRl) 5’-AATTACGCATTCTTGCGTCATAG

DA1257/ 1258 5’-AATTCIAT_GA§_G_CAA§AG

(1 gt- S'-GATCCTCTTGCGTCATAG

EcoRl/ BamHD

DA(1259/ 1260) 5’-TGIAEA§§§AAQACCGC

(1 gt-Sacll) 5’-GGTCTTGCGTCATACAGA

DA140 3 / 1404 5’-AATTA§ATAI!IQTTGAGQAIAIGTCTAGA

(2 Dl-ZTW) WTCGATTWG
5’AATTCGGGAAAACCCAAAATCGAGGGAT'I‘TTCC
CATCTAGACATATGCTCAACATATGT

DA1405/ 1406 5'-GATCTGGTTAGTAAGCTGTAAACTGGATCTGG

(SObp spacer) TTAGTAAGCTGTAAACTG

 

5'-GATCCAGTTTACAGCTTACTAACCAGATCCAG
TTTACAGCTTACTAACCA

 

214

 

Table A-III: Continued.

 

 

 

 

 

 

 

 

 

 

 

 

10. DA1407/ 1408 5'-TGGTTAGTAAGCTGTAAACTGGATCTGGTTAG
(SObp spacer/th) TAAGCTGTAAACTGWCCGC
S'-GGTCTTGCGTCATACAGTTTACAGCTTACTAA
CCAGATCCAGTTTACAGCTTACTAACCAGC
11. DA1337/1338 5’-AATTCWG
[1 Krueppel) 5'-GATCCGCTTAACCCGTTTTG
12. DA1339/ 1340 5’-MTTCWGACCC
(2 K1”) WC
5’-GATCCGCTTAACCCGTTTTGGGTCGCTTAACC
CGTTTTG
13. DA1341/ 1342 5’-AATTCAAAA§_QG§IIAAQ§GACCC
(3 Kr) WGACCCW
CG
5'-
GATCCGCTTAACCCGTTTTGGGTCGCTTAACCCG
TTTTGGGTCGCTTAACCCGTTTTG
14. DA1352/1353 5'-ACATACTAGTAACCGCTTTAGTCCCGCCAG
(Kni.340bp.SpeI. 5'-ACATACTAGTTGTGCACGGAGCTCCGCGAG
Spacer)
15. DA1833 / 1834 5'-GATCTGGTTAGTAAGCTGTG
(20bp 5’-GATCCACAGCTTACTAACCA
spacer.BamHl)
16. DA1835/ 1836 5’-GATCTGGTTAGTAAGCTGTAAACTCTGGTTAG
(3 5bp TAG
spacer.BamHl) 5’-GATCCTACTAACCAGAGTTTACAGCTTACTAA
CCA
17. DA1837/ 1838 5'-AATTCT1_;G_C_GA§§_CAI_CAATGCGT
(Gt.[at).EcoRl) 5’-AATTACGCATTGATGCGTCGCAAG
18. DA1839/ 1840 S'—AATTCTIQ§_GA§§§AICAATGCGACTCGT
(2Gt.(at).EcoRl / Ba WC
ml-ll) 5’-GATCCTGATGCGTCGCAACGAGTCGCATTGAT
GCGTCGCAAG

 

215

 

 

Table A-IV: Robustness of Evolutionary Strategy (ES) parameter estimation. The
parameter estimation algorithms were run 100 times using the scheme 2 form of the
model on a synthetic data set with increasing amounts of noise. Parameter value
(Standard deviation). Results from parameter estimation using Genetic Algorithm (GA)

and Simulated Annealing (SA) are shown for comparison.

216

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ANNACACC ANNACACC 88.9 68.3 88.9 ASACACC ANNSC ANACZCC AACC ad 8
~85 See ad ad See ad v85 83 ad

ANCACACC ANNACACC ANNACACC ASACC ACACACACC ANACACACC 32.9 :29 AACC No mo
83 88 82C 82C ACACNAC Nd NNNAC CNNAC Nd

AmNACACC ANNACACC ASACC ACACACACC ANACACACC AEACACC ACCACACC ANCACACC AACC NC 5
52C 82C NACNAC 83 82C 80o de CNNAC .3

28.9 ANNACACC :59 68.9 88.9 :39 A99 839 AACC 3C 8
80o and SC CAC N85 83 33d 86 ed

ANSACC ANACACACC ANACACACC ANACACACC ASACACC AACC ARACACC ANCACACC AACC mo No
83 82C 89¢ 83 83 2C 33 N56 no

ACSACC :69 68.9 ANACACACC AEACACC ASACACC ACNACACC AEACACC AACC mo 5
N85 N86 83 83 Name 83 N5. 33 ad

ANmNNC ANNoNC ANNNAC ANNZCC ASN.9 ANNACACC Ammo.mNC A53: AACC N No
:3 :4. 2.8 Ba 8d 83 A: S... 38 m

A8 A. AC ANCCACC ACNNACC A8 A .9 A A SC 832 ACNCNCNC 839 8C m 6
$3 an NSC NNAC.m 89m 53 N85. Nome m

$459 32.9 ASN.9 ANNSC ANCACACC ANNACACC ANS: ACNNNC A9 A: m
98.2 53 83 33C 83 8mm SQNN m3: 2

A 599 ASACACC AACC AoC AACC AACC 88.9 ANSACC AACC AC mmzm
A: 8 Bad Bad N56 N56 33C 83 N86 AC

A.\.ACNCwm $.8wa Aeoosmm A.\.mam 3.9mm Assam A.\.ACC<m A.\.ACC<.C 3.9mm .me

 

 

 

 

 

 

 

 

 

 

 

 

217

Table A-V: Extension of model to Knirps and Krﬁppel short-range repressors.
Parameter estimations were done 1000 times for each formulation of the model (schemes
1-9) to ﬁnd the R (repressor scaling factor) and C (cooperativity) parameters for Knirps
and Kruppel, ﬁxing quenching efﬁciency parameters to those of Giant. All schemes
generated similar cooperativity values for the two proteins, and dissimilar repressor

scaling factors. Parameter value (Standard deviation)

218

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

AmNCC AN: ASC ASACACC ANSC AmACC ANACC AACC
SAC N 8 mood A 8.0 mg NACACAC m
Ade A: ANNC ASACACC ANNACC ANNACC ANACC AACC
N00 3 5 wood 3 one A: NACACAC AC
AN: Am: :99 AmNACC AmNACC AACC
5 ACN 290 Ed 3 NACACAC N
A: AN: A809 ANNACC AACZCC AACC
3 mm 20.0 8C 3 80.0 N
Am. C A A C 38.8 ANNACC AEACC AACC
,3 8 wood I we NACACAC m
ANNACC Am: ANCC AACC ASACC ANACC ANNACC AACC
5.0 0N mm :3 no No 3 NACACAC N
ANACACC AZC ANACACACC ANNACC ANSC AACC
I E 20.0 Ed 3 NACACAC n
:08 ANmACC ANCC AACC ANNACC ANNACC ANSC AACC
NNAC N om SAC No.0 No.0 3 NACACAC N
A39 :5 ANACACACC ANNACC AmtoC AACC
I B BC 8.0 NA 80.0 N
N
No 5 m memmm No 6 m mommm 8.20m
.835. 3:5.

 

 

 

 

 

 

219

      

   

YL

m1111111|111II111111111111111111“

697