PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 K:/Prolecc&Pres/ClRC/DateDuerindd

IMPROVING RISK ASSESSMENT OF TRANSGENE INVASION:
ASSESSING THE ROLE OF UNCERTAINTY IN PREDICTIONS

By

Ashok Ragavendran

A DISSERTATION

Submitted to
Michigan State University
in partial fullﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Fisheries and Wildlife
2009

.,.§i.

;. ‘3 {l " I
:Ab\\‘, ”I
T V

Alba)”; .‘

ABSTRACT

IMPROVING RISK ASSESSMENT OF TRAN SGENE INVASION:
ASSESSING THE ROLE OF UNCERTAINTY IN PREDICTIONS

By

Ashok Ragavendran

Genetically Engineered organisms (GEOs) have the potential to pose a threat to the envi-
ronment and it is imperative to develop scientiﬁcally sound methodologies for assessing
the associated ecological risks. The main emphasis of ecological assessment is for trans-
genic organisms that have wild relatives and are relatively difﬁcult to contain or have
high dispersal capabilities (e.g., plants, insects and aquatic organisms). Key to managing
environmental threats of GEOs lies in the evaluation of the possibility of transgene intro-
gression into natural populations and this is currently accomplished by using deterministic
models of varying degrees of statistical and computational complexity. Accounting for
uncertainty in these assessments is vital for quantifying risks to make realistic predic-
tions and also to gain insight for guiding further research and effective policy towards
mitigating potential hazards.

Chapter l of this dissertation addresses the issue of quantifying the effects of demo-
graphic stochasticity on predictions for transgene introgression. This work generalizes
existing methodology by incorporating demographic stochasticity using a Monte Carlo
approach. Results show that considerable variation arises in the prediction distributions
of the transgene frequency and extinction probabilities, and also that the effects of de-
mographic stochasticity vary among ﬁtness components. Most importantly, stochasticity
allows for increased persistence of the transgene beyond what the deterministic model
predicts, even for cases where the transgene would have been lost early.

In Chapter 2, another source of variability is addressed, which arises from uncertainty
in the estimated parameters on predictions. We estimate ﬁtness components from exper-

iments, using genetically modiﬁed Zebraﬁsh (Danio rerio), to be used as inputs into the

.‘U k...
r“ ’
‘5'. ..s -5
,.,. a

,1 \

‘_.‘-‘» N

v

, H
fr ’" b
tl‘ -

‘ a

J ‘
L\.‘-4.~
‘1‘, :15.-
v.. _S

w. n v
- .
‘_ ..... -

~

[JPSOn
I “£4

u‘.‘

k“

net ﬁtness component model. Applying bootstrap approaches we show that estimates of
ﬁtness components for the wildtype and transgenic genotypes present considerable varia-
tion, both in absolute and relative values, leading to extra variation in model predictions.
Results also show that even moderate variation in estimates of individual components
can generate large effects at the population level due to non-linearity and the interactions
among genotypes. The model predictions showed good agreement with results observed
in mesocosm experiments, indicating that realistic prediction in the case of GEO risk as-
sessment can be made using carefully constructed experiments and modeling approaches.
In Chapter 2 we also present a Copula methodology to incorporate dependencies among
ﬁtness components due to life-history trade-offs, to assess their effects on model predic-
tions. Results show that assuming independence among ﬁtness components can produce
overly conservative predictions. Further, correlations among absolute ﬁtness components
differ from that among the relative values of ﬁtness components, which are the ﬁnal de-
terminants of transgene ﬁtness.

Based on the simulations for the ﬁrst two research chapters, there arose a clear need to
formalize approaches for exploring the parameter space while reducing the computational
burden. In Chapter 3, a global sensitivity analysis was considered to examine the effect
of parameters on the model. Meta-modeling using a Bayesian Gaussian Process (BGP)
was employed to improve the efﬁciency of sensitivity analysis and thus reduce the overall
computational burden without sacriﬁcing model complexity. The predictions from the
BOP model are shown to provide satisfactory performance as an emulator. The choice
of a Bayesian approach is deliberate as this framework is flexible enough to incorporate
model extensions as well combine outputs across multiple models. Moreover, this choice

will facilitate integration with other areas in future.

This thesis is dedicated to my mother Bhavani Raghavendran and my sister Asha

ACKNOWLEDGEMENTS

First and foremost I would like to acknowledge my family, my mother, sister and
brother, who have made a lot of sacriﬁces during my entire education because of which I
am in this position today. I thank you for your faith and belief in me.

My term at MSU involved being part of two departments, Fisheries and Wildlife and
Animal Science. I have had many memorable interactions in both departments and I
would like to thank everybody who made a difference, both, on a academic and personal
level.

First, I want to thank my advisor, Dr. Guilherme J .M. Rosa. 1 am deeply grateful
for his support, encouragement, motivation and his commitment to me throughout. I also
thank my committee members Dr Jim Bence, Dr. Bryan Epperson, Dr. Kim Scribner,
Dr. Rob Tempelman and Dr. Scott Winterstein, a long list and I have never regretted
it. I can’t thank them enough for coming through time and time again with support,
encouragement and their willingness to discuss ideas anytime. I would like to especially
thank Dr Tempelman for having given me the wonderful opportunity for consulting, not
just for the ﬁnancial support it afforded, but for letting me strengthen my statistical skills
through this avanue. Dr Winterstein also contributed substantially to my ﬁnancial support
by providing teaching assistanships and summer funds and I thank him for his support.

In the FW department, I want to especially thank Dr. Bill Taylor, the past chair, for
support and encouragement when Guilherme moved to Wisconsin and Dr. Mike Jones,
the current chair, for support in the last year of my program. I also thank the FW dept
for providing a wonderful environment during my term here and a special thanks to the
wonderful support staff, especially Jill Cruth and Mary Witchell.

In the Animal Science department, I want to thank Dr. Cathy Ernst for her support
and encouragement in the last year and everybody from the PRP project. I would like to

especially thank Dr Juan Pedro Steibel who has been a constant ﬁgure in discussion and

F5“ 'I’xi .J

1"...- in
i

. u
p¢-tn' ”‘Y‘
’ In I

. . . .
... on

‘thi‘ ,3.
.uwu \a .-‘5

" v
.1," \ 1’)"
‘9 ~h—u\\
. I 5
Li:- a g ' ‘5-
I. I‘ .a
'u A \II‘

help, ﬁrst as my lab mate and later as a mentor in the PRP project, and I have learnt a lot
from him. I would like to thank people from the functional genomics group, especially
Drs Ron Bates,Ted Ferris and Paul Coussens.

I also gratefully acknowledge the support from my collaborators on the Risk Assess-
ment Project, Dr. Bill Muir and Dr. Rick Howard, and the USDA-BRAG for having
funded the project.

Finally, last but not least, I want to acknowledge on a personal note all the people in
my life who have contributed to this endeavour. I would like to thank Punidan Jeyasingh
and Sudeep Perumbakkam, my friends who have always been there for me throughout
and for the many stimulating intellectual discussions we have had. I am deeply grateful
to Chida Ramanathan and his family who have been a second family for me here. I also
want to acknowledge my friends from Chennai, the “Beach ” gang and the “Pondy” gang,
especially Nagarajan, Arun, Srikrishna and Kate, my love, for her support. Lastly, I want
to thank my friends/collaborators at MSU, Jeff Evans, Pat Forsythe and Sean MacEachem,

for the many good times here, academic and otherwise.

vi

List of It
List of f1
Introduc

Chapter 1:

1.! It‘-
13 .\'t
1..

1..

1.3 R,
1..

1..

1.4 D-
Chapterl;
31 It
33 .\1
'1

3..

S.
2.3 R
2

2..

:f
2.4 D
2.

2L

2.~

TABLE OF CONTENTS

List of Tables ................................... viii
List of Figures .................................. x
Introduction ................................... 1

Chapter 1: Demographic stochasticity alters predicted risk after a GEO inva-

sion ................................... 9
Introduction ................................. 9

1.2 Materials and Methods ........................... 12
1.2.1 Stochastic Net Fitness Component Model ............. 12

1.2.2 Monte Carlo Implementation .................... 14

1.3 Results .................................... 16
1.3.1 Fitness Components and Variance ................. 17

1.3.2 Allelic Frequency of the Transgene ................. 18

1.4 Discussion .................................. 19

Chapter 2: Uncertainty in estimates of model parameters in assessing the risk

of transgene invasion: Monte Carlo approaches ........... 30

2.1 Introduction ................................. 30
2.2 Model and Methods ............................. 35
2.2.1 Net Fitness Component Model ................... 35
2.2.2 Experimental Setup ......................... 36
2.2.3 Data Analysis ............................ 37

2.2.4 Monte Carlo Approach ....................... 39

2.3 Results .................................... 42
2.3.1 Independence among vital rates: Non-Parametric Bootstrap . . . 42
2.3.2 Copulas: Dependence among ﬁtness components ......... 44

2.3.3 Validation with Mesocosm Data .................. 45

2.4 Discussion .................................. 46
2.4.1 Dependence vs Independence .................... 47

2.4.2 Considerations of experimental methods .............. 48

2.4.3 Conclusions ............................. 50

Chapter 3: Probabilistic Sensitivity Analyses of Models for Ecological Risk
Assessment: A Case Study Using the Net Fitness Component Model

for Assessing Risk of Genetically Engineered Fish ......... 60
3. 1 Introduction ................................. 60
3.2 Materials and Methods ........................... 65

vii

'1)

iJIJIJIJ
l)———-

'1)
J)

'.oJ

'1)
-L- '

Concluding Rt”
ipptndlt‘tﬁ .

Bibliograph

3.2.1 Gaussian process modeling ..................... 65

3.2.2 The net ﬁtness component model .................. 66

3.2.3 Monte Carlo simulations ...................... 67

3.2.4 Bayesian Gaussian Process (BGP) Approach ........... 68

3.3 Results .................................... 70
3.3.] Global Sensitivity analysis: Monte Carlo approach ........ 70

3.3.2 Bayesian Gaussian Process (BGP) models ............. 71

3.3.2.1 BGP Predictions ..................... 71

3.3.2.2 Global Sensitivity analysis: Gaussian Process approach 72

3.3.3 A Case study: Predictions for a model of transgenic Zebraﬁsh . . 73

3.4 Discussion .................................. 74
Concluding Remarks ................................ 90
Appendices ...................................... 93
Bibliography ................................... 112

viii

Ill

'JJ

J

8'.

Id

1.1

1.2

1.3

2.1

2.2

2.3

LIST OF TABLES

Statistics and quantiles for the distribution of extinction times using a
quasi-extinction threshold of 100 individuals. The last column represents
the percentile of the distribution of extinction times wherein the value

from the deterministic model falls under ...................

Scenario names and the associated components of ﬁtness that were con-
sidered stochastic. The scenario names will be used for reference in the

text. .....................................

Parameter values used in both sets of runs. 83 generations for all Scenar-
ios in Run] including A82. Initial population size of the WW genotype
was 60,000 individuals in an 1:1 sex ratio (30,000 males and 30,000 fe-
males). 60 adult individuals of the ww genotype were introduced into the
population with a 1:] sex ratio in the 65 days age class for simulation 1

and at 10 weeks of age for Run2 .......................

Summary statistics of the bootstrap estimates of ﬁtnesses and /l (the ﬁnite
rate of increase) for the Wildtype (WT) and transgenic (TG) genotypes.
The TG (w allele) was dominant in our case and this assumption was made

for all ﬁtness components in the model. ..................

Summary statistics for the relative ﬁtnesses and A (the ﬁnite rate of in-
crease) of the WT and TG genotypes for all scenarios. The relative ﬁt-
nesses are based on the ratio TG/WT. Although, the average values are
the same for the Copula and the Non-parametric bootstrap approaches, the
distribution of the relative values are diiferent. The same joint distribution
of all the estimated components were used for the scenarios differing in

relative mating success of the WT males ...................

Summary statistics of the ﬁxation times for the WT genotypes. Fixation is
deﬁned as the time where the TG gene frequency < 0.001. The Scenarios
indicate the relative mating success of the male WT genotype and the

scenario suffixed with (C) was implemented using the Copula approach. .

23

25

52

53

i/J

3.1 The mean of the ﬁrst order Sensitivity indices (SI) and the total sensitiv-
ity indices (TI). The subsecript for the ﬁtness components represent the
genotypes: W- WT genotype and T - TG gneotype. The ﬁtness compo-
nents are as follows: J - Juvenile viability, A- adult viability, Fr- Fertility
of male, Fc- Fecundity of females and 8- age of sexual maturity. ..... 79

3.2 Summary statistics for the ﬁtted models GP-MC GP-LHS GP-FULL. Quan-
tiles repesent absolute deviations of predicted values from the true values.
The model represents the deviations from the ﬁtted values on the trans-
formed scale and the training set represent the deviations from the ﬁtted
values back transformed to the original scale ................ 80

3.3 Summary of Global sensitivity analysis for the case study. First order
(SI) and total (TI) sensitivity indices are presented for the FULL-data.
Uniform column represent uniform distributions for both the mating suc-
cessand dominance components, while the Dominace and mating success
columns represent the modiﬁcations of the uncertainty distributions. . . . 81

1.1

1.2

1.3

1.4

LIST OF FIGURES

Distribution of extinction times for scenarios F, M, J and F+M+A+J at the
quasi-extinction threshold of 100 individuals. A kernel density estimate
using a Gaussian kernel is used for smoothing. The vertical bar represents
the time to extinction in the deterministic model at the quasi—extinction
threshold (”det Table 1). Plots showing distributions for all scenarios are

available in the appendix (see Fig. A.1.4 in Appendix ...........

The evolution of variance over time for Scenarios F, M , J and F+M+A+J
in the total population size (Left Column Fig. 2a) and the transgene fre-
quency (right column Fig. 2b). Limit for the x-axis is chosen as the
generation where at least 100 populations were not extinct over all runs
for each scenario. a) The trajectory of the log transformed variance of the
total population size (solid line) compared to the log transformed sum of
the variances (dotted line) over the individual genotypes. b) The trajectory

of the log transformed variance of the transgene frequency. ........

Probability of ﬁxation of the transgene in all scenarios. Proportion of
populations in the 1000 runs (Runl) where the transgene attains ﬁxation
at the quasi-extinction threshold of 100 individuals for all scenarios.

The persistence probability deﬁned as the proportion out of 200 popula-
tions where pw 2 0.01. Contours of the persistence probability of the
transgene after 49 generations (Run2). The lines represent contours of the

persistence probability across each combination of ﬁtness components . .

xi

26

27

28

29

 

 

I J

l J

I; -

I J

‘L/t

I’J

2.1

2.2

2.3

2.4

2.5

Evolution of the TG gene frequency over time for the deterministic mod-
els based on the expected values as well s the average over all bootstrap
replications for the three different values of mating success of WT males
(1.16, 1.93 and 5 respectively). The trajectory for the average over the
bootstrap replications implementing the Copula approach overlays the tra-
jectory for the corresponding independent approach. Legend text should
be interpreted as follows: D - represents deterministic models, B - rep-
resents average over bootstrap replications, C - represents average over
bootstrap replication implementing dependencies; 1.16, 1.93 & 5 - the
numbers represent the mating success of the WT genotype in that sce—
narlo. ....................................

Distribution of the time to ﬁxation of the Wildtype (WT) genotype de-
ﬁned as the time the TG allele frequency is < 0.001 for the three different
mating success values of the WT genotype (1.16, 1.93 and 5 respectively).

Distribution of the ﬁxation times versus the distribution of relative ﬁtness
components and II. Contours represent regions of the joint distribution us-
ing 2D kernel density estimates. Contours in red indicate the distributions
in the scenario incorporating dependence using copula, while the contours
in black represent the corresponding bootstrap scenario (i.e relative WT
male mating success=1.93) ..........................

Pairwise scatterplots of the parameters showing the relationship between
parameters under the independent bootstrap and the copula approach. The
set of scatter plots above the diagonal show parameter combinations un-
der independence. The set of scatterplots below the diagonal show de-
pendence induced among the possible parameter combinations based on
the speciﬁed correlations, here -0.5 between viability and all others, 0.5
between fecundity, fertility and the age of sexual maturity and 0 between
fecundity ad ferility as they are among different sexes ............

Comparison of relative ﬁtness contours between bootstrap and copula
generated parameter distributions. Contours represent the joint distribu-
tion using a 2D kernel density estimate. Contours connected by solid lines
indicate the distributions in the scenario incorporating dependence using
copula, while the contours drawn with dotted lines represent the corre-
sponding bootstrap scenario (i.e relative WT male mating success=1.93).
The contour levels indicate probabilities from 0.1 to 1 with a 10% differ-
ence in probability among adjacent levels. .................

xii

55

56

57

58

I!

.V _

ﬁ“

3).

ﬁ‘.

 

2.6 Comparison of the trajectories of gene frequency of the validation exper-
iment from those generated from the model predictions, with the associ-
ated 99th percentiles of the prediction distribution at each generation. A)
Represents the prediction distribution when the WT has a relative mating
advantage of 1.16 B & D) the relative mating advantage of the WT =1.93
C) WT relative mating advantage =5 and D) Additionally, incorporates
dependence using a copula approach. Legend text in the plots are: repl &
rep2 - Mesocosm replicate,mean - Mean over 1000 bootstrap replicates,
0.5% & 99.5%- the 99% intervals over the bootstrap replications ...... 59

3.1 Global sensitivity analysis using the Monte carlo approach. A) and B)
First order (SI) and total sensitivity (TI) indices for the MC-data respec-
tively C) and D) SI and TI using the full variance from the MC2-data. E)
and F) SI and T1 for sensitivity analysis using the LHS-data. ....... 82

3.2 Bayesian gaussian process model predictions using the model ﬁtted for
the MC-data. A) Predictions from the GP-MC for the data used for model
ﬁtting after transformation B) Backtransformed predicted values to the
orginal scale for the ﬁtted values and the response C) Model predictions
from the test set of the MC-data D) Predictions for the LHS-data as the
test set E) Predictions using MC2-data as the test set F) observed response
versus backtransformed response ...................... 83

3.3 Bayesian gaussian process model predictions using the model ﬁtted for
the LHS-data. a) Predictions from the GP-LHS for the data used for model
ﬁtting after backtransforming predicted values b) Model predictions be-
fore backtransforrning the response to the original scale c) Predictions
using MC-data as the test set d) Predictions using the MC2-data as the
test set .................................... 84

3.4 Sensitivity analysis using the Bayesian gaussian process model for the
MC-data ................................... 85

3.5 Sensitivity analysis using the Bayesian gaussian process model for the
LHS-data .................................. 86

3.6 Prediction plots and Sensitivity analysis using the Bayesian gaussian pro-
cess model for the FULL-data ....................... 87

xiii

 

 

;, Sch~n

F111-

DJ
(7.1

SCU~1
Fl'Ll

\Utti

3.7 Sensitivity analysis using the Bayesian gaussian process model for the
FULL-data using different uncertainty distributions for the dominance pa-
rameter ................................... 88

3.8 Sensitivity analysis using the Bayesian gaussian process model for the
FULL-data using alternate uncertainty distributions for the WT mating
success parameter .............................. 89

A. 1 .1 Distribution of extinction times for all scenarios at a quasi-extinction
threshold of 100 individuals. A kernel density estimate using a Gaussian
kernel is used for smoothing. The vertical bars represent the time to ex-
tinction for the deterministic model. .................... 104

A. 1.2 The evolution of average total population size over time for all scenar-
ios and the quantiles of population size over time. The dashed blue line
represents the evolution from the deterministic model, while the green
lines represent the 5% and 95% quantiles respectively. The red line repre-
sents the average population size over 1000 runs at each time step. Popu-
lation numbers have been log transformed for better visualisation. . . . . 105

A. 1 .3 The evolution of average transgene frequency over time for all scenar-
ios and the quantiles of the transgene frequency at time step. The dashed
blue line represents the evolution from the deterministic model, while the
green lines represent the 5% and 95% quantiles respectively. The red line
represents the average population size over 1000 runs at each time step. . 106

A. '1 .4 The evolution of the coeffecient of variation(%CV) for the runs over
time for scenarios F, M, J and F+M+A+J. Limit for the x-axis is chosen
as the generation where at least 100 populations were not extinct. 1) The
y-axis represents the trajectory of the %CV for the total population size.
2) The y-axis for the right column represents the trajectory of the %CV of
the transgene gene frequency ......................... 107

A - 2.1 Marginal distribution of the bootstrap parameter estimates for the Wild-
type (WT) genotype and the transgenic type (TG). Marginal distributions
for the method using Copulas simulating dependence are labeled with a
”c” subscript.The The same set of parameters were used for three differ-
ent values of realtive mating success of the WT males ( 1.16.1.93 and 5
respectively). ................................ 109

xiv

‘ I".
ilir 1\
.‘. V
\2. a.

A.2.2 Comparison of the ﬁxation time as a function of ﬁtness components
for the Wildtype(WT) genotype. The dashed contour lines represent the
scenario incorporating dependence using the copula approach ....... 110

A23 Comparison of the ﬁxation time as a function of ﬁtness components
for the transgenic (TG) genotype. The dashed contour lines represent the
scenario incorporating dependence using the copula approach ....... 11 1

XV

 

[mmduciiiin

(iment rutc~ of 31.?
“.75 hihitilx haw "C
Iii)? t. lending to in;
giant settlogtt‘ii mt:
Lilik‘ impugn ctulv;
min in the cut
‘}\Iﬁi1‘t\. Ettlit‘gltd
sheen and their .1
sin ruling and pr!
Recently there hm
nitration in CCU!“
mi‘iﬂdllli) integr;
Mini and Hotim.

fundamental comr

none and SUICIN

use in ecological

implant} tBnrnt

Ethnology have a

ml for ecologica

.,_ 1
3:333

in Where a C0

Introduction

Current rates of global environmental change due to anthropogenic modiﬁcation of nat-
ural habitats have resulted in accelerated extinction of natural populations (Gaggiotti,
2003), leading to rapid changes in biotic interactions (Kareiva et a1., 1993) with conse-
quent ecological impacts. To formulate effective policy and regulations that can mitigate
these impacts ecological risk assessments have risen to prominence in the last decade,
especially in the context of new technologies that can potentially further stress natural
systems. Ecological risk assessment focuses on the relationships among one or more
stressors and their associated ecological effects and serves as a guide for regulatory deci-
sion making and provides insights into the possible exposures and concomitant hazards.
Recently, there has been an emphasis on making population-level risk an explicit con-
sideration in ecological risk assessments (Bamthouse and Sorenson, 2007), and more
importantly integrating genetics and population dynamics into model based approaches
(Nacci and Hoffman, 2008). Modern ecological risk assessments include modeling as a
fuI‘Ciarnental component to predict population trajectories and to quantify risks (Bamt-
hOUSe and Sorenson, 2007). Many types of population models have been developed for
use in ecological risk assessment with varying degrees of statistical and computational
Comp] exity (Bamthouse and Sorenson, 2007). In the past decade, advances in computing
teelirlOlogy have allowed for the emergence of complex simulation models as a primary
tool for ecological assessments. Simulations are recognized as a kind of experimental

s
ystel’h where a complex model formulation mimics the behavior of natural systems. This

 

 

‘-r
"t‘\' \l

1" ~““

,..~ ." \ \J
,..h

v" '

i213 ht.‘=i-"‘.~"

. .4 :L
’ik' EVENI-

n 1.‘
1653an C

in under .1
unit cut the
File; Tied
.‘rnps 1:307":
'Aherein pt?
NaCCl 1100
K6} to
{ti

tittmgen

Hate-t. 2'

in

in the time“

can help researchers investigate system behavior by manipulations that would be impos-
sible otherwise, either logistically or ethically (Peck, 2004).

Advances in the biotechnology realm combined with escalating demand for food
worldwide has fostered the growth of transgenic solutions for generating environmen-
tally friendly produce (Aemi, 2004) and is especially true in aquaculture wherein more
than 30 species are awaiting commercial release (Devlin et al., 2006). Transgenic tech-
nology is also gaining ground in livestock (Clark and Whitelaw, 2003) and crop produc-
tion (Pilson and Prendeville, 2004), but has not been widely received by consumers and
no transgenic animal has been approved for food consumption in the USA or elsewhere
primarily due to environmental concerns. The main emphasis of ecological assessment
is for transgenic organisms that have wild relatives and high dispersal capabilities (e.g.,
plants, insects and aquatic organisms; National Research Council, 2004). In this con-
text biological containment strategies will have to be employed if there exists potential
for hybridization with wild populations leading to introgression or speciation (National
Research Council, 2004). Clearly, potential environmental impacts of genetically engi—
neered organisms (GEOs) should be evaluated and regulated using a scientiﬁc approach
towards policy, wherefore research and assessment of impacts should capture the dynam-
ics under a spatio-temporal context relative to a comparator, usually the same organism
without the transgene (see reviews in Andow and Hilbeck, 2004; Andow and Zwahlen,
2006; Tiedje et al., 1989)). Scientiﬁc approaches for assessing the risks of transgenic
crops recognize the need to utilize an ecological basis (e.g. Andow and Hilbeck, 2004)
wherein population-level strategies employ the integration of genetics and demography
(Nacci and Hoffman, 2008).

Key to managing environmental threats of GEOs lies in evaluating the possibility

0f transgene introgression into natural populations and its resultant ecological impact
\Heclrick, 2001). The likelihood of invasion and the persistence of a transgene depends

on the ﬁtness of the transgenic type relative to that of wildtype genotypes (Tiedje et al.,

r41"

 

rte?”

 

.1
,A. I """
‘9th Difi‘n‘w

‘ ‘I \ I ",\ \
Tlfruii‘g'trk ii.\\ \

::.:pt.‘pzltti..~n. 1:13
ind :hcrcfurc c
1': inattenntcnfnl =
tffgxﬂiidilt‘ll'. for a
rising htcttc int."
ecg-v} stern d§ nanz'
Current math.
t'rgmnns are in

:25 . 'mhh. 1‘19—

11“ r ‘
‘ . .
-.' i. and \pdhlt

‘,

.1113; Rtthtcr at
it: maul} dm

n.- ; ' ‘
IntitldiiO-n “Hi

‘13 I,
1‘... mt decade
graditttne the 1

3T1

“E'R‘Nttn for
.anklfulmg ‘-
‘ai 'Mfdthcr

11011

dEBumtu
PCFUiLttton mt

imahilitv 1C

tenth the 4‘
WPUi'citton \t

xiit‘iience 0f

1989). The problem consists of two aspects: (a) prediction regarding the introgression of
the transgene into wild populations and the associated changes in genetic composition of
the population, and (b) predicting changes in ecological interactions following introgres-
sion and therefore ecosystem effects from changes in the wildtype population. Assessing
the environmental effects of GEOs requires information at multiple levels of biological
organization; for example, transgenic individuals may induce population-level effects by
altering biotic interactions (Devlin et al., 2004) which is shown to impact community and
ecosystem dynamics (Kareiva et al., 1993).

Current methodology for assessing the ecological risks and impacts of transgenic
organisms are based on a plethora of modeling frameworks ranging from verbal mod-
els (Knibb, 1997), reaction-diffusion methods (Soboleva et al., 2003), matrix popula-
tion models (Claessen et al., 2005; Garnier and Lecomte, 2006; Muir and Howard, 2001,
2002), and spatial transfer models coupling population dynamics and genetics (Meagher,
2003; Richter and Seppelt, 2004). A recurrent theme amongst these models is that they
are mainly developed for assessing the risk of transgenic crops and focus more on hy-
bridization with wild relatives of the modiﬁed crop species (Hails and Morley, 2005).
The last decade has also seen increased complexity in the modeling frameworks towards
predicting the fate of the transgene introgression into natural populations. Models of in—
trogression for transgenic crops evaluate gene flow based on pollen dispersal, for example,
incorporating exponential distance and directional effects of wind mediated pollen disper-
sal (Meagher, 2003) or transport equations of atmospheric physics coupled with popula-
tion dynamics and genetic models (Richter and Seppelt, 2004). Additionally, structured
pOpulation models combining dispersal (Garnier and Lecomte, 2006) and environmental
variability (Claessen et al., 2005) have also been employed in this context. More re-

cently the AMELIE modeling framework proposed for transgenic crops combines both
wvulation structure, stochasticity and dispersal (Kuparinen and Schurr, 2007). The con-

sequence of increased model complexity are twofold, the number of parameters in the

 

 

3

' A av

r351 thtfw *
1A, A

l
I" YT.) D'I"p~\;:d
.11 i» . A

. ., ; .
3535‘s 1 th‘ 11‘

' Vii ~10
I \ ll .
L1: mm he

*1: terrmtr. 2*.
1'1: pittl‘ﬂt‘
Cd\'~'~t‘ii. 1W
age-tritium
thin}. mini
mile fertility
in iinhtng inc
ewiuttunar}
tidal) entplt
cti‘nxemtinn
Wit to eVah
dd‘tt‘x‘ated it
A char-at
from Chance
innita] and
ﬁtinn 1lithtn
in V1121] rate.

model increases and the relationships among variables need not be linear. For example,
in the proposed AMELIE framework there are 29 parameters and this is the most sophis-
ticated model yet for risk assessment of transgenic crops.

Aquaculture is another area wherein there is a large and increasing diversity of GEOs
awaiting ﬁeld release (Devlin et al., 2006) and there is a relative paucity of modeling
frameworks in this area. Most species used in aquaculture have natural populations in
the wild (e.g. salmon) and therefore, in conjunction with their dispersal characteristics,
create real potential for the transgenes to invade natural systems and cause perturbation at
the community and ecosystem levels. For the case of GE ﬁsh, Muir and Howard (2001;
2002) proposed a generic modeling framework based on matrix population models (sensu.
Caswell, 1989) that incorporates genetics and demography. Their model is based on an
age-structured population model characterized by six net ﬁtness components (juvenile vi-
ability, adult viability, mating advantage, age at sexual maturity, female fecundity and
male fertility, and reproductive longevity). The utility of this modeling framework lies
in linking individual life history traits to population dynamics to provide insight into the
evolutionary fate of populations (Grant and Benton, 2000). Matrix population models are
widely employed in applied ecology to predict the fate of populations of economic and
conservation concern (Tuljapurkar, 1997). This approach provides a reasonable frame-
work to evaluate the probability of gene introgression in natural populations and has been
advocated as a tool for assessing risks of GE crops (Hails and Morley, 2005).

A characteristic feature of biological populations is the inherent variability arising
from chance events that individuals within the population experience inﬂuencing their
survival and reproduction. Sources of demographic variation can be divided into vari-
ation within individuals with the same vital rates (sensu. Pollard (1966)) and variation

in Vital rates among individuals (sensu. Fox and Kendall (2002)). Recent demographic
Studies of natural populations have underscored the importance of the role of variation,

both environmental and demographic, and is shown to be imperative in understanding the

 

 

:l.)" F. .r
‘tnbiiou‘tllﬂx ';

.A V ‘
I " 'l
(rein.n.\3‘

0“"‘ ' ‘ ‘
1 ﬂ ' I .'
5\u\'\ “‘“d I

IV A ' ill .1
Lertnuhid‘

:‘tilﬂt, \iﬂ“

he

or V' ‘ i ‘
i'i".li:\t‘ili

than: and E
aneﬂdufr

tm\80513\i
mgtxxlhit’ i

hix4h~
.mtanCh
n« ' '
.JtUVCd P”

lvvtgatcu

tn dcc'

NYC

‘hr I
I145 ion”

V

re
unhhonur

hatexhrnir

(3:1 ”10111.0

Ti '
r.pttlatton

inﬂuence of anthropogenic modiﬁcations on natural populations (Boyce etal., 2006). For
example, it has been shown that variation in vital rates can signicantly affect outcomes of
demographic models with important ramiﬁcations, such as estimates of population growth
rate (Tuljapurkar et al., 2003) and persistence (Engen et al., 2005b). Uncertainty in es-
timated model parameters is another source of variation for model predictions in using
population models to predict population fate. In the context of matrix population models,
uncertainty in vital rates has been implemented using two broad schemes of resampling.
One involves resampling of the projection matrices over time for a single trajectory of the
forecast and the other involves sampling a random set of projection matrices, each having
their individual trajectories (e.g. Life-stage Simulation Analysis (LSA), Wisdom et al.,
2000). Many methods have been developed to elicit the effect of individual parameters of
the projection matrix on the overall growth rate of populations, such as elasticity analysis
(Grant and Benton, 2000; Caswell, 1989), and utilized in forecasting for management and
conservation (Morris and Doak, 2002). Monte Carlo approaches are speciﬁcally used in
this context to evaluate the effect of uncertainty in parameter estimates, especially in sit-
uations with limited ﬁeld data or when analytical derivations of the model dynamics are
impossible or cumbersome (Tuljapurkar et al., 2003; Caswell, 1989).

It is also widely recognized that dependencies among the vital rates are also impor-
tant, especially for forecasting. Long standing considerations of these dependencies have
focused primarily on temporal correlations among the projection matrices (Tuljapurkar,
1990) accounting for environmentally driven correlation in vital rates, which is shown
to decrease extinction times (Inchausti and Halley, 2003). However, life history theory
has long recognized the importance of associations among vital rates in determining the
evolutionary fates of different genotypes in populations (Stearns, 1992). Recent studies

haVe shown that incorporating associations in the vital rates based on life-history theory
can modify evolutionary stable states (Koons et al., 2008) as well as modify estimates of

pOpulation growth rates (Ramula and Lehtila, 2005). Most approaches utilize an ad-hoc

 

1th attracting I
“mid". {WT-ii“:
:31: gt":rt'.ri‘.".z12
tree. .11“? 1“
if;i}\i\.i‘1.'.i
renew \ee 5.:
populatz. tit
the :hu.\ Llft‘ r
fern‘tpulntm
Siege-~trueture
:5 linear and n'
Enhance-him
recently heron
cemplex mode
niques can he i-
dpprout‘h to cut
P181'1()U\i}’.
‘1'”: hate utili/
3411391} ignoring
1“ Parameter Ext
”1036an cf '-
Pftsistence 0111-

U! . .

gr

ﬁtness of the [full

method of eliciting the role of correlations among the vital rates (e. g. Ramula and Lehtila,
2005; Wisdom et al., 2000), based on linear correlations to measure dependencies.

Using a modeling approach, the ﬁrst step towards understanding system behavior lies
in extracting the relationship between variation in parameters and the resultant variation in
model output, known as uncertainty analysis (Saltelli, 2000). The next step is to examine
the contributions of the parameters to the variation in the output and their relative impor-
tance, also known as sensitivity analysis. A wide array of methods exist for sensitivity
analysis , both local and global, covered by an extensive literature (for a comprehensive
review see Saltelli, 2000). Local sensitivities have been widely used for simple models
of population dynamics as they are derivative based, usually can be analytically derived
and thus are not computationally intensive. For example, elasticity analysis performed
for population viability analysis (Caswell, 1989; Morris and Doak, 2002, PVA; see) using
stage-structured models are local methods, however, this is simple only if the response
is linear and monotonic. Global sensitivity analysis (GSA) for ecological models, using
variance-based techniques (an ANOVA like decomposition of the output variance), has
recently become prominent and provide insight on the relative importance of factors in
complex model settings. An accessible introduction to GSA using variance based tech-
niques can be found in Saltelli et a1. (2008) as well as two recent reviews introducing this
approach to ecological applications (Cariboni et al., 2007; Fieberg and Jenkins, 2005).

Previously, models based evaluation of the risks associated with transgene introgres-
sion have utilized a deterministic framework for the the dynamics of the population,
largely ignoring stochastic variation intrinsic to the populations as well as uncertainty
in parameter estimates. Using matrix projection models with environmental stochastic-
ity, Claessen et a1. (2005) showed that extrinsic variation impacts the invasion risk and

Persistence of feral crops once transgene introgression has taken place. Prediction of
transgene introgression into natural populations is contingent on evaluating the relative

ﬁtness of the transgenic type and the selection dynamics in natural populations. Fitness

of the transgene is associated with pleiotropic effects on ﬁtness components which can
impact risk (Muir and Howard, 2001, 2002). Genotype x Environment (G x E) effects
(Devlin et al., 2006), when environments vary, can further compound the associated risk
and thus necessitates multiple model evaluations. Furthermore, accuracy of assessments
is compounded by the fact that transgenic phenotypic effect can only be deduced from lab-
oratory experiments, therefore creating uncertainties in the estimated parameters. Thus
constructing assessment models for transgene introgression within a predictive frame-
work necessitates research integrating the above facets of uncertainty in conjunction to
provide effective results for regulatory decision-making.

This dissertation research focuses on ﬁlling the lacunae in the existing methodology
for assessing risk of transgene introgression into natural populations. Speciﬁcally, this re-
search address the effects of demographic stochasticity and uncertainty in parameters on
predictions for transgene introgression using a modeling approach. Furthermore, this re-
search also provides some methodology improvements to existing modeling frameworks
so they can implement biological mechanisms in a realistic manner. Speciﬁcally, the fol-
lowing objectives are addressed : 1) Determine the role of demographic stochasticity in
modifying predictions of transgene introgression and the relative contributions of individ-
ual ﬁtness components to the resultant variation, 2) Determine the role of uncertainty in
estimating parameters on model predictions of transgene introgression and 3) Assess the
global effect of parameters on model behavior using sensitivity analysis and the efficiency
of sensitivity analysis through meta-modeling using a Bayesian Gaussian Process, to re-
duce the computational burden. Each Chapter in the thesis addresses a speciﬁc objective.
Chapter 1 and Chapter 2 developed from the void in the existing literature on the impact
of stochasticity for making predictions. While working on Chapter 2, it was clear that it

Was important to account for correlations among ﬁtness components to make predictions

realistic and biologically meaningful. However, current approaches are primarily ad-hoc

in their implementation of correlations and so we alternatively implemented a Copula ap-
proach (For an introduction to the theory of copulas see Nelsen, 2006) as formal method-
ology for incorporating correlations. Finally, based on conducting simulations for the ﬁrst
two research chapters, there arose a clear need to formalize approaches for running them
and thus reduce the overall computational burden without sacriﬁcing model complexity.
This led to the adaptation of meta-modeling using a Bayesian approach in Chapter 3. The
Bayesian framework is ﬂexible enough to incorporate model extensions as well combine
outputs across multiple models, e.g. using hierarchical modeling for population ecology
(Clark, 2003) or model averaging (BMA; Gelman, 2004). With increased computational
power Bayesian approaches are becoming popular for assessments in conservation (Wade,
2000), population modeling (Millar and Meyer, 2000), population genetics (Beaumont
and Rannala, 2004) and evolutionary biology (OHara et al., 2008) and this choice will
facilitate integration with those areas in future. Finally, as part of the research program
a stochastic re-implementation of the net ﬁtness component model(Muir and Howard,
2001) was also created, which will be deployed in the near future as software tool for

decision-makers.

 

 

Che

Den

pret

1.1 1

EV‘dia
57:3 Ere. ir
Tt'degen
Cit}? PmL
$me A
01' Ei‘f'tii
hﬂcrd\ a
513-. me
11511 em in
ipflttdeh
ihﬁ dint-1n

r‘i'igitttixm

Chapter 1

Demographic stochasticity alters

predicted risk after a GEO invasion

1.1 Introduction

Escalating demand for food worldwide has fostered transgenic solutions for generat-
ing environmentally friendly produce, this is especially true in aquaculture (Aemi, 2004).
Transgenic technology is gaining ground in livestock (Clark and Whitelaw, 2003) and
crop production (Pilson and Prendeville, 2004), but has not been widely received by con-
sumers and no transgenic animal has been approved for food consumption in the USA
or elsewhere primarily due to environmental concerns. The potential for environmental
hazards are higher with organisms that have wild relatives and high dispersal capabilities
(e.g., insects and aquatic organisms National Research Council, 2004). Clearly, poten-
tial environmental impacts of GEOs should be evaluated and regulated using a scientiﬁc
approach towards policy, wherefore research and assessment of impacts should capture
the dynamics under a spatio-temporal context relative to a comparator, usually the same

organism without the transgene (see reviews in Andow and Hilbeck, 2004; Andow and

 

 

I
”a ,.—q
i." _‘..“5\

J1: Iii?”
,fjﬂtlf‘; iii.
;mpq;‘1.\ 11

Pye'tit‘
tangent
hmtﬁb
mmtnt
et a]- 3111‘
.lleigher.
Iﬁhlb
dhumn
an age-stru
nhMym

Mhhmh

Zwahlen, 2006; Tiedje et al., 1989). Assessing the environmental effects of GEOs re-
quires information at multiple levels of biological organization; for example, transgenic
individuals may induce population-level effects by altering interactions among individu-
als (Devlin et al., 2004) that can impact community and ecosystem dynamics.

Key to managing environmental threats of GEOs lies in evaluating the possibility
of transgene introgression into natural populations and its resultant ecological impact
(Hedrick, 2001). The likelihood of invasion and the persistence of a transgene depends on
the ﬁtness of the transgenic type relative to that of wildtype genotypes (Tiedje et al., 1989).
The problem consists of two aspects: (a) prediction regarding the introgression of the
transgene into wild populations and the associated changes in genetic composition of the
population, and (b) predicting changes in ecological interactions following introgression
and therefore ecosystem effects from changes in the wildtype population. If it can be
shown that the probability of transgene introgression is null or very low then ecological
impacts need not be considered (Muir and Howard, 2004).

Previous studies have used a plethora of modeling approaches to evaluate the fate of a
transgene in populations (Hails and Morley, 2005). Models range in varying complexity
from verbal models (Knibb, 1997), reaction-diffusion methods (Soboleva et al., 2003),
matrix population models (Claessen et al., 2005; Gamier and Lecomte, 2006; Gamier
et al., 2006), and spatial transfer models coupled with population dynamics and genetics
(Meagher, 2003; Richter and Seppelt, 2004). For the case of GE ﬁsh, Muir and Howard
2001; 2002 found that the risk associated with transgene introgression depends on the
pleiotropic effects of a transgene on ﬁtness components. Their analysis was based on
an age-structured population model characterized by six net ﬁtness components (juvenile
viability, adult viability, mating advantage, age at sexual maturity, female fecundity and

male fertility, and reproductive longevity). They show that population extinction could

10

result when the transgene was inﬂuenced by both sexual selection through increased mat-
ing advantage of male transgenics and natural selection through reduced viability of ju-
venile transgenic offspring, the ”Trojan Gene” effect (Muir and Howard, 1999). Their
model falls under the aegis of matrix population models (Caswell, l989sensu. ), the util-
ity of which lies in linking individual life history traits to population dynamics (Grant and
Benton, 2000) and provides insight into the evolutionary fate of populations. Currently
these models are widely employed in applied ecology to predict the fate of populations of
economic and conservation concern (Tuljapurkar, 1997). This approach provides a rea-
sonable framework to evaluate the probability of gene introgression in natural populations
(Hails and Morley, 2005).

Muir and Howard’s model (Muir and Howard, 2001) can be considered an extension
of matrix population models to multiple genotypes in a deterministic setting. Real popu-
lations, however, consist of a discrete number of individuals that experience chance events
which inﬂuence their survival and reproduction, resulting in demographic stochasticity.
Earlier models of transgene introgression have been based primarily on a deterministic
approach. A notable exception is the use of matrix projection models with environmental
stochasticity (Claessen et al., 2005), which show that stochasticity impacts the invasion
risk and persistence of feral crops once transgene introgression has taken place. Sources
of demographic variation can be divided into variation within individuals with the same
vital rates (sensu. Pollard, 1966) and variation in vital rates among individuals (sensu. Fox
and Kendall, 2002). Most work on stochasticity in age-structured populations evaluates
the impacts of stochastic variation in vital rates (the parameters of the projection matrix)
(Tuljapurkar, 1997). However, almost all considerations of stochasticity on population dy-
namics largely ignore genetic variation within populations. Contrastingly, classical pop-
ulation genetics focuses mainly on the dynamics of allele frequencies and only implicitly
incorporates the effect of population size and age structure. Stochasticity in these models

is largely a function of random sampling of gametes from an ﬁnite pool i.e random genetic

ll

"c; \‘
.15
J‘ i
s. u...
o _,_l
‘w‘k..-

1"). J )1
uk; 3 u

1.2

1.2,]

Our n

7nd Hot.

drift and gloss over the effect of population dynamics (Ewens, 2004). Emerging studies
combining aspects of both, genetics and population dynamics, suggest that the interplay
can have signiﬁcant impacts. For example, demographic stochasticity can inﬂuence the
effective population size in age-structured populations (Engen et al., 2005a) and selection
for stable population growth rates can favor genotypes with lower variance in vital rates
(Shpak, 2007). Prediction of transgene introgression into natural pOpulations is contin-
gent on evaluating the relative ﬁtness of the transgenic type and the selection dynamics
in natural populations. Furthermore, accuracy of assessments is compounded by the fact
that transgenic phenotypic effect can only be deduced from laboratory experiments, there-
fore creating uncertainties in relation to Genotype X Environment (G x E) effects (Devlin
et al., 2006). Thus constructing assessment models for transgene introgression within a
stochastic framework necessitates understanding the role of stochasticity in both genetics
and demography.

In this study, we investigate the effect of demographic stochasticity, primarily within
individual variation in age structured models on the probability of ﬁxation of a trans-
gene using a Monte Carlo implementation. To this end we pose the following questions:
How does demographic stochasticity affect predictions of extinction times relative to a
purely deterministic model? What are the relative contributions of ﬁtness components to
stochastic variation, both individually and jointly? And ﬁnally, what is the impact on the

probability of ﬁxation and the dynamics of the transgene?

1.2 Materials and Methods

1.2.1 Stochastic Net Fitness Component Model

Our model is an extension of Muir and Howards Net Fitness component approach (Muir

and Howard, 2001). A brief description of the deterministic model follows, for more

12

details see Muir and Howard 2001 and see Appendix A.

We assume a diallelic model conferring three genotypes in the population (transgenic

homozygote ww, wildtype homozygote WW, and the heterozygote Ww). The dynamics of

the population of each genotype, comprised of a vector of age classes, can be formulated

as:

Reproduction
t + 1 WW WW WW
"0,WW:FI :f(N xN and

t+1
n0,ww

__ WW.
—Ft

Survival

~t+1

 

 

Nww x NWW)

 

-——-) ———>
and NWW x NWW)

f1

 

——>

 

 

t,

7‘ > ____,
1+ 1W 2 Fth .f(ﬁww xNWwJVWW X Nww

W ——-> —-) -——>
N W x NW and NW x NWW)

 

(1.1)

where the F ‘5 represents reproductive output for each genotype g (see Appendix A equa-

tion 5) at time t and are nonlinear functions of the reproducing individuals in all age

classes over all three genotypes; the Sg submatrices represent the age-speciﬁc survival

__)
parameters for each genotype g and ng represents the population vectors of each geno-

type. The model is not analytically tractable when there are multiple genotypes, except

under very simpliﬁed assumptions (Bodmer, 1965). Details on an comprehensive review

of the population genetic analysis of age-structured populations in the deterministic con-

text can be found in Charlesworth (1994).

The implementation of demographic stochasticity follows Pollard (1966), wherein the

13

 

 

uuuu

I:
ltl'.r:‘p' ll
'1'. "’

t u" l‘

1.2.2 .1

lie axe.
tion 13.1er
ﬁle 1.31 at
Zhlf\i‘:ttid (5
Melt 351.1
gngennd u
Iti’tlthttps.

T110 dill
ttfeleten dit
mentniinn oi

Steeexs. A1 '
_ a C

Ctilflt’ipond [r

8

number of individuals n a t’

within an age class a at time t, are random variables. The el-

ements of the projection Matrix A (i.e. age speciﬁc vital rates) are parameters underlying

8

a I over time and are therefore constant in all simula-
9

the distributions generating the n
tions. For example, the number of individuals surviving from age 1 (n1) to the next age
is a random draw from a binomial distribution, i.e. (n2 ~ Bin(n., s)), where s is the sur-
vival parameter (probability of survival) for all age 1 individuals. Similarly, stochasticity
related to the other components in the model are implemented using a Poisson process
for female fecundity, binomial processes adult and juvenile survival, and a multinomial
process for mating success (also see Appendix B for a more detailed explanation). Male

fertility and age of sexual maturity components were the only ﬁtness components that

were applied in a deterministic manner.

1.2.2 Monte Carlo Implementation

We used a Monte Carlo approach to understand the dynamics in the total popula-
tion based on demographic stochasticity applied to the three genotypes in the popula-
tion. A thousand runs were performed for each scenario (see paragraph below; Ta-
ble 1.2) and each run stopped only when the total population reached the extinction
threshold or reached a maximum of 20,000 time steps which corresponds to approxi-
mately 250 generations. The model was implemented in the C++ programming lan-
guage and all random number generators were used from the GSL libraries (Galassi et al.,
2003)(http://www.gnu.org/software/gsl/).

Two different sets of runs (hereafter Runl and Run2) were performed. Runl consist
of eleven different scenarios of which the ﬁrst four scenarios comprise stochastic imple-
mentation of each particular component singly (Scenarios F for Fecundity, M for Mating
Success, A and J for adult and juvenile viability respectively). The next ﬁve scenarios

correspond to stochastic implementations of combinations of components, to evaluate the

14

 

 

‘.‘,.t \ if
. , 't
kluv— ‘
I

q).— 1
kg... 5

‘3‘ fttiit"

Fri ‘
Mutt.

-. ’
Jib-’3‘ 11,

U “.9 . .'
..i..|.hd\“

joint contributions of each ﬁtness component to the overall variation (e.g. Fecundity and
Juvenile viability: Scenario F+J; detailed notation in Table 2). Scenario 10 (3Mill) is
an implementation of stochasticity in all components with a larger starting population
size of three million individuals for the WW genotype and Scenario 11 (A82) was with
a reduced age structure (weekly rates instead of daily). We used the same parameters
for the ﬁtness components for all scenarios based on a single combination of transgenic
juvenile viability and transgenic mating advantage (2-fold) that leads to extinction of the
total population (see Table 1.3 for paramter values used). Parameters for the simulations
were based on the values estimated for transgenic Medaka (Muir and Howard, 2001). A
numerical search algorithm was used to initially adjust the juvenile survival of the WW
genotype to start the initial population (N WW only) at a stationary stable age distribution
(i.e., AWW ~ 1). Adult male and female transgenic individuals (NWW) were introduced
into the population at 0.001% of the WW population size (60,000 individuals in a 1:1
sex ratio). Generation time was calculated as the average age of females to replace the
population (see Caswell, 1989, pg. 110) and all generation times reported are in terms
of the WW population only. We used quasi-extinction times deﬁned as the time to reach
a threshold population size to alleviate confounding between the effects of demographic
stochasticity and error propagation due to rounding.

The second set of runs (Run2) was performed over a two-dimensional grid of param-
eter values for juvenile survival and mating success of the ww genotype (Table 1.3). To
speed up the simulations we used a weekly instead of daily age structure and the other pa-
rameters were adjusted accordingly. For each combination of juvenile survival and mating

success, we ran 200 simulations to obtain the distributions at that particular combination.

15

 

iihennsn'
{if} ﬂic\tfa'
.epenji\«xa

Atnurc
ﬁme~nqxxi

thwtoqua~

tunanon an
i31h\eh.7
‘IS‘E'Iabie
{mnnfqud\
theiﬁfp, Cit
94189111 Cum
{11111 an} CUI‘

(hhm'q
%ﬁhhcsoi
orhlandiih
‘See Ft
Ma

0
t'r

A+J.F,
Fig. A.1.1 _~.

1.3 Results

We present population trajectories and time to extinction for only four of the scenarios:
fecundity (F), mating success (M), juvenile viability (J), and F+M+A(adult viability)+J,
in the main text (see Table 1.2 for scenario descriptions). We focus on these scenarios as
they illustrate the main patterns in the results, but do present plots for all scenarios in the
appendix (see Fig. A. 1.1), and highlight some speciﬁc results for other scenarios.

At a threshold population size of ~ 100 individuals, the distribution of quasi-extinction
times was skewed to the right (Fig. 1.1; see also the quantiles in Table 1.1). The average
time to quasi-extinction (hereafter #QE’ calculated over all the 1000 runs) was centered
on the quasi-extinction time from the deterministic model (hereafter pdet) for scenarios
involving mating success and fecundity only (scenarios F & M, Fig. 1.1). Additionally,
variation around pQ E for these components was low (%CV 1.3%, 0.6% and 1.26% re-
spectively, Table 1). In all other scenarios, the ,uQ E was lower than, but close to, the ”det
(:t:5% Table 1.1). However, pdet was in the upper tail (2 75’hpercentile) of the distribu-
tion of quasi extinction times from the stochastic simulations, except for scenario J where
the ﬂdet coincides with ,uQ E (Fig. 1.1, Table 1.1, [.ldet percentile). The viability com-
ponent contributed most variation in the time to extinction, speciﬁcally juvenile viability
and any combination including it.

Other scenarios have similar trends to those used above, e.g., the distribution and
statistics of (quasi) extinction times for scenario F+M were similar to those of scenario F
or M and likewise scenarios A82 and 3mill are qualitatively similar to scenario F+M+A+J
(See Fig. A.1.1). Combinations with juvenile viability, i.e., scenarios with +J: A+J,
M+A+J, F+A+J, had very similar distributions and moments to those of scenario J (See

Fig. A.1.1 scenarios J, A+J, M+A+J, F+A+J, M+F+A+J).

16

 

{5 \LTF'UTICK

P41 n.
in the La
errnzrihtitton
Yhen adult \‘i
wrench: 1‘01
it‘ll) simulati
Variance i.
later time inet
ifig. 1.21M.
pitpnlntion \i/c
E'T‘,1U‘~'enile I‘iaf
mmPUidliﬂn \i
””708 exponent
ImPOmtntiy,

tenantes of the

C1”diiiete'alter it i

1.3.1 Fitness Components and Variance

When one ﬁtness component was stochastic at a time, stochasticity in juvenile viability
(scenario J) led to the highest variance in trajectories of total population size and gene
frequencies, and stochasticity in fecundity (F) and mating success (M) led to the lowest
variances (Fig. 1.2a & Fig. 1.2b, See Fig. A. 1.2 & Fig. A.1.3 as well).

Stochasticity in fecundity and mating success components make signiﬁcant contribu-
tions only when the population size is very small (Fig. 1.23, Scenarios F & M). Ana-
lytically, it can be shown that stochasticity in these components contributes to variation
only when the number of reproducing individuals are very low (see Appendix C) and this
is supported by our simulations, singly (scenarios F & M) or in combination (scenario
F+M).

In the case of viability, as expected (see Appendix C), juvenile viability has a larger
contribution to variance in population size (Fig. 1.2a) and transgene frequency (Fig. 1.2b)
than adult viability. In some circumstances, adult age classes contributed little to the
variances; for example, in scenario A in which only two populations went extinct out of
1000 simulations (see Fig. A. 1 .2.

Variance in total population size and transgene frequency follows a convex trajectory
over time increasing to a peak initially and then declining in all scenarios (Fig. 1.2a
& Fig. 1.2b). Declining population size results in concurrent increase of the %CV in
population size for all scenarios; however, in scenarios with a stochastic implementation
of juvenile viability, either jointly or singly (scenarios J, F+A+J, M+A+J etc), the %CV
in population size was _>. 50% at half the time to extinction, and the rate of increase was
almost exponential (See Fig. A. 1 .4a & Fig. A.1.4b scenario J and scenario F+M+A+J).

Importantly, the variance in total population size is initially lower than the sum of
variances of the individual genotype populations (around 15-20 generations, Fig. 1.2a)

and thereafter it is greater than the sum of the variances. This implies that the covariance

17

 

tinge trans;
mutton in
than lIOT \t‘t‘
\u"iill\ [\Ct

An imp.
'he former. [
did Iilt‘ pro?“
Furthemn in
{ten (Pr: PM
.\1*A*J. F4.
fit} or matir.

pmNehee o

of the WW genotype with the other genotypes changes both in sign and magnitude over

time when the allelic diversity is highest at the transgenic locus.

1.3.2 Allelic Frequency of the Transgene

In the deterministic model, there was no ﬁxation of the transgene; the maximum trans-
gene frequency equalled 0.75, before the total population became extinct. Under stochas-
ticity, it is expected that the distribution of maximum transgene frequency (max(pww))
to be centered around the deterministic value. For all scenarios, the evolution of the av-
erage transgene frequency (pww) closely follows the deterministic trajectory. However,
variation in the transgene frequency over time was lower for scenarios F, M, and F+M,
than for scenarios that included stochasticity in those components, plus stochasticity in
viability (scenarios F+J, M+J, and F+M+J; Fig 1.2b).

An important distinction between the stochastic and deterministic models is that, in
the former, transgene frequencies can go to ﬁxation in individual populations (pww = 1),
and the probability of ﬁxation depends on the components that are stochastic (Fig. 1.4).
Furthermore, an important consequence of stochasticity is that the probability of ﬁxa-
tion (Pr(pWW = 1)) was high when viability components were stochastic (scenarios J,
M+A+J, F+M+A +J; Fig. 1.4). There was no ﬁxation of the transgene when only fecun-
dity or mating success components were stochastic (scenarios F, M, F+M; Fig. 1.4). The
persistence of the transgene was assessed as the proportion of simulations where the trans-
gene frequency (pww) is _>. 0.01 in 49 generations (Fig. 1.4). In the deterministic case
pww = 0.00176 (at w juvenile viability=0.65, ww mating success=5) in 49 generations,

i.e. the transgene frequency is lO-fold lower than the persistence threshold. Contrastingly,

in the stochastic case 1.5% of the 200 simulated populations had pww 2 0.01.

18

1.4 Discussion

The environmental risk associated with transgene introgression into natural populations

can be characterized using a simple probabilistic approach (Muir and Howard, 2002) as

Risk 0c Pr(HarmlHazard) >< Pr(Hazard) (1.2)

where Harm denotes the possible community impacts and Hazard denotes the possibility
of gene introgression and the symbol | represents the conditional operator. Risk can be
difﬁcult to characterize given the potential for impacts at multiple levels of biological
organization, from populations to communities, except under straightforward situations,
e.g. the Trojan Gene hypothesis (Muir and Howard, 1999), wherein population extinction
occurs. Alternatively, if the probability of transgene introgression (Pr(Hazard)) is zero
(or close to zero) then the Risk is also very small.

Our results show that that predictions from a purely deterministic model can severely
underestimate the possible risks of extinction (Fig. 1.1; Scenario F+M+A+J). Demo-
graphic stochasticity is usually considered to be more important than other forms of
stochastic variation for small population sizes, usually 5 100 individuals (Lande, 1993).
It has been shown that the variance due to demographic stochasticity can contribute to
uncertainty in estimates of extinction times in age-structured populations (Engen et al.,
2005b). This also applies to the multi-genotype case and is supported in our simulations
wherein the (quasi) extinction times from the deterministic model are usually in the upper
percentiles of the distribution (Table 1.1).

Transgene introgression acts as a perturbation and modiﬁes the variances in predic-
tions of population size (Fig. 1.2a) which depends on the covariances among genotypic

frequencies. Although we are unable to derive any explicit analytical explanation for

19

 

. ”’1': ‘7
\ 35“" I

ﬁe- 4‘ "TI-1"

L; QC“:“ : J

"1’” -IV>‘

RAF. '1“ \
\-

. i :5" IF. I
$1“de I;igl_' L

mn MT 1

l ;— T
N111" sr‘c‘\

imsmn and
tree han‘est
Muir. 20111 .'

We see 1|
:inn prohibit
31bpring trar
adult tramgm
This is it Chin
mlittle. it j.
from neppcn
titlixcnres ”'16

.r
hemograph V
g ,E e

A” imprin

PICtmme hi "h ..
c k

this phenomenon, it occurs under all scenarios and is independent of the form of age-

structure of the population or population size. Recent studies have reconsidered the role

of demographic variation in the context of predicting population dynamics, especially

when age structure (Engen et al., 2005b) or mating system (Legendre et al., 1999) are

taken into account. Demographic variances can range from 20-40% of the population

size (Saether et al., 2004) and can slow down biological invasions (Snyder, 2003), re-

duce stochastic growth rates (Engen et al., 2005a) and impact demographic parameters in
natural populations (e.g. avian populations (Saether and Engen, 2002) and Scandinavian
brown bear populations (Saether et al., 1998)). Effective population size (Ne) is shown to
be inversely related the demographic variance in age-structured populations (Engen et al.,
2005a). Therefore, our results suggest that transgene invasion into wildtype populations
has the potential to impact Ne and that demographic stochasticity can play an important
role in this process. This needs to be accounted for in risk prediction models of transgene
invasion and is especially valid within the context of using transgenic technology to re-
duce harvesting pressure on declining natural populations, such as the ﬁsheries industry
(Muir, 2004).

We see in our simulations that demographic stochasticity can impact predicted ﬁxa-
tion probabilities (Fig. 1.4), even in the absence of random sampling of alleles to form
Offspring (random genetic drift). The simulations are based on a one time introduction of
adult transgenic individuals and a low starting transgene allele frequency (pww = 0.001).
This is a conservative approach, both in the number and frequency of introduction. For
eXample, it is estimated that approximately 2 million farmed salmon escape every year
from net-pen aquaculture into north Atlantic populations (Naylor et al., 2005). This un-

der Scores the importance for utilizing a stochastic framework to account for the role of
demography even when assessing baseline risks of transgene introgression.
An important insight derived from our results is that demographic stochasticity can

pr on‘l<)l:e higher persistence of the transgene and make a signiﬁcant contribution to the

20

,a

' ﬂ
“n.3,? 11': 1

.C'si‘ier‘e ti

' j ‘ :hou
if \ldi‘tm‘ '

mg. et '41..

gipciattom l

Grier.- :. c
to CEO. cut
efeetixe app
19951. In Ct
axing Monte
i: presihie It
31314;. The n

genetics and

liDJit‘x dill:
73651an av
Eli' —

dil pinpula

nth can be it

19921E100 lht

116$ underl \‘ir

variance in predicted allele frequencies (Fig. 1.4). Demographic stochasticity elevates
persistence of the transgene even when the overall ﬁtness of the transgene is lower than
the wildtype and predicted to be lost quite early in the deterministic context. This poses
an additional risk at two levels; First, under the context of G x E effects this creates
a potential for selection pressures on the transgenic type to vary over space and time
based on dynamic environmental regimes. Current rates of global environmental change
due to human populations can bring about rapid changes in biotic interactions (Kareiva
et al., 1993) and elevated persistence of the transgene may necessitate modiﬁcation of the
estimated risk of impacts. Secondly, recent microevolutionary studies have questioned
the stability of the genetic variance-covariance matrix (G-matrix (Ferriere et al., 2004;
Jones et al., 2003)) and therefore, higher persistence of a transgene combined with the
pleiotropic effects has the potential to transform the adaptive landscape of introgressed
populations through the G-matrix.

Given the uncertainties associated with the evaluation of environmental risks attributed
to GEOs, concern has been expressed that experimentation and modeling may not be an
effective approach to make quantitative predictions and sound decisions (Kareiva et al.,
1996). In complex systems where experiments are impossible to conduct, simulations
using Monte Carlo techniques can provide an avenue of experimental evaluation making
it possible to derive insight into the plausible sets among competing hypothesis (Peck,
2004). The net ﬁtness component approach provides an useful framework to link both the
genetics and population dynamics and our results show that each ﬁtness component con—
tributes differently to demographic variances. Furthermore, it has been shown that there
are strong associations between demographic variance and life history characteristics (e. g
aVian populations (Saether et al., 2004)). Extensions incorporating life history trade-

Offs can be linked to the vital rates through the fecundity and survival functions (Roff,
1992) and therefore our modeling framework can also be useful in evaluating uncertain-

ties underlying the risk of transgene introgression based on life history characteristics.

21

In conclusion, we can say that demographic stochasticity makes a signiﬁcant contribu-
tion to uncertainty in predicting evolutionary trajectories in transgene introgression. This
certainly needs to be accounted for in evaluating the associated risks and the Net Fitness
component model provides a ﬂexible framework not only to incorporate various sources

of stochasticity but also additional life history information.

22

 

 

Ar]

F41
FrA’J
f+.\l+.~\*
NisAeJ
Small

AS:

i471? ii 81..

extinction 1hr.

distnhutinn at

 

 

Scenario Mean Sthev %CV Variance Qu. 5% Qu. 95% Pdet ”der
percentile
F 62.504 0.671 1.073 0.450 61.422 63.603 62.060 0.257
M 62.088 0.397 0.640 0.158 61.470 62.748 62.060 0.490
J 65.487 20.414 31.172 416.714 53.277 79.471 62.060 0.500
A+J 58.698 13.799 23.509 190.415 48.334 72.062 62.060 0.779
F+M 62.506 0.787 1.260 0.620 61.277 63.904 62.060 0.300
F+A+J 58.859 17.552 29.821 308.071 48.081 70.560 62.060 0.760
F+M+A+J 58.144 10.810 18.592 116.862 48.380 70.084 62.060 0.781
M+A+J 58.273 14.307 24.552 204.687 47.780 71.217 62.060 0.775
3mi11 83.473 9.087 10.886 82.573 75.373 93.139 87.349 0.822
AS2 65.333 18.971 29.037 359.889 57.004 74.641 63.662 0.501

 

 

 

 

 

 

 

 

 

 

Table 1.1: Statistics and quantiles for the distribution of extinction times using a quasi-
extinction threshold of 100 individuals. The last column represents the percentile of the
distribution of extinction times wherein the value from the deterministic model falls under.

23

Table 1.3: Seen.
ateehnxtie. The

 

Stochastic Scenarios
Components

 

Fecundity

Mating Success

Juvenile Viability

Adult Viability

Fecundity and F+
Mating Success

Adult and Juvenile A+J
Viability

Fecundity, Adult F+A+J
and Juvenile Viability

Mating Success, Adult M+A+J
and Juvenile Viability

Fecundity, Mating Success F+M+A+J
Adult and Juvenile Viability

z>s~le

F+M+A+J:Initial population 3Mill
3 million individuals
F+M+A+J: Weekly A32

Age-Structure

 

Table 1.2: Scenario names and the associated components of ﬁtness that were considered
stochastic. The scenario names will be used for reference in the text.

24

Railinciu
:r .tfl 1:1 \
Eff-01);? \\

f ' .T
“3‘1 88111.4".

 

__——i

Parameters
Simulation Genotype Juvenile Adult Female Male Age of Mating ﬁnal

 

 

set viability viability Fecundity Fertility maturity Success Age

WW 0.9324 0.95 8.8 1 64 d. 1 280 d.

Runl Ww 0.9292 0.95 8.8 l 64 d. 2 280 (1.

WW 0.9292 0.95 8.8 1 64 d. 2 280 (1.

WW 0.7432 0.698 8.8 1 9 wks 1 31 wks

Run2 Ww 0.61-0.75 0.698 8.8 1 9 wks 1-5 31 wks
by 0.01 by 0.2

ww 0.61-0.75 0.698 8.8 1 9 wks 1-5 31 wks
by 0.01 by 0.2

 

Table 1.3: Parameter values used in both sets of runs. 83 generations for all Scenarios in
Run 1 including AS2. Initial population size of the WW genotype was 60,000 individuals
in an I :1 sex ratio (30,000 males and 30,000 females). 60 adult individuals of the ww
genotype were introduced into the population with a 1:1 sex ratio in the 65 days age class
for simulation 1 and at 10 weeks of age for Run2.

25

(iIIIII‘Hly
(1 1 (l 2’ (1 ii (1 11 f)

()0

Ofﬂi

(104

donsny

Ofﬁ?

(700

Figure 1.1;
quU‘CXIin
sittr, heme]
the detemt:
di‘mbUtior

Scenario: F Scenarlo: M

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0..
IIQ‘ T-
o co
v“ oi
o co
.é‘co 0”
'o w. :9
cl
,_ (‘1.
o" o
o o
o‘ - c . . d‘ . . . - .
61 62 63 64 61.0 61.5 62.0 62.5 63.0
0 Scenario: J Scenario: F+M+A+J
o.
o' 8.
o'
v .
°= s
2‘0 -‘
.a + o
g .
‘° 8, a.
0 c5
1
o o
o. o_.
o' e o

 

 

 

 

 

 

 

so 8'0 160 120 40 so 8'0 160 120
generation generation

Figure 1.1: Distribution of extinction times for scenarios F, M, J and F+M+A+J at the
ClililSi—extinction threshold of 100 individuals. A kernel density estimate using a Gaus-
sian kernel is used for smoothing. The vertical bar represents the time to extinction in
the deterministic model at the quasi-extinction threshold (”der Table 1). Plots showing
distributions for all scenarios are available in the appendix (see Fig. A. 1 .4 in Appendix

26

’7

16¢“.& e

AOCCCEC>

w-
Vaeno_

.0.

0 NJ.

...4Jn {ft .' .l..1
Anxzecrzu>v©nz

7

..
i

111 “Vh J
QOCGZG>an¥

:

. .tlhio

 

e.

we
3%

0.411. .1.
2..-. mw) ®O~O

ure 1.2

Em

»

x .n
a e a
) '1.“
r1 .n e
e ,a m
C r\ .6.an
n
,d he Kn.

 

 

    

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Tr? ' Scenario: F

8‘” “.1

Eco

9w 5}

8’N . .0

-d Scenario: F .7: /"‘
0 2'0 4'0 6'0 80 (T 20 4'0 6'0 at)

53

1g _

coo -- Lr').

5.9

is ‘ o

2;» '7“

g .

—0 Scenario: M ‘3 Scenario: M
0 1'0 2'0 3'0 4'0 50 6'0 0 1'0 2'0 3'0 40 50 6'0

1.0 O-

3;:-

8 Le

gr:

3.. a

8’

-0 Scenario: J . E?” j Scenario: J
0 2'0 4'0 00 0 2'0 4'0 6'0

6‘2-

0

.62-

Sun

0)

2° Scenario: F+M+A+J “3 Scenario: F+M+A+J
0 1'0 2'0 3'0 4'0 50 6'0 70 0 1'0 20 3'0 4'0 50 so 70

 

Figure 1.2: The evolution of variance over time for Scenarios F, M , J and F+M+A+J in
the total population size (Left Column Fig. 2a) and the transgene frequency (right column
Fig. 2b). Limit for the x-axis is chosen as the generation where at least 100 populations
were not extinct over all runs for each scenario. a) The trajectory of the log transformed
variance of the total population size (solid line) compared to the log transformed sum
of the variances (dotted line) over the individual genotypes. b) The trajectory of the log
transformed variance of the transgene frequency.

27

one

ure 1.3.:

.ttttttmvmw.mu

 

..cm..o - -| my OPAO
COEOQOLQ

11 8,6...

in the

110113.

hold (_

lhIC\

o._25

3mill

0.20

proportion
0 15

 

 

M+A+J
o J
0' F+A+J F+M+A+J
In
Q .
o
o
q .
0

Scenario

Figure 1.3: Probability of ﬁxation of the transgene in all scenarios. Proportion of popula-
tions in the 1000 runs (Runl) where the transgene attains ﬁxation at the quasi-extinction
threshold of 100 individuals for all scenarios.

28

.3323 32:22.

MuTum

mating success

 

0.62 0.64 0.66 0.68 0.70 0.72 0.74
Juvenile viability

Figure 1.4: The persistence probability deﬁned as the proportion out of 200 populations
where pww _>_ 0.01. Contours of the persistence probability 0f the transgene after 49
generations (Run2). The lines represent contours of the persistence probability across
each combination of ﬁtness components

29

Chapter 2

Uncertainty in estimates of model
parameters in assessing the risk of
transgene invasion: Monte Carlo

approaches

2.1 Introduction

Ecological risk assessement focuses on the relationships among one or more stressors
and their associated ecological effects. It serves as a guide for regulatory decision mak-
ing and provides insights into the possible exposures and concomitant hazards. Recently,
there has been an emphasis on making population-level risk an explicit consideration in
ecological risk assessments (Barnthouse and Sorenson, 2007), and more importantly on
integrating genetics and population dynamics into model based approaches (Nacci and
Hoffman, 2008). Many types of population models have been developed for use in eco-
logical risk assessment. For a recent review and classiﬁcation of such models see, for

example Barnthouse and Sorenson (2007). Risk assessment of the environmental hazards

30

of genetically engineered organisms (GEOs) is one such area wherein population-level
strategies need to be employed and this necessitates the integration of genetics and de-
mography. Scientiﬁc approaches for assessing the risks of transgenic crops recognize the
need to utilize an ecological basis (e.g. Andow and Hilbeck, 2004). The main emphasis
is on transgenic organisms with dispersal strategies wherefore biological containment is
imperative because they have wild relatives (e.g. plants, insects, aquatic organisms and
microbes) with potential for hybridization leading to introgression or speciation (National
Research Council, 2004).

Current methodology for assessing the ecological risks and impacts of transgenic
organisms are based on a plethora of modeling frameworks ranging from verbal mod-
els (Knibb, 1997), reaction-diffusion methods (Soboleva et al., 2003), matrix popula-
tion models (Claessen et al., 2005; Gamier and Lecomte, 2006; Muir and Howard, 2001,
2002), and spatial transfer models coupled with population dynamics and genetics (Richter
and Seppelt, 2004; Meagher, 2003). A recurrent theme amongst these models is that they
are mainly developed for assessing the risk of transgenic crops and focus on transgene
introgression into wild relatives of the modiﬁed crop species (Hails and Morley, 2005).
Aquaculture is another area wherein there is a large and increasing diversity of GEOs
awaiting ﬁeld release (Devlin et al., 2006) and there is a relative paucity of modeling
frameworks in this area. Most species used in aquaculture have natural populations in
the wild (e. g. salmon) and this, in conjunction with their dispersal characteristics, creates
real potential for the transgenes to invade natural systems and cause perturbation at the
community and ecosystem levels.

For the case of genetically engineered (GE) ﬁsh, Muir and Howard (2001; 2002) pro-
posed a generic modeling framework based on matrix population models (sensu.Caswell,
1989) that incorporates genetics and demography. Their model is based on an age-
structured population model characterized by six net ﬁtness components (juvenile via-

bility, adult viability, mating advantage, age at sexual maturity, female fecundity and

31

male fertility, and reproductive longevity). The utility of this modeling framework lies in
linking individual life history traits to population dynamics to provide insight into the evo-
lutionary fate of populations (Grant and Benton, 2000). Muir and Howard (2001) found
that the risk associated with transgene introgression depends on the pleiotropic effects of
a transgene on ﬁtness components. Additionally, they showed that population extinction
results when the transgene is selected through increased mating advantage of transgenic
males, even with lower relative viability of transgenic juveniles, the ”Trojan Gene” effect
(Muir and Howard, 1999). Currently, matrix population models are widely employed in
applied ecology to predict the fate of populations of economic and conservation concern
(Tuljapurkar, 1997) and thus are also advocated as a reasonable framework for assessing
risks of GE crops (Hails and Morley, 2005).

Hitherto, most models for the evaluation of the risks associated with transgene intro-
gression have utilized a deterministic framework largely ignoring any stochastic varia—
tion, both intrinsic and extrinsic, underlying the dynamics. However, recent advances in
demographic theory have underscored the importance of the role of variation, both en-
vironmental and demographic, and is expounded to be imperative in understanding the
inﬂuence of anthropogenic modiﬁcations on natural populations (Boyce et al., 2006). For
example, it has been shown that variation in vital rates can signicantly affect outcomes of
demographic models with important ramiﬁcations, such as estimates of population growth
rate (Tuljapurkar et al., 2003) and persistence (Engen et al., 2005b). This is also relevant
in the context of GEO risk assessment and it has been shown previously that stochasticity,
both demographic (e.g.see Chapter 1) and environmental (e.g. Claessen etal., 2005), can
inﬂuence persistence probabilities, gene introgression and extinction times.

Uncertainty of model parameters is another source of variation for model predictions
of GEO risk and modeling approaches for transgene invasion have made feeble attempts
to account for this in predictive assessments (e.g. Claessen et al., 2005; Kuparinen and

Schurr, 2007). In the context of matrix population models, uncertainty in vital rates has

32

 

~41 tr.
11 "w

bee.
25¢ Pit-1‘“
jll‘nf‘iie‘ will
pet-cries '0‘;
Th ll'lL‘kir?‘ i‘r'
racist-Cs 11.1)
1%.: ninrrix l
and Benton.
senatien I.\1
ce—rtext to ex
Zions vtrrh ill
Elfptﬁxsli‘ic 0
Along w.
sized that Lit
forecasting e
late iOL'cht‘
.ial‘urkar. 19‘.
fer tmlrnnnj
1111168 iiﬂt‘hul
'anized the j
fates of diiicr
iitlltl‘tns mild 7,:
, L,
based on life
i“ We“ modj

false '
' {e.g. Ra:

been implemented using two broad schemes of resampling. One involves resampling
of the projection matrices over time for a single trajectory of the forecast and the other
involves sampling a random set of projection matrices, each having their individual tra-
jectories (e.g. Life-stage Simulation Analysis (LSA), Wisdom et al., 2000). However,
this incorporates only the effect of environmental variation through the vital rates. Many
methods have been developed to elicit the effect of individual parameters of the projec-
tion matrix on the overall growth rate of populations, such as elasticity analysis (Grant
and Benton, 2000; Caswell, 1989), and utilized in forecasting for management and con-
servation (Morris and Doak, 2002). Monte Carlo approaches are speciﬁcally used in this
context to evaluate the effect of uncertainty in parameter estimates, especially in situa-
tions with limited ﬁeld data or when analytical derivations of the model dynamics are
impossible or cumbersome (Tuljapurkar et al., 2003; Caswell, 1989).

Along with uncertainty in the parameter estimation process, it has also been recog-
nized that dependencies among the vital rates are also important, especially under the
forecasting context. On one hand, long standing considerations of these dependencies
have focused primarily on temporal correlations among the projection matrices (Tul-
japurkar, 1990) as opposed to among the individual vital rates themselves. This accounts
for environmentally driven correlation in vital rates and is shown to decrease extinction
times (Inchausti and Halley, 2003). On the other hand, life history theory has long rec-
ognized the importance of associations among vital rates in determining the evolutionary
fates of different genotypes in populations (Stearns, 1992). In the context of matrix popu-
lations models, recent studies have shown that incorporating associations in the vital rates
based on life-history theory can modify evolutionary stable states (Koons et al., 2008)
as well modify estimates of population growth rates (Ramula and Lehtila, 2005). Most
approaches utilize an ad-hoc method of eliciting the role of correlations among the vital

rates (e.g. Ramula and Lehtila, 2005; Wisdom et al., 2000), based on linear correlations

33

to measure dependencies. Linear correlation is widely used, with its popularity stem-
ming from the ease of calculation. Although it is a natural scalar measure of dependence
in elliptical distributions (e.g. the multivariate normal and t distributions), it also has the
limitation that it is meaningful only on the linear scale. Therefore, using linear correlation
when random variables are not jointly elliptically distributed might prove very misleading
(Embrechts et al., 2003). While being largely unknown in the ecological literature, cop-
ulas have gained prominence for modeling dependence and are being extensively used in
the ﬁnancial literature (6. g in risk management Embrechts et al., 2003) and the hydrolog-
ical sciences (Genest and Favre, 2007). Copulas provide a natural statistical framework
for jointly modeling multivariate distributions that incorporate dependence without re-
strictions on the nature of the univariate marginal distributions. There are many excellent
reviews in this area (e.g. Genest and MacKay, 1986) and we shall not attempt to do so
here (For an introduction to the theory of copulas see Nelsen, 2006).

In this study, we evaluate the effect of uncertainty in parameter estimates on predic-
tions of transgene invasion into natural populations using Monte Carlo approaches. We
utilize data collected from laboratory experiments to estimate ﬁtness components for GE
zebraﬁsh (Danio rerio). We use a non-parametric bootstrap approach (Efron and Tibshi-
rani, 1998) to study effects of uncertainty in the estimated parameters on predictions of
transgene introgression. Evaluations are restricted to the uncertainty in estimated ﬁtness
components among the genotypes in the population and the concurrent changes in gene
frequencies over time. Since the experimental setup utilized allowed only independent
estimation of ﬁtness components, we propose an extension to the bootstrap methodology
using a copula based approach to assess the effect of possible dependencies among the
ﬁtness components. We examine if the combination of the two approaches can provide
a more realistic methodology for improving predictions in the risk assessment of GB or-
ganisms and therefore, more broadly be used as a general framework in applications of

structured population models for conservation.

34

2.2 Model and Methods

2.2.1 Net Fitness Component Model

Our model is based on Muir and Howard’s Net Fitness component approach (Muir and
Howard, 2001), and a brief description of its deterministic version is given below. For
further details see Muir and Howard (Muir and Howard, 2001).

We assume a diallelic model with three genotypes in the population (transgenic ho-
mozygote ww, wildtype homozygote WW, and the heterozygote Ww). The dynamics of
the population of each genotype, comprising of the vector of age classes, can be formu-

lated as:

Reproduction

 

7‘ > —> ———>.
nfﬁjw = FIW W : f (NWW x NWW and NWW x NWW)

 

> %
t+ 1w ___ Fth .f(ﬁww x NWw’NWW X Wu»

 

 

no, W .
and N WW x N WW)
, —" ———> > >
nt+1 = Fww if NWw waw and Nww waw
O,ww 1
Survival
. _, -r+1 . . - ___, .1
"WW SWW 0 0 "WW
—-> ——>
”WW 2 0 SWW 0 x "WW (2.1)
nww 0 0 SWW nww
J L -

 

 

 

 

 

 

where g = WW, Ww or w are the wildtype, heterozygote and transgenic genotypes re-
spectively; the F tg (g = WW, Ww or ww) represents reproductive output for each geno—
type g at time t and are nonlinear functions of the individuals of reprodutive age across

all three genotypes; the Sg submatrices represent the age-speciﬁc survival parameters for

35

each genotype g; and I}? represents the population vectors of each genotype. The model is
not analytically tractable when there are multiple genotypes, except under very simpliﬁed
assumptions (Bodmer, 1965). Comprehensive review of the population genetic analysis
of age-structured populations in the deterministic context can be found in Charlesworth
(1994). Hereafter the WW genotype will be referred to as the WT genotype and the ww
genotype as the TG genotype respectively.

2.2.2 Experimental Setup

The experimental setup is described in Muir et.al.(2006). Brieﬂy, ﬁtness components
were measured for zebraﬁsh individuals (Danio rerio) for the transgenic (TG) and the
Wildtype (WT) genotype in the laboratory as follows. Age of maturity was measured for
10 females as the day on which they started laying eggs. Fertility of males and fecundity
of females for the TG genotype was estimated from reciprocal crosses of WT males with
TG females and vice versa. A total of 20 mating pairs were used for each cross. From
an additional experiment, crosses of WT x WT matings (10 replicates) were available
and these data were used to estimate the fecundity and fertility components for the WT
genotypes. To estimate viability, 50 individuals (1:1 sex ratio) of each genotype (TG 2
reps; WT 4 reps) were followed from the fry stage and the number surviving in each
replicate tank counted at approximately 30 day intervals. The replicate was assumed to
have matured once females in each replicate started laying eggs. This experiment was
terminated once the females stopped laying eggs after 501 days. To estimate mating
success three different ratios of WTzTG (10:2, 5:5, 2:10) males were mated with 12 WT
females with two replicates for each ratio. The total number of eggs laid were collected

and the offspring genotypic proportions recorded after hatching.

Validation Experiment: Mesocosm: Two replicated mesocosms were set up with a

starting population size of 500 adults (1:1 sex ratio) with 5:1 ratio of WTzTG genotypes.

36

Each mesocosm was followed for 10 generations and the time to loss of the transgene, and
the frequency of WT to TG genotypes was recorded. The mesocosm protocol involved
collecting the fry from each replicate and randomly selecting individuals from each repli-
cation to start the next generation with the same starting population size. Thus effectively
we monitor the change in gene frequency over generations independent of population

regulation.

2.2.3 Data Analysis

For each ﬁtness component speciﬁc probability models underlying the data generating
mechanism were assumed and a maximum likelihood approach was used to estimate the
parameters. Standard distributions generally used for modeling applications in ecology
were considered as follows: Fecundity ~ P0isson(A); Fertility ~ Beta(a,,3); Age of
Sexual Maturity ~ L0gN0rmal(,u,o-2); Viability, both juvenile and adult, were assumed
to be a non-linear function of age with Gaussian errors, as detailed below. Where possible
analyses were conducted accounting for possible interactions among the genotypes, e.g.
the Age of Sexual Maturity component was analyzed separately for the WT x WT, WT x
TG and TGXTG combinations, using an ANOVA decomposition to test for any reciprocal
effects. A brief description of the analysis for each component follows below.

Age Of Sexual Maturity: Based on the empirical distribution of the data, we assumed
a Lognormal distribution for age of sexual maturity and analyzed the data with a linear
model after log transforming the response variable. The location and scale parameters for
the log-normal distribution were obtained by back-transformation using bias correction
(Beauchamp and Olson, 1973).

Fecundity: Fecundity of females was assumed to follow a Poisson distribution with
parameter A, denoting the expected number of eggs produced per female over the life-

time. To estimate A a generalized linear model with a log link function was speciﬁed.

37

Reciprocal effects of the genotype on fecundity was assessed by using the genotype of
the mate as a factor.

Fertility: Fertility of males was assumed to follow a Beta distribution with parameters
a and [3, where the average fertility is given by 3%}, the expectation from the Beta
distribution.

Viability: Viability was modeled as an age dependent mortality function of the number
of individuals entering a cohort using non-linear regression. We assumed a non-linear
model with Gaussian residuals, with the mortality over age as an exponential function of
cohort size and a constant mortality rate (z), which translates into N, = Noe—zt + ei,where
t = 1, 2, 3... are the age classes; z represents the mortality parameter, assumed constant
for all age intervals; and e,- ~ N(0, 0'2).

Mating Success: Since there were only two replicates for each combination of WTzTG
proportions, the relative mating advantage was calculated for each replicate and averaged
over the replicates for each combination. The average was compared to the expected pro-
portions using a chi-square test. For each replicate the expected proportion of offspring
genotypes was calculated as the product of the frequency of WT males and the number of
offspring. The WT genotype was chosen since it was the recessive genotype and therefore
the phenotype uniquely identiﬁes the genotype. The ratio of the expected proportion to
the observed proportion of WT offspring gives the relative mating advantage (disadvan-
tage) of the WT genotype. The underlying assumptions are that there are no segregation
distortion, probability of fertilization and hatching are the same for all genotypes.

All analysis of the experimental data were done using the statistical package R (Team,
2009). The parameters of the Beta distribution were estimated using a maximum likeli-
hood approach by the fitdstr function from the MASS library. For nonlinear regression

the nls function from the nlme package was used.

38

 

H-l

D‘-.

\un'Pam
art in ‘

\-

‘,7"
', 9 \CL‘
f.Fes

_ I
qr: ..
"a
5‘4
1 ‘7‘
Lil bk.‘ L
t.

TTA P0151“:

51‘ Cast“ ("Iii

Dependmd'
eenpﬁncnh
med a mum)
.13“ COmP‘I’n‘
using linear Ct
and Lehtila. I
11.31113117131- '
dependencies 2
Variables. The
are not rextricte
her of existing r
mere distributio
inra review .1- Br
random variable

19861. if F (X l '

2.2.4 Monte Carlo Approach

Non-Parametric Bootstrap: Independence among vital rates We implemented a
script in R to estimate the bootstrap distributions of the estimated parameters for the
ﬁtness components. For each ﬁtness component the data were sampled with replacement
and then analyzed with the corresponding statistical model to get parameter estimates for
each bootstrap sample. We used 1000 bootstrap samples for estimating the uncertainty
in the parameter estimates. Alternatively, for the viability component, since the available
dataset was fairly small across the life-span of the experiment we resampled the residuals
from the ﬁtted model to generate the bootstrap distributions (Efron and Tibshirani, 1998).
This bootstrap strategy may involve some extra model assumptions, thus it was used in

this case only because of the limited experimental samples size.For estimating viability.

Dependencies using Copulas To assess the effect of dependencies among the ﬁtness
components we also extended the bootstrap methodology using a copula approach. We
used a multivariate copula to incorporate dependencies among the joint distribution of ﬁt-
ness components. Previously dependencies have been accounted for in an ad-hoc fashion
using linear correlations, either through incorporating pair-wise correlations (e. g. Ramula
and Lehtila, 2005) or a transformation from the multivariate normal (e.g. Morris and
Doak, 2002). The copula approach provides a formal statistical methodology to estimate
dependencies as well generate samples from the joint distribution of correlated random
variables. The principal advantages of this approach are that the marginals distributions
are not restricted and can be from any family of distributions. Copulas encompass a num-
ber of existing multivariate distributions and provide a ﬂexible framework for generating
more distributions(see Genest and Favre, 2007; Nelsen, 2006; Embrechts et al., 2003,
for a review). Brieﬂy, a copula is a function that deﬁnes the joint distribution of a set of
random variables based on their univariate marginal distributions (Genest and MacKay,

1986). If F (X) and G(Y) are the marginal distributions of two random variables X and

39

l nth a jig:
51.1.;th of '.
'3: maﬁa.
We live-ti
$47.1?in j'fp:
GJL“lJTT Ct.

5.315015 u 1}

{1‘37“ 'Sit’c

there 1‘ repr
and 4 reprext
rexpeeri‘. e1}:
4151': I
The cnpt.

ing the unde
en11eeted is r
tion we \amp

05' FOI siml

mean and \a.
ten: hiked 0
Pics “1111 1le

(‘3.
l mpgnems.

p

ated in Our ex.

Y with a joint density H(X, Y), then we can rewrite the joint distribution H(X, Y) as a
function of the marginals H(X, Y) = C(F (X ), G(Y)), where the function C(-, -) represents
the copula.

We used the R package copula (Yan, 2007) for working with copulas. To generate
samples from the joint multivariate distribution of the ﬁtness components, we assumed a
Gaussian Copula for the Fertility, Fecundity, Age of Sexual Maturity and Viability com-
ponents with correlation structure as shown below, based on potential trade-offs in life
history (Stearns, 1992): ‘

1 P12 P13 P14
P21 1 P23 P24
P31 P32 1 P34

 

 

_ P41 P42 P43 1 ,
where p represents the correlation coeﬂicient among pair of ﬁtness components; 1, 2, 3
and 4 represent the Fertility, Fecundity, Viability and Age of Sexual Maturity components
respectively; and p12 = p21 = 0, p13 = p31 2 ——0.5, p14 2 p41 2 0.5, p23 = p32 2
—0.5, p24 = p42 2 0.5 and p34 :2 p43 = —0.5.

The copula approach allows us to generate replicate multivariate samples incorporat-
ing the underlying correlation structure, similar to a parametric bootstrap. The data we
collected is not amenable for directly estimating the copula parameters and for illustra-
tion we sampled from a multivariate distribution with absolute values of correlations set to
0.5. For simplicity, we assumed the viability parameters were normally distributed, with
mean and variance based on the estimates from the original data. All other parameters
were based on the estimated marginal distributions of the data. A 1000 bootstrap sam-
ples with 100 observations each were generated from the joint distribution of the ﬁtness
components, and the same statistical models as the section above were used to estimate
the ﬁtness parameters. Sample sizes were unequal among the ﬁtness components evalu-

ated in our experiments. Therefore, to reproduce the marginal distributions in the copula

4O

..‘ r. ,
1an] Akin L .5.

. .3
T‘I‘i') ‘3!»‘
\l .i s
1

Simulations:
2e: fine“ c...
away. The
rep for a pa
Lang-ans for e.
and error pr.
1":\It‘ati of .1.
faith {0 prU
3:35 stmctar:
Item Sigemd
the WPUlati.
.3131) lieu re
lation boner
:iiiilh. Our
Ads Unchdn‘
“Cﬁklg i‘L‘L‘L
eiabiliry to ‘
indj‘idual 3
Mail 1996
em 0f the r
daily milde]
daily mOdcl

0pmmizillior

approach that were similar to the bootstrap distributions of estimates, we used adjusted

sample sizes (~ 100) for all components.

Simulations For each set of bootstrap estimates of the ﬁtness components we ran the
net ﬁtness component model to generate distributions of the evolution of transgene fre-
quency. The estimated values for the mating success component was based only on two
reps for a particular WTzTG combination. Therefore we ran three different sets of simu-
lations for each estimated value of the WTzTG combination. To reduce model complexity
and error propagation we used a reduced age-structure, so the model represented weekly
instead of daily time steps. This necessitated re-parameterization of the estimated vital
rates to ensure that the dynamics of the reduced model closely follows that of the original
age structure. Under the theory of matrix population models (Caswell, 1989), the domi-
nant eigenvalue (/l) of the projection matrix is representative of the long-term dynamics of
the population and is based on the primitivity and irreducibility of the projection matrix.
Any new representation should try to mimic the dynamics of the original parameteri-
zation based on xi, the generation time and the age distribution (Yearsley and Fletcher,
2002). Our re-parameterization was carried out as follows: Estimated fertility of males
was unchanged and estimated fecundity of females was rescaled by a factor of 7 to reﬂect
weekly fecundity instead of daily fecundity. Adult viability was rescaled by raising the
viability to the 7th power, so that the overall survival of the adults over the adult stage of
individual genotypes in the population remained the same. We used a Genetic Algorithm
(Wall, 1996) to optimize the juvenile viability parameter such that the resultant Ii param-
eter of the reparameterized model and the generation times were equivalent to that of the
daily model, to reproduce the same dynamics. This resulted in the optimum WT juvenile
viability being lower than the TG juvenile viability in the weekly model relative to the
daily model. The juvenile viability was the parameter chosen to be modiﬁed during the

optimization since it was the worst estimated ﬁtness component due to small sample size

41

i.e. there were only 6 data points over 2 reps for the TG genotype and 12 points over 4
reps for the WT genotype to estimate this component, over ~ 90 and ~ 60 day intervals

respectively.

2.3 Results

Estimates of ﬁtness components for the Transgenic (TG) and the Wildtype (WT) geno-
types are given in Table 2.1. For all ﬁtness components the WT had higher relative ﬁtness
values and therefore in this case it is expected that the transgene will be lost from the
population. The mating success for each of the different proportions (WTzTG combina-
tions) were different as there was some frequency dependence in mate choice (Table 2.1).
Calculations of the estimated asymptotic growth rate (RWT 21.0422 and ’lTG =1.0213)
based on average parameter values indicated that WT was the favored allele. Additionally,
the WT males had greater relative mating advantage taking on three possible values (1.16,
1.93 and 5 times that of TG). The deterministic model based on the estimated values of
the ﬁtness components estimated the transgene lost (ﬁxation of the WT allele W) within
16.17, 6.68, 3.69 generations with increasing mating advantage of the WT males respec-
tively. All generation times are reported with respect to the WT genotype and calculated

based on the average time for a female to replace herself (Caswell pg 110).

2.3.1 Independence among vital rates: Non-Parametric Bootstrap

Distribution of Parameters The distribution of the bootstrap estimates of the ﬁtness
component parameters are given with the associated summary statistics (Table 2.1; Fig.
A.2.l). As mentioned above, not only does the WT genotype have higher ﬁtness on
average over all the individual components (Table 1), but also it has higher ﬁtness under
all combinations of the ﬁtness components. This is shown by the low probability of the

relative values (TG/WT) of the ﬁtness components being < 1 for all sets of the bootstrap

42

shew?» 1. Ti

:egatne b1.

extra tariat

tripinenn
the .‘lttminar
'ift’i' T116 .
~ 972 baxe
reiariiel} \n‘.
dwmies. T
‘i change in
3.31.
Transgene fr
gill?“ ill Figul
adl"antage of l
decreaxex in a
the WT 891ml;
5111016 [0 the lit
iiitl suﬁaCe 2%]

0 .
ﬁlters ing

estimates (Table 2). Since the WT genotype has a higher relative male mating advantage,
it is expected that the transgene will be lost under all conditions (Fig. 2.1), i.e there is
no possibility of a “Trojan Gene” effect (Muir and Howard, 1999). With the exception
of female fecundity, the distributions of the ﬁtness component estimates had a relatively
low variation around the expected values as seen by the %CV of the estimates (Table
2.1). Fecundity estimates for both WT and TG had a greater than expected variance
due to overdispersion relative to the Poisson distribution in the original data (results not
shown). This can be easily accommodated by using a different probability model, e.g. the
negative binomial, although in our case the nonparametric bootstrap method captures this
extra variation (Table 2.1; %CV for fecundity). Uncertainty in the estimates of the ﬁtness
components results in variation in the estimated As for each genotype,calculated from

the dominant eigenvalue of the projection matrix, and also their relative values, hereafter

 

’irel- The Arejs ranged from ~94%-99% (:TG respectively Table 2.2) compared to
~ 97% based on the averages for the individual genotypes. While the shift in ’irel is
relatively small (~ :l:3%) across all simulations, this induces substantial changes in system
dynamics. These changes can be seen by the large %CV of the ﬁxation times, relative to

% change in ’irel across all three scenarios of mating advantage for WT males (Table

2.3).

Transgene frequency The distribution of the time to ﬁxation for the WT genotype is
given in Figure 2.2 for the three different values of male mating success. As the mating
advantage of the WT genotype increases the average time to ﬁxation of the WT genotype
decreases in a nonlinear fashion. Concurrently, the variance in the time to ﬁxation of
the WT genotype also decreases. For both genotypes the ﬁxation times are relatively
stable to the variation across any individual ﬁtness component as seen by the relatively
ﬂat surface 2-dimensional kernel density estimates,except for the fecundity of the TG

genotype (Fig. A22 and A23). This is also the case for the relative values (Fig. 2.3)

43

Diarih

. '31."
37:.“
.

dppft‘i {it‘ h
ditty are it
This call PK
button 0i- "
scales iFig.
modify the
of their dist;
lation tRhtu.‘
scale the con
tit'e Values of
also induces t
Cti'mponents it
distension is ret
tune to the ind

‘ 40% reductit.

 

and is independent of the values of male mating success. The distribution of ﬁxation times
of the WT genotype show dependence only with respect to the distribution of the As of
the WT and TG genotypes and this relationship also holds for the ’irels (Fig. A.2.2, Fig.
A.2.3 & Fig. 2.3).

2.3.2 Copulas: Dependence among ﬁtness components

Distribution of Parameters By design the marginal distributions of the estimated pa-
rameters were elicited to be similar to that of the independent conﬁguration (Fig. A.2.l;
Table 2.1), however, the COpula induces correlations among the parameters that restricts
the multivariate distributions to a reduced subset of possible conﬁgurations (Fig. 2.4). As
expected, the pairwise joint distribution of the parameters are different under the copula
approach even though the marginal distributions are the same. Since fertility and fecun—
dity are for different sexes we did not simulate them to have any correlations, although
this can be easily accommodated in this approach. The relationship of the joint distri-
bution of relative ﬁtness parameters are not similar to their relationship at the absolute
scales (Fig. 2.5 vs Fig. 2.4). Including dependencies among the ﬁtness components can
modify the joint distribution of the relative ﬁtnesses as revealed by pairwise comparison
of their distributions (Fig. 2.5), which exhibit negative correlations (Spearman’s corre-
lation (Rho); Table 2.3b). For example, in our case we see that although in the absolute
scale the correlation between fertility and sexual maturity is positive (Fig. 2.4), the rela-
tive values of these ﬁtness components are negatively correlated (Fig. 2.5). Dependencies
also induces changes in the marginal distributions of the relative ﬁtness values for all
components with increased dispersion (Table 2.2), except for sexual maturity where the
dispersion is reduced. Additionally,dependencies restrict the possible values of A re ls, rel-
ative to the independence approach, wherein the distribution is more concentrated with

~ 40% reduction in the dispersion (Table 2.2).

Iran~2me Fl
.1 cijwﬁkin \1 ii

.‘::*:".i.”x‘11'~’fi"~‘ ‘

A

L‘Lllluf‘ll Oi. 1T1;
t‘ecundit} at"

then ed xx

Transgene Frequency The average time to ﬁxation for the WT genotype was similar
in comparison to the scenario under which there were no dependencies among the ﬁtness
components (WT mating success 1.93 vs 1.93(C) Table 2.3a). The trajectory of the av-
erage over the replicates with dependencies were similar to that of the average over the
independent replicates (Fig. 2.1). With dependencies among vital rates the distribution
for the ﬁxation times changes with a reduced dispersion relative to the independence ap-
proach (Fig. 2.2). Comparison across individual components show that the probability of
ﬁxation is not related to the any of the components in both the WT and the T G genotypes
(Fig. 82). In the case of the relative ﬁtness components, it can be seen that the distri-
bution of the ﬁxation times show linear trends as a function of the relative ﬁtness for the
fecundity and sexual maturity components (Fig. 2.3). In the case of ’lrelS’ the linear trend
observed with dependencies has a greater slope compared to the independence scenario
(Fig. 2.3). In all cases, the joint distribution of ﬁxation times and relative ﬁtness compo-
nents are more concentrated and show lower dispersion than the independence scenario
(Fig. 2.3). Finally, the distribution of ﬁxation times have reduced mean and variance as

well when dependencies are present among vital rates (Table 2.3a).

2.3.3 Validation with Mesocosm Data

The validation experiments had only two reps and in both reps the transgene frequency
drops to zero within 10 generations. We extracted the 99% prediction intervals for all
three scenarios of mating success for 10 generations, by calculating the generation time
for every bootstrap replication based on the vital rates for that particular replication. The
prediction intervals for transgene frequency with WT mating success=1.16 envelopes

both mesocosm realizations, while the prediction intervals envelopes only one replicate

45

when WT mating success=l.93 (Fig. 2.6). The scenario with the highest mating suc-
cess predicts reduction in transgene frequencies to be much faster than the observed re-
alizations. The results should be interpreted with some caution in the light of only two
stochastic realizations available from the experimental results. The 99% intervals can be
misleading if the density of the distribution is not symmetric, as in our case, and this is

can be visualized by the vioplots (Fig. 2.6).

2.4 Discussion

In Chapter 1 it was shown that demographic stochasticity, arising from variation at the in-
dividual level, can alter predictions of the risk associated with transgene invasion. Specif-
ically, for the “Trojan gene’ effect demographic stochasticity can reduce extinction times
relative to the deterministic model and more generally increase the persistence of the
transgene in the system. Here we consider the effect of uncertainty in estimated parame-
ters on model predictions applying the non—parametric bootstrap approach.

The bootstrap is a ﬂexible resampling approach, in implementation and application,
and its use is advocated to assess uncertainties in parameter estimation when using struc-
tured population models (Caswell, 1989; Wisdom et al., 2000; Morris and Doak, 2002).
Models of transgene invasion have focused on the invasibility of the transgene (e. g Sobol-
eva et al., 2003; Meagher, 2003; Gamier and Lecomte, 2006) and do not explicitly incor-
porate uncertainty associated with estimating parameters, due to a combination of lack
of data and model complexity. This has already come under criticism in the context of
model of population viability analysis (PVA) in conservation biology (Fieberg and Ellner,
2003). Our results, using the bootstrap procedure, show that uncertainty in estimation
of model parameters contributes to considerable variation in predictions of the loss of
the transgene (Fig. 2.1). Furthermore, bootstrapping is useful when evaluating the uncer-

tainty of functions of model outputs (e. g. ﬁxation time) which are analytically intractable.

46

This is especially relevant in the context of GEO invasion, where the overall population
growth rate (11), as well as genotype speciﬁc growth rates, (xlg) are non—linear functions
of the vital rates of all genotypes, unless simplifying assumptions are made regarding fre-
quency dependence (Bodmer, 1965; Charlesworth, 1994) or density dependence (Benton
and Grant, 1996). Using a bootstrap assessment of uncertainty in estimated vital rates, a
re-analysis of data in avian populations showed predictions of extinction risk were mod-
iﬁed (Saether and Engen, 2002). In our case, we see that variation in ﬁxation times of
the WT genotype are considerably higher in magnitude (Table 2.3), relative to variation

in the estimated parameters (Table 2.1).

2.4.1 Dependence vs Independence

There is substantial theory from a life history evolution perspective, which links vital
rates to ﬁtness and elucidates the mechanistic basis underlying covariation in vital rates
(Roff, 1992; Steams, 1992). It is acknowledged that covariation among vital rates is an
important determinant towards making realistic predictions and ignoring this can lead to
spurious conclusions (Coulson et al., 2005). Using the copula approach, our predictions
incorporating dependencies among ﬁtness components show that prediction intervals can
be over-estimated when dependencies are ignored (Fig. 2.6). Estimating covariation of
vital rates has been fraught with diﬂiculty and most work have utilized ad-hoc methods
to incorporate correlations, usually using the well known Pearson correlation coefﬁcient
(e.g. Ramula and Lehtila, 2005). For instance, Morris and Doak (2002) recommend uti-
lizing a multivariate Normal distribution incorporating correlations and transforming the
resultant Normal marginal distributions to the distributions of interest. However, it is well
known that correlations in the original scale are not an one-to-one transformation on the
underlying normal scale. The Pearson correlation coeﬂicient is appropriate only for ellip-

tical distributions such as the Normal since it is a linear operator (Embrechts et al., 2003).

47

Recently, there has been a re-emergence of a less known statistical approach to model de-
pendence among arbitrary distributions based on their univariate marginal distributions,
the copula approach. Our results using this approach also show that correlations among
the estimated vital rates can provide more realistic predictions by excluding regions of the
parameter space that are highly unlikely (Fig. 2.5). An additional observation that stems
from our results is that dependencies among the absolute values of ﬁtness components
induces correlation among relative ﬁtness values. In our case, this occurs in the opposite
direction to correlations on the absolute scale (Table 2.3b, Fig. 2.4 vs Fig. 2.5). While
this may be a statistical artifact arising from ratios of the two random variables, this mer-
its further study as the ﬁnal determinant of selection is based on relative ﬁtnesses and not

absolute values.

2.4.2 Considerations of experimental methods

A decade ago there was little faith in the concept of combining models and short-term
experiments to make predictions about transgene invasion into natural populations. The
main contributions from invasion theory was considered to be on guidance towards setting
up monitoring schemes and sampling programs that will be cost-effective in detecting a
problem invasion (Kareiva etal., 1996). Comparisons of the prediction distributions ver—
sus trajectories of the validation data show surprisingly good agreement (Fig. 2.6). This is
an optimistic outlook considering that there are only two replicates and also considering
that correlations can modify the predictions. However, the present results do provide the
promise that models combined with data can provide a realistic predictive framework for
risk assessment of GEO’s.

Although experimental data was available to estimate ﬁtness components, there were

pitfalls in our experimental design for the current study. Our experiments were good

48

enough to provide estimates of the marginal distributions of ﬁtness components as re-
ﬂected in the agreement of our model predictions with the validation data (Fig. 2.6). How-
ever, the current design is not amenable to estimating the underlying correlation structure
among vital rates and resulting in over or under estimates of the bounds when assessing
uncertainty. Incorporating the Copula approach allows for a rigorous statistical method-
ology that provides a framework to extend current modeling methodology by accounting
for covariation in vital rates and making predictions more realistic. This extension can
be of great utility in the context of GEO risk assessment due to the necessity for making
quantitative predictions of transgene invasion.

Data requirements for utilizing a Copula approach principally comprises of simultane-
ously collecting variables on the vital rates. The basic premise that there exists an unique
Copula for continuous multivariate joint distributions provides an avenue for statistical
estimation not only on the marginal distributions, but also on the underlying correlation
structure. Likelihood based methods already exist (e. g in the copula package) to estimate
the parameters for a copula based on the functional form of the copula and the marginal
distributions. The necessity of jointly measurements for data collection can be easily
accommodated when conducting laboratory experiments to estimate ﬁtness components.
For example, fertility (fecundity for females), sexual maturity and viability, in both adults
and juveniles, can be measured on the same replicates, by estimating them concurrently
for the entire cohort. While this would entail separate measurements for the two sexes
the net ﬁtness component model can be easily modiﬁed to accommodate the different sets
with the added bonus of increasing predictive accuracy. Once the joint measurements are
available, there are many methods currently available to estimate the joint distributions
using the copula approach. This is an active area of research which includes develop-
ment of methodology for making model choice based on different copulas. The proposed
modiﬁcations to designing experiments can also be applied in general to even population

viability assessments (PVA) using matrix population models.

49

2.4.3 Conclusions

Models to assess transgene introgression into natural populations need to incorporate eco-
logical principles in a biologically meaningful manner. Concurrently, variation is ubiq-
uitous in natural populations and contributes to uncertainty in the resultant estimates of
population level parameters. In this study we show that accounting for the uncertainties
using the non-parametric bootstrap can generate more realistic predictions of transgene
fate. We also illustrate extensions to modeling dependencies among ﬁtness components
using copulas that can enhance predictions by incorporating biological relationships. In
the event that data collected is not amenable to estimate the copula parameters, this can
still be a powerful tool to assess the effect of different types of correlation patterns e.g.
in sensitivity analysis. Additionally, demographic stochasticity can also be incorporated
in the model (see Chapter 1) and while this will make predictions even more uncertain,
this is certainly more realistic. Further extensions are possible such as linking forecasting
data on experimental drivers with elements of the transition matrices (Gotelli and Elli-
son, 2006), thus emulating long-term environmental change. Thus evaluation of uncer—
tainties can lead to methodological enhancements, in both data collection and modelling

approaches, paving the way for effective decision-making and policy.

50

Trans;
feeun.

enih:
\Eahih
Sexual
\laturi‘

efincrea
“JSJan
model

 

Wildtype 2.5% Per. lst Qu. Median Mean 3rd Qu. 97.5% Per. std. dev %CV

 

Fecundity 16.651 19.920 22.380 22.670 25.020 30.435 3.648 16.090
Fertility 0.465 0.474 0.478 0.478 0.483 0.491 0.007 1.416
Viability 0.991 0.992 0.992 0.992 0.992 0.992 0.000 0.032
Sexual 65.709 67.120 67.920 67.840 68.610 69.808 1.066 1.571
Maturity

xi 1.036 1.040 1.042 1.042 1.044 1.048 0.003 0.300

 

Mating Success repl: 1.16 rep2: 1.93 rep3: 5

 

Transgene 2.5% Per. 1st Qu. Median Mean 3rd Qu. 97.5% Per. std. dev %CV

 

Fecundity 3.840 5.616 6.779 6.838 7.892 10.536 1.723 25.205
Fertility 0.370 0.391 0.402 0.402 0.413 0.434 0.016 4.061
Viability 0.984 0.985 0.986 0.986 0.986 0.987 0.001 0.070
Sexual 90.970 94.710 96.430 96.370 98.180 101.500 2.709 2.811
Maturity

,1 1.009 1.019 1.020 1.020 1.022 1.024 0.004 0.358

 

 

Table 2.1: Summary statistics of the bootstrap estimates of ﬁtnesses and It (the ﬁnite rate
of increase) for the Wildtype (WT) and transgenic (TG) genotypes. The TC (w allele)
was dominant in our case and this assumption was made for all ﬁtness components in the
model.

51

-/ 112*". (‘7‘.1—4 ":1

7" M I W|h|;n

> .—

Tal‘lc 3.:
of the W
ratio TG
Para-men
311F116 illl‘

ing in re

 

Parameter
TG/W T

2.5%

1 st Qu. Median

Mean 3rd Qu.

97.5%

std.dev. %CV

 

Fertility
Viability
Fecundity
Sexual
Maturity
ll

0.7697
0.9922
0. 1601
0.6630

0.9662

0.8175
0.9932
0.2400
0.6901

0.9764

0.8403
0.9937
0.2991
0.7042

0.9789

0.8414
0.9937
0.3095
0.7046

0.9783

0.8655
0.9943
0.3674
0.7192

0.9810

0.9091
0.9952
0.5282
0.7510

0.9850

0.0364 4.3253
0.0008 0.0772
0.0929 30.0009
0.0224 3.1783

0.0046 0.4688

 

Copula

 

Fertility
Viability
Fecundity
Sexual
Maturity
4c

 

0.7645
0.9922
0. 1685
0.6678

0.9737

0.8133
0.9931
0.2454
0.6924

0.9773

0.8401
0.9937
0.3037
0.7064

0.8413
0.9937
0.3178
0.7070

0.8696
0.9942
0.3719
0.7208

0.9791 0.9791 0.9809

0.9242
0.9952
0.5655
0.7468

0.9844

0.0404 4.7988
0.0008 0.0783
0.1012 31.8354
0.0208 2.9450

0.0028 0.2867

 

Table 2.2: Summary statistics for the relative ﬁtnesses and 11 (the ﬁnite rate of increase)
of the WT and TG genotypes for all scenarios. The relative ﬁtnesses are based on the
ratio TG/WT. Although, the average values are the same for the Copula and the Non-
parametric bootstrap approaches, the distribution of the relative values are different. The
same joint distribution of all the estimated components were used for the scenarios differ-
ing in relative mating success of the WT males.

52

 

3a). Summary Statistics of Fixation times

 

 

 

WT Mating 2.5% lstQu. Median Mean 3rd Qu. 97.5% Std.dev %CV
Success

1.16 5.677 11.300 14.680 20.600 21.220 77.561 21.186 102.847
1.93 3.949 5.956 6.469 6.727 7.204 10.441 1.630 24.233
5 2.410 3.038 3.230 3.225 3.435 3.960 0.356 11.036
1.93(C) 5.324 6.036 6.480 6.764 7.148 9.965 1.227 18.145

 

 

3b). Correlations among relative Fitness Components

WT mating Success 1.93

Fertility Viability

Sexual Fecundity

 

Maturity
Fertility - -0.00921 0.00035 0.00088
Viability - - - 0.05483
Sexual maturity - - - -0.04022
Fecundity - - - -

 

WT mating Success 1.93 (C)

Fertility Viability

Sexual Fecundity
Maturity

 

 

Fertility — 04622499 -O.4597631 0.015746
Viability - - - —0.37283
Sexual maturity - - - 0385899
Fecundity - - - -

 

Table 2.3: Summary statistics of the ﬁxation times for the WT genotypes. Fixation is
deﬁned as the time where the TG gene frequency < 0.001. The Scenarios indicate the
relative mating success of the male WT genotype and the scenario sufﬁxed with (C) was
implemented using the Copula approach.

53

0.20 0.25 0.30

TG Gene frequency

0.00 0.05 0.10 0.15

I:lgur
b'tbed Oil
””66 ditl

e ”ale
apprCta—Ch
19.11 sh

(D
I.)

_ 0U
plememir
”1'- WT o

 

 

+ rep1
+ rep2

 

 

 

 

TG Gene frequency
0.00 0.05 0.10 0.15 0.20 0.25 0.30
l

 

 

 

generation

Figure 2.1: Evolution of the TG gene frequency over time for the deterministic models
based on the expected values as well 8 the average over all bootstrap replications for the
three different values of mating success of WT males (1.16, 1.93 and 5 respectively).
The trajectory for the average over the bootstrap replications implementing the Copula
approach overlays the trajectory for the corresponding independent approach. Legend
text should be interpreted as follows: D - represents deterministic models, B - represents
average over bootstrap replications, C - represents average over bootstrap replication im-
plementing dependencies; 1.16, 1.93 & 5 — the numbers represent the mating success of
the WT genotype in that scenario.

54

Density

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

WT Mat. Success 1.16 WT Mat. Success 1.93
LO
q _ v
o o" ‘
V
O _
o‘ <2 3
(‘0 O
Q _
° N _
3 _ o
O
8. — S r
O
8 _. c: _
o“ O
l I l l l I l l I l l l I
0 10 20 30 40 50 o 2 4 6 6 1o 12
WT Mat. Success 5 WT Mat. Success C 1.93
LO
0. _
0
v-' ‘ V. _
O
0'2 ..
o .
2 - g 4
g _.
0. _ . °- 4
O O
l l l l l l l l l l l
0 2 4 6 o 2 4 6 6 10 12
Fixation time of WT

Figure 2.2: Distribution of the time to ﬁxation of the Wildtype (WT) genotype deﬁned as
the time the TG allele frequency is < 0.001 for the three different mating success values
of the WT genotype (1.16, 1.93 and 5 respectively).

55

 

 

   

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

 

 

(D
o'
O I!)
0? Ed
go
o ‘32:-
'—
.é‘
go '2”
E00. :0
9° 8
LLN.
O
E S
<5 4 6 8 10 12 4 6 8 10 12
fixation fixation
e. Em
as 96
a? a
:30 ‘g
g trig /
.- 2 .
> _o
<33 5;;
° 8
o'
4 6 8 10 12 4 6 8 10 12
fixation fixation
O)
O)
o'
w
0)
go-
"rx
(00)
3:5
66
.10)
0'
l0
0)
d

 

 

 

4 6 8 10 1 2
fixation

Figure 2.3: Distribution of the ﬁxation times versus the distribution of relative ﬁtness
components and ,1. Contours represent regions of the joint distribution using 2D kernel
density estimates. Contours in red indicate the distributions in the scenario incorporat-
ing dependence using copula, while the contours in black represent the corresponding
bootstrap scenario (i.e relative WT male mating success=1 .93).

56

0.34 0.40 90 100 0.9976 0.9980
1 1 1 1 L 1 1 1 1 1 1 1

 

   

 

 

   
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

l l l
e ‘ ' e
o' _ Fertility 0'
<1- ‘ <1-
C’? - 02.
A o
o
8 o
1— 1—
Age of Maturity
o o
on on
83 8
05 a)
0? a?
o Viability O
R a
C” c»
a; c:
o o'
_ “'2
8 1-
m Fecundity
o
O ; _
N 1 1 i

l ' l I l l I ‘ l
100 0.9977 0.9981 0.5 1.5

Figure 2.4: lPairwise scatterplots of the parameters showing the relationship between pa-
rameters under the independent bootstrap and the copula approach. The set of scatter
plots above the diagonal show parameter combinations under independence. The set of
scatterplots below the diagonal show dependence induced among the possible parame-
ter combinations based on the speciﬁed correlations, here -0.5 between viability and all
others, 0.5 between fecundity, fertility and the age of sexual maturity and 0 between fe-
cundity ad ferility as they are among different sexes.

57

 

 

   

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

L0
0)
0?
O 1.0
N
E. a.
28 P
30' 5
C0 0
"-383 E g
< 8‘) g
0?
O to
‘0.
O
0.70 0.75 0.80 0.85 0.90 0.95 0.70 0.75 0.80 0.85 0.90 0.95
fertility TG/WT fertility TG/WT
1.0
O)
0?
o E “:3
O
V (5
ed a
$00 ‘3 o
E 83. E '5
'5 ° ‘3
X
8 8
0?
0 L0
‘9.
O
0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6
fecundity TGNVT fecundity TG/WT

Figure 2.5: Comparison of relative ﬁtness contours between bootstrap and copula gen-
erated parameter distributions. Contours represent the joint distribution using a 2D ker-
nel density estimate. Contours connected by solid lines indicate the distributions in the
scenario incorporating dependence using copula, while the contours drawn with dotted
lines represent the corresponding bootstrap scenario (i.e relative WT male mating suc-
cess=l.93). The contour levels indicate probabilities from 0.1 to 1 with a 10% difference
in probability among adjacent levels.

58

 

 

WT mating success=1.16 WT Mating success=1.93

A

 

i
‘ + rep1
—0— rep2
-£I- mean

     
     

 

 

 

 

 

 

 

 

 

 

 

 

TG Gene frequency

0.00 0.05 0.10 0.15 0.20 0.25 0.30
TG Gene frequency

0.00 0.05 0.10 0.15 0.20 0.25 0.30

 

 

 

 

 

 

 

 

 

 

 

 

 

2 4 6 8 10
generation generation

0 , C

3% WT Mating success=5 ng Copula model

In IO

2 g! —o— rep1
5.0 5.0 + rep2
C N C N -EJ- mean
3 o' 3 c5 ......
gm gm ------
0 <2 1, o e
(5 o‘ t (D o’
l— .. 1—

10 ‘- in

Q s Q

0 . ~. 0

o é. o """""

o ~- 0 ------

c5 c5

2 4 6 8 10 2 4 6 8 10
generation generation

Figure 2.6: Comparison of the trajectories of gene frequency of the validation experiment
from those generated from the model predictions, with the associated 99th percentiles
of the prediction distribution at each generation. A) Represents the prediction distribu-
tion when the WT has a relative mating advantage of 1.16 B & D) the relative mating
advantage of the WT =1.93 C) WT relative mating advantage =5 and D) Additionally,
incorporates dependence using a copula approach. Legend text in the plots are: rep1 &
rep2 - Mesocosm replicate,mean - Mean over 1000 bootstrap replicates, 0.5% & 99.5%-
the 99% intervals over the bootstrap replications.

59

Chapter 3

Probabilistic Sensitivity Analyses of
Models for Ecological Risk Assessment:
A Case Study Using the Net Fitness
Component Model for Assessing Risk of

Genetically Engineered Fish

3.1 Introduction

Modern ecological risk assessments include modeling as a fundamental component to pre-
dict population trajectories and to quantify risks (Barnthouse and Sorenson, 2007). In the
past decade advances in computing technology has promoted the emergence of simulation
models as a primary tool utilized in this respect (e.g. Wisdom et al., 2000). Simulations
models can be seen as another kind of experimental system in ecology and evolution,
wherein a complex model formulation mimics the behavior of natural systems. This can

help researchers investigate systems behavior by manipulations that would be impossible

6O

 

otherwise, either logistically or ethically (Peck, 2004). For example, using simulations
based on digital organisms, with computer programs that self-replicate, mutate, compete
and evolve, it was shown how complex functions can originate by random mutation and
natural selection over approximately 15,000 generations (Lenski et al., 2003).
Population level ecological risk assessment of anthropogenic impacts in natural en-
vironments has advanced towards using a combination of modeling informed through
ﬁeld data. While these models can be complex, there is impetus towards integrating both
the genetics and demographic aspects through modeling approaches to better understand
these impacts (Nacci and Hoffman, 2008). Similarly, ecological risk assessment for ge-
netically engineered organisms (GEOs) is no different and employs models of varying
complexity towards quantiﬁcation of risks. GEOS that are considered to have potential
ecological impacts are species that have natural wild populations and are difﬁcult with
regards to containment of the transgenes (e.g. plants, ﬁsh and insects). The last decade
has seen the emergence of a variety of modeling approaches towards predicting the fate of
the transgene introgression into natural populations with varying degrees of complexity.
Models of introgression for transgenic crops evaluate gene ﬂow based on pollen dispersal,
for example, incorporating exponential distance and directional effects of wind mediated
pollen dispersal (Meagher, 2003) or transport equations of atmospheric physics coupled
with population dynamics and genetic models (Richter and Seppelt, 2004). Additionally,
structured population models combining dispersal (Gamier and Lecomte, 2006) and en-
vironmental variability (Claessen et al., 2005) have also been employed in this context.
Recently, AMELIE, a modeling framework combining both population structure, stochas-
ticity and dispersal has been proposed for transgenic crops (Kuparinen and Schurr, 2007).
Structured population models have also been used for assessing the risk of transgene in-
trogression in GE ﬁsh populations (Muir and Howard, 2001) and with stochasticity adding

further complexity (see Chapter 1).

61

The above models are examples of the increasingly complex models utilized for pre-
diction in assessing risks associated with potential environmental consequences of trans-
gene introgression into natural populations. The consequence of increased model com-
plexity are twofold: the number of parameters in the model increases and the relationships
among variables need not be linear. For example, in the proposed AMELIE (Kuparinen
and Schurr, 2007) framework, the most sophisticated yet for transgenic crops, there are
29 parameters, and our simpliﬁed model has 10 parameters. The ﬁrst step towards mak-
ing sense of system behavior under these circumstances lies in extracting the relation-
ship between variation in parameter values and the resultant variation in model output,
known as uncertainty analysis (Saltelli, 2000). The next step is to examine the contribu-
tions of the parameters to the variation in the output and their relative importance, also
known as sensitivity analysis. A wide array of methods exist for both local and global
sensitivity analysis, as well as an extensive literature and one recent comprehensive re-
view can be found in Saltelli (2000). Local sensitivities have been widely used for sim-
ple models, which usually can be analytically derived as they are derivative based, and
thus are not computationally intensive. For example, elasticity analysis performed for
population viability analysis (PVA, Caswell, 1989; Morris and Doak, 2002) using stage-
structured models are local methods, however, this is simple only if the response is linear
and monotonic. Global sensitivity analysis (GSA) has recently become prominent using
variance-based techniques (an ANOVA like decomposition of the output variance) and
provide insight on the relative importance of factors in complex model settings. Brieﬂy,
let y = f (x1 , x2 . . . xk) denote the output of a simulation model with x,- (i = 1, 2, . . .k) the
input variables, with an joint probability distribution giving rise to uncertainty in y. If the

input factors are orthogonal then the variance of y, V( Y), for a model with k factors can

62

be decomposed as:

V(Y)=Zv,-+Z Zovij+-"+V12...k

i j>z

where Vi = V(E(YlX,-)) , Vi j = V(E(Y|Xi,XJ-)) — Vi — V j and so on. A model with only
Vi

WY?

which explains the amount of variance due to factor 1'. If the model is completely additive

Vi terms is called additive and the ﬁrst order sensitivity indices are deﬁned as S ,- =

then ZiS i = 1, else we can use another measure the total sensitivity index TIl- deﬁned

as:

 

V(E(le_i))
Ti:(1- V(Y) )ZSi+Z.Sﬁ+ Z Sijk+"'
J¢t j¢i,k=ﬁi

which is the amount of variance explained by the factor X,- and all other interactions
involving that factor. Both sensitivity indices have a quite general applicability, since
they apply to a very wide range of non-linear models i.e. they are “model-free” and rely
on only a few assumptions, namely the output has to be square integrable and the variance
is an adequate measure of dispersion. An accessible introduction to GSA using variance
based techniques can be found in Saltelli et a]. (2008) as well as in two recent reviews
introducing this approach to ecological applications (Cariboni et al., 2007; Fieberg and
Jenkins, 2005).

Recent rapid technological advancements, both in computing and data generation (e. g.
genomics), has facilitated wider use of complex models for biological applications and
this is also true of applications in population ecology (e.g. PVA). However, increase in
model complexity results in increased computational time, both in terms of computational
burden as well the parameter combinations used. In the area of systems engineering com-
plex computer models are the rule rather than the exception, especially as experimental

evaluations are costly. For example aerospace applications (Gramacy and Lee, 2008)

63

involve a large number of parameters during the prototype development phase and sim-
ulations as experiments are heavily utilized. This ﬁeld is mature with formal methods
for the design and analysis of computer experiments (e.g. see Fang and Sudjianto, 2006;
Santner et al., 2003). Meta-modeling of the computer model is an approach which is
used to increase efficiency when using computer experiments, known as an emulator in
the literature. This involves using a surrogate model to simulate the computer model
itself and is done using a wide variety of statistical regression techniques, from polyno-
mial regression to spline smoothing methods (Fang and Sudjianto, 2006). The standard
Bayesian approach to meta-modeling in the literature is to use a stationary Gaussian Pro-
cess (GP)(Santner et al., 2003; Oakley and O’Hagan, 2004). The GP is conceptually
straightforward and it easily accommodates prior knowledge in the form of covariance
functions for the model output, as well as for input distributions (Oakley and O’Hagan,
2002), and returns estimates of predictive conﬁdence. Mathematically, GPs are equiva-
lent to many well known models (e. g. Bayesian linear models, spline models, large neural
networks; see Rasmussen and Williams, 2005) and have also seen use in geostatistics as
kriging and environmental modeling (e.g. Wikle et al., 1998). GP models may be easier
to handle and interpret than, for example, neural networks (Plate, 1999).

The Bayesian framework is advantageous in that it is ﬂexible to incorporate model
extensions as well combine outputs across multiple models, e.g. using hierarchical mod-
eling for population ecology (Clark, 2003) or Bayesian model averaging (BMA; Gelman,
2004). With increased computational power Bayesian approaches are becoming popular,
and also recommended due to their ﬂexibility, for assessments in conservation (Wade,
2000), population modeling (Millar and Meyer, 2000), population genetics (Beaumont
and Rannala, 2004) and evolutionary biology (OHara et al., 2008).

In this study we apply Bayesian Gaussian Process (BGP) models for sensitivity anal-

ysis of a risk assessment model for transgene invasion, the net ﬁtness component model

64

(Muir and Howard, 2001, 2002). We use experimental and simulation data from a previ-
ous study (see Chapter 2) to conduct GSA for the model. We use both the Monte Carlo
approach (Saltelli et al., 2008) and the BGP model approach (Gramacy, 2007; Oakley
and O’Hagan, 2004) for GSA analysis and evaluate the impact of sampling designs for
parameter selection, speciﬁcally random sampling and Latin hyper-cube sampling. Given
that the number of model evaluations are reduced using the BGP model, we also conduct
a GSA by extending the previous dataset with two additional variables, the dominance ef-
fect of the transgene and the mating success of the wildtype, as a case study. We had not
done this previously as there was insuﬂicient information from the experimental data for
those variables to estimate their underlying distribution in the population. We illustrate
the ﬂexibility of the BGP model by evaluating the effect of the uncertainty distributions

for these two variables using GSA of the model.

3.2 Materials and Methods

3.2.1 Gaussian process modeling

The model description closely follows that of Gramacy (2007). Let X be a input matrix of
k covariates with n observations and a single response variable Z. Then the hierarchical
generative model for the stationary GP is as below:

The model output Z(response) depend on multivariate inputs X (explanatory vari-

ables) as
2 - 2 2
ZlB,0' , K ~ Nn(F,B,0' K) 0' ~ [G(aa/2,q(,/2)

Where ,8 are linear trend coeffecients and 02K is the covariance of a mean-0 process,K

is an correlation matrix and ,B is independent of 02K in the prior. Using a hierarchical

65

approach the mean function can be modeled as

3102. r2.W,,80 ~ wawz, r2.W) r2 ~ 10(aT/2. (M2)

ﬂo ~ NmQ-laB) W“1 ~ W<<p(V“).p)

F = {1 X} can be used to add covariates to the mean function, W is an k x k matrix
representing the covariance of the mean function; N, [G and W are the normal, Inverse-
Gamma and Wishart distribution. Constants ,u, B, V, p, o0, (10,01, qr are treated as
known. The GP correlation function K is chosen from a seperable power family, thus
assuming anisotropic correlations, with ﬁxed power p0, but known range and nugget
parameters. Correlations take the form K(xi,xj) = K‘(xi,xj) + gdi, j where g is the
nugget, 6,31- is the Kronecker delta function and K * is a true correlation representation

from a parametric family as follows:

 

. ._ .Po
.ku xkjl

* — _

K (xi,xj|d) — exp{ E dk }

The implementation for the markov chain monte carlo sampling (MCMC) involves a
Metropolis within Gibbs approach. The derivation of the full conditionals and the imple-
mentation of the MCMC is given in detail in Gramacy and Lee (2008) and all parameters

except those within the K(-, -) function can be sampled using the Gibbs step.

3.2.2 The net ﬁtness component model

We used an age structured model with three genotypes (WW , Ww and ww), the Wild-
type, heterozygote and the transgenic homozygote, respectively. The details of the model

implementation is given in Chapter 1 and appendix 1. Hereafter the WW genotype will

66

be referred to as the WT genotype and the ww genotype will be referred to as the TG

genotype respectively for notational simplicity.

3.2.3 Monte Carlo simulations

We use three different sets of simulations to conduct GSA using the Monte Carlo ap-
proach. Previously, we used experimental data to evaluate uncertainty in estimates of
ﬁtness components (refer Chapter 2) and generated samples from the joint uncertainty
distributions of the estimated ﬁtness components. This was done using Monte Carlo ap-
proaches for each ﬁtness component independently by non-parametric bootstrap. We use
the data set from the non-parametric bootstrap for estimating sensitivity indices and this
data set will hereafter will be referred to as MC-data. The data set had already been
modiﬁed for a reduced age-structure using the genetic algorithm approach. The MC-data
consisted of 1000 samples and we split it into two sets of 500 samples each to estimate
the ﬁrst order sensitivity indices (hereafter SI) and the total sensitivity indices (hereafter
TI) using the Monte Carlo approach. Let A be the matrix of N1 x k inputs, where N1
represents the ﬁrst 500 samples, and B the matrix of the other 500 samples and let Ayi
represent the model output using the ith row of the A matrix, and similarly for Byi- Fi-
nally, we create a new matrix C(i) for each input variable xi which contains the columns
for that variable from A and all other columns from B and we denote the model output as

Cyi- Following Saltelli et al. (2008) we can estimate the S 1,- as follows:

. . . 2
._ Var(E(Y|Xl-) _ Ay-C y(') - foz _ (l/N)Z{-‘= 1A)’i Cyi -f0
1 ‘ V(Y) — _

 

 

 

.2 .2
A)’ 'A y-fo (”MEL 1Ayi A)? -f0

where the 81,- is the ﬁrst order sensitivity index for variable x,- and is expressed as a
fraction of V( Y) that is explained by that variable; Ay, By and Cy represent the vector of

outputs for matrices AB and C respectively. Thus the S I i’s range from 0-1. The Tll- for

67

an input 1' is calculated as follows from the evaluations of matrix B and C:

. (i)_ .2 .2
_ Var(E(Y|X_l-) = M yC f0 =(1/N)sz,-Cy,~—f0

T1,- = 1
V(Y)

 

 

 

x2 A2
)7; 'YA “f0 (”ME/137,437 -f0

where the Tli is the total sensitivity index for variable i and represents the sum of vari-
ances that includes the ﬁrst order and all higher order interactions that involve the variable.
If Tli is 0 then it implies that the variable is non-inﬂuential with regards to the output. If
it is close to the S 1i then it implies that the interactions with other variables are negligible
in their inﬂuence on the output and the role of the variable is mainly additive.

Since the TG genotype is dominant in our case, both the transgenic homozygotes and
heterozygotes have the same values for ﬁtness components, and thus there were only ten
input variables to the model. They were the ﬁve ﬁtness components for the WT and TG
genotypes, namely: fertility, fecundity, juvenile viability, adult viability and age of sexual
maturity. We had to do a total of 500 x 10 + 2 evaluations to estimate all the SI and TI,
and this will hereafter be referred to as MC2-data. Additionally, we used Latin hyper
cube sampling to generate a space ﬁlling design of 2 sets of 200 samples each,for the 10
input variables. The combined data set of 400 samples will hereafter be referred to as the
LHS-data. To sample from the marginal distributions of the ﬁtness components we used
an inverse transformation of the cumulative distribution function. We used this design to
estimate the SI and TI, using the same protocol above, resulting in a total of 200 x 10 + 2

simulation, hereafter referred to LHSZ—data.

3.2.4 Bayesian Gaussian Process (BGP) Approach

We used the BGP approach to ﬁt a meta model to the output distributions using the fol-
lowing samples: a random sample of 300 observations each from the MC-data and the

LHS-data. The models will hereafter be referred to as GP-MC and GP—LHS, respectively.

68

The data samples from the MC-data were used as the training samples and the remaining
samples used as a test set to evaluate the predictive performance based on residual mean
square error (rmse). We also used samples from the other data sets created as test sets to
evaluate the performance of the GP model predictions for both the GP-MC and GP-LHS
models. For example, we evaluated the predictions of the GP-MC with respect to the
MC2-data as well as the LHS-data.

As a case study we evaluated the risk of transgene introgression using the BGP ap-
proach. We created a third simulation data set which included the dominance of the TG
genotype and the mating success of the WT genotype as additional variables, hereafter
referred to as FULL-data. The dominance effect of the TG genotype was chosen from an
Uniform (0,1) distribution, and the mating success of the WT genotype from an Uniform
(1,5) distribution. Simulations were conducted and the GP model, hereafter referred to as
GP-FULL, was used for conducting global sensitivity analysis of the FULL-data.

Evaluations of global sensitivity indices, both ﬁrst order and total, is simple once the
GP model has been ﬁt and we calculated the S1 and T I for all data sets above for compar-
ison. Additionally, using the GP-FULL model we calculated the sensitivity indices using
two forms of uncertainty distributions of the dominance and the WT mating success vari-
ables. The ﬁrst is the default uniform distribution and the second form for the uncertainty
distributions used a scaled beta that reﬂects our prior information on the marginal dis—
tributions of those parameters. For both variables, we evaluated uncertainty distributions
with modes close to the two extremes of the range of the data. For the dominance effect of
the transgene, which ranged from 0-1, we used modes at 0.1 and 0.9 respectively. For the
WT mating success, we used modes at 1.5 and 4.0 respectively. Since the GP approach
used here is a fully Bayesian approach using MCMC, we also used diagnostics to assess
convergence of the MCMC for the ﬁnal models.

For all analysis we used the R statistical environment (Team, 2009) and the appropri-

ate packages. BGP model ﬁtting was done using the tgp package (Gramacy, 2007). The

69

tgp package is extremely useful and provides functions for sensitivity analyses which
lets us specify the uncertainty distribution for the matrices used to calculate the sensitiv-
ity indices. The sens functions were used to generate the global sensitivity indices, both
ﬁrst order and total. Additionally, the 1hs package was used to generate the Latin hyper
cube samples for the LHS-data. As the resulting values are on the unit hypercube, the
inverse CDF of the marginal distributions from the MC-data was used to obtain the input
values for LHS-data. We use the coda package to evaluate the convergence and serial

correlation of the MCMC.

3.3 Results

3.3.1 Global Sensitivity analysis: Monte Carlo approach

The variance in the output as well as the estimated ﬁrst order sensitivity indices (SI) and
total sensitivity indices (T1) are given in Table 3.1 for all analyses using the MC-data and
the LHS-data. For the MC-data the SI and TI indices are not within the admissible range
(0-1) (Figure 3.1A and B), which occurs because the variance in output is not correctly
estimated by the random sampling design (MC-data var(y) = 4407.818 vs MC2-data
var(y) = 54210.4). The joint distribution from the random sampling does not encompass
regions of parameter space that have very low probability and therefore would necessitate
a larger number of runs. Even using the estimates of variances from the MC2-data runs
does not improve results much. The Si for the MC2-data identiﬁes the juvenile viability of
the TG genotype as the most inﬂuential factor (Figure 3.1C). A space-ﬁlling design (e.g.
the Latin hyper cube LHS-data) samples the entire parameter space and gives a much
better estimate of the output variance (LHS-data var(y) = 349242). For the LHS-data,
the SI indices ranked age of sexual maturity of the WT genotype as the most inﬂuential

input followed by juvenile viability and fecundity of both genotypes (Figure 3.1E). The

70

TI indices indicate that the effect of all the components identiﬁed as inﬂuential by the
81 are not additive and that there are interactions among them (Figure 3.1B, D and F).
All the components show higher order interactions as measured by the difference TIi-Sli,
even factors that do not have any ﬁrst order effects. Adult viability and fertility of both
genotypes are only inﬂuential as interactions with the other components as measured by
the difference in TI and SI. The TI indices are relatively consistent for both the MC-data

and the LHS-data, however the SI indices vary among the two sampling designs.

3.3.2 Bayesian Gaussian Process (BGP) models

3.3.2.1 BGP Predictions

The GP approach is able to produce a near perfect ﬁt for the MC-data as well as the
LHS-data(rmse=0.0004 and rmse=0.00029 respectively) for the data on the transformed
scale. For the MC-data the GP-MC model produced good predictions for the test sets
from the MC-data and the MC2 data (rmse=25.3639, Figure 3.2C; rmse=56.4734, Figure
3.2E). The predictions are not very accurate for the LHS-data outside the range of training
samples (rmse=487.5 Figure3.2D). However, when the input parameters were within the
range spanned by the training samples the GP-MC predictions are accurate. For the LHS-
data the GP—LHS model produced good predictions for both test sets, MC-data and MC2-
data (Figure 3.3C and D) and the absolute deviations were < :40 units for about 90% of
the original values (Table 3.2). The GP-LHS model predictions are not very accurate at
the upper limit of the ﬁxation times, even in the case of the training set (Figure 3.3A and
B). This is mainly caused by the effect of the reduced age structure resulting in reduction
of age of sexual maturity for the WT genotype to only two values (9 and 10 weeks), with

a very low probability for the larger age.

71

3.3.2.2 Global Sensitivity analysis: Gaussian Process approach

SI indices for the GP-MC model and the GP-LHS models show that the juvenile viability
component, for both WT and TG genotypes, had the largest ﬁrst order effect with the TG
genotype having the highest effect (Figure 3.4; Table 3.1).

For the MC-data the GP-MC also ranks the fecundity of both genotypes and the sexual
maturity of the TG genotype higher than the other components (Table 3.1). Additionally,
we can estimate the main effects for the ﬁtness components at no extra cost and deduce
the relationship between individual ﬁtness components and the response (Figure 3.4). In-
creasing juvenile viability of the TG increases the time to ﬁxation of the WT, whereas
increase in WT fecundity results in the decrease of the ﬁxation time. Comparisons with
the Monte Carlo approach reveal that the method is unable to detect the smaller effects,
which is a function of both the sampling method used as well as the number of model eval-
uations. The Monte Carlo approach to calculating SI with the MC-data performs worse
even though it takes a total of 6000 model evaluations compared to the GP approach for
the same data which used 300 model evaluations as the test set. The total sensitivity
indices reveal that all components have higher order effects and that the response is sen-
sitive to combinations of the factors (Fig 3.4). This results is reﬂected in the Monte Carlo
approach, however the TI estimates are much higher in that case.

For the LHS-data the GP-MC also ranks the juvenile viability components, for both
the WT and TG genotypes, as the factor with the highest effect (Table 3.1; Figure 3.5).
Additionally, now there is a clear indication that the age of sexual maturity and the fe-
cundity components for both genotypes have ﬁrst order effects as well (Figure 3.5). The
main effects plots show the trends of the scaled response to the input. Fixation times
increase or decreases alongwith the important factor identiﬁed by SI. For example, the

slope of ﬁxation time is positive with respect to TG viability. As for the MC-data the TI

72

indices reﬂect that all components have higher order effects and the fertility and adult vi-
ability components are only important in interactions with the other components. Results
compare much more favorably to the Monte Carlo approach with regard to the SI indices
except for a difference in rank among the top three factors.

For evaluating the SI the GP methods provide more realistic estimates when the pa-
rameter space is not completely spanned, whereas the results are more consistent between
approaches when the converse is true. The Monte Carlo approach suffers more from sam-
ple size compared to the GP approach. The TI estimates show differences in both mag-
nitude and rank among the two methods, although one remarkable resemblance among

both approaches is the consistency of those estimates for both data sets.

3.3.3 A Case study: Predictions for a model of transgenic Zebraﬁsh

The GP-FU LL model approach produces reasonable ﬁt to the ﬁxation times for the FULL-
data in the transformed scale (rmse=3.428e_04; Figure 3.6A). On the original scale the
model is unable to ﬁt three extreme observations which leads to a higher than expected
rrnse (rmse=17l.942), although the ﬁt improves once those observations are removed
(rmse=20.717). This is the case for the test dataset as well (rmse=134.2; Fig 3.6B), where
there is noticeable improvement once observations > 2000 are dropped (rmse=20.7l7;
Figure 3.6C). Model predictions as measured by the mean absolute deviations from the
true vales are approximately within :t40 days for 90% of the observations on the back-
transformed scale (Table 3.1).

Sensitivity analysis of the model input variables reveals that mating success of the
WT genotype is the most important factor followed by the juvenile viability and fecun-
dity. However, all factors where the SI~O exhibit interaction effects as revealed by the
high difference in the TI-SI (Table 3.3). Changing the uncertainty distribution produced

changes in the model sensitivity to the input factors and gives new insight into model

73

behavior. Increasing the dominance of the transgene increased the SI of mating success
of the WT genotype, while the opposite trend is noticed for the fecundity and juvenile vi-
ability components. The TI values are reduced at both values of dominance for all ﬁtness
components (Table 3.3, Figure 3.7). Shifting the WT mating success distributions results
in increased SI values all components except for the mating success component itself
which decreases under both conditions (Table 3.3; Figure 3.8). In the case of T I, shifting
the WT mating success to lower values creates a corresponding reduction for most com-
ponents. However, shifting mating success to the upper extreme results in the decrease of
TI for most components, except for WT fecundity, fertility and juvenile viability which
show increased TI values (Table 3.3). A signiﬁcant result derived overall is that the WT
mating success fails to have any main effect on the model when the distribution is shifted

to the upper extreme (Figure 3.8 Main effects).

3.4 Discussion

Previously, we have shown that uncertainty arising from demographic stochasticity (see
Chapter 1) and estimation of model parameters (see Chapter 2) can inﬂuence predictions
of transgene introgression into natural populations. However, when parameters are un-
known or diﬂicult to estimate we had to make simplifying assumptions when using them
in modeling due to computational and analysis constraints. For example, the experimental
setup to estimate mating success in the previous study could not provide enough infor-
mation on the underlying distribution for that component. It is not feasible to explore
all possible candidate distributions using our simulation model and meta-modeling of
computer experiments provides an attractive alternative to explore the parameter space
in a reasonably eﬂicient manner. In this study we explore the utility of using Bayesian
Gaussian Process models for meta-modeling and illustrate their application for sensitivity

analysis of the model.

74

Ecological risk assessments are based on increasingly complex modeling of ecolog-
ical phenomena and model using a variety of statistical and computational approaches
(Barnthouse and Sorenson, 2007), the case of risk assessment for genetically engineered
organisms (GEOs) being one such example. Recent advances in computational power
have fostered the use of complex models for ecological level phenomena and the pos—
sibility of using computer simulations as experimental surrogates (Peck, 2004). While
the concept of using computationally demanding models are relatively new in ecologi-
cal applications, this is the norm in areas such as engineering design wherein a mature
literature exists towards design and analysis of simulations (Fang and Sudjianto, 2006;
Santner et al., 2003). An important consideration in these areas is the development of
a meta-model, an emulator that can be used to approximate the simulation model over
most regions of the parameter space and this is usually constructed using statistical data-
analytic methods. Emulators can therefore provide an eﬂicient method of exploring model
behavior and Bayesian Gaussian Process (BGP) models have recently become prominent
in this area (Oakley and O’Hagan, 2004; Santner et al., 2003). Furthermore, the ﬂexibility
of a Bayesian approach makes these models attractive for consideration in ecological risk
assessment. Our results, using the BGP approach for the net ﬁtness component model
of GEO risk assessment, show overall good performance as an emulator for prediction
of simulation output using the ﬁtted model (Figure 3.2, Figure 3.3, Figure 3.6 A, B and
C). While the BGP models consistently had difﬁculty in predicting the times to ﬁxation
of the WT genotype at the extremes of the distribution, this can be attributed to the fact
that this is a preliminary assessment of the model performance using the general setup of
the R package tgp. We can expect further improvements from including modiﬁcations to
our speciﬁc case and this is certainly an area for further research. Using the BGP model
provides substantial improvement in the computing time to explore the parametric space
as well as the scope for additional analysis such as sensitivity analysis.

A direct consequence of increased model complexity is that the number of variables

75

and parameters underlying the model increases substantially, as well as the functional re-
lationships among them. Model sensitivity analysis then provides insight as to what fac-
tors inﬂuence the output of interest and thus help narrow down important inputs (Saltelli,
2000). Most efforts for sensitivity analysis in ecological risk assessments have been
derivative based, using analytical approaches (Caswell, 2008, 1989), and thus are conﬁned
to local perturbations of the parameters. Global sensitivity analysis (GSA), speciﬁcally
variance based approaches, are attractive in that they can accommodate nonlinear/non-
additive effects and thus are “model free”, although they can be computationally expen-
sive since they use Monte Carlo methods for evaluations (Saltelli, 2000; Saltelli et al.,
2008). These methods are however gaining prominence in ecological applications (Cari-
boni et al., 2007; Fieberg and Jenkins, 2005) as they can facilitate comparisons across
studies and thus potential ability to learn collectively from multiple results. For example,
a recent review of sensitivity analysis for spatial PVA’s emphasized the need to standard-
ize methods by applying GSA and develop tools for the same (Naujokaitis-Lewis et al.,
2009). The Bayesian approach using BGP meta-modeling allows for calculation of the
GSA sensitivity measures and sensitivity analysis to be undertaken using far fewer num-
bers of model runs in comparison to Monte Carlo methods (Oakley and O’Hagan, 2004).
In our study, we initially used results from the previous simulations, based on a Monte
Carlo approach (Saltelli et al., 2008), to conduct GSA and our estimates were not within
admissible ranges (Figure 3.1), even for 500 runs of the model, whereas the BGP model
produced reasonable estimates even with this data set (Figure 3.1)). Using the recom-
mended Latin hyper cube sampling approach we arrived at reasonable estimates of sen-
sitivity indices with the Monte Carlo approach, although this required 2400 simulations.
The BGP approach provides very similar estimates using 300 simulations at a fraction of
the computational cost.

An additional advantage to using the BGP model is that of prior speciﬁcation on the

uncertainty distribution of the input variables (Oakley and O’Hagan, 2002). This lets us

76

use the ﬁtted model for evaluating the effect of modifying underlying distributions when
they are not available for the model parameters, without running further simulations. Pre-
vious studies for GEO risk assessment have utilized an uniform distribution within the
range of the parameter space to generate random inputs (Claessen et al., 2005; Kuparinen
and Schurr, 2007). For the case study of the net ﬁtness component model, we evalu-
ated two additional components for which we did not have experimental estimates of the
underlying distribution. Results show that the modifying uncertainty distributions can
result in modiﬁcation of model behavior and the resultant sensitivity measures, although
in a biologically meaningful manner, and thus provide additional insight on the effects
of factors. For example, shifting the WT mating success distribution towards the upper
extremes results in complete loss of importance for the component. This makes intuitive
sense biologically, for above a certain threshold of WT mating success values the only
deviation from complete loss of the transgene occurs if some other component is strong
enough to counteract this effect.

The main caveat of a fully Bayesian approach, as used here, is ensuring that there is
MCMC convergence.This can be problematic as inference on the BGP scales poorly with
the number of data points typically requiring computing time that grows with the cube
of the sample size. Storlie and Helton (2008) provide a comprehensive review of using
smoothing methods such as GAM’s and LOESS in the context of sensitivity analysis,
though they do not provide explicit methods to conduct global sensitivity analysis. The
Monte Carlo approach for sensitivity analysis is easy to implement but costly in the num-
ber of model runs required for analysis. More importantly, current methods of GSA are
restricted to orthogonal or independent inputs, as there is considerable difﬁculty in evalu-
ating the high dimensional integrals that arise in estimation when considering correlations
among inputs. We have shown in context of risk assessment of GEOs that correlations

among the ﬁtness components exist due to to life history trade offs and have important

77

ramiﬁcations for model predictions, and this is true more generally for most natural pop-
ulations.

As illustrated by our results, the BGP approach enjoys good emulator properties and
also provides means to conduct sensitvity and uncertainty analysis with signiﬁcant sav-
ings in computation time. The tgp package (Gramacy, 2007) also provides additional
functionality such as adaptive sampling and partitioning of the paramater space (Gra-
macy and Lee, 2008), which can be used for further analysis. Bayesian Gaussian Procss
methods also enjoy the Bayesian advantage of being able to specify priors and thus pos-
sible model extensions using data from other sources. For example, it is possible to add
an additional error term and model the noise as a second Gaussian Process, in addition to
the deterministic output (Goldberg et al., 1998), and perhaps provide an avenue to model
demographic stochasticity. Further extensions, within the Bayesian framework, can in-
clude combining environmental data using a time series approach for producing realistic

predictions of temporal trajectories (Gotelli and Ellison, 2006)

78

 

 

 

 

 

 

 

 

Analysis ] MC LHS GP-MC GP-LHS

Component 8.1. T1 S.I. T1 8.1. T1 8.1. T1

1W 0 1.0505 0.1117 0.9569 0.1237 0.6036 0.1212 0.5210
JT 0.311 0.9317 0.1552 0.9267 0.1987 0.7067 0.1661 0.6046
AW 0 1.0987 76—04 1.063 0.0308 0.3834 0.0230 0.3918
AT 0 1.0977 0.0018 1.0527 0.0281 0.3854 0.0277 0.3956
FrW 0 1.0988 86—04 1.0639 0.0255 0.3868 0.0244 0.3930
FcW 0 1.0979 0.1103 0.9546 0.0486 0.4420 0.0592 0.4397
FrT 0 1.0988 66—04 1.0618 0.0317 0.3916 0.0230 0.3975
FCT 0.0016 0.9767 0.0894 0.9658 0.0359 0.4286 0.0426 0.4357
SW 0 1.071 0.2175 0.8967 0.0313 0.5109 0.1015 0.5710
ST 0.0022 1.0264 0.0079 1.0223 0.0543 0.4441 0.0583 0.4501

 

Table 3.1: The mean of the ﬁrst order Sensitivity indices (SI) and the total sensitivity
indices (TI). The subsecript for the ﬁtness components represent the genotypes: W- WT
genotype and T - TG gneotype. The ﬁtness components are as follows: J - Juvenile
viability, A- adult viability, Fr— Fertility of male, Fc- Fecundity of females and S- age of

sexual maturity.

79

GP—MC prediction quantiles

 

ICSI SCI

25% 50% 75% 90%

 

model
training
LHS-data
MC2-data

 

6.5e—05 0.00015 0.00033 0.00056
0.4390 1.0892 2.5824 7.4562
2.0151 6.3550 21.8629 121.9078
0.6864 2.0021 5.9359 20.5598

 

GP-LHS prediction quantiles

 

ICSI SCI

25% 50% 75% 90%

 

model
training
MC-data
MC2-data

 

5.82e“5 0.00014 0.00026 0.00039
0.6382 1.5568 6.9838 38.3397
3.3608 8.3949 15.2161 33.9908
3.9021 8.8574 20.8094 39.6219

 

GP-FULL prediction quantiles

 

 

test set 25% 50% 75% 90%

model 0.0002 0.0004 0.0007 0.0010
training 2.1915 5.1806 10.6909 24.4652
test 4.4369 9.9096 22.3818 49.4247

 

 

Table 3.2: Summary statistics for the ﬁtted models GP—MC GP—LHS GP-FULL. Quan-
tiles repesent absolute deviations of predicted values from the true values. The model
represents the deviations from the ﬁtted values on the transformed scale and the training
set represent the deviations from the ﬁtted values back transformed to the original scale

80

 

 

Fitness Both Dominace Mating Success
Uniform mode=0.1 mode=0.9 mode=l.5 mode=4
SI TI SI TI SI TI SI TI SI TI
J W 0.089 0.405 0.109 0.377 0.115 0.367 0.127 0.405 0.188 0.525
J T 0.025 0.301 0.022 0.225 0.026 0.220 0.030 0.242 0.032 0.288
AW 0.023 0.299 0.024 0.230 0.024 0.216 0.029 0.243 0.031 0.294
AT 0.023 0.298 0.023 0.210 0.023 0.198 0.026 0.218 0.027 0.267
FrW 0.022 0.299 0.025 0.270 0.025 0.257 0.037 0.293 0.043 0.345
FcW 0.103 0.422 0.110 0.373 0.098 0.342 0.108 0.379 0.191 0.527
FrT 0.023 0.299 0.023 0.232 0.024 0.217 0.029 0.242 0.030 0.294
FcT 0.024 0.304 0.024 0.224 0.024 0.218 0.028 0.243 0.031 0.292
SW 0.048 0.358 0.031 0.238 0.039 0.240 0.035 0.259 0.044 0.321
ST 0.025 0.299 0.024 0.209 0.024 0.198 0.026 0.222 0.027 0.269
MW 0.364 0.693 0.330 0.642 0.364 0.680 0.255 0.558 0.046 0.309
Dom 0.039 0.330 0.044 0.262 0.024 0.199 0.047 0.285 0.071 0.366

 

 

 

 

 

 

Table 3.3: Summary of Global sensitivity analysis for the case study. First order (SI) and
total (TI) sensitivity indices are presented for the FULL-data. Uniform column represent
uniform distributions for both the mating successand dominance components, while the
Dominace and mating success columns represent the modiﬁcations of the uncertainty
distributions.

81

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 -o- -0-_._ _._ -o-
- oo
o A 1 o- B
sr‘ : _
‘ 9t.
81 : °
7 + + f 3- _
O- é*!_—* *?— '1' O— — J- — — —
J'W TSW' 'ct' A r F'rt' J'w WSW ‘ F'ct Tilt 'Frt'
Aw rw t t FCt Aw w t St t
mu— 8 -‘ — —_ — —
O
_ C T m D ._
9.- : O.- —
o E F —
9!- 1 7
o : 8 '0-
-i _L —
le I SW T FBI I At I F'rt I le I SW I F'Ct I At I FITt I
w rw t t 1 AW Frw t t F
:34 E - F -
- i o O o _
- 1 31
l

0.6
1
-._

[E]-
I-
HI}
1'
ﬂ}- -
m-
l-
1-
l'
”3..
090
.4...
+1?

 

 

 

 

 

 

0.0

Figure 3.1: Global sensitivity analysis using the Monte carlo approach. A) and B) First
order (SI) and total sensitivity (TI) indices for the MC-data respectively C) and D) SI and
TI using the full variance from the MC2-data. E) and F) SI and TI for sensitivity analysis
using the LHS-data.

82

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2?: _ o
E - A g_ B
‘5 l\
“2 8- , o
9 L0 .8 O 0
g _ % 8‘ o
8 9 - 0
v O B. o
g 8“ ($34 0
§ _ 00 O O o
8 .
a 8- 8-
1 00 300 500 700 1 0 300 500 700
observed (backtransformed) observed (MC-data)
Oo O
o C °'
0- 0 o
a) O
0%0
E 8- .99 E
.9 “3 .2
8 8
a 8. a
‘1' o
O
o-
N o
I l l l T ﬁr 1o— l l l I l
0 1000 3000 5000 200 400 600 800
observed (LHS-data) observed (MCZ-data)

Figure 3.2: Bayesian gaussian process model predictions using the model ﬁtted for the
MC-data. A) Predictions from the GP—MC for the data used for model ﬁtting after trans-
formation B) Backtransformed predicted values to the orginal scale for the ﬁtted values
and the response C) Mode] predictions from the test set of the MC-data D) Predictions
for the LHS-data as the test set E) Predictions using MC2-data as the test set F) observed
response versus backtransformed response

83

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

s . s
8 A "5 8 _ B
O a) ‘_'
28’ 8 E
4—0 o _. .... N
C O C 0') —
.9 " .9 r:
‘5 ‘5
.3—5 8 é E5 _
8 8 ‘ 8 ‘-'
.13 °3 :5 o
8 8 <9 -
E. o ‘ a '-
l l l l 1 l l l l l
0 2000 4000 1.30 1.31 1.32 1.33
LHS-data untransformed LHS-data transformed
s 8 — . s .
.— ._ o
3 _ C o 8 ‘ D 6
E o 8 °
2.: 8 — tr: 8 _ o o
8 7 8 ‘° ° °
'4: - a: o
92 o 9 o ° °
= 8 - == 8 - .00
'o O '0 o
9 ‘ o O 9
O o 8 o 00 O
._ o ._ _
if» a - ° ° 8 N
0- 1 l I l 1 7 Q- 1 1 1 1 1
100 300 500 700 200 600 1 000
MC-data fixation time of WT M02-data fixation time of WT

Figure 3.3: Bayesian gaussian process model predictions using the model ﬁtted for the
LHS—data. a) Predictions from the GP-LHS for the data used for model ﬁtting after back—
transforming predicted values b) Model predictions before backtransforming the response

to the original scale c) Predictions using MC-data as the test set (1) Predictions using the
MC2-data as the test set

84

Main Effects

 

 

 

 

‘2
o
“2
$0
8
Q“!
30
b
".
o
o_
o

 

-0.4 0.0 0.2 0.4

 

1

.1

0.8

0.4

 

 

 

 

0.0

 

I--- JW
l---- Jt
I--- AW '
I—— A,
l—-- Frw F"
.— ch
I—-- Frt E
-’"‘" PC! 1
l--- Sw
I—-- S,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

scaled input
1st order SensitivigLIndices Total Effect Sensitivity lndices
qu'iesesHs
O r I l l I .L l
_ Hm —
i 8, _ , s *
‘I' ' O l 'L —
: II o : —_ _‘C_F
BunaiiéitT -. 77C:T:::
tﬁamgagmg g- iiiiiiii
88.81.8438; .iw'A'w'F'rw' F'rt'3w'
Jt At FCw FCt St Jt At FCw FCt St

input variables

input variables

Figure 3.4: Sensitivity analysis using the Bayesian gaussian process model for the MC-

data

85

Main Effects

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a I.-- Jw
o' I---- Jt
I--- AW
3 -‘—— At
5 i). '—'- Frw
8 O :— ch
9 I—-- Frt
m ..'o I -"" FCt
g — I l--- sw
,’ I—-- St
I I I I I
-0.4 0.0 0.2 0.4
scaled input
1st order Sensitivity Indices Total Effect Sensitivity lndices
1 ‘TTTTTTTTTT
8.- mazesasssis
° ° ' : : : ' : : :
I AIDw *
d- V’ ' —
d‘TI 6‘1;
' ' o I H-
. ° 3 $T ° 4- -t - ii
B . T ‘ T . : T l i
E.L Sgi ' i ' : : I .L : -L
g —i EGGBDE¢ BE 2 _ "' .L ‘L
I I fI I I I I I I I
Jw'w'rAwFrﬁsw' .I'wwAwFrF'rsw'
Jt At FCW FCt St Jt At FCw FCt St
input variables input variables

Figure 3.5: Sensitivity analysis using the Bayesian gaussian process model for the LHS-
data

86

 

 

 

 

 

8 o
E o /\
o C —
8 2 °
8
e -o a o
8 o o o
e 8—
0 co o
.8 J
'O
-9 o — o o
E N
O. l I l
0 1000 3000
observed time
1st order Sensitivity Indices
O
.—' 1

O
T

 

0.0 0.2 0.4 0.6 0.8
L

iiii‘isiI
@wdwwaadam
I I

5.3
‘9

Q

 

 

 

Jw

'A'W'F'w was ' Mu
Jt At rFCw FCt wSt th
input variables

predicted time
200 400 600 800 1000

1.0

0.0 0.2 0.4 0.6 0.8

 

 

 

 

o
B o
0
Q) o
0% o o
0%?) o 0 063
o
o O 0 0
(DO
0 o
oo o o o
I I I l I
200 600 1000

observed time

Total Effect Sensitivity Indices

 

ﬁ

1

 

 

 

J'W lAlW lF'rwI F'rtl

s'w'Mw'
Jt At FCw FCt St Dt
input variables

Figure 3.6: Prediction plots and Sensitivity analysis using the Bayesian gaussian process
model for the FULL-data

87

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

mode=0.1 1st order Sensitivity Indices Total Effect Sensitivity Indices
33 V: ; {éng’énif *
o o
90 .f .§$$$$+$$$ $
““4 géééééééééyé g. “tiff“: i:
0(3'0 0'4 0'3 Jw AwFer'rt S'W Mw' Jw AwFerrt Sw Mw
' dominance . Jt At chfct St Di Jt At FCVtFCt St Dt
mode=0.9 1st order Sensitivity indices Total Effect Sensitivity Indices
8‘ :7 00.. co i
got: oi oi
g a V2. ’ “L 9 o g? g
30. O a g $ O$ff$$$ffff f
a . tr . f $
can; éeééééi. o. fi'i'vi'ii'i'
0‘. . . . . . . o Jw AwFr WF'rtSw MW 0 Jw A wFerrtSw 'w'M
0'4 d%?nina(r)tfe 1‘0 thA tFvaIFCt St Dt th At FCvJ: Ct St Dt

Figure 3.7: Sensitivity analysis using the Bayesian gaussian process model for the FULL-
data using different uncertainty distributions for the dominance parameter

88

 

 

mode=1 .5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

>~ _ .—
28-
(D
a i
90
L“ h
o‘r U ﬁr r‘r I I I
10 20_BD 4D
donunance
mode=4
>~$‘ I'd—1‘1
O
- F,
88‘ 1"
LL lad—ﬂ
o- . y T Y . .
25 35 45
dominance

1st order Sensitivity indices

 

 

 

 

tst order Sensltlvlty indices

00

CW

«.1

o. g ’ $+
o.. Si; éésééﬂ‘e
0 JW A\,.,Frw'FrtS.,\,MW

Jt At FCWFCt St Dt

 

J

 

Q01(k4 03‘

Total Effect Sensitivity lndlces

 

 

 

 

SO‘(Mll0ﬁl
cum—a

Jw A wFerrtSw
th AtFCfot St D!

Total Effect Sensitivity Indices

i

Mw'

 

 

 

 

l s

l. l’
?;= ié?éée=a$
J'wA AWFrW'F'rtSW Mw'

th AtFCwFCt St Di

 

0°"

on I

q $$?§T$??T$?;

Ojlé$$$léé$$$.
I i .I g

Q. "‘ "tl'

o v

 

JQV'AWF'rQFrt'S'w'Mw

$:AWFJ%RStDt

Figure 3.8: Sensitivity analysis using the Bayesian gaussian process model for the FULL-
data using alternate uncertainty distributions for the WT mating success parameter

89

Concluding Remarks

This dissertation work focused on improvement of methods to predict transgene intro-
gression into natural populations using modeling approaches. Models to assess transgene
introgression into natural populations need to incorporate ecological principles in a bi-
ologically meaningful manner by accounting for variation that generates uncertainty in
the predictions. Variation is ubiquitous in natural populations arising intrinsically due
to chance events, i.e. demographic stochasticity, resulting in variability in estimates of
population level parameters. The overarching objective of this research was to assess the
impact of different sources of variability on model predictions, thus providing realistic
bounds for predictions that can be subsequently used for policy and regulation.

Chapter 1 of this dissertation looked at the effect of demographic stochasticity on pre-
dictions, and the relative contributions of ﬁtness components to the resultant variability.
This was accomplished by conducting stochastic simulations of transgene introgression
for combinations of ﬁtness components contributing to demographic stochasticity. All
simulations were conducted using a stochastic re-implementation of the net ﬁtness com-
ponent model (Muir and Howard, 2001, 2002) and results were assessed for two sets
of predictions: the transgene frequency in the populations and the time to extinction
of the natural population. Results from this work show that demographic stochasticity
not only contributes to uncertainty in predictions, but also that it can reduce the average
time to extinction. More importantly, assessment of gene frequencies revealed that de-

mographic stochasticity can elevate persistence of the transgene far beyond predictions

9O

within a purely deterministic framework. Additionally, it was seen that stochasticity in
the juvenile viability component contributed the most to variability in predictions, while
stochasticity in the fecundity and mating success component did not have any substantial
inﬂuence. Thus this work highlights the importance of using a stochastic framework to-
wards making realistic predictions when using modeling approaches for risk assessment.

The results of Chapter 2 underscores the impact of uncertainties in estimated param-
eters on model predictions. Transgenes used for genetically engineering organisms can
be derived from sources that are phylogenetically disparate. This is the case for our
model organism, a genetically modiﬁed zebraﬁsh (the Glo-Fish), where the gene produc-
ing ﬁorescence is derived from a sea-anemone. This necessitates the evaluation of ﬁtness
components using laboratory experiments, at least for the transgenic individual, since the
possibility of the same genotype occurring under natural conditions is almost nil. Using
the non-parametric bootstrap to assess the uncertainty in parameter estimates and incor-
porate it into model predictions, our results here show that we can can generate more
realistic predictions of transgene fate as validated against a mesocosm experiment. Given
that the experimental settings used to estimate the ﬁtness components did not allow us
to study the interdependencies among them, we also implemented a copula approach to
assess the effect of such correlations. This application lent insights into the limitations of
the current experimental set-up and we proposed enhancements that can facilitate better
estimation. In the event that data collected is not amenable to estimate the copula param-
eters, this can still be a powerful tool to assess the effect of different types of correlation
patterns within a sensitivity analysis framework.

The direct simulation methods applied in Chapters 1 and 2 posed signiﬁcant compu-
tational burdens, which will increase with extensions to the current modeling framework.
For example, linking forecasting data on experimental drivers with elements of the transi-

tion matrices (Gotelli and Ellison, 2006) for emulating effects of long-term environmental

91

change. Therefore, to exhaustively evaluate model behavior we applied global sensitiv-
ity analysis methods, which are computationally demanding when based on Monte Carlo
methods. The results using these sensitivity measures provide measures on the impor-
tance of individual ﬁtness components on model behavior over different regions of the
parameter space, which can be used to direct future research. For example, the model
is not very sensitive to mating success of the wildtype genotype when the distribution of
mating success is 2 3. Thus further model evaluations in this region can be ﬁxed at a
single value for the mating success component. To alleviate this computational burden, in
chapter 3 we applied a meta-modeling approach using Bayesian Gaussian Process mod-
els, which can be used not only to derive sensitivity measures but also to explore differing
forms of uncertainty distributions of the parameters. Our results show that this approach
performs very satisfactorily and the choice of the Bayesian paradigm can be of immense
utility for further extending the model, e. g. to incorporate demographic stochasticity.
The results from this dissertation are not speciﬁc to the transgenic system and can
be broadly applied in areas of conservation and demography. The work on demographic
stochasticity highlights the need to examine evolutionary trajectories combining genetics
and population dynamics, for which the transgenic system was a suitable model as the
exact genetic modiﬁcation underlying the phenotype is known. This work used two less
known methodological aspects, the copula approach and the Bayesian Gaussian Process
meta-modeling, that are widely suited to modeling applications in ecology. Thus results
from this research can be used to enhance not only the predictive framework for trans-
gene introgression, but also to foster methodological developments in broader areas of

ecology.

92

Appendix A

Description of the model

Let ng t represents the number of individuals in a particular age class a (where a =
O, l, 2,. . . , m ) of genotype g (g 2 WW, Ww or w) at time t. The composition of the total
population in an age class at any given time t can be written as the sum of numbers across
genotypes N510?! = 2g N: t' The dynamics of the population vector of age classes
——9 -’—> —-’>

Nttotal(= 28 N?) , where N? represents the population vector of an individual genotype

g at time t, can be represented using a matrix approach

- 1
NWWE
—) ___.)t
-—)
Nyw
L .

 

 

where A8 is the projection matrix for genotype g. In the deterministic case, the projection
matrix A8 is composed of parameters for fecundity and survival (see SI Table 82 for

details on the parameters used in the Monte Carlo runs). In the simplest case, i.e., for a

93

-mm. In. in

single genotype with m age classes we can write,

 

 

0 0 f] fk -
"O
s] 0 0 O .
__, __, :
Nz+1=AXNt= 0 32 0 0 x . (A.2)
"m
L 0 0 0 Sm_1‘ ‘ A

 

 

where the fi’s denote the fecundity functions/parameters,k is the num ber of fecund age
classes and the si’s denote the survival parameters, IV, is the population vector at time t
(For further details on analysis of the dynamics of the population in the deterministic case
refer to Caswell (Caswell, 1989)). This can be extended as a partitioned matrix involving
the fecundity functions and the survival functions for all three genotypes as in equation
(2) (See Methods). As seen in equation (2), the survival functions can be partitioned
to form individual submatrices for each genotype (Sg) which are independent of each
other. However, the fecundity functions are nonlinear because they depend on all three

genotypes (both the male and female fractions) as underlined below for the WW genotype

 

 

 

 

F
WW _ WW WW WW WW
Nt+l ‘ Nf,t f, S’WWx WWPt ‘W
l WW WW WW Ww
Nf,t f erWXWWp +... +
WW WW WW WW
sO N0,t +"'+Sa,t Na,t (A.3)
where N??? is the population size of the wildtype at time t + l ,f3 is the fecundity of
females of genotype g, sf; represents the survival transitions from age classes, Sig] x g2
represents the proportion of offspring of genotype g produced by mating of females of
mg N51
genotype gj with males of genotype g2, and p‘tg = ——’§— represents the relative
28 mgN m, t

mating success of males of genotype g with mating advantage m (this assumes that mating

94

 

advantage can be predicted from marginal frequencies of males, i.e. does not interact with
genotype of female). Therefore from equations (2) and (3) it is seen that the individual
projections matrices for each genotype is not a linear function of the population for that
genotype alone, and the overall projection matrix A is a nonlinear function of population
size across all genotypes. When the transgenic genotype has a higher relative mating
advantage, the fecundity functions (i.e. the ﬁrst two lines of the R.H.S in equation (3))
are a decreasing function of time whereby the number of WW offspring produced from
the Ww f x WWm or Ww f x me matings are lower than the number of Ww and ww

offspring produced by the WWf x me or WWf X wwm matings.

95

Appendix B

Stochastic implementation for the

ﬁtness components

B.l Fecundity of females

The number of eggs laid by each female is assumed to follow a Poisson distribution with
parameter A, the expected number of eggs laid by an individual female. Assuming that
the eggs laid by each female in a time step are independent and identically distributed
(iid) random variables, for each time step the number of females that reproduce per geno—
type over all ages (Ni.

, rep
produced by the available females in a genotype g was simulated as a Poisson variable

= 2a n‘; a) were calculated and the total number of eggs

with parameter N? repA . It is easily veriﬁed that the sum of n iid Poisson variables with

parameter A is also Poisson, with parameter nA (Casella and Berger, I990)

B.2 Mating success of males

Relative mating success is deﬁned as the relative proportion of individuals that mate

weighted by the mating advantage that an individual of a particular genotype enjoys.

96

 

Let

 

fg _ N§ex
sex _ g
Zg N sex
denote the relative frequency of individuals of a particular genotype g in a speciﬁc sex.
Then the relative mating success M§e x is a function of the mating advantage m‘Ee x and
the relative frequency f5, x where
8 g
Mg _ msex X fsex !
sex — g F

g
28 msex x f sex

Assuming that the number of individuals of a particular sex that are available for repro-

duction at any time is Nfe p, sex, the realized number of individuals is the re-weighted
. . . . real, g g g .
number With the mating successes as weights, that IS Nre p sex ~ M sex x Nre p, sex, in
the deterministic context. In the stochastic model, after we calculate the relative mating
success M8 as given above, we assume the number of individuals of a particular sex that
actually mate as a random draw from the available Nggfral, where Ngggral = 2 g N§ex9

from a multinomial distribution with parameters given by the Mgex’s, i.e,

,real I t 1. WW W
N§ep, sex ~ Mulzuvsgxa ,Mm M3833. M3,?)

and the probability of mating is calculated as

 

real, g
g _ Nrep, sex
p sex — Nreal,g
zg rep, sex

All females are assumed to reproduce and mating occurs at random, i.e., the probability
of a female pairing with individual male from any genotype depends only on the relative

proportion of each genotype and their mating advantage.

97

B.3 Survival from age class a to a + 1

The event of an individual surviving from age a to a + 1 can be considered a Bernoulli
random variable with probability of survival sa. Assuming independence across individ-
uals and a constant survival parameter, the number of individuals surviving from age a to

a + l in one time step is a binomial random variable, i.e. Na + 1, t + 1 ~ Bin(Na, t; sa).

98

 

Appendix C

Analytical results on effect of
stochasticity in each ﬁtness component

on variation

C.1 Fecundity

For stochasticity in fecundity of feeadtotoc males, assuming that all eggs are fertile the

number of offspring entering the population at time t (N5 1‘) is a Poisson variable with

g _ g _ g
E(N0,t)—Var(NO,t)—Nf,rep,tA.

If Ng A >> O, the Poisson distribution tends to a Gaussian process as N8 ~
f,rep,t 0,t

_ g _ g . . . .
N ([1,0'), where ,u — N f, rep, tA and 0' — ’Nf, rep, tA’ and it is again Simple to see

that the %CV declines exponentially as the product Ng A increases. In our simu-

f rep,t

lations A = 8.8 and for N3 = 50 for an arbitrary time t we can calculate the 99%

f, rep, I
intervals as (387,496), which implies that the probability the number of offspring pro-

duced is smaller than 387 or greater than 496 is 0.0]. As Ng increases this bound

f, rep,t

99

decreases and so we can expect stochasticity in the fecundity component should make

signiﬁcant contributions only when the population size is very small

C.2 Mating Success

Given the way we calculate relative mating success (see Appendix B) for a genotype
g, the realized number of males is a random variable from a multinomial distribution
with parameters M,gn, the relative mating success, and, 2g N§e p m, the total number of

males present. Conditional on the parameters for the other genotypes, we know that the

realizations of males for genotype g1 , is a Binomial random variable i.e
real, 81 g1 g2 3
NrepJn ~ Bin[ZN§ep m;Mm lM ,Mg

with expectation and variance given by:

real, g
EN[rep,m 1]: MngNg rep,m
and

Var[N;:;f9mgl]= Mg](l-M81)ZNgrep,m

However, in the model the ﬁnal prolbability of mating with a male of genotype g by any
Nrea , gl

8

female is given by the p% =
28 N re p, m

, whereby taking expectations and variances we

100

have

 

 

 

Nrmeal, gl —1 l
g _ rea,gl
E[pm —E——-—— Z N—T— =ZN3 rep,m m] E[Nm ] (01)
g rep,m
and
Nrnfagl, g1 ‘2 real
Var[p§n]=Var——— :[ZNreP m] Var[Nm ’81] (C2)
ZgNg rep,m

 

 

From equations (4) and (5) we can see that the contribution of the mating success com-
ponent to the variance will be really low and therefore we can expect stochasticity in this
component to only inﬂuence the mean of the process unless the number of reproducing

males are very low (Fig. 1, Fig. 3, scenario M).

c.3 Viability

In the case of adult and juvenile viability, since we assume that Na + 1 t + 1 ~ Bin(Na I; sa),
where N61,; is also ~ Bin(Na _ 1, t _ 1;sa _ 1) and so on, the unconditional distribution

of N +1 1 +1is~Bin(Na_ ]t_1;sasa_1)with expectation
E[N a,+1t+1]=SaSa—1Na—1,t—1

and variance

varlNa+l,t+1]= Sasa+ l (11 —SaSa+ 1)Na— 1,1‘— 1'

Since Na _ 1, t _ 1 itself is a radom variable, we can extend this all the way to the the ﬁrst

age class at time t— a and therefore write the expectation and variance at a particular time

101

I for a particular age class a as:

a
E[Na+1,t+l]=[( n Sk]N0,t—a] (C3)

and

VarlNa+1,t+1]=[[ [g] Sk][1—kﬁ Sk

k=l =1

 

NO, I- _ a] (C.4)

where N0 t— a represents the number of offspring born at an earlier time t — a. From
equations (3),(6) and (7) we can see that the overall variance of the population at time

t + 1 for a genotype g is given by Var [N;g ] = 2?: 1 Var [N5 t + 1], assuming that

+ 1
the covariances among age-classes are independent. if the No, I _ a are not very different
across age classes, then contributions to variance from each age classes is based on the
product
a a
H sk [1 — n sk ((3.5)
k = l k = 1
That is the contribution of each age class to variance in population size depends on the
survivorship upto that particular age class. The variance is therefore a weighted sum of
the offspring produced from time t—a to time t— 1. Hence, the highest contributions to the
overall variance are from those ages where the expression in equation [8] is maximized,
which usually occurs in age classes in the middle of the age distribution. For example, in
the daily model
a a
H Sk l — n Sk
k = l k = l
is maximized when a = l l and is > 0.05 in the interval a : (2,43). Therefore, we expect

the most contribution to variance to occur from these earlier age classes.

102

 

Appendix D

Additional plots for Chapter 1

103

Scenario: F

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  

 

 

 

 

 

Scenario: M
.é‘ccz: a?
901 22
8. 8
O 0..
°4'o ' 6'0 ' 83'100' 040 60 80 100
generation generation
Scenario: J Scenario: A+J
>.“' >N.:
2713* 5"“
C ‘ got
a . s . \
8 3
0'4'0 ' 60 ' 8'0 '100' 0'4'0 ' 6'0 ' 8'0 '100T
generation generation
Scenario: F+M Scenario: F+A+J
€05 é‘v
2o“ 28
e ‘ s
0.. _ - 8 -
C’40 60 80 100 040 60 80 100

generation generation

Scenario: F+M+A+J

L__

'4'0 ' 6'0 ' 8'0 '100'
generation

Scenario: M+A+J

 

 

I
A n A A

 

 

 

 

 

 

 

 

densnty
000 . 9-04 -

densny
000 004

'4'0 ' 60F 80 ‘ 100

 

 

 

 

 

 

 

 

 

 

 

 

generation
Scenario: 3mlll Scenario: A82

col 8

%d %9

C ‘ C

a) a)

'o 1 \ 'o .

O O

0.. - - . . . - - 0.. . . . . - - .

c>40 60 80 100 040 60 80 100
generation generation

Figure A.1.1: Distribution of extinction times for all scenarios at a quasi-extinction
threshold of 100 individuals. A kernel density estimate using a Gaussian kernel is used
for smoothing. The vertical bars represent the time to extinction for the deterministic

model.

104

' urn.
-u.
Lt

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

 

 

E E
.9 .9
E E m" \
a a ‘
g: 93*: So: M X.
c» 830' . . . . . .
9 —1o 20 30 4050 60
generation
’5 ’E
.9 , W§£333221j """"" 2,
E ooj """"""""""""""""""" (_U no
a ----------- a
o v- “““““ 0 Vi“
9,0- Sc: A Ea":
9 10 20 30 4o 50 60 9 10 20 30 4o 50 60
generation
’5 E
.9 .9 .
E E m- \
3 3 .
8' 8v 9
0;, 9, ~ SC: F+M ‘9,
83 839' . . . . i 1
— —10 20 3040 5060
generation
’8 E‘
.9 .9
E 5 no
a a -------
0 V o """"
9. Se: F+A+J “MW 9, Se: M+A+J W
C) . . . . . . U) . . 1 i . r
9 10 20 30 4o 50 so 9 10 20 30 4o 50 60
generation generation
’5 E‘
9 .9
:6 co 5 co
3 «.\ a .
O V; .i‘ ' 0 ~\ .......
96,301 Sc: F+M+A+J W 96 Sc: A82 --......~...;
9 100 200 300 400 9 1'0 20 3'0 4'0 5'0 6'0 7'0
generation generation
.9 531
9 . \_‘
D ..
8'11) .
eel Sc: 3MI|I
O) ‘ ﬁ . i . 1 .
9 10 20 30 4o 50 60
generation

Figure A. 1 .2: The evolution of average total population size over time for all scenarios and
the quantiles of population size over time. The dashed blue line represents the evolution
from the deterministic model, while the green lines represent the 5% and 95% quantiles
respectively. The red line represents the average population size over 1000 runs at each
time step. Population numbers have been log transformed for better visualisation.

105

 

 

 

 

 

 

     

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

          

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

     

 

d-cqj SCF/I’ '; SCIM “/5
93 O. - .
E 0. st.
9 2'0 3'0 4'0 5'0 6'0 7‘0 8'0 9 2‘0 3'0 70 . 5'0 6'0
generation ’ generation
670.. c0]
90. oi
E 0.2 0.2
° 9 2'0 3'0 4'0 5'0 6'0 7'0
generation
are ,0. SczF+A+J ......... #7:”: _________ _
90: ",,.—:____.__._“__m C ,-o-“‘:::: ---------
E O.“ x”\-...___ 9.1 """ . “N. .....
C’20 30 4o .50 60 7o 920 30 4o _50 60
generation generation
c.10- m- SC:M+A+J ”xi ___________
ed: 0'1
E of 0.2
c’2'0 30 4'0 5'0 6'0 0
generation
d-cql <91
20. O.
E 9i 0.2 """" ”"w...
0 2'0 3'0 4'0 50 Tao ° 20 3'0 4‘0 5'0 6'0 7'0
generation generation
. _ _ ../ ______
610.1 80.3%" --———-"" _______
20'
E o \ ~
0 r %—

 

 

 

 

2 30 4'0 5'0 0'0 7'0 8'0 9'0
generation

Figure A.1.3: The evolution of average transgene frequency over time for all scenarios
and the quantiles of the transgene frequency at time step. The dashed blue line represents
the evolution from the deterministic model, while the green lines represent the 5% and
95% quantiles respectively. The red line represents the average population size over 1000
runs at each time step.

106

 

 

 

 

 

 

 

c>/oCV
30 120
(D
9
Z
cyoCV
2 ‘l . 9 . 3
B
CD
0
g

 

9 .

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 1'0 2'0 30 4'0 5'0 60 0 1'0 2'0 3'0 4'0 50 6'0
generation generation
N
:53. Se: F v—'j Sc: F
5 ‘ 5:0:
o\° o\°o'
LO: .
o 10 20 30 4o 50 60 ° 0 1o 20 30 4o 50 60
generation generation
3. Sch o: 8ch
a ‘ 5‘5
09914 $3.
‘ 4
O . . . . . . . 1' o-. . . . . . .
o 10 20 30 4o 50 60 o 10 20 30 4o 50 60
generation generation
Sc: F+M+A+J SC: F+M+A+J

 

ci/oCV

0 . 4‘0‘ .LOO
%CV

10 :0. 1710

 

 

 

 

 

0 1'0 2'0 3'0 4'0 5'0 6'0 70 0 1'0 20 3'0 4'0 5'0 60 70
generation generation

Figure A.1.4: The evolution of the coeffecient of variation(%CV) for the runs over time

for scenarios F, M, .l and F+M+A+J. Limit for the x-axis is chosen as the generation

where at least 100 populations were not extinct. l) The y-axis represents the trajectory of

the %CV for the total population size. 2) The y-axis for the right column represents the
trajectory of the %CV of the transgene gene frequency.

107

Appendix E

Additional plots for Chapter 2

108

 

 

 

 

 

 

 

 

     

'Fecundit ‘ F3‘rtility" Sexual Maturit o Wability _
§_
8- 0'
do
m—
0
o—
«4
LO
53,- a_
05“ o
C!)—
3- 8-
0' s
N_ a)-
o?
O
1—— E_*
LD
('2...
O
FLO
f\
1 ._
JO)
llll Ill IIIIo‘IIII

 

 

 

 

 

 

 

 

 

 

 

 

 

TGTGcWTWTc TGTGcWTWTc TGTGcW'iWTc TGTGcW'iWTc

Figure A2. 1: Marginal distribution of the bootstrap parameter estimates for the Wildtype
(WT) genotype and the transgenic type (TG). Marginal distributions for the method using
Copulas simulating dependence are labeled with a ”c” subscript.The The same set of
parameters were used for three different values of realtive mating success of the WT
males ( 1.16.193 and 5 respectively).

109

 

8 10 12

Fixation Time

6

 

 

 

8 10 12

Fixation Time

6

 

 

 

0.46 0.47 0.48 0.49 0.50

Fertility WT

 

 

 

8 10 12

Fixation Time

6

 

 

1 5 20 25 30

Fecundity WT

 

Fixation Time
8 1 0 1 2

6

 

 

 

0.9910 0.9915 0.9920 0.9925

Viability WT

 

 

 

 

8 10 12

Fixation Time

6

 

 

 

 

1 .030 1 .040 1 .050

Lambda WT

llO

6566676869

Sexual Maturity WT

Figure A.2.2: Comparison of the ﬁxation time as a function of ﬁtness components for the
Wildtype(WT) genotype. The dashed contour lines represent the scenario incorporating
dependence using the copula approach

 

 

 

 

 

   

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

 

(\l N
0 $2
a) a)
E .9
*— co '— co
c c
.9 .9
‘5." 33'
x-r v
0.34 0.38 0.42 4 6 8 10 12 14
Fertilifv TG Fem mriifv TC.
9: 9:
0 52
(D 0
.E g
'- oo '- co
c c
.9 .9
E E?
x-r v
0.983 0.985 0.987 90 95 100
ViahiIifv TG Sexual Matt Irifv TG
(\I
o
a)
.9
F‘ co
c
.9
‘5:
LI: (0
V

 

 

 

 

1.010 1.015 1.020 1.025 1.030

Figure A.2.3: Comparison of the ﬁxation time as a function of ﬁtness components for the
transgenic (TG) genotype. The dashed contour lines represent the scenario incorporating
dependence using the copula approach

111

BIBLIOGRAPHY

Aemi, P., 2004. Risk, regulation and innovation: The case of aquaculture and
transgenic ﬁsh. Aquatic Sciences 66:327—341.

Andow, D. and A. Hilbeck, 2004. Science-based risk assessment for nontarget
effects of transgenic crops. BioScience 54:637—649.

Andow, D. and C. Zwahlen, 2006. Assessing environmental risks of transgenic
plants. Ecology Letters 9: 196 - 214.

Barnthouse, L. and M. Sorenson, editors, 2007. Population-Level Ecological Risk
Assessment. CRC, Boca Raton, FL.

Beauchamp, J. J. and J. S. Olson, 1973. Corrections for bias in regression estimates
after logarithmic transformation. Ecology 54:1403—1407.

Beaumont, M. and B. Rannala, 2004. The Bayesian revolution in genetics. Nature
Reviews Genetics 5:251-261 .

Benton, T. G. and A. Grant, 1996. How to keep ﬁt in the real world: Elasticity

analyses and selection pressures on life histories in a variable environment. The
American Naturalist 147:1 15—139.

Bodmer, W. F., 1965. Differential fertility in population genetics models. Genetics
51:411—424.

Boyce, M. S., C. V. Haridas, C. T. Lee, and T. N. S. D. W. Group, 2006. Demogra-
phy in an increasingly variable world. Trends Ecol Evol 21: 141—148.

Cariboni, J., D. Gatelli, R. Liska, and A. Saltelli, 2007. The role of sensitivity
analysis in ecological modelling,. Ecological Modelling 203:167—182. to read.

Casella, G. and R. L. Berger, 1990. Statistical Inference. Wadsworth & Brooks/Cole
Pub. Co., Paciﬁc Grove, Calif.

Caswell, H, 1989. Matrix Population Models: Construction, Analysis, and Inter-
pretation. Sinauer Associates, inc., sunderland Mass.

Caswell, H, 2008. Perturbation analysis of nonlinear matrix population models.
Demographic Research 18259-1 15.

112

 

Charlesworth, B., 1994. Evolution in Age-Structured Populations. Cambridge stud-
ies in mathematical biology. Cambridge University Press, Cambridge.

Claessen, D., C. Gilligan, P. Lutman, and F. van den Bosch, 2005. Which traits
promote persistence of feral GM crops? Part 1: implications of environmental
stochasticity. Oikos 110:20 — 29.

Clark, J. and B. Whitelaw, 2003. A future for transgenic livestock. Nature Reviews
Genetics 4:825 — 833.

Clark, J. S., 2003. Uncertainty and variability in demography and population
growth: A hierarchical approach. Ecology 84:1370—1381.

Coulson, T., J .-M. Gaillard, and M. Festa-Bianchet, 2005. Decomposing the varia-

tion in population growth into contributions from multiple demographic rates. Jour-
nal of Animal Ecology 74:789—801.

Devlin, R., M. D’Andrade, M. Uh, and C. Biagi, 2004. Population effects of growth
hormone transgenic Coho Salmon depend on food availability and genotype by
environment interactions. PNAS 10119303 — 9308.

Devlin, R., L. Sundstrom, and W. Muir, 2006. Interface of biotechnology and ecol—

ogy for environmental risk assessments of transgenic ﬁsh. Trends in biotechnology
24:89 — 97.

Efron, B. and R. Tibshirani, 1998. An Introduction to the Bootstrap. Chapman &
Hall/CRC, New York.

Embrechts, P., F. Lindskog, and A. J. McNeil, 2003. Modelling Dependence with
Copulas and Applications to Risk Management, chapter 8. Elsevier.

Engen, S., R. Lande, and B. Saether, 2005a. Effective size of a ﬂuctuating age-
structured population. Genetics 1701941 — 954.

Engen, 8., R. Lande, B. Saether, and H. Weimerskirch, 2005b. Extinction in re-
lation to demographic and environmental stochasticity in age-structured models.
Mathematical Biosciences 195:210 — 227.

Ewens, W., 2004. Mathematical Population Genetics. 2nd ed., Springer-Verlag,
New York.

113

Fang, K. and A. Sudjianto, 2006. Design and Modeling for Computer Experiments.
Chapman & Hall/CRC.

Ferriere, R., U. Dieckmann, and D. Couvet, 2004. Evolutionary Conservation Biol-
ogy. Cambridge studies in adaptive dynamics. Cambridge University Press, Cam-
bridge.

Fieberg, J. and S. P. Ellner, 2003. Using PVA for management despite uncertainty:
Effects of habitat, hatcheries, and harvest on Salmon. Ecology 84:1359—1369.

Fieberg, J. and K. J. Jenkins, 2005. Assessing uncertainty in ecological systems
using global sensitivity analyses: A case example of simulated wolf reintroduction
effects on elk. Ecological Modelling 1871259—280.

Fox, G. and B. Kendall, 2002. Demographic stochasticity and the variance reduc-
tion effect. Ecology 83: 1928—1934.

Gaggiotti, 0., 2003. Genetic threats to population persistence. Annales Zoologici
Fennici 40: 155—168.

Galassi, M., J. Davies, J. Theiler, B. Gough, G. Jungman, M. Booth, and F. Rossi,
2003. Gnu Scientiﬁc Library: Reference Manual. Network Theory Ltd.

Gamier, A., A. Deville, and J. Lecomte, 2006. Stochastic modelling of feral plant

populations with seed immigration and road verge management. Ecological mod-
elling 1971373—382.

Gamier, A. and J. Lecomte, 2006. Using a spatial and stage-structured invasion
model to assess the spread of feral populations of transgenic oilseed rape. Ecologi-
cal modelling 194:141 — 149.

Gelman, A., 2004. Bayesian Data Analysis. 2nd ed., Chapman & Hall/CRC, Boca
Raton, FL

Genest, C. and A. Favre, 2007. Everything you always wanted to know about copula
modeling but were afraid to ask. Journal Onydrologic Engineering 12:347—368.

Genest, C. and J. MacKay, 1986. The joy of copulas: Bivariate distributions with
uniform marginals. The American Statistician 40:280—283.

Goldberg, P., C. Williams, and C. Bishop, 1998. Regression with input-dependent
noise: A gaussian process treatment. pages 493—499 in M. Kearns, M. Jordan, &

114

T. Solla, editors. Advances in Neural Information Processing Systems MIT press,
Cambridge, MA.

Gotelli, N. J. and A. M. Ellison, 2006. Forecasting extinction risk with nonstation-
ary matrix models. Ecol Appl 16:51—61.

Gramacy, R., 2007. tgp: an R package for Bayesian nonstationary, semiparamet-
ric nonlinear regression and design by treed gaussian process models. Journal of
Statistical Software 19:6.

Gramacy, R. and H. Lee, 2008. Bayesian treed gaussian process models with an

application to computer modeling. Journal of the American Statistical Association
10321119—1130.

Grant, A. and T. G. Benton, 2000. Elasticity analysis for density-dependent popu-
lations in stochastic environments. Ecology 81:680—693.

Hails, R. and K. Morley, 2005. Genes invading new populations: A risk assessment
perspective. Trends in Ecology Gr Evolution 20:245 — 252.

Hedrick, P., 2001. Invasion of transgenes from salmon or other genetically modiﬁed
organisms into natural populations. Canadian Journal Of Fisheries And Aquatic
Sciences 58:841—844.

Inchausti, P. and J. Halley, 2003. On the relation between temporal variability and
persistence time in animal populations. Journal of Animal Ecology 72:899—908.

Jones, A., S. Arnold, and R. Burger, 2003. Stability of the g-matrix in a population
experiencing pleiotropic mutation, stabilizing selection, and genetic drift. Evolution
57:1747—1760.

Kareiva, P., 11. Parker, and M. Pascual, 1996. Can we use experiments and models in

predicting the invasiveness of genetically engineered organisms? Ecology 77: 1670—
1675.

Kareiva, P. M., J. G. Kingsolver, and R. B. Huey, 1993. Biotic Interactions and
Global Change. Sinauer Associates, Sunderland, Mass.

Knibb, W., 1997. Risk from genetically engineered and modiﬁed marine ﬁsh.
Transgenic Research 6:59—67.

115

Koons, D. N ., C. J. E. Metcalf, and S. Tuljapurkar, 2008. Evolution of delayed re-
production in uncertain environments: A life-history perspective. Am Nat 172:797—
805.

Kuparinen, A. and F. M. Schurr, 2007. A ﬂexible modelling framework linking the
spatio-temporal dynamics of plant genotypes and populations: Application to gene
ﬂow from transgenic forests. Ecological Modelling 202:476 — 486.

Lande, R., 1993. Risks of population extinction from demographic and environ-
mental stochasticity and random catastrophes. The American Naturalist 1422911—
927.

Legendre, S., J. Clobert, A. Moeller, and G. Sorci, 1999. Demographic stochasticity
and social mating system in the process of extinction of small populations: The case
of passerines introduced to New Zealand. American Naturalist 153:449—463.

Lenski, R. E., C. Ofria, R. T. Pennock, and C. Adami, 2003. The evolutionary
origin of complex features. Nature 423: 139—144.

Meagher, T., 2003. Using empirical data to model transgene dispersal. Philosophi-
cal Transactions: Biological Sciences 358:1 157—1 162.

Millar, R. B. and R. Meyer, 2000. Bayesian state-space modeling of age-structured
data: Fitting a model is just the beginning. Canadian Journal Of Fisheries And
Aquatic Sciences 57:43—50.

Morris, W. and D. Doak, 2002. Quantitative Conservation Biology: Theory and
Practice of Population Viability Analysis. Sinauer Associates.

Muir, W., 2004. The threats and beneﬁts of GM ﬁsh. Embo Reports 5:654—659.

Muir, W. and R. Howard, 1999. Possible ecological risks of transgenic organism
release when transgenes affect mating success: Sexual selection and the trojan gene
hypothesis. PNAS 96: 13853—13856.

Muir, W. and R. Howard, 2001. Fitness components and ecological risk of trans-
genic release: A model using Japanese Medaka (Oryzias latipes). American Natu-
ralist 158: 1—16.

Muir, W. and R. Howard, 2002. Assessment of possible ecological risks and haz-
ards of transgenic ﬁsh with implications for other sexually reproducing organisms.
Transgenic Research 11:101—1 14.

116

Muir, W. and R. Howard, 2004. Characterization of environmental risk of ge-
netically engineered (ge) organisms and their potential to control exotic invasive
species. Aquatic Sciences 66:414—420.

Nacci, D. and A. Hoffman, 2008. Genetic variation in population-level eco-
logical risk assessment, pages 93—113. in L.W. Barnthouse, W.R. Munns Jr &
M.T. Sorensen, editors. Population-Level Ecological Risk Assessment, CRC, Boca
Raton, FL, USA.

National Research Council, 2004. Biological conﬁnement of genetically engineered
organisms. National Academies Press.

Naujokaitis-Lewis, I. R., J. M. R. Curtis, P. Arcese, and J. Rosenfeld, 2009. Sensi—
tivity analyses of spatial population viability analysis models for species at risk and
habitat conservation planning. Conserv Biol 23:225—229.

Naylor, R., K. Hindar, I. A. Fleming, R. Goldburg, S. Williams, J. Volpe, F. Who-
riskey, J. Eagle, D. Kelso, and M. Mangel, 2005. Fugitive salmon: Assessing the
risks of escaped ﬁsh from net-pen aquaculture. BioScience 55:427—437.

Nelsen, R., 2006. An Introduction to Copulas. Springer Verlag.

Oakley, J. and A. O’Hagan, 2002. Bayesian inference for the uncertainty distribu-
tion of computer model outputs. Biometrika 89:769—784.

Oakley, J. E. and A. O’Hagan, 2004. Probabilistic sensitivity analysis of complex
models: A bayesian approach. Journal of the Royal Statistical Society. Series B
(Statistical Methodology) 66:751—769.

OHara, R. B., J. M. Cano, o. Ovaskainen, and C. A. J. S. Teplitsky, 2008. Bayesian
approaches in evolutionary quantitative genetics. Journal of Evolutionary Biology
21:949—957.

Peck, S., 2004. Simulation as experiment: A philosophical reassessment for bio-
logical modeling. Trends in Ecology Gr Evolution 19:530 — 534.

Pilson, D. and H. Prendeville, 2004. Ecological effects of transgenic crops and the
escape of transgenes into wild populations. Annual Review Of Ecology Evolution
And Systematics 35: 149 — 174.

Plate, T., 1999. Accuracy versus interpretability in ﬂexible modeling: Implement-
ing a tradeoff using gaussian process models. Behaviormetrika 26:29—50.

117

 

Pollard, J., 1966. On the use of the direct matrix product in analysing certain
stochastic population models. Biometrika 53:397—415.

Ramula, S. and K. Lehtila, 2005. Importance of correlations among matrix entries
in stochastic models in relation to number of transition matrices. Oikos 111:9-18.

Rasmussen, C. E. and C. K. 1. Williams, 2005. Gaussian Processes for Machine
Learning. The MIT Press, Cambridge, MA.

Richter, o. and R. Seppelt, 2004. Flow of genetic information through agricul-
tural ecosystems: A generic modelling framework with application to pesticide-

resistance weeds and genetically modiﬁed crops. Ecological Modelling 174:55 —
66.

Roff, D. A., 1992. The Evolution of Life Histories. Theory and Analysis. Chapman
& Hall, New York.

Saether, B., S. Engen, A. Moeller, H. Weimerskirch, M. Visser, W. Fiedler,
E. Matthysen, M. Lambrechts, A. Badyaev, P. Becker, et al., 2004. Life-history
variation predicts the effects of demographic stochasticity on avian population dy-
namics. American Naturalist 164:793—802.

Saether, B.-E. and S. Engen, 2002. Pattern of variation in avian population growth
rates. Philosophical Transactions of the Royal Society B: Biological Sciences
357:1185—1195.

Saether, B.-E., S. Engen, J. E. Swenson, O. Bakke, and F. Sandegre, 1998. Assess-
ing the viability of scandinavian brown bear, Ursus arctos, populations: The effects
of uncertain parameter estimates. Oikos 83:403—416.

Saltelli, A., 2000. Sensitivity Analysis. Wiley Series in Probability and Statistics.
Wiley.

Saltelli, A., M. Ratio, T. Andres, F. Campolongo, J. Cariboni, D. Gatelli,
M. Saisana, and S. Tarantola, 2008. Global Sensitivity Analysis: The Primer.
Wiley-Interscience.

Santner, T., B. Williams, and W. Notz, 2003. The Design and Analysis of Computer
Experiments. Springer.

Shpak, M., 2007. Selection against demographic variance in age structured popu-
lations. Genetics 107:.080747.

118

 

Snyder, R. E., 2003. How demographic stochasticity can slow biological invasions.
Ecology 84: 1333—1339.

Soboleva, T., P. Shorten, A. Pleasants, and A. Rae, 2003. Qualitative theory of the
spread of a new gene into a resident population. Ecological Modelling 163233 — 44.

Stearns, S. C., 1992. The Evolution of Life Histories. Oxford University Press,
Oxford.

Storlie, C. B. and J. C. Helton, 2008. Multiple predictor smoothing methods for

sensitivity analysis: Description of techniques. Reliability Engineering é} System
Safety 93:28 — 54.

Team, R. D. C., 2009. R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.

Tiedje, J ., R. Colwell, Y. Grossman, R. Hodson, R. Lenski, R. Mack, and P. Regal,
1989. The planned introduction of genetically engineered organisms - ecological
considerations and recommendations. Ecology 70:298—315.

Tuljapurkar, S., 1990. Population dynamics in variable environments: lecture notes
in biomathematics no. 85.

Tuljapurkar, S., 1997. Stochastic Matrix Models, pages 59—89. in S. Tuljapurkar
and H. Caswell, editors Structured-Population Models in Marine, Terrestrial, and
Freshwater Systems, Population and Community Biology Series. Chapman & Hall.

Tuljapurkar, S., C. C. Horvitz, and J. B. Pascarella, 2003. The many growth rates
and elasticities of populations in random environments. The American Naturalist
162:489502.

Wade, P. R., 2000. Bayesian methods in conservation biology. Conservation Biol-
ogy 14:1308—1316.

Wall, M., 1996. GAlib: A C++ Library of Genetic Algorithm Components, version
2.4, Documentation Revision B. Massachusetts Institute of Technology.

Wikle, C., L. Berliner, and C. N., 1998. Hierarchical bayesian space-time models.
Environmental and Ecological Statistics 5:1 17—154.

119

Wisdom, M. J., S. L. Mills, and D. F. Doak, 2000. Life stage simulation analy-

sis: Estimating vital-rate effects on population growth for conservation. Ecology
81:628—641.

Yan, J ., 2007. Enjoy the joy of copulas: with a package copula. Journal of Statisti-
cal Software 21: 1—2 1.

Yearsley, J. M. and D. Fletcher, 2002. Equivalence relationships between stage-
structured population models. Mathematical Biosciences 179:131—143.

120