._... suffiniriqﬁr
15% LEA. JAM?

S. «.3!

. . <qL
... is. ,
. b
37:...an 1.. o a
.. l:uv-4v .-

. .
w":

1!...
titty .
.812:

0.5.
1

 

 

2 ‘ 1:, 3 .. . . , . . . . . , ‘ . .‘ .2. .2: t
.uﬁé $.32." urfﬁsﬂ , , . , $5.;
3.1.. : t.“ I . . .

 

 

I “111178

2605

This is to certify that the
dissertation entitled

POPULATION GENETICS OF BACTERIAL ADAPTATION:
EXPERIMENTS WITH ESCHERICHIA COLI AND A SIMULATION
MODEL

presented by

Robert James Woods

has been accepted towards fulﬁllment
of the requirements for the

DOCTOR OF degree in Department of Zoology g
PHILOSOPHY

 

 

 

(UM a, WM:

Major Professor’s Signature

MSU is an Afﬁrmative Action/Equal Opportunity Institution

 

LIBRARY
Michigan State
University

 

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/05 p:/CIRC/DateDue.indd-p.1

 

 

POPULATION GENETICS OF BACTERIAL ADAPTATION: EXPERIMENTS WITH
ESCHERICHIA COLI AND A SIMULATION MODEL

By

Robert James Woods

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Zoology

2005

ABSTRACT

POPULATION GENETICS OF BACTERIAL ADAPTATION: EXPERIMENTS WITH
ESCHERICHIA COLI AND A SIMULATION MODEL

By
Robert James Woods

This dissertation presents empirical examples of the population genetics of
adaptation in microbial populations. Three of the four chapters of this dissertation deal
directly with the long-term evolution experiment with E. coli. The long-term experiment
is twelve initially identical populations that are allowed to evolve in identical
environments. This classic study in evolution sketches the entire process of evolution,
including the origin, fate and later consequences of variation that arises. As important as
its elegance is the longevity of the long-term experiment, both in the number of
generations and the time devoted to its study. It is this accumulated understanding that
has allowed novel discoveries to come to light, including those in this dissertation.

Chapter 1 uses four previously identiﬁed candidate beneﬁcial mutations in one
population to search for parallel genetic evolution in the remaining populations. Parallel
evolution is seen at the level of locus, and less frequently at the level of nucleotide.
Furthermore, different loci differ in the degree of parallelism they display. Four
independent tests, based on population genetics theory, show that this parallel molecular
evolution was driven by natural selection.

Chapter 2 uses a computer simulation model of large asexual populations to
investigate the population dynamics of adaptation. This model was used to quantify the
number of beneﬁcial mutation that arise in asexual populations, the frequencies they

reach, and the phylogenetic structure among them. Three patterns emerged. First,

multiple independent beneﬁcial mutations arise and compete with each other. Second,
when multiple lineages arise and coexist, each may pick up several mutations before one
eventually prevails. Third, as these coexisting lineages each pick up beneﬁcial mutations,
mean ﬁtness tends to increase in a step-like manner even without allelic ﬁxation events.
Finally, a hypothesis for indirect selection for evolvability is proposed and shown to work
with simulations.

Chapter 3 returns to the long-term E. coli experiment and focus on a single
evolving population to show the patterns from Chapter 2. Six beneﬁcial mutations were
tracked as they spread through this one population over 5000 generations. Multiple
independent beneﬁcial mutations arose and competed with each other, several lineages
that eventually became extinct nonetheless accumulated several beneﬁcial mutations, and
mean ﬁtness increases in a step-like manner even without allelic ﬁxation events.

Chapter 4 uses the same population to test the hypothesis of indirect selection for
evolvability that was developed in Chapter 2, which requires the clonal interference
dynamics that were shown to be present in Chapter 3. This hypothesis further requires
that the independent subpopulations differ in their ability to adapt. Representative clones
from an early time point, whose descendants went on to compete and further adapt before
one eventually displaced all others, were sampled from the population and used to found
new populations. These new populations founded with clones of the eventual winners
adapted more quickly and reach higher ﬁtness than the populations of the eventual losers.

Thus, the eventually successful genotype was the more evolvable one.

Dedicated to Andrea Lynn Woods and Eva Laura Woods

iv

ACKNOWLEDGMENTS

I would like to thank my advisor Dr Richard Lenski for his support, both
intellectual and with funding. He gave me the freedom to follow my interests and develop
my own projects, while his guidance certainly saved me many months of time, and
immeasurably improved the quality of my work. Most importantly he set an example of
science at its very best, with elegant experiments, brilliant analysis, and thoughtful
writing.

I would also like to thank Dr Timothy Cooper. He shared his advise on
everything from PCR to paper writing. Dr Cooper also collaborated on the work
presented in Chapter 3 and gave many helpful comments on early drafts of Chapters 2
and 4. I also owe a special thanks to Dr Dominique Schneider, who collaborated on
Chapters 1 and 3, and who allowed me to work in his lab for a month.

Many of the people I have met during my graduate studies as MSU have made the
trip very enjoyable. Dr Richard Hill let me pretend to be a marine biologist 4 out of the
past 6 summers. He was also a source of encouragement as I planned my future career.
Dr Paco Moore was always available as a source of stimulating conversation, on things
related to evolution, and much much more. Dr Susanna Remold was the ﬁrst to show me
the wonders of experimental evolution with E. coli. She was also the ﬁrst to show me the
importance of statistical rigor in scientiﬁc research. I thank Dusan Misevic and Dr
Elizabeth Ostrowski for their support and friendship. I also thank the members of my
committee Dr Thomas Schmidt, Dr Donald Hall, and Dr Douglas Schemske for their

guidance and helpful comments on my thesis.

The Lenski Lab has seen a steady stream of brilliant and wonderful students,
postdocs and visiting scientist, each has help to guide my development and shown by
example how science should be done. I would like especially to thank Zachary Blount,
Dr Christina Borland, Dr Timothy C00per, Dr Vaughn Cooper, Dr Santiago Elena, Dr
Christopher Marx, Dule Misevic, Dr Paco Moore, Dr Charles Ofria, Dr Elizabeth
Ostrowski, Dr Susi Remold, Dr Daniel Rozen, Sean Sleight, Dr Kristina Stredwick, Dr
Greg Velicer, and Gabe Yedid.

My parents, Robert and Caren Woods, gave me unwavering support and
convinced me that I could do whatever I put my mind to. I cannot thank them enough.
Most of all I want to thank my wonderful wife Andrea Woods. It was her faith in me that
gave me the conﬁdence to strive for more. She has been by my side through thick and
thin. Last, but certainly not least, I must thank my little daughter Eva Laura for making
these past months, which have been ﬁll with much stress and hard work, the happiest I

have ever had.

vi

TABLE OF CONTENTS

LIST OF TABLES ................................................................................................. viii
LIST OF FIGURES ............................................................................................... ix
CHAPTER 1: TESTS OF PARALLEL MOLECULAR EVOLUTION IN A LONG-
TERM EXPERIMENT WITH ESCHERICHIA COLI ........................................... 1
Abstract ....................................................................................................... 1
Introduction ................................................................................................. 1
Materials and methods ................................................................................ 3
Results ......................................................................................................... 4
Discussion ................................................................................................... 15

CHAPTER 2: POPULATION DYNAMICS OF BENEFICIAL MUTATIONS AND

THE THEORY OF CLONAL INTERFERENCE .................................................. 19
Abstract ...................................................................................................... 19
Introduction ................................................................................................ 19
Methods ...................................................................................................... 24
Results ........................................................................................................ 26
Discussion ................................................................................................... 46

CHAPTER 3: WITHIN-POPULATION DYNAMICS OF ADAPTATION IN A LONG-

TERM EVOLUTION EXPERIMENT WITH ESCHERICHIA COLI ................... 51
Abstract ...................................................................................................... 51
Introduction ................................................................................................ 52
Methods ...................................................................................................... 54
Results ........................................................................................................ 62
Discussion ................................................................................................... 78

CHAPTER 4: VARIATION IN EVOLVABILITY PREDICTS EVENTUAL SUCCESS

IN AN EVOLUTION EXPERIMENT WITH ESCHERICHIA COLI .................... 91
Abstract ....................................................................................................... 91
Introduction ................................................................................................. 91
Materials and methods ................................................................................ 95
Results ......................................................................................................... 101
Discussion ................................................................................................... l 1 1

REFERENCES ........................................................................................................ l 16

APPENDIX 1: MATLAB script for simulation and graphing of Muller-style
plot. ......................................................................................................................... 124

vii

LIST OF TABLES

Table l. Mutations in or near the four candidate loci in the evolving populations
after 20,000 generations. ....................................................................................... 7

Table 2. Number of mutations in random and candidate genes in 12 E. coli
populations after 20,000 generations. .................................................................... 10

Table 3. Candidate and random genes differ in relative abundance of (A)
synonymous and non-synonymous point mutations, and (B) point mutations in

mutator and non-mutator populations ..................................................................... 12

Table 4. The average number of mutations at each distance from the PLOD per
mutation that ﬁxed. ............................................................................................... 39

Table 5. The number of unique lineages of length d that end a given number of
mutational steps from the PLOD in a single population. ....................................... 42

Table 6. The fraction of times, out of 1000 simulations at each combination of
N and S, that populations ﬁxed each of the ﬁve possible ﬁrst mutations. ............ 45

Table 7. Frequency of clones with various mutations in each generational sample..64

Table 8. Fitness assays for 8 population samples and 61 isolated clones, with all
ﬁtness values relative to strain REL607. ............................................................... 68

Table 9. Analyses of variation in ﬁtness among genotypes and clones within
genotypes in population Ara-1. ............................................................................ 70

Table 10. Estimates of the ﬁrst divergence of the neutral allele from its initial
ratio. ....................................................................................................................... 105

Table] 1. Statistical analysis of the ﬁtness of EW derived clones relative to the EL
derived clones after 883 generations of additional evolution. ............................... 112

viii

LIST OF FIGURES
Figure l. Mutations after 20,000 generations in four candidate genes in 12
populations of E. coli. ............................................................................................ 6

Figure 2. Distribution of numbers of substitutions in the 12 populations for the
four candidate genes. ............................................................................................. 14

Figure 3. Reproduction of ﬁgure 1 from Muller (1932) that depicts the spread
of beneﬁcial mutations in asexual and sexual populations. .................................. 27

Figure 4. Results of a single simulation plotted in the style of Muller (1932). 28

Figure 5. The average proportion of the population that had at least one beneﬁcial
mutation that does not ﬁx. ....................................................................................... 33

Figure 6. The number of beneﬁcial mutations that occurred, per mutation that
ﬁxed, over a range of mutation rates and population sizes ...................................... 35

Figure 7. Inverse cumulative distributions of the numbers of individuals of
genotypes that had at least one mutation that did not ﬁx. ...................................... 36

Figure 8. The number of beneﬁcial mutations that arise at various mutational
distances from the PLOD, scaled to the total number of mutations that ﬁx. .......... 41

Figure 9. A. Muller-style graph of changing frequencies of mutant genotypes in
population Ara-l through time. ............................................................................... 63

Figure 10. Fitness of the population and clones of the major genotypes sampled
at (a) 500 generations and (b) 1000 generations, relative to the founding clone... 72

Figure 11. Changes in frequency of a neutral allele over ~880 generations of
evolution in ten additional populations. .................................................................. 76

Figure 12. Likelihood estimation of the number of contending mutations, n, that
make up the cohort of beneﬁcial mutations that displaces the ancestral population. 79

Figure 13. Results of a previous study of focal population ara-l showing the
step-like increases in ﬁtness. .................................................................................. 83

Figure 14. A test for frequency dependence of relative ﬁtness of eventual
winners (EW) to the eventual losers (EL) drawn from population Ara-1 at 500

generations. ............................................................................................................. 103

Figure 15. Frequency of a neutral allele in evolving populations. .......................... 106

ix

Figure 16. Relative ﬁtness of the Ara- EW derived clones relative to the Ara+ EL
derived clones following 883 generations of additional adaptation. ....................... 108

Figure 17. Relative ﬁtness of the Ara+ EW derived clones relative to the Ara- EL
derived clones following 883 generations of additional adaptation. ....................... 109

CHAPTER 1

TESTS OF PARALLEL MOLECULAR EVOLUTION IN A LONG-TERM
EXPERIMENT WITH ESCHERICHIA COLI

ABSTRACT

Evolutionary repeatability is difﬁcult to quantify because only a single outcome is
usually observed for any precise set of circumstances. We quantiﬁed the frequency of
parallel and divergent outcomes in twelve initially identical populations of Escherichia
coli propagated in identical environments for 20,000 generations. We sequenced four
candidate genes in which mutations of unknown effect were discovered in one
population, and compared their substitution pattern with 36 randomly chosen genes. Two
candidates had substitutions in all other populations, and the other two in several
populations. There were very few cases, however, in which the exact same mutations
were substituted. No random genes had any substitutions except in four populations that
evolved defects in DNA repair. Tests based on four different aspects of the pattern of
molecular evolution all indicate that adaptation by natural selection drove the parallel

changes in the candidate genes.

INTRODUCTION
One approach to study parallel evolution is to identify cases in which multiple
lineages in nature have evolved similar adaptations. For example, studies of lizard
morphology and ﬁsh behavior show that certain phenotypes evolved repeatedly when

separate populations colonized similar environments (Losos et al., 1998; Rundel, 2000).

Some pathogens show parallel changes including multiple HIV lineages with similar
mutations conferring drug resistance and Escherichia coli strains that independently
acquired similar virulence factors (Crandall et al., 1999; Reid et al., 2000). Despite
these and other compelling examples (Stewart et al., 1987; Conway Morris, 2003), it is
difﬁcult to quantify repeatability because one cannot exclude subtle differences in
selective environments or founding genotypes as causes of any divergent outcomes in
non-experimental studies. In other words, it is difﬁcult to determine the denominator that
provides the number of potential instances of parallel outcomes to compare with the
actual number observed. Carefully designed evolution experiments overcome this
problem. Moreover, replicate populations of microbes can be started from single
individuals, so that any parallel outcomes must depend on the independent origin, as well
as fate, of variants (Lenski and Travisano, 1994; Bull et al., 1997; Sniegowski, Gerrish
and Lenski, 1997; Rainey and Travisano, 1998; Burch and Chao, 1999; Ferea et al.,
1999; Whichman et al., 1999; Cooper and Lenski, 2000; Riehle Bennett and Long, 2001;
Elena and Lenski, 2003; Lenski, 2004).

In a landmark study, Wichman et al. (1999) examined evolutionary repeatability
in ¢X174, a DNA virus with 11 genes and a 5.4-kb genome. Two populations were
propagated for 10 days on a novel host, and the viral genomes were sequenced before and
after the experiment. Of 29 mutations found, 14 were identical in the evolved
populations. However, it is unknown whether more complex genomes would show
similarly strong parallelism. Thus, we examine the extent of genetic parallelism in a
long-term experiment with 12 populations of E. coli (Lenski and Travisano, 1994;

Cooper and Lenski, 2000), an organism with ~4,300 genes and a genome of ~4,600 kb

(Blattner et a1. , 1997). Two cases of parallel genetic changes at the level of affected
genes (but not sites within those genes) have previously been reported in these
populations (Cooper et al., 2001; Cooper, Rozen and Lenski, 2003), but both were
discovered based on parallel phenotypic changes and thus may not be representative. In
this study, we use an approach that is independent of any phenotypic changes to address
the issue in a manner that is statistically unbiased with respect to potential parallelism,
without requiring full genome sequences for each population (which remain costly for
bacteria). Our approach is to compare the pattern of mutational substitutions in several
candidate genes, where previous research found mutations of unknown effect in a single
population (Schneider et al., 2000), with the pattern in many other genes chosen at
random (Lenski, Winkworth and Riley, 2003). Here we use ‘candidate’ to mean only
that a mutation has previously been found in that gene, not that the gene was suspected of

being a target for beneﬁcial mutations.

MATERIALS AND METHODS

Twelve populations were started from the same ancestral strain of Escherichia
coli B, except that six populations were founded from an Ara‘ variant and six from an
Ara+ variant (Lenski etal., 1991). The Ara marker state indicates the ability to
metabolize the sugar arabinose, and is caused by a single nucleotide difference. It has
been shown to be neutral in the evolution environment (Lenski et al., 1991). The
populations are designated A-l to A-6 and AH to A+6 according to this marker. These
populations were propagated in identical glucose-limited environments for 20,000 cell

generations (3,000 days), with population sizes ﬂuctuating daily between ~5 x 106 and ~5

x 108 cells. The populations achieved similar, but not identical, ﬁtness gains (Lenski et
al., 1991; Lenski and Travisano, 1994; Cooper and Lenski, 2000). Also, all 12
populations showed large increases in cell size (Lenski and Travisano, 1994), certain
catabolic abilities were lost in parallel (Cooper and Lenski, 2000), and global gene-
expression proﬁles showed similar changes in the two populations that were examined in
this regard (Cooper, Rozen and Lenski, 2003). Importantly, four populations became
‘mutators’ by evolving defects in DNA repair pathways, which caused approximately
lOO-fold increases in spontaneous mutation rates (Sniegowski, Gerrish and Lenski, 1997;
Cooper and Lenski, 2000). A comparison between mutator and non-mutator populations
underlies one of our tests on the repeatability of molecular evolution. An in-depth
overview of this long-term experiment can be found elsewhere (Lenski, 2004).

An earlier study of two focal populations used RFLP-IS genomic ‘ﬁngerprinting’ to ﬁnd
four candidate beneﬁcial mutations caused by insertions of IS150 elements (Schneider et
al., 2000). Each mutation was substituted in its respective population early in the
experiment, when ﬁtness gains were most rapid. Three of the candidates were in
population A+1 including insertions in nadR, hokB/sokB, and upstream of ppr-rodA.
The fourth candidate, an insertion in pku, was in population A-l. For this study, these
four candidate genes (i. e., those bearing the candidate beneﬁcial mutations in one or the
other focal population) were sequenced for clones sampled at generation 20,000 from all
12 populations and for the ancestor. The candidate genes varied in their length, with the

extent of sequencing shown in Figure 1.

RESULTS

We found 38 mutations in these genes, and 34 were unique to one population or

another. The ancestral nucleotide sequences for these four genes were deposited in

GenBank with accession numbers AY849930—AY849933. Overall, 7,150 bp were
sequenced for each evolved population. Across the 12 populations, some 40 mutations
were found and, in all cases, conﬁrmed by re-sequencing. These mutations are shown in
Figure 1, and their precise locations and molecular details are provided in Table 1. Two
of them lie outside the candidate loci — in or near ych, a gene of unknown function
adjacent to hokB/sokB — and were excluded from our analyses. This exclusion had no
affect on the signiﬁcance of any of our tests, as explained below.

Whereas the two virus populations studied by Wichman et al.(1999) shared
almost half their mutations, the 66 possible pairs of the 12 bacterial populations shared,
on average, only 2.1% of their mutations. Parallelism at the level of mutations is
evidently much less common in the bacteria than in the viruses. Turning to the level of
genes, the pattern is very different. Every population had exactly one non-synonymous
point mutation in both 13ku (Figure 1A) and nadR (Figure 1B), except for the focal
populations that had the insertion mutations. One synonymous mutation was found in
pku, and none in nadR. The two other candidate genes also yielded mutations, but not
in every population. Besides the focal population’s insertion upstream of ppr-rodA,
ﬁve others had mutations in the upstream region, the [9pr gene, or both (Figure 1C).
For hokB/sokB, three other populations had insertions similar to the focal population (Fig.
1D). The many independent substitutions in the candidate genes suggest parallelism, but

it is necessary to demonstrate that the numbers are beyond those expected by chance.

Figure l. Mutations after 20,000 generations in four candidate genes in 12 populations of
E. coli. Lighter regions indicate protein-coding sequences for and near (A) pku, (B)
nadR, (C) ppr, and (D) hokB/sokB. Long bars below indicate the range sequenced;
short bars show scale (200 bp). Each arrow marks a mutation; the number shows the
affected population. The mutations in and near ych are of unknown relevance and were
excluded from our analyses. * denotes an 18150 insertion and, for populations -1 and +1,
these were the original mutations used to identify the candidate genes. A is a 1-bp

deletion. § is a synonymous mutation; all others in coding regions are non-synonymous.

+6 +3 -3 «93‘ -1* -2 (+5-5+1;-6) +2

A 1111 III I

pku —>

-5 +6+3 +1*-2 -6 -1 +2+4Ir5-4 -3
a \u u l Stu

4—1.

 

 

 

 

 

 

l I, ppr —> i radii—>-

.2 (+6* .5‘.)+1t .3!

 

 

 

 

 

 

.1ch —> I hokB/sokB L__.1

 

 

Table 1. Mutations in or near the four candidate loci in the evolving populations after

20,000 generations. Positions are nucleotide positions relative to the start codon of the

indicated gene. Populations A-2, A—4, A+3, and A+6 evolved mutator phenotypes,

whereas all others retained the low ancestral mutation rate. A denotes a 1-bp deletion.

::ISI50 indicates insertion of an IS I 5 0 element, often associated with a duplication of

several bp at the target site.

 

 

Gene Population Position Mutation Amino-acid change
pku A+6 208 A-+C Pro -> Thr
pku A+3 209 A—’C Pro -’ Gln
pku A-3 379 G—>A Asp —’ Asn
pku A+4 483 T->A Frame-shift
pku A+3 507 A—>G Synonymous
pku A—l 683 ::IS 150 Insertion
pku A-2 790 A->T Ile —> Phe
pku AH 901 G—aT Ala -’ Set
pku A+5 901 G—>T Ala -’ Ser
pku A-5 901 G—>T Ala —> Ser
pku A-6 901 G—>A Ala -’ Thr
pku A+2 1142 G—’C Gly —’ Ala
pku A-4 1385 C—aT Thr -> Ile
nadR A-S 74 T—>A Leu —> Gln
nadR A+6 77 A—aC Gln —> Pro
nadR A+3 103 A-—>G Thr -> Ala
nadR A+1 169 ::IS 150 Insertion
nadR A-2 193 G—>A Glu —> Lys

 

 

nadR A-6 335 A—aT Asp —’ Val

nadR A-l 889 G->A Gly —> Ser
nadR A+2 902 A—>G Tyr -> Cys
nadR A+4 902 A—>G Tyr —’ Cys
nadR A+5 913 G—aA Ala —> Thr
nadR A-4 961 A—>G Thr —> Ala
nadR A-3 1031 A-aC Tyr -* Ser
ppr AH -904 ::IS150 Insertion
ppr A-2 -742 C—>T Non-coding
ppr A-5 -800 A—>T Non-coding
ppr A-2 -873 T-eC Non-coding
ppr A-2 141 T—>G Phe —9 Leu
19pr A+6 748 A—>C Ile —> Leu
ppr A-4 761 T—>C Leu -> Pro
ppr A-2 1280 G—>A Gly -* Asp
ppr A-l 1411 A—>T Ile -> Phe
[hokB] A+6 397 T->G Adjacent to ych
[hokB] A-2 302 G l-bp insert in ych
hokB A+6 -69 ::IS150 Insertion
hokB A-5 -70 ::ISI50 Insertion
hokB AH -85 ::ISISO Insertion
hokB A-3 -147 ::IS150 Insertion

 

Table 1 (cont’d)

Perhaps most genes accumulated mutations after 20,000 generations, such that ﬁnding
mutations in a candidate gene in several or all populations is unremarkable.

To that end, we compared the results of sequencing the candidate genes with
sequences obtained, in a previous study, of randomly chosen gene regions (Lenski,
Winkworth and Riley, 2003). Thirty-six randomly chosen gene regions, of ~500-bp, were
sequenced in the ancestor and two clones randomly sampled from each of the 12 evolving
populations at 20,000 generations. The ancestral nucleotide sequences for these regions
were deposited in GenBank and given accession numbers AY625099-AY625134. The
total length sequenced from each clone was 18,374 bp. A total of 8 mutations was found
in the samples; the precise details of each mutation are provided elsewhere (Table 3 in
Lenski, Winkworth and Riley, 2003). In those cases where one or both clones had a
mutation in a particular gene, that region was sequenced for three more clones from the
affected population. Four mutations, including 3 in population A-4 and l in population
A+3, were present in all ﬁve clones. They are counted as substitutions in Table 2 of this
paper. The other four mutations were in population A+6, and all were polymorphic, with
two present at 80% and two others at 20%. For the analyses in this paper, A+6 is
considered to have two substitutions in the random genes, which corresponds to the
number of cases in which a mutation reached majority status (and also equals the
summed frequency across the four polymorphisms). Thus, we count six substitutions in
all in the random genes. Three of these substitutions were synonymous, and the other
three non-synonymous point mutations. Moreover, the substitutions in random genes
were all found in mutator populations. We performed four different statistical tests of the

hypothesis that natural selection drove parallel changes in the candidate genes.

Table 2. Number of mutations in random and candidate genes in 12 E. coli populations

after 20,000 generations. Numbers of mutations and substitution rates are pooled across

36 random genes and 4 candidate genes. Populations A-2, A-4, A+3, and A+6 became

mutators; all others retained the ancestral mutation rate. *Excluding the mutations (one

in A-l, three in A+1) used to identify the candidate genes.

 

Population

A-l

A-2
A-3
A-4
A-5
A-6

A+l

A+2
A+3
A+4
A+5

A+6

Random genes (18,374 bp total) Candidate genes (7,150 bp total)

 

 

 

Number of Rate per Number of Rate per
mutations 1000 bp mutations 1000 bp
I *0— T 0.000 3 0.420 —
" " 2* 0.280
0 0.000 6 0.839
0 0.000 3 0.420
3 0.163 3 0.420
0 0.000 4 0.559
0 0.000 2 0.280
0 0.000 4 0.559
" " 1* 0.140
0 0.000 2 0.280
1 0.054 3 0.420
0 0.000 2 0.280
0 0.000 2 0.280
2 0-109 4 0.5 59

 

10

First, we compared overall substitution rates for the candidate and random genes,
with an expectation of a higher rate in the candidates because they accumulated
substitutions by selection as well as drift. All 12 populations had higher substitution rates
in the candidate genes (Table 2), which is highly unlikely by chance (sign test, p =
0.0002). This result is unaffected by excluding the insertions used to identify the
candidates in populations A-1 and A+1; both had substitutions in one or more genes
whose candidacy was identiﬁed in the other population, and neither had any substitutions

in the random genes.

Second, if mutations in the candidate genes were beneﬁcial, then we expect an
excess of non-synonymous substitutions. For this statistical test, shown in Table 3A, only
point mutations in protein-coding regions were included, with 27 such mutations in the
candidate genes and 8 in the random genes. Three of six point mutations in random genes
were synonymous, but only 1 of 27 was synonymous in the candidate loci (Table 3A).
This difference is signiﬁcant (Fisher’s exact test, p = 0.0136) and supports the hypothesis
that the mutations substituted in the candidate genes were beneﬁcial. An additional 3
non-point mutations occurred in the protein-coding regions of the candidate genes
including 2 IS insertions and 1 one-bp deletion; these mutations could be regarded as
non-synonymous because they change the resulting amino-acid sequence. If these 3
additional mutations are included in the analysis, the outcome remains signiﬁcant (p =
0.0104).

Third, theory predicts that the substitution rate for neutral mutations should scale
with the mutation rate (Kimura, 1983), whereas the substitution rate for beneﬁcial

mutations is subject to diminishing returns in large asexual populations (Gerrish and

11

Lenski, 1998; de Visser er al., 1999; Wilke, 2004). Four populations became mutators
and had mutation rates ~100-fold higher than the other populations. Only half the 30
point mutations in candidate genes were in mutator populations, whereas all six
mutations in random genes were in mutators (Table 3B). This difference is signiﬁcant in
the direction expected if candidates experienced selection favoring new alleles (Fisher’s
exact test, p = 0.0279). The statistical test shown in Table 38 includes 30 point
mutations in the candidate genes, and 6 point mutations in the random genes. There were
8 additional mutations in candidate genes including 7 IS insertions and l one-bp deletion;
1 IS insertion was in a mutator population, and all others were in non-mutators. If these 8

additional mutations are included in this analysis, the outcome remains signiﬁcant (p =

 

 

 

 

 

0.0211).

A Synonymous Non-

. . __,_ ,.____ __ ____ synonymous
Candidate 1 26
Random 3 3

__ B _ __ ,Mutator Non-mutator
Candidate 15 15
Random 6 0

 

Table 3. Candidate and random genes differ in relative abundance of (A) synonymous
and non-synonymous point mutations, and (B) point mutations in mutator and non-
mutator populations

Fourth, if mutations in the candidates were neutral, then the numbers of

substitutions in the populations should follow a Poisson distribution (if all populations

had the same mutation rate) or be clumped in certain populations (given heterogeneity in

12

mutation rates). By contrast, if the mutations were beneﬁcial, and if different mutations
in the gene conferred the same beneﬁt, then one would expect a more uniform
distribution of mutations. Unlike the ﬁrst three tests, this test is independent of the forces
affecting the random genes. The distributions are the most uniform possible given the
numbers of mutations in two of the candidate genes (Figure 2). For nadR, there were 12
substitutions, with each population having exactly one; the likelihood of this distribution
by chance is 121/1212 < 0.0001. For pku, the chance of 11 populations having one
mutation and one having two mutations is 0.0004. Moreover, these calculations are very
conservative because the four mutator populations should push strongly away from
uniformity, making the observed distributions even more unexpected. The other two
candidates do not deviate signiﬁcantly from the Poisson distribution, but that may reﬂect

the smaller numbers of mutations in those genes and the conservative nature of this test.

We chose both candidate and non-candidate genes a priori, as explained. The
two mutations found in or near ych gene do not ﬁt into either category, and they were
excluded from our analyses on that basis alone. It is possible that ych is related to
hokB/sokB, given its proximity and unknown functionality, but it is also possible these
loci have nothing to do with each other. If we included ych with the random genes, it
would not weaken any of our four tests and would, in fact, strengthen one of them. The
ﬁrst test (Table 2) compares the density of mutations found in random and candidate
genes; adding one random mutation each to A-2 and A+6 would not change the fact that
12 of 12 populations have higher density in the candidate genes. The second test (Table
3A) is unaffected, because neither ych mutation counts as synonymous or non-

synonymous; one was outside the coding region and the other was not a point mutation.

13

Figure 2. Distribution of numbers of substitutions in the 12 populations for the four

candidate genes. Observed distributions are shaded; Poisson distributions with the same

mean as the observed distribution are shown in outline.

Populations

 

 

 

    

12 ppr-rodA
(I:
c
.9
E
3
a.
o
n.
0 1 2 3+ 0 1 2 3+
Mutations Mutations

14

The third test (Table 3B) compares the four mutator and eight non-mutator populations;
both ych mutations were found in mutator populations, and including it as a random
gene would strengthen that already signiﬁcant result. Finally, the fourth test (Figure 2)
would be unaffected because non—candidate genes do not enter into the analysis. If we
instead included ych as part of the hokB/sokB candidate genes, the third test would be

slightly weakened but remain signiﬁcant, while the other tests would not be affected.

DISCUSSION

The four tests individually and collectively support the hypothesis that parallel
evolution in the candidate genes was driven by natural selection favoring the mutant
alleles. The ﬁrst two tests are also consistent with the alternative hypothesis that the
candidate genes had relaxed selective constraints and could thus accumulate mutations
without adverse effects. The third and fourth tests clearly reject this alternative because,
if it were true, mutator populations should accumulate more substitutions and numbers
would not be uniform across populations. Another alternative is that the candidate loci
contain ‘hot spots’ with mutation rates much higher than the genome average. This
alternative also runs counter to the test comparing mutator and non-mutator populations,
unless one further supposes that the sites are independent of the DNA repair pathways
that are defective in the mutators; but substitutions in the candidates included transitions
and transversions as well as the original insertions, whereas substitutions in the random
genes occurred only in the mutators and all had signatures reﬂecting defects in DNA

repair (Lenski, Winkworth and Riley, 2003). The several IS insertions in hokB/sokB

15

suggest an increased rate of those mutations at that locus but such a bias, if it exists, does

not contradict the possibility that the substitutions were adaptive (Cooper et al., 2001).

The four tests collectively provide compelling evidence that the mutations that
were substituted in the four candidate genes are beneﬁcial in the environment employed
in the evolution experiment. We do not know, however, the functional bases for their
beneﬁcial effects. At ﬁrst glance, the fact that the four candidate genes were ﬁrst
identiﬁed by lS-element insertions in the focal populations might suggest that the
beneﬁcial mutations are knockouts. However, a more nuanced View is appropriate for
several reasons. First, most mutations found by sequencing these genes in the other
populations are unlikely to act as knockouts, with the probable exception of the several IS
insertions in hokB/sokB and one frameshiﬁ mutation in pku (Table 3). Thus, even if the
originally discovered mutation in a gene were a knockout, some of the other populations
may have substituted mutations with more subtle effects. Second, the original IS
insertion affecting ppr-rodA was not in the reading frame but, instead, in the upstream
regulatory region, where IS elements can exert more subtle effects on gene expression
(Mahillon and Chandler, 1998). Third, in the case of nadR, the affected protein is bi-
functional with both repressor and transport domains (Penfound and Foster, 1999;
Schneider et al., 2000). A knockout of one function could leave the other function intact;
and, in the case of the repressor function, even a knockout would cause elevated

expression of the de-repressed genes.

The following hypotheses suggest how mutations in the candidate genes might
enhance ﬁtness in the long-term experiment (Schneider et al., 2000). The pku gene

encodes one of two pyruvate kinases that catalyze the conversion of

16

phosphoenolpyruvate (PEP) and ADP into pyruvate and ATP. PEP is also used by the
phosphotransferase system (PTS) to transport glucose into the cell. By slowing the
former conversion of PEP to pyruvate, mutations in pku might make more PEP
available to drive the PTS-mediated uptake of glucose, which is limiting in the
environment of the long-term evolution experiment. As noted, nadR encodes a bi-
functional protein that is involved in several aspects of NAD metabolism, itself a key
metabolite in many different pathways. Several genes involved in NAD synthesis and
recycling are repressed by the NadR protein, and mutations in nadR might increase their
expression and the resulting intracellular concentration of NAD. The evolved bacteria
have higher maximum growth rates as well as shorter lags following the daily transfer
into fresh medium (Vasi, Travisano and Lenski, 1994), and increased levels of NAD
might be required to achieve one or both of these advantages. The hokB/sokB locus is
one of several loci in E. coli related to the hok-sok locus of plasmid R1; hok encodes a
toxin and sok an anti-sense RNA that blocks translation of the toxin. Together, these
activities kill bacteria that lose the plasmid, a function that might beneﬁt the plasmid but
is obviously harmful to the bacteria. Inactivation of hokB/sokB would therefore beneﬁt
the bacteria (in the absence of plasmids), and indeed other copies of hok/sok loci in E.
coli contain IS insertions (Pedersen and Gerdes, 1999; Schneider et al., 2002). The
ppr-rodA operon encodes two essential proteins involved with peptidoglycan synthesis
and coupling of cell-wall synthesis to the cell cycle (Begg and Donachie, 1998). All 12
populations evolved much larger cells (Lenski and Travisano, 1994), which may require
more peptidoglycan synthesis or changes in the timing of its synthesis in relation to the

cell cycle.

17

Our results demonstrate that evolution in these E. coli populations was often
parallel at the level of genes, but only rarely were substitutions identical at the base-pair
level. This latter point contrasts with results obtained in replicate populations of virus
¢X174, where almost half the substitutions were identical (Whichman et al., 1999). We
also observed variation in the extent of parallelism among the bacterial candidate genes,
with ppr and hokB/sokB showing less evolutionary repeatability than nadR and pku.
Future work, including additional generations and genetic manipulations, may reveal
whether these gene-level differences between populations will eventually disappear or,
alternatively, whether epistatic interactions among the mutations will sustain their

divergence indeﬁnitely.

18

CHAPTER 2

POPULATION DYNAMICS OF BENEFICIAL MUTATIONS AND THE THEORY
OF CLONAL INTERFERENCE

ABSTRACT

When multiple beneficial mutations arise independently and co-occur in an
asexual population, one and typically only one will persist in the long run. The
competition between asexual clones has been termed clonal interference. Here I use
simulations to investigate the population dynamics of adaptation under conditions of
large population size and high beneficial mutation rates. Multiple lineages often take
multiple adaptive steps before one out-competes all others. Even in such situations mean
ﬁtness typically increases in a step-like manner. This process increases the diversity
within evolving populations and allows populations to explore many genetic possibilities
that do not become ﬁxed in the population. This process also affects the probability of
ﬁxation of alleles that differ in their subsequent potential to adapt, but not their

immediate ﬁtness.

INTRODUCTION
Adaptation in the absence of genetic recombination requires mutations that ﬁx to
occur along a single line of descent. Beneﬁcial mutations that occur elsewhere in the
population will eventually be eliminated, but may, in the meantime, slow the rate of
ﬁxation of the mutations that do ﬁx (Fisher 1930; Muller 1932; Crow and Kimura 1965;

Gerrish and Lenski 1998; Campos and de Oliveira 2004; Wilke 2004). Muller (1932)

19

described this process thus: “Without sexual reproduction, the various favorable
mutations that occur must simply compete with each other, and either divide the ﬁeld
among themselves or crowd each other out till but the best adapted for the given
condition remains” (Muller 1932, p. 121). Sexual populations, on the other hand, can
recombine independently arising beneﬁcial mutations into the same genome. The
resulting increase in the rate of adaptation for sexual populations is often called the

F isher-Muller hypothesis for the advantage of sex (Fisher, 1930; Muller, 1932).

Describing the within-population dynamics of adaptation in asexual populations is
a particularly difﬁcult problem. In part, the difﬁculty is due to the many unknown
parameters, such as the fraction of mutations that are beneﬁcial, the distribution of
mutational effects, and the manner in which mutations interact with each other (epistasis).
In addition, the fixation probability of an allele is dependent both on the genetic
background into which it happens to fall, and on competition with other genotypes
present in the population. For cases of complete linkage this competition between clones
has been termed clonal interference (Gerrish and Lenski, 1998).

The concept of clonal interference was deﬁned by Gerrish and Lenski (1998) as
the “phenomenon, whereby the fate of an original beneﬁcial mutation is altered by the
appearance of a superior alternative mutation” (Gerrish and Lenski, 1998, p. 128). Both
the etymological and the conceptual foundations for ‘clonal interference’ were
established by Muller (1932, 1964), for example when he wrote “. . .mutual interference
with one another’s multiplication exerted by two or more different lines of advantageous
mutants in asexual populations” (Muller 1964, p. 5). Recently, the term clonal

interference has been used in this general way, without temporal or numerical limits

20

(Miralles et al., 1999; Campos and de Oliveira, 2004; Wilke 2004). Likewise, the term
‘theory of clonal interference’ has been used to refer to the body of conceptual work that
is aimed at understanding the dynamics involved (Wilke 2004).

Recent analytical models have advanced our understanding of adaptation in
asexual populations. Wilke (2004) presented a closed form equation for the rate of
adaptation in large asexual populations, building on previous work (Gerrish and Lenski,
1998; Orr 2000). Wilke’s model gives the rate of substitution, the expected size of
mutations that will reach ﬁxation, and the overall rate of adaptation (change in ﬁtness
over time) for large asexual populations. Gerrish (2001) also derived the temporal
variance in the rate of substitution in large asexual populations. Hermisson and Pennings
(2005) developed a single locus model to look at the within-population dynamics by
following multiple independently derived copies of a selected allele. They concluded that
high mutation rates or large selective coefﬁcients could lead to multiple copies of an
allele sweeping a population together, a phenomenon they called a “soft sweep”
(Hermisson and Pennings, 2005).

Empirical evidence for the within-population dynamics of adaptation has come
from experimental evolution with bacteria. Atwood, Schneider and Ryan (1951a,b)
presented what they called the ‘theory of periodic selection’ using experimental
populations of Escherichia coli. Here occasional beneﬁcial mutations sweep a
population, eliminating nearly all genetic variation and resetting the genetic background
of the population. It is clear from work of Atwood et al. (1951a,b) that they perceived
the selective sweeps as discrete events, with a single beneﬁcial mutation dominating the

population at each time point.

21

More recent experiments indicate that the population dynamics may be more
complex. Imhof and Schlotterer (2001) used a rapidly mutating neutral locus to track
multiple lineages within a population of E. coli, showing that many independently
arising beneﬁcial mutations coexist transiently within a single evolving population.
Notley-McRobb and Ferenci (2000) studied the adaptive population dynamics in
continuous culture of E. coli. They showed that at least 13 different mutations at the mgl
locus spread through one population together, creating a phenotypic sweep, but not a
genetic sweep (hence an example of Hermisson and Pennings’ “soft sweep” (2005)).
Likewise, a later adaptive step in a similar population included at least 15 mlc alleles
(Notley-McRobb and Ferenci 2000). However, it is not clear from their data whether
independent subpopulations coexisted through multiple adaptive steps. The mutations
seen at both the mgl and mlc loci in that study were knockout mutations having high
mutation rates, and given the large population sizes used, were expected to occur in ~10“
bacteria at each round of replication. Also, in two serial transfer long-term E. coli
cultures Papadopoulos et al. (1999) showed that many different insertion sequence
associated mutations coexisted, at least some of which were likely to be beneﬁcial
(Schneider et al. 2000, Chapter 1).

Two experiments have shown that, within a single population, multiple
subpopulations may accumulate multiple beneficial mutations before one ﬁnally out-
competes all others. Mao et al. (1997) showed that, in populations challenged with a
series of hard selection regimes, multiple lineages coexisted through multiple steps. In
that experiment, genotypes with increased mutation rates steadily displaced the genotypes

with lower mutation rates across multiple adaptive steps. Shaver et al. (2002) looked at

22

three long-term E. coli populations that evolved increased mutation rates. They saw that,
while the mutator phenotype was polymorphic in the population, it became progressively
more ﬁt through time, suggesting that the lineage having the mutator phenotype acquired
several additional beneﬁcial mutations before it ﬁxed in the population.

Tenaillon et al. (1999) carried out a simulation study of asexual populations.
They simulated asexual populations over a large range of population sizes, mutation
rates, and adaptive landscapes (i .e., the number of mutations, the effect of each mutation,
and the variation in effects were varied) to address the effect of these factors on the
ﬁxation of mutator phenotypes. Importantly, they suggest that a mutator phenotype, (i .e.,
an increased overall genomic mutation rate) is likely to become common when beneﬁcial
mutations are common, and that the mutator phenotype is likely to ﬁx because it can
accumulate several mutations faster than the non—mutator population (Tenaillon et al.,
1999).

The studies described above provide evidence that within-population dynamics of
asexual populations are complex, and have implications for selection acting indirectly on
phenotypes that generate more or less selectable variation (i.e., evolvability).
Nonetheless, a detailed understanding of the within population dynamics, described
initially by Fisher and Muller, is still missing. For instance, how many independent
beneﬁcial mutations will arise in an asexual population before the ancestral genotype is
eliminated through selection? How long will the various subpopulations coexist and
compete with each other before one out-competes all others? What, if any, consequences
will there be for the types of traits that are selected by this clonal interference regime,

especially with respect to evolvability? With a lack of appropriate analytical models we

23

must turn instead to numerical simulations of evolving populations to explore the

behavior of asexual populations in a variety of different situations.

METHODS

Simulations. Populations consisted of N individuals and were propagated in
discrete generations with stochastic reproduction and mutation. An inﬁnite sites
approach was taken, such that back mutations were not considered and no two mutations
were identical. With this approach a genotype is a genetically and phylogenetically
homogeneous and distinct group. Reproduction was achieved by choosing the size of
each genotype, n,, at the next time from a Poisson distribution with mean
n,(t + 1) = n,(t)(w,./(T)) , where a), is the ﬁtness of genotype i, and E is the mean
population fitness; a further adjustment is made so the expected total number of
individuals each generation is held constant at N. Genotypes with more than 1000
individuals were replicated deterministically with the same expectation, in order to
decrease the time of the simulations. All mutations that were introduced were beneﬁcial;
therefore, these simulations do not consider the additional complexities of deleterious
mutations. Thus this model is more reasonable for organisms with a low genomic
mutation rate (Gerrish and Lenski, 1998), such as Escherichia coli (Lenski, Winkworth
and Riley, 2003). A genotype with n, individuals would generate m mutant individuals,
where m is chosen from a Poisson distribution with a mean m = n, x U, and U is the
beneﬁcial mutation rate per individual per generation. Mutant individuals inherited all of

the mutations of their parent genotype, plus one new one. The entire phylogeny of

24

genotypes was saved, which allowed for the assessment of the most recent common
ancestor (MRCA). The source code is available upon request from the author.

An additional program was written to run in MATLAB. This program was
considerably simplified by only maintaining the present population. It was still in discrete
time and used the same rules for mutation and reproduction. This simpliﬁed version was
used only for the case study described below (Figure 4), because it permitted graphing.
The code is presented in Appendix 1.

The time required for these simulations is primarily determined by the mutation
supply (S), which equals the product of beneficial mutation rate and population size.
When the mutation supply is very low (e.g. <10”) it takes a long time for the population
to adapt. When the mutation supply is very large (e.g. >102) the large number of
genotypes created increases the computational time because each genotype is replicated
independently. As a consequence of these considerations the simulations were run over a
range of mutation supply rates instead of mutation rates. The same mutation supply,
achieved by different combinations of mutation rate and population size, can give quite
different results.

The range of mutation rates and population sizes results in very different rates of
adaptation and very different time scales of interest. To accommodate this fact, I kept
track of the shifts in the most recent common ancestor (MRCA) of the population (i. e.,
the MRCA of the population at a given generation is different from the MRCA of the
population at the preceding generation). Each simulation started with N individuals of a
single genotype. To avoid artifacts that might be introduced by the homogeneous initial

conditions, I began calculating the various statistics, which will be described below, from

25

the third time the MRCA shifted and ended on the 13‘“. Keeping time in number of shifts
in MRCA ensures that, on the low end of mutation rates, multiple substitutions are indeed
measured. On the high end of mutation rates, it ensures that there are true allelic ﬁxation
events. Following the 13Lh shift in the MRCA, the population simulations were allowed to

continue until every mutation that arose in that window of time was either eliminated or

fixed.

RESULTS

A case study. In Muller’s seminal paper (Muller 1932) he introduced a ﬁgure
(copied below as Figure 3) that is particularly useful for understanding the spread of
clones in asexual populations. In that ﬁgure, time is represented along one dimension and
the widths of shaded regions, in the orthogonal dimension, show the sizes of various
subpopulations. However, it is evident from close examination that Muller’s ﬁgure is an
illustrative schema rather than an actual simulation. By contrast, Figure 4 represents an
actual simulation (the MATLAB simulation). The population size in this simulation was
107. Beneficial mutations happened stochastically at a rate of 10*5 per individual per
generation. The ﬁtness effects of the mutations were drawn from an exponential
distribution with a mean of 0.025 (i.e., a 2.5% advantage).

A close examination of Figure 4 illustrates several of the main points of this
paper, which will subsequently be backed up with extensive simulations. First, multiple
single mutations arose on the ancestral genetic background and achieved significant

proportions in the population before any one had a chance to sweep the population. I will

26

Figure 3. Reproduction of figure 1 from Muller (1932) that depicts the spread of

beneficial mutations in asexual and sexual populations.

EVOLUTIONARY SPREAD OF
ADVANTAGEOUS MUTATIONS

IN ASEXUAL REPRODUCTION; IN SEXUAL REPRODUCTION

 

 

 

 

 

 

DIAGRAM 1. Showing the method of spreading of advantageous muta-
tions in asexual and sexual organisms, respectively. Time is here the ver-
tical dimension, progressing downwards. In the horizontal dimension 9.
given population, stationary in total numbers, is represented. Sections of
the population bearing advantageous mutant genes are darkened, propor-
tionally to the number of such genes. In asexual organisms these genes
compete and hinder one another ’8 spread; in sexual organisms they spread
through one another. See, however, qualiﬁcations in text (p. 121), explain-
ing limitations of a diagram in only two dimensions. The diagram is
simpliﬁed in a number of other ways as well. For example, all mutants
represented are shown as spreading at nearly the same rate, if they do
spread, and thisrate is shown as about the same regardless of the extent
to which they have entered into combination with one another.

Figure 4. Results of a single simulation plotted in the style of Muller (1932). In this
simulation N = 107, U = 10", and mean 3 = 0.025 (data from the MATLAB simulation).
Each shaded region represents a clone with a beneﬁcial mutation (colored according to
ﬁtness), and the vertical dimension indicates the frequency in the population. Each
subpopulation is placed within the clonal lineage that gave rise to it. The solid blue line
indicates the average ﬁtness of the population. Note the step-like increases in average
ﬁtness even in the absence of any allelic ﬁxation. The scale bar to the right indicates the

ﬁtness represented by various shades. Images in this dissertation are presented in color.

_a ._L —L ._A .5
' '01 '0) 'xi '00
r '-

average fitness
_L
l\) (A) A

 

 

 

1- 181'”
1.1 - s a I,
1 g ‘
0-9‘ l
0.8 . . ' l
100 200 300 400 500

time (generations)

28

refer to this group of genotypes collectively as a cohort. The single mutations confer
different ﬁtness effects. The one conferring the highest ﬁtness is increasing its
proportion relative to the others, whereas all of them are increasing relative to the
common ancestor.

The dynamics of the increase in frequency of beneﬁcial mutations helps to
explain the presence of multiple subpopulations. Much of the time a beneﬁcial allele is
polymorphic in a population it is at very low frequency. For example, a mutation with
5% ﬁtness advantage sweeping a population without interference will take, on average,
217 generations to go from a frequency of one in 10‘5 to one in 20, but just 117
generations to go from 5% to 95% (Haldane, 1924). This lag, between the time a
beneficial mutation occurs and the time until it appreciably inﬂuences the mean
population fitness, creates a window when the various single mutant subpopulations are
each growing almost independently of each other and the mean population ﬁtness is still
very close to the ﬁtness of the common ancestral genotype. Only after one or more of the
subpopulations causes a substantial decline in the ancestral genotype do the various
single mutations compete appreciably amongst themselves.

The second point to be made from Figure 4 is that the most ﬁt single mutation
may not ﬁx before double mutations become common, nor even will it necessarily prevail
in the long run. It is evident from Figure 4 that subpopulations with two mutations arose
and became common before the most ﬁt of the single mutation genotypes eliminated all
others. Hence multiple lineages co-existed through multiple adaptive steps.

Stair-step adaptation. When the beneﬁcial mutation rate is low, mean population

fitness is expected to increase in a step-like manner as rare beneﬁcial mutations sweep

29

through the population (Lenski et al., 1991). As more mutations with variable ﬁtness
arise at various times and on different genetic backgrounds, increases in mean population
ﬁtness might be expected to become more irregular. In Figure 4, however, the ﬁtness
increases still take on a stair-step appearance (see also Gerrish and Lenski 1998). This is
not an intuitive result. With a high beneﬁcial mutation rate there may be many
subpopulations present and in competition with each other. These subpopulations have
different ﬁtness values, and may be actively displacing each other during the nearly ﬂat
part of the step. In Figure 4 the average ﬁtness indicates there is little adaptation between
400 and 500 generations. Yet we can see that during that time there were six major
subpopulations, and the more ﬁt were actively displacing the less ﬁt. The differences in
fitness among genotypes within a cohort, however, are subtle compared to the differences
between cohorts.

The step-like increase in average ﬁtness occurs because the ﬁtness difference
between successive cohorts of genotypes carrying beneﬁcial mutations is large relative to
the difference within each cohort. One explanation for this may come from the statistics
of orders (Arnold et al. 1992). Looking within a single cohort, only the few most ﬁt of
the mutations that arise in that cohort will be competitive. When mutations are drawn
from an exponential distribution, the expected spacing among the top mutations does not
change as more mutations are sampled (i.e., the expected distance between the best and
the second best is the same no matter how many mutations are sampled). The mean
fitness effect of these top mutations, though, increases logarithmically (Arnold 1992). As
a result, the strength of selection within a cohort remains the same while the strength of

selection between cohorts increases with increasing mutation supply.

30

Simulations over a broad parameter range. I now move from the case study to a
systematic evaluation across a range of population sizes (N) and beneﬁcial mutation
supply rates (8). It has been shown previously that clonal interference increases with
increasing population size, increasing mutation rate, and decreasing size of mutational
effects (Muller 1964; Crow and Kimura 1965; Gerrish and Lenski 1998; Campos and de
Oliveira 2004; de Oliveira and Campos 2004; Wilke 2004). I extend these results by
examining the effect of these parameters on within-population dynamics. I will do this
by focusing on the ‘side branches,’ that is, the subpopulations containing mutations that
do not ﬁx. Figure 4 showed that multiple subpopulations could accumulate multiple
beneﬁcial mutations before any one genotype eventually out-competes all others. I will
examine the number of lineages, the number of adaptive steps through which they co-
exist, and the proportion of the total population that these side branches take up.

One set of simulations is presented in detail. Generalizations about other
parameters will be mentioned in a following section. This set of simulations covers
population sizes from 10" to 109, and mutation supply rates (the product of mutation rate
and population size) from 10’3 to 101 beneficial mutations per population per generation.
Note that the use of mutation supply rate allows for the simple calculation of mutation
rate (U), as U = S/N. However, since the same range of S was used for each N, a
different range of U was used for each N (see Methods). The ﬁtness effects of mutations
were drawn from a gamma distribution with a shape parameter of 1 (making it equivalent
to an exponential distribution) and a size parameter of 0.01. Thus, the mean beneﬁt of
mutations was 1%. For each combination of mutation supply and population size 1000

simulations were performed.

31

Eﬂect of N and S on the number of interfering mutations. I am interested in the
proportion of the population that has at least one beneﬁcial mutation that does not ﬁx. In
one sense, these additional mutations constitute a load on the population. The mutations
that do not ﬁx are slowing down the rate of ﬁxation of the mutations that will eventually
ﬁx by increasing the mean ﬁtness of their competition. On the other hand, they are
temporarily increasing the rate of increase in ﬁtness of the population as a whole, by
adding more beneﬁcial alleles. Furthermore, at any given time it is impossible to know
which genotype will be the ancestor to the future population. Each of the new beneﬁcial
mutations may therefore also be increasing the rate of adaptation by increasing the
number of genotypes available as genetic backgrounds for further adaptation.

In order to measure the proportion of individuals in the population with at least
one mutation that does not ﬁx I ﬁrst found the persisting line of decent or PLOD. The
PLOD is the lineage of genotypes that can be traced from the founding genotype, to the
most recent common ancestor of the ﬁnal population. For the period of time that I kept
the data on each simulation (i.e., from the 3rd to the 13‘h shift in the MRCA) I calculated
the number of individuals that were of a genotype not on the PLOD and divided it by the
total number of individuals extant in the same window of time. These data are plotted in
Figure 5. At the lowest mutation supply there are very few individuals that have
genotypes not on the PLOD. As the mutation supply increases the percent of the
population having non-ﬁxing mutations increases. At a mutation supply of 10‘2 several
percent of the population have mutations that will not ﬁx. In the most extreme parameter
combination (N = 10“, S = 10), over 50% of the population contained at least one

beneﬁcial mutation that did not fix. Recall that for a given beneﬁcial mutation supply rate

32

Figure 5. The average proportion of the population that had at least one beneﬁcial
mutation that does not ﬁx. With increasing beneficial mutation supply, more of the

population has a beneﬁcial mutation that does not ﬁx for every population size (104-109).

 

:3

population off PLOD

9171-
311:.

A

.1 ' . 17.17.41"- ‘

 

 

 

 

 

 

.001 0.01 0.10 1.00 10.0
mutation supply

33

the simulations with different population sizes also had different mutation rates per
individual; the combination of mutation rate and population size used to achieve a given
mutation supply made a difference. At low mutation supply, larger populations have
more of their population with mutations that do not ﬁx. Conversely, at high mutation
supply, smaller populations have more of their population with mutations that do not ﬁx.
The proportion of the population that had at least one mutation that did not ﬁx
was made up of a number of different genotypes. Figure 6 shows the number of
mutations not on the PLOD for different population sizes and mutation supply rates. The
number of alternate genotypes was scaled to the number of mutations that ﬁxed to allow
comparisons across the entire parameter range. At the same mutation supply, larger
populations generated more genotypes per mutation that ﬁxed than smaller populations.
Most of these genotypes were inconsequential because they were eliminated by
chance at small sizes (Fisher, 1922; Haldane, 1927). It is of interest to know, of the
genotypes that were generated, how many individuals each contained. I calculated the
cumulative population size of each genotype by summing the number of individuals of
that genotype over all generations it existed. The inverse cumulative distribution
function (ICDF), which is also called a survival function, was then plotted for the
cumulative size of all genotypes that have at least one mutation that did not ﬁx. These
curves are presented in Figure 7, where each curve is the average of 1000 simulated
populations with identical parameters. A linear relationship here on a log-log plot would
indicate that a power law describes the association. Genotypes reaching a small
cumulative number of individuals (e.g. 1-1000) ﬁt a linear relationship well. These

mutations represent those that were most inﬂuenced by drift while at low frequency and

34

Figure 6. The number of beneﬁcial mutations that occurred, per mutation that ﬁxed, over

a range of mutation rates and population sizes. Bar shading indicates population size.

 

 
  
   

10 . . . . . .
[3104 i
-105
.106

103— -m7 —.

-10"
-109

  

mutations/mutation that ﬁxes

_L
D

 

 

 

 

.001 0.01 0.10 1.00 10.0
mutation supply

35

Figure 7. Inverse cumulative distributions of the numbers of individuals of genotypes that
had at least one mutation that did not ﬁx. Data are scaled to the number of mutations that
did ﬁx. Each panel contains the data from one beneﬁcial mutation supply rate and gives
the average of 1000 simulations at each of 6 population sizes (10“ to 109). In every panel

curves are ordered, with the lowest population size on the bottom and the highest on top.

 

 

 

 

 
  

 

 

 

10M . s = 10‘3 10M 1 s = 10‘2
I l
10"2 10"2
23 a i
B 1vo 9. who,
1
“IA-2 10A-2r
1
10A- eeeeeeee ~ _ 0A..
13% 10"2 10"4 10"6 10"8 10‘1010"12 13W 10"2 10M 10"6 10"8 10*1010"12
Cumulative size of genotype Cumulative size of genotype
10ml 3 = 10‘1 10% t S = 1
l _
10"2 10"2 I
‘25 E
B 10"0 2 who
10"-2 10"-2
0A_ 0A-
13A0 10"2 10M 10"6 10"8 10‘1010"12 13W 10"2 10*4 10‘6 10"8 10*1010‘12
Cumulative size of genotype Cumulative size of genotype

1004 S = 10

10"-
13"0 1002 1004 10"6 1008 10‘1010"12
Cumulative size of genotype

36

are eventually lost by chance. All population sizes and mutation rates had nearly
identical shape in this part of the curve. This was expected because the effect of drift on
small subpopulations is independent of the total population size (N), as long as N is much
larger than the subpopulation. Also, all of these simulations used identical distributions
of mutational effect, and the chance of surviving drift at low population size is known to
be a function of size of mutation effects (Haldane, 1927).

However, the slopes of the curves change at high cumulative numbers in Figure 7.
Of all the genotypes that occurred, the ones that make up this latter part of the curve are
the ones that escaped stochastic loss and thus were interfering with mutations that did
eventually ﬁx. The point at which the slope changes is an indicator of how many
mutations are interfering. When mutation supply was low (e.g., 10“), the change in slope
happened at a low value because few mutations both escaped drift and failed to ﬁx, but
the slope change was greater, as mutations that escaped stochastic loss had less
interference from other mutations. At higher mutation supply there are many more
mutations that failed to ﬁx, but the latter part of the curve (for sizes >1000) was steeper,
indicating that their survival was limited more by clonal interference.

To summarize Figures 3, 4 and 5, within a given mutation supply, (i) larger
populations had more mutations that did not ﬁx, (ii) in larger populations more
individuals had alternative mutations, and (iii) in larger populations these mutations
tended to reach higher total numbers in the population. When we compare a range of
beneﬁcial mutation supply rates we see that, (iv) at lower mutation supply fewer
mutations interfered, but those that did tended to reach larger sizes, and (v) the percent of

the population that had mutations that did not ﬁx interacts in a nonlinear way with

37

population size. More generally, it can be seen that very many alternative mutations that
did not ﬁx arose and escaped stochastic loss.

Clonal interference and phylogenetic structure. In this section I look at the same
set of simulations, and add the phylogenetic information on the genotypes that was
collected. In doing this we can see not only the number of alternate genotypes, but also
their relationships to one another and the mutations that ﬁxed. These simulations
represent evolution in a “single niche” world (i.e., there is one dimension to ﬁtness).
Therefore, the phylogenies were dominated by a single line of descent (the PLOD), but as
we saw in Figure 4, clonal interference dynamics can create side branches in the
phylogenies that may accumulate several beneﬁcial mutations before they go extinct.

For this set of simulations, the numbers of mutations that were at each distance
from the PLOD are presented in Table 4. Again the data have been scaled to the number
of mutations that ﬁxed. At the lowest mutation supply rate (0.001), nearly all of the
alternate genotypes were lost by chance while rare. However, as displayed in Figure 7,
even in this parameter range it was possible to have mutations that achieved high levels
that did not ﬁx. At the lowest mutation supply ~52 mutations were lost by chance for
every one mutation that ﬁxed. Given that the average selective advantage of alleles in
these simulations is 0.01, this is in agreement with the theoretical estimate by Haldane
(1927) that the probability of ﬁxation is twice the selective advantage.

With a mutation supply of 0.1 side branches of two or three mutational steps were
common. This ﬁnding was surprising given that, with ~98% of the mutations being lost
to chance, 1000 generations are expected to pass for every two mutations that survived

genetic drift. But the stochastic timing of mutations, and the inherent lag between the

38

/—'L

.QOAm 2: 80¢ Swan: 83% a 38588 88 momma—E
288: 3888 .8 858:: 05 8:888 888888 8 858:: BE. .88 >253 82888 32.88:. new 08m cove—smog some 8
888886 202 80b 08 Sea 882m 88 .5888 can OOH: 05 88.: 8888 some 8 88888 we 838:: emanate 05. .v Beau.

 

 

 

 

 

 

 

8.8.8 :8 2
332888888 8888 88 2
888.8888 288.8 83.8 :
8-838888 828.8 83.8 238.8 «8.8 8
888.8588 2.388 88.8 8888 3.8 888 mg 8
888.8588 5:88 8:8 2888.8 888 8:88 «:8 3888 8.. 2.808 2.3. a
$3.88 888 3888 :88 2:88 888 83.88 88 228.8 43 82.8 8.5 a.
8888 8.2 838.8 8: 2888 88 2.85.8 M22 :88 as 28.8 v.2 a
2.298 new 588 88 2:88 3 288 48 2:8 «.2 £88 98 m
2.88 Ia 588 EN 828 a; $88 08 $88 ”.8 28.: as 3.
288 3.: 83.8 n: :8 a: 22.: N3 28.8 3.. :83 2: m
8.3 .8 238 mm. 5.8 «.8 2%: 28 28: 88 2.28 as a
aimﬂémimﬂa 8388882 8+mo_.~8+m8.~ 888.2888: 888.3888: :23 2.. 2 S
8-888888 a
$8828 888 8888888 :2 9
8888 8:8 8-8488883 5.888 8.8 8888 888 a. 3
8388 £88 98888 888 €888 8.8.8 2:88 9.88 2e88 83 o
3:88 388 288.8 8.8 2388 88 63.8.8 88 £888 88 208.8 83 m
858 I 85.8 2 888 8.. 8.8.8 8.2 2.898 8.. 8:8 8.~ a
8:8 4.: 2:8 N: 2.28 : 2.58 2: 2:8 8. :28 2. n
2a.: on: 8.: N8 28.: o: :88 n8 5.8 o8 3.8 2.... N
25 .8 :88 8m s33 83. :88 8m 2.88 33 2a: .8 _ _
:8 58.8 0888 8-82.8 3.~8.o 8888 :88 8.848888 m
8.88 888 E88 838 2:88 888 228.8 88.8 2288 888 5880888 ..
8:08 88 288.8 38 2.88.8 3.88 888 88 888 88 388 :3 n
838 8.8 688 :3 828 8.... 888 8.8 2.38 8.0 888 88 N
am: a: :8: w: 2:: w: :8: 8: 8.3.8 was G88 88 2 3
28888888 8888888 2.8888 :88 85888588 88388888 88888888 n
888 $8 888 9.3 23.88 33.8 2:88 9.8 28.88 8.8 :88 88 N
:58 68 2:8 as $.88 o8 2.8 8 2.5 .8 38 mm _ :8
2.88 8:8 2888 8:8 388 88.8 888 8:8 888 888 8888 £88 a
3.3 n8 :88 Q8 2.8: .8 :88 gm :58 in :38 «.8 2 :8...
9:: as: 9:: e1: 9:: 3.: no...— 8 :8 :83.
A7: 38 gens—=95 82.858 555:8

 

arrival of a mutation and the time it affected the mean fitness, ensured that these side
branches occurred. With higher mutation supply rates the side branches became even
longer and more common.

Figure 8 showed these data for the simulations that had a mutation supply rate of
10. The nearly linear relationship on the semilog plot indicates that the number of
genotypes that arose at a given distance from the PLOD decayed roughly exponentially
with distance from the PLOD. Here again, within a given S, the combination of mutation
rate and population size made a difference. Larger populations produce more genotypes
one and two mutations away from the PLOD, but the small populations produce more
genotypes further from the PLOD.

The genotypes at a given distance from the PLOD were not necessarily
phylogenetically independent. For example, a particularly successful genotype four
mutations away from the PLOD could have given rise to all of the genotypes ﬁve
mutations away, within in a single simulation. To examine this effect, for each class I
calculated a “uniqueness” value, which indicates the number of phylogenetically unique
lineages of a given length (d) that ended at a given distance from the PLOD. For one
simulation the data are presented in Table 5. In that simulation N = 107 and S = 10 (i.e.,
the beneﬁcial mutation rate per individual, U, was 106). In this case all 2070 genotypes 3
mutations off the PLOD converged to just 27 genotypes 2 mutations off the PLOD, and
further converged to 9 mutations one mutation off the PLOD. Values one cell below the
diagonal give the number of entirely unique branches off the PLOD of each length. In
Table 4 the average values of the entirely unique lineages are included in parentheses for

all parameter

40

Figure 8. The number of beneficial mutations that arise at various mutational distances
from the PLOD, scaled to the total number of mutations that ﬁx. The PLOD is the
persisting line of descent and is found by tracing the lineage from the MRCA of the ﬁnal
population to the founding genotype. The data plotted are the average over 1000

simulations at each population size (104-109) with a beneﬁcial mutation supply rate (S) of

10.
-O- N=10"4
S - 10 + N=10A5
+ N=10A6
10000 ! -K— N=10"7
1000 \ + N=10A8
100 \ —o— N=10"9

10 J \:\
1-~ \
0.1 \.

l

mutations/mutation that fixes

 

0.01 ;
0.001 4:
0.0001 I 7 T T
12345678910111213
stepsoff PLOD

41

 

Steps from d

 

PLOD l 2 3 4 5
O 17
1 40560
2 9047 150
3 2070 27 9
4 694 l 1 2 2
5 1 72 6 2 1 1

 

Table 5. The number of unique lineages of length d that end a given number of
mutational steps from the PLOD in a single population. In this simulation N = 107 and S
= 10. The PLOD is the single lineage that leads from the founding genotype to the
MRCA of the ﬁnal population. The bold numbers one cell below the diagonal indicate
the number of unique lineages that occurred in this simulation of each length.
combinations (again averaged over all 1000 runs and scaled to the number of mutations
that ﬁxed).

The mutations that did not fix were often clustered in genetic space (Table 4).
For example, in the simulations where N = 107 and S = 10 (U = 10‘), an average of 6.6
genotypes ﬁve mutations off the PLOD were created for every one mutation that ﬁxed.
These genotypes were, however, highly clustered. Of these, there were on average only
0.0314 unique phylogenetic branches leading to genotypes ﬁve mutations off the PLOD.
Evidently, side branches ﬁve mutations off the PLOD were extremely infrequent, but
when they did occur they were part of a subpopulation that contained many beneﬁcial
mutations.

The role of the distribution of mutation eﬂects. In addition to the simulations

described here further simulations were carried out with identical parameter ranges

except that they had different distributions of mutation effects. The results were

42

consistent with expectations from population genetics theory. For example, changing the
shape parameter of the gamma distribution to 2, while retaining the same mean allows for
a greater number of mutations to escape drift (data not shown). Also, using a constant
mutation effect size greatly increased the degree of interference, the number of alternate
genotypes per mutation that ﬁxed and the length of the side branches (data not shown).

A model for within-population selection on evolvability. The extended
interference dynamic described above may also inﬂuence the way in which traits that
generate beneﬁcial variation will be affected by selection. It has been shown that there
are two modes by which the frequency of an allele can change in a population. One is
selection between subpopulations within a cohort, where the subpopulations with high
fitness out-compete the subpopulations with lower ﬁtness. The second mode is for the
trait to be differentially represented in the next cohort, which opens the door for indirect
selection on evolvability. Genotypes that produce more beneﬁcial mutations or
mutations with larger beneﬁts can contribute disproportionately to the next cohort.
Recall that alleles may be polymorphic through several cohorts before they ﬁx. Fixation
will be inﬂuenced, therefore, not only by an allele’s ﬁtness advantage in the genetic
background in which it occurred, but also by its propensity to generate more or more ﬁt
beneﬁcial mutations (i.e., its evolvability).

To test this idea another set of simulations was performed on a ﬁnite and
completely deﬁned ﬁtness landscape. The simulations began with a homogeneous
population of the ancestral genotype. This ancestral genotype had ﬁve mutations
available, which each had identical ﬁtness values, but differed in the number of

subsequent beneﬁcial mutations (zero, one, two, three or four) available. Thus, these ﬁve

43

mutations differ only in their evolvability, where evolvability is deﬁned as the capacity to
generate heritable phenotypic variation (Kirschner and Gerhart, 1998). To avoid
confounding factors, back mutations were precluded. Also, for genotypes having at least
one beneficial mutation available, the total beneficial mutation rates were made identical.
This second provision inﬂated the beneficial mutation rate of the less evolvable
genotypes and made them even, in this respect, with the more evolvable genotypes. This
situation is unlikely in nature because genotypes with more beneﬁcial mutations available
will probably also have higher total beneﬁcial mutation rates. This provision made the
results conservative and ensured that any differences were attributable to the number of
alleles alone. Additional simulations were done without this provision, and the effects,
which are described below, were even more pronounced (data not shown).

Table 6 shows the results of these simulations. As expected, at low mutation
supply (0.001 and 0.01) populations had roughly equal probabilities of ﬁxing each of the
ﬁrst mutations. As the mutation supply increased, mutations that allowed more
subsequent adaptation became more likely to ﬁx in the population. With a mutation
supply rate of 0.1, the mutation allowing only the shortest adaptive trajectory became less
likely to ﬁx. With a beneﬁcial mutation supply rate of 1.0 the populations became more
likely to take the longer adaptive trajectories, and with a supply rate of 10 the longest
path was taken approximately 60% of the time. In these simulations there were no back
mutations, so once a mutation had ﬁxed the population was committed to that path.
Therefore, genotypes ﬁve adaptive mutations away from the ancestor must have arisen

before the population fully committed to its ﬁrst adaptive step.

Table 6. The fraction of times, out of 1000 simulations at each combination of N and S,
that populations ﬁxed each of the ﬁve possible ﬁrst mutations. The ﬁve mutations have
the same immediate ﬁtness advantage (10%), but differed in the number of subsequent
mutations available. Numbers in bold are signiﬁcantly above 0.2 (p < 0.025) and those in

italic are signiﬁcantly less than 0.2 (p < 0.025), assuming a binomial distribution.

 

 

 

 

 

 

 

Population Mutation Subsequently available beneﬁcial mutations
size (N) supply (S) 0 l 2 3 4
0.001 0.205 0.185 0.180 0.234 0.196
10,000 0.01 0.196 0.185 0.230 0.196 0.193
0.1 0.124 0.222 0.215 0.237 0.202
1 0. 000 0.121 0.246 0.298 0.335
1 0 0. 000 0. 000 0. 060 0.313 0.627
0.001 0.199 0.215 0.210 0.159 0.217
100,000 0.01 0.197 0.194 0.207 0.219 0.183
0.1 0.124 0.205 0.214 0.235 0.222
1 0. 000 0.079 0.244 0.330 0.347
1 0 0. 000 0. 003 0. 08 7 0.314 0.596
0.001 0.212 0.180 0.197 0.209 0.202
1,000,000 0.01 0.190 0.21 1 0.189 0.203 0.207
0.1 0.090 0.214 0.237 0.225 0.234
1 0. 000 0. 064 0.248 0.328 0.360
1 0 0. 000 0.006 0. 074 0.314 0.606
0.001 0.205 0.217 0.199 0.196 0.183
10,000,000 0.01 0.188 0.186 0.212 0.207 0.207
0.1 0.071 0.232 0.212 0.239 0.246
1 0. 000 0. 050 0.227 0.359 0.364
10 0. 000 0.002 0. 07 6 0.340 0.582
0.001 0.184 0.193 0.201 0.213 0.209
100,00,0000 0.01 0.177 0.192 0.211 0.222 0.198
0. 1 0.063 0.216 0.245 0.232 0.244
1 0. 000 0.049 0.230 0.320 0.401
10 0. 000 0.002 0.070 0.311 0.617
0.001 0.199 0.194 0.184 0.209 0.214
1,000,000,000 0.01 0.167 0.217 0.207 0.212 0.197
0.1 0.043 0.222 0.230 0.249 0.256
1 0. 000 0.041 0.235 0.336 0.388
10 0. 000 0. 001 0.058 0.324 0.617

 

45

Throughout the range of population sizes, 10“ to 109, there were no apparent
differences in the results. Evidently, for these simulations, the ability for selection to act
indirectly on evolvability is dependent largely on the mutation supply rate, and it does not
matter which combination of population size and mutation rate gave rise to it. This
ﬁnding is in contrast to the earlier results looking at the distributions of size and distance

from the PLOD of genotypes.

DISCUSSION

This investigation into the population dynamics of adaptation in large asexual
populations allows us to draw three major conclusions. First, when there is a high
beneﬁcial mutation supply rate, many independent subpopulations may arise and
compete with each other. Each of these subpopulations may also produce additional
beneﬁcial mutation before one out-competes all others. Second, as the population
evolves, fitness may increase in a step-like manner, even when there is no allelic ﬁxation
event. Independent subpopulations may coexist through several steps in ﬁtness. Third,
as the number of independent subpopulations increases and they coexist through multiple
adaptive steps, the ﬁxation probability of a mutation depends not only on the ﬁtness
beneﬁt that it confers, but also on its opportunity for further adaptation, relative to other
contending beneﬁcial mutations.

The simulations presented here covered only a fraction of the possible parameter
combinations that could have been chosen. In these simulations, the role of deleterious
mutation, which can also play an important role (Peck, 1994; Orr 2000; Johnson and

Barton, 2002; Bachtrog and Gordo, 2004), was ignored completely. The population

46

dynamics have proven to be interesting and complex even without considering
deleterious mutations. The results presented here are most appropriately related to
organisms with low overall mutation rates, as the effect of deleterious mutations will be
minimized. The way in which the conclusions of this study will be inﬂuenced by
deleterious mutations remains an open and interesting question. Also, cases of
intermediate rates of recombination were ignored. The cases described here, with no
recombination whatsoever, serve to describe the boundary condition of a more general
theory that can incorporate the full range of recombination rates.

The exponential distribution of mutation effects was used because the limited data
available, suggest that this is a reasonable assumption (Imhof and Schlotterer 2001,
Rozen et al. 2002). Unfortunately, these data are inadequate to discriminate between an
exponential and other potential distributions (Rozen et al. 2002). Fortunately, a large
number of statistical distributions have exponential-like tails (Gumbel 1958). Of course,
in any real population the set of beneﬁcial mutations available will be a consequence of
the environment and the genotypes present, and will not be a realization of any abstract
statistical distribution. Additionally, the distribution will be made up of a ﬁnite set of
mutations; the assumed tail behavior, which plays a prominent role in several of the
results of this paper, might be unfounded. This assumption has been used widely in
theoretical treatments of adaptation (e.g., Gillespie 1984; Gerrish and Lenski 1998; Orr
2000; Orr 2002; Orr 2003; Wilke 2004). Importantly, the conclusions of this paper seem
to be robust to the precise distribution of mutational effects assumed. Any actual
distribution for which the collection of most ﬁt mutations is more clustered than an

exponential distribution will tend to cause the competing subpopulations to be more

47

similar in fitness, extending the amount of time they will compete with each other and
therefore increase the strength of each of the previously stated conclusions.

Clonal interference allows for exploration of an adaptive landscape, by promoting
the coexistence of independently evolving subpopulations. However, this exploration has
important limitations. First, it accesses only those mutations that are beneﬁcial on the
background in which they occur. For the parameters considered here, there is probably
little time for neutral mutations that may interact epistatically to increase to a high
enough frequency to pick up additional beneﬁcial mutations before they are eliminated
by the continuous supply of beneﬁcial mutations. In other words, the expected waiting
time for neutral mutations to matter may be considerably longer than the spacing between
successive beneﬁcial mutations (Christiansen et al., 1998; Van Nimwegen and
Crutchﬁeld, 2000). Therefore, this process of exploration of genetic space predominantly
looks up (toward higher ﬁtness), rarely to the side, and almost never down (i.e., through
adaptive valleys). Second, in these dynamics the genetic space sampled tends to be very
clustered. Even in situations of high beneﬁcial mutation supply, there are sweeps that go
unchallenged, so that no alternatives are sampled. Additionally, the various
subpopulations can be highly variable in size. Therefore, the extent of exploration of
genotypes around those subpopulations will also be highly variable.

Considering these limitations on the exploration of genetic space it is perhaps
surprising that the indirect selection for evolvability can be so effective. In the set of
simulations shown in Table 6, where alleles differed only in their ability to generate
additional beneﬁcial mutations, the more evolvable alleles had a signiﬁcantly higher

probability of ﬁxation under conditions of a high mutation supply rate. This outcome

48

was likely aided by the fact that the mutations were identical in fitness, which minimized
the variation purging effect of within-cohort competition.

The population dynamics described here may help to explain several reports of
increased mutation rates in real populations (Mao et al., 1997; Sniegowski et al., 1997;
Oliver et al., 2000; Notley-McRobb et al., 2002; Shaver et al. 2002). Sniegowski et al.
(2000) argued that, “ this ‘clonal interference’ effect constrains the adaptive usefulness of
a high mutation rate to situations in which beneﬁcial mutations are extremely
infrequent.” In fact, theory predicts (Gerrish and Lenski 1998; Wilke 2004) and
experiments show (de Visser et al. 1999) that under conditions of high beneﬁcial
mutation supply, further increases in mutation rate have a minimal effect on the rate of
adaptation. It is therefore paradoxically that mutator phenotypes have been observed to
evolve in very large populations with a high supply of beneﬁcial mutations (Sniegowski
et al., 1997; Tenaillon et al., 1999; Notley-McRobb et al., 2002; Shaver et al., 2002).
The population dynamics described here suggest a possible explanation. With a high
beneficial mutation supply rate, the ﬁxation of a lineage typically occurs over several
adaptive steps. At each of these steps the mutator, or any trait that increases evolvability,
can increase its proportion in the population by contributing more genotypes to the next
cohort (see also Tenaillon et al., 1999; de Visser and Rozen, 2005).

The work presented here has examined adaptation in large asexual populations. It
has also identified several questions that deserve further investigation. Speciﬁcally, does
selection for evolvability, in the manner described here, actually explain the evolution of
increased mutation rates? Are there real cases of alleles that allow more adaptive steps

being selected in the manner described? Also, our increased understanding of the

49

population dynamics caused by clonal interference may lead to new or more precise
predictions about the differences between sexual and asexual populations and the

potential evolutionary advantage of sexual and asexual reproduction.

50

CHAPTER 3

WITHIN-POPULATION DYNAMICS OF ADAPTATION IN A LONG-TERM
EVOLUTION EXPERIMENT WITH ESCHERICHIA COLI

ABSTRACT

In asexual populations, independently arising beneﬁcial mutations cannot
recombine into the same genetic background. Instead, they compete with each other, and
typically only one persists in the long run. The resulting population dynamics have
important implications for the rate of evolution, the regularity with which mutations ﬁx,
and the characteristics of mutations that will ﬁx. Much of our understanding of this
process is theoretical, being based on analytical models or computer simulations. Here
we present a detailed study of the within-population dynamics of an experimental
population of Escherichia coli as it adapts to a novel environment during 5,000
generations. Six beneﬁcial mutations were known to ﬁx in this population over that time.
The frequency of each of these mutations was tracked over time, and the ﬁtness of clones
— with and without the known mutations — was assayed. These data reveal that multiple
independent beneﬁcial mutations arise and compete with each other. Furthermore, these
independent subpopulations sometimes accumulate multiple beneﬁcial mutations before
one subpopulation out-competes all others. This extended co—existence persists even
through previously identiﬁed step-like increases in ﬁtness. Finally, the generality of this
process is shown with 10 additional populations that evolved for ~880 generations,
wherein multiple subpopulations again took multiple adaptive steps before one
subpopulation out-competed all others. The implications of our ﬁndings for the evolution

of evolvability and the evolution of sex are discussed.

51

INTRODUCTION

The lack of genetic recombination can cause complex adaptive dynamics in
asexual populations. Early evolution experiments with Escherichia coli provided the
basis of the theory of periodic selection (Atwood Schneider and Ryan, 1951a, b), which
describes the purging of genetic variation as rare beneﬁcial mutations sweep through a
population. In such populations, adaptation will generally proceed by steps in ﬁtness
corresponding to the ﬁxation of individual favorable mutations (Lenski et al., 1991;
Elena, Cooper and Lenski, 1996). In the intervening periods, the population is waiting
for the next beneﬁcial mutation to arise, neutral and deleterious variation moves toward
an equilibrium, and average ﬁtness remains approximately constant. Under these
conditions, where beneﬁcial mutations are rare, this model works well and gives intuitive
predictions for rate of adaptation with various mutation rates and population sizes (Crow,
1965; Gerrish and Lenski, 1998).

When beneficial mutations become common enough for multiple beneﬁcial
mutations to arise and escape stochastic loss on the same genetic background, these
mutations compete with each other. The ensuing population dynamic is called clonal
interference. Several important consequences follow. First, the rate of adaptation is
slowed signiﬁcantly relative to a population that has free recombination (Fisher, 1930;
Muller, 1932; Muller, 1964; Crow, 1965; Felsenstein, 1974). Second, there is a law of
diminishing returns on the rate of adaptation with increasing mutation rate (Crow and
Kimura, 1965; Gerrish and Lenski, 1998; de Visser et al., 1999; Orr, 2000; Wilke, 2004).

Third, these conditions inﬂuence the types of mutations that ﬁx. Speciﬁcally, as clonal

52

interference increases so does the average size of beneﬁcial effect of mutations that ﬁx
(Gerrish and Lenski, 1998; Rozen, de Visser and Gerrish, 2002; Wilke, 2004), although
this pattern may not always hold when deleterious mutations occur at a high rate
(Campos and de Oliveira, 2004; de Oliveira and Campos, 2004). Fourth, the temporal
ﬁxation of beneﬁcial mutations may be more regular than otherwise expected (Lenski et
al., 1991; Gerrish, 2001). Finally, clonal interference may facilitate the ﬁxation of those
traits that affect evolvability by inﬂuencing the generation of heritable variation
(Tenaillon et al., 1999; Chapter 2). These evolvability traits may include mutator
genotypes (Tenaillon et al., 1999) and alleles that inﬂuence subsequent adaptive
evolution by having more positive or fewer negative epistatic interactions (Chapter 2).
Several experimental studies have looked at the within-population dynamics of
evolving microbial populations. Imhof and Schlotterer tracked a rapidly mutating micro-
satellite locus in adapting populations of E. coli in order to measure the ﬁtness effects of
the mutant genotypes (Imhof and Schlotterer, 2001). In doing so, they were able to see
multiple subpopulations competing for ﬁxation (Imhof and Schlotterer, 2001). Notely-
McRobb and Ferenci examined the genetics and population dynamics of E. coli adapting
in chemostats (Notley-Mcrobb and Ferenci, 2000; Notley-Mcrobb, Seeto and Ferenci,
2002). In large populations and for beneﬁcial mutations that occur at a sufﬁciently high
rate, they showed that many independent beneﬁcial mutants can sweep through the
population together, creating a phenotypic sweep without an allelic ﬁxation (Notley-
Mcrobb and Ferenci, 2000). They further showed that this dynamics was sometimes
associated with the transient increase in a mutator phenotype (Notley-Mcrobb, Seeto and

Ferenci, 2002).

53

The study presented here examines a single population in the long-term evolution
experiment with E. coli, and it describes the within-population dynamics over the ﬁrst
5,000 generations when adaptive evolution was most rapid. This particular population
has been studied previously and has previously served as a focal population for several
intensive phenotypic and genetic analyses. The temporal trajectory of ﬁtness has been
shown to increase in a step-like fashion (Lenski and Travisano, 1994). Increases in cell
size, a trait that is correlated with ﬁtness, closely matched the increases in ﬁtness (Elena,
Cooper and Lenski, 1996). Six mutations have been identiﬁed and shown to be favorable
in the evolution environment (Schneider et al., 2000; Cooper et al., 2001; Cooper, Rozen
and Lenski, 2003; Stanek and Lenski, in prep; Chapter 1). In this study, we use this
genetic information, along with the availability of frozen population samples from
preceding generations, to track the within-population dynamics by following the
frequency of individual beneﬁcial mutations as they spread through the population. Also,
we measure the ﬁtness of clones sampled from the population with each of the genotypes
identiﬁed at each time point, including those genotypes with and without subsequently
fixed beneﬁcial mutations. These data yield a much richer and more complete picture of

the within-population dynamics of adaptation than has been previously available.

METHODS
Evolution experiment. The focal population under study, called Ara-1, is one of
twelve populations in a long-term evolution experiment described elsewhere (Lenski et
al., 1991; Lenski and Travisano, 1994; Lenski 2004). The population was founded with a
single clone of Escherichia coli B, and was propagated in serial batch culture in a

glucose-limited environment (Davis minimal media supplemented with 25ug/mL

54

glucose). Each day, 1% of the population has been transferred to fresh media, allowing
~6.64 generations as the population consumes the available resources. A sample of the
p0pulation was obtained every 500 generations throughout the evolution experiment and
has been stored at —80°C; each sample is broadly representative of the entire population at
the corresponding generation because each sample contains >1 mL of the population
culture from the 99% not transferred to the next day’s culture.

Isolation of clones and estimation of mutation frequency. Six mutations have
been identified and shown to be beneﬁcial (Table 7). PCR-based detection assays were
developed to distinguish each mutant allele from the ancestral allele. Beneﬁcial
mutations in the ribose Operon (rbs) were large deletions and are detectable by
polymerase chain reaction fragment length polymorphisms (PCRFLP). The mutation in
pku was an insertion of an [8150 element (Schneider, et al., 2000), and was likewise
detectable by PCRFLP. The mutations in topA, spoT and ppr each created or
destroyed a different restriction enzyme recognition site. In these cases, restriction
fragment length polymorphisms (RFLP) were used to detect the mutations. The mutation
in glm US created neither a PCRFLP nor an RFLP. For this gene, a primer set was
designed that would only amplify when the mutant allele was present. Because this
method depends on optimization of PCR condition, and is inherently more prone to error
than the other methods, each clone was checked at least three times. All assays were run
with a positive (known mutant genotype) and a negative (ancestral genotype) control.

To estimate the frequency of genotypes over time, 48 clones were picked from
the frozen population samples at generations 500, 1,500, 2,000, 2,500, 3,000, 4,000, and

5,000, and 95 clones were picked from the population at 1,000 generations (Table 8).

55

Clones were obtained by streaking the population sample onto minimal glucose plates
and picking colonies at random after 24 hours of growth at 37°C. The clones picked
were inoculated into 1 mL LB liquid media and allowed to grow overnight at 37°C. The
following day glycerol was added to achieve a 12% (v/v) ratio, and clones were then
stored at —80°C. Lack of ampliﬁcation or failure of an unambiguous signal caused
several clones to be eliminated from the analysis. Table 8 indicates the number of clones
used from each set. All mutations were not checked in all clones, but rather the following
rules were followed: (i) if a mutation was present in all clones from one generation, then
all clones from later generations are assumed to have the mutation; and (ii) if a mutation
was not present in any clones from one generation, then all clones from previous
generations are assumed to not to have the mutation. Bear in mind that all mutations in
this study are ones known to have been absent in the ancestor and eventually ﬁxed in the
population. Table 8 indicates precisely when each allele was assayed.

A subset of these clones was used for ﬁtness assays. Three representative clones
of each genotype from each generational sample were chosen. If there were fewer than
three of a given genotype available at a given time point, then all available clones of that
genotype were chosen. Here, “genotype” represents only the information we have from
the allele detection procedures as described above. Hence, mutations we identify as
being the same genotype are only identical at these loci and as testable by these methods.
For example, it is possible that similar but not identical deletions in the ribose operon are
scored as identical due to limitations of the PCRFLP assay. Also, the RFLP detection
assays and the allele-speciﬁc PCR assays only detect mutations at their respective

recognition sites. For this reason, they cannot tell us if there is a mutation elsewhere in

56

the gene. Most importantly, none of the methods can detect mutations at other loci. It is
quite possible - and indeed we will show by ﬁtness assays — that clones with the same
“genotype” as detected by these methods in fact differ at other loci where mutations
contribute to adaptation. Table 8 lists the genotypes of all clones assayed for ﬁtness.
Alleles at each locus are indicated in parentheses as follows: the allele present in the
ancestor is designated anc, the allele that will eventually be ﬁxed is designated ﬁx, and
any other alleles that are found are labeled alphabetically (A, B, ...) in the order in which
they appear.

Construction of the Muller-style plot. Figure 9 presents a plot to describe the
spread of genotypes through time and was created from the allele-frequency data in the
style of Muller (1932). In this plot, each color uniquely indicates a different genotype
and the width of a shaded region indicates the frequency of that genotype in each
generation. For each shaded region, the earliest occurrence of that genotype is placed
within the color of the genotype that gave rise to it by mutation (parsimoniously
assuming the fewest mutations possible). Frequency data were collected at generations
500, 1,000, 1,500, 2,000, 2,500, 3,000, 4,000 and 5,000. The plot also assumes that
genotype frequencies changed linearly between sample time points, which clearly does
not reﬂect reality but does give a reasonable picture of the spread of genotypes and the
coexistence of their subpopulations through time.

Fitness assays. The standard ﬁtness assay for the long-term E. coli evolution
experiment was used, and it is described in more detail elsewhere (Lenski et al., 1991).
In all cases, ﬁtness was measure against strain REL607, a genotype identical to the

ancestral strain used to found the population except that it contains a neutral mutation in

57

the arabinose operon. This mutation allows the cells to utilize the sugar L(+) arabinose
and causes a readily detectable phenotype (pink colored colonies) when mixed samples
are grown on tetrazolium arabinose (TA) indicator agar. Colonies of cells that cannot use
arabinose appear red on TA plates. The phenotype of being able to use arabinose is
labeled Ara+ and the phenotype of not being able to use arabinose is labeled Ara-.

Fitness assays were performed as follows. A particular clone (or population
sample) and REL607 were inoculated separately into ﬂasks containing LB liquid medium
from the freezer stocks and allowed to grow for 24 hours at 37°C. These cultures were
then diluted 100 fold, and 100 ul was transferred to 9.9 ml DM25 media. The cultures
were again allowed to grow separately for 24 hours at 37°C. The clone (or population
sample) to be assayed and REL607 were then mixed at a 1:1 volumetric ratio by
transferring 50 ul from each culture to 9.9 ml fresh DM25. A sample was taken from this
mixture and plated onto TA agar to estimate the initial number of each type. The
mixture was allowed to grow for 24 hours at 37°C, at which point a second sample of the
mixture was plated onto TA agar to estimate the final number of each type. Realized
Malthusian parameters (m) for each of the two competitors were calculated,

m = ln(N f / N i) , where N, is the number of individuals present at the ﬁnal count and N, is

the number present at the initial count, and where both counts reﬂect changes in overall
density based on the dilutions used for plating. Relative ﬁtness of one competitor to the
other is then simply the ratio of their realized Malthusian parameters. Fitness estimates
were made in four blocks. Each block consisted of two assay of every clone, four assays

of every mixed population, and six assays of the ancestor.

58

Statistical analyses. The purpose in collecting the ﬁtness data was two fold.
First, we were interested in the competitive dynamics among the coexisting genotypes.
Second, temporal changes in the ﬁtness of genotypes deﬁned on the basis of known
mutations may tell us when clones of a particular genotype have acquired additional
beneﬁcial mutations that have not been identiﬁed. These issues were addressed by
analyzing the data from each time point separately in a mixed general linear model.
Genotype was treated as a ﬁxed effect, and clone was treated as a random effect nested
within genotype. The experimental block in which each ﬁtness estimate was obtained
was entered into the statistical model as a random factor. Additionally, for the three
generational time points in which more than two genotypes were present, one planned
contrast was performed. This contrast is a two-tailed test for a difference between the
genotype that was most closely related to the eventual winners and all other genotypes
present. This contrast is reported in Table 9 as E.W. as shorthand for the eventual
winner. The MIXED procedure in SAS (version 8.0) was used. The MIXED procedure
is a generalization of the standard linear model that allows the data to exhibit internal
correlation and nonconstant variation (SAS Institute Inc, 1999), both of which may exist
in our data set. Also, the maximum likelihood methodology in MIXED is more robust to
imbalance in the data structure than standard ANOVA methods. Imbalance occurred in
our data set whenever fewer than three clones of a given genotype were present in a
particular generation. The restricted maximum likelihood method (REML) was used to

estimate three models:

59

(A) In the full model, each ﬁtness estimate, yijklr is the sum of the mean of genotype i,
the effect of the jth randomly selected clone (~iid N(0, 0C)), the effect of the kth

block (~iid N(0, 0,,» and the error from the lth measurement (~iid N(0, 06)),

yijkl = 10+ 911') + bk + eijkl;

(B) In the first reduced model, the effect of clone (nested in genotype) is dropped from

the full model,
yijkl = “i + bk + eijkl;
(C) In the second reduced model, the block effect is also dropped,
yijkl = W + eijkl'

For each model a likelihood value is generated which indicates the probability of

the model given the data. When one model is a subset of another the difference in the —2

log likelihood values of the two models is expected to ﬁt a x2 distribution with degrees of

freedom equal to the difference in the number of parameters in the two models (Self and
Liang, 1987; Littell, 1996). Thus, tests of the factor clone were made by comparing the

full model (A) to the model identical but for the factor clone (B). This comparison tests

the null hypothesis that the variance attributable to clone is 0 (i.e., H0: CC = 0, Ha: Go >
0). Finally, since CC is bounded to be non-negative, the relevant probability is half that

inferred from the x2 value (Littell, 1996). By identical reasoning, the test of the block

60

effect (i.e. ob > 0) is one-half the probability of a x2 greater than the difference in —2 log

likelihoods of model C and model B.

The ﬁxed effect of genotype and the planned E.W. contrast were both tested with
F statistics using the type III sums of squares from most complex model with signiﬁcant
support (A, B, or C). Degrees of freedom from the Satterthwaite approximation were
used to adjust for imbalances in the data (Satterthwaite, 1946; Littell, 1996).

Subsidiary evolution experiment. Ten additional populations were evolved under
conditions identical to those used for Ara-1. These populations each started with a 1:1
ratio of clones REL606 and REL607. Thus, the founding populations were genetically
homogenous except for a neutral marker, which determines the ability to use arabinose,
with each allele present at an initial frequency of 0.5. A sample of each population,
containing about 400 colony forming units (CFU) was plated onto TA agar every three
days (~20 generations). As described above, Ara+ colonies appear pink while Ara-
colonies are red, and the number of each type was recorded for each sample. This
experiment follows closely the methodology of a previous experiment designed to
capture the ﬁrst adaptive mutational step (Rozen, de Visser and Gerrish, 2002), but it ran
more than twice as long as that earlier experiment to allow multiple adaptive steps.

The expected mean time to ﬁxation by a process of random drift alone is given by

-4N [ plog( p) + (1 — p) log(1 - p)] (Kimura and Ohta, 1969). The effective population size

in these experiments, taking into account the bottleneck during the daily transfers, is
about 3 x 107 cells (Lenski et al., 1991). Given an initial p = 0.5, ﬁxation by random drift
would require >90 million generations, and very long periods would be needed to

produce any measurable change in frequency. Therefore, the substantial shifts observed

61

between 100 and 900 generations reﬂect the effects of selection on beneﬁcial mutations

that arise in one or both genetic backgrounds.

RESULTS

Order of beneﬁcial mutations. Figure 9 and Table 7 show the frequencies of each
known mutation in each generational sample tested in the focal population. These data
document the timing and order in which these beneﬁcial mutations arose and were
substituted in the population. The ﬁrst beneﬁcial mutation identiﬁed thus far in that
population, among those that eventually ﬁx, is a deletion in the ribose operon (rbs(ﬁx)).
Onto this genetic background, beneﬁcial mutations arose, in order, in topA, spoT, glm US,
pku, and ppr. That the ribose operon mutation preceded the topA mutation is evident
from the existence of clones with the rbs(ﬁx) mutation that do not have the topA(ﬁx)
mutation. Likewise, there was a clone that had the spoT(ﬁx) mutation that did not have
the glmUS(ﬁx) mutation, implying that the glmUS mutation arose in a background that
carried the spoT mutation. In all cases, clones with a given mutation also had each of the
preceding mutations.

Beneﬁcial mutations that do not ﬁx. In addition to the mutations that eventually
fixed in the population, we found two mutations in the ribose operon at 500 and 1000
generations that did not reach ﬁxation. We did not know beforehand to look for these
mutations. They were found because the PCRFLP assay, used to distinguish the rbs(anc)
allele from the rbs(ﬁx) allele, also indicated that other fragment lengths were present.
Deletions in the ribose operon occur at an elevated rate and are mediated by an IS 150

element that is located directly upstream of the operon, and they also confer a small

62

Figure 9. A. Muller—style graph of changing frequencies of mutant genotypes in
population Ara-l through time. The height of the shaded regions indicates relative
frequencies estimated from samples taken in generations 0, 500, 1000, 2000, 2500, 3000,
4000, and 5000 with linear extrapolation between these time points. Genotypes are
labeled by the gene that contains the latest known beneﬁcial mutation. B. Mean ﬁtness
relative to the ancestor of the population samples taken from the same generations. Error
bars indicate 95% conﬁdence intervals. (Data from Table 8.) Images in this dissertation

appear in color.

 

I not any
rbs(A)
rbs(B)
rbs(ﬁx)
topA
spoT
glmUS
pku
ppr

 

 

0 500 1000 1500 2000 2500 3000 4000 5000

 

 

I I I ﬁr If ﬁr If 1

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500

63

Table 7. Frequency of clones with various mutations in each generational sample. The

total number of clones tested is shown across the top. Cells shaded grey were not tested

but the numbers shown are based on parsimonious assumptions that mutations had not

yet arisen (entry equals zero) or had already ﬁxed (entry equals total). The assay column

indicates the method used to test for the presence of each particular mutation. PCRFLP

stands for polymerase chain reaction fragment length polymorphism. RFLP stands for

restriction fragment length polymorphism; the restriction enzyme that was used is shown

in parentheses. The mutation in glmUS was detected using a primer pair in which the 3’

end of one primer matched the mutation such that a PCR product should be produced

only when the mutant allele is present. See Methods section for further details. See

references listed for full descriptions of the mutation and how they were determined to be

 

 

beneﬁcial.

a a e e e g e

E— V) ~ .... N N m In

Total 4289 48 484846 40 48 assay reference

rbs(anc) 2 1 0 0 0 0 O 0 PCRFLP
rbs(A) 14 32 0 0 0 0 0 0 PCRFLP This chapter
rbs(B) 0 5 0 0 0 0 0 O PCRFLP This chapter
rbs(ﬁx) 2651 48 48 48 46 40 48 PCRFLP Cooper et al., 2001
topA(ﬁx) 224448 48 48 46 40 48 RFLP (Nmu CI) Crozat et al., 2004
spoT(fix) 0 34 48 48 48 46 40 48 RFLP (Hin 41) Cooper et al., 2003
glm US(ﬁx) 0 3348 48 48 46 40 48 Allele speciﬁc PCR Stanek and Lenski, in prep.
pku (ﬁx) 0 0 6 6 45 45 4048 PCRFLP Schneider et al., 2000
ppr(fix) 0 O 0 0 0 3 3648 RFLP (Mbo 1) Chapter 1

 

ﬁtness advantage in the glucose-limited environment of the evolution experiment
(Cooper et al., 2001).

Following completion of the allele detection and ﬁtness assays, it became clear
that beneficial mutations arose in the population that did not ﬁx, and that have not yet
been identiﬁed. For example, if the clones from the 1000-generation sample that have
either the rbs(A) or the rbs(B) allele contained no other beneﬁcial mutations, then we
would expect their ﬁtness advantage to be only about 1-2% relative to the ancestor
(Cooper et al., 2001). Their fitness advantages are, in fact, much greater, indicating they
have additional beneﬁcial mutations that have not yet been discovered in genetic analyses
of this population. Therefore, we made two targeted attempts to identify some of these
other beneﬁcial mutation that must have been present but did not ﬁx.

In the first effort to identify these unknown mutations, we examined two clones
from generation 1000, one having the rbs(A) allele (clone REL10651) and the other
having the rbs(B) allele (clone REL10654). Three genes, pku, nadR and spoT, were
chosen as good candidates because mutations in these genes eventually ﬁx in this
population and because similar mutations arose in independent populations under the
same conditions (Cooper, Rozen and Lenski, 2003; Chapter 1). Sequencing did not
reveal mutations in any of these genes in REL10654. One non-synonymous mutation
was found in pku in REL10651, but its nadR and spoT sequences were identical to the
ancestor. This mutation in pku eliminates a recognition site for the restriction enzyme
NlaIII, and this fact was used both to double check the accuracy of the sequencing data
and to test for the presence of the mutation in the other clones having the rbs(A) allele.

This RFLP assay revealed that all three clones sampled from generation 1000, which

65

have the rbs(A) allele and for which we have ﬁtness estimates, possess the same pku
mutation. Two 500-generation clones that have the rbs(A) allele were checked and did
not have the mutation in pku. This subpopulation evidently continued to evolve
between generations 500 and 1000.

Our second attempt to ﬁnd beneficial mutations that did not ﬁx examined a clone,
REL10625, sampled from generation 1500 that did not have the pku mutation that
eventually ﬁxed in this population. The presence of an unidentiﬁed beneﬁcial mutation
was strongly suspected in this case because the three clones without the pku (ﬁx) allele
were just as fit as the three clones with the pku (ﬁx) allele (see below). This fact further
suggested we look at the pku gene for the missing beneficial mutation. Sequencing the
pku gene in clone REL10625 revealed a 7-bp deletion in the reading frame causing a
frame shift.

Lack of recombination. The E. coli B strain used to start population Ara -l was
thought to have no means of genetic recombination between separate cells. E. coli is not
naturally competent, and the ancestral B strain carries no plasmids and seems not to
harbor any functional lysogenic phage. Consistent with this view, we found no evidence
for recombination among clones of any of the alleles assayed in this work. Speciﬁcally,
there were 177 clones assayed when alleles at multiple loci were polymorphic in the
population. Each of these clones could potentially have revealed a putative
recombination event, yet in no case was a genotype seen that was inconsistent with an
absence of recombination.

Dynamics of adaptation in population Ara-I . We can now combine the allele

frequency data with the fitness data, along with some previously acquired information, to

66

explain the dynamics of adaptation in population Ara-l over the ﬁrst 5,000 generations of
the long-term experiment. The resulting picture is not complete, but it does clearly show
that clonal interference was intense. Three important patterns will be described in the
data. First, numerous beneﬁcial mutations arise and reach a high frequency in the
population, but only some of the them eventually ﬁx in the population as a whole.
Second, some sub-populations may acquire multiple beneﬁcial mutations, yet still fail to
be successful in the long run. Finally, these sub-populations are genetically different
from the eventual winner either at the level of the locus or the particular allele within a
locus.

At generation 500 there were four distinguishable genotypes that had, as follows,
the rbs(A) allele, the rbs(ﬁx) allele, both the rbs(ﬁx) allele and the topA(ﬁx), or no
known mutations. Fitness estimates for the clones having both the rbs(ﬁx) and topA(ﬁx)
alleles are consistent with the expectation obtained from the multiplicative combination
of their separately estimated effects. That is, relative ﬁtness values of 1.014 for ribose
(Cooper et al., 2001) and 1.133 for topA (Crozat et al., 2004), respectively, are expected
to produced a combined ﬁtness of 1.148; the least squares mean for this genotype from
the ANOVA based on clones from generation 500 is 1.136, with the corresponding 95%
conﬁdence interval ranging from 1.104 to 1.167. On the other hand, the clones with no
known mutations, or known mutations only in the ribose operon, have ﬁtness much
greater than expected. This discrepancy indicates that the clones must have beneﬁcial
mutations that have not been identified. It is not presently known how many other
mutations account for the additional ﬁtness advantage. Also, in the ANOVA based on

the SOC-generation clones, the factor genotype is signiﬁcant (p = 0.0018). The planned

67

Table 8. Fitness assays for 8 population samples and 61 isolated clones, with all ﬁtness
values relative to strain REL607. REL607 is identical to the ancestral strain except that it
has the selectively neutral Ara+ marker. * indicates that the allele state is assumed; see
text and Table 7 for details. Adding and subtracting the parenthetical +/- value gives the
95% conﬁdence interval for the ﬁtness estimate. For all genes, “anc” indicates the
ancestral allele, “ﬁx” indicates the allele that eventually ﬁxed in the population, and other
alleles are named in alphabetical order (A, B, ...). The clone column indicates whether
the test material was a clone or a population sample (including all the genetic variation in

the population when it was sampled) .

 

REL relative
time clone label ﬁtness +/- rbs topA spoT glmUSpku ppr
0 yes 606 anc anc anc anc anc anc
500 no 762 1.195 (0.033)
1000 no 964 1.225 (0.041)
1500 no 1068 1.336 (0.039)
2000 no 1164 1.352 (0.034)
2500 no 1282 1.347 (0.042)
3000 no 1483 1.387 (0.053)
4000 no 1890 1.437 (0.046)
5000 no 2179 1.458 (0.055)
500 yes 10632 1.127 (0.037) fix ﬁx anc anc *anc *anc
500 yes 10633 1.150 (0.041) ﬁx ﬁx anc anc *anc *anc
500 yes 10634 1.137 (0.024) ﬁx ﬁx anc anc *anc *anc
500 yes 10635 1.135 (0.041) ﬁx anc anc anc *anc *anc
500 yes 10636 1.197 (0.037) ﬁx anc anc anc *anc *anc
500 yes 10637 1.265 (0.106) fix anc anc anc *anc *anc
500 yes 10638 1.204 (0.034) A anc anc anc anc *anc
500 yes 10639 1.205 (0.064) A anc anc anc *anc *anc
500 yes 10640 1.182 (0.081) A anc anc anc anc *anc
500 yes 10641 1.228 (0.074) anc anc anc anc *anc *anc
500 yes 10642 1.230 (0.083) anc anc anc anc *anc *anc
1000 yes 10643 1.284 (0.090) ﬁx ﬁx ﬁx ﬁx anc *anc
1000 yes 10644 1.243 (0.067) ﬁx ﬁx ﬁx ﬁx anc *anc
1000 yes 10645 1.261 (0.053) ﬁx ﬁx fix ﬁx anc *anc
1000 yes 10646 1.207 (0.059) fix ﬁx fix anc anc *anc
1000 yes 10647 1.219 (0.052) ﬁx fix anc anc anc *anc

 

 

 

68

 

 

 

 

 

 

1000 yes 10648 1.208 (0.031) ﬁx fix anc anc anc *anc
1000 yes 10649 1.252 (0.065) ﬁx ﬁx anc anc anc *anc
1000 yes 10650 1.220 (0.038) ﬁx anc anc anc anc *anc
1000 yes 10651 1.227 (0.029) A anc anc anc A *anc
1000 yes 10652 1.230 (0.057) A anc anc anc A *anc
1000 yes 10653 1.248 (0.041) A anc anc anc A *anc
1000 yes 10654 1.263 (0.063) B anc anc anc anc *anc
1000 yes 10655 1.282 (0.074) B anc anc anc anc *anc
1000 yes 10656 1.271 (0.044) B anc anc anc anc *anc
1000 yes 10657 1.228 (0.053) anc anc anc anc anc *anc
1500 yes 10624 1.329 (0.068) ﬁx ﬁx ﬁx ﬁx ﬁx *anc
1500 yes 10658 1.305 (0.064) ﬁx ﬁx ﬁx ﬁx ﬁx *anc
1500 yes 10659 1.314 (0.047) ﬁx ﬁx ﬁx ﬁx ﬁx *anc
1500 yes 10625 1.336 (0.049) ﬁx ﬁx ﬁx ﬁx anc *anc
1500 yes 10660 1.347 (0.073) ﬁx ﬁx ﬁx ﬁx anc *anc
1500 yes 10661 1.297 (0.087) ﬁx ﬁx ﬁx ﬁx anc *anc
2000 yes 10662 1.359 (0.086) ﬁx *ﬁx *ﬁx *ﬁx ﬁx anc
2000 yes 10663 1.352 (0.082) ﬁx *ﬁx *ﬁx *ﬁx ﬁx anc
2000 yes 10664 1.292 (0.035) ﬁx *ﬁx *ﬁx *ﬁx ﬁx anc
2000 yes 10665 1.393 (0.139) ﬁx *ﬁx *ﬁx *ﬁx anc anc
2000 yes 10666 1.369 (0.058) ﬁx *ﬁx *ﬁx *ﬁx anc anc
2000165 10667 1.272 (0.106) ﬁx *ﬁx *ﬁx *ﬁx anc anc
2500 yes 10668 1.411 (0.072) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx anc
2500 yes 10669 1.381 (0.070) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx anc
2500 yes 10670 1.285 (0.050) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx anc
2500 yes 10671 1.326 (0.042) *ﬁx *ﬁx *ﬁx *ﬁx anc anc
2500 yes 10672 1.327 (0.042) *ﬁx *ﬁx *ﬁx *ﬁx anc anc
2500 yes 10673 1.225 (0.214) *ﬁx *ﬁx *ﬁx *ﬁx anc anc
3000 yes 10674 1.367 (0.058) *fix *ﬁx *fix *ﬁx ﬁx ﬁx
3000 yes 10675 1.392 (0.053) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx ﬁx
3000 yes 10676 1.429 (0.082) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx ﬁx
3000 yes 10677 1.417 (0.080) *fix *ﬁx *ﬁx *ﬁx ﬁx anc
3000 yes 10678 1.331 (0.053)*ﬁx *ﬁx *ﬁx *ﬁx ﬁx anc
3000 yes 10679 1.402 (0.042) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx anc
3000 yes 10680 1.378 (0.055) *ﬁx *ﬁx *ﬁx *ﬁx anc anc
4000 yes 10681 1.482 (0.086) *ﬁx *ﬁx *fix *ﬁx ﬁx ﬁx
4000 yes 10682 1.442 (0.066) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx ﬁx
4000 yes 10683 1.426 (0.051) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx ﬁx
4000 yes 10684 1.403 (0.051) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx anc
4000 yes 10685 1.448 (0.056) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx anc
4000 yes 10686 1.424 (0.079) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx anc
5000 yes 10687 1.417 (0.056) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx ﬁx
5000 yes 10688 1.477 (0.070) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx ﬁx
5000 yes 10689 1.406 (0.106) *ﬁx *ﬁx *ﬁx *ﬁx ﬁx ﬁx

 

Table 8 (cont’d)

69

Table 9. Analyses of variation in ﬁtness among genotypes and clones within genotypes in
population Ara-1. Results were obtained using the MIXED procedure. The signiﬁcance
of the random factors, clone and block, were evaluated using a likelihood ratio test
(LRT). The signiﬁcance of the ﬁxed factor genotype, and of the contrast between the
eventual winner and other genotypes, E.W., were evaluated by partial F tests using the
most complicated model with support (model C for generations 2000 and 3000, and

model B for all other generations). See Statistical Analyses in the Methods section for

 

 

 

 

 

 

 

 

 

further details.
time factor test df test. .
statistic

500 clone LRT 1 1.2 0.1367
block LRT 1 10.5 0.0006
genotype F 3, 80 5.45 0.0018
E.W. F 1, 80 15.77 0.0002
1000 clone LRT 1 0 0.5000
block LRT 1 10 0.0008
genotype F 6,108 0.92 0.4866
E.W. F 1, 108 1.49 0.2247
1500 clone LRT 1 0 0.5000
block LRT 1 7.2 0.0036
ienoqm F l, 42 0.31 0.5809
2000 clone LRT l 0.1 0.3759
block LRT 1 0.2 0.3274
genotype F 1, 44 0.04 0.8381
2500 clone LRT 1 0.5 0.2398
block LRT 1 5.6 0.0090
genotype F 1, 41 1.87 0.1794
3000 clone LRT l 0.7 0.2014
block LRT 1 2. 1 0.0736
genotype F 2, 52 0.14 0.8674
E.W. F 1, 52 0.29 0.5956
4000 clone LRT 1 0 0.5000
block LRT 1 8.9 0.0014
genotype F 1, 39.1 0.78 0.3820
5000 clone LRT 1 0 0.5000
block LRT 1 0 0.5000

 

70

contrast of the genotype most similar to the eventual winner to the other genotypes
indicates that the genotype with both the rbs(ﬁx) and topA(ﬁx) mutations is, in fact,
different from the rest of the population (p = 0.0002). Surprisingly, however, that
genotype was, at generation 500 generations, the least ﬁt (Figure 10A).

The SOD-generation genotype that is the ancestor to the eventual winner was still
present at 1000 generations (Table 8). It also gave rise to two other sub-populations that
were present at 1000 generations, each of which harbors one or two alleles that
eventually fixed in the population. One of these new sub-populations has a mutation in
spoT as well as the rbs(ﬁx) and topA(ﬁx) alleles. The other new sub-population has
mutations in glmUS along with the rbs(ﬁx), topA(ﬁx), and spoT(ﬁx) alleles. There was
also one clone at 1000 generations that has the rbs(ﬁx) allele but none of the other alleles
that were eventually ﬁxed. The 1000-generation clones that have only the rbs(ﬁx) and
topA(ﬁx) alleles are signiﬁcantly more ﬁt than clones with those same mutations at 500
generations, again indicating the presence of other unidentiﬁed beneﬁcial mutations.
Two other genotypes appear in the 1000-generation sample that were not present at
generation 500, each containing alleles that do not eventually ﬁx. One of these contains
another distinct mutation in rbs, designated the rbs(B) allele. The other contains a
mutation in pkyF, designated the pku (A) allele, on the background that also contains the
rbs(A) allele; this genotype therefore carries two mutations that do not ﬁx, both in loci
that acquire other mutations that do eventually ﬁx in this population.

By generation 1000, one or more beneﬁcial mutations have arisen at ﬁve
different loci (rbs, topA, spoT, glm US, and pku), yet none of these mutations had yet

fixed in the population (Fig. 1B), despite a gain in mean ﬁtness of more than 20% (Table

71

Figure 10. Fitness of the population and clones of the major genotypes sampled at (A)
500 generations and (B) 1000 generations, relative to the founding clone. The label
‘mixed pop.’ indicates the entire population with all the genetic variation that it
contained. All other bars represent the ﬁtness of a single clone. The clones are grouped
and colored according to the genotype assigned by the allele detection assays. Group
labels indicate the gene with the latest known beneﬁcial mutation; for example, all topA
clones also have the rbs(ﬁx) allele. Error bars show 95% conﬁdence intervals. There
was signiﬁcant variation among the genotypes at generation 500 but not generation 1000

(Table 9). Images in this dissertation appear in color.
A

'1.4<

relatlve ﬁtness
5

 

mixed pop. topA rbs(fix) rbs(A) not any

Relatlve tltness
i: a

.5
_A
1

 

 

mixed glmUS spoT topA rbs(ﬁx) rbs(A) rbs(B) not any
pop.

72

8). Four different sub-populations exist that share no known mutations including those
with at least one mutation that will ﬁx (at the rbs, topA, and spoT loci), those with rbs(A)
and pku (A), those with rbs(B) only, and those with no known mutations, and a total of
seven distinct genotypes were identiﬁed. Despite this extensive genetic diversity, there
was no signiﬁcant variation in ﬁtness among these genotypes, nor among clones within
genotypes, at 1000 generations (Table 9; Fig. 2B). This similarity in ﬁtness among
genotypes with and without the several known beneficial mutations indicates, once again,
that many clones carry one or more unknown beneﬁcial mutations.

Between 1000 and 1500 generations, four known beneﬁcial mutations ﬁnally
became fixed in the population: rbs(fix), topA(fix), spoT(ﬁx), and glmUS(ﬁx). Also, a
mutation arose in pku that would later; this pku (ﬁx) allele had reached a frequency of
about 12.5% at generation 1500 (Table 7). Thus, we see clearly that the population was
swept not by a single beneﬁcial allele but by a linked set of several beneﬁcial mutations,
which evidently accumulated as a consequence of clonal interference that prevented any
single beneﬁcial mutation from sweeping through the population on its own. Once again,
the ANOVA based on the ﬁtness values estimated for the 1500-generation clones reveals
no differences between the different genotypes (with and without the pku allele that
later fixed), nor among the clones within each genotype (Table 9). This analysis implies
that the genotypes that do not have pku (ﬁx) have an unidentiﬁed beneﬁcial mutation.
Further evidence for the equivalence in ﬁtness of the competing genotypes with and
without pku (ﬁx) lies in the fact that 500 generations later, at generation 2000, the

frequency of the pku (ﬁx) allele remains at 12.5% (Table 7). Subsequent sequencing

73

identified that at least part of the population is equally matched because it has a different
mutation in pku than the one that eventually ﬁxed.

At generation 2500, the pku (fix) allele ﬁnally became numerically dominant
(Table 7). This ﬁnding suggests that an unidentiﬁed beneﬁcial mutation has arisen on
this genetic background and driven the frequency increase. It should be noted that this is
the only time we have unequivocal evidence for a mutation that will eventually ﬁx that
has not been identiﬁed, although it is possible that another beneﬁcial mutation went
undetected with the earliest set of mutations. The subpopulation that does not have the
pku (ﬁx) allele is still present, indicating that it is either being replaced or that it, too,
acquired an additional beneﬁcial mutation. The lack of signiﬁcant variation in ﬁtness
(Table 9) lends support to the latter interpretation. Also, members of this subpopulation
are present, albeit at low frequency, at 3000 generations, indicating that it may have
acquired two beneﬁcial mutations since the time it diverged from the lineage that will
eventually win. By generation 4000, the pku (ﬁx) allele had ﬁnally ﬁxed in the
population after appearing sometime before generation 1500. The ﬁnal beneﬁcial
mutation tracked in this study lies in ppr. The ppr(ﬁx) allele was ﬁrst seen at 3000
generations, had become common by generation 4000, and was found in all clones
screened at 5000 generations.

Subsidiary evolution experiment. The results from the allele-detection assays in
population Ara-l indicate that the ancestral genotype was displaced not by a single
beneﬁcial allele, but instead by a cohort of multiple independently derived and distinct
beneﬁcial mutations. These subpopulations then coexisted through several adaptive

steps. It is of interest to know whether this was an unusual occurrence or an outcome to

74

be expected in this system. To that end, we allowed ten additional populations to evolve
under the same conditions and with the same starting genotype as population Ara-l,
except that a neutral allele was present at an initial frequency of 0.5. The frequency of
that neutral allele was measured every 3 days (~20 generations), and the resulting data are
shown in Figure 11. We attempted to get three types of information from these
populations. First, we estimated the time at which the ﬁrst beneﬁcial mutation, or cohort
of beneﬁcial mutations, swept through the population. Second, we looked for the
presence of multiple competing subpopulations during the sweep. Finally, we use the
data to estimate the number of competing subpopulations that persisted through the ﬁrst
step.

To estimate the time at which the ﬁrst beneﬁcial mutation or cohort began to
eliminate the ancestral genotype, we identiﬁed the time at which the ratio of Ara- to Ara+
cells first deviates signiﬁcantly from 1. Our measurement of that ratio is a sample from a
binomial distribution where the expected value is the actual ratio in the population. We
concluded that a signiﬁcant shift had occurred when an observed ratio was unlikely to
have been drawn from a population with a 1:1 ratio, using a conservative probability
cutoff of p < 0.001. On average 433 colonies were counted from each population at each
time point. Populations 1-10 ﬁrst deviated from the ancestral ratio at generations 200,
200, 140, 260, 200, 200, 180, 180, 260, and 180, respectively. The mean time is 200
generations, which agrees well with the timing of the ﬁrst step in population Ara-1 based
on changes in ﬁtness and cell morphology (Lenski and Travisano, 1994; Elena, Cooper

and Lenski, 1996).

75

Figure 11. Changes in frequency of a neutral allele over ~880 generations of evolution in

ten additional populations. An average of 433 colonies were counted from each

population every 20 generations (3 days). The ordinate indicates the Loglo ratio of Ara-

/Ara+ colonies, and the abscissa indicates time in generations. Notice the near constancy

of the ratio for the ﬁrst 100 generations or so, and the substantial shifts thereafter; such

dynamics are hallmarks of selection acting on beneﬁcial mutations that happen in one or

both marked genetic backgrounds.

1°
01

 

log(red/pink)

.1
I o
to 01

. .0 .0 7‘
.1 01 o 01 .1 01 [\D
.1— 1 1

 

lb
01

 

 

I T

100

200

 

800

generations

76

Next we asked whether the ancestral population was being eliminated by one or
multiple subpopulations. Visual inspection of Figure 11 suggests that multiple
subpopulations are replacing the ancestor in at least nine out of the ten populations.
Following the initial shift away from 1, the ratio then either levels out or reverses
direction at least once in nine populations. These kinds of dynamics are possible only if
subpopulations with beneﬁcial mutation have independently arisen on each of the two
ancestral backgrounds (Ara- and Ara+). In fact, at least four of the ten populations still
have not fixed one or the other Ara marker allele by 880 generations, demonstrating that
no beneﬁcial mutation had swept to fixation. By generation 1000, population Ara-1 was
already approaching its third adaptive step (Lenski and Travisano, 1994); yet it still had
not ﬁxed any known mutation, even though ﬁve loci were polymorphic for beneﬁcial
mutations (Table 8).

We would like to estimate the number of subpopulations harboring independent
beneficial mutations that compete in these populations. To that end, we used a maximum
likelihood approach to estimate this number during the initial adaptive step. We deﬁne a
contending beneﬁcial mutation as one that reaches a high enough frequency to be
detected with our sampling (at least one in ~433). The number of contending mutations
that make up the ﬁrst adaptive step in a population is designated n, and we want to
calculate its likelihood function given that nine out of ten populations remained
polymorphic through the ﬁrst adaptive step. For one population, the probability of
remaining polymorphic is one minus the probability of becoming ﬁxed, which can only
happen if all n contending mutations occur in the same genetic background (either Ara+

or Ara-). Therefore, for a single population, the probability of remaining polymorphic is

77

1 — (f " + (1— f )"), where f is the frequency of the neutral allele (equal to 0.5 in our
experiment). Expanding this equation to the probability of nine of ten populations

remaining polymorphic, the likelihood function for n is

10

L0) =( 9)(1-(f" + (1— f)"))9(f" + (1- ﬂ")-

Figure 12 plots this function for n from 1 to 20. The 95% conﬁdence interval was
estimated using a likelihood ratio test (Sokal and Rohlf, 1995). The exact probability
distribution for this case is unknown but can be approximated by a )6 distribution with
one degree of freedom (Wilks, 1938; Self and Liang, 1987; Sokal and Rohlf, 1995).
Using this approximation, likelihoods falling below 0.0305 are signiﬁcantly less likely, at
the p < 0.05 level, than the maximally likely n. Therefore, our best estimate is that, on
average, about four contending subpopulations take part in the ﬁrst adaptive step, with

the 95% confidence interval from three to nine.

DISCUSSION

In this study, we examined in detail the dynamics of substitution of known
beneﬁcial mutations in focal population Ara-1 of the long-term E. coli evolution
experiment. We also examined the dynamics of adaptation in ten other populations in
order to evaluate whether the complex patterns observed in the focal population were
typical. In the next two sections, we summarize speciﬁc conclusions from these analyses.
We then discuss more general implications of our ﬁndings for understanding the
dynamics of adaptation in large asexual populations.

Dynamics of substitution of beneﬁcial mutations in population Ara-I . Genetic

tests were performed on >400 clones sampled during the ﬁrst 5,000 generations of this

78

Figure 12. Likelihood estimation of the number of contending mutations, n, that make up
the cohort of beneﬁcial mutations that displaces the ancestral population. Calculations
are based on the outcome that nine out of ten populations remained polymorphic for the
neutral allele through the ﬁrst adaptive step (Figure 11). The likelihood of each n was
calculated given this outcome. A likelihood ratio test indicates that likelihood values less
than 0.0305 are signiﬁcantly less likely than the maximally likely value of n at the p <
0.05 level. Thus, the best estimate for the number of contenders is four, with the 95%

conﬁdence interval ranging from three to nine.

 

114 I r I r

[135

[13

025

0.2

likelihood

045

DJ

005

 

 

 

0 5 10 15 20
number of contenders (n)

79

focal population of the long-term E. coli evolution experiment. These data document the
rise and eventual ﬁxation of beneﬁcial mutations at six loci. For four of these loci — rbs,
topA, spoT, and glm US — competition assays with isogenic constructs have conﬁrmed the
beneﬁcial effects of mutations (Cooper et al., 2001; Crozat et al., 2004; Cooper et al.,
2003; Stanek and Lenski, unpublished data). For the other two loci — pku and ppr —
parallel substitutions of similar mutations in all or several of the replicate populations
indicate their beneﬁcial nature (Chapter 1). The order in which the beneﬁcial alleles
arose was rbs, topA, spoT, glmUS, pku, and ppr with ﬁrst appearances in generational
samples 500, 500, 1000, 1000, 1500, and 3000, respectively. In those cases where
multiple alleles were ﬁrst detected in the same sample, their order of appearance was
deduced from the existence of clones that had one but not both alleles.

The alleles did not ﬁx independently but instead they were substituted in more or
less discrete sets, with rbs, topA, spoT, and glm US being fixed between generations 1000
and 1500, followed by pku between generations 3000 and 4000 and by ppr between
generations 4000 and 5000. Another beneﬁcial mutation, as yet unidentified, was also
substituted between the mutations in pku and ppr. With respect to the initial set of
beneficial alleles, it is impossible to say whether their ﬁxations were simultaneous or
only nearly so, as this distinction depends on the order in which the competing lineages
with some or all of the ancestral alleles were eliminated; our study lacked the sampling
intensity necessary to resolve that point.

In any case, the existence of such sets of linked beneﬁcial mutations depends on
the phenomenon of clonal interference, which has important effects on the dynamics of

adaptation in large asexual populations (Muller 1932; Gerrish and Lenski 1998). Clonal

80

interference occurs when multiple beneﬁcial mutations arise in different asexual lineages,
such that the lineages compete with one another and impede the progress of any of them
to ﬁxation. If the production of further beneﬁcial mutations was somehow stopped after
two or more lineages had arisen, each with single beneﬁcial mutations, then eventually
the single most ﬁt mutation would prevail. However, its time to achieve ﬁxation would
be slowed down relative to the time necessary if it competed only against the ancestral
type, and the magnitude of the delay could be substantial if one of the other lineages had
a beneficial mutation with nearly the same beneﬁcial ﬁtness effect. Thus, for example, it
would take approximately 10 times as long for a lineage carrying a mutation with a 10%
advantage to be ﬁxed in the presence of another lineage carrying a mutation with a 9%
advantage as it would in the absence of that other lineage; this difference arises because
the eventual winner’s advantage is only 1% against that other lineage. But the generation
of new beneﬁcial mutations does not stop at some arbitrary point. In fact, the likelihood
that a second beneﬁcial mutation arises in a lineage that already has one such mutation is
substantially greater with clonal interference, which increases the total number of cell
generations before any beneﬁcial mutation is ﬁxed. In the hypothetical case above, and
now allowing secondary beneﬁcial mutations, the eventual winner between the lineages
that acquired the mutations with 10% and 9% beneﬁts would likely depend on which
lineage produced the next mutation with a substantial beneﬁt before the other was driven
extinct. And if both lineages produced secondary mutations with similar beneﬁts, then
the winner might depend on which one produced the best tertiary beneﬁcial mutation.

It is precisely this scenario, in which several beneﬁcial mutations are assembled

in a single lineage before all of them are eventually ﬁxed, that happened in population

81

Ara-1. In addition to the dynamics of the beneﬁcial alleles, which are fully consistent
with this conclusion, other lines of evidence can be explained only by this scenario. In
particular, clones lacking the beneﬁcial mutations that were eventually substituted were
often as or more ﬁt than clones from the same generation that had those mutations. A
particularly striking example was seen at generation 500; some clones that had neither the
rbs(ﬁx) allele nor the topA(ﬁx) allele were more ﬁt than other clones that carried both
(Table 8). Although the former clones were indistinguishable from the ancestor based on
the genetic tests we used, they were some 20% more ﬁt than the ancestor, thus proving
that they had other unknown beneﬁcial mutations. In addition to qualitatively similar
results from other generations, we also identiﬁed three beneﬁcial alleles that arose but
were eventually lost; all three were in genes where other beneﬁcial mutations later ﬁxed.
Two of the lost beneﬁcial mutations were in rbs, and the other was in pku. Moreover,
the lost pku (A) allele arose in the same lineage that carried the lost rbs(A) allele, and
this lineage had a ﬁtness at generation 1000 that was almost as great as the lineage that
carried the four beneﬁcial mutations that eventually ﬁxed in rbs, topA, spoT, and glm US.
Thus, two beneﬁcial mutations, and probably more, accumulated even in some lineages
that were ultimately excluded by the eventual winner.

Dynamics of beneﬁcial substitutions in relation to dynamics of ﬁtness. In this
section, we examine the correspondence between the dynamics of beneﬁcial substitutions
documented in this chapter and the dynamics of ﬁtness and cell morphology reported in
previous studies performed on this same focal population. Lenski and Travisano (1994),
using single clones saved every 100 generations, showed that ﬁtness increased in a step-

like manner in this population, while Elena et al. (1996) showed a corresponding pattern

82

Figure 13. Results of a previous study of focal population Ara-1 showing the step-like
increases in fitness. Clones were sampled every 100 generations over the ﬁrst 2000
generations of evolution, and their fitness values were measured relative to the marked
ancestral strain using the same procedures as used for all of the ﬁtness values reported in
the current study. The solid line shows the ﬁt of a step model to the data, which was

signiﬁcantly better than various simpler models. Figure reproduced from Lenski and

 

 

 

Travisano (1994).
1.4 -
(D
m 1.3
LIJ
E
I: 1.2
LlJ
2 1.1
'3
LIJ
a: 1.0
0.9 _ #1 1 l J
O 500 1000 1500 2000

TIME (generations)

83

for average cell volume. The two datasets showed similar timing of the step-like changes
in their respective phenotypes. The ﬁtness dataset is reproduced here as Figure 13.

The originally proposed explanation for the dynamics was that each step reﬂected
a sequential and rapid fixation of a rare beneﬁcial mutation. The ﬁndings of this study
disprove that explanation. Instead, our analyses demonstrate that several subpopulations
produced similarly beneﬁcial mutations, with no lineage able to displace all the others
until it had acquired multiple mutations. Therefore, each step-like shift encompasses
several more or less simultaneous transitions in multiple subpopulations. At ﬁrst glance,
it may seem rather surprising that such complex dynamics would preserve a step-like
appearance. However, these step-like dynamics reﬂect only the mean behavior of the
population; mathematical analyses and numerical simulations have demonstrated that
such dynamics are rather general in evolving asexual populations, even when clonal
interference gives rise to processes more complex than temporally isolated sequential
sweeps (Gerrish and Lenski, 1998; Chapter 2).

The step-like increases in ﬁtness, coupled with coexistence of multiple lineages
through the steps, imply that the among-genotype variation in ﬁtness (at least among the
main contenders) should be low within a generational sample relative to the change in
ﬁtness across steps. Consistent with this view, the ﬁtness data show an increase of ~35%
over the ﬁrst 2000 generations (Table 8), while most of the analyses of variance show
little or no variation in ﬁtness between co-occurring genotypes and clones within
genotypes (Table 9; Figure 11). The only time point at which we detected signiﬁcant
variation in fitness was in the SOC-generation sample, where we observed signiﬁcant

variation among the co-occurring genotypes. Fisher’s fundamental theorem of natural

84

selection states that the rate of adaptation should be proportional to the variation in ﬁtness
in a population (Fisher, 1930), and at 500 generations the population was adapting most
rapidly (Lenski et al., 1991; Lenski and Travisano, 1994). Moreover, the 500-generation
sample appears to have captured the beginning of a step up in mean fitness. The ﬁtness
of the least ﬁt genotype was ~1. 14 (Table 8; Figure 10) and consistent with the ﬁtness
step that ended around 500 generations (Figure 13; Lenski and Travisano 1994), whereas
other genotypes had ﬁtness values that averaged ~1.2l (Table 8; Figure 10) consistent
with the subsequent step (Figure 13; Lenski and Travisano 1994).

Dynamics of adaptation in ten replicate populations. The results obtained with
the focal Ara-1 population demonstrate the importance of clonal interference, which gave
rise to complex dynamics of substitution of beneﬁcial mutations. In particular, several
beneﬁcial mutations were incorporated into the lineage that eventually prevailed before it
was able to exclude other contending lineages that had also produced multiple beneﬁcial
mutations. Although these dynamics are fully consistent with previous work examining
the dynamics of ﬁtness in the same focal population, the interpretation is considerably
more complex than was suggested by the phenotypic dynamics alone. Therefore, we
sought to determine whether these complications were an idiosyncratic feature of that
particular population or, alternatively, whether such complication would be typical of
other populations under the same conditions.

The analyses of the mutational dynamics and clonal estimates of relative ﬁtness
were extremely intensive for the focal population, which made it impossible to perform
comparable analyses for the other long-term populations. Therefore, we designed a new

evolution experiment speciﬁcally to address whether qualitatively similar effects would

85

occur in other populations under the same selective conditions and population size. We
started 10 new populations, each with a 50:50 mixture of two neutral marker states. If the
results obtained with focal population Ara-1 were atypical, and instead single beneﬁcial
mutations were responsible for most selective sweeps, then we would expect one or the
other marker state to sweep to ﬁxation. On the other hand, if the results in population
Ara—1 were typical, then we expect to see evidence for strong clonal interference among
multiple contending beneﬁcial mutations. In that case, the markers should change from
their initial frequencies, but neither state should sweep rapidly to ﬁxation. Consistent
with the latter hypothesis, in almost all populations the markers remained polymorphic
long after beneﬁcial mutations had arisen that displaced their relative frequencies from
the initial 50:50 ratio (Figure 12). This outcome conﬁrms that focal population Ara-1
was typical in having cohorts of several contending beneﬁcial mutations, which caused
clonal interference that, in turn, required the accumulation of multiple beneﬁcial
mutations in a lineage before they could collectively be substituted in the evolving
population.

Implications for the phenomenon of periodic selection. These dynamics - in
which cohorts of several beneﬁcial mutations contend for ﬁxation and multiple beneﬁcial
mutations must accumulate in the winning lineage before any of them can be substituted
— impact the genetic variation that exists during the adaptive process. According to the
classic model of periodic selection in asexual populations, neutral and deleterious
mutations accumulate between selective sweeps by beneﬁcial mutations, but these
variants are purged as a single beneficial mutation — derived from a single cell that is, in

most cases, on an otherwise unmutated background - replaces all other genotypes

86

(Atwood, Schneider and Ryan, 1951a, b). However, under the conditions in our
experiments, this model does not hold, at least not in its original form. Instead, there are
a number of sub-populations present at all times, each with different sets of contending
mutations. Although each sub-population traces its ancestry to a single individual, each
adaptive step typically involves several subpopulations. Thus, there is a higher chance
that a neutral mutation will survive an adaptive step because it can occur in any of several
individuals, instead of just one. Still, one would expect that most of the neutral and
deleterious variation will be purged, as in the classic periodic selection experiments. An
important future research objective is to understand precisely how these dynamics affect
the expected frequency distributions of neutral and non-neutral genetic mutations.
Implications for indirect selection on evolvability. When asexual lineages have
to accumulate multiple beneﬁcial mutations in order to out-compete other lineages with
contending mutations, this situation may allow indirect selection for genotypes with
greater adaptive potential. Variation in evolvability could reﬂect differences either in
genetic architecture or mutation rates. In the ﬁrst case, existing mutations might interact
epistatically with potential subsequent mutations such that the ﬁtness beneﬁt of the latter
mutations depend on the presence of the former mutations (e. g., Lenski et al., 2003b). In
the second case, mutations in certain genes affect processes of DNA repair such that rates
of mutation are increased either locally (Moxon et al., 1994) or globally (Sniegowski et
al., 1997; Matic et al., 1997; Oliver et al., 2000). In either case, if a mutation affects the
likelihood of additional beneﬁcial mutations occurring, then this fact will inﬂuence its

own probability of eventual ﬁxation.

87

In fact, there are reasons to think that some of the mutations that were
polymorphic for extended periods of time in focal population Ara-l may exhibit epistatic
interactions that inﬂuence their fate. Both spoT and topA are global regulators, and
mutations in those genes affect the expression of many other genes (Pruss and Drlica,
1989; Steck et al., 1993; Cashel, 1996; Cooper et al., 2003). Also, in the case of spoT,
many of the other long-term populations substituted mutations in spoT, but four had not
done so even after 20,000 generations (Cooper et al., 2003) . The spoT mutation that
ﬁxed in population Ara-l conferred a 9% advantage when it was moved into the ancestral
genetic background, so it is surprising that similar mutations would not have ﬁxed in all
the populations after 20,000 generations. The populations had slowed signiﬁcantly in
their rate of adaptation after 10,000 generations, and therefore mutations of such large
effect would have had little competition if they occurred. And such mutations should
have occurred repeatedly in each population. With a base-pair mutation rate of about 1.4
x 10"“ (Lenski et al., 2003), at least eight different mutations within spoT able to give
similar advantages (Cooper et al., 2003), and an effective population size of ~3.3 x 107
(Lenski et al., 1991), we expect that favorable mutations in spoT occurred every 26
generations or so, and one of them should survive stochastic drift loss about every 146
generations, given a probability of surviving drift loss of approximately 2s or 18%
(Haldane 1927; Lenski et al., 1991; Johnson and Gerrish 2002). Moreover, when the
spoT(ﬁx) allele from population Ara-1 was moved into a clone sampled at generation
2000 from another population that did not ﬁx a mutation in spoT even after 20,000

generations, that allele no longer conferred any ﬁtness advantage (Cooper et al., 2003),

88

which conﬁrms the importance of epistatic interactions in determining the fate of
mutations.

Implications for the advantage of recombination. The Fisher-Muller hypothesis is
a prominent explanation for the advantage of genetic recombination (Fisher 1930; Muller
1932, 1964; Crow 1965; Felsenstein 1974). According to this hypothesis, sexual
populations can adapt more quickly than asexual ones because sex allows mutations that
arise in different genomes to be recombined into a single genome, whereas asexual
populations must wait for each mutation to arise sequentially on a single genetic
background. Several recent experimental evolution studies support this hypothesis by
showing that sexual populations do adapt more quickly than otherwise identical asexual
populations (Colegrave 2002; Goddard, Godfray and Burt, 2005; Grimberg and Zeyl,
2005). Our data on allele frequencies also lend support to this hypothesis by showing
that beneﬁcial alleles were indeed prevented from spreading by competition with other
beneﬁcial alleles. One important caveat to the Fisher-Muller hypothesis is that the alleles
that are competing must be different, such that bringing them together through
recombination would be beneﬁcial. The pku (A) allele, which was eliminated, preceded
the pku (ﬁx) allele that was eventually ﬁxed, and therefore it is likely that its
recombination into the lineage that won would have accelerated the population’s overall
rate of adaptation. By contrast, most contending sub-populations at generation 500 had
equivalent mutations in the rbs operon, and therefore recombination involving that gene
may not accelerated the overall rate of adaptation. The rbs mutations are a special case,
however, because they occur at an unusually high mutation rate owing to an adjacent

genetic element that produces mutations at that site (Cooper et al., 2001). More

89

generally, the strong clonal interference documented in our study indicates that asexual
reproduction can indeed limit the overall rate of adaptive evolution (Fisher 1930; Muller

1932, 1964; Crow 1965; Felsenstein 1974; Gerrish and Lenski 1998; de Visser et al.,
1999).

90

CHAPTER 4

VARIATION IN EVOLVABILITY PREDICTS EVENTUAL SUCCESS IN AN
EVOLUTION EXPERIMENT WITH ESCHERICHIA COLI

ABSTRACT

Explanations for the evolution of evolvability often rely on ill-deﬁned group or
species level selection. Here a model for selection on evolvability within a single
population is tested, which has the potential to explain many traits. This model applies to
large asexual populations. If this model is to explain selection for evolvability, then (i)
independent subpopulations must coexist through multiple adaptive steps and (ii) the
subpopulations that are competing must differ in their evolvability. The ﬁrst condition
was described in an experimental Escherichia coli population. Here, the second condition
is tested using four clones that were sampled from that population. Two of these clones
were of the genotype that eventually prevailed. The other two were from genotypes that,
although they competed for hundreds of generations and continued to adapt, eventually
lost. Ten replicate populations were started from each clone and evolved for 883
additional generations. Populations founded from the eventual winners began to adapt
sooner and reached higher ﬁtness. These results are consistent with the hypothesis for

indirect selection for evolvability.

INTRODUCTION
The evolution of evolvability, that is, the capacity to produce heritable variation

(Krischner and Gerhart, 1998), poses a unique challenge to evolutionary biologist.

91

Evolvability cannot be selected in the usual Darwinian sense because it does not directly
affect an organism’s ﬁtness, yet traits that inﬂuence evolvability are common in nature.
For example, chaperone proteins (Rutherford and Lindquist, 1998; Queitsch Sangster and
Lindquist, 2002, Fares et al., 2002), the [PSI+] prion in yeast (True and Lindquist, 2000),
the modular nature of metazoan development (Krishner and Gerhard, 1998; Halder,
Callerts and Gehring, 1995), abundant retrotransposons in mammalian genomes
(Kazazian, 2000; Gould, 2002 p1273), codon bias in viral antigens (Plotkin and Dushoff,
2003), contingency loci in microbial pathogens (Moxon et al., 1994), increased mutation
rates during times of stress (Bjedov et al., 2003), and increased genomic mutation rates
in pathogenic Escherichia coli (Matic et al., 1997) and Pseudomonas aeruginosa ( Oliver
et al., 2000), have all been suggested to contribute to evolvability.

Evolvability can evolve when it is pleiotropically associated with some other trait
that is experiencing direct selection. For example, the codon bias in DNA regions that
encode antigens in inﬂuenza A is well suited to maximize the speed of adaptation.
However, this pattern may have been generated by past selection related to the direct
ﬁtness advantage of beneﬁcial mutations (Plotkin and Dushoff, 2003). Likewise,
chaperone proteins usually have the direct effect of lessening the deleterious effects of
misfolded proteins caused by high temperatures or mutation, but they also therefore
maintain variation that might be unveiled and used for adaptation to changing
environments (Rutherford and Lindquist, 1998; Queitsch Sangster and Lindquist, 2002,
Fares et al., 2002). However, once a trait that increases evolvability exists, the advantage
to the group having it, relative to other groups, becomes straightforward. Several authors

have proposed that evolvability may give a selective advantage to groups or species

92

(Jablonski, 1987; Dawkins, 1989; Gould, 2002 pp 1270-1295, Lindquist 2003). Even
Dawkins, who has strongly argued for the prominence of gene-level selection in most
situations (Dawkins, 1976), has suggested that evolvability may be an exceptional case of
species-level selection (Dawkins, 1989). However, arguments for group selection must
be made with caution (Williams, 1966).

Therefore, a concrete model for the evolution of evolvability within a single
population has been developed that applies to large asexual populations. Microbial
populations that have little recombination have produced some of the clearest examples
of traits that apparently function to increase evolvability (Moxon et al., 1994; Matic,
1997; Oliver et al., 2000). Several recent studies of computer simulations (Tenaillon et
al., 1999; Chapter 2) and experimental microbial populations (Mao et al., 1997; Chapter
3) have yielded the following model of selection for evolvability within an asexual
population. When favorable mutations are common, such as when organisms experience
a change in their environment, several independent mutations may arise and escape
stochastic loss before any one can sweep through the population (Muller, 1932; Gerrish
and Lenski, 1998; Wilke, 2004). Collectively, the genotypes that have these beneﬁcial
mutations will displace the ancestral genotype. But without recombination to bring these
mutations together into a single genome, these beneﬁcial mutations will then compete
with each other. At this point, if no more beneﬁcial mutations arose the most ﬁt among
these single mutant genotypes would eventual take over the population. However, when
ﬁirther beneﬁcial mutations are possible, several genotypes that have different ﬁrst
beneﬁcial mutations will produce second beneﬁcial mutations before the most ﬁt single

mutation dominates the population. Therefore, eventual success for a beneﬁcial mutation

93

may depend not only on the direct ﬁtness advantage it confers, but also on the
opportunity it allows for additional adaptation, relative to the other contending mutations.
These independent subpopulations may, in fact, compete over many adaptive steps before
one eventually out-competes all others (Chapters 2 and 3). A lineage that has the ability
to generate more beneﬁcial mutations or beneﬁcial mutations that confer a greater
selective advantage (i.e., has increased evolvability) can increase its proportion in the
population at each adaptive step.

With this model, evolvability confers an advantage to a lineage over time. It is
the very fact that the lineage can produce more favorable mutations that gives it this
advantage. Thus, selection is for evolvability. However, this selection is indirect because
the increased representation of a lineage is always due to the ﬁtness of individuals that
make up that lineage. The ﬁtness of individuals is a product of the beneﬁcial mutations
they harbor, not their ability to generate future beneﬁcial mutations.

The dynamics of clonal interference, which this model relies upon, have been
previously described in detail in an experimental population of Escherichia coli (Chapter
3). In that population, multiple subpopulations coexisted through multiple step-like
increases in ﬁtness before one subpopulation eventually out-competed all others. It was
further shown that independent subpopulations arose at an early time point; the
descendants of these genotypes subsequently competed for hundreds of generations and
accumulated several additional mutations before the eventual winners (henceforth
abbreviated as EW) displaced the eventual losers (henceforth EL). It is possible that the
lineage that eventually won did so, in part, because it had a higher propensity to evolve.

In this case it would be an example of selection on evolvability as described above.

94

Alternatively, the winner may have been the lineage that, simply by chance, picked up
more beneﬁcial mutations. The experiments reported here are designed to distinguish
between these two possibilities. From that early time point, we picked two representative
clones of the genotype whose descendants would eventually take over the populations,
and two representative clones of the genotypes whose descendants would compete, and
further adapt, yet eventually become extinct. We allowed many replicate populations
founded from these clones to evolve independently and measured their abilities to adapt,

which we then compared.

MATERIALS AND METHODS

The long—term lines and population Ara-I. Twelve populations of E. coli were
propagated through serial batch transfer in a long-term evolution experiment (Lenski et
al., 1991; Lenski and Tranvisano 1994; Lenski, 2004; Chapter 3). The populations were
grown in 10 ml Davis minimal media supplemented with 25ug/ml of glucose (DM25), at
37°C in 50-ml Erlenmeyer ﬂasks rotating at 120 revolutions per minute. Daily transfers
of 1% of the populations into fresh media resulted in ~6.64 generations/day. One of
these twelve populations, called Ara-1, has been the focus of a number of previous
studies (Lenski et al. .1991; Lenski and Travisano, 1994; Elena, Cooper and Lenski 1996;
Cooper, Rozen and Lenski, 2003; Papadopoulos et al., 1999; Schneider et al., 2000;
Crozat et al., 2004; Chapter 3), and it is the focus of this study as well.

E ventual winners and eventual losers. A previous study (Chapter 3) demonstrated
that, in population Ara-l, multiple subpopulations coexisted thorough multiple adaptive

steps. From a sample collected at generation 500 from this population, four genetically

95

distinct subpopulations were identiﬁed. One of these had two mutations that would
eventually ﬁx in the population, including a large deletion in the rbs operon (Cooper et
al., 2001) and a point mutation in the topA gene (Crozat et al., 2004). Because these
mutations eventually ﬁx in the population, this genotype is called the “eventual winner”
(EW). Two representative clones of the EW genotype were picked (designated EWl and
EW2). Two other genotypes that were present at 500 generations had no mutations
known to ﬁx in the population, yet their ﬁtness clearly indicated they contained beneﬁcial
mutations (Chapter 3). Because these mutations were eventually eliminated from the
population, we refer to the genotypes having these mutations as the “eventual losers”
(EL). Two representative clones of the EL genotypes were chosen from the sample taken
at generation 500. One of these, ELI, contained no known mutations. A second clone,
EL2, had a deletion in rbs that did not ﬁx. In population Ara-l the descendants of the EL
and EW genotypes were still present and competing with each other 500 generations
later, even though the descendants of the EW genotype had picked up two additional
beneﬁcial mutations. The descendants of the EW genotype eventually took over the
population between 1000 and 1500 generations. (A fourth genotype identiﬁed in Ara-1 at
500 generations had the deletion in rbs that would ﬁx but not the topA mutation that
would ﬁx; it was not included in this study.)

Ara+ revertants. Clones that differed ﬁ'om EWl, EW2, ELI, and EL2 at only a
neutral locus were created as follows. All of the clones from population Ara-l were
unable to use the sugar L(+) arabinose, a phenotype called Ara-. Mutations at this locus
that are Ara+ are known to be selectively neutral in DM25 but they give a readily visible

phenotype when grown on tetrazolium arabinose (TA) indicator agar. Ara- colonies

96

appear red while those that are Ara+ appear light pink. Mutants that were capable of
using arabinose were selected by plating >10lo cells onto minimal media agar plates
supplemented with arabinose. Colonies that grew on these plates contained a spontaneous
mutation in araA, which allowed for the metabolism of arabinose. All Ara+ revertants
were shown by a restriction fragment length polymorphism assay to contain identical
nucleotide substitutions. Given these procedures and the very low genomic mutation rate
in the ancestral strain (Lenski, Winkworth and Riley, 2003) it is unlikely that these
genotypes contained any mutations other than the one responsible for the Ara+
phenotype.

Fitness assays. The standard ﬁtness assay for this system was used (Lenski etal.,
1991). For each assay, two clones were inoculated separately into ﬂasks containing LB
liquid medium from the freezer stocks and allowed to grow for 24 hours at 37°C. These
cultures were then diluted 100 fold, and 0.1 ml was transferred to 9.9 ml of DM25
medium where the cultures were again allowed to grow separately for 24 hours at 37°C.
This step allows for physiological acclimation to the competition conditions. The two
clones were then mixed at a 1:1 volumetric ratio (unless otherwise noted) and a combined
0.1 ml was added to 9.9 ml fresh DM25 medium. A sample was taken from this mixture
and plated onto TA agar to estimate the initial population density of each type (N). The
mixture was propagated for one or more days with daily 1:100 dilutions into fresh DM25
medium. At the end of the experiment a second sample of the mixture was plated onto
TA agar to estimate the ﬁnal density of each type (N f). The Malthusian parameter (m) is
the realized grth rate of a clone over the test period. It is calculated as m = ln[100' -

N/NJ/(t), where t is the number of days, and where both counts reﬂect changes in overall

97

density based on the dilutions used for plating. The relative ﬁtness of one clone to
another is the ratio of their Malthusian parameters.

Additional evolution. To test the inherent evolvability of the four clones (EWI,
EW2, ELI, and EL2) ten replicate populations were founded with each clone and their
subsequent adaptation was quantiﬁed. Each population was started with a 1:1 ratio of a
clone and its cognate Ara+ revertant. The evolution environment was identical to that
experienced by population Ara-1, during the long-term evolution experiment. These
populations were labeled alphabetically within each ancestral clone (EWla, EWlb, and so
on). The duration of the experiment was somewhat arbitrary. However, 883 generations
approximates the amount of time until the eventual winners overtook the eventual losers
in population Ara-l, which can be seen from the steps in ﬁtness (Lenski and Travisano,
1994) and cells size (Elena, Cooper and Lenski, 1996) and the changes in allele
frequencies (Chapter 3).

Timing the rise of the ﬁrst beneﬁcial mutation. Every 3 days (~20 generations) 3
sample, containing ~400 colony forming units, from each population was plated onto
tetrazolium arabinose (TA) indicator agar plates. Any substantial deviation from the
initial ratio indicated the presence of a new adaptive mutation (Chao and Cox, 1983;
Rozen et al., 2002; chapter 3). We can be sure of this inference because the neutral allele
at a frequency of 0.5 is unlikely to vary signiﬁcantly due to random drift alone given the
large population size and relatively few generations. The expected mean time to ﬁxation,
or loss, by drift alone for a mutation initially at a frequency of 0.5 and Ne ~3.3x107 is
about 90 million generations (Kimura and Ohta, 1969). No detectable shift would be

measured after only 900 generations. Therefore, we can infer that at least one beneﬁcial

98

mutation has arisen to a high enough frequency to be detected based on the change in
frequency of the linked neutral marker.

The following protocol was used to decide on the time point at which a shift in
neutral allele frequency was ﬁrst detected. First, for each population we calculated the
initial frequency of the neutral allele by averaging over the ﬁrst 5 time points (0, 20, 40,
60, and 80 generations), during which time the frequency was constant. This averaging
gives a good estimate of the initial frequency, which was expected to be 0.5 and in most
case was very close. Second, for each population and at each time point we calculated
the probability that the observed ratio of Ara+ to Ara- could have been drawn from a
population with a mean equal to the initial value in that population. The frequency was
tentatively determined to have signiﬁcantly changed when the null hypothesis was
rejected at p < 0.05, which required about a 5% shift in allele frequencies. However, the
large number of samples taken suggests that deviations of p < 0.05 would also sometimes
occur by chance. Therefore, we additionally required that the samples must have differed
from the initial ratio in two consecutive time points in order to accept a tentative
determination as valid.

Fitness of E W-derived clones relative to EL-derived clones. Comparison of
relative ﬁtness between the evolved eventual winners and the evolved eventual losers was
carried out as follows. From each of the 40 populations, a single clone was chosen at
random from the 883-generation sample. At that time, some populations had ﬁxed the
Ara- phenotype, some had ﬁxed the Ara+ phenotype, and others remained polymorphic.
Thus, the clones picked were a mixture of the Ara+ and Ara-. Each of the Ara+ clones

derived from an eventual winner was competed against each of the Ara- clones derived

99

from an eventual loser, and vice versa. Of the 20 evolved clones derived from EWl and
EW2, 12 were Ara- and 8 were Ara+. Of the evolved EL populations 12'were Ara- and 8
were Ara+ clones. Each possible EW-EL pair with opposite marker types was competed
once, although two of the measures did not yield useable results owing to procedural
errors. Thus, there were 190 (12x8+8x12-2) ﬁtness assays. For the statistical analysis,
these data were treated as two independent data sets.

A mixed general linear model was ﬁt to the two data sets independently using the
MIXED procedure in SAS. Each data point is the ﬁtness of one EW-derived clone
relative to one EL-derived clone. In the model the intercept was treated as a ﬁxed effect
and derived clone was treated as a random effect. The Satterthwaite approximation was
used for the degrees of freedom to correct for the two missing data points. The model
estimated an intercept and a standard error around the intercept. Hypotheses concerning
the intercept were tested with a t-test (Sokal and Rohlf, 1995), and degrees of freedom
from the Satterthwaite approximation. Random effects were tested with a likelihood ratio
test (LRT), where the difference in the —2 log likelihood value of a full model is
subtracted from the -2 log likelihood of a reduced model. This difference is compared to
a x2 distribution with degrees of freedom equal to the difference in the number of
parameters in the two models (Littell et al., 1996; Chapter 3). The full model, which
contained the intercept, the EW-derived clone, and the EL-derived clone, was compared
to reduced models in which the term for either the EW-derived clone or the EL-derived
clone was removed.

Two of the evolving populations experienced pipetting errors. First, in population

EL2h only the Ara- clone was present in the founding population. Therefore, this

100

population was not informative for the timing of adaptation, but it was still used for the
ﬁtness assay among 883-generation clones. The second error occurred when a 0.1 ml
aliquot from Ele was mistakenly transferred into EWld along with EWld’s normal
daily 0.1 ml transfer at day 84, resulting in a population that was half EWld and half
Ele. This mixed population was then propagated for several days before the mistake
was detected, at which point the decision was made to continue to transfer the mixed
population as EWld. Evidently, the EW-derived subpopulation out-competed the EL-
derived subpopulation. The clone chosen from that population at 883 generation
contained a deletion in rbs that demonstrated it had evolved from the EW clone.
Therefore, it was still used in the competition assays among the 883-generation clones as
the representative clone from EWld. The results, described below, indicate that
population EWld had already begun to adapt before this occurred, so it was informative

for the timing of the ﬁrst shift of the neutral allele.

RESULTS
Relative ﬁtness of E W to EL including a test for frequency dependent eﬁ’ects.

Previous data indicated that the EW clones were signiﬁcantly less ﬁt than the EL clones
at 500 generations (Chapter 3). These ﬁtness measurements were made by competing the
eventual winners and the eventual losers each against the same reference genotype. The
signiﬁcant ﬁtness difference implied that the EL clones were, at that point, enjoying a
temporary ﬁtness advantage. However, it is also possible that competitive interactions are
nontransitive, such that the relative ﬁtness to the reference clone does not indicate their

ﬁtness relative to one another (Paquin and Adams, 1983; Kerr et al., 2002; Kirkup and

101

Riley, 2004). Additionally, it is possible that the competing subpopulations evolved
frequency dependent interactions, which can maintain genetic diversity (Rozen and
Lenski, 2000).

To measure the ﬁtness of the eventual winners relative to the eventual losers, and
to exclude the possibility that they coexisted through frequency—dependent interactions,
we performed competitions between the eventual winners and the eventual losers. Each
EW clone competed against each Ara+ revertant of the EL clones, and each EW Ara+
revertant was competed against each EL clone. These competitions were performed
across 6 initial frequencies that encompass the observed frequencies of the
subpopulations at 500 generations in Ara-1 (EW:EL of 1:24, 1:4, 1:1, 4:1, 24:1 and
124:1) in ﬁtness assays that lasted 7 days (~45 generations). Thus, there were eight data
points at each initial frequency, with two ﬁtness estimates for each possible EW-EL pair
at each frequency.

The results of these ﬁtness comparisons are presented in Figure 14. A least-
squares regression of the combined data indicates that the slope is not signiﬁcantly
different form zero (95% conﬁdence from -0.00010 to 0.00004). Thus, no frequency-
dependant interaction was detected. The intercept was, however, signiﬁcantly less than
one. The two representative EW clones from 500 generations had, on average, a ﬁtness of
0.938 (95% conﬁdence from 0.9344 to 0.9425) relative to the two representative EL
clones.

Subsequent evolution. The evolutionary potential of the EW and EL genotypes

was tested with ten populations founded from each representative clone (EWl , EW2,

102

Figure 14. Relative ﬁtness of eventual winners (EW) to eventual losers (EL) sampled
from population Ara-1 at 500 generations. Each of the two representative EW clones was
competed against the two representative EL clones twice (once with each marker
combination) at 6 initial volumetric ratios (EW:EL of 1:24, 1:4, 1:1, 1:4, 1:24, and 1:124)
for seven days. The horizontal line is the mean of the combined data; the regression of

ﬁtness on intitial log ratio was non-signiﬁcant.

 

 

 

 

 

 

 

0.98—
EleELl
0'97- ‘ EWIZELZ
. l EW2:EL1
3 0.96— 9 EWZZELZ
0
g 095 O ‘ A
— o
. g .
Hi . , . r t
a I A o
a) 0.94” ; i :
g. 0 ' 3 °
2 0.93- 3 I °
:3 A O
9 0.92—
0 ‘ I
0.91 r
A
0.9 l l l l 1 1 l l I
-2 -1.5 -1 -0.5 0. 0.5 1 1.5 2 2.5
log ratio (EW:EL)

103

EL], and EL2) that were allowed to evolve independently for ~883 generations. Each of
these populations was initially homogeneous except for a neutral mutation, the frequency
of which was tracked through time. Using these 40 populations we compared the
evolvability of the EW and EL clones in two ways. First, we identiﬁed at which time the
beneﬁcial mutations were ﬁrst observed to displace the founding genotype. Second, we
compared the ﬁtness levels achieved by the EW- and EL-derived clones.

Figure 15 shows the change in the relative frequency of the neutral markers in
each of the evolving populations over the 883-generation experiment. These data
collectively represent counts of more than 700,000 colonies. Shifts in the frequency of
the neutral marker away from the 0.5 initial value indicate the spread of beneﬁcial
mutations in one or both marker types. For each population, the ﬁrst time at which the
marker signiﬁcantly deviates from its initial value is indicated in Table 10, using the
procedure described in the Materials and Methods. An independent test of the validity of
this procedure is available as follows. Spurious results are equally likely to over-estimate
the frequency as under-estimate the frequency, whereas if the allele frequency has truly
shifted then it should diﬁer from the initial value in the same direction in the two
consecutive samples. In fact, in all 39 populations for which we used this protocol the
ﬁrst two consecutive samples that signiﬁcantly differed from the initial ratio did so in the
same direction. We can strongly reject the null hypothesis that the results were due solely
to chance (sign test, p = 1.8x10'l 1). A Kruskal-Wallis test (Sokal and Rohlf, 1995) for the
effect of founding genotype on the timing of this shift revealed that the differences

among clones were signiﬁcant (Adjusted H = 14.68, d.f. = 3, p = 0.0021). Moreover,

104

Table 10. Estimates of the time of ﬁrst divergence of the neutral allele from its initial
ratio. This time was determined as the ﬁrst of two consecutive measurements that each

differed from the initial frequency at the p < 0.05 level.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

EW 1 generation EW2 generation
a 280 a 180
b 460 b 200
c 220 c 160
d 160 d 240
e 340 e 440
f 240 f 240
g 120 g 260
h 260 h 240
i 340 i 260
j 240 j 260

rnean 266 rnean 248

EL 1 generation EL2 generation
a 400 a 500
b 480 b 720
c 420 c 180
d 620 d 520
e 380 e 400
f 360 f 360
g 300 g 180
h 420 h --

i 380 i 380
j 660 ggj 440
mean 442 mean 408.9

 

 

105

Figure 15. Frequency of a neutral marker in evolving populations. 10 populations were
founded from each of four genotypes: EWl, EW2, ELI and EL2. Initially each
population was isogenic except for the neutral allele at a frequency of ~0.5. Shifts in the
neutral allele indicate beneﬁcial mutations have arisen and are spreading in the
population. The dashed line indicates the estimate of the average time at which the allele

signiﬁcantly deviated from the initial ratio (see Table 11).

 

 

 

 

 

 

 

 

 

 

o 200 400 600 800 o 200 400 .. m 800' "

time (generations)

106

comparison in this respect of each pair of ancestors showed that EWl z EW2 < EL] 2
EL2, using the p < 0.05 to establish dissimilarity. Overall, the neutral allele began to
shift 169 generations sooner in the populations derived from eventual winners than in
those derived from eventual losers.

Next we can assess the difference in the ﬁtness increase over 883 generations
between the EW and EL derived populations. To do this we picked a single clone from
each population at 883 generations. These clones were a mix of Ara+ and Ara- marker
states. To compare their ﬁtness, each Ara+ clone from an EW population was competed
against each Ara- clone from an EL population, and vice versa. This procedure resulted in
two independent data sets.

There are two hypotheses of interest concerning the ﬁtness of clones derived from
EW populations relative those from EL populations. The ﬁrst predicts that the EW clones
adapted faster than the EL clones. We know that the EW clones started with a relative
ﬁtness of 0.938 compared to the EL clones. Therefore, if the ﬁtness of the EW-derived
clones relative to the EL derived clones is, on average, greater than this initial deﬁcit we
can conclude that they did adapt faster. A second hypothesis predicts that the populations
founded with EW clones increased to a greater ﬁnal ﬁtness than the populations founded
from EL clones. Under this second hypothesis, the average ﬁtness of EW-derived clones
relative to EL-derived clones must be greater than 1. Both hypotheses use one-tailed
tests.

The two ﬁtness data sets are presented in Figure 16 and Figure 17. For the data set
of the Ara- EW-derived clones competed against the Ara+ EL-derived clones, the mixed

model estimated an intercept of 1.0205, with a standard error of 0.01 16. Thus, the mean

107

Figure 16. Relative ﬁtness of the Ara- EW derived clones relative to the Ara+ EL derived

clones following 883 generations of additional adaptation. Each point is the result of a

single ﬁtness assay. The two dashed lines indicate the two null hypotheses of interest:

that the derived EW clones have surpassed the initial ﬁtness difference, 0.93 8; and that

the EW derived clones have evolved to a higher ﬁtness than the EL derived clones, 1.

relative ﬁtness (EW:EL)

1.2

1.15

0.95

 

t A —+-— 1?.le
/' l —9— EL“
I 1 + ELlh
, , + Ella
' —v— EL2c
/ 1 + ELZd
’ + ELZf

‘ "' ' E121

 

 

1 4 1 l l

09 l 1 l 1 J l
EWla EWlb EWle EWlf EW2a EWZb EWZC EWZd EWZC EWZI EWZI EW2]

EW derived clone

108

Figure 17. Relative ﬁtness of the Ara+ EW derived clones relative to the Ara- EL derived
clones following 883 generations of additional adaptation. Each point is the result of a
single ﬁtness assay. The two dashed lines indicate the two null hypotheses of interest:
that the derived EW clones have surpassed the initial ﬁtness difference, 0.938; and that

the EW derived clones have evolved to a higher ﬁtness than the EL derived clones, 1.

1.1-

1.05 —"_ ELla
-—9—- EU]
-—*— Ele
+ ELlc
—‘6'— ELle
—<1—- 15ng
—>— ELli
- - - - ELZb

0 9 —B— ELZe
' —e— ELZg

—-*— EL2h
—e— Esz'

relative fitness (EW:EL)

0.85

 

0.8 -

U

 

 

Ech EWld Eng EWlh EWli Ele EWZg EW2h
EW derived clone

109

ﬁtness is greater than both 0.938 (t = 7.11, d.f. = 14.8, p < 0.0001) and l (t = 1.76, d.f.
14.8, p = 0.049). In the second set of comparisons, in which the Ara+ EW-derived clones
were competed against the Ara- EL-derived clones, the results were remarkably similar.
The intercept was estimated to be 1.0212, with a standard error of 0.01 16. Again, the
mean ﬁtness is greater than both 0.938 (t = 7.17, d.f. = 14.1, p < 0.0001) and l (t = 1.82,
d.f. 14.1, p = 0.045). Taken together, these two independent data sets show that the EW
clones have the inherent ability to adapt faster than the representative EL clones.
Moreover, this difference in rate of adaptation was sufﬁcient to allow the evolved EW
clones to overcome their initial ﬁtness disadvantage to become ﬁtter after 883
generations of independent evolution.

The results reported above support the contention that the EW clones were more
evolvable than the EL clones. Nevertheless, it could be argued that the statistical analysis
is not ideal, as the model was run without a term for the particular ancestral clone (either
EWl or EW2 for the ancestor of the EW-derived clones and either ELI or EL2 for the
EL-derived clones). This approach was used because the a priori comparison of interest
concerned the intercept. Including the ancestral genotype would rob that comparison of
all but one degree of freedom in the statistical model. A better experimental design would
have included more replication at the level of ancestral clone. Strictly speaking, then, the
probability values reported above apply only to the representative EW and EL clones and
cannot be extended to the general pool of eventual winners and eventual losers present in
population Ara-1 at 500 generations. However, a test of the effect of ancestral clones
(through a likelihood ratio test of full and reduced models) found that ancestral clone was

a non-signiﬁcant effect for both the EL and EW competitors, in both data sets.

110

We also tested for variation in competitive ability among the independently
evolved clones within an ancestral type, either EW or EL, using a likelihood ratio test in
which the full model was compared to a model in which one term was removed (Littell et
al., 1996; Chapter 3). The effects of derived EL clone and derived EW clone were both
highly signiﬁcant (Table 11). This result indicates that, although there was a general
tendency for the EW clones to evolve to a higher ﬁtness than the EL clone, independent

populations had signiﬁcantly heterogeneous outcomes.

DISCUSSION

The genotypes that were the eventual winners in population Ara-1 evolved more
quickly, and to a higher ﬁnal ﬁtness than the eventual losers, even though they started
with a somewhat lower ﬁtness. Therefore, these clones show that the trait of evolvability
can vary and, moreover, that increased evolvability arose spontaneously within a single
population. The greater evolvability of the eventual winners could be due to a higher
beneﬁcial mutation rate, to beneﬁcial mutations of larger effect, or both. In any case, the
data support the hypothesis that the lineage that eventually prevailed, from among those
present in this population at generation 500 did so, at least in part, because it was
inherently more capable of adapting.

Previous studies indicated that population Ara-1 was in the midst of a step up in
ﬁtness at 500 generations. This interpretation is consistent with the ﬁndings reported here
that genotype EW was less ﬁt than the eventual losers. The fact that the EW genotype
was decreasing in frequency at 500 generations implies that it must have made up a larger

proportion of the population previously. Thus, in population Ara-1 the eventual winners

111

had a head start, but this was counterbalanced by the fact that they were at a selective
disadvantage. It is also quite possible that at 500 generations in Ara-1, that the EW
genotype had already given rise to one or more additional beneﬁcial mutations, which

were still at too low a frequency to be detected.

 

 

 

 

EW Ara+:EL Ara- -2 log likelihood LRT p
Full model -328.100

EW derived clone -306.400 21.700 <0.0001

EL derived clone -3 13.000 15.100 0.0001

EW Ara-:EL Ara+ -2 log likelihood LRT P
Full model -361.100

EW derived clone -350.400 10.700 0.0011

EL derived clone -310.700 50.400 <0.0001

 

Table 11. Statistical analysis of the ﬁtness of EW—derived clones relative to the EL-
derived clones after 883 generations of additional evolution. The full model contained the
intercept, the EW-derived clone, and the EL-derived clone. It was compared to reduced
models in which the term for either the EW derived clone or the EL derived clone was
removed. LRT is the likelihood ratio test (Littell, et al., 1996).

Although their greater evolvability may have tipped the odds in favor of the
eventual winners, chance probably also played an important role in deciding which
lineage eventually prevailed. We see in Figures 3 and 4 that some of the clones derived
from eventual winners were more ﬁt than every eventual loser derived clone with which
they competed, (e. g., EWlb and EWl j). On the other hand, several of the eventual
winner derived clones tended to lose to eventual loser derived clones (e. g., EWlf and

EW2b). The variation in outcomes is also evident from the statistical analysis, which

indicated signiﬁcant variation in competitive ability due to both the EW- and EL-derived

112

clones (Table 11). In other words, if one wagered on the eventual winners present at 500
generations prevailing in the long run, it would be a good bet, but not a sure bet.

At present, the molecular basis of the greater evolvability of the eventual winners
is unknown. The eventual winner contained two known beneﬁcial mutations. One is a
deletion in rbs (Cooper et al., 2001). It does not seem likely that this deletion is the
source of the differential evolvability, because clone EL2 also has an rbs deletion. The
second known beneﬁcial mutation in the eventual winners is in topA (Crozat et al., 2004).
The product of topA is topoisomerase I which controls DNA supercoiling. Studies of the
effect of DNA supercoiling have shown it may affect the expression of many other genes
(Javanovich and Lebowitz, 1987; Pruss and Drlica, 1989; Steck et al., 1993, Gmuender et
al., 2001). A possibility, which deserves further study, is that the topA mutation
conferred an increase in evolvability through epistatic interactions with mutations at other
loci.

Conversely, the difference in evolvability could be due to one or more mutations
in the eventual losers that limit their potential for evolution. In one sense, this is certainly
true. Because EL] and EL2 were more ﬁt, they had already “used up” more of the
adaptive possibilities. In this trivial sense, every beneﬁcial mutation that becomes ﬁxed
lowers the evolvability to some extent because further mutations at that locus, which had
been beneﬁcial before the ﬁxation event, are neutral or deleterious afterwards. The fact
that populations founded with EW clones were able not only to catch up in ﬁtness, but to
surpass the EL-derived populations, suggests that this is not the only explanation for the

difference in their capacity to adapt.

113

It is also possible that the genetic difference between the EL clones and the EW
clones that causes the difference in evolvability is a neutral or slightly deleterious
mutation that was ﬁxed by drift or hitchhiking. However, this explanation is unlikely for
the following reason. It has been estimated that the ancestral genotype used to found
population Ara-1 had a mutation rate of 1.44 X 10'10 per base pair per generation (Lenski,
Winkworth and Riley, 2003). Given the genome size of 4.64 X 10° base pairs (Blattner et
al. , 1997), in a clone picked at 500 generations we would expect, on average, only 0.33
mutations. Furthermore, any single mutation would need to be in either both EW clones
or both EL clones. The most recent common ancestor of either pair necessarily occurred
some time previous to generation 500, thus further shortening the time available for this
non-selected mutation to arise. Finally, this low number is for all mutations. It is
unknown how many mutations affect evolvability, but this is presumably only a tiny
fraction of all mutations.

It seems likely, therefore, that the difference in evolvability between the EW and
EL clones was caused by a difference in the mutations that were adaptive at the
organismal level. Thus, the consequent effect on evolvability initially spread as a
pleiotropic “side effect.” The clonal interference population dynamic, however, limits
the ability of a single mutation to ﬁx on the basis of its organismal level selective
advantage alone (Chapter 2). This limitation occurs because other competing beneﬁcial
mutations are likely to arise in other lineages that confer comparable ﬁtness effects. The
eventual winner among these genotypes will be the one that generates more, or more ﬁt,
additional mutations over time. If there is no difference among the competing clonal

lineages in their inherent ability to do this, then the eventual winner will be due to

114

chance. But if these genotypes differ in their propensity to generate additional beneﬁcial
mutations, then those with greater propensity will signiﬁcantly increase their likelihood

of ﬁxing. The results presented here are consistent with the latter scenario.

115

REFERENCES

Arnold, B. C., N. Balakrishnan, H. N. Nagaraja. 1992. A First Course in Order Statistics.
John Wiley & Sons, Inc., New York

Atwood, K. C., L. K. Schneider and F. J. Ryan. 1951a. Periodic selection in Escherichia
coli. Proceedings of the National Academy of Sciences of the United States of A merica
37: 146-155

Atwood, K. C., L. K. Schneider and F. J. Ryan. 1951b. Selective mechanisms in bacteria.
Cold Spring Harbor Symposia on Quantitative Biolog/ 16:345-355

Bachtrog, D. and I. Gordo. 2004. Adaptive evolution of asexual populations under
Muller's ratchet. Evolution 58: 1403-1413

Begg, K. J. and Donachie, W. D. 1998. Division planes alternate in spherical cells of
Escherichia coli. Journal of Bacteriology 180: 2564-2567

Bjedov, I., O. Tenaillon, B. Ge’rard, V. Souza, E. Denamur, M. Radman, F. Taddei and
Ivan Matic. 2003. Stress-induced mutagenesis in bacteria. Science 300: 1404-1409

Bull J. J., M. R. Badgett, H. A. Wichman, J. P. Huelsenbeck, D. M. Hillis, A. Gulati, 0.
Ho and We. J. Molineux. 1997. Exceptional convergent evolution in a virus. Genetics
147: 1497-1507

Burch, C. L. and Chao, L. 1999. Evolution by small steps and rugged landscapes in the
RNA virus (1)6. Genetics 151: 921-927

Blattner, F. R., G. Plunkett, C. A. Bloch, N. T. Pema, V. Burland, M. Riley, J.
ColladoVides, J. D. Glasner, C. K. Rode, G. F. Mayhew, J. Gregor, N. W. Davis, H. A.
Kirkpatrick, M. A. Goeden, D. J. Rose, B. Man and Y. Shao. 1997. The complete genome
sequence of Escherichia coli K-12. Science 277: 1453-1474

Campos, P. R. A. and V. M. de Oliveira. 2004. Mutational effects on the clonal
interference phenomenon. Evolution 58:932-937

Cashel, M., D. R. Gentry, V. J. Hernandez, and D. Vinella. 1996. The stringent response.
Pp. 1458-1496 in Escherichia coli and Salmonella: Cellular and Molecular Biology. F.
C. Neidhardt ed. ASM Press, Washington, DC.

Chao, L. and E. C. Cox. 1983. Competition between high and low mutating strains of
Escherichia coli. Evolution 37: 125-134

Christiansen, F. B., S. P. Otto, A. Bergman and M. W. Feldman. 1998. Waiting with and

without recombination: The time to production of a double mutant. Theoretical
Population Biology 53: 1 99-2 1 5

116

Colegrave, N. 2002. Sex releases the speed limit on evolution. Nature 420:664-666
Conway Morris, S. 2003. Life ’s Solution. Cambridge University Press, Cambridge

C00per, T. F ., Rozen, D. E. and Lenski R E. 2003. Parallel changes in gene expression
after 20,000 generations of evolution in E. coli. Proceedings of the National Academy of
Sciences of the United States of America 100: 1072-1077

Cooper, V. S. and Lenski, RE. 2000. The population genetics of ecological
specialization in evolving Escherichia coli populations. Nature 407: 736-739

Cooper, V. S., Schneider, D., Blot, M. and Lenski, R. E. 2001. Mechanisms causing rapid
and parallel losses of ribose catabolism in evolving populations of Escherichia coli B.
Journal of Bacteriology. 183: 2834-2841

Crandall, K. A., Kelsey, C. R., Imamichi, H., Lane, H. C. and Salzman, N. P. 1999.
Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous
substitution rate ratio to detect selection. Molecular Biology and Evolutin 16: 372-3 82

Crow, J. and M. Kimura. 1965. Evolution in sexual and asexual populations. American
Naturalist 99:439-450

Crozat, E., N. Philippe, R. E. Lenski, J. Geiselmann and D. Schneider. 2004. Long-term
experimental evolution in Escherichia coli. XII. DNA topology as a key target of
selection. Genetics 1692523-53

Dawkins, R. 1976. The Selﬁsh Gene. Oxford University Press, Oxford

Dawkins, R. 1989. The evolution of evolvability. Artificial life: the proceedings of an

interdiciplinary workshop on the synthesis and simulation of living systems. C. G.
Langton ed. Addison-Wesley, Reading, MA. Volume 6 pp. 201-220

de Oliveira, V. M. and P. R. A. Campos. 2004. Dynamics of ﬁxation of advantageous
mutations. Physica a-Statistical Mechanics and Its Applications 337:546—554

de Visser, J ., C. W. Zeyl, P. J. Gerrish, J. L. Blanchard and R. E. Lenski. 1999.
Diminishing returns from mutation supply rate in asexual populations. Science 283:404-
406

de Visser, J. A. G. M., and D. E. Rozen. 2005. Limits to adaptation in asexual
populations. Journal of Evolutionary Biology 18:779-788

Elena, S. F ., V. S. Cooper and R. E. Lenski. 1996. Punctuated evolution caused by
selection of rare beneﬁcial mutations. Science 272:1802-1804

117

Elena, S. F. and Lenski, R. E. 2003. Evolution experiments with microorganisms: the
dynamics and genetic bases of adaptation. Nature Reviews Genetics 4: 457-469

Felsenstein, J. 1974. The evolutionary advantage of recombination. Genetics 782737-756
F erea, T. L., Botstein, D., Brown, P. O. and Rosenzweig. R. F. 1999. Systematic
changes in gene expression patterns following adaptive evolution in yeast. Proceedings of

the National Academy of Sciences of the United States of America 96: 9721-9726

Fisher, R. 1922. On the dominance ratio. Proceedings of the Royal Society of Edinburgh.
42: 321-431.

Fisher, R. 1930. The Genetical Theory of Natural Selection. Clarion, Oxford
Gerrish, P. J. 2001. The rhythm of microbial adaptation. Nature 413:299-302

Gerrish, P. J. and R. E. Lenski. 1998. The fate of competing beneﬁcial mutations in an
asexual population. Genetica 103:127-144

Gillespie, J. H. 1984. Molecular evolution over the mutational landscape. Evolution
38:1116-1129

Gmuender, H., K. Kuratli, K. Di Padova, C. P. Gray, W. Keck and S. Evers. 2001. Gene
expression changes triggered by exposure of Haemophilus inﬂuenzae to novobiocin or

ciproﬂoxacin: Combined transcription and translation analysis. Genome Research 11: 28-
42

Goddard, M. R., H. C. J. Godfray and A. Burt. 2005. Sex increases the efﬁcacy of natural
selection in experimental yeast populations. Nature 434:636-640

Gould, S. J. 2002. The Structure of Evolutionary Theory. Harvard University Press,
Cambridge, MA

Grimberg, B. and C. Zeyl. 2005. The effects of sex and mutation rate on adaptation in test
tubes and to mouse hosts by Saccharomyces cerevisiae. Evolution 59:431-438

Gumbel, E. J. 1958. Statistics of Extremes. Columbia University Press, New York

Halder, G., P. Callerts and W. J. Gehring. 1995. Induction of ectopic eyes by targeted
expression of the eyeless gene in Drosophila. Science 267: 1788-1792

Haldane, J. B. S. 1924. A mathematical theory of natural and artiﬁcial selection. Part I.
Transactions of the Cambridge Philosophical Society 23: 19-24

Haldane, J. B. S. 1927. A mathematical theory of natural and artiﬁcial selection. Part V:
Selection and Mutation. Proceedings of the Cambridge Philosophical Society 23:83 8-844

118

Hermission, J. and P. S. Pennings. 2005. Soft sweeps: molecular population genetics of
adaptation from standing genetic variation. Genetics 169: 2335-2352

Imhof, M. and C. Schlotterer. 2001. Fitness effects of advantageous mutations in
evolving Escherichia coli populations. Proceedings of the National Academy of Sciences
of the United States of America 98:1 1 13-1 1 l7

Jablonski, D. 1987. Heritability at the species level - analysis of geographic ranges of
cretaceous mollusks. Science 238: 360-363.

Jovanovich, S. B. and J. Lebowitz. 1987. Estimation of the effect of coumermycin Al on

Salmonella typhimurium promoters by using random operon fusions. Journal of
Bacteriology 169: 443 1-4435

Johnson, T. and N. H. Barton. 2002. The effect of deleterious alleles on adaptation in
asexual populations. Genetics 162:395-411

Johnson, T. and P. J. Gerrish. 2002. The ﬁxation probability of a beneﬁcial allele in a
population dividing by binary ﬁssion. Genetica 115:283-287

Kazazian H. H. 2000. Retrotransposons shape the mammalian genome. Science 289:
1 152-1153

Kerr, B., M. Riley, M. F eldman and B. J. M. Bohannan. 2002. Local dispersal promotes
biodiversity in a real life game of rock-paper-scissors. Nature 418: 171-17

Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge University
Press, Cambridge

Kimura, M. and T. Ohta. 1969. The average number of generations until ﬁxation of a
mutant gene in a ﬁnite population. Genetics 61 :763-771

Kirkup, B. C. and M. A. Riley. 2004. Antibiotic-mediated antagonism leads to a bacterial
game of rock-paper-scissors in vivo. Nature 428: 412-414

Kirschner, M. and J. Gerhart. 1998. Evolvability. Proceedings of the National Academy
of Sciences of the United States of America 95:8420-8427

Lenski, R. E., Rose, M. R., Simpson, S. C. and Tadler, S. C. 1991. Long-term
experimental evolution in Escherichia coli. 1. Adaptation and divergence during 2,000
generations. American Naturalist 138: 1315-1341

Lenski R. E. and Travisano, M. 1994. Dynamics of adaptation and diversiﬁcation: a

10,000-generation experiment with bacterial populations. Proceedings of the National
Academy of Sciences of the United States of America 91: 6808-6714

119

Lenski, R. E., Winkworth, C. L. and Riley, M. A. 2003. Rates of DNA sequence
evolution in experimental populations of Escherichia coli during 20,000 generations.
Journal of Molecular Evolution 56: 498-508

Lenski, R. E. 2004. Phenotypic and genomic evolution during a 20,000-generation
experiment with the bacterium Escherichia coli. Plant Breed Reviews 24: 225-265

Lenski, R. E., C. Oﬁ'ia, R. T. Pennock, and C. Adami. 2003b. The evolutionary origin of
complex features. Nature 423: 139-144

Littell, R. C., G. A. Milliken, W. W. Stroup and R. D. Wolﬁnger. 1996. SAS System for
Mixed Models. SAS Institute Inc., Cary, NC

Losos, J. B., Jackman, T. R., Larson, A., de Queiroz, K. and Rodriguez-Schettino, L.
1998. Contingency and determinism in replicated adaptive radiations of island lizards.
Science 279: 2115-2118

Mahillon, J. and Chandler, M. 1998. Insertion sequences. Microbiology and Molecular
Biology Reviews. 62: 725-774

Mao, E. F., L. Lane, J. Lee and J. H. Miller. 1997. Proliferation of mutators in a cell
population. Journal of Bacteriology 179:417-422

Matic, I., M. Radman, F. Taddei, B. Picard, C. Doit, E. Bingen, E. Denamur and J. Elion.
1997. Highly variable mutation rates in commensal and pathogenic Escherichia coli.
Science 277:1833-1834

Miralles, R., P. J. Gerrish, A. E. S. Moya and S. F. Elena. 1999. Clonal Interference and
the evolution of RNA Viruses. Science 285:1745-1747

Moxon, E. R., P. B. Rainey, M. A. Nowak and R. E. Lenski. 1994. Adaptive evolution of
highly mutable loci in pathogenic bacteria. Current Biology 4:24-33

Muller, H. J. 1932. Some genetic aspects of sex. American Naturalist 662118-138

Muller, H. J. 1964. The relation of recombination to mutational advance. Mutation
Research 1:2-9

Notley-McRobb, L. and T. F erenci. 2000. Experimental analysis of molecular events
during mutational periodic selections in bacterial evolution. Genetics 156:1493-1501

Notley-McRobb, L., S. Seeto and T. Ferenci. 2002. Enrichment and elimination of mutY
mutators in Escherichia coli populations. Genetics 162: 1055-1062

120

Oliver, A., R. Canton, P. Campo, F. Baquero and J. Blazquez. 2000. High frequency of
hypermutable Pseudomonas aeruginosa in cystic ﬁbrosis lung infection. Science
288:1251-1253

Orr, H. A. 1998. The population genetics of adaptation: The distribution of factors ﬁxed
during adaptive evolution. Evolution 52:935-949

Orr, H. A. 1999. The evolutionary genetics of adaptation: a simulation study. Genetical
Research 74:207-214

Orr, H. A. 2000. The rate of adaptation in asexuals. Genetics 155:961-968

Orr, H. A. 2002. The population genetics of adaptation: The adaptation of DNA
sequences. Evolution 56:1317-1330

Orr, H. A. 2003. The distribution of ﬁtness effects among beneﬁcial mutations. Genetics
163:1519-1526

Papadopoulos, D., D. Schneider, J. Meier-Eiss, W. Arber, R. E. Lenski and M. Blot.
1999. Genomic evolution during a 10,000-generation experiment with bacteria.

Proceedings of the National Academy of Sciences of the United States of America
96:3807-3812

Paquin, C. E. and J. Adams. 1983. Relative ﬁtness can decrease in evolving asexual
populations of S. cerevisiae. Nature 306: 368-371

Peck, J. R 1994. A ruby in the rubbish: beneﬁcial mutations, deleterious mutations and
the evolution of sex. Genetics 137:597-606

Penfound, T. and Foster, J. W. 1999. NAD-dependent DNA-binding activity of the
bifunctional NadR regulator of Salmonella typhimurium. Journal of Bacteriology 181:
648-655

Plotkin, J. B., and J. Dushoff. 2003. Codon bias and frequency-dependent selection on the
hemagglutinin epitopes of Inﬂuenza A virus. Proceedings of the National Academy of

Sciences 100: 7152-7157

Pruss, G. J. and K. Drlica. 1989. DNA supercoiling and prokaryotic transcription. Cell
56:521-523

Queitsch, C., T. A. Sangster, and S. Lindquist. 2002. Hsp90 as a capacitor of phenotypic
variation. Nature 417: 618-624

Rainey, P. B. and Travisano, M. 1998. Adaptive radiation in a heterogeneous
environment. Nature 394: 69-72

121

Rozen, D. E., J. A. G. M. de Visser and P. J. Gerrish. 2002. Fitness effects of ﬁxed
beneﬁcial mutations in microbial populations. Current Biology 12: 1040-1045

Reid, S. D., Herbelin, C. J., Bumbaugh, A. C., Selander, R. K. and Whittam, T. S. 2000.
Parallel evolution of virulence in pathogenic Escherichia coli. Nature 406: 64—67

Riehle, M. M., Bennett, A. F. and Long, A. D. 2001. Genetic architecture of thermal
adaptation in Escherichia coli. Proceedings of the National Academy of Sciences of the
United States of America 98: 525-530

Rundle, H. D., Nagel, L., Boughman, J. W. and Schluter, D. 2000. Natural selection and
parallel speciation in sympatric sticklebacks. Science 287: 306-308

Rutherford S.L. 2003. Between genotype and phenotype: protein chaperones and
evolvability. Nature Reviews Genetics 4:263-74

Rutherford, S. L. and S. Lindquist. 1998. Hsp90 as a capacitor for morphological
evolution. Nature 396:336-342

SAS Institute Inc. 1999. SAS/STAT User's Guide, Version 8. SAS Institute, Cary NC

Schneider, D., E. Duperchy, J. Depeyrot, E. Coursange, R. E. Lenski and M. Blot. 2002.
Genomic comparisons among Escherichia coli strains B, K-12, and 01 57:H7 using IS
elements as molecular markers. BMC Microbiology 2:18 doi:10.1186/1471-2180-2-18

Schneider, D., Duperchy, E., Coursange, E., Lenski, R. E. and Blot, M. 2000. Long-term
experimental evolution in Escherichia coli. IX. Characterization of insertion sequence-
mediated mutations and rearrangements. Genetics 156: 477-488

Shaver, A. C., P. G. Dombrowski, J. Y. Sweeney, T. Treis, R. M. Zappala and P. D.
Sniegowski. 2002. Fitness evolution and the rise of mutator alleles in experimental
Escherichia coli populations. Genetics 162:557-566

Sniegowski, P. D., P. J. Gerrish, T. Johnson and A. Shaver. 2000. The evolution of
mutation rates: separating causes from consequences. Bioessays 22: 1057-1066

Sniegowski, P. D., P. J. Gerrish and R. E. Lenski. 1997. Evolution of high mutation rates
in experimental populations of Escherichia coli. Nature 387:703-705

Sokal, R. R. and F. J. Rohlf. 1995. Biometry. W. H. Freeman and Company, New York
Steck, T. R., R. J. Franco, J. Y. Wang and K. Drlica. 1993. Topoisomerase mutations

affect the relative abundance of many Escherichia coli proteins. Molecular Microbiology
10:473-481

122

Stewart, C. B., Schilling, J. W. and Wilson, A. C. 1987. Adaptive evolution in the
stomach lysozymes of foregut ferrnenters. Nature 330: 401-404

Tenaillon, O., F. Taddei, M. Radman and I. Matic. 2001. Second-order selection in
bacterial evolution: selection acting on mutation and recombination rates in the course of
adaptation. Research in Microbiology 152211-16

Tenaillon, O., B. Toupance, H. Le Nagard, F. Taddei and B. Godelle. 1999. Mutators,
population size, adaptive landscape and the adaptation of asexual populations of bacteria.
Genetics 152:485-493

True H. L. and S. L. Lindquist. 2000. A yeast prion provides a mechanism for genetic
variation and phenotypic diversity. Nature 407: 477-83.

Van Nimwegen, E. and J. P. Crutchﬁeld. 2000. Metastable evolutionary dynamics:
Crossing ﬁtness barriers or escaping via neutral paths? Bulletin of Mathematical Biology
62:799-848

Vasi, F., Travisano, M. and Lenski, R. E. 1994. Long-term experimental evolution in
Escherichia coli. 11. Changes in life-history traits during adaptation to a seasonal

environment. American Naturalist 144: 432-456

Pedersen, K. and Gerdes, K. 1999. Multiple hok genes on the chromosome of
Escherichia coli. Molecular Microbiology 32: 1090-1102

Wichman, H. A., Badgett, M. R., Scott, L. A., Boulianne, C. M. and Bull J. J. 1999.
Different trajectories of parallel evolution during viral adaptation. Science 285: 422-424

Wilke, C. O. 2004. The speed of adaptation in large asexual populations. Genetics 167:
2045-2053

Wilks, S. S. 193 8. The large-sample distribution of the likelihood ratio for testing
composite hypotheses. Annals of Mathematical Statistics 9:60-62

Williams, G. C. 1966. Adaptation and Natural Selection. Princeton University Press.

123

Appendix 1. MATLAB script for simulation and graphing of Muller-style plot.

clear F N popnow poptime Graph tsz l = H;
counter i j k I end
N=1*10"7; l=l+1;
T = 600; end
U = 1*100-6;
S = .025; s = expmd(S);
$2 = N(1,l);
ﬁgure test = sum(N(l,1:(l)));
hold on j = double(k-sum(N(l,1 :(l-l))));
axis ([1 T 0 N]) gens = size(N);
set(gca, 'ytick', [D N = [N(1,1:(1-1)),(j-1),1,(sz-
set(gca,'fontsize', l6); j),N( 1 ,(1+ 1 ):gens( l ,2))];
N = [N]; F =
tot = sum(N); [F (l,1:(l)),F(l)*(l+s),F(1,(l):gens(l,2))];
tot2 = sum(N); k = k + int32(expmd(1/U));
F =11]; l= 1;
popnow = N; tot2 = sum(N);

piptime = popnow;

%mdstate = rand('state');
%mdnstate = randn('state');

rand('state', rndstate)

end

Graph = cumsum(N);
Graph = [Graph' Graph'];
Graph = [0 0;Graph];
sizeG = size(Graph);

randn('state', mdnstate) C = [F' F';0 0];
x = l:1:sizeG(l,l);
for t = 1:T; x = x./x;
time(l,t) = t; x = [x'*t x'*(t+l)];
t; pcolor(x,Graph,C)
avﬁt = sum(N.*F)/tot; shading ﬂat
lambda = N.*F./avﬁt; end
fn( l ,t) = avﬁt;
N = poissmd(lambda); colorbar
sizeN = size(N); set(colorbar,'box','off‘);
gens = sizeN(l,2); set(colorbar,'FontName', 'Arial’);
' = 1; colormap(gray)
l = 1; a = get(gca,'Clim');
counter=0; a(l,l)= l;
k = int32(expmd(1/U)) + 1; set(gca,'Clim',a);
while k<=tot2;
while sum(N(:,l:l))<k;
if N(l,1) == 0
N(:,l) = [1;
F(1,1) = I];

124

   

Iiiiliijiiiiiﬂﬂiﬂiji1116111