._... suffiniriqfir 15% LEA. JAM? S. «.3! . . -5 +6+3 +1*-2 -6 -1 +2+4Ir5-4 -3 a \u u l Stu 4—1. l I, ppr —> i radii—>- .2 (+6* .5‘.)+1t .3! .1ch —> I hokB/sokB L__.1 Table 1. Mutations in or near the four candidate loci in the evolving populations after 20,000 generations. Positions are nucleotide positions relative to the start codon of the indicated gene. Populations A-2, A—4, A+3, and A+6 evolved mutator phenotypes, whereas all others retained the low ancestral mutation rate. A denotes a 1-bp deletion. ::ISI50 indicates insertion of an IS I 5 0 element, often associated with a duplication of several bp at the target site. Gene Population Position Mutation Amino-acid change pku A+6 208 A-+C Pro -> Thr pku A+3 209 A—’C Pro -’ Gln pku A-3 379 G—>A Asp —’ Asn pku A+4 483 T->A Frame-shift pku A+3 507 A—>G Synonymous pku A—l 683 ::IS 150 Insertion pku A-2 790 A->T Ile —> Phe pku AH 901 G—aT Ala -’ Set pku A+5 901 G—>T Ala -’ Ser pku A-5 901 G—>T Ala —> Ser pku A-6 901 G—>A Ala -’ Thr pku A+2 1142 G—’C Gly —’ Ala pku A-4 1385 C—aT Thr -> Ile nadR A-S 74 T—>A Leu —> Gln nadR A+6 77 A—aC Gln —> Pro nadR A+3 103 A-—>G Thr -> Ala nadR A+1 169 ::IS 150 Insertion nadR A-2 193 G—>A Glu —> Lys nadR A-6 335 A—aT Asp —’ Val nadR A-l 889 G->A Gly —> Ser nadR A+2 902 A—>G Tyr -> Cys nadR A+4 902 A—>G Tyr —’ Cys nadR A+5 913 G—aA Ala —> Thr nadR A-4 961 A—>G Thr —> Ala nadR A-3 1031 A-aC Tyr -* Ser ppr AH -904 ::IS150 Insertion ppr A-2 -742 C—>T Non-coding ppr A-5 -800 A—>T Non-coding ppr A-2 -873 T-eC Non-coding ppr A-2 141 T—>G Phe —9 Leu 19pr A+6 748 A—>C Ile —> Leu ppr A-4 761 T—>C Leu -> Pro ppr A-2 1280 G—>A Gly -* Asp ppr A-l 1411 A—>T Ile -> Phe [hokB] A+6 397 T->G Adjacent to ych [hokB] A-2 302 G l-bp insert in ych hokB A+6 -69 ::IS150 Insertion hokB A-5 -70 ::ISI50 Insertion hokB AH -85 ::ISISO Insertion hokB A-3 -147 ::IS150 Insertion Table 1 (cont’d) Perhaps most genes accumulated mutations after 20,000 generations, such that finding mutations in a candidate gene in several or all populations is unremarkable. To that end, we compared the results of sequencing the candidate genes with sequences obtained, in a previous study, of randomly chosen gene regions (Lenski, Winkworth and Riley, 2003). Thirty-six randomly chosen gene regions, of ~500-bp, were sequenced in the ancestor and two clones randomly sampled from each of the 12 evolving populations at 20,000 generations. The ancestral nucleotide sequences for these regions were deposited in GenBank and given accession numbers AY625099-AY625134. The total length sequenced from each clone was 18,374 bp. A total of 8 mutations was found in the samples; the precise details of each mutation are provided elsewhere (Table 3 in Lenski, Winkworth and Riley, 2003). In those cases where one or both clones had a mutation in a particular gene, that region was sequenced for three more clones from the affected population. Four mutations, including 3 in population A-4 and l in population A+3, were present in all five clones. They are counted as substitutions in Table 2 of this paper. The other four mutations were in population A+6, and all were polymorphic, with two present at 80% and two others at 20%. For the analyses in this paper, A+6 is considered to have two substitutions in the random genes, which corresponds to the number of cases in which a mutation reached majority status (and also equals the summed frequency across the four polymorphisms). Thus, we count six substitutions in all in the random genes. Three of these substitutions were synonymous, and the other three non-synonymous point mutations. Moreover, the substitutions in random genes were all found in mutator populations. We performed four different statistical tests of the hypothesis that natural selection drove parallel changes in the candidate genes. Table 2. Number of mutations in random and candidate genes in 12 E. coli populations after 20,000 generations. Numbers of mutations and substitution rates are pooled across 36 random genes and 4 candidate genes. Populations A-2, A-4, A+3, and A+6 became mutators; all others retained the ancestral mutation rate. *Excluding the mutations (one in A-l, three in A+1) used to identify the candidate genes. Population A-l A-2 A-3 A-4 A-5 A-6 A+l A+2 A+3 A+4 A+5 A+6 Random genes (18,374 bp total) Candidate genes (7,150 bp total) Number of Rate per Number of Rate per mutations 1000 bp mutations 1000 bp I *0— T 0.000 3 0.420 — " " 2* 0.280 0 0.000 6 0.839 0 0.000 3 0.420 3 0.163 3 0.420 0 0.000 4 0.559 0 0.000 2 0.280 0 0.000 4 0.559 " " 1* 0.140 0 0.000 2 0.280 1 0.054 3 0.420 0 0.000 2 0.280 0 0.000 2 0.280 2 0-109 4 0.5 59 10 First, we compared overall substitution rates for the candidate and random genes, with an expectation of a higher rate in the candidates because they accumulated substitutions by selection as well as drift. All 12 populations had higher substitution rates in the candidate genes (Table 2), which is highly unlikely by chance (sign test, p = 0.0002). This result is unaffected by excluding the insertions used to identify the candidates in populations A-1 and A+1; both had substitutions in one or more genes whose candidacy was identified in the other population, and neither had any substitutions in the random genes. Second, if mutations in the candidate genes were beneficial, then we expect an excess of non-synonymous substitutions. For this statistical test, shown in Table 3A, only point mutations in protein-coding regions were included, with 27 such mutations in the candidate genes and 8 in the random genes. Three of six point mutations in random genes were synonymous, but only 1 of 27 was synonymous in the candidate loci (Table 3A). This difference is significant (Fisher’s exact test, p = 0.0136) and supports the hypothesis that the mutations substituted in the candidate genes were beneficial. An additional 3 non-point mutations occurred in the protein-coding regions of the candidate genes including 2 IS insertions and 1 one-bp deletion; these mutations could be regarded as non-synonymous because they change the resulting amino-acid sequence. If these 3 additional mutations are included in the analysis, the outcome remains significant (p = 0.0104). Third, theory predicts that the substitution rate for neutral mutations should scale with the mutation rate (Kimura, 1983), whereas the substitution rate for beneficial mutations is subject to diminishing returns in large asexual populations (Gerrish and 11 Lenski, 1998; de Visser er al., 1999; Wilke, 2004). Four populations became mutators and had mutation rates ~100-fold higher than the other populations. Only half the 30 point mutations in candidate genes were in mutator populations, whereas all six mutations in random genes were in mutators (Table 3B). This difference is significant in the direction expected if candidates experienced selection favoring new alleles (Fisher’s exact test, p = 0.0279). The statistical test shown in Table 38 includes 30 point mutations in the candidate genes, and 6 point mutations in the random genes. There were 8 additional mutations in candidate genes including 7 IS insertions and l one-bp deletion; 1 IS insertion was in a mutator population, and all others were in non-mutators. If these 8 additional mutations are included in this analysis, the outcome remains significant (p = 0.0211). A Synonymous Non- . . __,_ ,.____ __ ____ synonymous Candidate 1 26 Random 3 3 __ B _ __ ,Mutator Non-mutator Candidate 15 15 Random 6 0 Table 3. Candidate and random genes differ in relative abundance of (A) synonymous and non-synonymous point mutations, and (B) point mutations in mutator and non- mutator populations Fourth, if mutations in the candidates were neutral, then the numbers of substitutions in the populations should follow a Poisson distribution (if all populations had the same mutation rate) or be clumped in certain populations (given heterogeneity in 12 mutation rates). By contrast, if the mutations were beneficial, and if different mutations in the gene conferred the same benefit, then one would expect a more uniform distribution of mutations. Unlike the first three tests, this test is independent of the forces affecting the random genes. The distributions are the most uniform possible given the numbers of mutations in two of the candidate genes (Figure 2). For nadR, there were 12 substitutions, with each population having exactly one; the likelihood of this distribution by chance is 121/1212 < 0.0001. For pku, the chance of 11 populations having one mutation and one having two mutations is 0.0004. Moreover, these calculations are very conservative because the four mutator populations should push strongly away from uniformity, making the observed distributions even more unexpected. The other two candidates do not deviate significantly from the Poisson distribution, but that may reflect the smaller numbers of mutations in those genes and the conservative nature of this test. We chose both candidate and non-candidate genes a priori, as explained. The two mutations found in or near ych gene do not fit into either category, and they were excluded from our analyses on that basis alone. It is possible that ych is related to hokB/sokB, given its proximity and unknown functionality, but it is also possible these loci have nothing to do with each other. If we included ych with the random genes, it would not weaken any of our four tests and would, in fact, strengthen one of them. The first test (Table 2) compares the density of mutations found in random and candidate genes; adding one random mutation each to A-2 and A+6 would not change the fact that 12 of 12 populations have higher density in the candidate genes. The second test (Table 3A) is unaffected, because neither ych mutation counts as synonymous or non- synonymous; one was outside the coding region and the other was not a point mutation. 13 Figure 2. Distribution of numbers of substitutions in the 12 populations for the four candidate genes. Observed distributions are shaded; Poisson distributions with the same mean as the observed distribution are shown in outline. Populations 12 ppr-rodA (I: c .9 E 3 a. o n. 0 1 2 3+ 0 1 2 3+ Mutations Mutations 14 The third test (Table 3B) compares the four mutator and eight non-mutator populations; both ych mutations were found in mutator populations, and including it as a random gene would strengthen that already significant result. Finally, the fourth test (Figure 2) would be unaffected because non—candidate genes do not enter into the analysis. If we instead included ych as part of the hokB/sokB candidate genes, the third test would be slightly weakened but remain significant, while the other tests would not be affected. DISCUSSION The four tests individually and collectively support the hypothesis that parallel evolution in the candidate genes was driven by natural selection favoring the mutant alleles. The first two tests are also consistent with the alternative hypothesis that the candidate genes had relaxed selective constraints and could thus accumulate mutations without adverse effects. The third and fourth tests clearly reject this alternative because, if it were true, mutator populations should accumulate more substitutions and numbers would not be uniform across populations. Another alternative is that the candidate loci contain ‘hot spots’ with mutation rates much higher than the genome average. This alternative also runs counter to the test comparing mutator and non-mutator populations, unless one further supposes that the sites are independent of the DNA repair pathways that are defective in the mutators; but substitutions in the candidates included transitions and transversions as well as the original insertions, whereas substitutions in the random genes occurred only in the mutators and all had signatures reflecting defects in DNA repair (Lenski, Winkworth and Riley, 2003). The several IS insertions in hokB/sokB 15 suggest an increased rate of those mutations at that locus but such a bias, if it exists, does not contradict the possibility that the substitutions were adaptive (Cooper et al., 2001). The four tests collectively provide compelling evidence that the mutations that were substituted in the four candidate genes are beneficial in the environment employed in the evolution experiment. We do not know, however, the functional bases for their beneficial effects. At first glance, the fact that the four candidate genes were first identified by lS-element insertions in the focal populations might suggest that the beneficial mutations are knockouts. However, a more nuanced View is appropriate for several reasons. First, most mutations found by sequencing these genes in the other populations are unlikely to act as knockouts, with the probable exception of the several IS insertions in hokB/sokB and one frameshifi mutation in pku (Table 3). Thus, even if the originally discovered mutation in a gene were a knockout, some of the other populations may have substituted mutations with more subtle effects. Second, the original IS insertion affecting ppr-rodA was not in the reading frame but, instead, in the upstream regulatory region, where IS elements can exert more subtle effects on gene expression (Mahillon and Chandler, 1998). Third, in the case of nadR, the affected protein is bi- functional with both repressor and transport domains (Penfound and Foster, 1999; Schneider et al., 2000). A knockout of one function could leave the other function intact; and, in the case of the repressor function, even a knockout would cause elevated expression of the de-repressed genes. The following hypotheses suggest how mutations in the candidate genes might enhance fitness in the long-term experiment (Schneider et al., 2000). The pku gene encodes one of two pyruvate kinases that catalyze the conversion of 16 phosphoenolpyruvate (PEP) and ADP into pyruvate and ATP. PEP is also used by the phosphotransferase system (PTS) to transport glucose into the cell. By slowing the former conversion of PEP to pyruvate, mutations in pku might make more PEP available to drive the PTS-mediated uptake of glucose, which is limiting in the environment of the long-term evolution experiment. As noted, nadR encodes a bi- functional protein that is involved in several aspects of NAD metabolism, itself a key metabolite in many different pathways. Several genes involved in NAD synthesis and recycling are repressed by the NadR protein, and mutations in nadR might increase their expression and the resulting intracellular concentration of NAD. The evolved bacteria have higher maximum growth rates as well as shorter lags following the daily transfer into fresh medium (Vasi, Travisano and Lenski, 1994), and increased levels of NAD might be required to achieve one or both of these advantages. The hokB/sokB locus is one of several loci in E. coli related to the hok-sok locus of plasmid R1; hok encodes a toxin and sok an anti-sense RNA that blocks translation of the toxin. Together, these activities kill bacteria that lose the plasmid, a function that might benefit the plasmid but is obviously harmful to the bacteria. Inactivation of hokB/sokB would therefore benefit the bacteria (in the absence of plasmids), and indeed other copies of hok/sok loci in E. coli contain IS insertions (Pedersen and Gerdes, 1999; Schneider et al., 2002). The ppr-rodA operon encodes two essential proteins involved with peptidoglycan synthesis and coupling of cell-wall synthesis to the cell cycle (Begg and Donachie, 1998). All 12 populations evolved much larger cells (Lenski and Travisano, 1994), which may require more peptidoglycan synthesis or changes in the timing of its synthesis in relation to the cell cycle. 17 Our results demonstrate that evolution in these E. coli populations was often parallel at the level of genes, but only rarely were substitutions identical at the base-pair level. This latter point contrasts with results obtained in replicate populations of virus ¢X174, where almost half the substitutions were identical (Whichman et al., 1999). We also observed variation in the extent of parallelism among the bacterial candidate genes, with ppr and hokB/sokB showing less evolutionary repeatability than nadR and pku. Future work, including additional generations and genetic manipulations, may reveal whether these gene-level differences between populations will eventually disappear or, alternatively, whether epistatic interactions among the mutations will sustain their divergence indefinitely. 18 CHAPTER 2 POPULATION DYNAMICS OF BENEFICIAL MUTATIONS AND THE THEORY OF CLONAL INTERFERENCE ABSTRACT When multiple beneficial mutations arise independently and co-occur in an asexual population, one and typically only one will persist in the long run. The competition between asexual clones has been termed clonal interference. Here I use simulations to investigate the population dynamics of adaptation under conditions of large population size and high beneficial mutation rates. Multiple lineages often take multiple adaptive steps before one out-competes all others. Even in such situations mean fitness typically increases in a step-like manner. This process increases the diversity within evolving populations and allows populations to explore many genetic possibilities that do not become fixed in the population. This process also affects the probability of fixation of alleles that differ in their subsequent potential to adapt, but not their immediate fitness. INTRODUCTION Adaptation in the absence of genetic recombination requires mutations that fix to occur along a single line of descent. Beneficial mutations that occur elsewhere in the population will eventually be eliminated, but may, in the meantime, slow the rate of fixation of the mutations that do fix (Fisher 1930; Muller 1932; Crow and Kimura 1965; Gerrish and Lenski 1998; Campos and de Oliveira 2004; Wilke 2004). Muller (1932) 19 described this process thus: “Without sexual reproduction, the various favorable mutations that occur must simply compete with each other, and either divide the field among themselves or crowd each other out till but the best adapted for the given condition remains” (Muller 1932, p. 121). Sexual populations, on the other hand, can recombine independently arising beneficial mutations into the same genome. The resulting increase in the rate of adaptation for sexual populations is often called the F isher-Muller hypothesis for the advantage of sex (Fisher, 1930; Muller, 1932). Describing the within-population dynamics of adaptation in asexual populations is a particularly difficult problem. In part, the difficulty is due to the many unknown parameters, such as the fraction of mutations that are beneficial, the distribution of mutational effects, and the manner in which mutations interact with each other (epistasis). In addition, the fixation probability of an allele is dependent both on the genetic background into which it happens to fall, and on competition with other genotypes present in the population. For cases of complete linkage this competition between clones has been termed clonal interference (Gerrish and Lenski, 1998). The concept of clonal interference was defined by Gerrish and Lenski (1998) as the “phenomenon, whereby the fate of an original beneficial mutation is altered by the appearance of a superior alternative mutation” (Gerrish and Lenski, 1998, p. 128). Both the etymological and the conceptual foundations for ‘clonal interference’ were established by Muller (1932, 1964), for example when he wrote “. . .mutual interference with one another’s multiplication exerted by two or more different lines of advantageous mutants in asexual populations” (Muller 1964, p. 5). Recently, the term clonal interference has been used in this general way, without temporal or numerical limits 20 (Miralles et al., 1999; Campos and de Oliveira, 2004; Wilke 2004). Likewise, the term ‘theory of clonal interference’ has been used to refer to the body of conceptual work that is aimed at understanding the dynamics involved (Wilke 2004). Recent analytical models have advanced our understanding of adaptation in asexual populations. Wilke (2004) presented a closed form equation for the rate of adaptation in large asexual populations, building on previous work (Gerrish and Lenski, 1998; Orr 2000). Wilke’s model gives the rate of substitution, the expected size of mutations that will reach fixation, and the overall rate of adaptation (change in fitness over time) for large asexual populations. Gerrish (2001) also derived the temporal variance in the rate of substitution in large asexual populations. Hermisson and Pennings (2005) developed a single locus model to look at the within-population dynamics by following multiple independently derived copies of a selected allele. They concluded that high mutation rates or large selective coefficients could lead to multiple copies of an allele sweeping a population together, a phenomenon they called a “soft sweep” (Hermisson and Pennings, 2005). Empirical evidence for the within-population dynamics of adaptation has come from experimental evolution with bacteria. Atwood, Schneider and Ryan (1951a,b) presented what they called the ‘theory of periodic selection’ using experimental populations of Escherichia coli. Here occasional beneficial mutations sweep a population, eliminating nearly all genetic variation and resetting the genetic background of the population. It is clear from work of Atwood et al. (1951a,b) that they perceived the selective sweeps as discrete events, with a single beneficial mutation dominating the population at each time point. 21 More recent experiments indicate that the population dynamics may be more complex. Imhof and Schlotterer (2001) used a rapidly mutating neutral locus to track multiple lineages within a population of E. coli, showing that many independently arising beneficial mutations coexist transiently within a single evolving population. Notley-McRobb and Ferenci (2000) studied the adaptive population dynamics in continuous culture of E. coli. They showed that at least 13 different mutations at the mgl locus spread through one population together, creating a phenotypic sweep, but not a genetic sweep (hence an example of Hermisson and Pennings’ “soft sweep” (2005)). Likewise, a later adaptive step in a similar population included at least 15 mlc alleles (Notley-McRobb and Ferenci 2000). However, it is not clear from their data whether independent subpopulations coexisted through multiple adaptive steps. The mutations seen at both the mgl and mlc loci in that study were knockout mutations having high mutation rates, and given the large population sizes used, were expected to occur in ~10“ bacteria at each round of replication. Also, in two serial transfer long-term E. coli cultures Papadopoulos et al. (1999) showed that many different insertion sequence associated mutations coexisted, at least some of which were likely to be beneficial (Schneider et al. 2000, Chapter 1). Two experiments have shown that, within a single population, multiple subpopulations may accumulate multiple beneficial mutations before one finally out- competes all others. Mao et al. (1997) showed that, in populations challenged with a series of hard selection regimes, multiple lineages coexisted through multiple steps. In that experiment, genotypes with increased mutation rates steadily displaced the genotypes with lower mutation rates across multiple adaptive steps. Shaver et al. (2002) looked at 22 three long-term E. coli populations that evolved increased mutation rates. They saw that, while the mutator phenotype was polymorphic in the population, it became progressively more fit through time, suggesting that the lineage having the mutator phenotype acquired several additional beneficial mutations before it fixed in the population. Tenaillon et al. (1999) carried out a simulation study of asexual populations. They simulated asexual populations over a large range of population sizes, mutation rates, and adaptive landscapes (i .e., the number of mutations, the effect of each mutation, and the variation in effects were varied) to address the effect of these factors on the fixation of mutator phenotypes. Importantly, they suggest that a mutator phenotype, (i .e., an increased overall genomic mutation rate) is likely to become common when beneficial mutations are common, and that the mutator phenotype is likely to fix because it can accumulate several mutations faster than the non—mutator population (Tenaillon et al., 1999). The studies described above provide evidence that within-population dynamics of asexual populations are complex, and have implications for selection acting indirectly on phenotypes that generate more or less selectable variation (i.e., evolvability). Nonetheless, a detailed understanding of the within population dynamics, described initially by Fisher and Muller, is still missing. For instance, how many independent beneficial mutations will arise in an asexual population before the ancestral genotype is eliminated through selection? How long will the various subpopulations coexist and compete with each other before one out-competes all others? What, if any, consequences will there be for the types of traits that are selected by this clonal interference regime, especially with respect to evolvability? With a lack of appropriate analytical models we 23 must turn instead to numerical simulations of evolving populations to explore the behavior of asexual populations in a variety of different situations. METHODS Simulations. Populations consisted of N individuals and were propagated in discrete generations with stochastic reproduction and mutation. An infinite sites approach was taken, such that back mutations were not considered and no two mutations were identical. With this approach a genotype is a genetically and phylogenetically homogeneous and distinct group. Reproduction was achieved by choosing the size of each genotype, n,, at the next time from a Poisson distribution with mean n,(t + 1) = n,(t)(w,./(T)) , where a), is the fitness of genotype i, and E is the mean population fitness; a further adjustment is made so the expected total number of individuals each generation is held constant at N. Genotypes with more than 1000 individuals were replicated deterministically with the same expectation, in order to decrease the time of the simulations. All mutations that were introduced were beneficial; therefore, these simulations do not consider the additional complexities of deleterious mutations. Thus this model is more reasonable for organisms with a low genomic mutation rate (Gerrish and Lenski, 1998), such as Escherichia coli (Lenski, Winkworth and Riley, 2003). A genotype with n, individuals would generate m mutant individuals, where m is chosen from a Poisson distribution with a mean m = n, x U, and U is the beneficial mutation rate per individual per generation. Mutant individuals inherited all of the mutations of their parent genotype, plus one new one. The entire phylogeny of 24 genotypes was saved, which allowed for the assessment of the most recent common ancestor (MRCA). The source code is available upon request from the author. An additional program was written to run in MATLAB. This program was considerably simplified by only maintaining the present population. It was still in discrete time and used the same rules for mutation and reproduction. This simplified version was used only for the case study described below (Figure 4), because it permitted graphing. The code is presented in Appendix 1. The time required for these simulations is primarily determined by the mutation supply (S), which equals the product of beneficial mutation rate and population size. When the mutation supply is very low (e.g. <10”) it takes a long time for the population to adapt. When the mutation supply is very large (e.g. >102) the large number of genotypes created increases the computational time because each genotype is replicated independently. As a consequence of these considerations the simulations were run over a range of mutation supply rates instead of mutation rates. The same mutation supply, achieved by different combinations of mutation rate and population size, can give quite different results. The range of mutation rates and population sizes results in very different rates of adaptation and very different time scales of interest. To accommodate this fact, I kept track of the shifts in the most recent common ancestor (MRCA) of the population (i. e., the MRCA of the population at a given generation is different from the MRCA of the population at the preceding generation). Each simulation started with N individuals of a single genotype. To avoid artifacts that might be introduced by the homogeneous initial conditions, I began calculating the various statistics, which will be described below, from 25 the third time the MRCA shifted and ended on the 13‘“. Keeping time in number of shifts in MRCA ensures that, on the low end of mutation rates, multiple substitutions are indeed measured. On the high end of mutation rates, it ensures that there are true allelic fixation events. Following the 13Lh shift in the MRCA, the population simulations were allowed to continue until every mutation that arose in that window of time was either eliminated or fixed. RESULTS A case study. In Muller’s seminal paper (Muller 1932) he introduced a figure (copied below as Figure 3) that is particularly useful for understanding the spread of clones in asexual populations. In that figure, time is represented along one dimension and the widths of shaded regions, in the orthogonal dimension, show the sizes of various subpopulations. However, it is evident from close examination that Muller’s figure is an illustrative schema rather than an actual simulation. By contrast, Figure 4 represents an actual simulation (the MATLAB simulation). The population size in this simulation was 107. Beneficial mutations happened stochastically at a rate of 10*5 per individual per generation. The fitness effects of the mutations were drawn from an exponential distribution with a mean of 0.025 (i.e., a 2.5% advantage). A close examination of Figure 4 illustrates several of the main points of this paper, which will subsequently be backed up with extensive simulations. First, multiple single mutations arose on the ancestral genetic background and achieved significant proportions in the population before any one had a chance to sweep the population. I will 26 Figure 3. Reproduction of figure 1 from Muller (1932) that depicts the spread of beneficial mutations in asexual and sexual populations. EVOLUTIONARY SPREAD OF ADVANTAGEOUS MUTATIONS IN ASEXUAL REPRODUCTION; IN SEXUAL REPRODUCTION DIAGRAM 1. Showing the method of spreading of advantageous muta- tions in asexual and sexual organisms, respectively. Time is here the ver- tical dimension, progressing downwards. In the horizontal dimension 9. given population, stationary in total numbers, is represented. Sections of the population bearing advantageous mutant genes are darkened, propor- tionally to the number of such genes. In asexual organisms these genes compete and hinder one another ’8 spread; in sexual organisms they spread through one another. See, however, qualifications in text (p. 121), explain- ing limitations of a diagram in only two dimensions. The diagram is simplified in a number of other ways as well. For example, all mutants represented are shown as spreading at nearly the same rate, if they do spread, and thisrate is shown as about the same regardless of the extent to which they have entered into combination with one another. Figure 4. Results of a single simulation plotted in the style of Muller (1932). In this simulation N = 107, U = 10", and mean 3 = 0.025 (data from the MATLAB simulation). Each shaded region represents a clone with a beneficial mutation (colored according to fitness), and the vertical dimension indicates the frequency in the population. Each subpopulation is placed within the clonal lineage that gave rise to it. The solid blue line indicates the average fitness of the population. Note the step-like increases in average fitness even in the absence of any allelic fixation. The scale bar to the right indicates the fitness represented by various shades. Images in this dissertation are presented in color. _a ._L —L ._A .5 ' '01 '0) 'xi '00 r '- average fitness _L l\) (A) A 1- 181'” 1.1 - s a I, 1 g ‘ 0-9‘ l 0.8 . . ' l 100 200 300 400 500 time (generations) 28 refer to this group of genotypes collectively as a cohort. The single mutations confer different fitness effects. The one conferring the highest fitness is increasing its proportion relative to the others, whereas all of them are increasing relative to the common ancestor. The dynamics of the increase in frequency of beneficial mutations helps to explain the presence of multiple subpopulations. Much of the time a beneficial allele is polymorphic in a population it is at very low frequency. For example, a mutation with 5% fitness advantage sweeping a population without interference will take, on average, 217 generations to go from a frequency of one in 10‘5 to one in 20, but just 117 generations to go from 5% to 95% (Haldane, 1924). This lag, between the time a beneficial mutation occurs and the time until it appreciably influences the mean population fitness, creates a window when the various single mutant subpopulations are each growing almost independently of each other and the mean population fitness is still very close to the fitness of the common ancestral genotype. Only after one or more of the subpopulations causes a substantial decline in the ancestral genotype do the various single mutations compete appreciably amongst themselves. The second point to be made from Figure 4 is that the most fit single mutation may not fix before double mutations become common, nor even will it necessarily prevail in the long run. It is evident from Figure 4 that subpopulations with two mutations arose and became common before the most fit of the single mutation genotypes eliminated all others. Hence multiple lineages co-existed through multiple adaptive steps. Stair-step adaptation. When the beneficial mutation rate is low, mean population fitness is expected to increase in a step-like manner as rare beneficial mutations sweep 29 through the population (Lenski et al., 1991). As more mutations with variable fitness arise at various times and on different genetic backgrounds, increases in mean population fitness might be expected to become more irregular. In Figure 4, however, the fitness increases still take on a stair-step appearance (see also Gerrish and Lenski 1998). This is not an intuitive result. With a high beneficial mutation rate there may be many subpopulations present and in competition with each other. These subpopulations have different fitness values, and may be actively displacing each other during the nearly flat part of the step. In Figure 4 the average fitness indicates there is little adaptation between 400 and 500 generations. Yet we can see that during that time there were six major subpopulations, and the more fit were actively displacing the less fit. The differences in fitness among genotypes within a cohort, however, are subtle compared to the differences between cohorts. The step-like increase in average fitness occurs because the fitness difference between successive cohorts of genotypes carrying beneficial mutations is large relative to the difference within each cohort. One explanation for this may come from the statistics of orders (Arnold et al. 1992). Looking within a single cohort, only the few most fit of the mutations that arise in that cohort will be competitive. When mutations are drawn from an exponential distribution, the expected spacing among the top mutations does not change as more mutations are sampled (i.e., the expected distance between the best and the second best is the same no matter how many mutations are sampled). The mean fitness effect of these top mutations, though, increases logarithmically (Arnold 1992). As a result, the strength of selection within a cohort remains the same while the strength of selection between cohorts increases with increasing mutation supply. 30 Simulations over a broad parameter range. I now move from the case study to a systematic evaluation across a range of population sizes (N) and beneficial mutation supply rates (8). It has been shown previously that clonal interference increases with increasing population size, increasing mutation rate, and decreasing size of mutational effects (Muller 1964; Crow and Kimura 1965; Gerrish and Lenski 1998; Campos and de Oliveira 2004; de Oliveira and Campos 2004; Wilke 2004). I extend these results by examining the effect of these parameters on within-population dynamics. I will do this by focusing on the ‘side branches,’ that is, the subpopulations containing mutations that do not fix. Figure 4 showed that multiple subpopulations could accumulate multiple beneficial mutations before any one genotype eventually out-competes all others. I will examine the number of lineages, the number of adaptive steps through which they co- exist, and the proportion of the total population that these side branches take up. One set of simulations is presented in detail. Generalizations about other parameters will be mentioned in a following section. This set of simulations covers population sizes from 10" to 109, and mutation supply rates (the product of mutation rate and population size) from 10’3 to 101 beneficial mutations per population per generation. Note that the use of mutation supply rate allows for the simple calculation of mutation rate (U), as U = S/N. However, since the same range of S was used for each N, a different range of U was used for each N (see Methods). The fitness effects of mutations were drawn from a gamma distribution with a shape parameter of 1 (making it equivalent to an exponential distribution) and a size parameter of 0.01. Thus, the mean benefit of mutations was 1%. For each combination of mutation supply and population size 1000 simulations were performed. 31 Eflect of N and S on the number of interfering mutations. I am interested in the proportion of the population that has at least one beneficial mutation that does not fix. In one sense, these additional mutations constitute a load on the population. The mutations that do not fix are slowing down the rate of fixation of the mutations that will eventually fix by increasing the mean fitness of their competition. On the other hand, they are temporarily increasing the rate of increase in fitness of the population as a whole, by adding more beneficial alleles. Furthermore, at any given time it is impossible to know which genotype will be the ancestor to the future population. Each of the new beneficial mutations may therefore also be increasing the rate of adaptation by increasing the number of genotypes available as genetic backgrounds for further adaptation. In order to measure the proportion of individuals in the population with at least one mutation that does not fix I first found the persisting line of decent or PLOD. The PLOD is the lineage of genotypes that can be traced from the founding genotype, to the most recent common ancestor of the final population. For the period of time that I kept the data on each simulation (i.e., from the 3rd to the 13‘h shift in the MRCA) I calculated the number of individuals that were of a genotype not on the PLOD and divided it by the total number of individuals extant in the same window of time. These data are plotted in Figure 5. At the lowest mutation supply there are very few individuals that have genotypes not on the PLOD. As the mutation supply increases the percent of the population having non-fixing mutations increases. At a mutation supply of 10‘2 several percent of the population have mutations that will not fix. In the most extreme parameter combination (N = 10“, S = 10), over 50% of the population contained at least one beneficial mutation that did not fix. Recall that for a given beneficial mutation supply rate 32 Figure 5. The average proportion of the population that had at least one beneficial mutation that does not fix. With increasing beneficial mutation supply, more of the population has a beneficial mutation that does not fix for every population size (104-109). :3 population off PLOD 9171- 311:. A .1 ' . 17.17.41"- ‘ .001 0.01 0.10 1.00 10.0 mutation supply 33 the simulations with different population sizes also had different mutation rates per individual; the combination of mutation rate and population size used to achieve a given mutation supply made a difference. At low mutation supply, larger populations have more of their population with mutations that do not fix. Conversely, at high mutation supply, smaller populations have more of their population with mutations that do not fix. The proportion of the population that had at least one mutation that did not fix was made up of a number of different genotypes. Figure 6 shows the number of mutations not on the PLOD for different population sizes and mutation supply rates. The number of alternate genotypes was scaled to the number of mutations that fixed to allow comparisons across the entire parameter range. At the same mutation supply, larger populations generated more genotypes per mutation that fixed than smaller populations. Most of these genotypes were inconsequential because they were eliminated by chance at small sizes (Fisher, 1922; Haldane, 1927). It is of interest to know, of the genotypes that were generated, how many individuals each contained. I calculated the cumulative population size of each genotype by summing the number of individuals of that genotype over all generations it existed. The inverse cumulative distribution function (ICDF), which is also called a survival function, was then plotted for the cumulative size of all genotypes that have at least one mutation that did not fix. These curves are presented in Figure 7, where each curve is the average of 1000 simulated populations with identical parameters. A linear relationship here on a log-log plot would indicate that a power law describes the association. Genotypes reaching a small cumulative number of individuals (e.g. 1-1000) fit a linear relationship well. These mutations represent those that were most influenced by drift while at low frequency and 34 Figure 6. The number of beneficial mutations that occurred, per mutation that fixed, over a range of mutation rates and population sizes. Bar shading indicates population size. 10 . . . . . . [3104 i -105 .106 103— -m7 —. -10" -109 mutations/mutation that fixes _L D .001 0.01 0.10 1.00 10.0 mutation supply 35 Figure 7. Inverse cumulative distributions of the numbers of individuals of genotypes that had at least one mutation that did not fix. Data are scaled to the number of mutations that did fix. Each panel contains the data from one beneficial mutation supply rate and gives the average of 1000 simulations at each of 6 population sizes (10“ to 109). In every panel curves are ordered, with the lowest population size on the bottom and the highest on top. 10M . s = 10‘3 10M 1 s = 10‘2 I l 10"2 10"2 23 a i B 1vo 9. who, 1 “IA-2 10A-2r 1 10A- eeeeeeee ~ _ 0A.. 13% 10"2 10"4 10"6 10"8 10‘1010"12 13W 10"2 10M 10"6 10"8 10*1010"12 Cumulative size of genotype Cumulative size of genotype 10ml 3 = 10‘1 10% t S = 1 l _ 10"2 10"2 I ‘25 E B 10"0 2 who 10"-2 10"-2 0A_ 0A- 13A0 10"2 10M 10"6 10"8 10‘1010"12 13W 10"2 10*4 10‘6 10"8 10*1010‘12 Cumulative size of genotype Cumulative size of genotype 1004 S = 10 10"- 13"0 1002 1004 10"6 1008 10‘1010"12 Cumulative size of genotype 36 are eventually lost by chance. All population sizes and mutation rates had nearly identical shape in this part of the curve. This was expected because the effect of drift on small subpopulations is independent of the total population size (N), as long as N is much larger than the subpopulation. Also, all of these simulations used identical distributions of mutational effect, and the chance of surviving drift at low population size is known to be a function of size of mutation effects (Haldane, 1927). However, the slopes of the curves change at high cumulative numbers in Figure 7. Of all the genotypes that occurred, the ones that make up this latter part of the curve are the ones that escaped stochastic loss and thus were interfering with mutations that did eventually fix. The point at which the slope changes is an indicator of how many mutations are interfering. When mutation supply was low (e.g., 10“), the change in slope happened at a low value because few mutations both escaped drift and failed to fix, but the slope change was greater, as mutations that escaped stochastic loss had less interference from other mutations. At higher mutation supply there are many more mutations that failed to fix, but the latter part of the curve (for sizes >1000) was steeper, indicating that their survival was limited more by clonal interference. To summarize Figures 3, 4 and 5, within a given mutation supply, (i) larger populations had more mutations that did not fix, (ii) in larger populations more individuals had alternative mutations, and (iii) in larger populations these mutations tended to reach higher total numbers in the population. When we compare a range of beneficial mutation supply rates we see that, (iv) at lower mutation supply fewer mutations interfered, but those that did tended to reach larger sizes, and (v) the percent of the population that had mutations that did not fix interacts in a nonlinear way with 37 population size. More generally, it can be seen that very many alternative mutations that did not fix arose and escaped stochastic loss. Clonal interference and phylogenetic structure. In this section I look at the same set of simulations, and add the phylogenetic information on the genotypes that was collected. In doing this we can see not only the number of alternate genotypes, but also their relationships to one another and the mutations that fixed. These simulations represent evolution in a “single niche” world (i.e., there is one dimension to fitness). Therefore, the phylogenies were dominated by a single line of descent (the PLOD), but as we saw in Figure 4, clonal interference dynamics can create side branches in the phylogenies that may accumulate several beneficial mutations before they go extinct. For this set of simulations, the numbers of mutations that were at each distance from the PLOD are presented in Table 4. Again the data have been scaled to the number of mutations that fixed. At the lowest mutation supply rate (0.001), nearly all of the alternate genotypes were lost by chance while rare. However, as displayed in Figure 7, even in this parameter range it was possible to have mutations that achieved high levels that did not fix. At the lowest mutation supply ~52 mutations were lost by chance for every one mutation that fixed. Given that the average selective advantage of alleles in these simulations is 0.01, this is in agreement with the theoretical estimate by Haldane (1927) that the probability of fixation is twice the selective advantage. With a mutation supply of 0.1 side branches of two or three mutational steps were common. This finding was surprising given that, with ~98% of the mutations being lost to chance, 1000 generations are expected to pass for every two mutations that survived genetic drift. But the stochastic timing of mutations, and the inherent lag between the 38 /—'L .QOAm 2: 80¢ Swan: 83% a 38588 88 momma—E 288: 3888 .8 858:: 05 8:888 888888 8 858:: BE. .88 >253 82888 32.88:. new 08m cove—smog some 8 888886 202 80b 08 Sea 882m 88 .5888 can OOH: 05 88.: 8888 some 8 88888 we 838:: emanate 05. .v Beau. 8.8.8 :8 2 332888888 8888 88 2 888.8888 288.8 83.8 : 8-838888 828.8 83.8 238.8 «8.8 8 888.8588 2.388 88.8 8888 3.8 888 mg 8 888.8588 5:88 8:8 2888.8 888 8:88 «:8 3888 8.. 2.808 2.3. a $3.88 888 3888 :88 2:88 888 83.88 88 228.8 43 82.8 8.5 a. 8888 8.2 838.8 8: 2888 88 2.85.8 M22 :88 as 28.8 v.2 a 2.298 new 588 88 2:88 3 288 48 2:8 «.2 £88 98 m 2.88 Ia 588 EN 828 a; $88 08 $88 ”.8 28.: as 3. 288 3.: 83.8 n: :8 a: 22.: N3 28.8 3.. :83 2: m 8.3 .8 238 mm. 5.8 «.8 2%: 28 28: 88 2.28 as a aimflémimfla 8388882 8+mo_.~8+m8.~ 888.2888: 888.3888: :23 2.. 2 S 8-888888 a $8828 888 8888888 :2 9 8888 8:8 8-8488883 5.888 8.8 8888 888 a. 3 8388 £88 98888 888 €888 8.8.8 2:88 9.88 2e88 83 o 3:88 388 288.8 8.8 2388 88 63.8.8 88 £888 88 208.8 83 m 858 I 85.8 2 888 8.. 8.8.8 8.2 2.898 8.. 8:8 8.~ a 8:8 4.: 2:8 N: 2.28 : 2.58 2: 2:8 8. :28 2. n 2a.: on: 8.: N8 28.: o: :88 n8 5.8 o8 3.8 2.... N 25 .8 :88 8m s33 83. :88 8m 2.88 33 2a: .8 _ _ :8 58.8 0888 8-82.8 3.~8.o 8888 :88 8.848888 m 8.88 888 E88 838 2:88 888 228.8 88.8 2288 888 5880888 .. 8:08 88 288.8 38 2.88.8 3.88 888 88 888 88 388 :3 n 838 8.8 688 :3 828 8.... 888 8.8 2.38 8.0 888 88 N am: a: :8: w: 2:: w: :8: 8: 8.3.8 was G88 88 2 3 28888888 8888888 2.8888 :88 85888588 88388888 88888888 n 888 $8 888 9.3 23.88 33.8 2:88 9.8 28.88 8.8 :88 88 N :58 68 2:8 as $.88 o8 2.8 8 2.5 .8 38 mm _ :8 2.88 8:8 2888 8:8 388 88.8 888 8:8 888 888 8888 £88 a 3.3 n8 :88 Q8 2.8: .8 :88 gm :58 in :38 «.8 2 :8... 9:: as: 9:: e1: 9:: 3.: no...— 8 :8 :83. A7: 38 gens—=95 82.858 555:8 arrival of a mutation and the time it affected the mean fitness, ensured that these side branches occurred. With higher mutation supply rates the side branches became even longer and more common. Figure 8 showed these data for the simulations that had a mutation supply rate of 10. The nearly linear relationship on the semilog plot indicates that the number of genotypes that arose at a given distance from the PLOD decayed roughly exponentially with distance from the PLOD. Here again, within a given S, the combination of mutation rate and population size made a difference. Larger populations produce more genotypes one and two mutations away from the PLOD, but the small populations produce more genotypes further from the PLOD. The genotypes at a given distance from the PLOD were not necessarily phylogenetically independent. For example, a particularly successful genotype four mutations away from the PLOD could have given rise to all of the genotypes five mutations away, within in a single simulation. To examine this effect, for each class I calculated a “uniqueness” value, which indicates the number of phylogenetically unique lineages of a given length (d) that ended at a given distance from the PLOD. For one simulation the data are presented in Table 5. In that simulation N = 107 and S = 10 (i.e., the beneficial mutation rate per individual, U, was 106). In this case all 2070 genotypes 3 mutations off the PLOD converged to just 27 genotypes 2 mutations off the PLOD, and further converged to 9 mutations one mutation off the PLOD. Values one cell below the diagonal give the number of entirely unique branches off the PLOD of each length. In Table 4 the average values of the entirely unique lineages are included in parentheses for all parameter 40 Figure 8. The number of beneficial mutations that arise at various mutational distances from the PLOD, scaled to the total number of mutations that fix. The PLOD is the persisting line of descent and is found by tracing the lineage from the MRCA of the final population to the founding genotype. The data plotted are the average over 1000 simulations at each population size (104-109) with a beneficial mutation supply rate (S) of 10. -O- N=10"4 S - 10 + N=10A5 + N=10A6 10000 ! -K— N=10"7 1000 \ + N=10A8 100 \ —o— N=10"9 10 J \:\ 1-~ \ 0.1 \. l mutations/mutation that fixes 0.01 ; 0.001 4: 0.0001 I 7 T T 12345678910111213 stepsoff PLOD 41 Steps from d PLOD l 2 3 4 5 O 17 1 40560 2 9047 150 3 2070 27 9 4 694 l 1 2 2 5 1 72 6 2 1 1 Table 5. The number of unique lineages of length d that end a given number of mutational steps from the PLOD in a single population. In this simulation N = 107 and S = 10. The PLOD is the single lineage that leads from the founding genotype to the MRCA of the final population. The bold numbers one cell below the diagonal indicate the number of unique lineages that occurred in this simulation of each length. combinations (again averaged over all 1000 runs and scaled to the number of mutations that fixed). The mutations that did not fix were often clustered in genetic space (Table 4). For example, in the simulations where N = 107 and S = 10 (U = 10‘), an average of 6.6 genotypes five mutations off the PLOD were created for every one mutation that fixed. These genotypes were, however, highly clustered. Of these, there were on average only 0.0314 unique phylogenetic branches leading to genotypes five mutations off the PLOD. Evidently, side branches five mutations off the PLOD were extremely infrequent, but when they did occur they were part of a subpopulation that contained many beneficial mutations. The role of the distribution of mutation eflects. In addition to the simulations described here further simulations were carried out with identical parameter ranges except that they had different distributions of mutation effects. The results were 42 consistent with expectations from population genetics theory. For example, changing the shape parameter of the gamma distribution to 2, while retaining the same mean allows for a greater number of mutations to escape drift (data not shown). Also, using a constant mutation effect size greatly increased the degree of interference, the number of alternate genotypes per mutation that fixed and the length of the side branches (data not shown). A model for within-population selection on evolvability. The extended interference dynamic described above may also influence the way in which traits that generate beneficial variation will be affected by selection. It has been shown that there are two modes by which the frequency of an allele can change in a population. One is selection between subpopulations within a cohort, where the subpopulations with high fitness out-compete the subpopulations with lower fitness. The second mode is for the trait to be differentially represented in the next cohort, which opens the door for indirect selection on evolvability. Genotypes that produce more beneficial mutations or mutations with larger benefits can contribute disproportionately to the next cohort. Recall that alleles may be polymorphic through several cohorts before they fix. Fixation will be influenced, therefore, not only by an allele’s fitness advantage in the genetic background in which it occurred, but also by its propensity to generate more or more fit beneficial mutations (i.e., its evolvability). To test this idea another set of simulations was performed on a finite and completely defined fitness landscape. The simulations began with a homogeneous population of the ancestral genotype. This ancestral genotype had five mutations available, which each had identical fitness values, but differed in the number of subsequent beneficial mutations (zero, one, two, three or four) available. Thus, these five 43 mutations differ only in their evolvability, where evolvability is defined as the capacity to generate heritable phenotypic variation (Kirschner and Gerhart, 1998). To avoid confounding factors, back mutations were precluded. Also, for genotypes having at least one beneficial mutation available, the total beneficial mutation rates were made identical. This second provision inflated the beneficial mutation rate of the less evolvable genotypes and made them even, in this respect, with the more evolvable genotypes. This situation is unlikely in nature because genotypes with more beneficial mutations available will probably also have higher total beneficial mutation rates. This provision made the results conservative and ensured that any differences were attributable to the number of alleles alone. Additional simulations were done without this provision, and the effects, which are described below, were even more pronounced (data not shown). Table 6 shows the results of these simulations. As expected, at low mutation supply (0.001 and 0.01) populations had roughly equal probabilities of fixing each of the first mutations. As the mutation supply increased, mutations that allowed more subsequent adaptation became more likely to fix in the population. With a mutation supply rate of 0.1, the mutation allowing only the shortest adaptive trajectory became less likely to fix. With a beneficial mutation supply rate of 1.0 the populations became more likely to take the longer adaptive trajectories, and with a supply rate of 10 the longest path was taken approximately 60% of the time. In these simulations there were no back mutations, so once a mutation had fixed the population was committed to that path. Therefore, genotypes five adaptive mutations away from the ancestor must have arisen before the population fully committed to its first adaptive step. Table 6. The fraction of times, out of 1000 simulations at each combination of N and S, that populations fixed each of the five possible first mutations. The five mutations have the same immediate fitness advantage (10%), but differed in the number of subsequent mutations available. Numbers in bold are significantly above 0.2 (p < 0.025) and those in italic are significantly less than 0.2 (p < 0.025), assuming a binomial distribution. Population Mutation Subsequently available beneficial mutations size (N) supply (S) 0 l 2 3 4 0.001 0.205 0.185 0.180 0.234 0.196 10,000 0.01 0.196 0.185 0.230 0.196 0.193 0.1 0.124 0.222 0.215 0.237 0.202 1 0. 000 0.121 0.246 0.298 0.335 1 0 0. 000 0. 000 0. 060 0.313 0.627 0.001 0.199 0.215 0.210 0.159 0.217 100,000 0.01 0.197 0.194 0.207 0.219 0.183 0.1 0.124 0.205 0.214 0.235 0.222 1 0. 000 0.079 0.244 0.330 0.347 1 0 0. 000 0. 003 0. 08 7 0.314 0.596 0.001 0.212 0.180 0.197 0.209 0.202 1,000,000 0.01 0.190 0.21 1 0.189 0.203 0.207 0.1 0.090 0.214 0.237 0.225 0.234 1 0. 000 0. 064 0.248 0.328 0.360 1 0 0. 000 0.006 0. 074 0.314 0.606 0.001 0.205 0.217 0.199 0.196 0.183 10,000,000 0.01 0.188 0.186 0.212 0.207 0.207 0.1 0.071 0.232 0.212 0.239 0.246 1 0. 000 0. 050 0.227 0.359 0.364 10 0. 000 0.002 0. 07 6 0.340 0.582 0.001 0.184 0.193 0.201 0.213 0.209 100,00,0000 0.01 0.177 0.192 0.211 0.222 0.198 0. 1 0.063 0.216 0.245 0.232 0.244 1 0. 000 0.049 0.230 0.320 0.401 10 0. 000 0.002 0.070 0.311 0.617 0.001 0.199 0.194 0.184 0.209 0.214 1,000,000,000 0.01 0.167 0.217 0.207 0.212 0.197 0.1 0.043 0.222 0.230 0.249 0.256 1 0. 000 0.041 0.235 0.336 0.388 10 0. 000 0. 001 0.058 0.324 0.617 45 Throughout the range of population sizes, 10“ to 109, there were no apparent differences in the results. Evidently, for these simulations, the ability for selection to act indirectly on evolvability is dependent largely on the mutation supply rate, and it does not matter which combination of population size and mutation rate gave rise to it. This finding is in contrast to the earlier results looking at the distributions of size and distance from the PLOD of genotypes. DISCUSSION This investigation into the population dynamics of adaptation in large asexual populations allows us to draw three major conclusions. First, when there is a high beneficial mutation supply rate, many independent subpopulations may arise and compete with each other. Each of these subpopulations may also produce additional beneficial mutation before one out-competes all others. Second, as the population evolves, fitness may increase in a step-like manner, even when there is no allelic fixation event. Independent subpopulations may coexist through several steps in fitness. Third, as the number of independent subpopulations increases and they coexist through multiple adaptive steps, the fixation probability of a mutation depends not only on the fitness benefit that it confers, but also on its opportunity for further adaptation, relative to other contending beneficial mutations. The simulations presented here covered only a fraction of the possible parameter combinations that could have been chosen. In these simulations, the role of deleterious mutation, which can also play an important role (Peck, 1994; Orr 2000; Johnson and Barton, 2002; Bachtrog and Gordo, 2004), was ignored completely. The population 46 dynamics have proven to be interesting and complex even without considering deleterious mutations. The results presented here are most appropriately related to organisms with low overall mutation rates, as the effect of deleterious mutations will be minimized. The way in which the conclusions of this study will be influenced by deleterious mutations remains an open and interesting question. Also, cases of intermediate rates of recombination were ignored. The cases described here, with no recombination whatsoever, serve to describe the boundary condition of a more general theory that can incorporate the full range of recombination rates. The exponential distribution of mutation effects was used because the limited data available, suggest that this is a reasonable assumption (Imhof and Schlotterer 2001, Rozen et al. 2002). Unfortunately, these data are inadequate to discriminate between an exponential and other potential distributions (Rozen et al. 2002). Fortunately, a large number of statistical distributions have exponential-like tails (Gumbel 1958). Of course, in any real population the set of beneficial mutations available will be a consequence of the environment and the genotypes present, and will not be a realization of any abstract statistical distribution. Additionally, the distribution will be made up of a finite set of mutations; the assumed tail behavior, which plays a prominent role in several of the results of this paper, might be unfounded. This assumption has been used widely in theoretical treatments of adaptation (e.g., Gillespie 1984; Gerrish and Lenski 1998; Orr 2000; Orr 2002; Orr 2003; Wilke 2004). Importantly, the conclusions of this paper seem to be robust to the precise distribution of mutational effects assumed. Any actual distribution for which the collection of most fit mutations is more clustered than an exponential distribution will tend to cause the competing subpopulations to be more 47 similar in fitness, extending the amount of time they will compete with each other and therefore increase the strength of each of the previously stated conclusions. Clonal interference allows for exploration of an adaptive landscape, by promoting the coexistence of independently evolving subpopulations. However, this exploration has important limitations. First, it accesses only those mutations that are beneficial on the background in which they occur. For the parameters considered here, there is probably little time for neutral mutations that may interact epistatically to increase to a high enough frequency to pick up additional beneficial mutations before they are eliminated by the continuous supply of beneficial mutations. In other words, the expected waiting time for neutral mutations to matter may be considerably longer than the spacing between successive beneficial mutations (Christiansen et al., 1998; Van Nimwegen and Crutchfield, 2000). Therefore, this process of exploration of genetic space predominantly looks up (toward higher fitness), rarely to the side, and almost never down (i.e., through adaptive valleys). Second, in these dynamics the genetic space sampled tends to be very clustered. Even in situations of high beneficial mutation supply, there are sweeps that go unchallenged, so that no alternatives are sampled. Additionally, the various subpopulations can be highly variable in size. Therefore, the extent of exploration of genotypes around those subpopulations will also be highly variable. Considering these limitations on the exploration of genetic space it is perhaps surprising that the indirect selection for evolvability can be so effective. In the set of simulations shown in Table 6, where alleles differed only in their ability to generate additional beneficial mutations, the more evolvable alleles had a significantly higher probability of fixation under conditions of a high mutation supply rate. This outcome 48 was likely aided by the fact that the mutations were identical in fitness, which minimized the variation purging effect of within-cohort competition. The population dynamics described here may help to explain several reports of increased mutation rates in real populations (Mao et al., 1997; Sniegowski et al., 1997; Oliver et al., 2000; Notley-McRobb et al., 2002; Shaver et al. 2002). Sniegowski et al. (2000) argued that, “ this ‘clonal interference’ effect constrains the adaptive usefulness of a high mutation rate to situations in which beneficial mutations are extremely infrequent.” In fact, theory predicts (Gerrish and Lenski 1998; Wilke 2004) and experiments show (de Visser et al. 1999) that under conditions of high beneficial mutation supply, further increases in mutation rate have a minimal effect on the rate of adaptation. It is therefore paradoxically that mutator phenotypes have been observed to evolve in very large populations with a high supply of beneficial mutations (Sniegowski et al., 1997; Tenaillon et al., 1999; Notley-McRobb et al., 2002; Shaver et al., 2002). The population dynamics described here suggest a possible explanation. With a high beneficial mutation supply rate, the fixation of a lineage typically occurs over several adaptive steps. At each of these steps the mutator, or any trait that increases evolvability, can increase its proportion in the population by contributing more genotypes to the next cohort (see also Tenaillon et al., 1999; de Visser and Rozen, 2005). The work presented here has examined adaptation in large asexual populations. It has also identified several questions that deserve further investigation. Specifically, does selection for evolvability, in the manner described here, actually explain the evolution of increased mutation rates? Are there real cases of alleles that allow more adaptive steps being selected in the manner described? Also, our increased understanding of the 49 population dynamics caused by clonal interference may lead to new or more precise predictions about the differences between sexual and asexual populations and the potential evolutionary advantage of sexual and asexual reproduction. 50 CHAPTER 3 WITHIN-POPULATION DYNAMICS OF ADAPTATION IN A LONG-TERM EVOLUTION EXPERIMENT WITH ESCHERICHIA COLI ABSTRACT In asexual populations, independently arising beneficial mutations cannot recombine into the same genetic background. Instead, they compete with each other, and typically only one persists in the long run. The resulting population dynamics have important implications for the rate of evolution, the regularity with which mutations fix, and the characteristics of mutations that will fix. Much of our understanding of this process is theoretical, being based on analytical models or computer simulations. Here we present a detailed study of the within-population dynamics of an experimental population of Escherichia coli as it adapts to a novel environment during 5,000 generations. Six beneficial mutations were known to fix in this population over that time. The frequency of each of these mutations was tracked over time, and the fitness of clones — with and without the known mutations — was assayed. These data reveal that multiple independent beneficial mutations arise and compete with each other. Furthermore, these independent subpopulations sometimes accumulate multiple beneficial mutations before one subpopulation out-competes all others. This extended co—existence persists even through previously identified step-like increases in fitness. Finally, the generality of this process is shown with 10 additional populations that evolved for ~880 generations, wherein multiple subpopulations again took multiple adaptive steps before one subpopulation out-competed all others. The implications of our findings for the evolution of evolvability and the evolution of sex are discussed. 51 INTRODUCTION The lack of genetic recombination can cause complex adaptive dynamics in asexual populations. Early evolution experiments with Escherichia coli provided the basis of the theory of periodic selection (Atwood Schneider and Ryan, 1951a, b), which describes the purging of genetic variation as rare beneficial mutations sweep through a population. In such populations, adaptation will generally proceed by steps in fitness corresponding to the fixation of individual favorable mutations (Lenski et al., 1991; Elena, Cooper and Lenski, 1996). In the intervening periods, the population is waiting for the next beneficial mutation to arise, neutral and deleterious variation moves toward an equilibrium, and average fitness remains approximately constant. Under these conditions, where beneficial mutations are rare, this model works well and gives intuitive predictions for rate of adaptation with various mutation rates and population sizes (Crow, 1965; Gerrish and Lenski, 1998). When beneficial mutations become common enough for multiple beneficial mutations to arise and escape stochastic loss on the same genetic background, these mutations compete with each other. The ensuing population dynamic is called clonal interference. Several important consequences follow. First, the rate of adaptation is slowed significantly relative to a population that has free recombination (Fisher, 1930; Muller, 1932; Muller, 1964; Crow, 1965; Felsenstein, 1974). Second, there is a law of diminishing returns on the rate of adaptation with increasing mutation rate (Crow and Kimura, 1965; Gerrish and Lenski, 1998; de Visser et al., 1999; Orr, 2000; Wilke, 2004). Third, these conditions influence the types of mutations that fix. Specifically, as clonal 52 interference increases so does the average size of beneficial effect of mutations that fix (Gerrish and Lenski, 1998; Rozen, de Visser and Gerrish, 2002; Wilke, 2004), although this pattern may not always hold when deleterious mutations occur at a high rate (Campos and de Oliveira, 2004; de Oliveira and Campos, 2004). Fourth, the temporal fixation of beneficial mutations may be more regular than otherwise expected (Lenski et al., 1991; Gerrish, 2001). Finally, clonal interference may facilitate the fixation of those traits that affect evolvability by influencing the generation of heritable variation (Tenaillon et al., 1999; Chapter 2). These evolvability traits may include mutator genotypes (Tenaillon et al., 1999) and alleles that influence subsequent adaptive evolution by having more positive or fewer negative epistatic interactions (Chapter 2). Several experimental studies have looked at the within-population dynamics of evolving microbial populations. Imhof and Schlotterer tracked a rapidly mutating micro- satellite locus in adapting populations of E. coli in order to measure the fitness effects of the mutant genotypes (Imhof and Schlotterer, 2001). In doing so, they were able to see multiple subpopulations competing for fixation (Imhof and Schlotterer, 2001). Notely- McRobb and Ferenci examined the genetics and population dynamics of E. coli adapting in chemostats (Notley-Mcrobb and Ferenci, 2000; Notley-Mcrobb, Seeto and Ferenci, 2002). In large populations and for beneficial mutations that occur at a sufficiently high rate, they showed that many independent beneficial mutants can sweep through the population together, creating a phenotypic sweep without an allelic fixation (Notley- Mcrobb and Ferenci, 2000). They further showed that this dynamics was sometimes associated with the transient increase in a mutator phenotype (Notley-Mcrobb, Seeto and Ferenci, 2002). 53 The study presented here examines a single population in the long-term evolution experiment with E. coli, and it describes the within-population dynamics over the first 5,000 generations when adaptive evolution was most rapid. This particular population has been studied previously and has previously served as a focal population for several intensive phenotypic and genetic analyses. The temporal trajectory of fitness has been shown to increase in a step-like fashion (Lenski and Travisano, 1994). Increases in cell size, a trait that is correlated with fitness, closely matched the increases in fitness (Elena, Cooper and Lenski, 1996). Six mutations have been identified and shown to be favorable in the evolution environment (Schneider et al., 2000; Cooper et al., 2001; Cooper, Rozen and Lenski, 2003; Stanek and Lenski, in prep; Chapter 1). In this study, we use this genetic information, along with the availability of frozen population samples from preceding generations, to track the within-population dynamics by following the frequency of individual beneficial mutations as they spread through the population. Also, we measure the fitness of clones sampled from the population with each of the genotypes identified at each time point, including those genotypes with and without subsequently fixed beneficial mutations. These data yield a much richer and more complete picture of the within-population dynamics of adaptation than has been previously available. METHODS Evolution experiment. The focal population under study, called Ara-1, is one of twelve populations in a long-term evolution experiment described elsewhere (Lenski et al., 1991; Lenski and Travisano, 1994; Lenski 2004). The population was founded with a single clone of Escherichia coli B, and was propagated in serial batch culture in a glucose-limited environment (Davis minimal media supplemented with 25ug/mL 54 glucose). Each day, 1% of the population has been transferred to fresh media, allowing ~6.64 generations as the population consumes the available resources. A sample of the p0pulation was obtained every 500 generations throughout the evolution experiment and has been stored at —80°C; each sample is broadly representative of the entire population at the corresponding generation because each sample contains >1 mL of the population culture from the 99% not transferred to the next day’s culture. Isolation of clones and estimation of mutation frequency. Six mutations have been identified and shown to be beneficial (Table 7). PCR-based detection assays were developed to distinguish each mutant allele from the ancestral allele. Beneficial mutations in the ribose Operon (rbs) were large deletions and are detectable by polymerase chain reaction fragment length polymorphisms (PCRFLP). The mutation in pku was an insertion of an [8150 element (Schneider, et al., 2000), and was likewise detectable by PCRFLP. The mutations in topA, spoT and ppr each created or destroyed a different restriction enzyme recognition site. In these cases, restriction fragment length polymorphisms (RFLP) were used to detect the mutations. The mutation in glm US created neither a PCRFLP nor an RFLP. For this gene, a primer set was designed that would only amplify when the mutant allele was present. Because this method depends on optimization of PCR condition, and is inherently more prone to error than the other methods, each clone was checked at least three times. All assays were run with a positive (known mutant genotype) and a negative (ancestral genotype) control. To estimate the frequency of genotypes over time, 48 clones were picked from the frozen population samples at generations 500, 1,500, 2,000, 2,500, 3,000, 4,000, and 5,000, and 95 clones were picked from the population at 1,000 generations (Table 8). 55 Clones were obtained by streaking the population sample onto minimal glucose plates and picking colonies at random after 24 hours of growth at 37°C. The clones picked were inoculated into 1 mL LB liquid media and allowed to grow overnight at 37°C. The following day glycerol was added to achieve a 12% (v/v) ratio, and clones were then stored at —80°C. Lack of amplification or failure of an unambiguous signal caused several clones to be eliminated from the analysis. Table 8 indicates the number of clones used from each set. All mutations were not checked in all clones, but rather the following rules were followed: (i) if a mutation was present in all clones from one generation, then all clones from later generations are assumed to have the mutation; and (ii) if a mutation was not present in any clones from one generation, then all clones from previous generations are assumed to not to have the mutation. Bear in mind that all mutations in this study are ones known to have been absent in the ancestor and eventually fixed in the population. Table 8 indicates precisely when each allele was assayed. A subset of these clones was used for fitness assays. Three representative clones of each genotype from each generational sample were chosen. If there were fewer than three of a given genotype available at a given time point, then all available clones of that genotype were chosen. Here, “genotype” represents only the information we have from the allele detection procedures as described above. Hence, mutations we identify as being the same genotype are only identical at these loci and as testable by these methods. For example, it is possible that similar but not identical deletions in the ribose operon are scored as identical due to limitations of the PCRFLP assay. Also, the RFLP detection assays and the allele-specific PCR assays only detect mutations at their respective recognition sites. For this reason, they cannot tell us if there is a mutation elsewhere in 56 the gene. Most importantly, none of the methods can detect mutations at other loci. It is quite possible - and indeed we will show by fitness assays — that clones with the same “genotype” as detected by these methods in fact differ at other loci where mutations contribute to adaptation. Table 8 lists the genotypes of all clones assayed for fitness. Alleles at each locus are indicated in parentheses as follows: the allele present in the ancestor is designated anc, the allele that will eventually be fixed is designated fix, and any other alleles that are found are labeled alphabetically (A, B, ...) in the order in which they appear. Construction of the Muller-style plot. Figure 9 presents a plot to describe the spread of genotypes through time and was created from the allele-frequency data in the style of Muller (1932). In this plot, each color uniquely indicates a different genotype and the width of a shaded region indicates the frequency of that genotype in each generation. For each shaded region, the earliest occurrence of that genotype is placed within the color of the genotype that gave rise to it by mutation (parsimoniously assuming the fewest mutations possible). Frequency data were collected at generations 500, 1,000, 1,500, 2,000, 2,500, 3,000, 4,000 and 5,000. The plot also assumes that genotype frequencies changed linearly between sample time points, which clearly does not reflect reality but does give a reasonable picture of the spread of genotypes and the coexistence of their subpopulations through time. Fitness assays. The standard fitness assay for the long-term E. coli evolution experiment was used, and it is described in more detail elsewhere (Lenski et al., 1991). In all cases, fitness was measure against strain REL607, a genotype identical to the ancestral strain used to found the population except that it contains a neutral mutation in 57 the arabinose operon. This mutation allows the cells to utilize the sugar L(+) arabinose and causes a readily detectable phenotype (pink colored colonies) when mixed samples are grown on tetrazolium arabinose (TA) indicator agar. Colonies of cells that cannot use arabinose appear red on TA plates. The phenotype of being able to use arabinose is labeled Ara+ and the phenotype of not being able to use arabinose is labeled Ara-. Fitness assays were performed as follows. A particular clone (or population sample) and REL607 were inoculated separately into flasks containing LB liquid medium from the freezer stocks and allowed to grow for 24 hours at 37°C. These cultures were then diluted 100 fold, and 100 ul was transferred to 9.9 ml DM25 media. The cultures were again allowed to grow separately for 24 hours at 37°C. The clone (or population sample) to be assayed and REL607 were then mixed at a 1:1 volumetric ratio by transferring 50 ul from each culture to 9.9 ml fresh DM25. A sample was taken from this mixture and plated onto TA agar to estimate the initial number of each type. The mixture was allowed to grow for 24 hours at 37°C, at which point a second sample of the mixture was plated onto TA agar to estimate the final number of each type. Realized Malthusian parameters (m) for each of the two competitors were calculated, m = ln(N f / N i) , where N, is the number of individuals present at the final count and N, is the number present at the initial count, and where both counts reflect changes in overall density based on the dilutions used for plating. Relative fitness of one competitor to the other is then simply the ratio of their realized Malthusian parameters. Fitness estimates were made in four blocks. Each block consisted of two assay of every clone, four assays of every mixed population, and six assays of the ancestor. 58 Statistical analyses. The purpose in collecting the fitness data was two fold. First, we were interested in the competitive dynamics among the coexisting genotypes. Second, temporal changes in the fitness of genotypes defined on the basis of known mutations may tell us when clones of a particular genotype have acquired additional beneficial mutations that have not been identified. These issues were addressed by analyzing the data from each time point separately in a mixed general linear model. Genotype was treated as a fixed effect, and clone was treated as a random effect nested within genotype. The experimental block in which each fitness estimate was obtained was entered into the statistical model as a random factor. Additionally, for the three generational time points in which more than two genotypes were present, one planned contrast was performed. This contrast is a two-tailed test for a difference between the genotype that was most closely related to the eventual winners and all other genotypes present. This contrast is reported in Table 9 as E.W. as shorthand for the eventual winner. The MIXED procedure in SAS (version 8.0) was used. The MIXED procedure is a generalization of the standard linear model that allows the data to exhibit internal correlation and nonconstant variation (SAS Institute Inc, 1999), both of which may exist in our data set. Also, the maximum likelihood methodology in MIXED is more robust to imbalance in the data structure than standard ANOVA methods. Imbalance occurred in our data set whenever fewer than three clones of a given genotype were present in a particular generation. The restricted maximum likelihood method (REML) was used to estimate three models: 59 (A) In the full model, each fitness estimate, yijklr is the sum of the mean of genotype i, the effect of the jth randomly selected clone (~iid N(0, 0C)), the effect of the kth block (~iid N(0, 0,,» and the error from the lth measurement (~iid N(0, 06)), yijkl = 10+ 911') + bk + eijkl; (B) In the first reduced model, the effect of clone (nested in genotype) is dropped from the full model, yijkl = “i + bk + eijkl; (C) In the second reduced model, the block effect is also dropped, yijkl = W + eijkl' For each model a likelihood value is generated which indicates the probability of the model given the data. When one model is a subset of another the difference in the —2 log likelihood values of the two models is expected to fit a x2 distribution with degrees of freedom equal to the difference in the number of parameters in the two models (Self and Liang, 1987; Littell, 1996). Thus, tests of the factor clone were made by comparing the full model (A) to the model identical but for the factor clone (B). This comparison tests the null hypothesis that the variance attributable to clone is 0 (i.e., H0: CC = 0, Ha: Go > 0). Finally, since CC is bounded to be non-negative, the relevant probability is half that inferred from the x2 value (Littell, 1996). By identical reasoning, the test of the block 60 effect (i.e. ob > 0) is one-half the probability of a x2 greater than the difference in —2 log likelihoods of model C and model B. The fixed effect of genotype and the planned E.W. contrast were both tested with F statistics using the type III sums of squares from most complex model with significant support (A, B, or C). Degrees of freedom from the Satterthwaite approximation were used to adjust for imbalances in the data (Satterthwaite, 1946; Littell, 1996). Subsidiary evolution experiment. Ten additional populations were evolved under conditions identical to those used for Ara-1. These populations each started with a 1:1 ratio of clones REL606 and REL607. Thus, the founding populations were genetically homogenous except for a neutral marker, which determines the ability to use arabinose, with each allele present at an initial frequency of 0.5. A sample of each population, containing about 400 colony forming units (CFU) was plated onto TA agar every three days (~20 generations). As described above, Ara+ colonies appear pink while Ara- colonies are red, and the number of each type was recorded for each sample. This experiment follows closely the methodology of a previous experiment designed to capture the first adaptive mutational step (Rozen, de Visser and Gerrish, 2002), but it ran more than twice as long as that earlier experiment to allow multiple adaptive steps. The expected mean time to fixation by a process of random drift alone is given by -4N [ plog( p) + (1 — p) log(1 - p)] (Kimura and Ohta, 1969). The effective population size in these experiments, taking into account the bottleneck during the daily transfers, is about 3 x 107 cells (Lenski et al., 1991). Given an initial p = 0.5, fixation by random drift would require >90 million generations, and very long periods would be needed to produce any measurable change in frequency. Therefore, the substantial shifts observed 61 between 100 and 900 generations reflect the effects of selection on beneficial mutations that arise in one or both genetic backgrounds. RESULTS Order of beneficial mutations. Figure 9 and Table 7 show the frequencies of each known mutation in each generational sample tested in the focal population. These data document the timing and order in which these beneficial mutations arose and were substituted in the population. The first beneficial mutation identified thus far in that population, among those that eventually fix, is a deletion in the ribose operon (rbs(fix)). Onto this genetic background, beneficial mutations arose, in order, in topA, spoT, glm US, pku, and ppr. That the ribose operon mutation preceded the topA mutation is evident from the existence of clones with the rbs(fix) mutation that do not have the topA(fix) mutation. Likewise, there was a clone that had the spoT(fix) mutation that did not have the glmUS(fix) mutation, implying that the glmUS mutation arose in a background that carried the spoT mutation. In all cases, clones with a given mutation also had each of the preceding mutations. Beneficial mutations that do not fix. In addition to the mutations that eventually fixed in the population, we found two mutations in the ribose operon at 500 and 1000 generations that did not reach fixation. We did not know beforehand to look for these mutations. They were found because the PCRFLP assay, used to distinguish the rbs(anc) allele from the rbs(fix) allele, also indicated that other fragment lengths were present. Deletions in the ribose operon occur at an elevated rate and are mediated by an IS 150 element that is located directly upstream of the operon, and they also confer a small 62 Figure 9. A. Muller—style graph of changing frequencies of mutant genotypes in population Ara-l through time. The height of the shaded regions indicates relative frequencies estimated from samples taken in generations 0, 500, 1000, 2000, 2500, 3000, 4000, and 5000 with linear extrapolation between these time points. Genotypes are labeled by the gene that contains the latest known beneficial mutation. B. Mean fitness relative to the ancestor of the population samples taken from the same generations. Error bars indicate 95% confidence intervals. (Data from Table 8.) Images in this dissertation appear in color. I not any rbs(A) rbs(B) rbs(fix) topA spoT glmUS pku ppr 0 500 1000 1500 2000 2500 3000 4000 5000 I I I fir If fir If 1 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 63 Table 7. Frequency of clones with various mutations in each generational sample. The total number of clones tested is shown across the top. Cells shaded grey were not tested but the numbers shown are based on parsimonious assumptions that mutations had not yet arisen (entry equals zero) or had already fixed (entry equals total). The assay column indicates the method used to test for the presence of each particular mutation. PCRFLP stands for polymerase chain reaction fragment length polymorphism. RFLP stands for restriction fragment length polymorphism; the restriction enzyme that was used is shown in parentheses. The mutation in glmUS was detected using a primer pair in which the 3’ end of one primer matched the mutation such that a PCR product should be produced only when the mutant allele is present. See Methods section for further details. See references listed for full descriptions of the mutation and how they were determined to be beneficial. a a e e e g e E— V) ~ .... N N m In Total 4289 48 484846 40 48 assay reference rbs(anc) 2 1 0 0 0 0 O 0 PCRFLP rbs(A) 14 32 0 0 0 0 0 0 PCRFLP This chapter rbs(B) 0 5 0 0 0 0 0 O PCRFLP This chapter rbs(fix) 2651 48 48 48 46 40 48 PCRFLP Cooper et al., 2001 topA(fix) 224448 48 48 46 40 48 RFLP (Nmu CI) Crozat et al., 2004 spoT(fix) 0 34 48 48 48 46 40 48 RFLP (Hin 41) Cooper et al., 2003 glm US(fix) 0 3348 48 48 46 40 48 Allele specific PCR Stanek and Lenski, in prep. pku (fix) 0 0 6 6 45 45 4048 PCRFLP Schneider et al., 2000 ppr(fix) 0 O 0 0 0 3 3648 RFLP (Mbo 1) Chapter 1 fitness advantage in the glucose-limited environment of the evolution experiment (Cooper et al., 2001). Following completion of the allele detection and fitness assays, it became clear that beneficial mutations arose in the population that did not fix, and that have not yet been identified. For example, if the clones from the 1000-generation sample that have either the rbs(A) or the rbs(B) allele contained no other beneficial mutations, then we would expect their fitness advantage to be only about 1-2% relative to the ancestor (Cooper et al., 2001). Their fitness advantages are, in fact, much greater, indicating they have additional beneficial mutations that have not yet been discovered in genetic analyses of this population. Therefore, we made two targeted attempts to identify some of these other beneficial mutation that must have been present but did not fix. In the first effort to identify these unknown mutations, we examined two clones from generation 1000, one having the rbs(A) allele (clone REL10651) and the other having the rbs(B) allele (clone REL10654). Three genes, pku, nadR and spoT, were chosen as good candidates because mutations in these genes eventually fix in this population and because similar mutations arose in independent populations under the same conditions (Cooper, Rozen and Lenski, 2003; Chapter 1). Sequencing did not reveal mutations in any of these genes in REL10654. One non-synonymous mutation was found in pku in REL10651, but its nadR and spoT sequences were identical to the ancestor. This mutation in pku eliminates a recognition site for the restriction enzyme NlaIII, and this fact was used both to double check the accuracy of the sequencing data and to test for the presence of the mutation in the other clones having the rbs(A) allele. This RFLP assay revealed that all three clones sampled from generation 1000, which 65 have the rbs(A) allele and for which we have fitness estimates, possess the same pku mutation. Two 500-generation clones that have the rbs(A) allele were checked and did not have the mutation in pku. This subpopulation evidently continued to evolve between generations 500 and 1000. Our second attempt to find beneficial mutations that did not fix examined a clone, REL10625, sampled from generation 1500 that did not have the pku mutation that eventually fixed in this population. The presence of an unidentified beneficial mutation was strongly suspected in this case because the three clones without the pku (fix) allele were just as fit as the three clones with the pku (fix) allele (see below). This fact further suggested we look at the pku gene for the missing beneficial mutation. Sequencing the pku gene in clone REL10625 revealed a 7-bp deletion in the reading frame causing a frame shift. Lack of recombination. The E. coli B strain used to start population Ara -l was thought to have no means of genetic recombination between separate cells. E. coli is not naturally competent, and the ancestral B strain carries no plasmids and seems not to harbor any functional lysogenic phage. Consistent with this view, we found no evidence for recombination among clones of any of the alleles assayed in this work. Specifically, there were 177 clones assayed when alleles at multiple loci were polymorphic in the population. Each of these clones could potentially have revealed a putative recombination event, yet in no case was a genotype seen that was inconsistent with an absence of recombination. Dynamics of adaptation in population Ara-I . We can now combine the allele frequency data with the fitness data, along with some previously acquired information, to 66 explain the dynamics of adaptation in population Ara-l over the first 5,000 generations of the long-term experiment. The resulting picture is not complete, but it does clearly show that clonal interference was intense. Three important patterns will be described in the data. First, numerous beneficial mutations arise and reach a high frequency in the population, but only some of the them eventually fix in the population as a whole. Second, some sub-populations may acquire multiple beneficial mutations, yet still fail to be successful in the long run. Finally, these sub-populations are genetically different from the eventual winner either at the level of the locus or the particular allele within a locus. At generation 500 there were four distinguishable genotypes that had, as follows, the rbs(A) allele, the rbs(fix) allele, both the rbs(fix) allele and the topA(fix), or no known mutations. Fitness estimates for the clones having both the rbs(fix) and topA(fix) alleles are consistent with the expectation obtained from the multiplicative combination of their separately estimated effects. That is, relative fitness values of 1.014 for ribose (Cooper et al., 2001) and 1.133 for topA (Crozat et al., 2004), respectively, are expected to produced a combined fitness of 1.148; the least squares mean for this genotype from the ANOVA based on clones from generation 500 is 1.136, with the corresponding 95% confidence interval ranging from 1.104 to 1.167. On the other hand, the clones with no known mutations, or known mutations only in the ribose operon, have fitness much greater than expected. This discrepancy indicates that the clones must have beneficial mutations that have not been identified. It is not presently known how many other mutations account for the additional fitness advantage. Also, in the ANOVA based on the SOC-generation clones, the factor genotype is significant (p = 0.0018). The planned 67 Table 8. Fitness assays for 8 population samples and 61 isolated clones, with all fitness values relative to strain REL607. REL607 is identical to the ancestral strain except that it has the selectively neutral Ara+ marker. * indicates that the allele state is assumed; see text and Table 7 for details. Adding and subtracting the parenthetical +/- value gives the 95% confidence interval for the fitness estimate. For all genes, “anc” indicates the ancestral allele, “fix” indicates the allele that eventually fixed in the population, and other alleles are named in alphabetical order (A, B, ...). The clone column indicates whether the test material was a clone or a population sample (including all the genetic variation in the population when it was sampled) . REL relative time clone label fitness +/- rbs topA spoT glmUSpku ppr 0 yes 606 anc anc anc anc anc anc 500 no 762 1.195 (0.033) 1000 no 964 1.225 (0.041) 1500 no 1068 1.336 (0.039) 2000 no 1164 1.352 (0.034) 2500 no 1282 1.347 (0.042) 3000 no 1483 1.387 (0.053) 4000 no 1890 1.437 (0.046) 5000 no 2179 1.458 (0.055) 500 yes 10632 1.127 (0.037) fix fix anc anc *anc *anc 500 yes 10633 1.150 (0.041) fix fix anc anc *anc *anc 500 yes 10634 1.137 (0.024) fix fix anc anc *anc *anc 500 yes 10635 1.135 (0.041) fix anc anc anc *anc *anc 500 yes 10636 1.197 (0.037) fix anc anc anc *anc *anc 500 yes 10637 1.265 (0.106) fix anc anc anc *anc *anc 500 yes 10638 1.204 (0.034) A anc anc anc anc *anc 500 yes 10639 1.205 (0.064) A anc anc anc *anc *anc 500 yes 10640 1.182 (0.081) A anc anc anc anc *anc 500 yes 10641 1.228 (0.074) anc anc anc anc *anc *anc 500 yes 10642 1.230 (0.083) anc anc anc anc *anc *anc 1000 yes 10643 1.284 (0.090) fix fix fix fix anc *anc 1000 yes 10644 1.243 (0.067) fix fix fix fix anc *anc 1000 yes 10645 1.261 (0.053) fix fix fix fix anc *anc 1000 yes 10646 1.207 (0.059) fix fix fix anc anc *anc 1000 yes 10647 1.219 (0.052) fix fix anc anc anc *anc 68 1000 yes 10648 1.208 (0.031) fix fix anc anc anc *anc 1000 yes 10649 1.252 (0.065) fix fix anc anc anc *anc 1000 yes 10650 1.220 (0.038) fix anc anc anc anc *anc 1000 yes 10651 1.227 (0.029) A anc anc anc A *anc 1000 yes 10652 1.230 (0.057) A anc anc anc A *anc 1000 yes 10653 1.248 (0.041) A anc anc anc A *anc 1000 yes 10654 1.263 (0.063) B anc anc anc anc *anc 1000 yes 10655 1.282 (0.074) B anc anc anc anc *anc 1000 yes 10656 1.271 (0.044) B anc anc anc anc *anc 1000 yes 10657 1.228 (0.053) anc anc anc anc anc *anc 1500 yes 10624 1.329 (0.068) fix fix fix fix fix *anc 1500 yes 10658 1.305 (0.064) fix fix fix fix fix *anc 1500 yes 10659 1.314 (0.047) fix fix fix fix fix *anc 1500 yes 10625 1.336 (0.049) fix fix fix fix anc *anc 1500 yes 10660 1.347 (0.073) fix fix fix fix anc *anc 1500 yes 10661 1.297 (0.087) fix fix fix fix anc *anc 2000 yes 10662 1.359 (0.086) fix *fix *fix *fix fix anc 2000 yes 10663 1.352 (0.082) fix *fix *fix *fix fix anc 2000 yes 10664 1.292 (0.035) fix *fix *fix *fix fix anc 2000 yes 10665 1.393 (0.139) fix *fix *fix *fix anc anc 2000 yes 10666 1.369 (0.058) fix *fix *fix *fix anc anc 2000165 10667 1.272 (0.106) fix *fix *fix *fix anc anc 2500 yes 10668 1.411 (0.072) *fix *fix *fix *fix fix anc 2500 yes 10669 1.381 (0.070) *fix *fix *fix *fix fix anc 2500 yes 10670 1.285 (0.050) *fix *fix *fix *fix fix anc 2500 yes 10671 1.326 (0.042) *fix *fix *fix *fix anc anc 2500 yes 10672 1.327 (0.042) *fix *fix *fix *fix anc anc 2500 yes 10673 1.225 (0.214) *fix *fix *fix *fix anc anc 3000 yes 10674 1.367 (0.058) *fix *fix *fix *fix fix fix 3000 yes 10675 1.392 (0.053) *fix *fix *fix *fix fix fix 3000 yes 10676 1.429 (0.082) *fix *fix *fix *fix fix fix 3000 yes 10677 1.417 (0.080) *fix *fix *fix *fix fix anc 3000 yes 10678 1.331 (0.053)*fix *fix *fix *fix fix anc 3000 yes 10679 1.402 (0.042) *fix *fix *fix *fix fix anc 3000 yes 10680 1.378 (0.055) *fix *fix *fix *fix anc anc 4000 yes 10681 1.482 (0.086) *fix *fix *fix *fix fix fix 4000 yes 10682 1.442 (0.066) *fix *fix *fix *fix fix fix 4000 yes 10683 1.426 (0.051) *fix *fix *fix *fix fix fix 4000 yes 10684 1.403 (0.051) *fix *fix *fix *fix fix anc 4000 yes 10685 1.448 (0.056) *fix *fix *fix *fix fix anc 4000 yes 10686 1.424 (0.079) *fix *fix *fix *fix fix anc 5000 yes 10687 1.417 (0.056) *fix *fix *fix *fix fix fix 5000 yes 10688 1.477 (0.070) *fix *fix *fix *fix fix fix 5000 yes 10689 1.406 (0.106) *fix *fix *fix *fix fix fix Table 8 (cont’d) 69 Table 9. Analyses of variation in fitness among genotypes and clones within genotypes in population Ara-1. Results were obtained using the MIXED procedure. The significance of the random factors, clone and block, were evaluated using a likelihood ratio test (LRT). The significance of the fixed factor genotype, and of the contrast between the eventual winner and other genotypes, E.W., were evaluated by partial F tests using the most complicated model with support (model C for generations 2000 and 3000, and model B for all other generations). See Statistical Analyses in the Methods section for further details. time factor test df test. . statistic 500 clone LRT 1 1.2 0.1367 block LRT 1 10.5 0.0006 genotype F 3, 80 5.45 0.0018 E.W. F 1, 80 15.77 0.0002 1000 clone LRT 1 0 0.5000 block LRT 1 10 0.0008 genotype F 6,108 0.92 0.4866 E.W. F 1, 108 1.49 0.2247 1500 clone LRT 1 0 0.5000 block LRT 1 7.2 0.0036 ienoqm F l, 42 0.31 0.5809 2000 clone LRT l 0.1 0.3759 block LRT 1 0.2 0.3274 genotype F 1, 44 0.04 0.8381 2500 clone LRT 1 0.5 0.2398 block LRT 1 5.6 0.0090 genotype F 1, 41 1.87 0.1794 3000 clone LRT l 0.7 0.2014 block LRT 1 2. 1 0.0736 genotype F 2, 52 0.14 0.8674 E.W. F 1, 52 0.29 0.5956 4000 clone LRT 1 0 0.5000 block LRT 1 8.9 0.0014 genotype F 1, 39.1 0.78 0.3820 5000 clone LRT 1 0 0.5000 block LRT 1 0 0.5000 70 contrast of the genotype most similar to the eventual winner to the other genotypes indicates that the genotype with both the rbs(fix) and topA(fix) mutations is, in fact, different from the rest of the population (p = 0.0002). Surprisingly, however, that genotype was, at generation 500 generations, the least fit (Figure 10A). The SOD-generation genotype that is the ancestor to the eventual winner was still present at 1000 generations (Table 8). It also gave rise to two other sub-populations that were present at 1000 generations, each of which harbors one or two alleles that eventually fixed in the population. One of these new sub-populations has a mutation in spoT as well as the rbs(fix) and topA(fix) alleles. The other new sub-population has mutations in glmUS along with the rbs(fix), topA(fix), and spoT(fix) alleles. There was also one clone at 1000 generations that has the rbs(fix) allele but none of the other alleles that were eventually fixed. The 1000-generation clones that have only the rbs(fix) and topA(fix) alleles are significantly more fit than clones with those same mutations at 500 generations, again indicating the presence of other unidentified beneficial mutations. Two other genotypes appear in the 1000-generation sample that were not present at generation 500, each containing alleles that do not eventually fix. One of these contains another distinct mutation in rbs, designated the rbs(B) allele. The other contains a mutation in pkyF, designated the pku (A) allele, on the background that also contains the rbs(A) allele; this genotype therefore carries two mutations that do not fix, both in loci that acquire other mutations that do eventually fix in this population. By generation 1000, one or more beneficial mutations have arisen at five different loci (rbs, topA, spoT, glm US, and pku), yet none of these mutations had yet fixed in the population (Fig. 1B), despite a gain in mean fitness of more than 20% (Table 71 Figure 10. Fitness of the population and clones of the major genotypes sampled at (A) 500 generations and (B) 1000 generations, relative to the founding clone. The label ‘mixed pop.’ indicates the entire population with all the genetic variation that it contained. All other bars represent the fitness of a single clone. The clones are grouped and colored according to the genotype assigned by the allele detection assays. Group labels indicate the gene with the latest known beneficial mutation; for example, all topA clones also have the rbs(fix) allele. Error bars show 95% confidence intervals. There was significant variation among the genotypes at generation 500 but not generation 1000 (Table 9). Images in this dissertation appear in color. A '1.4< relatlve fitness 5 mixed pop. topA rbs(fix) rbs(A) not any Relatlve tltness i: a .5 _A 1 mixed glmUS spoT topA rbs(fix) rbs(A) rbs(B) not any pop. 72 8). Four different sub-populations exist that share no known mutations including those with at least one mutation that will fix (at the rbs, topA, and spoT loci), those with rbs(A) and pku (A), those with rbs(B) only, and those with no known mutations, and a total of seven distinct genotypes were identified. Despite this extensive genetic diversity, there was no significant variation in fitness among these genotypes, nor among clones within genotypes, at 1000 generations (Table 9; Fig. 2B). This similarity in fitness among genotypes with and without the several known beneficial mutations indicates, once again, that many clones carry one or more unknown beneficial mutations. Between 1000 and 1500 generations, four known beneficial mutations finally became fixed in the population: rbs(fix), topA(fix), spoT(fix), and glmUS(fix). Also, a mutation arose in pku that would later; this pku (fix) allele had reached a frequency of about 12.5% at generation 1500 (Table 7). Thus, we see clearly that the population was swept not by a single beneficial allele but by a linked set of several beneficial mutations, which evidently accumulated as a consequence of clonal interference that prevented any single beneficial mutation from sweeping through the population on its own. Once again, the ANOVA based on the fitness values estimated for the 1500-generation clones reveals no differences between the different genotypes (with and without the pku allele that later fixed), nor among the clones within each genotype (Table 9). This analysis implies that the genotypes that do not have pku (fix) have an unidentified beneficial mutation. Further evidence for the equivalence in fitness of the competing genotypes with and without pku (fix) lies in the fact that 500 generations later, at generation 2000, the frequency of the pku (fix) allele remains at 12.5% (Table 7). Subsequent sequencing 73 identified that at least part of the population is equally matched because it has a different mutation in pku than the one that eventually fixed. At generation 2500, the pku (fix) allele finally became numerically dominant (Table 7). This finding suggests that an unidentified beneficial mutation has arisen on this genetic background and driven the frequency increase. It should be noted that this is the only time we have unequivocal evidence for a mutation that will eventually fix that has not been identified, although it is possible that another beneficial mutation went undetected with the earliest set of mutations. The subpopulation that does not have the pku (fix) allele is still present, indicating that it is either being replaced or that it, too, acquired an additional beneficial mutation. The lack of significant variation in fitness (Table 9) lends support to the latter interpretation. Also, members of this subpopulation are present, albeit at low frequency, at 3000 generations, indicating that it may have acquired two beneficial mutations since the time it diverged from the lineage that will eventually win. By generation 4000, the pku (fix) allele had finally fixed in the population after appearing sometime before generation 1500. The final beneficial mutation tracked in this study lies in ppr. The ppr(fix) allele was first seen at 3000 generations, had become common by generation 4000, and was found in all clones screened at 5000 generations. Subsidiary evolution experiment. The results from the allele-detection assays in population Ara-l indicate that the ancestral genotype was displaced not by a single beneficial allele, but instead by a cohort of multiple independently derived and distinct beneficial mutations. These subpopulations then coexisted through several adaptive steps. It is of interest to know whether this was an unusual occurrence or an outcome to 74 be expected in this system. To that end, we allowed ten additional populations to evolve under the same conditions and with the same starting genotype as population Ara-l, except that a neutral allele was present at an initial frequency of 0.5. The frequency of that neutral allele was measured every 3 days (~20 generations), and the resulting data are shown in Figure 11. We attempted to get three types of information from these populations. First, we estimated the time at which the first beneficial mutation, or cohort of beneficial mutations, swept through the population. Second, we looked for the presence of multiple competing subpopulations during the sweep. Finally, we use the data to estimate the number of competing subpopulations that persisted through the first step. To estimate the time at which the first beneficial mutation or cohort began to eliminate the ancestral genotype, we identified the time at which the ratio of Ara- to Ara+ cells first deviates significantly from 1. Our measurement of that ratio is a sample from a binomial distribution where the expected value is the actual ratio in the population. We concluded that a significant shift had occurred when an observed ratio was unlikely to have been drawn from a population with a 1:1 ratio, using a conservative probability cutoff of p < 0.001. On average 433 colonies were counted from each population at each time point. Populations 1-10 first deviated from the ancestral ratio at generations 200, 200, 140, 260, 200, 200, 180, 180, 260, and 180, respectively. The mean time is 200 generations, which agrees well with the timing of the first step in population Ara-1 based on changes in fitness and cell morphology (Lenski and Travisano, 1994; Elena, Cooper and Lenski, 1996). 75 Figure 11. Changes in frequency of a neutral allele over ~880 generations of evolution in ten additional populations. An average of 433 colonies were counted from each population every 20 generations (3 days). The ordinate indicates the Loglo ratio of Ara- /Ara+ colonies, and the abscissa indicates time in generations. Notice the near constancy of the ratio for the first 100 generations or so, and the substantial shifts thereafter; such dynamics are hallmarks of selection acting on beneficial mutations that happen in one or both marked genetic backgrounds. 1° 01 log(red/pink) .1 I o to 01 . .0 .0 7‘ .1 01 o 01 .1 01 [\D .1— 1 1 lb 01 I T 100 200 800 generations 76 Next we asked whether the ancestral population was being eliminated by one or multiple subpopulations. Visual inspection of Figure 11 suggests that multiple subpopulations are replacing the ancestor in at least nine out of the ten populations. Following the initial shift away from 1, the ratio then either levels out or reverses direction at least once in nine populations. These kinds of dynamics are possible only if subpopulations with beneficial mutation have independently arisen on each of the two ancestral backgrounds (Ara- and Ara+). In fact, at least four of the ten populations still have not fixed one or the other Ara marker allele by 880 generations, demonstrating that no beneficial mutation had swept to fixation. By generation 1000, population Ara-1 was already approaching its third adaptive step (Lenski and Travisano, 1994); yet it still had not fixed any known mutation, even though five loci were polymorphic for beneficial mutations (Table 8). We would like to estimate the number of subpopulations harboring independent beneficial mutations that compete in these populations. To that end, we used a maximum likelihood approach to estimate this number during the initial adaptive step. We define a contending beneficial mutation as one that reaches a high enough frequency to be detected with our sampling (at least one in ~433). The number of contending mutations that make up the first adaptive step in a population is designated n, and we want to calculate its likelihood function given that nine out of ten populations remained polymorphic through the first adaptive step. For one population, the probability of remaining polymorphic is one minus the probability of becoming fixed, which can only happen if all n contending mutations occur in the same genetic background (either Ara+ or Ara-). Therefore, for a single population, the probability of remaining polymorphic is 77 1 — (f " + (1— f )"), where f is the frequency of the neutral allele (equal to 0.5 in our experiment). Expanding this equation to the probability of nine of ten populations remaining polymorphic, the likelihood function for n is 10 L0) =( 9)(1-(f" + (1— f)"))9(f" + (1- fl")- Figure 12 plots this function for n from 1 to 20. The 95% confidence interval was estimated using a likelihood ratio test (Sokal and Rohlf, 1995). The exact probability distribution for this case is unknown but can be approximated by a )6 distribution with one degree of freedom (Wilks, 1938; Self and Liang, 1987; Sokal and Rohlf, 1995). Using this approximation, likelihoods falling below 0.0305 are significantly less likely, at the p < 0.05 level, than the maximally likely n. Therefore, our best estimate is that, on average, about four contending subpopulations take part in the first adaptive step, with the 95% confidence interval from three to nine. DISCUSSION In this study, we examined in detail the dynamics of substitution of known beneficial mutations in focal population Ara-1 of the long-term E. coli evolution experiment. We also examined the dynamics of adaptation in ten other populations in order to evaluate whether the complex patterns observed in the focal population were typical. In the next two sections, we summarize specific conclusions from these analyses. We then discuss more general implications of our findings for understanding the dynamics of adaptation in large asexual populations. Dynamics of substitution of beneficial mutations in population Ara-I . Genetic tests were performed on >400 clones sampled during the first 5,000 generations of this 78 Figure 12. Likelihood estimation of the number of contending mutations, n, that make up the cohort of beneficial mutations that displaces the ancestral population. Calculations are based on the outcome that nine out of ten populations remained polymorphic for the neutral allele through the first adaptive step (Figure 11). The likelihood of each n was calculated given this outcome. A likelihood ratio test indicates that likelihood values less than 0.0305 are significantly less likely than the maximally likely value of n at the p < 0.05 level. Thus, the best estimate for the number of contenders is four, with the 95% confidence interval ranging from three to nine. 114 I r I r [135 [13 025 0.2 likelihood 045 DJ 005 0 5 10 15 20 number of contenders (n) 79 focal population of the long-term E. coli evolution experiment. These data document the rise and eventual fixation of beneficial mutations at six loci. For four of these loci — rbs, topA, spoT, and glm US — competition assays with isogenic constructs have confirmed the beneficial effects of mutations (Cooper et al., 2001; Crozat et al., 2004; Cooper et al., 2003; Stanek and Lenski, unpublished data). For the other two loci — pku and ppr — parallel substitutions of similar mutations in all or several of the replicate populations indicate their beneficial nature (Chapter 1). The order in which the beneficial alleles arose was rbs, topA, spoT, glmUS, pku, and ppr with first appearances in generational samples 500, 500, 1000, 1000, 1500, and 3000, respectively. In those cases where multiple alleles were first detected in the same sample, their order of appearance was deduced from the existence of clones that had one but not both alleles. The alleles did not fix independently but instead they were substituted in more or less discrete sets, with rbs, topA, spoT, and glm US being fixed between generations 1000 and 1500, followed by pku between generations 3000 and 4000 and by ppr between generations 4000 and 5000. Another beneficial mutation, as yet unidentified, was also substituted between the mutations in pku and ppr. With respect to the initial set of beneficial alleles, it is impossible to say whether their fixations were simultaneous or only nearly so, as this distinction depends on the order in which the competing lineages with some or all of the ancestral alleles were eliminated; our study lacked the sampling intensity necessary to resolve that point. In any case, the existence of such sets of linked beneficial mutations depends on the phenomenon of clonal interference, which has important effects on the dynamics of adaptation in large asexual populations (Muller 1932; Gerrish and Lenski 1998). Clonal 80 interference occurs when multiple beneficial mutations arise in different asexual lineages, such that the lineages compete with one another and impede the progress of any of them to fixation. If the production of further beneficial mutations was somehow stopped after two or more lineages had arisen, each with single beneficial mutations, then eventually the single most fit mutation would prevail. However, its time to achieve fixation would be slowed down relative to the time necessary if it competed only against the ancestral type, and the magnitude of the delay could be substantial if one of the other lineages had a beneficial mutation with nearly the same beneficial fitness effect. Thus, for example, it would take approximately 10 times as long for a lineage carrying a mutation with a 10% advantage to be fixed in the presence of another lineage carrying a mutation with a 9% advantage as it would in the absence of that other lineage; this difference arises because the eventual winner’s advantage is only 1% against that other lineage. But the generation of new beneficial mutations does not stop at some arbitrary point. In fact, the likelihood that a second beneficial mutation arises in a lineage that already has one such mutation is substantially greater with clonal interference, which increases the total number of cell generations before any beneficial mutation is fixed. In the hypothetical case above, and now allowing secondary beneficial mutations, the eventual winner between the lineages that acquired the mutations with 10% and 9% benefits would likely depend on which lineage produced the next mutation with a substantial benefit before the other was driven extinct. And if both lineages produced secondary mutations with similar benefits, then the winner might depend on which one produced the best tertiary beneficial mutation. It is precisely this scenario, in which several beneficial mutations are assembled in a single lineage before all of them are eventually fixed, that happened in population 81 Ara-1. In addition to the dynamics of the beneficial alleles, which are fully consistent with this conclusion, other lines of evidence can be explained only by this scenario. In particular, clones lacking the beneficial mutations that were eventually substituted were often as or more fit than clones from the same generation that had those mutations. A particularly striking example was seen at generation 500; some clones that had neither the rbs(fix) allele nor the topA(fix) allele were more fit than other clones that carried both (Table 8). Although the former clones were indistinguishable from the ancestor based on the genetic tests we used, they were some 20% more fit than the ancestor, thus proving that they had other unknown beneficial mutations. In addition to qualitatively similar results from other generations, we also identified three beneficial alleles that arose but were eventually lost; all three were in genes where other beneficial mutations later fixed. Two of the lost beneficial mutations were in rbs, and the other was in pku. Moreover, the lost pku (A) allele arose in the same lineage that carried the lost rbs(A) allele, and this lineage had a fitness at generation 1000 that was almost as great as the lineage that carried the four beneficial mutations that eventually fixed in rbs, topA, spoT, and glm US. Thus, two beneficial mutations, and probably more, accumulated even in some lineages that were ultimately excluded by the eventual winner. Dynamics of beneficial substitutions in relation to dynamics of fitness. In this section, we examine the correspondence between the dynamics of beneficial substitutions documented in this chapter and the dynamics of fitness and cell morphology reported in previous studies performed on this same focal population. Lenski and Travisano (1994), using single clones saved every 100 generations, showed that fitness increased in a step- like manner in this population, while Elena et al. (1996) showed a corresponding pattern 82 Figure 13. Results of a previous study of focal population Ara-1 showing the step-like increases in fitness. Clones were sampled every 100 generations over the first 2000 generations of evolution, and their fitness values were measured relative to the marked ancestral strain using the same procedures as used for all of the fitness values reported in the current study. The solid line shows the fit of a step model to the data, which was significantly better than various simpler models. Figure reproduced from Lenski and Travisano (1994). 1.4 - (D m 1.3 LIJ E I: 1.2 LlJ 2 1.1 '3 LIJ a: 1.0 0.9 _ #1 1 l J O 500 1000 1500 2000 TIME (generations) 83 for average cell volume. The two datasets showed similar timing of the step-like changes in their respective phenotypes. The fitness dataset is reproduced here as Figure 13. The originally proposed explanation for the dynamics was that each step reflected a sequential and rapid fixation of a rare beneficial mutation. The findings of this study disprove that explanation. Instead, our analyses demonstrate that several subpopulations produced similarly beneficial mutations, with no lineage able to displace all the others until it had acquired multiple mutations. Therefore, each step-like shift encompasses several more or less simultaneous transitions in multiple subpopulations. At first glance, it may seem rather surprising that such complex dynamics would preserve a step-like appearance. However, these step-like dynamics reflect only the mean behavior of the population; mathematical analyses and numerical simulations have demonstrated that such dynamics are rather general in evolving asexual populations, even when clonal interference gives rise to processes more complex than temporally isolated sequential sweeps (Gerrish and Lenski, 1998; Chapter 2). The step-like increases in fitness, coupled with coexistence of multiple lineages through the steps, imply that the among-genotype variation in fitness (at least among the main contenders) should be low within a generational sample relative to the change in fitness across steps. Consistent with this view, the fitness data show an increase of ~35% over the first 2000 generations (Table 8), while most of the analyses of variance show little or no variation in fitness between co-occurring genotypes and clones within genotypes (Table 9; Figure 11). The only time point at which we detected significant variation in fitness was in the SOC-generation sample, where we observed significant variation among the co-occurring genotypes. Fisher’s fundamental theorem of natural 84 selection states that the rate of adaptation should be proportional to the variation in fitness in a population (Fisher, 1930), and at 500 generations the population was adapting most rapidly (Lenski et al., 1991; Lenski and Travisano, 1994). Moreover, the 500-generation sample appears to have captured the beginning of a step up in mean fitness. The fitness of the least fit genotype was ~1. 14 (Table 8; Figure 10) and consistent with the fitness step that ended around 500 generations (Figure 13; Lenski and Travisano 1994), whereas other genotypes had fitness values that averaged ~1.2l (Table 8; Figure 10) consistent with the subsequent step (Figure 13; Lenski and Travisano 1994). Dynamics of adaptation in ten replicate populations. The results obtained with the focal Ara-1 population demonstrate the importance of clonal interference, which gave rise to complex dynamics of substitution of beneficial mutations. In particular, several beneficial mutations were incorporated into the lineage that eventually prevailed before it was able to exclude other contending lineages that had also produced multiple beneficial mutations. Although these dynamics are fully consistent with previous work examining the dynamics of fitness in the same focal population, the interpretation is considerably more complex than was suggested by the phenotypic dynamics alone. Therefore, we sought to determine whether these complications were an idiosyncratic feature of that particular population or, alternatively, whether such complication would be typical of other populations under the same conditions. The analyses of the mutational dynamics and clonal estimates of relative fitness were extremely intensive for the focal population, which made it impossible to perform comparable analyses for the other long-term populations. Therefore, we designed a new evolution experiment specifically to address whether qualitatively similar effects would 85 occur in other populations under the same selective conditions and population size. We started 10 new populations, each with a 50:50 mixture of two neutral marker states. If the results obtained with focal population Ara-1 were atypical, and instead single beneficial mutations were responsible for most selective sweeps, then we would expect one or the other marker state to sweep to fixation. On the other hand, if the results in population Ara—1 were typical, then we expect to see evidence for strong clonal interference among multiple contending beneficial mutations. In that case, the markers should change from their initial frequencies, but neither state should sweep rapidly to fixation. Consistent with the latter hypothesis, in almost all populations the markers remained polymorphic long after beneficial mutations had arisen that displaced their relative frequencies from the initial 50:50 ratio (Figure 12). This outcome confirms that focal population Ara-1 was typical in having cohorts of several contending beneficial mutations, which caused clonal interference that, in turn, required the accumulation of multiple beneficial mutations in a lineage before they could collectively be substituted in the evolving population. Implications for the phenomenon of periodic selection. These dynamics - in which cohorts of several beneficial mutations contend for fixation and multiple beneficial mutations must accumulate in the winning lineage before any of them can be substituted — impact the genetic variation that exists during the adaptive process. According to the classic model of periodic selection in asexual populations, neutral and deleterious mutations accumulate between selective sweeps by beneficial mutations, but these variants are purged as a single beneficial mutation — derived from a single cell that is, in most cases, on an otherwise unmutated background - replaces all other genotypes 86 (Atwood, Schneider and Ryan, 1951a, b). However, under the conditions in our experiments, this model does not hold, at least not in its original form. Instead, there are a number of sub-populations present at all times, each with different sets of contending mutations. Although each sub-population traces its ancestry to a single individual, each adaptive step typically involves several subpopulations. Thus, there is a higher chance that a neutral mutation will survive an adaptive step because it can occur in any of several individuals, instead of just one. Still, one would expect that most of the neutral and deleterious variation will be purged, as in the classic periodic selection experiments. An important future research objective is to understand precisely how these dynamics affect the expected frequency distributions of neutral and non-neutral genetic mutations. Implications for indirect selection on evolvability. When asexual lineages have to accumulate multiple beneficial mutations in order to out-compete other lineages with contending mutations, this situation may allow indirect selection for genotypes with greater adaptive potential. Variation in evolvability could reflect differences either in genetic architecture or mutation rates. In the first case, existing mutations might interact epistatically with potential subsequent mutations such that the fitness benefit of the latter mutations depend on the presence of the former mutations (e. g., Lenski et al., 2003b). In the second case, mutations in certain genes affect processes of DNA repair such that rates of mutation are increased either locally (Moxon et al., 1994) or globally (Sniegowski et al., 1997; Matic et al., 1997; Oliver et al., 2000). In either case, if a mutation affects the likelihood of additional beneficial mutations occurring, then this fact will influence its own probability of eventual fixation. 87 In fact, there are reasons to think that some of the mutations that were polymorphic for extended periods of time in focal population Ara-l may exhibit epistatic interactions that influence their fate. Both spoT and topA are global regulators, and mutations in those genes affect the expression of many other genes (Pruss and Drlica, 1989; Steck et al., 1993; Cashel, 1996; Cooper et al., 2003). Also, in the case of spoT, many of the other long-term populations substituted mutations in spoT, but four had not done so even after 20,000 generations (Cooper et al., 2003) . The spoT mutation that fixed in population Ara-l conferred a 9% advantage when it was moved into the ancestral genetic background, so it is surprising that similar mutations would not have fixed in all the populations after 20,000 generations. The populations had slowed significantly in their rate of adaptation after 10,000 generations, and therefore mutations of such large effect would have had little competition if they occurred. And such mutations should have occurred repeatedly in each population. With a base-pair mutation rate of about 1.4 x 10"“ (Lenski et al., 2003), at least eight different mutations within spoT able to give similar advantages (Cooper et al., 2003), and an effective population size of ~3.3 x 107 (Lenski et al., 1991), we expect that favorable mutations in spoT occurred every 26 generations or so, and one of them should survive stochastic drift loss about every 146 generations, given a probability of surviving drift loss of approximately 2s or 18% (Haldane 1927; Lenski et al., 1991; Johnson and Gerrish 2002). Moreover, when the spoT(fix) allele from population Ara-1 was moved into a clone sampled at generation 2000 from another population that did not fix a mutation in spoT even after 20,000 generations, that allele no longer conferred any fitness advantage (Cooper et al., 2003), 88 which confirms the importance of epistatic interactions in determining the fate of mutations. Implications for the advantage of recombination. The Fisher-Muller hypothesis is a prominent explanation for the advantage of genetic recombination (Fisher 1930; Muller 1932, 1964; Crow 1965; Felsenstein 1974). According to this hypothesis, sexual populations can adapt more quickly than asexual ones because sex allows mutations that arise in different genomes to be recombined into a single genome, whereas asexual populations must wait for each mutation to arise sequentially on a single genetic background. Several recent experimental evolution studies support this hypothesis by showing that sexual populations do adapt more quickly than otherwise identical asexual populations (Colegrave 2002; Goddard, Godfray and Burt, 2005; Grimberg and Zeyl, 2005). Our data on allele frequencies also lend support to this hypothesis by showing that beneficial alleles were indeed prevented from spreading by competition with other beneficial alleles. One important caveat to the Fisher-Muller hypothesis is that the alleles that are competing must be different, such that bringing them together through recombination would be beneficial. The pku (A) allele, which was eliminated, preceded the pku (fix) allele that was eventually fixed, and therefore it is likely that its recombination into the lineage that won would have accelerated the population’s overall rate of adaptation. By contrast, most contending sub-populations at generation 500 had equivalent mutations in the rbs operon, and therefore recombination involving that gene may not accelerated the overall rate of adaptation. The rbs mutations are a special case, however, because they occur at an unusually high mutation rate owing to an adjacent genetic element that produces mutations at that site (Cooper et al., 2001). More 89 generally, the strong clonal interference documented in our study indicates that asexual reproduction can indeed limit the overall rate of adaptive evolution (Fisher 1930; Muller 1932, 1964; Crow 1965; Felsenstein 1974; Gerrish and Lenski 1998; de Visser et al., 1999). 90 CHAPTER 4 VARIATION IN EVOLVABILITY PREDICTS EVENTUAL SUCCESS IN AN EVOLUTION EXPERIMENT WITH ESCHERICHIA COLI ABSTRACT Explanations for the evolution of evolvability often rely on ill-defined group or species level selection. Here a model for selection on evolvability within a single population is tested, which has the potential to explain many traits. This model applies to large asexual populations. If this model is to explain selection for evolvability, then (i) independent subpopulations must coexist through multiple adaptive steps and (ii) the subpopulations that are competing must differ in their evolvability. The first condition was described in an experimental Escherichia coli population. Here, the second condition is tested using four clones that were sampled from that population. Two of these clones were of the genotype that eventually prevailed. The other two were from genotypes that, although they competed for hundreds of generations and continued to adapt, eventually lost. Ten replicate populations were started from each clone and evolved for 883 additional generations. Populations founded from the eventual winners began to adapt sooner and reached higher fitness. These results are consistent with the hypothesis for indirect selection for evolvability. INTRODUCTION The evolution of evolvability, that is, the capacity to produce heritable variation (Krischner and Gerhart, 1998), poses a unique challenge to evolutionary biologist. 91 Evolvability cannot be selected in the usual Darwinian sense because it does not directly affect an organism’s fitness, yet traits that influence evolvability are common in nature. For example, chaperone proteins (Rutherford and Lindquist, 1998; Queitsch Sangster and Lindquist, 2002, Fares et al., 2002), the [PSI+] prion in yeast (True and Lindquist, 2000), the modular nature of metazoan development (Krishner and Gerhard, 1998; Halder, Callerts and Gehring, 1995), abundant retrotransposons in mammalian genomes (Kazazian, 2000; Gould, 2002 p1273), codon bias in viral antigens (Plotkin and Dushoff, 2003), contingency loci in microbial pathogens (Moxon et al., 1994), increased mutation rates during times of stress (Bjedov et al., 2003), and increased genomic mutation rates in pathogenic Escherichia coli (Matic et al., 1997) and Pseudomonas aeruginosa ( Oliver et al., 2000), have all been suggested to contribute to evolvability. Evolvability can evolve when it is pleiotropically associated with some other trait that is experiencing direct selection. For example, the codon bias in DNA regions that encode antigens in influenza A is well suited to maximize the speed of adaptation. However, this pattern may have been generated by past selection related to the direct fitness advantage of beneficial mutations (Plotkin and Dushoff, 2003). Likewise, chaperone proteins usually have the direct effect of lessening the deleterious effects of misfolded proteins caused by high temperatures or mutation, but they also therefore maintain variation that might be unveiled and used for adaptation to changing environments (Rutherford and Lindquist, 1998; Queitsch Sangster and Lindquist, 2002, Fares et al., 2002). However, once a trait that increases evolvability exists, the advantage to the group having it, relative to other groups, becomes straightforward. Several authors have proposed that evolvability may give a selective advantage to groups or species 92 (Jablonski, 1987; Dawkins, 1989; Gould, 2002 pp 1270-1295, Lindquist 2003). Even Dawkins, who has strongly argued for the prominence of gene-level selection in most situations (Dawkins, 1976), has suggested that evolvability may be an exceptional case of species-level selection (Dawkins, 1989). However, arguments for group selection must be made with caution (Williams, 1966). Therefore, a concrete model for the evolution of evolvability within a single population has been developed that applies to large asexual populations. Microbial populations that have little recombination have produced some of the clearest examples of traits that apparently function to increase evolvability (Moxon et al., 1994; Matic, 1997; Oliver et al., 2000). Several recent studies of computer simulations (Tenaillon et al., 1999; Chapter 2) and experimental microbial populations (Mao et al., 1997; Chapter 3) have yielded the following model of selection for evolvability within an asexual population. When favorable mutations are common, such as when organisms experience a change in their environment, several independent mutations may arise and escape stochastic loss before any one can sweep through the population (Muller, 1932; Gerrish and Lenski, 1998; Wilke, 2004). Collectively, the genotypes that have these beneficial mutations will displace the ancestral genotype. But without recombination to bring these mutations together into a single genome, these beneficial mutations will then compete with each other. At this point, if no more beneficial mutations arose the most fit among these single mutant genotypes would eventual take over the population. However, when fiirther beneficial mutations are possible, several genotypes that have different first beneficial mutations will produce second beneficial mutations before the most fit single mutation dominates the population. Therefore, eventual success for a beneficial mutation 93 may depend not only on the direct fitness advantage it confers, but also on the opportunity it allows for additional adaptation, relative to the other contending mutations. These independent subpopulations may, in fact, compete over many adaptive steps before one eventually out-competes all others (Chapters 2 and 3). A lineage that has the ability to generate more beneficial mutations or beneficial mutations that confer a greater selective advantage (i.e., has increased evolvability) can increase its proportion in the population at each adaptive step. With this model, evolvability confers an advantage to a lineage over time. It is the very fact that the lineage can produce more favorable mutations that gives it this advantage. Thus, selection is for evolvability. However, this selection is indirect because the increased representation of a lineage is always due to the fitness of individuals that make up that lineage. The fitness of individuals is a product of the beneficial mutations they harbor, not their ability to generate future beneficial mutations. The dynamics of clonal interference, which this model relies upon, have been previously described in detail in an experimental population of Escherichia coli (Chapter 3). In that population, multiple subpopulations coexisted through multiple step-like increases in fitness before one subpopulation eventually out-competed all others. It was further shown that independent subpopulations arose at an early time point; the descendants of these genotypes subsequently competed for hundreds of generations and accumulated several additional mutations before the eventual winners (henceforth abbreviated as EW) displaced the eventual losers (henceforth EL). It is possible that the lineage that eventually won did so, in part, because it had a higher propensity to evolve. In this case it would be an example of selection on evolvability as described above. 94 Alternatively, the winner may have been the lineage that, simply by chance, picked up more beneficial mutations. The experiments reported here are designed to distinguish between these two possibilities. From that early time point, we picked two representative clones of the genotype whose descendants would eventually take over the populations, and two representative clones of the genotypes whose descendants would compete, and further adapt, yet eventually become extinct. We allowed many replicate populations founded from these clones to evolve independently and measured their abilities to adapt, which we then compared. MATERIALS AND METHODS The long—term lines and population Ara-I. Twelve populations of E. coli were propagated through serial batch transfer in a long-term evolution experiment (Lenski et al., 1991; Lenski and Tranvisano 1994; Lenski, 2004; Chapter 3). The populations were grown in 10 ml Davis minimal media supplemented with 25ug/ml of glucose (DM25), at 37°C in 50-ml Erlenmeyer flasks rotating at 120 revolutions per minute. Daily transfers of 1% of the populations into fresh media resulted in ~6.64 generations/day. One of these twelve populations, called Ara-1, has been the focus of a number of previous studies (Lenski et al. .1991; Lenski and Travisano, 1994; Elena, Cooper and Lenski 1996; Cooper, Rozen and Lenski, 2003; Papadopoulos et al., 1999; Schneider et al., 2000; Crozat et al., 2004; Chapter 3), and it is the focus of this study as well. E ventual winners and eventual losers. A previous study (Chapter 3) demonstrated that, in population Ara-l, multiple subpopulations coexisted thorough multiple adaptive steps. From a sample collected at generation 500 from this population, four genetically 95 distinct subpopulations were identified. One of these had two mutations that would eventually fix in the population, including a large deletion in the rbs operon (Cooper et al., 2001) and a point mutation in the topA gene (Crozat et al., 2004). Because these mutations eventually fix in the population, this genotype is called the “eventual winner” (EW). Two representative clones of the EW genotype were picked (designated EWl and EW2). Two other genotypes that were present at 500 generations had no mutations known to fix in the population, yet their fitness clearly indicated they contained beneficial mutations (Chapter 3). Because these mutations were eventually eliminated from the population, we refer to the genotypes having these mutations as the “eventual losers” (EL). Two representative clones of the EL genotypes were chosen from the sample taken at generation 500. One of these, ELI, contained no known mutations. A second clone, EL2, had a deletion in rbs that did not fix. In population Ara-l the descendants of the EL and EW genotypes were still present and competing with each other 500 generations later, even though the descendants of the EW genotype had picked up two additional beneficial mutations. The descendants of the EW genotype eventually took over the population between 1000 and 1500 generations. (A fourth genotype identified in Ara-1 at 500 generations had the deletion in rbs that would fix but not the topA mutation that would fix; it was not included in this study.) Ara+ revertants. Clones that differed fi'om EWl, EW2, ELI, and EL2 at only a neutral locus were created as follows. All of the clones from population Ara-l were unable to use the sugar L(+) arabinose, a phenotype called Ara-. Mutations at this locus that are Ara+ are known to be selectively neutral in DM25 but they give a readily visible phenotype when grown on tetrazolium arabinose (TA) indicator agar. Ara- colonies 96 appear red while those that are Ara+ appear light pink. Mutants that were capable of using arabinose were selected by plating >10lo cells onto minimal media agar plates supplemented with arabinose. Colonies that grew on these plates contained a spontaneous mutation in araA, which allowed for the metabolism of arabinose. All Ara+ revertants were shown by a restriction fragment length polymorphism assay to contain identical nucleotide substitutions. Given these procedures and the very low genomic mutation rate in the ancestral strain (Lenski, Winkworth and Riley, 2003) it is unlikely that these genotypes contained any mutations other than the one responsible for the Ara+ phenotype. Fitness assays. The standard fitness assay for this system was used (Lenski etal., 1991). For each assay, two clones were inoculated separately into flasks containing LB liquid medium from the freezer stocks and allowed to grow for 24 hours at 37°C. These cultures were then diluted 100 fold, and 0.1 ml was transferred to 9.9 ml of DM25 medium where the cultures were again allowed to grow separately for 24 hours at 37°C. This step allows for physiological acclimation to the competition conditions. The two clones were then mixed at a 1:1 volumetric ratio (unless otherwise noted) and a combined 0.1 ml was added to 9.9 ml fresh DM25 medium. A sample was taken from this mixture and plated onto TA agar to estimate the initial population density of each type (N). The mixture was propagated for one or more days with daily 1:100 dilutions into fresh DM25 medium. At the end of the experiment a second sample of the mixture was plated onto TA agar to estimate the final density of each type (N f). The Malthusian parameter (m) is the realized grth rate of a clone over the test period. It is calculated as m = ln[100' - N/NJ/(t), where t is the number of days, and where both counts reflect changes in overall 97 density based on the dilutions used for plating. The relative fitness of one clone to another is the ratio of their Malthusian parameters. Additional evolution. To test the inherent evolvability of the four clones (EWI, EW2, ELI, and EL2) ten replicate populations were founded with each clone and their subsequent adaptation was quantified. Each population was started with a 1:1 ratio of a clone and its cognate Ara+ revertant. The evolution environment was identical to that experienced by population Ara-1, during the long-term evolution experiment. These populations were labeled alphabetically within each ancestral clone (EWla, EWlb, and so on). The duration of the experiment was somewhat arbitrary. However, 883 generations approximates the amount of time until the eventual winners overtook the eventual losers in population Ara-l, which can be seen from the steps in fitness (Lenski and Travisano, 1994) and cells size (Elena, Cooper and Lenski, 1996) and the changes in allele frequencies (Chapter 3). Timing the rise of the first beneficial mutation. Every 3 days (~20 generations) 3 sample, containing ~400 colony forming units, from each population was plated onto tetrazolium arabinose (TA) indicator agar plates. Any substantial deviation from the initial ratio indicated the presence of a new adaptive mutation (Chao and Cox, 1983; Rozen et al., 2002; chapter 3). We can be sure of this inference because the neutral allele at a frequency of 0.5 is unlikely to vary significantly due to random drift alone given the large population size and relatively few generations. The expected mean time to fixation, or loss, by drift alone for a mutation initially at a frequency of 0.5 and Ne ~3.3x107 is about 90 million generations (Kimura and Ohta, 1969). No detectable shift would be measured after only 900 generations. Therefore, we can infer that at least one beneficial 98 mutation has arisen to a high enough frequency to be detected based on the change in frequency of the linked neutral marker. The following protocol was used to decide on the time point at which a shift in neutral allele frequency was first detected. First, for each population we calculated the initial frequency of the neutral allele by averaging over the first 5 time points (0, 20, 40, 60, and 80 generations), during which time the frequency was constant. This averaging gives a good estimate of the initial frequency, which was expected to be 0.5 and in most case was very close. Second, for each population and at each time point we calculated the probability that the observed ratio of Ara+ to Ara- could have been drawn from a population with a mean equal to the initial value in that population. The frequency was tentatively determined to have significantly changed when the null hypothesis was rejected at p < 0.05, which required about a 5% shift in allele frequencies. However, the large number of samples taken suggests that deviations of p < 0.05 would also sometimes occur by chance. Therefore, we additionally required that the samples must have differed from the initial ratio in two consecutive time points in order to accept a tentative determination as valid. Fitness of E W-derived clones relative to EL-derived clones. Comparison of relative fitness between the evolved eventual winners and the evolved eventual losers was carried out as follows. From each of the 40 populations, a single clone was chosen at random from the 883-generation sample. At that time, some populations had fixed the Ara- phenotype, some had fixed the Ara+ phenotype, and others remained polymorphic. Thus, the clones picked were a mixture of the Ara+ and Ara-. Each of the Ara+ clones derived from an eventual winner was competed against each of the Ara- clones derived 99 from an eventual loser, and vice versa. Of the 20 evolved clones derived from EWl and EW2, 12 were Ara- and 8 were Ara+. Of the evolved EL populations 12'were Ara- and 8 were Ara+ clones. Each possible EW-EL pair with opposite marker types was competed once, although two of the measures did not yield useable results owing to procedural errors. Thus, there were 190 (12x8+8x12-2) fitness assays. For the statistical analysis, these data were treated as two independent data sets. A mixed general linear model was fit to the two data sets independently using the MIXED procedure in SAS. Each data point is the fitness of one EW-derived clone relative to one EL-derived clone. In the model the intercept was treated as a fixed effect and derived clone was treated as a random effect. The Satterthwaite approximation was used for the degrees of freedom to correct for the two missing data points. The model estimated an intercept and a standard error around the intercept. Hypotheses concerning the intercept were tested with a t-test (Sokal and Rohlf, 1995), and degrees of freedom from the Satterthwaite approximation. Random effects were tested with a likelihood ratio test (LRT), where the difference in the —2 log likelihood value of a full model is subtracted from the -2 log likelihood of a reduced model. This difference is compared to a x2 distribution with degrees of freedom equal to the difference in the number of parameters in the two models (Littell et al., 1996; Chapter 3). The full model, which contained the intercept, the EW-derived clone, and the EL-derived clone, was compared to reduced models in which the term for either the EW-derived clone or the EL-derived clone was removed. Two of the evolving populations experienced pipetting errors. First, in population EL2h only the Ara- clone was present in the founding population. Therefore, this 100 population was not informative for the timing of adaptation, but it was still used for the fitness assay among 883-generation clones. The second error occurred when a 0.1 ml aliquot from Ele was mistakenly transferred into EWld along with EWld’s normal daily 0.1 ml transfer at day 84, resulting in a population that was half EWld and half Ele. This mixed population was then propagated for several days before the mistake was detected, at which point the decision was made to continue to transfer the mixed population as EWld. Evidently, the EW-derived subpopulation out-competed the EL- derived subpopulation. The clone chosen from that population at 883 generation contained a deletion in rbs that demonstrated it had evolved from the EW clone. Therefore, it was still used in the competition assays among the 883-generation clones as the representative clone from EWld. The results, described below, indicate that population EWld had already begun to adapt before this occurred, so it was informative for the timing of the first shift of the neutral allele. RESULTS Relative fitness of E W to EL including a test for frequency dependent efi’ects. Previous data indicated that the EW clones were significantly less fit than the EL clones at 500 generations (Chapter 3). These fitness measurements were made by competing the eventual winners and the eventual losers each against the same reference genotype. The significant fitness difference implied that the EL clones were, at that point, enjoying a temporary fitness advantage. However, it is also possible that competitive interactions are nontransitive, such that the relative fitness to the reference clone does not indicate their fitness relative to one another (Paquin and Adams, 1983; Kerr et al., 2002; Kirkup and 101 Riley, 2004). Additionally, it is possible that the competing subpopulations evolved frequency dependent interactions, which can maintain genetic diversity (Rozen and Lenski, 2000). To measure the fitness of the eventual winners relative to the eventual losers, and to exclude the possibility that they coexisted through frequency—dependent interactions, we performed competitions between the eventual winners and the eventual losers. Each EW clone competed against each Ara+ revertant of the EL clones, and each EW Ara+ revertant was competed against each EL clone. These competitions were performed across 6 initial frequencies that encompass the observed frequencies of the subpopulations at 500 generations in Ara-1 (EW:EL of 1:24, 1:4, 1:1, 4:1, 24:1 and 124:1) in fitness assays that lasted 7 days (~45 generations). Thus, there were eight data points at each initial frequency, with two fitness estimates for each possible EW-EL pair at each frequency. The results of these fitness comparisons are presented in Figure 14. A least- squares regression of the combined data indicates that the slope is not significantly different form zero (95% confidence from -0.00010 to 0.00004). Thus, no frequency- dependant interaction was detected. The intercept was, however, significantly less than one. The two representative EW clones from 500 generations had, on average, a fitness of 0.938 (95% confidence from 0.9344 to 0.9425) relative to the two representative EL clones. Subsequent evolution. The evolutionary potential of the EW and EL genotypes was tested with ten populations founded from each representative clone (EWl , EW2, 102 Figure 14. Relative fitness of eventual winners (EW) to eventual losers (EL) sampled from population Ara-1 at 500 generations. Each of the two representative EW clones was competed against the two representative EL clones twice (once with each marker combination) at 6 initial volumetric ratios (EW:EL of 1:24, 1:4, 1:1, 1:4, 1:24, and 1:124) for seven days. The horizontal line is the mean of the combined data; the regression of fitness on intitial log ratio was non-significant. 0.98— EleELl 0'97- ‘ EWIZELZ . l EW2:EL1 3 0.96— 9 EWZZELZ 0 g 095 O ‘ A — o . g . Hi . , . r t a I A o a) 0.94” ; i : g. 0 ' 3 ° 2 0.93- 3 I ° :3 A O 9 0.92— 0 ‘ I 0.91 r A 0.9 l l l l 1 1 l l I -2 -1.5 -1 -0.5 0. 0.5 1 1.5 2 2.5 log ratio (EW:EL) 103 EL], and EL2) that were allowed to evolve independently for ~883 generations. Each of these populations was initially homogeneous except for a neutral mutation, the frequency of which was tracked through time. Using these 40 populations we compared the evolvability of the EW and EL clones in two ways. First, we identified at which time the beneficial mutations were first observed to displace the founding genotype. Second, we compared the fitness levels achieved by the EW- and EL-derived clones. Figure 15 shows the change in the relative frequency of the neutral markers in each of the evolving populations over the 883-generation experiment. These data collectively represent counts of more than 700,000 colonies. Shifts in the frequency of the neutral marker away from the 0.5 initial value indicate the spread of beneficial mutations in one or both marker types. For each population, the first time at which the marker significantly deviates from its initial value is indicated in Table 10, using the procedure described in the Materials and Methods. An independent test of the validity of this procedure is available as follows. Spurious results are equally likely to over-estimate the frequency as under-estimate the frequency, whereas if the allele frequency has truly shifted then it should difier from the initial value in the same direction in the two consecutive samples. In fact, in all 39 populations for which we used this protocol the first two consecutive samples that significantly differed from the initial ratio did so in the same direction. We can strongly reject the null hypothesis that the results were due solely to chance (sign test, p = 1.8x10'l 1). A Kruskal-Wallis test (Sokal and Rohlf, 1995) for the effect of founding genotype on the timing of this shift revealed that the differences among clones were significant (Adjusted H = 14.68, d.f. = 3, p = 0.0021). Moreover, 104 Table 10. Estimates of the time of first divergence of the neutral allele from its initial ratio. This time was determined as the first of two consecutive measurements that each differed from the initial frequency at the p < 0.05 level. EW 1 generation EW2 generation a 280 a 180 b 460 b 200 c 220 c 160 d 160 d 240 e 340 e 440 f 240 f 240 g 120 g 260 h 260 h 240 i 340 i 260 j 240 j 260 rnean 266 rnean 248 EL 1 generation EL2 generation a 400 a 500 b 480 b 720 c 420 c 180 d 620 d 520 e 380 e 400 f 360 f 360 g 300 g 180 h 420 h -- i 380 i 380 j 660 ggj 440 mean 442 mean 408.9 105 Figure 15. Frequency of a neutral marker in evolving populations. 10 populations were founded from each of four genotypes: EWl, EW2, ELI and EL2. Initially each population was isogenic except for the neutral allele at a frequency of ~0.5. Shifts in the neutral allele indicate beneficial mutations have arisen and are spreading in the population. The dashed line indicates the estimate of the average time at which the allele significantly deviated from the initial ratio (see Table 11). o 200 400 600 800 o 200 400 .. m 800' " time (generations) 106 comparison in this respect of each pair of ancestors showed that EWl z EW2 < EL] 2 EL2, using the p < 0.05 to establish dissimilarity. Overall, the neutral allele began to shift 169 generations sooner in the populations derived from eventual winners than in those derived from eventual losers. Next we can assess the difference in the fitness increase over 883 generations between the EW and EL derived populations. To do this we picked a single clone from each population at 883 generations. These clones were a mix of Ara+ and Ara- marker states. To compare their fitness, each Ara+ clone from an EW population was competed against each Ara- clone from an EL population, and vice versa. This procedure resulted in two independent data sets. There are two hypotheses of interest concerning the fitness of clones derived from EW populations relative those from EL populations. The first predicts that the EW clones adapted faster than the EL clones. We know that the EW clones started with a relative fitness of 0.938 compared to the EL clones. Therefore, if the fitness of the EW-derived clones relative to the EL derived clones is, on average, greater than this initial deficit we can conclude that they did adapt faster. A second hypothesis predicts that the populations founded with EW clones increased to a greater final fitness than the populations founded from EL clones. Under this second hypothesis, the average fitness of EW-derived clones relative to EL-derived clones must be greater than 1. Both hypotheses use one-tailed tests. The two fitness data sets are presented in Figure 16 and Figure 17. For the data set of the Ara- EW-derived clones competed against the Ara+ EL-derived clones, the mixed model estimated an intercept of 1.0205, with a standard error of 0.01 16. Thus, the mean 107 Figure 16. Relative fitness of the Ara- EW derived clones relative to the Ara+ EL derived clones following 883 generations of additional adaptation. Each point is the result of a single fitness assay. The two dashed lines indicate the two null hypotheses of interest: that the derived EW clones have surpassed the initial fitness difference, 0.93 8; and that the EW derived clones have evolved to a higher fitness than the EL derived clones, 1. relative fitness (EW:EL) 1.2 1.15 0.95 t A —+-— 1?.le /' l —9— EL“ I 1 + ELlh , , + Ella ' —v— EL2c / 1 + ELZd ’ + ELZf ‘ "' ' E121 1 4 1 l l 09 l 1 l 1 J l EWla EWlb EWle EWlf EW2a EWZb EWZC EWZd EWZC EWZI EWZI EW2] EW derived clone 108 Figure 17. Relative fitness of the Ara+ EW derived clones relative to the Ara- EL derived clones following 883 generations of additional adaptation. Each point is the result of a single fitness assay. The two dashed lines indicate the two null hypotheses of interest: that the derived EW clones have surpassed the initial fitness difference, 0.938; and that the EW derived clones have evolved to a higher fitness than the EL derived clones, 1. 1.1- 1.05 —"_ ELla -—9—- EU] -—*— Ele + ELlc —‘6'— ELle —<1—- 15ng —>— ELli - - - - ELZb 0 9 —B— ELZe ' —e— ELZg —-*— EL2h —e— Esz' relative fitness (EW:EL) 0.85 0.8 - U Ech EWld Eng EWlh EWli Ele EWZg EW2h EW derived clone 109 fitness is greater than both 0.938 (t = 7.11, d.f. = 14.8, p < 0.0001) and l (t = 1.76, d.f. 14.8, p = 0.049). In the second set of comparisons, in which the Ara+ EW-derived clones were competed against the Ara- EL-derived clones, the results were remarkably similar. The intercept was estimated to be 1.0212, with a standard error of 0.01 16. Again, the mean fitness is greater than both 0.938 (t = 7.17, d.f. = 14.1, p < 0.0001) and l (t = 1.82, d.f. 14.1, p = 0.045). Taken together, these two independent data sets show that the EW clones have the inherent ability to adapt faster than the representative EL clones. Moreover, this difference in rate of adaptation was sufficient to allow the evolved EW clones to overcome their initial fitness disadvantage to become fitter after 883 generations of independent evolution. The results reported above support the contention that the EW clones were more evolvable than the EL clones. Nevertheless, it could be argued that the statistical analysis is not ideal, as the model was run without a term for the particular ancestral clone (either EWl or EW2 for the ancestor of the EW-derived clones and either ELI or EL2 for the EL-derived clones). This approach was used because the a priori comparison of interest concerned the intercept. Including the ancestral genotype would rob that comparison of all but one degree of freedom in the statistical model. A better experimental design would have included more replication at the level of ancestral clone. Strictly speaking, then, the probability values reported above apply only to the representative EW and EL clones and cannot be extended to the general pool of eventual winners and eventual losers present in population Ara-1 at 500 generations. However, a test of the effect of ancestral clones (through a likelihood ratio test of full and reduced models) found that ancestral clone was a non-significant effect for both the EL and EW competitors, in both data sets. 110 We also tested for variation in competitive ability among the independently evolved clones within an ancestral type, either EW or EL, using a likelihood ratio test in which the full model was compared to a model in which one term was removed (Littell et al., 1996; Chapter 3). The effects of derived EL clone and derived EW clone were both highly significant (Table 11). This result indicates that, although there was a general tendency for the EW clones to evolve to a higher fitness than the EL clone, independent populations had significantly heterogeneous outcomes. DISCUSSION The genotypes that were the eventual winners in population Ara-1 evolved more quickly, and to a higher final fitness than the eventual losers, even though they started with a somewhat lower fitness. Therefore, these clones show that the trait of evolvability can vary and, moreover, that increased evolvability arose spontaneously within a single population. The greater evolvability of the eventual winners could be due to a higher beneficial mutation rate, to beneficial mutations of larger effect, or both. In any case, the data support the hypothesis that the lineage that eventually prevailed, from among those present in this population at generation 500 did so, at least in part, because it was inherently more capable of adapting. Previous studies indicated that population Ara-1 was in the midst of a step up in fitness at 500 generations. This interpretation is consistent with the findings reported here that genotype EW was less fit than the eventual losers. The fact that the EW genotype was decreasing in frequency at 500 generations implies that it must have made up a larger proportion of the population previously. Thus, in population Ara-1 the eventual winners 111 had a head start, but this was counterbalanced by the fact that they were at a selective disadvantage. It is also quite possible that at 500 generations in Ara-1, that the EW genotype had already given rise to one or more additional beneficial mutations, which were still at too low a frequency to be detected. EW Ara+:EL Ara- -2 log likelihood LRT p Full model -328.100 EW derived clone -306.400 21.700 <0.0001 EL derived clone -3 13.000 15.100 0.0001 EW Ara-:EL Ara+ -2 log likelihood LRT P Full model -361.100 EW derived clone -350.400 10.700 0.0011 EL derived clone -310.700 50.400 <0.0001 Table 11. Statistical analysis of the fitness of EW—derived clones relative to the EL- derived clones after 883 generations of additional evolution. The full model contained the intercept, the EW-derived clone, and the EL-derived clone. It was compared to reduced models in which the term for either the EW derived clone or the EL derived clone was removed. LRT is the likelihood ratio test (Littell, et al., 1996). Although their greater evolvability may have tipped the odds in favor of the eventual winners, chance probably also played an important role in deciding which lineage eventually prevailed. We see in Figures 3 and 4 that some of the clones derived from eventual winners were more fit than every eventual loser derived clone with which they competed, (e. g., EWlb and EWl j). On the other hand, several of the eventual winner derived clones tended to lose to eventual loser derived clones (e. g., EWlf and EW2b). The variation in outcomes is also evident from the statistical analysis, which indicated significant variation in competitive ability due to both the EW- and EL-derived 112 clones (Table 11). In other words, if one wagered on the eventual winners present at 500 generations prevailing in the long run, it would be a good bet, but not a sure bet. At present, the molecular basis of the greater evolvability of the eventual winners is unknown. The eventual winner contained two known beneficial mutations. One is a deletion in rbs (Cooper et al., 2001). It does not seem likely that this deletion is the source of the differential evolvability, because clone EL2 also has an rbs deletion. The second known beneficial mutation in the eventual winners is in topA (Crozat et al., 2004). The product of topA is topoisomerase I which controls DNA supercoiling. Studies of the effect of DNA supercoiling have shown it may affect the expression of many other genes (Javanovich and Lebowitz, 1987; Pruss and Drlica, 1989; Steck et al., 1993, Gmuender et al., 2001). A possibility, which deserves further study, is that the topA mutation conferred an increase in evolvability through epistatic interactions with mutations at other loci. Conversely, the difference in evolvability could be due to one or more mutations in the eventual losers that limit their potential for evolution. In one sense, this is certainly true. Because EL] and EL2 were more fit, they had already “used up” more of the adaptive possibilities. In this trivial sense, every beneficial mutation that becomes fixed lowers the evolvability to some extent because further mutations at that locus, which had been beneficial before the fixation event, are neutral or deleterious afterwards. The fact that populations founded with EW clones were able not only to catch up in fitness, but to surpass the EL-derived populations, suggests that this is not the only explanation for the difference in their capacity to adapt. 113 It is also possible that the genetic difference between the EL clones and the EW clones that causes the difference in evolvability is a neutral or slightly deleterious mutation that was fixed by drift or hitchhiking. However, this explanation is unlikely for the following reason. It has been estimated that the ancestral genotype used to found population Ara-1 had a mutation rate of 1.44 X 10'10 per base pair per generation (Lenski, Winkworth and Riley, 2003). Given the genome size of 4.64 X 10° base pairs (Blattner et al. , 1997), in a clone picked at 500 generations we would expect, on average, only 0.33 mutations. Furthermore, any single mutation would need to be in either both EW clones or both EL clones. The most recent common ancestor of either pair necessarily occurred some time previous to generation 500, thus further shortening the time available for this non-selected mutation to arise. Finally, this low number is for all mutations. It is unknown how many mutations affect evolvability, but this is presumably only a tiny fraction of all mutations. It seems likely, therefore, that the difference in evolvability between the EW and EL clones was caused by a difference in the mutations that were adaptive at the organismal level. Thus, the consequent effect on evolvability initially spread as a pleiotropic “side effect.” The clonal interference population dynamic, however, limits the ability of a single mutation to fix on the basis of its organismal level selective advantage alone (Chapter 2). This limitation occurs because other competing beneficial mutations are likely to arise in other lineages that confer comparable fitness effects. The eventual winner among these genotypes will be the one that generates more, or more fit, additional mutations over time. If there is no difference among the competing clonal lineages in their inherent ability to do this, then the eventual winner will be due to 114 chance. But if these genotypes differ in their propensity to generate additional beneficial mutations, then those with greater propensity will significantly increase their likelihood of fixing. The results presented here are consistent with the latter scenario. 115 REFERENCES Arnold, B. C., N. Balakrishnan, H. N. Nagaraja. 1992. A First Course in Order Statistics. John Wiley & Sons, Inc., New York Atwood, K. C., L. K. Schneider and F. J. Ryan. 1951a. Periodic selection in Escherichia coli. Proceedings of the National Academy of Sciences of the United States of A merica 37: 146-155 Atwood, K. C., L. K. Schneider and F. J. Ryan. 1951b. Selective mechanisms in bacteria. Cold Spring Harbor Symposia on Quantitative Biolog/ 16:345-355 Bachtrog, D. and I. Gordo. 2004. Adaptive evolution of asexual populations under Muller's ratchet. Evolution 58: 1403-1413 Begg, K. J. and Donachie, W. D. 1998. Division planes alternate in spherical cells of Escherichia coli. Journal of Bacteriology 180: 2564-2567 Bjedov, I., O. Tenaillon, B. Ge’rard, V. Souza, E. Denamur, M. Radman, F. Taddei and Ivan Matic. 2003. Stress-induced mutagenesis in bacteria. Science 300: 1404-1409 Bull J. J., M. R. Badgett, H. A. Wichman, J. P. Huelsenbeck, D. M. Hillis, A. Gulati, 0. Ho and We. J. Molineux. 1997. Exceptional convergent evolution in a virus. Genetics 147: 1497-1507 Burch, C. L. and Chao, L. 1999. Evolution by small steps and rugged landscapes in the RNA virus (1)6. Genetics 151: 921-927 Blattner, F. R., G. Plunkett, C. A. Bloch, N. T. Pema, V. Burland, M. Riley, J. ColladoVides, J. D. Glasner, C. K. Rode, G. F. Mayhew, J. Gregor, N. W. Davis, H. A. Kirkpatrick, M. A. Goeden, D. J. Rose, B. Man and Y. Shao. 1997. The complete genome sequence of Escherichia coli K-12. Science 277: 1453-1474 Campos, P. R. A. and V. M. de Oliveira. 2004. Mutational effects on the clonal interference phenomenon. Evolution 58:932-937 Cashel, M., D. R. Gentry, V. J. Hernandez, and D. Vinella. 1996. The stringent response. Pp. 1458-1496 in Escherichia coli and Salmonella: Cellular and Molecular Biology. F. C. Neidhardt ed. ASM Press, Washington, DC. Chao, L. and E. C. Cox. 1983. Competition between high and low mutating strains of Escherichia coli. Evolution 37: 125-134 Christiansen, F. B., S. P. Otto, A. Bergman and M. W. Feldman. 1998. Waiting with and without recombination: The time to production of a double mutant. Theoretical Population Biology 53: 1 99-2 1 5 116 Colegrave, N. 2002. Sex releases the speed limit on evolution. Nature 420:664-666 Conway Morris, S. 2003. Life ’s Solution. Cambridge University Press, Cambridge C00per, T. F ., Rozen, D. E. and Lenski R E. 2003. Parallel changes in gene expression after 20,000 generations of evolution in E. coli. Proceedings of the National Academy of Sciences of the United States of America 100: 1072-1077 Cooper, V. S. and Lenski, RE. 2000. The population genetics of ecological specialization in evolving Escherichia coli populations. Nature 407: 736-739 Cooper, V. S., Schneider, D., Blot, M. and Lenski, R. E. 2001. Mechanisms causing rapid and parallel losses of ribose catabolism in evolving populations of Escherichia coli B. Journal of Bacteriology. 183: 2834-2841 Crandall, K. A., Kelsey, C. R., Imamichi, H., Lane, H. C. and Salzman, N. P. 1999. Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection. Molecular Biology and Evolutin 16: 372-3 82 Crow, J. and M. Kimura. 1965. Evolution in sexual and asexual populations. American Naturalist 99:439-450 Crozat, E., N. Philippe, R. E. Lenski, J. Geiselmann and D. Schneider. 2004. Long-term experimental evolution in Escherichia coli. XII. DNA topology as a key target of selection. Genetics 1692523-53 Dawkins, R. 1976. The Selfish Gene. Oxford University Press, Oxford Dawkins, R. 1989. The evolution of evolvability. Artificial life: the proceedings of an interdiciplinary workshop on the synthesis and simulation of living systems. C. G. Langton ed. Addison-Wesley, Reading, MA. Volume 6 pp. 201-220 de Oliveira, V. M. and P. R. A. Campos. 2004. Dynamics of fixation of advantageous mutations. Physica a-Statistical Mechanics and Its Applications 337:546—554 de Visser, J ., C. W. Zeyl, P. J. Gerrish, J. L. Blanchard and R. E. Lenski. 1999. Diminishing returns from mutation supply rate in asexual populations. Science 283:404- 406 de Visser, J. A. G. M., and D. E. Rozen. 2005. Limits to adaptation in asexual populations. Journal of Evolutionary Biology 18:779-788 Elena, S. F ., V. S. Cooper and R. E. Lenski. 1996. Punctuated evolution caused by selection of rare beneficial mutations. Science 272:1802-1804 117 Elena, S. F. and Lenski, R. E. 2003. Evolution experiments with microorganisms: the dynamics and genetic bases of adaptation. Nature Reviews Genetics 4: 457-469 Felsenstein, J. 1974. The evolutionary advantage of recombination. Genetics 782737-756 F erea, T. L., Botstein, D., Brown, P. O. and Rosenzweig. R. F. 1999. Systematic changes in gene expression patterns following adaptive evolution in yeast. Proceedings of the National Academy of Sciences of the United States of America 96: 9721-9726 Fisher, R. 1922. On the dominance ratio. Proceedings of the Royal Society of Edinburgh. 42: 321-431. Fisher, R. 1930. The Genetical Theory of Natural Selection. Clarion, Oxford Gerrish, P. J. 2001. The rhythm of microbial adaptation. Nature 413:299-302 Gerrish, P. J. and R. E. Lenski. 1998. The fate of competing beneficial mutations in an asexual population. Genetica 103:127-144 Gillespie, J. H. 1984. Molecular evolution over the mutational landscape. Evolution 38:1116-1129 Gmuender, H., K. Kuratli, K. Di Padova, C. P. Gray, W. Keck and S. Evers. 2001. Gene expression changes triggered by exposure of Haemophilus influenzae to novobiocin or ciprofloxacin: Combined transcription and translation analysis. Genome Research 11: 28- 42 Goddard, M. R., H. C. J. Godfray and A. Burt. 2005. Sex increases the efficacy of natural selection in experimental yeast populations. Nature 434:636-640 Gould, S. J. 2002. The Structure of Evolutionary Theory. Harvard University Press, Cambridge, MA Grimberg, B. and C. Zeyl. 2005. The effects of sex and mutation rate on adaptation in test tubes and to mouse hosts by Saccharomyces cerevisiae. Evolution 59:431-438 Gumbel, E. J. 1958. Statistics of Extremes. Columbia University Press, New York Halder, G., P. Callerts and W. J. Gehring. 1995. Induction of ectopic eyes by targeted expression of the eyeless gene in Drosophila. Science 267: 1788-1792 Haldane, J. B. S. 1924. A mathematical theory of natural and artificial selection. Part I. Transactions of the Cambridge Philosophical Society 23: 19-24 Haldane, J. B. S. 1927. A mathematical theory of natural and artificial selection. Part V: Selection and Mutation. Proceedings of the Cambridge Philosophical Society 23:83 8-844 118 Hermission, J. and P. S. Pennings. 2005. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics 169: 2335-2352 Imhof, M. and C. Schlotterer. 2001. Fitness effects of advantageous mutations in evolving Escherichia coli populations. Proceedings of the National Academy of Sciences of the United States of America 98:1 1 13-1 1 l7 Jablonski, D. 1987. Heritability at the species level - analysis of geographic ranges of cretaceous mollusks. Science 238: 360-363. Jovanovich, S. B. and J. Lebowitz. 1987. Estimation of the effect of coumermycin Al on Salmonella typhimurium promoters by using random operon fusions. Journal of Bacteriology 169: 443 1-4435 Johnson, T. and N. H. Barton. 2002. The effect of deleterious alleles on adaptation in asexual populations. Genetics 162:395-411 Johnson, T. and P. J. Gerrish. 2002. The fixation probability of a beneficial allele in a population dividing by binary fission. Genetica 115:283-287 Kazazian H. H. 2000. Retrotransposons shape the mammalian genome. Science 289: 1 152-1153 Kerr, B., M. Riley, M. F eldman and B. J. M. Bohannan. 2002. Local dispersal promotes biodiversity in a real life game of rock-paper-scissors. Nature 418: 171-17 Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge Kimura, M. and T. Ohta. 1969. The average number of generations until fixation of a mutant gene in a finite population. Genetics 61 :763-771 Kirkup, B. C. and M. A. Riley. 2004. Antibiotic-mediated antagonism leads to a bacterial game of rock-paper-scissors in vivo. Nature 428: 412-414 Kirschner, M. and J. Gerhart. 1998. Evolvability. Proceedings of the National Academy of Sciences of the United States of America 95:8420-8427 Lenski, R. E., Rose, M. R., Simpson, S. C. and Tadler, S. C. 1991. Long-term experimental evolution in Escherichia coli. 1. Adaptation and divergence during 2,000 generations. American Naturalist 138: 1315-1341 Lenski R. E. and Travisano, M. 1994. Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations. Proceedings of the National Academy of Sciences of the United States of America 91: 6808-6714 119 Lenski, R. E., Winkworth, C. L. and Riley, M. A. 2003. Rates of DNA sequence evolution in experimental populations of Escherichia coli during 20,000 generations. Journal of Molecular Evolution 56: 498-508 Lenski, R. E. 2004. Phenotypic and genomic evolution during a 20,000-generation experiment with the bacterium Escherichia coli. Plant Breed Reviews 24: 225-265 Lenski, R. E., C. Ofi'ia, R. T. Pennock, and C. Adami. 2003b. The evolutionary origin of complex features. Nature 423: 139-144 Littell, R. C., G. A. Milliken, W. W. Stroup and R. D. Wolfinger. 1996. SAS System for Mixed Models. SAS Institute Inc., Cary, NC Losos, J. B., Jackman, T. R., Larson, A., de Queiroz, K. and Rodriguez-Schettino, L. 1998. Contingency and determinism in replicated adaptive radiations of island lizards. Science 279: 2115-2118 Mahillon, J. and Chandler, M. 1998. Insertion sequences. Microbiology and Molecular Biology Reviews. 62: 725-774 Mao, E. F., L. Lane, J. Lee and J. H. Miller. 1997. Proliferation of mutators in a cell population. Journal of Bacteriology 179:417-422 Matic, I., M. Radman, F. Taddei, B. Picard, C. Doit, E. Bingen, E. Denamur and J. Elion. 1997. Highly variable mutation rates in commensal and pathogenic Escherichia coli. Science 277:1833-1834 Miralles, R., P. J. Gerrish, A. E. S. Moya and S. F. Elena. 1999. Clonal Interference and the evolution of RNA Viruses. Science 285:1745-1747 Moxon, E. R., P. B. Rainey, M. A. Nowak and R. E. Lenski. 1994. Adaptive evolution of highly mutable loci in pathogenic bacteria. Current Biology 4:24-33 Muller, H. J. 1932. Some genetic aspects of sex. American Naturalist 662118-138 Muller, H. J. 1964. The relation of recombination to mutational advance. Mutation Research 1:2-9 Notley-McRobb, L. and T. F erenci. 2000. Experimental analysis of molecular events during mutational periodic selections in bacterial evolution. Genetics 156:1493-1501 Notley-McRobb, L., S. Seeto and T. Ferenci. 2002. Enrichment and elimination of mutY mutators in Escherichia coli populations. Genetics 162: 1055-1062 120 Oliver, A., R. Canton, P. Campo, F. Baquero and J. Blazquez. 2000. High frequency of hypermutable Pseudomonas aeruginosa in cystic fibrosis lung infection. Science 288:1251-1253 Orr, H. A. 1998. The population genetics of adaptation: The distribution of factors fixed during adaptive evolution. Evolution 52:935-949 Orr, H. A. 1999. The evolutionary genetics of adaptation: a simulation study. Genetical Research 74:207-214 Orr, H. A. 2000. The rate of adaptation in asexuals. Genetics 155:961-968 Orr, H. A. 2002. The population genetics of adaptation: The adaptation of DNA sequences. Evolution 56:1317-1330 Orr, H. A. 2003. The distribution of fitness effects among beneficial mutations. Genetics 163:1519-1526 Papadopoulos, D., D. Schneider, J. Meier-Eiss, W. Arber, R. E. Lenski and M. Blot. 1999. Genomic evolution during a 10,000-generation experiment with bacteria. Proceedings of the National Academy of Sciences of the United States of America 96:3807-3812 Paquin, C. E. and J. Adams. 1983. Relative fitness can decrease in evolving asexual populations of S. cerevisiae. Nature 306: 368-371 Peck, J. R 1994. A ruby in the rubbish: beneficial mutations, deleterious mutations and the evolution of sex. Genetics 137:597-606 Penfound, T. and Foster, J. W. 1999. NAD-dependent DNA-binding activity of the bifunctional NadR regulator of Salmonella typhimurium. Journal of Bacteriology 181: 648-655 Plotkin, J. B., and J. Dushoff. 2003. Codon bias and frequency-dependent selection on the hemagglutinin epitopes of Influenza A virus. Proceedings of the National Academy of Sciences 100: 7152-7157 Pruss, G. J. and K. Drlica. 1989. DNA supercoiling and prokaryotic transcription. Cell 56:521-523 Queitsch, C., T. A. Sangster, and S. Lindquist. 2002. Hsp90 as a capacitor of phenotypic variation. Nature 417: 618-624 Rainey, P. B. and Travisano, M. 1998. Adaptive radiation in a heterogeneous environment. Nature 394: 69-72 121 Rozen, D. E., J. A. G. M. de Visser and P. J. Gerrish. 2002. Fitness effects of fixed beneficial mutations in microbial populations. Current Biology 12: 1040-1045 Reid, S. D., Herbelin, C. J., Bumbaugh, A. C., Selander, R. K. and Whittam, T. S. 2000. Parallel evolution of virulence in pathogenic Escherichia coli. Nature 406: 64—67 Riehle, M. M., Bennett, A. F. and Long, A. D. 2001. Genetic architecture of thermal adaptation in Escherichia coli. Proceedings of the National Academy of Sciences of the United States of America 98: 525-530 Rundle, H. D., Nagel, L., Boughman, J. W. and Schluter, D. 2000. Natural selection and parallel speciation in sympatric sticklebacks. Science 287: 306-308 Rutherford S.L. 2003. Between genotype and phenotype: protein chaperones and evolvability. Nature Reviews Genetics 4:263-74 Rutherford, S. L. and S. Lindquist. 1998. Hsp90 as a capacitor for morphological evolution. Nature 396:336-342 SAS Institute Inc. 1999. SAS/STAT User's Guide, Version 8. SAS Institute, Cary NC Schneider, D., E. Duperchy, J. Depeyrot, E. Coursange, R. E. Lenski and M. Blot. 2002. Genomic comparisons among Escherichia coli strains B, K-12, and 01 57:H7 using IS elements as molecular markers. BMC Microbiology 2:18 doi:10.1186/1471-2180-2-18 Schneider, D., Duperchy, E., Coursange, E., Lenski, R. E. and Blot, M. 2000. Long-term experimental evolution in Escherichia coli. IX. Characterization of insertion sequence- mediated mutations and rearrangements. Genetics 156: 477-488 Shaver, A. C., P. G. Dombrowski, J. Y. Sweeney, T. Treis, R. M. Zappala and P. D. Sniegowski. 2002. Fitness evolution and the rise of mutator alleles in experimental Escherichia coli populations. Genetics 162:557-566 Sniegowski, P. D., P. J. Gerrish, T. Johnson and A. Shaver. 2000. The evolution of mutation rates: separating causes from consequences. Bioessays 22: 1057-1066 Sniegowski, P. D., P. J. Gerrish and R. E. Lenski. 1997. Evolution of high mutation rates in experimental populations of Escherichia coli. Nature 387:703-705 Sokal, R. R. and F. J. Rohlf. 1995. Biometry. W. H. Freeman and Company, New York Steck, T. R., R. J. Franco, J. Y. Wang and K. Drlica. 1993. Topoisomerase mutations affect the relative abundance of many Escherichia coli proteins. Molecular Microbiology 10:473-481 122 Stewart, C. B., Schilling, J. W. and Wilson, A. C. 1987. Adaptive evolution in the stomach lysozymes of foregut ferrnenters. Nature 330: 401-404 Tenaillon, O., F. Taddei, M. Radman and I. Matic. 2001. Second-order selection in bacterial evolution: selection acting on mutation and recombination rates in the course of adaptation. Research in Microbiology 152211-16 Tenaillon, O., B. Toupance, H. Le Nagard, F. Taddei and B. Godelle. 1999. Mutators, population size, adaptive landscape and the adaptation of asexual populations of bacteria. Genetics 152:485-493 True H. L. and S. L. Lindquist. 2000. A yeast prion provides a mechanism for genetic variation and phenotypic diversity. Nature 407: 477-83. Van Nimwegen, E. and J. P. Crutchfield. 2000. Metastable evolutionary dynamics: Crossing fitness barriers or escaping via neutral paths? Bulletin of Mathematical Biology 62:799-848 Vasi, F., Travisano, M. and Lenski, R. E. 1994. Long-term experimental evolution in Escherichia coli. 11. Changes in life-history traits during adaptation to a seasonal environment. American Naturalist 144: 432-456 Pedersen, K. and Gerdes, K. 1999. Multiple hok genes on the chromosome of Escherichia coli. Molecular Microbiology 32: 1090-1102 Wichman, H. A., Badgett, M. R., Scott, L. A., Boulianne, C. M. and Bull J. J. 1999. Different trajectories of parallel evolution during viral adaptation. Science 285: 422-424 Wilke, C. O. 2004. The speed of adaptation in large asexual populations. Genetics 167: 2045-2053 Wilks, S. S. 193 8. The large-sample distribution of the likelihood ratio for testing composite hypotheses. Annals of Mathematical Statistics 9:60-62 Williams, G. C. 1966. Adaptation and Natural Selection. Princeton University Press. 123 Appendix 1. MATLAB script for simulation and graphing of Muller-style plot. clear F N popnow poptime Graph tsz l = H; counter i j k I end N=1*10"7; l=l+1; T = 600; end U = 1*100-6; S = .025; s = expmd(S); $2 = N(1,l); figure test = sum(N(l,1:(l))); hold on j = double(k-sum(N(l,1 :(l-l)))); axis ([1 T 0 N]) gens = size(N); set(gca, 'ytick', [D N = [N(1,1:(1-1)),(j-1),1,(sz- set(gca,'fontsize', l6); j),N( 1 ,(1+ 1 ):gens( l ,2))]; N = [N]; F = tot = sum(N); [F (l,1:(l)),F(l)*(l+s),F(1,(l):gens(l,2))]; tot2 = sum(N); k = k + int32(expmd(1/U)); F =11]; l= 1; popnow = N; tot2 = sum(N); piptime = popnow; %mdstate = rand('state'); %mdnstate = randn('state'); rand('state', rndstate) end Graph = cumsum(N); Graph = [Graph' Graph']; Graph = [0 0;Graph]; sizeG = size(Graph); randn('state', mdnstate) C = [F' F';0 0]; x = l:1:sizeG(l,l); for t = 1:T; x = x./x; time(l,t) = t; x = [x'*t x'*(t+l)]; t; pcolor(x,Graph,C) avfit = sum(N.*F)/tot; shading flat lambda = N.*F./avfit; end fn( l ,t) = avfit; N = poissmd(lambda); colorbar sizeN = size(N); set(colorbar,'box','off‘); gens = sizeN(l,2); set(colorbar,'FontName', 'Arial’); ' = 1; colormap(gray) l = 1; a = get(gca,'Clim'); counter=0; a(l,l)= l; k = int32(expmd(1/U)) + 1; set(gca,'Clim',a); while k<=tot2; while sum(N(:,l:l))