.. .rbii fink.” tn. «E 2. JV... . C .g... .31.. ? avg. . z. :1“! . £1}. 7.... -o\ l . . .2: ‘3. c . ‘3. 9.2L! 3.: 4......Ihm . I. a. nQJuKLI‘P: . . J « wall-35.... 1 . .x . 2.5.. . .....u c. . 4. . V ..E\ p v.5}?! . ...-LLhC.-.u._ , y , . II . . . 1 . ‘l .30.}. Q... . | 'l _ h. '55.... . 9;...“ THEmS MICHIGAN STA ” lllllllllllllllllll lllllllllllllllllll 3 1293 01787 9895 LIBRARY Michigan State University This is to certify that the dissertation entitled DYNAMICS OF MUTATION AND SELECTION IN ASEXUAL POPULATIONS presented by Philip J; Gerrish has been accepted towards fulfillment of the requirements for Ph. D.— degree in ZOO log; Kim? s, Law; Major professor Date A?“ 2f (qqg MSU i: an Affirmative Action/Equal Opportunity Institution 0-12771 L v- ¥‘-‘—“~‘—“—I ‘__.. PLACE lN REI'URN BOX to remove this checkout from your record. TO AVOID FINE return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 1/98 WW“ DYNAMICS OF MUTATION AND SELECTION IN ASEXUAL POPULATIONS BY Philip J. Gerrish A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Zoology 1998 JABSTEUMCT DYNAMICS OF MUTATION AND SELECTION IN ASEXUAL POPULATIONS BY Philip J. Gerrish The dynamics of asexual populations are characterized by strong associations between alleles at different loci due to complete linkage. One consequence of complete linkage is the phenomenon known as hitchhiking, whereby a specified mutant allele is driven to high frequency or even fixation by a beneficial mutation to which it is linked. If the specified mutant allele confers a change in the mutation rate of the organism, then the fixation of that allele affects the mean mutation rate of the population. Observations of high mutation rates in evolving E. coli populations are consistent with the hypothesis that mutator alleles were fixed in these populations by hitchhiking. Theoretical exploration of the hitchhiking mechanism reveals its general significance for asexual evolution. I conclude that (i) mutation rates found in asexual populations are more likely to be determined by sporadic hitchhiking than by evolutionary "fine-tuning" as previous theoretical models would suggest, and (ii) there exists a most probable increase in mutation rate due to hitchhiking that is both positive and finite and its value is typically significant. l 1 i‘ , v ‘- eta ..bu a . O 9- A. .0- ~gv-A . 1.1 {7.5: rite an. o.‘ .Another consequence of complete linkage is the phenomenon called “cloned interference”, whereby the progression of one beneficial mutation to fixation is hindered or even prevented by competition with alternative beneficial mutations. From theoretical exploration of clonal interference, I derive several fundamental population-genetic parameters, as well as the probability that a beneficial mutation will transiently achieve polymorphic frequency. After treating the case in which an unlimited number of beneficial mutations are available, I treat the case in which that number is limited. From these developments, I solve the inverse problem of estimating the following parameters from fitness data of evolving E. coli populations: (1) beneficial mutation rate, (ii) the distribution of mutational effects, and (iii) the number of available beneficial mutations for the case in which this number is limited. Salient conclusions from both theoretical treatments are (i) adaptive evolution of asexual populations is characteristically punctuated with short bursts of rapid change followed by long periods of stasis regardless of population size or mutation rate, (ii) in identical environments, the trajectory of adaptive evolution of large, parallel populations is highly repeatable due to clonal interference, (iii) the rate of fitness improvement is an increasing function of both mutation rate and population size, but the function is decelerating so that the rate of adaptive evolution is constrained by a “speed limit”, and (iv) with significant probability, clonal interference may transiently maintain fitness variants at polymorphic frequencies. n}. D- -- .5. r». .q. :- 3-. —\H Q “I ”v. ACKNOWLEDGMENTS First and foremost, I thank my committee members: I thank my advisor, Dr. Richard Lenski, for admirable instruction, guidance and support as well as his vital role in the conception of ideas presented here, I thank Dr. Paul Sniegowski for his crystal clear explanations of population genetics as well as his vital role in the conception of ideas presented here, I thank Dr. V. Mandrekar for superior instruction in stochastic processes, I thank Dr. Judy Mongold and Dr. Don Hall for helpful comments and conversations. I especially thank Danny Rozen for important conceptual contributions, Drs. Alex White and.Alejandra Sorto for help with mathematical and statistical analyses. I am grateful to Dr. Francois Taddei for sharing a manuscript prior to publication and to Dr. Cliff Zeyl for permission to cite unpublished data. I thank Dr. Brendan Bohannan, Lynette Ekunwe and Phyllis Frank for technical assistance, Drs. Tom Cebula, Fred Adler, Frank Stewart, Warren Ewens, Andrew Leigh Brown, Santiago Elena, Mike Travisano, John Gerrish, Larry Segerlind, Arjan DeVisser, and Paco Moore for helpful discussions and comments. Thanks to Drs. H. Maki and J.E. LeClerc for plasmids. Thanks to my present supervisor, Dr. Marcia Kalish, for patience and support. Thanks to Lucy Gerrish for moral support and good humor, Irene Gerrish for playing dinosaur, and Dr. Theophilus Okosun for plenty of beer. iv V... F s I new. a. f) h‘ \rc . -‘KAFO. “ ""V~'\" R‘ TABLE OF CONTENT LIST OF FIGURES . .... . . . . . . . . . . . LIST OF MATHEMATICAL SYMBOLS . . . . . . . . Chapter 2 . . . . . . . . . . . . . . . Chapter 3 . . . . . . . . . . . . . . . Chapter 4 . . . . . . . . . . . . . . . INTRODUCTION . . . . . . . . . . . . . . . . S Fundamental issues and roadmap to the dissertation . . Chapter 1 . . . . . . . . . . . . . . . Chapter 2 . . . . . . . . . . . . . . . Chapter 3 . . . . . . . . . . . . . . . Chapter 4 . . . . . . . . . . . . . . . Chapter 1: EVOLUTION OF HIGH MUTATION RATES IN EXPERIMENTAL POPULATIONS OF ESCHERICHIA Abstract . . . . . . . . . . . . . . . Introduction and Findings . . . . . . . Methods . . . . . . . . . . . . . . . . Experimental System . . . . . . . Mutation rate measurements . . . Time of Origin and Persistence of Complementation Tests . . . Analysis of Fluctuation Test Data COLI . . xiii xiii xiv xvi 15 JAKE! .‘ . .7“ ... Vet ’i U" Discussion I O I O O O O O O O O O O O O I O O I O O O O O 0 Chapter 2: THE FATE OF COMPETING BENEFICIAL MUTATIONS IN AN ASEXUAL POPULATION . . . . . . . . . . . . Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . Clonal interference and fixation . . . . . . . . . . . . . . Clonal interference among beneficial mutations . . . . Some definitions . . . . . . . . . . . . . . . . The expected number of interfering mutations . . Fixation probability of a beneficial mutation . . . . . Expected rate of substitution . . . . . . . . . . . . . Expected selection coefficient of successful mutations Effect of clonal interference on rate of fitness increase . . . . . . . . . . Estimation of parameters: an empirical example . . . . Transiently common mutations . . . . . . . . . . . . . . . . Clonal interference - a general model . . . . . . . . . Probability of transiently polymorphic beneficial mutations . . . . . . . . . . . . . . Probability of transiently common mutations: the leapfrog . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . Summary of results . . . . . . . . . . . . . . . . . . Assumptions of the models . . . . . . . . . . . Model validation by simulation . . . . . . . . . . . . Inclusion of the double mutant . . . . . . . . . . . . Implications for the evolution of 18 25 25 26 28 28 28 29 33 35' 36 37 38 42 42 48 52 54 54 56 62 63 ‘7‘);a. van-lb — . 8:3 4 ,. p: e... Chapter 3: reproductive strategies . . . . . . . . . . . . . Implications for the general nature of adaptive evolution . . . . . . . . . . . . . . . Evidence for transiently common beneficial mutations in microbial populations . . . . . . . . . . . . . A suggestion for further research . . . . . . . . . . . THE ORDER OF FIXATION OF BENEFICIAL MUTATIONS IN ASEXUAL POPULATIONS . . . . . . . . . . . . . . . Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . Probabilities of fixation orderings . . . . . . . . . . . . . Order of fixation of two beneficial mutations . . . . . Probability that a superior mutation is the first of many . . . . . . . . . . . . . . . Probability that fixation ordering is from largest to smallest s . . . . . . . . . . . . . . . . . . General solution for any ordering of fixations . . . . Fitness trajectories I: parallel populations share a common set of available beneficial mutations . . . . . . . . . Expectation and variance . . . . . . . . . . . . . . . Reducing the number of parameters . . . . . . . . . . . Fitness trajectories II: parallel populations share a common distribution of beneficial mutational effects . . . . . The non-ordered region . . . . . . . . . . . . . . . . The ordered region . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .Assumptions of the models . . . . . . . . . . . . . . . Estimating parameters . . . . . . . . . . . . . . . . . Ordered substitutions in nature, with an application to HIV . . . . . . . . . . 64 67 69 72 84 84 85 88 88 94 95 96 98 99 103 106 107 110 115 115 118 122 Chapter 4: HITCHHIKING OF DELETERIOUS MUTATIONS WITH SPECIAL ATTENTION TO MUTATOR ALLELES Abstract . . . . . . . . . . . . . . . . . . . Introduction . . . . Theoretical developments The hitchhiking rate An effective mutation—selection balance Probability of fixation of double-mutations, Pr{fix} Converting per-capita rate to populational probability An application . Hitchhiking rate of mutator mutations . . . Estimating rate of mutation to mutator for evolving E. coli populations Discussion . . . . . . . . . . . . . . . . . . . . . . . Assumptions Comparison with simulation Effect of population size on hitchhiking rate Effect of selective disadvantage on hitchhiking rate Effect of mutator strength on hitchhiking rate APPENDIX 1.1: FLUCTUATION TEST ANALYSIS PROGRAM Theory Obtaining and using the program . APPENDIX 2.1: PROBABILITY OF SURVIVING DRIFT APPENDIX 2.2: n-GENOTYPE LOGISTIC SYSTEM WITH MUTATION General solution . Application of boundary conditions due to mutation 135 135 136 138 138 140 142 148 151 151. 154 157 157 159 162 163 164 178 178 182 183 185 185 187 _...‘...-.. 1'"! unsu- it." v... x 7" um... v. Notation for the 3-genotype case . . . . . . . . . . . APPENDIX 2.3: EXPECTED NUMBER OF CANDIDATE REPLICATIONS . . APPENDIX 2.4: FUNCTIONS EMPLOYING THE RECTANGULAR DISTRIBUTION APPENDIX 3.1: ADAPTIVE SUBSTITUTIONS IN ASEXUAL POPULATIONS ARE ACCURATELY MODELED AS INSTANTANEOUS REPLACEMENTS . . . . . . . . . . . . . . . APPENDIX 3.2: ANALYTICAL INTEGRATION OF EQUATIONS . . . . . . .APPENDIX 4.1: EFFECTIVE MUTATION-SELECTION BALANCE . . . . . APPENDIX 4.2: EFFECTIVE POPULATION NUMBER UNDER A SERIAL TRANSFER REGIME . . . . . . . . . . LIST OF REFERENCES . . . . . . . . . . . . . . . . . . . . . . 188 190 194 197 200 204 208 218 LIST OF FIGURES Figure 1.1. Rates of mutation in 12 Araf and Ara+ experimental populations (A-l to A-6 and A+1 to A+6) at 10000 generations and in their common ancestors REL606 (Ara') and REL607 (Arat). . . . . . . . . . . . . . 22 Figure 1.2. Time of appearance and evolutionary persistence of mutator phenotypes in experimental populations Ara‘2, Ara'4, and Ara‘3. . . . . . . . . . . . . . . 23 Figure 1.3. Mutation rates to nalidixic acid resistance in 10,000-generation isolates from populations Araih Araf4 and Araf3 and ancestor REL606 transformed with plasmids bearing wild—type alleles of seven known general mutator loci. . . . . . . . . . . . . . . . 24 Figure 2.1. The probability that a given beneficial mutation with selection coefficient, 5, achieves fixation. O 0 O O O O O O O O O O O 0 O O O O 0 O O 74 Figure 2.2. The probability of fixation of an arbitrarily chosen beneficial mutation is a decreasing function of both beneficial mutation rate, p, and population size, N . . . . . . . . . . . . . . . . . . . . . . . . . 75 Figure 2.3. The substitution rate of a population is an increasing function of its beneficial mutation rate . . . . . . . . . . . . . . . . . . . . . . . . . 76 Figure 2.4. The expected selection coefficient of substitutions, , is an increasing function of population size, N. . . . . . . . . . . . . . . . . 77 Figure 2.5. The rate of fitness improvement is an increasing function of both population size, N, and beneficial mutation rate, p. . . . . . . . . . . . . 78 u . ‘0‘ .I - a-h - - ans . u . . 0. Q A v F .u "I! a 4 afiy PM. awu .RU AH» .FU p. a.» .Hh (U a» .u A P91. u.“ fl 5» ‘is C. A.V X a» “U. r . r . r . v . v . v a v a v .a H u . a “a . a “a .. a .. a To. an. we. 14. we. a: ~u. 7.. .. .. i. .. 4. I; .. no. no. —Do He. not his ~54 \i ~E § Figure 2.6. The probability that an arbitrarily chosen beneficial mutation transiently achieves polymorphic frequency is plotted against log population size for various beneficial mutation rates. . . . . . . . . . 79 Figure 2.7. The leapfrog phenomenon illustrated phylogenetically. . . . . . . . . . . . . . . . . . 80 Figure 2.8. The leapfrog phenomenon illustrated dynamically. . . . . . . . . . . . . . . . . . . . . 81 Figure 2.9. A simulation of competition among numerous beneficial mutations. . . . . . . . . . . . . . . . 82 Figure 2.10. The probability of fixation of an arbitrarily chosen beneficial mutation is plotted against the beneficial mutation rate, u, for various population sizes. . . . . . . . . . . . . . . . . . . . . . . . 83' Figure 3.1. Probability that a superior mutation (ss== 0.1) is fixed before an inferior mutation (s,==0.08) as a function of both population number, N (for which u5:= 1h = 3 x 104), and overall beneficial mutation rate, p (for which N'= 3.3 x 107). . . . . . . . . . . . . 128 Figure 3.2. Probability of fixation ordering from largest to smallest selective advantage as a function of both population number, N (for which u = 6 x 104), and overall beneficial mutation rate, u (for which N'= 3.3 x 107). . . . . . . . . . . . . . . . . . . . . . 129 Figure 3.3. Evolutionary trajectories as a function of population size. . . . . . . . . . . . . . . . . . 130 Figure 3.3 (cont.) . . . . . . . . . . . . . . . . . . 131 Figure 3.3 (cont.) . . . . . . . . . . . . . . . . . . 132 Figure 3.4. Least-squares fit of expected fitness trajectory to fitness data from evolving E. coli population, the Ara‘1.line from Lenski & Travisano (1994). . . . . . . . . . . . . . . . . . . . . . 133 Figure 3.5. Estimated probability (solid line) and maximum probability (dotted line) that the inferior AZT- resistance mutation, K7OR, is fixed before the superior mutation, T215Y/F, as a function of effective population number. . . . . . . . . . . . . . . . . 134 Figure 4.1. The dynamics of mutation-selection balance. 0 O O O O O O I O 0 O O O O O O O O O O O O 170 Figure 4.2. Panel A shows the per-capita hitchhiking rate of a specified deleterious mutation as a function of population number, N. Panel B shows the corresponding probability that a population fixes the deleterious mutation by hitchhiking within a time interval of 10,000 generations. . . . . . . . . . . . . . . . 172 Figure 4.3. Panel.A shows the per-capita hitchhiking rate of a specified deleterious mutation as a function of its selective disadvantage, sD. Panel B shows the corresponding probability that a population fixes the deleterious mutation by hitchhiking within a time interval of 10,000 generations. . . . . . . . . . 173 Figure 4.4. Panel A shows the per-capita hitchhiking rate of a mutator allele as a function of its strength, m. Panel B shows the corresponding probability that a population fixes the mutator allele by hitchhiking within a time interval of 10,000 generations. . . 174 Figure 4.5. The per-capita recruitment rate of double- mutations (beneficial mutations on mutator background) as a function of mutator strength, m. . . . . . . 175 Figure 4.6. Probability of fixation of a double-mutation (beneficial mutation on mutator background) as a function of mutator strength, m. . . . . . . . . . 176 Figure 4.7. Frequency of a deleterious mutation during a substitutional event on both wildtype and beneficial backgrounds. . . . . . . . . . . . . . . . . . . . 177 33:} () Chapter 2: x(t) y(t) z(t) t: .n(s) .Pr{fix} c7 <.> LIST OF MATHEMATICAL SYMBOLS number of wildtype individuals at time t number of individuals carrying the beneficial mutation in question at time t number of individuals carrying an alternative beneficial mutation at time t beneficial mutation rate total population number selection coefficient time to fixation exponential parameter for distribution of beneficial mutational effects probability that a beneficial mutation of selective advantage 3 survives drift (referred to as the “survival function”) expected number of “interfering mutations” as defined in the text probability of fixation rate of adaptive substitution expected value xm nut. n‘ . r . «may a\» e d 9. or .I t . PC .1 ,. rev by. a» (vi 4 I r I. . . .Maw Nu... RUN c. 74 v Chapter 3: K _ frequency number of “candidate replications” as defined in the text expected number of superior mutations that would prevent a given beneficial mutation from attaining frequency f. time at which the appearance and survival of a superior mutation of selective advantage sz ensures that the beneficial mutation in question attains a frequency of exactly f. expected number of superior mutations preventing the fixation of a beneficial mutation that has achieved a frequency of at least f. the linear constant in the survival function, such that n(s) z Ks. The case of only two available beneficial mutations: I - S _ Pr{I;S} - Pr{S;I} - denotes inferior beneficial mutation denotes superior beneficial mutation probability that the inferior mutation is fixed before the superior mutation probability that the superior mutation is fixed before the inferior mutation rate of appearance and survival of the inferior mutation my ii; i q... _-u ... 3 r”. 4. . A... a.” Pint 1 . . P. r. 0.7- yin a: I .2 v .o I rate of appearance and survival of the superior mutation The case of many available beneficial mutations: {{j} ¢({) Witlé) E(whfl var(whfl 1: number of available beneficial mutations probability that the largest-effect mutation is fixed before all others rate of appearance of the beneficial mutation with the jth largest selective advantage permuting function mapping place in the fixation ordering onto rank by selective advantage probability of the fixation ordering specified by the permuting function 5 time between the j-lth fixation and the appearance of the jth beneficial mutation to be fixed eXpected time until the i” fixation, given fixation ordering C population fitness after the i” fixation, given fixation ordering ( population fitness at time t, given fixation ordering ( expected population fitness at time t variance of population fitness among replicate populations at time t the set of all available beneficial mutations both inferior to and subsequent to the jth beneficial mutation fixed XV n, f) C)’ > ‘1 I" , J .‘ix #1, #5, Chapter 4: '9" Pr{fix} h x(t) Y“) the set of all available beneficial mutations both superior to and subsequent to the jth beneficial mutation fixed cardinality of set Ij cardinality of set 59 mutation-selection balance equilibrium mutation-selection balance “effective mutation-selection balance” as defined in the text overall beneficial mutation rate on the wildtype background overall beneficial mutation rate on the specified deleterious background rate of mutation from wildtype to the specified deleterious mutation selective disadvantage of the specified deleterious mutation probability of fixation per-capita hitchhiking rate of the specified deleterious mutation number of wildtype individuals at time t number of individuals carrying only the specified deleterious mutation at time t z(t) number of individuals carrying both beneficial and deleterious mutations at time t factor by which a specified mutator allele elevates the general mutation rate, e.g. u’=mu. B B overall deleterious mutation rate of the wildtype "it. -I' ' 44 p. s III-“‘08.. nanno- , .- 9""8 i- . :C-‘gt. ~. . ~.~‘ 0 “ ‘. ‘Q ‘sxn‘v a; ‘1 INTRODUCTION Fundamental issues and roadmap to the dissertation Adaptive evolution occurs by the action of selection on genetic variants in a population. In sexual populations, different fitness variants may recombine, thereby allowing selection to act at different loci with a certain degree of independence. In asexual populations, however, fitness variants compete and the fittest variant eventually displaces all others. Thus the fate of a particular allele at a specified locus is determined as much by selection at other loci to which it is linked as by its own selective value. One consequence of asexuality (or, more generally, genetic linkage) is a phenomenon known as hitchhiking awaynard Smith & Haigh, 1974; Berg, 1995), whereby a specified mutant allele is driven to high frequency or even :fixation by a successful beneficial mutation to which it is liJiked. .A second consequence of asexuality is the ‘. ’A-; F “.u y-»U°“" b “valflvp‘ H.V' L..p nav‘ ...so\0v- ‘ F n;" . ‘ 5‘... Q- - Q . bn‘fl'- so- u v...- '9! (“D QQA G2: -..b-'.,' ‘u—Ju- I ‘ w . P' .‘F‘PI- “. b we... Av- “’ \oo‘t ’- C..::“ - ‘1 h- _ F‘ v..: .‘VL . - -~P~. .4. ‘y v- . ‘M‘ u i .. - IN ‘._ .fi’. ‘\ .. wt“ pv . v.‘~-\\ ‘ - -‘ ‘ Q i In as — 2 phenomenon called “clonal interference”, whereby the progression of one beneficial mutation to fixation is hindered or even prevented by competition with alternative beneficial mutations. This dissertation explores the effects of both hitchhiking and clonal interference on the adaptive dynamics of asexual populations. Chapter 1 is an empirical study of mutation rate evolution in bacterial populations and invokes hitchhiking as an explanation for observed trends. Chaptersl 2 and 3 are theoretical explorations of clonal interference under different assumptions about the availability of beneficial mutations. Chapter 4 ties the theory developed in chapters 2 and 3 to the empirical observations of Chapter 1. Chapter 1 Mutations may arise in the genes responsible for .accurate DNA synthesis, proofreading or repair. An :Lndividual is said to carry a mutator allele if any of these genes is impaired or disabled by such a mutation; a mutator zillele therefore confers an elevated mutation rate. An irn'V" {044“ .u ‘ugfl" " .mthO5 . .r, .. ,. .....Sy v rgrwrr - ‘HU‘DD - . o. .u. to. flu“. 58.62:. HH‘A9. fld‘iue '4. th- 2. si...‘ “4“.“ up“ 3 individual is said to carry an antimutator allele if the function of any of these genes is enhanced by a mutation; thus, an antimutator allele confers a diminished mutation rate. Because mutator and antimutator alleles are recurrently produced in a population, their fate and hence the mutation rate of the population may be determined by selection. Chapter 1 examines mutation rate evolution in laboratory populations of E. coli (Sniegowski et al., 1997). This study began as a test of the hypothesis that mutation rates in a constant environment should decrease with time. This hypothesis is grounded in the logic that, as a population adapts to a constant environment, there should be (i) weaker selection for mutator alleles as the available beneficial mutations are expended and (ii) stronger selection for antimutator alleles as the remaining available mutations become almost exclusively neutral or deleterious. This line of reasoning is in accordance with conventional theory (Kimura, 1960; Kimura, 1967; Leigh, 1970; Leigh, 1973; Gillespie, 1981; Ishii et al., 1989) and predicts that evolution should reduce mutation rates in a constant environment. “Artflr- pt. 0V." 6 85:115.: b'p-e 'v-- fl .A-‘- 5-4 QI'VI‘ 'n-v vu‘..w- .3.“ ‘31 o 1: In» - U lav'. vamp . _, I’V‘A Juio ” “4 I . -FA‘ ‘ ‘fl '7‘ U. V. by ) w 'v n- 4 av (I) 4 To test this hypothesis, Sniegowski et al. (1997) performed fluctuation assays (Luria & Delbruck, 1943) to estimate the mutation rates of twelve E. coli populations that had been evolving independently in a constant environment for 10,000 generations (Lenski & Travisano, 1994). Estimation of mutation rates from fluctuation assays is notoriously difficult because of its mathematical complexity (Lea & Coulson, 1949; Stewart et al., 1990; Sarkar, 1991). Building on recent developments (Ma et al., 1992; Stewart, 1994; Jaegger & Sarkar, 1995), I derive the analytical Newton series for the maximum likelihood estimator of mutation rates from fluctuation assays, as well as the analytical variance, and have incorporated these into a computer program. Instructions for obtaining and using this program, as well as the novel mathematical derivations, are given in Appendix 1.1. Having estimated mutation rates for each of the twelve E. coli populations, none of the twelve populations had a systematically lower mutation rate than the ancestor. Surprisingly, three of the twelve populations had mutation rates that were about two orders of magnitude higher than the ancestral rate. This result was cause for rejection of .u‘ q: . . ‘\ u A Cy u s .- . .m u. S . . ... a. ”a v. .3 n... a. C ; _ .2 .. . a. 2.. S C. E . c 3 v. .... : X : s L“ n... L" r. 3 .>.. we r C . e E C .3 . . e o . .. . C. ..n E C. c 1 u - 5. the original hypothesis and for critical review of conventional mutation rate evolution theory. In re- examining this theory, Paul Sniegowski (personal communication) proposed that a major flaw was the implicit assumption of infinite population size. A consequence of this flaw is the unsubstantiated prediction that adaptive evolution should "fine-tune" mutation rates. To explain their observations, Sniegowski et al. (1997) hypothesized instead that either (i) mutation rates in asexual populations may be determined by chance hitchhiking events, whereby mutator alleles are driven to fixation by beneficial mutations that they produce, or (ii) mutator alleles may provide a direct fitness benefit if reduced fidelity allows for an increased replication rate. Chapters 1 and 4 explore and support the plausibility of (1). While preliminary results suggest rejection of (ii) for the E. coli populations, the validity of this rests on future empirical studies. I! Ma- -—.-.r '1 ”v.04 ' ‘.. EVGV- ] w“ Or, (I) Chapter 2 Chapter 2 is a theoretical exploration of the asexual adaptive dynamics that result from competitive interactions among beneficial mutations (Gerrish & Lenski, 1998). In asexual populations, clones that carry alternative beneficial mutations compete with one another and, thereby, interfere with the expected progression of a given mutation to fixation (Fisher, 1930; Muller, 1932; Muller, 1964; Crow . & Kimura, 1965; Haigh, 1978). This phenomenon has been called “clonal interference”. Intuitively, it seemed this phenomenon would significantly affect fundamental population-genetic parameters. Richard Lenski (personal communication) speculated that this phenomenon might also have a curious effect on phylogenetic and fitness dynamics and on the relationship between these. He reasoned that a beneficial mutation may rise to polymorphic frequency, or even become the majority genotype, before being competitively displaced by a superior, alternative mutation. Such a phenomenon, which he termed the “leapfrog”, would present strange phylogenetic dynamics in which ancestor ab is succeeded by mutant Ab, and Ab is then succeeded not by R in .’.-~ . r .w H~“' O‘b ‘8' ...€ .-. .no 5. . 2 o—Jiib-‘ “'~‘A l . H ‘ want... ”fin”, - Viv-A. VI, F‘y‘ A 44" ,- . V“. «m: . ‘ ’7‘ '4. V‘ I" 7 AB but by aB (the two successive mutants being more closely related to the ancestor than they are to each other). From the fitness standpoint, the leapfrog phenomenon would give the appearance of two fixations when, in fact, only one mutation is actually fixed. The question remained, however, under what conditions the leapfrog phenomenon was likely to occur and whether such conditions were biologically reasonable. From theoretical exploration of clonal interference, I analytically derive several fundamental population-genetic parameters such as fixation probability, substitution rate and rate of fitness increase. Also, I derive the probability that a beneficial mutation transiently achieves polymorphic frequency or majority status (the “leapfrog”). Employing fitness data from an evolving E. coli population, I solve the inverse problem to estimate (i) the beneficial mutation rate in this population and (ii) the distribution of mutational effects. If arr}: rt ...‘--\ a? F fig to :va. ' .. be. I -~\.-- H m¢__‘ tfiv. s it" A. Cd 7- ‘ t. C r!) Chapter 3 An assumption made throughout Chapter 2 is that the population in question is evolving in a slowly changing environment, such that opportunities for further improvement are never depleted. In other words, the number of available beneficial mutations is infinite. In a constant environment, however, the set of available beneficial mutations is eventually depleted. This set must therefore be considered finite. Chapter 3 explores the adaptive dynamics of asexual populations in a constant environment, i.e., when the number of available beneficial mutations is finite. Given a finite set of available beneficial mutations, the course of adaptive evolution is determined by both the timing and ordering of fixations. In large, asexual populations, the ordering of fixations is affected by competition among beneficial mutations. Taking into account such clonal interference, I derive the probability of any specified fixation ordering. From this, I derive dynamic equations for fitness expectation and variance among replicate populations, given (i) a particular set of A“." .i Ugh-v v- if. 81': ".2' u» ‘A l.» s g“.- t'uhr‘- . o -Hch .o-‘vg‘ “w. A: 9 beneficial mutations, or given (ii) only a common distribution of mutational effects. Using fitness data from an evolving E. coli population, I solve the inverse problem to determine (i) the beneficial mutation rate in this population, (ii) the distribution of beneficial mutational effects, and (iii) the number of available beneficial mutations. Chapter 4 Chapter 4 employs the theory developed in Chapter 2 to model the hitchhiking mechanism that was proposed as an explanation for the observations of Chapter 1 (Sniegowski et al., 1997). In order for any allele to hitchhike to fixation, it must first be linked to a beneficial mutation. Beneficial mutations appear in linkage with a specified deleterious mutation at a certain rate. As long as this rate is greater than zero, the subpopulation carrying the specified deleterious mutation will eventually produce a beneficial mutation. If this beneficial mutation is subsequently fixed in the population, then the specified deleterious mutation to which it is linked will hitchhike to 10 fixation. The likelihood that the beneficial mutation will be fixed is reduced, however, by (i) the selective disadvantage of the deleterious mutation to which it is linked, (ii) genetic drift, and (iii) clonal interference from both wildtype and mutant subpopulations. Taking all of this into account, I derive the probability that a particular deleterious allele will hitchhike to fixation within a given time period. The probability that a mutator allele hitchhikes to fixation is then presented as a special case. Results corroborate the observations of Chapter 1 and suggest that hitchhiking is of general importance to the evolution of mutation rates in asexual populations. Abstr Ayn. «u»! ad. ad. I. .A a v e . x 5 8.. . 4 ‘ .6 Q l.... he 5.. ..: .3 q a Chapter 1 EVOLUTION OF HIGH MUTATION RATES IN EXPERIMENTAL POPULATIONS OF ESCHERICHIA COLI1 Abstract Most mutations are likely to be deleterious, and hence the spontaneous mutation rate is generally held at a very low value (Drake, 1991) . Nonetheless, evolutionary theory predicts that high mutation rates can evolve under certain circumstances (Leigh, 1970; Ishii et al., 1989; Taddei et al., 1997) . Empirical observations have heretofore been limited to short-term studies of the fates of mutator strains deliberately introduced into laboratory populations of E. coli (Cox & Gibson, 1974; Chao & Cox, 1983; Trobner & Piechocki, 1984) and to the effects of intense selective events on mutator frequencies in E. coli (Mao et al., 1997) . Here we report a new phenomenon: the rise of spontaneously originated mutators in populations of E. coli undergoing long-term adaptation to a novel environment. Our results corroborate computer simulations of mutator evolution in adapting clonal populations (Taddei et al., 1997) and may shed light on recent observations that associate high mutation rates with emerging pathogens (LeClerc et al., 1996) and with certain cancers (Modrich et al., 1995). 1 This chapter was written originally as a paper (Sniegowski et al., 1997) . The “we" in this chapter refers to P.D. Sniegowski, P.J. Gerrish, and R.E. Lenski. 11 Introduct N‘v -" 1y u..CC..! 8849' l l OVUQAUH “‘9‘: V on ‘ 5M~d§ ‘Vcsa '7’. bui. Y‘“Wc e vs“... yr, 12 Introduction and Findings Lenski and collaborators (Lenski et al., 1991) established twelve replicate experimental populations of a clonal strain of E. coli in order to study evolution directly in the laboratory. Because these populations were founded from a single ancestral clone, mutation provided their only source of genetic variation. The glucose-limited environment in which these populations were propagated was essentially novel at the outset of the experiment and provided considerable scope for adaptive evolution. Substantial evolutionary increases in fitness and changes in certain other phenotypic features in these populations have been documented elsewhere (Lenski et al., 1991; Lenski & Travisano, 1994; Vasi et al., 1994; Travisano & Lenski, 1996; Elena et al., 1996). We measured mutation rates in the common ancestral strain and in the twelve experimental populations after they had been evolving for 10,000 generations. A majority of the populations retained the ancestral mutation rate; however, three populations (designated Ara-2, Ara-4 and Ara+3) displayed mutation rates that were between one and two n'n “AMT“; .y“ oAVv‘V‘L' v Ara 4 an: n'nav-A"'v~_ Vuys.V-‘L . h J. exrc"'~: ‘5 a. -...c " ..i,.,_ a.‘ sit-Mu, 3‘38? 9‘ D u UA.V . { ~ s 6 '9‘ “vim f‘r- p... ‘- Pie a' fig A‘y" Hl‘q-t..: s H “A a..- ~~~S.r. a»: .‘g“~afiJ ‘ s" . Q “h rn au.e :. . U", k'- . “89 V‘ 1' v t; k‘, I the ‘2 \- 13 orders of magnitude higher than those in the ancestor (Figure 1.1). Figure 1.2 illustrates when mutator phenotypes arose during the evolution of populations Ara-2, Ara-4 and Ara+3, as determined by screening for the mutator phenotype in isolates stored periodically throughout the experiment. Once a lineage displayed a mutator phenotype, all subsequent isolates from that lineage also tested as mutators through 10,000 generations. This observation indicates that mutators rose to and remained at high frequencies in these populations. To ascertain a genetic basis for the observed mutator phenotypes, we transformed 10,000-generation clonal isolates from populations Ara-2, Ara-4 and Ara+3 and an isolate from the ancestral strain REL606 with multicopy plasmids bearing wild-type alleles of seven known general mutator loci. The ancestral mutation rate was fully restored in the Ara—2, Ara-4 and Ara+3 strains only by the presence of wild-type alleles of genes in the methyl-directed mismatch repair pathway (Modrich, 1991) (Figure 1.3). The mutation rate in the ancestral strain, REL606, was not significantly affected by the plasmids (with a single exception; see Figure 1.3). In the.Ara+3 strain the mutS+ allele alone restored the l .l! p [9.898. ‘.’ f‘F - on 9'“ n. . . {Wear *n Vvtsfi. ‘.‘ Methods FYY‘Ar‘ VF! “”Fccsh: r. ...E “A I.‘ *4 -§. . r‘ed\ ‘q‘ A U. ‘ 14>.W‘ H‘Ufl‘ . F ”WA-J “'8. ‘0. H s d“‘ “C.“ .. 14 ancestral mutation rate. In the Ara-2 and Ara-4 strains the ancestral rate was completely restored by uer+, and the data also indicated a partial effect of mutL+. In fact, recent studies have suggested a mechanistic interaction between the mutL and uer gene products, such that a defect in one may be complemented by increased production of the other in some cases (P. Modrich, personal communication). Methods Experimental System The common ancestral strain was REL606, an Ara- clone of E. coli B obtained by B. R. Levin from S. Lederberg (Lederberg, 1966). A spontaneous Ara+ revertant was selected from REL606 and designated REL607. Six clones each from REL606 and REL607 were used to found the twelve experimental populations, which were propagated at 37°C by daily loo-fold dilutions into 10 ml of fresh Davis minimal medium (Carlton & Brown, 1981) supplemented with glucose at 25 ug/ml. Population sizes fluctuated between approximately 5 X 106 cells after dilution and 5 X 108 cells at stationary phase. -o - 5.4.6: r 8" 37.6-8... “e expe v... p. A! out. L.-v. ' "Y F 6 9.»-.. IQCI‘I'H "vacs...’ d m I". . I «0.3‘ p D. , . .- ~ fix a “‘v‘ ~ V in 15 Isolates from each population were stored at -80°C at 100— generation intervals during the first 2500 generations of the experiment and at SOD-generation intervals thereafter. .MUtation rate measurements Fluctuation tests (Luria & Delbruck, 1943) to estimate a given rate of mutation were conducted simultaneously on all strains to provide controlled comparison. Prior to testing, strains were revived and regrown for three days in the original experimental medium to reestablish the physiological conditions that had prevailed in the evolving populations. In general, 24 or more cultures of a given strain were grown from inocula of approximately 1,000 cells for each fluctuation test. For a given mutation rate measurement, all strains to be compared were grown in aliquots from the same batch of Davis minimal medium. Cultures were grown to stationary phase before selective plating. (An additional 24 hours in stationary phase had no discernable effect upon the mutation rate in pilot studies.) Final population sizes were estimated by growing and randomly harvesting three extra cultures for each strain and . on '3‘“ \r ."a..‘ DU-GbAu awran’ - yuan-inc J - ..,.:'...‘. A: pg: nu. v. F ”9" In rfilv¢ss J "' c 1128 C. 16 measuring cell densities using a Coulter particle counter; the means were used in mutation rate calculations. Ara+ mutants were enumerated on Davis minimal agar medium supplemented with arabinose at 10 ug/ml; Nalr mutants were enumerated on LB agar medium (Miller, 1992) containing 20 ug/ml of nalidixic acid; T5r mutants were enumerated by plating cultures with excess phage T5 in soft agar. Time of Origin and Persistence of MUtator Phenotypes Mutator screens were conducted at approximately 500 generation intervals over the 10,000 generations of the experiment. Overnight cultures in 5 ml of Davis minimal medium (supplemented with glucose at 1,000 ug/ml) were assayed for Nalr mutant numbers at stationary phase. Expected distributions of mutants, based on mutator and ancestral mutation rates, were used to generate criteria by which isolates could be scored as mutator or nonmutator with >95% confidence. (I) I (TVA II Bunny“- “VA twp: ”Tara 1‘“; Nu“ MW... 17 Complementation Tests Strains were transformed with plasmids carrying wildtype alleles for mutH (plasmid pGW1899), mutL (pGW1842), mutS (pGW1811), uer (pGT26), mutT (pSK25), dnaQ (pMMS) and dnaE (pMK9) according to a standard protocol (Sambrook et al., 1989). The plasmid labels correspond to the following sources: pGW from Pang et al. (1985), pGT from Taucher- Scholz & Hoffman-Berling (1983), pSK from Bhatnagar & Bessman (1988), pMM from Horiuchi et a1. (1981), and pMK from Schaaper & Cornacchio (1992). Fluctuation tests were conducted as described above, except that all strains were propagated in LB medium (containing 60 ug/ml of ampicillin where the strain was plasmid-bearing) and five parallel cultures were used per fluctuation test. Analysis of Fluctuation Test Data I wrote a computer program employing a recent Luria- Delbruck distribution-generating algorithm (Ma et al., 1992) t1) calculate maximum likelihood mutation rates from flinztuation test data. Instructions for obtaining and using ‘i ‘86“! 15‘s ‘ ‘V! . a “A“:I‘AF' HUHO‘“ ash {5285 l. u . 5 Daisy w who've. u- .n m, as “"“I f‘v “‘~ how. Discuss ‘ h . \.. f) 18 this program are given in Appendix 1.1. Approximate 95% confidence intervals for the mutation rates illustrated in Figure 1.1 were calculated using the formulae of Stewart (1994). Approximate 95% confidence limits for the mutation rates illustrated in Figure 1.3 are based upon the theoretical variance of the maximum likelihood estimate of 1n m, assuming normality, where m is the expected number of mutations per culture (see Appendix 1.1). Discussion Because most mutations are deleterious, mutator alleles are likely to have negative average effects on fitness. In an evolving clonal population, however, a deleterious mutator can rise to high frequency (hitchhike) in association with an adaptive mutation provided that the selective cost of the mutator does not outweigh the selective benefit of the adaptive mutation. Hitchhiking of mutators with adaptive mutations was demonstrated previously in chemostat populations of E. coli (Chao & Cox, 1983), but these results left open the question of whether such events vanild be likely to occur in natural populations: When a :"'G"Ej u» 5‘ I a I In a his...» 4 1.1 , .. . . {—7 Q! l .IPF w.‘ 3..” " A D] (I) l V I" L. 19 mutator was introduced above a relatively high threshhold number relative to the wild type, the mutator population always acquired an adaptive mutation before the wildtype population did and the mutator rose to high frequency. However, when the mutator was introduced in lower and more realistic numbers, the wild type population always acquired a beneficial mutation first and displaced the mutator. In the evolution experiment we have described here, as in natural populations, mutator alleles must have arisen by mutation. This raises the question of how mutators reached high frequencies in three of our twelve populations. Computer simulations (Taddei et al., 1997) and stochastic analytical modeling (Gerrish, Sniegowski and Mandrekar, unpublished; see Chapter 4) suggest that rare mutators may occasionally hitchhike to high frequencies in finite asexual populations as a consequence of chance associations with adaptive mutations. We view such chance hitchhiking events as the likely explanation for our results. Sufficient time may not have existed to observe them in earlier studies of competition between mutator and wild-type strains (Cox & Gibson, 1974; Chao & Cox, 1983; Trobner & Piechocki, 1984), vflmich were carried out for only a few tens or hundreds of 0 361835. I ,...{.‘y ..J‘J! ° 6.7-:0 '9" "no. 9" “"255 not.“ .v'.:.A' ifivU-V. up u I V' r“ ubloh‘ n ("'V'“, Air 11.4 d; Fena‘u y- . LUuu-..‘ 3.: A] I‘I 5" la: .‘1 a.” f. a... 20 generations, as opposed to the 10,000 generations in our study. Although we cannot formally rule out the possibility that certain mutations in mismatch repair might increase fitness directly, it is much more likely that the evolved mutators we observed are deleterious or at best effectively neutral. Indeed, known mutator alleles of mutS, mutL and uer do not increase cell fitness under our experimental conditions (C. Zeyl, unpublished data). A prediction of the simulations conducted by Taddei et al. (1997) is that mutators will hitchhike in some, but not all, finite asexual populations undergoing adaptation. Our experimental observations are consistent with this prediction. A further prediction is that fitness will be higher on average in mutator populations than in those that retain the wild type mutation rate. However, this fitness effect is very small and subtle: approximately half of the mutator populations in the Taddei et al. simulations did not show higher fitness, and the observed increases in fitness over contemporaneous wild type populations were slight (approximately 1%) in the remainder (see Taddei et al., 1997, their Figure 3B). We tested for a relationship between Inutability and fitness in our experimental populations in «by Cu _“ bill is: A‘fl Hr». we hh‘ :1 21 several ways, none of which gave statistical significance. For example, we tested for a correlation between mutability ranks (as calculated from data obtained in screens for mutator phenotypes: see Methods) and relative fitness ranks averaged across the duration of the experiment in all twelve populations. The results were inconclusive (r = 0.28, n = 12, one-tailed P = 0.19). It is clear from our experiments and the simulations of Taddei et al. that increased mutation rate and not increased fitness is the more striking consequence of the hitchhiking process. Our finding that asexual populations can evolve high mutation rates in a relatively benign environment may shed light on recent observations associating mismatch repair mutators with certain cancers (Modrich, 1995) and with pathogenicity in E. coli and Salmonella (LeClerc et al., 1996). .As in our experimental populations, high mutation rates may evolve as a stochastic byproduct of adaptation in clonal tumour lineages and in populations of asexual pathogens (Nowell, 1974; Moxon et al., 1994). The potential for faster evolution once high mutation rates have evolved may have important health implications. 22 405 0 fi' 8‘ W ‘V ‘9 <0 8 <: < «< '< .< < -7 B I 9 .93 8 f (U L. c -9. .9 Q E's-mi 9 Q 9 IQ? 4—0 .2: i ‘l E.118l\Fvam©s-Nm Inc as o8<<=t<<<2222322 CD -45 -' C 9 -&5 I . é -65+ f -&5 (ONPNC'DVL‘PCDv-NCOV‘OCO 884222<22252s2 Population Figure 1.1. Rates of mutation in 12 Ara_ and Ara+ experimental populations (A-l to Ar6 and A+l to A+6) at 10000 generations and in their common ancestors REL606 (Ara-) and REL607 (Ara+). Error bars give approximate 95% confidence intervals. (A) Reversion to Ara+ in isolates from the seven Ara- populations. (B) Mutation to nalidixic acid resistance. (C) Mutation to bacteriophage T5 resistance. CC...C_ZZCQ 23 Wildiype I:] Mutator - Ara*8 fl C .9 L3 Ara'4 ‘ Q . (O o. _ , , Ara 2 _ 0 1000 2000 3&0 4000 5000 6000 7000 8000 9000 10000 Generation Figure 1.2. Time of appearance and evolutionary persistence of mutator phepotypes in experimental populations.Ara 2, Ara 4, and Ara 3. 24 £5 Ara+3 ! i -7.5« § I i i i °8.5-( I .9... ______ i7 _______ j 40 5 control MutH MutL MutS Uer MutT MutO MutE Ara'4 -6.5~ i . O J5~ § § I Q) es< { r 4...: . SE -9& ________ l_ ______ -10. g 5 control Mutt-l MutL MutS Uer MutT Mth MutE Ara' 2 -7 I i i i f .9. _____. ,____ ______ .10. j l- 1 control MutH MutL MutS Uer MutT Mth MutE REL606 pl ‘ i ‘ .10. t "- cl 1' l 1 . control Mutt-l MutL MutS Uer MutT Mth MutE 10 Log mutat dn Figure 1.3. Mutation rates to nalidixic acid resistance in 10,000-generation isolates from populations Ara-2, Ara-4 and Ara 3 and ancestor REL606 transformed with plasmids bearing wild-type alleles of seven known general mutator loci. Controls shown are plasmid-free. Error bars give approximate 95% confidence limits. Only an upper confidence limit is shown for the case in which no mutants were obtained in any culture. For visual comparison, the dashed horizontal line and dotted horizontal lines in each panel illustrate the mutation rate and approximate 95% confidence limits measured in the ancestral control. he r! ’n Chapter 2 THE FATE OF CWPETING BENEFICIAL MUTATIONS IN AN ASEXUAL POPULATION1 Abstract In sexual populations, beneficial mutations that occur in different lineages may be recombined into a single lineage. In asexual populations, however, clones that carry such alternative beneficial mutations compete with one another and, thereby, interfere with the expected progression of a given mutation to fixation. From theoretical exploration of such "clonal interference", we have derived (1) a fixation probability for beneficial mutations, (2) an expected substitution rate, (3) an expected coefficient of selection fer realized substitutions, (4) an expected rate of fitness increase, (5) the probability that a beneficial mutation transiently achieves polymorphic frequency (2 1%), and (6) the probability that a beneficial mutation transiently achieves majority status. Based on (2) and (3), we were able to estimate the beneficial mutation rate and the distribution of mutational effects from changes in mean fitness in an evolving E. coli population. 1 This chapter was written originally as a paper (Gerrish & Lenski, 1998). The “we" in this chapter refers to P.J. Gerrish and R.E. Lenski. 25 H; D p. p- (D v-r Intrc fifih~~y ‘VVVCA- Pv~b _ . ‘~‘_ .'R.‘\' b..-“ - 26 Introduction Asexual populations adapt to their environment by the occurrence and subsequent rise in frequency of beneficial mutations. Without recombination, a population must incorporate beneficial mutations in a sequential manner (Fisher, 1930; Muller, 1932, 1964; Crow & Kimura, 1965). The time required for fixation of a beneficial mutation may be considerable if the population is large; however, the mutation remains at low frequency for much of this time (Lenski et al., 1991). While the mutation is at low frequency, another beneficial mutation may arise on the ancestral background. If two such beneficial mutations occur in a sexual population, then the two novel genotypes can recombine to form a fitter double-mutant (assuming no negative gene interactions). In an asexual population, however, these two novel genotypes compete with one another. Such competition between beneficial mutations slows the spread of, and may even eliminate, the first mutation. Such "clonal interference" between beneficial mutations has many important consequences for the dynamics of evolution in asexual populations. 27 The idea that progression of a beneficial mutation to fixation may be impeded by competing beneficial mutations was articulated by Muller (1932, 1964) in the context of discussions on the evolutionary advantage of sex. .Almost in passing, a brief theoretical treatment was later given by Haigh (1978), in which he proposed a discrete-time model of competing beneficial mutations. Employing a different approach, we give a full theoretical treatment of the phenomenon of competing beneficial mutations and its consequences. The body of this paper is presented in two main parts. In the first part, a probability of fixation is derived which incorporates the effect of competition between beneficial mutations, and some consequences of this derivation are then explored. The dynamics of fixation are such that a relatively simple derivation suffices. In the second part, the probability is derived that a beneficial mutation achieves a frequency greater than or equal to some specified frequency, f. From this, the probability that a beneficial mutation becomes transiently polymorphic (0.01 = uNPr{fix|or,u,N} , (5) 36 where denotes the expected value. As shown in Figure 2.3, a very large change in beneficial mutation rate (several orders of magnitude) has little effect on the substitution rate of the population, especially when the population is large. This constraint may be thought of as a "law of diminishing returns," where the investment is the number of beneficial mutations produced by a population and the returns are adaptive substitutions. The expected selection coefficient of successful mutations Figure 2.1 showed that there is some critical value of 5 below which the probability of fixation of a beneficial mutation is essentially zero. A beneficial mutation whose selective advantage is small is not likely to become fixed because it must compete with many superior mutations. On the other hand, a beneficial mutation whose advantage is large is less likely to be produced. Hence, there must be some intermediate selection coefficient that balances the fixation advantage of large 3 with the more frequent occurrence of small 3. This balance corresponds to the expected selection coefficient of successful mutations. se‘ t. ‘ 37 Let p(s) = Kn(s) e_l(”°"”’M 1’, where K is a normalizing constant such that j p(s) ds = 21. Then p(s) is o the probability density that a beneficial mutation of selective advantage 3 will be (i) produced and (ii) fixed. Therefore, the expected value for the selection coefficient of successful mutations is = fspls) ds . (6) 0 Figure 2.4 reveals that this expectation is essentially constant for pN < 0.01 and increases approximately linearly with the log of population size when uN > 0.1. Effect of clonal interference on rate of fitness increase At this point, sufficient information has been provided to determine how clonal interference between beneficial mutations affects the rate of adaptive evolution. Having derived (i) the rate at which substitutions occur and (ii) the expected selective advantage conferred by substitutions, tax; ““\A\ u“ e... "Pr “v.4“ V v- ‘E‘- 38 we now calculate the expected rate of fitness increase simply as the product of (i) and (ii): dw d—t (7) or u Nf sn(s) e’“”“’"’~> '°“ ds 0 Equation (7) is plotted against Logm N'in Figure 2.5 for . . dw different mutation rates. It appears that '3? approaches a maximum value for increasing N. The same is true for p. Indeed, that a maximum value exists can be shown mathematically. The implication is that there exists a sort of "speed limit" for asexual evolution imposed by clonal interference. Estimation of parameters: an empirical example The previous developments show some characteristic consequences of clonal interference; yet, these developments remain at the level of sweeping generalities until we find the region of parameter space in which biological reality lies. We demonstrate here that the parameters a and u may, in fact, be estimated empirically. 39 Equations (5) and (6) govern the expected rate of substitution and the expected selection coefficient of substitutions, respectively, both being functions of a, u, and N. If N'is known, then the resulting two equations contain two unknowns and are linearly independent: ll 0 (5 (mp) > — sob: (8) ll 0 (C(q’u)>_—Ows The parameters a and u may, therefore, be determined from this pair of equations given observed values for the substitution rate, 0g”, and the selection coefficient of substitutions, sun. It is possible to obtain such values by tracking the fitness trajectory of an evolving population (Lenski et al., 1991; Lenski & Travisano, 1994). The average time between periodic selection events gives the reciprocal of the substitution rate estimate; the average fitness increase caused by periodic selection events gives an estimate for the selection coefficient of substitutions. As an example, we estimate a'and u using the fitness trajectory observed for an evolving Escherichia coli 40 population (Lenski et al., 1991; Lenski & Travisano, 1994). This example serves two purposes: (i) it demonstrates the estimation procedure, and (ii) it puts us in the "biological ball-park" of parameter space. Lenski and colleagues serially propagated several E. coli populations for ten thousand generations of binary fission in a constant environment. (A particularly nice feature of working with bacteria is that samples of the evolving populations may be frozen and later "resurrected" for comparison with samples from earlier or later times. In this way, one may track the evolution of populations over time by competing the evolved populations against the ancestor to estimate their relative fitness.) That calculation of generation number implies a discrete-time formulation of population growth, whereas the mathematics in this paper employ a continuous-time formulation. In the following estimation of parameters, we adjust the number of generations by a factor of in 2 (= 0.693) to reflect this difference. During the first 2000 generations of binary fission (:1400 natural generations), they intensively assayed fitness for one population (Lenski & Travisano, 1994). The observed fitness trajectory was characteristically punctuated with sudden 41 fitness increases followed by long periods of stasis. This general pattern is in accordance with the results of previous sections: that due to clonal interference, the substitution of a beneficial mutation is a rare, isolated event, and that the fitness increases due to substitutions are large. Based on three sudden fitness increases during ~1400 natural generations (Lenski & Travisano, 1994), the average substitution rate is estimated as can = 0.002 substitutions per generation; the average fitness increase resulting from a substitution is San = 0.1. The effective population size with respect to the substitution of beneficial mutations, and given the serial transfer regime, was determined to be 3.3 x 107 (Lenski et al., 1991). We have estimated parameters a and u from these data by finding the point of intersection between the solution curves of equations (8). The solution for this system of equations is or = 35 and u = 2.0 x 10‘9 beneficial mutations per replication. Given that the genomic mutation rate of E. coli is approximately 3 x10"3 mutations per replication (Drake, 1991), one can infer that the proportion of mutations that are beneficial is roughly one in a million. 42 We emphasize that these estimates depend on (i) the assumption of an exponential distribution of beneficial mutational effects, and (ii) the assumption that a and u remain constant even as mean fitness increases. The empirical fitness trajectories referred to in this section show a decreasing rate of increase, suggesting that assumption (ii) is false if the environment is constant. (See Assumptions of the models.) Transiently common mutations Clonal interference - a general model Suppose that, while one beneficial mutation grows in number, a second beneficial mutation appears that is superior to the first. The population is now composed of three genotypes of interest: the ancestor and two competing beneficial mutations. If the first beneficial mutation is not close to fixation, then its growth is unaffected by the growth of the second, superior mutation until the latter has become sufficiently abundant to affect the mean fitness of 43 the population noticeably. When the superior mutation attains sufficient number, the growth of the original mutation is retarded until, at some point, it reaches a maximum frequency and then begins to decline. We are interested in the probability that the frequency at which this maximum occurs is greater than or equal to some frequency, f. To determine the probability that any particular beneficial mutation achieves a frequency of at least f, we begin by computing the time, t,, at which a superior mutation with selective advantage 32 must have appeared to insure that the original mutation achieves a maximum frequency of exactly f. Then we calculate the probability that no such superior mutation occurs in the interval (0, t2); this is the probability that the original mutation achieves a maximum frequency of at least f. (Note that ti is itself a function of the selective advantage, 5” of a given superior mutation.) To facilitate presentation of this development, we introduce the term candidate replication to refer to any replication event which, if it were to produce a superior mutation, would prevent the original mutation from attaining frequency f. 44 Consider a three-genotype system with ancestor x, original beneficial mutant y, and alternative superior mutant 2; the deterministic solution for the dynamics of such a system is derived in Appendix 2.2. The time, tn“, at which beneficial mutation, y, reaches maximum number is a function of the time of occurrence, ti, of an alternative mutation, 2, which is superior to y, i.e., t = t (t) . max max 2 The time, t,, is that which satisfies y(tmx(tz)) = fN, . dy . where QMX(QJ 15 such that Cd? = 0. If the superior mutation, 2, were to occur before time t,, then the original mutation, y, would not achieve frequency f. We can, therefore, calculate the probability that no superior mutation occurs in the interval (0, ti) by determining the expected number of such mutations in this interval and assuming that they are Poisson distributed. The first step in determining the expected number of superior mutations interfering with the original mutation is to calculate how many candidate replications take place, i.e., the number of ancestral replications in the interval 45 ! (O,tz)l or R = fx(t) dt. But t. is a function of the 0 selective advantage, s“ of the superior mutation. R is closely approximated by evaluating tzkh) at the expected value for s, conditional on it being greater than sy, i.e., . l t = t ((3 Is )3 l) , where is Is >3) = s + — is the expected 2 z z z y z z y y C1 selection coefficient of a superior mutation. The expected number of beneficial mutations in the interval (0, 52) is pR, where u is the per-replication rate at which beneficial mutations are produced. Of these uR beneficial mutations, only the fraction eifi” will be competitively superior to y, the original beneficial mutation. And of these u.Ree-” superior mutations, only 1 another fraction “‘5{“E) will survive drift. Therefore, the expected number of beneficial mutations that occur in the interval (0, 52), that are superior to y, and that 46 a I survive drift is w = u R e ”n(sy+a) . Because this expectation is a function of sy but not sz, we simplify our notation at this point by letting s = sy. The analytical solution for R, the number of candidate replications, is derived in Appendix 2.3. The resulting expected number of superior mutations that would prevent a given beneficial mutation from attaining frequency f is _ 1 wls,a.u.N.f) = u NlntN/x) e °” rim-a) , (9) where X z: 1 + ds+l (ff-1) (orsN)“"”l -'QS . Thus, the probability that a given beneficial mutation achieves a maximum frequency of at least f is 11(3) e-q‘("avquvf) I (10) where the effect of drift is incorporated by n(s). It is important to point out that equation (9) incorporates an approximation that is essentially an eQuality for f < 0.95, but which introduces significant 47 error for f > 0.99. (See Appendix 2.3 for details.) A technical difficulty with equation (10) is that there is no guarantee that w(s,d,u,N,f) is non-negative, whereas a fundamental assumption of the Poisson process is that the Poisson parameter be non-negative. To remedy this problem, we impose the condition, w(s,d,u,N,f) == max{w(s,d,u,N,f),0}. Otherwise, a negative Poisson parameter may arise if superior mutation 2 must appear before original beneficial mutation y to insure that the latter attains maximum frequency f, i.e., fl is negative. In this case, the probability that a given beneficial mutation achieves a maximum frequency of at least f is equal to one, because an assumption of our analysis is that the superior mutation 2 does not appear before original mutation y. We have shown that this assumption does not introduce much error (see Assumptions of the models). 48 Probability of transiently polymorphic beneficial mutations In this section, our objective is to determine with what probability one might expect a beneficial mutation to rise temporarily to polymorphic frequency. We define polymorphic frequency as any frequency greater than or equal to 0.01. In the Clonal interference and fixation section, we were only concerned with whether or not a beneficial mutation became fixed in a population, i.e., whether or not f’z-Egi. Now, we examine the probability that the frequency, f, of a beneficial mutation exceeds 0.01 yet N-l N’ This is the probability that a never reaches mutation will be transiently polymorphic. Given that a beneficial mutation survives drift, the probability that it will achieve polymorphic frequency is e““““”“M°J”). Given that the same mutation achieves polymorphic frequency, the probability that it does not reach fixation is computed as the probability that at least one superior mutation appears in the interval (£;,tf). The expected number of superior mutations appearing in this interval is: 49 - , 1 vls.a,u.N,f> = lgNlnlxi e“ n , (11) where x is as defined in equation (9). Therefore, given that a mutation with selective advantage, 3, has achieved polymorphic frequency, the probability that it does not reach fixation is 1.- e““““”“M°J“). The probability that a mutation will be transiently polymorphic is the product of (i) the probability that the mutation survives drift, (ii) the probability that the mutation achieves polymorphic frequency given (i), and (iii) the probability that the mutation does not reach fixation given (ii). Therefore, the probability that any arbitrarily chosen beneficial mutation transiently achieves polymorphic frequency is Pr{polylor, u, N} = " _ _ _ (12) aim“) e Ms,a,u,N,0.01) as (1 _ e y(s,u,a,N,0.0l) )ds 0 This equation is plotted in Figure 2.6. Given a certain population size, there is an intermediate value of the beneficial mutation rate at which the probability is greatest that an arbitrarily chosen beneficial mutation will 50 transiently achieve polymorphic frequency. Likewise, given a certain beneficial mutation rate, there is an intermediate population size that maximizes the probability that an arbitrary beneficial mutation will be transiently polymorphic. This result seems reasonable, because an increased recruitment rate of beneficial mutations, pN, increases the probability that a superior mutation occurs before a given beneficial mutation can reach polymorphic frequency (i.e., increases clonal interference). By lowering pN, on the other hand, one reduces the probability that a superior mutation occurs later, hence increasing the probability that a beneficial mutation, which has already achieved polymorphic frequency, will go to fixation (i.e., is not transient). Given the parameters estimated previously for an evolving E. coli population (a = 35, u = 2.0 x 104, N = 3.3 x 107), the probability that an arbitrarily chosen beneficial mutation becomes transiently polymorphic is approximately 0.034. With uN=:0.07, a beneficial mutation would have occurred every 15 generations or so. Of these, about 1 in 30 would become transiently polymorphic. Hence, one would expect about three transient polymorphisms 51 (f>0.01) in 1400 natural generations. This number is roughly comparable to the number of periodic selection events that were observed. This correspondence suggests that each beneficial mutation that went to fixation displaced not only its "parent" genotype but also a "sibling" genotype that had achieved some success. Surprisingly, these estimates do not rely heavily on the assumption that beneficial mutations are exponentially distributed. Calculations based on an alternative rectangular distribution show that the probability that a beneficial mutation transiently achieves polymorphic frequency is approximately 0.05. Thus, by assuming a rectangular distribution, one might expect about four transient polymorphisms in 1400 natural generations. The fact that assuming such very different distributions results in less than a two-fold difference in estimates suggests that these results are fairly robust. The section Assumptions of the models gives a more complete discussion of this test of robustness. 52 Probability of transiently common mutations: the leapfrog In a slight variation of the previous section, we will now examine the probability that a beneficial mutation achieves a frequency of 0.5 but is not fixed. We devote a separate section to this special case because of the strange dynamics it would present to an observer of a population in which it occurred. In this case, a mutant Ab reaches majority status before being supplanted by a superior mutant. aB, where both mutants are derived directly from the same ancestor ab. At the genetic level, this appears as a "leapfrog" episode in which (i) Ab replaces ab as the most common genotype and thereafter aB replaces Ab as the most common genotype, even though (ii) a8 is more closely related to ab than to Ab (Figure 2.7). If one were to sample this population at times ta, t2, and t3, as indicated in Figure 2.8, then one would observe that the sample from t3 is more closely related at the genetic level to the sample taken at ta than to that taken at t2. Following the derivation of equation (12), the probability that an arbitrarily chosen beneficial mutation transiently achieves a frequency of 0.5 or more is 53 Pr{ leapfrogla, u: N} = . - - _ (13) QfH(S) e Ms,a.u.N.0.S) (13(1 _ e y(s,u,a,N,0.5) )dS 0 Using the parameters previously estimated from an evolving E. coli population, and following the same logic as described at the end of the previous subsection, a beneficial mutation would occur every 15 generations or so. About one in every 55 of these mutations would be subject to the "leapfrog" effect, which should thus occur every 800 generations or 50. Therefore, it is quite possible that some of the three periodic selection events observed during the 1400 natural generations experiment were complicated by this effect. Whether empirical data would resolve the leapfrog as one or two periodic selection events would depend, in part, on how close in time the relevant genotypes became numerically dominant. If a leapfrog was resolved as two distinct periodic selection events, then the descendants after 1400 natural generations should differ from the founding ancestral genotype by fewer than the three beneficial mutations that would be expected under the 54 presumption that each periodic selection event was caused by the sequential substitution of an additional mutation (Figure 2.8). Figure 2.9a shows a numerical simulation of the E. coli populations using the empirically estimated parameters. The resulting trajectory for mean fitness, shown in Figure 2.9b, illustrates that a single leapfrog episode may indeed give the appearance of two periodic selection events. But rather than implying the successive fixation of two beneficial mutations, only a single substitution has actually occurred. Discussion Summary of results Competition between clones that carry different beneficial mutations may be very important for the evolutionary dynamics of asexual populations. The prevalence of such "clonal interference" among beneficial mutations increases dramatically with population size and with mutation rate. The following points summarize some of the most salient consequences of clonal interference. (1) 55 The fixation probability of a given beneficial mutation is a decreasing function of both population size and mutation rate. (2) Substitutions appear as discrete, rare events, no matter how frequently beneficial mutations arise. If a beneficial mutation is to overcome clonal interference and become fixed, then it must confer a substantial selective advantage. The advantage that is required for a reasonable probability of fixation is an increasing function of population size and mutation rate. (3) The rate of fitness increase is an increasing function of both population size and mutation rate, but it is only weakly dependent on these parameters when their product is not small. (4) Using observable trajectories for the mean fitness of evolving asexual populations, it is possible to estimate both the beneficial mutation rate and the distribution of beneficial mutational effects. We obtained such estimates for an evolving laboratory population of Escherichia coli. (5) Beneficial mutations that become transiently abundant, but which do not go to fixation, may be quite common in asexual populations. (6) Some of these transient polymorphisms may give rise to a "leapfrog" effect, in which the majority genotype at some point in time is less closely related to 56 the immediately preceding majority genotype than to an earlier genotype. Parameter estimates obtained for the evolving laboratory population of E. coli are consistent with this effect being an important feature of asexual dynamical systems. Assumptions of the models The models presented here assume that the general form of the distribution of beneficial mutational effects is that of an exponential distribution. Kimura (1979) employs the more general gamma distribution to describe the distribution of deleterious mutational effects. Elena et al. (1997) have shown that a compound gamma-rectangular distribution fits well to experimental data from transposon-induced mutations in E. coli. Intuitively, the exponential distribution seems a good choice for beneficial mutational effects, because it is reasonable to suppose that there are many more beneficial mutations of small effect than of large effect. Fisher (1930) reasoned that most mutations of large effect are deleterious as a geometrical consequence of the high dimensionality of fitness landscapes. He argued that the 57 ratio of deleterious to beneficial mutations increases with mutational effect (i.e., phenotypic difference between mutant and non-mutant), because a large radius in phenotypic space is very likely to circumscribe potential improvements, whereas a small radius stands a better chance of being tangent to an improvement. That this effect increases with the dimensionality of the fitness landscape is an easily demonstrable fact of geometry. A convincing argument for the use of the exponential distribution in particular comes from extreme value theory (see Gillespie, 1991, p. 262). Suppose that Alfitness alleles are present in a population such that Wm>W[2]>W[3]>...>W[M, (where W denotes fitness). If the population is in dynamic equilibrium, then the fittest of these Alalleles greatly outnumbers the other MFl alleles, which are held at some low frequency by mutation-selection balance. A fitness mutation results in a genotype whose fitness is drawn at random from some unknown parent distribution. Now, imagine that a novel fitness mutation appears that is beneficial (i.e., fitter than the current fittest genotype). If we denote the fitness of this mutation by Wm, then W101>Wr11 and the selection coefficient will. — 1. Gillespie (1991) m of this novel mutation is s = 58 shows that this s is exponentially distributed in the limit as bl~<», regardless of the shape of the unknown parent distribution. (The reason for this result has to do with the fact that Wm and Wm are the two largest fitnesses; they are extreme values of the parent distribution.) In other words, in the limit of infinite fitness alleles, the distribution of s is necessarily exponential. To evaluate the sensitivity of our analysis to the assumption that selection coefficients are exponentially distributed, we replaced the exponential density with a l/s , s<:s max max rectan ular densit i.e. s = . See 9 Yr Ipl) 0 I S>Sm Appendix 2.4 for results of these derivations. Following the logic employed in Estimation of parameters, we have estimated the parameters SW” and u to be 0.12 and 5 xilOdo, respectively, for an evolving laboratory population of E. coli. Note that the beneficial mutation rate estimate, p, is of the same order of magnitude as was obtained assuming exponentially distributed selection coefficients. Figure 2.10 further shows that replacing the exponential with a 59 rectangular density changes the resulting fixation probabilities only slightly. The slight discrepency when uN is very small, such that clonal interference is unimportant, arises because the average selection coefficient (and hence 43) is slightly higher for the rectangular than for the exponential distribution. The probabilities of transient polymorphisms (either f>0.01 or f>0.5) are consistently higher when a rectangular density is assumed, although the discrepancies are small. In view of these results, our analyses appear to be reasonably robust with respect to the form of the distribution of selection coefficients. A second assumption of our analyses is that neither the beneficial mutation rate nor the distribution of selection coefficients changes over time. But in a constant environment, a population becomes better adapted with time, leaving progressively less room for further improvement. It is likely that a well adapted population has (i) a lower overall rate of beneficial mutation, (ii) a smaller average effect of beneficial mutations, or both. Consequently, u = p(w) may be a decreasing function of fitness, while a = a(w) may increase with fitness. These parameters are therefore constant only when w is constant. This condition may be met 60 in an environment that changes just fast enough to counter adaptation of a population. A third assumption made in these models is that the progress of a given beneficial mutation is unaffected by the presence of inferior beneficial mutations. By definition, inferior beneficial mutations cannot themselves competitively exclude a given beneficial mutation. However, the selective advantage of a given beneficial mutation will be lower relative to these inferior beneficial mutations than relative to the ancestral genotype, and so inferior beneficial mutations may prolong the time that is required for fixation of a given beneficial mutation. As a consequence, there may be a longer interval in which a superior beneficial mutation could appear that would prevent fixation of the original beneficial mutation. To address this possible complication, the probabilistic models were made fully dynamic by considering all beneficial mutations since the most recent substitution. When the dust settled, the results were essentially unchanged from those that we have presented. For example, the fixation probability of an arbitrarily chosen beneficial mutation was changed by a 61 a . . . . . . factor of 'ETI which is inconsequential Since a 18 generally large. An assumption made in estimating the parameters a and p from observed fitness trajectories is that the sudden jumps observed in these trajectories are, in fact, fixation events. Based on the results of The leapfrog, however, this assumption is questionable; if a leapfrog event were to occur, then it would give the appearance of two such fixation events (Figure 2.9). Thus, of the three observed jumps in fitness during 1400 natural generations (Lenski & Travisano, 1994), for example, perhaps only two were actual fixations and one was the result of a leapfrog episode. If this were the case, then our estimates of a and u would be incorrect. To evaluate the degree to which these estimates may be in error, we changed the assumption that observed fitness jumps represent fixations and assumed instead that these jumps represent beneficial mutations that achieved a frequency of f > 0.5. To that end, we employed the derivations of Clonal interference - a general model. This change of assumption did not appreciably affect the estimates (a = 29, p = 1.6 x 10”), indicating that our 62 initial assumption, at least in this case, did not introduce much error. .Mbdel validation by simulation A general result of the section, Clonal interference and fixation, is that trajectories of population mean fitness are characteristically punctuated, with sudden jumps followed by long periods of stasis, regardless of the mutation recruitment rate, uN. To assess this general prediction qualitatively, we simulated the occurrence of, and competition among, many different beneficial mutations whose selection coefficients were drawn at random from an exponential distribution. Figure 2.9 demonstrates that, despite fierce competition among numerous beneficial mutations (pN = 0.1), the population mean fitness is not appreciably affected until a fitness variant achieves high frequency. (These results also lend support to the assumption that mutations inferior to the currently dominant variant play a negligible role in clonal interference.) To test the models quantitatively, we ran repeated simulations. The probabilistic predictions for (i) the 63 probability of fixation, (ii) the expected fitness increase conferred by a substitution, (iii) the expected substitution rate, (iv) the probability of transiently achieving polymorphic frequency, all agreed well with a large number of fully stochastic simulations. We emphasize that these simulations allow for the more realistic situation in which a mutant may acquire further beneficial mutations at any time after its appearance. Inclusion of the double mutant To this point, we have emphasized competition between three genotypes, the progenitor (ab) and two mutants that carry different beneficial mutations (Ab and aB). However, a fourth genotype should eventually appear that has both beneficial mutations (AB). If the effects of the two beneficial mutations on fitness are additive, then the double mutant will eventually take over the population. A full treatment of the dynamics involving this fourth genotype is beyond the scope of this paper. For now, we address only one particular issue. If a leapfrog event is to be manifest, then genotypes Ab and a3 must each achieve 64 majority status before AB does; otherwise, the dynamics will appear as a sequential substitution (Figure 2.7). The probability of occurrence of the leapfrog must, therefore, incorporate the probability that sequential substitution does not occur. We have conservatively estimated this n . . n(s)u ‘ . probability as exp - e f'th) dt ; this factor was 0 incorporated into the integrand of equation (13) and found to have a negligible effect (probabilities were reduced by no more than five percent for a wide range of parameters). Therefore, we neglected this factor in our earlier developments in order to keep things as simple as possible. Implications for the evolution of reproductive strategies Muller (1964) briefly alludes to the concept of clonal interference while making a case for the evolutionary advantage of sex. Muller argued that adaptive evolution of asexual populations is inefficient, because the fraction of beneficial mutations that are lost due to competition with 65 alternative beneficial mutations may be substantial in a large population. Recombination would remedy such inefficiency, which suggested an evolutionary advantage for sex. This argument was restated and explored analytically by Crow & Kimura (1965), to which Maynard Smith (1968) responded by pointing out that Muller's original argument relied on the erroneous assumption that mutations were unique events, such that each could occur only once. In a counter-example, Maynard Smith demonstrated that models of sexual and asexual systems yielded the same rate of adaptive evolution when mutations were treated as recurrent events. For a nice summary of this controversy and further developments on this topic, see Felsenstein (1974, 1988). Much recent work has focused on how fixation probabilities are affected by variance in fitness at background loci and the degree of linkage to these loci (Barton, 1993, 1995; Keightley, 1991; Pamilo et al., 1987; Peck et al., 1997). Barton (1994) derived the conditional probability of fixation of a beneficial mutation given that a single substitution occurs or that substitutions occur at a given rate. He explored the dynamics of this probability under varying degrees of recombination. We believe that the 66 models presented here may contribute to understanding the evolution of sex by giving an explicit expression for the unconditional probability of fixation of a beneficial mutation, in the limit as recombination rate goes to zero. Another part of a population's reproductive strategy, namely its mutation rate, may also be affected by the clonal interference phenomenon. Much work has been done to determine whether and how natural selection may adjust mutation rates. A high mutation rate may confer an evolutionary advantage, for example, if it increases the rate of substitution of beneficial alleles. This advantage, however, must overcome the disadvantage of a parallel increase in deleterious mutations. Leigh (1970) demonstrated theoretically that elevated mutation rates can evolve in asexual populations that experience oscillating selection on some locus. Since then, much work has supported the notion that evolutionary elevation of mutation rates is at least possible, and perhaps likely, in changing environments (Gillespie, 1981; Ishii et al., 1989). In light of the developments presented in this paper, however, it seems that the strength of selection to elevate mutation rates (above some minimal value set by the physiological 67 cost of fidelity) may be smaller than the established theory would indicate, especially when populations are large. As we have shown, an increase in mutation rate hardly changes the rate of adaptation of large populations because of clonal interference (Figure 2.5). To gain an appreciable increase in the rate of adaptation for a large population would, therefore, require a disproportionate increase in mutation rate. Such a large increase in mutation rate, however, would undoubtedly have a detrimental effect due to _ the greatly increased production of deleterious alleles. Consequently, it seems reasonable to suggest that selection for elevated mutation rates should be weak in large populations. Implications for the general nature of adaptive evolution Three especially interesting consequences of the results obtained here concern the general nature of adaptive evolution in asexual populations. The first is that one should expect the trajectory for mean fitness of any asexual population to be punctuated with short bursts of rapid, significant increase followed by long periods of stasis, 68 regardless of the size of the population or its mutation rate. This result contradicts the intuitive, but erroneous, View that discrete bouts of periodic selection (in which individual mutations sweep to fixation) should overlap, thus giving the appearance of continuity, when the mutation recruitment rate, pN, is sufficiently high. A second intriguing implication is that there exists a "speed limit" on the rate of adaptive evolution in asexual populations. As shown in Figure 2.5, the rate of improvement in a population's mean fitness decelerates with increasing u and N. This result reflects intensified clonal interference as well as the longer time required for selection to proceed to fixation in large populations. A third important consequence is closely related to the second: the rate of adaptive evolution is clearly not always limited by mutation rate. In fact, because of clonal interference, the rate of adaptive evolution is only weakly dependent on mutation rate and population size unless uN is small (uN < 0.1 for a’= 35). 69 Evidence for transiently common beneficial mutations in microbial populations One of the intriguing consequences of asexuality is that beneficial mutations may become quite common temporarily but eventually go extinct as superior mutations arise (Figures 2.8 & 2.9). In principle, it should be possible to find evidence for this effect in natural populations of asexual organisms. A complication arises, however, in that a beneficial mutation may also become transiently common, but then disappear, if the environment changes so that the mutation is no longer favored. For example, Holmes et al. (1992) followed the molecular evolution of a population of the human immunodeficiency virus (HIV) within a single infected patient, and their data show several instances of transiently common mutations. In particular, they monitored changes in the RNA sequence encoding the third hypervariable loop of gp120 (V3) throughout the asymptomatic phase of infection (7 years) of a single hemophiliac patient. All 12 viral sequences that were obtained immediately after infection were identical, and these were denoted as sequence 70 A. In year three, a set of related sequences, denoted C1 through C5, were numerically dominant (11 of 15), but in year seven they and their descendants comprised only a small fraction of the population (2 of 13). By contrast, sequence E1 was present only as a small minority (1 of 15) after three years, but after seven years it and its descendants were numerically dominant (10 of 13). Within the E1 clade, a subset of derived sequences, denoted E2 through E5, became numerically dominant after five years (19 of 23). However, neither they nor their descendants were represented by even a single sequence in years six (0 of 15) and seven (0 of 13). Thus, it is evident that certain mutations became transiently common, only to decline subsequently in frequency. Moreover, the data show the "leapfrog" effect in which the majority type at one point in time is not descended from the majority type that immediately preceded it. Holmes et al. (1992, p. 4838) recognized the importance of these dynamics when they said that changes in the viral population, "instead of being the sequential replacement of one antigenically distinct variant by another, may involve a complex interaction between the different, and competing, evolutionary lineages present in the plasma." 71 The one caveat to our interpretation of these data, however, is that the host immune system responds to the viruses, and so the HIV population is evolving in a changing environment. Thus, for example, sequences E2 through E5 may not have been driven extinct solely by an intrinsically superior mutation, but instead they may have become selectively disadvantaged after they were targeted by the immune system. This scenario is supported by the fact that the V3 loop is a principal target of the immune system. But even with the added complication of the changing immune environment, asexuality can have important dynamical consequences for HIV and other pathogens. In particular, the "leapfrog" effect necessarily increases the genetic distance between successive majority types (Figures 2.7 & 2.8), and so it may actually facilitate a pathogen's evasion of the host immune response. An unambiguous demonstration of the "leapfrog" effect will require data from an asexual organism living in a constant environment. To that end, we are now using molecular methods to determine the phylogeny of clones sampled over time from an experimental population of E. coli, as it evolved for thousands of generations in a 72 defined laboratory environment (Lenski et al., 1991; Lenski & Travisano, 1994; Elena et al., 1996). If the "leapfrog" phenomenon is important, then we expect to see a clade become numerically dominant, only to be driven extinct by the emergence of another, even more successful clade that is derived from its ancestral base (rather than from the formerly dominant clade). A suggestion for further research Clonal interference is not the only dynamic that inhibits the progression of beneficial mutations to fixation in an asexual population. A similar inhibition may be caused by Muller's ratchet (Muller, 1964; Haigh, 1978), in which deleterious mutations will tend to accumulate in small asexual populations. As shown by Manning & Thompson (1984) and by Peck (1994), the fate of a beneficial mutation is determined as much by the selective disadvantage of any deleterious mutations with which it is linked as by its own selective advantage. In asexual organisms, the entire genome in which a beneficial mutation occurs will remain linked to that mutation and will hitchhike to fixation if 73 that is the fate of the mutation. Therefore, a beneficial mutation that spreads to fixation presents a severe population bottleneck in which only a single genome is sampled, thus exacerbating the effect of Muller's ratchet. Consequently, a beneficial mutation may only be considered advantageous if its benefit more than compensates for the drastic reduction in effective population size caused by its fixation and the associated acceleration of Muller's ratchet. Haigh (1978) modeled the effect of deleterious mutations on population fitness; Manning & Thompson (1984) and Peck (1994) modeled the effect of deleterious mutations on the fate of beneficial mutations; and the models presented here provide a quantitative account of how beneficial mutations affect one another. A logical next step would be to integrate these models into a single theoretical treatment of mutation in asexual populations. Such a synthesis would be a valuable contribution toward a general understanding of evolutionary dynamics in asexual systems. 74 c 0.18““.“ n .......... -~ . g l .U m N '7 o 12 "4'3? 0 —l)— 'I-I 'fl ° ~11 >' u .U a... n. 0.06 4’- H '87 .o g o D. selection coefficient, 3 Figure 2.1. The probability that a given beneficial mutation with selection coefficient, 3, achieves fixation. Equation (3) with a = 35, p = 2.0 x 10", and N = 3.3 x 107. 75 ,2: N II 105 .35 N .. 10‘ '21: N I 107 H D. N I 10' t: N a 10’ O H iii -a .r 1 g N «4 -10 -8 -6 -4 'H ‘H o 0 .. >1 U ...| H 13 '8 N I 10’ H e N s 10‘ o .4 N 8 107 UI 0 II: N a 10. N I 10’ -8 l l J] -10 -8 -5 -4 Log10 (beneficial mutation rate, it) Figure 2.2. The probability of fixation of an arbitrarily chosen beneficial mutation is a decreasing function of both beneficial mutation rate, p, and population size, N (Equation (4)). The exponential parameter, a, appears only to shift the curves down vertically without changing their shape. 76 O N = 109 = 105 x—\ b 0 +) M I: c o «4 .p :3 4) a) ‘-10 -a -s -4 .Q :3 m 13 o m .p 0 m N a 109 E; N =- 105 ‘2 F! or o _u4 -10 -8 -6 -4 Logic (Beneficial Mutation Rate, 1.1) Figure 2.3. The substitution rate of a population is an increasing function of its beneficial mutation rate (Equation (5)). When the population size is large, however, a large change in beneficial mutation rate hardly affects the substitution rate. 77 0.4- 0.3 -_ Expected Selection Coefficient, Figure 2.4. The expected selection coefficient of substitutions, , is an increasing function of population size, N; Equation (6) with a = 35, and p = 2.0 x 104. The solid line indicates the expected value; the dashed lines indicate numerically determined 95% predictive confidence limits. 78 Loglo dw/ dt dw Figure 2.5. The rate of fitness improvement, ‘35' is an increasing function of both population size, N, and beneficial mutation rate, p. Equation (7) with a'= 35. This rate of increase decelerates substantially, however, due to increased clonal interference when pN > 0.1 (i.e., when more than one beneficial mutation is produced on average every ten generations). 79 “\ (3.03 “will- of}. _‘ G C) It- 10 “a. 3“ p-lOJ H 4 3n. u-lo _, g ‘ 0.02 w— “.10 “.10-10 6 'H 0% hm U «4.3 0.01 - H “E“ go '39. “H “‘0 0 r % a. 4 6 8‘ 10 LOQMN Figure 2.6. The probability that an arbitrarily chosen beneficial mutation transiently achieves polymorphic frequency is plotted against log population size for various beneficial mutation rates. Equation (12) with a = 35. For a giVen beneficial mutation rate, there is an intermediate P0Pu1ation size at which the probability of achieving FOlymorphic frequency is a maximum. 80 ab——>Ab ———>AB sequential substitution Ab / \aB ———> AB Ieapfiog ab time —-> Figure 2.7. The leapfrog phenomenon illustrated phylogenetically. The phylogeny of majority genotypes is compared with that of sequential substitution. lfi‘r-fiu turn—v1 81 1.E+08- U) 7|“ ab Ab a3 a 'U 1.E+06 -/ -a > -a 'U c H 1.E+04 -~ u; o H '3 l.E+02—~ o 1.E+00 . .. __1 n = ' ' 0 t1 t2 500 t3 1000 time (generations) Figure 2.8. The leapfrog phenomenon illustrated dynamically. Genotype ab is displaced by mutant Ab which is later displaced by alternative mutant aB. Equations (25) and (26) with sy1= 0.09, 52 = 0.13. Note that genotypes sampled at time t5 are more closely related to those sampled at ta than to those sampled at t2. 82 1.E+00‘" 1 .E-OZ “'- 1 .E-O4 T’ frequency 1.3-06 '- 1 .E-OB l 1 § 1.2 T 1.15 -_ / 1.1 '- 1.05 ‘- mean realtive fitness 0 ' 150 300 450 . time (generations) Figure 2.9. (a) A simulation of competition among numerous beneficial mutations. The heavy lines represent genotypes that achieve majority status. Note that a leapfrog event has occurred in this particular simulation. Selection coefficients were drawn at random from an exponential distribution. Parameters used are a'= 35, p = 2.0 x 104, N = 3.3 x 10K (b) The mean fitness trajectory of the population simulated in panel (a). Note that the leapfrog phenomenon gives the appearance of two distinct periodic selection events. 83 Log” Pr{fix} -10 -8 -6 -4 Figure 2.10. The probability of fixation of an arbitrarily chosen beneficial mutation is plotted against the beneficial mutation rate, p, for various population sizes. The solid lines indicate probabilities assuming an exponential distribution of beneficial mutational effects (Eq. (4) with a'= 35, as estimated from the E. coli populations). The dashed lines indicate probabilities which assume a rectangular distribution of beneficial mutational effects (Eq. (33) with SW“ = 0.12, as estimated from the E. coli populations). The discrepancies resulting from the different distributional assumptions are small. Chapter 3 THE ORDER OF FIXATION OF BENEFICIAL MUTATIONS IN ASEXUAL POPULATIONS1 Abstract In a constant environment, an organism's opportunities for further improvement are slowly depleted. To model adaptive evolution in such an' environment, one should consider the number of available beneficial mutations to be finite. Given a finite set of beneficial mutations, the course of adaptive evolution is determined by both the timing and the ordering of fixations. In large, asexual populations, the ordering of fixations is affected by competition among beneficial mutations. Taking into account such competition, we derive the probability of any specified fixation ordering. The trajectories of fitness expectation and variance among replicate populations are then derived, given (i) a particular set of beneficial mutations, or given (ii) only a common distribution of mutational effects. USing fitness data from evolving E. coli populations, we solve the inverse problem to determine (i) the number of available beneficial mutations, (ii) the distribution of beneficial mutational effects, and (iii) the beneficial mutation rate. 1 This chapter is written in the format of a paper. The “we” in this chapter refer to P.J. Gerrish, R.E. Lenski, and V. Mandrekar. 84 85 Introduction In a continually changing environment, an organism's potential adaptations are never depleted. To model the adaptation of an organism in a changing environment, therefore, it is reasonable to assume the number of available beneficial mutations to be infinite (Gerrish & Lenski, 1998). When an organism's environment is constant, however, then the number of possible genetic improvements is slowly depleted as the organism adapts. To model the adaptation of an organism in a constant environment, therefore, the number of available beneficial mutations must be considered finite. Given that a finite set of beneficial mutations is available to an organism, all available beneficial mutations are eventually fixed in the population. Whenever one of these beneficial mutations is fixed, the pOpulation fitness increases. Because each beneficial mutation is unique, each confers a different increase in population fitness. Thus, the adaptive dynamics of a population are determined as much by‘ c; . “Xi inc: 5““ ; . D th‘ wot} woul 86 by the order in which the available beneficial mutations are fixed as by the timing of the fixations. If there were only two different mutations that would increase the fitness of a particular asexual organism in some environment, and if one of these beneficial mutations occurred in one individual, while the other occurred in another individual, then the progeny of the two individuals would compete. The lineage carrying the superior mutation would competitively exclude both the wildtype and the inferior beneficial mutation (Muller, 1932; Muller, 1964; Pamilo et al., 1987). Eventually, the inferior mutation would appear on the background of the superior mutation and the double mutant would be fixed. Let there be two beneficial mutations, A and B, and let a and b denote the corresponding wildtype alleles. Let A be the fitter of the two alleles, i.e., up > we. If genotypes Ab and a8 are both present in the population, then aB will be excluded, and the order of fixation of the genotypes will be ab ~ Ab ~ AB. In small populations, the time interval between occurrence of beneficial mutations is typically long enough to ensure that both alternative beneficial mutations are not simultaneously present in the population. Thus, the 5213 time alte (‘5- mt-c 87 two possible fixation orderings of the beneficial mutations (ab ~ aB ~ AB, and ab ~ Ab ~ AB) are approximately equiprobable in small populations, their probabilities depending primarily on the relative rates of appearance and survival of A and B. In large populations, however, the time interval between occurrence of beneficial mutations is short, resulting in a high probability that alternative beneficial mutations are simultaneously present in the population. The simultaneous presence of the two alternative beneficial mutations promotes the ordering, ab ~ Ab ~ AB. In general, large populations (and/or high mutation rates) make probable the simultaneous presence of alternative beneficial mutations, which in turn promotes the ordering of fixations from largest to smallest effect mutations. In what follows, we derive the probabilities of different fixation orderings. Because each ordering of fixations corresponds to a particular fitness trajectory, each trajectory is therefore assigned a probability, from which the aggregate expected trajectory may be computed. We first treat the case in which a set of beneficial mutations is given, such that the set of selection coefficients is prol 0"" ‘ mt. 0‘.— ma... (T (D :1 Pro] 01 88 known. Then we demonstrate how the number of unknown parameters may be reduced either by extracting the most probable set of selection coefficients from a parent distribution or by taking the limit as selection coefficients converge. Finally, we treat the fully probabilistic case in which, instead of assuming that different populations fix the same set of beneficial mutations, we assume only a distribution for beneficial mutational effects. In the discussion section, we employ these developments to estimate parameters of the model for evolving populations of Escherichia coli by regressing their fitness trajectories; parameters to be estimated are: (i) the number of available beneficial mutations, (ii) the distribution of beneficial mutational effects, and (iii) the beneficial mutation rate. Probabilities of fixation orderings Order of fixation of two beneficial mutations Let subscripts S and I refer respectively to the superior and inferior of two available beneficial mutations. 89 Suppose the inferior mutation appears and survives drift before the superior mutation does so. Then the inferior mutation will be fixed before the superior mutation only if the superior mutation does not appear and survive drift before the inferior mutation is fixed. Let tq denote the time of appearance of an inferior mutation which subsequently survives drift. Then the total time before fixation of the inferior mutation is tI-+ (see Appendix I 3.1), where 31.15 the selective advantage of the inferior mutation over the wildtype, and N'is population number (assumed constant). Thus, the expected number of times the superior mutation occurs and survives drift before fixation of the inferior mutation is Is(t1+ ), where rk==Lkn(sS)N I is the rate at which the superior mutation appears and subsequently survives drift; us.is the rate of mutation to the superior mutation, ss is the selective advantage of the superior mutation over the wildtype, and n(u) is a linear function of u giving the probability that a beneficial mutation with selective advantage u survives the effects of drift (Haldane, 1927; Otto & Whitlock, 1997; Gerrish & 90 Lenski, 1998). Given t1, the probability that inferior mutation I is fixed before superior mutation S appears and survives drift is the zero-class of the Poisson .N )}. Let Pr{I;S} denote the I distribution, exp{-rk(tl+ probability that I is fixed before 8. (This is unconventional notation and should not be confused with a conventional notation in which S would denote a condition.) If the probability density function (p.d.f.) of t, is denoted by g(tfl, then the probability that I'is fixed before 5 is given by: . _ " lnN Pr{LS} —£exp{-rs(tl+ 31)}g(t1)dt1 . (14) The rate at which the inferior mutation appears and subsequently survives drift is 1}==15n(sI)N. This rate is time independent. Therefore, the waiting time p.d.f. for such an event is the homogeneous exponential: g(tI)==r}efl7”£ Upon integration, equation (14) becomes 91 u 11(3) Pr{I;S} = l + L—i exp{~usn(ss) N l: N} . (15) 1% II(SI) 1 For later convenience, let n denote overall beneficial mutation rate, such that p = uI-+;mu In some of the subsequent figures, we assume individual mutation rates to be equal, implying in this case Hz==lk = p/2. We now derive the probability that S is fixed before I, The superior mutation will be fixed first only if its appearance and subsequent survival precede the fixation of the inferior mutation. Let t3 denote the time of occurrence of a superior mutation which subsequently survives drift. The inferior mutation cannot be fixed in the interval (0, ts) if the superior mutation is to precede. The time lnN required for fixation of the inferior mutation is . I Therefore, the superior mutation will be fixed first only if an inferior mutation does not appear and survive drift lnN before time (tS-—E—J+, where the notation (0+ is defined I here as max(u0). The expected number of inferior mutations . . . . , N , appearing in this interval 15 r}(tS-—§—J . Therefore, the I 92 unconditional probability that superior mutation S is fixed before inferior I is given by . _ ” lnN , Pr{S,I} — £exp{-r1(ts- ) }g(tS)dtS , (16) l _ ”5's _ where g(tS) - rSe , and rs - uSn(sS)N. As expected, the probability that the superior mutation is fixed first is simply the complement of the probability that the inferior is fixed first: Pr{S;I} = 1.- Pr{I;S}. Note that limNm Pr{S;I}==l, confirming our claim that large population number promotes the ordering of fixation from largest to smallest effect mutations (Figure 3.1). uSIHsS) Moreover, lim Pr{S;I) = , confirming our N»: u1n(sl) + )JSII(SS) claim that the ordering of fixations in small populations is determined primarily by the relative rates of appearance and survival of beneficial mutations. One limit of special interest to the empiricist is the limit in which the two selection coefficients converge. This limit is of interest because it is often the case that 93 little is known about the selection coefficients other than the relation sS:>sl. And it is conveniently the case that selection coefficients cancel out in this limit, rendering the probability independent of either selection coefficient. The limit as selection coefficients converge yields a maximum value for Pr{I,S}. More precisely, define e as the positive constant which satisfies ss==sI+-e. Then, given only that 58 >.sn the maximum probability, Pr{I,S}, is found by taking the limit of (15) as e-0. A second limit of interest is that in which the mutation rate ratio, pyfim, gets large. This limit is of interest because it again yields a maximum probability, Pr{I,S}, this time eliminating dependence on u,. In the cases where selection coefficients as well as the mutation- rate ratio are unknown (see HIV example in Discussion), it may be convenient to examine the double limit, sup{Pr{I;S}|uS,N} = mMnN lim 0 u, Pr{I;S} = e , e~,—«o us (17) 94 where K is the linear constant in the drift survival function, and we assume the intercept of this function to equal zero, i.e., rxufl = Ks. (This assumption is reasonable when N is large, because n(O)==1/ll= 0; see Crow & Kimura, 1970.) Note that (17) does not depend on either selection coefficient and gives a maximum value for all cases in which sS>>sI and p, is unknown. A useful inequality may be derived by noting that the probability given by (17) is greater than 0.05 when pSKNlnN < 3 . (18) The utility of this inequality is demonstrated in the discussion section. Probability that a superior mutation is the first of many In the case of many available beneficial mutations, the probability that the superior mutation will be fixed first is an extension of the previous developments. Let the set of M'available beneficial mutations be ordered from largest to smallest effect such that s, > 52 > 53 > . . . > SM. We 95 assume that these mutational effects are addditive, so that the rank-order of selection coefficients does not depend on which mutations may have been fixed previously in the population. The probability that the superior mutation (mutation 1) is the first to be fixed is the probability that none of the NFl inferior mutations are fixed first: Pr{l;2,3,...,M} = jexp{-fri(tl-—ln—N)+}g(t)dt , (19) s 1 1 0 i=2 i - t where g(tl) = rler“, and r: = uin(si)N. Probability that fixation ordering is from largest to smallest s Fixations appear in order from largest to smallest s when the largest of the M'available beneficial mutations is fixed first, and the largest of the:RF1 remaining mutations is fixed next, etc. Given the ordered set of available beneficial mutations as described above and extending this logic, the probability that the mutations will be fixed in 96 order from largest to smallest selective advantage is, intuitively, Pr{l;2, . . . ,M} *Pr{2;3, . . . ,M} *. . . *Pr{M—1;M}, or: T1 Iexp{- g r] (ti- lnN 1:1 0 j =i +1 1' )*}g(ti)dt, . (20) t where g(t'_) = rier“, and If = pin(si)N. Above a certain population number or beneficial mutation rate, this probability of rank-ordered fixations is equal to one (Figure 3.2). General solution for any ordering of fixations The previous developments may be generalized to derive an expression for the probability of any specified ordering of fixations. Let C(-} denote a permuting function. .A permuting function specifies an ordering. For example, C(3} = 1 would imply that the third mutation to be fixed is the mutation of largest selective advantage. Put differently, the function §{'} maps place in the fixation ordering onto rank by selective advantage. (Note, there are M? such functions, corresponding to all possible fixation orderings.) Define 13 to be the set of all beneficial 97 mutations both inferior to and subsequent to the jth mutation fixed, given the fixation ordering, C. Define $9 to be the set of all beneficial mutations both superior to and subsequent to the j” mutation fixed. Then, the probability of any specified fixation ordering, Q, is given by: ¢(€) = ” lnN . lnN (21) fexp{-Xrl(ti-—S—) - £r_(ti+ )}g(t'_)dt’_ . i: 0 1.611 I J jesi 1 CU} ' -r ! Where giti)==1}“}e ‘”". Note that 1, §i= o limfim¢(() = 0” S ‘ 0, again confirming our claim that large populations promote rank-ordered fixations. Moreover, M ‘1 Sui-1(3)) , again confirming our ..1 J 14 liqu¢(() == iliin155)[ claim that fixation ordering in small populations is determined primarily by the relative rates of appearance and survival of beneficial mutations. Indeed, if all such 98 rates, Lrn(si), are equal, then this probability reduces to lim ¢(€) = -—— which a rees with intuition. N4. 1:11:01) “"1“" j) M! ’ g The analytical integration of (21) is given in Appendix 3.2. Fitness trajectories I: parallel populations share a common set of available beneficial mutations How repeatable is evolution? Albeit on a miniature scale, evolutionary biologists are beginning to address this question experimentally by studying parallel microbial populations evolving in identical environments (e.g., Lenski & Travisano, 1994). We address this question theoretically for the case of adaptive evolution of replicate asexual populations. In the previous section, we derived the probability of any specified ordering of fixations. In this section, we derive the fitness trajectories that correspond to the various fixation orderings. From this, we compute trajectories of fitness expectation and variance among populations. In identical environments, it is reasonable to assume that genetically identical clones will have the same set of 99 potential adaptive improvements. Hence, in this section, we assume that parallel populations share a common set of available beneficial mutations. Expectation and variance Given some fixation ordering, C, the time until . . . . . ' ln.N fixation of the ith benefiCial mutation is if = S: I, + s . I .- "' J (0} The time I, is a waiting time between the j-lth fixation and. the appearance and survival of the j“ mutation; this time plus the time required for fixation of the jth mutation, lnN s , equals the total time between the j-l” and j” (0} fixations. The If depend on fixation ordering, C. If, for example, the j+1m mutation to be fixed is superior to the j“ mutation, given fixation ordering C, then the expected waiting time until occurrence and survival of the jth mutation is shortened because of the condition that it must be fixed before the appearance and survival of the superior j+lth mutation. Mathematically, given that the jth mutation appears and subsequently survives drift at time I], the 100 probability that the j+lm mutation appears after time lnN . lnN ls exp{-rj+l (Ij+ I,+ )}. The condition that a S S ’ cm <0) subsequent fixation, say the j+2m fixation, fixes an inferior mutation will shorten the expected waiting time slightly, because the waiting time for the j” mutation must not be so long as to allow the prior fixation of the inferior j+2m mutation. Mathematically, the j+2m mutation lnN’ . , appears and survives drift after time (I'_-‘S ) with (U +2} . . lnN + . probability exp{<€ 2(5- 5 )}. To generalize, let S, {(10} denote the set of all beneficial mutations both superior to and subsequent to the j‘“ mutation fixed, and let I, denote the set of all beneficial mutations both inferior to and subsequent to the j“ mutation fixed. Then, given fixation ordering C, time I] has expectation lnN __lnN E(t_|() = ftg(t)exp{-er(t+ )-er(t )+}dt (22) J 0 1:63: CU} kelj k where g(t)==r‘ eflkUfi. Thus, the accumulated expected time CU} until fixation of the i” mutation is 101 Population fitness after the ith lnN E(T.|Z.) = :Eulcw c0} fixation is defined as w;==l + 2: 3 Therefore, the function describing the trajectory of population fitness over time is W(t|<) = W, I (23) where i = (k: t e [E(Tk|(), Emmlcn}. Given the set of beneficial mutations available to an organism, one can determine the expected fitness of the population at time t by weighting each of the possible fitnesses at that time by its probability and summing over all possible trajectories. The set of possible trajectories has a one-to-one correspondence with the set of fixation orderings. To compute expected fitness at time t, one must multiply the population fitness (given a particular fixation ordering) by the probability of that fixation ordering and sum over all possible fixation orderings: E . (30) [1] where F(s) and f(s) are, respectively, the cumulative density function (c.d.f.) and p.d.f. for the selective advantage of beneficial mutations. If we assume, as we have in the previous derivations, that selective advantage S of beneficial mutations is an exponentially distributed random variable, then (30) becomes ps (3) = M(1-e‘“’)“‘ae'°" . (31) [1] This gives the p.d.f. for W1, since W, = 1 + Sm. Derivation Of the p.d.f. for W;==l+:: S_ does not yield the gamma 112 distribution (as in the non-ordered region) because the 5w) are not independent. In this case, the p.d.f. is found by implementing a convenient feature of the exponential distribution, namely, that S,” and the differences SUtU - Sn], j=1,2,...,i—1, are independent random variables. Thus, for computing the density of population fitness upon fixation of the ith beneficial mutation, the following theorem obtains. Theorem. For i = 1,2,...,M; let Xuj.be the order- statistics drawn from exponential parent density, aeT“, such that X[1]>X[2]>e s o>X[M]. Define Zi = 2 lel 0 Then! the j: ‘probability density function of Zi.is (91(2) = , M(1-e—“")M’lde_°“' , i =1 hFl d’ -m, z’ ‘ Rhi j . ' (17.) [m w“) v), - i "1 '“ZL " (‘1)k i k ”-kd w ere Y ( 1) [00.] (l e ) + : (i-k—l)![ 00.) z . 113 This theorem is readily proved by induction. Thus, the density for fitness after the i” fixation is ¢WTI(W)' as given by the theorem. .As in the non-ordered case, the time to fixation of the i“ beneficial mutation is given by the sum, T = S:(I’ + T.). And as before, the waiting time is l J :1 M F) decomposed into two parts, (i) 23 TW and (ii) TE. Part 1:1 1a ‘ (i) is the same as in the non-ordered case. In part (ii), ln.N , , applies; the same transformation of variables, Ta = U] however, the transformed variables in the ordered case are not independent. Again, the lack of independence is remedied by the convenient feature of the exponential distribution described above. The p.d.f. for Tm is fTF, (y!) = y y y1-2 yi-l (33) f ‘f . . . .f j. y(thxv...,yg)(macbcd...cbg , t-l [-2 2 l TyiTlTy2 '3'y1-2'iy1-1 where, y = 114 M! - -a In N M“ ' -2 -a 1n N ——. 0( ln N ' 1 - ex —— - ex —-— (M-i) ! ( ) [ p{ Y’. D I! (y! y] +1) p{ y] '1’}. +1 }' and )3” = 0. .As in the non-ordered case, the p.d.f. for the sum of the two parts, T) = 2 (TM), + T_), is given by their 1:] F) convolution, hum = gT (t)*fTF(t) = .7'(.7{gTM(t) )9{fTF(t)}) . (34) M In this section, I have presented work in progress. It is hoped that some features of the fitness p.d.f. derived for the ordered region will enable one to distinguish it from the fitness p.d.f. derived for the non-ordered region. If such distinctive features are discovered between sums of order-statistics (ordered region) vs. sums of i.i.d. random variables (non-ordered region), it may be possible to determine whether the evolution of a meta-population is selection limited (corresponding to the ordered region) or mutation limited (corresponding to the non-ordered region), simply based upon the fitness distribution. 115 Discussion Previous work by Johnson et al. (1996) explored the case in which a single beneficial mutation is available to a population. They computed the variance in mean fitness among replicate populations and discovered the existence of two distinct domains: (i) When the product of beneficial mutation rate and population number, pN, is above a critical value, adaptive evolution is repeatable; they called this the "coincident-event" domain. (ii) When pN is small, adaptive evolution is not repeatable; they called this the "isolated-event" domain. Qualitatively, we have reached the same conclusion by our discovery of the distinct ordered and non-ordered regions of parameter space, the main difference being that we have treated the case of two or more available beneficial mutations. Assumptions of the models The expected fitness trajectory given by equation (24) was derived by assuming that fitness jumps suddenly at each expected time of fixation of a beneficial mutation. (These 116 sudden jumps in fitness explain the jagged trajectories in Figures 3.3.) While this assumption is justified in Appendix 3.1 and in Lenski et al. (1991) for the expected fitness trajectory, it is not justified for the trajectory of fitness variance. The fitness variance given by equation (25) is the variance among replicate populations in which fixations always occur at their expected time. Equation (25) does not account for the variance in timing of occurrence of beneficial mutations. While this timing variance may be large, the corresponding fitness variance is typically much smaller than the fitness variance due to the different possible fixation orderings. The expected fitness trajectory given by equation (24) also assumes that selection coefficients are additive. That is, the analysis does not allow any epistatic interactions, whereby the selection coefficients of certain mutations might change in magnitude or even sign. This absence of epistatic effects in conjunction with the finite number of available beneficial mutations allows the roughly monotonic deceleration of the fitness trajectory to its final plateau (Figure 3.3, panels A, C, E). If there were epistasis, then the trajectory could suddenly re-accelerate after, for 117 example, the fixation of a beneficial mutation of small effect which resulted in certain other mutations becoming much more beneficial than they were against the former genetic background. The absence of epistasis ensures the eventual convergence of all populations to the same fitness plateau, such that the variance in fitness among populations eventually goes to zero (Figure 3.3, panels B, D, F). With strong epistasis in which the sign of certain mutational effects varies according to genetic background different populations can become stuck indefinitely on fitness peaks of different heights (Wright, 1982). That is, the initial fixation order of beneficial mutations will vary among finite populations, which may then open up (or close off) alternative adaptive routes. Therefore, sustained fitness variation among replicate populations, even while they individually approach fitness plateaus, implies a rugged adaptive landscape with strong epistasis (Lenski et al. 1991; Lenski & Travisano, 1994). In future work, we intend to examine formally the dynamical consequences of epistatic interactions among beneficial (and conditionally beneficial) mutations in finite populations. 118 The previous section, entitled Fitness trajectories II, relies heavily on the assumption that the parent distribution of beneficial mutational effects is the exponential. The notion that this distribution should be some monotonic decreasing function was proposed by Fisher (1930). His claim was based on certain assumptions about both the structure and the dimensionality of fitness landscapes. More recently, Gillespie (1991) elegantly defended the use of the exponential distribution for beneficial mutational effects, basing his argument solely on the statistical theory of extreme values. Despite the sound logic of these arguments, however, very little biological data are available to test this assumption. Estimating'parameters For ten thousand generations of bacterial evolution, Lenski et al. (e.g. Lenski & Travisano, 1994) have recorded fitnesses of twelve E. coli populations relative to a common ancestor. We have performed a least-squares regression of their average fitness data to a simplified variation of equation (24) (Figure 3.4). A simplification was necessary 119 because the number of terms in the sum of equation (24) is M7; hence, exploration of parameter space for moderately large Atis computationally prohibitive. Instead of summing over all possible fixation orderings, our simplified equation sums only two terms: (i) the fitness trajectory in which fixations are rank-ordered times the probability of rank-ordering, and (ii) the expected fitness trajectory in which all orderings are equiprobable times the complement of the probability of rank-ordering. The expected fitness trajectory in which all orderings are equiprobable is given by: E[w(t|no order)] = wt , where i = (k: te[t:'tx+1)} , (35) where w;==1.+ i/d is the fitness after the ith fixation, and t - i;( j +dlnN)- i(i+l) 4-idlnN is the timin i -j= un(1/a)N‘ - 2un(1/a)N‘ g of the ith fixation; p is the overall beneficial mutation rate at time zero, and a is the exponential parameter. This simplification is systematically biased toward extreme unpredictability, because it assumes that fixation orderings are either perfectly rank-ordered or completely 120 unpredictable. This assumption neglects to favor "well- ordered" fixations (e.g. 124356) over "poorly-ordered" fixations (e.g. 516324), but instead treats both as equiprobable. In spite of this bias, however, tests of our simplified approach, for the computationally feasible case of small AL yielded a close approximation to exact solution of equation (24) when there is a moderate to strong tendency toward rank-ordering of fixations. Previous estimates of p and a for the E. coli populations (Gerrish & Lenski, 1998) gave us a priori reason to believe that there was a strong tendency toward rank—ordering of fixations, justifying the use of our simplified approach for these populations. Least-squares regression of the fitness data from the Arafil population (Lenski & Travisano, 1994) gave: exponential parameter, a'= 20, beneficial mutation rate, p = 5 x 104, and number of available mutations, M'= 8. Two of these parameters correspond reasonably well with previous estimates, a'= 35 and p = 2 x 104, from a model in which an infinite number of available beneficial mutations is assumed (Gerrish & Lenski, 1998). That the infinite-mutations model estimates a larger value for a reflects the fact that the beneficial mutation rate does not change when there is no 121 limit to the number of available beneficial mutations. In the present model, the beneficial mutation rate effectively decreases over time because the finite number of available beneficial mutations are slowly expended. Thus, to achieve a similar fitness increase, the average selective advantage of each mutation must be greater for the present model in which there is a finite number of available beneficial mutations. This would explain why the present model estimates a lower a (average selective advantage = l/a). The parameter, AL has not previously been estimated. In future work, I will develop a modification of the estimation procedure described here that will take into account all “well-ordered” fixations. I envision a routine that computes the probabilities of all fixation orderings in which zero or only two fixations do not appear in rank- order. In addition, a bootstrap routine could then sample other permutations at random. Such a routine will be used not only to estimate parameters with greater accuracy, but also to compute the corresponding trajectory of fitness variance. Comparison of this theoretical variance with observed fitness variance among E. coli populations could be useful in addressing the question of whether these 122 populations are following the same evolutionary pathway (Lenski et al., 1991). Ordered substitutions in nature, with an application to HIV Some recent studies with bacteria and viruses have shown that certain pairs of mutations are fixed in evolving populations in a more or less predictable order (Hall, 1988; Mittler & Lenski, 1992; Cunningham et al., 1997). Three different hypotheses can, in principle, explain this predictability. First, mutation A may be much more advantageous than B. In that case, A deterministically out-competes B if they co-occur, as is likely in a sufficiently large population. Also, A has a correspondingly higher probability of escaping initial loss due to drift than does B. Second, mutation A may occur at a higher rate than B so that, in a small population, A is more likely to arise and be fixed first. Third, the mutations may interact epistatically, such that B is advantageous only in the presence of A. In that case, B will be fixed in the population only after A has been fixed. The theory that we have developed here provides a mathematical framework for 123 evaluating the first hypothesis. Moreover, it can be readily extended to also incorporate the second hypothesis, at least for the case of two mutations. In the following paragraphs, we apply this framework to the case of resistance mutations in HIV (human immunodeficiency virus). The effective population number of virions in an HIV- infected person has been a matter of recent debate (Mascolini, 1997; Leigh Brown & Richman, 1997). The outcome of this debate is of central importance to the understanding of in vivo evolution of HIV.) On one side of the debate, Coffin (1995) argues that, with respect to population dynamics, in Vivo populations of HIV are effectively infinite and therefore behave deterministically. On the other side, Leigh Brown (1997) contends that in vivo HIV has the stochastic behavior of a small, certainly finite population. Based on molecular data and an algorithm developed by Kuhner et al. (1995), Leigh Brown (1997) <:a1culated that the effective population number of virions in an HIV-infected patient was on the order of one thousand. This number is five to eight orders of magnitude smaller than.the actual number of virions present in an HIV-infected JPerson (Ho et al., 1989; Piatak et al., 1993). To support 124 this calculation, he pointed to the fact that mutations conferring resistance to anti-retroviral drugs do not always appear in order from largest to smallest effect. He reasoned that, if the effective population number were very large as modelers had previously assumed, one would expect resistance mutations to always appear in order from largest to smallest effect. Yet, despite the sound logic of this argument, his calculation was met with some criticism (Mascolini, 1997). Here, we evaluate Leigh Brown's claim by mathematically formalizing his argument concerning the ordering of resistance mutations. Based on known parameters for two resistance mutations (Boucher et al., 1992), the ordering of their fixation in vivo, and an estimate of the in vivo point mutation rate (Mansky & Temin, 1995), we estimate the probability of the observed ordering as a function of effective population number. (Effective population number is simply the parameter N'in our previous developments.) Then, employing the double limit given by equation (17), we compute the maximum possible probability of the observed ordering. In 4 out of 18 patients receiving AZT treatment for HIV infection, Boucher et al. (1992) observed that an inferior 125 resistance mutation, K70R, appeared to be fixed before a superior resistance mutation, T215Y/F.1 (In several cases, the appearance of K70R was transient; we did not count these transient appearances as fixations.) The 50 percent inhibitory coefficients (;Rgo) for wildtype, mutant K70R, and mutant T215Y/F are 0.006, 0.01, and 0.15 uM AZT, respectively. Employing a saturation function, we compute selection coefficients of mutants by IC,0 (M) [C+1'Cso (W) ] = - l where C denotes AZT concentration 5 1550‘") [91°30 (M) 1 ' in vivo, W'and.M’denote wildtype and mutant, respectively. Using equation (15), the solid line in Figure 3.5 plots the probability of the observed fixation ordering against effective population number. It suggests that such an ordering would never occur if effective population number were greater than ~ 3000. l A. Leigh Brown (personal communication) has pointed out that a caveat to our interpretation of this data is that both mutations may be present in the population prior to drug treatment. But such pre-existence (of any consequence) is unlikely if the effective population number is small. So, in this sense, our argument appears circular. Nevertheless, the observed fixation ordering could be explained (without invoking unusual epistatic interactions or extraordinarily large differences in mutation selection balance) only if the T215Y/F mutation does not pre—exist in the population. 126 One source of error in the above calculations is due to uncertainty in the conversion of the IC50 to selection coefficients. The accuracy of the resulting selection coefficient relies on (i) the accuracy of the IO”, (ii) the dubious assumption that these values determined in vitro are the same in vivo, and (iii) both the accuracy and constancy of the in Vivo concentration of AZT, C. This latter concentration is highly variable throughout the patient's body and over time, thus introducing considerable uncertainty in estimates of the selection coefficients. To circumvent this difficulty, we appeal to equation (17) for the maximum probability, Pr{I,S}, given only that ss.> sp A second source of uncertainty in our calculations is introduced by the assumption of equal mutation rates. Indeed, that K70R appears first may be explained, at least in part, by invoking a higher mutation rate for K70R than for T215Y/F. This explanation may be ruled out, however, by calculating the maximum probability given by the limit as )mwm gets large; this limit is employed in equation (17). The dotted line in Figure 3.5 employs equation (17) to plot the largest possible probability of observing K70R before T215Y/F against effective population number. By 127 largest possible probability, we mean the maximum probability given simply that smuwpsmm and that pxm is unknown. This figure demonstrates that these two simple conditions together with the observation of K70R before T215Y/F imply that the effective population number is less than about 5000. This observation is confirmed by numerically solving the inequality given by equation (18) for N. Based on the Poisson distribution and the estimated probability (solid line of Figure 3.5), the observations of Boucher et al. statistically support the hypothesis that the effective population number is less than ~ 300. Based on our maximum probability (dotted line), their observations support an effective population number of anything less than ~4000. This number corresponds to an average production of one mutation per nucleotide per ~10 generations ([4000 replications X (3)10"5 mutations per nucleotide per replication]*; see Mansky & Temin, 1995). Thus, our observations generally support the notion advocated by Leigh Brown that the effective population number of virions in an HIV—infected person is small enough that it must be considered finite. 128 0.0 .— 0.8 -- 0.7 -- 0.e .- 0.5 -- 0.4 .- 0.3 «- 0.2 -- 0.1 -- Log[population number, N] 0.9 .- 0.a -- 0.7 -- 0.e .. 0.5 -- 0.4 -- 0.3 .. 0.2 ~- 0.1 [ probability of superior before inferior, Pr{S,l} -11 -10 -9 -8 .7 Log[beneficial mutation rate, p] Figure 3.1. Probability that a superior mutation (ss = 0.1) is fixed before an inferior mutation (s, = 0.08) as a function of both population number, N (for which p5 = u, = 3 x 104), and overall beneficial mutation rate, u (for which N = 3.3 x 107) . 129 1.00 1- 0.75 .. 0) 0.50 ~~ C2 CD 23 025» (U .5 H— 000 e , p .8 4 5 6 7 8 9 L. 8 § Log[population number, N] a: C: E ‘8 >5 1.00T :: g 0.75 -- '8 I... Q. 0.50 .. 0.25 d- 000 "r + ii 1 .11 -10 -9 -8 -7 Log[beneficial mutation rate, (1] Figure 3.2. Probability of fixation ordering from largest to smallest selective advantage as a function of both population number, N (for which p = 6 x 104), and overall beneficial mutation rate, p (for which N'= 3.3 x 107). The number of available beneficial mutations is At: 7, and the corresponding set of selection coefficients is {.08, .05, .04, .03, .02, .01, .005}. The individual beneficial mutation rates are p,==)bfl%. 130 (A) N: 1E6 i i 135 ! 13 1.25 i- 1.2 ._ 1.1 .. 1.05 4. expected fitness 0 2000 4000 6000 8000 10000 0.04 .- 0.035 .- 0.03 .. 0.025 .. 0.02 .. 0.015 .. 0.01 .. 0.005 .. standard deviation 4 0 2000 ” 4000 6000 8000 10000 generations 'Figure 3.3. Evolutionary trajectories as a function of population size. The top and bottom panels in each pair show population fitness and its standard deviation. (A) N'= 10‘, (B) N’= 107, (C) N'= 10°. The number of available beneficial mutations is M'= S, and the corresponding set of selection coefficients is {.12, .08, .07, .05, .02}. The overall beneficial mutation rate is u = 6 x 104, and individual mutation rates are u, = p/Mz 1 t (B mmmwcoc UGHUOQXQ :0.=m..>0U UthCmcm Pim 131 (B) N=1E7 1.35 Y mar 125i i2r 1.11. L05 i 41 0 2000 4000 6000 8000 10000 expected fitness 0041 0mm.. onal 00251 002" 0011 0mm.- standard deviation 0 2000 4000 6000 8000 10000 generations Figure 3.3 (cont.) 132 (C) N=1E8 1.35 .- 1.3 .. 1.25 .. 1.2 .. 1.15 .. 1.1 .. 1.05 l expected fitness 0 2000 4000 6000 8000 10000 0.04 .. 0.035 .. 0.03 .. 0.025 .. 0.02 .. 0.015 .. 0.01 J. 0.005 .. standard deviation 0 2000 4000 6000 8000 10000 generations Figure 3.3 (cont.) 133 1.6 .. I 8 ' g a) I I . g . (D > z: .‘E m 1 ml *- 0000 10000 time (generations) Figure 3.4. Least-squares fit of expected fitness trajectory to fitness data from evolving E. coli population, the Ara'1.line from Lenski & Travisano (1994). Effective population number is.N’= 3.3 x 107. The generation number in the xeaxis corresponds to a discrete-time model of population growth by binary fission, whereas the mathematical developments in this paper employ a continuous— time formulation. For purposes of parameter estimation, we have therefore multiplied the generation number by a factor of ln 2 to adjust for this difference. Estimated parameters are M': 8, a = 20, and u = 5 x 104. nix/U r01.) (11411 [(11 . 1 134 U. \ >— L0 1 e ......... \— ~~~~~ e G) 081 \K L. fl .13.: . JD 061 m “\ E 04 .. ' \ x \\\ “- 3 0 02.. ‘. >~ ‘. :2 . 1% o 1 ’° . 1 .0 1 2 3 4 5 9. Q. Log[effective population number, Ne] Figure 3.5. Estimated probability (solid line) and maximum probability (dotted line) that the inferior AZT-resistance mutation, K70R, is fixed before the superior mutation, T215Y/F, as a function of effective population number. We used a conservative value for the mutation rate, them”? = 10'5 (see Mansky & Temin, 1995). To compute estimated probabilities, we employed equation (15) and assumed (i) that the two mutations conferred in vivo selective advantages over the wildtype of sum,= 0.3 and snuwfi.= 1.5 (converted from IC5° values reported by Boucher et al., 1992), and (ii) that up“ = 11,215,”. To compute maximum probabilities, we employed equation (17) which relies only on the condition that smsy/psnog and an estimate for ”T215171“ Chapter 4 HITCHHIKING OF DELETERIOUS MUTATIONS WITH SPECIAL.ATTENTION TO MUTATORALLELES1 Abstract In asexual populations, the fate of’a particular allele is determined as much by the selective values of alleles at other loci as by its own selective value. A deleterious allele may achieve high frequency or even fixation in a population as a result of linkage with a beneficial mutation. For a specified deleterious mutation, we derive the rate at which such “hitchhiking” takes place, taking into account competitive interactions among beneficial mutations. As a special case, we explore the hitchhiking of mutator alleles. we find that there exists a most probable mutator strength that is positive and typically of significant effect, suggesting a sort of “quantum” behavior in which a population’s mutation rate either corresponds to the wildtype rate or is elevated significantly above that rate. This characteristic has been observed in both natural and laboratory populations. Our results indicate that hitchhiking is generally an important mechanism in the evolution of mutation rates in asexual populations. l 'Ihis chapter is written in the format of a paper. The “we” in this «chapter refer to P.J. Gerrish, P.D. Sniegowski and R.E. Lenski. 135 136 Introduction A neutral or deleterious mutation may rise to high frequency or even fixation in a population as a result of being linked to a beneficial mutation. This process is known as hitchhiking. Maynard Smith & Haigh (1974) and Kaplan et al. (1989) have explored the consequences of hitchhiking on heterozygosity for both selectively neutral alleles and selectively maintained polymorphisms. Among other things, they found that hitchhiking reduced average heterozygosity by an amount determined in part by the degree of linkage between the beneficial mutation and the locus in question. In asexual organisms, there is complete linkage among all loci. For the asexual case, Berg (1995) derives a stochastic model of hitchhiking in which beneficial mutations are implicitly assumed to be very rare events such that they do not interfere with each other's progression to fixation; he explores implications of this model for neutral and nearly neutral variation. In large populations, however, the fate of a given allele is affected not only by selection at linked loci but also by competition with other 137 fitness variants in the population, each with its own complex array of linked fitness alleles. Taking into account such competition, we explore the dynamics of hitchhiking, focusing our attention on the fate of a single specified deleterious allele. We derive the per-capita rate at which the specified deleterious allele is fixed by hitchhiking and the consequent probability that it is fixed within a given time interval. In obligately asexual organisms, linkage simply means that alleles are present in the same genome. There are three ways in which a deleterious mutation and a beneficial mutation may appear on the same genome; the deleterious mutation may occur first, the beneficial mutation may occur first, or the two may occur simultaneously (in the same generation). If the beneficial mutation occurs first, then the deleterious mutation will not be fixed because it will compete with fitter organisms that carry the beneficial but not the deleterious mutation. The deleterious mutation may appear early in the growth of the beneficial mutation, in which case it will initially rise to high frequency with the beneficial mutation; however, it will subsequently decrease 138 in frequency and eventually stabilize at some frequency determined by the balance between mutation and selection. In light of the above, a deleterious mutation can be fixed by selection only if it is in the background upon which a beneficial mutation appears. That is, the deleterious mutation must occur before, or simultaneously with, the beneficial mutation. Thus, the rate at which a specified deleterious mutation hitchhikes to fixation is reduced to the rate at which successful beneficial mutations are produced in the specified deleterious subpopulation. Theoretical developments The hitchhiking rate For convenience, we refer to any beneficial mutation that is produced on a specified deleterious background (in that order) as a double—mutation. The frequency of a deleterious mutation in a population is increased by recurrent mutation and decreased by selection. The resulting mutation-selection balance determines the frequency of the deleterious mutation, which we denote by 139 o. Let u; denote the beneficial mutation rate in this deleterious subpopulation. Then, the per capita recruitment rate of double-mutations is ¢>u;. We allow the beneficial mutation rate in the deleterious subpopulation to differ from the corresponding rate in the wildtype subpopulation (hence the prime), because one class of deleterious mutations that is of special interest causes a general increase in mutation rate. Let Pr{fix} denote the probability that any given beneficial mutation produced by the deleterious subpopulation achieves fixation in the population. Then, the per-capita rate of fixation of such double-mutations is h = (pp; Pr{fix} . (36) This is, in other words, the per-capita hitchhiking rate of the deleterious mutation in question. To assemble the pieces of equation (36), we first derive an effective mutation-selection balance, ¢¢’ and compare it with the commonly used equilibrium mutation-selection balance, 6. Then we derive the fixation probability, Pr{fix}, by considering that a beneficial mutation may be lost as a 140 result of either drift or competitive exclusion by a superior beneficial mutation. All derivations in this section are general for deleterious mutations. An effective mutation-selection balance The specified deleterious mutation appears at per- capita rate up, and selection acts to remove this mutation at per-capita rate 80¢, where 5,) is the coefficient of selection against the specified deleterious mutation. Therefore, the dynamic equation for mutation-selection b l ' d¢ ‘ ( - )(1- ) h l ' a ance 15 Id? - 1%) 55¢ ¢ , w ere ( ¢) imposes logistic frequency dependence and ensures that o s 1. Because mutation selection balance is typically a low frequency (such that 1-¢ e21), this equation is well d approximated by -é% =1s)-'sd¢. The equilibrium solution to . u this dynamic equation is ¢ 2'32' which is the conventional D expression for mutation-selection balance. Evolving populations, however, are not at equilibrium. Hitchhiking of deleterious alleles only occurs in adaptively evolving populations. In such populations, genetic variation is periodically purged by adaptive substitutions. 141 In Appendix 4.1, we derive an effective mutation-selection balance, o‘, which employs the above dynamic equation and takes into account such periodic homogenizing of a population. In subsequent figures, we plot curves employing both equilibrium (¢>= 6) and effective (q>= ¢¢) mutation- selection balance. Figure 4.1 illustrates the importance of accounting for the mutation-selection dynamics. It uses mathematical developments and equations that are presented in Appendix 4.1, but the basic points can be readily understood graphically. Panel A shows, for two different values of em the dynamics of a deleterious mutation as it approaches its equilibrium without any interruption. Notice that the more deleterious mutation not only has a lower equilibrium frequency, but also that it approaches that equilibrium much faster. However, the on-going substitution of beneficial mutations will perturb these dynamics away from their approach to the equilibrium, periodically re-setting the deleterious mutation frequency to near zero. Panels B and C show this effect for two different values of the substitution rate, 0, the inverse of which approximates the median time between fixation of successive beneficial 142 mutations. As can be seen, the discrepancy between the equilibrium (dashed line) and effective (solid line) mutation-selection balance becomes greater as the substitution rate increases. The discrepancy also becomes greater at lower values of’sb because, as shown in panel A, the equilibrium frequency is higher and the approach to equilibrium slower, leaving correspondingly more time for a given rate of adaptive substitutions to depress the frequency of the deleterious mutation substantially below its equilibrium. Thus, the discrepancy between the equilibrium and effective mutation-selection balance is a decreasing function of‘sm and an increasing function of 0. Probability of fixation of double—mutations, Pr{fix} If a beneficial mutation on a deleterious background is to achieve fixation, it must (i) confer an advantage that outweighs the disadvantage of the deleterious background, (ii) survive the effects of drift in the first few sgenerations of growth, and (iii) outcompete any alternative beneficial mutations that may arise on the wildtype or deleterious backgrounds. Condition (iii) may be rephrased 143 to state that a beneficial mutation will be fixed only if it does not encounter an interfering mutation, defined as follows. If a superior mutation appears and itself survives drift, then it is called an interfering mutation since it interferes with, indeed prevents, the fixation of the original beneficial mutation. Because loss by drift occurs in the first few generations of growth whereas loss by clonal interference (competition among alternative beneficial mutations) occurs in later generations of growth, we can make the assumption that these two processes are independent. Thus, we compute the probability of fixation of a beneficial mutation on a deleterious background as the product of the probabilities that (i) this double-mutation survives drift and (ii) no interfering mutation is encountered. Suppose a population consists of two homogeneous subpopulations, wildtype and deleterious, until the appearance of a beneficial mutation on the deleterious background. Before the beneficial mutation appears, the size of the deleterious subpopulation is determined by the (iynamic balance between mutation and selection. Let x(t) Cienote the number of wildtype individuals at time t, let 144 y(t) denote the number of individuals carrying only the deleterious mutation, and let z(t) denote the number of individuals carrying both beneficial and deleterious mutations. Let 3,, and SD denote, respectively, the selection coefficients for the beneficial mutation and against the deleterious mutation, such that, relative to the wildtype, the fitness of an individual carrying both mutations is 1 + 3,, - sD (i.e., both 3,3 and sD assume positive values). Define t; as the time to virtual fixation of the double mutation. Start time with the appearance of a beneficial mutation on the deleterious background, i.e., z(0) = 1. Then the total number of alternative beneficial mutations is the number produced on either the wildtype or deleterious background during the time interval (0, t;). On the wildtype background, this number is us S— S (1-¢)N1nN , (37) B D t (sij‘x(t) dt = 0 where uB denotes the rate of beneficial mutation in the wildtype subpopulation, and the dynamics of x(t) are assumed to be logistic, which carries the assumption of constant population size (see Crow & Kimura, 1970). Likewise, the 145 number of alternative beneficial mutations produced on the deleterious background between the times of appearance and fixation of the original beneficial mutation is t I / 3 30 SB 30 ¢1Vln.N . (38) The total expected number of alternative beneficial mutations is the sum of equations (37) and (38). We now determine what fraction of these alternative mutations are interfering mutations. To proceed requires an assumption about how selection coefficients of beneficial mutations are distributed. For reasons discussed in Gillespie (1991, p. 262), we have chosen the exponential distribution. Let a denote the parameter of this distribution. Then, given some function, n(s), describing the probability of surviving drift (see Appendix 4.1), a beneficial mutation chosen at random from the wildtype subpopulation (i) is superior to the double-mutant, and (ii) survives drift, with probability f n(s)de%”ds. When n(s) ‘3-30 is a linear function of s (which we assume in the following developments), this integral reduces to -a( 3 8-3 D) n(siwafl/d) e The second factor in this product is 146 the probability that an arbitrarily chosen beneficial mutation is superior to the double mutant in question, given the exponential distribution of mutational effects. The first factor is the expected probability that an arbitrarily chosen superior mutation survives drift. The probability that a beneficial mutation chosen at random from the deleterious subpopulation is (i) superior to the original mutation and (ii) survives drift is [n(s-SD) de'” ds, or ’8 n(sB-sD+1/01) em” when n(s) is linear. Note the change in. the lower integration limit from sB-sD for the wildtype subpopulation to 33 for the deleterious subpopulation. This is because a mutation occurring on the deleterious background must have a selection coefficient greater than 33 if it is to be superior to the double-mutation, whereas a mutation occurring on the wildtype background needs a selection coefficient only greater than SB -.ap Interfering mutations are those which (i) occur in the interval (0, tf), (ii) survive the effects of drift, and (iii) are superior to the double-mutation. Therefore, the number of interfering mutations produced by the wildtype subpopulation is 147 HE S ‘S B D (1-¢)N1n(N) n(SB-SD+l/Ot) e'“('”"") . (39) Likewise, the number of interfering mutations produced by the deleterious subpopulation is / u - A] = s -Bs cp Nln(N) n(sB-sD+l/01) e as” . (40) B D The total number of interfering mutations in the population, therefore, is simply the sum A +2U. The number of interfering mutations is Poisson distributed, so the conditional probability of fixation of a beneficial mutation with net selective advantage, sB-:% > 0, is equal to the probability that (i) it survives drift and (ii) no interfering mutation appears: I Pr{fixlsB} = n(sB-SD) 9‘0”“ . (41) Given our assumption that $3.15 exponentially distributed with parameter a, the expected probability that a beneficial mutation on the deleterious background achieves fixation is . ” 11+fl)-ua Pr{fix} = Oth'I(SB-SD) e dsB . (42) D 148 The lower limit of integration reflects the obvious condition that a beneficial mutation is defined as one that has a net selective advantage, i.e., sB-:% > 0. Substituting equation (42) into (36) gives the final expression for the per capita hitchhiking rate of a deleterious mutation: _ , m _ 11+16-a” h - oqufIHsB s0) e ds . (43) 3 D Converting per-capita rate to populational probability To make sense of a per-capita hitchhiking rate, it is helpful to convert it into the corresponding probability that a hitchhiking event occurs in a population within a given time interval. A hitchhiking event is said to have occurred in a population if the specified deleterious mutation (i) has produced a successful beneficial mutation, and (ii) has achieved fixation as a result of linkage with this beneficial mutation. Given per-capita hitchhiking rate, h, the populational probability that the specified deleterious mutation produces a successful beneficial 149 mutation in the interval (0,1) is 1.-e‘“fl. After the successful beneficial mutation is produced, a certain time, I}, elapses until the resulting double mutation is fixed. Only then has the hitchhiking event taken place. Thus, the populational probability that the specified deleterious mutation hitchhikes to fixation in the interval (0,T) is ~hN(T-T[) 1 — e I TZT Pr{hitchhikelT} = f . (44) 0 , Te 0; for comparison, we do not employ our effective mutation-selection balance because their simulations do not appear to take into account the periodic purging of genetic variation caused by substitutions.2 Inserting their simulation parameters ((16 = 10", 118 = 108, and 110 = 5 x 107’) into our analytical model, we computed the probability that such a population would fix a mutator by hitchhiking within a time span of 20,000 generations. Out of 100 simulations, Taddei et al. found that 19 fixed a mutator in 20,000 2This is evidenced by their Figure 3 in which mutator frequencies never drop much below mutation-selection balance. Given parameters used by Taddei et al., their Figure 3 should show occasional downward spikes of between one and two orders of magnitude if the variation—purging effect of substitutions in the wildtype subpopulation were allowed by their simulations. 161 generations when m = 10, and 7 fixed a mutator when m = 100. Employing their assumption of mutation-selection equilibrium (¢>= 6), our model gives probabilities of 0.223 and 0.073 for m = 10 and m = 100, respectively, which shows good agreement with their findings. However, when we correctly account for the variation-purging effect of substitutions by employing the effective mutation-selection balance (6==6‘), probabilities are 0.040 and 0.057 for m = 10 and m = 100, respectively. We restate that the fundamental differences between our analytical model and the simulations are: (i) the analytical model assumes an unlimited number of available beneficial mutations whereas the simulations assume that number to be finite, and (ii) the simulations assume mutation-selection equilibrium whereas our analytical model takes into account the dynamics of mutation-selection balance. 162 Effect of population size on hitchhiking rate Panel A of Figure 4.2 plots per-capita hitchhiking rate, h, against population number, N, for a mutator; it shows that h remains constant below about N'= 106 and decreases steadily above that value. This figure elucidates the effect of increased clonal interference on per-capita hitchhiking rate. In small populations, clonal interference is highly improbable. Thus, the principal determinant of h. in small populations is the per-capita probability of occurrence of a beneficial mutation on the specified deleterious background. Because it is per-capita, this probability is independent of N, explaining why h is essentially independent of N below a certain population size. Above this population size, the decline in h reflects the fact that clonal interference is probable and intensifies with increasing AL thus reducing the probability that any given beneficial mutation (including those occurring on the specified deleterious background) will achieve fixation. Panel B of Figure 4.2 plots the corresponding probability that a population fixes the specified 163 deleterious mutation by hitchhiking within a time interval of 10,000 generations against population size, N. Hitchhiking probability increases with population number but this increase decelerates at large N because of intensifying clonal interference. Effect of selective disadvantage on hitchhiking rate Panel A of Figure 4.3 shows the per capita hitchhiking. rate as a function of the strength of selection against the specified deleterious mutation, 50. Interestingly, this rate is essentially independent of 5% when it is small. This is most easily understood by considering that the initial rise in the deleterious mutation's frequency after a substitution is determined much more by recurrent mutation than by selection against the mutation. It is not until the frequency of the mutation approaches its equilibrium value that selection begins to significantly affect its trajectory. Because very slightly deleterious mutations have relatively high equilibrium frequencies, there is a good chance that a substitution in the population will occur before selection becomes important to the trajectories of 164 such mutations. In other words, in the time interval between substitutions, slightly deleterious mutations may behave as neutral mutations, in which case their probability of hitchhiking is essentially that of a neutral mutation. (In the population genetic language, substitutions reduce the effective population size, AL, and any mutation whose selective disadvantage is less than Ngq, in a haploid population, is effectively neutral.) Panel B of Figure 4.3 plots the corresponding probability that a population fixes the specified deleterious mutation by hitchhiking within a time interval of 10,000 generations as a function of so. .A trend similar to that observed in panel A.is observed. Effect of mutator strength on hitchhiking rate Panel A of Figure 4.4 shows how the strength of a mutator affects its own per—capita hitchhiking rate. Recall that a mutator allele elevates the general mutation rate, which includes the beneficial mutation rate, by a factor, m (i.e., u’ B =1nu%). Thus, equation (36) becomes Ii=.m)si6 Pr{fix}, where uB denotes wildtype beneficial 165 mutation rate. This equation is the product of (i) the per- capita recruitment rate, niuBcp, of double-mutations, and (ii) the probability of fixation, Pr{fix}, of a double- mutation. Factor (i) is plotted in Figure 4.5, and factor (ii) is plotted in Figure 4.6. Thus, Figure 4.4A, which plots hitchhiking rate against mutator strength, may be understood simply as the product of Figures 4.5 and 4.6. Our aim in the next several paragraphs is therefore to explain the trends in Figures 4.5 and 4.6 in order to understand Figure 4.4. We begin by explaining the dashed line in Figure 4.5. This shows that the invalid equilibrium assumption (q>= 6) renders the double-mutation recruitment rate essentially independent of mutator strength over the region of biological interest. This may be understood as follows. The rate at which a mutator subpopulation produces beneficial mutations is directly proportional to the number of individuals carrying the mutator allele and hence to the mutation-selection balance of that allele. As derived in the subsection entitled Hitchhiking rate of mutators, the coefficient of selection against a mutator is l-e-%Umn. Thus, if deleterious mutations were always maintained at 166 . u their equilibrium frequency, 6 =0): 2?, then the population D would produce beneficial mutations on the mutator background Reno no words, a weaker mutator allows for higher equilibrium In . 115(1-m) ‘1 at a per-capita rate of HiuBlfi) 1-e z mutation-selection balance but has a lower per capita mutation rate. These two factors influencing the double- mutation recruitment rate, m1s36, directly cancel each other out, rendering this rate essentially independent of mutator strength. Note that this rate is also independent of population size, N. When the dynamics of mutation-selection balance are properly accounted for by employing ¢>= 6‘, then the double- mutation recruitment rate is not independent of mutator strength. In fact, the solid line in Figure 4.5 shows this rate to be a monotonically increasing function of mutator strength. This is explained as follows. After an adaptive substitution takes place, the frequency of mutators in the population is very low. It takes some time for a mutator allele to approach its equilibrium frequency. And the higher that equilibrium frequency, the longer it takes to approach (Figure 4.1A). This can be seen in the dynamic solution to the mutation-selection balance equation, 167 '3! —2 (l-e D). Weak mutators have small.a,and therefore D require long times to approach their high equilibrium frequency. If adaptive substitutions occur in the population at a given rate, then mutator frequency drops with a certain periodicity determined by this rate. Since stronger mutators recuperate their lower equilibrium frequency more rapidly, their effective mutation-selection balance is less affected by the periodic purging of genetic variation caused by adaptive substitutions. Thus, their effective recruitment rate of double-mutations is higher. Hence the trend of monotonic increasing recruitment rate with increasing mutator strength, as shown by the solid line in Figure 4.5. Figure 4.6 shows that probability of fixation, Pr{fix}, decreases monotonically with mutator strength. The stronger the mutator, the more deleterious it is. Therefore, the net fitness of a beneficial mutation that is linked to a mutator is lower for stronger mutators. Hence, the probability of fixation of such double-mutations decreases monotonically with increasing mutator strength. Figure 4.4 shows that, when the dynamics of mutation- selection balance are properly accounted for, there exists 168 an intermediate mutator strength that maximizes the hitchhiking rate. (Panel B of Figure 4.4 reveals that the same trend is observed at the population level.) This observation reflects a balance between the increased double- mutation recruitment rate of stronger mutators (Figure 4.5) which increases hitchhiking rate, with the decreased probability of fixation of stronger mutators (Figure 4.6) which decreases hitchhiking rate. Observations of bacterial populations in nature and in laboratories seem to suggest that mutation rates either correspond to a wildtype rate or they are elevated from this rate by one, two or occasionally three orders of magnitude (Sniegowski et al., 1997; LeClerc et al., 1996; Mao et al., 1997). Of course, this observation is complicated by the fact that strong mutators are more likely to be noticed and their mutation rate is more likely to be statistically distinguishable from the wildtype. Yet, it seems that the strength of the evidence outweighs this complication. Fragility of the genetic mechanisms involved in DNA synthesis, proofreading and repair may offer one explanation for this observation (Cox, 1976; Miller, 1996). Our work suggests another explanation which is based solely on the 169 adaptive dynamics of asexual populations. If a mutator is to be observed, it must achieve high frequency in the population. Hitchhiking is a plausible mechanism by which a mutator may achieve high frequency (Sniegowski et al., 1997). Our work shows that a mutator's hitchhiking rate is maximized when its strength is at some intermediate value. Therefore, given that a mutator is observed, it is most likely to be a mutator of intermediate strength. 170 01131 _ ..................................... A 0&D5- 1 frequency generations Figur. 4,1, The dynamics of mutation-selection balance. The two solid lines in panel A.show the frequency trajectories of two selectively distinct deleterious mutations, starting with a frequency of zero, as after an adaptive substitution, and asymptotically approaching their equilibrium frequency. (These lines plot the dynamic ‘81‘: equation, 6(t) = (l-e D )). Dashed lines show the mlé: U - u corresponding equilibrium frequencies (6 = E?)' Parameters D used are p0 = 3 x 10‘, 50:: 0.03 and 0.003. frequency ada lir1 Cal 171 0.001 _— ....................................................... - B 0.0005 \ \ \ \ \ \ i" 0 1 1 1 1 1 3 0 500 1000 1500 2000 2500 t: g 0.001 ........................................................... C 0.0005 -- O r 1 1 1 L 1 I i W 0 500 " 1000 1500 2000 2500 generations Figure 4,1 (continued). Panels B and C plot the frequency trajectories of the same deleterious mutations, but here adaptive substitutions occur at rates 0 = 0.0009 and o = 0.003, respectively. These substitution rates are determined from equation (88) using parameters N'= 3 x 107, or = 35, 113 = 4 x 104° and 2 x 10". The solid horizontal lines represent effective mutation-selection balance, as calculated by equation (87). “IO u-‘l‘l‘ 'J‘I‘ ( .... w r .....— t.. 44 .... tcwéwxs Po} pm The mut 172 £1 ~ 113-12 -- 0) .U (U H D) 3 .5 1E-14 ._ 8.15 Si I U . .. y 0 «4 .. as 1E-16 , , , 1; 1E+04 1E+06 1E+08 1E+10 I~ 01 l. D H :35 i m 1E‘01 .. g - 3 .2 '1'. .u 1E-02 - .6 e H In 0 o 5 1E-03 _ >1 0) :3 o H c 115-04 - La °. .0 3 1E-05 ‘ 1 g A, 8 t., 1E+04 -- 1E+06 1E+08 1E+10 Dav! population number, N Figure 4_2, Panel A shows the per-capita hitchhiking rate of a specified deleterious mutation as a function of population number, N. Panel B shows the corresponding probability that a population fixes the deleterious mutation by hitchhiking within a time interval of 10,000 generations. The dashed lines show the result of incorrectly assuming mutation-selection equilibrium (0): 6). Parameters used are or =35, uB=2x10", uD=2x10“, s, = 0.03. l “1 11waJJ JNJ III...- I 173 a 115-12 - . a; 1 ~ U 10 8 115-13 ._ U) m G ‘ if. '52 gnu-1 1E-14 __ o :3. I O z :1 01.: 115-15 1 4. e v 1 a 1, 0.00001 0.0001 0.001 0.01 0.1 1 t D) 1:: 32 ... 1E+00 fi 111 ......... o 6 ‘"*. U ”'1 1E‘01 -- °°. . *1” .G d . ... 3 11502 0 a 0 if; a) 115-03 .- «II C H O :5 :3 115-04 1 1 r t 1 1 '8 .1 0.00001 0.0001 0.001 0.01 0.1 1 11 1: an.) selective disadvantage, 8n Figure 4.3. Panel A shows the per-capita hitchhiking rate of a specified deleterious mutation as a function of its selective disadvantage, 80. Panel B shows the corresponding probability that a population fixes the deleterious mutation by hitchhiking within a time interval of 10,000 generations. The dashed lines show the result of incorrectly assuming mutation-selection equilibrium (4): 6). Parameters used are or = 35. 113 = 2.0 x10‘9, 110 = 2 xlO-s, and N = 3 x 10". t:?JwIJ(JfiJ “1 111‘ .1. I 174 a 15-12 1— 0) JJ 10 M 15-13 -- D) d Q 33 31 gnu-1 1E-14 -- o i l U 11.: 0.: 15.15 '4‘; 1 . D) d 34' q, 15+00 -_ .G e '3 5 .................... ,_. .,, 1501 -- -------- .... u .G a ... 3 1502 -- 0 fl 0 5' b' 1503 -_ 2.1 8 jé‘i 1504 .. 1 1 .1 '8 3 1 10 100 1000 H d 04-1-1 mutator strength, in Figur. 4,4, Panel A shows the per-capita hitchhiking rate of a mutator allele as a function of its strength, m. Panel B shows the corresponding probability that a population fixes the mutator allele by hitchhiking within a time interval of 10,000 generations. The dashed line shows the result of incorrectly assuming mutation-selection equilibrium (6 = 6) . Parameters used are 01 = 35, “a = 2.0 x 10", up = 2 x 10“, 116 = 0.0003, and N = 3 x 107. cofiumu5610HQSOU mUfiQ001HoQ 175 1:: ° i ...q t; ' g 1E-10 .— 1+; 6 E I '1 o H o " ‘3 1: o u 1E-11 ._ '0 U «u t: 3% a. g E 1E-12 1 1 1 4. 3 3 1 10 100 1000 10000 a. u mutator strength, 11: Figur. 4,5, The per-capita recruitment rate of double- mutations (beneficial mutations on mutator background) as a function of mutator strength, m. The dashed line shows the result of incorrectly assuming mutation-selection equilibrium (<1) = <13) . Parameters used are N = 3 x 107, 01 = 35, 1.18 = 2.0 x 10", no = 2 x 10“, and 116 = 0.0003. 176 K K O” u. Nu a: 1501 .- c? O :3 a 1502-- K -I-l 'H "3 1503-- >1 4.) .,.| ...] '3 1504 1 1 1 1 a. is" 1 10 100 1000 10000 H 01 mutator strength, m Figure 4,5, Probability of fixation of a double—mutation (beneficial mutation on mutator background) as a function of mutator strength, m. Parameters used are N = 3 x 107, a = 35, 113 = 2.0 x 10”, 140 = 2 x 10“, and 116 = 0.0003. frequency Pig bag ach EXa $01 177 deleterious on wildtype deleterious on beneficial 0.0001 >5 (3 C 0.00008 exac, E;0.00006 VZSden ‘— :Z'?'>.' ’ . « ' ‘. 1.... . :93: 3 -'=approx1mate 0-00004 'q; “:jesmmbng. 0.00002 i 10C] 200 300 1100 BBC generations Figur. 4.7. Frequency of a deleterious mutation during a substitutional event on both wildtype and beneficial backgrounds. Beneficial mutation appears at time t=0 and achieves frequency 0.5 at time t=173 (vertical dotted line). Exact solution is given by equation (85); approximate solution is given by equation (86). Parameters used are N= 3x107, “a: 5x10", s, = 0.1, s, = 0.005. APPENDICES .APPENDIX 1.1 MUTATION RATE ESTIMATION PROGRAM Theory The computer program FT.EXE was written to estimate mutation rates from fluctuation test data (Luria & Delbruck, 1943). Briefly, fluctuation tests report the numbers of mutants that have accumulated by spontaneous mutation and subsequent replication during exponential growth of a population (Chapter 1). Computation of the expected distribution is notoriously difficult (Stewart et al., 1990). The probability generating function for this distribution was derived by Lea & Coulson (1949). From this generating function, an algorithm for generating the corresponding distribution was derived independently by Gurland (1958) and Ma et al. (1992). (See also Gurland, 1963; Sarkar et al., 1992; Jaegger & Sarkar, 1995.) The program FT.EXE employs this algorithm for computing the distribution of numbers of mutants in a fluctuation test as follows. 178 179 Let u denote mutation rate of the wildtype to the selected mutation, let N denote the final population number, and let 1 denote the number of mutants in the final population. Then the probability, p5, that i mutants are present in the final population is given by P ==e O P. 12:: FE I (46) where m = uN is the expected number of mutations (not to be confused with mutants). To estimate m, the program FT.EXE employs the Newton series (derived below) for the recurrence relation given by (46). The program also implements the empirical formulas for 95% confidence limits derived by Stewart (1994). For small sample sizes, however, Stewart's formulas have no solution. To make estimates from small sample sizes, I derive (below) the analytical variance based on the maximum likelihood surface. As an alternative to Stewart's confidence limits, the program also gives 95% confidence limits based on computation of this analytical variance. Given this variance, one can base statistical inference on either the 180 assumption of normality or the implementation of Chebyshev's inequality (Feller, 1968). Let S denote the sample set, let m = uN denote expected number of mutations, and let u = 1n m. Given u, pjau denotes the probability of j mutants in the final population. Then, the log-likelihood function is L(u) = Zlnpjlu) . (47) fiS From this, the estimate of u is computed by solution of = o , (48> where the prime indicates derivative. The variance of u is given by Var(u) = —[L”(u)]‘1 (49) (see Hogg & Craig, 1995). To compute the variance, we have // l u) : pj(u) _ pj(u) __ (50) 3‘53 pj(U) pj(u) 181 The derivatives necessary for computing (48) and (50) are given by u '- AP UH ‘9 j = .___ _______ (51) P1”) 1' 1:0 (i-j+1) ’ r 1 V ‘ p (H) -e" 1 0 O k e / u where A = l l 0 , Pk(u) =lpk(u) i, and Po(u) =1 -e”'e K 1 2 1 n pk(U)J (eU_l)eU‘e To estimate u, the program solves (48) numerically by Newton's method. This is achieved by iterating on r: L’1u) u = u - -—”—— o (52) [*1 r L (u) Once an estimate for u is obtained, this estimate is used to compute the variance by (49). From here, the program assumes that u is normally distributed to compute confidence limits. (Stewart (1994) suggests that u is more nearly normal than m.) However, the program may be easily modified 182 to implement Chebyshev's inequality in order to avoid assumptions about how u is distributed. Lastly, the estimate of the number of mutations, m, as well as confidence limits on this estimate are obtained by back- transformation, m = eK From here, the mutation rate estimate and confidence limits are obtained from p = m/N. Obtaining and using the program The compiled program, FT.EXE, as well as the source code, FT.BAS (or FT.TXT for ASCII format), may be obtained at the following ftp site: ftp://ftps.cdc.gov/pub/MuRates The username and password needed to log on to this site are “ncidftp” and “12emerG” (case sensitive), respectively. Also at that site are: (i) FTDOC.TXT, documentation on how to use the program, (ii) FORMAT.TXT, an outline of the input format, and (iii) SAMPLE.DAT, an example data file. .APPENDIX 2.1 PROBABILITY OF SURVIVING DRIFT In the first few generations of growth, a beneficial mutation may be lost by random sampling events, or drift. Haldane (1927) derived the probability of surviving drift for a single beneficial mutation. His derivation made use of a result from the theory of branching processes, which states that probability of extinction (i.e., not surviving drift) is obtained by solving the equation f(9)==e, where f(e) is the probability generating function for number of offspring (see Ewens, 1969, p. 79). A simple assumption for multicellular, sexual organisms is that this function generates a Poisson distribution, in which case the probability of survival of a beneficial mutation approximates 23. Our analyses, however, are based on the fundamental assumption of no recombination. We may further restrict our analysis to a particular kind of asexual organism, namely asexual bacteria. Bacteria reproduce by binary fission, and so we derive the generating function as 183 184 follows. Our assumption of a constant population size (see below) implies a sampling event every generation. Thus, a bacterium that divides before sampling will leave zero, one, or two offspring after sampling. In the case of bacteria, therefore, the probability generating function for number of offspring is ,.~. _.« t. ...-'l '. f(9) = (1—c:/2)2 + c (1-c/2) e + (c/2)262 , (53) where c is the expected number of offspring after division. and sampling. Thus, the probabilities of passing zero, one, and two offspring to the next generation are, respectively, (1-c/2)2, C(l-c/Z), and (c/2)2. The selective advantage of the mutant is s =JJ1C by definition, or approximately 5 scz- 1 when s is small. Let n(s) denote the probability that a beneficial mutant survives drift. Then, by substituting 1 + s for c in (53) and solving the equation 45 f(l—n(s))==l-n(s), we obtain n(s) =-—————— which is <1+s)2' approximately 45 for small 3. All derivations in this dissertation employ the general notation, n(s), whereas all computations implement the approximation, n(s)==4s. .APPENDIX 2.2 n-GENOTYPE LOGISTIC SYSTEM WITH MUTATION General solution Logistic dynamics of an n-genotype system are modeled by in“ 4.1-”- .L i 1 assuming that (i) total population size is constant, i.e., S: x ==N, where.xiis number of individuals of genotype i, i=1 ’ and (ii) the differences in Malthusian parameters are constant: mi -m1 = Si 1 i : 210-01“ I (54) dxi 1 dxl -1 dxi where m = — and m = —— = x — N — 1 1 dt ' l xl dt {:2 ' l=2 t Equation (54) may, therefore, be rewritten as: l dXi ‘1 S: de . _ dt + N-ngj jzz-(fi' ‘ St. I 1-2,3,...,n .(55) 185 186 This system of n-l equations can be rearranged as follows: dXi 1 (56) dt = X151_TI,. SIX} ’ where i = 2,3,...,n. While this system of equations is non- linear, its symmetry makes an analytical solution possible. The key to its solution is the transformation, X;==ln xi-sit. The system of equations now becomes: dX. + _' : -l S; s er ’1' , (57) j: where i = 2,3,...,n. Thus, the time derivatives of all transformed variables are equal: dX. dX __' _ _L = (58) dt dt 0 ’ where i,j = 2,3,...,n. Integration of (58) yields .fi -.%: = }%_, and k” is a constant of integration that is determined from initial conditions: k0, = Kim) -X}.(0) = ln xi(0) -ln x110) , (59) 187 where i,j = 2,3,...,n. Thus, the system of equations is uncoupled by substituting X} from (57) with X} - k”, which yields: dX __" = __1_ S; S_ exi’kif‘J' , (60) j: where i = 2,3,...,n. From solution and subsequent back- F hum... L). as : f. transformation of equation (60), the analytical solution of an n-genotype logistic system is obtained: -1 x(t) = X.(0) e"t[1 + -1- 2; KW) (e’jt—l)] , (61) l 1 N}: I where i = 2,3,...,n, and x1(t) = 111-f: let) . <62) I Application of boundary conditions due to mutation If genotype i appears by mutation at time I}, then boundary conditions are xi(n) = 1. From these, the initial conditions are determined; they are 188 x = R“N , (63) where x, is a vector whose elements are x1- (0), i = 2,3,...,n, R is an n-l x n-l matrix whose elements are r” = , i,j = 2,3,...,n, and N'is a vector whose n-l elements are the constant N. NOtation for the 3-genotype case The developments in this appendix use a more general notation than is used in the rest of the paper, where.xzis simply denoted by x, x2.is denoted by y, and x; is denoted by z. This 3-genotype case has the particular solution, ll (D K + y(t) : (64) z(t) 3; 1 ‘+ l a; _ 1.+ (O)( a; _1) 4 2(0) Tve 2(0) 8 ' ll (D 189 The initial conditions are determined from the boundary conditions, y(0)==1 and z(tz)==1; they are ll H W0) 85‘. + N (65) 2(0) . .APPENDIX 2.3 EXPECTED NUMBER OF CANDIDATE REPLICATIONS. Here we derive the expected number of replications that may generate superior mutations that prevent a given beneficial mutation from attaining some frequency, f. We have called these candidate replications, denoted by R, in the subsection, Clonal interference - a general model. The, crucial step in the derivation of R is finding an expression for the time, t2, at which a superior mutation must appear if the original mutation is to attain a maximum frequency of exactly f. The time, tmul at which y reaches a maximum number is d determined from.-3% == 0; it is tumult) 1 S N2 t (t) = — ln ’ _ _ (66) mu 2 s s -s (sysz) ta 3 z z z y Ne 190 191 (s -:)t -:t , 3 F . When e V ‘ ‘ < Ne ‘ ‘ (i.e. when e” ‘.%fl is well approximated 1 by evaluating R at s = (3 Is >s ) = s + —, to derive the z z z y y d expected number of candidate replications: 192 ' N R " fX(t) dt = —lnN- 0 S (69) Ni 1+ (1-1)( N)%- ‘3' “ 1+ds "f “S “s ' . 1 where t is simply t evaluated at s ==s +—-. Z Z 2 y a The approximation made in equation (67) is, for our t purposes, essentially an equality when 50’ ‘< N} If we combine this condition with equation (68), then the approximation works well only when the frequency f meets the following condition: I —l his l-—3 2 s (70) f<— y 2+ .N s -s 1 z y 1 O .0 If we let s = (sl5;>s) == 3 -+-, and if we Simplify the z zzy yet notation so that s =:%, then the above condition becomes 193 2 +.. 71 .f < 3i(01 51W 1“ 1-1 ( ) This upper bound on f reaches a minimum value when.-3§ == 0, so that an overall bound below which the approximation works well is obtained by solving for the value of s that 1 satisfies ln.(dsN) == 1 +-a§ and using that value in equation (71). In general, the approximation is valid when i’< 0.95 provided that N'is greater than 104. For the purposes of this paper, the approximation is essentially an equality because we are concerned only with the cases 1:: 0.01 and 1’: 0.5, for which the approximation works extremely well. We compute fixation probabilities, i.e., the boundary case f’>-—N—, using the simpler derivations in Clonal interference and fixation. APPENDIX 2 . 4 FUNCTIONS EMPLOYING THE RECTANGULAR DISTRIBUTION We present here the results only of the derivations in which a rectangular distribution replaces the exponential distribution of beneficial mutational effects. The probability of fixation of an arbitrarily chosen beneficial mutation is: . 1 max _1 ’ ’ ,N) Pr{fixls ,u,N} = n(s) e “(3 :‘mp ds , (72) max 3 max 0 where A (s s N) = ll Nln Nn Sin-$2 (assumin that R ' m'u' g 25 9 ma n(u) is approximately linear). The expected rate of substitution of beneficial mutations is: 194 195 = uNPrlfIXIsm.u,N} . (73) The expected selection coefficient of successful mutations is: 3 max -A( ’ , »N) fsn(s) e “Hm“ ds _ o - 3m 40: “M I (74) I II (:3) ea R ’ ““' ' (1:3 0 where )%(s,smu,u,N) is as defined above for equation (72). The expected number of superior mutations in the interval (o,£) is: Z 2 2 u N Sam‘s w (SIS (uprf) : — Nln[—) n —_ . (75) R max S The expected number of superior mutations in the interval (fig) is: u S ‘S : ._ 76 YR(SI Smulu,N,f) S Nln(xR) II ( ) 196 2: h - 1 + — 1-1 23 N - 2 Th w ere )(R - 5 +5 (sum 5) (TC ) s -s s . e probability that an arbitrarily chosen beneficial mutation transiently achieves polymorphic frequency (f>0.01) is: Pr{polylsmx, 11: N} = l ’” -W(Ls ,mmoxn) -fl(Ls ,mmoxn) (77) —rn(s)eR ...... (l-e“ m )ds. 3 max0 Finally, the probability that an arbitrarily chosen beneficial mutation transiently achieves majority status is obtained by replacing 0.01 in equation (77) with 0.5. APPENDIX 3.1 ADAPTIVE SUBSTITUTIONS IN.ASEXUAL POPULATIONS ARE ACCURATELY MODELED AS INSTANTANEOUS REPLACEMENTS Conveniently, the continuous process of adaptive substitution in asexual populations is well approximated by a discrete process of instantaneous replacement. Lenski et al. (1991) suggested the use of such an approximation. They reasoned that in a large population the frequency of a substituting variant would remain very low for a considerable time and then rise sharply, thus approximating a step function. Define time of adaptive substitution, t', as the time at which the frequency of a beneficial mutation achieves 0.5. Then, the process of adaptive substitution may be approximated by a simplified scenario in which the substituting variant remains at a frequency of zero until time t? and assumes a frequency of one thereafter. That we have defined t7 appropriately is evidenced by the symmetry of logistic growth. More generally, let tf denote the time of the ith adaptive substitution in an evolving population. Then populational processes are well approximated by 197 198 assuming the population to be numerically dominated by a single "wildtype" variant throughout the interval (tpf, tf), for every i. Suppose a beneficial mutation appears in a population at time t=0 and subsequently spreads to fixation. Let p denote the frequency of the beneficial allele. Then, if the unit of time is generations, the dynamic equation for the growth of the beneficial mutation is dp/dt==sp(1-p), where s is the selection coefficient, and p(0)=1/N. Define time until fixation, tf, as that which satisfies p(tfil=(N-1)/N. The time of substitution, t', as defined above, is that which satisfies p(tlr=0.5; it is t“= ln(N)/s. Thus, according to the continuous substitution process, the total number of wildtype replications after time t=0 is given by 'f' Nln(N) . [1-p(t)]dt = —— = Nt. 0 S The discrete approximation to the above substitution 0, tst' process is given by the equation, p(t) = .' where l, t>t t‘is as defined above. According to this approximation, 199 the number of wildtype replications after time t=0 is simply ART, which is exactly equal to the number obtained for the continuous process. We conclude that (i) the time of substitution, t3 is appropriately defined and (ii) the continuous process of substitution is closely approximated by a discrete process of instantaneous replacement at time t'. .APPENDIX 3.2 ANALYTICAL INTEGRATION OF EQUATIONS (21) AND (22). To analytically integrate equation (21), a more precise notation is essential. Given fixation ordering, C, the i” mutation fixed will be followed by a number of fixations of mutations that are superior to the i“ mutation. We have denoted the set of such superior mutations as Sh. Let the . subscript S indicate membership in set Sh, such that ssck) denotes the k“ member of this set. Now, order this set of selection coefficients such that 35(1) > 5362) > . . . > ss (#SJ}, where #81 denotes the cardinality of set 81-. The i” mutation fixed will also be followed by a number of fixations of mutations that are inferior to the iCh mutation. We have denoted the set of such inferior mutations as 1;. Let the subscript I denote membership in set 1;, such that sILk) denotes the kth member of this set. Now, order this set of selection coefficients such that 31(1) > 51(2) > . . . > sI(#Ii)}, where #11 denotes the 200 201 cardinality of set Ii. With this new notation, the integral in equation (21) may be rewritten as lnN/s (1) ”‘1 I lnN f exp{-(ti+ S N)£r Si(k)}g(t )dt 0 1 k'1 lnN/51(2) 1n f exp -(ti+ + lnN N)£rs (k) ' (t1 - ———)-81)r( (1)}g(t )dt SI lnN/31(1) 1 k-l lnN/s (3) I In I exp{- ”:1 + + 2 lnN N)£r 5(k) ‘ z(ti-m)rr(k)]g(ti)dt lnN/3H2) Si 101 k-l (78) lnN/s (in) lnN "[1 lnN + exp-(t.+ )Er (k) - 2 (ti———k—)rI(k) g(ti) lnN/51111114) 151 k'1 k'1 SI( ) °° lnN ”1 lnN + f exp-(t.+ N12r 1, this limit is closely approximated by NC) “ f 1 T1-—————exp{-#Siifimnnfin , not.rank-ordered , rank-ordered M—i+1) exp{ -11: KNlnN} 203 Equation (22) may be integrated analytically by decomposition of the integral into intervals as before. The appropriate decomposition is achieved by simply multiplying each integrand of (78) by ti. Then, analytical integration yields #1 r' #s . - lnN 1 lnN E(T.|§) = 24exp -—Xr (k) + t—r (k) t I: R 8,- k=l S k=1 51(k) I I (82) -1131 -RJL : (I) lnN 1 3 (j +1) lnN 1 x e I -———e—+—- - e I -———T———+—- , 31(3) R sI(_7+l) R #31 . where R=r +Xr(k) + firm), 5(0) =00 and s(#1',+1) =0. l k=l S k=1 1 I I a In the limit as selection coeffecients converge, this expected time is reduced to limbthiIC) = exp{-#SiuKNlnN} 1 lnN 1 #si+l §-( +fi)eXp{-(#Si+l)uKNlnN} (83) exp{ - (#S‘, +#Ii ) uKNlnN} lnN #%+#E+1 1 +73) exp{ - (#Si +#Ii +1) uKNlnN} where s is the average selection coefficient (to which all sj converge) . 204 APPENDIX 4.1 EFFECTIVE MUTATION-SELECTION BALANCE Let ¢(t) denote the frequency of a given deleterious mutation, where t=0 at the time of the most recent adaptive substitution (see below for precise definition). Given that the time until the next adaptive substitution is t5, the average mutation selection balance during the time interval between the most recent and the next substitution is t 1 -E—f¢(t)dt. Let 0 denote the expected rate of adaptive 3 0 substitutions. Then, to a first approximation, we define effective mutation selection balance as 1/0 (be = cf¢(t)dt . (84) 0 Suppose a beneficial mutation appears on the wildtype background at time t = 0. As the number of individuals carrying this beneficial mutation grows, they produce at 205 rate Lb a deleterious mutation, which has selective disadvantage sh. .Assuming constant population size, N, the frequency, ¢, of this deleterious mutation on the beneficial background is given by the dynamic equation, db = _ -: 3' -1 'EE ([(N 1)e + l] u D ' SD¢)(1-¢) I (85) with initial condition ¢(O) = O, reflecting the fact that the beneficial mutation occurs on the wildtype background.- It is commonly the case that ¢>« 1, for which the above equation is well approximated by do __ uD-SED ’ t:2 tuz (86) '3? '_ O , t:< t“2 ln.N where t“2 = s is the time necessary for the beneficial mutation to attain a frequency of 0.5, and again this equation has initial condition ¢(0) = O. For convenience, we now shift our time axis such that the time at which the beneficial mutation achieves frequency 0.5 is zero, i.e., 206 1/2é:0' We define this time as the time of substitution. That equation (85) is well approximated by (86) is good evidence that the time of substitution, as we have defined it, marks a "resetting" of genetic variability in the population. Figure 4.7 plots the frequency of a deleterious mutation in a population during a substitutional event. The vertical dotted line marks the time of substitution as defined here (the t=0 axis in the shifted coordinate system). Note the close agreement between exact and approximate solutions. In our shifted coordinate system, equation (86) becomes d i = u - 5 ¢- When the initial condition, q>(0) = 0, is dt D D I I 0 DD -8! O 0 applied, the solution is o == 2? (l-e D). This solution D is plotted in Figure 4.1A.for two different values of SD; up note that as t «c», ¢ ~ :;-= o. Given that the time between D two substitutions is ts, the effective mutation-selection balance during that time is 207 “D O' *JD/O of ¢(t) dt = ? l-s—(l-e ) , (87) where 0 is the rate of adaptive substitutions as determined by Gerrish & Lenski (1997): " u = 0( uBNers) exp {ugly Nln(N) em" n(s+%) - d 5} ds (88) o This expression makes the assumption that selection coefficients for beneficial mutations are exponentially distributed with parameter a. The function n(s) describes the probability that a beneficial mutation of selective advantage, 3, is not lost by drift. Assuming constant total population size and Poisson-distributed offspring, this function is approximately n(s)==23 (Haldane, 1927). For bacteria, which reproduce by binary fission, n(s)==4s (Gerrish & Lenski, 1998). .APPENDIX 4.2 EFFECTIVE POPULATION NUMBER UNDER.A SERIAL TRANSFER REGIME I derive the effective population number with respect to non-neutral mutations. First, I find the probability of fixation} of a beneficial mutation in a population of constant number. Then, I find this probability for a population that is subject to periodic bottlenecks and grows exponentially between these bottlenecks, as in a serial transfer regime. Equating these two probabilities gives an expression for effective population number. Conveniently, the same equations apply for deleterious mutations as well, implying that this effective population number is general for non-neutral mutations. Employing a diffusion approximation, the Kolmogorov backward equation is solved to find the ultimate probability 1 By probability of fixation, we mean the probability that the mutant gene in question is not lost by drift. In an asexual system, the mutant gene may be lost as a result of competition with alternative beneficial mutant genes (Haigh, 1978; Gerrish & Lenski, 1997); such competition, however, becomes important only when frequencies become relatively high, at which point stochastic effects are negligible. Because effective population size is a stochastic equivalent, calculations here do not take such competition between beneficial mutations into account. 208 209 of fixation u(p) given starting frequency p. (The Kolmogorov backward equation is derived in Chapter 8 of Crow & Kimura (1970).) Let u(p,t) denote the probability that a mutant gene becomes fixed by the t” generation, given that its starting frequency is p. Then, the diffusion approximation for this probability is given by au(p,t) _ au(p,t) 1282u(p,t) + —o ———————— 3t - u 3P 2 6p2 (89) We are interested in the ultimate probability of fixation (probability that the mutant gene ever achieves fixation), given by u(p) == limbmu(p,t), for which au/6t==0 and which therefore satisfies 2 udumb) + 02 d u(p) _. = o I (90) dp dp with boundary conditions, u(0) = 0 and u(l) = 1. The solution which satisfies these boundary conditions is P IG(x)dx u(p) = f—— . <91) fG(x)dx 0 210 2 where G(x)=exp{-f—udx}. Let X, denote frequency of the 02 mutant gene at time t, and let 6x! denote the change in xt between generations t and t+l, i.e. }< = xti'BX}. Then p (+1 and.c?:may be understood as expectations E(5xt) and E[(6xq)2], respectively. Given that the system is asexual and that the mutant gene in question has selective advantage 8, the change in mean frequency is simply u = Nexs/Ne = sx. To calculate CF, consideration must be given to both the mode of reproduction as well as the predominant mode of selection. At this point, I restrict the derivation to the case of reproduction by binary fission; conveniently, however, the result approximates that for other modes of reproduction, implying generality. If selection occurs mainly by differential growth, then the change in variance x(1+s) 2N ' is oz==2ALx(l+s)%(l-%)/Nf== the binomial variance in which 2A£x(1+s) offspring are sampled with probability 211 —. If selection occurs mainly by differential death (or differential sampling), then the change in variance is 2 2 2N 1+ l-s x 1-s2 02 = 2ch[ S)[ )/N: = -(—)—, the binomial variance in which ZACK offspring are sampled with probability (l+s)/2. 2 3 Thus, —; = 4N‘f(s), where fish—13; for differential growth 0 for differential death, and (91) becomes or f(s) l-s2 l - exp{-4N¢f(s)p} ”(p) = l-exp{-4N¢f(s)} ' (92) Given ploidy number, ¢, the starting frequency of an individual mutant gene is p = l/(¢N), where N'is the actual number of individuals present in the population at the time of mutation. When this starting frequency is inserted into (92), the probability of fixation of an individual mutant gene is closely approximated by N‘ (93) u == 4 35; , regardless of the choice of f(s). (The sexual case, assuming Poisson-distributed offspring, yields the same approximation; see Crow & Kimura, 1970.) Rearrangement of this equation gives an expression for effective population u N number, N' = -—2—. e 43 If the actual population number, AL fluctuates over time, as in a serial transfer regime, then the effective population number will also fluctuate. To leave the effective population number as a function of time, however, would defeat the purpose of having an effective population number, which is to simplify the math. Thus, we employ the geometric mean effective population number, _ 1 t U¢Nt ln(N) = -— 1 dt, where r is the period of time c to 43 between serial transfers. If the population grows exponentially between transfers, then AC == Age", where r is the exponential 213 growth parameter. For continuous growth of bacteria, r = 1; for discrete growth of bacteria, r = ln 2. It remains to derive the probability of fixation, u. Let v denote the probability that the beneficial mutation in question does not survive the effects of random sampling, such that u = l - v. The number of offspring of the beneficial mutant that make it through the first population bottleneck (due to the first transfer dilution) is a Poisson random variable. The Poisson parameter of this random variable is the product of (i) the number of offspring of the beneficial mutant just before dilution, and (ii) the dilution factor, D. Given that the beneficial mutant appears at time t, factor (i) is e'“””(‘“). Thus, the Poisson parameter for the number of offspring of the beneficial mutant that are sampled at the first dilution is e“l"”‘”)D. For convenience, I break this parameter into two factors, A = e'(l+’)‘D and Y: = e—'(“’)' . The total probability of loss of the beneficial mutation is then given by the following logic. Either zero offspring are transfered at the first dilution or one 214 offspring is transfered and lost in subsequent dilutions or two offspring are transfered and both lineages are lost in subsequent dilutions or . . . etc. The probability that zero offspring are transfered at the first dilution is -A e Y‘. The probability that one offspring is transfered at -A the first dilution is Ay}e 7‘. Let x denote the probability that one lineage is lost in subsequent dilutions. Then the probability that one offspring is transfered at the first dilution and its lineage is lost in subsequent dilutions is —A Av}e ytx. Likewise, the probability that two offspring are transfered at the first dilution and both lineages are lost . . . . 4? . in subsequent dilutions lS %(Ayt)2e txz. The same logic applies for any number of offspring transfered at the first dilution. Thus the total probability of loss is given by the sum, v == S:-———L—-e ‘x’ == e . (94) 215 The remaining unknown is x, the probability that a single lineage is lost in subsequent dilutions. If one beneficial mutant is transfered at the first dilution, then the number of its offspring present just r(l+s)t I before the next dilution is e such that the Poisson parameter for the number of its offspring that make it r(l+:)tl) I through that dilution is e or simply A. Each of those offspring that make it through that dilution will have the same Poisson distribution of offspring after the next dilution, etc. This is called a branching process (see p. 58 in Bailey, 1964). A result of branching process theory is that the probability of extinction is given by the smallest positive root of the equation g(x)==x, where g(x) is the probability generating function for the number of offspring produced by an individual in one "generation". If we define one "generation" as starting just after one dilution and ending just after the next, then the probability generating function for the number of offspring left by one individual after one "generation" is m A! -i - A g(x) = 2 Te x1 = e ”—1). Thus, the lineage started by 1.: o 216 one beneficial mutant transfered at the first dilution is lost in a subsequent dilution with probability x, the smallest positive root of the equation, ex””” = x. When this probability is determined, then the total probability of survival may be calculated: u =].- v==1 - =].-.xY. From here, the effective population number may be calculated: dt . (95) — lt ¢No rt 7!: Ne — exp ;{lnl—Ee (1—x ) When lAyt(x-l)| is small, then the following approximations are satisfactory for intermediate values of s. The probability of survival may be approximated by lny-l) u==l - e z Ayt(l-x) (from a first order expansion), where x is approximately x:=1.- 2rsr (from a second order expansion of the equation e1”"” = x). Insertion of these two approximations into (95) yields the following simplified expression: 217 — z 1 Ne 2cpNort . (96) If t is given in generations, this expression is further simplified: - ~ 1 Ne N 3¢NOI o (97) LIST OF REFERENCES LIST OF REFERENCES Bailey, N.T.J., 1964. The Elements of Stochastic Processes. John Wiley & Sons, Inc., New York. Barton, N.H., 1993. The probability of fixation of a favoured allele in a subdivided population. Genet. Res. 62: 149-157. Barton, N.H., 1994. The reduction in fixation probability caused by substitutions at linked loci. Genet. Res. 64: 199-208. Barton, N.H., 1995. Linkage and the limits to natural selection. Genetics 140: 821-841. Berg, O.G., 1995. Periodic selection and hitchhiking in a bacterial population. J. Theor. Biol. 173: 307-320. Bhatnagar, S.K. & M.J. Bessman, 1988. Studies on the mutator gene, mutT, of Escherichia coli. Molecular cloning of the gene, purification of the gene product, and identification of a novel nucleoside triphosphatase. J. Biol. Chem. 263: 8953-8957. Boucher, C.A.B., E. O'Sullivan, J.W. Mulder, C. Ramautarsing, P. Kellam, G. Darby, J.M.A. Lange, J. Goudsmit & B.A. Larder, 1992. Ordered appearance of zidovudine resistance mutations during treatment of 18 human immunodeficiency virus-positive subjects. Journal of Infectious Diseases 165: 105-110. Carlton, B.C. & B.J. Brown, 1981. Manual of Methods for General Bacteriology, ed. P. Gerhardt. American Society for Microbiology, Washington, D.C., p. 222-242. 218 219 Chao, L. & E.C. Cox, 1983. Competition between high and low mutating strains of Escherichia coli. Evolution 37: 125-134. Coffin, J.M., 1995. HIV population dynamics in vivo: implications for genetic variation, pathogenesis, and therapy. Science 267: 483-489. Cox, B.C., 1976. Bacterial mutator genes and the control of spontaneous mutation. Annu. Rev. Genet. 10: 135-156. Cox E. C. & T.C. Gibson, 1974. Selection for high mutation rates in chemostats. Genetics 77: 169-84. Crow, J.F. & M. Kimura, 1965. Evolution in sexual and asexual populations. Am. Nat. 99: 439-450. Crow, J.F. & M. Kimura, 1970. An Introduction to Population Genetics Theory. New York: Harper & Row. Cunningham, C.W., K. Jeng, J. Husti, M. Badgett, I.J. Molineux, D.M. Hillis & J.J. Bull, 1997. Parallel molecular evolution of deletions and nonsense mutations in bacteriophage T7. Molecular Biology & Evolution 14: 113-116. Drake, J.W., 1991. A constant rate of spontaneous mutation in DNA-based microbes. Proc. Natl. Acad. Sci. USA 88: 7160-7164. Drake, J.W., 1991. Spontaneous mutation. Annu. Rev. Genet. 25: 125-46. Elena, S.F., L. Ekunwe, N. Hajela, S.A. Oden & R.E. Lenski, 1998. Distribution of fitness effects caused by random insertion mutations in Escherichia coli. Genetica, in press. Elena, S.F., V.S. Cooper & R.E. Lenski, 1996. Punctuated evolution caused by selection of rare beneficial mutations. Science 272: 1802-1804. Ewens, W.J., 1979. Mathematical Population Genetics. New York: Springer-Verlag. 220 Ewens, W.J., 1969. Population Genetics. London: Methuen Press. Feller, W., 1968. An Introduction to Probability Theory and Its Application. New York: John Wiley & Sons. Felsenstein, J., 1988. Sex and the evolution of recombination, pp. 74-86 in The Evolution of Sex, edited by R.E. Michod and B.R. Levin. Sunderland, Mass.: Sinauer Associates. Felsenstein, J., 1974. The evolutionary advantage of recombination. Genetics 78: 737-756. Fisher, R.A., 1930. The Genetical Theory of Natural Selection. Oxford: Oxford Univ. Press. Gerrish, P.J. & R.E. Lenski, 1998. The fate of competing beneficial mutations in an asexual population. Genetica (in press). Gillespie, J.H., 1991. The Causes of Molecular Evolution. Oxford: Oxford Univ. Press. Gillespie, J.H., 1981. Mutation rate modification in a random environment. Evolution 35: 468-476. Gurland, J., 1958. Biometrics 14: 229—249. Gurland, J., 1963. A method of estimation for some generalized Poisson distributions. International Symposium on Classical and Contagious Discrete Distributions, McGill, Montreal. Haigh, J., 1978. The accumulation of deleterious genes in a population -- Muller's ratchet. Theor. Pop. Biol. 14: 251-267. Haldane, J.B.S., 1927. The mathematical theory of natural and artificial selection. Proc. Camb. Phil. Soc. 23: 838-844. 221 Hall, B.G., 1988. .Adaptive evolution that requires multiple spontaneous mutations. I. Mutations involving an insertion sequence. Genetics 120: 887-897. Ho, D.D., T. Moudgil & M. Alam, 1989. Quantitation of human immunodeficiency virus type 1 in the blood of infected persons. New England Journal of Medicine 321: 1621- 1625. Hogg, R.V. & A.T. Craig, 1995. Introduction to mathematical statistics. New Jersey: Prentice Hall. Holmes, B.C., L.Q. Zhang, P. Simmonds, C.A. Ludlam & A.J.L. Brown, 1992. Convergent and divergent sequence evolution in the surface envelope glycoprotein of human immunodeficiency virus type 1 within a single infected patient. Proc. Natl. Acad. Sci. USA 89: 4835-4839. Horiuchi, T., H. Maki, M. Maruyama, & M. Sekiguchi, 1981. Identification of the dnaQ gene product and location of the structural gene for RNAse H of Escherichia coli by cloning of the genes. Proc. Natl. Acad. Sci. USA 78: 3770-3774. Ishii, K., H. Matsuda, Y. Iwasa &.A. Sasaki, 1989. Evolutionary stable mutation rate in a periodically changing environment. Genetics 121: 163-174. Jaeger, G. & S. Sarkar, 1995. On the distribution of bacterial mutants: the effects of differential fitness of mutants and non-mutants. Genetica 96: 217-223. Johnson, P., R.E. Lenski & F. Hoppensteadt, 1995. Theoretical analysis of divergence in mean fitness betweeen initially identical populations. Proceedings of the Royal Society, London B 259: 125-130. Kaplan, N.L., R.R. Hudson & C.H. Langley, 1989. The “hitchhiking” effect revisited. Genetics 123: 887-889. Keightley, P.D., 1991. Genetic variance and fixation probabilities at quantitative trait loci in mutation- selection balance. Genet. Res. 58: 139-144. 222 Kimura, M., 1960. Optimum mutation rate and degree of dominance as determined by the principle of minimum genetic load. J. Genet. 57: 21-34. Kimura, M., 1967. On the evolutionary adjustment of spontaneous mutation rates. Genet. Res. 9: 23-34. Kimura, M., 1979. Model of effectively neutral mutations in which selective constraint is incorporated. Proc. Natl. Acad. Sci. USA 76: 3440-3444. Kuhner, M.K., J. Yamato & J. Felsenstein, 1995. Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140: 1421-1430. Lea, D.E. & C.A. Coulson, 1949. The distribution of the numbers of mutants in bacterial populations. J. Genetics 49: 264-285. LeClerc, J.E., B. Li, W.L. Payne, & T. Cebula, 1996. High mutation frequencies among Escherichia coli and Salmonella pathogens. Science 274: 1208-1211. Lederberg, S., 1966. Genetics of host-controlled restriction and modification of deoxyribonucleic acid in Escherichia coli. J. Bacteriol. 91: 1029—1036. Leigh, E.G., 1970. Natural selection and mutability. Am. Nat. 104: 301-305. Leigh, E.G., 1973. The evolution of mutation rates. Genetics 73 (suppl.): sl-sl8. Leigh Brown, A.J. & D.D. Richman, 1997. HIV-1: gambling on the evolution of drug resistance? Nature Medicine 3: 268-271. Leigh Brown, A.J., 1997. Analysis of HIV-1 env gene sequences reveals evidence for a low effective number in the viral population. Proc. Natl. Acad. Sci. USA 94: 1862-1865. 223 Lenski, R.E & M. Travisano, 1994. Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations. Proc. Natl. Acad. Sci. USA 91: 6808-6814. Lenski, R.E., M.R. Rose, S.C. Simpson & S.C. Tadler, 1991. Long-term experimental evolution in Escherichia coli. I. Adaptation and divergence during 2000 generations. Am. Nat. 138: 1315-1341. Luria, S. E. & M. Delbrfick, 1943. Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28: 491-511. Ma, W.T., G.H. Sandri, & S. Sarkar, 1992. Analysis of the Luria-Delbruck distribution using discrete convolution powers. J. Appl. Prob. 29: 255-267. Manning, J.T. & D.J. Thompson, 1984. Muller's ratchet accumulation of favourable mutations. Acta Biotheor. 33: 219-225. Mansky, L.M. & H.M. Temin, 1995. Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase. J. Virol. 69: 5087-5094. Mao, E.F., L. Lane, J. Lee & J.H. Miller, 1997. Proliferation of mutators in A.cell population. J. Bacteriol. 179: 417-422. Mascolini, M., 1997. Of clades and quasispecies: making sense of the HIV population census. Journal of the International Association of Physicians in AIDS Care 3: 13-25. Maynard Smith, J. & J. Haigh, 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23-35. Maynard Smith, J., 1968. Evolution in sexual and asexual populations. Am. Nat. 102: 469-473. Miller, J.H., 1992. A Short Course in Bacterial Genetics. Cold Spring Harbor Lab. Press, Plainview, NY. 224 Miller, J.H., 1996. Spontaneous mutators in bacteria: insights into pathways of mutagenesis and repair. Annu. Rev. Microbiol. 50: 625-643. Mittler J.E. & R.E. Lenski, 1992. Experimental evidence for an alternative to directed mutation in the bgl operon. Nature 356: 446-448. Modrich, P., 1995. Mismatch repair, genetic stability and tumour avoidance. Phil. Trans. Roy. Soc. Lond. Ser. B. 347: 89-95. Modrich, P., 1991. Mechanisms and biological effects of mismatch repair. Ann. Rev. Genet. 25: 229-253. Moxon B.R., P.B. Rainey, M.A. Nowak & R.E. Lenski, 1994. Adaptive evolution of highly mutable loci in pathogenic bacteria. Current Biology 4: 24-33. ' Muller, H.J., 1932. Some genetic aspects of sex. Am. Nat. 8: 118-138. Muller, H.J., 1964. The relation of recombination to mutational advance. Mutat. Res. 1: 2-9. Nowell, P.C., 1974. The clonal evolution of tumor cell populations. Science 194: 23-28. Otto, S.P. & M.C. Whitlock, 1997. The probability of fixation in populations of changing size. Genetics 146: 723-733. Pamilo, P., M. Nei & W. Li, 1987. Accumulation of mutations in sexual and asexual populations. Genet. Res. 49: 135—146. Pang, P.P., A.S. Lundberg & G.C. Walker, 1985. Identification and characterization of the mutL and mutS gene products of Salmonella typhimurium LT2. J. Bacteriol. 163: 1007-1015. Peck, J.R., 1994. A ruby in the rubbish: beneficial mutations, deleterious mutations and the evolution of sex. Genetics 137: 597-606. 225 Peck, J.R., G. Barreau & S.C. Heath, 1997. Imperfect genes, Fisherian mutation and the evolution of sex. Genetics 145: 1171-1199. Piatak, M., M.S. Saag, L.C. Yang, S.J. Clark, J.C. Kappes, K.C. Luk, B.H. Hahn, G.M. Shaw & J.D. Lifson, 1993. High levels of HIV-1 in plasma during all stages of infection determined by competitive PCR. Science 259: 1749-1755. Sambrook, E.F., T. Fritsch & J. Maniatis, 1989. Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Spring Harbor Laboratory Press, Plainview, NY. Sarkar, S., 1991. Haldane's solution of the Luria-Delbruck distribution. Genetics 127: 257-261. Sarkar, S., W.T. Ma & G.v.H. Sandri, 1992. On fluctuation analysis: a new, simple and efficient method for computing the expected number of mutants. Genetica 85: 173-179. Schaaper, R.M. & R. Cornacchio, 1992. An Escherichia coli dnaE mutation with suppressor activity toward mutator mutDS. J. Bacteriol. 174: 1974-1982. Sniegowski, P.D., P.J. Gerrish & R.E. Lenski, 1997. Evolution of high mutation rates in experimental populations of E. coli. Nature 387: 703-705. Stewart, F.M., D.M. Gordon & B.R. Levin, 1990. Fluctuation analysis: the probability distribution of the number of mutants under different conditions. Genetics 124: 175- 185. Stewart, F.M., 1994. Fluctuation tests: how reliable are the estimates of mutation rates? Genetics 137: 1139- 1146. Taddei, F., M. Radman, J. Maynard Smith, B. Toupance, P.H. Gouyon & B. Godelle, 1997. Role of mutator alleles in adaptive evolution. Nature 387: 700-702. 226 Taucher-Scholz, G. & H. Hoffman-Berling, 1983. Identification of the gene for DNA helicase II of Escherichia coli. Eur. J. Biochem. 137: 573-580. Travisano, M. & R.E. Lenski, 1996. Long-term experimental evolution in Escherichia coli. IV. Targets of selection and the specificity of adaptation. Genetics 143: 15— 26. Trdbner, W. & R. Piechocki, 1984. Competition between isogenic mutS and.nnnfi populations of Escherichia coli K12 in continuously growing cultures. Mol. Gen. Genet. 198: 175-176. Vasi, F., M. Travisano & R.E. Lenski, 1994. Long-term experimental evolution in Escherichia coli. II. Changes in life-history traits during adaptation to a seasonal environment. Am. Nat. 144: 432—456. Wright, S., 1982. Character change, speciation, and the higher taxa. Evolution 36: 427-443. "‘111111111111“