"a ..‘ yin—«‘4 wu ~ ‘1‘ ; L x .. '4 ' 1‘" if?" (I 1' Ia." . .f‘ a" .4- 3: , £353 :2. 53a“ ‘” ~’ "3131.1 '4. m. , 2% __ w -v %y t. n . -5... fig” 1 . 1‘! 3 v. a... s ‘ #233? 1331’! ~¥ 1 5x512- u.- $3.: mflw” ’ ,. : a I 3‘ i3. 2 ‘ L3,; «2" .n: #007 This is to certify that the dissertation entitled COMPARATIVE MOLECULAR EVOLUTIONARY ANALYSIS OF VIRULENCE LOCI IN PATHOGENIC ESCHERICHIA COLI presented by David William Lacher has been accepted towards fulfillment of the requirements for the Ph.D. degree in Genetics Major Professor’s Signature flw‘I [(9)2001 Date MSU is an affirmative-action, equal-opportunity employer -4 ...-.-.—.-.---.-.-.—.--.-.-.-.-«-.—--.-.—-.-.-.-.-.—,—.—.-.-.-.-.-.-»-.-.---.-.-.-.-.—.-.-.- _._,_,_,- -.-.-c- LIBRARY Michigan State UHI‘VBI‘EQW PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/07 p:/CIRC/DateDue.indd-p.1 COMPARATIVE MOLECULAR EVOLUTIONARY ANALYSIS OF VIRULENCE LOCI IN PATHOGENIC ESCHERICHIA COLI By David William Lacher A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Graduate Program in Genetics 2007 ABSTRACT COMPARATIVE MOLECULAR EVOLUTIONARY ANALYSIS OF VIRULENCE LOCI IN PATHOGENIC ESCHERICHIA COLI By David William Lacher Escherichia coli is a diverse species of Gram-negative bacteria, some strains of which are pathogenic. The early stages of E. coli pathogenesis often involve bacterial attachment mediated by the expression of surface proteins. It has been hypothesized that pathogens alter their surface proteins in order to evade detection by their host's immune system. Therefore, it is likely that natural selection is acting to generate new allelic variants. It is the goal of this research to examine the allelic diversity of genes that encode a variety of surface structures in different classes of pathogenic E. coli (pathotypes). The specific aims are to: 1) develop a method to quickly and accurately subtype a highly polymorphic locus responsible for the hallmark phenotype of the attaching and effacing E. coli, 2) characterize enteropathogenic E. coli (EPEC) through multilocus sequence typing (MLST) and restriction fragment length polymorphism (RFLP) analyses, 3) assess the level of genetic polymorphism in a region of the operon encoding the type 1 fimbriae of E. coli, and 4) examine various genes encoding surface structures for the actions of positive selection and recombination. To address specific aim 1, a new method to quickly and accurately type the eae locus was developed. This new technique addresses the limitations of existing typing schemes and was applied to a set of E. coli capable of the attaching and effacing phenotype. For specific aim 2, a system was designed to detect and identify the alleles of three EPEC virulence genes. The distribution of these virulence gene alleles was assessed in a collection of strains representing a variety of EPEC serotypes and then compared to a phylogenetic framework generated from MLST analysis of conserved housekeeping loci. To address specific aim 3, a segment of DNA encompassing a regulatory region of the type 1 fimbrial operon was sequenced in various types of pathogenic E. coli. This region contains an invertible genetic element responsible for the phase variability of the type 1 fimbriae and has been shown to be inactive in some strains. For specific aim 4, a collection of allelic sequences was assembled for genes encoding five different surface proteins from several E. coli pathotypes. These genes were analyzed for evidence of positive selection and homologous recombination. The results of this work will give us a better understanding of how different types of pathogenic E. coli have evolved and may have important public health implications. Copyright by DAVID WILLIAM LACHER 2007 ACKNOWLEDGEMENTS I wish to thank my mentor, Thomas S. Whittam for his guidance and support. I thank my committee members, Michael Bagdasarian, Richard Lenski, and Vincent Young for their suggestions and guidance. I would also like to thank my fellow lab members by providing suggestions and support. I thank Galeb Abu-Ali, Teresa Bergholz, Alyssa Bumbaugh, Susan Francisco, Sara Kienzle, Shannon Manning, Adam Nelson, Suzanne Nelson, Lindsey Ouellette, Weihong Qi, James, Riordan, Hans Steinsland, Cheryl Tarr, Sivapriya Kailasan Vanajaa, Seth Walk, and Lukas Wick. TABLE OF CONTENTS LIST OF TABLES ........................................................................................................... ix LIST OF FIGURES ......................................................................................................... x Chapter 1: Literature Review .......................................................................................... 1 NATURAL SELECTION AND RECOMBINATION IN GENE EVOLUTION ............................... 2 ATTACHING AND EFFACING E. COLI ............................................................................. 5 ENTEROPATHOGENIC E. COLI ...................................................................................... 5 SHIGA TOXIN-PRODUCING E. COLI ............................................................................... 6 0157:H7 AND TYPE 1 FIMBRIAE .................................................................................. 7 CURLI FIMBRIAE .......................................................................................................... 8 PURPOSE ..................................................................................................................... 8 Chapter 2: Allelic subtyping of the intimin locus (eae) of pathogenic Escherichia coli by fluorescent RFLP ........................................................................................................ 10 SUMMARY ................................................................................................................... 11 INTRODUCTION ............................................................................................................ 12 MATERIALS AND METHODS ........................................................................................ 14 Bacterial strains and DNA isolation .................................................................... 14 PCR and eae-escD primer design ........................................................................ 14 RF LP development ............................................................................................. 15 PCR clean-up and restriction enzyme digestion .................................................. 15 fRF LP .................................................................................................................. 20 DNA sequencing ................................................................................................. 20 RESULTS ...................................................................................................................... 22 DISCUSSION ................................................................................................................. 27 ACKNOWLEDGEMENTS ................................................................................................ 29 Chapter 3: Molecular evolution of typical enteropathogenic Escherichia coli .............. 30 SUMMARY ................................................................................................................... 31 INTRODUCTION ............................................................................................................ 32 MATERIALS AND METHODS ........................................................................................ 34 Strains .................................................................................................................. 34 MLST .................................................................................................................. 34 eae RF LP ............................................................................................................ 34 bfivA PCR and DNA sequencing .......................................................................... 34 bfivA PCR and RF LP ............................................................................................ 36 perA PCR and DNA sequencing ......................................................................... 37 perA PCR and RF LP ........................................................................................... 38 fliC typing ............................................................................................................ 38 Phylogenetic analyses .......................................................................................... 38 RESULTS ...................................................................................................................... 39 vi MLST analysis ..................................................................................................... 39 Allelic variation in eae ........................................................................................ 42 Allelic variation in bfi)A ...................................................................................... 44 Allelic variation in perA ...................................................................................... 47 Association between EAF types and STs ............................................................ 49 DISCUSSION ................................................................................................................. 52 Common EPEC clones ........................................................................................ 52 Other EPEC clones .............................................................................................. 52 Virulence gene distribution ................................................................................. 53 bfiDA-negative EPEC ............................................................................................ 54 Relationship between typical and atypical EPEC ............................................... 55 ACKNOWLEDGEMENTS ................................................................................................ 56 Chapter 4: Sequence variation within the type 1 fimbrial phase switch of pathogenic Escherichia coli ............................................................................................................... 57 SUMMARY ................................................................................................................... 58 INTRODUCTION ............................................................................................................ 59 MATERIALS AND METHODS ........................................................................................ 61 Strains .................................................................................................................. 61 MLST .................................................................................................................. 61 fliC typing ............................................................................................................ 61 PCR ...................................................................................................................... 61 DNA sequencing ................................................................................................. 63 Phylogenetic analyses .......................................................................................... 63 RESULTS ...................................................................................................................... 64 Variation within the fim switch ........................................................................... 64 Clonal analysis ..................................................................................................... 64 DISCUSSION ................................................................................................................. 70 Sequence polymorphism within the fim switch ................................................... 70 Stepwise evolution of 01571H7 and the fim switch deletion .............................. 7O ACKNOWLEDGEMENTS ................................................................................................ 73 Chapter 5: Positive selection and recombination in surface protein-encoding genes ................................................................................................................................ 74 SUMMARY ................................................................................................................... 75 INTRODUCTION ............................................................................................................ 76 MATERIALS AND METHODS ........................................................................................ 78 Strains .................................................................................................................. 78 MLST .................................................................................................................. 78 fliC typing ............................................................................................................ 78 PCR primer design .............................................................................................. 78 PCR ...................................................................................................................... 78 DNA sequencing ................................................................................................. 81 Phylogenetic analyses .......................................................................................... 82 Recombination analyses ...................................................................................... 82 Selection analyses ................................................................................................ 83 vii RESULTS ...................................................................................................................... 84 MLST analysis ..................................................................................................... 84 Allelic variation in bpr ...................................................................................... 87 Allelic variation in csgA ...................................................................................... 89 Allelic variation in eae ........................................................................................ 90 Intracellular region of eae .................................................................................... 92 Extracellular region of eae .................................................................................. 92 Allelic variation in espA ...................................................................................... 94 Allelic variation in fimA ...................................................................................... 96 Selection and hydrophobicity .............................................................................. 98 DISCUSSION ................................................................................................................. 100 bpr ...................................................................................................................... 100 csgA ..................................................................................................................... 100 eae ........................................................................................................................ 102 espA ..................................................................................................................... 103 fimA ...................................................................................................................... 103 ACKNOWLEDGEMENTS ................................................................................................ 105 Chapter 6: Summary and Synthesis ................................................................................ 106 COMPATIBILITY ANALYSIS COMBINED WITH PHYLOGENETIC NETWORKS .................... 107 INFERRED EVOLUTION OF EAF PLASMIDS AND EPEC CLONES .................................... 116 FUTURE CONSIDERATIONS ........................................................................................... 1 18 References ........................................................................................................................ 1 19 viii Table 1. Table 2. Table 3. Table 4. Table 5. Table 6. Table 7. Table 8. Table 9. Table 10. Table 11. Table 12. Table 13. Table 14. Table 15. Table 16. Table 17. LIST OF TABLES MseI digestion of eae-escD PCR product ...................................................... 18 AseI / DdeI / SalI digestion of eae-escD PCR product .................................. 19 Allelic variation in eae based on RE LP ........................................................ 23 Reference strains for defined intimin alleles and patterns determined by fRF LP and confirmed by DNA sequencing .............................................. 24 Summary of EPEC strains investigated ......................................................... 35 Sequence variation among alleles of 7 MLST genes from EPEC strains ...... 40 Expected restriction fragment length polymorphisms of bpr PCR amplicons ....................................................................................................... 46 Expected restriction fragment length polymorphisms of perA PCR amplicons ....................................................................................................... 50 EAF plasmid types and distribution among EPEC clones ............................. 51 Summary of strains investigated for fim switch sequence variation .............. 62 Variation among alleles of 7 MLST genes in the fim switch strains ............. 66 Summary of strains investigated for allelic variation in at least one of 5 surface protein-encoding genes .............................................................. 79 Sequence variation among alleles of 7 MLST genes ..................................... 85 Summary of recombination and selection analyses for 7 MLST genes ......... 85 Summary of recombination and selection analyses for 5 surface protein-encoding genes .................................................................................. 101 Summary of combined compatibility analyses .............................................. 108 EAF plasmid changes within four common EPEC clonal groups ................. 117 ix Figure 1. Figure 2. Figure 3. Figure 4. Figure 5. Figure 6. Figure 7. Figure 8. Figure 9. Figure 10. Figure 11. Figure 12. Figure 13. Figure 14. Figure 15. Figure 16. Figure 17. Figure 18. Figure 19. LIST OF FIGURES Location of PCR primers and predicted MseI restriction sites for 4 major intimin alleles associated with human disease ................................. l6 Fragment sizes of DNA resulting from in silico MseI digestion of the eae-escD amplicon ......................................................................................... 17 Accuracy of fragments sizes estimated by RF LP ......................................... 26 Phylogenetic relationships of 21 EPEC sequence types ................................ 41 Phylogenetic network of 21 EPEC sequence types ............. V .......................... 43 Eleven bij alleles cluster into two major groups ......................................... 45 Twenty perA alleles cluster into four major groups ....................................... 48 Nucleotide polymorphism within the fim switch ........................................... 65 Phylogenetic relationships of 41 observed sequence types ........................... 67 Phylogenetic network of 18 genotypes belonging to the EHEC 1 clonal group ................................................................................................... 69 Revised stepwise evolution model of E. coli 01572H7 ................................. 72 Distribution of allelic data among sequence types ........................................ 86 Selection and recombination in bpr ............................................................. 88 Selection and recombination in csgA ............................................................. 91 Selection and recombination in the periplasmic and transmembrane domains of eae ............................................................................................... 93 Selection and recombination in the extracellular domains of eae ................. 95 Selection in espA ............................................................................................ 97 Selection and recombination in fimA ............................................................. 99 Phylogenetic network for csgA and its associated sequence types ................ 109 Figure 20. Phylogenetic network for fimA and its associated sequence types ................ 111 Figure 21. Conflicting evolutionary histories for two regions of eae ............................. 113 Figure 22. Phylogenetic network for eae ........................................................................ 114 Figure 23. Phylogenetic networks for espA combined with eae ..................................... 115 xi CHAPTER 1 LITERATURE REVIEW Escherichia coli is a diverse species of Gram-negative bacteria. Most strains are nonpathogenic and do not harm their host, but some may cause a variety of harmful intestinal and extra-intestinal infections. A common theme in the pathogenesis of the different types of E. coli is bacterial attachment mediated by the expression of surface proteins. A wide range of surface proteins are expressed by these strains, some of which are ubiquitous While others are specific to certain pathogenic types (pathotypes). It has - been hypothesized that these surface proteins must alter their three-dimensional structure or surface epitopes in order to confer an evolutionary advantage and evade detection by antibodies generated from previous antigen exposure. Therefore, it is likely that natural selection drives repeated amino acid replacements in surface proteins by acting upon the gene or genes that encode them. Natural selection and recombination in gene evolution. Natural selection is the evolutionary mechanism by which the relative genotype frequencies in a population change according to their relative fitness. There are two types of natural selection, positive and negative. These types of selection can be detected at the DNA sequence level by comparing the rates of synonymous (silent) and nonsynonymous (amino acid changing) nucleotide substitutions. According to the neutral theory of molecular evolution (68), synonymous and nonsynonymous substitutions Should accumulate at approximately the same rate in the absence of selection. Most coding genes Show an excess of synonymous substitutions (19). This is indicative of negative selection acting to maintain the existing function and structure of the encoded protein. Under this type of selective pressure, mutants have a lower fitness than their parental genotype(s) and their frequency decreases in the following generations. Positive selection, in contrast, favors amino acid changes and is characterized by a greater rate of nonsynonymous than synonymous substitution. Under positive selection, a newly produced mutant has a higher fitness than the average in the population, and its frequency increases in the following generations. Methods have been developed for estimating the numbers of synonymous and nonsynonymous substitutions (22, 44, 57, 86, 87, 99, 103). These methods calculate the substitution rate for an entire gene by computing the average number of substitutions over a particular length of codons. A limitation of these methods is that they could potentially fail to identify genes under positive selection if high nonsynonymous substitution rates occur at only a few codons. More recently, methods have been developed to detect positive selection at single amino acid sites. These approaches have been classified into three categories: counting methods, fixed effects models, and random effects models (73). Counting methods reconstruct the ancestral sequences and then estimate the number of nonsynonymous and synonymous changes that have occurred at each codon throughout the evolutionary history of the data set (106, 107, 142, 143). Fixed effects models estimate the ratio of nonsynonymous to synonymous substitutions on a site-by- site basis (142, 159). Like counting methods, fixed effects models make no assumption regarding the distribution of substitution rates across sites. Random effects models fit a distribution of substitution rates across sites and then infer the rate at which individual Sites evolve (51, 108, 158). Another mechanism that affects gene evolution is homologous recombination. In this process, genetic information is exchanged between related segments of DNA. Numerous methods have been developed to test for the occurrence and boundaries of recombinational events. Drouin et al. (33) have classified these techniques into four general categories: Similarity, phylogenetic, compatibility, and nucleotide substitution distribution methods. Similarity methods infer recombination when synonymous substitutions at variable genes (or regions of genes) exceed those at conserved genes (98, 110). Phylogenetic methods infer recombination when the phylogenies from different parts of a genome or gene result in conflicting topologies (50, 91, 154). Recombination can also be detected under these methods when orthologous genes do not reflect the evolutionary relationships of their species. Compatibility methods test for phylogenetic incongruence in a site-by-Site basis using only informative Sites but do not require a phylogeny for the sequences being analyzed to be known in advance (58, 59). A Site is said to be parsimoniously informative when there are at least two different nucleotides, each represented at least twice, at that position in the set of sequences under investigation. These informative sites are defined as compatible when their evolutionary histories are congruent with a single tree topology. Recombination is detected when a region of incompatible sites is found. In the final category, nucleotide substitution distribution, sequences are analyzed for a random distribution of substitutions along the sequences (130, 139). Recombination is inferred if a significant clustering of substitutions is detected. Attaching and effacing E. coli. Strains of attaching and effacing E. coli (AEEC) are capable of intimately attaching to intestinal epithelial cells (38, 62). Once the bacterium attaches to the host cell, it manipulates the host cytoskeleton to form pedestal structures beneath the bacterium and, in the process, effaces the microvilli of the intestinal mucosa (21, 49). This phenotype, termed attaching/effacement, is due to the locus of enterocyte effacement (LEE), a ~35-kb pathogenicity island that is inserted into the chromosome of AEEC (36, 93, 120). The LEE island encodes approximately 40 genes, including the components of a type III secretion system (TTSS) (36, 120). The TTSS acts as a molecular syringe to inject bacterial effector proteins, many of which are also LEE-encoded, into the host cell. A main component of the LEE-encoded TTSS is EspA, which enables initial attachment of the bacterium to the host cell (21) and acts as a conduit for bacterial effector proteins to enter the enterocyte (25, 67, 71, 134). Since these filaments are exposed to the host immune system, they may experience selection pressures that could lead to an increase in espA polymorphism. Another important locus on the LEE island is eae, a highly polymorphic gene that encodes that intimin adhesin. Intimin protein plays a crucial role in the attaching/effacing phenotype and studies have indicated that different groups of pathogenic E. coli possess different eae alleles. This has been used to help classify strains into pathogenic types, so a reliable intimin typing scheme is necessary. Enteropathogenic E. coli. Enteropathogenic E. coli (EPEC) infections are a leading cause of infantile diarrhea in developing nations (63, 82, 83). EPEC are capable of attaching/effacement and are therefore members of the AEEC pathotype. Typical EPEC strains are differentiated from other types of pathogenic E. coli by their ability to form microcolonies on the surface of epithelial cells (6, 100, 141). This phenotype, termed localized adherence, is due to the presence of a large (~70 kb) virulence plasmid called the EPEC adherence factor (EAF) plasmid (29). The EAF plasmid encodes the bundle-forming pilus (BFP), a member of the type IV family of fimbriae. An Operon of 14 genes is necessary for expression of the BFP (31, 140), and the major structural subunit, termed bundlin, is encoded by bfiyA. Another important locus on the EAF plasmid is the plasmid-encoded regulator (Per) consisting of three genes (perA, perB, and perC). PerA shows homology to members of the AraC family of transcriptional activators (45), whereas PerB and PerC Show no Significant homology to any known prokaryotic proteins (45) and their exact role in EPEC pathogenesis is still under investigation. Most typical EPEC strains fall into one of two phylogenetically distinct groups, designated EPEC 1 and EPEC 2 (156). Little is known about the allelic distributions of bpr, perA, and eae among EPEC 1, EPEC 2, and other clonal lineages of EPEC. Shiga toxin-producing E. coli. Strains of Shiga toxin-producing E. coli (STEC) are defined by their ability to produce one or more variants of Shiga toxin. These potent cytotoxins inhibit protein synthesis, resulting in cell death, and are usually encoded by bacteriophages. The main clinical manifestations of STEC infection are hemorrhagic colitis (HC) and hemolytic-uremic syndrome (HUS). HC is a distinctive gastrointestinal illness characterized by severe abdominal pain with cramps, watery diarrhea followed by grossly bloody diarrhea, and little or no fever (125). In some cases, infection with STEC can progress to HUS, which is characterized by acute renal failure, decreased platelet count, and hemolytic anemia (75). Enterohemorrhagic E. coli (EHEC) belong to both the AEEC and STEC pathotypes. These strains are capable of attaching/effacement, express one or more Shiga toxins, cause HC and HUS, and possess a ~60-MDa plasmid (82, 83). Like EPEC, most EHEC strains fall into one of two phylogenetically distinct groups, designated EHEC 1 and EHEC 2 (124). In many parts of the world, the EHEC serotype most frequently associated with severe disease is 015 7:H7 (64). 0157 :H7 and type 1 fimbriae. Type 1 fimbriae are filamentous structures composed primarily of the structural subunit F imA (69)and are expressed on the surface of most clinical E. coli isolates (5). These fimbriae bind to mannose-containing receptors on epithelial cells (109), have been shown to be required for colonization of the urinary tract (23, 53, 65, 132), and may play a role in the colonization of the intestinal tract (76, 77). E. coli 0157:H7 is unusual in that does not express type 1 fimbriae. Genetic analysis has revealed that these strains possess a 16-bp deletion within the invertible genetic element (known as the fim switch) that controls fimbrial expression (85). A study by Roe et al. (127) found that the deletion was responsible for the lack of type 1 fimbrial expression in 0157:H7. Their results demonstrated that the fim switch was permanently locked in the "off" orientation so that transcription offimA did not occur. It is unknown if this deletion is unique to 0157:H7 or if it is also present in other closely related strains. Another area of underdeveloped research concerns polymorphism in fimA. Extensive fimA sequence variation has been described for strains isolated from a wide range of animal hosts (119), but a similar analysis in a diverse set of human isolates has yet to be performed. Recombination also appears to play a role in fimA allelic diversification. Evidence for horizontal transfer of multiple fimA alleles has been reported in a set of closely related extraintestinal strains (155), but it remains unclear if recombination is responsible for fimA sequence variation in other classes of strains. Curli fimbriae. E. coli express thin, coiled surface structures, designated curli, that mediate bacterial binding to a variety of extracellular matrix and serum proteins (113, 114, 136). Research also suggests that curli may play a role in the development of biofilrns on inert surfaces (121, 152). Curli fibers are encoded by two divergently transcribed chromosomal operons, csgBAC and csgDEF G. The assembly of the fibers is unique and involves extracellular self-assembly of the curlin subunit (ngA), dependent on a specific nucleator protein (ngB) (47, 48). The csgD gene encodes the transcriptional regulator of curli production (47) and Uhlich et al (148, 149) identified mutations within the promoter region of csgD responsible for enhanced curli expression. However, an extensive study of the genetic polymorphism in csgA has not been performed. Natural selection could be acting upon csgA to generate allelic variants that form more effective biofilms, thereby enhancing bacterial survival in the external environment. Purpose. The primary objective of this research is to examine the allelic diversity of genes that encode a variety of surface proteins in different classes of pathogenic E. coli. This diversity will be uncovered primarily through direct sequencing of the genes under investigation. Once an allelic database has been assembled, a more rapid and cost- effective typing method will be assessed for some of the loci by examining additional isolates to determine the level of variation that can be resolved by digestion with one or more restriction enzymes. The observed allelic variation in the surface protein-encoding genes will be compared to a phylogenetic framework generated from polymorphisms within conserved housekeeping genes. The questions to be addressed are: How does the level of sequence polymorphism observed within the surface protein-encoding genes compare to that seen within the conserved housekeeping genes? Do the sequence data indicate that these two categories of loci are experiencing similar or different selective pressures? Is there evidence of recombination both within the surface protein-encoding genes and between these loci and the chromosomal backbone represented by the housekeeping loci? It is expected that the answers to these questions will provide insights into the evolution of pathogenic E. coli at the level of individual genes as well as at the strain level. CHAPTER 2 ALLELIC SUBTYPING OF THE INTIMIN LOCUS (EAE) OF PATHOGENIC ESCHERICHIA COLI BY FLUORESCENT RFLP 10 SUMMARY Intimin is a highly polymorphic protein encoded by the eae gene and plays a crucial role in the attaching-effacing phenotype of diarrheagenic Escherichia coli and related pathogens. A method to quickly and accurately uncover allelic variation at the eae locus was developed through the use of fluorescent RFLP (fRF LP). Application of RF LP to 151 eae-positive strains (including the newly described Escherichia albertii) revealed 26 different fRF LP types that correspond to 20 of the 28 previously described (eae alleles. Two sequence variants of the y, I, K, and C; alleles and three variants of a were also observed. In addition to being reliable and accurate, the method can be easily adapted to accommodate new eae allelic sequences, as they become known. 11 INTRODUCTION Strains of attaching and effacing E. coli (AEEC) are capable of intimately attaching to intestinal epithelial cells (38, 62). Once the bacteria attach to the host cells, they manipulate the host cytoskeleton to form pedestal structures beneath the bacterial cells and, in the process, efface the microvilli of the intestinal mucosa (21, 49). This process creates a characteristic intestinal histopathology, termed attaching effacing (AE) lesions. The ability of pathogenic E. coli to form AE lesions is encoded in a pathogenicity island referred to as the locus of enterocyte effacement or LEE (93, 94). The LEE island is ~35-kb in length and can be inserted into one of several chromosomal locations in AEEC clonal groups (36, 120, 137, 157). The LEE island comprises ~40 genes including those that encode the structural components of a type III secretion system (TTSS) (36, 120). The TTSS acts as a molecular delivery system to translocate bacterial effector proteins, many of which are also LEE-encoded, into the host cell (26). Intimin plays a crucial role in the AE phenotype and is encoded by eae, one of the most highly polymorphic genes of the LEE island (20). Intimin is composed of six domains: a periplasmic domain, a transmembrane domain, three extracellular immunoglobulin-like domains, and an extracellular lectin-like domain (89). Much of the transmembrane domain is homologous to the invasins of pathogenic Yersinia and this part of the molecule has been termed the central conserved domain (95). The extracellular domains of intimin interact with cellular receptors (135) including Tir, the translocated intimin receptor that is encoded in LEE and moves to the eukaryotic cell via the LEE encoded TTSS (67). An analysis of 27 intimin alleles from GenBank indicates that the four C-terminal extracellular domains contain more than 75% of the nucleotide 12 variation present within those sequences (unpublished data). The AE strains represent a variety of enteric pathotypes of both humans and animals, including both typical and atypical enteropathogenic E. coli (EPEC), enterohemorrhagic E. coli (EHEC), Escherichia albertii, and Citrobacter rodentium (28, 153). Among the AEEC, there is a striking association of different eae alleles with specific clones or clonal groups of pathogenic E. coli (2, 144, 157). This association of eae alleles with pathogenic lineages has been used to help classify strains into pathotypes, so a rapid and reliable intimin typing scheme is a valuable too]. Here a method to identify allelic variants of the eae locus through the use of fluorescent RFLP (fRF LP) (80) is described. In this method, the entire highly variable 3' half of eae is amplified in a standard PCR reaction using primers that are located in the conserved central domain of eae and in the conserved downstream gene escD. The exact size of the amplicon depends on the Specific eae allele, but is typically about 2 kb in length. The PCR amplicon is then digested with one or more restriction enzymes that leave a 5' overhang that acts as a template for the incorporation of a fluorescent dye- terminator nucleotide from a standard cycle sequencing kit. The multiple labeled restriction fragments are then separated on a capillary-based sequencer and their Sizes estimated to within a few base pairs in length. This fRF LP system provides a rapid method for uncovering allelic variation in eae and for classifying eae subtypes based on complex restriction digests. l3 MATERIALS AND METHODS Bacterial strains and DNA isolation. The strains in this study included 144 AEEC representing 38 O-Serogroups, many of which include EPEC serotypes, and 7 strains from the Escherichia albertii and Shigella boydii 13 clonal lineage (55, 56). All 151 AB strains were grown overnight at 37°C in 10 ml of LB broth with moderate shaking. Genomic DNA was isolated using the Puregene DNA isolation kit (Gentra Systems Inc., Minneapolis, MN). DNA concentrations were determined using a NanoDrop ND-lOOO spectrophotometer (N anoDrop Technologies Inc., Rockland, DE). Working template concentrations of 50 ng/ p1 were used for PCR. PCR and eae-escD primer design. PCR primers were designed in the central conserved domain of eae (eae-F 1: 5'-ACT CCG ATT CCT CTG GTG AC-3') and the conserved downstream gene escD (escD-R1: 5'-GTA TCA ACA TCT CCC GCC CA-3') based on available sequences for the or, [3, y, e, C, and 0 intimin alleles. The eae-F1 and escD-R1 primers are located at positions 25986-26005 and 27918-27937, respectively, of the complete LEE sequence from strain E2348/69 (GenBank accession number AF 022236). Each 25-pl reaction contained 2.5 pl 10X buffer II (Applied Biosystems, Foster City, CA), 2.5 pl 2 mM dNTP, 2.0 pl 25 mM MgC12, 0.5 pl 10 pM eae-F1 primer, 0.5 pl 10 pM escD-R1 primer, 1.5 units AmpliTaq Gold (Applied Biosystems), 1 pl 50 ng/ pl genomic DNA template, and 15.7 pl ddeO. Amplification utilized an initial denaturing step at 94°C for 10 min., followed by 35 cycles of 92°C for 1 min., 55°C for 1 min., and 72°C for 2 min. A final step of 72°C for 5 min. was used for final completion of any partially extended product. PCR products (5 pl) were visualized on ethidium bromide-stained 0.8% agarose gels by illumination with UV light. 14 fRFLP development. In silica digestions were performed to find suitable restriction endonucleases for RF LP. To be considered for RF LP, the restriction enzymes had to 1) leave a 5' overhang, 2) mainly produce fragments in the range of the labeled size standard (60 to 640 bp), and 3) had to produce fragments that could be labeled with fluorescent tags that were different than that of the size standard. Out of 532 restriction enzymes tested in silico, MseI was chosen for RF LP because it meets all of the above Criteria and was found to have the greatest power to discriminate known eae alleles (Figures 1 and 2, Table 1). To confirm the eae allele assignments based on the MseI findings, a second restriction digestion was designed. Multiple enzymes were needed to give a similar amount of pattern variation as seen with MseI. For Simplicity, restriction enzymes were selected that use the same reaction conditions (reaction buffer, incubation temperature, and labeled ddNTP). Three sets of triple-enzyme digests were found. A combination of AseI, DdeI, and SalI was chosen over the others based on lower enzyme cost and the ddNTP used is the same as that for the MseI digest (Table 2). The two digests (MseI and Asel / DdeI / SalI) were used to subtype the intimin alleles from a diverse set of 151 AB strains. PCR clean-up and restriction enzyme digestion. Samples that were positive for the eae-escD PCR were treated with ExoSAP-IT (USB Corporation, Cleveland, OH) to remove unincorporated dNTPS and PCR primers (5 p1 of PCR product and 2 pl of ExoSAP-IT). All restriction endonucleases, reaction buffers, and bovine serum albumin (BSA) solutions were obtained from New England Biolabs Inc. (Ipswich, MA). MseI digests were set up so that each 30-pl reaction contained 7.0 pl ExoSAP-treated PCR 15 cesT eae escD Lb ,’L640 nt) are not shown. b Variants that are indistinguishable by fRF LP with the Msel digest. 18 Table 2. Asel / DdeI / SalI digestion of eae-escD PCR product. fRF LP Amplicon . Predicted fragment Sizes (nt)“ pattern srze (bp) 111.1 1952 106,110,121,138,213,221,278,285,428 (1251 1952 106,110,138,221,278,285,334,428 91.1 1950 110,114,138,213,384,408,583 pzii 1940 106,110,114,138,164,204,213,244,260,278 y1.1 1984 106,110,138,213,246,278 y1.2 1984 106,110,138,198,246,278 31.1 2036 60,106,110,138,141,141,142,213,244,267,384 21.2 2142 60,110,138,141,141,142,212,213,244,267,384 21.3 2142 60,110,138,141,212,213,244,267,283,384 2:21 2137 106,110,123,138,144,213,244,278,334,348 c1.1b 1946 106,110,115,138,198,237,278 Qlu2b ~1946 106,110,115,138,198,237,278 n L1 2140 110,138,141,151,183,244,262,267,597 91.1 2011 106,110,120,138,222,278,357 11.1b 1824 65,106,110,138,213,238,240,278,344 11.2b 1824 65,106,110,138,213,238,240,278,344 121. 1831 73,91,110,138,239,246,285,597 K111 1939 85,106,110,138,171,198,213,278,278,310 xxLz 2716 85,106,110,138,171,174,198,213,278,278,330,583 i111 2665 100,106,109,110,131,138,198,278,325,481,584 111.1 2006 106,110,120,138,144,206,213,278,639 v1.1 1944 110,121,138,582 51.1 2141 110,138,141,151,192,213,244,263,267,384 111.1 -2020 85,110,136,138,158,200,278,304,611 p1.1 2013 106,110,138,165,196,218,278 11.1 1942 110,138,213,384,473,624 “ Fragments outside the range of the size standard (<60 nt or >640 nt) are not shown. b Variants that are indistinguishable by RF LP with the Asel / DdeI / SaII digest. 19 product, 0.3 pl 100X BSA, 0.5 pl 10 U/pl Msel, 3.0 pl 10X NEB2 buffer, and 19.2 pl ddeO. AseI / DdeI / Sail digests were set up so that each 30-pl reaction contained 7.0 pl ExoSAP-treated PCR product, 0.3 pl 100X BSA, 0.5 pl 10 U/pl Asel, 0.5 pl 10 U/pl Ddel, 0.25 pl 20 U/ pl Sail, 3.0 pl 10X NEB3 buffer, and 18.45 pl ddeO. Reactions were incubated overnight at 37°C. fRFLP. fRF LP was performed using the CEQ DTCS standard kit (Beckman Coulter Inc, Fullerton, CA). Each reaction contained 2.0 pl of unpurified restriction enzyme digest, 1.5 pl 10X reaction buffer, 0.1 pl ddUTP, 0.1 pl T aq DNA polymerase, and 11.3 pl ddeO. Samples were incubated at 60°C for 1 hour, purified with Sephadex G-50 Fine columns (Amersham Pharrnacia Biotech Inc., Piscataway, NJ), dried under vacuum centrifugation (Savant Instruments Inc., Holbrook, NY), and suspended in 10 pl of deionized fonnamide. Of this, 2 pl were mixed with 0.6 pl CEQ DNA Size Standard 600 (Beckman Coulter Inc., Fullerton, CA), and 39.4 pl deionized formamide, and run on a CEQZOOOXL (Beckman Coulter Inc.) using a capillary temperature of 50°C, a denature step at 90°C for 2 min., injection at 2.0 kV for 30 sec., and separation at 4.8 kV for 65.0 min. (Msel digest) or 90 min. (Asel / DdeI / SalI digest). Fragment sizes were determined with the CEQZOOOXL software, version 4.3.9. 9 DNA sequencing. The complete eae gene was sequenced in at least one representative strain for each fRF LP pattern identified. The 5' half of eae was amplified using primers cesT-F 9 (5'-TCA GGG AAT AAC ATT AGA AA-3') and eae-R3 (5'-TCT TGT GCG CTT TGG CTT-3') using the same PCR conditions described above. PCR products were purified using the QIAquick PCR purification kit (QIAGEN Inc., Valencia, CA) and quantified with a NanoDrop ND-1000 Spectrophotometer. Cycle 20 sequencing reactions contained 6.0 pl CEQ DTCS Quick Start premix (Beckman Coulter Inc.), 1.5 pl 20 pM primer, approximately 180 ng of cesT/eae product or 250 ng of eae/escD product, and ddeO to 15 pl. Amplification utilized an initial denaturing step at 94°C for 1 min., followed by 35 cycles of 96°C for 30 sec., 52°C for 30 sec., and 60°C for 2 min. Upon completion of cycle sequencing, samples were purified with Sephadex G—50 Fine columns, dried under vacuum centrifugation, suspended in 40 pl of deionized formamide, and sequenced using a Beckman CEQ2000XL DNA sequencer. Samples were analyzed using the CEQ2000XL software and then exported for fiuther analysis with the SeqMan and MegAlign modules of the Lasergene software (DNASTAR Inc., Madison, WI). Internal sequencing primers were designed as new sequence data were generated. 21 RESULTS The ability of the eae fRF LP method to resolve intimin alleles was first tested on 53 strains based on the expected patterns from the in silica digestion of available eae sequences from GenBank. This initial study included 34 AE strains, most of which had previously been known to have the or, [3, or y intimin alleles, most often found in strains associated with human infection (Table 3). Nineteen additional AE strains were selected and tested by the fl1F LP method because their intimin allele had been determined by other researchers or by previous work in the Whittam laboratory based on DNA sequencing or conventional RFLP (Table 4). The intimin alleles of all 53 strains were confirmed by both the Msel and Asel / DdeI / SalI digests. To further evaluate the new method, 98 eae-positive strains for which no allelic subtyping data existed were tested. These strains were originally recovered from two separate populations: a pediatric population in Seattle, Washington (27) and a cohort study of childhood diarrhea] disease in Guinea-Bissau, West Africa (150). Among these 98 eae-positive strains, most strains (85%) exhibited known digestion patterns with both Msel and the triple digest, and therefore could be easily subtyped to an allele. The 15 strains for which the eae allele could not be determined had one of seven previously unobserved digestion patterns. A representative of each pattern was sequenced and the alleles were either previously described or variants of previously described alleles (81.3, 32.1, Kl.2, 71.1.1, v1.1, p1.1, and £11). Among the 151 strains examined, there were a total of 24 different fRF LP patterns observed for the Msel digests (Table 1, Figure 2). The Asel / DdeI / SalI digest also resolved 24 distinct fRF LP patterns (Table 2). For two alleles, 812/813 and 22 Table 3. Allelic variation in eae based on fRF LP. Alffislein Strain name Serotype“ 80:32:13 Source and pathotype TW00588 DEC 1a OSS:H6 or 1.1 Human EPEC TW07884 E851/71 Ol42zH6 01 1.1 Human EPEC TW07923 RN587/1 0157:[h45] 01 1.1 Human EPEC TW04262 TB269C Ol45:[h34] or 2.1 Human atypical EPEC TW01120 Bl70 Olll:[h2] B 1.1 Human EPEC TW05355 13180-25 01 1 1:H11 B 1.1 EHEC from food TW00148 3448-87 01142H2 B 1.1 Human EPEC TW00389 29315 01 19:H2 B 1.1 Human EPEC TW07099 LT119-80 Ol 19:H2 B 1.1 Human EPEC TW01266 C342-62 01262H2 B 1.1 Human EPEC TW07896 E56/54 01282H2 B 1.1 Human EPEC TW01664 DEC lOi Ol452H16 B 1.1 Human EHEC TW05149 BCL73 Ol45:[h-] B 1.1 Bovine STEC TW0786O 314-S Ol45:[h16] B 1.1 Bovine STEC TW08894 02-3422 Ol45:[h2] B 1.1 Rabbit EPEC TW09153 IH 16 Ol45:[h-] B 1.1 Human STEC TW07924 Z188-93 0110:H6 B 2.1 Avian EPEC TW01225 1396/69 01 19:H6 B 2.1 Human EPEC TW03293 ECOR-37 O-:[h7] y 1.1 Marmoset atypical EPEC TW00947 DEC 5d OSS:H7 y 1.1 Human atypical EPEC TW03064 B6820—C1 Ol45:[h28] y 1.1 Bovine STEC TW07596 GS G5578620 Ol45:[h28] y 1.1 Human STEC TWO7865 IHIT0304 Ol45zH28 y 1.1 Bovine STEC TW08087 MT#66 Ol45:[h28] y 1.1 Human STEC TWO9356 4865/96 Ol45:[h28] y 1.1 Human STEC TW08881 3556-77 Boydii 13 y 1.2 Human atypical B13 TW08887 3557-77 Boydii 13 y 1.2 Human atypical B13 TW08889 3053-94 Boydii 13 y 1.2 Human atypical B13 TW07618 98ST607 Ol 10:H28 C 1.1 Human STEC TW00964 75-83 Ol45:[h25] Q 1.1 Human STEC TW05307 LT055-43 OSS:H7 9 1.1 Human atypical EPEC TW07960 DA-34 OlO3:[h25] 0 1.1 Human STEC TW00970 DEC 8b 01 1 1:H8 0 1.1 Human EHEC TWO7888 010-311082 O76:H51 p 1.1 Human EPEC “ Lower case H-types in brackets were inferred from fliC allele. 23 Table 4. Reference strains for defined intimin alleles and patterns determined by fRF LP and confirmed by DNA sequencing. Pattern Intimin eae fRFLP Reference strain Species and serotype“ no. allele pattem 1 or a 1.1 TW06375 (E2348/69) E. coli 01272H6 2 012 or 2.1 TW01270 (C712-65) E. coli 01252H6 3 [3 p 1.1 TW07862 (413/89-1) E. coli 026:[h11] 4 132 p 2.1 TW07894 (0659-79) E. coli 01192H6 5 y y 1.1 TW08264 (Sakai) E. coli 0157:H7 6 y y 1.2 TW08888 (3092-94) s. baydii type 13 7 e a 1.1 TW08101 (MT#80) E. coli 0103:H2 8 e a 1.2 TW08023 (MT#2) E. coli 0121:H19 9” e a 1.3 TW10363 (83F4) E. coli O-:[h8] 10" 82 a 2.1 TW10371 (98B3) E. coli 0116:[h9] 11 g c 1.1 TW07863 (537/89) E. coli 084:[h2] 12 g g 1.2 TW04892 (921) E. coli 0111:H9 13 r] n 1.1 TW07892 (012-050982) E. coli 0142:[h21] 14 0 e 1.1 TW01387 (CL-37) E. coli 0111:H8 15 1 1 1.1 TW01933 (125259) E. coli 055:[h34] 16 1 1 1.2 TW04174 (TB227C) E. coli 086:[h8] 17 12 1 2.1 TW08839 (C-425) s. baydii type 13 18 1e 1e 1.1 TW06584 (C295-53) E. coli 0861H34 19” 1e 1e 1.2 TW10337 (64B4) E. coli O49:[h10] 20” i I. 1.1 TW10327 (57A1) E. coli O33:[h34] 21 11 p 1.1 TW08260 (MA551/1) E. coli 055:[h51] 22” v v 1.1 TW10376 (106A5) E. albertii 23” t; g 1.1 TW10334 (60A3) E. coli 05:[h2] 24 o o 1.1 TW07627 (Albert 19982) E. albertii 25” p p 1.1 TW10366 (9314) E. coli 021:[h5] 26 1 1 1.1 TW08933 (K-1) 5. baydii type 7 “ Lower case H-types in brackets were inferred from fliC allele. b fllFLP pattern discovered in AEEC strains from Guinea-Bissau. 24 y1.2/y1.3, variants are resolved by the Asel / DdeI / SalI digests and are indistinguishable with Msel digestion. In contrast, two variants that are resolved by Msel (11.1/11.2 and 0.1/61.2) are indistinguishable with the triple digest. Combined there are 26 distinct fRFLP patterns (Table 4). In all, the new fllF LP method is able to identify 20 of the 28 previously described alleles, and differentiated two new variants of the y, l, K, and Q alleles, and three new variants of e. To gauge the accuracy of the fragment Size estimation, the expected fragment sizes based on in silica digestions were compared to those observed from the capillary sequencer. Overall the fragment scoring was accurate with most fragments over 100 nt in length less than 2% different in size from their expected values (Figure 3). Examination of the plot reveals that for fragments less than ~100 nt and greater than ~550 nt, the estimated fragments size deviates from the expected (Figure 3). For the 83 fragments observed in the MseI digest, the average deviation from expected Size is 1.76%. The triple digest was more accurate with an average deviation of 1.01% across the 70 distinct fragments. 25 CD 7i . o Msel Q 0 “7:83) 2 6- ' “d 3 o Asel/Ddel/Sall g 5_o (n=70) .‘ , g 4- .' c) 0 co 9° 4... fl . C 3- ca o 0 I K a) 2‘ o 9- o 1- °°° 0 ll v - v I ‘ I I I I I 100 200 300 400 500 600 Fragment Size (nt) Figure 3. Accuracy of fragments sizes estimated by fRF LP. The percentage observed difference from the expected fragment Size is plotted against the expected size. The lines mark the percentage deviation for 1, 2, and 5 nt respectively. 26 DISCUSSION Current eae typing schemes either focus on allele-specific PCR amplification (2, 9, 61, 115, 1.18, 123, 160) or conventional RFLP analysis (60, 118, 122, 133). Sequence analysis of intimin alleles has revealed that many eae alleles are mosaics with segments having different evolutionary histories (95, 144). Therefore, allele-specific PCR amplification can lead to erroneous typing results. For example, the p allele would be erroneously typed as y by the Reid method (123) because the y allele primer is located in a region that is shared between these two alleles (data not Shown). In addition, as new alleles are discovered, new primers are necessary to amplify these alleles and the subtyping scheme becomes more complex. Conventional RFLP analysis also has its limitations. The 5' half of eae iS relatively conserved among the different alleles, so there may not be sufficient variation in the amplicons to accurately and reliably differentiate the alleles. For example, EPEC strain 1396/69 (01 19:H6) was typed by Jenkins et al. (60) as possessing the 7 allele of eae instead of the B2 allele. In silica analysis revealed that the B2, 7, and )1. alleles of eae are virtually indistinguishable under the Jenkins system. Another limitation of conventional RF LP is that a system based on the highly variable 3' half of the gene may be difficult to score Since small differences in the banding patterns of different alleles may not be easily discernible under standard electrophoretic conditions. The RF LP method described here addresses many of the limitations of the existing eae typing methods. A drawback of the devised method, however, is that it requires the location of escD relative to eae to be conserved. If this location changes such that escD is either upstream or far downstream of eae, PCR amplification will not 27 occur and the strain will be nontypeable by this method. However, most of the known alleles for eae have been amplified and observed, so this situation does not appear to be common. The new subtyping method has been tested against a diverse panel of 87 eae- positive isolates originally recovered from children in West Afiica and also uncovered seven additional variants. The remaining alleles (B3, 83, 84, 112, TC, 0) still need to be tested. Another limitation is that some of the eae alleles in GenBank do not contain their associated downstream escD sequence, so the expected amplicon size and fRF LP profile cannot be fully determined in all cases. When new fRF LP patterns are observed, eae needs to be completely sequenced to verify the allele, but then the new pattern can be added to the fRF LP database for future reference. 28 ACKNOWLEDGEMENTS I would like to thank Lindsey Ouellette, Dr. Weihong Qi, and Dr. Hans Steinsland for technical assistance. This work appears in the August 2006 issue of FEMS Microbiology Letters (261 :80-87). This project has been funded in part with Federal funds from the NIAID, NIH, DHHS, under NIH Research Contract # N01 -AI-30058. 29 CHAPTER 3 MOLECULAR EVOLUTION OF TYPICAL ENTEROPATHOGENIC ESCHERICHIA COLI 30 SUMMARY Enteropathogenic Escherichia coli (EPEC) infections are a leading cause of infantile diarrhea in developing nations. Typical EPEC are differentiated from other types of pathogenic E. coli by two distinctive phenotypes —- attaching effacement and localized adherence. The genes Specifying these phenotypes are found on the locus of enterocyte effacement (LEE) and the EPEC adherence factor (EAF) plasmid. To describe how typical EPEC have evolved, a diverse collection of strains was characterized by performing multilocus sequence typing (MLST) and restriction fragment length polymorphism (RFLP) analysis of three virulence genes (eae, bfirA, and perA) to assess allelic variation. Among 129 strains representing 20 O-serogroups, 21 clonal genotypes were identified using MLST. RFLP analysis resolved 9 eae, 9 bjfaA, and 4 perA alleles. Each bfirA allele was associated with only one perA allele class, suggesting that recombination has not played a large role in shuffling the bpr and perA loci between separate EAF plasmids. The distribution of eae alleles among typical EPEC strains is more concordant with the clonal relationships than the distribution of the EAF plasmid types. These results provide further support for the hypothesis that the EPEC pathotype has evolved multiple times within E. coli through separate acquisitions of the LEE island and EAF plasmid. 31 INTRODUCTION Enteropathogenic E. coli (EPEC) infections are a leading cause of infantile diarrhea in developing nations (63, 83). A key characteristic of EPEC strains is the ability to intimately attach to intestinal epithelial cells and create attaching and effacing (AE) lesions (38). The AE phenotype is Specified by genes of the locus of enterocyte effacement (LEE), a ~35-kb pathogenicity island located in the bacterial chromosome 4 (36, 93). The LEE island comprises approximately 40 genes and encodes the components of a type III secretion system, various effector molecules, and the intimin adhesin (36, 66, 151). Intimin plays a crucial role in AB lesion formation (21) and is encoded by the highly polymorphic eae gene (2, 160). To date, more than 25 major allelic variants of eae have been described (79). Most typical EPEC strains fall into one of two phylogenetically distinct groups or clonal lineages, designated EPEC 1 and EPEC 2 (156), and differ from atypical EPEC and other types of pathogenic E. coli by their ability to form microcolonies on the surface of intestinal epithelial cells (6). This phenotype, termed localized adherence (LA), correlates with the presence of a large virulence plasmid called the EPEC adherence factor (EAF) plasmid (29). The EAF plasmids from different EPEC strains Show considerable variation in size (~70 to 110 kb) (15, 102, 145) and, presumably, gene content. Comparison of the complete EAF plasmid sequences from two prototypical EPEC strains (01272H6 EPEC 1 strain E2348/69 and O] 11:NM EPEC 2 strain B171) indicates that the EPEC 2 plasmid of B171 carries fewer genes (80 vs. 115 open reading frames) and a greater percentage of intact or partial insertion sequence elements (33% vs. 19%) than the pMAR7 plasmid of EPEC 1 strain E2348/69 ( 15, 145). Nevertheless, 32 certain parts of the plasmid Show a high degree of sequence conservation among typical EPEC strains (101), particularly in the region encoding the bundle-forming pilus (BFP), a type IV fimbria whose production is associated with the LA phenotype. An operon of 14 genes is necessary for expression of the BFP (31, 140), with bfiaA encoding the major structural subunit (bundlin). Sequence comparisons of 9 bfirA alleles have provided compelling evidence for the action of positive selection at the molecular level (10, 11). A second locus on the EAF plasmid implicated in the full virulence of EPEC is the plasmid- encoded regulator (Per), consisting of three genes (perA, perB, and perC). Per has been shown to activate genes within the bfia operon (146) and the LEE pathogenicity island (35, 96). Little is known about the allelic distributions of eae, bpr, and perA among the EPEC 1, EPEC 2, and other clonal lineages of typical EPEC. In this study a diverse collection of 129 EPEC, including strains of the classical EPEC serotypes (117), was characterized through multilocus sequence typing (MLST) and restriction fragment polymorphism (RFLP) analysis to elucidate the extent to which horizontal transfer of the LEE island and EAF plasmid have contributed to the evolution and diversification of EPEC clones. 33 MATERIALS & METHODS Strains. A collection of 95 EPEC strains was assembled based on serotype or their inclusion in one of two studies examining bfiaA allelic variation (10, 11). These strains represent a variety of serotypes originally isolated between 1947 and 1998 from different regions around the world and were obtained from a number of sources, including the Centers for Disease Control and Prevention, Dr. Alejandro Cravioto, Prof. Helge Karch, Drs. Frits and Ida Drskov, Dr. Phillip I. Tarr, and Dr. Luis Trabulsi (Table 5). An additional 34 eae+ bfi9A+ strains were selected from a cohort study of childhood diarrhea] disease in Guinea—Bissau, West Afiica (150) (Table 5). Each strain was grown overnight at 37°C in 10 m1 of Luria-Bertani (LB) broth with moderate Shaking. Genomic DNA was isolated using the Puregene DNA isolation kit (Gentra Systems Inc., Minneapolis, MN). DNA concentrations were determined using a NanoDrop ND-lOOO spectrophotometer (N anoDrop Technologies Inc., Rockland, DE), which were diluted to 25 ng/pl for PCR. MLST. Multilocus sequence typing (MLST) was performed on 7 conserved housekeeping genes (aspC, cle,fadD, ich, lysP, mdh, and uidA). A detailed protocol of the MLST procedure, including allelic type and sequence type (ST) assignment methods, can be found at the EcMLST website (http://www.shigatox.net/mlst). Sequences were concatenated for phylogenetic analyses. eae fRFLP. Allelic variation in eae was resolved using fluorescent RFLP (fRFLP) as described previously (79). bpr PCR and DNA sequencing. PCR primers were designed to target conserved flanking regions of bfiaA based on 9 available allelic sequences (10, 11). Each 34 Table 5. Summary of EPEC strains investigated. 36mm“? 13:125.. 15311511.? 1.0.11.1.) Classical 055 21 1947-1998 Brazil, Congo, Dutch Guiana, France, Germany, Guinea-Bissau, Mexico, Scotland, USA 086 10 1950-1997 Brazil, Bulgaria, Denmark, Germany, Guinea-Bissau, UK, USA 0111 16 1947-1996 Austria, Brazil, Dutch Guiana, Germany, Mexico, Peru, Scotland, UK, USA 0114 4 1969-1997 Guinea-Bissau, UK, USA 0119 28 1960-1998 Brazil, Chile, Guinea-Bissau, Mexico, Peru, UK, USA 0127 4 1969-1997 Guinea-Bissau, UK 0142 11 <1960-1997 Brazil, Canada, Guinea-Bissau, Indonesia, Peru, Portugal, Scotland, USA ch_er_ 02 1 1997 Guinea-Bissau 033 2 1997 Guinea-Bissau 034 1 1997 Guinea-Bissau 049 2 1997 Guinea-Bissau 051 1 1997 Guinea-Bissau 073 1 1997 Guinea-Bissau 076 l 1982 Peru 0110 1 1993 Germany 0126 3 1962-1964 Egypt, Iran, Pakistan 0128 15 1953-1991 Denmark, Germany, Pakistan, UK, USA 0157 3 1983-1998 Brazil, USA 0X9 1 1997 Guinea-Bissau 0- 3 1996-1997 Guinea-Bissau 35 25-pl reaction contained 2.5 pl 10X buffer 11 (Applied Biosystems, Foster City, CA), 2.5 p12 mM dNTP, 2.0 pl 25 mM MgC12, 0.5 pl 10 pM bpr_-52F primer (5'-AGA TTA TTC CGT GAC CTA TT-3'), 0.5 pl 10 pM bpr_9R primer (5'-TGT CCT CAC ATA TAC CTC CC-3'), 1.5 U AmpliTaq Gold (Applied Biosystems), 1 pl 25 ng/pl genomic DNA template, and 15.7 pl ddH20. Amplification of the approximately 700-bp fragment utilized an initial denaturing step at 94°C for 10 min, followed by 35 cycles of 92°C for 1 min, 52°C for l min, and 72°C for 30 s. A final step of 72°C for 5 min was used to complete any partially extended product. PCR products (5 pl) were visualized on ethidium bromide-stained 1.5% agarose gels by illumination with UV light, purified using the QIAquick PCR purification kit (QIAGEN Inc., Valencia, CA), and quantified. Cycle sequencing reactions contained 4.0 p1 CEQ DTCS Quick Start premix (Beckman Coulter Inc., Fullerton, CA), 1.0 pl 20 pM bpr_-52F or bpr_9R primer, approximately 70 ng of bpr PCR product, and ddH20 to a final volume of 10 pl. Amplification utilized an initial denaturing step at 94°C for l min, followed by 35 cycles of 96°C for 30 s, 52°C for 30 S, and 60°C for 2 min. Upon completion of cycle sequencing, samples were purified with Sephadex G-50 Fine columns (Amersham Phannacia Biotech Inc., Piscataway, NJ), dried under vacuum centrifugation (Savant Instruments Inc., Holbrook, NY), suspended in 40 pl of deionized formamide, and run on a CEQ2000XL (Beckman Coulter Inc.). Samples were analyzed using the CEQZOOOXL software and then exported for further analysis with the SeqMan module of the Lasergene software (DNASTAR Inc., Madison, WI). bpr PCR and RFLP. PCR conditions were identical to those described above for bfiaA except primers bpr_l 14F (5'-GTC TGC GTC TGA TTC CAA TA-3') and 36 bpr_521R (5'-TCA GCA GGA GTA ATA GC-3') were used to amplify a 408-414 bp internal fragment of the gene. Prior to digestion, each bpr PCR product was purified using the QIAquick PCR purification kit. Three different restriction enzyme digests were used. Digestion with AluI and with Bfal was performed in separate 30-pl reactions with 10 U of enzyme, 3.0 pl 10X reaction buffer, and 26.0 pl purified PCR product; and samples were incubated overnight at 37°C, while digestion with 10 U osttNI was performed in 30-pl reactions with 3.0 pl 10X reaction buffer, 0.3 pl 10OX BSA, and 25.7 pl purified PCR product followed by an overnight incubation at 60°C. All restriction enzymes were obtained from New England BioLabs Inc. (Ipswich, MA) and the reaction buffer provided with each enzyme was used. After incubation, 15 pl of the digests were visualized on ethidium bromide-stained 1.5% agarose gels by illumination with UV light. ' perA PCR and DNA sequencing. Primers were designed to target the conserved flanking and internal regions of perA based on 15 available sequences (45, 111, 146). PCR conditions are Similar to those described above for bjjaA except primers perA_-24F (5'-AAC AAA CGC GCA TGA AGG TG-3') and perB_222R (5'-TTC GCC GGT GAT GTG GTC T-3'), a 58°C annealing temperature, and a l min extension time were used. The resulting PCR products (approximately 1.1 kb) were purified and quantified as described above. Cycle sequencing reactions were similar to those used for bfaA, except that 120 ng ofperA PCR product; primers perA_-24F, perA_539F (5'-AAA ACT GGA AAC TAG GCG ATG TCA-3'), perA_562R (5'-TGA CAT CGC CTA GTT TCC AGT TTT-3'), and perB_222R; and a 5 8°C annealing temperature were used. 37 perA PCR and RFLP. PCR conditions were similar to those described for bpr except primers perA_-24F and perA_562R were used at an annealing temperature of 58°C. Digestion with DdeI and with Sau96l was performed in separate 30-p1 reactions with 10 U of enzyme, 3.0 pl 10X reaction buffer, 20.0 pl unpurified PCR product, and ddH20 to volume; samples were incubated overnight at 37°C, and visualized on ethidium bromide-stained 1.5% agarose gels. fliC typing. Strains that were nonmotile or lacked flagellar serotype data were typed for the fliC locus. The entire fliC gene was amplified with primers fliC_1F (5'- ATG GCA CAA GTC ATT AAT ACC AA-3') and fliC_1497R (5'-TTA ACC CTG CAG CAG AGA CA-3') using the same PCR conditions described for bfiyA except for an annealing temperature of 55°C and an extension time of 2 min. Amplicons (approximately 2 kb) were either digested with 5 U ofDdeI under conditions similar to that described for perA or sequenced to determine the allele. H-types that were determined by fliC sequencing or RFLP are denoted with a lower case ‘h' and are enclosed in square brackets. Phylogenetic analyses. Sequences were aligned with the ClustalW algOrithm using the MegAlign module of the Lasergene software. Neighbor-joining trees were constructed using the Kimura 2-parameter model of nucleotide substitution with the MEGA3 software (78) and the inferred phylogenies were each tested with 500 bootstrap replications. Phylogenetic network analysis was conducted with the SplitsTree 4 (54) program using the neighbor-net algorithm (18) and untransformed distances (p distance). The (I)... recombination test (17) as implemented by SplitsTree 4 was used to distinguish recurrent mutation from recombination in generating genotypic diversity. 38 RESULTS MLST analysis. PCR amplification and sequencing of the 7 MLST loci in 129 EPEC strains was successful in most (90%) cases. The notable exception was uidA, which failed to amplify in 13 strains, including a Single 0114:H2 (380/69) strain and 12 strains 11 of which are serotype 055:[h51]. The 12 uidA-negative strains were identical to each other at the 6 remaining MLST loci. PCR amplification with primers located in the genes flanking the uidA locus produced a truncated amplicon suggesting that these strains belong to a clonal genotype that has lost most of the uidA gene (data not shown). For phylogenetic analysis, the sequenced internal fragments of the 7 housekeeping genes were concatenated to yield 3,732 nucleotides. The uidA locus was treated as missing data and replaced with alignment gaps in the fully concatenated sequence for the 13 uidA- negative strains. MLST analysis resolved an average of 25.4 variable nucleotide Sites per locus, which defined a number of alleles, ranging from 7 to 12, at the 7 MLST genes (Table 6). The distinct combinations of alleles across the 7 MLST loci were used to define 21 multilocus genotypes or sequence types (STS) among the 129 EPEC strains. Classification of the strains based on the bootstrap analysis indicates that most (77%) of the strains belong to one of four main clonal groups, designated EPEC 1, EPEC 2, EPEC 3, and EPEC 4 (Figure 4). With the exception of EPEC 3 strains, which were all 0862H34 (or nonmotile relatives), the EPEC groups based on the classification of STs included strains of various 0 types. There were strains representing 3 0-types in EPEC 1 (055, 0127, and 0142), 5 O-types in EPEC 2 (0111, 0114, 0119, 0126, and 0128), and 2 0-types in EPEC 4 (0110 and 0119). H-types (or the inferred H-type from the 39 Table 6. Sequence variation among alleles of 7 MLST genes from EPEC strains. # of # of # of Locus sites variable sites alleles aspC 513 21 9 cle 567 34 l 1 fadD 483 32 1 1 1'ch 567 30 8 lysP 477 10 8 mdh 549 25 12 uidA 576 26 7 Average 533.1 25.4 9.4 40 333 EAF type ’ :sr-or (18) 1, 8 98' sr-02 (2) : EPEC 1 —L sr-05 (2) sr-oe (1) sr-o7 (3) 1—ST-08 (1) —-{ST'°9(7) EPEC3 -'--°-°_ ST-19 (1:) . ST-11(1) ' sr-12 (12) _ ——I.$T-1‘3i215-)-EPEC 4 .'°° ”ST-14(1) sr-15 (2) ST-16 (1) ST-17 (2) ST-1812)“ ; swam.) 100 ST-ZO (29) EPEC 2 _L O a QKKKQJPQQQR 0100 N 9° l mhwww-h-K—‘ACO-hwflm ”KEG"; N _L O 100 N 2,4, 7, 11,— 4 muuufi I—-l 0.002 substitutions / site Figure 4. Phylogenetic relationships of 21 EPEC sequence types. An unrooted phylogenetic tree was constructed by the neighbor-joining algorithm based on the Kimura 2-parameter model of nucleotide substitution. The four main clonal groups are indicated by gray boxes. The sequence type (ST) and number of isolates are given at the branch tips. Bootstrap values greater than 50% based on 500 replications are given at the internal nodes. The distributions of eae alleles and EAF plasmid types are shown on the right (see Table 9 for plasmid type definitions). 41 fliC allele) were conserved among strains of each group: EPEC 1 strains were H6, EPEC 2 were H2, EPEC 3 were H34, and EPEC 4 were H6. These four clonal groups were represented among both the worldwide and Guinea-Bissau strains. The 21 STS differed on average at 1.4% and 0.2% of the nucleotide and amino acid Sites, respectively. ST-20 was the most common multilocus genotype (22.5%) followed by ST-13 (19.4%), and ST- 1 (14.0%) (Figure 4). The splits network (Figure 5) reveals several parallel paths indicative of the presence of phylogenetic incompatibilities in the divergence of EPEC clones. Such incompatibilities could arise from recurrent mutation or recombination in the MLST loci. The 4)... test was used to detect recombination since it has been shown to discriminate between recurrent mutation and recombination in a variety of circumstances (17). In application to the concatenated sequences of the 21 STS, there were 129 informative Sites and the (I)... test found statistically Significant evidence of recombination (p <0.001). The four main clonal groups, however, are separated and intact. Three of the four groups occur at the end of long branches without evidence of multiple paths suggesting that recombination occurred early in the divergence of EPEC genotypes. With an EPEC phylogenetic framework in place, the allelic distributions of eae, bfaA, and perA were assessed. Allelic variation in eae. The eae locus was subtyped by fRF LP (79) and 9 alleles (a, B, B2, 8, n, r, K, A, and p) were observed among the 129 EPEC strains (Figure 4). As shown previously (2), the EPEC 1 and EPEC 2 clonal groups possess the or and B alleles of eae, respectively. Furthermore, strains of the same ST had identical eae alleles as resolved by RF LP. The or-eae allele had the widest distribution among the EPEC 42 EPEC 3 ST 10(1) - SM" 3109 (7‘)“ (1) ST'06 (1), s_T-o1 (18) sr02(2) EPEC1 sr-07(3) 3 \ °ST-03 (4) ST-04(3). \’\§ 'sr-12 (12) O ; ST-05(2)' _ _ 4 , sr-11(1) ,ST-13125) - ~°ST-14(_1_)- . EPEC 4 / \\ \ \ w . ST-18 (2) sr-17 ,sr-19 (11) ST '15 (2) ST-20 (29) () ST-16 ST-21 (1) 0.002 (1) spec 2 substitutions / site Figure 5. Phylogenetic network of 21 EPEC sequence types. Phylogenetic (Splits) network is based on the neighbor-net algorithm using a p distance matrix. The four main clonal groups are indicated by gray ellipses. The sequence type (ST) and number of isolates are given at the branch tips. 43 clones, being found in strains with 051:[h49], 073:[h34], 0142:[h34], and 0157:[h45] serotypes in addition to the EPEC 1 group (055:H6, 0127:H6, and 0142:H6). The rarest eae alleles were a, n, 1, and A, and combined, these alleles account for less than 5% of the strains examined. Allelic variation in bfirA. The entire bfiaA gene was amplified and sequenced in 15 strains with STS for which bfiaA allelic data were not previously available (STS 4, 5, 7, 8, ll, 15, 16, 17, and 18). Comparative sequence analysis revealed the existence of a tenth allele of bfirA, which was designated B7.] (Figure 6). A minor variant of this allele that differed by a Single synonymous substitution was designated B7.2. Using the identified bfiaA sequences, an RFLP-based typing system was devised to subtype bpr alleles based on new PCR primers designed to target the conserved internal regions of the gene. Three restriction enzymes (AluI, Bfal, and BstNI) were identified which, when used separately produced digestion patterns that combined could resolve 9 bfiaA alleles (Table 7). In silica analysis with over 500 restriction endonucleases failed to identify an enzyme that could easily distinguish the B1 and B7 alleles. However, B1 and B7 bfaA strains can be easily differentiated based on their perA allele (see below). PCR amplification of 12ij was successful in all but 21 isolates. 0f the bpr- negative strains, 01282H2 was the most common serotype with 13 isolates. RF LP analysis of the 108 bfiaA-positive strains showed that the 011 (n=23) and 013 (n=24) alleles were the most common. The 012 (n=15) and B5 (n=17) alleles were also frequently identified, but the B2 (n=1), B3 (n=4), B4 (n=7), and B6 (n=2) alleles were rarely observed. Fourteen strains were classified as B1/B7 by RFLP, DNA sequencing confirmed 11 as B1 and 3 as B7. The 011 and 012 alleles of bpr, as well as B4 and B5, 44 I-—I Locations of polymorphic amino acids substitgtighs / site 98 (11 L 1' l “' IIEIID 100 012 I ILJ III JICIII 1- 613 1 in 11 1mm: — B6 I 1 II] I 111 100—131 I I III III [III [37.1 C L I II [I] I 9—11-[372 l I I II III I 76 [32 I II I I I I 79 B3 i I 1 I] 1 I 99 I34 I I I Tn 1 1 1 95 [35 I I I I E l J Figure 6. Eleven bpr alleles cluster into two major groups. A phylogenetic tree constructed by the neighbor-joining algorithm based on the Kimura 2-parameter model of nucleotide substitution is Shown on the left. Bootstrap values based on 500 replications are given at the internal nodes. To the right is a graph of the locations of the 39 polymorphic amino acid Sites (195 total), which are marked as vertical lines that indicate differences from the consensus of all 11 alleles. 45 Table 7. Expected restriction fragment length polymorphisms of bpr PCR amplicons. bjfaA Digestion pattern (bp)“ allele AluI Bfal BstNI 011 408 408 155, 253 012 408 75, 333 155, 253 013 408 75, 333 408 B1 _1__6_,_53, 113,231 fi,375 414 B2 16, 170, 228 414 414 p3 _1_6, 54, 344 39, 375 414 B4 _1_6, 177, 215 32, 369 408 B5 _Ifi, fi, 123, 215 _3_9_, 369 408 B6 16, 392 3_9_, 369 408 B7 E, _5_4_, 113, 225 32, 369 408 “ Underlined fragments are not detectable under standard electrophoretic conditions. 46 differ only by one nonsynonymous nucleotide substitution. The two sets of closely related bpr alleles were found in divergent EPEC lineages: EPEC 1 contains 011 and B5 whereas EPEC 2 has 012 and B4 (Figure 4). In addition, multiple bfiaA alleles were found within the same sequence type: 013, B2, and B5 within ST-13 and 012, B1, and B4 within ST-20 (Figure 4). Allelic variation in perA. The entire perA gene was amplified and sequenced in 33 strains representing a diverse set of EPEC STS and bfiaA alleles. One strain, 2309-77 (0111:H2), resulted in a PCR product approximately 1 kb larger than expected. DNA sequencing revealed the presence of a 1055-bp IS element inserted into perA at position 414 with significant similarity to 1S102 (86%) and 18903 (84%). Sequence analysis also identified 8 strains that contained one or more frameshifts within mononucleotide repeats in perA that presumably inactivate the gene. The variability of these frameshifts among closely related alleles indicates their relatively recent occurrence, as there has not been sufficient time for the inactivated alleles to accumulate further mutations. These frameshifts were corrected and the IS element sequence was excised in silica prior to allele assignment and phylogenetic analyses. The 33 sequences yielded 20 alleles, which clustered into 4 groups based on phylogenetic sequence analysis, and in keeping with the nomenclature for eae and bpr, these 4 allele classes were designated 01, B, y, and 5 (Figure 7). As with bpr, each distinct translated perA sequence was given an allele designation, resulting in 11 major 01 types, 3 B, 2 y, and a single 5. Two of the 01 alleles had variants resulting from synonymous substitutions and each variant was given its own subtype designation (011.1, 011.2, 011.3, 015.1, and 015.2). 47 Locations of polymorphic amino acids [—1 011.1 I 1 0.01 . L012 [1 J substitutions] srte _ (X3 L l 1 low . . . 05.1 L " I (1.5.2 ' “ J (16 [ f I] (17 a | H JJ 018 I U a9 I | T] 100 0110 I I " _L011.2 L J 011.3 I J 100 —0111 , 1 1 J _[31 ' III I I Il'lll I I J - [32 B 111 I J11 l ' 71—1_fl3 . [TI 1 I 111]] l 37 71 ' m1 I II I T _fl )7 [If I I 111111 I 1 I I I] 81 :18 m l I I I 1 “ Figure 7. Twenty perA alleles cluster into four major groups. A phylogenetic tree constructed by the neighbor-j oining algorithm based on the Kimura 2—parameter model of nucleotide substitution is shown on the left. Bootstrap values for the major groups based on 500 replications are given at the internal nodes. To the right is a graph of the locations of the 46 polymorphic amino acid sites (274 total), which are marked as vertical lines that indicate differences from the consensus of all 20 alleles. 48 Based on the sequence data, an RFLP method using DdeI and Sau96l was designed to detect the 4 perA allele classes (Table 8). PCR amplification of perA was successful in all but 17 isolates. AS with the bfiaA-negative strains, 0128:H2 was the most common serotype among the perA-negative isolates (n=12). RFLP analysis of the 112 perA-positive strains showed that the o1 allele was the most common (n=74), followed by B (n=29), 7 (n=8), and 5 (n=1). Association between EAF types and STS. By combining the bfi)A and perA allelic data, each bfiaA allele was associated with only one perA allele class, resulting in 11 distinct EAF plasmid types, which were designated EAF type 1 to 11 (Table 9). EAF plasmid types 4 and 8 appear to be the most promiscuous, being found in 5 and 4 clonal groups, respectively. Interestingly, the EAF type represented by the fully sequenced plasmid from 011.1:NM EPEC 2 strain B171 (145) is among the least promiscuous, being found in only one serotype (0111:H2) of a Single sequence type (ST-20). 49 Table 8. Expected restriction fragment length polymorphisms of perA PCR amplicons. perA allele class 01 B Y 6 Digestion pattern (bp)“ DdeI Sau96I Q, 74, 82, 417 586 _1_3, 74, 499 586 82, 87, 417 174, 186, 226 87, 499 226, 360 “ Underlined fragments are not detectable under standard electrophoretic conditions. 50 Table 9. EAF plasmid types and distribution among EPEC clones. EAF bjfaA perA # of # of # of clonal type allele allele isolates STS groups 1 011 01 23 4 2 2 012 01 15 l 1 3 013 B 24 4 3 4 B1 01 11 6 5 5 B2 5 1 1 1 6 B3 y 4 1 1 7 B4 01 7 3 3 8 B5 01 18 4 4 9 B6 B 2 1 1 10 B7 B 3 2 2 1 1 — y 4 1 1 — — — 17 4 3 51 DISCUSSION Common EPEC clones. This is the first study to take a comprehensive look at the evolution of typical EPEC by combining clonal relatedness based on MLST with the allelic distributions of three important virulence factors. Previous clonal studies of EPEC have focused primarily on two main groups, EPEC 1 and EPEC 2, which were first described and defined based on the genetic relatedness of strains as determined by multilocus enzyme electrophoresis (MLEE) (156). EPEC 1 was described to comprise strains with 055:H6, 086:H34, 0127:H6, and 0142:H6 serotypes, while EPEC 2 included 01112H2, 0114:H2, 0126:H2, and 0128:H2. 0119:H6 strains have also been regarded as members of EPEC 1 because they share a number of genetic traits with the group (H6 flagellar antigen, eae+, and EAFI) (147) even though MLEE places them just outside of EPEC 1 (156). However, there appear to be sufficient genetic differences to warrant the removal of 086:H34 and 0119:H6 from EPEC l and they have been reclassified as EPEC 3 and EPEC 4, respectively (Figure 4). The data indicate that EPEC 1 strains (055:H6, 0127:H6 and 0142:H6) all possess 01-eae, whereas the K and B2 eae alleles are associated with EPEC 3 and EPEC 4, respectively. 086:H34 strains have also been shown to possess cytolethal distending toxin, while EAF+ 055:H6, 0119:H6, 0127:H6, and 0142:H6 strains are negative (46). 0119:H6 strains, on the other hand, are negative for espC, which encodes an enterotoxin, whereas EPEC 1 strains are positive (97). Other EPEC clones. Thirty of the 129 strains (23%) that were analyzed did not belong to any of the above-mentioned major lineages, and most of these had unusual serotypes for EPEC. Of these serotypes, only 02:[h2], 049:[h10], and 051:[h49] have 52 previously been reported to possess eae and/or express the AE phenotype (3, 7, 16), and only 033:[h34], 0142:[h34], and 0157:[h45] have been previously described as eae+ bjIDA+, and therefore classified as typical EPEC (7, 40, 43, 90, 112, 138). Literature searches on the remaining serotypes, including 034:[h45], 073:[h34], O76:H51, 086:[h8], 0142:[h21], 0X9:[h7], 0-:[h7], and 0-:[h34] failed to find any association with typical EPEC. An interesting finding of this study was the prevalence of the 055:[h51] clone among the strains isolated in Guinea-Bissau. Strains with this serotype have previously been described as relatively minor members of the 055 serogroup and have been reportedly isolated in only South America (8, 126). It is possible that 055:[h51] strains are Simply common among children in Guinea-Bissau or their higher prevalence may be due to sampling bias. Alternatively, the abundance of these strains among the West Afiican isolates could indicate that 055:[h51] is an emerging clone, which is increasing in frequency and spreading geographically, possibly because of a distinct combination of p intimin and EAF type 8. Virulence gene distribution. Recombination appears to have played a role in the initial generation of the EAF plasmid types. The highly divergent 011/012 and B4/ B5 bjfaA alleles are all associated with 01-perA, whereas the closely related B1 and B7 1?pr alleles are each found with different perA alleles. However, since bfi7A and perA are in complete linkage disequilibrium (each bpr allele is associated with only one perA allele class), it does not appear that recombination has played a large role in assorting allele combinations. The results also indicate varying degrees of promiscuity among the different EAF types (Table 9). Recently the complete sequence of a derivative of the 53 wild-type EAF plasmid (pMAR7) from prototypical EPEC 1 strain E2348/69 was determined (15). In comparison to the EAF ofOl 11:NM strain B171, the primary difference is the presence of the tra locus in pMAR7 (EPEC 1, EAF type 1) and its absence from pB171 (EPEC 2, EAF type 2). The tra genes, which are responsible for conjugal transfer in plasmids F and R100, were found to have varying degrees of conservation among the EAF plasmids from different EPEC strains (15). This finding has intriguing implications for the results presented here. In addition to EAF type 2, plasmid types 5, 6, 9, and 11 were all found in only a single ST. Like pBl71, these plasmids may lack the entire tra locus or possess defective conjugation machinery thereby preventing their transfer to other EPEC clones. In contrast to the distribution of the EAF plasmid types, the distribution of eae alleles among EPEC strains is more consistent with their clonal relationships. This suggests that the EAF plasmid is more mobile than the LEE island. bfiJA-negative EPEC. EAF type 11 is unusual in that it was the only plasmid type that did not contain bfiaA according to the PCR screening. Bortolini and colleagues (14) reported a similar finding when they described EAFI 0119:H2 and 0128:H2 strains in which most (~13 kb) of the bfp operon had been deleted and replaced with an IS66-like element. Since the deletion encompasses the 3' end of bjfaA, this could explain why plasmid type 11 is bfiaA-negative. To confirm this, new primers were designed to target the 5' end of bfi7A and the IS66-like element. All EAF type 11 strains yielded the expected amplicon indicating that this plasmid type possesses a similar bfp operon structure as that described by Bortolini and colleagues (14). Aside from the EAF type 11 strains, 17 additional isolates were bpr-negative. These strains, however, were also 54 .‘ perA-negative, suggesting that they did not possess the EAF plasmid. 0f the EAF- negative isolates, most (82%) were part of the EPEC 2 clonal group, with the 0128:H2 serotype being the most common. This finding was not unexpected as 0128:H2 strains are often reported to be EAF-negative and, therefore, classified as atypical EPEC (147). Relationship between typical and atypical EPEC. The findings presented here Show that at least some atypical EPEC strains, such as those that lost both 17pr and perA (and presumably the whole EAF plasmid), evolved from typical EPEC, rather than typical EPEC evolving from atypical EPEC by acquisition of the plasmid. It has been shown that typical EPEC can lose the EAF plasmid at a surprisingly high rate during passage through adult volunteers (30, 84), so there appears to be selective pressure to lose the plasmid and convert from typical to atypical EPEC. This outcome is interesting given the recent reports of atypical EPEC in human clinical isolates, some of which belong to typical EPEC 0-serogroups (4, 7, 105). 55 ACKNOWLEDGEMENTS I would like to thank Lindsey Ouellette, Dr. Weihong Qi, and Dr. Hans Steinsland for technical assistance. This work appears in the second January 2007 issue of the Journal of Bacteriology (189:342-350). This project has been funded in part with Federal funds from the NIAID, NIH, DHHS, under NIH Research Contract # N01 -AI-30058. 56 CHAPTER 4 SEQUENCE VARIATION WITHIN THE TYPE 1 FIMBRIAL PHASE SWITCH OF PATHOGENIC ESCHERICHIA C 0L1 57 SUMMARY Strains of Escherichia coli 0157:H7 do not express the type 1 fimbriae encoded by the fim operon because of a l6-bp deletion within the fim switch. This regulatory switch element is an invertible piece of DNA that is responsible for the phase variation phenotype of the fimbriae. The entire fim switch was sequenced in 129 strains representing 22 0-serogroups to assess the amount of variation within this element and to determine when this deletion occurred during the evolution of 0157:H7. Sequence analysis of 32 fim switch alleles revealed that the inverted repeats, promoter sites, and the third binding site for the leucine-responsive regulatory protein (Lrp) were well conserved. In contrast, the first and second Lrp sites and the integrated host factor (II-IF) binding Site were more variable. Phylogenetic analyses indicated that the fim switch deletion occurred at a recent stage in the evolution of E. coli 0157:H7: only B-glucuronidase- negative, sorbitol-negative (GUD-, SOR-) 0157:H7 and 01572NM strains possess the deletion, whereas GUD+, SOR+ 0157:H- strains and GUD+, SOR- 0157:H7 contain an intact fim switch. These observations suggest that the deletion in the fim switch occurred after the loss of GUD expression. 58 INTRODUCTION Type 1 fimbriae are filamentous structures expressed on the surface of most clinical E. coli isolates (5). These fimbriae bind to mannose-containing receptors on epithelial cells (109), have been Shown to be required for colonization of the urinary tract (23, 53, 65, 132), and may play a role in the colonization of the intestinal tract (76, 77). Type 1 fimbriae are encoded by the fim operon, a cluster of nine genes required for biosynthesis (52, 70, 116), and are composed primarily of the structural subunit FimA (69). Expression of these fimbriae is under the control of the fim switch, an invertible genetic element that contains the promoter for fimA (l). Transcription offimA only occurs when the fim switch is in the "on" orientation. Inversion of the fim switch is mediated by two site-specific recombinases, F imB and F imE, which are encoded upstream of the switch (41, 92). In addition to these recombinases, two cofactors, the integration host factor (IHF) and the leucine-responsive regulatory protein (Lrp), have positive regulatory roles in the inversion process (12, 13, 32, 34, 42, 128). It has been hypothesized that these proteins organize the structure of the nucleoprotein complex so that the site-Specific recombination event can occur. E. coli strains with the 0157:H7 serotype are unable to express type 1 fimbriae because the fim switch appears to be permanently locked in the "off" orientation so that transcription of fimA does not occur (85 , 127). DNA sequencing has revealed that E. coli 0157:H7 has a 16-bp deletion within the fim switch, which prevents inversion of the element (85, 127). It has been hypothesized that Since the end of the deletion is within the IHF binding site, it may prevent IHF from binding to and bending the DNA at this 59 position (127). In addition, the deletion may interfere with strict Spatial requirements necessary for recombination between the switch's inverted repeats (81, 127). Feng et a1. (3 7) have proposed a model for the stepwise evolution of E. coli 0157:H7. Under this model, enterohemorrhagic E. coli (EHEC) 0157:H7 strains evolved from enteropathogenic E. coli (EPEC) 055:H7 strains. To determine if the deletion within the fim switch is compatible with this model, the entire fim switch was sequenced in 25 0157:H7, 35 non-motile 0157, and 8 055:H7 strains. An additional 61 strains representing other types of pathogenic E. coli were sequenced to determine if the l6-bp deletion is unique to 0157:H7 and to assess the level of variation within the switch. 60 MATERIALS & METHODS Strains. A collection of 129 strains representing 22 0-serogroups was assembled (Table 10). These strains were originally isolated between 1947 and 2003 from different regions around the world. Each strain was grown overnight at 37°C in 10 ml of Luria- Bertani (LB) broth with moderate shaking. Genomic DNA was isolated using the Puregene DNA isolation kit (Gentra Systems Inc., Minneapolis, MN). DNA concentrations were determined using a NanoDrop ND-lOOO spectrophotometer (N anoDrop Technologies Inc., Rockland, DE), which were diluted to 25 ng/pl for PCR. MLST. Multilocus sequence typing (MLST) was performed on 7 conserved housekeeping genes (aspC, cle,fadD, ich, lysP, mdh, and uidA) as described in Chapter 3. fliC typing. Strains that were nomnotile or lacked flagellar serotype data were typed for the fliC locus as described in Chapter 3. PCR. To select for the "off" orientation, the fim switch was amplified in two pieces. Primers fimsw-F6 (5’-TGC CGG ATT ATG GGA AAG A-3') and off-R1 (5'- ATT TGG GGC CAT TTT GAC TC-3') were used to amplify the 5' end of the fim switch and primers off-F5 (5'-GTT TCT GTG GCT CGA CGC ATC T-3') and fimsw-Rl (5'- GGA CAG AGC CGA CAG AAC AA-3') amplified the 3' end. These primers were used for both PCR amplification and DNA sequencing of the fim switch (see below). Each 25- pl reaction contained 2.5 pl 10X buffer 11 (Applied Biosystems, Foster City, CA), 2.5 pl 2 mM dNTP, 2.0 pl 25 mM MgC12, 0.5 pl 10 pM forward primer, 0.5 pl 10 pM reverse primer, 1.5 units AmpliTaq Gold (Applied Biosystems), 1 pl 25 ng/ pl genomic DNA template, and 15.7 pl ddH20. Amplification utilized an initial denaturing step at 94°C for 61 Table 10. Summary of strains investigated for fim switch sequence variation. Serogroup Flagellar type(s) “ , # 0f fi m SWItCh Isolates deletron 06 H1, H31 2 no 026 H11,[h11] 3 no 055 H6, H7, [h34] 12 no 086 H34, [h34] 5 no 091 H21 1 no 0103 [h25] 1 no 0104 H21 1 no 0110 H6 1 no 0111 H2, [h2], H8, H9,[h11] 12 no 0113 H21 1 no 0114 H2 2 no 0118 H16 1 no 0119 H6 5 no 0121 H19 1 no 0126 H2 1 no 0127 H6 1 no 0128 H2 3 no 0142 H6, [h21] 5 no 0157 H7, [h7], [h42], [h45] 65 no (36), yes (29) 0173 [h16] 2 no 0174 [h21] 1 no 0- [h7], H18, H48 3 no “ Lower case H-types in square brackets were inferred from fliC allele. 62 10 min., followed by 35 cycles of 92°C for 1 min., 55°C for 1 min., and 72°C for 30 see. A final step of 72°C for 5 min. was used for final completion of any partially extended product. PCR products (5 pl) were visualized on ethidium bromide-stained 1.5% agarose gels by illumination with UV light. DNA sequencing. PCR products were purified using the QIAquick PCR purification kit (QIAGEN Inc., Valencia, CA) and quantified. Cycle sequencing reactions contained 4.0 pl CEQ DTCS Quick Start premix (Beckman Coulter Inc.), 1.0 pl 20 pM primer, approximately 40 ng of purified fim switch product, and ddH20 to 10 pl. Amplification utilized an initial denaturing step at 94°C for 1 min., followed by 35 cycles of 96°C for 30 sec., 55°C for 30 sec., and 60°C for 2 min. Upon completion of cycle sequencing, samples were purified with Sephadex G-50 Fine columns, dried under vacuum centrifugation, suspended in 40 p1 of deionized formamide, and sequenced using a Beckman CEQ2000XL DNA sequencer. Samples were analyzed using the CEQ2000XL software and then exported for further analysis with the SeqMan module of the Lasergene software (DNASTAR Inc., Madison, WI). Phylogenetic Analyses. Sequences were aligned with the ClustalW algorithm using the MegAlign module of the Lasergene software. A neighbor-joining tree of the concatenated MLST sequences was constructed using the Kimura 2-parameter model of nucleotide substitution with the MEGA3 software (78). The inferred phylogeny was tested with 500 bootstrap replications. Phylogenetic network analysis was conducted with the SplitsTree 4 (54) program using the neighbor-net algorithm (18) and untransformed distances (p distance). 63 RESULTS Variation within the fim switch. PCR amplification and sequencing of the fim switch was successful in all cases. A total of 61 polymorphic sites are present within the fim switch, resulting in 32 alleles (Figure 8). 0f the regions within the switch with functional importance, the —10 and —35 promoter sites, the Lrp-3 binding Site, and the right inverted repeat were completely conserved across all 32 alleles. The left inverted repeat was conserved in all but a single 0157:H7 strain, which contained a C -) T transition. The IHF binding Site and the remaining two Lrp binding Sites (Lrp-l and Lrp- 2) were more variable. With the exception of the strains possessing the 16-bp deletion, the size of the switch was fairly consistent with a length of 3 14:1:1 bp. The l6-bp deletion was found to be associated only with strains expressing the 0157 antigen. Examination of the flagellar data revealed that the deletion was unique to 0157:H7 (n=20) and 0157:[h7] (n=9) strains. However, 5 0157:H7 and 26 0157:[h7] strains possess an intact fim switch. Clonal analysis. Since the presence of the fim switch deletion was variable among the 0157:H7 and 0157:[h7] isolates, MLST was performed to assess the clonal relatedness of the strains investigated. The sequenced internal fragments of the 7 housekeeping genes were concatenated to yield 3,732 nucleotides for comparative analysis. MLST analysis resolved an average of 35.7 variable nucleotide sites per locus, which defined a number of alleles ranging from 12 to 24 at the 7 housekeeping genes (Table 11). The distinct combinations of alleles across the 7 MLST loci were used to define 41 multilocus genotypes or sequence types (STS) among the 129 strains (Figure 9). Strains closely related to 0157:H7 had one of 12 STS that cluster together at the end of a 64 IRL -10 -35 TTGGGGCCAAA-CTGTCCATATCATAAATAAGTTACGTATTTTTTCTCAAGCATAAAAAT t A T G T G- IHF ATTAAAAAACGACAAAAAGCATCTAACTGTTTGATATGTAAATTATTTCTCTTGTAAATT T C—T T C a t A C A CG AATTTCACATCACCTCCGCTATATGTAAAGCTAACGTTTCTGTGGCTCGACGCATCTT§§ A C t Aa A A T T Lrp-3 Lrp-1 TCATTCTTCTCTCCAAAAACCACCTCATGCAATATAAAAATCTATAAATAAAGATAACAA Ga a AT C A GC Cg TC T Lrp-2 TAGAATATTAAGCCAACAAATAAACTGAAAAAGTTTGTGCGCGATGCTTTCCTCTATGAG 9C a T T G G C C IRR TCAAAATGGCCCCAA [601 [120] [1801 [240] [3001 [315] Figure 8. Nucleotide polymorphism within the fim switch. The consensus sequence determined from 32 fim switch alleles is given for the "off" orientation. Mutations from the consensus are below the sequence with substitutions found in a single isolate in lower case. Regions with known functional importance are in bold and labeled above the sequence (IRL, inverted repeat left; -10 and -35, promoter Sites; IHF, integrated host factor binding site; Lrp, leucine-responsive regulatory protein binding Site; IRR, inverted repeat right). The location of the 0157:H7 deletion between sites 67 and 82 is in the shaded box. Nucleotide positions are given at the end of each line. 65 Table 11. Variation among alleles of 7 MLST genes in the fim switch strains. # of # of # of Locus , , . , srtes vanable srtes alleles aspC 513 26 12 cle 567 39 16 fadD 483 61 17 ich 567 29 17 lysP 477 l7 l4 mdh 549 35 24 uidA 576 43 18 MLST avg. 533.1 35.7 16.9 66 F EHEC 1 clonal group 1CD (055:1-17, 0157:H7, 0157:NM) l—l 0.002 substitutions / site Figure 9. Phylogenetic relationships of 41 observed sequence types. An unrooted phylogenetic tree was constructed by the neighbor-joining algorithm based on the Kimura 2—parameter model of nucleotide substitution. The gray box indicates the 12 sequence types representing strains with 055:H7, 0157:H7, and 0157:NM serotypes belonging to the EHEC 1 clonal group. Bootstrap values greater than 75% based on 500 replications are given at the internal nodes. 67 long internal branch with 100% bootstrap support. In addition to 0157:H7, the clonal group contains isolates with 0157:[h7], 055:H7, and 0-:[h7] serotypes. Reid et al. (124) have designated this cluster of closely related strains the EHEC 1 clonal group. When these STS were combined with the fim switch allelic data, 18 genotypes were resolved within the EHEC 1 clonal group (Figure 10). Phylogenetic Splits network analysis, which does not force the data into a bifurcating tree, revealed that the genotypes with the fim switch deletion cluster together, strongly suggesting that the deletion occurred only once during the diversification of the clonal group. 68 74/10 O55:H7 77/10 ,/' ' 04,17] 73/10 055:H7 I-—-I 0.0002 substitutions / site 75/7 0157:[h7] 73/9 0 76/6 055:H7 '\ i /' o1 571117] 73/8 0 0 75/6 0157:[h7] 055:H7 o 237/6 0157:H7 / 73/6 055:H7 65/5 0157:H7 66/1 0 069/1 /I\ . 275/1 65,2 , . 66/3 68/1 A fim switch . O1 57:H7 67/4 Figure 10. Phylogenetic network of 18 genotypes belonging to the EHEC 1 clonal group. The splits network is based on the neighbor-net algorithm using a p distance matrix. Unique combinations of sequence type (ST) and fim switch allele were used to define the genotypes, which are denoted as ST/fim switch. Serotypes common to each genotype are also shown. Genotypes with the fim switch deletion are enclosed in the gray box. The 9 0157:[h7] isolates with the fim switch deletion are all genotype 66/1. 69 DISCUSSION Sequence polymorphism within the fim switch. The work presented here is the first to examine the level of DNA sequence variation within the fim switch from a diverse set of strains. Previous work on the switch has been primarily focused on identifying binding sites for cofactors (IHF and Lrp) involved in the inversion process (13, 42, 128). The results of these studies have made it possible to examine the switch for differing selective pressures by classifying the sites within the switch as either “domain” or “independent.” The domain sites consisted of the two inverted repeats, both promoter Sites, and the IHF and Lrp binding Sites; all other sites were classified as independent. When Sites within the switch were placed into one of these two categories, an intriguing pattern emerged. Substitution rates involving transitions were Similar between the two categories of Sites (0.0199 for domain, 0.0170 for independent). However, a greater difference between the two categories was observed for the transversion rate (0.0107 for domain, 0.0155 for independent), suggesting that there may be some selective disadvantage towards transversions within the domain Sites. Stepwise evolution of 0157:H7 and the fim switch deletion. A stepwise evolution model has been proposed for the evolution of E. coli 0157:H7 (37). This model is based on multilocus enzyme electrophoresis and was created using a parsimony approach. The fim switch deletion appears to follow the stepwise evolution model and occurred only once during the evolution of 01 57:H7. The fim switch deletion occurred relatively recently in the evolutionary history of E. coli 0157:H7; only GUD-, SOR- 0157:H7 and 0157:NM strains possess the 16-bp deletion. This implies that the fim switch deletion occurred after the loss of GUD expression. The stepwise evolution model 70 was updated using sequence types resolved by MLST as the foundation rather than electrophoretic types as determined by MLEE (Figure 11). While the original model had a bifurcating split after the unobserved GUD+, SOR+, Stx2+ 0157:H7 phenotype (A3 in Figure 11), the updated model now has a trifurcation after this point. The discovery of a GUD+, SOR+, Stx2- 0157:H7 strain provides further support to the existence of the A3 phenotype. As the contribution of 0157 strains without the typical diagnostic GUD- and SOR- phenotypes to the cases of EHEC disease becomes known, a strain with the elusive A3 phenotype may finally be observed. The biological relevance of the loss of type 1 fimbrial expression in 0157:H7 remains unclear. However, an intriguing hypothesis has been put forth in a recent study by Low et a1. (88). They measured the expression of 15 fimbrial gene clusters in 0157:H7 and found that most (n=1 1) were not expressed under the variety of conditions examined. The authors concluded that the limited collection of expressed fimbriae may be an important part of 0157:H7'S biology. Low et a1. hypothesize that 0157:H7'S niche at the terminal rectum of cattle iS possibly due to limited adherence to other Sites in the gastrointestinal tract. This hypothesis, however, remains to be tested. 71 Ancestral EPEC-like strain with 0157' H- ST 75 LEE (seIC) ’ 9+ 5+ _’ ST 76 G+ S+ Loss Of 2+ 1 motility E A4 5 stx2 I'fb i phage region 055: H7 V 055: H7 V 0157: H7 0157: H7 —-> G+S+ —-> G+S+ «—-—’ G+S+ -—->ST237 (3+ 3+ 2+ 2+ -_ Loss of 2_ -- stx2 A1 A2 A3 A7 stx1 phage i 0157: H7 0157: H7 L f D G+ S- L—T G- S- oss o 055 o SOR 2+ 1" GUD, 2" 1" A5 A 5'" A6 switch V V ST 73 ST 73 ST 65 ST 66 ST 74 ST 67 ST 77 ST 68 ST 69 ST 275 Figure 11. Revised stepwise evolution model of E. coli 0157:H7. Phenotypes of ancestors Al—A7 are shown; changes predicted to have occurred are in bold (G, GUD; S, SOR; l, Stxl; 2, 8612). Sequence types (ST) observed among the strains investigated are shown. A strain with the traits of ancestor A3 (shaded square) has not been reported. 72 ACKNOWLEDGEMENTS I would like to thank Lindsey Ouellette for technical support and Susan Francisco for her assistance in the sequencing of the EPEC isolates examined. This project has been funded in part with Federal funds from the NIAID, NIH, DHHS, under NIH Research Contract # N01 -AI-30058. 73 CHAPTER 5 POSITIVE SELECTION AND RECOMBINATION IN SURFACE PROTEIN-ENCODING GENES 74 SUMMARY Escherichia coli is a diverse species of Gram-negative bacteria. Most strains are nonpathogenic and do not harm their host, but some may cause a variety of harmful intestinal and extra-intestinal infections. A common theme in the pathogenesis of the different types of E. coli is bacterial attachment mediated by the expression of surface proteins. A wide range of surface proteins are expressed by these strains, some of which are ubiquitous while others are Specific to certain pathogenic types (pathotypes). The primary objective of this study is to examine the allelic diversity of 5 surface protein- encoding genes (bpr, csgA, eae, espA, and fimA) for the actions of positive selection and recombination. Allelic sequences were obtained for at least one of the 5 loci from a collection of 324 strains representing 44 O-Serogroups of E. coli as well as the newly described E. albertii. Sequence analysis identified 11 bfiJA, 12 csgA, 20 eae, 31 espA, and 32 fimA alleles in the strains investigated. Comparison to the allelic sequences of 7 conserved housekeeping loci in the same strains revealed higher levels of sequence variation in the 5 surface protein-encoding genes. Analysis of the housekeeping loci suggests that these genes are under weak negative selection with most (76%) of the codons examined evolving neutrally with no Significant intragenic recombination events identified. In contrast, evidence of positive selection and/or recombination was detected in all 5 of the surface protein-encoding genes. These results support the hypothesis that surface proteins alter their three-dimensional structure or surface epitopes in order to confer an evolutionary advantage. 75 INTRODUCTION Escherichia coli is a diverse species of Gram-negative bacteria, some strains of which are pathogenic. The early stages of E. coli pathogenesis often involve bacterial attachment mediated by the expression of surface proteins. It has been hypothesized that pathogens alter their surface proteins in order to evade detection by their host's immune system. Therefore, it is likely that natural selection is acting to generate new allelic variants. Furthermore, homologous recombination may be acting to generate new allelic variants. This could take place through the exchange of whole genes or gene segments. Several methods have been developed to test for the occurrence and boundaries of recombinational events. The most commonly used methods examine the sequences for conflicting phylogenetic signals (58, 74) or a non-random distribution of nucleotide substitutions (130). There are two types of natural selection, positive and negative. These types of selection can be detected at the DNA sequence level by comparing the rates of synonymous (silent) and nonsynonymous (amino acid changing) nucleotide substitutions. Numerous methods have been developed for estimating the numbers of synonymous and nonsynonymous substitutions (22, 44, 57, 86, 87, 99, 103). These methods. calculate the substitution rate for an entire gene by computing the average number of substitutions over a particular length of codons. More recently, methods have been developed to detect positive selection at Single amino acid sites (51, 73, 106-108, 142, 143, 158, 159). It is the goal of this research to examine the allelic diversity of genes that encode a variety of surface structures in different classes of pathogenic E. coli (pathotypes). Five genes encoding surface proteins involved in bacterial attachment during different types of 76 infection or survival in the external environment were examined for evidence of positive selection and homologous recombination. The genes chosen for this analysis are: l) bij, the major structural subunit of the bundle-forming pilus of enteropathogenic E. coli (EPEC); 2) csgA, the major structural subunit of the curli fimbriae involved in biofilm formation; 3) eae, the intimin protein of the attaching/effacing phenotype; 4) espA; the filamentous extension on the type III secretion system needle of attaching and effacing E. coli (AEEC); and 5) fimA, the major structural subunit of the type 1 fimbriae. The level of sequence polymorphism was assessed in each locus and compared to the amount of variation present within 7 genes with conserved housekeeping function. A similar approach was taken for both the selection and recombination analyses. Overall values were calculated for dN, ds, and phylogenetic compatibility, followed by regional and site- by-site examination of each locus. 77 MATERIALS AND METHODS Strains. A collection of 324 strains representing 44 0-serogroups of E. cali as well as the newly described E. albertii was assembled (Table 12). These strains were originally isolated between 1947 and 2004 from 29 countries around the world. Each strain was grown overnight at 37°C in 10 ml of Luria-Bertani (LB) broth with moderate Shaking. Genomic DNA was isolated using the Puregene DNA isolation kit (Gentra Systems Inc., Minneapolis, MN). DNA concentrations were determined using a NanoDrop ND-1000 Spectrophotometer (N anoDrop Technologies Inc., Rockland, DE), which were diluted to 25 ng/ pl for PCR. MLST. Multilocus sequence typing (MLST) was performed on 7 conserved housekeeping genes (aspC, cle,fadD, ich, lysP, mdh, and uidA) as described in Chapter 3. fliC typing. Strains that were nonmotile or lacked flagellar serotype data were typed for the fliC locus as described in Chapter 3. PCR primer design. DNA sequences of the regions encompassing each of the genes (bfiaA, csgA, eae, espA, and fimA) were obtained from GenBank and aligned. All primers were designed to target conserved regions, were synthesized by Integrated DNA Technologies, Inc., and were stored at a concentration of 100 pM in ddH20. PCR. The eae and bpr genes were amplified and sequenced as described in Chapters 2 and 3, respectively. For csgA, each 25-pl reaction contained 2.5 pl 10X buffer 11 (Applied Biosystems, Foster City, CA), 2.5 pl 2 mM dNTP, 2.0 pl 25 mM MgC12, 0.5 111 10 11M csgA_-70F primer (5'-CAA ATG GCT ATT CGC GTG AC-3'), 0.5 pl 10 pM csgA_542R primer (5'-GTG CCG CAA GGA GTA ATA AC-3'), 1.5 U 78 Table 12. Summary of strains investigated for allelic variation in at least one of 5 surface protein-encoding genes. Species & serogroup #of strains # of STS Allelic data “ bpr csgA eae espA fimA E. coli 02 E. coli 05 E. coli 06 E. coli 020 E. coli 021 E. coli 026 E. coli 033 E. coli 034 E. coli 049 E. coli 051 E. coli 055 E. coli 073 E. coli 076 E. coli 084 E. coli 085 E. coli 086 E. coli 088 E. coli 091 E. coli 0101 E. coli 0103 E. coli 0104 E. coli 0108 E. coli 0109 E. coli 0110 2 NNwN-bt-‘I-‘NN t—AD—lF—‘Lo’: :1—1 Nr—i-hv—Iww~\l N Nv—ib—‘t—ANNt—A—‘flt—A—‘r—II—AON—lNI—IN—‘v—ONN l I-‘NI—IN 2 2 NNWNWI—Iy—I Hp—‘p—‘y—i fl 1 2 Nv—anr—II—I AHHp—Ap—A 12 l 79 continued Table 12, continued Species & # of # of Allelic data “ serogroup strains STS bfiJA csgA eae espA fimA E. coli 0111 24 7 15 9 21 5 12 E. coli 0113 1 1 l E. coli 0114 5 3 4 2 4 2 E. coli 0116 l 1 l l E. coli 0118 1 l E. coli 0119 29 4 23 2 28 5 5 E. coli 0121 5 3 2 2 E. coli 0125 2 4 1 E. coli 0126 3 2 1 3 l E. coli 0127 5 2 4 1 4 1 1 E. coli 0128 20 6 2 5 15 l 3 E. coli 0142 12 5 11 3 11 3 5 E. coli 0145 12 7 l2 2 E. coli 0153 1 1 1 E. coli 0157 67 13 3 13 46 14 65 E. coli 0173 2 1 2 E. coli 0174 1 1 1 E. coli 0X9 1 l 1 l E. coli 0- 9 9 3 2 7 2 3 E. coli 0? b 7 3 7 E. albertii 12 7 5 10 7 4 S. baydii type 13 4 3 4 1 total 324 103 108 70 269 91 133 “ Number of strains for which allelic data was determined. Blanks indicate no allelic data was obtained because the strain was either negative or not tested for the locus. b 0-type not determined. 80 AmpliTaq Gold (Applied Biosystems), 1 pl 25 ng/ pl genomic DNA template, and 15.7 pl ddH20. Amplification of the approximately 600-bp fragment utilized an initial denaturing step at 94°C for 10 min, followed by 35 cycles of 92°C for 1 min, 55°C for 1 min, and 72°C for 30 s. A final step of 72°C for 5 min was used to complete any partially extended product. For espA, an approximately 950-bp fragment was amplified under conditions similar to those described for csgA with the exceptions of primer sepL- F887 (5'-AGA GCC CTT CTC GGG TAT CG-3'), primer espD-R126 (5'-GGC CGT GGA TTT AAC CAG TTG TAA-3'), an annealing temperature of 53°C, and an extension time of 1 min. For fimA, an approximately 700-bp fragment was amplified under conditions Similar to those described for csgA with the exceptions of primers fimA- F6 (5'-ACT GCC CAT GTC GAT TTA GAA-3') and fimA-R8 (5'-GAG CAA ACA TTG GCA GCA AC-3'). PCR products (5 pl) were visualized on ethidium bromide-stained 1.5% agarose gels by illumination with UV light, purified using the QIAquick PCR purification kit (QIAGEN Inc., Valencia, CA), and quantified. DNA sequencing. For csgA, cycle sequencing reactions contained 4.0 p1 CEQ DTCS Quick Start premix (Beckman Coulter Inc., Fullerton, CA), 1.0 pl 20 pM csgA_- 70F or csgA_542R primer, approximately 60 ng of purified csgA PCR product, and ddH20 to a final volume of 10 p1. Amplification utilized an initial denaturing step at 94°C for 1 min, followed by 35 cycles of 96°C for 30 s, 55°C for 30 s, and 60°C for 2 min. Upon completion of cycle sequencing, samples were purified with Sephadex G-50 Fine columns (Amersham Pharrnacia Biotech Inc., Piscataway, NJ), dried under vacuum centrifugation (Savant Instruments Inc., Holbrook, NY), suspended in 40 pl of deionized formamide, and run on a CEQ2000XL (Beckman Coulter Inc.). For espA, sequencing 81 conditions were similar to those described for csgA with the exceptions of primers sepL- F887, espD-R126, espA-F205c (5'-GAG GCA TCT AAR GMG TCA AC-3'), espA-F246 (5'-GGA TGC CAA GAT CGC TGA AGT T-3'), espA-R413 (5'-GCT TTT ACG GTT TGC AGG TCA C-3'), and espA-R426c (5'-AAT AGC NGC YTT CAC YGT TTG-3'); an annealing temperature of 53°C; and approximately 100 ng of purified espA PCR product. F or fimA, sequencing conditions were similar to those described for csgA with the exception of primers fimA-F6 and fimA-R8 and approximately 70 ng of purified fimA PCR product. Samples were analyzed using the CEQ2000XL software and then exported for further analysis with the SeqMan module of the Lasergene software (DNASTAR Inc., Madison, WI). Phylogenetic analyses. Sequences were aligned with the ClustalW algorithm using the MegAlign module of the Lasergene software. Neighbor-joining trees were constructed using the Kimura 2-parameter model of nucleotide substitution with the MEGA3 software (78) and the inferred phylogenies were each tested with 500 bootstrap replications. Recombination analyses. Putative regions of recombination were identified through the construction of compatibility matrices of parsimoniously informative sites using the program Reticulate (58). The Significance of each matrix was evaluated by a Monte Carlo approach in which the original matrix was compared to 1000 random matrices of the gene's informative sites. Allelic sequences were examined for evidence of gene conversion by performing Sawyer's test using the program GENECONV (Version 1.81) (131). Calculations were based on 10,000 permutations and global fragments with Bonferroni-corrected Karlin-Altschul p-values _<_ 0.05 were considered significant. 82 Allelic sequences were also fit to a nucleotide substitution model and potential recombinational breakpoints were identified using the genetic algorithm recombination detection (GARD) method (74) as implemented on the Datamonkey website (http://www.datamonkey.org/GARD) using the B — F model with 3 rate classes. For loci in which GARD identified a possible recombinational breakpoint, the regions of the gene on either Side of the breakpoint were analyzed separately for the action of selection on individual codons (see below). Selection analyses. The number of synonymous substitutions per synonymous site (ds) and the number of nonsynonymous substitutions per nonsynonymous Site ((1111) were estimated by the modified Nei-Gojobori method using MEGA3 (78). The proportions of polymorphic synonymous (pg) and nonsynonymous (pN) sites were calculated by the method of Nei and Gojobori (103). Variation in the functional constraints of different regions of each gene was examined by tabulating pg ande in a sliding-window analysis of 10 or 30 codons using the program PSWIN. Allelic sequences were fit to a nucleotide substitution model using the Datamonkey website (http://www.datamonkey.org) and then the single likelihood ancestor counting (SLAC) and random effects likelihood (REL) methods were used to fit a codon model to detect selection on individual codons (72). Codons under significant positive or negative selection were identified by p-values S 0.01 for SLAC and posterior probabilities Z 0.99 for REL. 83 RESULTS MLST analysis. PCR amplification and sequencing of the 7 MLST loci in 324 strains was successfirl in most (92%) cases. The notable exception was uidA, which failed to amplify in 25 strains, including a Single 0114:H2 and a O76:H51 strain, all 11 055:[h51] strains, and all 12 E. albertii strains. One of the E. albertii strains was also PCR-negative for lysP. The uidA and lysP loci were treated as missing data and replaced with alignment gaps in the fully concatenated sequence for these 25 strains. For phylogenetic analysis, the sequenced internal fragments of the 7 housekeeping genes were concatenated to yield 3,732 nucleotides. MLST analysis resolved an average of 78.4 variable nucleotide sites per locus, which defined a number of alleles, ranging from 30 to 45, at the 7 MLST genes (Table 13). The distinct combinations of alleles across the 7 MLST loci were used to define 103 multilocus genotypes or sequence types (STS) among the 324 strains (Figure 12). The synonymous rate of substitution (d3) ranged from a low of 5.52% for uidA to a high of 22.66% for fadD with an average of 11.33 synonymous substitutions per 100 synonymous Sites (Table 13). The nonsynonymous rate (111.1) per 100 nonsynonymous sites was generally two orders of magnitude lower than d5, ranging from 0.07% for cle to 0.58% for uidA. Tests for natural selection operating on the allelic variation at each MLST locus based on the SLAC and REL methods revealed that, on average, most (76.0%) of the codons analyzed are evolving neutrally with no significant difference between the levels of synonymous and nonsynonymous substitutions (Table 14). The number of negatively selected codonS ranged from a low of 2.9% for aspC to a high of 49.1% for fadD with an average of 24.0%. No codonS in any of the MLST loci were 84 Table 13. Sequence variation among alleles of 7 MLST genes. # of # of # of d3 dN Locus Sites variable sites alleles X 100 X 100 aspC 513 80 37 12.80 0.30 cle 567 73 37 9.55 0.07 fadD 483 114 36 22.66 0.35 ich 567 59 39 8.13 0.16 lysP 477 70 30 10.23 0.41 mdh 549 86 45 10.43 0.12 uidA 576 67 40 5.52 0.58 Avg. 533.1 78.4 37.7 11.33 0.28 Table 14. Summary of recombination and selection analyses for 7 MLST genes. Locus Overall Global GARD Neutral Negative Positive compatibility fragments “ breakpoints codons codons codons aspC 94.4% 0 0 97.1% 2.9% 0.0% cle 77.4% 0 0 70.4% 29.6% 0.0% fadD 81.3% 0 0 50.9% 49.1% 0.0% ich 70.0% 0 0 78.8% 21.2% 0.0% lysP 92.6% 0 0 76.7% 23.3% 0.0% mdh 94.9% 0 0 77.0% 23.0% 0.0% uidA 83.3% 0 0 81.3% 18.8% 0.0% Avg. 84.8% 0 0 76.0% 24.0% 0.0% “ Number of significant global inner fragments identified by Sawyer's test. A O \l ’ . '\\ «- " [.7/ 9 - . . . 4" /I . : \ o : I" - A . C O \ l .. ' Y ° 5 - 2 o ‘ csgA E . - 14‘ 0 ' espA l—l 0.005 substitutions / site B ‘ O r 0 : (I? bpr D ,' \. .~ ' o 0 \- \ O 1 . - i (It. ’ °,..’ eae F , ‘ . . O 0. .IIN fimA Figure 12. Distribution of allelic data among sequence types (STS). A) 500 STS Showing the diversity of E. coli and E. albertii. The 103 STS investigated are marked with gray circles. The branch between E. coli and E. albertii is represented by the dashed line (length = 0.063 substitutions/site). Panels B through F Show the distributions of allelic data obtained for bpr (B, 22 STS), csgA (C, 39 STS), eae (D, 74 STS), espA (E, 50 STS), and fimA (F, 44 STS) among the 103 STS indicated in panel A. found to be under Significant positive selection. Thus, low values of dN/ds at the MLST loci reflect weak negative selection over many codons. Compatibility analysis indicated that, on average, 84.8% of the pairwise comparison of parsimoniously informative Sites in the 7 MLST genes have compatible phylogenies, suggesting that recombination or parallel mutation may have played a small role in the allelic diversification at these loci. The results of Sawyer's test and GARD appear to support the possibility of parallel mutation Since neither method found evidence of Significant intragenic recombination in any of the 7 MLST loci (Table 14). Allelic variation in bpr. 0f the 324 strains investigated, bjfaA allelic data was obtained for 108 isolates through a combination of DNA sequencing and RFLP analysis. Sequence analysis revealed the presence of 11 distinct alleles of bpr, which are defined by 99 polymorphic nucleotide sites within the 588-bp gene. Phylogenetic compatibility analysis with 72 parsimoniously informative Sites resulted in an overall compatibility of 92.6%. No globally Si gnificant inner fragments were detected by Sawyer's test; however, GARD identified a potential recombinational breakpoint at codon 150. In comparison to the MLST loci from the same 108 strains, the synonymous rate for bfp/1 was 9.22%; slightly greater than the range (3.44 — 8.56%) and mean (5.41%) for the 7 MLST genes. The nonsynonymous rate of 6.38% for bpr was more than 40 times greater than the average dN across all 7 MLST genes (0.15%). The proportions of nonsynonymous and synonymous codon changes (pN and pg, respectively) were calculated within a lO-codon Sliding window to determine if regions of bpr are experiencing different selective pressures (Figure 13A). The level of selective constraint appears to vary along the length of bpr, for most of the gene pg > pN, but 87 0 20 40 60 80 100 120 140 160 180 150 0 20 40 60 80 100 120 140 160 180 codon Figure 13. Selection and recombination in bfiJA. A) Sliding window analysis depicting changes in p5 (black line), pN (gray line), ande — p3 (heavy black line) along the length of bpr using a lO-codon window. B) Codons under Significant positive (triangles, n=7) or negative (squares, n=1 9) selection are plotted using the dN — d3 values as determined by the SLAC method. Sites found to be significant by SLAC and REL are Shaded black, while sites significant under REL only are gray. Codons evolving neutrally are indicated by the white circles. The recombination breakpoint identified by GARD at codon 150 is indicated by the dashed line. 88 codons 145-170 are distinct in that pN > pg. When the regions on either side of the recombinational breakpoint identified by GARD were analyzed separately for the action of selection on individual codons, 19 codons were identified as being under Significant negative selection, while 7 codons appear to be under positive selection (Figure 13B). Allelic variation in csgA. 0f the 324 strains investigated, csgA allelic data was obtained for 70 isolates. Sequence analysis revealed the presence of 12 distinct alleles, which are defined by 73 polymorphic nucleotide sites within the 459-bp gene. One of the E. albertii strains investigated (K-l) possessed an unusual csgA allele. This allele, designated as allele 12, was unique in that it is l35-bp shorter than the other alleles due to a large in-frame deletion. In addition to the deletion, allele 12 also possesses a premature stop codon because of a C -) T transition at position 418 in the alignment. Since some of the methods require the removal of alignment gaps, allele 12 was excluded from the recombination and selection analyses because the removal of the 135 -bp fiom the other alleles may lead to inaccurate results. Phylogenetic compatibility analysis with 41 parsimoniously informative sites within the remaining 11 alleles resulted in an overall compatibility of 82.7%. Two globally significant inner fragments were detected by Sawyer's test, both of which were located between codons 16 and 57 and GARD identified a potential recombinational breakpoint within the same region of the gene at codon 5 3. In comparison to the MLST loci from 69 of the 70 csgA strains (E. albertii strain K-l was excluded), the synonymous rate for csgA was 20.02%; within the range (5.19 — 28.65%) but greater than the mean (12.65%) for the 7 MLST genes. The nonsynonymous rate of 1.54% for csgA was approximately 5 times greater than the 89 average 1111 across all 7 MLST genes (0.30%). Sliding window analysis revealed that pg > pN for most of the gene, but a small region centered near codon 20 was distinct in that [711 > pg. (Figure 14A) This region along with 3 others was characterized by an increase in pN above zero. The codon selection analyses identified 42 codons under Significant negative selection and 2 codons under positive selection (Figure 14B). Allelic variation in eae. 0f the 324 strains investigated, eae allelic data was obtained for 269 isolates through a combination of DNA sequencing and fRF LP analysis (79). Among these isolates, 21 of the 28 major allelic variants of eae were observed, and .5 ‘u.'""'""" representative sequences of each allele were chosen for subsequent analyses. Sequence analysis of the 21 eae alleles identified 1009 polymorphic nucleotide Sites within the ~2.8-kb gene. Of these polymorphic sites, 773 (76.6%) are located in the 3' end of the gene, which encodes the extracellular domains of intimin. Because of this large difference in the distribution of polymorphic Sites, eae was divided into intracellular (periplasmic and transmembrane) and extracellular regions. In comparison to the MLST loci from the same 269 strains, the synonymous rate for the intracellular portion of the 21 eae alleles was 12.57%; within the range (5.31 — 25.57%) and Slightly greater than the mean (12.43%) for the 7 MLST genes. The nonsynonymous rate of 1.93% was approximately 6 times greater than the average dN across all 7 MLST genes (0.31%). As expected, the extracellular region of eae was quite different from the periplasmic and transmembrane domains. With a synonymous rate of 61.39% and a nonsynonymous rate of 26.18%, the extracellular domains possess roughly 5 times as many synonymous and approximately 85 times as many nonsynonymous substitutions per Site as the average values for the 7 MLST loci. 90 0.8 0.4 - Q. 0 _ m m r—m -0.4 - '0.8 l I I I I I I 0 20 40 60 80 100 120 140 B 53 8 t 4 _ g} . _éz 0: ommhm OED-Emigrant) IOI- “$3...“ I -4 - i '8 l l I I I I I 0 20 40 60 80 100 120 140 codon Figure 14. Selection and recombination in csgA. A) Sliding window analysis depicting changes in pg (black line), pN (gray line), and pi — pg (heavy black line) along the length of csgA using a 10-codon window. B) Codons under Significant positive (triangles, n=2) or negative (squares, n=42) selection are plotted using the dN — dg values as determined by the SLAC method. Sites found to be significant by SLAC and REL are Shaded black, while Sites Significant under REL only are gray. Codons evolving neutrally are indicated by the white circles. The recombination breakpoint identified by GARD at codon 53 is indicated by the dashed line. 91 As mentioned previously, 21 of the 28 major allelic variants of eae were observed among the strains investigated. When all 28 variants were analyzed, an additional 116 polymorphic sites were identified; all selection and recombination analyses were performed on these 28 allelic variants. Phylogenetic compatibility analysis with 860 parsimoniously informative Sites from the 28 alleles resulted in an overall compatibility of 29.8%. When eae was divided into the intracellular and extracellular regions, the periplasmic and transmembrane portion had an observed compatibility of 71.3% while the extracellular domains had a compatibility of 30.4%. Since the compatibility analysis indicates strong support for parallel mutation and/or recombination within the extracellular domains as opposed to the intracellular region of the gene, the regional division of eae was maintained in the remaining recombination and selection analyses. Intracellular region of eae. Eleven globally significant inner fragments were detected within the intracellular region of eae by Sawyer's test, all of which were located either between codons 107 and 345 or between codons 352 and 507. GARD identified two potential recombinational breakpoints, one at codon 173 and a second at codon 351. Sliding window analysis indicated that pg is greater than pN over the entire region (Figure 15A). A Slight increase in pN was observed over the first 175 codons, which corresponds to the periplasmic domain of intimin. Codon selection analysis identified 119 negatively selected codonsiand 2 positively selected codons (Figure 15B). Extracellular region of eae. Sawyer’s test identified 237 significant global inner fragments ranging in length from 51 to 973 bp. These fragments tended to cluster into one of three overlapping regions within the extracellular domains (codons 551 to 697, codons 648 to 885, and codons 849 to 947). GARD identified potential recombinational 92 '0.8 I I 1 I l l I 1 T r 0 50 100 150 200 250 300 350 400 450 500 550 8 173 351 4 1,. A . o o . o I'? O 13.1 ;:.11 :'I:I-:?.l':?.(.1_o..1¢.. 1 (“_L—(Lm q 1 1:011? 1.4:: '1‘. l ."1‘1‘2311'1 .1. H : Z o 9:117 552— 155:211I‘I3f’flér. 1v“. , '11:le Q, D..‘.'IO T“:€(c: .zro S'CHYJ. (0....L... 7’ ' ' I. I I I i " I II I -4 1 I I - I ' I I '8 I I I T F l I I I I 0 50 100 150 200 250 300 350 400 450 500 550 codon Figure 15. Selection and recombination in the periplasmic (PPD) and transmembrane (TMD) domains of eae. A) Sliding window analysis depicting changes in pg (black line), pN (gray line), ande — pg (heavy black line) along the length of eaeppD/rMD using a 30- codon window. B) Codons under significant positive (triangles, n=2) or negative (squares, n=l l9) selection are plotted using the dN — dg values as determined by the SLAC method. Sites found to be Significant by SLAC and REL are shaded black, while Sites Significant under REL only are gray. Codons evolving neutrally are indicated by the white circles. The recombination breakpoints identified by GARD at codons 173 and 351 are indicated by the dashed lines. 93 breakpoints at codons 598, 656, 689, 743, and 829. Sliding window analysis indicated that, similar to the intracellular region of eae, pg is greater than pN over the entire extracellular region (Figure 16A). An increase in both pg ande was observed at approximately codon 675, which corresponds to the immunoglobulin-like D1 domain of intimin. Codon selection analysis identified 213 negatively selected codons and 8 positively selected codons (Figure 168). Six of the 8 positively selected codons are located within the lectin-like D3 domain of intimin, which interacts with the translocated intimin receptor (Tir). In addition to the 8 positively selected sites, the D3 domain features a pair of cysteine residues located at positions 865 and 948 in the alignment. These two cysteine residues are under significant negative selection (Figure 16B), suggesting that they may play a role in stabilizing the affinity of the intimin-Tir interaction as suggested by the structural model of Luo et a1. (89). Allelic variation in espA. Of the 324 strains investigated, espA allelic data was obtained for 91 isolates. Sequence analysis revealed the presence of 31 distinct alleles, which are defined by 241 polymorphic nucleotide sites within the 579-bp gene. Phylogenetic compatibility analysis with 214 parsimoniously informative sites resulted in an overall compatibility of 74.5%. Neither Sawyer's test nor GARD found evidence of significant intragenic recombination, suggesting that the phylogenetic incompatibilities could be due to parallel mutation or ancient recombination events that have since been obscured by subsequent mutation. In comparison to the MLST loci from the same 91 strains, the synonymous rate for espA was 39.22%, which is greater than the range (5.20 — 26.64%) and mean (13.10%) for the 7 MLST genes. The nonsynonymous rate of 11.93% for espA was more 94 0.8 0.4: w a. 0.0 . -o.4: W -O.8 ‘ . . . . . . I . 550 600 650 700 750 800 850 900 950 598 656 689 743 829 codon Figure 16. Selection and recombination in the extracellular domains (ECD) of eae. A) Sliding window analysis depicting changes in pg (black line), pN (gray line), ande — pg (heavy black line) along the length of eaeECD using a 30-codon window. B) Codons under significant positive (triangles, n=8) or negative (squares, n=213) selection are plotted using the dN — dg values as determined by the SLAC method. Sites found to be significant by SLAC and REL are shaded black, while sites significant under REL only are gray. Codons evolving neutrally are indicated by the white circles. The recombination breakpoints identified by GARD at codons 598, 656, 689, 743, and 829 are indicated by the dashed lines. Cysteine residues are located at codons 865 and 948. 95 than 30 times greater than the average dN across all 7 MLST genes (0.35%). Sliding window analysis revealed that pg > pN for most of the gene, but a small region before codon 20 was distinct in that pN > pg. In addition, a broad region between codons 80 and 140 was characterized by a greater increase in pN when compared to the rest of the gene (Figure 17A). When the alleles were analyzed for the action of selection on individual codons, 81 codons were found to be under significant negative selection and 6 codons are under positive selection. Of the positively selected sites, 5 are within the broad region with increased pN identified in the sliding window analysis (Figure 178). Allelic variation in fimA. Of the 324 strains investigated, fimA allelic data was obtained for 133 isolates. Sequence analysis revealed the presence of 32 distinct alleles, which are defined by 156 polymorphic nucleotide sites within the 555-bp gene. Phylogenetic compatibility analysis with 134 parsimoniously informative sites resulted in an overall compatibility of 69.9%. Nineteen globally significant inner fragments were detected by Sawyer's test, all of which were located either between codons 1 and 74 or between codons 111 and 172. GARD identified a potential recombinational breakpoint within the second region of the gene at codon 112. In comparison to the MLST loci fiom the same 133 strains, the synonymous rate for fimA was 19.99%, within the range (5.73 — 23.21%) but greater than the mean (10.51%) for the 7 MLST genes. The nonsynonymous rate of 4.84% for fimA was almost 20 times greater than the average dN across all 7 MLST genes (0.26%). Sliding window analysis revealed that, with the exception of a small region centered at codon 24 where pN is slightly greater than pg, the difference between pN and pg is less than zero over the entire length of the gene (Figure 18A). However, five regions of the gene were 96 0 20 40 60 80 100 120 140 160 180 18 8 4 ‘ A a O ‘ m .1 0 o O O A O "a Q) 0 0A QC 0 - ()Ib Q5 afib dkfliai Cb cococPo {f figfiginjiifilelléi:I'fqfii' 1' qtprEafiidll'fll d! I I I II. I. I0 -4.. . II I I '8 I I I I I I I I I 0 20 40 60 80 100 120 140 160 180 codon Figure 17. Selection in espA. A) Sliding window analysis depicting changes in pg (black line), pN (gray line), and pN — pg (heavy black line) along the length of espA using a 10-codon window. B) Codons under significant positive (triangles, n=6) or negative (squares, n=8l) selection are plotted using the dN — dg values as determined by the SLAC method. Sites found to be significant by SLAC and REL are shaded black, while sites significant under REL only are gray. Codons evolving neutrally are indicated by the white circles. No recombination breakpoints were identified by GARD. 97 characterized by an increase in pN above zero. The codon selection analysis identified 60 negatively selected codons and 2 positively selected codons (Figure 18B). Selection and hydrophobicity. Amino acid hydrophobicity was assessed to determine if any significant differences exist in the amino acids found at codons evolving under different selective pressures (neutral, positive, or negative). For the surface protein-encoding genes, significantly more hydrophilic residues were found at sites under positive selection (x2=3.898, dfil , p=0.048) than expected from the observed proportions over all sites. Similarly, significantly more hydrophobic residues were found at sites under negative selection (x2=9.039, df=l, p=0.003) than expected. No significant difference was observed for the neutrally selected sites (x2=2.030, df=1, p=0.154). Analysis of the housekeeping loci used in MLST revealed no significant differences for either the negatively selected sites (x2=0.015, df=l , p=0.901) or neutral sites (x2=0.004, df=1, p=0.947). 98 A 0.8 0.4- 1 W -0.4~ '0.8 I I I I I I I I I O 20 4O 6O 80 100 120 140 160 180 B 8 112 4 - A g ‘ 0 %E 00a? 0 '2 OMWaWo fiawqp egmegm 'e - I I I - I In i ,o o€ o °I - II . I o -4- I I I II .I I I I - I I I : I I '8 I I I I I I I I I 0 20 4O 6O 80 100 120 140 160 180 codon Figure 18. Selection and recombination in fimA. A) Sliding window analysis depicting changes in pg (black line), pN (gray line), ande — pg (heavy black line) along the length of fimA using a 10-codon window. B) Codons under significant positive (triangles, n=2) or negative (squares, n=60) selection are plotted using the dN — dg values as determined by the SLAC method. Sites found to be significant by SLAC and REL are shaded black, while sites significant under REL only are gray. Codons evolving neutrally are indicated by the white circles. The recombination breakpoint identified by GARD at codon 112 is indicated by the dashed line. 99 DISCUSSION In the work presented here, five genes encoding surface proteins involved in the attachment of E. coli during different types of infection or survival in the external environment were examined for evidence of positive selection and homologous recombination. The genes investigated were quite variable in the amount of recombination detected in the allelic sequences ranging from no recombination (espA) to extensive recombination (eae) (Table 15). With the exception of csgA, structural protein models have been reported for the genes analyzed allowing inferences to be drawn concerning the residues under positive selection. bij. Blank et al. (1 1) reported a theoretical three-dimensional structure model for the bundlin encoded by the al bij allele. All 6 positively selected residues located in the 3' half of the gene map to the convex face of the pilin head, which is predicted to be located on the surface of the assembled pilus. The remaining positively selected residue is located along the edge of the pilin head and may also be surface-exposed. Interestingly, the two cysteine residues involved in disulfide bridge formation and stability of the mature bundlin protein appear to be evolving neutrally. However, since the codons are completely conserved across all the alleles, the lack of any synonymous substitutions may reflect a limitation of the codon selection method used here to distinguish between negative and neutral selection in codons that lack sequence variation. csgA. For csgA, the pattern of codon selection was quite different from the other genes analyzed. Little variation was observed among the allelic sequences with the level of sequence polymorphism more similar to the MLST loci than the other surface protein- encoding genes. The two codons identified as being under positive selection are 100 Table 15. Summary of recombination and selection analyses. Overall Global GARD Neutral Negative Positive Locus compatibility regions ” breakpoints codons codons codons bij 92.6% 0 1 86.4% 9.9% 3.7% csgA 82.7% 1 1 70.9% 27.8% 1.3% eaeppDnMD 71 .3% 2 2 77.9% 21.8% 0.4% eaeeco 30.4% 3 5 41.2% 56.6% 2.1% espA 74.5% 0 0 54.0% 42.9% 3.2% fimA 69.9% 2 1 65.7% 33.1% 1.1% “ Number of regions within the gene in which the significant global inner fragments identified by Sawyer's test tended to cluster. 101 completely conserved among the E. coli alleles examined with the sequence differences at these positions due to the E. albertii strains. The two residues may be indicative of weak positive selection operating between the two species. However, analysis of the allelic sequences used here with the addition of csgA sequences from Citrobacter sp. (GenBank accession numbers AJ 515700 and AJ 515701) and Salmonella enterica (GenBank accession numbers NC_003197 and NC_003198) classified these two sites as evolving neutrally. The conflicting findings indicate that the selection results must be carefiilly inspected before conclusions about their importance can be drawn. eae. Luo et a1. (89) described the crystal structure of the extracellular domains of or intimin and focused on the interaction between the D3 domain of intimin and the translocated intimin receptor (Tir). Of particular interest is the positively charged intimin tip (residues 909-914 in the alignment used here) that directly interacts with Tir through hydrogen bonding. Five of the 6 residues within the tip have positive dN-ds values with the codons at positions 910 and 913 under significant positive selection according to the more conservative SLAC method. This is somewhat surprising as one might expect this region to be under negative selection so as to preserve the interaction with Tir. Variation within the tip may explain reports of intimin binding to receptors other than Tir (39). In addition to the D3-Tir interactions, Luo et al. also described an 8-residue linker region between the transmembrane domain and the D0 domain. They hypothesized that, since it contains a glycine near each end (positions 552 and 559), the region may function as a flexible hinge because of the conformational variability of these residues. Interestingly, these two glycines are completely conserved across all 28 eae alleles investigated. The glycine at position 552 is under significant negative selection 102 according to the REL method, while the codon at position 559 is classified as neutral since it lacks any sequence variation among the alleles. This supports the hypothesis that the region serves some functional importance especially given the level of sequence polymorphism observed in this part of the gene. espA. In 2003 Neves and colleagues (104) demonstrated that there was no immunological cross-reactivity between EspA filaments from EPEC 1 (serotype 0127:H6) and EHEC 1 (serotype 0157:H7) strains. More recently, Crepin et a1. (24) expanded upon this finding by elucidating the molecular basis of the antigenic polymorphism in these two distinct alleles of EspA. They identified a short hypervariable region of the protein located between residues 123 and 129 that when deleted did not affect filament biogenesis and function. In addition, peptide insertions into the hypervariable region were tolerated and displayed on the surface of the filament. By exchanging this surface-exposed hypervariable domain between an 0127:H6 EPEC 1 strain and an 0157:H7 EHEC 1 strain, they were able to swap the antigenic specificity of the EspA filaments. The hypervariable region described by Crepin et al. was easily identified in the espA alleles examined in this study. Of the 7 amino acids within this domain, 4 were found to be under significant positive selection, thereby adding further evidence to the immunological importance of this region. Another positively selected codon was identified at position 105 that, while not part of the dispensable hypervariable domain, may be surface-exposed given its close proximity to the region. fimA. A 2001 study by Peek and colleagues (119) examined fimA sequences obtained from E. coli strains isolated from a broad range of host species for evidence of positive selection and recombination. They also performed a structural analysis of fimA 103 based on homology to the pilin domain of fimH, the gene encoding the adhesive tip of the type 1 fimbriae. The authors identified 19 sites under positive selection within their fimA sequences by using the likelihood method of Nielsen and Yang (108). However, since this method does not allow for variation in synonymous substitution rates across sites, the validity of the Nielsen-Yang approach was recently called into question by Kosakovsky Pond and Frost (73). They discovered that if variation in both synonymous and nonsynonymous substitution rates is not taken into account, the results obtained by the Nielsen-Yang approach could be misleading. The REL method used here was developed by Kosakovsky Pond and Frost to address this issue in the Nielsen-Yang approach by allowing both dg and dN to vary across sites independently. Reanalysis of the Peek et al. data using the methods employed here (recombination detection by GARD, followed by codon selection analysis with both SLAC and REL) identified 9 codons as having a positive value for dN-ds. All of these sites were described by Peek et a1. as being under positive selection; however, none of the values were significant in the reanalysis. This suggests that site-to-site variation in the synonymous substitution rate could be responsible for the positive selection results reported by Peek et al. 104 ACKNOWLEDGEMENTS I would like to thank Lindsey Ouellette for performing MLST on many of the non-EPEC strains. I would also like to thank Susan Francisco, Suzanne Henson, and Sara Kienzle for their assistance in the fimA, espA, and csgA sequencing projects, respectively. This project has been funded in part with Federal fimds from the NTAID, NIH, DHHS, under NIH Research Contract # N01-AI-30058. 105 CHAPTER 6 SUMMARY AND SYNTHESIS 106 The overall purpose of the research presented in this dissertation is to examine the allelic diversity of genes that encode a variety of surface structures in different classes of pathogenic E. coli. Much of the work presented thus far has been primarily focused on the evolution of particular virulence factors. By combining this allelic data with a phylogenetic framework obtained from MLST analysis, insights into strain evolution may be achieved. Compatibility analysis combined with phylogenetic networks. Phylogenetic incompatibilities can arise from either recombination or parallel mutation, both of which can obscure the evolutionary history of genetic information passed by vertical transmission within bacterial populations. To address this issue, phylogenetic compatibility analysis was performed as described in Chapter 5, with the modification that the least compatibles sites were sequentially removed until only the set of sites with complete compatibility was achieved (Table 16). This set of 100% compatible sites should more accurately represent the phylogenetic relationships of the loci under investigation. For the two chromosomally encoded loci (csgA and fimA) this modified compatibility analysis was also performed on the sequence types resolved by MLST. Phylogenetic network analysis (see Chapter 3) was then performed using the compatible csgA or fimA sites with the MLST sites from the associated STs to identify potential lateral transfer events involving either of these two genes into a new chromosomal background. This type of analysis is the first of its kind to be performed with either of these two genes. The E. coli csgA sequences cluster into one of three groups with E. albertii as an outgroup (Figure 19). Of particular interest is the parallel path involving E. 107 Table 16. Summary of combined compatibility analyses. Locus a # of # of # of # of Fraction Fraction for sites BI sites b BC sites ‘ BS sites d compatible 8 network analysisf csgA 456 33 25 28 75.8% 11.6% csgA STS 3156 316 225 35 71.2% 8.2% eaeppD-Do 1962 234 81 164 34.6% 12.5% eaem-[)3 816 184 43 54 23.4% 11.9% espA 570 137 77 24 56.2% 17.7% fimA 549 97 51 16 52.6% 12.2% fimA STS 3156 305 218 60 71.5% 8.8% “ ST, sequence type; PPD-DO, the portion of eae that encodes the periplasmic, transmembrane and D0 domains; D1-D3, the portion of eae that encodes the D1, D2, and D3 domains. b BI, binary informative: sites possessing two nucleotides, each of which is found in at least two alleles. C BC, binary compatible: the set of B1 sites that are 100% compatible with each other. d BS, binary singleton: sites possessing two nucleotides, one of which is found in only a single allele. 9 The percentage of BI sites that are 100% compatible. f The percentage of the total sites that are 100% compatible or binary singletons. 108 1—1 0.01 substitutions lsite EHEC 2. EPEC 2 ‘ . otherAEEC. “7* EPEC. &STEC EPEC1, 3, & 4 other AEEC, EPEC, & UPEC Escherichia albertii Figure 19. Phylogenetic network for csgA and its associated sequence types (STS). Three main clusters of E. coli are indicated by the gray ovals. Clonal groups and other relevant pathotypes found within each cluster are given. Multiple paths are indicative of phylogenetic incompatibilities between the STs and csgA. AEEC, attaching and effacing E. coli; EHEC, enterohemorrhagic E. coli; EPEC, enteropathogenic E. coli; STEC, Shiga toxin-producing E. coli; UPEC, uropathogenic E. coli. 109 albertii. MLST analysis places the EHEC 1 clonal group as more basal, while the csgA data places E. albertii closer to the cluster containing the EPEC 1, 3, and 4 clonal groups. In addition, the csgA analysis with sequences from Citrobacter and Salmonella places the E. albertii allele within the diversity of the E. coli sequences. Taken together, these findings suggest a lateral transfer event involving csgA between E. coli and E. albertii since the sequences are more similar than would be expected based on the MLST data. The results of the fimA comparisons are even more intriguing since numerous transfer events are apparent (Figure 20). With the exception of the EHEC 2 group, each of the two major lineages of EHEC and EPEC has experienced a lateral transfer event involving fimA. Since only two sites within fimA were found to be under significant positive selection, recombination and not point mutation appears to be the driving force in generating allelic diversity within clonal groups. A slightly different approach was used for eae and espA since both are part of the LEE pathogenicity island, a known mobile genetic element. Instead of comparing each of these genes to their associated sequence types, they were compared to each other to identify potential recombination events resulting in the creation of new combinations of eae and espA alleles. The mosaic nature of eae has been previously described (95, 144), but over 20 additional major allelic variants of eae have been reported since these studies, so an updated analysis was warranted. In the previous analyses of eae presented here, the gene was partitioned on the basis of domain location (intracellular vs. extracellular). This division was also seen in the compatibility analysis, but the boundary was somewhat different. Closer inspection of the compatibility analysis indicates that the D0 domain is slightly more compatible with the periplasmic and transmembrane domains (41.4%) than 110 EHEC 2 & EPEC 2 i—-l 0.01 Escherichia substitutions / site albertii Figure 20. Phylogenetic network for fimA and its associated sequence types (STs). The locations of the two major clonal lineages of the enterohemorrhagic and enteropathogenic E. coli are indicated by the gray ovals. Multiple paths are indicative of phylogenetic incompatibilities between the STS and fimA. EHEC, enterohemorrhagic E. coli; EPEC, enteropathogenic E. coli. 111 it is with the other extracellular domains (37.1%). The resulting set of 100% compatible sites from each of these two regions revealed two very different evolutionary histories of eae (Figure 21). With the exception of G-eae, the 5' region sequences from the alleles fell into one of three groups with the Citrobacter rodentium sequence as an outgroup. Analysis of the 100% compatible sites from the 3' region revealed a different set of relationships with most of the sequences belonging to one of four major allelic families. These allelic families have been designated a, B, y, and s after the first reported member of each group. The extent of recombination between the two regions of eae is quite apparent in the network analysis (Figure 22). Since the 5' region is primarily intracellular, it is likely to be more representative of the evolutionary history of eae in the absence of selective pressure from the immune system. Each of the three groups contains members from different allele families; for example, Group 2 is comprised of members of the [3, y, and 8 allele families. This indicates that recombination has played a large role in generating antigenic variation. Comparison of the set of 100% compatible espA sites to both eae regions further supports the hypothesis of extracellular domain exchange within eae. On the basis of location within the LEE island alone, one might expect espA to be more compatible with the 3' eae region given the shorter distance between them. However, the data indicate the contrary, espA is much more compatible with the 5' eae region (88.6%) than the 3' region (583%) (Figure 23). The three major groups of alleles identified in the 5' eae region are intact in the espA network analysis. In contrast, the relationships within each of the four allele families from the 3' eae region are broken with members of each family dispersed over the network. By replacing the extracellular 112 A g ' B 82 c2 -{__,c -BZ 1l 1 99_|1’t2 K —s 8 "'82 e3 0‘2 84 G 1 L7 mp 88 i - 99 x B ‘ or —_|B3 "n _132 B 9?. r12 98 Crod [1.1 K J 1t 74 x 99 9 . CO— 83 7 or ' [i —v 99 84 Group2 94—1 0t B3 C e —L'C2 - Group 3 y u ‘°°Lr9 ' 0' Figure 21. Conflicting evolutionary histories for two regions of eae. These minimum evolution trees were constructed using the number of nucleotide differences within the set of 100% compatible and binary singleton sites for the periplasmic, transmembrane, and DO domains (A) and for the D1, D2, and D3 domains (B) of eae. Group or allele family designations are indicated by the square brackets. Bootstrap values for relevant clusters based on 500 replications are given at the internal nodes. C. r0d., Citrobacter rodentium 113 [32 o . K .11 112 ' 0 TC 0 ' '82 G O -\' B? / / B / 84. 8. l; 83 0 o 47 ' V ° T p t-—l , 0.01 Citrobacter t2 substitutions lsite rodentium Figure 22. Phylogenetic network for eae. Extensive recombination between the 5' (periplasmic, transmembrane, and D0 domains) and 3' (D1, D2, and D3 domains) regions of eae is indicated by the multiple paths within the network. 114 1K l1 .0 82.707 52 - or l--l 0.01 substitutions / site B or. Q Q2 82 ‘ or.“ 52. H __ \‘ ‘J.’ 't .y ’ 7 / “a I \\\\_’ V E.“ .. '0 'T B 62 8 ' I o n 1.2 p i—-l 0.01 substitutions / site Figure 23. Phylogenetic networks for espA combined with eae. Phylogenetic incompatibilities identified between espA and the 5' eae region (A) and 3' eae region (B) are indicated by the multiple paths within the networks. Combinations of eae and espA are indicated by the black circles and labeled with their corresponding eae allele. Not all eae alleles are represented since they were not observed among the strains investigated. 115 domains from one allele family with those from another, a strain could gain a selective advantage in expressing new surface epitopes in an immunologically na'r've host. Inferred evolution of EAF plasmids and EPEC clones. The results of the EPEC study demonstrate the highly promiscuous nature of the EAF plasmid, so a different approach was used to investigate strain evolution. Based on the MLST and EAF plasmid type data, ancestral or primitive clonal types within the 4 main EPEC groups can be inferred under the parsimony principle, that is, positing a simple evolution model based on minimizing the number of evolutionary genetic events. The principal events in the evolutionary change of an ancestral EPEC clone are EAF plasmid recombination, plasmid replacement, and plasmid loss. Recombination of the EAF plasmid is defined as a change in bpr allele since each bpr allele is associated with a single perA allele class. Plasmid replacement, presumably resulting from the horizontal transfer of an EAF plasmid, is believed to have occurred when both bpr and perA differ from the primitive condition. Plasmid loss is inferred when an isolate is PCR negative for both bpr and perA. Under the parsimony principle, the types of genetic events underlying the evolution of each of the EPEC groups have been deduced (Table 17). The EPEC 3 group is the most homogenous with only plasmid loss being inferred. Two of the three possible plasmid changes were found in the EPEC 1 and EPEC 4 clonal groups. For EPEC 1, the inferred ancestral type possessed EAF type 1 (al-bij, or-perA). Plasmid loss was not observed, but plasmid recombination (EAF type 8: BS-bijA, a-perA) and replacement (EAF type 6: B3-bpr, y-perA) have occurred. For EPEC 4, plasmid recombination was not observed, but plasmid loss and replacement were detected. EAF 116 Table 17. EAF plasmid changes within four common EPEC clonal groups. Clonal Recombination Plasmid replacement . . _ Plasmrd loss group (change in bpr) (change 1n bfp/4 & perA) EPEC 1 EAF 1 to EAF 8 EAF 1 to EAF 6 not observed EPEC 2 EAF 4 to EAF 2 or EAF 7 EAF 4 to EAF 11 observed EPEC 3 not observed not observed observed EPEC 4 not observed EAF 3 to EAF 5 or EAF 8 observed 117 type 3 (a3-bpr, B-perA) was replaced with EAF type 5 (BZ-bpr, S-perA) and with EAF type 8 (BS-bpr, a-perA). EPEC 2 is the most variable clonal group. With an inferred ancestral state of EAF type 4 (Bl-bij, a-perA), all three types of plasmid changes were observed. Two different recombination events involving EAF type 2 (a2-bpr, or-perA) and EAF type 7 (B4-bpr, a-perA), plasmid replacement with EAF type 11 (bij', y- perA), and plasmid loss have shaped the diversity of this clonal group. Future considerations. The evolution of EPEC appears to be a dynamic process involving repeated acquisition of the LEE island and transfer of the EAF plasmid. The work presented here is the first to classify EAF plasmids into types based on bfi9A and perA allelic data, and 11 distinct plasmid types were identified among the EPEC strains investigated. Nevertheless, it remains unclear what level of conservation exists among plasmids of the same EAF type. Given the number of IS elements present within the two fully sequenced EAF plasmids (15, 145), there could be considerable variation within each plasmid type and further characterization of the EAF types is warranted. Three insertion sites of the LEE island have been identified within the AEEC, all of which are either within or near tRNA genes (selC, pheU, and pheV) (129, 137, 157). These studies have primarily focused on strains with the or, B, y, a, Q, and 0, eae alleles, but a survey of the remaining alleles has not been performed. If additional insertion sites are uncovered, further insight into the number of transfer events of the LEE island into E. coli may be gained. The further characterization of pathogenic strains will improve our understanding of the processes that underlie microbial evolution. The identification of unique genetic determinants in these strains may then be used to facilitate the detection of specific epidemic clones during outbreaks of disease. 118 REFERENCES Abraham, J. M., C. S. Freitag, J. R. Clements, and B. I. Eisenstein. 1985. An invertible element of DNA controls phase variation of type 1 fimbriae of Escherichia coli. Proc Natl Acad Sci U S A 82:5724-7. Adu-Bobie, J., G. Frankel, C. Bain, A. G. Goncalves, L. R. Trabulsi, G. Douce, S. Knutton, and G. Dougan. 1998. Detection of intimins or, [3, y, and 5, four intimin derivatives expressed by attaching and effacing microbial pathogens. J Clin Microbiol 36:662-8. Albert, M. J., K. Alam, M. Ansaruzzaman, J. Montanaro, M. Islam, S. M. Faruque, K. Haider, K. Bettelheim, and S. Tzipori. 1991. Localized adherence and attaching-effacing properties of nonenteropathogenic serotypes of Escherichia coli. Infect Immun 59:1864-8. Alikhani, M. Y., A. Mirsalehian, and M. M. Aslani. 2006. Detection of typical and atypical enteropathogenic Escherichia coli (EPEC) in Iranian children with and without diarrhoea. J Med Microbiol 55:1159-63. Beachey, E. H. 1980. Bacterial adherence. Chapman and Hall, London ; New York. Bieber, D., S. W. Ramer, C. Y. Wu, W. J. Murray, T. Tobe, R. Fernandez, and G. K. Schoolnik. 1998. Type IV pili, transient bacterial aggregates, and virulence of enteropathogenic Escherichia coli. Science 280:2114-8. Blanco, M., J. E. Blanco, G. Dahbi, M. P. Alonso, A. Mora, M. A. Coira, C. Madrid, A. Juarez, M. I. Bernardez, E. A. Gonzalez, and J. Blanco. 2006. Identification of two new intimin types in atypical enteropathogenic Escherichia coli. Int Microbiol 9:103-10. Blanco, M., J. E. Blanco, G. Dahbi, A. Mora, M. P. Alonso, G. Varela, M. P. Gadea, F. Schelotto, E. A. Gonzalez, and J. Blanco. 2006. Typing of intimin (eae) genes from enteropathogenic Escherichia coli (EPEC) isolated from children with diarrhoea in Montevideo, Uruguay: identification of two novel intimin variants (uB and fiR/B2B). J Med Microbiol 55:1165-74. Blanco, M., J. E. Blanco, A. Mora, J. Rey, J. M. Alonso, M. Hermoso, J. Hermoso, M. P. Alonso, G. Dahbi, E. A. Gonzalez, M. I. Bernardez, and J. Blanco. 2003. Serotypes, virulence genes, and intimin types of Shiga toxin (verotoxin)-producing Escherichia coli isolates from healthy sheep in Spain. J Clin Microbiol 41:1351-6. 119 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Blank, T. E., D. W. Lacher, I. C. Scaletsky, H. Zhong, T. S. Whittam, and M. S. Donnenberg. 2003. Enteropathogenic Escherichia coli 0157 strains from Brazil. Emerg Infect Dis 9:113-5. Blank, T. E., H. Zhong, A. L. Bell, T. S. Whittam, and M. S. Donnenberg. 2000. Molecular variation among type IV pilin (bfiJA) genes from diverse enteropathogenic Escherichia coli strains. Infect Immun 68:7028-7038. Blomfield, I. C., P. J. Calie, K. J. Eberhardt, M. S. McClain, and B. I. Eisenstein. 1993. Lrp stimulates phase variation of type 1 fimbriation in Escherichia coli K-12. J Bacteriol 175:27-36. Blomfield, I. C., D. H. Kulasekara, and B. I. Eisenstein. 1997. Integration host factor stimulates both FimB- and FimE-mediated site-specific DNA inversion that controls phase variation of type 1 fimbriae expression in Escherichia coli. Mol Microbiol 23:705-17. Bortolini, M. R., L. R. Trabulsi, R. Keller, G. Frankel, and V. Sperandio. 1999. Lack of expression of bundle-forming pili in some clinical isolates of enteropathogenic Escherichia coli (EPEC) is due to a conserved large deletion in the bfp operon. FEMS Microbiol Lett 179: 169-74. Brinkley, C., V. Burland, R. Keller, D. J. Rose, A. T. Boutin, S. A. Klink, F. R. Blattner, and J. B. Kaper. 2006. Nucleotide sequence analysis of the enteropathogenic Escherichia coli adherence factor plasmid pMAR7. Infect Immun 74:5408-13. Broes, A., R. Drolet, M. Jacques, J. M. Fairbrother, and W. M. Johnson. 1988. Natural infection with an attaching and effacing Escherichia coli in a diarrheic puppy. Can J Vet Res 52:280-2. Bruen, T. C., H. Philippe, and D. Bryant. 2006. A simple and robust statistical test for detecting the presence of recombination. Genetics 172:2665-81. Bryant, D., and V. Moulton. 2004. Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol Biol Evol 21 :255-65. Bush, R. M. 2001. Predicting adaptive evolution. Nat Rev Genet 2:3 87-92. Castillo, A., L. E. Eguiarte, and V. Souza. 2005. A genomic population genetics analysis of the pathogenic enterocyte effacement island in Escherichia coli: the search for the unit of selection. Proc Natl Acad Sci U S A 102:1542-7. 120 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. Cleary, J., L. C. Lai, R. K. Shaw, A. Straatman-Iwanowska, M. S. Donnenberg, G. Frankel, and S. Knutton. 2004. Enteropathogenic Escherichia coli (EPEC) adhesion to intestinal epithelial cells: role of bundle-forming pili (BFP), EspA filaments and intimin. Microbiology 150:527-38. Comeron, J. M. 1995. A method for estimating the numbers of synonymous and nonsynonymous substitutions per site. J Mol Evol 41 :1 152-9. Connell, I., W. Agace, P. Klemm, M. Schembri, S. Marild, and C. Svanborg. 1996. Type 1 fimbrial expression enhances Escherichia coli virulence for the urinary tract. Proc Natl Acad Sci U S A 93:9827—32. Crepin, V. F., R. Shaw, S. Knutton, and G. Frankel. 2005. Molecular basis of antigenic polymorphism of EspA filaments: development of a peptide display technology. J Mol Biol 350:42-52. Daniel], S. J., E. Kocsis, E. Morris, S. Knutton, F. P. Booy, and G. Frankel. 2003. 3D structure of EspA filaments from enteropathogenic Escherichia coli. Mol Microbiol 49:301-8. Deng, W., Y. Li, P. R. Hardwidge, E. A. Frey, R. A. Pfuetzner, S. Lee, S. Gruenheid, N. C. Strynakda, J. L. Puente, and B. B. Finlay. 2005. Regulation of type III secretion hierarchy of translocators and effectors in attaching and effacing bacterial pathogens. Infect Immun 73:2135-46. Denno, D. M., J. R. Stapp, D. R. Boster, X. Qin, C. R. Clausen, K. H. Del Beccaro, D. L. Swerdlow, C. R. Braden, and P. I. Tarr. 2005. Etiology of diarrhea in pediatric outpatient settings. Pediatr Infect Dis J 24: 142-8. Donnenberg, M. S. 2002. Introduction. In M. S. Donnenberg (ed.), Escherichia coli: Virulence mechanisms of a versatile pathogen. Academic Press, New York, NY. Donnenberg, M. S., J. A. Giron, J. P. Nataro, and J. B. Kaper. 1992. A plasmid-encoded type IV fimbrial gene of enteropathogenic Escherichia coli associated with localized adherence. Mol Microbiol 6:3427-37. Donnenberg, M. S., C. O. Tacket, S. P. James, G. Losonsky, J. P. Nataro, S. S. Wasserman, J. B. Kaper, and M. M. Levine. 1993. Role of the eaeA gene in experimental enteropathogenic Escherichia coli infection. J Clin Invest 92: 1412- 7. Donnenberg, M. S., H. Z. Zhang, and K. D. Stone. 1997. Biogenesis of the bundle-forming pilus of enteropathogenic Escherichia coli: reconstitution of fimbriae in recombinant E. coli and role of DsbA in pilin stability--a review. Gene 192:33-8. 121 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. Dorman, C. J., and C. F. Higgins. 1987. Fimbrial phase variation in Escherichia coli: dependence on integration host factor and homologies with other site- specific recombinases. J Bacteriol 169:3840—3. Drouin, G., F. Prat, M. E11, and G. D. Clarke. 1999. Detecting and characterizing gene conversions between multigene family members. Mol Biol Evol 16:1369-90. Eisenstein, B. I., D. S. Sweet, V. Vaughn, and D. I. Friedman. 1987. Integration host factor is required for the DNA inversion that controls phase variation in Escherichia coli. Proc Natl Acad Sci U S A 84:6506-10. Elliott, S. J., V. Sperandio, J. A. Giron, S. Shin, J. L. Mellies, L. Wainwright, S. W. Hutcheson, T. K. McDaniel, and J. B. Kaper. 2000. The locus of enterocyte effacement (LEE)-encoded regulator controls expression of both LEE- and non-LEE-encoded virulence factors in enteropathogenic and enterohemorrhagic Escherichia coli. Infect Immun 68:6115-6126. Elliott, S. J., L. A. Wainwright, T. K. McDaniel, K. G. Jarvis, Y. K. Deng, L. C. Lai, B. P. McNamara, M. S. Donnenberg, and J. B. Kaper. 1998. The complete sequence of the locus of enterocyte effacement (LEE) from enteropathogenic Escherichia coli E2348/69. Mol Microbiol 28: 1-4. Feng, P., K. A. Lampel, H. Karch, and T. S. Whittam. 1998. Genotypic and phenotypic changes in the emergence of Escherichia coli 0157:H7. J Infect Dis 177:1750-3. Finlay, B. B., I. Rosenshine, M. S. Donnenberg, and J. B. Kaper. 1992. Cytoskeletal composition of attaching and effacing lesions associated with enteropathogenic Escherichia coli adherence to HeLa cells. Infect Immun 60:2541-3. Frankel, G., A. D. Phillips, L. R. Trabulsi, S. Knutton, G. Dougan, and S. Matthews. 2001. Intimin and the host cell--is it bound to end in Tir(s)? Trends Microbiol 9:214—8. Franzolin, M. R., R. C. Alves, R. Keller, T. A. Gomes, L. Beutin, M. L. Barreto, C. Milroy, A. Strina, H. Ribeiro, and L. R. Trabulsi. 2005. Prevalence of diarrheagenic Escherichia coli in children with diarrhea in Salvador, Bahia, Brazil. Mem Inst Oswaldo Cruz 100:359-63. Gally, D. L., J. Leathart, and I. C. Blomfield. 1996. Interaction of FimB and FimE with the fim switch that controls the phase variation of type 1 fimbriae in Escherichia coli K-12. Mol Microbiol 21:725-38. 122 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. Gally, D. L., T. J. Rucker, and I. C. Blomfield. 1994. The leucine-responsive regulatory protein binds to the fim switch to control phase variation of type 1 fimbrial expression in Escherichia coli K-12. J Bacteriol 176:5665-72. Ghilardi, A. C., T. A. Gomes, W. P. Elias, and L. R. T rabulsi. 2003. Virulence factors of Escherichia coli strains belonging to serogroups 0127 and 0142. Epidemiol Infect 131 :815-21. Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725-36. Gomez-Duarte, O. G., and J. B. Kaper. 1995. A plasmid-encoded regulatory region activates chromosomal eaeA expression in enteropathogenic Escherichia coli. Infect Immun 63:1767-76. Guth, B. E., R. Giraldi, T. A. Games, and L. R. Marques. 1994. Survey of cytotoxin production among Escherichia coli strains characterized as enteropathogenic (EPEC) by serotyping and presence of EPEC adherence factor (EAF) sequences. Can J Microbiol 40:341-4. Hammar, M., A. Arnqvist, Z. Bian, A. Olsen, and S. Normark. 1995. Expression of two csg operons is required for production of fibronectin- and congo red-binding curli polymers in Escherichia coli K—12. Mol Microbiol 18:661-70. Hammar, M., Z. Bian, and S. Normark. 1996. Nucleator-dependent intercellular assembly of adhesive curli organelles in Escherichia coli. Proc Natl Acad Sci U S A 93:6562-6. Hartland, E. L., S. J. Daniel], R. M. Delahay, B. C. Neves, T. Wallis, R. K. Shaw, C. Hale, S. Knutton, and G. Frankel. 2000. The type 111 protein translocation system of enteropathogenic Escherichia coli involves EspA-EspB protein interactions. Mol Microbiol 35:1483-92. Hein, J. 1990. Reconstructing evolution of sequences subject to recombination using parsimony. Math Biosci 98: 1 85-200. Huelsenbeck, J. P., and K. A. Dyer. 2004. Bayesian estimation of positively selected sites. J Mol Ev0158:661-72. Hull, R. A., R. E. Gill, P. Hsu, B. H. Minshew, and S. Falkow. 1981. Construction and expression of recombinant plasmids encoding type 1 or D- mannose—resistant pili from a urinary tract infection Escherichia coli isolate. Infect Immun 33:933-8. 123 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. Hultgren, S. J., T. N. Porter, A. J. Schaeffer, and J. L. Duncan. 1985. Role of type 1 pili and effects of phase variation on lower urinary tract infections produced by Escherichia coli. Infect Immun 50:370-7. Huson, D. H., and D. Bryant. 2006. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23:254-67. Huys, G., M. Cnockaert, J. M. Janda, and J. Swings. 2003. Escherichia albertii sp. nov., a diarrhoeagenic species isolated from stool specimens of Bangladeshi children. Int J Syst Evol Microbiol 53:807-10. Hyma, K. E., D. W. Lacher, A. M. Nelson, A. C. Bumbaugh, J. M. Janda, N. A. Strockbine, V. B. Young, and T. S. Whittam. 2005. Evolutionary genetics of a new pathogenic Escherichia species: Escherichia albertii and related Shigella boydii strains. J Bacteriol 187 :619-28. Ina, Y. 1995. New methods for estimating the numbers of synonymous and nonsynonymous substitutions. J Mol Evol 40:190-226. Jakobsen, I. B., and S. Easteal. 1996. A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Comput Appl Biosci 12:291-5. Jakobsen, I. B., S. R. Wilson, and S. Easteal. 1997. The partition matrix: exploring variable phylogenetic signals along nucleotide sequence alignments. Mol Biol Evol 14:474-84. Jenkins, C., A. J. Lawson, T. Cheasty, G. A. Willshaw, P. Wright, G. Dougan, G. Frankel, and H. R. Smith. 2003. Subtyping intimin genes fiom enteropathogenic Escherichia coli associated with outbreaks and sporadic cases in the United Kingdom and Eire. Mol Cell Probes 17:149-56. Jores, J., K. Zehmke, J. Eichberg, L. Rumer, and L. H. Wieler. 2003. Description of a novel intimin variant (type zeta) in the bovine 0842NM verotoxin-producing Escherichia coli strain 537/89 and the diagnostic value of intimin typing. Exp Biol Med (Maywood) 228:370-6. Kaper, J. B. 1998. EPEC delivers the goods. Trends Microbiol 6:169-72; discussion 172-3. Kaper, J. B. 1994. Molecular pathogenesis of enteropathogenic Escherichia coli, p. 173-195. In V. L. Miller, J. B. Kaper, D. A. Portnoy, and R. R. Isberg (ed.), Molecular Genetics of Bacterial Pathogenesis. American Society for Microbiology, Washington, DC. 124 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. Kaper, J. B., and A. D. O'Brien. 1998. Escherichia coli 0157:H7 and other shiga toxin-producing E. coli strains. ASM Press, Washington, DC. Keith, B. R., L. Maurer, P. A. Spears, and P. E. Orndorff. 1986. Receptor- binding function of type 1 pili effects bladder colonization by a clinical isolate of Escherichia coli. Infect Immun 53:693-6. Kenny, B. 2002. Mechanism of action of EPEC type III effector molecules. Int J Med Microbiol 291 :469-77. Kenny, B., R. DeVinney, M. Stein, D. J. Reinscheid, E. A. Frey, and B. B. Finlay. 1997. Enteropathogenic E. coli (EPEC) transfers its receptor for intimate adherence into mammalian cells. Cell 91 :51 1-20. Kimura, M. 1979. The neutral theory of molecular evolution. Sci Am 241:98- 100, 102, 108 passim. Klemm, P. 1984. The fimA gene encoding the type-1 fimbrial subunit of Escherichia coli. Nucleotide sequence and primary structure of the protein. Eur J Biochem 143:395-9. Klemm, P., and G. Christiansen. 1987. Three fim genes required for the regulation of length and mediation of adhesion of Escherichia coli type 1 fimbriae. Mol Gen Genet 208:439-45. Knutton, S., I. Rosenshine, M. J. Pallen, I. Nisan, B. C. Neves, C. Bain, C. Wolff, G. Dougan, and G. Frankel. 1998. A novel EspA-associated surface organelle of enteropathogenic Escherichia coli involved in protein translocation into epithelial cells. Embo J 17:2166-76. Kosakovsky Pond, S. L., and S. D. Frost. 2005. Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinforrnatics 21:2531-3. Kosakovsky Pond, S. L., and S. D. Frost. 2005. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol Biol Evol 22: 1208-22. Kosakovsky Pond, S. L., D. Posada, M. B. Gravenor, C. H. Woelk, and S. D. Frost. 2006. Automated phylogenetic detection of recombination using a genetic algorithm. Mol Biol Evol 23:1891-901. Koster, F., J. Levin, L. Walker, K. S. Tung, R. H. Gilman, M. M. Rahaman, M. A. Majid, S. Islam, and R. C. Williams, Jr. 1978. Hemolytic-uremic syndrome after shigellosis. Relation to endotoxemia and circulating immune complexes. N Engl J Med 298:927-33. 125 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. Krogfelt, K. A. 1991. Bacterial adhesion: genetics, biogenesis, and role in pathogenesis of fimbrial adhesins of Escherichia coli. Rev Infect Dis 13:721-35. Krogfelt, K. A., B. A. McCormick, R. L. Burghoff, D. C. Laux, and P. S. Cohen. 1991. Expression of Escherichia coli F -18 type 1 fimbriae in the streptomycin-treated mouse large intestine. Infect Immun 59:1567-8. Kumar, S., K. Tamura, and M. Nei. 2004. MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinforrn 5:150-63. Lacher, D. W., H. Steinsland, and T. S. Whittam. 2006. Allelic subtyping of the intimin locus (eae) of pathogenic Escherichia coli by fluorescent RF LP. FEMS Microbiol Lett 261:80-7. Lazzaro, B. P., B. K. Sceurman, S. L. Carney, and A. G. Clark. 2002. fRFLP and fAF LP: medium-throughput genotyping by fluorescently post-labeling restriction digestion. Biotechniques 33:539-40, 542, 545-6. Leathart, J. B., and D. L. Gally. 1998. Regulation of type 1 fimbrial expression in uropathogenic Escherichia coli: heterogeneity of expression through sequence changes in the fim switch region. Mol Microbiol 28:371-81. Levine, M. M. 1987. Escherichia coli that cause diarrhea: enterotoxigenic, enteropathogenic, enteroinvasive, enterohemorrhagic, and enteroadherent. J Infect Dis 155:377-89. Levine, M. M., and R. Edelman. 1984. Enteropathogenic Escherichia coli of classic serotypes associated with infant diarrhea: epidemiology and pathogenesis. Epidemiol Rev 6:3 1-5 1 . Levine, M. M., J. P. Nataro, H. Karch, M. M. Baldini, J. B. Kaper, R. E. Black, M. L. Clements, and A. D. O'Brien. 1985. The diarrhea] response of humans to some classic serotypes of enteropathogenic Escherichia coli is dependent on a plasmid encoding an enteroadhesiveness factor. J Infect Dis 152:550-9. Li, B., W. H. Koch, and T. A. Cebula. 1997. Detection and characterization of the fimA gene of Escherichia coli 0157:H7. Mol Cell Probes 11:397-406. Li, W. H. 1993. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol 36:96-9. 126 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. Li, W. H., C. I. Wu, and C. C. Luo. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol Biol Evol 2:150-74. Low, A. S., N. Holden, T. Rosser, A. J. Roe, C. Constantinidou, J. L. Hobman, D. G. Smith, J. C. Low, and D. L. Gally. 2006. Analysis of fimbrial gene clusters and their expression in enterohaemorrhagic Escherichia coli 0157:H7. Environ Microbiol 8: 1033-47. Luo, Y., E. A. Frey, R. A. Pfuetzner, A. L. Creagh, D. G. Knoechel, C. A. Haynes, B. B. Finlay, and N. C. Strynadka. 2000. Crystal structure of enteropathogenic Escherichia coli intimin-receptor complex. Nature 405: 1073-7. Makino, S., H. Asakura, T. Shirahata, T. Ikeda, K. Takeshi, K. Arai, M. Nagasawa, T. Abe, and T. Sadamoto. 1999. Molecular epidemiological study of a mass outbreak caused by enteropathogenic Escherichia coli 0157:H45. Microbiol Irnmunol 43:381-4. Maynard Smith, J., and N. H. Smith. 1998. Detecting recombination from gene trees. Mol Biol Evol 15:590-9. McClain, M. S., I. C. Blomfield, and B. I. Eisenstein. 1991. Roles of fimB and fimE in site-specific DNA inversion associated with phase variation of type 1 fimbriae in Escherichia coli. J Bacteriol 173:5308-14. McDaniel, T. K., K. G. Jarvis, M. S. Donnenberg, and J. B. Kaper. 1995. A genetic locus of enterocyte effacement conserved among diverse enterobacterial pathogens. Proc Natl Acad Sci U S A 92:1664-8. McDaniel, T. K., and J. B. Kaper. 1997. A cloned pathogenicity island from enteropathogenic Escherichia coli confers the attaching and effacing phenotype on E. coli K-12. Mol Microbiol 23:399-407. McGraw, E. A., J. Li, R. K. Selander, and T. S. Whittam. 1999. Molecular evolution and mosaic structure of or, B, and y intimins of pathogenic Escherichia coli. Mol Biol Evol 16:12-22. Mellies, J. L., S. J. Elliott, V. Sperandio, M. S. Donnenberg, and J. B. Kaper. 1999. The Per regulon of enteropathogenic Escherichia coli: identification of a regulatory cascade and a novel transcriptional activator, the locus of enterocyte effacement (LEE)-encoded regulator (Ler). Mol Microbiol 33:296-306. Mellies, J. L., F. Navarro-Garcia, I. Okeke, J. Frederickson, J. P. N ataro, and J. B. Kaper. 2001. espC pathogenicity island of enteropathogenic Escherichia coli encodes an enterotoxin. Infect Immun 69:315-24. 127 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. Menotti-Raymond, M., W. T. Starmer, and D. T. Sullivan. 1991. Characterization of the structure and evolution of the Adh region of Drosophila hydei. Genetics 127:355-66. Miyata, T.,. and T. Yasunaga. 1980. Molecular evolution of mRNA: a methOd for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application. J Mol Evol 16:23-3 6. Nataro, J. P., and J. B. Kaper. 1998. Diarrheagenic Escherichia coli. Clin Microbiol Rev 11:142-201. Nataro, J. P., K. O. Maher, P. Mackie, and J. B. Kaper. 1987. Characterization of plasmids encoding the adherence factor of enteropathogenic Escherichia coli. Infect Immun 55:2370-7. Nataro, J. P., I. C. Scaletsky, J. B. Kaper, M. M. Levine, and L. R. Trabulsi. 1985. Plasmid-mediated factors conferring diffuse and localized adherence of enteropathogenic Escherichia coli. Infect Immun 48:378-83. Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3:418- 26. Neves, B. C., R. K. Shaw, G. Frankel, and S. Knutton. 2003. Polymorphisms within EspA filaments of enteropathogenic and enterohemorrhagic Escherichia coli. Infect Immun 71:2262-5. Nguyen, R. N., L. S. Taylor, M. Tauschek, and R. M. Robins-Browne. 2006. Atypical enteropathogenic Escherichia coli infection and prolonged diarrhea in children. Emerg Infect Dis 12:597-603. Nielsen, R. 2002. Mapping mutations on phylogenies. Syst Biol 51:729-39. Nielsen, R., and J. P. Huelsenbeck. 2002. Detecting positively selected amino acid sites using posterior predictive P-values. Pac Symp Biocomput:576-88. Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929-36. Ofek, 1., and E. H. Beachey. 1978. Mannose binding and epithelial cell adherence of Escherichia coli. Infect Immun 22:247-54. Ohta, T., and C. J. Basten. 1992. Gene conversion generates hypervariability at the variable regions of kallikreins and their inhibitors. Mol Phylogenet Evol 1:87- 90. 128 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. Okeke, I. N., J. A. Borneman, S. Shin, J. L. Mellies, L. E. Quinn, and J. B. Kaper. 2001. Comparative sequence analysis of the plasmid-encoded regulator of enteropathogenic Escherichia coli strains. Infect Immun 69:5553-64. Okeke, I. N., A. Lamikanra, H. Steinruck, and J. B. Kaper. 2000. Characterization of Escherichia coli strains from cases of childhood diarrhea in provincial southwestern Nigeria. J. Clin. Microbiol. 38:7-12. Olsen, A., A. Arnqvist, M. Hammar, and S. Normark. 1993. Environmental regulation of curli production in Escherichia coli. Infect Agents Dis 2:272-4. Olsen, A., A. Jonsson, and S. Normark. 1989. F ibronectin binding mediated by a novel class of surface organelles on Escherichia coli. Nature 338:652-5. Orden, J. A., M. Yuste, D. Cid, T. Piacesi, S. Martinez, J. A. Ruiz-Santa- Quiteria, and R. De la Fuente. 2003. Typing of the eae and espB genes of attaching and effacing Escherichia coli isolates from ruminants. Vet Microbiol 96:203-15. Orndorff, P. E., and S. Falkow. 1984. Organization and expression of genes responsible for type 1 piliation in Escherichia coli. J Bacteriol 159:736-44. Orskov, F., T. S. Whittam, A. Cravioto, and I. Orskov. 1990. Clonal relationships among classic enteropathogenic Escherichia coli (EPEC) belong to different O groups. J Infect Dis 162:76-81. Oswald, E., H. Schmidt, S. Morabito, H. Karch, O. Marches, and A. Caprioli. 2000. Typing of intimin genes in human and animal enterohemorrhagic and enteropathogenic Escherichia coli: characterization of a new intimin variant. Infect Immun 68:64-71. Peek, A. S., V. Souza, L. E. Eguiarte, and B. S. Gaut. 2001. The interaction of protein structure, selection, and recombination on the evolution of the type-1 fimbrial major subunit (fimA) from Escherichia coli. J Mol Evol 52:193-204. Perna, N. T., G. F. Mayhew, G. Posfai, S. Elliott, M. S. Donnenberg, J. B. Kaper, and F. R. Blattner. 1998. Molecular evolution of a pathogenicity island from enterohemorrhagic Escherichia coli 0157:H7. Infect Immun 66:3810-7. Prigent-Combaret, C., G. Prensier, T. T. Le Thi, O. Vidal, P. Lejeune, and C. Dorel. 2000. Developmental pathway for biofilm formation in curli-producing Escherichia coli strains: role of flagella, curli and colanic acid. Environ Microbiol 2:450—64. 129 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. Ramachandran, V., K. Brett, M. A. Hornitzky, M. Dowton, K. A. Bettelheim, M. J. Walker, and S. P. Djordjevic. 2003. Distribution of intimin subtypes among Escherichia coli isolates from ruminant and human sources. J Clin Microbiol 41 :5022-32. Reid, S. D., D. J. Betting, and T. S. Whittam. 1999. Molecular detection and identification of intimin alleles in pathogenic Escherichia coli by multiplex PCR. J Clin Microbiol 37 :27 1 9-22. Reid, S. D., C. J. Herbelin, A. C. Bumbaugh, R. K. Selander, and T. S. Whittam. 2000. Parallel evolution of virulence in pathogenic Escherichia coli. Nature 406:64-7. Riley, L. W., R. S. Remis, S. D. Helgerson, H. B. McGee, J. G. Wells, B. R. Davis, R. J. Hebert, E. S. Olcott, L. M. Johnson, N. T. Hargrett, P. A. Blake, and M. L. Cohen. 1983. Hemorrhagic colitis associated with a rare Escherichia coli serotype. N Engl J Med 308:681-5. Rodrigues, J., I. C. Scaletsky, L. C. Campos, T. A. Gomes, T. S. Whittam, and L. R. Trabulsi. 1996. Clonal structure and virulence factors in strains of Escherichia coli of the classic serogroup 055. Infect Immun 64:2680-6. Roe, A. J., C. Currie, D. G. Smith, and D. L. Gally. 2001. Analysis of type 1 fimbriae expression in verotoxigenic Escherichia coli: a comparison between serotypes 0157 and 026. Microbiology 147:145-52. Roesch, P. L., and I. C. Blomfield. 1998. Leucine alters the interaction of the leucine-responsive regulatory protein (Lrp) with the fim switch to stimulate site- specific recombination in Escherichia coli. Mol Microbiol 27 :751-61. Rumer, L., J. Jores, P. Kirsch, Y. Cavignac, K. Zehmke, and L. H. Wieler. 2003. Dissemination of pheU- and phe V-located genomic islands among enteropathogenic (EPEC) and enterohemorrhagic (EHEC) E. coli and their possible role in the horizontal transfer of the locus of enterocyte effacement (LEE). Int J Med Microbiol 292:463-75. Sawyer, S. 1989. Statistical tests for detecting gene conversion. Mol Biol Evol 6:526-38. Sawyer, S. A. 1999. GENECONV: A computer package for the statistical detection of gene conversion. Distributed by the author, Department of Mathematics, Washington University in St. Louis, available at http://www.math.wustl.edu/~sawyer. 130 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. Schaeffer, A. J., W. R. Schwan, S. J. Hultgren, and J. L. Duncan. 1987. Relationship of type 1 pilus expression in Escherichia coli to ascending urinary tract infections in mice. Infect Immun 55:37 3-80. Schmidt, H., H. Russmann, and H. Karch. 1993. Virulence determinants in nontoxinogenic Escherichia coli 0157 strains that cause infantile diarrhea. Infect Immun 61:4894-8. Shaw, R. K., S. Daniel], F. Ebel, G. Frankel, and S. Knutton. 2001. EspA filament-mediated protein translocation into red blood cells. Cell Microbiol 3:213-22. Sinclair, J. F., E. A. Dean-Nystrom, and A. D. O'Brien. 2006. The established intimin receptor Tir and the putative eucaryotic intimin receptors nucleolin and Bl integrin localize at or near the site of enterohemorrhagic Escherichia coli 0157:H7 adherence to enterocytes in vivo. Infect Immun 74:1255-65. Sjobring, U., G. Pohl, and A. Olsen. 1994. Plasminogen, absorbed by Escherichia coli expressing curli or by Salmonella enteritidis expressing thin aggregative fimbriae, can be activated by simultaneously captured tissue-type plasminogen activator (t-PA). Mol Microbiol 14:443-52. Sperandio, V., J. B. Kaper, M. R. Bortolini, B. C. Neves, R. Keller, and L. R. Trabulsi. 1998. Characterization of the locus of enterocyte effacement (LEE) in different enteropathogenic Escherichia coli (EPEC) and Shiga-toxin producing Escherichia coli (STEC) serotypes. FEMS Microbiol Lett 164:133-9. Stephan, R., N. Borel, C. Zweifel, M. Blanca, and J. E. Blanco. 2004. First isolation and further characterization of enteropathogenic Escherichia coli (EPEC) 0157:H45 strains from cattle. BMC Microbiol 4:10. Stephens, J. C. 1985. Statistical methods of DNA sequence analysis: detection of intragenic recombination or gene conversion. Mol Biol Evol 2:539-56. Stone, K. D., H. Z. Zhang, L. K. Carlson, and M. S. Donnenberg. 1996. A cluster of fourteen genes from enteropathogenic Escherichia coli is sufficient for the biogenesis of a type IV pilus. Mol Microbiol 20:325-37. Strom, M. S., and S. Lory. 1993. Structure-function and biogenesis of the type IV pili. Annu Rev Microbiol 47 :565-96. Suzuki, Y. 2004. New methods for detecting positive selection at single amino acid sites. J Mol Evol 59:11-9. Suzuki, Y., and T. Gojobori. 1999. A method for detecting positive selection at single amino acid sites. Mol Biol Evol 16:1315-28. 131 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. Tarr, C. L., and T. S. Whittam. 2002. Molecular evolution of the intimin gene in O] 11 clones of pathogenic Escherichia coli. J Bacteriol 184:479-87. Tobe, T., T. Hayashi, C. G. Han, G. K. Schoolnik, E. Ohtsubo, and C. Sasakawa. 1999. Complete DNA sequence and structural analysis of the enteropathogenic Escherichia coli adherence factor plasmid. Infect Immun 67:5455-62. Tobe, T., G. K. Schoolnik, I. Sohel, V. H. Bustamante, and J. L. Puente. 1996. Cloning and characterization of bjpTVW, genes required for the transcriptional activation of bpr in enteropathogenic Escherichia coli. Mol Microbiol 21:963- 75. Trabulsi, L. R., R. Keller, and T. A. Tardelli Games. 2002. Typical and atypical enteropathogenic Escherichia coli. Emerg Infect Dis 8:508-13. Uhlich, G. A., J. E. Keen, and R. 0. Elder. 2001. Mutations in the csgD promoter associated with variations in curli expression in certain strains of Escherichia coli 0157:H7. Appl Environ Microbiol 67 :2367-70. Uhlich, G. A., J. E. Keen, and R. 0. Elder. 2002. Variations in the csgD promoter of Escherichia coli 0157:H7 associated with increased virulence in mice and increased invasion of HEp-2 cells. Infect Immun 70:395-9. Valentiner—Branth, P., H. Steinsland, T. K. Fischer, M. Perch, F. Scheutz, F. Dias, P. Aaby, K. Molbak, and H. Sommerfelt. 2003. Cohort study of Guinean children: incidence, pathogenicity, conferred protection, and attributable risk for enteropatho gens during the first 2 years of life. J Clin Microbiol 41:4238-45. Vallance, B. A., and B. B. Finlay. 2000. Exploitation of host cells by enteropathogenic Escherichia coli. Proc Natl Acad Sci U S A 97 :8799-806. Vidal, 0., R. Longin, C. Prigent—Combaret, C. Dorel, M. Hooreman, and P. Lejeune. 1998. Isolation of an Escherichia coli K—12 mutant strain able to form biofilms on inert surfaces: involvement of a new ompR allele that increases curli expression. J Bacteriol 180:2442-9. Wales, A. D., M. J. Woodward, and G. R. Pearson. 2005. Attaching-effacing bacteria in animals. J Comp Pathol 132:1-26. Weiller, G. F. 1998. Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences. Mol Biol Evol 15:326-35. 132 155. 156. 157. 158. 159. 160. Weissman, S. J., S. Chattopadhyay, P. Aprikian, M. Obata-Yasuoka, Y. Yarova-Yarovaya, A. Stapleton, W. Ba-Thein, D. Dykhuizen, J. R. Johnson, and E. V. Sokurenko. 2006. Clonal analysis reveals high rate of structural mutations in fimbrial adhesins of extraintestinal pathogenic Escherichia coli. Mol Microbiol 59:975-88. Whittam, T. S., and E. A. McGraw. 1996. Clonal analysis of EPEC serogroups. Reviews in Microbiology 27 (Suppl. #1):7-16. Wieler, L. H., T. K. McDaniel, T. S. Whittam, and J. B. Kaper. 1997. Insertion site of the locus of enterocyte effacement in enteropathogenic and enterohemorrhagic Escherichia coli differs in relation to the clonal phylogeny of the strains. FEMS Microbiol Lett 156:49-53. Yang, Z., R. Nielsen, N. Goldman, and A. M. Pedersen. 2000. Codon- substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431-49. Yang, Z., and W. J. Swanson. 2002. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol Biol Evol 19:49-57. Zhang, W. L., B. Kohler, E. Oswald, L. Beutin, H. Karch, S. Morabito, A. Caprioli, S. Suerbaum, and H. Schmidt. 2002. Genetic diversity of intimin genes of attaching and effacing Escherichia coli strains. J Clin Microbiol 40:4486-92. 133 uljllllllljlliljljjlljjjll