.u » a: Jan ‘ a w“; £4 (:4: .| hair an. ran :31 A . x; ‘. g.u}mm..) .ppacfi‘m‘w @ Sufism. 3,} . mummy... {:2 i“. ‘» $5514 .5. ,,r .3133 I... .. 22.512 v! ,- 2.MI..LOR.HH~ .2 itHQI-n I 2‘ -.....__,.,, ff..-“ \ I LIBRARY 1006’ Michigan State University This is to certify that the thesis entitled RELEASE FROM ANT AGONISTIC PLEIOTROPY AND COEVOLUTION FOLLOWING GENE DUPLICATION IN FUNGAL MITOCHONDRIAL HEAT SHOCK PROTEINS presented by Krista Gudrais Reitenga has been accepted towards fulfillment of the requirements for the Master of degree in Microbiology & Molecular Science Genetics // L/§::m / fl ji__;Major Piofgsor’ 3 Signature 5/ 61/ 1 Date MSU is an Affirmative Action/Equal Opportunity Employer PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE SIOB KzlProj/Acc8Pres/CIRC/DateDuohdd ~--‘ “3....— RELEASE FROM ANT AGONISTIC PLEIOTROPY AND COEVOLUTION FOLLOWING GENE DUPLICATION IN FUNGAL MIT OCHONDRIAL HEAT SHOCK PROTEINS By Krista Gudrais Reitenga A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Microbiology & Molecular Genetics 2009 F0 that g3 multpl Mai F65 cl ABSTRACT RELEASE FROM ANTAGONISTIC PLEIOTROPY AND COEVOLUTION FOLLOWING GENE DUPLICATION IN FUNGAL MIT OCHONDRIAL HEAT SHOCK PROTEINS By Krista Gudrais Reitenga SSCl is a gene that encodes a multifunctional mitochondrial heat shock protein that gave rise to SSQl by gene duplication in a subset of yeasts. In contrast to the multiple chaperone functions carried out by most heat shock proteins, Ssqlp is specialized in Fe/S cluster assembly. Ssclp and Ssqlp both participate in the formation of Fe/S clusters and require interaction with Jaclp. Biochemical experiments and genetic manipulation of Saccharomyces cerevisiae have provided evidence that Ssqlp and J aclp may have coevolved to optimize a specialized interaction. Together, these factors present a unique opportunity to understand how natural selection shapes the functional coevolution of gene duplicates. We hypothesized that the divergence of SSCl and SSQl resulted in the coevolution of the JACl-SSQl pair. Here, we report that, in the presence of a rapidly evolving SSQl, the average rate of J AC1 evolution has decreased. Our study also supports a burst of adaptive evolution in SSQl immediately following its inception. Additionally, both SSCl and SSQl exhibit elevated rates of evolution when co— occurring. When taken together, the signatures of ancestral and present-day selection point to a release from antagonistic pleiotropy that facilitated coevolution between J AC1 and SSQl. This study offers detailed evidence that the duplication of multifunctional genes allows for the coevolution of interacting proteins to optimize a paired function. LIST ( LIST ( SECII NRC little filter .V‘n D d P? FD z. rd ,‘Qr,cf?."_"—4€I7V’;r TABLE OF CONTENTS LIST OF TABLES ............................................................................ v LIST OF FIGURES ........................................................................... vi SECTION I. BACKGROUND INTRODUCTION ............................................................................ 1 Hsp70s: Characteristic Features .................................................... 2 Hsp70 Phylogenetic Distribution and Gene Family Evolution ................ 3 Yeast Mitochondrial Hsp7OS ........................................................ 7 88C]: A Multifunctional Mitochondrial Hsp7O ................................. 9 Iron-Sulfur Cluster Assembly ...................................................... 10 SSQl: The Mitochondrial Hsp70 Iron-Sulfur Cluster Specialist .............. 13 J-protein Co—chaperones ............................................................ 15 J AC1: The Mitochondrial J -protein Iron—Sulfur Cluster Specialist ............ 17 Patterns and Mechanisms of Gene Evolution Following Duplication. . . . . l9 Detecting Signatures of Selection by Evolutionary Rate Comparisons ...... 25 Correlated Evolution vs. Co-Adaptation ................................ 27 Site-Specific Models ........................................................ 29 Branch-Specific Models .................................................... 31 Branch-Site Models ......................................................... 32 C lade Models ................................................................ 33 Tools for Evolutionary Rate Analysis .................................... 35 Methodological Limitations ................................................ 37 SECTION H. EXPERIMENTAL STUDY HYPOTHESES AND PREDICTIONS .................................................... 40 METHODS ..................................................................................... 47 Fungal Taxa and Gene Sequence Alignments .................................... 47 Cladogram Construction for PAML Input Trees ................................. 49 Data Partitioning ............................................................ 49 Maximum Parsimony ....................................................... 51 Maximum Likelihood ....................................................... 53 Bayesian Inference .......................................................... 54 Constructing Composite Input Tree Topologies ........................ 56 thsp70 Clade Model Rate Comparisons ........................................ 58 Site-Specific Rate Tests ............................................................. 6O Branch-Site Test ...................................................................... 61 RESULTS ...................................................................................... 64 85C] evolution accelerated in the presence of SSQl ........................... 64 SSQl has evolved at a faster rate than SSCl ..................................... 65 JACl evolution has decelerated in the presence of SSQl ...................... 68 SSQl has evolved under positive selection ....................................... 71 iii DlSCl APPEf REE} DISCUSSION .................................................................................. 77 Future Directions ..................................................................... 87 APPENDICES Appendix A: Fungal Mitochondrial Heat Shock Protein Coding Region DNA Sequence Sources ................................................... 90 Appendix B: Fungal Mitochondrial Heat Shock Protein Multiple Sequence Alignments ......... - ....................................................... 94 Appendix C: Fungal Mitochondrial Heat Shock Protein Phylogenetic Gene Tree Input Topologies for codem'l Evolutionary Rate Analysis ......................................................... 122 Appendix D: Evolutionary Rate Test Specifications Used in Control Files Used to Run codem'l of PAML .................................. 141 Appendix E: Likelihood Ratio Tests of codeml Evolutionary Rate Analyses ........................................................................ 143 REFERENCES ................................................................................ 153 labia its: C Table VF ‘2! Table i Model Table l llcdel LIST OF TABLES Table 1: Evolutionary Rate ((1)) Estimation Under the Branch-Site Model ............ 62 Table A1: SSCl Sequence Sources ......................................................... 91 Table A2: SSQl Sequence Sources ......................................................... 92 Table A3: I AC1 Sequence Sources ......................................................... 93 Table El: Likelihood Ratio Test Comparison of SSCl Clade Model _ Test Outputs ................................................................................... 144 Table E2: Likelihood Ratio Test Comparison of 88C] and SSQl Clade Model Test Outputs ......................................................................... 146 Table E3: Likelihood Ratio Test Comparison of JACl Site-Specific Model Test Outputs ........................................................................ 149 ‘ Table E4: Likelihood Ratio Test Comparison of SSQl Branch-Site Model Test Outputs ......................................................................... 151 hgur reigns duplz. Figlin distril Figurt coast! Figure {Lplll l Figure and :21! LIST OF FIGURES Figure 1: A simplified Cladogram representing the evolutionary relationships among selected fungi in relation to thsp70 gene duplication events ............................................................................. 8 Figure 2: Summary of mitochondrial heat shock protein (thsp) distribution among fungal clades ............................................................ 41 Figure 3: Average within-clade pair-wise sequence divergence of J AC1 ............. 48 Figure 4: Summary of data partitions and phylogenetic tree construction for evolutionary rate analysis ................................................. 58 Figure 5: a priori defined lineages used for clade and branch-site model input trees ...................................................................................... 63 Figure 6: Comparison of SSC 1 codon evolution from taxa encoding SSQl and taxa lacking SSQl ........................................................................ 65 Figure 7: Comparison of SSCl and SSQl codon evolution ............................. 67 Figure 8: Site-specific to estimations for J AC1 from clades encoding SSQl ......... 70 Figure 9: Site-specific a) estimations for J AC1 from clades lacking SSQl ........... 71 Figure 10: Comparison of ancestral SSQl codon evolution to SSQl and 88C] evolution within all other lineages ................................................... 74 Figure 11: Comparison of posterior probabilities of placement of sites into a divergent rate class by the branch-site model, among input tree topologies ....................................................................................... 75 Figure 12: Amino acid sequence of Ssql encoded by Saccharomyces cerevisiae YJM789 showing sites inferred to exhibit relaxed selective constraint and ancestral positive selection ................................................. 76 Figure Bl: SSCl amino acid multiple sequence alignment ......................... 95 Figure B2: SSCl and SSQl combined amino acid multiple sequence alignment ..................................................................................... 106 Figure B3: Saccharomyces clade J AC1 amino acid multiple sequence alignment ..................................................................................... 118 vi Figure B4: Candida clade JACl amino acid multiple sequence alignment ..................................................................................... 119 Figure B5: F usarium clade JACl amino acid multiple sequence alignment ...................................................................................... 120 Figure B6: Aspergillus clade J AC1 amino acid multiple sequence alignment ....................................................................................... 121 Figure Cl: SSCl Bayesian Inference Tree 1 .............................................. 123 Figure C2: SSCl Bayesian Inference Tree 2 .............................................. 123 Figure C3: SSCl Bayesian Inference Tree 3 .............................................. 124 Figure C4: SSCl Bayesian Inference Tree 4 .............................................. 124 ' Figure C5: SSCl Maximum Likelihood Tree 1 .......................................... 125 Figure C6: SSCl Maximum Likelihood Tree 2 ........................................... 125 Figure C7: SSCl Maximum Likelihood Tree 3 ........................................... 126 Figure C8: SSCl Maximum Parsimony Tree 1 ........................................... 126 Figure C9: SSCl Maximum Parsimony Tree 2 ........................................... 127 Figure C10: SSCl Maximum Parsimony Tree 3 .......................................... 127 Figure C11: SSCl Maximum Parsimony Tree 4 .......................................... 128 Figure C12: SSCl and SSQ1 Bayesian Inference Tree 1 ................................ 129 Figure C13: SSC] and SSQ1 Bayesian Inference Tree 2 ................................ 130 Figure C14: SSCl and SSQ1 Bayesian Inference Tree 3 ................................ 131 Figure C15: SSC] and SSQ1 Bayesian Inference Tree 4 ................................ 132 Figure C16: SSC] and SSQ1 Bayesian Inference Tree 5 ................................ 133 Figure C17: SSCl and SSQ1 Bayesian Inference Tree 6 ................................ 134 Figure C18: SSCl and SSQ1 Maximum Likelihood Tree ............................... 135 Figure C19: SSC] and SSQ1 Maximum Parsimony Tree 1 ............................. 136 vii Figure C20: SSCl and SSQ1 Maximum Parsimony Tree 2 ............................. 137 Figure C21: JACl Saccharomyces Bayesian Inference/Maximum Parsimony Tree ................................................................................ 138 Figure C22: J AC1 Saccharomyces Maximum Likelihood Tree ........................ 138 Figure C23: J AC1 Candida Bayesian Inference Tree .................................... 138 Figure C24: JACl Candida Maximum Likelihood Tree .................. A ............... 138 Figure C25: JACl Candida Maximum Parsimony Tree ................................. - 139 Figure C26: J AC1 F usarium Bayesian Inference/Maximum Likelihood/ Maximum Parsimony Tree ................................................................... 139 Figure C27: JACl Aspergillus Bayesian Inference Tree ................................. 139 Figure C28: J AC1 Aspergillus Maximum Likelihood Tree ............................. 139 Figure C29: J AC1 Aspergillus Maximum Parsimony Tree .............................. 140 viii E [\TR( gulps races it ligand SECTION I. BACKGROUND INTRODUCTION Coevolution has long been appreciated as a mechanism that operates between groups of organisms with the potential to create ecological mutualisms and initiate arms races for adaptation. However, coevolution is a pervasive phenomenon which extends F beyond macroscopic interactions such as among flowers and their pollinators or hosts and their parasites. Phenotypes that determine ecological fitness are the result of complex biochemical pathways. Coevolution, therefore, also takes place among the molecules E within organisms, and at times, may even be responsible for species-level interdependencies and competitive strategies. Through molecular coevolution, proteins can exert a selective influence over interacting partners or components of a biochemical pathway to favor molecular cooperation or antagonism. Proteins may become specialist or generalist as a result. Therefore, the evolutionary success of organisms hinges upon the fitness advantages conferred by molecular components. Additionally, molecular coevolution may influence genetic interactions, which can, among other things, lead to congenital diseases and contribute to the process of speciation. Coevolution thus merits . careful study to facilitate our understanding of many fundamental aspects of biology. Heat shock proteins (Hsps) constitute a group of proteins that are of central importance to nearly all organisms. A great deal of data has been amassed concerning the biochemical and genetic properties of Hsps and has led to detailed understanding of the many known functions of these proteins. While highly conserved and slowly evolving, one class of Hsps exhibits dynamic variation in their gene copy number. In one BSSC deci inter CXCF interesting case, gene amplification has led to the specialization of an Hsp in Fe/S cluster assembly, an essential pathway for which biochemical mechanisms are only now being deciphered. Furthermore, the multiple functions carried out by Hsps necessitate interaction with a wide variety of protein partners and creates ample potential for Hsps to exert a reciprocal influence on other constituents of networks. Combined, the characteristics of Hsps present a unique opportunity to study how changes in gene copy number affect coevolution of interacting partners within an essential biochemical pathway. i Hsp70s: Characteristic Features So named for their discovery (Ritossa 1962) as a group of proteins that exhibited increased abundance in cells following heat stress, heat shock proteins of the 70 kiloDalton (kDa) class (Hsp703) represent a multi-gene family of protein chaperones with a nearly ubiquitous distribution within the tree of life. Homologs have been found throughout the Bacteria and Eukarya, as well as some representatives in Archaea (the absence of Hsp70s in Archaea has been reported by (Gribaldo et a1. 1999)). Hsp7OS are known to participate in an array of indispensable functions associated with the folding, . transport, and degradation of a wide variety of polypeptides. Hsp7OS may perform house- keeping functions constitutively under many physiological conditions or exhibit transcriptional up-regulation in response to environmental stresses in order to protect the integrity of polypeptide components of the cell (Boorstein et al. 1994). Since their first identification in heat stressed drosophila cells in the 1970’s (T issieres et a1. 1974; Bukau and Horwich 1998) many other stimuli have been demonstrated to trigger increased SIT In) an Egg. mu“. \1( synthesis of Hsps, including exposure to ethanol, anoxic conditions, heavy metal ions, and ultraviolet light (Lindquist and Craig 1988). The Hsp703 have an extremely slow rate of evolution and share a common t1i~domain protein structure across all three domains of life. The canonical form comprises a 44 kDa amino-terminal ATPase domain, an 18 kDa peptide binding domain (Wang et a1. 1993), and a 10 kDa carboxy-terminal domain of ’ variable amino acid composition. Hydrolysis of adenosine-5'-triphosphate (ATP) regulates the induction of a conformational change within the Hsp703’ substrate binding pocket and consequent binding and release of hydrophobic regions of the substrate polypeptide (Bukau and Horwich 1998). The functions of the proteins comprising the Hsp70 family are so well conserved that, when expressed by a mammalian cell, an Hsp70 protein from a fruit fly is able to perform heat stress protection (Pelham 1984). Hsp70 Phylogenetic Distribution and Gene Family Evolution Though many Hsp7O homologs have retained equivalent functional abilities across divergent organismal taxa, the number of Hsp70 genes encoded within a genome shows plasticity, a dynamic rife with evolutionary and ecological potential. Comparative sequence analyses have revealed that the eukaryotic Hsp70 genes, all encoded within the nuclear genome, constitute four phylogenetically distinct clades. The clades are characterized by common intracellular localization of the protein products to either the mitochondria, endoplasmic reticulum, plastids, or cytoplasm (Boorstein et a1. 1994). Nearly all eukaryotes contain at least three Hsp70 gene copies; the budding yeast Saccharomyces cerevisiae possesses 9 cytosolic (cyt), 3 mitochondrial (mt), and 2 endoplasmic reticulum (er) isoforms of Hsp70. However, the number of paralogs cncod Hspil'. specie the Eli encodi genes 1 CK‘Clll'it' genes. I Wage Nil “1' m.“ and Fe: encoded by different eukaryotic genomes can vary widely, as exemplified by the 10 Hsp70 genes found in the nematode Caenorhabditis elegans and 19 in the closely related species C. briggsae (Nikolaidis and Nei 2004). An early gene duplication event prior to the radiation of eukaryotic species gave rise to the cytHsp7OS and erHsp7OS. The genes encoding Hsp7OS of the mitochondria and plastids are likely of bacterial origin. After establishment of the bacterial endosymbionts that are hypothesized to have gaven rise to the mitochondria and plastids in an ancestral eukaryote, lateral transfer of the Hsp70 genes from the organellar genomes to the nuclear chromosome is thought to have occurred (Muhlenhoff and Lill 2000). Gene duplication is known to play an important role in the amplification of Hsp70 genes. Duplication is likely facilitated by inverted and tandem cytHsp7O gene pair arrangements common to the genomes of the Caenorhabdid nematodes (N ikolaidis and ' Nei 2004), mosquito (Benedict et a1. 1993), rat (Walter et a1. 1994), fruit fly (Bettencourt and Feder 2002), fugu (Lim and Brenner 1999), and human (T avaria et a1. 1996). A biological cost-benefit balance may play a role in governing the cytHsp70 copy number within genomes. Cells sustain a cost of deleterious effects on growth, imposing an upper limit on the optimum Hsp7O expression level due to a cost of replicating additional Hsp70 genes, energy required for additional translation, or a toxic effect associated with Hsp70 expression above a certain threshold. Conversely, an increase in Hsp70 expression may offer the benefit of an enhanced ability to survive environmental stresses. Evidence of a correlation between Hsp70 expression level and degree of thermotolerance has been well documented in Drosophila (Feder et a1. 1996). Thermotolerance and survival in the face of other environmental stressors by Hsp70 buffering therefore constitute ecologically relevant gene cor frequent iBCllEl'lC concurs spreadin rimmed Within 31 dlltlgen mutation inurlf} lititltlio k a ton m51mm Hép‘Os f Nei 2011 relevant phenotypes on which natural selection may act. Examination of two cytHsp7O paralog clusters from Drosophila revealed that gene conversion between and among groups of physically clustered genes is likely to be a frequent event which contributes to the homogenization of Hsp70 copies within a group (Bettencourt and Feder 2002). Gene conversion maintains sequence similarity, while concurrently enabling a subgroup of cytHsp703 to diverge in a concerted manner by spreading new mutations among copies. Gene conversion among cytHsp70 has also been reported in the nematodes (Nikolaidis and Nei 2004) and has been suspected to occur within angiosperm plants (Renner and Waters 2007). Alternatively, the lack of divergence among a group of Hsp705 may be due to slow evolutionary rates. The bias of mutations exhibited among paralogs toward synonymous changes implies the large role of purifying selection. In conjunction with gene homogenization, the spread of deleterious changes among Hsp70 paralogs is disfavored (Bettencourt and Feder 2002). As a consequence, Hsp70 sequences of proteins localized to the same cellular compartment from distantly related organisms tend to share greater similarity than Hsp708 from different cellular compartments within the same organism. (N ikolaidis and Nei 2004). Unlike the mechanisms of convergent evolution that characterize many cytHsp7OS, the mt- and erHsp7OS show evidence of divergent evolution. Diversifying selection is a mechanism which drives divergent evolution and is facilitated by the process of independent gene duplication and loss events among lineages (Ota and Nei 1994). The birth and death of paralogs is a feature of the mt- and erHsp70s. While many divergent eukaryotes, including Drosophila, nematodes, and the marine diatom ‘i rabid: Waters plastids to the H ilk plat cyanobe encode ' nm3h {lambs Show ll finals 1 minty; Wile: klflligl “mtg; Willa 53 come Sign? Us. The-.1. Le;- Thalassiosira pseudonana encode a single thsp70, the thsp7OS have undergone duplication in other eukaryotic lineages, with Saccharomyces cerevisiae possessing 3, Arabidopsis thaliana with 2, and Plasmodium falciparum genomes with l (Renner and Waters 2007). Congruent with the hypothesis for the origin of eukaryotic mitochondria and plastids from ancient bacterial endosymbionts, thsp70 genes display greatest similarity to the Hsp70 bacterial homologues from representatives of the a-Proteobacteria, whereas the plastid Hsp70 genes most closely resemble the heat shock protein genes of cyanobacteria (Boorstein et a1. 1994). Within the Bacteria, some organisms may also encode multiple Hsp705 (referred to as dnaK or heat shock cognate, hsc, in the bacteria), with 3 homologs in the Escherichia coli genome (Itoh et a1. 1999) and the cyanobacterium Synechococcus (W ard-Rainey et al. 1997). Bacterial Hsp7OS have been shown to display paralog-specific localization patterns. In the case of Synechococcus, dnaK3 localizes specifically to the cytosolic thylakoid membrane of an oxygen-producing photosynthetic system (Nimura et a1. 1996), analogous to the plastid-specific organellar localization observed in some eukaryotes. In contrast to the ever-present status of Hsp70 in Eukarya and Bacteria, the detection of gene homologs within Archaea has been patchy, with presence reported in some taxa (Macario et al. 1991; Gupta and Singh 1992, 1994), but absence of recognizable homologs in others (Lange et a1. 1997). These observations have given rise to controversy surrounding the origin of the archaeal Hsp70 and challenge the reliability of the use of Hsp70 as a phylogenetic marker with respect to the three domains of life. The alternative hypotheses of lateral acquisition in a subset of lineages (Philippe et a1. Yea Pr,— r.l 1999) and differential gene loss (Gupta 1999) have also been proposed. Yeast Mitochondrial Hsp705 The plasticity of gene copy number within the Hsp70 gene family has produced a particularly interesting outcome within the yeast thsp70s. Gene duplication has given rise to a functionally specialized protein that can be readily studied in the experimentally tractable model eukaryote, Saccharomyces cerevisiae. S. cerevisiae encodes three thsp7OS: Ssclp, the most abundant Hsp70 that functions within the organelle, plus Ssqlp and EcmlOp, two constitutively present forms of rarer abundance. Included in all three yeast thsp70 sequences is a leader sequence that targets the protein products for. import into the mitochondria, where they function in the matrix (Craig 1989). In an event independent of the whole genome duplication estimated to have occurred about 150 million years ago in yeast (Langkjaer et a1. 2003), SSQ1 arose from SSCl by gene duplication prior to the most recent common ancestor of S. cerevisiae and Candida albicans (see Figure 1). Additionally, the paralog SSQ1 has been identified in all descendent fungal taxa studied (Schilke et a1. 2006). The duplication of SSCl is in agreement with the observation that slowly evolving genes in S. cerevisiae tend to duplicate, with subsequent retention of paralogs, more frequently than fast evolving genes (Davis and Petrov 2004). Later, ECMlO, a third yeast thsp70, was generated during the whole genome duplication believed to have occurred in the most recent progenitor of the clade that includes S.cerevisiae and S. castellii (Kellis et a1. 2004) (see Figure 1). While ECMlO now shares 82% amino acid sequence identity with SSCl of S. cerevisiae (Baumann et a1. 2000), SSQ1 has undergone greater divergence, particularly effit' com 81011 first within the substrate-binding domain, sharing an overall amino acid identity of only 52% with SSCl (Schilke et a1. 2006). Each yeast thsp7O is located on a separate nuclear chromosome, a feature which has the potential to result in disparate mutation rates and efficiencies of natural selection which act on the three thsp7O genes. The genomic context within which the thsp7O genes reside can therefore exert an influence on evolutionary rates of these genes independent of their respective protein structure and function (Pal et a1. 2006). Schizosocchoromyces pombe l———Aspergillus niduians I L———Neosartoryafischeri I .——Neurospora crassa —Fusarium verticiliioides ‘r’orrowio Iipolytico r—Deboryomyces hansenii L—Candida olbicans Soccharomyces cos tel/ii andida giabrata Sacchoromyces cerevisiae Figure 1: A simplified Cladogram representing the evolutionary relationships among selected fungi in relation to thsp70 gene duplication events. This Cladogram is a modified version of that constructed by Fitzpatrick et a1. (2006) using maximum likelihood to infer the organismal relationships among fungi based on a concatenated alignment of 153 universally distributed fungal genes. All branches shown were supported with a bootstrap value of 100. The gray star indicates the lineage within which a thsp7O gene duplication gave rise to SSQ1. ‘WGD’ indicates the lineage within which a whole genome duplication took place, giving rise to ECMlO. moi HUT rial Pf?- ‘a ‘,I iii-Lu SSC]: A Multifunctional Mitochondrial Hsp70 Ssclp is a constitutively expressed, essential protein that functions as the major molecular chaperone within the matrix of the yeast mitochondrion and interacts with a myriad of different peptides. The constitutive chaperone tasks of Ssclp involve peptide chain folding, unfolding and translocation necessary for mitochondrial biogenesis. About 10% of the Hsp70 protein present in the mitochondria acts as a component of the pre- protein translocase of the inner membrane (TIM) complex. As a TIM constituent, Ssclp cyclically binds and releases polypeptides to assist the pumping of nuclear-encoded peptide chains across the inner membrane of the mitochondrion. Subsequently, Ssclp facilitates folding of the chains into their native conformation as they emerge into the matrix (Neupert 1997). Because many proteins translated in the cytosol become folded prior to their import across the mitochondrial membranes, protein unfolding into linear peptide chains appropriate for translocation through the TIM complex is also critical, and is yet another function performed by Ssclp via interaction with a substrate peptide’s N- terrninal pre-protein signal sequence (Lim et al. 2001). Ssclp can also be found associated with mitochondrial ribosomes to fold proteins into their native conformation as newly synthesized peptides emerge during translation. Under conditions of heat stress, Ssclp protects the cell from the toxic effects of A protein denaturation and aggregation within mitochondria. For instance, Ssclp is responsible for maintaining Varlp, a subunit of mitochondrial ribosomes, in a soluble form to prevent aggregation or misfolding prior to ribosome assembly, a danger met with ' increased potential during heat shock (Herrmann et al. 1994). Further, Ssclp plays a role in the synthesis of mitochondrial DNA as a partner in the Hsp70—Hsp78 mitochondrial bicha a pan [harm mch cluster creatior linked 1 one dep ImIl-Su J {‘1- CT Lille .ml‘i‘sxg Film Elfin. z“Slam £65361: mine in 55331] it bichaperone system. In yeast, this system is critical to the maintenance and restoration of a particularly thermosensetive enzyme, Miplp, the mtDNA polymerase, during severe thermal stress. The Hsp70-Hsp78 bichaperone is known to localize within protein- mtDNA complexes known as nucleoids, where the bichaperone may act to quickly refold Miplp within the nucleoid scaffold leading to protection and reactivation of the mtDN A polymerase. Reactivation of Miplp is more efficient than importing newly synthesized Miplp into the mitochondrion (Germaniuk et al. 2002). In addition to these classical roles as a chaperone, Ssclp is involved in Fe/S cluster biosynthesis, a function that was encoded by the ancestral mtI-Isp70 prior to the creation of SSQ1 (Schilke et a1. 2006). The process of Fe/S cluster assembly is tightly linked to the mitochondria in eukaryotes and has been appreciated only in recent years as one dependent on an enzyme-mediated biochemical pathway (Zheng et al. 1993). Iron-Sulfur Cluster Assembly From a broad perspective, it is no understatement to characterize Fe/S clusters as ubiquitous chemical structures that enable biochemical reactions essential to the processes that drive Earth’s ecology, since these units make photosynthesis, cellular respiration, and nitrogen fixation possible. Serving as inorganic cofactors for a variety of proteins, Fe/S clusters participate in substrate binding and dictate many catalytic mechanisms via oxidation and reduction within enzymes. Fe/S cluster proteins are thus necessary for the citric acid cycle, haem biosynthesis, DNA repair, protein synthesis, and purine metabolism (Rouault and Tong 2005). Additionally, Fe/S clusters have been shown to sense oxidative stress and intracellular concentrations of iron to mediate cellular 10 16S; I": Slm suit. responses, sometimes as Fe/S cluster-containing transcription factors (Kiley and Beinert 2003). Though many details remain to be clarified, the general mechanism for the synthesis of Fe/S cluster assembly involves an initial step of structurally coordinating sulfur and iron into a cluster on a scaffold protein and the subsequent transfer of the metallocluster to a substrate apoprotein. While enzymatic abstraction from cysteine residues is known to supply the sulfur for Fe/S cluster biogenesis, the source of iron has yet to be elucidated (Lill and Muhlenhoff 2008). Several roles have been proposed for Hsp70 chaperones in the context of Fe/S cluster assembly, though none have been proven experimentally. Hypothesized Hsp70 functions in Fe/S cluster biogenesis include assisting the transfer of assembled Fe/S clusters from the scaffold protein to the recipient apoprotein, or binding to Fe/S assembly proteins and/or substrate apoproteins to prevent inappropriate oxidation of cysteine residues that serve as ligands to coordinate the Fe/S structure (Muhlenhoff and Lill 2000). One certainty that emerges regarding the process of Fe/S cluster assembly is that this multi-step pathway is rife with ample potential for SSCl, SSQ1, and their co—chaperone, .1 AC 1, to interact with many protein players. As a testament to Fe/S cluster essentiality, three different pathways have arisen throughout the tree of life dedicated to Fe/S cluster biogenesis: the nitrogen fixation (NIF), iron-sulfur cluster (ISC), and sulfur utilization factor (SUF) pathways. The N IF ' pathway consists of a set of genes highly conserved in azototrophic bacteria and is devoted to the formation of Fe/S clusters exclusively for the maturation of the nitrogenase enzyme. The more general ISC pathway genes interact to assemble Fe/S prosthetic groups onto a variety of apoproteins (Zheng et al. 1998). This second system is 11 uti, it be r al. If!) the Sl‘ first utilized by a much broader distribution of organisms and shows strong conservation throughout the Bacteria, particularly within the a-proteobacteria, the mitochondrial ancestor of which is hypothesized to have bestowed an intact ISC biosynthesis system to the Eukarya with subsequent preservation from yeast to humans (Lill and Muhlenhoff V 2008). While the Archaea encode many proteins which rely on Fe/S clusters for their functions, this domain of life lacks homologs of both NIF and ISC assembly systems. Instead, these microbes encode genes homologous to some of the genes of the third Fe/S cluster assembly pathway, SUF. The SUF operon encodes a redundant pathway discovered in Escherichia coli when a small degree of Fe/S enzyme activity was retained following deletion of the bacteria’s ISC operon (Takahashi and Tokumoto 2002). Later, in contrast to the housekeeping function of the ISC pathway, the SUF pathway was found to be required by E. coli under conditions of Fe starvation and oxidative stress (Outten et al. 2004). SUF homologs have also been identified within plastid genomes. Additionally, the SUF system may have served as the origin for the scaffold protein of the ISC cluster assembly pathway in some bacteria (Takahashi and Tokumoto 2002). Though the NIF, ISC, and SUF pathways function independently, similarities among the systems abound, which have facilitated the identification of the functional components that perform analogous tasks within yeast mitochondria for Fe/S biogenesis. All together, at least 15 proteins have been implicated as Fe/S cluster assembly proteins that cooperate in the mitochondrial matrix (Lill and Muhlenhoff 2008). Though many of the proteins that require Fe/S clusters function within mitochondria, some cytosolic proteins also contain Fe/S clusters and are believed to receive Fe/S clusters exported from the mitochondria, since Fe/S cluster biogenesis has not been demonstrated to occur in the 12 5501 ales; biochl SSQI mind come fittest Conic; cflzlnj cytosol. The reducing chemical conditions and lower partial pressure of 02 within the mitochondrial matrix relative to the cytosol may have favored the establishment of the Fe/S cluster biogenesis pathway within this organelle (Muhlenhoff and Lill 2000). 8801: The Mitochondrial Hsp70 Iron-Sulfur Cluster Specialist SSQ1 has become specialized in the assembly of Fe/S clusters, but at the price of a loss in the multifunctionality displayed by its paralog SSCl. Recent genetic and in vitro biochemical experiments offer support of the functional specialization of SSQ1. When SSQ1 was deleted from the S. cerevisiae genome, mutants accrued iron within the mitochondrial matrix with a concurrent reduction in Fe/S cluster-containing enzyme concentrations and protection against oxidative agents (Voisine et al. 2000). To further investigate this phenotype, authors of another study used an assay to observe the . conversion of ferredoxin, a mitochondrial protein that requires an Fe/S cluster for enzymatic function, from its apo-forrn to its holo-form within isolated mitochondria. Within mitochondria extracted from an S. cerevisiae SSQ1 deletion strain, the majority of ferredoxin failed to mature into a holoenzyme. Ferredoxin that did achieve the holoenzyme state was found to have reduced kinetic character (Lutz et al. 2001). The interaction of Ssql with known components of the Fe/S cluster assembly pathway has also been tested, and investigators observed efficient binding of a purified protein binding domain fragment of Ssqlp to a peptide fragment of the scaffold protein involved in Fe/S cluster formation (Schilke et al. 2006). These results suggest that Ssqlp is important for Fe/S biogenesis and is able to physically interact with a key component of the pathway. Consistent with the hypothesis of specialization and the concomitant loss of 13 ances bin-iii llllHSj follou mcmb bind 1( ct al. 3 require fading I'Sc'm'iit These n 3301 d. mam v PIOitClic “PINK ancestral thsp70 function after duplication, Ssqlp was found to have very weak binding specificity to peptides known to be bound by Ssclp of both pre- and post thsp70 duplication (Andrew et al. 2006). Additionally, Ssqlp was not detected following co-irnmunoprecipitation with Tiin44p, a component of the inner mitochondrial membrane peptide translocase, was attempted. This demonstrated that Ssqlp does not bind to Tim44p, in contrast to Ssclp, which acts as a subunit of the TIM complex (Lutz et al. 2001). Ssql has lost the ability to bind the same variety of peptide substrates that require Hsp70s for general protein translocation and folding. In accordance with this finding, the inability of SSQ1 over-expression to complement an SSCl null mutation (Schilke et al. 1996) is consistent with the loss of general chaperone function by SSQ1. These results support the conclusion that Ssqlp is no longer a generalist thsp70. SSQ1 is dispensable for yeast survival due to some functional overlap of 38C]. SSQ1 deletion mutants have been observed to accrue iron within the mitochondrial matrix with a concurrent reduction in Fe/S cluster-containing enzyme concentrations and protection against oxidative agents, phenotypes that can be partially rescued by the over- expression of SSCl (V oisine et al. 2000). Furthermore, because the mechanism by which SSCl and SSQ1 participate in Fe/S cluster biogenesis seems to require interaction with the same conserved motif of Isu, the Fe/S cluster biogenesis scaffold protein (Schilke et al. 2006), SSCl is likely to assist Fe/S cluster formation in fungi lacking SSQ1. Because of this overlap, Ssclp and Ssqlp compete for nucleotide exchange factor Mgelp, which allows ADP and Pi to be released from the thsp7OS and is present in limiting amounts. The greater abundance of Ssclp in the mitochondrial matrix, compared to Ssqlp, may limit the amount of Mgel p that can interact with Ssqlp to be recycled to its active form. 14 As a n “€811“ the H5] yeah. obligrt protein al. 300 have in and one bite be mciudir. neitifur A.— 6,, slim 3‘9.‘ 1' ‘thsi J'Pmlei t. ‘Qii'l‘ a} liq-232d 1501.: s19" I‘Lbel .J r—9 .3 51. £9» rural As a result, the reduced proportion of activated Ssqlp may only be sufficient to carry out a restricted task load compared to Ssclp (Schmidt et al. 2001). The role of Ssqlp in Fe/S cluster formation is analogous to the specialized task of the Hsp70 HscA in bacteria and it appears that, after arising independently in a subset of yeasts, SSQ1 has undergone functional evolution. In the process, Ssqlp has acquired an obligatory protein interaction with the yeast orthologs of the cluster assembly scaffold protein and the co-chaperone proteins with which the bacterial HscA interacts (Schilke et al. 2006). A specialized Hsp70 committed to Fe/S cluster biogenesis therefore appears to have independently arisen twice throughout the course of evolution- once in the bacteria and once in the yeast. The initial discoveries of SSQ1 in E. coli and S. cerevisiae seem to have been serendipitous; SSQ1 homologs remain undetected in many eukaryotes, including humans (Schilke et al. 2006). Given that most eukaryotes utilize a multifunctional thsp7O in the Fe/S cluster biogenesis pathway, the advantage of dedicating a separate thsp70 to assist exclusively in this process in yeasts remains to be established. J-protein Co-chaperones J—domain protein co-chaperones belong to the 40 kDa heat shock protein (Hsp40) family and engage in an obligate, physical interaction with Hsp70s J-proteins are required to stimulate the activity and mediate the function of Hsp705; thus, J-protein isoforms are active in all cellular compartments containing Hsp70s. While the J -proteins represent a disparate group of proteins with little gene sequence conservation or protein structural organization among members, J -proteins do all share a defining feature called 15 the 14 contai actii'it proteir Hsp?) intent: b} Still the abs differet tenets} domain 133p. Vi Y itxct Sahi an- E'Oi YEP] imger d ”Win 3.1“] “Um "SWIG; the J-domain, named for its sequence similarity to the E. coli DnaJ protein. All J-domains contain a histadine-proline-aspartic acid motif essential for stimulation of the ATPase activity of the Hsp70 partner (Cheetham and Caplan 1998). Both general and specialist J - proteins exist in yeast, with several unique J-proteins that function to assist general Hsp70 functions or specialized Hsp70 roles, depending on the specific J—protein/Hsp70 interaction. A distinction between generalist and specialist J -proteins was demonstrated by Sahi and Craig (2007) in S. cerevisiae when the deleterious growth effect caused by the absence of J -protein Ydjlp was rescued by expressing J-domain fragments of several different J -protein co-chaperones, indicating that Ydj 1p is a generalist J -protein. Such general J-proteins may thus work to indiscriminately stimulate the ATPase functional domain common to all Hsp70s. When specialist J -proteins ch23p, Sislp, Jjj 1p, and Jjj3p were deleted from the yeast genome, however, the deleterious phenotype could not be rescued by expression of any other gene. Thus, in contrast to the generalist Ydj 1p, Sahi and Craig (2007) showed that the J —domain fragment of specialist Jjj3p alone could not replace the function of full-length specialist J-proteins. For Jjj3p, an additional zinc finger domain was shown to be required for the J -protein’s specialized role as a component in the diphtharnide biosynthesis pathway. Some specialist J-proteins form an exclusive thsp7O partnership to perform a single function, as in the case of a chaperone-co-chaperone pair, Sszlp and Zuolp, which associates with translating ribosomes to fold newly synthesized peptides. J-protein Zuolp interacts solely with the Hsp70 Sszl p, and Sszlp does not pair with any other J —protein, despite the co-occurrence of several other types of J-proteins. In some cases, J -proteins have been shown to bind substrate peptides themselves, 16 indej azin or re. sub~t coil .___4 andr speei IAC' getter twist liken independent of the formation of a complex with an Hsp70. For the E. coli DnaJ homolog, a zinc finger-like region and the carboxy-terminal region are required for ligand binding function (Han and Christen 2003). Some J -proteins may deliver substrates to the Hsp70 or recruit the Hsp70 to a peptide when they exhibit a ligand binding function. Similar substrate polypeptide binding features in specialized yeast J-proteins may also act to localize the J-protein to a particular site within the cell, thereby sequestering a J -protein and rendering it unavailable to function in place of other J-proteins, thus conferring specificity (Sahi and Craig 2007). JACl: The Mitochondrial J -protein Iron-Sulfur Cluster Specialist J AC1 is an essential gene and encodes one of 22 J-proteins in the S. cerevisiae genome. The J acl p protein contains an N-terminal mitochondrial signal sequence and is imported into the mitochondrial matrix where it serves as a specialized co—chaperone to assist in Fe/S cluster generation (Voisine et a1. 2001). Its task is to bind the Fe/S cluster assembly scaffold protein Isup for delivery to a thsp70 and stabilize the Isulp-Hsp70 interaction (Andrew et al. 2006). J aclp serves as the only known J-protein capable of interaction with thsp70 Ssqlp and together, the J-protein/Hsp70 pair has become specialized in the yeast Fe/S cluster assembly pathway. However, because J AC 1 and 88C] orthologs have been preserved together from bacteria to humans as components of the ISC Fe/S cluster formation pathway, they retain the ability to cooperate in yeast, explaining why the effects of deleting SSQ1 from the S. cerevisiae genome may be compensated for by the over-expression of J AC1 (Andrew et al. 2006). Several pieces of evidence demonstrate that J aclp and Ssqlp are a functional pair 17 5r pit Il'lE IOI C01 1?: rL_ .5} specialized in Fe/S cluster biogenesis and are consistent with the coevolution of the two proteins. When the mitochondria of mutant J AC1 S. cerevisiae strains are isolated and manipulated to contain normal Fe concentrations, a decrease in the activity of Fe/S cluster enzymes was reported. Further, the J AC1 mutation created in this study was found to display a negative genetic interaction with deletion of SSQ1, as these double mutants could not be recovered (Voisine et al. 2001). Additionally, Both J aclp and Ssqlp have been demonstrated by Andrew and colleagues (2006) to bind the C-terminal domain of Isup. This has revealed that J ac 1p and Ssqlp interact with a common component of the Fe/S cluster biogenesis pathway. Importantly, although J aclp also has the potential to pair with the more abundant thsp7O SSCl, J aclp displays a greater degree of in vitro stimulation of Ssqlp ATPase activity compared with the efficiency of the Jaclp — Ssclp interaction of both pre- and post thsp7O duplication yeasts (Schilke et al. 2006). Recently, new insight into genetic basis of differences that have evolved at the J AC1 locus and are responsible for the increased efficiency of Ssqlp ATPase stimulation have been elucidated, and involve shortening of the J-domain (Marszalek, unpublished). JACl from S. cerevisiae was engineered to include a section of the J-domain from the pre-duplication yeast Y. lipolytica. The elongated J -domain more closely resembled J AC1 sequences from yeasts encoding Ssclp, but lacking Ssqlp. The ability of the chimeric protein to stimulate Ssclp in S. cerevisiae, relative to native J aclp, was increased. Therefore, the portion of the J-domain lost in yeasts encoding Ssqlp may be important for interaction with thsp705 and the increased affinity of Jaclp for Ssqlp compared to Ssclp may have been due to this J -domain modification. The functional specialization of the Jaclp - Ssqlp pair emerged through the sequence of events in evolutionary history 18 Pine: fineag genom- fittest it get multifi novel: hast It’lfil‘im di) erg;- Ihese d i ., r611 that followed the duplication of an ancient, multifunctional thsp70. Conversely, the divergence of paralogs SSC] and SSQ1 may have shaped the evolution of J AC1 and molded this J -protein into an Fe/S cluster assembly specialist as well. Patterns and Mechanisms of Gene Evolution Following Duplication Because SSQ1 and SSC] originated from a gene duplication event in a yeast lineage, it is important to understand how the presence of paralogous genes within a genome can affect evolutionary divergence. Gene duplication plays a prominent role in molecular evolution as a mechanism of spawning the genetic material needed to generate the genomic variation responsible for biological diversity. When the ancestral, multifunctional mtI-Isp70 gene duplicated, a potential was created for the development of novel adaptation unattainable in the single gene copy state. However, the evolutionary fate of gene duplicates depends on two distinct types of mechanisms: 1) one of initial retention within a population and 2) one of several alternative modes of paralog divergence. While many models exist to describe the modes of gene duplicate evolution, those described here have emerged to the forefront of research studies (Hurles 2004). Neofunctionalization and nonfunctionalization are two models of gene duplicate evolution first put forth by Ohno (1970) to describe the resolution of functionally redundant paralogs. Common to both models is the assumption that a gene duplication event has no effect on organismal fitness because immediately after duplication, the paralogs are equivalent, with each gene copy capable of fulfilling all functions of the ancestral gene equally well. The gene copies are expected to be interchangeable while both paralogs retain high sequence identity, rendering the new duplicate gene immune to 19 forces 1 muunc lossof- gene to bOIh le copy su exhibiti duplical all illilC' of deleti or both. fate of g longer f- linen; Dip-ii (t g forces of selective constraint. Therefore, the duplicate gene is free to accumulate mutations that would have been forbidden in the ancestral single copy state because any loss-of-function that the duplicate gene copy sustains would be rescued by the redundant gene copy. Under this premise, the neofunctionalization and nonfunctionalization models both predict an asymmetry in the evolutionary rates between paralogous genes, with one copy subject to purifying selection to retain ancestral functions and the other copy exhibiting accelerated substitution due to relaxed constraint. Nonfunctionalization occurs when the period of relaxed constraint on the duplicated gene copy results in the accumulation of deleterious mutations that degenerate all functions of the ancestral gene, without the creation of new functions. Accumulation of deleterious mutations may occur within the protein-coding region, regulatory region, or both, and eventually leads to pseudogenization. This is likely to be the most common fate of gene duplicates (Li 1980). Once a gene has sustained a null mutation and is no longer functional, it is selectively eliminated from the genome and leads to the permanent preservation of the non-mutated paralogs. N eofunctionalization describes a scenario in which, during the period of initial relaxed selection on the duplicate gene, mutations are acquired in the coding or regulatory sequence that lead to a novel function of the encoded protein. These mutations are thought to be rare, relative to nonfunctionalization. Positive selection to optimize the novel function of the neofunctionalized paralog is then followed by reassertion of selective constraint to preserve the new function. Assuming that neofunctionalization results in the loss of an ancestral gene function, this process too, can lead to non-mutated paralog retention. 20 tests 4 genes. proces iaeilitz subset dislillt‘ inplice period ICCOl'lSl Under a third model of paralog resolution, known as subfunctionalization, the tasks of a multifunctional ancestral gene become partitioned between the two duplicate genes. Duplication-Degeneration-Complementation (DDC) (Force et al. 1999) is one process by which subfunctionalization is thought to occur, where degenerative mutations facilitate the preservation of both paralogs that have become dedicated to complimentary subsets of modular ancestral functions. DDC assumes that the ancestral gene expresses distinct functions ascribed to independent, modular regions of the gene. Following duplication, both paralogs acquire complementary loss-of-function mutations during a period of relaxed selection such that the expression of both paralogs is necessary to reconstitute the repertoire of functions encoded by the ancestral gene. Escape from adaptive conflict is yet another alternative model of paralog divergence. The premise of this model is that, if an ancestral gene gains an additional novel utility that is in adaptive conflict with the first function, the creation of a duplicate gene could confer an immediate fitness advantage by breaking the ancestral gene free of antagonistic pleiotropy. Through divergent selection, each paralog would have the opportunity to individually specialize in at least one of the ancestral functions to a greater degree than was possible in the ancestral gene. Assuming the ancestral gene was constrained by competing phenotypes conferred by a single gene, the functional partitioning between duplicates or the rise of a novel function after duplication could proceed in a non-neutral manner, driven by an adaptive advantage. Strong evidence for gene duplicate evolution by escape from adaptive conflict has been shown in the regulatory divergence of paralogs of the S. cerevisiae galactose utilization pathway (Hittinger and Carroll 2007). GALl and GAL3 arose as duplicates of 21 a hifu induc bifum prnmt trans; in mm replae. found these c GAD lilietg: Promo: manna Clflas a bifunctional ancestral gene and encode the galactokinase enzyme Gallp and a co- inducer Gal3p, respectively. While they once shared a common promoter in the bifunctional ancestral gene, near complete subfunctionalization of the upstream promoters between the descendent paralogs has resulted in stringent control of GALl transcriptional regulation, contrasting with a more modest GAL3 transcriptional response to induction. The authors swapped the promoter sequences of GALl and GAL3, and also replaced native paralog promoters with that of a bifunctional GALl/GAL3 promoter found in another yeast species, and subsequently evaluated the fitness consequences of these changes. The results from these experiments revealed that switching GALl and GAL3 promoters was detrimental, indicating that each promoter had undergone divergence to optimize the expression of GALl and GAL3 individually. While the promoter of the bifunctional gene performed well in driving the expression of GAL3 and maintaining yeast fitness, regulation of GALl by the bifunctional promoter reduced basal expression and decreased yeast fitness. The spacing of transcriptional activator binding sites was then altered within the promoter sequence of the bifunctional gene to mimic the binding site arrangement of the GAL1 promoter. The manipulated bifunctional gene promoter increased the expression control of the galactokinase function in response to the presence of galactose. Adaptive conflict was therefore proposed to have compromised the expression optimization of galactokinase in the bifunctional ancestral gene with a single promoter. Only after duplication and promoter divergence was the expression of the galactokinase function brought under tighter regulation. Recently, an additional example of biochemical evidence for gene evolution via escape from adaptive conflict was presented in a study focused on a set of genes involved 22 in at and 1 then conic insert Mil in a pigment biosynthetic pathway in the morning glory, Ipomeoea purpurea (Des Marais and Rausher 2008). The dihydroflavonol-4-reductase (DFR) gene, responsible for the chemical reduction of flavonoid precursors of anthocyanin, has given rise to three gene copies, DFR-A, DFR-B, and DFR-C, through two gene duplication events. Biochemical assays for enzymatic reduction of five substrates (three commonly reduced by DFR and two rarely reduced by DFR) by the DFR copies encoded by both pre- and post- duplication species were conducted. Severe reductions in the capacity of post-duplication DFR-A and DFR-C to act on any of the five substrates tested, and an increase in post- duplication DFR-B to reduce all substrates when compared to the activity of pre- duplication DFR enzymes were demonstrated. The authors therefore concluded that the function of the ancestral gene was improved by the DFR-B copy. The release of adaptive constraint, imposed by antagonistic pleiotropy on the ancestral DFR, following the creation of duplicate genes, was consistent with comparative DNA sequence-based evidence of adaptive molecular evolution. While the above models of paralog evolution offer gene optimization through specialization or the acquisition of novel roles as long-term fitness advantages of gene duplication, short-term benefits must exist to govern the retention of paralogs immediately after gene duplication. The presumed selective neutrality of gene duplication fundamental to Ohno’s models fails to offer a short-term benefit of duplication events. The interim retention of duplicate genes on the path to neo- or subfunctionalization through DDC or escape from adaptive conflict also requires a fitness advantage. Kondrashov et al. (2002) have suggested that gene duplication itself may be a mechanism of adaptation by hypothesizing that survival in the face of environmental stresses may 23 mind: throng n13} tl Sher. infiner ripen tl'Ollt' the an. result: 1998 1. in yet erplnr mandate an increase in protein and/or RNA dosage that can be immediately achieved through an increase in gene copy. An environmentally determined optimum copy number may thus exist for each gene under a given set of conditions (Kondrashov et al. 2002). Several studies performed with yeast suggest that environmental conditions may influence gene copy number. For example, when a population of S. cerevisiae was experimentally propagated for 450 generations in glucose-limited media, the population evolved the ability to reproduce at a higher cell yield per unit of glucose compared with the ancestral strain (via a glucose transport system with enhanced glucose affinity), resulting from multiple tandem duplications of two hexose transport genes (Brown et al. I 998). The amplification of genes within the Hsp70 gene family may similarly be driven in yeasts as a mechanism to tolerate variations in heat, pH, ethanol, etc., to facilitate the exploration of new environments. The role that selection plays in detemrining the evolutionary fates of gene duplicates, from initial retention in the genome to degeneration, neofunctionalization, or subfil netionalization, or other intermediate states of paralog divergence, distinguish the different patterns of gene evolution following duplication described above. Though the Previously discussed modes may not be mutually exclusive and no one model of paralog eVOllltion may serve as a general mechanism applicable to all gene duplication events, the ability to characterize the direction and strength of past and present selective forces aeting on paralogs are proving to be keys to the elucidation of molecular evolutionary outcomes. 24 Detectin ultimatel the rent ll'llhlll a lit the 0 changes those the and tram are those mutitinr acids to 31? 91851 hillmcd Wise Detecting Signatures of Selection by Evolutionary Rate Comparisons At the level of DNA, nucleotide mutations arise in a stochastic manner and ultimately rely on either the forces of natural selection acting on the fitness conferred by the resulting phenotype, or random genetic drift within the population to achieve fixation within a population. Within protein-coding DNA, signatures of selection can be identified via the comparison of the proportion of nonsynonymous to synonymous nucleotide changes that have occurred though time. Nonsynonymous nucleotide substitutions are those that result in the substitution of an amino acid, via a change in both the DNA codon and translated peptide sequence. Synonymous nucleotide substitutions on the other hand, are those that do not alter the amino acid of the corresponding protein. Synonymous mu tations exist due to the degeneracy of the genetic code, which allows some amino acids to be specified by several unique nucleotide triplet sequences. Synonymous changes are presumed to be invisible to selection acting on protein phenotypes and are therefore assumed to represent the locus-specific background level of mutations fixed by neutral ' processes such as population bottleneck events or mutational hitchhiking. In the context of sequence evolution, the proportion of nucleotide differences per site between two genes that result in nonsynonymous codon changes represents the norIS)Inonymous substitution rate, dN, while the synonymous substitution rate is given as (18- Funher, expressing these two rates as the ratio at) = dN/ds can be interpreted as a gauge of the direction and strength of selection. An co value of less than 1 indicates that the number of mutations resulting in amino acid changes that reach fixation is more restricted than the basal mutation level. Therefore, w < l is indicative of negative or purifying Seleection, which reduces the rate of fixation of deleterious mutations. When a) is greater 25 enti: SW. Thea] {hf} pres.- that may: 5‘ A“ I UJJUI Clint than 1, it may be inferred that positive selection is responsible for the greater-than- expected fixation of amino acid changes. An co = 1, due to equal rates of nonsynonymous and synonymous nucleotide changes, points to neutral evolution of codons, with amino acid substitutions neither being selected for, nor against. Therefore, the greater the deviation of (0 is from 1, the greater the influence of selection. While evidence of negative selection may identify DNA sequence coding for regions of proteins that require strict structural conservation, uncovering signatures of positive selection is of particular importance in the search for evidence of adaptive evolution. Calculating an average a) for an entire protein-coding gene through pair-wise sequence comparison detects evidence of positive or negative selection throughout an en tire gene, evidence found in only a very small proportion of gene sequences. For ex ample, in a large-scale study conducted with 3,595 groups of homologs, comprising 24, 832 unique sequences, only 17 gene groups (or 0.45% of the total groups) emerged as candidates of positive selection (Endo et al. 1996). Such estimations of the prevalence and scope of the role of positive selection, however, may be misleading. Since gene-wide mean a) values mask site specific heterogeneity with which natural selection may act, they may not provide an accurate representation of the strength and direction of selective Pressures experienced by a gene. Strict interpretations of gene-wide average 0) values may overlook the high to with which a few sites of a gene are evolving, veiled by the low (0 values which characterize the evolution of the majority of sites. To bolster this line of reaSOning, Yang and Swanson (2002) used several models to estimate the number of codons subjected to positive selection in two gene sequence alignment sets: 192 human c1388 I MHC glycoprotein alleles, and abalone sperm lysin genes from 25 different 26 spec nail} abet 10 ill”.- pm. Corr El?» ml”: ‘ 5L 11c; , a} x kl species. For both groups of genes, when a) was averaged across all sites of the sequences, neither the class I MHC glycoproteins, nor the sperm lysin genes yielded an a) value above 1. On the other hand, when a model permitting 0) to vary from codon to codon was applied, a number of sites emerged as likely targets of positive selection with a) significantly greater than 1. This result was upheld regardless of whether sites of each gene alignment were partitioned into two groups with different evolutionary rates a priori (based on functional information of the encoded structures), or whether sites of different evolutionary rate classes were assumed to be distributed randomly across the sequence. The results of this study highlight the need to account for evolutionary rate heterogeneity to uncover patterns of selection. Several models have been developed to identify differential rates of molecular evolution that can result from selective pressures unique to particular sites and organismal lineages. Correlated Evolution vs. Co-Adaptation Differential selective pressures that act on individual sites may be particularly relevant to the detection of coevolution of interacting protein partners. Hakes et al. (2007) have suggested that a distinction must be made between correlated evolution and co- a~df'-lli>i::—:1tion among protein sites to more specifically describe coevolution. Correlated eVOIUtion is the concurrent change among interface residues of interacting proteins that may not necessarily be directly influenced by selective forces due to the protein-protein interaction itself. Co-adaptation, however, is driven by selection to maintain functional and Structural integrity of the protein pair to preserve cooperative abilities and results in the c()mpensatory change among interacting protein partners. The compensatory mutation 27 ma sen touj spec com lllZL’l COlT seq; Can rest “'h‘; Valid 1'10 ldfif may be fixed in response to an amino acid substitution in a region of either protein, which serves as the point of contact with the other partner. Of the proteins investigated by Hakes et al. (2007), an average of only 13% of the total protein sequence was found to correspond to exposed residues directly involved in specific binding activity at the interface of an interacting protein. Patches of proteins that comprise only a minority of residues may experience selective pressure exerted by interacting partners for inter-protein compensatory change. Therefore, correlated increases in the evolutionary rate of whole protein sequences of protein-protein interaction partners do not constitute conclusive evidence for co-adaptation. Instead, correlated coevolution among physically interacting proteins detected by whole gene seq uence evolutionary rate analysis may point to other targets of selection unrelated to the interaction of residues at binding surfaces. For instance, gene expression is known to heavily influence the rate of gene evolution (reviewed in Pal et al. 2006). Because cooperative proteins often depend on specific stochiometric ratios of active partners Witlji m the cell for efficient interaction, selection for changes in expression of one protein Can lead to selective pressure for a corresponding expression change in the other. The resul taut expression levels may then be the cause for evolutionary rate changes across the Who le protein sequence in both partners, detected as correlated evolution without an underlying adaptation of optimizing inter-protein residue binding. Therefore, evolutionary rate models that account for site-to-site differences are more likely to idel’ltify compensatory mutations resulting from co—adaptation. In addition to acting in a targeted manner within a protein coding gene, adaptive coevOlution has been shown to occur in episodic patterns of bursts (Messier and Stewart 28 19%. 1 over th: ratios n inform: history. ma) im selectio differen Silt-Spg codon-t ll): cod nucleon rtWM flUcleotl “equal be diffs simplgf) Willem filth as the P085 ‘1 l“; .3“ E (I'D 1997). By effectively averaging any strong, but transient periods of positive selection over the phylogenetic history of two sequences, pair-wise calculation of whole-gene 0) ratios may miss evidence for divergent adaptive evolution when lineage-specific information is not taken into account. Events at particular time points in a phylogenetic history, such as environmental changes impacting ecological niches or gene duplication, may impose divergent selective pressures on two protein-coding sequences. Divergent selection is implicated as a cause for (0 values of a gene to differ among clades, reflecting different selective pressures influencing different branches of a phylogeny. S ire-Specific Models Site-specific models define the codon as the unit of evolution and employ a codon-based substitution model to describe site-specific variations in evolutionary rate. The codon substitution model utilizes all of the information encoded within DNA at the nucleotide level, but improves upon the nucleotide substitution model in its representation of molecular evolution by recognizing the amino acids that are encoded as nucleotide triplets. Importantly, considering the amino acid sequence that will result from a Sequence of nucleotide codons allows synonymous and nonsynonymous mutations to be differentiated (Goldman and Yang 1994). In employing codon models, several Sin'1l.3lifying assumptions must be made. First, the codon model assumes that the DNA Seqllences under study are protein-coding and does not consider untranslated sequences such as introns. Second, codons which signal translational termination are not included in the Possible codons allowed to result from substitution, since these stop codons most often generate a truncated protein and are generally not tolerated within organisms 29 (Nielsc are m mutant (Goldn each cc predeli models datum tstimat Calegor llllffpre (Nielsen and Yang 1998). Lastly, only one of the three nucleotide positions of a codon are assumed to undergo substitution per mutation event (for example, an AGG to CGA mutation would require more than one step under the codon model of evolution) (Goldman and Yang 1994). The site-specific model of codon substitution assigns a probability with which each codon of a multiple sequence alignment is expected to fall within a particular predefined number of evolutionary rate categories. By conducting this test using nested models of increasing rate categories, the optimum number of rate categories can be determined by statistical tests. The evolutionary rate for each of the rate categories is estimated from the data. Each codon can be assigned to a particular rate class and categorized as evolving under positive or negative selection, and at what magnitude, by interpreting the sign and value of (1). Maximum likelihood estimation of site-specific rates of evolution can be conducted using fixed-site models or random-sites models. Fixed-site models utilize Structural and functional information about a protein of interest to identify specific amino acids predicted to be under equal selective pressures a priori, while random-site models do not make any prior assumptions about the evolutionary rate of any particular site. when Yang and Swanson (2002) analyzed the site-specific rates of evolution of MHC c:IaSS I and sperm lysin genes, the proportion of codons belonging to each evolutionary rate Class and the values of to that were estimated exhibited a high degree of consistency among both fixed- and random-sites models for both gene data sets. The authors demonstrated that partitioning codons into rate classes prior to a) estimation is not necessary; the random-sites model was just as powerful. The residues were classified into 30 evolutionary functional or pressures. Brunch-Spat Serc heterogencit pmneter. u phylogenetit multiple line lineage will llOmOgeneoL numbfil’ 0f 65 Mel comp Wide (1) film Mfll~specj inlemaj bran ml Il0l eXls Solution Un Mel-w historical en evolutionary rate groups corresponding to the groups of residues constituting evolving functional or structural regions of the proteins believed to be under unique selective pressures. Branch-Specific Models Several models have also been developed to examine lineage-specific a) value heterogeneity among genes. The simplest lineage-specific model includes only one a) parameter, which assumes the same gene-wide average to for each branch of a phylogenetic tree. The number of different on values represented by a gene across multiple lineages may be increased to test whether a gene along one a priori identified lineage (with (0;) is evolving with an overall rate that is significantly different from a homogeneous rate (we) characterizing that gene from all other branches of the tree. The number of estimated branch-specific to parameters may be increased until maximum model complexity is reached with the “free-ratio” model, in which an independent gene- wide a) value is estimated for each branch of the tree. An important distinction of the branc h-specific model compared to the site-specific model is the ability to examine internal branches of a phylogenetic tree. Because known DNA sequence representatives may not exist for internal branches, the ability to detect evidence of ancestral sequence eVOII—ltion under positive selection makes this model powerful. A test conducted using a branCh-specific model allows one to correlate a phylogenetic branch with known hiSt()l‘ical events, such as ecological shifts or a gene duplication, to hypothesize the Soul‘ce of increased selection. Yang (1998) used a branch-specific test to demonstrate that a lysozyme gene, 31 present in t had a highe the other at Monkey pr along the b that the lyst the ph}loge Brunch-Sin The combined It more specil particular p for Clumplc 55mg llll'eg' A ‘h may hate 8‘ Wired a] TOrcgmunc mk‘hes, u abng m6 \‘p bl ”‘6 null t when [he nu present in the ancestral primate leading to the divergence of the Hominoid species group, had a higher overall nonsynonymous to synonymous substitution rate ratio compared to the other ancestral and present-day Colobine, Cercopithecine, Hominoid, and New World Monkey primates examined. Furthermore, the average to of the lysozyme gene inferred along the branch leading to the Hominoids was found to be greater than one, indicating that the lysozyme gene was likely under a divergent positive selection during this time in the phylogenetic history of primates, rejecting a strictly neutral mechanism of evolution. Branch-Site Models The principles of site— and branch- specific estimation of to have also been combined to design models that are used to test gene evolution hypotheses with even more specificity. These methods allow one to gain evidence for hypotheses concerning particular points in evolutionary history. An instance in which ancestral gene duplication, for example, may have given rise to changes in the selective pressures acting on a gene bei n g investigated, could be identified. A ‘branch-site’ test allows one to test for the presence of individual codons that may have evolved under positive selection along specified branches. The branch SPeCifred a priori as that hypothesized to be under positive selection, is denoted the “foreground” branch and is compared to all other branches of the tree, the “background” bral'l<:hes, with respect to site-specific to distribution. The detection of positive selection along the specified branch relies on the rejection of neutral evolutionary rates predicted by the null hypothesis of a fixed on = l for the gene on the foreground branch. Therefore, when the null hypothesis is rejected, codons are identified along the foreground branch 32 r7 ...__< A~ U11 Ih: ~l. Ltd Nil that exhibit a) both greater than that of the background branch sequences and greater than 1. To increase the rigor with which false positives arise, the subset of positively selected foreground sites are divided among two categories: 1) a class where the co of background sites is free to vary from 0 < (o > 1 and 2) a class where the (D of background sites is fixed at 1. This technique provides a more accurate estimation of the (o of background branches, to which the foreground sites are compared for evidence of positive selection (Zhang et al. 2005). Clade Models In addition to the branch-site model, the clade model allows evolutionary rate comparisons to be made simultaneously among codons within a gene and among branches of a phylogeny. The clade model rate test combines patterns of substitution rate heterogeneity across a gene sequence and lineage-dependent rate disparities. However, clade models differ from branch—site models in two important respects: 1) the sequences under analysis must represent at least two clades, defined as a group that includes all of the taxa descended from a common ancestor, a situation described as monophyly, and 2) clade models do not require an (n > 1 to detect a significant difference in evolutionary rates between foreground and background branches. Statistical comparisons of nested clade models can show evolutionary rate accelerations or decelerations that represent a potential increase or relaxation of selective constraint, respectively. A clade chosen a priori is compared with all other clades on the tree with respect to its site-specific a) distribution. These two clades are often called the “foreground” and “background” clades, respectively. Ultimately, individual codons that are evolving at a 33 different rate in one lineage compared to equivalent codons from another lineage are identified. Individual amino acids may therefore be examined as candidates responsible for functional divergence within a protein. The clade model was first used to test for divergence in selective pressure between the 8 and y globin genes, paralogs which encode subunits of the hemoglobin oxygen binding protein products in placental mammals (Bielawski and Yang 2003). Following the gene duplication that created the e and y globins, selection is thought to be responsible for the divergence in observed expression patterns, leading to delayed, post- embryonic y globin expression in the simian primate lineage. In contrast, a globin expression has maintained ancestral gene expression patterns and remains confined to the embryonic life stage of all placental mammals. Under application of the clade model, approximately 16% of the codons common to the e and y globins were found to be evolving under divergent selective pressures, with e globin codons in this rate category evolving under very strong purifying selection (a) = 0.008) and orthologous y globin codons in the divergent rate category evolving under weak purifying selection (a) = 0.79). The twelve codons that comprised the class of divergently evolving sites among the a and y globin clades were subsequently mapped onto three-dimensional globin protein structures to verify that the majority of the encoded residues are part of major structural and functional features of the hemoglobin holoenzyme, one such region being that responsible for oxygen affinity. The authors concluded that, while the majority of globin sites evolve at similar rates when the e and y globin clades are compared and display substantial selective constraint, the twelve codons of the divergent (1) category are residues likely to have been important for the expression-niche expansion of y globin to 34 the fetal developmental stage following gene duplication. Tools for Evolutionary Rate Analysis One popular tool that has been developed to model the heterogeneous nature of molecular evolutionary rates is the package of computer programs known collectively as PAML, or Phylogenetic Analysis by Maximum Likelihood. Among other functions, PAML implements maximum likelihood statistical methods in the context of a phylogeny to estimate synonymous and nonsynonymous substitution rates. The estimates can then be used to test hypotheses of site- and lineage-specific a) variation given a sequence alignment and phylogenetic tree topology. Included within PAML is codeml , a program that can perform the site-specific, branch-specific, branch-site, and clade model tests. The user inputs 3 multiple sequence alignment file, a tree topology which describes a hypothesis of evolutionary relationships among the input sequences, and a control file which specifies the model with either initial or fixed parameter values. A strength of PAML is the ability to optimize parameters that define trends unique to individual data sets of protein-coding sequences through the numerical maximization of the log likelihood value. The likelihood score is indicative of the probability of observing a set of data given a particular model of evolution and phylogenetic tree. Parameters used to describe patterns of sequence change upon which the model and tree are dependent are optimized simultaneously within the likelihood score calculation. Optimized parameters include the transition/transversion rate ratio (1c), and total genetic distance among sequences used to infer branch lengths (t), and nonsynonymous to synonymous substitution rate ratio (0)). Equilibrium codon 35 fret CV3 Obs: an i It‘ll. 1m CVO out; like frequencies exert an influence on the optimization of K, t, and w, and are therefore evaluated by PAML analytically from the sequence alignment. Recognizing the possibility that multiple local maxima may occur within the distribution of likelihood values (Suzuki and Nei 2001), it is important to allow PAML to optimize parameter values using several different initial parameter input values to ensure that the likelihood space is sufficiently explored. The use of different codon frequency calculation methods is also encouraged to ensure that the parameters are optimized robustly and result in the greatest likelihood score. Ignoring codon bias has been observed to impose an even greater influence on 0) estimations than K, since codon bias is an influential source of unequal substitution rates among codons (Bielawski and Yang 2004b). Additionally, replicate PAML tests should be performed using alternative input tree topologies, if multiple tree topologies exhibit strong statistical support. Because the “true” phylogeny of a set of sequences cannot be known, it is important to show that evolutionary rate analysis results are not dependent on any one tree topology and that test outputs are in agreement with a common conclusion (Bielawski and Yang 2004a). Outputs obtained from multiple runs can subsequently be compared by their log likelihood scores in a likelihood ratio test, which evaluates the differences between a pair of nested models with different parameters. In this “goodness-of—fit” test, the simpler model represents the null hypothesis. To perform a likelihood ratio test, twice the log likelihood difference between the competing models, defined as the log likelihood test statistic, is first calculated. The log likelihood test statistic is assumed to approximately follow a x2 distribution. Therefore, the x2 distribution is used to determine an expected value of the log likelihood test statistic, using the number of additional parameters 36 incorporated into the more complex model relative to the simpler model, as the appropriate degrees of freedom. The null hypothesis is accepted if the log likelihood test statistic falls within the expected distribution (Bielawski and Yang 2004a). Methodological Limitations Interpreting the role of selection on a gene through (D estimations of protein- coding regions has the potential to be misleading. For one, the calculation of d5 ignores the cases where a nucleotide substitution that fails to change the encoded amino acid may confer a fitness difference. The value of ds may therefore be erroneously assumed to be a rate of neutral mutation. For example, biased abundances of iso-accepting tRN As containing different anticodons, within the cellular pool of tRNAs, may result in differential translational efficiency of sequences containing different nucleotide triplets for the same amino acid. Synonymous substitutions may also violate the assumption of neutrality when a nucleotide is shared between genes, as in the case of genetic material of many viruses (Diamond et al. 1989) for which the mutation is nonsynonymous for an overlapping reading frame. Moreover, nucleotide changes may affect the stability of DNA or RNA molecules if the substitution results in disruption of secondary structure through elimination of a crucial hydrogen bond. Hammerhead ribozymes, for instance, rely on stem-loop features for recognition, binding, and subsequent cleavage of substrates (T uschl and Eckstein 1993). In addition, the alignment of DNA and amino acid sequences is implied to be error—free, such that each nucleotide within a ‘column’ corresponds to the same codon position of all other genes. However, the “true” alignment of a group of sequences is 37 unknown, and even computer programs using sophisticated algorithms to align sequences can only make an inference of sequence relationships. Assessment of simulated DNA sequence data alignments has shown that the reliability of computer— generated alignments for correctly recognizing homologous sites decreases when the length of sequences that contain insertions and deletions is increased (Nuin et al. 2006). Similarly, the estimation of site or lineage-specific to values relies on the topology of a cladogram which serves as a description of the ancestral origins and relationships among the sequences in question. However, cladograms represent hypothesized phylogenetic relationships; the true phylogenetic history of a set of gene sequences can never be known with certainty. Furthermore, factors other than positive selection can cause an 0) >1. For instance, the severe reduction in population size caused by a population bottleneck can decrease the effectiveness of purifying selection, allowing deleterious mutations that would otherwise be eliminated, to rise to fixation and oppose selection via drift. In some cases, the random nature of mutation may result in the absence of synonymous substitutions. Thus, a codon may show a) >1 simply due to the stochastic nature of mutation. Likelihood tests of evolutionary rate heterogeneity do not yet allow such alternative explanations for w >1 to be statistically considered (reviewed in Hughes, 2007). Finally, and perhaps most importantly, recovering molecular signatures indicative of the direction and intensity of selection are not adequate to make conclusions about the phenotypes and subsequent fitness effects of observed mutations. Instead, evolutionary rate analysis should be used as a springboard for the formulation of hypotheses that may directly (i.e. biochemically, at the molecular level) investigate the fitness costs and benefits to organisms conferred by the products of genes evolving at elevated or 38 dec: ecoi decelerated rates. Ultimately, the goal of this line of research should be to seek the ecological origins for evolutionary forces that lead to adaptation. 39 SECTION II: EXPERINIENTAL STUDY HYPOTHESES AND PREDICTIONS We sought to investigate the evolutionary patterns of the mitochondrial heat shock proteins involved in Fe/S cluster biogenesis: the paralogous genes SSC] and SSQ1, plus their interacting J-protein partner, J AC1. Motivation for this study comes from the observation that SSQ1 represents an example of a heat shock protein that has become specialized in a particular sub-function of its ancestral gene, and interestingly, one unusual to chaperones. In this study, we analyzed sequences of monophyletic fungal groups of comparable within-clade relatedness, two of which diverged from a common ancestor prior to the gene duplication that created SSQ1 (Aspergillus and F usarium), and two of which diverged after the duplication event (Saccharomyces and Candida) (see Figure 2). The presence of J AC1 within each of these clades has given us the opportunity to investigate how the duplication of the ancestral thsp70 has influenced the evolutionary paths of SSCl, SSQ1, and J AC1, via extensive comparative analyses of the rate of gene sequence evolution. 40 SSC1 SSQ1 JACl 'I'III'I' Candida III'd-"I" +-'l- M <+—+ Figure 2: Summary of mitochondrial heat shock protein (thsp) distribution among fungal clades. The monophyletic fungal groups examined were the Saccharomyces, Candida, F usarium, and Aspergillus clades. All four clades encode the gene for the multi-functional thsp70 SSC 1, as well as the interacting mitochondrial J -protein co- chaperone encoded by JAC] present in all four clades. The “+” and “-” symbols represent the presence or absence of a gene within a clade, respectively. In the lineage indicated by the star, prior to the divergence of the Saccharomyces and Candida clades, the ancestral thsp70 underwent a gene duplication, which gave rise to the gene for the specialized thsp70 SSQ1 carried in Saccharomyces and Candida taxa. Our objective was to elucidate molecular patterns of selection to test the following hypotheses: Hypothesis 1: Selective constraint has been relaxed in 88C] in the presence of its paralog, SSQ1. Ho: The rate of SSC] evolution in clades encoding the paralogs 8801 is equal to the rate of 88C] evolution in clades that lack SSQl Inability to reject the null would suggest that SSC] and SSQ1 paralogs are equivalent and therefore, functionally redundant. This outcome seems unlikely, given that evidence indicates SSCl and SSQ1 cannot replace one another and rules out the functional equivalence of the encoded proteins. 41 H1: The rate of 88C] evolution in clades encoding the paralog SSQ1 is not equal to the rate of 88C] evolution in clades that lack 8801 Evidence to support this hypothesis would be consistent with a relaxation of selective constraint following duplication of the ancestral thsp70 gene if SSCl is evolving at a faster rate than SSQ1. Our a priori prediction is that the rate of SSC 1 evolution will be elevated in clades encoding SSQ1, versus clades that lack SSQ1. Biochemical evidence for the increased affinity displayed by J aclp for Ssqlp suggests that Ssqlp may be capable of fulfilling the role of Ssclp in Fe/S cluster biogenesis. Thus, SSQ1 may have the ability to compensate any loss-of-function mutations affecting SSC 1 at sites important for Fe/S cluster biogenesis. SSQ1 would negate the need for SSCl to maintain sites used for the Fe/S cluster assembly pathway, allowing a greater proportion of mutations to be fixed at these sites in SSCl. Alternatively, the rate of 88C] evolution in clades encoding SSQ1 could be decreased relative to the rate of SSC 1 evolution in clades that lack SSQ1. Evidence for this result would be consistent with an increase in selective constraint on SSCl when co- occurring with SSQ1. It is difficult to identify possible sources of increased constraint on SSCl, given that evidence does not exist to suggest that SSCl has attained a novel function or adaptive peak since the thsp70 duplication event. Hypothesis 2: 8801 is under less selective constraint than SSCl because 8801 has fewer encoded functions to maintain. Ho: 8801 and 88C] evolve at equal rates Inability to reject the null would be consistent with the conclusion that SSQ1 and SSC] are not under current divergent selective pressures. One possible explanation for 42 this result could be that, while functional divergence of Ssqlp and Ssclp occurred in an ancestral lineage, current selective pressures in extant taxa are now acting with the same direction and magnitude on each paralog. However, the functionally divergent paralogs interact with different groups of substrates and therefore have different sources of possible coevolutionary influence. Thus, it seems unlikely that SSCl and SSQ1 would be evolving at equal rates. H1: 8801 and SSCl evolve at unequal rates Evidence to support the alternative hypothesis would be consistent with the functional specialization of Ssqlp in Fe/S cluster biogenesis if our a priori prediction, that SSQ1 is evolving at an elevated rate compared to SSCl, is observed. According to biochemical experiments, J aclp stimulates the ATPase activity of Ssqlp to a greater extent than Ssclp of both pre- and post— thsp70 duplication species. Therefore, mutations must have been fixed in SSQ1 since the time of the duplication event to afford functional distinction from SSCl. Additionally, Ssqlp has a diminished functional repertoire. Therefore, SSQ1 must have undergone fixation of mutations that result in loss of function. On the other hand, Ssclp has not been shown to have gained any novel functions subsequent to the creation of SSQ1. Because Fe/S cluster biogenesis constitutes only one of many roles encoded by SSCl, any loss of performance in Fe/S cluster assembly sustained by SSC] is predicted to have occurred with a small number of mutations. The number of mutations likely to have occurred in SSQ1, to degenerate the many lost roles in protein folding and translocation, would be comparably large. Therefore, a greater number of mutations are likely to have occurred in SSQ1 than SSCl since the time of duplication. 43 Alternatively, SSQ1 could be evolving at a decreased rate compared to 88C]. An increase in selective constraint on SSQ1 would be a possible explanation for this result. However, because fewer functions have been ascribed to Ssql, relative to Ssclp, this outcome would support the need to further investigate the functions of Ssql to identify additional sources of constraint that could be acting on SSQ1 compared to SSC 1. Hypothesis 3: The rate of JACl evolution is positively correlated with the rate of SSQ1 evolution because JACl and 8801 are coevolving. Ho: The rate of JAC] evolution in clades encoding 8801 is equal to the rate of JAC] evolution in clades that lack SSQl Inability to reject the null would be consistent with the absence of an influence by SSQ1 on the direction and magnitude of selection acting on J AC1. This outcome does not seem likely, given that J aclp has been demonstrated to result in different magnitudes of ATPase stimulation for Ssqlp and Ssc 1 p.Therefore, the selective pressures exerted by Ssqlp and Ssclp on Jaclp are probably not equivalent. Alternatively, the increased efficiency of the J aclp - Ssql interaction could be due to changes at very few sites in J AC1, or entirely independent of J AC1 evolution, resulting from the specialization of Ssql alone. H1: The rate of JACl evolution in clades encoding SSQ1 is not equal to the rate of JAC] evolution in clades that lack 8801 Evidence to support this hypothesis would be consistent with the coevolution of J AC1 with a duplicate thsp70 under increased or decreased selective constraint relative to the ancestral pre-duplicate thsp70. Our a priori prediction is that J AC1 evolves at an increased rate in the presence of SSQ1, compared to J AC1 from clades that lack SSQ1. Because Jaclp must stimulate the ATPase activity of an thsp70, evolution of J AC1 44 would be necessary to accommodate any changes in a thsp70 that might hinder the ability of J aclp to physically interact with the thsp70. Molecular coevolution of J AC1 with SSQ1 could account for the specialized interaction that has given rise to the ability of Jaclp to stimulate Ssqlp to a greater extent than Ssclp. Furthermore, because SSQ1 is a duplicate gene, it is expected to be under relaxed selective constraint compared to thsp70s in the single gene state. Therefore, if SSQ1 is evolving at a faster rate, the rate of JACl evolution would be expected to accelerate when co-occurring with SSQ1 to maintain the ability to physically interact. This assumes that the faster rate of SSQ1 evolution is due to changes at sites critical to interaction with J AC1. Regardless of whether SSQ1 — JACl coevolution was instigated by initial changes in SSQ1 or J AC1, the exertion of reciprocal selective pressures could result in correlated rate acceleration of SSQ1 and JACl. Thus, J AC1 would be observed to evolve faster in clades encoding SSQ1 compared to clades lacking SSQ1. Conversely, if the evolutionary rate of SSQ1 is observed to be slower than that of SSC], we predict the rate of J AC 1 evolution will be decelerated in the presence of SSQ1. A negative correlation between J AC1 and SSQ1 evolution would be indicative of antagonistic coevolution. A coevolutionary relationship of this nature could result if either JACl or SSQ1 constrain the evolution of the other, such as if the proteins had reached an adaptive peak in their interaction. Alternatively, another factor (perhaps an unidentified component of the Fe/S cluster biosynthesis pathway) could increase selective constraint on J AC] or SSQ1, while releasing constraint on the other. Hypothesis 4: 8801 has undergone adaptive evolution to optimize the Ssql - Jaclp interaction important for Fe/S cluster biogenesis. Ho: 8801 has not evolved under positive selection 45 confer relarat the di\ 11 5.1168 fl'olut functit 111 the 33km Inability to reject the null would be consistent with the fixation of mutations that confer the functional differences between Ssqlp and Ssclp to have occurred by a relaxation of selection and/or genetic drift. A possible evolutionary history to account for the divergence of Ssqlp without positive selection would include relaxation of constraint at sites required for protein folding, translocation, and stress responses. A relaxation of selective constraint at those sites would allow deleterious mutations to accumulate and degenerate the encoded functions. The increased ATPase activity in the presence of J ac l p could have arisen in SSQ1 due to the random fixation of beneficial mutations with weak fitness effects. Alternatively, the increased ATPase activity could be due to evolution within JACl alone. H1: 8801 has evolved under positive selection Evidence to support this hypothesis would be consistent with a period of adaptive evolution in the history of SSQ1. The premise for this proposal is that SSQ1 has become functionally specialized since its divergence from SSCl. SSQ1 shows increased activity in the presence of J AC1, an improvement of ancestral function important for Fe/S cluster assembly. To improve upon the ancestral function, SSQ1 must have acquired mutations beneficial to the Jac lp-Ssqlp interaction, potentially at sites critical to physical contact between the two proteins. If increased efficiency of Ssqlp ATPase stimulation by J aclp results in an increase in adaptive fitness, perhaps by improving Fe/S cluster biogenesis, then positive selection could drive the new SSQ1 allele to fixation. 46 METHODS Fungal Taxa and Gene Sequence Alignments Gene sequences were retrieved from seven to eight fungal species, from each of four monophyletic clades. The taxa from two of these clades, the Saccharomyces and Candida groups, encode the duplicate Hsp70, SSQ1, while the Aspergillus and F usarium clades lack SSQ1. SSC], SSQ1, and J AC1 coding region sequences (exons only) were taken from Saccharomyces cerevisiae RM1 11, Saccharomyces cerevisiae YJM789, Saccharomyces paradoxus, Saccharomyces mikatae, Saccharomyces bayanus, Saccharomyces castellii, and Candida glabrata genomes, which comprise the Saccharomyces clade, and from Candida lusitaniae, Candida guilliermondii, Debaryomyces hansenii, Candida parapsilosis, Candida tropicalis, Candida dubliniensis, and Candida albicans genomes, which comprise the Candida clade. SSCl and J AC1 sequences from the Aspergillus nidulans, Aspergillus niger, Aspergillus terreus, Aspergillus oryzae, Aspergillusflavus, Aspergillus clavatus, Aspergillusfirmigatus, and Neosartorya fischeri genomes comprise the Aspergillus clade, while sequences from the Podospora anserina, Trichoderma reesei, F usarium solani, F usarium graminearum, F usarium verticilliodes, and F usarium oxysporum genomes comprise the F usarium clade. The complete genome of each of the above fungal species has been sequenced and has been made available through sequence databases curated by the Saccharomyces Genome Database, the BROAD Institute, the Joint Genome Institute, the Wellcome Trust Sanger Institute, and Génoscope. Nucleotide and protein sequence BLAST searches were performed to identify orthologs, with reciprocal best BLAST hits used to confirm 47 orthology and reject paralogy of SSCl and SSQ1 sequences. The sources and genome coordinates of all sequences are presented in Appendix A. Because J AC1 is a fast evolving gene and differs by more than 80% at the nucleotide level between fungal clades, J AC1 is too divergent to confidently generate a multiple alignment of J AC1 sequences from all four fungal clades. Therefore, it was necessary to carry out I AC1 sequence alignments, subsequent gene tree construction, and rate analyses, separately for each of the four fungal clades. However, a similar average I AC1 sequence divergence and number of taxa for each clade facilitates comparison of JAC] sequences among the clades (see Figure 3). Within each clade, any two JAC] nucleotide sequences differ by approximately 25% to 35%. Average Within-Clade Pairwise Sequence Divergence 0.5 7 r: Clade: encoding 8801 . Clade: lacking $801 Uncorrected p-distances Saccharomyces Candida Aspergillus Fusarium Figure 3: Average within-clade pair-wise sequence divergence of J AC1. The average I AC1 divergence, calculated as the uncorrected p-distance, between any two J AC1 sequences encoded by taxa belonging to the same clade is shown. P-distances are expressed as the percent sequence dissimilarity. Error bars represent standard deviations. 48 Multiple alignments of translated amino acid sequences were performed using CLUSTAL W (Thompson et al. 1994), with default gap penalties and subsequent manual trimming to remove gaps. The J AC1 alignment of each clade contains the following number of amino acids: Saccharomyces: 177, Candida: 167, F usarium: 189, and Aspergillus: 185. In contrast, the more conserved nature of the thsp70 genes permitted alignment of sequences from all taxa. The SSCl alignment includes 603 amino acid sites from all four fungal clades, and the combined SSCl and SSQ1 alignment includes 580 amino acid sites from SSCl of all four fungal clades and SSQ1 from the Saccharomyces and Candida clades. All amino acid alignments are shown in Appendix B. Cladogram Construction for PAML Input Trees Data Partitioning Figure 4 depicts a graphical summary of gene tree construction. Analysis of separate partitions of data with independent evolutionary models has been demonstrated to fit heterogeneous data better when compared to un-partitioned data. Further, data partitioning may also yield support for alternative tree topologies (DeBry 1999). Analysis of partitioned sequence data is a technique used to accommodate evolutionary heterogeneity within subsets of the sequences. The first and second nucleotide positions of codons within protein coding regions are expected to evolve at a slower rate than third positions, due to the fact that most substitutions at third positions are synonymous. At first positions, however, most substitutions are nonsynonymous, and all substitutions are nonsynonymous at second positions. Therefore, selective constraint is expected to be 49 weakest for third positions, of intermediate strength at first positions, and strongest for second positions. Thus, the fastest rate of change is expected to take place at third positions and result in greater ability to resolve phylogenetic relationships among closely related or slowly evolving sequences. In such cases where few sequence changes are expected to have accumulated among taxa, first and second positions may not contain sufficient variation to resolve evolutionary histories. First and second positions are often more useful in resolving deep branches of a phylogenetic tree, where the sequences in present-day taxa may be very divergent. Given greater sequence divergence, the chance increases for third positions to become saturated with homoplasies, at which point these nucleotides no longer provide a reliable signal to distinguish basal relationships. Tree construction of translated amino acid sequences is another method to achieve robust branch resolution, given evolutionary rate variation within genes. A model used to describe patterns of amino acid substitution may be more appropriate than models that use DNA units of evolution, and is useful to complement results of cladogram construction using nucleotide substitution models. While information held within DNA is lost when sequences are examined at the amino acid level due to the degenerate nature of the genetic code, modeling amino acid substitutions releases analyses from biases in nucleotide base composition and mutation more prevalent in nucleotide sequences. For example, unlike peptides, nucleotide evolution is often influenced by structural constraints that favor a particular nucleotide sequence for hairpin or loop regulatory features that result when the DNA is transcribed into RNA. Selection for codon bias also falls into the category of nucleotide compositional bias. Moreover, far more character types make up peptide sequences compared to DNA sequences (there are more types of 50 amino acids than nucleotide bases), therefore making amino acids less prone to mutations that revert a site back to its ancestral state. Additionally, because amino acid substitutions often require more than one nucleotide substitution, the rate of amino acid evolution is slower than that of nucleotides. Together, the reduced homoplasy and slower evolutionary rate observed at the amino acid level confers the advantage of better phylogenetic resolution of distantly related taxa or fast-evolving genes than might be possible by nucleotides. Therefore, different subsets of the sequence data were considered here individually. Unrooted gene trees were constructed using the following partitions: first, second, third, first with second, and all nucleotide positions of codons, as well as amino acids. Maximum Parsimony Constructing the phylogenetic tree topologies to be used in the estimation of evolutionary rates is a critical initial step that can be accomplished using several different methods of inference. Ideally, given a set of properly aligned sequences, the inferred phylogeny would be identical, regardless of the method used to construct the tree, if the “true” evolutionary history is to be accurately represented. In practice, however, each method of phylogenetic tree construction possesses unique strengths and pitfalls, and therefore, can influence the outcome of phylogenetic analyses. For this reason, it is prudent use more than one method in parallel, and, given a sequence data set, to examine alternative trees in subsequent analyses when possible. Maximum parsimony (MP), maximum likelihood (ML), and the Bayesian inference (BI) methods were used in this 51 study. The MP method of phylogenetic inference is a character-based method that seeks to recover tree topologies that minimize the number of evolutionary transitions necessary to explain the distribution of characters among taxa (Hennig 1966). A tree search algorithm is used to evaluate tree topologies according to the minimum number of steps required. The occurrence of convergent evolution, parallel evolution, or character reversals to the ancestral state, may cause two sequences to appear more closely related than they actually are. These are sources of homoplasy; the opportunity for their occurrence increases with the time since divergence from a common ancestor and are assumed to be minimized in the most parsimonious tree. However, MP tree construction has a tendency to erroneously group highly divergent sequences together, particularly when the sequences are distantly related or have undergone very rapid evolution. This problem is known as long-branch attraction (Felsenstein 1978). MP trees were constructed in PAUP* v 4.0b10 (Swofford 2000), with a heuristic search using the tree-bisection-reconnection (T BR) branch-swapping algorithm and equal weighting for all characters. The TBR method of tree searching starts with an initial tree topology, breaks the tree into two sub-trees, and then reconnects the halves at all possible nodes. Here, the initial topology was generated by the random, stepwise addition of sequences and heuristic tree search proceeded by random addition sequence replications. One hundred bootstrap replacement replicates were performed to determine statistical support for branches of each topology. 52 Maximum Likelihood Another commonly used method for inferring phylogenies is the MLmethod introduced by Felsenstein (Felsenstein 1981). ML tree construction can use many different models of sequence substitution in conjunction with the powerful statistical inference of optimizing a likelihood function. This allows the ML method to more efficiently distinguish homoplasy from synapomorphy, an advantage that provides greater accuracy of phylogenetic inference of very divergent taxa or sequences with very different rates of evolution, compared to the MP method. The ML method examines all possible pathways of sequence change possible for a given data set in order to identify the hypothesis most likely for the data. Within the likelihood calculation used to evaluate hypotheses, the tree topology, branch lengths, and evolutionary model components are simultaneously optimized. When these parameters have been optimized to maximize the likelihood, the best evolutionary model and tree have been found (according to ML). This is analogous to reaching a peak in a multi-dimensional parameter landscape. Parameter values at the peak reached in the parameter space are estimated from the data, and therefore do not need to be specified a priori by the investigator before examining the data (Holder and Lewis 2003). However, ML method calculations are computationally intensive and may propose an incorrect evolutionary relationship if an inappropriate substitution model is chosen (Huelsenbeck and Crandall, 1997). Maximum likelihood trees were inferred using PhyML v2.4.4 (Guindon and Gascuel 2003) by applying the general time reversible (GTR) model of nucleotide substitution. The GTR model estimates an independent frequency with which each nucleotide base is observed within a set of sequences and an independent substitution rate 53 for each pair of nucleotide substitutions. Additionally, each substitution type is assumed to be equally reversible to allow, for instance, G —> T and T —> G to occur at equal rates. Furthermore, parameters such as the proportion of invariant sites and the gamma shape parameter, used to describe the distribution of substitution rates among sites, account for site-to-site evolutionary patterns. All parameters for the model were estimated from the data, with four discrete categories in the gamma rate distribution. The amino acid model of substitution indicated as the best model for protein evolution by ProtTest v1.4 (Abascal et al. 2005), according to a likelihood ratio test, was specified for each multiple amino acid sequence alignment as follows: Saccharomyces J AC1: WAG, Candida J AC1: RtREV, Fusarium JACl: WAG, Aspergillus JACl: J'I'I', SSCl: RtREV, SSCl and SSQ1 combined: RtREV. A neighbor-joining tree was generated in PhyML to serve as the starting tree in the tree search. A hill-climbing algorithm was then used to optimize the maximum likelihood. One hundred bootstrap replicates were performed to determine statistical robustness of trees and yielded a bootstrap consensus tree used to assess clade support for both MP and ML methods. Bayesian Inference The BI method of phylogenetic reconstruction resembles the ML method in that the BI method can incorporate many different molecular evolutionary models in the search for the best tree. However, the BI method samples from the posterior probability distribution to identify the most probable phylogenetic tree, given a data set. This requires an investigator to assign prior probabilities for all parameters, i.e. predictions 54 made before examination of the data, a potential source of bias that some consider a disadvantage of the BI method (Felsenstein 2003). In phylogenetic analysis, prior probabilities are usually given an uninforrnative or “flat” distribution, to regard all possible trees as equal hypotheses until the data are examined. While the BI method evaluates the likelihood of a hypothesis to calculate the posterior probability, parameters are not optimized as in ML. Instead, the Markov Chain Monte Carlo (MCMC) algorithm is used to estimate the probability distribution of a hypothesis. This algorithm constructs chains to move from one location to the next within a multi-dimensional space of hypotheses, periodically sampling the posterior probability, and moving toward successively greater probability densities. Each “link” within the chain, or location within the tree space, is termed a “generation.” The goal is to reach an equilibrium posterior probability distribution, at which time a move to a new location within the tree space does not yield a greater posterior probability. Separate chains running in parallel converge at similar posterior probability values. To avoid becoming stuck in local regions of high posterior probability density and allow more efficient exploration of the hypothesis landscape, the tree and evolutionary model evaluated at a location by one chain may be periodically swapped between other parallel chains (reviewed in Holder and Lewis, 2003). Heated chains are freer to traverse peaks and valleys in the landscape, and are thus useful when posterior probabilities are swapped among parallel chains. The cold chain is more restricted in its movement and is the chain from which sampled posterior probabilities are used as the output for a run. Because the initial locations of the chains (termed the “bum-in”) in the tree space is often far from the greatest posterior probability density, a proportion of the first generations are discarded 55 from the final evaluation of posterior probability distributions. Bayesian inference trees were constructed in MrBayes v3.1.2 (Ronquist and Huelsenbeck 2003) using the same nucleotide substitution model as described for ML tree construction of nucleotide sequences and mixed model optimization for amino acids. The default assumption of flat prior probability densities was implemented for all parameters. Two parallel Markov chain Monte Carlo processes were initiated, consisting of three hot chains and one cold chain. The chains were run for 1,000,000 generations each, with a sampling frequency of once per 100 generations. The initial 2,500 trees were discarded as the burn-in. Chain parameter and tree convergence within one run and between parallel runs was assessed by likelihood scores. When the likelihood scores of the cold chains were no longer increasing and showed fluctuation within a narrow range, the chain was assumed to have reached stationarity within the parameter space. In addition, plots of generation versus the log posterior probability were also generated for each run to visually detect stationarity via absence of increasing or decreasing posterior probability value trends. Constructing Composite Input Tree Topologies For each method of tree construction, the most parsimonious or most likely (as appropriate to the tree method) trees were visually examined for branch resolution on bootstrap consensus trees. Note that in instances where more than one tree topology was returned as the most parsimonious tree by the MP method, computation of the bootstrap consensus negated the need to examine multiple MP trees for each data partition. Bootstraps of Z 90% or posterior probability values of _>_ 0.9 were considered sufficiently 56 well supported. In cases where branch resolution could not be achieved using one data partition, but could with another, branches were manually inserted to produce the best resolved, composite tree. In instances where evolutionary relationships among sequences could not be resolved with high statistical support by any combination of sequence partitions, tree branches were collapsed into polytomies. The number of unique composite tree topologies obtained by each phylogenetic inference method for each gene alignment are as follows: Saccharomyces J AC1: 2, Candida J AC 1: 3, F usarium J AC 1: 1, Aspergillus J AC1: 3, SSCl: 11, SSC] and SSQ1 combined: 9. All composite tree topologies used as input trees for codeml are displayed in Appendix C. 57 1‘t 2” /‘——rfi 1" 2' 1‘2 Figur 1316 E three Ukel were that play}. 530:1 arm SSC M El‘ol Maximum Parsimony (PAUP*) Composite Partitioned Sequence Data ]Branch sulDPOrt inputtree assessment - ,1 u {l u u U 1d 20!! 3M 1“ 20d 2nd ammo acids Maxrmum Likelihood (PhyML) Composite Partitioned Sequence Data I BfaflCh SUPP!”t inputtree assessment topologies V v v V V 1st 2nd 3"I 1st 2Ml 1st 2"“ 3" mm acids Bayesran Inference (MrBayes) Composite Partitioned Sequence Data g ]Branch support in uttree ]assessment p topologies 1n 2M3” 1“ 2nd 2,“, amino acids toAnalysis (codeml) Figure 4: Summary of data partitions and phylogenetic tree construction for evolutionary rate analysis. J AC1, SSCl, and combined SSC] and SSQ1 gene trees were inferred using three different methods of tree construction: Maximum Parsimony, Maximum Likelihood, and Bayesian Inference. Five different partitions of each sequence alignment were used individually for tree construction: all three nucleotide bases of a triplet codon, the 1St and 2“d nucleotide positions only, each nucleotide position individually, and the amino acid residues. Each tree resulting from each data partition analyzed by the three phylogenetic methods was assessed for branch support to generate all possible unique, strongly supported composite topologies for to analysis using the cod eml program of the PAML package. thsp70 Clade Model Rate Comparisons To investigate the potential role selective constraint has played in the evolution of SSCl since the inception of SSQ1, as stated in Hypothesis 1, a clade model test was performed to examine the influence of the presence of SSQ1 on the rate of SSCl evolution. The rate of SSCl codon evolution from taxa possessing SSQ1 (Saccharomyces and Candida clades) was compared to the rate of SSCl codon evolution from taxa 58 lacking SSQ1 (Aspergillus and F usarium clades) via application of the clade model rate test. The foreground clade (SSC 1 from taxa co-occurring with SSQ1) was distinguished from the background clade (SSCl from taxa lacking SSQ1) at the node representing the most recent common ancestor of the Aspergillus and F usarium sub-clades for each of the eleven input trees (see Figure 5A). Due to a program glitch that we found in the application of model=3 in PAML v. 4.0 (Bielawksi 2008), clade model analyses were carried out in version 3.0 of PAML (Yang 1997). Tests were conducted under three different clade models that varied in the number of pre-defined rate categories. The null model had one rate category, while the two alternative models were specified to have either two or three rate categories. likelihood ratio tests were then performed to determine the most appropriate rate test model. Likelihood ratio tests comparing models are shown in Appendix E. For all clade models, initial 0) and K values of 0.5, 1.0, and 1.5 were tested. In addition, two different methods of codon frequency adjustment were applied in all codem'l tests: 1) codon frequency model F3x4, where 15‘, 2nd, and 3rd base frequencies from the data were used to estimate codon frequencies, and 2) a table of codon frequencies observed within the data. Because varying initial a) and K values and codon frequency models does not alter the number of parameters used by the model, the effect of altering these settings cannot be determined by a likelihood ratio test. However, clade tests conducted with initial values set to 0.5 and 1.0 tended to give slightly higher likelihood scores than with initial values set to 1.5. Initial values of 0.5 and 1.0 gave very similar, and often identical, likelihood scores. Use of the observed codon frequency table always resulted in the highest likelihood scores of all codem'l tests. The output that yielded the highest likelihood score is reported in the 59 RESULTS. To investigate the possibility that SSQ1 is under weaker selective constraint than SSCl, as stated in Hypothesis 2, clade model rate analyses were also performed as described above to compare the rate of SSQ1 sequence evolution to that of SSC 1. Cladograms constructed for all taxa in a single tree, with gene sequences from both thsp7OS, were divided into a foreground clade of SSQ1 and background clade of SSC 1 (see Figure 5B). Site-Specific Rate Tests To investigate the potential role that the thsp70 gene duplication played in altering the rate of J AC1 evolution in the presence of SSQ1, as outlined by Hypothesis 3, J AC1 site-specific rate tests were conducted. J AC1 sequences were separately evaluated for each of four fungal clades using a site-specific model of gene evolution applied in the codem'l program of version 4.0 of PAML (Yang 2007). The site-specific rate model is used to estimate to values for a pre-defined number of rate categories and, subsequently, each codon is assigned to the most likely category. We used this test to look for evidence of increased or decreased selective constraint acting on individual amino acids in sequences derived from clades in which J AC1 co-occurs with both Hsp70 paralogs, SSC 1 and SSQ1, compared to J AC1 sequences obtained from clades possessing only SSCl. Each J AC1 cladogram was subjected to rate analysis using models consisting of either three or ten possible rate categories. High and low initial values (1.3 and 0.3) for w and K were tested, and it was found that in all cases the analyses reached convergence under both starting values for both parameters. Codon frequency models were varied as 60 described above for the clade model rate analyses above. The number of 0) categories which best modeled site-specific rates of evolution for each J AC1 clade was determined by likelihood ratio tests (see Appendix E). The use of ten rate categories was found to confer a significantly greater likelihood of predicting the data for the Aspergillus clade when either the BI or ML input trees were used. The results obtained from the simpler model, using three rate categories, was superior in all other cases. Branch-Site Test To investigate the possibility that positive selection played an historical role in the adaptation of SSQ1 to Fe/S cluster biogenesis specialization, as stated in Hypothesis 4, we conducted a branch-site test to analyze the codon-specific selection pressures of the ancestral SSQ1. The ancestral SSQ1, which existed immediately after the thsp70 gene duplication, was defined as the foreground branch (see Figure 5C). We expected sites along the foreground branch to show evidence of positive selection. The model placed each codon into one of four 0.) rate categories, with restrictions placed on a) values as shown in Table l. Codons were placed into two classes for which a) was constant among ancestral and descendent sites, and two classes for which 0) was variable between ancestral and descendent SSQ1 sites. The value of a) was estimated to be 0 < (0 < 1 for common rate class 1, while the proportion of sites with 03:10 shared among ancestral and descendent sites was estimated for common rate class 2. To test the alternative model of evolution under selection, the estimated a) of ancestral SSQ1 sites was free to vary with (o > 1, while holding descendent SSQ1 sites at 0 < a) < 1 for divergent rate class 1. The background a) for divergent rate class 2 was held at 1.0. Posterior probabilities for 61 site classes were calculated by the Bayes empirical Bayes (BEB) method (Zhang and Yang 2005). The same eleven SSCl and SSQ1 combined gene trees used for clade model analyses were input into the branch-site test, with a 3X4 codon frequency model and the parameters K and to estimated from the data. The results of these tests were compared by likelihood ratio test to the null model under which all sites of the ancestral SSQ1 branch evolving at a divergent rate were modeled with a fixed a) = 1. Table 1: Evolutionary Rate (to) Estimation Under the Branch-Site Model “0 Hr Evolutionary Rate Class Descendent lineages Ancestral SSQl Ancestral SSQl (background) (foreground) (foreground) Common rate class 1 0 < w >1 0 <0» <1 0 1 Divergent rate class 2 w = 1 w = 1 w > 1 62 'IIIIII. IIIIOIOIO. COO-IOIIIIOIICOIII' l mesa l; mesa Figure 5: a priori defined lineages used for clade and branch-site model input trees. The phylogenetic relationships among SSCl fungal sequences, shown in dark gray, and SSQ1 fungal sequences, in light gray, are depicted in these simplified schematic trees. Dotted boxes are used to encompass foreground clades in trees A and B. The clade model was used to test for divergent selection pressures among SSC] of pre- and post- thsp70 duplication clades (A), and among SSQ1 and 88C] (B). The branch-site test was used to look for evidence of positive selection along the highlighted ancestral SSQ1 branch (C). A starred thick gray line is used to indicate the foreground lineage in tree C, representing the ancestral SSQ1 sequence present following the mitochondrial heat shock protein 70 (thsp70) gene duplication event and prior to the divergence of the Saccharomyces and Candida SSQ1 clades 63 RESULTS SSCl evolution accelerated in the presence of $801 The clade model test was conducted to examine whether altered selective constraint affected the evolutionary rate of SSCl in the presence of SSQ1 (Hypothesis 1). Rates of codon evolution were compared between SSCl DNA sequences derived from fungal clades that differed with respect to the presence of the fungal paralog, SSQ1 (Candida and Saccharomyces vs. Aspergillus and F usarium, harbor the presence and absence of SSQ, respectively). The purpose of this test was to identify the proportion of SSCl codons evolving at different rates between those SSCl sequences that co-occur with SSQ1 (foreground clade) and those evolving in the absence of SSQ1 (background clade), and to determine the w of those sites evolving at differential rates. The results presented were obtained using the SSCl MLTree 2, the tree that gave the highest likelihood score when used as the input tree. Similar results were attained with all tree topologies tested, and are thus independent of tree topology. More than half (61.6%) of sites in all SSCl genes exhibited an a) of 0.001 (common rate class 1), and just under a third (28.8%) of sites showed an a) value of 0.038 (common rate class 2), regardless of the presence or absence of the duplicate gene (Figure 6). However, about 9.6% of SSC] codons differ in their rate of evolution, depending on the presence or absence of SSQ1 (Figure 6). The faster evolving codons, belonging to clades lacking SSQ1, show an m of 0.107. In contrast, these same SSCl codons evolved more than twice as fast, with (0 = 0.284, in taxa possessing SSQ1 (Figure 6). 64 Average in for SSC1 codons in divergent rate class __ _ 0.234 ,_,_ Ciades lackingSSQl Ciedos onde 5501 Distribution of SSC1 codons among three evolutionary rate classes r l “”4“” ; . ' i- . 28.896 "“911! L'u- 0.038 t " . 61.6% Cicommon rate class 1 to = 0-001 a common rate class 2 I divergent rate class Figure 6: Comparison of SSC1 codon evolution from taxa encoding SSQ1 and taxa lacking SSQ1. The pic graph depicts the distribution of SSC1 codon evolutionary rates. Common rate classes are comprised of codons common to SSC1 from all taxa that evolve at the same rate. Codons of the divergent rate class are those common to SSC1 from all taxa that show two different rates of evolution, corresponding to the co—occurrence or absence of SSQ1. The largest proportion (61.6%) of SSC1 codons belong to common rate class 1, with an a) = 0.001. The second largest proportion (28.8%) of SSC1 codons belong to common rate class 2, with an a) = 0.038. The smallest proportion (9.6%) of SSC1 codons were placed into the divergent rate class. The bar graph depicts the difference in evolutionary rates between SSC1 from clades lacking SSQ1 and clades encoding SSQ1. The codons of the divergent rate class of clades encoding SSQ1 evolve with an a) = 0.107, while the codons of the divergent rate class of clades lacking SSQ1 evolve with an (0 = 0.284. 8801 has evolved at a faster rate than SSC1 Additionally, the clade model test was used to examine the rate of SSQlevolution relative to SSC, in order to determine whether or not there is evidence for an increase or decrease in selective constraint acting on SSQ1 (Hypothesis 2). By designating the 65 monophyletic group formed by all SSQ1 sequences as the foreground clade and the monophyletic group comprised of all SSC1 sequences as the background clade, the clade model test was used to determine the magnitude and direction of selection acting on a proportion of codons evolving at different rates between SSC1 and SSQ1. The results presented were obtained using the SSC1 and SSQ1 combined BI Tree 4, the input tree which yielded the most likely clade model outputs. SSQ1 sequences were found to contain a subset of sites evolving faster than those of SSC1 (Figure 7). Most of the sites conserved between SSC1 and SSQ1 are evolving at equal (slow or intermediate) relative rates, with about 43.5% having an a) = 0.002 and about 40.8% having an a) = 0.031(Figure 7). Approximately 15.6% of codons estimated to have a differential rate ratio of about 0.209 in SSQ1 and about 0.077 in SSC1, which is nearly three times as fast in SSQ1 than in SSC1 (Figure 7). A total of 82 codons comprise the 15.6% of SSC1 and SSQ1 in the divergent rate class. The encoded amino acids are highlighted within the Ssqlp amino acid sequence in Figure 12. 66 Average 00 for SSC1 and $50.1 codons In divergent rate class 0.25 T — -~ I 0.203 Distribution of SSC1 and SSQ1 codons among three evolutionary rate classes 1 . '."r . _,‘ -“gJ-‘h‘ , r; _ . , o , . . 'v gd T’s- ” " ‘1‘“ K, ' 1.. ‘33” D cOmmon rate class 1 u t 0.002 B Common rate class 2 I divergent rate class Figure 7: Comparison of SSC1 and SSQ1 codon evolution. The pie graph depicts the distribution of SSC1 and SSQ1 codon evolutionary rates. Common rate classes are comprised of codons common to SSC1 and SSQ1 that evolve at the same rate in all taxa. Codons of the divergent rate class are those common to SSC1 and SSQ1 from all taxa that show two different rates of evolution unique to each paralog. The largest proportion (43.3%) of SSC1 and SSQ1 codons belong to common rate class 1, with an 00 = 0.002. A nearly equal proportion (41.6%) of SSC1 and SSQ1 codons belong to common rate class 2, with an (0 = 0.031. The smallest proportion (15.1%) of SSC1 and SSQ1 codons were placed into the divergent rate class. The bar graph depicts the difference in evolutionary rates between SSC1 and SSC1. The SSC1 codons of the divergent rate class evolve with an 0) = 0.208, while the SSQ1 codons of the divergent rate class evolve with an 0) = 0.082. For both thsp70 comparative analyses, the clade model that grouped all codons into one of three rate categories was significantly more likely to predict the data, as indicated by likelihood ratio test, than when only two rate categories were used. Likelihood ratio test results are presented in Appendix E. The null model, with all codons constrained to have evolved at equal rates, was also rejected in every instance by 67 likelihood ratio tests. Statistical validation of the use of the clade model with three 00 categories held among all tree topologies examined (11 input trees for SSC1 and eight input trees for SSC1 and SSQ1 combined). Results of the clade model tests indicated that SSC1 evolved at an elevated rate when co-occurring with the duplicate gene, while SSQ1 evolved faster than SSC1. JAC] evolution has decelerated in the presence of SSQ1 A site-specific model was used to examine the direction and strength of selection that acted on individual codons of J AC1 among the Candida, Saccharomyces, Aspergillus, and F usarium fungal clades. Our purpose was to assess possible trends in J AC1 evolution from clades possessing duplicate Hsp703 compared to clades lacking the duplicate thsp70 (Hypothesis 3). Figures 8 and 9 show the distribution of codon evolutionary rates across the J AC1 sequences. The alternative tree topologies tested closely agree in the magnitude and location of elevated codon rates for the Saccharomyces and Candida clades. In the case of the Aspergillus clade, examination of alternative, strongly supported tree topologies resulted in some variation in the magnitude, but not location, of elevated codon rates. Only one JACl tree topology was used in the analysis of F usarium clade sequences because the topologies generated by each phylogenetic inference method were identical. In the clades containing the duplicate gene, SSQ1, J AC1 shows similar a) values across the gene sequence, rarely rising above 0.1 (Figure 8). In contrast, when the sequences from fungi lacking SSQ1 are examined, the average 00 of J AC1 is greater (Figure 9). The average 0) across the J AC 1 sequence and corresponding standard errors from each clade 68 were as follows: Saccharomyces: 0.0546 1 0.0028, Candida: 0.0348 :1: 0.0024, Aspergillus: 0.0711 :1: 0.0061, and F usarium: 0.0812 :1: 0.0056. The variance of (0 values estimated for J AC1 from the clades lacking SSQ1 was also greater than from the clades co-occurring with SSQ1 (Saccharomyces : 0.0014, Candida: 0.0010, Aspergillus: 0.0070, and F usarium: 0.0059). Additionally, none of the J AC1 site-specific analyses produced 0) estimates of 0, excluding the possibility of the absence of nonsynonymous mutations at a particular site across the sequences of a clade. Thus, the results of our codon-specific rate analysis of J AC1 from four fungal clades has opposed our prediction; the rate of evolution of J AC1, the J-protein co-chaperone specialized in Fe/S cluster assembly, slowed down following the duplication of the thsp70. 69 A Saccharomyces Clade iac1 0.4 , —— __L ____. l 0.3 - e e e - 00 0.2 i _ __ l —~—MP/Bl tree 01 l m "V M Mi ' _ l i l' "I M" —MLtree o . r r ‘6 r m T r a T i F m . l ”seassasaaa§§§§§§§§g codon B Candida Clade iac1 0.4 7 - ~ 7 ,, , . — *x—flw~-m—H-- ,1 i- A e 0.3 i- -— — - - , ,, we r n w 0'2 if,“ HM“ H MD I U _.~-,.-_. '_ Vii—~8—MPtree i 0.1 l -_-_,_ —.. _...._—._.—ML tree _...... Bl tree codon Figure 8: Site-specific 00 estimations for J AC 1 from clades encoding SSQ1. Evolutionary rates (0)) for J AC1 codons from the Saccharomyces and Candida clades are shown as a function of codon position within the gene sequence. Codon numbers represent column positions within trimmed nucleotide sequence alignments. Results from each input tree topology are represented: (A) Saccharomyces clade, MP/BI tree shown in dark gray, ML tree shown in black, (B) Candida clade, MP tree shown in dark gray, ML tree shown in black and BI tree shown in light gray. 70 Fusarium Clade JAC1 0.47,”: , , . ~77 , ,, r 0.3 f~ ,-,_,_,,._., ' W _ 0.1 ’ ’7 " l v ' 7’ l ' l 'H t, ' ’ Bltl'ee o . , . . . . A . - , r . . H 0 0'1 00 i\ to In 3 m N H O 0" W l\ {D in Q m N H """“"""‘“ “wmggfififlfifltfihfl codon B . Aspergillus Clade iac1 0.4 0.3 l, l l i i w 0'2 l1 . l l ‘ '1 —MPtree 0.1 * 3" , j M . l “r"- —Bltree 0 . . l. i I . I l 1 MLtree "sssssrsssaseaaasaesa codon Figure 9: Site-specific w estimations for J AC1 from clades lacking SSQ1. Evolutionary rates (0)) for J AC1 codons from the F usarium and Aspergillus clades are shown as a function of codon position within the gene sequence. Codon numbers represent column positions within trimmed nucleotide sequence alignments. Results from each input tree topology are represented: (A) F usarium clade, MP/ML/BI tree, (B) Aspergillus clade, MP tree shown in black, BI tree in light gray, and ML tree shown in dark gray. SSQ1 has evolved under positive selection Because .1 AC1 is evolving slowly in the presence of SSQ1, we suspected that JACl and SSQ1 have reached an optimum coevolutionary state among the extant taxa. This suggests that the potential for adaptive coevolution may have occurred between SSQ1 and JACl (Hypothesis 4). Ideally, we would test for evidence of positive selection 71 along the ancestral branch of J AC1 corresponding to the lineage in which SSQ1 arose. However, such a J AC1 branch-site test would require a single phylogenetic tree that incorporated sequences from all fungal clades, in order to reconstruct ancestral states at critical points in evolutionary history. Due to our inability to generate the needed multiple sequence alignment, the required tree could not be inferred. However, such tests are possible with SSQ1. Therefore, we conducted a branch-site test to detect evidence of positive selection affecting sites along the tree branch giving rise to SSQ1. Sites with constant evolutionary rates in both the ancestral SSQ1 branch (inferred sequence of the foreground branch) and all other sequences (background branches) were grouped into two categories (Figure 10). A proportion of codons (81.3%) were estimated to have evolutionary rates of (0 = 0.034, representing common rate class 1, and 4.4% exhibited an to: 1.000, representing common rate class 2. Hence, these rate categories were constant regardless of whether the sequence was that of the ancestral SSQ1 gene or a background gene (Figure 10). For 11.8% of codons, 0) was estimated at 1.994 within the ancestral SSQ1 and 0.034 for all other genes, designated divergent rate class 1 (Figure 10). A very small fraction of sites (0.6%) were placed into divergent rate class 2, which evolved at a rate of 00 = 1.994 in the ancestral gene, while these same codons evolved at 0) = 1.000 in derived sequences. This suggests that 12.4% of ancestral SSQ1 codons, representing codons from both divergent rate classes 1 and 2, were subjected to positive selection immediately following SSC1 gene duplication. Though the posterior probabilities associated with the placement of each codon into a given rate category varied according to tree topology, five out of the nine tree topologies agreed on four candidate sites for the initial fixation of adaptive mutations 72 following the birth of SSQ1. These four codons, corresponding to amino acids His”, Lysm, Glum, and Leu346 of the raw SSQ1 sequence from S. cerevisiae YJM789, were given a posterior probability of Z 0.90 of having an (0 of approximately 2 by at least 5 of the tree topologies tested (shown in Figures 11 and 12). Several other residues were given a high probability of having undergone positive selection in ancestral SSQ1 by some tree topologies (see Figure 11). The results obtained using tree topology B15, however, identified a different set of residues with high probabilities of belonging to a rate category with 00 > 1 and did not support evolution under positive selection for the residues shown in Figure 11. The source of this anomaly is unclear, given that the topology of the B15 tree does not show any large deviations from the other topologies used. All likelihood ratio tests allowed for the rejection of the null model of neutral evolution, validating the branch-site test model incorporating sites evolving under positive selection, as a statistically significantly better fit to model early SSQ1 evolution. The branch-site test was thus able to detect evidence of positive selection within the ancestral SSQ1 lineage immediately following gene duplication, and thereby rejects evolution by neutrality. Together, the two variable rate categories suggest that adaptive evolution in SSQ1 decelerated in descendent gene sequences after a burst of stronger selection immediately following the inception of SSQ1. 73 Average 00 for codons in divergent rate class 2 2.5 7 2.0 -l—~ m 1.5 7—— ~ 1.0 y 0.5 1. ~~ . 0.0 + - 7 Distribution of ancestral and descendent codons among four evolutionary rate classes 0.6% Average 00 for codons in divergent rate class 1 2: 71.994 " ' if \ ., —— :r------——— —— a ”33,136,, 1.0 « _-_ A ,. " ' a common rate class 1 0'5 4 0.034 N i 1 common rate class 2 0.0 J AI other i 4.4% I divergent rate class 1 “"009” “l = 1:000 I divergent rate class 2 Figure 10: Comparison of ancestral SSQ1 codon evolution to SSQ1 and SSC1 evolution within all other lineages. The pie graph depicts the distribution of SSQ1 codon evolutionary rates. Common rate classes are comprised of codons that evolve at a constant rate. Codons of the divergent rate classes are those common to ancestral and present-day descendent lineages that show two different rates of evolution, for each divergent rate class, unique to the ancestral and descendent lineages. The largest proportion (83.1%) of codons belong to common rate class 1, with an 00 = 0.002. A much smaller proportion (4.4%) of codons belong to common rate class 2, with an 0) = 1.000. A proportion of 11.8% of codons were placed into divergent rate class 1. The bottom bar graph depicts the difference in evolutionary rates between ancestral SSQ1 codons and those of descendent sequences in divergent rate class 1. The ancestral SSQ1 codons of divergent rate class 1 evolve with an 0) = 1.994, while SSC1 and SSQ1 codons from all other lineages of the tree of divergent rate class 1 evolve with an 0) = 0.034. The top bar graph depicts the difference in evolutionary rates between ancestral SSQ1 and descendent codons of divergent rate class 2. The ancestral SSQ1 codons of divergent rate class 2 evolve with an 0) = 1.994, while the present-day descendent SSQ1 codons of divergent rate class 1 evolve with an 00 = 1.000. 74 SSQI Amino Acid Residues Lys Asn Hrs Lys Glu Leu Arg Tyr Alignment Sequence 171 219 256 258 279 287 327 579 Raw Sequence 224 274 315 317 338 346 386 649 I 8'1 ** ** *** *4!!! *** *Iluk *** #31! n 81 2 ** ** *** *** ** ** ** ** p 8' 3 ** *t *1“! *** ##4! #134! ** *** I] 8'4 *4"! *** *** *** *** *** ** *** t BIS - - - - - - - - T Bl 6 ** ** *** *** *** *** ** ** r ML ** *** *** *** *** *** ** *** e MP 1 _ at: an: an: at: an: an all e MP2 _ _ *4!!! *4! t 4! *** * Figure 11: Comparison of posterior probabilities of placement of sites into a divergent rate class by the branch-site model, among input tree topologies. All residues assigned to divergent rate category 1 or 2, with to > 1 and a posterior probability of at least 0.9 in at least one of the tested tree topologies is shown. Posterior probabilities for placement in divergent rate class 2 of 0.70-0.79 (*), 0.80-0.89 (**), and 0.90-0.99 (***) are shown, with residues given a posterior probability of less than 0.70 indicated by (-). The His, Lys, Glu, and Leu residues shaded in gray are those residues of ancestral SSQ1 believed to have evolved under positive selection, given that at least five of the nine tree topologies tested resulted in those residues with a posterior probability of 0.90-0.99 of evolving with (0 > 1. 75 Ssq 1p Saccharomyces cerevisiae YJM789 IOOVIVVVV'VIVVII'VV|YVI'l!"""""!"l"'1|V'QV'VV'QII'V'I'V’VI'1P'|I 10 20 30 40 50 60 70 VIGIDLGTTNflAukglkDKgAgIIBNREGRTTPSIVAPLVGMAAKRQNAINSENTFFATKRLIGRAFNDEB V'7'99'9l'VV'I'UIVl9...IY"!""Il""|""l'VV'IPI'II'V'VIV'VVIYQVV'V BO 90 100 110 120 130 140 ‘ VQRDHAVNPYKIVKCGQIYLSTSGLIQSPSQIASILIKYLKQTSEEYLGVNLAVITVPAYFNDSQRQATK O O 0 0‘0. 0 O O O I!’l!'!""""'9!|vvvvIvvrv'vv'vlvvvv'7111|vvvv'VVVVI'VVVIICIV'QVPOI9 150 160 170 180 190 200 210 DAGKLAGLNVLRVINEPTAAAL3FGIDD§§N§PIAVYDLGGGTPDISILDIBDGVFEVRATNGDTHLGGE 9'9'0'!!|V9"|QP"|VVI1|vvov'vv1vlvvvv'vvvv' v v'vvvvlvvvv'919'|9!' 1 220 230 240 250 60 270 0 DPDNVIVNYIIDTPIEITREEITKNRETKQRLKDVSBRAKIDISHVRKTPIELPFUYKSRHLRVPMTEEE O O O. .0. .0 I O O ""'¢"IVV'VITVVV'V'09'99'9l'9'9|""|""""'|"'VI'P"|""|'V'Vl' 290 300 310 320 330 340 350 LDNMTLSLINRTIPPVKQALKDADIBPEDIDBVILVGGHTRNPKIRSVVKDLFGRSPNSSVNPDBTVALG I O O I O O I O. O. "'|"9'|?'QI|"'V|"99I99""i"|"'?|""""'l""'VP'VIV'V'IVV'II' 360 370 380 390 400 410 420 AAIQGGILSGBIKNVLLLDVTPLTLGIETPGGAPSPngkflTTVPVKKTEIPSTGUDGQTGVDIKVFQGB 'V'I'999'9'VC'I'V'I',"|0",'99'9'9'9'I'VVVI'VIIIVQI'I'V'VI'I',|!9,'I! 430 440 450 460 470 480 490 RGLVfiNNKLIGDLKLTGITPLPKGIPQIYVTPDIDADGIINV§AABK38%FQQSITSGLSE§EI:§PIBB I'QIOOUOI'OOO|DOIQI999!'0'?!I'9'!|QIOI|9999|9099|9909|9909'Q'IVIVQQ!|9 500 510 520 530 540 550 560 ANANRAODNLIRQRLELISKADIMISDTENLFKRYEKLISSKEYSNIVEDIKALROAIKNFKSIDVNGIK oee oooo 00 on oooooooe ooooooee Q'Q'OO'O'QPVQ'IOQVI 570 580 KATDALQGRALKLPQSATK O 000 0 Figure 12: Amino acid sequence of Ssql encoded by Saccharomyces cerevisiae YJ M789 showing sites inferred to exhibit relaxed selective constraint and ancestral positive selection. The 82 amino acids indicated with dots are the amino acids identified as belonging to the divergent rate class in the SSC1 and SSQ1 clade model test (see Figure 7), which evolve at an accelerated rate compared to SSC1. The four residues indicated by the arrows correspond to the sites identified via the branch-site model test as those estimated to have evolved under positive selection in the ancestral SSQ1, immediately following the mitochondrial heat shock protein 70 gene duplication. The Ssqlp sequence shown is from the trimmed SSC1 and SSQ1 combined sequence alignment. 76 DISCUSSION Molecular coevolution among interacting proteins can confer fitness consequences to crucial enzymatic pathways and can be initiated by the ubiquitous genetic phenomenon of gene duplication. The findings presented here in the evolutionary rate analyses of the fungal thsp70 paralogs SSC1 and SSQ1, and the interacting J - protein co—chaperone ] AC 1, together with previous observations concerning the functions of the encoded proteins, bear evidence consistent with release from antagonistic pleiotropy following a gene duplication event. Subsequent subfunctionalization has facilitated the coevolution of SSQ1 and J AC1 to optimize a J -protein co-chaperone- thsp70 chaperone interaction dedicated to activity in the Fe/S cluster biogenesis pathway in yeast. The thsp70 paralogs investigated here show a history of selection similar to that inferred for morning glory dihydroflavonol-4-reductase (DFR) duplicate genes. PAML rate analyses of the anthocyanin biosynthesis pathway DFR genes were consistent with paralog divergence via escape from adaptive conflict (Des Marais and Rausher 2008). Evidence from clade and branch—specific 00 value estimates of codon rate evolution for each of the three DFR copies indicated an ancestral single-copy DFR that was subjected to purifying selection, followed by a relaxation of selective constraint after gene duplication. Evidence for positive selection within the lineage immediately following the second duplication was observed. Positive selection early in the history of the paralogs of the most recent DFR duplication potentially enabled a burst of adaptive mutation fixation within these paralogs. Combined with biochemical evidence of optimization from an 77 ancestral sub-function of one of the DFR paralogs, and the loss of the ability to perform other ancestral functions in paralogs, the authors concluded that antagonistic pleiotropy enforced selective constraint to prevent full optimization of all ancestral DFR functions in single-copy form. In analogy to the DFR study, one of the fungal thsp70s, SSQ1, was found to have undergone positive selection in its ancestral sequence shortly following the gene duplication event from which it was created. Like DFR-B, SSQ1 became specialized in a role performed by the pre-duplication gene, and may have even evolved to outperform its paralog, SSC1, in terms of increased affinity for Jaclp and greater ATPase activation. SSQ1 shows biochemical evidence of ATPase activity improvement in response to J AC1 stimulation, with the potential to improve Fe/S cluster biogenesis efficiency, an ancestral pre-duplication function. Concornitantly, SSQ1 can no longer perform the ancestral thsp70 functions of protein folding and translocation functions, nor provides protection to cellular integrity from environmental stresses. The functional evolution of SSQ1 thus fits the criteria for a case of subfunctionalization. Furthermore, I AC1 evolution resulting in the loss of J ~domain residues important for Ssclp ATPase activation has occurred in yeasts encoding SSQ1. Therefore, an alteration of the J -domain of Jaclp may have been necessary for improved affinity to Ssqlp and may have been evolutionarily favored only in the presence of a thsp70 specialized in Fe/S cluster biogenesis. Coevolution of J AC1 with SSQ1 would have thus been a consequence of thsp70 paralog evolution following escape from adaptive conflict. However, evidence is lacking to meet the more stringent criteria of SSC1 and SSQ1 evolution by escape from adaptive conflict. There is no direct proof of a novel 78 function arising in the pre-duplication thsp70 that reduced the ability of the ancestral protein to perform any of its other tasks. This would require the biochemical characterization of the protein translated from an ancestral gene reconstruction. Additionally, future investigation of EcmlOp functions, and the selective forces acting on this third yeast mtI-Isp70 duplicate, could bolster the case for adaptive conflict in the pre- duplication thsp70 if ECM10 has also optimized an ancestral SSC1 function. Finally, it remains to be determined if a more efficient J aclp—Ssclp interaction optimizes the Fe/S cluster assembly pathway to increase yeast fitness. The functional specialization of SSQ1 also resembles the subfunctionalization for optimization of GALl and GAL3 functions, after release from antagonistic pleiotropy, by gene duplication (Hittinger and Carroll 2007). While promoter divergence resulted in the evolved phenotypes of differential control over GALl and GAL3 transcription, regulatory evolution of the thsp705 was not examined in this study. However, previous observations of decreased SSQ1 expression compared to SSC1, within S. cerevisiae mitochondria, suggests that SSQ1 and SSC1 have also undergone regulatory divergence. Another possibility is that the specialized function of SSQ1 hinges on a mutation analogous to a GALl Ser-Ala di-peptide identified to be sufficient for galactokinase activity when added to the active site of GAL3, the co-inducer of the galactose uptake pathway. Because deletion of the di-peptide from GALl, and a pre-duplication GALl/GAL3 bifunctional protein, did not improve the co-inducer function of the encoded proteins, the Ser-Ala mutation of GAL] could not be ruled a source of adaptive conflict. The effect of the Ser-Ala mutation on galactokinase function was dependent on the background of residues present at other sites within GALl. It is possible that 79 mutations have similarly arisen in SSQ1 that now contribute to functional specialization, but were fixed as compensatory mutations secondary to mutations fixed as a direct result of release from antagonistic pleiotropy. It is reasonable to hypothesize that opportunity for functional specialization of proteins like pigment biosynthesis enzymes, galactose pathway components, or thsp708, may extend to molecules that participate within a common biological pathway, by coevolution. The release of SSQ1 from antagonistic pleiotropy has influenced the evolution of J AC1, the J -protein partner also specialized in this pathway. J AC1 coevolution with the thsp70 paralogs has allowed its interaction with SSQ1 to become more efficient, while decreasing its efficiency of ATPase stimulation in SSC1. Support for Hypothesis 1: Selective constraint has been relaxed in SSC1 in the presence of its paralog, SSQl. An equal rate of SSC1 evolution, in the presence versus absence of SSQ1, was rejected. SSC1 evolved faster in the presence of its paralog, SSQ1. The result that SSC1 evolved faster when co-occurring with SSQ1 is consistent with the conclusions of Scannell and Wolfe (2008), who found that recent paralogs tend to evolve at an increased rate compared to singleton genes. Here, we suggest that the functional specialization of SSQ1 has relieved SSC1 of the Fe/S cluster biogenesis task, thereby relaxing selective constraint acting on SSC1 for this particular function. The' availability of the SSQ1 :J AC1 specialized pair could have rendered the SSC1 :J AC1 cooperation less important, thus allowing a greater proportion of nonsynonymous codon changes to be tolerated in SSC1, particularly if those sites encode residues that contribute to interaction with J AC 1 , or other unidentified aspects of Fe/S cluster biogenesis. 8O In the absence of SSQ1, however, antagonistic pleiotropy would continue to impose evolutionary constraint on SSC1, because SSC1 would be required to perform Fe/S cluster formation, in addition to protein import and folding. While evidence does not yet exist to suggest that SSC1 has improved any other pre-duplication thsp70 function in the presence of SSQ1, it could be that escape from adaptive conflict may allow SSC1 to perform a chaperone task, such as peptide translocation across the inner mitochondrial membrane, with greater efficiency if optimization is permitted in the presence of paralogs. This seems plausible if the relaxation of selective constraint on SSC1 among clades that harbor duplicate genes persists for tens of millions of years (Scannell and Wolfe 2008). An extended period of relaxed constraint may have the potential to fix many mutations via drift, and as a composite, could result in an altered phenotype. Support for Hypothesis 2: SSQl is under less selective constraint than SSC1 because SSQl has fewer encoded functions to maintain. Evolution of SSQ1 and SSC1 at equal rates was rejected. SSQ1 evolved faster than SSC1. When the average rate of codon evolution was compared between SSQ1 and SSC1, we were able to conclude that SSQ1 evolved faster than SSC1. This rate asymmetry is consistent with other published analyses of evolutionary rate asymmetry in paralogs (Conant and Wagner 2003; Zhang et al. 2003). An examination of gene duplicates created by a whole genome duplication in yeast revealed that genes with the most dramatic evolutionary rate increase, immediately following duplication, remained the “faster” evolving gene of the two paralogs. Therefore, it is likely that SSQ1 will continue to evolve with a greater (0 than SSC1. While evidence for sites under positive 81 selection (an to greater than 1) in extant taxa was not identified, the faster rate of SSQ1 evolution compared to SSC1 is interpreted as a result of relaxed constraint, depressed expression level, or both. The increased rate of evolution for SSQ1 could be due to relaxation of selective constraint that is independent of gene expression in order to allow for specialization on a single function. Relaxed constraint on SSQ1, compared to the ancestral single-copy thsp70, likely initially resulted from the ability of SSC1 and SSQ1 to reciprocally compliment one another, and subsequently also provide robustness against deleterious mutations. For example, if a mutation in SSC1 resulted in diminished function as an Fe/S cluster biogenesis chaperone, SSQ1 would have been able to restore this function. We propose that SSQ1 would then have been free to optimize efficiency for its role in Fe/S cluster assembly in the presence of SSC1, which could functionally replace SSQ1 for any of the many sub-functions that may have been compromised during Fe/S cluster assembly optimization. As a result, disproportionately many sites in the protein may now be under relaxed selection and thus evolve at a faster rate compared to the multifunctional SSC1. Alternatively, gene expression divergence of SSQ1 and SSC1 alone is a viable explanation for the faster rate of SSQ1 evolution. This line of reasoning is supported by the Drummond et a1. (2005) study, which concluded that gene expression level is the single greatest determinant of protein evolution, explaining more than half of the variation in nucleotide substitution rates of genes in S. cerevisiae. Though gene length, dispensability, and recombination have also been suggested as factors aiding to predict evolutionary rates of genes, these variables seem to play a minor role in the determination of evolutionary rates. In addition, expression levels have been shown to exert control 82 over these factors, often confounding efforts to link these factors as direct causes. Drummond et al. (2005) revealed that genes with a lower level of expression tend to evolve faster and offer an explanation for this observation independent of selection on protein function. It is known that errors during mRN A translation lead to the accumulation of mis-folded and toxic protein products that impose fitness costs to a cell by disrupting metabolic processes (Bucciantini et al. 2002). It was therefore proposed that selection acts to increase the translational accuracy of a sequence, (i.e. using the most abundant tRNA anticodons for amino acids), and to increase the robustness of a sequence to translational errors. Favoring amino acid sequences that fold into functional proteins, regardless of the generation of missense errors, increases translational robustness (Drummond et al. 2005). A subsequent study (Drummond and Wilke 2008) identified protein misfolding costs as the underlying selective pressure responsible for the co-variation in evolutionary rates, codon preference, and gene expression within and between genes, observed for model organisms ranging in complexity from E. coli to humans. The authors revealed that translational accuracy, translational robustness, the synthesis of full—length peptides, and the tendency to fold properly, all correlate positively with gene expression level. The cost of protein misfolding thus provides a reason for the selective constraint that gives rise to a greater proportion of optimal codons observed at conserved sites within a protein in genes that are most highly expressed. When a gene is expressed at a higher level, as with SSC1, translation occurs more frequently, increasing the number of opportunities for detrimental errors, so that accuracy and robustness become more influential to the cell’s overall fitness. Therefore, by 83 selecting against protein sequences with toxic characteristics (such as a propensity for aggregation) when translated incorrectly, the same evolutionary forces may indirectly select for a protein structure with enhanced thermodynamic stability. Together, selection which results in an increase in translational accuracy and robustness may have the effect of lowering both the rate of synonymous and nonsynonymous mutation fixation, imposing a form of evolutionary constraint at the sequence level. Higher expression level may therefore bring about increased evolutionary constraint on SSC1, while the relatively decreased level of expression of SSQ1 may result in relaxation of constraint. Divergence in the expression level of paralogous genes could occur as a consequence of accelerated promoter or regulatory region evolution by adaptive or neutral evolution. This has been suggested to be a common phenomenon in eukaryotes (note that evidence of regulatory sequence evolution would go undetected in protein-coding 0) analyses) (Zhang, 2003). Alternatively, divergence in paralog expression levels can result from other sequence changes that contribute to mRNA stability or chromatin structure differences between the gene duplicates (Li et al. 2005). Indeed, the approximately 1000-fold lower concentration of Ssql protein present in the mitochondria of S. cerevisiae (Voisine et al. 2000) is accompanied by a decreased codon usage bias and an increased overall rate of nucleotide substitution, indicative of relaxed selective constraint. While the codon adaptation index for SSC1 is reported to be 0.521, the codon adaptation index of SSQ1 is much lower, at 0.148 (SGD project, Sept. 2008) and is indicative of less selective constraint acting on third position nucleotides of SSQ1 codons. Less constraint on these nucleotides could allow SSQ1 to tolerate more synonymous substitutions than SSC1. Therefore, the elevated 00 of SSQ1 is impressive in 84 the face of an elevated d5, as was observed in a gene-wide average of site-specific ds values estimated across tree branches and compared to SSC1 d5 averages (data not shown). Lack of support for Hypothesis 3: The rate of J AC1 evolution is positively correlated with the rate of SSQl evolution because JAC] and SSQl are coevolving. An equal rate of JAC] evolution in clades encoding SSQ1 compared to the rate of JAC] evolution in clades that lack SSQ1 is rejected. However, JAC] evolution decelerated after thsp70 gene duplication. Here we have examined the influence of a gene duplication event on the selective forces driving the molecular evolution of protein partners specialized in Fe/S cluster assembly. We have demonstrated that J AC1 evolves faster in the absence of SSQ1. Our proposed explanation is that selective constraint is acting on J AC1 to preserve an optimized, physical interaction with SSQ1, which resulted from the coevolution of JACl and the duplicate, specialized thsp70. While J AC1 now evolves slowly in the presence of SSQ1, it is conceivable that the rate of evolution of J AC1 was initially accelerated after the thsp70 gene duplication that gave rise to SSQ1. Subsequently, J AC1 may have quickly reached an adaptive peak, together with SSQ1, in its ability to facilitate Fe/S cluster assembly. Or, J AC 1 was brought under constraint by some other influence. The rapid rate of J AC1 evolution, however, precludes the testing of this hypothesis, as carried out for SSQ1, since we could not reconstruct J AC1 ancestral states. We speculate that, subsequent to initial rate acceleration during a co-adaptive arms race to fix complimentary changes in the sites that physically interact between J aclp and Ssqlp, Jaclp evolution slowed to maintain efficient cooperation with Ssqlp. An alternative explanation for the faster rate of J AC 1 evolution in the Aspergillus and F usarium clades 85 could be a smaller effective population size of representative species compared to the Candida and Saccharomyces clades, which would in turn result in a reduced efficiency of purifying selection. Though expression data of J AC1 in the fungal species from which sequences were analyzed is unavailable, it is possible that the expression of JACl has been increased in the organisms possessing SSQ1 to balance molecular stoichiometry. Indeed, a higher average codon bias, consistent with higher levels of gene expression (Wang et al. 2005), was observed for J AC1 from clades encoding SSQ1. J AC1 CAI value means and standard errors calculated for each clade were as follows: Saccharomyces: 0.273 :t 0.013, Candida: 0.249 1 0.010, F usarium: 0.195 t 0.013, and Aspergillus: 0.183 :t 0.010. Thus, the third nucleotide positions of J AC 1 codons from clades encoding SSQ1 are likely to be under stronger selective constraint than third nucleotide positions within JAC] from clades lacking SSQ1. Increased constraint on third position nucleotides, as well as an overall increase in constraint to preserve translational robustness when gene expression is elevated, may be depressing w in J AC1 from Saccharomyces and Candida taxa. Support for Hypothesis 4: 8801 has undergone adaptive evolution to optimize the Ssql - Jaclp interaction important for Fe/S cluster biogenesis. Evolution of SSQ1 in the absence of positive selection is rejected. SSQ1 evolved under positive selection in the lineage immediately following its inception. The antagonistic pleiotropy that characterized the ancestral thsp70 prior to gene duplication may have been broken by positive selection in ancestral SSQ1, immediately following its gene duplication. Positive selection may have enabled a burst of adaptive evolution to optimize the J aclp-Ssqlp partnership important for Fe/S cluster biogenesis and promoted rapid subfunctionalization of SSQ1, thus relaxing constraint on SSC1 at 86 sites necessary for interaction with J aclp. The retention of SSQ1 within the genome following the gene duplication event may be attributable to this subfunctionalization, possibly having involved an adaptive sweep at ancestrally positively selected sites His3'5, Lys3 '7, Glum, and Leu346 within the ATPase domain. Given that past studies have shown how significant adaptive shifts can be instigated by very few amino acid substitutions (Golding and Dean 1998), rapid thsp70 SSQ1 evolution may have been responsible for its coevolution with J AC1 to specialize the J -protein-thsp70 pair. Alternatively, the signature of an initial burst of selection detected in the ancestral sequence of SSQ1 may not have been accompanied by functional adaptation at all sites, but instead reflect the fixation of compensatory substitutions to rescue a decrease in fitness arising from deleterious mutations within the gene or even elsewhere within the genome (Pal et a1. 2006). Future Directions Possible future lines of research include conducting protein structural and functional analyses, via experimental genetics and biochemistry, in order to elucidate the role of particular SSC1, SSQ1, and J AC1 sites in Fe/S cluster biogenesis. Site—directed mutagenesis and reconstruction of inferred ancestral gene sequences, followed by biochemical characterization of ‘resurrected’ ancestral proteins, is a technique that has been successfully used in the past to gain insight into the fates of paralogous genes following gene duplication (Zhang and Rosenberg 2002). Additional experiments could include mutating SSC1 sites to those corresponding to SSQ1 sites that were identified to have undergone positive selection immediately after the gene duplication. It would be 87 interesting to determine if those sites from SSQlimprove the efficiency of Ssclp ATPase activity in the presence of J aclp, and if so, whether those sites are necessary for direct contact with Jaclp, the nucleotide exchange factor protein, or the nucleotide. SSC1 engineered to encode an ATPase domain that more closely resembles that of SSQ1 might also be predicted to have decreased chaperone and stress mediation functions. Such a result would directly demonstrate the tradeoff between optimization of J aclp-mediated ATPase activity and loss of performance in other functions within the thsp70. The source of antagonistic pleiotropy in the ancestral thsp70 would thus be pinpointed within the ATPase domain. Conversely, manipulation of sites in SSQ1, where homologous positions in SSC1 are under relaxed selection, are predicted to be involved in Fe/S cluster biogenesis, as these were the sites predicted to be released from selection by subfunctionalization. On the other hand, independent manipulation of the sequences encoding the substrate binding, ATPase, and variable domains of SSQ1, to contain those sites that are under strict selective constraint in SSC1, should be performed. Such manipulation might lead investigators to attribute the increased Ssqlp ATPase activity to a domain other than the ATPase domain. Identifying sites in J AC1 that have evolved at a f«'tlSl: rate in the F usarium and Aspergillus clades, but have evolved at a slower rate in the Candida and Saccharomyces clades, might also be informative in guiding similar site- SPGCific mutation construction of J AC 1. The role of regulatory sequence evolution should also be explored in the future, per haps by evaluating the effect of exchanging the promoters of the paralogous rntI‘ISp703. One expectation might be that replacing the SSC1 promoter with that regulating the transcription of SSQ1 will decrease the expression level of SSC1 within 88 the m [18.1151 resul‘ more stud and chill the AT] pro clus phe Van H01 clue 3V0 the mitochondrial matrix. Alteration in the number, orientation, and/or sequence of transcription factor binding sites after mtI-lsp70 gene duplication might be expected to result in such regulatory differences. Another outcome of thsp70 promoter swapping might be that, when under the control of the SSC1 promoter, SSQ1 is increased in its degree of expression. However, the extent to which active Ssqlp is produced may still be less than Ssclp levels, given that SSQ1 has a lower codon bias and therefore might be more prone to translational errors that result in truncated or misfolded proteins. Such studies would be important to verify that the expression level difference between SSC1 and SSQ1 is due to cis-regulatory evolution and is not an effect of other forms of regulation, such as feed-back inhibition. The ultimate goal should be to elucidate details of how the thsp70 paralogs differ in their interaction with J AC1 and how these changes confer fitness differences via the execution of Fe/S cluster biogenesis. Thus, the direct impact that the increased Ssqlp ATPase stimulation by J ac 1p confers upon the level of active Fe/S-containing proteins produced in vivo must be established. Further, the fitness advantage of an optimized Fe/S cluster biosynthesis pathway must be demonstrated by the observation of an adaptive phenotype. This will not be a trivial undertaking, as the advantage of a phenotype often varies under different growth conditions and the presence of ecological competitors. However, as with any molecular process, if we are to advance our understanding of Fe/S cluster biosynthesis, we must study the pathway components in the context of evolutionary and ecological dynamics. 89 Fun. Appendix A Fungal Mitochondrial Heat Shock Protein Coding Region DNA Sequence Sources 90 lrl. .l‘. | . F I I .. .... l. l a. . lib l u v v I at ill-fillili \ lit! I:I~.tr\...whnr_pd _ IOWCOucurthOUGDNWW .IEW.~.UU(H .G.EC.U..OOU OEOCOU 09.30% COCODUOAM OEOCQUN CORR IDOL—40m 0020330” 90%“ up.‘ 0~OBL 58 §§E 2a: .568 a 3 a 852 28528 8a n. :33 _> .5338 use: 80822.8: E82 39.2 2: .. 0:9.» 05009:on 20:8 .-. 5 .+. 2.8350. uses... 0583 2.2. u 6.. 223:. 96% n 90% 838.5 oEocoe $388. u now mommnvo 32 8.3008 xcwncoo . . . . «.0893 Ex . n . .5.an «.8808 832 mosafltfios E: EEooSEEo. E33: E38 E: :38 |_| N Eco m— 20 838 (EZ 300.— Ndmmmo mum? 300. @883 —ooooO< 300. mevvomuwh< 300. mag flak: u . . ego «segue. $81.2 macho. was .11 . 3.0338 32 830008 xcamcoo 830mg? maoo. m EEO . maovmmmmhm—mmm 00w owowoo <40< 300: I mm .250 . «52.9335 com o. Fowz< mace. now 4:84. FEB + vomnéwvm . v<0m0m 85:9: me: .113“ 89 2:0 9.50. N area + mmvnomwé Enem— lomunfluo Shrub om2v _> 8.52.0; 95w: 8080352 «$82 26:3 2:. .. ucwi oESoEeco 833 3 .+. 08883. 83.2. ancoo E2. u .01 83:2. 905 u 90mm 3338 0:880 gluon“ u now ooiboSuz 980. Nmtvomfiu? 260. m woooomwooaoo< 300. 2.3.: So 32 c9338 x5980— mté Ea $6. 38868? cou$oou xcamcoc 035° So< 260. 8w 6F c2w88u xcamcoo 35 a :‘6 300. N 4:00..” . an .N EEO + mowsnéoowm 338.60 3:58 .:m<< c9333 x5956 «9808; c2338 xcumcoa $88885 8.8.5.“ €380 8058...... 8:2... ice—=21 8:2; 33:28 a... E2388 2 E2: 3%.289; Epafioum cox-h 3958 8:258 FOE. "2 03a... 93 Apmndix B Fungal Mitochondrial Heat Shock Protein Multiple Sequence Alignments Multiple alignments of amino acid sequences translated from protein-coding regions of mitochondrial heat shock proteins (thsps) were performed using CLUSTAL W (Thompson et al. 1994) with default gap penalties, and subsequent manual trimming to remove gaps. Alignment columns highlighted in black denote sites sharing 100% identity among all taxa. Taxon name abbreviations used are listed in the table below: Taxon Taxon Abbreviation Fungal Species Abbreviation Fungal Species Sce r_Y Saccharomyces cerevisiae RM11 Fgra Fusarium graminearum Sce r_R Saccharomyces cerevisiae YJ M789 Fver Fusarium verticilliodes Spar Saccharomyces paradoxus Fsol Fusan'um solani Smik Saccharomyces mikatae Ncra Neurospora crassa Sbay Saccharomyces bayanus Tree Trichoderma reesei Scas Saccharomyces caste/Iii Pans Podospora anserina Cgla Candida glabrata Nfis NeosarIOIya fischen‘ Calb Candida albicans Anid Aspergillus nidulans Ctro Candida tropicalis Ater Aspergillus terreus Cpar Candida parapsi/osis Acla Aspergillus clavatus Cgui Candida guiI/iennondii Afla Aspergillus flavus Cdub Candida dub/iniensis Anig Aspergillus niflr Clus Candida Iusitaniae Aory AsperMJs oryzae Dhan Debaryomyces hansenii Afum Aspergillus fumigatus Foxy Fusarium oxysporum 94 Sccr_R Scar_! cyla suit Shay Scar Dhan Chlb Cgui Ctro Cpar Cdub Clus ’OIY rgra Ever lcrl Tree Pans F301 Hfir Anid Ater Acla Afla Anig Aory Afun Figure 8cor_R Seal; 1' Cgl a Spar smile Shay Sca- Dhan Ca 11: Cgui Ctro Cpar Cdub c1 u- Foxy Fgra PVC: Nara Troo Pans rs o1 N151 I An it! A t: -r A c.- .1 a A £1 a An :1 g Aoz-y A fun Figure Bl: SSC1 amino acid multiple sequence alignment 95 Sccr_l Scer_i cyla Spar Snik Shay Scar Dhan Chlb Cgui Ctro Cpar Cdub Clus bey Igra Iver Ncra Tree Pans r101 Nfis Anid Ater Acla Afla Anig “I? Afum ligu Sear_R Scar_Y cyla Spar Salk Shay Scan Dhan Chlb cgui Ctro Cpar Cdub Clus Pbxy tyre Nor Nbra Tra- Pan- I501 Nfi- Anid Ater Acla Afla Anig Aory Afum Figure Bl: SSC1 amino acid multiple sequence alignment (continued) 96 140 150 160 170 180 VVT¥PAYFND QRQHTKDAGilvGLNVLRV¥NEPTAAALAYGLE VVT¥PAYFND QRQHTKDAGEI¥GLNVLRV¥NEPTAAALAYGLE ‘VVT‘PAYFND QRQHTKDAGOI GLNVLRV EPTAAALAYGLE VVT‘PAYFND QRQHTKDAGO.I‘GLNVLRV‘NEPTAAALAYGLE T PAYFND QRQHTKDAGII‘GLNVLRV‘NEPTAAALAYGLE 1"VVT PAYFND QRQ TKDAGO:I GLNVLRV‘NEPTAAALAYGLE '1 'VVT PAYFND QRQHTKDAGEI I¥GLNVLRV NEPTAAALAYGLE P ‘ VVTIPAYFND QRQHTKDAGSI GLNVLRV NEPTAAALAYGLE KVNS VVTOPAYFNDHQRQHTKDAGBI GLNVLRVENEPTAAALAYGLE P CI‘VVTGPAYFNDHQRQHTKDAth NEPTAAALAYGLE i KVNS VVTEPAYFNDHQRQHTKDAGHIa i ii QC) r'r' "2'3 FL" ”E? Z [‘1 '11 g L“ 3:! K: O L" D1 KINS VVTiPAYFNDHQRQHTKDA «I KVNS VVTOPAYFND QRQ- TKDAGHI P VVTEPAYFND' QRQH TKDAG'I GLNVLRV *NEPTAAALAYGLE PIC -VVT§PAYFND QRQ- TKDAGOI GLNVLRV¥ NEPTAAALAYGLE P11 VVT PAYFND QRQ TKDAGoI GLNVLRV¥NEPTAAALAYGLE PI! -VVT¥PAYFND QRQ TKDAGOI:GLNVLRV¥NEPTAAALAYGLE P 1WT¥PAYFND QRQ :TKDAGOIHGLNVLRV¥NEPTAAALAYGLE P .VVT¥PAYFND- QRQ :TKDAGIIHGLNVLRV NEPTAAALAYGLE P VVT PAYFND QRQ TKDAG-I GLNVLRV¥NEPTAAALAYGLE PI VVT¥PAYFND§QRQETKDAGOI GLNVLRV EPTAAALAYGLE P vvr¥ PAYFND QRQH TKDAG-I GLNVLRV NEPTAAALAYGLE P . WT¥ PAYFND QRQRTKDAGOIHGLNVLRvfiNEPTAAALAYGLE P HTKDAGOIHGLNVLRV¥ P P P a P P C) ‘ *PAYFND QRQ yNEPTAAALAYGLE :PAYFND QRQHTKDAGE.IHGLNVLRV¥NEPTAAALAYGLE ¢"W :PAYFND QRQHTKDAGEIHGLNVLRV¥NEPTAAALAYGLE -VVT¥ PAYFND QRQHTKDAGE.IRGLNVLRV¥NEPTAAALAYGLE VVT‘ PAYFND QRQHTKDAGEIHGLNVLRV *NEPTAAALAYGLE VVT¥ PAYFND‘ ‘QRQHTKDAGOIHGLNVLRV NEPTAAALAYGLE Figure B1: SSC1 amino acid multiple sequence alignment (continued) 97 Sco;_l Sce;_l Cpl: Spar Suit Shay Scan Dhan Cblb Cgui Ctro Cpar CUub Clus Fbxy Pgra Ivor Hera Tree Pans P301 Nfia Anid Ate: Acla Afl: lnig Aer] Afu: “m 190 200 210 220 230 240 Sccz;R Sco:_Y cyla Spar Smik Shay Sca- Dhan Calb cgui Ctro Cpar Cdub Clu- Foxy tyre Ivor Nara Tron Pan- 1101 Nfil Anid Atar Acla Afla Anig Aory Afum Figure Bl: SSC1 amino acid multiple sequence alignment (continued) 98 250 260 270 280 290 300 Scar;R Sco;_Y cyla Spar Salk Shay sca- Dhan Chlb cyui Ctro Cpar Cdub Clus Pbxy ryra Ever Nara Tree Pan- Incl Nfil Anid Ater Acla Ail. Anig Lory Afum Figure Bl: SSC1 amino acid multiple sequence alignment (continued) 99 310 320 330 340 350 360 5co:_R Sco;_Y cyla Spar Shay Sca- Dhan Chlb cyui Ctro Cpar Cdub Clus foxy Pyra Ivor Nara Tree Pan- Peal Nfia Anid Ater Acla Afla Anig Aory Afum Figure B1: SSC1 amino acid multiple sequence alignment (continued) 100 370 380 390 400 410 420 Ctro Cpar Caub C1 in Foxy 1'ng Nor Nara Tree Pans F501 NH- Anid Ater Acla Afla Anig Aory Afum Figure B1: SSC1 amino acid multiple sequence alignment (continued) 101 430 440 450 460 470 480 Sc¢:;R Sco:_r cyla spar Snik shay Sca- Dhan Chlb cyui Ctro Cpar cdub Clu- bey tyra Ivor Nara Tron Pan- 1501 Nil- Anid Ater Acla Afla Anig Aory Afum Figure B1: SSC1 amino acid multiple sequence alignment (continued) 102 Scor;R Scog_r cyla SC.‘ Nara P101 N115 Anid Ater AC1. Afla Anig Lory Alum Figure Bl: SSC1 amino acid multiple sequence alignment (continued) 103 550 560 570 580 590 600 Scor;R Scor;Y cyla Spar Snik shay Scan Dhan Chlb Cgui Ctro Cpar Cdub Clus Foxy Iyra IVor Hora True Pan- Fool Nfil Anid Ater Acla Afla Anig Lory Afum Figure Bl: SSC1 amino acid multiple sequence alignment (continued) 104 Sco;_R 8co;_¥ Cgla Spar 8m1k Shay Scan Dhan Chlh cyui Ctro Cpar Cduh Clus taxy Pyra Ivor Nora fro. Pan: Fool Nfia Anid Ater Acla Afla Anig Jory Afum Figure Bl: SSC1 amino acid multiple sequence alignment (continued) 105 8801 Spar SSQl Smik 8801 Shay $801 Seas SSQl Cgla SSQl Chlb 3501 Ctro SSQl Cpar .9391 Cgui SSQl Cdub 8301 Clus $501 Dhan Figure B2: SSC1 and SSQl combined amino acid multiple sequence alignment 106 SSC1 Scot Y SSCI Scor R SSC1 Spar SSC1 Shay SSC1 Scan SSC1 cyla SSC1 Tron SSC1 Ater 3801 Scar Y R 8801 Scan 8801 cyla SSQI Calb SSQI Ctro SSQl Cpar SSQl cyui 8801 Cdub SSQl Clu- SSQl Dhan Figure B2: SSC] and $801 combined amino acid multiple sequence alignment (continued) 107 110 120 130 140 150 cool .... 8881 Scar Y 8801 Seer R 8881 Spar SSC1 Snik 88C1 Shay SSC1 Scan SSC1 cyla 88C1 Chlb 88C1 Ctro 8801 Soc: 8801 Sea: 8801 Spar 8801 Smik 8801 Shay 8801 Scan 8801 cyla 8801 Calh 8801 Ctro 8801 Cpar 8801 cyui 8801 Cduh 8801 Clus 8801 Dhan N N Figure B2: SSC1 and SSQ1 combined amino acid multiple sequence alignment (continued) 108 160 170 180 190 200 88C! Scar Y 8881 Scar R 88C1 Spar 88C1 Shay 88C1 Scan 88C1 Ater SSC1 Afla 88C! Anig 88C1 Afum 8801 Scar 8801 8a.: 8801 Spar N N 8801 Shay 8801 Scan 8801 cyla 8801 Chlh 8801 Ctro 8801 Cpar 8801 cyui 8801 Cdub 8801 Clu- 8801 Dhan Figure B2: SSC1 and SSQ1 combined amino acid multiple sequence alignment (continued) 109 88c1 Scar Y 8801 Calb 8801 Ctro 9391 Cgui .9391 cduh 3.901 cm- 8801 Dhan Figure B2: SSC1 and SSQ1 combined amino acid multiple sequence alignment (continued) 110 260 270 280 290 300 ................||| 8881 Scar Y 8881 Scar R 8881 8par ssc1 Shay 8881 Scan 8881 cyla 888! Chlb 8881 Ctro 8881 Cpar ssc: Cgui 8881 81n- 8881 Dhan 8881 ryra 8881 rye: 8881 Pool 8881 Nora 8881 Traa 8881 Nfis 888! Anid 8881 Ater 8881 Lola 8881 Afla ssc1 Anig 8881 Aory 8881 Afum 8801 Scot 8801 Scar 8801 Spar 8801 8m1k 8801 Shay 8801 Scan 8801 cyla 8801 Chlh 8801 Ctro 8801 Cpar 8801 cyui 8801 Cduh 8801 Clus 8801 Dhan Y R Figure B2: SSC1 and SSQ1 combined amino acid multiple sequence alignment (continued) 111 310 320 330 340 350 8881 Scor Y 8881 Scar R 8881 8par 8881 8m1k 8881 Shay 8881 Scan 8881 891a 88C1 Clua 8801 Dhan Figure B2: SSC1 and SSQ1 combined amino acid multiple sequence alignment (continued) 112 360 370 380 390 I 400 8801 Seat 8801 Scar 8801 Spar 8801 831k 8801 Shay 8801 Seas 8801 cyl- 8801 8a1h 8801 Ctro 8801 Cpar 8801 cyui 8801 Cduh 8801 Clu- 8801 Dhan N N Figure 82: SSC1 and SSQ1 combined amino acid multiple sequence alignment (continued) 113 410 420 430 440 450 oaaa aa-a'uaaa|aao-|aaoo 8881 Scot Y 8881 Scot R 8881 Spar 8881 Salk 8881 Shay 8881 Scan 8881 8y1a 8881 Lila 8881 Anig 8881 Lory 8881 Afum 8801 Scar 8801 Scar 8801 Spar 8801 Snik 8801 Shay 8801 Sea- 8801 cyla 8801 Chlh 8801 Ctro 8801 Cpar 8801 cyui 8801 8duh 8801 Clus 8801 Dhan a": Figure 32: SSC1 and SSQ1 combined amino acid multiple sequence alignment (continued) 114 460 470 480 490 500 8881 Scar Y 8881 Seat R 8881 Spar 8881 81:1): 8881 Shay 8881 Scan 8881 8g1a 8881 8a1h 8881 Ctro 8881 Cpar 8881 Cgui 8881 8duh 8881 Clus 8801 Cth 8801 Ctro 8801 Cgui 8801 8duh 5.991 Clus 5591 Dhan Figure B2: SSC1 and SSQ1 combined amino acid multiple sequence alignment (continued) 115 510 520 530 540 550 8801 81ua 8801 Dhan Figure B2: SSC1 and SSQ1 combined amino acid multiple sequence alignment (continued) 116 560 570 580 8881 Seat Y 8881 Seat R 8881 Spar 8881 Shay 8881 Ctro SSC1 Anid 88C1 Atcr 8881 Afla 8881 Anig 8881 Jory 8881 Afum 8801 Scar 8801 Scar 8801 Spar 8801 Salt 8801 Shay 8801 Scan 8801 cyla 8801 8a1h 8801 Ctro 8801 Cpar 8801 cyui 8801 8duh 8801 Clus 8801 Dhan N N Figure B2: SSC1 and SSQ1 combined amino acid multiple sequence alignment (continued) 117 . :HoH ..Omm 13.qu 0mm ..an 19491:. on Eon—IE? 85.58 2.51:: E: oEEa “Dag ova—o muoafiosgueom "mm PEME «Hon 35. neon 3% 83m. n38 5am 35.. 3% 35. Moon «38 noon BE. ObH MN ounc‘-.-o omH ova on." can OH." OOH on 52:28 .mmoa .oam 239.5mxm90 .o ofiuflom..mmeummqm:n>ng>azqa 3.60 .85.. Ha gang“? quxmiaw. q .30. .38]umme m.o>r:_:.l:5mme. .91qu 9.5 l I gammy—E noon .85. . ommméqu my? 8381.95: .1 _ ".3. 39:31:; qnqxflifliinfi know H05”. in ..xqamwfilzgammofaam." cacaomwmxmq. WA? 5:113: mmHflmmqm>nqnq..1...i:...._.>a a: Mflfifi. ”UFO 0mm ..thxqamwflafiammo .95.... oqafimxma. m3? hi“:ugmmaflmmqmwnqnqnflnnzze Noun H015 hay-ntdufl _ N H00“ HU‘H. h404$>3ufl3 N “.0“ HU‘h. 0mm. .4m.[mqamw¢fizqammofomm_. mmo OH 118 E2:—mum? 8:258 2.51:: Eva 95:5 ~OQ4 IF .3an H m>¢MZSHU nubumfikch:IME2>qmu one «Ema mmdmfi> . : Hana HUND HmMH4>< I 4::n m .u1u_z:saiz um.<.mqumm i n... Ila-... ...: :15:— ENG an. qummmdm OHuU HUSH. MH‘SAMN?HL 1 .._.-Mr“;— _ 325': #:4? ..._DE mg. 0 1mm4m _ . QHQU HUSH .... .... .... .... .... .... .... .... .... .... .... I...._.... .... .... .... OOH OMH OvH OMH ONH OHH OOH Om xwmunmmmoma HdeI $.an . 4 . I nml“. _Qnmm_. ”.mOQ m fiZJ. .MQZH mu _Qmom..n~b MN an“ Hugh: .HONQKWNHNNM .Hd , m Mrmza: . I :2!!- _Dm$.u ”.mOdJmnmth IMEM n— ame mam n ...—N UUHU HU‘N. SWEWKKO HG. mawmzu Jddugidemi ..moqmwumtmq “EH Comm... ... 2mm I1!!! >5.— gUU HU‘H. qmmmmqfiomo ‘ :rhifliomm 3.qu m ELF mu REG .2. .: gnu n36 ..Hmam H 95 84“”an .mqémzH _ ... E. my hand 35. zmzmxmoommuomumflfiuin .32. an. . H Dam. ..moq IIme ..mq mean u» 0.30 "DE. Imtflfin£33u3{=; «Hdeame-dfi . HHQmm.:mOsz m._mq7KDZ>H Gamma Jam Iaiilph AHUU HUN5 ....._l..._....l.. on or O» om ov on on OH 119 2555:: 8:253. 23:5:— 23 2:8: ~U. anaqwommja aqua HUSH. ...... . . A .. . . ‘ .u . .u u . _ .. flamquommqqm ooh.“ no.8 ‘ . ; . . . . . V . . . ., . qzaqwommqqm who: H05. :55: :nl:.= m4. . ”...H :m . r ‘ . . . 4 . a . HOIh HUSH. Ina! FFEIR m<. . o . NO: HUGH. Iva-du— _i!_l:m¢_. . ”HUN HU‘h. Ina! .mlln mfi . NkON HU‘h. am." am: ova 120 9:08:33 3:253. 23:58 Eva 2:8: “Dag. ecu—o “Sauuummv‘ 6m 9.3mm m>umam<_ 1 m>0m¢m< N.DQODM¢H>QHamemqu04m l huu< ubtfi u. oqwomflgamm . mmquoam 33¢ HOE. M..oqoomfiBoHomjmmqq>o¢m.. 3.6.2 .35. ‘mmmUKmQHanHHW_ZH.Qmmmd..m ESN1 HUtS 4&0 afimw[ZH DammmA_m Mflfi< HUSH qmflndhm>_m lHU‘ HU‘S mommde_m Houfl H016 on Oh on on ov on on OH 121 Fungal Mitochondrial Heat Shock Protein Phylogenetic Gene Tree Input Topologies Apgndix C for cod eml Evolutionary Rate Analysis Tree topologies input into codem'l represent composite structures of highly supported relationships from trees inferred using the following sequence partitions: 1‘“, 2nd, 3rd, lSt and 2"d nucleotide positions, all nucleotides, and amino acids. Maximum Parsimony, Maximum Likelihood, and Bayesian Inference methods were used. Branches were manually collapsed if bootstrap support or posterior probabilities were below 90% or 0.9, respectively. All topologically unique composite trees are shown. Taxon name abbreviations used are listed in the table below: Taxon Taxon Abbreviation Fungal Species Abbreviation Fungal Species Sce r_Y Saccharomyces cerevisiae RM11 Fgra Fusarium graminearum Sce r_R Saccharomyces cerevisiae YJ M789 Fver Fusarium verticilliodes Spar Saccharomycesgaradoxus Fsol Fusarium solani Smik Saccharomyces mikatae Ncra Neurospora crassa Sbay Saccharomyces bayanus Tree Tn'choden‘na reesei Scas Saccharomyces caste/Iii Pans Podospora anserina igla Candida glabrata Nfis Neosartoola fischeri Calb Candida albicans Anid AjpeiMus nidulans Ctro Candida tropicalis Ater Aspergillus terreus Cpar Candida parapsilosis Acla Aspergillus clavatus _C_gui Candida guilliermondii Afla Aspergillus flavus Cdub Candida dub/iniensis Anig Aspergillus Liger Clus Candida Iusitaniae Aory Aspergillus oryzae Dhan Debaryomyces hansenii Afum Asperg'IIus fumigatus Foxy Fusarium oxysporum 122 Cgkl Seas Sbay Snfik Spar lus Cpar Cunt Can: Cdub Dhan Cgui Nona Pans Tuna Fsol an: Fver Foxy Ankl flung Iuer Anny Amh Ackl Ahun Nfis Figure C1: SSC1 Bayesian Inference Tree 1 ScefiY Seed! Cgkl Seas Sbay Snfik Spar SceFV Seed? (flus Cpar Cuni Can: Cdub Dhan Cgui Ncnl Pans Tn»; Fsol ana Fver Foxy Ask: rung Iuer Ankl AUunI Nfis Anny Afla Figure 62: SSC1 Bayesian Inference Tree 2 123 Cgla Seas Sbay Smik Spar SeerY ScerR Dhan Cpar Ctro Calb Cdub Cgui Clus Nera Pans Tree Fsol Fgra Fver Foxy Anid Anig Ater Aory Afla Acla Afum Nfis Cgkl Seas Sbay Snfik Spar SeerY Seen? Dhan Cpar Chci Can: Cdub Cgui (”us New: Pans Tux; Fsol anl Fver Foxy Ackl Auk; IMer Ankl Ahun Nfis Anny Afla Figure 03: SSC1 Bayesian Figure C4: SSC1 Bayesian Inference Tree 3 Inference Tree 4 124 Cgla Seas Sbay Smik Spar , —- SeerY F— — SeerR Clus Cgui Dhan Cpar Ctro -— Cdub — Calb Pans Nera Tree Fsol Fgra r— Fver —- Foxy Anid Anig Ater ...— Aory '_"i_. Afla Aela F— Nfis Figure C5: SSC1 Maximum Figure CG: SSC1 Maximum Likelihood Tree 1 Likelihood Tree 2 g 125 Aela {Em Nfis Dhan Cgla Cgla Seas ~ Seas Sbay Sbay Smik _ Smik Spar Spar SeerY SeerY SeerR ScerR Dhan Cpar Cgui Ct", Clus El ECalb Cpar Cdub Ctro Cgui Calb Clus CdUb Pans Pans I Nera Ncra Fsol F3°' 1% Fgra Fgra Fver Fver ‘—-' Foxy , :27: Anid - Ater 21:? Aela ______ Anig 23a” Nfis Afum Afla Aory Figure CB: SSC1 Maximum Parsimony Tree 1 Figure C7: SSC1 Maximum Likelihood Tree 3 126 Cgla Cgla Seas Seas Smik SbaY Shay Smik Spar Spar SeerY SeerY SeerR Seer R Dhan Clus Cgui Cgui Clus Dhan Cpar Cpar Ctro Ctro Calb Calb Cdub Cdub Pans Pans Ncra Ncra Tree Tree Fsol Fsol Fgra Fgr a Fver Fver Foxy Foxy An id Anid Ater Ater Aela Aela Anig Anig Nfis Nfis Afum Afum Afla Afla Aory Aory Figure 09: SSC1 Maximum Figure 010: SSC1 Maximum Parsimony Tree 2 Parsimony Tree 3 127 Figure C11: SSC1 Maximum Parsimony Tree 4 128 SSQ1 Cgla SSQ1 Seas SSQ1 Spar SSQ1 Sbay SSQ1 Smik SSQ1 SeerY SSQ1 ScerR SSQ1 Clus SSQ1 Dhan SSQ1 Cgui SSQ1 Cpar SSQ1 Ctro SSQ1 Cdub SSQ1 Calb SSC1 Aory SSC1 Afla SSC1 Anig SSC1 Ater SSC1 Anid SSC1 Aela SSC1 Afum SSC1 Nfis SSC1 Ncra SSC1 Pans SSC1 Tree SSC1 Fsol SSC1 Fgra SSC1 Fver SSC1 Foxy SSC1 Clus SSC1 Cpar SSC1 Ctro SSC1 Calb SSC1 Cdub L..— r—-SSC‘l Dhan l—SSC1 Cgui SSC1 Cgla SSC1 Seas SSC1 Sbay SSC1 Smik Figure C12: SSC1 SSC1 Spar and SSQ1 Bayesian SSC1 SeerY Inference Tree 1 SSC1 SCOT“ in: l i rm 129 SSQ1Cgh SSQ1 Seas SSQ1 Spar SSQ1 Sbay SSQ1 Smik SSQ1 SeerY SSQ1 SeerR SSC1Cgh SSC1 Seas SSC1 Sbay SSC1 Smik SSC1 Spar SSC1 SeerY SSC1 SeerR SSC1 Clus SSC1 Cpar SSC1 Ctro SSC1 Calb SSC1 Cdub SSC1 Dhan SSC1 Cgui SSC1 Ncra SSC1 Pans SSC1 Tree fissm F30. SSC1 Fgra SSC1 Fver ___. ESSC1 Foxy SSC1 Aory SSC1 Afla SSC1 Anig SSC1 Ater SSC1 Anid SSQ1 Dhan SSQ1 Clus —dl $8801 Cgui SSQ1 Cpar SSQ1 Ctro SSQ1 Cdub SSQ1 Calb Figure C13: SSC1 SSC1 Aela and SSQ1 Bayesian SSC1 Afum Inference Tree 2 SSC1 Nfis 130 SSQ1 Cgla SSQ1 Seas SSQ1 Spar SSQ1 Sbay SSQ1 Smik SSQ1 SeerY SSQ1 ScerR SSQf Dhan SSQ1 Clus SSQ1 Cgui SSQ1 Cpar SSQ1 Ctro SSQ1 Cdub SSQ1 Calb SSC1 Cgla SSC1 Seas SSC1 Sbay SSC1 Smik SSC1 Spar SSC1 SeerY SSC1 SeerR SSC1 Dhan SSC1 Cpar SSC1 Ctro SSC1 Calb SSC1 Cdub l—SSC1 Clus l--—SSC1 Cgui SSC1 Nera SSC1 Pans SSC1 Tree SSC1 Fsol SSC1 Fgra SSC1 Fver SSC1 Foxy SSC1 Aory SSC1 Afla SC1 Anig SC1 Ater SSC1 Anid SSC1 Aela SSC1 Afum SSC1 Nfis MW it Figure C14: SSC1 and SSQ1 Bayesian Inference Tree 3 ii 131 SSG1Cgh SSQ1 Seas $01 Spar SQ1 Sbay SSQ1 Smik SSQ1 SeerY SSQ1 ScerR SSQ1 Clus SQ1 Dhan $01 Cgui SSQI Cpar SQ1Cuo $01 Cdub SSQ1 Calb SSC1Cgh SSC1 Seas SSC1 Sbay SC1 Smik SSC1 SeerFI SSC1 Dhan SSC1 Cpar SSC1CuO SC1 Calb SC1 Cdub J—"SSC1 Clus SC1 Cgui SSC1 Nera SSC1 Pans SSC1 Tree SSC1 Fsol SC1 Fgra SC1 Fver SSC1 Foxy SSC1 Aory SSC1 Afla SC1 Anig SC1 Ater Figure C15: SSC1 and SSQ1 Bayesian Inference Tree 4 iii SC1 Anid SC1 Aela SC1 Afum SC1 Nfis if 132 SSQ1 Cgla Seas Spar Sbay Smik SeerY SSQ1 ScerR SSQ1 Clus Dhan Cgui Cpar Ctro Cdub SSQ1 Calb SSC1 Cgla 1 Seas 1 Sbay 1 Smik 1 Spar 1 SeerY SSC1 ScerR SSC1 Dhan 1 Cpar 1 Ctro 1 Calb 1 Cdub 1 Clus 1 Cgui SSC1 Ncra 1 Fans Tree 1 Fsol 1 Fgra 1 Fver 1 Foxy SSC1 Aela 1 Aory 1 Afla 1 Anig 1 Ater 1 Anid 1 Afum Nfis Figure C16: SSC1 and SSQ1 Bayesian Inference Tree 5 133 SSQ1 Cgla Seas Spar Sbay SSQ1 Smik SeerY SSQ1 ScerR SSQ1 Dhan Clus Cgui Cpar Ctro Cdub SSQ1 Calb SSC1Cgh 1 Seas 1 Sbay 1 Smik 1 Spar 1 SeerY SSC1 ScerFI SSC1 Dhan SSC1 Cpar 1CuO 1 Calb 1 Cdub 1 Clus 1 Cgui SSC1 Ncra SSC1 Fans 1 Tree 1 Fsol 1 Fgra 1 Fver 1 Foxy SSC1 Aela SSC1 Aory 1 Afla 1 Anig 1 Ater 1 Anid 1 Afum 1 Nfis Figure C1 7: SSC1 and SSQ1 Bayesian Inference Tree 6 134 Cgla Seas Sbay Smik Spar SeerY SeerR Cpar Ctro Cdub Calb Cgui Dhan SSQ1 Clus SSC1Cgh 1 Seas Sbay 1 Smik 1 Spar 1 SeerY 1 Seed! SSC1 Clus 1 Cgui 1 Dhan 1 Cpar Ctro 1 Cdub 1 Calb SSC1 Ncra 1 Pans 1 Tree Fsol 1 Fgra Fver 1 Foxy SSC1 Ater 1 Anig 1 Anid 1 Aela 1 Aory 1 Afla 1 Afum Nfis Figure C18: SSC1 and SSQ1 Maximum Likelihood Tree 135 Seas Cgla Spar Smik Sbay SeerY SSQ1 SeerR SSQ1 Dhan Clus Cgui Ctro Cpar Calb SSQ1 Cdub SSC1Cgh 1 Seas 1 Sbay 1 Smik 1 Spar 1 SeerY 1 SeerR SSC1 Clus 1 Cgui SSC1 Dhan 1 Ctro 1 Cpar 1 Calb 1 Cdub SSC1 Ncra SSC1 Fans 1 Tree 1 Fsol 1 Fgra 1 Fver 1 Foxy 1 Ater 1 Anig 1 Anid 1 Aela 1 Aory 1 Afla 1 Afum 1 Nfis Figure C19: SSC1 and SSQ1 Maximum Parsimony Tree 1 136 SSQ1 Seas SSQ1Cgh SSQ1 Spar SSQ1 Smik SSQ1 Sbay SSQ1 SeerY SSQ1 ScerR SSQ1 Dhan SSQ1 Clus SSQ1 Cgui SSQ1 Ctro SSQ‘I Cpar SSQ1 Calb $501 Cdub SSC1Cgh SSC1 Seas Mali SSC1 Sbay SSC1 Smik SSC1 Spar SSC1 SeerY SSC1 ScerR SSC1 Clus r—SSC1 Cgui l—SSC1 Dhan SSC1 Ctro SSC1 Cpar SSC1 Calb SSC1 Cdub SSC1 Nera SSC1 Pans SSC1 Tree SSC1 Fsol ii SSC1 Fgra SSC1 Fver SSC1 Foxy SSC1 Ater SSC1 Anig ——-SSC1 Anid -——-SSC1 Aela Figure C20: SSC1 and SSQ1 Maximum Parsimony Tree 2 SSC1 Aory {SSC1 Afla , SSC1 Afum {ssm Nfis 137 SeerY ScerR Spar Sbay Smik Seas Cgla Figure C21 : JAC1 Saccharomyces Bayesian Inference] Maximum Parsimony Tree Cdub Calb Ctro Cpar Dhan Clus Cgui Figure 023: JAC1 Candida Bayesian Inference Tree SeerY _f—CW Spar Smik Sbay Cgla Seas Figure C22: JAC1 Saccharomyces Maximum Likelihood Tree Cdub Calb Ctro Cpar Dhan Cgui [— Lem Figure C24: JAC1 Candida Maximum LikelihoodTree 138 Calb Pans Cdub Nera Cpar Tree Ctro ‘ Fsol Dhan Fgra Cgui Fver [: Clus Foxy Figure C25: JAC1 Figure C26: JAC1 Fusarium Candida Maximum Bayesian Inference! Maximum Parsimony Tree Likelihood I Maximum Parsimony Tree Nfis Aory Afum Afla Aela Anid , Anid Anig —[: Aory h—' Ater Afla Aela Anig ‘ Nfis ___.: Ater Afum Figure C27: JAC1 Figure C28: JAC1 Aspergillus Bayesian Aspergillus Maximum Inference Tree Likelihood Tree 139 Figure C29: JAC1 Aspergillus Maximum Parsimony Tree 140 Agmndix D Evolutionary Rate Test Specifications Used in Control Files Used to Run codem'l of PAML 141 Site-Specific Model Model = O Nsites = 7 (a) distribution approximated as a beta distribution) ncatG = 3 or 10 (# of categories pre-defined in the (0 distribution) Branch-Site Model: Model A as defined by Zhang et al. (2005 ) Model = 2 Nsites = 2 (on distribution includes sites under positive selection) ncatG = 3 (# of categories pre-defined in the to distribution) f1x_kappa = O (kappa to be estimated) fix_omega = O (omega to be estimated) null model for branch-site test Model = 2 Nsites = 2 (on distribution includes sites under positive selection) ncatG = 3 (# of categories pre-defined in the to distribution) fix_kappa = 1 (kappa fixed) kappa = 1 (fixed value of kappa) fix_omega (omega fixed) omega = 1 (fixed value of omega) Clade Model: Model D as defined by Bielawski and Yang (2004) Model = 3 Nsites = 3 (discrete to distribution) ncatG = 3 (# of categories pre-defined in the to distribution) null model for clade test Model = 0 (to distribution and estimated values apply to all branches of the tree) Nsites = 0 (one gene-wide average to estimated) ncatG = l (# of categories pre-defined in the to distribution) 142 Appgndix E Likelihood Ratio Tests of codem'l Evolutionary Rate Analyses 143 mmfigg hm~-mmw.~ 53-2.06 3.9-x. Nmmvné Hmmmomd www-mnmfi “.29-“. mmmnmd em~-wmm.m «mmmmmd 3.9-2 ~m-mom.m mvmfinofi mammfim 3.9-x. MMN mmN :3 «23 Amnessz. e2 .9 $982. 92 8.82 3352. 92 .9 22 R222: $985 92 .9 22 32:3 32:2 @342 21982. 92 .9 $982. as. $.32 #35qu Q2 .9 22 2382 235$ 92 .9 22 32:3 «.833 3.9: sneeze e2 .9 $982. 22 3%: 36382. 92 .9 22 3.98 $982. 92 .9 22 3:33 32:2 $.32 $985 22 .9 3"onqu e2 33: Amuosszv 92 .9 22 2.23 anesszv e2 .9 22 32:3 32:2 size“ as 222 23229.: 3.33? Hwémmma- _ $982. 22 _ $985 22 emANme- e 005 _n modmmmfi- _ $9282. 92 _ $982. as. mNdVNmH- m 00.5 _m 3.33.? _ .muocszv 92 _ 335$ 92 afimamma- N «on. _m 3633. F $6282. 92 _ $985 92 «002...; mmfimmma- 0.2 Humomma- 92 ondvmma- 22 3:8an- 0.2 3:... 323m noo£.9=..»o. gramme 59.5 :25. case .9» 3.8.2 286 G9. a 2822.58 to» one. 8229.: am can» 144 mmmamd www-mmm.m mmmmmvé m2m>d mm-mm~.m owmmmmé mvwmofim m3_m>..n_ emmaé mm~-mmN.H m--mm~.~ m:_m>na ma -mmm.v Nmmmomé Ommmmné 3.9-2 32.26 :3 28.2 2.5 89. a 2823.28 .92 2.2 282.95 2.32.28. 3 22¢ MMN $5282. 22 . 3.82 2328222. $.22 $98232. 32:3 382.2 3.an 2:: 2282222. 22.82 8328222. 033 2328232. 3213 325.2 8.22 3328232. 3.3: 2328232. 8.22: $28232. 32:3 3232 22.2 228222. Rea 23523.2. 3.33 $98232. 32:3 323.2 9 $282. 22 m> OS. 9 22 9 $932. 22 m> 22 9 22 9 fiuousz. 22 m> OS— m> OS— 9 fiuouszv 22 m> 0—2 m> OS. 8.3.5” as 2.2 25229.: mmdemwfi- omKNmmH- _ 3128232 L $985 22 ooKeNmH- «005 as. emNmmmH- _ Amuosszv 22 _ $9822 22 3.3me- m 35 .._2 mw.mw~ma- _ Amuwuszv 22 L $9822 22 NmKHNmH- ~ 008 .=2 wHKmNmH- _ 26$qu 22 _ 2"owqu 22 Humbug—2 2.3me 0.2 wmdomma- 22 mméommfi- OS. NYmNmmH- 02 3:.-. 8.33 305.0232 0283: 33.5 :25. 145 * 3mm“ .23.: meat 8% can Umm __m 2.92m 20% “502.2: 5.2. m5 38.2.12» c. 33...». 83...; 32:32.30. 9.532 32. m5 .235 9 29... #62 23:85:92. _muoE 2.29 2: 88.2.. 3.8.. v.83 mod v a an 2.8.253 8:32-; mmuocmo .. matommumu 3 .o 32.5: 2: u .982 u x6523. ... 82.3.2. .262 3% of m. as. ..muoE =2. 2: m. 22 mmmmmé www-mgd ~m~-m~m.¢ m:_m>-n_ «mmmfiw Nommmfia Nm~-mom.w «29-2 cmmmmé wmmmmmd mmNQNNd m:_m>-n_ 9390 .92 28.2 2.5 89. .0 2822.50... .2: 2.3. 8229.... u....m=2_.=.3. m. 222 mmm «1mm 3 om.mmwoa- o_>. mmdmwofi- 0.2 09?me o.)— 352 89282 22 .9 .Nuwsuz. 22 3.22 $6.82. 22 .9 22 8.28 328:. 22 .9 22 3.2.8.- 8.322- ..._...3 _ .muw.82. 22 _ $282. 22 e 3: 22 e we. .._2 3.32 8282. 22 .9 .3282. 22 8.32 .mugsz. 22 .9 22 $.23 .3282 22 .9 22 22332. 8.382- 32.3 — .muwfiz. 22 _ .3282. 22 m 8: ...2 m we. 22 3.32 $6.82 22 .9 £282. 22 3.32 .3282. 22 .9 22 2.82 36.82. 22 .9 22 3.2.3.- 2.8%.. 32.3 _ .muo.sz. 22 _ .mugsz. 22 ~ 8: ...2 ~ 3.. ...2 3.9.23" as 0:2 .9823... ..._.... macs 28.2.9232 9.5mm: 5&3 225. 146 mmémwd oo+mood oo+mood «emf. Réwwé mwmémmd oo+wood m:_m>un_ ow-mwm.m oo+mood oo+mood m:_m>ua wnmvmfi oo+mood oo+mood w:_m>um mnamfim oo+wood oo+mood magi 8.2"... .3282. 22 .9 .3882. 22 $.83 .3282. 22 .9 9.. 3:: .Nuufiz. 22 .9 22 .._....3 m 8: ... «33 36.82. 22 .9 .3382. 22 8.28. .muofiz. 22 .9 22 8.8: .Nuosuz. 22 .9 22 32.3 e 8: ... . 83m .mugsz. 22 .9 36.82. 22 3.3: 86.82. 22 .9 22 8.25 .Nnozuz. 22 .9 22 32.3 ... 8: a 21.3 36.85 22 .9 .3282. 22 $2.8 86.82. 22 .9 22 2.3: .3282. 22 .9 22 32.8 N 8.. ... 3.2;. .muu.82. 22 .9 .3382 22 8.2.2. .muo.82. 22 .9 22 2.2.: .Nuofiz. 22 .9 22 ......E n a!» a 8.3.5“ 93 2:: .8229... medmNom- _ .muofiz. 22 wmdmaom- _ W 3282 22 3.23%. T .muofiz. 22 wmdmaom- m .muwusz. 22J mmémfiom- _ .muousz. 22 _ 2.23m- $2.82. 22 m 8... 3 2.28m- $982. 22 e 8: ... mmdwmom- .3282. 22 m 8.. ... 2.88m- .mugsz. 22 ~ 3: ... 2.88m- .Nuozuz. 22 u out a mm.momfim- 0.2 mwdmmam- 22 mvémmfim- 0.2 afimmmfim- 0.2 @5523. 92 .._.... 933 82292.8. «5&2 9&8 .23. 92.50 .2: 28.2 226 89. .25 5m... ... 52.2.58 .92 25. 8229... "m 222 147 ow-mNH.m 8+mood 8500.0 ~22.-. mw-mond oo+mood 8+mood mag-n. mm-wmw.H 8+wood 8+mood mag-n. Hw-wmmé 00.686 8+mood mag-n. 3mm. .89.. mum... 8mm __m 9.0:... 9.8... 3003.3... .83 2... 38.3... .65 ... 32.23 83.8 uoo...9...-mo. 9.3mm: Emu 2... 6.3... o. 2.9... .82 2.2.8.253 .32... 32. 2.. 38.3... 8.8.. .....2m 2.23 $8.82. 22 .9 38.82. 22 2.3: ...-0.32. 22 .9 22 8.23 38.82. 22 .9 22 32.3 3 8.. 22 3.8.. ...-0.82. 22 .9 38.82. 22 $8... $8.82. 22 .9 22 3.82 38.82. 22 .9 22 32.3 H 8.. 22 8.3 ...-0.82. 22 .9 38.82. 22 3.223 38.82. as. .9 22 3.2.3 38.82. 22 .9 22 32.3 8.. ..2 3.5 ...-"982. 22 .9 38.82. 22 32:3 3.8.82. 22 .9 22 8.32 38.82. 22 .9 22 32.3 o 8.. a 8.5.3. .8. 2.2 .8223... 8.3 v a 3 E8555 32?-.. $.0ch .. 8.35.8 3 .o .38.... m... u .982 u 53:89. ... 32.883 .25... 2.2. m... .... as. “.25... :2. m... m. 0.2 3.33m- 3.323. 2.22.:- _ ...-3.82.22L 38.82.22 22 ~25 22 ...-.33...- 8.22.3- $.3on- _ 88.82.on 38.82.22 22 . a... ...2 8.33. 3.3.3- 3.3.3. _ 38.82.32 . 38.82.32 22 3.. ..2 2333- 333..- 3.82m- _ 36.82.32 _ 38.82. 22 22 may...» 32.. 9.3. ....2...9....§ 9...»... .....so 22.... 9.5.3 .3. .322 ......u 3mm .25 89. ... 82.2.28 .9» 2.... 302...... “3.2.2.2... m 22.. 148 Table E3: Likelihood Ratio Test Comparison of JAC1 Site-Specific Model Test Output 149 PAML output negafive log-likelihood scores Hnl.) Likelihood ratio test statistics Socchammyres clade MP/Bl tree MPIBI tree Number of 0.) Categories 3 10 2A(-ln L) df P-value 4248.93 -2249.55 1.25 7 9.906-01 Ml. tree Ml. tree Number of w Categories 3 l 10 2A(-lnl.) df P-value -2253.53 -2283.81 60.55 7 1.17E-10 " Candida clade Bl tree Bl tree Number of w Categories 3 10 2A(-lnL) df P-value -2767.48 -2765.66 3.63 7 8216-01 Ml. tree Ml. tree Number of w Categories 3 10 2A(~lnL) df P-value -2797.S3 -2796.00 3.06 7 8.79E-01 MP tree MP tree Number of w Categories 3 10 2A(-lnL) df P-value -2791.03 -2787.95 6.16 7 5.21E-01 Fusarium clade MP/Bl/Ml. tree Number of w Categories 2A(-lnL) df P-value 3 10 14.00 7 5.125-02 -2786.48 -2789.61 Table £3 (continued) : Likelihood Ratio Test Comparison of JAC1 Site-Specific Model Test Outputs PAML output i i negative log-likelihood scores (-lnL) Likelihood ratio test stat st cs Aspergillus clade MP Tree MP/Bl/ML tree Number of w Categories 3 10 2A(-lnL) df P-value -2508.72 _ -2508.73 0.03 7 1.00E+00 * Bl tree Bl tree I Number of w Cate ories 3 I 10 I 2A(-lnL) df P-value -2801.61 -2790.07 23.08 7 1656-03 “‘ ML tree ML tree l Number of w Categories 3 [ 10 J 2A(-lnL) df P-value -2856.26 -2847.00 18.51 7 9.886-03 "' Denotes P-values significant at P < 0.05 Black boxes indicate the number of in categories in the site-specific model that was significantly most likely to predict the data Negative log-likelihood values shaded in gray indicate the overall best likelihood score for the given clade obtained among all site-specific tests 150 Tabe E4: Likelihood Ratio Test Comparison of $501 Branch-Site Model Test Outputs PAML output negative log-likelihood scores I-InL) Bl tree 1 L Model A, estimated in I Model A, fixed to = 1 -31029.34 31129.88 Bl tree 2 I Model A, estimated in I Model A, fixed in = 1 -31023.33 7“ -31128.51 Bl tree 3 rModel A, estimated nu Model A, fixed a) = 1 81037.84 31139.44 Bl tree 4 I Model A, estimated in I Model A, fixed on = 1 31043.68 31124.20 Bl tree 5 I Model A, estimated in I Model A, fixed in = 1 34199.23 -34292.48 PAML output negative log-likelihood scores (-lnL) BltreeG L Model A, estimated in J Model A, fixed in = 1 -31088.51 -31166.09 ML tree I Model A, estimated to I Model A, fixed (1) = 1 -31158.86 -31254.19 MPtreel I Model A, estimated in I Model A, fixed in = 1 -31374.70 -31463.91 151 Likelihood ratio test statistics Bl tree 1 Model A, estimated in vs. Model A, fixed in = 1 2A(-lnL) df P-value 201.07 2 2.18E-44 8i tree 2 Model A, estimated (.0 vs. Model A, fixed 10 = 1 2A(-lnL) df P-value 210.35 2 2116-46 Bl tree 3 Model A, estimated in vs. Model A, fixed in = 1 2A(-lnL) df P-value 203.21 2 7.48E-45 Bl tree 4 Model A, estimated to vs. Model A, fixed in = 1 2A(-lnL) df P-value 161.03 2 1.08E-35 Bl tree 5 Model A, estimated in vs. Model A, fixed to = 1 2A(-lnL) df P-value 186.50 2 3.18E-41 Likelihood ratio test statistics Bl tree 6 Model A, estimated to vs. Model A, fixed in = 1 2A(-|nL) df P-value 155.15 2 2.04E-34 ML tree Model A, estimated in vs. Model A, fixed in = 1 2A(-InL) df P-value 190.67 2 3.95E-42 MP tree 1 Model A, estimated 0.) vs. Model A, fixed in = 1 2A(-lnL) df P-value 178.43 2 1.80E-39 - q-.-“ :- Table E4 (continued): Likelihood Ratio Test Comparison of $501 Branch-Site Model Test Outputs PAML output negative log-likelihood scores (-lnL) Likelihood ratio test statistics MP tree 2 MP tree 2 L Model A, estimated on ] Model A, fixed to = 1 Model A, estimated (.0 vs. Model A, fixed to = 1 -31351.81 -31458.31 2A(-|nL) df P-value 213.00 2 5.59E-47 * ‘ Denotes P-values significant at P < 0.05 Black boxes indicate the significantly model most likely to predict the data Negative log-likelihood values shaded in gray indicate the overall best likelihood score obtained among all branch-site PAML tests 152 REFERENCES Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21(9): 2104-2105. Andrew A], Dutkiewicz R, Knieszner H, Craig EA, Marszalek J (2006) Characterization of the interaction between the J -protein J aclp and the scaffold for Fe-S cluster biogenesis, Isulp. J Biol Chem 281(21): 14580-14587. Baumann F, Milisav I, Neupert W, Herrmann J M (2000) EcmlO, a novel hsp70 homolog in the mitochondrial matrix of the yeast Saccharomyces cerevisiae. FEBS Lett 487(2): 307-312. Benedict MQ, Cockburn AF, Seawright J A (1993) The Hsp70 heat-shock gene family of the mosquito Anopheles albimanus. Insect Mol Biol 2(2): 93-102. Bettencourt BR, Feder ME (2002) Rapid concerted evolution via gene conversion at the Drosophila hsp70 genes. J MolEv0154(5): 569—586. Bielawski JP, Yang Z (2003) Maximum likelihood methods for detecting adaptive evolution after gene duplication. J Struct Funct Genomics 3(1-4): 201-212. Bielawski JP, Yang Z (2004a) Maximum Likelihood Methods for Detecting Adaptive Protein Evolution. Bielawski JP, Yang Z (2004b) A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. J Mol Evol 59(1): 121-132. Boorstein WR, Ziegelhoffer T, Craig EA (1994) Molecular evolution of the HSP70 multigene family. J Mol Evol 38(1): 1-17. Brown C], Todd KM, Rosenzweig RF (1998) Multiple duplications of yeast hexose transport genes in response to selection in a glucose-limited environment. Mol Biol Evol 15(8): 931-942. Bucciantini M, Giannoni E, Chiti F, Baroni F, Formigli L et al. (2002) Inherent toxicity of aggregates implies a common mechanism for protein misfolding diseases. Nature 416(6880): 507-51 1. Bukau B, Horwich AL (1998) The Hsp70 and Hsp60 chaperone machines. Cell- 92(3): 351-366. 153 Cheetham ME, Caplan AJ (1998) Structure, function and evolution of DnaJ: conservation and adaptation of chaperone function. Cell Stress Chaperones 3(1): 28-36. Conant GC, Wagner A (2003) Asymmetric sequence divergence of duplicate genes. Genome Res 13(9): 2052-2058. Craig EA (1989) Essential roles of 70kDa heat inducible proteins. Bioessays 11(2-3): 48- 52. Davis J C, Petrov DA (2004) Preferential duplication of conserved proteins in eukaryotic genomes. PLoS Biol 2(3): E55. DeBry RW (1999) Maximum likelihood analysis of gene-based and structure-based process partitions, using mammalian mitochondrial genomes. Syst Biol 48(2): 286-299. Des Marais DL, Rausher MD (2008) Escape from adaptive conflict after duplication in an anthocyanin pathway gene. Nature 454(7205): 762-765. Diamond ME, Dowhanick JJ, Nemeroff ME, Pietras DF, Tu CL et al. (1989) Overlapping genes in a yeast double-stranded RNA virus. J Virol 63(9): 3983-3990. Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134(2): 341-352. Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold PH (2005) Why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A 102(40): 14338- 14343. Endo T, Ikeo K, Gojobori T (1996) Large-scale search for genes on which positive selection may operate. Mol Biol Evol 13(5): 685-690. Feder ME, Cartano NV, Milos L, Krebs RA, Lindquist SL (1996) Effect of engineering Hsp70 copy number on Hsp70 expression and tolerance of ecologically relevant heat shock in larvae and pupae of Drosophila melanogaster. J Exp Biol 199(Pt 8): 1837-1844. Felsenstein J (1978) Cases in which parsimony or compatibility methods can be positively misleading. Syst Zool 27: 401-419. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6): 368-376. Felsenstein J (2003) Inferring Phylogeneies. United States: Sinauer Associates. 154 Fitzpatrick DA, Logue ME, Stajich JE, Butler G (2006) A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evol Biol 6: 99. Force A, Lynch M, Pickett FB, Amores A, Yan YL et al. (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151(4): 1531-1545. Gerrnaniuk A, Liberek K, Marszalek J (2002) A bichaperone (Hsp70-Hsp78) system restores mitochondrial DNA synthesis following thermal inactivation of Miplp polymerase. J Biol Chem 277(31): 27801-27808. Golding GB, Dean AM (1998) The structural basis of molecular adaptation. Mol Biol Evol 15(4): 355-369. Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein- coding DNA sequences. Mol Biol Evol 11(5): 725-736. Gribaldo S, Lumia V, Creti R, de Macario EC, Sanangelantoni A et al. (1999) Discontinuous occurrence of the hsp70 (dnaK) gene among Archaea and sequence features of HSP70 suggest a novel outlook on phylogenies inferred from this protein. J Bacteriol 181(2): 434-443. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52(5): 696-704. Gupta RS (1999) Hsp70 sequences and the phylogeny of prokaryotes. Mol Microbiol 31(3): 1007-1009. Gupta RS, Singh B (1992) Cloning of the HSP70 gene from Halobacterium marismortui: relatedness of archaebacterial HSP70 to its eubacterial homologs and a model for the evolution of the HSP70 gene. J Bacteriol 174(14): 4594—4605. Gupta RS, Singh B (1994) Phylogenetic analysis of 70 kD heat shock protein sequences suggests a chimeric origin for the eukaryotic cell nucleus. Curr Biol 4(12): 1104- 1114. Hakes L, Lovell SC, Oliver SG, Robertson DL (2007) Specificity in protein interactions and its relationship with sequence diversity and coevolution. Proc Natl Acad Sci U S A 104(19): 7999-8004. Han W, Christen P (2003) Mechanism of the targeting action of DnaJ in the DnaK molecular chaperone system. J Biol Chem 278(21): 19038-19043. Hennig W (1966) Phylogenetic Systematics. Urbana, IL: Univ. of Illinois Press. 155 Herrmann J M, Stuart RA, Craig EA, Neupert W (1994) Mitochondrial heat shock protein 70, a molecular chaperone for proteins encoded by mitochondrial DNA. J Cell Biol 127(4): 893-902. Hittinger CT, Carroll SB (2007) Gene duplication and the adaptive evolution of a classic genetic switch. Nature 449(7163): 677-681. Holder M, Lewis PO (2003) Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet 4(4): 275-284. Hurles M (2004) Gene duplication: the genomic trade in spare parts. PLoS Biol 2(7): E206. Itoh T, Matsuda H, Mori H (1999) Phylogenetic analysis of the third hsp7O homolog in Escherichia coli; a novel member of the Hsc66 subfamily and its possible co- chaperone. DNA Res 6(5): 299-305. Kellis M, Birren BW, Lander ES (2004) Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428(6983): 617-624. Kiley PJ, Beinert H (2003) The role of Fe-S proteins in sensing and regulation in bacteria. Curr Opin Microbiol 6(2): 181-185. Kondrashov FA, Rogozin IB, Wolf Y1, Koonin EV (2002) Selection in the evolution of gene duplications. Genome Biol 3(2): RESEARCHOOOS. Lange M, Macario AJ, Ahring BK, Conway de Macario E (1997) Heat-shock response in Methanosarcina mazei S-6. Curr Microbiol 35(2): 116-121. Langkjaer RB, Cliften PF, Johnston M, Piskur J (2003) Yeast genome duplication was followed by asynchronous differentiation of duplicated genes. Nature 421(6925): 848-852. Li WH (1980) Rate of gene silencing at duplicate loci: a theoretical study and interpretation of data from tetraploid fishes. Genetics 95(1): 237-258. Li WH, Yang J, Gu X (2005) Expression divergence between duplicate genes. Trends Genet 21(11): 602-607. Lill R, Muhlenhoff U (2008) Maturation of iron-sulfur proteins in eukaryotes: mechanisms, connected processes, and diseases. Annu Rev Biochem 77: 669-700. Lim EH, Brenner S (1999) Short-range linkage relationships, genomic organisation and sequence comparisons of a cluster of five HSP70 genes in Fugu rubripes. Cell Mol Life Sci 55(4): 668-678. 156 Lim J H, Martin F, Guiard B, Pfanner N, Voos W (2001) The mitochondrial Hsp70- dependent import system actively unfolds preproteins and shortens the lag phase of translocation. Embo J 20(5): 941-950. Lindquist S, Craig EA (1988) The heat-shock proteins. Annu Rev Genet 22: 631-677. Lutz T, Westermann B, Neupert W, Herrmann J M (2001) The mitochondrial proteins Ssql and J acl are required for the assembly of iron sulfur clusters in mitochondria. J Mol Biol 307(3): 815-825. Macario AJ, Du gan CB, Conway de Macario E (1991) A dnaK homolog in the archaebacterium Methanosarcina mazei S6. Gene 108(1): 133-137. Messier W, Stewart CB (1997) Episodic adaptive evolution of primate lysozymes. Nature 385(6612): 151-154. Muhlenhoff U, Lill R (2000) Biogenesis of iron-sulfur proteins in eukaryotes: a novel task of mitochondria that is inherited from bacteria. Biochim Biophys Acta 1459(2-3): 370-382. Neupert W (1997) Protein import into mitochondria. Annu Rev Biochem 66: 863-917. Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148(3): 929-936. Nikolaidis N, Nei M (2004) Concerted and nonconcetted evolution of the Hsp70 gene superfamily in two sibling species of nematodes. Mol Biol Evol 21(3): 498-505. Nimura K, Yoshikawa H, Takahashi H (1996) DnaK3, one of the three DnaK proteins of cyanobacterium Synechococcus sp. PCC7942, is quantitatively detected in the thylakoid membrane. Biochem Biophys Res Commun 229(1): 334-340. N uin PA, Wang Z, Tillier ER (2006) The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinformatics 7: 471. Ohno S (1970) Evolution by gene duplication. New York: Springer-Verlag. Ota T, Nei M (1994) Divergent evolution and evolution by the birth-and-death process in the immunoglobulin VH gene family. Mol Biol Evol 11(3): 469-482. Outten FW, Djaman O, Storz G (2004) A suf operon requirement for Fe-S cluster assembly during iron starvation in Escherichia coli. Mol Microbiol 52(3): 861- 872. 157 Pal C, Papp B, Lercher MJ (2006) An integrated view of protein evolution. Nat Rev Genet 7(5): 337-348. Pelham HR (1984) Hsp70 accelerates the recovery of nucleolar morphology after heat shock. Embo J 3(13): 3095-3100. Philippe H, Budin K, Moreira D (1999) Horizontal transfers confuse the prokaryotic phylogeny based on the HSP70 protein family. Mol Microbiol 31(3): 1007-1010. Renner T, Waters ER (2007) Comparative genomic analysis of the Hsp70s from five diverse photosynthetic eukaryotes. Cell Stress Chaperones 12(2): 172-185. Ritossa F (1962) A new puffing pattern induced by heat shock and DNP in Drosophila. Experientia 18: 571-573. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12): 1572-1574. Rouault TA, Tong WH (2005) Iron-sulphur cluster biogenesis and mitochondrial iron homeostasis. Nat Rev Mol Cell Biol 6(4): 345-351. Sahi C, Craig EA (2007) Network of general and specialty J protein chaperones of the yeast cytosol. Proc Natl Acad Sci U S A 104(17): 7163-7168. Scannell DR, Wolfe KH (2008) A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast. Genome Res 18(1): 137-147. Schilke B, Williams B, Knieszner H, Pukszta S, D'Silva P et al. (2006) Evolution of mitochondrial chaperones utilized in Fe-S cluster biogenesis. Curr Biol 16(16): 1660-1665. Schilke B, Forster J, Davis J, James P, Walter W et al. (1996) The cold sensitivity of a mutant of Saccharomyces cerevisiae lacking a mitochondrial heat shock protein 70 is suppressed by loss of mitochondrial DNA. J Cell Biol 134(3): 603-613. Schmidt S, Strub A, Rottgers K, Zufall N, Voos W (2001) The two mitochondrial heat shock proteins 70, Sscl and Ssql, compete for the cochaperone Mgel. J Mol Biol 313(1): 13-26. Suzuki Y, Nei M (2001) Reliabilities of parsimony-based and likelihood-based methods for detecting positive selection at single amino acid sites. Mol Biol Evol 18(12): 2179-2185. Swofford DL (2000) PAUP*, Phylogenetic Analysis Using Parsimony, Version 4.0b10. 158 Takahashi Y, Tokumoto U (2002) A third bacterial system for the assembly of iron-sulfur clusters with homologs in archaea and plastids. J Biol Chem 277(32): 28380- 28383. Tavaria M, Gabriele T, Kola I, Anderson RL (1996) A hitchhiker's guide to the human Hsp70 family. Cell Stress Chaperones 1(1): 23-28. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22): 4673-4680. Tissieres A, Mitchell HK, Tracy UM (1974) Protein synthesis in salivary glands of Drosophila melanogaster: Relation to chromosome puffs. J Mol Biol 85(3): 389- 398. Tuschl T, Eckstein F (1993) Hammerhead ribozymes: importance of stem-loop II for activity. Proc Natl Acad Sci U S A 90(15): 6991-6994. Voisine C, Schilke B, Ohlson M, Beinert H, Marszalek J et al. (2000) Role of the mitochondrial Hsp70s, Sscl and Ssql, in the maturation of thl. Mol Cell Biol 20(10): 3677-3684. Voisine C, Cheng YC, Ohlson M, Schilke B, Hoff K et al. (2001)]ac1, a mitochondrial J - type chaperone, is involved in the biogenesis of Fe/S clusters in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 98(4): 1483-1488. Walter L, Rauh F, Gunther E (1994) Comparative analysis of the three major histocompatibility complex-linked heat shock protein 70 (Hsp70) genes of the rat. Immunogenetics 40(5): 325-330. Wang R, Prince JT, Marcotte EM (2005) Mass spectrometry of the M. smegmatis proteome: protein expression levels correlate with function, operons, and codon bias. Genome Res 15(8): 1118-1126. Wang TF, Chang J H, Wang C (1993) Identification of the peptide binding domain of hsc70. l8-Kilodalton fragment located immediately after ATPase domain is sufficient for high affinity binding. J Biol Chem 268(35): 26049-26051. Ward-Rainey N, Rainey FA, Stackebrandt E (1997) The presence of a dnaK (HSP70) multigene family in members of the orders Planctomycetales and Verrucomicrobiales. J Bacteriol 179(20): 6360-6366. Yang Z ( 1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13(5): 555-556. 159 Yang Z (1998) Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol 15(5): 568-573. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8): 1586-1591. Yang Z, Swanson WJ (2002) Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol Biol Evol 19(1): 49-57. Zhang J, Rosenberg HF (2002) Complementary advantageous substitutions in the evolution of an antiviral RNase of higher primates. Proc Natl Acad Sci U S A 99(8): 5486-5491. Zhang J, Nielsen R, Yang Z (2005) Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol 22(12): 2472-2479. Zhang P, Gu Z, Li WH (2003) Different evolutionary patterns between young duplicate genes in the human genome. Genome Biol 4(9): R56. Zheng L, Cash VL, Flint DH, Dean DR (1998) Assembly of iron-sulfur clusters. Identification of an iscSUA-hscBA-fdx gene cluster from Azotobacter vinelandii. J Biol Chem 273(21): 13264-13272. Zheng L, White RH, Cash VL, Jack RF, Dean DR (1993) Cysteine desulfurase activity indicates a role for NIFS in metallocluster biosynthesis. Proc Natl Acad Sci U S A 90(7): 2754-2758. 160