COMPUTATIONAL STUDIES OF THE DNA MISMATCH RECOGNITION PROTEIN By Sean Ming-Yin Law A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSPHY Biochemistry and Molecular Biology 2011 ABSTRACT COMPUTATIONAL STUDIES OF THE DNA MISMATCH RECOGNITION PROTEIN By Sean Ming-Yin Law The MutS DNA mismatch recognition protein was studied by using a combination of molecular dynamics simulations and normal mode analysis. Both methods revealed uniquely different structural conformations that were collectively used to characterize a new functional cycle for mismatch recognition. The DNA dynamics from the MutS simulations were also assessed. The G·T mismatch contained within the DNA was found to be relatively stable, whereas the 5’ adjacent base next to the mispaired thymine was highly dynamic. In one simulation, the 5’ adjacent base opened up via the major groove and stayed flipped-out for the entire duration of the 200 ns simulation. The energetics of base-flipping in the MutS-DNA system were examined and the relevance and importance of these observations were discussed. The development of a new path-based restraint is presented and applied to study DNA translocation in the Hin recombinase test system. Using multiple path-base restraints, the DNA was successfully translocated by one full base pair in both the forward and backward directions. The method for calculating the corresponding free energy profile along a single DNA translocation reaction coordinate was also reformulated and explained. ACKNOWLEDGEMENTS Six productive, challenging, and revealing years at Michigan State University have finally come to a dénoument. Each passing day as a graduate student has taught me more about myself than I could have ever imagined. However, my initial interest in research began well before my days at Michigan State University and, therefore, I must acknowledge Dr. Walter J. Whiteley at York University (Toronto, Canada). The opportunities that were given to me as an undergraduate researcher really opened up my eyes, and Walter’s guidance and advice have played a vital role in my scientific journey. As an amateur structural biologist with an interest in applied mathematics, I became a member of Dr. Michael Feig’s group at Michigan State University during the spring of 2006. Michael’s exacting nature, endless patience, passion for science, and insistence of “quality over quantity” provided the perfect environment for my professional growth and development. As a mentor and advisor, Michael challenged and encouraged me to step beyond my boundaries and both inspired and motivated me to work harder. Over the years, I have truly come to understand and appreciate all of the things that Michael has done and I consider myself exceptionally privileged and forever indebted. I will always remember the one piece of timely advice that Michael had shared with me near the end of my degree which accurately defines what I have learned while under his supervision – “Gambate!” (Japanese: Work hard/don’t give up!). My path through graduate school would not have been possible without the support from many past and current members of the Feig lab. First, I would like to thank Dr. Shayantani Mukherjee, Dr. Katarzyna Maksimiak, Hugh Crosmun, and Taraz Buck for their indispensable contributions to the MutS project. Next, I would like to acknowledge the first generation of iii group members, namely Andrew Stumpff-Kane, Dr. Seiichiro Tanizaki, Dr. Kitiyaporn Wittayanarakul, Brian Connelly, and Jacob Clifford with whom I have had the pleasure of working with. I am also grateful for my other colleagues, Afra Panahi, Maryam Sayadi, Dr. YiMing Cheng, Dr. Srinivasa Gopal, Dr. Kahsay Gebreyohannes, Dr. Alexander Predeus, Dr. Liang Fang, Dr. Kangasabai Vadivel, Vahid Mirjalili, Seref Gul, and Nan Liu. Thank you all for the great memories and for always making the next day more enjoyable than the last. I would like to express my sincerest gratitude to all members of my guidance committee, Dr. Robert Cukier, Dr. Robert Hausinger, Dr. Charles Hoogstraten, Dr. Honggao Yan, and Dr. William Wedemeyer. Their expert advice, critical support, and careful tutelage were crucial for allowing me to reach my research goals. I would also like to extend my appreciation to Dr. Leslie Kuhn and former members of the Kuhn Lab for adopting me as their computational halfson and I would also like to acknowledge Dr. Zachary Burton for his critical feedback and helpful discussions. A special thanks goes out to Edward Kabara as I would have never survived my first year of graduate school without his honest friendship. For fear of omission, I would like to thank all past and present members of the Department of Biochemistry and Molecular Biology as well as all of the admirable administrative personnel. My days as a graduate student have finally reached the end of a long chapter but I promise you that the next page will be filled with new and exciting adventures. Finally, to my mom, dad, Stella, Sacha, and to the rest of my supportive friends and family -多謝. iv TABLE OF CONTENTS List of Tables ................................................................................................................................ vii List of Figures .............................................................................................................................. viii Chapter 1 Introduction .................................................................................................................... 1 1.1 Overview ............................................................................................................................... 2 1.2 Replication Fidelity and Methyl-directed DNA Mismatch Repair ....................................... 2 1.3 Molecular Structure of the DNA Mismatch Recognition Protein ........................................ 4 1.4 Investigating the Structural Dynamics of the DNA Mismatch Recognition Complex......... 6 1.5 Computer Simulations .......................................................................................................... 8 Chapter 2 Deciphering the Mismatch Recognition Cycle in MutS and MSH2-MSH6 Using Normal Mode Analysis ................................................................................................................. 23 2.1 Abstract ............................................................................................................................... 24 2.2 Introduction ......................................................................................................................... 25 2.3 Methods............................................................................................................................... 30 2.4 Results and Discussion ....................................................................................................... 32 2.5 Conclusions ......................................................................................................................... 48 2.6 Acknowledgements ............................................................................................................. 49 Chapter 3 An Analysis of E. coli MutS Dynamics from Molecular Dynamics Simulations........ 67 3.1 Abstract ............................................................................................................................... 68 3.2 Introduction ......................................................................................................................... 69 3.3 Methods............................................................................................................................... 71 3.4 Results ................................................................................................................................. 73 3.5 Discussion ........................................................................................................................... 76 3.6 Conclusions ......................................................................................................................... 80 Chapter 4 Base-Flipping Mechanism in Post-Mismatch Recognition by MutS ........................... 96 4.1 Abstract ............................................................................................................................... 97 4.2 Introduction ......................................................................................................................... 98 4.3 Materials and Methods ...................................................................................................... 100 4.4 Results and Discussion ..................................................................................................... 106 4.5 Conclusions ....................................................................................................................... 116 4.6 Acknowledgements ........................................................................................................... 117 Chapter 5 A Path-Based Reaction Coordinate for Biased Sampling of Nucleic Acid Translocation .............................................................................................................................. 158 5.1 Introduction ....................................................................................................................... 159 5.2 Methods............................................................................................................................. 162 5.3 Results ............................................................................................................................... 173 5.4 Discussion ......................................................................................................................... 175 v 5.5 Conclusion ........................................................................................................................ 177 Chapter 6 Conclusions and Perspectives .................................................................................... 183 6.1............................................................................................................................................ 184 References ....................................................................................................................................191 vi LIST OF TABLES Table 2.1 Overlap Index for a Pair of Modes, Each From Two Different Sets ............................50 Table 3.1 Pair-wise Overlap of the First Eigenvector From the Nine Simulations and the Combined Trajectory .....................................................................................................................81 Table 4.1 DNA Sequence Used in All MutS Simulations ..........................................................118 Table 5.1 DNA Sequence Used in All Hin Recombinase Simulations ......................................178 vii LIST OF FIGURES Figure 1.1 Schematic representation of the methyl-directed DNA mismatch repair in E. coli ....17 Figure 1.2 DNA mismatch recognition proteins ..........................................................................19 Figure 1.3 Schematic representation of the Hamiltonian Replica Exchange Method ..................21 Figure 2.1 Crystal structure of MSH2-MSH6...............................................................................51 Figure 2.2 The dynamic behavior of the ATPase site in both chains ...........................................53 Figure 2.3 Root mean square fluctuation of Cα atoms .................................................................55 Figure 2.4 Protein backbone showing thermal fluctuations color coded by B-factors .................57 Figure 2.5 Average covariance from the first 10 modes in MSH2-MSH6 and MutS ..................59 Figure 2.6 Mode motions of MSH-free projected on to the minimized crystal structure .............61 Figure 2.7 Close-up views of the motions in the nucleotide-binding domain of MSH-free ........63 Figure 2.8 Schematic diagram representing distinct conformational states during the functional cycle of MSH2-MSH6 or MutS .....................................................................................................65 Figure 3.1 The MutS-DNA structure with different nucleotide-bound conformations ................82 Figure 3.2 Secondary structure propensity ...................................................................................84 Figure 3.3 The percent contribution of the first 20 eigenvectors ..................................................86 Figure 3.4 Schematic “Porcupine plot” for the ATP:ATP simulation ..........................................88 Figure 3.5 MutS ATPase domain..................................................................................................90 Figure 3.6 Schematic “Porcupine plot” for the ATP:ADP simulation .........................................92 Figure 3.7 The RMSF for the S1 and S2 monomers......................................................................94 Figure 4.1 X-ray crystal structure of E. coli MutS .....................................................................119 Figure 4.2 Pseudodihedral angle definition ................................................................................121 Figure 4.3 Cα protein RMSD and heavy atom DNA RMSD .....................................................123 viii Figure 4.4 K-means clustering of the nine simulations using a 2.5 Å radius .............................125 Figure 4.5 DNA base pair hydrogen bonding .............................................................................127 Figure 4.6 Comparison of the G·T and G/C(-1) major groove widths .......................................129 Figure 4.7 Comparison of T22 glycosyl rotation angle ..............................................................131 Figure 4.8 Correlation of C21 base-flipping in NONE:NONE simulation with various structural quantities ......................................................................................................................................133 Figure 4.9 Snapshots from the NONE:NONE simulation of base flipping ................................135 Figure 4.10 Free energy profiles from the HREM simulation ....................................................137 Figure 4.11 Comparison of the C21 backbone ζ torsion angle ...................................................139 Figure 4.12 Free energy profiles from the same HREM simulation...........................................141 Figure 4.13 HREM sampling overlap .........................................................................................143 Figure 4.14 Water residence time calculations from the NONE:NONE simulation ..................145 Figure 4.15 Solvent-accessible surface area (SASA) calculations .............................................147 Figure 4.16 Protein domain motions...........................................................................................149 Figure 4.17 S1 DNA binding domain movement from unbiased and HREM simulations along Y and Z directions............................................................................................................................152 Figure 4.18 Allosteric signaling from the DNA binding domain to the ATPase domains .........154 Figure 4.19 Visualization of the effects of base flipping on the ATPase domains .....................156 Figure 5.1 Snapshots from forward and backward DNA translocation in Hin-recombinase .....179 Figure 5.2 Free energy profile for DNA translocation ...............................................................181 ix Chapter 1 Introduction 1 1.1 Overview In this dissertation the structures of the Escherichia coli (E. coli) and human DNA mismatch recognition proteins, MutS and MSH2-MSH6 (MutSα), respectively, are studied using modern computational techniques. This dissertation is composed of six chapters followed by a combined reference section for all chapters. Chapter 1 offers a basic introduction to the DNA mismatch repair process and presents some of the experimental and theoretical methods that are referenced within this dissertation. Chapter 2 discusses the identification of distinct MutS and MutSα conformational states using normal mode analysis accompanied by the structural characterization of a complete functional cycle for mismatch recognition. Chapters 3 and 4 describe the work from several 200 nanosecond (ns) molecular dynamics computer simulations of E. coli MutS. Chapter 3 presents an analysis of the overall protein dynamics and Chapter 4 details the proposed role and effects of DNA base-flipping in MutS. Chapter 5 describes the development of a novel path-based biasing potential that can be used for studying DNA translocation. Finally, a summary of the conclusions is given in Chapter 6. 1.2 Replication Fidelity and Methyl-directed DNA Mismatch Repair In 1953, James Watson and Francis Crick presented their classical paper which accurately described the double helical structure of DNA (1). Because DNA serves as the genetic blueprint, maintaining the integrity of DNA is essential and requires mechanisms to prevent errors due to attack from external factors and due to infidelity during improper replication or homologous recombination. During DNA synthesis, the error frequency in base misincorporation alone is -1 -2 -5 -7 estimated to be about 10 – 10 (2-3) but this error is greatly reduced to 10 – 10 by the 2 nucleotide selectivity of DNA polymerase and the replisomal 3’5’ proofreading exonuclease (along with a small reduction by accessory proteins such as the single-stranded DNA-binding proteins (SSB)) (2-3). However, this cumulative error rate is still far too high considering that the human genome is made up of roughly three billion base pairs and, on average, the replication machinery only commits three base pair errors per replication cycle (3). Fortunately, the last line of defense is the DNA mismatch repair (MMR) system which is capable of increasing the 3 accuracy by a factor of 10 and thereby improving the cumulative error frequency to ~10 -10 (3). The MMR pathway is conserved from prokaryotes to eukaryotes and MMR deficiencies in humans have been linked to an increase risk of colorectal cancer as well as other forms of cancer (4). The methyl-directed MMR pathway in E. coli (Figure 1.1), which has been reconstituted from purified proteins (5), is the most well-studied bacterial mismatch repair pathway and therefore also serves as an excellent prototype for understanding eukaryotic mismatch repair. In E. coli, the MutS DNA mismatch recognition protein is responsible for scanning the DNA in search of short insertion/deletion loops (IDLs) and base-base mismatches resulting from occasional polymerization errors that have eluded proofreading (6). After binding to a mismatch, MutS, in the presence of ATP, recruits MutL and forms a MutS-DNA-MutL ternary complex (79). Next, MutH, a protein that is bound within ~1 kb of the mismatch at a hemimethylated d(GATC) site located either 3’ or 5’ to the lesion and which is activated by the assembly of the ternary complex, gets recruited. Then, the latent endonuclease found in MutH cleaves the newly synthesized (unmethylated) DNA strand and not the Dam methylated parental strand (10). This incision acts as a point of entry for binding of SSB and for MutL-facilitated loading of DNA helicase II. Dependent upon the location of the nick with respect to the mismatch, removal of the 3 damaged DNA strand can occur in either direction (11-12) and is completed by either 3’5’ exonucleases (ExoI and ExoX) or 5’3’ exonucleases (RecJ and ExoVII) (13-15). Finally, the excised DNA is re-synthesized and sealed by DNA polymerase III and DNA ligase, respectively. For a more thorough review of DNA MMR, the reader is directed to several excellent and comprehensive reviews (4-5, 16-18). 1.3 Molecular Structure of the DNA Mismatch Recognition Protein The earliest documented effort for predicting the 3-dimensional structure of the human MutS homolog 2 (hMSH2) DNA mismatch recognition protein was published in 1998 by de las Alas et al. (19). In that work, a then novel prediction-based threading method was used to identify structural homologs of hMSH2 and coordinates were manually assigned based on matches between the predicted secondary structure of hMSH2 and the secondary structure of the structural homologs found in the Protein Data Bank (PDB). However, two years later, two independent groups published the first ever high-resolution X-ray crystal structures of MutS from E. coli (20) and Thermus aquaticus (T. aquaticus) (21), neither of which bore any resemblance to the previously predicted structure. Over the years, these seminal structures have played an important role in improving our overall understanding of mismatch recognition. As suggested by their highly conserved amino acid sequences, the pair of crystal structures was shown to be remarkably similar (Figure 1.2A and Figure 1.2B) with each structure being made up of two individual homodimeric subunits that form the shape of a θ symbol (or, alternatively, a pair of praying hands). Each monomer consists of five structural domains, each of which is found to match the fold of a previously determined protein structure (21). Domain I is the DNA 4 binding domain and even though both subunits (S1 and S2) share the same sequence, only the S1 DNA binding domain interacts directly with the mismatch via a highly conserved Phe-X-Glu motif (20-30) (located near the N-terminus of the protein) while the S2 subunit makes nonspecific DNA contacts. Domain II is referred to as the connector and has been implicated in ATP-dependent interactions with MutL (9, 31). Domain III, the core domain, is largely responsible for providing structural support and is poised for transmitting long-range allosteric signals between both ends of the large protein (21, 32). Domain IV resides at the tip of the protein forming a clamp around the DNA with the help of domain I and, in both structures, the DNA is bent by about 60° near the site of the mismatch. However, in the absence of DNA, both domains I and IV are found to be highly mobile (21). Finally, domain V is the location for the well conserved nucleotide-binding site (20-21) (Figure 1.2D) which belongs to the ATP binding cassette (ABC) superfamily. Biochemical studies of E. coli and T. aquaticus MutS have demonstrated that the two chemically identical ATPase subunits act asymmetrically, each showing different affinities for ADP, ATP, and non-hydrolyzable ATP analogues (33-34). This domain also contains a conserved helix-turn-helix (HTH) motif which is crucial for mismatch binding, ATPase activity, and protein dimerization (35) (see Figure 1.2D). From single-molecule total internal reflection fluorescence microscopy (TIRFM), it has been clearly demonstrated that the DNA mismatch protein scans the DNA diffusively (in the presence or absence of ADP) (36) and, after recognizing a mismatch, ATP binds to MutS and causes the protein to dissociate from the mismatch and to form a stable clamp around the DNA (37-42). This is the so-called “sliding clamp” conformation. There has been an ongoing debate as to whether or not MutS sliding (not scanning) is dependent on the hydrolysis of ATP (as supported by the active translocation model) (39, 42) or whether the purpose of ATP hydrolysis 5 is to allow the instantaneous recovery of MutS back to a scanning mode (as supported by the molecular switch model) (38, 40). A more comprehensive discussion of these models is provided in the following review articles (16, 18, 43). Since 2000, several more structures of E. coli MutS containing a point mutation (44), bound to various mismatches (45), and doubly bound with ATP (instead of ADP as in the original structure) (29) have been published. However, these new structures only capture small local changes in the protein compared to the original and it has been suggested that: 1) different mismatches are recognized using the same binding mode (45); and 2) the MutS structure observed in the various crystals is possibly a trapped intermediate that is incapable of hydrolyzing ATP (29). More recently, several crystal structures of the hMSH2-hMSH6 (MutSα) human homolog (bound to different mismatches) were determined (30) (see Figure 1.2C) and were found to preserve many of the structural features first identified in the prokaryotic MutS homolog. 1.4 Investigating the Structural Dynamics of the DNA Mismatch Recognition Complex The structure and dynamics of the MutS-DNA complex (and its homologs) have been studied by using a wide range of experimental and theoretical techniques. As discussed above, the highresolution structural data from X-ray crystallography has been important for our understanding of the overall MutS structure, but it also provides some additional insight into the mobility of the protein as reflected by thermal B-factors. In principle, B-factors indicate the spread of electron density around a specific position in the map, and so parts of the structure that are disordered are 6 reflected by high B-factor values while low B-factors represent low mobility (46). However, the meaning of B-factors must be interpreted cautiously because crystal packing forces could have an adverse effect on the protein motions. At a much lower resolution, small angle X-ray scattering (SAXS) has also been helpful in identifying three different nucleotide-dependent conformations of Thermus thermophilus MutS (47). In that study, the size and general shape were measured in solution using SAXS, and it was found that the MutS structure (in the absence of DNA) was stretched out in the presence of ADP, more compact in the presence of ATP, and existed as an intermediate between the two when nucleotides were absent. While new observations were made using this method, the lack of atomic resolution makes it impossible to determine the precise nucleotides that are bound in the asymmetrical ATPases. This is an important point because each ATP binding site can be independently occupied by either ADP, ATP, or nothing at all, which means that there are a total of nine different possible nucleotidebound combinations to consider (two subunits and three nucleotide configurations). The dynamic nature of the MutS-DNA complex has been best characterized by using atomic force microscopy (AFM) (48). This method utilizes a flexible probe/stylus that scans the contours of a surface that is deposited with protein-DNA complexes and is capable of producing images with nanometer (nm) resolution (48-49). AFM images showed that MutS bound to homoduplex DNA belonged to a single population where the DNA took on a bent conformation while both bent and unbent DNA conformations were observed when MutS was bound at a mismatch site (48). These results led to the proposition that the DNA is kinked by 60° upon recognition of a mismatch (i.e., as in the crystal structure) but, ultimately, the protein undergoes a conformational change as a result of DNA unbending (16). These ideas were later expanded based on single molecule fluorescence resonance energy transfer (smFRET) experiments (50). 7 Since MutS was found to be more stable at an unbent mismatch site than at a homoduplex site, it was hypothesized that the unbent state bound by MutS may be stabilized by flipping out one of the mismatched bases (16, 48, 51). Interestingly, while there has been no direct evidence of the mismatch base (or any other base) flipping out of the helical stack, the dynamics of the bases surrounding the mismatch have been probed using 2-aminopurine, a fluorescent adenine analog that is commonly used to study DNA base flipping, and it was found that the 5’ adjacent base next to the mismatch experienced enhanced dynamics when bound by MutS (52). The relevance of this study is discussed in more detail in Chapter 4. The experimental methods discussed in the above section have all been vital in contributing to the current understanding of the MutS-DNA complex. However, most of the methods described above lack the level of detail required to accurately define the different structural conformations expected in the MutS system. Atomic detail computer simulations are capable of filling this void and can play an important complementary role in clarifying experimental data. As well, molecular dynamics (MD) has been used extensively to investigate other protein-DNA complexes (53-56) and is therefore well suited for studying the MutS-DNA system. In the next section, some of the key computational techniques referenced in this dissertation are introduced. 1.5 Computer Simulations In 1958, Kendrew et al. published the first high-resolution protein structure of myoglobin (57) which, to some extent, contributed to the initial view that proteins were rigid rather than dynamic structures (58-59). Nearly 20 years later, McCammon et al. broke new ground by being the first 8 to capture the protein dynamics of the bovine pancreatic trypsin inhibitor (BPTI) at atomic resolution from an 8.8 picosecond (ps) MD computer simulation (60). Since then, MD simulations have played a valuable role in complementing experiment by providing molecular level insight into the dynamical motions of individual macromolecules (59, 61). Modern MD simulations, which treat atoms as being the smallest particle in the system, begin with defining  the potential energy function, U R , which is traditionally made up of bonding terms and non- ( )  bonding terms and is written as a function of the Cartesian coordinates, R (61-64):  U R = ( ) ∑ bond lengths ∑ + 2 Kb ( b − b0 ) + dihedrals + ∑ elec ∑ bond angles Kθ (θ − θ0 ) Kφ 1 + cos ( nφ − δ )  +   qi q j + Drij ∑ 2 impropers Kω (ω − ω0 ) 2 (1.1)    A − B   ∑  12 6  rij   vdW  rij   Bonding terms typically consist of bond lengths ( b ), bond angles ( θ ), dihedral angles ( φ ), and improper dihedral angles ( ω ) (Eq. (1.1)). All constants with the subscript 0 represent equilibrium values and Kb , Kθ , Kφ , and Kω denote the respective force constants. The dihedral angle term is modeled as a sinusoidal function where n and δ are the periodicity and phase shift, respectively. The non-bonding terms consist of electrostatic and van der Waals interactions (Eq. (1.1)) where rij corresponds to the distance between atoms i and j. The electrostatic interactions are calculated between point charges qi and q j using a Coulombic 9 potential where D represents the effective dielectric function for the medium. The combined parts within the van der Waals term is often referred to as the Lennard-Jones 6-12 potential and accounts for the repulsion of atomic cores at short distances and for the attractive London dispersion forces. A more thorough explanation of these and other new or missing terms can be found in the following references (61-64). Once the potential energy function is established, a molecular dynamics simulation can be initiated by solving Newton’s equation of motion: Fi = mi ai (1.2) where mi and ai are the mass and acceleration of atom i, respectively. Fi is the force acting on  atom i and is computed from the gradient of the potential energy function, U R : ( )  Fi = −∇iU R ( ) (1.3) Starting with the coordinates of a high-resolution crystal structure, the standard procedure for running an MD simulation typically involves first minimizing the structure in order to remove steric clashes and to relieve local strains within the structure. Next, initial velocities are randomly assigned to each atom from the Maxwellian distribution starting from a low temperature, T, and ai is computed from the force as described above. Then, Newton’s equation of motion (which is an ordinary differential equation with no analytical solution) can be solved through numerical  integration using discrete steps in order to determine the new position of each atom, ri , at some time t + ∆t . The Taylor expansion of the coordinate for a particular atom around time t can be written as: 10 1    ri ( t + ∆t ) ri ( t ) + vi ∆t + ai ∆t 2 +  = 2 (1.4)  where ri ( t ) is given and ∆t should be some short time step (typically between 1-2 fs) that is smaller than the period of the highest frequency motion and chosen to ensure stability in the potential energy between each simulation step (61, 65). Several integrators exist for continuously integrating Newton’s equations of motion (e.g. Verlet, leap-frog, velocity Verlet, etc.), but discussions of the pros and cons are beyond the scope of this dissertation and the reader is referred to the following resources (58, 65). Normally, to avoid creating local “hot spots” with high velocities, the simulation is gradually heated by applying new velocities from a corresponding Maxwellian distribution at a given temperature up to the target temperature. Once the system is fully equilibrated, the simulation is ready for its production run. The resulting production simulation trajectory will contain snapshots of the system collected from the full trajectory and any dynamic variable (e.g. angles, distances, energies, etc) can be measured and plotted as a function of the simulation time. More importantly, average values can also be calculated from the time series plots for comparison with experiment. Some of the most popular MD simulation programs currently available include CHARMM (63-64), AMBER (66), GROMOS (67), and NAMD (68). Since the inception of computer simulations, there have been significant advances in computer hardware and software. In fact, in 2010, using a specially constructed, state-of-the-art machine called Anton (designed for producing extensively long simulation trajectories) (69), Shaw et al. revisited the historical BPTI protein and became the first group ever to simulate a protein in explicit solvent for a full millisecond (ms) (70), nearly 100 times longer than what was 8 previously possible and more than 10 times longer than the original BPTI simulation. Extending 11 the simulations from microsecond (μs) to ms time scales for much larger systems is not trivial, but it opens the door for studying more complex biological phenomena. In this dissertation, both the CHARMM (63-64) and NAMD (68) simulation programs have been used, along with the CHARMM27/CMAP force field (71-73), in order to study the conformational dynamics of the E. coli MutS protein on sub-μs time scales (see Chapter 3 and 4). Often times large barriers exist between two conformational states (e.g. open and closed states) which may not be effectively sampled by using straight MD due to the long simulation times required to observe these transitions. Thus, it may be necessary to employ enhanced sampling techniques such as Umbrella Sampling (US) (74) or the Hamiltonian Replica Exchange Method (HREM) (75) in order to overcome these barriers. In US, a carefully chosen restraining potential (or umbrella potential), U umbrella (ξ ) , is added to the potential energy function,  U R , in order to bias the sampling towards a particular region along the reaction coordinate of ( ) interest, ξ , that otherwise would be rarely visited. The resulting biased probability distribution, Pbiased (ξ ) , can be unbiased according to: Punbiased = e (ξ ) βU umbrella (ξ ) × Pbiased (ξ ) × e− β f (1.5) where, β = 1 k BT ( k B : Boltzmann constant and T : temperature) and f is a constant that comes  from adding U umbrella (ξ ) to U R . The corresponding free energy profile (or sometimes ( ) referred to as the potential of mean force (PMF)), wunbiased (ξ ) , can then be obtained from: wunbiased (ξ ) = −k BT ln Punbiased (ξ ) 12 (1.6) U umbrella (ξ ) can take on any functional form but is usually chosen as a quadratic function: U umbrella (ξ ) Kξ (ξ − ξ0 ) = 2 (1.7) where Kξ is the force constant that controls the width of the umbrella potential and ξ0 is the target equilibrium value along ξ . In this case, a sufficiently large value of Kξ would lead to small deviations away from ξ0 and consecutive overlapping simulations that progress ξ0 incrementally along ξ between two states, where the structural output of the last simulation is used as input in the next simulation in a daisy chain fashion, ultimately leading to barrier crossing. These individually biased simulations (often called “windows”) can be easily unbiased according to Eq. (1.5) and a relative free energy profile (or PMF) along ξ can be constructed using the Weighted Histogram Analysis Method (WHAM) (76). HREM is a method where multiple non-interacting copies of the same system, called replicas, each using a restraining potential with a different value of ξ0 are generated and independently simulated for a given time. Periodically, replicas are compared with neighboring conditions (or neighboring values along ξ ) and may be swapped based on a specific energetic criteria (75) (see Figure 1.3). The probability of accepting or rejecting an exchange follows the Monte Carlo Metropolis criterion, W : ( ) W ( X i , Em ; X j , En ) = W X i , Em ; X j , En = 1 for ∆ ≤ 0; exp ( −∆ ) for ∆ > 0 13 (1.8) { ( ) ( )} where ∆ = β  Em X j + En ( X i )  −  Em ( X i ) + En X j  and E ( X ) is the potential energy     of a system for a given configuration, X . After a number of exchange cycles, the HREM technique essentially promotes multiple instances of barrier crossing by multiple replicas and, similar to US, allows accurate relative free energies to be calculated. Both the US and HREM methods are applied to study DNA base flipping in MutS (see Chapter 4). Also, the general lack of understanding about how MutS distinguishes between homoduplex and mismatch DNA as it scans duplex DNA has motivated the development of a novel multidimensional path-based restraining potential. In Chapter 5, this new umbrella potential for studying DNA translocation is presented along with a discussion on generating PMFs along arbitrary reaction coordinates from multidimensional WHAM. Normal mode analysis (NMA) is an effective computational method for deducing largeamplitude conformational dynamics. Unlike MD simulations where the computational cost of simulating slow conformational changes in large macromolecular assemblies becomes prohibitive as the number of atoms within the system increases significantly, NMA extracts biologically relevant motions (often represented by the lowest frequency vibrational modes) by approximating the potential energy surface of the system as a parabolic function around the potential energy minimum (65, 77-78). If we let X 0 be the equilibrium configuration comprised of N atoms and whose potential energy resides at a minimum, then the Taylor expansion of the potential energy function E ( X ) around X 0 (truncated after the quadratic term for small displacements) can be written as: 3 N  ∂E  3N  ∂ 2 E 0 + 0 +1  E ( X= E X ) ∑  ∂X  X i − X i 2 ∑  ∂X ∂X = 1  i 0 i = 1 i j i, j ( ) ( ) 14   Xi − X 0 i  0 ( )( X j − X 0j ) (1.9) However, since X 0 is the minimum of the potential energy function and the potential energy ( )  ∂E  0 can be set to zero and Eq. (1.9) can be defined relative to X 0 , then both   and E X  ∂X i 0 reduces to: 1 3N  ∂ 2 E  E(X ) = ∑  ∂X ∂X 2 i, j =1 i j   Xi − X 0 i  0 ( )( X j − X 0j ) (1.10) Thus, the potential energy surface is approximated by a harmonic function centered around the energy minimum and which is governed by the second derivatives in Eq. (1.10). The substitution of Eq. (1.10) into Newton’s equation of motion yields: F ( X ) = −∇E ( X ) (1.11) which can be rearranged and expressed in matrix form as:  ∂2E ∂2E   ∂X1∂X 3 N  ∆X1   ∂X1∂X1 d  + m       dt 2  ∆X  3N   ∂ 2 E  ∂2E    ∂X 3 N ∂X1 ∂X 3 N ∂X 3 N      ∆X1     = (1.12)  0    ∆X 3 N       where ∆X i = X i − X i0 and the 3 N × 3 N matrix of second-order partial derivatives is commonly referred to as the Hessian. Diagonalization of the Hessian matrix produces a set of 3N eigenvalues and 3N eigenvectors which are directly related to the frequencies and normal modes, respectively. A more detailed derivation of Eq. (1.10) and a discussion of the methods for diagonalizing the Hessian is provided in reference (78). 15 Normally, for a system with N atoms, standard NMA is carried out in three steps: 1) the starting structure is extensively energy minimized using an appropriate force field; 2) The Hessian is calculated; and 3) The eigenvalues and eigenvectors (or normal modes) are computed by diagonalizing the Hessian. Of course, as with all approximations extra care must always be taken to make use of experimental information in order to properly pinpoint and assign the functionally relevant modes (79). The method of NMA is used in Chapter 2 to examine the biologically important motions in the human and E. coli MutS proteins. In the following chapters, the results from several extensive computer simulations will be presented and carefully compared to experimental data. In addition, a newly developed pathbased umbrella potential used for studying DNA translocation will be proffered. 16 Figure 1.1 Schematic representation of the methyl-directed DNA mismatch repair in E. coli. Post-replicative DNA mismatch repair begins when MutS scans the newly synthesized DNA in search of base-base mismatches or insertion/deletion loops. Upon mismatch binding, additional downstream repair proteins (MutL, MutH, DNA helicase, exonuclease, SSB) are recruited and the mismatch containing strand is excised. Following, DNA polymerase III and ligase resynthesize the missing DNA strand and the DNA is repaired. SSB has been omitted for clarity. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. 17 Figure 1.1 18 Figure 1.2 DNA mismatch recognition proteins (front and side views) bound to mismatch DNA. The DNA binding domains are colored red/pink, the connector domains are colored orange/pale orange, the core domains are colored yellow/pale yellow, the clamp domains are colored green/pale green, the ATPases are colored blue/pale blue, and the DNA is colored brown/tan. A) E. coli MutS (20). B) T. aquaticus MutS (21). C) Human MutSα (30). D) E. coli MutS dual ATPases bound by two ATP molecules (purple spheres) (20). The conserved helix-turn-helix is denoted by HTH. 19 Figure 1.2 20 Figure 1.3 Schematic representation of the Hamiltonian Replica Exchange Method. Multiple simulations are coupled and conditions (or replicas) are exchanged in periodic intervals according to a Monte Carlo Metropolis criterion in order to enhance sampling along a specific reaction coordinate (75). 21 Figure 1.3 22 Chapter 2 Deciphering the Mismatch Recognition Cycle in MutS and MSH2-MSH6 Using Normal Mode Analysis Shayantani Mukherjee, Sean M. Law, and Michael Feig Adapted from Biophys. J., 2009, 96, 1707-1720 Sean Law contributed significantly to the analysis of the data from normal mode analysis and also produced several figures for the publication. 23 2.1 Abstract Post-replication DNA mismatch repair is essential in maintaining the integrity of genomic information in prokaryotes and eukaryotes. The first step in mismatch repair is the recognition of base-base mismatches and insertions/deletions by bacterial MutS or eukaryotic MSH2-MSH6. Crystal structures of both proteins bound to mismatch DNA reveal a similar molecular architecture, but provide limited insight into the detailed molecular mechanism of long-range allostery involved in mismatch recognition and repair initiation. This study describes normal mode calculations of MutS and MSH2-MSH6 with and without DNA. The results reveal similar protein flexibility and suggest common dynamic and functional characteristics. A strongly correlated motion is present between the lever domain and ATPase domains, which proposes a pathway for long-range allostery from the N-terminal DNA binding domain to the C-terminal ATPase domains, as suggested from experimental studies. Detailed analysis of individual low frequency modes of both MutS and MSH2-MSH6 shows changes in the DNA binding domains coupled to the ATPase sites, which are interpreted in the context of experimental data to arrive at a complete molecular-level mismatch recognition cycle. Distinct conformational states are proposed for DNA scanning, mismatch recognition, repair initiation, and sliding along DNA after mismatch recognition. Hypotheses based on the results presented here form the basis for further experimental and computational studies. 24 2.2 Introduction DNA mismatch repair (MMR) pathways maintain the integrity of genomic DNA by eliminating errors incorporated during replication and recombination. The initial steps of DNA-mismatch recognition and repair initiation in the post-replication MMR pathway are mostly conserved from bacteria to human with MutS in prokaryotes and MutS homologs (MSH) in eukaryotes recognizing defective DNA and initiating repair (16-17, 80). A functional MSH protein leading to correct mismatch recognition and subsequent deletion is especially important in humans for the avoidance of cancer phenotypes (81). Prokaryotic MutS is comprised of monomers with identical sequence, termed S1 and S2, although it forms a structural heterodimer when bound to DNA (20-21). MutS is known to recognize base-base mismatches and short base insertions or deletions leading to their successful repair. In eukaryotes, at least 7 variants of MSH have been identified. They form a number of heterodimers, of which MSH2-MSH6 corresponds most closely to MutS (with MSH2 corresponding to S2 and MSH6 corresponding to S1) (80). Like MutS, the MSH2-MSH6 complex also recognizes base pair mismatches with high efficiency and single base insertions or deletions but does not efficiently recognize longer base insertions or deletions (16, 80). After successful association of MutS or MSH2-MSH6 with a mismatch, a complex is formed in the presence of ATP with MutL in prokaryotes (7) or MutL homologs (MLH) in eukaryotes (82) in order to promote downstream repair events. Crystal structures of prokaryotic MutS from Escherichia coli (E. coli) (20), Thermus aquaticus (21) and human MSH2-MSH6 (30) bound to different base pair mismatches or a single thymine insertion/deletion have become available. The structures all show the same architecture with two main functional sites at 25 opposite ends of the dimer: a DNA binding site and an ATPase site. As evidenced by the crystal structures, the clamp and DNA binding domains (domains IV and I, respectively) from both chains (S1 and S2 of MutS or MSH6 and MSH2 of human) encircle the mismatched DNA (Figure 2.1). However, only the DNA-binding domain of one of the chains is in direct contact with the mismatch, giving rise to structural and functional asymmetry between the dimer moieties. Specific contacts with the mismatch base are made through a conserved ‘Phe-X-Glu’ motif in the DNA binding domain of chain S1 in MutS and MSH6. Insertion of this motif into the minor groove of the DNA is coincident with significant DNA bending (~60˚) and minor groove widening at and around the mismatch site compared to canonical DNA. The bent conformation of the DNA is further stabilized through non-specific contacts from the clamp domain. The nucleotide binding domains (domain V) reside on the opposite end of the protein with the ATP binding sites (ATPase sites) lying close to the dimerization interface. Biochemical studies have provided evidence for functional coupling between DNA scanning, mismatch recognition, repair initiation and ATPase activity (34, 44, 83-84), which suggests allosteric signaling within the MutS or MSH dimers. Each MutS ATPase domain, belonging to the ATP binding casette (ABC) superfamily (85), is comprised of functionally important residues from both chains as shown in Figure 2.1C (29). The nucleotide binding site residing in each particular chain consists of Walker A and Walker B loops that are important for nucleotide phosphate binding and phosphate catalysis, respectively. Another loop containing a conserved phenylalanine residue (596 in MutS, 650 in MSH2, and 1108 in MSH6) stacks with the nucleotide adenine ring and the cavity is completed by the signature loop of the opposite monomer, which has been suggested to play an important role in catalysis. Several studies 26 suggest that the ATPase activities of the two chains are strongly correlated with each other and that they follow a sequential, rather than a simultaneous, pattern of ATP hydrolysis (44). Moreover, both sites show intrinsic asymmetry in the ATPase activity with nucleotide binding affinities changing significantly for each ATPase site during the recognition cycle (33-34, 44, 83-84). In free enzyme or when bound to regular DNA, the chain that contacts the DNA mismatch (S1 or MSH6) has a higher affinity for ATP compared to the other chain, while chain S2 or MSH2 binds mostly ADP (33, 84). It is further known that ATP hydrolysis is fast in S1/MSH6 when the protein is bound to regular DNA and that ADP release is the rate-limiting step (34). The ATPase site of the other chain has a much slower hydrolysis rate (84). These results highlight a differential behavior of the two ATPase sites when the protein is bound to regular DNA as depicted schematically in Figure 2.2. During scanning of regular DNA, the nucleotide-binding domain of chain S1/MSH6 binds ATP followed by fast hydrolysis to ADP. However, since exchange of ADP for ATP is not as fast as hydrolysis, ADP would be bound to this site for the majority of the time. At the same time, ADP is also bound predominantly to the other nucleotide-binding domain of chain S2/MSH2. Experimental data suggests that mismatch binding promotes the exchange of ADP for ATP while stalling ATP hydrolysis of S1/MSH6 (34, 83). The resulting prolonged ATP bound state at S1/MSH6 'authorizes' recognition of a mismatch by the DNA binding domain, whereas ATP is readily hydrolyzed when the DNA binding domain is bound to regular DNA (24). Furthermore, stable ATP binding by S1/MSH6 ultimately leads to reduced ADP binding affinity in the ATPase site of S2/MSH2. This presumably enhances the ATP binding affinity of S2/MSH2 (84). The dual ATP bound state is believed to trigger a conformational change to a sliding clamp conformation where the mismatch is released by the DNA binding domain and 27 rebinding of mismatched DNA is inhibited (83-84). Interestingly, a recent single molecule study on MSH2-MSH6 has demonstrated that the sliding motion along DNA after mismatch recognition is independent of ATP hydrolysis (36). While there is a general understanding of the long-range allostery of MutS and its homologs involved in recognition and repair initiation, the molecular-level events leading to the functional correlation between N-terminal DNA mismatch recognition and C-terminal nucleotide binding and hydrolysis have remained elusive. Further advances in this respect have been hindered by the fact that the available crystal structures only show the mismatch-bound state and do not provide information about the different nucleotide bound combinations in the two ATPase domains. Until now, the complex interplay between functional states of the two ATPase and DNA binding sites is mostly understood from biochemical kinetic studies that, on the other hand, fail to offer a molecular-level understanding of the process. While experiments may continue to reveal additional information for different functional states, conformational sampling of proteins can also be studied by theoretical means. Molecular dynamics simulations that often offer insights in this regard are not easily applicable to MutS because of the long time scales of the mismatch recognition process and the large system size of the MutS-DNA complex. Normal mode analysis (NMA) is an alternative strategy for studying large-scale conformational changes in biomolecules. NMA relies on a harmonic approximation of the potential energy surface around a minimum energy structure and the resulting lowest frequency dynamic modes often resemble biologically relevant functional motions (27-28). Here, we have applied NMA to study the conformational dynamics of MutS and its eukaryotic homolog MSH2-MSH6. The results suggest a new molecular-level understanding of the long-range allosteric pathway in the functional interplay between DNA mismatch recognition, nucleotide binding activity, and repair 28 initiation. Structural characterization of distinct conformational states along with the elucidation of a complete functional cycle offers possible avenues of validating the proposed cycle through experiments. 29 2.3 Methods Normal mode calculations were performed on E. coli MutS and its human homolog, MSH2MSH6, both in the absence of any bound nucleotides. Calculations were performed on each protein in the presence or absence of DNA, resulting in a total of 4 sets of normal mode calculations. They are referred to as MSH-DNA, MSH-free, MutS-DNA, and MutS-free for MSH2-MSH6 with DNA, MSH2-MSH6 without DNA, MutS with DNA, and MutS without DNA, respectively. Initial structures of E. coli and human protein were obtained from PDB IDs 1E3M (20) and 2O8B (30), respectively, and missing loops were constructed using Modeller (86). The structures were then extensively energy minimized using the CHARMM22/CMAP force-field (87) and distance dependent dielectric (ε = 4). The root mean square deviation (RMSD) of the minimized structures with respect to the crystal structures were 2.08 Å, 2.42 Å, 1.28 Å and 2.06 Å for MSH-DNA, MSH-free, MutS-DNA, and MutS-free, respectively. Low RMSD values indicate that extensive minimization in absence of explicit water or DNA does not lead to significant structural deviations from the crystal structure. Normal modes were calculated using the block-normal mode approach using the VIBRAN module in CHARMM (88-89), version c33a2, and with the same force field as used for minimization. Only low frequency modes were analyzed in both proteins as those are the most relevant for describing functional motions involving the entire complex. In order to calculate the similarity between individual modes among MutS and MSH2-MSH6, we defined the overlap index for each pair of modes (i,j)   as:  ∑ Sik  H jk  k ; where k is the number of aligned residues of MutS and MSH, Sik is the    k  k th component unit vector of ith mode of MutS and H ik is the k th component unit vector of 30 j th mode of MSH. Each dot product contributing to the sum is between unit vectors and can possess a maximum value of 1 for residue pairs moving in exactly the same direction, or a value of -1 for residue pairs moving in exactly the opposite direction. The value of the overlap index can thus reach a maximum value of 1 for an ideal case where all aligned residues of two proteins are moving in exactly same or opposite direction. The sequence alignment between MSH2MSH6 and MutS was taken from previous work (30). Molecular graphics were generated using PyMOL (90). 31 2.4 Results and Discussion Normal mode calculations were carried out to explore the possible conformational dynamics of MutS and MSH2-MSH6 from the perspective of the crystal structures. Apart from conducting NMA calculations on both proteins with bound DNA, we have also considered proteins without DNA in order to allow dynamics beyond the DNA mismatch bound form. Results from the analysis of MutS and MSH2-MSH6 in the presence and absence of DNA are discussed below. Data from 4 different sets of normal mode calculations are referred to as MSH-DNA, MSH-free, MutS-DNA, and MutS-free for MSH2-MSH6 with DNA, MSH2-MSH6 without DNA, MutS with DNA, and MutS without DNA, respectively. Flexibility of MutS and MSH2-MSH6 from Normal Modes and X-ray Data Root mean square fluctuations (RMSF) provide information about inherent protein flexibility. They can be deduced from experimental B-factors or can be calculated from normal modes (91). The results in Figure 2.3 (A-D) show that the RMSFs calculated from experimental B-factors are uniformly high due to the limited resolution of the MSH2-MSH6 crystal structure (2.75 Å) and do not provide significant information about relative domain fluctuations. RMSFs from B-factors of the MutS crystal structure (with a resolution of 2.2 Å) are still high but indicate increased flexibility in MutS domains I, IV and parts of III, in particular, for chain S2. On the contrary, RMSFs calculated from the first 10 normal modes show significant differences in the domain movements. MutS-free and MSH-free exhibit large flexibility in the DNA binding and clamp domains (I and IV) and to a lesser degree between the lever domains (III). In contrast, the ATPase domains (V) show comparably low structural fluctuations. Mode calculations for MutS32 DNA and MSH-DNA provide qualitatively similar results, but with damped flexibility in the clamp domains and in the DNA binding domains of chain MSH6 and MutS S1. Figure 2.4 (A-D) show both proteins colored according to the B-factors calculated from normal mode RMSF values. The NMA-based dynamics of MutS and MSH2-MSH6 are remarkably similar between chains as well as between the prokaryotic and eukaryotic enzymes. MSH6 and domain I of MSH2 appear to be slightly more rigid compared to MutS which may be related to the functional specialization of MSH2-MSH6. One may speculate that MutS requires increased flexibility to recognize both mismatches and longer insertions/deletions in contrast to MSH2/MSH6 which only recognizes mismatches and single base insertions/deletions. While an absolute comparison of RMSF values from a small number of normal modes between proteins of different size may be problematic, very similar results were obtained when motions of the first 100 modes were accumulated (data not shown). Furthermore, it appears that MSH2 and MutS S2 are slightly more flexible than MSH6 and MutS S1, respectively, with the exception of the clamp domain for MSH-DNA and MutSDNA systems. The increased flexibility in domain I of MSH2 or MutS S2 and decreased flexibility of the clamp domain compared to the other chain correspond to the structural asymmetry of MutS and MSH2-MSH6. Increased flexibility of domain I of MSH2/MutS S2 is probably due to the fact that they do not make considerable contacts with the DNA, while extensive DNA contacts of the clamp domains of MSH2/MutS S2 compared to the other chain accounts for its decreased flexibility. It was observed that the clamp domain of MSH2 and MutS S2 make 83 and 86 atomic contacts with the DNA, respectively, while that of MSH6 and MutS S1 make only 62 and 42 contacts, respectively. It was further observed that clamp and lever of 33 MutS S1 are more flexible compared to that of MSH6, which again is a result of less atomic contacts made by the clamp of S1 with the DNA than that of MSH6. The number of atomic contacts was calculated by considering protein heavy atoms around 5 Å of the DNA in the minimized structure of both proteins. Finally, Figure 2.4 highlights that the clamp and lever domains of MSH2 in MSH-DNA are slightly more flexible than the corresponding domains in MutS S2 of MutS-DNA. These differences are probably the result of the proteins being bound to DNA segments of varying lengths. MutS-DNA has a longer DNA (3 base pair steps more than MSH-DNA) which topologically constrains the mobility of MutS S2, giving rise to a comparatively rigid S2 clamp than that of MSH2. The portion of the lever domain of S2 tightly connected to the clamp also undergoes some degree of rigidification. The rigidification of MutS S2 may be more close to reality, as DNA undergoing repair in the cell is much longer than those observed in the crystal structures. It should also be mentioned that a substantial longer DNA can alter the extent of flexibility observed in the clamp of the other chain; namely S1 and MSH6. Hence, different level of flexibility of clamp and lever due to a much larger DNA cannot be directly inferred from these studies using fragmented DNA, except for the fact that an overall decreased flexibility of the clamps and levers will result in both chains when compared to DNA free systems. Finally, it was observed that the RMSF for protein with DNA spikes at residues 1275 to 1281 in MSH6 and residues 663-666 in MutS S2. This is likely a manifestation of the tip effect (92) and is considered physically meaningless. Correlated Motions in MutS and MSH2-MSH6 34 Covariance plots averaged over the ten lowest NMA modes were calculated to examine correlated motions in all 4 systems under investigation. The results shown in Figure 2.5 indicate similar overall correlations in MutS and MSH2-MSH6 in the absence and presence of DNA. Furthermore, both chains of MutS and MSH2-MSH6 show similar average correlation patterns, with only minor variations, despite the structural asymmetry of the complex. Common to all chains are correlations within each domain, reflecting rigid body domain motions, like correlations between adjacent domains II (connector domain) and III (lever domain), between III and V (ATPase domain), and between I (DNA binding domain) and II. While correlations within the same subunit are generally positive, correlations between dimer moieties are mostly negative with the exception of a strong positive correlation between the two ATPase domains and the two clamp domains as a result of dimerization. The plots indicate high positive correlation between the lever domains and parts of the ATPase domains immediately adjacent to the lever including the ATPase binding sites. Experiments suggest the presence of long range allostery between the N-terminal DNA binding domain and the C-terminal ATPase domains, although a clear understanding of the allosteric pathway is missing. Strong correlations between the ATPase sites and lever domains highlight the propagation of signals within the two functional sites via the levers. Furthermore, the DNA binding domain in MSH6 and MutS S1 has a strong negative correlation with the ATPase domain in MSH2 and MutS S2, respectively, in particular for MSH-DNA and MutS-DNA, again suggesting conserved domain motions important for allostery. 35 Correspondence Between MutS and MSH2-MSH6 Modes The analysis of RMSFs and motional correlations indicates that MutS and MSH2-MSH6 exhibit similar dynamic characteristics, both in the presence and absence of DNA. Furthermore, the correlation analyses from the first ten modes suggests dynamic coupling between DNA binding and ATPase activity. In order to explore this point in more detail, the ten lowest-frequency modes were individually compared between the same protein, i.e., MSH-free and MSH-DNA or MutS-free and MutS-DNA. The same comparison was also performed between different proteins, i.e., MSH-free and MutS-free and MSH-DNA and MutS-DNA. Table 2.1 shows the overlap indices calculated between any pair of modes from 4 different systems as described in the methods section. An overlap index value of 1.0 means that atoms move in identical directions in the two modes that are compared, a value of 0.0 means that motions are entirely orthogonal or that atom motions have zero amplitude. While a value of 1.0 or close to it is unlikely even for very similar structures, visual inspection of the MutS and MSH modes indicate that the motions are qualitatively similar when overlap indices are at 0.6 and above and, to a lesser but still substantial extent, when values are between 0.5 and 0.6, especially when MutS is compared to MSH. Relatively low overlap indices despite visually similar motions are due to uncertainties in the alignment between the two proteins with a sequence identity of only 21% and 24% for MSH2 and MSH6 (30), differences in structure, and significant overall flexibility due to the multidomain nature of both MutS and MSH. Table 2.1 shows that the highest degree of overlap on a mode-by-mode basis exists between MSH-free and MSH-DNA and also between MSH-free and MutS-free systems. There is a lesser degree of one-to-one correspondence between MutS-free and MutS-DNA and also between MSH-DNA and MutS-DNA, with individual modes being reordered more significantly 36 according to frequencies in these systems. High one-to-one overlap is found between modes 1, 2, 3, 4 of MSH-free and modes 1, 5, 3, 4 of MutS-free, between modes 1, 3, 4, 9, 10 of MSH-DNA and modes 2, 5, 4, 9, 10 of MutS-DNA, between modes 1, 2, 4, 5, 8, 9 of MSH-DNA and modes 2, 3, 4, 5 and 9, 10, 7 of MSH-free, and between modes 1, 2, 3, 5 of MutS-DNA and modes 2, 5, 3 and 9, 7 of MutS-free. It is apparent that many modes do not match on a one-to-one basis, but share common features with multiple modes (e.g. mode 2 of MSH-free matches modes 2, 5, and 8 of MutS-free). It is known from previous studies that complex domain motions responsible for altered functional states in a large protein are often better represented as a combination of low frequency modes. Our studies also suggest that the low frequency modes of both MSH and MutS exhibit almost similar domain flexibilities, while specific protein dynamics are often seen to occur as a combination of multiple modes showing different degrees of mode mixing in both proteins. The degree of mode mixing observed in this study will likely change as a result of using different force fields or coarse-grained models, but low frequency normal mode space will likely be conserved in all normal mode analyses, provided the starting structure remains the same. Thus, the main aim of this study is to highlight the conserved nature of domain motions in both proteins, rather than highlighting any specific mode or modes responsible for the protein function. The presence of DNA alters the structural flexibility to some extent, as evident from the differences between modes in the presence and absence of DNA. For example, mode 1 is present in MSH-free and MutS-free, but not in MSH-DNA or MutS-DNA. As described in more detail below, the mode involves large motions of the clamp domains that are not possible in the presence of DNA. Visual inspection further reveals that altered motions of the clamp and DNA binding domains for structures in the presence and absence of DNA is a major factor in reduced 37 mode overlap indices between the two protein systems, despite otherwise similar overall motion. Interestingly, MSH modes are much more conserved in MSH-free and MSH-DNA systems than for the two systems in MutS. This suggests that protein flexibility is altered more in MutS than in MSH2-MSH6 through specific DNA interactions, especially near the DNA binding domain and clamps. This further reflects a more rigid overall structure in MSH2-MSH6 that is optimized to interact with mismatched DNA while MutS requires more structural flexibility to interact both with mismatched DNA or significantly distorted DNA structures with insertions or deletions. In this study, we will focus more on modes from MSH-free and MutS-free since they are more likely to indicate motions from the known mismatch bound crystallographic structures towards alternate states during DNA scanning and mismatch repair. Comparison of the modes between MSH-free and MutS-free indicate that modes 1, 3, 4, and 9 from both complexes significantly overlap and may be considered equivalent. Modes 2 and 5 overlap significantly, suggesting that these modes are simply reordered with respect to their frequencies. However, there is also overlap between modes 2 of both complexes suggesting common features in both modes. Otherwise, there is significant mode overlap along the diagonal for modes 7 through 9 and additional limited off-diagonal overlap for modes 6 through 10. Overall, mode overlaps between MSH-DNA and MutS-DNA are lower but high overlap indices are again mostly limited to diagonal or near-by off-diagonal elements, with modes 1, 2, 3, 4, 5, 6, 9, and 10 of MSH-DNA corresponding to modes 1 and 2, 2, 5, 4, 6, 6, 9, and 10 of MutS-DNA. Low-frequency Modes in MSH2/MSH6 38 Based on our analyses, the dynamic characteristics are largely conserved between MutS and MSH2-MSH6, both in the presence or absence of DNA. This is not surprising but it is also not trivial given the structural differences between the eukaryotic and prokaryotic enzymes and the slightly different biological functions. In the following, we will describe the lowest frequency modes in more detail with a focus on the modes of MSH-free. The protein motion during each of the first 5 modes of MSH-free is shown in Figure 2.6. Close-up views of the ATPase domains of selected modes are shown in Figure 2.7. Initial visual inspection suggests the following general conclusions about the nature of domain motions in both MutS and MSH2-MSH6: 1) Most of the modes show an overall breathing motion of the DNA binding cavity involving the clamp and DNA binding domains. The parts of the clamp domains directly bound to the DNA backbone always show damped motion in proteins with DNA, although movements of other parts of the DNA binding cavity show a similar kind of breathing motion. Such an opening/closing motion of the DNA binding cavity corresponds to conformational transitions between a mismatch-bound state and scanning/sliding conformations where the interaction with DNA is presumed to be weaker. 2) Many modes show a correlation between opening/closing of the DNA binding cavity and alterations in the ATPase domain, in particular, the nucleotide-binding cleft. This finding establishes that MutS or MSH2-MSH6 is capable of allosteric communication between DNA binding and ATPase activity. The correlation between motions of the DNA binding domains and the ATPase domains varies as it may involve the MSH6, MSH2, or both ATPase domains in an alternating fashion. 3) A mode that affects the nucleotide-binding cleft in both ATPase domains in the same manner and at the same time is not observed in any of the 4 cases studied. This finding agrees with the experimental evidence that 39 ATPase activity in MSH2-MSH6 involves the two domains only in a sequential rather than simultaneous fashion (44). The individual modes are described in detail in the following: Mode 1 involves a wagging motion of the clamp domain along the direction of the DNA. The rotating motion around the core, apparent in the rest of the enzyme, results from a fixed center of mass. If the protein is aligned at domains I, II, III, and V, only the clamp domain IV moves in this mode. Mode 1 involves both chains to the same extent. The exact functional role of this mode is unclear but it may be related to the translocation of MSH2-MSH6 along DNA in the absence of mismatch when the clamps do not establish strong contacts with the DNA backbone. This mode is absent in both proteins bound to DNA mismatch, presumably due to residue contacts with the bent DNA. Mode 2 consists of a partial opening/closing motion of the DNA binding site that is less pronounced than in some of the other modes. The unique aspect of this mode is an alternating opening/closing of the nucleotide binding clefts between the MSH2 and MSH6 ATPase domains (see Figure 2.6). It appears likely that this mode is involved in coupling MSH6 and MSH2 ATPase activity in a sequential fashion. As mentioned above, mode 2 in MSH-free has high overlap with mode 5 of MutS-free. However, this is due to similar motions in the DNA binding, clamp, and core domains. The alternating opening/closing of the two nucleotide binding sites is not present in mode 5 of MutS-free but is seen instead in mode 2 of MutS-free. An alternating ATPase movement correlated similarly to motions in the DNA binding cavity is also observed in mode 1 of MutS-DNA and MSH-DNA, suggesting that the inter-domain correlation is conserved in all systems. 40 Mode 3 couples opening of the DNA binding site with closing of the nucleotide binding cleft in MSH2. The opening of the DNA binding site is achieved by the movement of the clamp domains away from the DNA as well as the movement of the DNA binding domain in MSH6 out of the plane of the MSH2-MSH6 complex. Relative to the DNA, this motion moves domain I out of the DNA groove rather than along its helical axis. In the open form of this mode, most DNA contacts of MSH6 near the mismatch site are lost and the DNA can essentially slide freely relative to MSH2-MSH6. This mode is highly conserved in all other systems of MutS and MSH and is thus expected to play an important role in the protein’s functional cycle. An almost identical mode is observed for mode 2 and 3 for MSH-DNA and MutS-DNA. We note, though, that the overlap index between the two modes in MSH-DNA and MutS-DNA is small due to altered clamp movements in MutS-DNA but otherwise they show similar domain motions. Mode 4 is comprised mainly of a sideways motion of the clamp and part of the lever domains towards either the MSH2 or MSH6 side of the enzyme. This mode is asymmetric with respect to the overall complex. A symmetric version of this mode would result in clamp domain separation and lead to an open dimer where the clamp domains are far away from each other as proposed for the DNA-free complex from small-angle X-ray scattering (47). The symmetric mode is not observed, presumably due to limitations of the harmonic approximation in normal mode analysis. Mode 5 involves closing of the DNA binding cavity that is coupled with opening of the nucleotide binding cleft in MSH6. The closing of the DNA binding cavity is achieved primarily by the motion of the clamp domains directly towards the DNA. A similar overall motion is also found in mode 2 of MutS-free, although the coupling between opening and closing of the DNA binding cavity with changes in the ATPase domain of MutS S1 is more pronounced in mode 6 of 41 MutS-free. It is likely that MutS-free achieves a motion equivalent to the MSH-free mode 5 through a combination of modes 2 and 6. A similar correlated motion between DNA binding cavity and ATPase domains is further observed in mode 5 of MSH-DNA and mode 4 of MutSDNA. Functional Cycle of MSH2-MSH6 and MutS from Normal Modes The crystal structures of MutS and the MSH2-MSH6 complex only show the mismatch bound conformation. It is clear, however, that other functional states are involved during scanning of regular DNA, authorization of mismatch repair, and sliding of the enzyme along DNA during and immediately after repair before DNA scanning is resumed. On the molecular level, these different states are likely reflected in altered conformations of MSH2-MSH6 and MutS. X-ray crystallographic approaches have not identified alternate states of MSH2-MSH6, but there is evidence of alternating ATP and ADP bound states from small-angle X-ray scattering (47), where ATP binding has resulted in more compact protein conformations. The normal mode analysis presented here offers first insights into the functional dynamics of MSH2-MSH6 and MutS beyond the known DNA-mismatch bound crystal structures. Through a combination of the conserved low-frequency modes in both proteins, it is possible to propose, for the first time, a complete functional cycle of MSH2-MSH6 and MutS that is in full agreement with experimental observations. The proposed molecular-level picture of the cycle is illustrated in Figure 2.8 and described in detail in the following: DNA binding: The functional cycle of MSH2-MSH6 and MutS begins with binding to newly replicated DNA. Experimental data suggests that DNA-free MutS is present in an open 42 form. Upon association with DNA the clamp domains are presumed to close. The asymmetric mode 4 indicates how the clamp domains might separate starting from the DNA-bound form without significantly affecting the structure of the rest of the enzyme. DNA scanning and mismatch recognition: Once MSH2-MSH6 or MutS is bound to DNA, it will begin scanning for base mismatches. According to single molecule experiments, MSH2-MSH6 moves along regular DNA via one-dimensional diffusion (36), while DNA binding kinetics indicate that the protein is not bound strongly to DNA in the absence of a mismatch (16, 23, 93). In contrast, MSH2-MSH6 and MutS interact closely with mismatched DNA in a highly bent form as evidenced by the crystal structures (20-21, 30). The formation of highly bent DNA is greatly facilitated by the presence of base mismatches or base insertions/deletions (94) and is believed to be the main feature by which mismatch DNA base pairing is recognized (16). The transition from scanning to mismatch recognition is therefore expected to involve a significant change in the DNA binding domain from a relaxed conformation with relatively weak protein-DNA interactions to a tightened conformation where the enzyme holds on to highly bent DNA. The opening/closing motion of the DNA binding cavity in mode 5 of MSH2-MSH6 describes such a transition in molecular detail. The transition from DNA scanning to mismatch recognition is coupled to the fast exchange of ADP to ATP and subsequent stalling of ATP hydrolysis in MSH6 according to kinetic experiments (34, 44, 83-84). Mode 5 couples closing of the DNA binding cavity to opening of the MSH6 or MutS S1 nucleotide binding cleft, and vice versa. The nucleotide binding cleft is sandwiched between the Walker A motif and a loop, which acts as a flap over the adenine moiety. This loop contains a conserved Phe residue (Phe596 in MutS, Phe650 in MSH2 and Phe1108 in MSH6) that stacks with the adenine ring in all available crystal structures. 43 Previous studies of ATP binding in some ATPases have revealed that binding often induces tightening of the site that is required for ATP hydrolysis as suggested by an increase in the hydrophobicity of the binding pocket (95) or closing of specific loops in the presence of the nucleotide resulting in a tightened cavity (96). We hypothesize that an open nucleotide binding cleft in the MSH2-MSH6 and MutS ATPase domains encourages ATP binding but inhibits hydrolysis. On the contrary, closing of the ATPase cavity predominantly involves movement of the loop bound to the adenine ring towards the catalytic center (Walker B motif), thereby ensuring successful ATP catalysis. Mode 5 therefore provides a molecular level picture of how mismatch recognition through deformation of DNA at the mismatch site might be coupled to the experimentally observed changes in MSH6 ATP activity. In most of the crystal structures of MSH2-MSH6 and MutS, ADP or a non-hydrolyzable ATP analog is present in both or any one of the chains, while only one MutS structure with bound ATP on both chains has been reported so far (29). The difficulties of observing the protein with stable ATP bound MSH6/S1 may be attributed to crystal packing that does not allow the formation of ATPase domains that are truly catalytically inactive (29). Thus, the crystal structure with ATP at S1 may not be fully representative of the true non-hydrolysable state of the protein, but rather represent a different trapped intermediate state. Initiation of repair: The next step after base mismatch recognition is initiation of the repair process. This involves binding of MutL/MLH to MutS/MSH2-MSH6 (16-17, 80) which then signals further downstream events. Furthermore, kinetic studies indicate that ADP exchanges for ATP in the MSH2 ATPase domain subsequent to ATP binding in the MSH6 ATPase domain (84). The sequential coupling of ATP binding to the two ATPase domains mirrors alternating ATP hydrolysis activity in other dimerized ATPase domains as in ABC 44 transporters (97) and can be understood in terms of the alternating opening/closing of the nucleotide binding clefts seen in mode 2. We hypothesize that the initially very open ATP binding site in MSH6 following mismatch recognition partially closes upon ATP binding which in turn leads to opening of the MSH2 ATP binding site according to mode 2 and subsequent exchange of ADP for ATP in MSH2. Mode 2 also involves structural rearrangement outside the ATPase domain indicating that the enzyme assumes a distinct conformation at this step of the functional cycle, possibly to facilitate MutL/MLH binding. Repair and mismatch release: Recent experiments suggest that after the initiation of repair, MSH2-MSH6 and MutS form a mobile clamp state that slides along the DNA in search of downstream repair proteins (36, 40, 83-84, 98). A transition from the mismatch bound state to a sliding conformation requires re-opening of the DNA binding cavity to a form that still holds the DNA but is not competent to rebind mismatched DNA (83). Moreover, this sliding activity is not powered by ATP hydrolysis. Current understanding of this transition is unclear from any experimental studies. We propose that the most conserved mode in both proteins, i.e., mode 3, describes the molecular events involved in the formation of sliding clamp conformation. In mode 3, the DNA binding cavity is opened by the release of the clamps from the DNA coupled to a large motion of the DNA binding domain perpendicular to the DNA helix. As a result, intimate interactions with the mismatch through the DNA groove become impossible, in particular, interactions involving the highly conserved Phe-X-Glu motif that is known to interact specifically at the mismatch site (20-21, 29-30). The protein is capable of sliding along the DNA in this state. The release of DNA mismatch binding and sliding according to mode 3 is coupled to a tightening of the MSH2 ATP binding site which would facilitate eventual ATP hydrolysis in MSH2 and allow recovery of the DNA scanning mode. 45 Although, specific normal modes have been mentioned while describing the molecular events during scanning, mismatch binding and sliding clamp formation, it is likely that opening and closing of the DNA binding cavity is actually occurring as a result of multiple low frequency modes. This is even more likely as almost all of the low frequency modes studied, except mode 1 in MSH-free and MutS-free, exhibit some kind of breathing motion of the DNA binding cavity involving different domain motions like that of the clamps, levers and DNA binding domain. The specific modes used to describe the conformational changes in the functional cycle only show the necessary synchronization between the opening/closing of the DNA binding cavity and the ATPase cleft, and are thus used to describe the experimentally observed allosteric effects. Validation Through Experiments and Further Simulation The results presented from normal mode calculations make a number of predictions about the functional dynamics of MutS and MSH2-MSH6 and the existence of additional functional states that have not been characterized on a molecular level to date. In particular, this study proposes molecular level details of long range allosteric coupling between the N-terminal DNA binding domains and the C-terminal ATPase sites as well as coupling between the two adjacent ATPase sites, which are known to exhibit a sequential pattern of action. In addition, the results presented here provide an atomic level characterization of distinct states in the functional cycle of MutS and MSH2-MSH6. A more open DNA scanning conformation is proposed and a sliding clamp state is predicted where the DNA binding domain is rotated out of the enzyme to result in structures that are significantly different from the crystal structure. These predictions should stimulate further experimental and computational studies to validate the predictions made here. 46 In particular, structural experiments could probe the nature of the DNA scanning and sliding conformations based on the predictions presented here while biochemical studies may test mutations that would disrupt the proposed domain movements. Furthermore, the proposed structures for alternate functional states can be subjected to more extensive computational studies to examine their stability and transitions between those states. 47 2.5 Conclusions Results from the normal mode calculations of MSH2-MSH6 and MutS are presented to develop a molecular level picture of distinct conformational states involved in their functional cycles. A comparison of the modes between MSH2-MSH6 and MutS reveals striking similarities, indicating that the two enzymes are not just structurally, but also dynamically and functionally equivalent on the molecular level. The most important result indicates the presence of a strong motional correlation between the ATPase domains and the lever domains in all low frequency modes analyzed, while individual modes highlight the specific nature of the correlation between the N-terminal DNA binding domains and the ATPase domains. This indicates that both MutS and MSH2-MSH6 are structurally capable of establishing long-range allostery during their functional cycle. Based on a detailed analysis of the lowest-frequency modes in the context of the available experimental data, a detailed mechanism is proposed that involves DNA scanning, mismatch recognition, repair initiation, and sliding of MSH2-MSH6/MutS along DNA before scanning is resumed. Normal mode calculations can provide an approximate view of biologically relevant dynamics in biomolecules but are limited by the theoretical nature of the methodology. The ideas presented here suggest a number of experiments that could validate and extend the proposed mechanism of DNA mismatch recognition by MSH2-MSH6 and MutS. Furthermore, the normal mode results can serve as starting points for additional computational studies that may investigate the proposed functional states and transitions between them in more detail. 48 2.6 Acknowledgements The authors thank Dr. Katarzyna Maksimiak and Mr. Hugh Crosmun for valuable contributions during the early stages of the project. Financial support from NSF CAREER grant 0447799 and the Alfred P. Sloan Foundation is acknowledged as well as access to computational resources at the High Performance Computing Center at Michigan State University. 49 Table 2.1 Overlap Index for a Pair of Modes, Each From Two Different Sets Mode No. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 0.8 0.2 0.0 0.2 0.1 0.1 0.2 0.3 0.1 0.1 0.5 0.3 0.3 0.3 0.0 0.3 0.3 0.1 0.1 0.1 0.5 0.1 0.5 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.5 0.4 0.2 0.1 0.2 0.3 0.4 0.1 0.2 0.2 0.1 0.5 0.3 0.1 0.5 0.0 0.2 0.2 0.2 0.3 0.6 0.5 0.2 0.2 0.1 0.2 0.0 0.1 0.2 0.0 0.7 0.0 0.4 0.1 0.0 0.2 0.0 0.2 0.1 0.0 0.6 0.3 0.2 0.2 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 0.8 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.4 0.4 0.2 0.5 0.2 0.0 0.1 0.1 0.0 0.2 0.0 0.9 0.1 0.1 0.2 0.2 0.2 0.0 0.3 0.1 0.4 0.2 0.7 0.4 0.0 0.2 0.1 0.2 0.2 0.2 0.2 0.0 0.3 0.6 0.1 0.5 0.1 0.1 0.2 0.3 0.0 0.1 0.2 0.6 0.3 0.2 0.4 0.1 0.2 0.1 0.1 0.1 0.2 0.9 0.0 0.1 0.0 0.1 0.1 0.0 0.3 0.3 0.3 0.2 0.3 0.5 0.0 0.1 0.2 0.1 0.0 0.6 0.2 0.3 0.5 0.2 0.3 0.0 0.2 0.0 0.0 0.1 0.8 0.2 0.2 0.3 0.0 0.1 0.2 0.1 0.3 0.2 0.4 0.2 0.6 0.0 0.1 0.3 0.1 0.2 0.2 0.6 0.1 0.0 0.5 0.1 0.4 0.3 0.2 0.1 0.1 0.1 0.2 0.5 0.4 0.3 0.0 0.4 0.4 0.4 0.0 0.2 0.0 0.3 0.5 0.5 0.4 0.2 0.0 0.2 0.0 0.0 0.4 0.3 0.2 0.4 0.5 0.3 0.2 0.4 0.3 0.1 0.2 0.5 0.4 0.1 0.0 0.2 0.2 0.2 0.2 0.3 0.1 0.3 0.3 0.3 0.5 0.5 0.1 0.2 0.0 0.1 0.4 0.0 0.3 0.0 0.0 0.1 0.1 0.4 0.2 0.2 0.3 0.2 0.4 0.3 0.2 0.1 0.7 0.2 0.3 0.2 0.1 0.3 0.6 0.2 0.2 0.3 0.2 0.0 0.3 0.5 0.0 0.3 0.2 0.3 0.3 0.5 0.0 0.1 0.2 0.3 0.0 0.1 0.1 0.1 0.3 0.0 0.0 0.4 0.1 0.2 0.4 0.3 0.1 0.0 0.2 0.1 0.2 0.4 0.4 0.4 0.1 0.5 0.0 0.3 0.1 0.3 0.4 0.1 0.2 0.0 0.3 0.2 0.1 0.4 0.3 0.5 0.6 0.2 0.0 0.1 0.2 0.2 0.3 0.2 0.1 0.2 0.7 0.1 0.0 0.1 0.2 0.3 0.7 0.2 0.4 0.0 0.2 0.1 0.1 0.1 0.6 0.3 0.1 0.0 0.4 0.2 0.1 0.3 0.1 0.1 0.1 0.1 0.2 0.1 0.5 0.0 0.3 0.3 0.1 0.2 0.0 0.0 0.1 0.4 0.2 0.4 0.1 0.6 0.2 0.2 0.2 0.0 0.0 0.2 0.1 0.8 0.0 0.1 0.0 0.2 0.1 0.4 0.1 0.2 0.3 0.2 0.3 0.3 Values ≥ 0.5 are in bold; values ≥ 0.6 are also in italic. 50 MSH-free (rows) vs. MutS-free (columns) MSH-DNA (rows) vs. MutS-DNA (columns) MSH-DNA (rows) vs. MSH-free (columns) MutS-DNA (rows) vs. MutS-free (columns) Figure 2.1 Crystal structure of MSH2-MSH6 in front (A) and sideways (B) orientation. Protein domains are indicated in red (I, DNA binding), orange (II, connector), yellow (III, lever), green (IV, clamp) and blue (V, ATPase). DNA is indicated in light (base pairs) and dark (backbone) brown, while bound ADP molecules are magenta. Darker shades refer to MSH6; lighter shades refer to MSH2. Close-up view of the nucleotide-binding domain (C) highlights the Walker A motif in yellow, the Walker B motif in orange, and the signature loop in green. 51 Figure 2.1 52 Figure 2.2 The dynamic behavior of the ATPase site in both chains are represented along an arbitrary horizontal time axis. Alterations among three possible nucleotide binding states (ADP/ATP/free) of the nucleotide binding domain are shown along the vertical axis with the help of curves that represent different hydrolysis pattern during functionally important phases of the protein. Functionally distinct states are colored as blue for scanning, pink for mismatch recognition and green for sliding phases. 53 Figure 2.2 54 Figure 2.3 Root mean square fluctuation (RMSF) of Cα atoms as a function of residue number calculated from the first 10 normal modes (red: without DNA, black: with DNA) and from crystallographic B-factors according to RMSFXray = 3B 8π 2 (blue) for MSH6 (A), MSH2 (B), MutS S1 (C), and MutS S2 (D). Discontinuities along the blue curve are due to missing residues in the crystal structures. Protein domains are indicated by colored bars with red, orange, yellow, green, and blue for domains I, II, III, IV, and V. Chain MSH6/MutS S1 are colored with dark shades while MSH2/MutS S2 are indicated by light shades. 55 Figure 2.3 56 Figure 2.4 Protein backbone showing thermal fluctuations color coded by B-factors calculated from RMSF of the first 10 modes for MSH-free (A), MSH-DNA (B), MutS-free (C), and MutSDNA (D). The color scale for B-factors is provided at the end of the figure. 57 Figure 2.4 58 Figure 2.5 Average covariance from the first 10 modes in MSH2-MSH6 (A; upper triangle: MSH-free, lower triangle: MSH-DNA), and MutS (B; upper triangle: MutS-free, lower triangle: MutS-DNA). Protein domains in both chains are indicated by colored bars following the same color scheme as in Figure 2.3. 59 Figure 2.5 60 Figure 2.6 Mode motions of MSH-free projected on to the minimized crystal structure for modes 1 (A), 2 (B), 3 (C), 4 (D), and 5 (E). Motions are indicated by colored arrows in the direction of the mode vectors for every 6th residue. Motions involving the clamp, DNA binding, and ATPase domains are shown in green, red, and blue, respectively. 61 Figure 2.6 62 Figure 2.7 Close-up views of the motions in the nucleotide-binding domain of MSH-free during modes 2 (A), 3 (B), and 5 (C). Arrows are placed on each Cα atom in every 3 consecutive residues and only displacements of more than 1 Å are shown. The two chains and bound nucleotides follow the same color scheme as Figure 2.1C. 63 Figure 2.7 64 Figure 2.8 Schematic diagram representing distinct conformational states during the functional cycle of MSH2-MSH6 or MutS. 65 Figure 2.8 66 Chapter 3 An Analysis of E. coli MutS Dynamics from Molecular Dynamics Simulations 67 3.1 Abstract Following up on the work presented in Chapter 2, this chapter serves to complement the normal mode analyses by examining the larger scale structural dynamics of the Escherichia coli MutS from nine independent 200 ns molecular dynamics simulations. Standard techniques were employed to measure the flexibility of each protein residue and to monitor changes in the protein secondary structure. Finally, using principal component analysis, two distinct principal modes that are likely to be linked to important protein function were identified. The first mode, found in eight of the nine simulations, describes the movement of the S2 DNA binding along the DNA helical axis towards the S1 DNA binding domain. This mode is hypothesized to be involved in ATP-hydrolysis independent movement of MutS along DNA. In the second mode, unique to the simulation with ATP in S1 and ADP in S2, both DNA binding domains moved upwards towards the DNA in a concerted fashion and resembled a DNA bending mode that bends the DNA upon mismatch recognition. The conformational changes observed in both modes also demonstrated coupled motions between the S2 DNA binding domain and the distant ATPase domains. Overall, the results are consistent with our previously proposed functional cycle for mismatch recognition and builds upon the individual states characterized in Chapter 2. 68 3.2 Introduction In this chapter, new observations from nine independent simulations of the Eshcerichia coli MutS-DNA structure bound with different ADP and ATP nucleotides (Figure 3.1) are reported. Each system was derived from the 2.27 Å MutS-DNA crystal structure (PDBID: 1W7A) (29) which is bound to two ATP nucleotides. Unresolved coordinates within the structure (residues 660-667 in the S1 monomer) were generated using the loop modeling facility in MODELLER (99) and since MutS is a homodimer, the S1 subunit served as a template for completing missing residues in the S2 subunit. After being fully solvated, each of the nine systems contained over 165,000 atoms and was simulated for at least 200 ns using NAMD (68) along with the CHARMM27/CMAP all-atom force field (71-73) for a collective simulation time of 1.8 μs in the NPT ensemble. Each model is referenced by its nucleotide configuration using the “X:X” notation which corresponds to nucleotides that have been modeled into the S1 and S2 ATPases, respectively (Figure 3.1). For example, ADP:ATP has ADP in S1 and ATP in S2 while NONE:NONE contains no nucleotides in either ATPases. The positional fluctuation measured from each simulation showed that the protein was generally more flexible than in the crowded crystal structure environment, but this extra mobility was not derived from changes in the major protein secondary structure elements (namely, αhelices and β-sheets). Due to the length of each simulation and complexity of the protein conformational dynamics, direct visualization of each trajectory only provided limited insight. Thus, the method of principal component analysis (PCA) was used to uncover the so-called “essential dynamics” within the protein (100). Similar to NMA, PCA filters out the dominant, functionally relevant (largest amplitude) modes from the local fluctuations in an MD simulation. 69 Applying PCA to our nine simulations, two new conformational transitions (or modes) were identified and found to be in good agreement with the previously proposed functional cycle for mismatch recognition (101). Additionally, both modes demonstrated coupled motions between the S2 DNA binding domain and the ATPase domains which support an allosteric signaling mechanism in the protein. 70 3.3 Methods Analyses The root mean square fluctuation (RMSF) and other basic distance measurements were calculated using CHARMM (v. c36a1) (64) and interfaced with the MMTSB Tool Set (102). Crystallographic B-factors were converted to RMSF values by applying the RMSF = 3B 8π 2 relation. The protein secondary structure for each simulation snapshot was assigned for each monomer (S1 and S2) using the Dictionary of Secondary Structures in Proteins (DSSP) program (103). All molecular images were generated using PyMOL (104). Principal Component Analysis The essential dynamics for each simulation were ascertained via PCA (100) by first constructing and then diagonalizing the variance-covariance matrix of the Cα positional fluctuations. This  produces a set of 3N eigenvalues ( λi ) and corresponding eigenvectors ( vi ) that, collectively, describe the many different motions within a system. Typically, the eigenvalues are arranged in  decreasing order such that the first vector, v1 , has λ1 with the largest amplitude. This first eigenvalue-eigenvector pair is often referred to as the first principal component (PC1) and represents the motions within a given simulation that has the largest average amplitude. The  level of similarity between v1 from different simulations was determined by calculating the inner dot product which should equal 1 when the proteins are moving in exactly the same way, 0 when the motions are not correlated, and -1 when protein motions are anti-correlated. All nine 71 trajectories (with frames extracted from every 10 ps of simulation time) were first superimposed onto the same starting structure and the principal components were calculated using the program  Wordom (v. 0.21) (105-106). One method for identifying an eigenvector v1 that better represents multiple simulations is to combine the multiple trajectories into one, analyze the combined trajectory using PCA, and then compare this new PC1 with those obtained from the individual simulations (107-108). Individual trajectories with related motions have been shown to have eigenvalues and eigenvectors that are similar to the combined trajectory (107-108). Thus, the  nine trajectories were concatenated into one, analyzed as described above, and compared with v1 and λ1 from the individual simulation models. 72 3.4 Results Protein Secondary Structure Propensities The evolution of the protein secondary structure for each simulation was monitored using the DSSP program (103). While differences existed between the simulations (mostly attributed to localized changes in highly flexible loop, turn, and bend regions), the two major secondary structure elements, α-helices and β-sheets, remained nearly unchanged. Figure 3.2 shows the propensity of β-sheets and α-helices determined from each simulation and for each residue of each monomer. With the exception of a short β-strand located near residue 268 which lies unimportantly on the periphery of a larger β-sheet, all of the β-sheets that existed in the crystal structure remained present in all nine trajectories for nearly 100% of the simulation time (Figure 3.2A). The minor break located between residues 750 and 760 corresponds to a flexible β-hairpin connecting two flanking β-strands. Differences in the β-sheet propensity between S1 and S2 are essentially indistinguishable. Similarly, Figure 3.2B shows that all of the α-helices that were present in the crystal structure remained intact in both monomers for the all of the simulations with the exception of two short helices near residue 480 (located at the tip of the clamp domains) and residue 734 (located near the base of the ATPases) both of which are unstable due to being solvent exposed. Visual examination of the crystal structure showed that the two helices positioned near residue 188 and residue 589 were originally present as an intermediate between a 3/10-helix and an α-helix and therefore was not classified as being novel. Overall, both the S1 and S2 monomers shared nearly identical helical propensities. Principal Component Analysis Reveals Unique Motions in the S2 DNA Binding Domain 73 Figure 3.3 shows the percent contribution of the top 20 eigenvalues, λi , (arranged in decreasing order from each of the nine simulations) plotted against the eigenvector index, i (for i ≤ 20 ). In  all of the simulations, the first eigenvector, v1 , accounted for at least 73% or more of the   fluctuations while v2 contributed less than 15%. Table 3.1 shows the overlap between v1 for the different simulations calculated from the inner dot product between any given pair of  eigenvectors. The v1 from ATP:ATP (the same nucleotide configuration as the 1W7A crystal  structure) had the highest average overlap of 0.74 while the v1 from ATP:ADP had the lowest  overlap of 0.45 with the other eight simulations. v1 from the combined trajectory had the  maximum average overlap of 0.82 meaning that, with the exception of v1 from ATP:ADP which only had an overlap of 0.58, the motions captured by this eigenvector also accurately and best  describe the essential motions involved in each of the individual simulations. Although, v1 from ADP:NONE only had an intermediate level of overlap (0.70) with PC1 from the combined  trajectory, visual comparison of ADP:NONE showed that v1 and PC1 had essentially the same  motions. Since v1 from ATP:ATP showed the highest overlap (0.93) with PC1 from the concatenated trajectory, the Cα atomic displacement for ATP:ATP that corresponds to the largest amplitude movement along PC1 was mapped onto the starting simulation structure in order to visualize the extent of the motions identified in the projection (Figure 3.4). This “porcupine plot” (which only displays Cα displacements greater than 3 Å) serves as a good representative for the conserved movement observed in eight out of the nine simulations (excluding ATP:ADP). Parts of the S1 DNA binding domain appear to move slightly upwards towards the DNA in the direction of the mismatch site while the entire S2 DNA binding domain, in a concerted motion with the S2 connector domain, slides under and along the DNA helical axis towards the 74 relatively stationary S1 DNA binding domain. There is also some minor movement of parts of the clamp domains that interact with the DNA. As well, the S1 core domain moved by a few Angstroms towards the center of the protein while residues in the ATPase dimer interface, which includes a conserved helix-turn-helix motif (see Figure 3.5), moved downwards and away from the base of the protein. For the ATP:ADP simulation, which showed limited overlap with PC1 from the concatenated trajectory (Table 3.1), a similar “porcupine plot” showing the largest amplitude  movement along its own first principal component, v1 , was also mapped onto the starting simulation structure (Figure 3.6). The most unique characteristic about this mode is the significant upward movement of the S2 DNA binding domain towards the DNA. Additionally, this eigenmode showed conformational changes in the S2 connector, S1 DNA binding domain, clamp domains, and ATPases domains (see Figure 3.6) that were similar to the PC1 captured by the concatenated trajectory (which likely accounts for the 0.58 overlap in Table 3.1) but lacked   the previously observed motions in the S1 core. v1 from ATP:ADP was also compared with v2  and v3 from the other eight simulations and did not demonstrate any significant overlap (with an average overlap of about 0.2 in both cases). Thus, this mode appears to be exclusive to ATP:ADP. 75 3.5 Discussion Protein Flexibility The Cα root mean square deviation (RMSD) for all nine trajectories was previously assessed and the protein was found to be quite stable for a system of this size and simulation length (109) (see also Figure 4.3 in Chapter 4). However, the overall RMSD is not always a good indicator of the local protein flexibility. Instead, the Cα RMSF calculated from a simulation is often used to better understand the extent of the local protein dynamics and offers a direct method for comparison with crystallographic B-factors (see Materials and Methods). The RMSFs derived from the simulations were found in general to be higher than the crystal structures (Figure 3.7). Normally, RMSF differences between the simulations and X-ray structures can be attributed to incomplete sampling due to overly short simulation lengths and is exemplified by simulated RMSF values that are lower than those derived from the crystal structure (46). However, considering that our simulations are each over 200 ns long and that they exhibit higher mobility on average, it is more likely that the X-ray structures are restricted from sampling alternate conformations due to crystal packing forces (46). α-Helix and β-Sheet Propensity MD simulations can often be used to successfully monitor the evolution of secondary structure elements (110-112). Surprisingly, the β-sheets and α-helices that were present in the crystal structure (Figure 3.1) were essentially maintained throughout the trajectories and the propensities of these dominant secondary elements were extremely high (Figure 3.2). When interpreting these 76 results, it is also important not to underestimate the CHARMM27 force field effects which has been shown to over-stabilize α-helices (113). Nonetheless, the conservation of the two dominant secondary structure elements on sub-μs time scales suggests that the mobility in the Cα atoms may result from domain-level conformational changes. Principal Component Analysis The predominant large scale motion sampled in each trajectory and in the combined trajectory was measured using PCA. With the exception of ATP:ADP, the overlap (dot product) measured   between v1 from the combined trajectory and v1 from the other eight simulations was remarkably high (see Table 3.1) which implies that PC1 from the combined trajectory is a good representative of the dominant motion observed in eight of the nine simulations. Figure 3.4 demonstrates the extent of movement from the representative ATP:ATP simulation (which showed the highest overlap) projected along PC1 from the combined trajectory. The largest and most interesting conformational change in the protein comes from the S2 DNA binding domain which moves under and along the local DNA helical axis towards the S1 DNA binding domain in a manner that resembles a mode for sliding along DNA. However, any substantial movement of the DNA would first require the S1 DNA binding domain to relinquish its contacts with the mismatched DNA. In fact, one of the modes from NMA demonstrated the transition of the MutS protein from a repair initiation mode to a sliding clamp conformation which involved opening of the DNA binding cavity and movement of the S1 DNA binding domain away from the mismatched DNA (101). The combination of this NMA mode followed by the movement of the S2 DNA binding domain as described by PC1 could very well account for the conformational 77 changes necessary for the experimentally observed ATP-hydrolysis independent movement of MutS along DNA after mismatch recognition (39-40). Finally, PC1 was also compared with the 10 lowest frequency modes from our NMA findings (101) and was found to overlap most with the sliding clamp mode, although, the overlap was only about 0.3 (too small to suggest any significant relationship between the modes). The similarities in the conformational sampling amongst the different simulations were also assessed by clustering the structures based on its Cα RMSD ((109) and see also Figure 4.4 in Chapter 4). All nine simulations were successfully grouped into six overlapping clusters, where one of the six clusters was visited only by ATP:ADP which suggested that the ATP:ADP simulation samples slightly different conformations from the other eight simulations. Analysis of the ATP:ADP trajectory using PCA confirmed that the first principal component from this    simulation was unique and showed little to no overlap with the v1 , v2 , and v3 from the other  eight simulations. Figure 3.6 shows the extent of movement of the protein along v1 from ATP:ADP. The most obvious difference is the large movement of the S2 DNA binding domain which, instead of moving along the DNA helical axis, moves upwards towards the DNA in what resembles a DNA bending action. Interestingly, the ATP:ADP nucleotide configuration was previously identified from NMA as the mismatch binding mode (101). However, no DNA bending mode was identified from NMA and comparison of the DNA bending mode from ATP:ADP with the 10 lowest frequency modes from NMA only showed a maximum overlap of 0.1. DNA bending in MutS has been studied experimentally using atomic force microscopy (48), but the exact mechanism by which the protein bends the DNA is largely unknown. Thus, it is proposed that upon mismatch binding by the S1 DNA binding domain, the S2 DNA binding domain pushes upwards on the DNA and bends it as depicted in Figure 3.6. 78 In both the DNA sliding and DNA bending modes identified from PCA, the movement of the S2 DNA binding domain was coupled to an elongation of both ATPases (see Figure 3.4 and Figure 3.6). The two nucleotide binding sites and both conserved C-terminal helix-turn-helix motifs (HTH, residues 766-800) (20, 35) were seen moving downwards and away from the base of protein, thereby slightly elongating the MutS structure. We note that this elongation is largely dependent on the movement of the HTH motif (Figure 3.6) which has been previously demonstrated to attenuate ATPase activity and affect dimerization upon being disrupted (35). Therefore, it can be speculated that the elongation of the HTH motifs could play a role in modifying the distance between the ATPase dimer interface which would directly affect ATPase activity since ATP hydrolysis requires that the opposing dimer move closer in order to complete the active site. Thus, an increase in distance between the subunits (governed by the movements of the HTH motifs) would likely inhibit ATP hydrolysis. 79 3.6 Conclusions The complex nature of the of the DNA mismatch recognition cycle involves numerous conformational changes in the MutS protein that are dependent upon the bound nucleotides as well as the presence or absence of a DNA mismatch. We have investigated the dynamics of the MutS protein by applying the PCA method to several long simulations of the MutS-DNA complex. Overall, the essential dynamics of MutS derived from our nine simulations complements the NMA work presented in Chapter 2. The observation of coupled motions between the DNA binding domain and the distant ATPases strengthens the case for an allosteric signaling pathway within the protein. More importantly, the identification of two novel conformational modes, one for DNA sliding and one for DNA bending, provides new insight into the DNA mismatch recognition process. 80 Table 3.1 Pair-wise Overlap of the First Eigenvector From the Nine Simulations and the Combined Trajectory Simulations Combined NONE:ATP ATP:ATP ADP:NONE NONE:ADP ADP:ADP ADP:ATP ATP:ADP NONE:NONE Trajectory ATP:NONE 0.78 0.78 0.53 0.67 0.70 0.76 0.28 0.91 0.88 NONE:ATP 0.83 0.63 0.71 0.71 0.73 0.41 0.77 0.89 ATP:ATP 0.68 0.83 0.77 0.70 0.52 0.79 0.93 ADP:NONE 0.62 0.45 0.45 0.52 0.50 0.70 NONE:ADP 0.68 0.59 0.63 0.70 0.86 ADP:ADP 0.67 0.50 0.75 0.86 ADP:ATP 0.46 0.75 0.82 ATP:ADP 0.31 0.58 NONE:NONE 0.90 81 Figure 3.1 The MutS-DNA structure (center) with the different nucleotide-bound conformations (surrounding). 82 Figure 3.1 83 Figure 3.2 Secondary structure propensity for the S1 (red) and S2 (blue) monomers from each simulation. A) β-sheet propensity. B) α-helix propensity. The simulation order shown in A) is the same in B). The colored bar located below each plot corresponds to the five different structural domains and is colored according to the legend in Figure 3.1 and the broken black bars above each plot corresponds to the same type of secondary structure elements that were present in the crystal structure. 84 Figure 3.2 85 Figure 3.3 The percent contribution of the first 20 eigenvectors for each of the nine simulations. 86 Figure 3.3 87 Figure 3.4 Schematic “Porcupine plot” for the ATP:ATP simulation projected onto the first principal component (PC1) of the combined trajectory and mapped onto the starting simulation structure. Each cone points in the direction of motion and the length of the cone represents the amplitude of the fluctuation for each Cα atom. For clarity, the cones are colored according to the different structural domains and only motions larger than 3 Å are displayed. 88 Figure 3.4 89 Figure 3.5 MutS ATPase domain with the S1 and S2 subunits colored in dark and light blue, respectively. The conserved helix-turn-helix is colored red and pink for the S1 and S2 subunits, respectively. 90 Figure 3.5 91 Figure 3.6 Schematic “Porcupine plot” for the ATP:ADP simulation showing its own first principal component mapped onto the starting simulation structure. Each cone points in the direction of motion and the length of the cone represents the amplitude of the fluctuation for each Cα atom. For clarity, the cones are colored according to the different structural domains and only motions larger than 3 Å are displayed. 92 Figure 3.6 93 Figure 3.7 The RMSF for the S1 and S2 monomers calculated from the nine MD simulations and from the 1W7A and 1E3M crystal structures. The colored bar located below each panel corresponds to the five different structural domains and is colored according to the legend in Figure 3.1. 94 Figure 3.7 95 Chapter 4 Base-Flipping Mechanism in Post-Mismatch Recognition by MutS Sean M. Law and Michael Feig Submitted to Biophys. J. 96 4.1 Abstract DNA mismatch recognition and repair is vital for preserving the fidelity of the genome. Conserved across prokaryotes and eukaryotes, MutS is the primary protein that is responsible for recognizing a variety of DNA mismatches. From molecular dynamics simulations of the Escherichia coli MutS-DNA complex, we describe significant conformational dynamics in the DNA surrounding a G·T mismatch that involves weakening of the base pair hydrogen bonding in the base pair adjacent to the mismatch and, in one simulation, complete base opening via the major groove. The energetics of base flipping was further examined with Hamiltonian replica exchange free energy calculations revealing a stable flipped-out state with an initial barrier on the order of about 2 kcal/mol. Furthermore, we observe changes in the local DNA structure as well as in the MutS structure that appear to be correlated with base flipping. Our results suggest a role of base flipping as part of the repair initiation mechanism, most likely leading to sliding clamp formation. 97 4.2 Introduction The integrity of the genome is safeguarded from replication errors by an evolutionarily conserved DNA mismatch repair (MMR) pathway. MMR in E.coli begins with the mismatch recognition protein, MutS, scanning the DNA for base-base mismatches and small insertion/deletion loops (6). Upon mismatch recognition, MutL binds to MutS followed by further downstream repair events to ultimately restore the parental genotype (7, 10-12, 39-40, 114-115). Defects in the MMR pathway lead to replication and recombination errors and have been linked to hereditary non-polyposis colorectal cancer in humans (116) and are likely to play a role in other types of cancer as well (117). Crystal structures of prokaryotic MutS and one of its human homologs, MSH2-MSH6, bound to various DNA mismatches, have provided mechanistic insight into the mismatch recognition process (20-21, 24, 29-30, 44-45, 118-120). Heteroduplex DNA bound to MutS (Figure 4.1A) is bent by about 45°-60° towards the major groove at the site of the mismatch. Mismatch specific contacts are made by a conserved F36-X-E38 motif (Figure 4.1B). The F36, first identified in cross-linking studies (26), forms an aromatic ring stack on the 3’ side of the mismatched base. Mutation of F36 abolishes mismatch binding in vitro and is associated with defective MMR in vivo (22, 121-122). The intrusion of a Phe residue into the duplex stack resembles intercalating residues commonly found in other DNA repair systems such as DNA glycosylases, T4 endonuclease V, and DNA demethylases, all of which involve a base-flipping mechanism (123-124). A similar base-flipping mechanism has also been proposed for MutS (16, 25, 30, 48, 51) but direct evidence has been lacking to date. 98 A recent FRET study has indicated that the MutS-DNA complex may involve transient intermediate states and exhibit more dynamics than suggested by the crystal structures (50). More detailed insight into the dynamics of the MutS-DNA complex during mismatch recognition is difficult to obtain with biochemical experiments but can be gained from computer simulations. Previous computational studies of MutS and homologs include normal mode analysis (101) and limited molecular dynamics (MD) simulations (32, 125-126). Here, we present results from submicrosecond MD simulations of the MutS-DNA complex to focus on the details of the postmismatch recognition process. In particular, we describe the observation of spontaneous baseflipping of the base adjacent to the mismatch site when bound to MutS. Quantitative aspects of the base opening transition were additionally analyzed with the Hamiltonian replica exchange method (HREM) (75). Our results suggest that flipping of the base adjacent to the mismatch is energetically likely in the MutS-DNA complex. Furthermore, it appears that base-flipping may be coupled to conformational changes in the protein suggesting a mechanistic role during repair initiation by MutS. 99 4.3 Materials and Methods Simulated Systems and Molecular Dynamics Protocol MD simulations of E. coli MutS in complex with DNA containing a G·T mismatch were carried out with explicit solvent. The starting conformation of the MutS-DNA complex was taken from the crystal structure 1W7A (29). Missing residues 660-667 in the S1 (mismatch binding) monomer were completed using the loop modeling (127) function in MODELLER, version 9 (99). Visual comparison of the model with a recent crystal structure of MutS where the disordered loop was resolved (120) showed no appreciable differences. Missing residues in the homodimeric S2 subunit were modeled after the S1 chain. Histidine ionization states were predicted using PROPKA3.1 (128) and confirmed visually based on the local protein environment. Nine simulations were carried out with all possible combinations of bound ATP, bound ADP, or no nucleotide at either the S1 or S2 ATPase domain. Positioning of the nucleotides was based on resolved nucleotides in the 1E3M (20) and 1W7A (29) crystal structures. The “X:X” notation is used here to denote which nucleotides are bound to the S1 and S2 subunits, respectively (e.g. ATP:ADP means that ATP is bound to S1 and ADP is bound to S2 while NONE:NONE is free of nucleotides). In addition to the wild-type system, simulations of an S1-F36A mutant with four different nucleotide combinations (ADP:NONE, ADP:ADP, ADP:ATP, NONE:NONE) were also carried out (see below). Each structure was solvated using the TIP3P water model (129) and electrically neutralized with sodium ions. The total dimension of each system was approximately 155 Å x 117 Å x 94 Å and contained more than 165,000 atoms. The particle-mesh Ewald method (130) was employed to account for electrostatic interactions. The direct electrostatic sum and Lennard100 Jones (LJ) interactions were truncated at 10 Å with a switching function becoming effective at 8.5 Å and a non-bonded list cutoff at 12.5 Å. The all-atom CHARMM27/CMAP force field was used for all calculations (71-73) and chosen because it has been extensively validated in many other simulations of protein-nucleic acid simulations (54-55, 131), including simulations describing base flipping (132). Minimization, Equilibration, and MD Protocol Initial minimization involved 50 steps with the steepest descent method (SD) followed by 10,000 steps of adopted basis Newton-Raphson minimization. During the minimization, a 10 2 kcal/mol/Å harmonic restraint was applied to all solute heavy atoms. Following minimization, each structure was gradually heated to 300 K and equilibrated in the NVT ensemble through three consecutive stages: (1) 1.4 ns of MD were carried out during which the solute was restrained as described above but water and ions were allowed to move freely; (2) solute restraints were released gradually over a period of 100 ps; (3) unrestrained MD over 6.4 ns was carried out to further equilibrate the system. All minimization and equilibration steps were carried out using the CHARMM program (64), version c35a1 in conjunction with the MMTSB Tool Set (102). After the initial equilibration phase, each of the nine simulations was then continued for another 200 ns using the NAMD simulation program, version 2.6 (68). The unrestrained NAMD simulations were carried out in the NPT ensemble that was maintained -1 using a Langevin thermostat and barostat with a friction coefficient of 5 ps and a 2 fs integration time step was used in conjunction with SETTLE (133) to holonomically constrain bonds involving hydrogen atoms. 101 F36A Mutant Set Up and Simulations Fully equilibrated structures from four wild type simulations (ADP:NONE, ADP:ADP, ADP:ATP, and NONE:NONE) were taken and the S1-Phe36 residue was mutated to an alanine residue using the MMTSB Tool Set (102). These mutated structures were subjected to the solute restraint protocol described above followed by a gradual release of the restraints and then equilibrated for an additional 10 ns (completely free of restraints). Each of the F36A mutant simulations was simulated for a total of 60 ns by using an identical production simulation protocol as described above. Water Residence Time The residence time of water molecules located within 4 Å of the G10 base was calculated by using a coordinate correlation function which has been previously used to assess solvent and ion residence times (134-135). Briefly, the water correlation function, Cα ( t ) , is written as: N water ttotal −t 1 1 Cα ( t ) ∑ N ( 0, t − t ) ∑ pα ,i ( t ', t '+ t; t *) N water α ,i total = 1 =' 0 i t (4.1) where pα ,i ( t ', t '+ t ; t *) is a binary function that is set to 0 unless water molecule i is found within the predefined area α between time t ' and t '+ t . To ignore waters that escape and quickly rebind, the rebinding time t * was set to 1 ps. The binary function is then accumulated across the total simulation time ttotal and divided by the number of times Nα ,i ( 0, ttotal − t ) water 102 molecule i is found within the confines of α. Finally, N water corresponds to the total number of water molecules that participate in the residence time calculation. Depending on the overall sampling, water residence times may be fit to a bi-exponential decay function or, in some cases, a tri-exponential function. Alternatively, by taking the natural logarithm of the water correlation function the residence times can be easily obtained by calculating the inverse slope of portions of ln ( Cα ( t ) ) that can be fit to a linear curve. Solvent-Accessible Surface Area Analysis of the solvent-accessible surface area for the H1’ atom in the DNA minor groove of bases near the mismatch site was obtained from the COOR SURF command in CHARMM using a 1.4 Å probe radius (which represents a single water molecule). Solvent-accessible surface areas were then calculated with and without MutS. The reported change in accessibility upon MutS binding, referred to as ∆SASA, is the difference between these two values. Hamiltonian Replica Exchange Simulations and Free Energy Calculations To investigate the energetics of base-flipping, umbrella sampling simulations were carried out. A harmonic biasing potential was applied to enhance base-flipping and to obtain sufficient statistical sampling for estimating the free energy profile associated with base-flipping. The reaction coordinate used for the biasing potential is a pseudodihedral angle introduced earlier (136). The pseudodihedral is based on the following four heavy atom sites: 1) Center of mass (COM) of the G9, T22, C11, and G20 bases (flanking the base of interest, C21); 2) the T22 103 phosphate; 3) the C21 phosphate; and 4) the COM of the C21 base (Figure 4.2A). While other reaction coordinates have been utilized in the past to study base flipping (132, 137-139), this pseudodihedral angle definition provides an improvement over previous methods (136) and has been shown to produce results that are in good agreement with experiment (140). The biasing potential was applied using the miscellaneous mean field potential (MMFP) module (141) of CHARMM and has the following form: wi= (θ ) ki (θ − θi )2 2 (4.2) 2 where ki is the force constant set to 100 kcal/mol/rad , θ is the pseudodihedral angle, and θi is th the target value for the i window. A total range of 0 to 162.5° was covered in 2.5° increments to result in 66 windows. Instead of conventional umbrella sampling, we used HHREM (75) with 66 replicas corresponding to the umbrella windows to enhance sampling efficiency further. These simulations involved the entire E. coli MutS-DNA complex in explicit solvent. They were carried out by using CHARMM (64) in conjunction with the MMTSB Tool Set (102). Starting structures for different replicas were taken from one of the unbiased simulations where baseflipping was observed spontaneously. Each starting structure was initially subjected to 200 ps of equilibration with the biasing potential of a given replica. Each replica was then simulated for 10.5 ns (for a total simulation time of 693 ns for all 66 replicas). Exchanges between neighboring replicas were attempted every 1 ps. 23-37% of the exchanges were successful. Analysis 104 Most of the analysis was carried out with the MMTSB Tool Set and CHARMM, version c35a1 based on the 200 ns production time for the unbiased simulations. Protein RMSD values were calculated using Cα atoms. The DNA RMSD was calculated by using all heavy atoms, omitting the ultimate and penultimate bases. 1-D potentials of mean force (PMFs) were generated from the replica exchange simulation using WHAM (76) after discarding the first 5 ns as equilibration. 2-D PMFs along additional degrees of freedom were estimated from the HREM simulations (also with the first 5 ns removed as equilibration) using standard WHAM under the assumption that all other degrees of freedom orthogonal to the pseudodihedral angle are thoroughly sampled (142). All structural figures were generated using PyMOL (104). 105 4.4 Results and Discussion A series of nine 200 ns MD simulations of MutS in complex with a G·T mismatch containing DNA were analyzed with a primary focus on MutS-DNA interactions and the dynamics of mismatch DNA when bound to MutS. The simulations differed in the nucleotide(s) bound in the ATPase sites since the simulations were initially set to study the effect of different nucleotides on the MutS structure. During the course of the simulations reported here we did not see significant structural perturbations that could be correlated with the type of nucleotide bound to the ATPase domain. In fact, we found that the MutS-DNA complex sampled similar conformations in all nine simulations. The Cα RMSDs were all within 3-4 Å relative to the Xray structure (Figure 4.3). Furthermore, clustering analysis shows that structures from all simulations fall into closely related conformations, with overlapping sampling of conformations belonging to the four largest clusters (Figure 4.4). This suggests that different nucleotides bound in the ATPase domain do not dramatically affect the overall MutS structure on the sub-µs time scales covered here. Consequently, the simulations discussed here are treated as nine independent simulations of essentially the same system providing a total of 1.8 µs of sampling of the MutS-DNA complex. Dynamics of DNA and Base-flipping in the MutS-DNA complex Overall, the DNA bound to MutS maintained its bent structure in all simulations as indicated by a heavy-atom RMSD of 1-4 Å (Figure 4.3). However, a more detailed analysis of base pair hydrogen bonding revealed significant base dynamics in the vicinity of the mismatch. More specifically, the G/C(-1) base pair adjacent to the mismatch site on the 5’-side of the thymine of 106 the G·T base pair lost Watson-Crick hydrogen bonding in most of the simulations (Figure 4.5A). The X/Y(±N) notation is used here to denote the X/Y base pair relative to the thymine of the G∙T mismatch (see Table 4.1). The G·T mismatch remained stable in all but one of the simulations. In that simulation (NONE:ADP) a new N3-O6 hydrogen bond was formed within the same base pair due to shearing of the G·T base pair. The next-neighbor A/T(+1) and C/G(-2) base pairs stably maintained standard hydrogen bonding in all simulations (Figure 4.5A). The instability of the G/C(-1) base pair was unexpected and involved the loss of N1–N3 hydrogen bonding and at least partial opening of the C21 base into the major groove. In one of the simulations (NONE:NONE) all of the G/C(-1) hydrogen bonds were lost within the first 10 ns and the base subsequently flipped out into the major groove where it remained for the rest of the simulation. This observation appears to be in conflict with previous NMR and MD studies where significant instability of G·T pairs over canonical base pairs has been established (143144). However, these studies were not conducted in the presence of MutS and therefore do not account for the severe bend in the DNA caused by interactions with the protein (20-21, 48). The bending leads to significant distortions of the grooves near the mismatch site. In particular, the major groove width is reduced to only 13 Å at the G·T mismatch but increased to 18 Å at the G/C(-1) base pair (see Figure 4.6) compared to the major groove width of canonical B-DNA at around 17 Å (145). The narrow major groove at the G·T pair effectively prevents base opening while the wider major groove at the G/C(-1) base pair is more favorable for base opening. To test a possible role of F36 in stabilizing mismatch base pairing and promoting G/C(-1) base flipping we ran four additional 60 ns simulations of a S1-F36A mutant. We find that mismatch base pairing is stably maintained without F36 (see Figure 4.5B) although the T22 base reorients with different glycosyl rotation angles (see Figure 4.7A-B). Interestingly, we again 107 observed spontaneous base opening of the G/C(-1) base pair in one of the simulations (ADP:NONE:F36A) in a very similar manner as in the NONE:NONE simulation (see Figure 4.5B). These results suggest that F36 does not play a significant role in either stabilizing mismatch base pairing or promoting base opening of the C/G(-1) base pair. Progression of the base-flipping process was quantified with the help of a pseudodihedral angle, θ, (see Methods section) with negative values as the base opens into the major groove (see Figure 4.8A). Figure 4.9 shows snapshots of key time points during the base opening process. Initially, G/C(-1) was perfectly base paired (θ ≈ 0). The base then rapidly lost base pair hydrogen bonds and stacking interactions to reach a semi-open state (θ = -40°) that was stable for a few ns. Further opening led to another intermediate state that was stabilized by hydrogen bonding interactions to the DNA backbone (θ = -81°). This state also persisted for a few ns. Eventually, the C21 base opened entirely at about 10 ns from the beginning of the production phase of the simulation. The base was briefly fully exposed to the solvent environment (θ = -130°) but then began to interact with the DNA backbone of the opposing strand (θ = -120°). This conformation persisted from t ≈ 20 ns to t ≈ 120 ns. During the remainder of the simulation, C21 moved back towards various semi-open states but without re-forming a fully stacked configuration. C21 base flipping was associated with a change in the C21 backbone ζ torsion angle from around -150° to 150° (see Figure 4.8B, Figure 4.10B, and Figure 4.11A) as generally expected for DNA base flipping (146). Otherwise, the DNA structure remained largely unaffected by the opening of the C21 base on the time scale of our simulations. Our observation of spontaneous base-flipping in DNA complexed to MutS provides new molecular-level evidence for the previously proposed idea that base-flipping may play a role in mismatch recognition (16, 25, 30, 48, 51). In order to gain more quantitative insight we also 108 carried out an HREM simulation of the NONE:NONE MutS-DNA complex where sampling along the base-flipping reaction coordinate was enhanced with a total 10.5 ns of simulation time for each replica. The main result is a PMF free energy profile along the base-flipping reaction coordinate (see Figure 4.10A). The PMF has a prominent minimum near 0° for the fully base paired state and a second minimum at around -105° corresponding to the flipped-out state. The two states are estimated to be separated by a 2 kcal/mol energy barrier. To examine the convergence of the PMF we compared it to PMFs with a shorter simulation lengths (7.5 ns/replica and 9 ns/replica) and found negligible change between the 9.5 ns/replica and 10.5 ns/replica PMFs (Figure 4.12). Based on the variation of the PMF over time we roughly estimate the uncertainty to be between 0.1-0.5 kcal/mol. Thus, the HREM simulation confirms the existence of a favorable, flipped-out state. Based on the PMF we calculate that the G/C(-1) base pair is intact (θ ≥ -20°) for 69% of the time but the C21 is flipped-out to varying degrees during the remaining 31% with an estimated uncertainty of 5-10% based on the uncertainty of the PMF. The observed 2 kcal/mol barrier suggests conformational transitions on ns time scales. This is in apparent contradiction with the rarity of full base opening/closing events in the unbiased simulations. In the replica exchange simulations, complete base opening/closing was also never observed for any individual replica although significant sampling overlap from many replicas at each pseudodihedral value (see Figure 4.13) suggests that the PMF presented in Figure 4.10A is realistic. This suggests the presence of significantly higher kinetic barriers in orthogonal degrees of freedom not captured by the projection onto the C21 pseudodihedral angle. One source for such barriers is likely the torsional dynamics of the ζ backbone dihedral with a barrier height estimated to be larger than 5 kcal/mol in previous simulations of base opening (147). Another source for slow base opening/closing kinetics appears to be the presence of long109 lived water molecules at the constrained protein-DNA interface (Figure 4.14A-B). An analysis of residence times of water molecules located within 4 Å of the G10 base from the NONE:NONE simulation found that waters located on the major groove side and within the cavity left by the flipped out C21 base had residence times up to 500 ps (compared to about 50 ps of surfacebound waters, see Figure 4.14C-D) while waters that managed to enter the cramped minor groove side essentially become trapped near the N2, N3, and N9 atoms of the G10 base with even longer residence times in the ns range (Figure 4.14E-F). The presence of these long-lived water molecules likely hinders base closing which cannot be accomplished unless these waters are displaced. This would explain why C21 never fully restacked in the NONE:NONE simulation despite the ζ torsion reverting back to the -150° range near the end of the simulation. Our results suggest that base opening may occur on sub-µs time scales since it was observed spontaneously in two of our simulations. Most likely, base opening kinetics are dominated by the kinetic barrier for ζ backbone dihedral transitions. Base closing, on the other hand, appears to involve much longer time scales due to obstruction by long-lived water molecules. This would imply that the flipped-out state may be kinetically stabilized for a long time despite being thermodynamically slightly less favorable than the fully stacked state according to our analysis. Opening of the 5’ Adjacent Base Next to the Mismatch is in Agreement with Experiment Direct structural evidence for DNA base-flipping in the MutS-DNA complex is lacking, but there is indirect experimental evidence for at least partial opening of the 5’ adjacent base next to the mismatch: Prior to the discovery of the MutS structure, chemical footprinting was used to 110 uncover the interactions between Thermus aquaticus MutS and the DNA minor groove (148). MutS-bound DNA with a G·T mismatch was found to be protected on the 3’ side of the lesion, but not on the 5’ side of the mismatched thymine where the -4, -2, and -1 positions were hyperreactive to oxidative attack. This was attributed to widening of the minor groove when the crystal structure became available (21). To further understand these data, we analyzed the effect of MutS on solvent-accessibility of H1’ (the hydrogen attached to the C1’ attack site located in the minor groove) from our simulations. We found that without base flipping (in ATP:NONE), access to the -4 base is fully maintained, access to the -2 base is partially hindered, but the -1 base is largely occluded (Figure 4.15). Base flipping (in NONE:NONE), on the other hand, fully exposes the -1 base so that all three bases become vulnerable to oxidative attack as indicated by experiment. In a more recent study, 2-AP, a fluorescent adenine analogue often used to probe DNA base-flipping, was incorporated into various positions next to a G·T mismatch (52). It was found that the mean fluorescence lifetime increased when the mismatch was bound by MutS. Furthermore, the level of increase in the observed mean fluorescence lifetime was significantly higher when the probe was placed on the 5’ side of the mismatch as compared to other positions. This increase was attributed to an increased amplitude of the longest lifetime component, which could be explained by an increased fraction of extrahelical states. Interestingly, the relative population with the increased fluorescence lifetime, calculated by summing up the fractional amplitude of the two longest lifetimes, was estimated to be ~31% (52), the same percentage as the fraction of flipped-out conformations in our HREM simulation. While we believe that a qualitative comparison with the experiment is meaningful, the surprisingly good quantitative agreement is likely fortuitous because of uncertainties in both the experimental and 111 computational results as well as differences between experiment and simulation. In the experimental results each reported lifetime results from numerous conformers with comparable quenching rates. Furthermore, the experimental study describes opening of an A/T (or 2-AP/T) base pair that is known to have different base opening rates than G/C base pairs (144) as in the MutS-DNA complex studied here. Changes in MutS as a Result of Base-flipping Experiments suggest that initial mismatch recognition is followed by several changes in MutS. First, biochemical data indicate altered activity of the ATPase domain as a result of mismatch recognition where ADP is exchanged for ATP and hydrolysis is stalled (8, 24, 84, 149-150). Second, the affinity for MutS-MutL complex formation is increased. Third, a transition to a sliding clamp formation has been suggested to allow MutS to leave the mismatch site after initial recognition so that DNA repair can take place (39-40, 84, 98). Thus, if base-flipping plays a role in post-mismatch recognition, sliding clamp formation, and/or initiation of repair, one would expect that there are correlated changes in the MutS structure either in the DNA binding domains, the core and connector domains where MutL is proposed to bind (31), or in the ATPase domains. In our simulations we identified changes in the DNA binding domains of both chains, local rearrangements in the ATPase domains, but no significant changes in the core and connector domains of MutS correlated with base flipping. DNA binding domain motions were characterized by a local coordinate system: X corresponds to motion along the DNA helical axis, Y to motion perpendicular to the helical axis towards the tips of the clamp domains, and Z to motion across the S1-S2 dimer interface (see 112 Figure 4.16A). The S1 DNA binding domain (S1-D1), which interacts specifically with the DNA mismatch, showed a significant shift by, on average, 1-1.5 Å along X in the simulation where the base is flipped out (Figure 4.8C) compared to all of the other simulations where the base did not flip out (Figure 4.16C-E). Motion of S1 along Y and Z was not correlated with base flipping (Figure 4.17A-B). Due to the bent shape of the DNA, this motion effectively moves the domain away from the DNA (see Figure 4.16B). The S2 DNA binding domain shows a significant shift along Z, laterally away from the DNA towards the S1 core and a moderate shift along X and Y (Figure 4.16F-H). As a result of the motion of the S2 domain, MutS-DNA interactions are also reduced (Figure 4.16B) and these motions appear to be closely coupled to base flipping (Figure 4.8D-F). Additional analysis based on data from the HREM simulations confirms a strong correlation between base flipping and the motion of S2 along X, Y, and Z (see Figure 4.10D-F) and to a lesser extent for S1 along X (see Figure 4.10C) but not along Y or Z (Figure 4.17C-D). An apparent correlation between motion of the DNA binding domain and base flipping points at a possible connection with sliding clamp formation which is assumed to involve reduced MutSDNA interactions. Functional coupling between mismatch recognition and ATPase activity requires allosteric signaling over 90 Å from the DNA binding domains to the ATPase domains (see Figure 4.1A). Based on the MutS structure it appears that such communication would involve the core and connector domains which provide the structural connection. In fact, there is a string of highly conserved residues from the DNA-binding to the ATPase domains (Figure 4.18). In our simulations we did not observe motions along this pathway that could be uniquely attributed to base flipping but we did identify changes in the ATPase domain itself in the vicinity of the S1 nucleotide binding pocket that appear to be correlated to base-flipping. 113 Ser668, a conserved residue in MutS homologs, was previously implicated in ATP hydrolysis (29, 120, 151). Crystal structures suggest that Ser668 in the S2 monomer may move closer to the opposing S1 nucleotide binding site by ~5 Å to take part in ATP hydrolysis (29, 120) (Figure 4.19A). While the exact mechanism is unclear, it has been postulated that S2 Ser668, located at the end of an α-helix, could convey the positive charge generated from the helix dipole to the γ phosphate to assist with catalysis of the hydrolysis reaction (29, 120). For this structural rearrangement to occur, Asn616, situated in the P-loop (Figure 4.19A), has to move away from the dimer interface to allow Ser668 to complete the active site (29). Significant reduction in ATPase activity following mutation of either Asn616 or Ser668 supports such a mechanism (29, 151). In our simulations, we found that the S2 Ser668 to S1 Asn616 distance, the backbone conformation of Asn616 in the S1 monomer, and the ability to form a salt bridge between Glu594 of the S1 monomer and Arg667 of the S2 monomer are all correlated with base flipping. Changes in the backbone of Asn616 were measured by the Ψ (N-Cα-C-N) torsion angle. Figure 4.8G shows a significant decrease in the Ser668 to Asn616 Cα-Cα distance from about 9 Å to 5 Å upon base flipping which may promote nucleotide hydrolysis as indicated by the biochemical data. As shown in Figure 4.8H, the Asn616 Ψ angle changes from 125° to -30° at the same time as the base flips out and, as a result, allows Ser668 to approach the S1 active site. A correlation with base flipping is confirmed from the HREM data where base opening appears to limit the Cα-Cα distance between Ser668 and Asn616 to 6-8 Å instead of 6-12 Å when the base is fully stacked (Figure 4.10G). Similarly, base flipping seems to broaden the conformational sampling of the Asn616 Ψ value to the full range from -50 to 180 degrees while only extended conformations are observed for fully base-stacked DNA (see Figure 4.10H). 114 Repositioning of S1 Asn616 also allows the S2 Arg667 side chain to relocate and form a salt bridge with S1 Glu594 (Figure 4.8I and Figure 4.19B). The interaction between these two residues further stabilizes the S2 signature loop in which Ser668 resides (see Figure 4.8J) and Arg667 is also positioned to hydrogen bond with the ribose of adenosine (Figure 4.19B). Based on these results, we speculate that base flipping may promote (or restore) the ability to hydrolyze ATP in the S1 binding site. However, there remains uncertainty about the exact mechanism of how variations in MutS-DNA interactions are communicated to the distant ATPase domain. Known crystal structures of MutS are very similar with either ATP (29) or ADP bound in the ATPase domains (20). This suggests that they more likely represent the post-mismatch recognition state where ATP hydrolysis is stalled and MutS is poised for sliding clamp formation. The simulation results suggest that this structure may promote base flipping in DNA which in turn seems to initiate sliding clamp formation. The correlated changes in the ATPase domain suggest a connection to ATPase activity. The coincidence of apparent changes in the ATPase domain and DNA binding domain as a result of base flipping would be consistent with a previously suggested role of ATP hydrolysis during sliding clamp formation (39, 41). However, this idea is inconsistent with a hydrolysis-independent model for sliding clamp formation that is supported by other studies (40, 84). 115 4.5 Conclusions Sub-µs computer simulations were used to report direct structural evidence for DNA baseflipping in the large MutS-DNA system. The instability in DNA base pairing was found to be specific for the 5’ adjacent base pair instead of the mismatch. This is in contrast to previous hypotheses, but appears to be in good agreement with experimental data. Energetic analysis of the base-flipping process confirmed the existence of a stable flipped-out state in the presence of MutS and revealed an estimated 2-2.5 kcal/mol activation energy barrier for base flipping. Kinetic rates for base flipping were estimated to be in the µs range due to slow DNA backbone and water rearrangements. Further analysis of changes in the MutS structure suggests that base flipping leads to motions of the DNA-binding domains away from the DNA and more subtle changes in the ATPase domain. Taken together, our simulations suggest that base flipping may be the key step that allows MutS to transition from the post-mismatch recognition complex to the sliding clamp formation. We hope that our results will motivate further computational and experimental studies to better understand the mechanistic details of DNA mismatch repair initiation by MutS. 116 4.6 Acknowledgements We thank Dr. Shayantani Mukherjee, Dr. Katarzyna Maksimiak, Dr. Yi-Ming Cheng, Dr. Richard Venable, and Hugh Crosmun for discussions and Dr. Chresten Søndergaard for help with the determination of protein ionization states in the presence of DNA. We also acknowledge access to computational resources from the High Performance Computing Center at Michigan State University. This work was supported by the National Science Foundation [MCB-0447799]; the National Institutes of Health [GM 092949]; TeraGrid [TG-MCB090003]; and the Center for Biological Modeling at Michigan State University [S. M. L.]. 117 Table 4.1 DNA Sequence Used in All MutS Simulations* G/(-9) T/A (-8) G/C (-7) 3’… G18 T17 G16 A15 C14 5’… A14 C15 --- A/T (-6) T16 C/G (-5) C/G (-4) C/G (-2) G/C (-1) C13 A12 C11 G10 G9 A8 C7 C6 G5 C21 T22 T23 G24 G25 C26 G17 G18 A/T (-3) T19 G20 A/T C/G C/G G/C T/A C/G G/C A/T (+1) (+2) (+3) (+4) (+5) (+6) (+7) (+8) T4 C3 G2 A1 A27 G28 C29 T30 …3’ * The DNA sequence is identical to the crystal structure found in reference (29) and the G∙T mismatch is shown in bold. 118 …5’ Figure 4.1 X-ray crystal structure of E. coli MutS (20). (A) MutS is colored with respect to its DNA binding domains (red/pink), connector domains (orange or pale orange), core domains (yellow or pale yellow), clamp domains (green or pale green), and ATPase domains (blue or pale blue). The DNA is colored in beige (bases) and brown (backbone). Bound nucleotides are omitted for clarity. (B) A conserved F36-Xaa-E38 motif interacts with the G∙T mismatch through the DNA minor groove. The protein is colored in green, the mismatch in pink, and the G/C(-1) base pair with the 5’ adjacent base C21 in yellow. The bifurcated base pair hydrogen bond in the G∙T mismatch and hydrogen bonding between E38 and T22 are shown as black dotted lines. 119 Figure 4.1 120 Figure 4.2 Pseudodihedral angle definition (see Methods). 121 Figure 4.2 122 Figure 4.3 Cα protein RMSD (red) and heavy atom DNA RMSD (black) for each of the nine simulation models calculated with respect to a common starting structure. Refer to Methods for the notation used to describe each simulation model. 123 Figure 4.3 124 Figure 4.4 K-means clustering (implemented in the MMTSB Tool Set (102)) of the nine simulations using a 2.5 Å radius (large gray overlapping circles). Structures were extracted at every 1 ns of production simulation and clustered based on the Cα RMSD. The area of each cluster (six colored circles) is proportional to the number of structures in that cluster and the individual colored slices show the contribution of structures from the nine different simulations. The colored edges correspond to the sampling of each simulation and the length of the edge is proportional to the Cα RMSD between any two connected cluster centers. The largest Cα RMSD of 2.3 Å was between cluster 2 and cluster 4. 125 Figure 4.4 126 Figure 4.5 DNA base pair hydrogen bonding for C/G(-2), G/C(-1), G∙T mismatch, and A/T(+1) base pairs from N3-N1 (C/G base pairs), N1-N3 (A/T base pairs), and N1-O4 (G∙T mismatch) distance time series in each simulation are described here. Typical hydrogen bond distances of 3 Å are shown as blue dotted lines. (A) Wild type simulations with different nucleotide combinations. (B) F36A mutant simulations. 127 Figure 4.5 128 Figure 4.6 Comparison of the G·T and G/C(-1) major groove widths from the unbiased NONE:NONE simulation. The solid blue line corresponds to a canonical major groove width of 17 Å estimated from the B-DNA crystal structure 1BNA (145). Major groove widths were calculated using the 3DNA program v2.0 (152). 129 Figure 4.6 130 Figure 4.7 Comparison of T22 glycosyl rotation angle, χ, from the unbiased wild type and F36A mutant simulations. 131 Figure 4.7 132 Figure 4.8 Correlation of C21 base-flipping in NONE:NONE simulation with various structural quantities (see Methods for definitions): (A) Pseudodihedral angle. (B) C21 backbone ζ torsion angle. (C) Movement of the S1 DNA binding domain (S1-D1) along X. (D) Movement of S2-D1 along X. (E) Movement of S2-D1 along Y. (F) Movement of S2-D1 along Z. (G) S2 Ser668 to S1 Asn616 Cα-Cα distance. (H) S1 Asn616 Ψ backbone torsion angle. (I) Salt bridge distance between S2 Arg667 and S1 Glu594 measured between heavy atoms. A distance of 3 Å corresponding to hydrogen bonding is shown as a blue line. (J) Cα-RMSD of the S2 signature loop. 133 Figure 4.8 134 Figure 4.9 Snapshots from the NONE:NONE simulation of base flipping progress viewed from the major groove. Protein, water, and additional DNA are omitted for clarity. The G∙T is colored in pink, G/C(-1) in yellow, and C/G(-2) in grey. The red arrow indicates C21. 135 Figure 4.9 136 Figure 4.10 Free energy profiles from the HREM simulation: (A) Free energy of base flipping (10.5 ns/replica). (B) C21 backbone ζ torsion angle vs. base flipping. (C) Movement of S1-D1 along X vs. base flipping. (D) Movement of S2-D1 along X vs. base flipping. (E) Movement of S2-D1 along Y vs. base flipping. (F) Movement of S2-D1 along Z vs. base flipping. (G) S2 Ser668 to S1 Arg616 Cα-Cα distance vs. base flipping. (H) S1 Arg616 Ψ backbone torsion angle vs. base flipping. 137 Figure 4.10 138 Figure 4.11 Comparison of the C21 backbone ζ torsion angle from the (A) unbiased wild type and (B) F36A mutant simulations. 139 Figure 4.11 140 Figure 4.12 Free energy profiles from the same HREM simulation but generated from different length simulations. The PMF from Figure 4.10A is included here for comparison (red). 141 Figure 4.12 142 Figure 4.13 HREM sampling overlap. Thick red bars correspond to the range of equilibrium pseudodihedral angles, θi , prescribed by a given harmonic potential, while thin black lines correspond to the actual range of pseudodihedral angles, θ , that is sampled by each replica (see Methods). 143 Figure 4.13 144 Figure 4.14 Water residence time calculations from the NONE:NONE simulation (see Methods). (A)-(B) Diagram depicting the water molecules that have entered into the minor groove side as a result of C21 base flipping. The S1 DNA binding domain is shown as a gray surface, the G/C(-1) base pair is colored yellow, the G·T mismatch is colored in magenta, the rest of the DNA is colored in brown (in (B) only), and water molecules located within 4 Å of the G10 base are colored as orange, blue, and green spheres (only waters within a 4 Å radius of the G10 base are used for the residence time calculation). Additional waters within 6 Å of the G10 base are colored in red and were included to illustrate the crowded environment. For clarity, only protein atoms within a 10 Å radius of the G10 base are shown. The white arrow points to the narrow 6 Å wide channel that is created when C21 is flipped out. Orange spheres correspond to fast moving waters with residence times less than 500 ps while green and blue spheres correspond to trapped waters with long residence times in the 1-10 ns range. (C)-(D) Water residence times for water molecules on the surface of the protein (away from the DNA) (C) before and (D) after C21 base flipping (note that the time is in picoseconds). (E)-(F) Water residence times for water molecules located within 4 Å of the G10 base (E) before and (F) after C21 base flipping. The black lines in (C)-(F) correspond to the inverse slope used to calculate the accompanying residence time. 145 Figure 4.14 146 Figure 4.15 Solvent-accessible surface area (SASA) calculations from NONE:NONE and ATP:NONE simulations. Each plot show the time series for the change in solvent-accessible surface area (ΔSASA) of the H1’ atom (bound to C1’ on the minor groove side) upon binding to MutS. Results from the NONE:NONE simulation (top), where the base is flipped out after 10 ns, are compared with the ATP:NONE simulation (bottom), where the base remains stacked. 147 Figure 4.15 148 Figure 4.16 Protein domain motions. (A) Side and back view of the starting MutS-DNA complex along with the three orthogonal vectors, X, Y, and Z, used to describe the protein domain motions. The S1 DNA binding domain is red, the S2 DNA binding domain is pink, the DNA is brown, and the rest of the protein is colored white. Vector X is identical to the DNA helical axis, vector Y points upwards towards the phosphorous of cytosine 7, and vector Z is perpendicular to X and Y and points towards the S1 core domain. (B) Side and back view of the S1 and S2 DNA binding domains bound to DNA after ~200 ns of simulation time. The final simulation structure is colored in green and the orientation of the starting structure is identical to (A). (C)-(H) Comparison of the range of motion along the three orthogonal axes X, Y, and Z between trajectories with and without base flipping. The trajectory where base flipping is observed is colored in black and the remaining eight trajectories where no base flipping is observed is collectively colored in red. Movement of the S1 DNA binding domain is shown in (C) – (E) while movement of the S2 DNA binding domain is shown in (F) – (H). 149 Figure 4.16 150 Figure 4.16 (Cont’d) 151 Figure 4.17 S1 DNA binding domain movement from unbiased and HREM (NONE:NONE) simulations along Y and Z directions. (A) Movement of the S1 DNA binding domain (S1-D1) along Y. (B) Movement of S1-D1 along Z. (C) Free energy profile of the movement of S1-D1 along Y with respect to the base opening angle. (D) Free energy profile of the movement of S1D1 along Z with respect to the base opening angle. 152 Figure 4.17 153 Figure 4.18 Allosteric signaling from the DNA binding domain to the ATPase domains. A structure-based sequence alignment was used (30) to map highly conserved residues onto the NONE:NONE model. The protein is shown as white ribbons and is in the same orientation as the full length structure found in Figure 4.1A. The conserved residues are highlighted as red spheres for the S1 monomer only. The same residues are conserved in the S2 monomer as well. DNA, water, and nucleotides have been omitted for clarity. 154 Figure 4.18 155 Figure 4.19 Visualization of the effects of base flipping on the ATPase domains. The S1 and S2 monomers are colored in blue and white, respectively. Ser668 is colored green, Asn616 is colored pink, Arg667 is colored cyan, Glu594 is colored yellow, and ATP is colored magenta. (A) Starting simulation structure modeled from the 1W7A crystal structure. ATP is modeled in for reference but is absent in the base-flipping simulation. (B) A post-base-flipping conformation from the base-flipped trajectory showing the Asn616 Ψ angle reorientation, salt bridge formation between Arg667 and Glu594, and stabilization of Ser668 and the S2 signature loop. 156 Figure 4.19 157 Chapter 5 A Path-Based Reaction Coordinate for Biased Sampling of Nucleic Acid Translocation Sean M. Law, Afra Panahi, and Michael Feig Afra Panahi contributed to the work on the Weighted Histogram Analysis Method. 158 5.1 Introduction Protein-nucleic acid interactions are involved in many biological processes, including DNA replication, transcription, translation, DNA repair, DNA degradation, DNA packaging and homologous recombination. In many of these processes, translocation of proteins along doublestranded DNA plays a central role. Experimental investigations of protein-DNA translocation over the past decade include studies of helicases (153-158), FtsK DNA translocase (159-161), and the Zif268 zinc-finger protein (162). Computational studies have provided additional insight into the mechanisms of protein-DNA translocation and serve as a platform for developing new hypotheses (163-164). One such example is the observation of spontaneous forward and backward translocation of nucleic acid in the RNA polymerase II system during submicrosecond MD simulations (53). While these results have extended the understanding of the eukaryotic transcription process, such findings are rare because translocation more commonly occurs on millisecond-second time scales. Millisecond time scales and beyond remain largely inaccessible with conventional constanttemperature MD simulations but can be studied with enhanced sampling methods such as umbrella sampling. Such methods aid in overcoming kinetic barriers by biasing sampling along a suitable reaction coordinate. The choice of the reaction coordinate is often straightforward and may consist of an intramolecular distance, a torsional angle, or root mean square deviations (RMSD) from a given target structure. Protein-nucleic translocation, however, is more difficult to describe with a simple reaction coordinate since the underlying dynamics is often complex with both rotation and translation. Further complications arise when the nucleic acid is deformed during interactions with a protein. 159 In an earlier investigation using a so-called “path-search algorithm”, Ishida examined the free energy for branch migration in a RuvA-Holliday junction DNA complex by employing the umbrella sampling method (165). In their study, 68 harmonic biasing potentials were applied to 64 individual phosphorus atoms in addition to four other center-of-mass restraints (applied to four central bases) located in the heart of the Holliday junction. In each umbrella window, the 68 independent reaction coordinates were biased along a specific direction vector towards a target position, ultimately resulting in branch migration by a single base position. However, this procedure assumes that each phosphorus atom must migrate towards the exact position of the next target phosphorus atom and this method is limited to branch migration by only one base. More recently, Golosov et al. used targeted molecular dynamics (TMD) along with a more sophisticated reaction coordinate in order to study the translocation step in DNA replication by DNA polymerase I (166). However, the complex coordinate chosen to characterize the DNA motion is specific for DNA polymerase I and cannot be easily generalized to other nucleic acid translocation systems. In the current study, we introduce a generalized path-based restraint that can be used in umbrella sampling simulations to enhance the sampling of a system (or a subset of the system) along any specific 3-dimensional path. For nucleic acid translocation, individual paths for each strand are pre-defined and the movement of each nucleotide is biased via its center-of-mass projection onto its corresponding path towards a target location relative to the path. The use of a center-of-mass projection allows the entire nucleotide to freely sample different conformations while maintaining an overall motion that runs parallel to the translocation path. We have tested our path-based restraint on the Hin recombinase protein-DNA complex (167) by translocating the DNA both in the forward and backward directions using umbrella sampling. The resulting 160 motion clearly follows a screw-like movement of the DNA that would be difficult to capture with general umbrella sampling potentials. In the following, the theoretical basis of umbrella sampling is reviewed briefly, the proposed new biasing function for translocation is introduced, and a sample application is described. 161 5.2 Methods Umbrella sampling The time scale associated with transitions between two states depends on the kinetic barrier separating the two states. The rate of conformational transitions can be accelerated effectively by flattening the barrier. This idea is exploited in umbrella sampling where a biasing potential U umbrella (ξ ) along a reaction coordinate ξ is applied (168). Simulations are then carried out with a biased energy function ( ) ( ) = Eunbiased r N + U umbrella (ξ ) Ebiased r N (5.1) The resulting sampling can be reweighted to obtain the potential of mean force (PMF) as a function of ξ according to: −U umbrella w (ξ ) = (ξ ) + wbiased (ξ ) + f (5.2) where the biased free energy, wbiased (ξ ) , is obtained from the sampling probability function pbiased (ξ ) in the biased simulation as wbiased (ξ ) = −kT ln pbiased (ξ ) (5.3) and f is a constant (with respect to ξ) and defined according to ( )dr N e− β f = − β Eunbiased ( r N ) dr N ∫e ∫e − β Ebiased r N 162 (5.4) The choice of reaction coordinate is key to successful application of the umbrella sampling method. Ideally, progress along the reaction coordinate would coincide closely with the minimum energy transition path between the two states with minimal sampling of low-energy states in orthogonal directions. Then, the umbrella sampling method can effectively guide the transition between two states over significant kinetic barriers. In practice, the use of a single umbrella is often not sufficient to study transitions in more complex systems (i.e. DNA translocation). Instead, a series of i simulations is carried out with biasing functions U umbrella,i (ξ ) that progress along the reaction coordinate in a stepwise fashion with sampling limited to overlapping ranges of ξ . As a result, piecewise PMFs wi (ξ ) are obtained from each simulation which subsequently need to be combined into a total PMF w (ξ ) along the entire range of ξ . The combination of multiple PMFs into a single PMF along the entire transition path is possible with the weighted histogram analysis method (WHAM) (169) or the more recently introduced multi-state Bennett acceptance ratio method (MBAR) (170). 163 Path-Based Restraint Potential An arbitrary 3-dimensional path is represented by a set of P points that are connected via ( ) = P − 1 piece-wise smooth cubic spline equations, Sq tq ( q 1,, P − 1) . tq ∈ [ 0,1] describes interpolated positions on the qth spline piece between spline points Sq ( 0 ) and Sq (1) = Sq +1 ( 0 ) . Based on the piece-wise tq values, a continuous reaction coordinate ξ is introduced as ξ= q + tq with positions on the entire path given as S (ξ ) so that, for example, ( ) ( ) S= 1.5) S1 = 0.5 and S (ξ 14.92) S14 tq 0.92 . (ξ = tq = = = A path-based restraint potential can then be defined by comparing the projection of the center of mass (COM) of a set of atoms onto such a path with a given reference value according to: U umbrella = K (ξ − ξo ) (ξ ) 2 (5.5) where K is the force constant in units of kcal/mol (since ξ has arbitrary units), ξ is the instantaneous COM projection in the moving system, and ξo is the target value at which the umbrella potential becomes zero. Because only the projection onto the path is restrained the system is free to explore conformational space orthogonal to the path thereby allowing, for example, a protein-nucleic acid complex to respond structurally when the protein translocates along the nucleic acid. 164 If multiple COMs are projected onto different parts of the path and the biasing potential is used to advance the entire structure along the path (as in protein-nucleic acid translocation) it is more convenient to use ξ and ξo values relative to initial values from a reference structure rather than absolute values. It then becomes possible to advance multiple points along different sections of a given path with a single target value. In that case the biasing potential becomes: U umbrella (ξ1,, ξ M ) = M ∑ Kα (ξα − ξα ,initial − ξo ) α =1 2 (5.6) where ξα and ξα ,initial correspond to the moving and initial COM projections for the α th set of atoms, respectively, and ξ0 is the relative change of all COM projections from their initial values. We will now discuss how to obtain ξ , the projection of an atom or center of mass (COM) of a set of atoms onto a given path. A projection onto a line is defined as the closest point on the path from a given reference point. In the most general case this involves an analytically intractable quintic equation with potentially multiple solutions. In order to simplify the problem, we approximate the path at ξ by a secant vector that passes through the points S (ξo + ∆ξ ) and S (ξo − ∆ξ ) . Because of the umbrella potential, sampling of ξ is assumed to be sufficiently close to ξo for the approximation of a locally linear path to be reasonable. In order to obtain a good approximation of the part of the path sampled with a given umbrella potential, we choose ∆ξ =/ K . The projection of a given COM onto the linear path can then be solved analytically 1 with a unique solution. If ξ is not close to ξo or if it is unknown, such as at the beginning of a 165 simulation, tangent vectors are first defined from the P-1 spline points and subsequently used to find the closest spline segment and to obtain an initial estimate of the projection, ξ guess . Then a refined estimate is obtained as above by using the secant vector that passes through the points at (ξ guess + ∆ξ ) and (ξ guess − ∆ξ ) . To ensure that a closer point does not exist in the two neighboring spline pieces, we also project the COM onto the neighboring splines using the same protocol. From our experience, this two-step approach for determining an initial value of ξ is sufficiently robust. The path-based restraint potential was implemented in CHARMM (v. c36a4). Generalized Weighted Histogram Analysis Method Calculation of unbiased PMFs from umbrella sampling simulations (74) is commonly addressed using WHAM (169) (171-173). WHAM is often applied to obtain a PMF for the (single) reaction coordinate that is used during umbrella sampling but an extension of WHAM to multidimensional umbrella potentials or to the calculation of PMFs for other reaction coordinates than the one(s) used in the umbrella potential is not so straightforward. In the case of a multidimensional biasing potential with M coordinates ξ1ξ M one obtains: punbiased (ξ1,, ξ M ) = e βU umbrella (ξ1,...,ξ M ) 166 pbiased (ξ1,, ξ M ) e− β f (5.7) The PMF for a different reaction coordinate η is formally given according to: ∫ δ (η − η ' (ξ1,,ξ M ) ) punbiased (ξ1,,ξ M ) dξ1 dξ M punbiased (η ) = = e− β f × ∫ δ (η − η ' (ξ1,,ξ M ) ) e βU umbrella (ξ1,...,ξ M ) pbiased (ξ1,, ξ M ) dξ1 dξ M ≡ e− β f Ξ (η ) (5.8) punbiased (η ) can be obtained numerically by first constructing the M-dimensional histogram for punbiased (ξ1,, ξ M where η (ξ1,, ξ M ) according to Eq. (5.7) and then accumulating all of the elements ) falls into a given bin in a one-dimensional histogram from N umbrella windows: ( ) N1 NM ( punbiased η j ∑  ∑ punbiased ξ1,i1 ,...,ξ M ,iM = 1= 1 i1 iM ) η (ξ1,i1 ,...,ξM ,iM ) ∈ η j ,η j + ∆η    (5.9) = = where η j ηmin + j∆η and ξi, j ξi,min + j∆ξi with histogram bin widths ∆η and ∆ξi . In practice, the explicit construction of the M-dimensional histogram is not desirable for large numbers of M because of computer memory limitations and because the limited length of typical simulations does not generate sufficient sampling for conventional multidimensional WHAM to converge. Instead, the histogram for punbiased (η ) can be accumulated on the fly without explicitly constructing the M -dimensional histogram. The combination of Eq. (5.7) and (5.9) gives: 167 ( ) punbiased η j =  e− β f ×   N1 N M βU ξ ,...,ξ M ,i M p  ∑  ∑ e umbrella 1,i1 biased ξ1,i1 ,, ξ M ,iM  = 1= 1 iM  i1 ( ) ( )    η j ,η j + ∆η   η ∈   (5.10) ( Since pbiased ξ1,i ,..., ξ M ,i 1 M ) is obtained from the simulation as the fraction of samples where all of the variables ξ1ξ M fall into a particular bin in the M-dimensional histogram, punbiased (η ) can be calculated by directly accumulating e βU umbrella (ξ1,...,ξ M ) from each simulation frame into the bin where η (ξ1,..., ξ M ) ∈ η j ,η j + ∆η  :   ( ) punbiased η j e− β f n βU umbrella (ξ1,,ξ M ∑e n i =1 ) η (ξ ,, ξ ) ∈ η ,η + ∆η  1 M  j j  (5.11) with the summation now running over the n samples from the simulation. Eq. (5.11) applies to any number of variables ξi and is therefore valid in general for any biasing term ( ) ( ) U umbrella r N and any reaction coordinate η r N . In order to obtain the relative free energy shifts, f k , between k overlapping umbrella windows, an iterative approach is used in the WHAM formalism. This step is independent of the reaction coordinate used to calculate the PMF from punbiased (η ) and therefore the standard WHAM formalism can be applied: 168 e− β f k = ∫ dξ1 dξ M N ∑ ni e − βU umbrella,k (ξ1,,ξ M )   i =1 N n e− β U umbrella, j (ξ1,,ξ M )− f j  ∑ j =1 j pbiased ,i (ξ1,, ξ M ) (5.12) where the umbrella potential depends on the variables ξ1ξ M . The integral is solved numerically through discrete summation. For a small number of variables ξi ( M < 4) Eq. (5.12) could be used directly, but for the more general case it is again desirable to avoid the explicit construction of the multi-dimensional histogram and instead accumulate the right side of Eq. (5.12) on the fly. This is accomplished by expressing Eq. (5.12) as: e− β f k N ni ∑∑ e ( − βU umbrella,k ξ1,i,l ,,ξ M ,i,l ( ) )   i= 1l= 1 N n e− β U umbrella, j ξ1,i,l ,,ξ M ,i,l − f j  ∑ j =1 j ∆ξ1 ∆ξ M (5.13) where for each sample l of simulation i with a total of ni samples, the umbrella potential variables ξ1,i,l ξ M ,i,l are determined and then used to calculate U umbrella,k (ξ1,i,l ,, ξ M ,i,l ) and (ξ ,,ξ M ,i,l ) − f j  − β U N  (for a n j e  umbrella, j 1,i,l ∑ j =1 more detailed derivation of Eq. (5.13) see references (170, 174)). Once the f k values are determined, the PMF along η can be calculated according to Eq. (5.11). Simulation of Hin-recombinase translocation 169 As a test system we chose DNA translocation in the Hin-recombinase complex due to its small size and high-resolution (2.3 Å) crystallographic data for the structure of the Hin 52-mer peptide bound to a 14 base pair DNA oligomer (16). All crystallographic waters and unpaired bases located at the DNA ends were removed from the crystal structure (PDBID: 1HCR). This initial structure was used as the reference for defining two paths for the two strands of the DNA backbone along which DNA is translocated. More specifically, the phosphorous atoms from residues 4-15 on strand 1 and phosphorous atoms from residues 18-29 on strand 2 are used as the spline points for the first and second paths, respectively. The restraint potential was then applied to the projection onto the respective path of the center of mass of the heavy atoms of residues 513 (strand 1) and residues 19-27 (strand 2) with a total of 18 independent reaction coordinates (for 9 bases on each strand). We define forward translocation as the movement of the biased DNA nucleotides from their initial reference position towards the A15/T17 base pair (or towards the protein C-terminus) and backward translocation as the movement of the biased DNA nucleotides towards the G3/C29 base pair (or towards the protein N-terminus) (see Table 5.1 and Figure 5.1C). Umbrella sampling of the forward and backward translocation process was achieved by increasing/decreasing ξo from +/-0.05 to +/-1.3 in 0.025 increments (resulting in a total of 52 windows in each direction) with final structures from each umbrella used as the input for the next window. Initially, 300 ps simulations with a force constant of K= 200 kcal/mol were carried out at each value of ξo . The final structures at each value of ξo were then simulated with a reduced force constant of K=100 kcal/mol for 1 ns per window (after 80 ps of additional equilibration). A high force constant was chosen initially to obtain starting structures close to the target ξo values 170 but production simulations were run with a lower force constant to improve sampling overlap between adjacent windows. All simulations were conducted in implicit solvent using the GBMV method in CHARMM (175-176) and the default parameters as specified in the MMTSB Tool Set along with the −12, S 0.65 (177). In addition, atomic radii developed for following GBMV parameters: β = 0 = nucleic acids by Banavali and Roux were used instead of the standard van der Waals radii (178). All simulations were performed using the program CHARMM (version c36a4c) (64) along with the CHARMM27/CMAP force field (71-72) for proteins and nucleic acids and in combination with the MMTSB Tool Set (102). A force switching function, made effective at 15 Å and truncated at 17 Å, was used in the calculation of non-bonded interactions with the cutoff for list -1 generation set at 20 Å. Langevin dynamics was applied to all heavy atoms with a 10 ps friction coefficient (179). SHAKE (180) was used to constrain bond lengths involving hydrogen atoms along with a 1.5 fs integration time step. The starting structure was subjected to energy minimization followed by a gradual heating from 100-300 K over the course of 60 ps. The final structure was then utilized as the initial input for umbrella sampling as described above. The energetics of DNA translocation is described by a single reaction coordinate, η (ξ1,, ξ18 ) that is derived from the individual 18 projection reaction coordinates as follows: 18 ∑ ξi η (ξ1,, ξ18 ) = i =1 18 171 (5.14) Therefore, η can be viewed as the average translocation of the individual bases. While η depends on ξi in this case, any other reaction coordinate η could be chosen as well. The free energy profile for forward and backward translocation along η was calculated by using the generalized WHAM method described above for the data from the 1 ns production sampling for each of the 2x52 umbrellas. 172 5.3 Results DNA Translocation Umbrella sampling with the new path-based biasing potential was applied in simulations of DNA translocation in the Hin-recombinase system. During both the forward ( to > 0 ) and backward translocation ( to < 0 ) processes, both the protein and the DNA remained stable. Final snapshots from selected windows are shown in Figure 5.1. During the umbrella sampling simulations the DNA followed the pre-defined path dictated by its DNA backbone and moved in a screw-like fashion in either direction while largely retaining its internal structure. However, some degree of local distortion was observed, especially during forward translocation. Moreover, fraying of the A15/T17 base pair (in which no biasing potential was applied) was observed during forward translocation. Free Energy Profile for DNA Translocation Figure 5.2 shows the PMF for DNA translocation as a function of the average translocation reaction coordinate, η (ξ1,, ξ18 ) (see Eq. (5.14)). A minimum is located near η = 0 which corresponds to the native binding state. Translocation away from this state by η ≈ ±0.5 (approximately equivalent to translocation by one-half base pairs) causes a free energy increase by about 10 kcal/mol and 12.5 kcal/mol in the backward and forward directions, respectively. Movement of the DNA by a full base pair resulted in further increases in the relative free energy to about 30 kcal/mol for η ≈ −1.0 and about 60 kcal/mol for η ≈ +1.0 . Overall, the PMF shows 173 that backward translocation is favored over forward translocation, but DNA movement in general is highly unfavorable in this system. 174 5.4 Discussion This work describes the development of a new path-based restraint that can be used in umbrella sampling or Hamiltonian replica exchange simulations of more complex conformational transitions that cannot be easily captured with simple geometric reaction coordinates. In particular, the path-based potential is useful for the simulation of nucleic acid translocation. A first application involves a small protein-DNA system, Hin-recombinase bound to its target sequence, which has been previously studied in a different context using short MD simulations (181-182). As a result of the biasing potential, the DNA follows a screw-like movement in both directions by one full base pair rather than just direct translational motion. The umbrella sampling results show that the protein can stay bound to the DNA and is capable of forming new interactions during the translocation process. Based on the free energy profile obtained from our simulations, backward movement appears to be favored vs. forward movement but in both directions DNA translocation appears to be highly unfavorable in this system. This suggests that Hin-recombinase does not scan DNA to find its target sequence. This finding is consistent with the experimental observation that Hinrecombinase binds specifically only to its target DNA base sequence (167). Our approach for enhancing sampling along a given path provides greater flexibility in biased simulations of conformational dynamics than established biased sampling methods since it supports more complex motions than what can be generally described with simple geometric coordinates and provides better control of the entire transition path than RMSD-based restraints. The path-based biasing potential is also less restrictive than RMSD-based restraints because only the projection of selected atoms onto a given path is restrained and the system is free to move in 175 orthogonal degrees of freedom. While the application of the path-based restraint potential is illustrated here for the case of DNA translocation it is applicable in a wide variety of contexts. Other cases where such a potential could be applied may involve protein folding, DNA base flipping, or the passage of molecules through channels or nanopores. 176 5.5 Conclusion DNA translocation is a complex event that plays a fundamental role in cell division, gene transcription, and mismatch recognition. Computational methods offer alternative efforts for studying this essential process. The work presented here overcomes some of the disadvantages of previous methods and provides a more intuitive approach for examining DNA translocation. As well, a free energy profile was calculated using a generalized WHAM and showed significantly high barriers associated with DNA translocation. Finally, unlike unrestrained systems, we cautiously remind the reader that adding additional restraints to a simulation can be helpful answering complex biological questions but extra care is needed when interpreting the validity of the results. 177 Table 5.1 DNA Sequence Used in All Hin Recombinase Simulations* 5’… G3 T4 T5 T6 T7 T8 G9 A10 T11 A12 A13 G14 A15 …3’ 3’… C29 A28 A27 A26 A25 A24 C23 T22 A21 T20 T19 C18 T17 …5’ *The DNA sequence is identical to the crystal structure found in reference (167) with the exception that unpaired terminal bases are omitted. 178 Figure 5.1 Snapshots from forward and backward DNA translocation in Hin-recombinase. The reference protein structure is white and the protein structure resulting from DNA translocation is colored in dark grey in all snapshots. Each base pair is colored independently of the others in order to track the translocation process with respect to the reference structure. A) Forward translocation by one full base step. B) Forward translocation by half of a base step. C) The reference protein-DNA structure. D) Backward translocation by half of a base step. E) Backward translocation by a full base step. 179 Figure 5.1 180 Figure 5.2 Free energy profile for DNA translocation. 181 Figure 5.2 182 Chapter 6 Conclusions and Perspectives 183 6.1 Conclusions and Perspectives MutS and MSH2-MSH6 Conformational Dynamics Following the crystallization of the first two prokaryotic MutS structures in 2000 (20-21), understanding the protein structure-function relationship became a major focus in the field of DNA mismatch repair. From a number of biochemical studies, it was suggested that MutS takes on various conformational states depending on the type of DNA (homoduplex vs. heteroduplex) that it is bound to and the presence or absence of different ATP/ADP nucleotides. However, almost all of the high resolution X-ray structures of MutS to date are bound to a mismatch and appear virtually identical even when bound by different nucleotides (20, 29) or different mismatches (45). Thus, the current picture of the MutS conformational dynamics has been largely influenced by the limited number of available crystal structures. Over the past decade, a number of new methods such as AFM (183), FRET (50), and deuterium exchange mass spectrometry (31, 184) have been used to investigate the DNA mismatch repair process but none of the these methods offer sufficient resolution to accurately describe the various MutS conformational states. In contrast, computational approaches have proven in the past to be able reveal alternative conformational states beyond the crystal structure and they also provide important molecular-level details that can serve as an excellent complement to experiments. The work described in Chapters 2 and 3 were aimed at improving the relatively static picture of MutS conformational dynamics and to help create a detailed mechanism for DNA mismatch recognition that incorporates both experimental as well as structural data. In Chapter 2, four distinct conformational states (characterized as either a DNA binding mode, DNA scanning and mismatch mode, a repair initiation mode, or a sliding clamp formation mode) were identified 184 from NMA and found to be conserved across the different MutS and MSH2-MSH6 systems. These results suggest that the prokaryotic and eukaryotic proteins are not only structurally alike but share considerable similarities in their overall dynamics. Additionally, correlated movements between the DNA binding domains and the ATPase domains were found in several of the biologically relevant modes which support the idea of a long-range allosteric signaling pathway between the distant domains. From these observations, a detailed functional cycle for mismatch recognition was established based on the conserved low-frequency modes and available experimental data. Extending the work from Chapter 2, Chapter 3 examined the protein essential dynamics from nine sub-μs long MD simulations of the E. coli MutS-DNA system using principal component analysis. The PCA results contributed two additional conformational states (a DNA sliding mode that follows sliding clamp formation and a DNA bending mode) to the functional mechanism. These modes also demonstrated similar motional coupling between the DNA binding cavity and the distant ATPases as well as between adjacent ATPases. Altogether the proposed mechanism has improved our understanding of the MutS conformational dynamics and provides a framework for developing a more comprehensive picture of the mismatch recognition cycle. Future work should focus on validating each step of the functional cycle, possibly by introducing new mutations in the protein that would prevent the transition between the different conformational states. Also, the role of MutL within the mismatch recognition cycle is still rather unclear and warrants further attention. Does MutS Employ a DNA Base-flipping Mechanism? 185 Since 2003, it has been repeatedly speculated that the MutS protein binds to the mismatch, bends the DNA, and uses a base-flipping mechanism to flip out one of the mismatched bases (16, 25, 30, 48, 51). This proposition is not necessarily surprising when considering the fact that MutS is a DNA repair protein that inserts a conserved residue (phenylalanine) into the DNA minor groove much like to several other DNA repair systems (DNA demethylases, DNA glycosylases, and T4 endonuclease V) which employ the same technique in order to promote base opening (123-124). Thus, it is reasonable to believe that MutS may also utilize a similar base-flipping mechanism during post-mismatch recognition. However, direct evidence for a base-flipping mechanism has been lacking to date. In Chapter 4, the DNA dynamics surrounding a G·T mismatch were analyzed. The mismatch base pair, which forms a bifurcated hydrogen bond in the crystal structure, was found to be more stable than expected but the 5’ adjacent base next to the mismatched thymine was observed to be much more dynamic and, in one case, the 5’ adjacent base flipped out of the helical stack via the major groove. To the best of our knowledge, this is the first reported case of spontaneous DNA base-flipping from an unbiased protein-DNA MD simulation. The direct visualization of the base-flipping process is significant not only for the MutS system but also in general for the study of protein-DNA interactions. In other DNA repair systems, the damaged base is often flipped and then removed. However, it is not clear why the 5’ adjacent base is flipped out rather than the mismatch itself. This may have to do with the fact that the mismatch is stabilized through its interactions with the conserved phenylalanine. A flipped out base could serve as an amplification of the mismatch recognition signal and possibly play a role in DNA unbending (48). To further validate these observations, it may be possible to use a disulphide crosslinking strategy (124), where the 5’ adjacent base is replaced by a disulphide-modified cytosine 186 and an appropriate residue in the DNA binding domain is mutated to cysteine, to capture the base in its open state. However special care is required to ensure that the disulphide-modified cytosine is not recognized as a mismatch by MutS and that the engineered cysteine residue does not affect the general protein function. It may also be interesting to see from future MD simulations whether or not the 5’ adjacent base still experiences enhanced dynamics when the mismatch is replaced by a normal Watson-Crick base pair. Homoduplex vs. Heteroduplex DNA The affinity of MutS for heteroduplex DNA is well understood but exactly how the protein differentiates between damaged and undamaged DNA is still largely unclear (16). A hypothetical model for DNA scanning describes the free energy landscape as a rugged terrain with deep minima that correspond to locations along the DNA that are highly flexible due to the presence of a mismatch (36). However, brute force simulations are not useful for studying DNA translocation because MutS has been shown to move along DNA at a rate (2 ms/base) that is well beyond the time scales currently available by straight MD. Instead, alternative enhanced sampling techniques such as umbrella sampling (74) are required. Chapter 5 details the development of a new path-based restraint, its successful application to drive the forward and backward translocation of DNA in the Hin recombinase test system, and the determination of the corresponding free energy profile along an intuitive translocation reaction coordinate from the biased simulations. Future application of these same techniques to study DNA translocation in the MutS-DNA system would greatly improve the current understanding of how the protein distinguishes between mismatch DNA and homoduplex DNA. For example, instead of only 187 looking at DNA translocation starting from a mismatch-bound state, it may be interesting to replace the mismatch with a regular base pair and then translocate this DNA through MutS. Since MutS is known to bind tightly to a mismatch then the free energy barrier for moving the mismatch away from this low energy state would be high. However, one would expect the free energy barrier to be relatively flat when translocating along homoduplex DNA. Furthermore, since MutS also bends the DNA upon mismatch binding, it may also be interesting to measure DNA bending as a function of DNA translocation. Future Directions The body of work presented in this dissertation has provided new insight into the MutS DNA mismatch repair protein. One of the main challenges of studying MutS (and its homologs) using computer simulation methods is the large size of the protein which has limited the conformational sampling to sub-μs time scales. Recent experimental studies have demonstrated the importance of understanding the interactions between MutS and the downstream repair protein, MutL, (9, 31) as well as MutS tetramerization effects on longer length mismatchcontaining DNA (185). However, examining these larger complicated systems using all-atom MD simulations would be prohibitive from a computational standpoint. One possible solution is to decrease the overall resolution of the system by using a coarse-graining method whereby each protein or nucleic acid residue is represented by one or more coarse-grained particles (depending on the level of accuracy required). This kind of approach has been explored in our group through the development of an accurate coarse-graining model for protein-nucleic acids systems, 188 PRIMO/PRIMONA (186), and would be most suitable for the future investigation of multiprotein MutS-DNA complexes. 189 References 190 References 1. Watson, J. D., and F. H. Crick. 1953. Molecular Structure of Nucleic Acids; a Structure for Deoxyribose Nucleic Acid. Nature 171:737-738. 2. Loeb, L. A., and T. A. Kunkel. 1982. Fidelity of DNA Synthesis. Annu Rev Biochem 51:429-457. 3. Friedberg, E. C., G. C. Walker, and W. Siede. 1995. DNA Repair and Mutagenesis. ASM Press, Washington, D.C. 4. Hsieh, P., and K. Yamane. 2008. DNA Mismatch Repair: Molecular Mechanism, Cancer, and Ageing. Mech. Ageing Dev. 129:391-407. 5. Modrich, P., and R. Lahue. 1996. Mismatch Repair in Replication Fidelity, Genetic Recombination, and Cancer Biology. Annu Rev Biochem 65:101-133. 6. Su, S. S., and P. Modrich. 1986. Escherichia Coli Muts-Encoded Protein Binds to Mismatched DNA Base Pairs. Proc. Natl. Acad. Sci. U. S. A. 83:5057-5061. 7. Grilley, M., K. M. Welsh, S. S. Su, and P. Modrich. 1989. Isolation and Characterization of the Escherichia Coli MutL Gene Product. J. Biol. Chem. 264:1000-1004. 8. Acharya, S., P. L. Foster, P. Brooks, and R. Fishel. 2003. The Coordinated Functions of the E-Coli MutS and MutL Proteins in Mismatch Repair. Mol. Cell 12:233-246. 9. Winkler, I., A. D. Marx, D. Lariviere, R. J. Heinze, M. Cristovao, A. Reumer, U. Curth, T. K. Sixma, and P. Friedhoff. 2011. Chemical Trapping of the Dynamic MutS-MutL Complex Formed in DNA Mismatch Repair in Escherichia Coli. J Biol Chem 286:1732617337. 10. Au, K. G., K. Welsh, and P. Modrich. 1992. Initiation of Methyl-Directed Mismatch Repair. J. Biol. Chem. 267:12142-12148. 11. Cooper, D. L., R. S. Lahue, and P. Modrich. 1993. Methyl-Directed Mismatch Repair Is Bidirectional. J. Biol. Chem. 268:11823-11829. 191 12. Grilley, M., J. Griffith, and P. Modrich. 1993. Bidirectional Excision in Methyl-Directed Mismatch Repair. J. Biol. Chem. 268:11830-11837. 13. Viswanathan, M., and S. T. Lovett. 1998. Single-Strand DNA-Specific Exonucleases in Escherichia Coli. Roles in Repair and Mutation Avoidance. Genetics 149:7-16. 14. Viswanathan, M., and S. T. Lovett. 1999. Exonuclease X of Escherichia Coli. A Novel 3'-5' Dnase and Dnaq Superfamily Member Involved in DNA Repair. J Biol Chem 274:30094-30100. 15. Burdett, V., C. Baitinger, M. Viswanathan, S. T. Lovett, and P. Modrich. 2001. In Vivo Requirement for Recj, Exovii, Exoi, and Exox in Methyl-Directed Mismatch Repair. Proc. Natl. Acad. Sci. U. S. A. 98:6765-6770. 16. Kunkel, T. A., and D. A. Erie. 2005. DNA Mismatch Repair. Annu Rev Biochem 74:681710. 17. Schofield, M. J., and P. Hsieh. 2003. DNA Mismatch Repair: Molecular Mechanisms and Biological Function. Annu Rev Microbiol 57:579-608. 18. Joseph, N., V. Duppatla, and D. N. Rao. 2006. Prokaryotic DNA Mismatch Repair. Prog Nucleic Acid Res Mol Biol 81:1-49. 19. de Las Alas, M. M., R. A. M. de Bruin, L. Ten Eyck, G. Los, and S. B. Howell. 1998. Prediction-Based Threading of the hMSH2 DNA Mismatch Repair Protein. FASEB J 12:653-663. 20. Lamers, M. H., A. Perrakis, J. H. Enzlin, H. H. Winterwerp, N. de Wind, and T. K. Sixma. 2000. The Crystal Structure of DNA Mismatch Repair Protein MutS Binding to a G•T Mismatch. Nature 407:711-717. 21. Obmolova, G., C. Ban, P. Hsieh, and W. Yang. 2000. Crystal Structures of Mismatch Repair Protein MutS and Its Complex with a Substrate DNA. Nature 407:703-710. 22. Yamamoto, A., M. J. Schofield, I. Biswas, and P. Hsieh. 2000. Requirement for Phe36 for DNA Binding and Mismatch Repair by Escherichia Coli MutS Protein. Nucleic Acids Res. 28:3564-3569. 192 23. Schofield, M. J., F. E. Brownewell, S. Nayak, C. W. Du, E. T. Kool, and P. Hsieh. 2001. The Phe-X-Glu DNA Binding Motif of MutS - the Role of Hydrogen Bonding in Mismatch Recognition. J. Biol. Chem. 276:45505-45508. 24. Lebbink, J. H. G., D. Georgijevic, G. Natrajan, A. Fish, H. H. K. Winterwerp, T. K. Sixma, and N. de Wind. 2006. Dual Role of MutS Glutamate 38 in DNA Mismatch Discrimination and in the Authorization of Repair. EMBO J. 25:409-419. 25. Holmes, S. F., K. D. Scarpinato, S. D. McCulloch, R. M. Schaaper, and T. A. Kunkel. 2007. Specialized Mismatch Repair Function of Glu339 in the Phe-X-Glu Motif of Yeast MSH6. DNA Repair 6:293-303. 26. Malkov, V. A., I. Biswas, R. D. Camerini-Otero, and P. Hsieh. 1997. Photocross-Linking of the NH2-Terminal Region of Taq MutS Protein to the Major Groove of a Heteroduplex DNA. J. Biol. Chem. 272:23811-23817. 27. Tama, F., M. Valle, J. Frank, and C. L. Brooks. 2003. Dynamic Reorganization of the Functionally Active Ribosome Explored by Normal Mode Analysis and Cryo-Electron Microscopy. Proc. Natl. Acad. Sci. U. S. A. 100:9319-9323. 28. Van Wynsberghe, A., G. H. Li, and Q. Cui. 2004. Normal-Mode Analysis Suggests Protein Flexibility Modulation Throughout RNA Polymerase's Functional Cycle. Biochemistry 43:13083-13096. 29. Lamers, M. H., D. Georgijevic, J. H. Lebbink, H. H. Winterwerp, B. Agianian, N. de Wind, and T. K. Sixma. 2004. ATP Increases the Affinity between MutS ATPase Domains. Implications for ATP Hydrolysis and Conformational Changes. J Biol Chem 279:43879-43885. 30. Warren, J. J., T. J. Pohlhaus, A. Changela, R. R. Iyer, P. L. Modrich, and L. S. Beese. 2007. Structure of the Human MutS Alpha DNA Lesion Recognition Complex. Mol. Cell 26:579-592. 31. Mendillo, M. L., V. V. Hargreaves, J. W. Jamison, A. O. Mo, S. Li, C. D. Putnam, V. L. Woods, and R. D. Kolodner. 2009. A Conserved MutS Homolog Connector Domain Interface Interacts with MutL Homologs. Proc. Natl. Acad. Sci. U. S. A. 106:2222322228. 32. Mukherjee, S., and M. Feig. 2009. Conformational Change in MSH2-MSH6 Upon Binding DNA Coupled to ATPase Activity. Biophys. J. 96:L63-65. 193 33. Bjornson, K. P., and P. Modrich. 2003. Differential and Simultaneous Adenosine Di- and Triphosphate Binding by MutS. J. Biol. Chem. 278:18557-18562. 34. Antony, E., and M. M. Hingorani. 2004. Asymmetric ATP Binding and Hydrolysis Activity of the Thermus Aquaticus MutS Dimer Is Key to Modulation of Its Interactions with Mismatched DNA. Biochemistry 43:13115-13128. 35. Biswas, I., G. Obmolova, M. Takahashi, A. Herr, M. A. Newman, W. Yang, and P. Hsieh. 2001. Disruption of the Helix-U-Turn-Helix Motif of MutS Protein: Loss of Subunit Dimerization, Mismatch Binding and ATP Hydrolysis. J. Mol. Biol. 305:805816. 36. Gorman, J., A. Chowdhury, J. A. Surtees, J. Shimada, D. R. Reichman, E. Alani, and E. C. Greene. 2007. Dynamic Basis for One-Dimensional DNA Scanning by the Mismatch Repair Complex MSH2-MSH6. Mol. Cell 28:359-370. 37. Gradia, S., S. Acharya, and R. Fishel. 2000. The Role of Mismatched Nucleotides in Activating the hMSH2-hMSH6 Molecular Switch. J. Biol. Chem. 275:3922-3930. 38. Gradia, S., S. Acharya, and R. Fishel. 1997. The Human Mismatch Recognition Complex hMSH2-hMSH6 Functions as a Novel Molecular Switch. Cell 91:995-1005. 39. Allen, D. J., A. Makhov, M. Grilley, J. Taylor, R. Thresher, P. Modrich, and J. D. Griffith. 1997. MutS Mediates Heteroduplex Loop Formation by a Translocation Mechanism. EMBO J. 16:4467-4476. 40. Gradia, S., D. Subramanian, T. Wilson, S. Acharya, A. Makhov, J. Griffith, and R. Fishel. 1999. hMSH2-hMSH6 Forms a Hydrolysis-Independent Sliding Clamp on Mismatched DNA. Mol. Cell 3:255-261. 41. Blackwell, L. J., K. P. Bjornson, D. J. Allen, and P. Modrich. 2001. Distinct MutS DNABinding Modes That Are Differentially Modulated by ATP Binding and Hydrolysis. J. Biol. Chem. 276:34339-34347. 42. Blackwell, L. J., D. Martik, K. P. Bjornson, E. S. Bjornson, and P. Modrich. 1998. Nucleotide-Promoted Release of Hmuts Alpha from Heteroduplex DNA Is Consistent with an ATP-Dependent Translocation Mechanism. J. Biol. Chem. 273:32055-32062. 194 43. Sixma, T. K. 2001. DNA Mismatch Repair: MutS Structures Bound to Mismatches. Curr. Opin. Struct. Biol. 11:47-52. 44. Lamers, M. H., H. H. Winterwerp, and T. K. Sixma. 2003. The Alternating ATPase Domains of MutS Control DNA Mismatch Repair. EMBO J 22:746-756. 45. Natrajan, G., M. H. Lamers, J. H. Enzlin, H. H. Winterwerp, A. Perrakis, and T. K. Sixma. 2003. Structures of Escherichia Coli DNA Mismatch Repair Enzyme MutS in Complex with Different Mismatches: A Common Recognition Mode for Diverse Substrates. Nucleic Acids Res. 31:4814-4821. 46. Hunenberger, P. H., A. E. Mark, and W. F. van Gunsteren. 1995. Fluctuation and CrossCorrelation Analysis of Protein Motions Observed in Nanosecond Molecular Dynamics Simulations. J Mol Biol 252:492-503. 47. Kato, R., M. Kataoka, H. Kamikubo, and S. Kuramitsu. 2001. Direct Observation of Three Conformations of MutS Protein Regulated by Adenine Nucleotides. J. Mol. Biol. 309:227-238. 48. Wang, H., Y. Yang, M. J. Schofield, C. W. Du, Y. Fridman, S. D. Lee, E. D. Larson, J. T. Drummond, E. Alani, P. Hsieh, and D. A. Erie. 2003. DNA Bending and Unbending by MutS Govern Mismatch Recognition and Specificity. Proc. Natl. Acad. Sci. U. S. A. 100:14822-14827. 49. Engel, A., and D. J. Muller. 2000. Observing Single Biomolecules at Work with the Atomic Force Microscope. Nat. Struct. Biol. 7:715-718. 50. Sass, L. E., C. Lanyi, K. Weninger, and D. A. Erie. 2010. Single-Molecule FRET TACKLE Reveals Highly Dynamic Mismatched DNA-MutS Complexes. Biochemistry 49:3174-3190. 51. Tessmer, I., Y. Yang, J. Zhai, C. W. Du, P. Hsieh, M. M. Hingorani, and D. A. Erie. 2008. Mechanism of MutS Searching for DNA Mismatches and Signaling Repair. J. Biol. Chem. 283:36646-36654. 52. Nag, N., B. J. Rao, and G. Krishnamoorthy. 2007. Altered Dynamics of DNA Bases Adjacent to a Mismatch: A Cue for Mismatch Recognition by MutS. J Mol Biol 374:3953. 195 53. Feig, M., and Z. F. Burton. 2010. RNA Polymerase Ii with Open and Closed Trigger Loops: Active Site Dynamics and Nucleic Acid Translocation. Biophys. J. 99:2577-2586. 54. Villa, E., A. Balaeff, and K. Schulten. 2005. Structural Dynamics of the Lac RepressorDNA Complex Revealed by a Multiscale Simulation. Proc. Natl. Acad. Sci. U. S. A. 102:6783-6788. 55. Woo, H. J., Y. Liu, and R. Sousa. 2008. Molecular Dynamics Studies of the Energetics of Translocation in Model T7 RNA Polymerase Elongation Complexes. Proteins 73:10211036. 56. Eargle, J., A. A. Black, A. Sethi, L. G. Trabuco, and Z. Luthey-Schulten. 2008. Dynamics of Recognition between Trna and Elongation Factor Tu. J. Mol. Biol. 377:1382-1405. 57. Kendrew, J. C., G. Bodo, H. M. Dintzis, R. G. Parrish, H. Wyckoff, and D. C. Phillips. 1958. A Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis. Nature 181:662-666. 58. Brooks, C. L., M. Karplus, and B. M. Pettitt. 1988. Proteins : A Theoretical Perspective of Dynamics, Structure, and Thermodynamics. J. Wiley, New York. 59. Karplus, M., and J. A. McCammon. 2002. Molecular Dynamics Simulations of Biomolecules. Nat. Struct. Biol. 9:646-652. 60. McCammon, J. A., B. R. Gelin, and M. Karplus. 1977. Dynamics of Folded Proteins. Nature 267:585-590. 61. Karplus, M., and G. A. Petsko. 1990. Molecular Dynamics Simulations in Biology. Nature 347:631-639. 62. Mackerell, A. D. 2004. Empirical Force Fields for Biological Macromolecules: Overview and Issues. J. Comput. Chem. 25:1584-1604. 63. Brooks, B. R., R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus. 1983. CHARMM - a Program for Macromolecular Energy, Minimization, and Dynamics Calculations. J. Comput. Chem. 4:187-217. 196 64. Brooks, B. R., C. L. Brooks, III, A. D. Mackerell, L. Nilsson, R. J. Petrella, B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch, L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek, W. Im, K. Kuczera, T. Lazaridis, J. Ma, V. Ovchinnikov, E. Paci, R. W. Pastor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L. Woodcock, X. Wu, W. Yang, D. M. York, and M. Karplus. 2009. CHARMM: The Biomolecular Simulation Program. J. Comput. Chem. 30:1545-1614. 65. Becker, O. M., A. D. M. Jr, B. Roux, and M. Watanabe. 2001. Computational Biochemistry and Biophysics. CRC Press, New York, NY. 66. Weiner, P. K., and P. A. Kollman. 1981. Amber - Assisted Model-Building with Energy Refinement - a General Program for Modeling Molecules and Their Interactions. J. Comput. Chem. 2:287-303. 67. Scott, W. R. P., P. H. Hunenberger, I. G. Tironi, A. E. Mark, S. R. Billeter, J. Fennen, A. E. Torda, T. Huber, P. Kruger, and W. F. van Gunsteren. 1999. The Gromos Biomolecular Simulation Program Package. J Phys Chem A 103:3596-3607. 68. Phillips, J. C., R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, and K. Schulten. 2005. Scalable Molecular Dynamics with NAMD. J. Comput. Chem. 26:1781-1802. 69. Shaw, D. E. 2009. Anton: A Specialized Machine for Millisecond-Scale Molecular Dynamics Simulations of Proteins. Abstracts of Papers of the American Chemical Society 238:-. 70. Shaw, D. E., P. Maragakis, K. Lindorff-Larsen, S. Piana, R. O. Dror, M. P. Eastwood, J. A. Bank, J. M. Jumper, J. K. Salmon, Y. Shan, and W. Wriggers. 2010. Atomic-Level Characterization of the Structural Dynamics of Proteins. Science 330:341-346. 71. MacKerell, A. D., M. Feig, and C. L. Brooks, III. 2004. Improved Treatment of the Protein Backbone in Empirical Force Fields. J Am Chem Soc 126:698-699. 72. MacKerell, A. D., D. Bashford, M. Bellott, R. L. Dunbrack, J. D. Evanseck, M. J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-McCarthy, L. Kuchnir, K. Kuczera, F. T. K. Lau, C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, W. E. Reiher, B. Roux, M. Schlenkrich, J. C. Smith, R. Stote, J. Straub, M. Watanabe, J. Wiorkiewicz-Kuczera, D. Yin, and M. Karplus. 1998. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 102:3586-3616. 197 73. Foloppe, N., and A. D. MacKerell. 2000. All-Atom Empirical Force Field for Nucleic Acids: I. Parameter Optimization Based on Small Molecule and Condensed Phase Macromolecular Target Data. J. Comput. Chem. 21:86-104. 74. Torrie, G. M., and J. P. Valleau. 1977. Non-Physical Sampling Distributions in MonteCarlo Free-Energy Estimation - Umbrella Sampling. J. Comput. Phys. 23:187-199. 75. Fukunishi, H., O. Watanabe, and S. Takada. 2002. On the Hamiltonian Replica Exchange Method for Efficient Sampling of Biomolecular Systems: Application to Protein Structure Prediction. J. Chem. Phys. 116:9058-9067. 76. Kumar, S., D. Bouzida, R. H. Swendsen, P. A. Kollman, and J. M. Rosenberg. 1992. The Weighted Histogram Analysis Method for Free-Energy Calculations of Biomolecules. I. The Method. J. Comput. Chem. 13:1011-1021. 77. Hayward, S., and B. L. de Groot. 2008. Normal Modes and Essential Dynamics. Methods in Molecular Biology (Clifton, N.J.) 443:89-106. 78. Cui, Q., and I. Bahar. 2006. Normal Mode Analysis : Theory and Applications to Biological and Chemical Systems. Chapman & Hall/CRC, Boca Raton. 79. Ma, J. 2005. Usefulness and Limitations of Normal Mode Analysis in Modeling Dynamics of Biomolecular Complexes. Structure 13:373-380. 80. Modrich, P. 2006. Mechanisms in Eukaryotic Mismatch Repair. J. Biol. Chem. 281:30305-30309. 81. Peltomaki, P. 2003. Role of DNA Mismatch Repair Defects in the Pathogenesis of Human Cancer. J Clin Oncol 21:1174-1179. 82. Gu, L. Y., Y. Hong, S. McCulloch, H. Watanabe, and G. M. Li. 1998. ATP-Dependent Interaction of Human Mismatch Repair Proteins and Dual Role of Pcna in Mismatch Repair. Nucleic Acids Res. 26:1173-1178. 83. Jacobs-Palmer, E., and M. M. Hingorani. 2007. The Effects of Nucleotides on MutSDNA Binding Kinetics Clarify the Role of MutS ATPase Activity in Mismatch Repair. J. Mol. Biol. 366:1087-1098. 198 84. Mazur, D. J., M. L. Mendillo, and R. D. Kolonder. 2006. Inhibition of Msh6 ATPase Activity by Mispaired DNA Induces a MSH2(ATP)-MSH6(ATP) State Capable of Hydrolysis-Independent Movement Along DNA. Mol. Cell 22:39-49. 85. Gorbalenya, A. E., and E. V. Koonin. 1990. Superfamily of Uvra-Related Ntp-Binding Proteins - Implications for Rational Classification of Recombination Repair Systems. J. Mol. Biol. 213:583-591. 86. Fiser, A., R. K. G. Do, and A. Sali. 2000. Modeling of Loops in Protein Structures. Protein Sci. 9:1753-1773. 87. Mackerell, A. D., M. Feig, and C. L. Brooks. 2004. Extending the Treatment of Backbone Energetics in Protein Force Fields: Limitations of Gas-Phase Quantum Mechanics in Reproducing Protein Conformational Distributions in Molecular Dynamics Simulations. J. Comput. Chem. 25:1400-1415. 88. Li, G. H., and Q. Cui. 2002. A Coarse-Grained Normal Mode Approach for Macromolecules: An Efficient Implementation and Application to Ca2+-ATPase. Biophys. J. 83:2457-2474. 89. Tama, F., F. X. Gadea, O. Marques, and Y. H. Sanejouand. 2000. Building-Block Approach for Determining Low-Frequency Normal Modes of Macromolecules. ProteinsStructure Function and Genetics 41:1-7. 90. DeLano, W. L. 2002. The PyMOL Molecular Graphics System. 91. Van Wynsberghe, A. W., and Q. Cui. 2005. Comparison of Mode Analyses at Different Resolutions Applied to Nucleic Acid Systems. Biophys. J. 89:2939-2949. 92. Lu, M., B. Poon, and J. Ma. 2006. A New Metho for Coarse-Grained Elastic NormalMode Analysis. Journal of Chemical Theory and Computation 2:464-471. 93. Hays, J. B., P. D. Hoffman, and H. X. Wang. 2005. Discrimination and Versatility in Mismatch Repair. DNA Repair 4:1463-1474. 94. Mitra, R., B. M. Pettitt, G. L. Rame, and R. D. Blake. 1993. The Relationship between Mutation-Rates for the (C-Center-Dot-G) -] (T-Center-Dot-a) Transition and Features of T-Center-Dot-G Mispair Structures in Different Neighbor Environments, Determined by Free-Energy Molecular Mechanics. Nucleic Acids Res. 21:6028-6037. 199 95. Hiratsuka, T. 1994. Nucleotide-Induced Closure of the ATP-Binding Pocket in Myosin Subfragment-1. J. Biol. Chem. 269. 96. Bilwes, A. M., C. M. Quezada, L. R. Croal, B. R. Crane, and M. I. Simon. 2001. Nucleotide Binding by the Histidine Kinase Chea. Nat. Struct. Biol. 8:353-360. 97. Janas, E., M. Hofacker, M. Chen, S. Gompf, C. van der Does, and R. Tampe. 2003. The ATP Hydrolysis Cycle of the Nucleotide-Binding Domain of the Mitochondrial ATPBinding Cassette Transporter Mdl1p. J. Biol. Chem. 278:26862-26869. 98. Pluciennik, A., and P. Modrich. 2007. Protein Roadblocks and Helix Discontinuities Are Barriers to the Initiation of Mismatch Repair. Proc. Natl. Acad. Sci. U. S. A. 104:1270912713. 99. Sali, A., and T. L. Blundell. 1993. Comparative Protein Modelling by Satisfaction of Spatial Restraints. J Mol Biol 234:779-815. 100. Amadei, A., A. B. M. Linssen, and H. J. C. Berendsen. 1993. Essential Dynamics of Proteins. Proteins-Structure Function and Genetics 17:412-425. 101. Mukherjee, S., S. M. Law, and M. Feig. 2009. Deciphering the Mismatch Recognition Cycle in MutS and MSH2-MSH6 Using Normal-Mode Analysis. Biophys. J. 96:17071720. 102. Feig, M., J. Karanicolas, and C. L. Brooks, III. 2004. MMTSB Tool Set: Enhanced Sampling and Multiscale Modeling Methods for Applications in Structural Biology. J. Mol. Graphics Modell. 22:377-395. 103. Kabsch, W., and C. Sander. 1983. Dictionary of Protein Secondary Structure - PatternRecognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 22:2577-2637. 104. Schrödinger, L. Schrödinger, LLC. The PyMOL Molecular Graphics System, Version 1.3. 105. Seeber, M., M. Cecchini, F. Rao, G. Settanni, and A. Caflisch. 2007. Wordom: A Program for Efficient Analysis of Molecular Dynamics Simulations. Bioinformatics 23:2625-2627. 200 106. Seeber, M., A. Felline, F. Raimondi, S. Muff, R. Friedman, F. Rao, A. Caflisch, and F. Fanelli. 2011. Wordom: A User-Friendly Program for the Analysis of Molecular Structures, Trajectories, and Free Energy Surfaces. J. Comput. Chem. 32:1183-1194. 107. Brigo, A., K. W. Lee, G. I. Mustata, and J. M. Briggs. 2005. Comparison of Multiple Molecular Dynamics Trajectories Calculated for the Drug-Resistant Hiv-1 Integrase T66i/M154i Catalytic Domain. Biophys. J. 88:3072-3082. 108. van Aalten, D. M. F., A. Amadei, A. B. M. Linssen, V. G. H. Eijsink, G. Vriend, and H. J. C. Berendsen. 1995. The Essential Dynamics of Thermolysin - Confirmation of the Hinge-Bending Motion and Comparison of Simulations in Vacuum and Water. ProteinsStructure Function and Genetics 22:45-54. 109. Law, S. M., and M. Feig. 2011. Base-Flipping Mechanism in Post-Mismatch Recognition in MutS. Submitted for publication. 110. Zheng, X., K. Diraviyam, and D. Sept. 2007. Nucleotide Effects on the Structure and Dynamics of Actin. Biophys. J. 93:1277-1283. 111. van Gunsteren, W. F., P. H. Hunenberger, A. E. Mark, P. E. Smith, and I. G. Tironi. 1995. Computer Simulation of Protein Motion. Comput. Phys. Commun. 91:305-319. 112. Sekijima, M., C. Motono, S. Yamasaki, K. Kaneko, and Y. Akiyama. 2003. Molecular Dynamics Simulation of Dimeric and Monomeric Forms of Human Prion Protein: Insight into Dynamics and Properties. Biophys. J. 85:1176-1185. 113. Mittal, J., and R. B. Best. 2010. Tackling Force-Field Bias in Protein Folding Simulations: Folding of Villin Hp35 and Pin Ww Domains in Explicit Water. Biophys. J. 99:L26-28. 114. Lahue, R. S., K. G. Au, and P. Modrich. 1989. DNA Mismatch Correction in a Defined System. Science 245:160-164. 115. Hargreaves, V. V., S. S. Shell, D. J. Mazur, M. T. Hess, and R. D. Kolodner. 2010. Interaction between the Msh2 and Msh6 Nucleotide-Binding Sites in the Saccharomyces Cerevisiae Msh2-Msh6 Complex. J. Biol. Chem. 285:9301-9310. 116. Kolodner, R. D., N. R. Hall, J. Lipford, M. F. Kane, M. R. Rao, P. Morrison, L. Wirth, P. J. Finan, J. Burn, P. Chapman, and et al. 1994. Human Mismatch Repair Genes and Their 201 Association with Hereditary Non-Polyposis Colon Cancer. Cold Spring Harb Symp Quant Biol 59:331-338. 117. Li, G. M. 2008. Mechanisms and Functions of DNA Mismatch Repair. Cell Res. 18:8598. 118. Junop, M. S., G. Obmolova, K. Rausch, P. Hsieh, and W. Yang. 2001. Composite Active Site of an ABC ATPase: MutS Uses ATP to Verify Mismatch Recognition and Authorize DNA Repair. Mol. Cell 7:1-12. 119. Alani, E., J. Y. Lee, M. J. Schofield, A. W. Kijas, P. Hsieh, and W. Yang. 2003. Crystal Structure and Biochemical Analysis of the MutS•ADP•Beryllium Fluoride Complex Suggests a Conserved Mechanism for ATP Interactions in Mismatch Repair. J. Biol. Chem. 278:16088-16094. 120. Lebbink, J. H. G., A. Fish, A. Reumer, G. Natrajan, H. H. K. Winterwerp, and T. K. Sixma. 2010. Magnesium Coordination Controls the Molecular Switch Function of DNA Mismatch Repair Protein MutS. J. Biol. Chem. 285:13131-13141. 121. Bowers, J., T. Sokolsky, T. Quach, and E. Alani. 1999. A Mutation in the MSH6 Subunit of the Saccharomyces Cerevisiae MSH2-MSH6 Complex Disrupts Mismatch Recognition. J. Biol. Chem. 274:16115-16125. 122. Drotschmann, K., W. Yang, F. E. Brownewell, E. T. Kool, and T. A. Kunkel. 2001. Asymmetric Recognition of DNA Local Distortion - Structure-Based Functional Studies of Eukaryotic MSH2-MSH6. J. Biol. Chem. 276:46225-46229. 123. Yang, W. 2008. Structure and Mechanism for DNA Lesion Recognition. Cell Res. 18:184-197. 124. Yang, C. G., C. Yi, E. M. Duguid, C. T. Sullivan, X. Jian, P. A. Rice, and C. He. 2008. Crystal Structures of DNA/RNA Repair Enzymes AlkB and ABH2 Bound to dsDNA. Nature 452:961-965. 125. Salsbury, F. R. 2010. Effects of Cisplatin Binding to DNA on the Dynamics of the E. Coli MutS Dimer. Protein Peptide Lett 17:744-750. 202 126. Salsbury, F. R., J. E. Clodfelter, M. B. Gentry, T. Hollis, and K. D. Scarpinato. 2006. The Molecular Mechanism of DNA Damage Recognition by MutS Homologs and Its Consequences for Cell Death Response. Nucleic Acids Res. 34:2173-2185. 127. Fiser, A., M. Feig, C. L. Brooks, III, and A. Sali. 2002. Evolution and Physics in Comparative Protein Structure Modeling. Acc Chem Res 35:413-421. 128. Bas, D. C., D. M. Rogers, and J. H. Jensen. 2008. Very Fast Prediction and Rationalization of Pka Values for Protein-Ligand Complexes. Proteins 73:765-783. 129. Jorgensen, W. L. 1981. Quantum and Statistical Mechanical Studies of Liquids .10. Transferable Intermolecular Potential Functions for Water, Alcohols, and Ethers Application to Liquid Water. J Am Chem Soc 103:335-340. 130. Darden, T., D. York, and L. Pedersen. 1993. Particle Mesh Ewald - an Nlog(N) Method for Ewald Sums in Large Systems. J. Chem. Phys. 98:10089-10092. 131. Golosov, A. A., and M. Karplus. 2007. Analysis of the Translocation Step in DNA Replication by DNA Polymerase I with Computer Simulations. Biophys. J.:225a-225a. 132. Huang, N., N. K. Banavali, and A. D. MacKerell, Jr. 2003. Protein-Facilitated Base Flipping in DNA by Cytosine-5-Methyltransferase. Proc. Natl. Acad. Sci. U. S. A. 100:68-73. 133. Miyamoto, S., and P. A. Kollman. 1992. Settle - an Analytical Version of the Shake and Rattle Algorithm for Rigid Water Models. J. Comput. Chem. 13:952-962. 134. Feig, M., and B. M. Pettitt. 1999. Sodium and Chlorine Ions as Part of the DNA Solvation Shell. Biophys. J. 77:1769-1781. 135. Garcia, A. E., and G. Hummer. 2000. Water Penetration and Escape in Proteins. Proteins 38:261-272. 136. Song, K., A. J. Cambell, C. Bergonzo, C. del los Santos, A. P. Grollman, and C. Simmerling. 2009. An Improved Reaction Coordinate for Nucleic Acid Base Flipping Studies. J. Chem. Theory Comput. 5:3105-3113. 203 137. Banavali, N. K., and A. D. MacKerell, Jr. 2002. Free Energy and Structural Pathways of Base Flipping in a DNA GCGC Containing Sequence. J Mol Biol 319:141-160. 138. Varnai, P., and R. Lavery. 2002. Base Flipping in DNA: Pathways and Energetics Studied with Molecular Dynamic Simulations. J Am Chem Soc 124:7272-7273. 139. Hagan, M. F., A. R. Dinner, D. Chandler, and A. K. Chakraborty. 2003. Atomistic Understanding of Kinetic Pathways for Single Base-Pair Binding and Unbinding in DNA. Proc. Natl. Acad. Sci. U. S. A. 100:13922-13927. 140. Nikolova, E. N., E. Kim, A. A. Wise, P. J. O'Brien, I. Andricioaei, and H. M. AlHashimi. 2011. Transient Hoogsteen Base Pairs in Canonical Duplex DNA. Nature 470:498-U484. 141. Beglov, D., and B. Roux. 1997. An Integral Equation to Describe the Solvation of Polar Molecules in Liquid Water. J. Phys. Chem. B 101:7821-7826. 142. Banavali, N. K., and B. Roux. 2005. Free Energy Landscape of A-DNA to B-DNA Conversion in Aqueous Solution. J Am Chem Soc 127:6866-6876. 143. Varnai, P., M. Canalia, and J. L. Leroy. 2004. Opening Mechanism of G Center Dot T/U Pairs in DNA and RNA Duplexes: A Combined Study of Imino Proton Exchange and Molecular Dynamics Simulation. J Am Chem Soc 126:14659-14667. 144. Moe, J. G., and I. M. Russu. 1992. Kinetics and Energetics of Base-Pair Opening in 5'D(Cgcgaattcgcg)-3' and a Substituted Dodecamer Containing G.T Mismatches. Biochemistry 31:8421-8428. 145. Drew, H. R., R. M. Wing, T. Takano, C. Broka, S. Tanaka, K. Itakura, and R. E. Dickerson. 1981. Structure of a B-DNA Dodecamer - Conformation and Dynamics .1. P Natl Acad Sci-Biol 78:2179-2183. 146. Chen, Y. Z., V. Mohan, and R. H. Griffey. 1998. Effect of Backbone Zeta Torsion Angle on Low Energy Single Base Opening in B-DNA Crystal Structures. Chem. Phys. Lett. 287:570-574. 147. Feig, M., R. Zacharias, and B. M. Pettitt. 2001. Conformations of an Adenine Bulge in a DNA Octamer and Its Influence on DNA Structure from Molecular Dynamics Simulations. Biophys. J. 81:352-370. 204 148. Biswas, I., and P. Hsieh. 1997. Interaction of MutS Protein with the Major and Minor Grooves of a Heteroduplex DNA. J Biol Chem 272:13355-13364. 149. Bjornson, K. P., D. J. Allen, and P. Modrich. 2000. Modulation of MutS ATP Hydrolysis by DNA Cofactors. Biochemistry 39:3176-3183. 150. Antony, E., and M. M. Hingorani. 2003. Mismatch Recognition-Coupled Stabilization of MSH2-MSH6 in an ATP-Bound State at the Initiation of DNA Repair. Biochemistry 42:7682-7693. 151. Acharya, S., and K. Patterson. 2010. Mutations in the Conserved Glycine and Serine of the MutS ABC Signature Motif Affect Nucleotide Exchange, Kinetics of Sliding Clamp Release of Mismatch and Mismatch Repair. Mutat Res-Fund Mol M 684:56-65. 152. Lu, X. J., and W. K. Olson. 2003. 3dna: A Software Package for the Analysis, Rebuilding and Visualization of Three-Dimensional Nucleic Acid Structures. Nucleic Acids Res. 31:5108-5121. 153. Dillingham, M. S., D. B. Wigley, and M. R. Webb. 2000. Demonstration of Unidirectional Single-Stranded DNA Translocation by Pcra Helicase: Measurement of Step Size and Translocation Speed. Biochemistry 39:205-212. 154. Dillingham, M. S., D. B. Wigley, and M. R. Webb. 2002. Direct Measurement of SingleStranded DNA Translocation by Pcra Helicase Using the Fluorescent Base Analogue 2Aminopurine. Biochemistry 41:643-651. 155. Soultanas, P., M. S. Dillingham, P. Wiley, M. R. Webb, and D. B. Wigley. 2000. Uncoupling DNA Translocation and Helicase Activity in Pcra: Direct Evidence for an Active Mechanism. EMBO J. 19:3799-3810. 156. Caruthers, J. M., and D. B. McKay. 2002. Helicase Structure and Mechanism. Curr. Opin. Struct. Biol. 12:123-133. 157. Enemark, E. J., and L. Joshua-Tor. 2006. Mechanism of DNA Translocation in a Replicative Hexameric Helicase. Nature 442:270-275. 158. Gyimesi, M., K. Sarlos, and M. Kovacs. 2010. Processive Translocation Mechanism of the Human Bloom's Syndrome Helicase Along Single-Stranded DNA. Nucleic Acids Res. 38:4404-4414. 205 159. Massey, T. H., C. P. Mercogliano, J. Yates, D. J. Sherratt, and J. Lowe. 2006. DoubleStranded DNA Translocation: Structure and Mechanism of Hexameric Ftsk. Mol. Cell 23:457-469. 160. Sivanathan, V., M. D. Allen, C. de Bekker, R. Baker, L. K. Arciszewska, S. M. Freund, M. Bycroft, J. Lowe, and D. J. Sherratt. 2006. The Ftsk Gamma Domain Directs Oriented DNA Translocation by Interacting with Kops. Nat. Struct. Mol. Biol. 13:965-972. 161. Lowe, J., A. Ellonen, M. D. Allen, C. Atkinson, D. J. Sherratt, and I. Grainge. 2008. Molecular Mechanism of Sequence-Directed DNA Loading and Translocation by Ftsk. Mol. Cell 31:498-509. 162. Takayama, Y., D. Sahu, and J. Iwahara. 2010. Nmr Studies of Translocation of the Zif268 Protein between Its Target DNA Sites. Biochemistry 49:7998-8005. 163. Yu, J., T. Ha, and K. Schulten. 2007. How Directional Translocation Is Regulated in a DNA Helicase Motor. Biophys. J. 93:3783-3797. 164. Yu, J., T. Ha, and K. Schulten. 2006. Structure-Based Model of the Stepping Motor of Pcra Helicase. Biophys. J. 91:2097-2114. 165. Ishida, H. 2010. Branch Migration of Holliday Junction in Ruva Tetramer Complex Studied by Umbrella Sampling Simulation Using a Path-Search Algorithm. J. Comput. Chem. 31:2317-2329. 166. Golosov, A. A., J. J. Warren, L. S. Beese, and M. Karplus. 2010. The Mechanism of the Translocation Step in DNA Replication by DNA Polymerase I: A Computer Simulation Analysis. Structure 18:83-93. 167. Feng, J. A., R. C. Johnson, and R. E. Dickerson. 1994. Hin Recombinase Bound to DNA - the Origin of Specificity in Major and Minor-Groove Interactions. Science 263:348-355. 168. Torrie, G. M., and J. P. Valleau. 1974. Monte-Carlo Free-Energy Estimates Using NonBoltzmann Sampling - Application to Subcritical Lennard-Jones Fluid. Chem. Phys. Lett. 28:578-581. 169. Kumar, S., D. Bouzida, R. H. Swendsen, P. A. Kollman, and J. M. Rosenberg. 1992. The Weighted Histogram Analysis Method for Free-Energy Calculations on Biomolecules .1. The Method. J. Comput. Chem. 13:1011-1021. 206 170. Shirts, M. R., and J. D. Chodera. 2008. Statistically Optimal Analysis of Samples from Multiple Equilibrium States. J. Chem. Phys. 129:124105-124114. 171. Ferrenberg, A. M. 1989. Efficient Use of Monte Carlo Simulation Data. In Department of Physics. Carnegie-Mellon University, Pittsburgh, PA. 86. 172. Ferrenberg, A. M., and R. H. Swendsen. 1988. New Monte Carlo Technique for Studying Phase Transitions. Phys Rev Lett 61:2635-2638. 173. Ferrenberg, A. M., and R. H. Swendsen. 1989. Optimized Monte Carlo Data Analysis. Phys Rev Lett 63:1195-1198. 174. Souaille, M., and B. Roux. 2001. Extension to the Weighted Histogram Analysis Method: Combining Umbrella Sampling with Free Energy Calculations. Comput. Phys. Commun. 135:40-57. 175. Lee, M. S., M. Feig, F. R. Salsbury, and C. L. Brooks. 2003. New Analytic Approximation to the Standard Molecular Volume Definition and Its Application to Generalized Born Calculations. J. Comput. Chem. 24:1348-1356. 176. Lee, M. S., F. R. Salsbury, and C. L. Brooks. 2002. Novel Generalized Born Methods. J. Chem. Phys. 116:10606-10614. 177. Chocholousova, J., and M. Feig. 2006. Balancing an Accurate Representation of the Molecular Surface in Generalized Born Formalisms with Integrator Stability in Molecular Dynamics Simulations. J. Comput. Chem. 27:719-729. 178. Banavali, N. K., and B. Roux. 2002. Atomic Radii for Continuum Electrostatics Calculations on Nucleic Acids. J. Phys. Chem. B 106:11026-11035. 179. Pastor, R. W., B. R. Brooks, and A. Szabo. 1988. An Analysis of the Accuracy of Langevin and Molecular-Dynamics Algorithms. Mol Phys 65:1409-1419. 180. Ryckaert, J. P., G. Ciccotti, and H. J. C. Berendsen. 1977. Numerical-Integration of Cartesian Equations of Motion of a System with Constraints - Molecular-Dynamics of NAlkanes. J. Comput. Phys. 23:327-341. 207 181. Komeiji, Y., and M. Uebayasi. 1999. Change in Conformation by DNA-Peptide Association: Molecular Dynamics of the Hin-Recombinase-Hixl Complex. Biophys. J. 77:123-138. 182. Komeiji, Y., and M. Uebayasi. 1999. Molecular Dynamics Simulation of the HinRecombinase - DNA Complex. Mol Simulat 21:303-324. 183. Jia, Y., L. Bi, F. Li, Y. Chen, C. Zhang, and X. Zhang. 2008. Alpha-Shaped DNA Loops Induced by MutS. Biochem Biophys Res Commun 372:618-622. 184. Mendillo, M. L., C. D. Putnam, A. O. Mo, J. W. Jamison, S. Li, V. L. Woods, and R. D. Kolodner. 2010. Probing DNA- and ATP-Mediated Conformational Changes in the MutS Family of Mispair Recognition Proteins Using Deuterium Exchange Mass Spectrometry. J. Biol. Chem. 285:13170-13182. 185. Jiang, Y., and P. E. Marszalek. 2011. Atomic Force Microscopy Captures MutS Tetramers Initiating DNA Mismatch Repair. EMBO J Advance Online Publication. 186. Gopal, S. M., S. Mukherjee, Y. M. Cheng, and M. Feig. 2010. Primo/Primona: A CoarseGrained Model for Proteins and Nucleic Acids That Preserves near-Atomistic Accuracy. Proteins 78:1266-1281. 208