annqw-n “- I m a a . . w m .‘3 Q.~ T“ —Q 3“\ LIBRARY Michigan State University This is to certify that the dissertation entitled DEVELOPMENT OF SPECIAL MOLECULAR DYNAMICS BASED METHODS TO ACCELERATE SAMPLING OF IMPORTANT PROTEIN MOTIONS presented by Li Su has been accepted towards fulfillment of the requirements for the Ph.D. degree in Chemistry fl/vwf CUM; Major Professor’s Signature @‘J’ /8/ 2009 Date MSU is an Affirmative Action/Equal Opportunity Employer ----.-r—.—o—-—.—.-.-.._ - PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5108 K:/Prolecc&Pres/CIRCIDateDue.indd DEVELOPMENT OF SPECIAL MOLECULAR DYNAMICS BASED METHODS TO ACCELERATE SAMPLING OF IMPORTANT PROTEIN MOTIONS By Li Su A DISSERTATION Submitted to Michigan State University In partial fulfillment of the requirements for degree of DOCTOR OF PHILOSOHPY Chemistry 2009 ABSTRACT DEVELOPMENT OF SPECIAL MOLECULAR DYNAMICS BASED METHODS TO ACCELERATE SAMPLING 0F IMPORTANT PROTEIN MOTIONS By Li Su Molecular Dynamics (MD) applied to complex systems such as proteins that have energetic barriers separating different configurations may suffer from slow sampling of the configuration space. Therefore, new methods are needed to improve the ability of exploring conformational space. One way to accelerate the exploration of configuration space is with various Hamiltonian replica exchange methods (HREM) whereby multiple systems differing in their Hamiltonians are run by MD in parallel and, periodically, attempts at exchanging them is made according to a Monte Carlo rule that maintains the Boltzmann distribution. The HREM (implemented in the CUKMODY MD code designed for the efficient simulations of solvated proteins) is first used to study the conformational states of the zwitterionic form of the pentapeptide Met-enkephalin. There is a competition between open forms of the peptide driven by polar solvation of the terminal ammonium and carboxylate groups and closed forms driven by their salt-bridge formation. Normal MD started from an open state does not sample closed conformations. A small number of HREM systems were found to be sufficient to sample closed and open states a sufficient number of times to obtain potentials of mean force along various reaction coordinates. A Principal Component Analysis (PCA) shows that the first two principal modes capture more than one-half of the HREM generated fluctuations. The first mode corresponds to the end-to-end distance fluctuations and shows that the closed zwitterionic state is the predominant species. The second mode describes the presence of two conformations of similar end-to-end distance that differ in the values of neighboring psi and phi dihedral angles, because such psi/phi compensation can still produce the same end-to-end distance. HPPK (6-Hydroxymethyl-7,8-dihydropterin pyrophosphokinase) catalyzes the transfer of pyrophosphate from ATP to HP (6-hydroxymethyl-7,8-dihydropterin). This first reaction in the folate biosynthetic pathway is an important target for potential antimicrobial agents. The conformations of the HP binding pockets of E. coli HPPK and Y. pestis HPPK are studied by HREM methods. Root mean square fluctuation (RMSF) calculated based on backbone atoms shows that loop 2 (which is close to the HP binding pocket) of YpHPPK has a much larger flexibility than that of loop 2 of EcHPPK. By using clustering methods, we find that EcHPPK and YpHPPK have residue conformations around the HP binding pocket that are are close to but distinct from those found in the crystal structures. There are near-closed conformations that have different HP binding pocket shapes in EcHPPK and YpHPPK that have potential for discriminating among ligands. The conformational space of ATP binding to HPPK and the unbinding process of ATP from HPPK is studied using a restraint MD method and a targeted reweighting scheme (implemented in CUKMODY). ATP remains remarkably stable in its binding pocket when HPPK is driven, by using the restraint method, towards its open form. When the ATP is induced to leave its binding pocket by using a targeted reweighting method, it uses a special path that preserves the hydrogen bonds and salt bridges existing in previous stages along the path. Dedicated to my family in China and my life at MSU iv Table of Contents List ofTables .................... vii List of Figures ...................................................................................... viii Chapter 1. Introduction - -- 1 1. Molecular Dynamics Simulation ................................................................................ 1 General description ..................................................................................................... 1 Force Field .................................................................................................................. 3 Integrator ..................................................................................................................... 3 Temperature and pressure control ............................................................................... 4 Periodic Boundary Conditions (PBC) ......................................................................... 6 Calculation of long-range (non-bonding) interactions ................................................ 7 2. Methods for enhanced sampling ................................................................................. 9 General Introduction ................................................................................................... 9 Replica Exchange Methods ....................................................................................... 10 Reweighting .............................................................................................................. 1 1 3. Methods to compute the potential of mean force (PMF) .......................................... 13 4. Principle Component Analysis (PCA) ...................................................................... 16 5. Cluster Analysis ........................................................................................................ 18 6. Studied peptides and proteins ................................................................................... 18 Met-Enkephalin ......................................................................................................... 18 6-Hydroxymethyl-7,8-dihydropterin pyrophosphokinase (HPPK) .......................... 19 Chapter 2. A specific Hamiltonian Replica Exchange Method (HREM) and its validation by application to Met-Enkephalin - 22 1. Introduction ............................................................................................................... 22 2. Methodology ............................................................................................................. 24 Hamiltonian Replica Exchange Method ................................................................... 24 Molecular Dynamics Simulation settings ................................................................. 27 Convergence tests for Principal Component Analysis (PCA) .................................. 28 3. Results and Discussion ............................................................................................. 29 Diagnostics ................................................................................................................ 29 PCA trajectory diagnostics ........................................................................................ 39 l-dimensional PCA analysis ..................................................................................... 4O End-to-end (Tyrl terminal nitrogen to MetS terminal carboxylate carbon) distance PMF and its comparison with DREM results ........................................................... 43 2-dimentional PCA Analysis .................................................................................... 47 4. Concluding Remarks ................................................................................................. 53 Chapter 3 A specific Hamiltonian Replica Exchange Method for the study of EcHPPK and YpHPPK 56 1. Introduction ............................................................................................................... 56 2. Methodology ............................................................................................................. 59 Hamiltonian Replica Exchange Method ................................................................... 59 Molecular Dynamics Simulations ............................................................................. 6O Deviation and fluctuation Measures ......................................................................... 62 Clustering method and HP binding pocket profile calculation ................................. 63 3. Results and Discussion ............................................................................................. 64 HREM Replica Exchange Diagnostics ..................................................................... 64 Comparing RMSD and RMSF for HREM trajectories with those for normal MD.. 66 Binding pockets for EcHPPK and YpHPPK ............................................................ 80 Test of adding water molecules in HP binding pockets ............................................ 90 4. Concluding Remarks ................................................................................................. 95 Chapter 4. Studies of HPPK-ATP conformation space and ATP binding through using enhanced MD methods 97 1. Introduction ............................................................................................................... 97 2. Methodology ............................................................................................................. 99 Molecular Dynamics Simulations ............................................................................. 99 Restraint method ..................................................................................................... 101 Reweighting method ............................................................................................... 102 3. Results and Discussion ........................................................................................... 104 Restraint MD of HPPK-ATP .................................................................................. 104 Reweighting MD of HPPK-ATP ............................................................................ 108 4. Concluding Remarks ............................................................................................... 121 Chapter 5. Some potential useful modifications of the HREM approach ............... 124 ‘ l. HTREM (Hamiltonian Temperature REM) ............................................................ 124 2. HRTREM: (Hamiltonian Restraint TREM) ............................................................ 125 3. HTREM_RMS (HTREM_RootMeanSquare) ........................................................ 126 4. HREM_Ion .............................................................................................................. 127 5. HREM_IonWater .................................................................................................... 127 Appendix. Description of implementation of HREM and several modifications.... 129 1. HREM ..................................................................................................................... 129 Initial Loading ......................................................................................................... 129 Exchange Attempt ................................................................................................... 129 Code excerpt of the driver of HREM implementation of CUKMODY .................. 131 2. The implementation of some possibly useful modifications based on HREM ....... 138 HTREM ................................................................................................................... 138 HRTREM ................................................................................................................ 138 I-[RTREMGEN ........................................................................................................ 139 HTREM_RMS ........................................................................................................ 139 HTREM_RMS_gradual .......................................................................................... 141 HREM_Ion .............................................................................................................. 141 HREM_IonWater .................................................................................................... 142 References- - 143 vi List of Tables Table 1.1. Typical motions in proteins ............................................................................. 10 Table 2.1. The acceptance ratio for the time span of the HREM simulation (exchanges are attempted every 80fs.) ............................................................................................... 30 Table 2.2. The RMSIPs (for HREM window 1) for the three 3 ns intervals (3-6, 6-9 and 9-12 us) using the PCA first mode, first two modes and first ten modes ................. 40 Table 3.1. Acceptance ratios for the HREM simulations; (a) for EcHPPK and (b) for YpHPPK ................................................................................................................... 65 Table 3.2. The number of snapshots in the clusters constructed for EcHPPK. ................ 82 Table 3.3. The number of snapshots in the clusters constructed for YpHPPK ................. 83 Table 4.1. The atom(s) whose distances are used for the restraint simulations .............. 106 Table 4.2. The hydrogen bonds that are present in the stages of ATP separation fi'om HPPK. ..................................................................................................................... 1 16 Table 4.3. The salt bridges that are'present in the stages of ATP separation from HPPK. ................................................................................................................................. 1 18 vii List of Figures Images in this dissertation are presented in color. Figure 1.1. Illustration of Periodic Boundary Conditions (The shadowed box is the primary box and the boxes around it are its periodic images.) ................................... 7 Figure 1.2. The function of HPPK in the folate biosynthesis pathway73 .......................... 20 Figure 2.1. (a)-(e): Migration of systems into and out of a given configuration for 3-6 ns. (fl-(i): Migration of configurations (replicas) into and out of a given system for 3-6 ns (1.,- scale value). The figures from (a) to (e) correspond to systems 1 to 5 and, similarly, the figures from (t) to (j) correspond to configurations l to 5. In (a) the points are vertically connected, and the coverage demonstrates the desired itineration. Note that in view of the number of data points that are plotted, it appears as if, at a particular time, several replicas occupy the same window or several systems visit the same replica— this does not happen, looking at shorter intervals dots will be separate. ........................................................................................................ 32 Figure 2.2. (a)-(e): Migration of systems into and out of a given configuration for 6-9 ns. (D-(i): Migration of configurations (replicas) into and out of a given system for 6-9 ns (11,- scale value). The figures from (a) to (e) correspond to systems 1 to 5 and, similarly, the figures from (f) to 0) correspond to configurations l to 5. Note that in view of the number of data points that are plotted it appears as if, at a particular time, several replicas occupy the same window or several systems visit the same replica— this does not happen, looking at shorter intervals dots will be separate. .................. 34 Figure 2.3. (a)-(e): Migration of systems into and out of a given configuration for 9-12 ns. (f)-(j): Migration of configurations (replicas) into and out of a given system for 9-12 ns (2.,- scale value). The figures from (a) to (e) correspond to systems 1 to 5 and, similarly, the figures from (t) to (i) correspond to configurations 1 to 5. Note that in view of the number of data points that are plotted, it appears as if, at a particular time, several replicas occupy the same window or several systems visit the same replica— this does not happen, looking at shorter intervals dots will be separate. 36 Figure 2.4. The end-to-end distances (Tyrl terminal nitrogen to Met5 terminal carboxylate carbon) along the time line for: (a) 3 to 6 ns, (b) 6 to 9 ns, and (c) 9 to 12 us for the ’1] =1 system. ........................................................................................ 38 Figure 2.5. The PMF corresponding to the PCA mode 1 displacements for all lt=1 system snapshots for the three time intervals of 3 to 6 ns, 6 to 9 ns, and 9 to 12 ns. ........... 42 viii Figure 2.6. The PMF for the end-to-end distance generated by the HREM simulation for all systems with ll=1 for the three time intervals of 3 to 6 ns, 6 to 9 ns, and 9 to 12 ns. ................................................................................................................................... 44 Figure 2.7. The PMF for the end-to-end distance generated by the HREM simulation and DREM simulations. The PMF line for DREM is obtained by combining two separate lines for the 3-11 A and the 8-16 A simulations ......................................... 46 Figure 2.8. The 2D PMF for the first and second PCA modes for all X=l system snapshots. Backbone CA stick plots of the configurations in the dense places are shown with the distance between Tyrl backbone nitrogen and Met5 carboxyl carbon shown. ....................................................................................................................... 48 Figure 2.9. CA wire plots for the two salt bridge conformers shown in Figure 2.8, with the Gly—2 and Gly—3 backbone atoms shown explicitly that illustrates the ‘I’(2) and $0) dihedral angle compensation mechanism. ........................................................ 49 Figure 2.10. Ramachandran plots for Gly2, Gly3 and Phe4 of Met-enkephalin; The three plots in the upper row show results for snapshots that are in the PCA first mode deep well (—O.6 to —0.4 A); The lower row for all snapshots ............................................ 50 Figure 3.1. Migration Check for EcHPPK. (a)-(f): Migration of configurations (replicas) into and out of a given system (2,. scale value). (g)-(l): Migration of windows (systems) into and out of a given configuration . The figures of a-f correspond to systems 1-6 and the figures of f-j correspond to configurations 1-6. Note that in view of the number of data points that are plotted it appears as if, at a particular time, several replicas occupy the same window or several systems visit the same replica- this does not happen,—~looking at shorter intervals, the dots will be separate. .......... 67 Figure 3.2. Migration Check for YpHPPK. (a)-(f): Migration of configurations (replicas) into and out of a given system (2,- scale value). (g)-(l): Migration of windows (systems) into and out of a given configuration . The figures of a-f correspond to systems 1-6 and the figures of f-j correspond to configurations 16 (Note that in view of the number of data points that are plotted it appears as if, at a particular time, several replicas occupy the same window or several systems visit the same replica— this does not happen. Looking at shorter intervals, the dots will be separate. ......... 70 Figure 3.3. The RMSD for residues of Loop 2 and Loop 3 in EcHPPK: (a) and (b) are for RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (0) and (d) are based on all atoms. The RMSD based on normal MD are shown with solid lines and the RMSD based on HREM are shown with dashed lines. The RMSDs are average RMSDs calculated for all the snapshots in the two three nanosecond long trajectories based on normal MD and HREM picked when the RMSDs and RMSFs for the trajectories have been stable. ....................................... 74 Figure 3.4. The RMSD for residues of Loop 2 and Loop 3 in YpHPPK: (a) and (b) are for RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c) ix and (d) are based on all atoms. The RMSD based on normal MD are shown with solid lines and the RMSD based on HREM are shown with dashed lines. The RMSDs are average RMSDs calculated for all the snapshots in the two three nanosecond long trajectories based on normal MD and HREM picked when the RMSDs and RMSFs for the trajectories have been stable. ....................................... 75 Figure 3.5. Starting crystal structures for the simulations: (a) for EcHPPK (PDB entry lqu) (b) for YpHPPK (PDB entry 2qx0) ................................................................ 77 Figure 3.6. The RMSF for residues of Loop 2 and Loop 3 in EcHPPK: (a) and (b) are for RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c) and (d) are based on all atoms. The RMSF based on normal MD are shown with solid lines and the RMSF based on HREM are shown with dashed lines The RMSFs are calculated for all the snapshots in the two three nanosecond long trajectories based on normal MD and HREM picked when the RMSDs and RMSFs for the trajectories have been stable. .................................................................................... 78 Figure 3.7. The RMSF for residues of Loop 2 and Loop 3 in YpHPPK: (a) and (b) are for RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c) and (d) are based on all atoms. The RMSF based on normal MD are shown with solid lines and the RMSF based on HREM are shown with dashed lines. The RMSFs are calculated for all the snapshots in the two three nanosecond long trajectories based on normal MD and HREM picked when the RMSDs and RMSFs for the trajectories have been stable. .................................................................................... 79 Figure 3.8. EcHPPK backbone atom conformations of pocket residues of the HP binding pocket of the central structures for clusters 1 and 4 of HREM snapshots: The left panel is for cluster 1 and the right for cluster 4. The central structure of each cluster is shown in green, the starting crystal structure is shown in yellow, and HP, in magenta, is placed according to the ternary complex crystal structure. ................... 84 Figure 3.9. YpHPPK backbone atom conformations of pocket residues of HP binding pocket of the central structures for cluster 1, cluster 2, cluster 5, and cluster 6 of HREM snapshots. The left panel is for cluster 1 (top) and cluster 5 (bottom) and the right panels for cluster 2 (top) and cluster 6 (bottom). The central structure of each cluster is shown in green, the starting crystal structure is shown in yellow and HP, in magenta, is placed according to the ternary complex crystal structure. ................... 85 Figure 3.10. The HP binding pockets profiles for EcHPPK. The HP binding pocket profiles are shown in bright red in the following plots. (a) shows the profile for the starting crystal structure, and (b) and (c) show profiles for the central structures of cluster 1 and cluster 4, respectively. ......................................................................... 87 Figure 3.11. The HP binding pockets profiles for YpHPPK. The HP binding pocket profiles are shown in bright red in the following plots. (a) shows the profile for the starting crystal structure, and (b), (c), (d), and (e) show profiles for central structures of cluster 1 cluster 2, cluster 5 and cluster 6, respectively. ...................................... 88 Figure 3.12. (a) The configuration of EcHPPK and four hydrogen bonded added water molecules. (b) The configuration after 1 ns normal MD simulation with the starting structure as shown in (a). The added water molecules are shown in yellow. ........... 92 Figure 3.13. (a) The configuration of EcHPPK with 20 added water molecules. (b) The configuration after 1 ns normal MD simulation with the starting structure as shown in (a). The added water molecules are shown in yellow. .......................................... 94 Figure 4.1. A schematic representation of the MD simulation protocol. The cylinder denotes residues that form a binding region for ATP with key residue side-chain orientations denoted by curved arrows. ATP is represented as a Wiggly line. The initial HPPK-ATPMgz binary complex is constructed from the ternary crystal structure. Ten consecutive restraint simulations direct HPPK-ATPMgz toward the apo crystal structure. ATPMgz remains bound throughout the process even though residues initially trapping ATPMgz gradually move away, through trapping intermediates, until reaching open trapping states. The reweight simulation then shows that ATPMgz can separate from HPPK through a series of breaking and making hydrogen bonds and salt bridges. ............................................................... 100 Figure 4.2. Superposition of representative structures from three windows along the restraint pathway from closed to open. The structure for window 1, which is quite similar to the closed form, is colored in yellow; the structure for window 5 is colored in purple; the structure for window 10 is colored using the normal convention. ATP in each window is shown in “ball and stick” mode, and residues Arg82 and Trp89 are shown in “stick” mode. It is clear that ATP remains bound to HPPK throughout the restraint simulation ............................................................................................ 107 Figure 4.3. (left panel) CPK view of the starting MD simulation structure derived from the ternary complex. (right panel) CPK view of the MD structure afier maintaining the restraints for 9 ns subsequent to the restraint opening protocol. ATP is colored purple. The yellow residues, 86-89, which cover part of ATP in the initial ternary complex, are mainly replaced by solvent interactions in the MD-opened structure. ................................................................................................................................. 109 Figure 4.4. ATPMgz for six different conformations as it separates from HPPK are shown with yellow, green, blue, purple, orange, and red indicating stages one through six. The gradual separation of ATP is evident as a combination of reorientational and translational motion. ............................................................................................... 1 12 Figure 4.5. Details of the hydrogen bond and salt bridge interactions as ATP separates from HPPK. Left panels, from top to bottom, are hydrogen bonds (dotted lines) for stages 1, 3 and 5. Right panels, the corresponding salt bridges (solid lines). See Table 4.2 for hydrogen bond and Table 4.3 for salt bridge particulars. ................. 114 xi Chapter 1. Introduction 1. Molecular Dynamics Simulation General description 3 is a form of computer simulation for studying molecular Molecular Dynamics“ motion by allowing atoms to interact under force fields, which are developed to approximate certain known physics. Normally, MD simulation can provide dynamic information of a system at atomic detail. (The method only deals with the classical Newtonian forces and does not explicitly take into account any quantum level details.) Thus, in principle, MD simulation can be used to answer any questions about a model system. In other words, in principle, not only any equilibrium property of the model system can be evaluated by calculating a trajectory time average based on the ergodic hypothesis (in practice, the MD trajectory usually only provides a sample of the ensemble) , but also the real dynamics of the system can be obtained. By examining the trajectory, not only can one calculate average system properties such as free energy but also one can learn the main functional motions of a protein. Of course, in practice, a study based on MD simulation will be limited by the time-scale affordable under current MD technique and the correctness of the force field parameters in use. The algorithms used to implement MD simulation are all based on answering a fundamental question3: Given the current configuration of the system, what is the next one? MD propagates the model system in time by answering the above question repeatedly. As the method only deals with classical dynamics, MD propagates the system following the classical equations of motion (equations 1.1 and 1.2): = _ _ . 1.1 , ari art. f, ( ) api ml where, ’1' , pi_ ft and mi are the coordinate, momenttun, force and mass of particle i, respectively, while H and V are the relevant Hamiltonian and potential. In practice for MD, the above first-order differential equations are usually solved for each particle to obtain better accuracy solutions of the equations of motion although the second-order Newton’s Equation (equation 1.3): ft = mi 'rl. (1.3) are sometimes used for implementation due to efficiency concerns. Therefore, in general, after the composition of the system (including the number of atoms and their types, masses and interaction potentials) are specified, there are three steps involved in the process used to propagate the system. First, one needs to assign initial values of positions and velocities to the particles within the model system. In terms of biological systems, simulation is usually started out with the crystal structures of the molecules of interest and, when conducting explicit solvent simulation, solvents can be added by either putting the system in a lattice box or an equilibrated box of solvent molecules. Then, the initial velocities are usually either generated randomly from a Maxwell distribution for a specified temperature3 or assigned to a special value (typically zero). Second, forces are evaluated according to the atoms’ interaction potentials used to define the system. In fact, potentials are defined through choosing developed force fields. Third, an integrator is used to predict the next configuration given the chosen time step. The second and third steps are repeated until reaching the pre-set number of integration steps. Force Field In MD, a molecule is generally described as a series of charged points (atoms) linked by springs (bonds). Therefore, the relevant potential interactions in a MD simulation can be decomposed into the following interaction terms: bonding terms for bond length vibration, bond angle vibration, and proper torsion and improper torsion angles, and non-bonded Lennard-Jones and electrostatic terms. These terms must be parameterized to obtain a force field. The parameterization is obtained by reproducing molecular geometry and selected properties of tested systems and the parameter values are chosen to fit experimental results or ab initio quantum data. By now, a substantial number of force fields have been developed, including some popular ones for biological systems such as the AMBER4, CHARMM5 and GROMOS6 force fields. Integrator Many integrators3 have been developed for MD simulation. One of the most popularly used integrators is the leapfrog algorithm which is a modified version of the standard Verlet algorithm mainly used for integrating the second order Newtonian equations. The Verlet algorithm uses the positions and accelerations at the time t and the positions at the time t - At to predict the positions at the time t + At, where At is the integration step (time step). By using a Taylor expansion, it can easily be derived that the errors in the atomic positions and velocities are of the order of At4 and Atz, respectively. To obtain more accurate velocities, the leapfrog algorithm is used, using velocities at half time step. The following two equations (equations 1.4 and 1.5), in which r represents atom positions, t represents time and At represents one time step, show the leap-frog algorithm7. . At r(t+At)=r(t)+r(t——2—)At (1.4) At f[t+£:—t)= '(t—-2—]+F(I)At (15) One thing worth noting is that the integration step size should be chosen as large as possible to make the simulation efficient but small enough to catch the fastest motion (As a rule of thumb, roughly the step size needs to be set as about 1/10 of the time scale of the fastest motion.) Normally bonds involving hydrogen atoms vibrate in the scale of 10 fs. Therefore, in order to make the MD simulation more efficient, the SHAKE8 algorithm is usually used to constrain at least hydrogen related bonds so that the proper time step size can be increased to 2 fs. Since, for the normal simulation time scale, those quick bond vibration motions will be averaged out, using the above constraints should be a good approximation. Temperature and pressure control When only solving the Newton’s equation with no temperature modification, then the total energy of the system is conserved. (This condition is ofien used as a measure to check the accuracy of the integrators used.) Therefore, without temperature modification, the MD simulation is in fact performed in the microcanonical (fixed number (N), volume (V) and energy (E)) ensemble. Although MD simulations within the microcanonical ensemble are the easiest to perform, in most situations, the canonical (fixed number (N), volume (V) and temperature (T)) or the isothermal-isobaric (fixed number (N), temperature (T) and pressure (P)) ensemble is more appropriate for normal experimental conditions. Therefore, we need methods to control temperature and pressure. A number of methods have been developed to control temperature or pressure 9. One popularly used method is the Berendsen method“); the essential idea behind the method is that, in fact, the system in simulation is not isolated but coupled to an external bath. So, Berendsen at al proposed the following equation of motion with a modification . T on velocities of atoms, F,- = 1'— + i(r7ef_ -1)i‘i , where, r is the coupling time, and T refis mi 22' the reference temperature, which is the temperature pre-chosen for the external bath. T is K , where k is the Boltzmann the instantaneous temperature, which is equal to B 2 kB(Nd) constant, Nd is the number of degrees of freedom in the system and the kinetic energy is N K = Z gmifiz. The extra term added in the above equation, in fact, serves as a frictional i = 1 force and, hence, the actual implementation of the method simply employs a feedback mechanism to control the temperature. The velocity is scaled by a factor -T 0' = J1-91_7Ji . Usually, the coupling time, t, is set to around 0.2 ps. The principle I behind controlling pressure is similar to that behind controlling temperature. The instantaneous pressure P is calculated as P =§27(K — Virial) with V the volume, K the kinetic ener and Virial the virial function3. Thus, for an isotro ic s stem, the ressure gy P y P is controlled by sealing the box side and all atoms’ positions by a l t P-Pref 3 factory = l-% P , where B is the coupling time for pressure control. The implementation of pressure control is much more complex for anisotropic box, in which case a pressure tensor has to be used”). As shown above, the implementation of Berendsen method is quite simple, but the compromise for this simplicity in implementation is that the Berendsen thermostats, in theory, do not lead to any known ensemble. To guarantee an NVT or NPT ensemble theoretically, more complicated methods such as Langevin dynamics method11 or Nose-Hoover method12 need to be introduced. Periodic Boundary Conditions (PBC) In mathematical models and computer simulations, periodic boundary conditions (PBC) are a set of boundary conditions that are often used to simulate an infinite system by using only a finite size box. Since usually only a small box of atoms (usually referenced as primary box) can be afforded in practical MD simulation, the boundary particles’ interactions are a serious concern. The PBC is introduced to help address this difficulty.3 In the PBC approximation, an infinite system is represented as a periodically repeated array of the finite small system being simulated. A schematic diagram for a two- dimensional system is shown in the following Figure 1.1. Clearly by using PBC, certain artificial periodicity (as shown in Figure 1.1, the simulated system is constructed by putting periodic images around the primary box, so there clearly are certain periodicities in the constructed system which might not exist in the real system) is imposed on the system under study. But, it can be shown that this system can still represent a real system provided that an appropriate box size is used.3 In conjunction with using PBC, the long- range interactions can be calculated approximately by using cut-off methods or, more accurately, by using an Ewald Lattice Sum.3 Details of calculation of long range interactions are given in the next paragraph. Figure 1.]. Illustration of Periodic Boundary Conditions (The shadowed box is the primary box and the boxes around it are its periodic images.) Calculation of long-range (non-bonding) interactions For most current MD simulation force fields (including the one described in more detail before), the non-bonded energy an for the molecular system can be written as a sum of pair-wise electrostatic and Lennard-J ones contributions as shown in equation 1.6: qiq Ar" B1" an= Z(_—— j .+—12——6j (1.6) 47:8087241- I“..- When conducting simulation using PBC concept, as far as the system is determined, all its images are determined. Therefore, in theory, all the long-range interactions can be calculated by the above expression. But, in practice, it can be seen that, in the first term of the above equation representing the electrostatic contribution, the order of rij, the distance between two atoms, is -1. It can be shown that a summation of inqj r the long-range interactions whose energy has the form of with p S 3 , converges very slowly.” (Of course, if p 51, the summation does not converge at all.) Probably the easiest method to solve this problem in calculation of those non-bonding interactions is to use a cut-off (truncation) method, which simply ignores all interactions beyond some chosen threshold distance (typically 10 A). But later it was found that for proper simulation of many biological systems more efficient methods, such as Ewald summation3 are needed.14 Using an Ewald summation method would allow the long range interactions to be calculated fully for a periodic system. The Ewald method is implemented by dividing the summation calculations into two parts, within the so-called reciprocal space and within the so-called real space. Then, assuming a charge neutral system, the two summations both converge fairly rapidly given the special formula developed by Ewald in 1921.3 A naive implementation of the Ewald summation method would be implemented in 0(N2), which is too expensive for many biological systems such as proteins simulated in explicit solvents. There are specialized implementations of the Ewald method, which can run as 0(N log N). A popular implementation is called the Particle Mesh Ewald (PME) method”, which makes interpolation of the reciprocal space structure factor in the lattice points using Cardinal BSpline interpolation method and utilizes the popular Fast Discrete Fourier Transform (F DFT) method. 2. Methods for enhanced sampling General Introduction Although, in theory, MD simulation can provide all structural, equilibrium and dynamic information of a system under study at atomic detail, the quality of the information obtained is limited by the quality of the force fields in use and the extent of the coverage in the sample space. In our work we have focused on the sampling problem (In other words, the force field issue, which is another important problem cumbering effective application of MD simulation, is not the focus for the work summarized in this dissertation.) The current affordable time scale for conducting MD simulation through integrating Newton’s equation of motion with a femtosecond time step is usually only tens of nanoseconds for a normal-sized protein simulated under explicit solvents unless large parallel machines are used so that hundreds of nanoseconds or a microsecond is achievable. ,Thus, using MD simulations to study conformational changes that may occur on micro second to seconds time scales is problematic.16 (Some typical motions in proteins are listed in the following Table 1.1.) Proteins usually have complex energy landscapes with energy barriers many times the thermal energy, RT (for room temperature at 300 K, RT is around 0.6 kcal/mol), while going over barriers is ’5“ MT , where k is rate exponentially hard according to the Arrhenius equation, k = Ae constant, A is a pre-factor, and Ea is the activation energy. Therefore, structures are frequently trapped in local energy minima where the thermal energy of the system is insufficient to traverse the energy barriers in normal MD simulation times. Special techniques, such as the replica exchange method (REM) and the reweighting method, have been developed to, respectively, help surmount or reduce those barriers among minima so that, by using those methods appropriately, a better sampling within the given affordable amount of time might be obtained. Table 1.1. Typical motions in proteins Protein Motion 10810 of Characteristic times (5) Local denaturation and opening of secondary 3 to -1 structure Aromatic side-chain rotation l to -5 Conformational changes and allosteric trasitions 0 to -5 Diffusion -6 to -9 Segmental motion -7 to -9 Aliphatic side-chain rotation -8 to -9 Vibrational and torsional modes ~11 to -13 Replica Exchange Methods The Replica exchange method (REM)17'24 is a popularly used method designed to address the sampling issue. The method was first developed for the simulations of spin systems 22 and now is widely used to study peptides and proteins. The first version of the REM introduced into the MD simulation field was temperature replica exchange method (TREM), and recently a Hamiltonian REM (HREM) was introduced into the field by Fukunishi et al.25 The essential algorithm for all the REM approaches contains two parts. In part one, each REM system (a REM system is characterized by a certain property, e.g., temperature for TREM or Hamiltonian for HREM) containing a certain configuration is simulated simultaneously and independently for a certain number of MD steps. In part 10 two, exchange attempts are made among systems according to a Metropolis-Hastings algorithm.9 (To improve the effectiveness, usually exchange attempts are only conducted between adjacent systems). In the TREM, the low temperature systems can borrow fast equilibration properties from high temperature systems, providing doorways for those low temperature systems to overcome energy barriers and thus improve the sampling.26 Of course, the sampling of the low temperature system will not be guaranteed to improve through exchanging with high temperature systems because, for example, the high temperature system may simulate the same sample space as the low temperature system or the high temperature system wastes most time on sampling unproductive regions in the sample space. The TREM suffers from the deficiency that the number of system copies needed is approximately proportional to the square root of the number of degrees of freedom of the system of interest.25 The HREM was introduced to address the above- mentioned difficulty with TREM.25, In HREM, the potential energy function in different Hamiltonians for the systems differ only in a limited set of the total number of degrees of freedom required to characterize a system, thereby, in principal, reducing the number of systems needed. Of course, the basic issue of implementing HREM becomes choosing which degrees of freedom to focus on. Reweighting The essential idea underlying a reweighting method27'3‘, which is a form of umbrella sampling9, is that we can use another potential to run a trajectory, and then by reweighting back we can still get the original ensemble average of any configurational property. The ensemble average of any property A(r~ ) , which is a function of the system configuration r” , can be calculated as shown in equation 1.730 11 A ~ _ ldr”A(r”)e"“"”’_1mm)ewrwte-W) < (r )> — [drive-WON) " IdrN {e+flAV(rN)}e—'6V.('N) : A (rN)e+,6AV(rN) e+flAV(rN) t+T Ns limo A(r.~(s))e*f"’l’* ()st T—-)oo to = t +T lim 0 e+flAVIriv")]ds T—)oo to (1.7) where, the potential energy V=(V+AV)—AVEV*—AV defines V* , a modified potential energy surface upon which the dynamics is carried out, <..>* denotes an average with modified ( V* ) weighting and r.” (s) (to SsStO+T) denotes the V *modified trajectory over the simulation time interval, T. The last identity indicates that, with the assumption of ergodicity, the ensemble averages are to be evaluated as time averages over the trajectories generated from the modified surface. At each step of the dynamics on the modified surface, the data can be “re-weighted” with the factor exp( ,BA V) to guarantee that the average is Boltzmann weighted. If barriers are reduced by use of the modified potential surface, the modified trajectory will explore the configuration space more rapidly. Some successful examples of application of the method including protein folding”, exploration of dihedral conformational space in the alanine dipeptide 33, and study of HIV-1 protease 34. l2 The most technical and difficult part for reweighting method is to choose V*. 32-34 Examples of VI choices from some developed reweighting methods are shown in the following equations 1.8 to 1.10. V“ =expt(—V/nf1 (1.8) if Vm > V(rN), then V* = AV(rN ) + V(rN ) with _ N 2 (1.9) AVON) = (Vthre V(r )])V a + (Vthre — V0" )) V* =1- exp[(—7V(rN ) — Vthre )] (1.10) where, V is the true potential energy; Vthre is called a threshold energy and is a constant chosen before conducting simulation; ”f in eq. 1.8 is the number of degrees of freedom of the system; a and y are constants used in equations 1,9 and 1.10, respectively, to further modify the energy landscape. Because a good choice of ere, and a and y, needs to be made based on some knowledge of the energy landscape, usually a substantial amount of trials need to be made before the final choice can be made on those values. 3. Methods to compute the potential of mean force (PMF) To find a low energy path and barriers between different states, a potential of mean force (pmf)35 is defined as: W (g) = —len p(§)+W0, where, W0 is an arbitrarily 13 chosen reference value to set a zero fiee energy; and [2(6) is the distribution fimction which is defined in following equation 1.11 as: )_ Idr3N6(§'[r3N ] - g)exp[-V(r3N )/kT] p(§ — * Idr3N exp[—V(r3N)/kT] (1.11) in which, V , the potential energy, depends on the 3N coordinates of the system, r3”; 4‘ is a reaction coordinate, which is a fimction depending on a certain number of degrees of fieedom in the system (e.g. an angle or a distance of interest). The 6(4‘ '[r3N ]-5) is a delta function used to pick out the reaction coordinate value 5 of the snapshot being checked, 2," '[r3N ]. Of course, in practice, a certain range will be used to consider 4‘ '[r3N ] being equal to 4’. Usually, there will be some barriers along the chosen reaction coordinate that are difficult to overcome by normal MD simulation. Then, some artificial potential'terms need to be added to help the sampling. One widely used scheme is the Umbrella Sampling scheme.36 For applying this method, a restraining potential UK) (sometimes called umbrella potential or window potential) is used to help improving the sampling of specific regions around 4‘, UK) can be whatever which is convenient and proper, though the most commonly used potential is the harmonic potential. Then, the chosen path along reaction coordinate will be divided into properly spacing windows to provide enough overlap in the reaction coordinate distributions for neighboring windows, and the expression of the restraint potential biased distribution for some window i can be represented as shown in equation 1.12: pm) = expert/45)»? (r) < exp(-flU.-(§)) >;‘ (1.12) 14 where, p," (9‘) is the unbiased (without the restraint potential) distribution for the 1"“ window. Substituting it into the WK) expression , the corresponding W,- can be obtained for each window, which is shown in following equation 1.13, W1=—kT1np.-”(5)+W1 —U.-(:>+/,- 1111) where, f,- is defined as fi = "kT 1n < 3XP(“13U1(§)) >1 (1.14) There have been numerous efforts spent to address problems of unbiasing and recombining the information gotten from all windows. WHAM37’38 is currently the most popular method since all traditional methods, which calculate fi by overlapping ,0? (4‘) as well as possible, only use data within overlapped area between neighboring ,0? (4‘) while wasting a substantial amount of data outside the overlapped area. WHAM instead linearly combines the ply (9’) to one unbiased distribution function: p“(5)=:p,-(:)p;‘(r). :p,(;)=1 11.15) 1.1... "1- exp<-fl[U1(é‘)-f.-]> injexp<-fl[U,-(r)—fi]) 191(5): (1.16) WHAM scheme is developed to make the statistical error minimal.39 The ft ’3 can be obtained from: 15 BXP(-flf.-)= jdrp“(:)exp<—flU.-(«:)> zidf N n1eXp(-flU.-(5)) P515) (1.17) gnjexp(—3[U.(5)—fi]) By plugging eq.1.14 into eq 1,12 and making some rearranging, we can get the following equation: p.” (4")=exp(-fl[U1(§)-fil)p1b(é‘) 1111) Using the above four equations (equations 1.15 to 1.18), the fi can be computed self-consistently (note UiK) is the added window restraint potential which is known), and thus compute the unbiased distribution function ,0“ (5) shown in eq. 1.15. And, it is clear that this method can be easily extended to higher dimensional distributions naturally. For example, for two dimensions, the only difference is that the iteration will be carried over ij instead of fi Though for practical issue, the number of data points needed for pmf construction exponentially depends upon the dimensionality. 4. Principle Component Analysis (PCA) Principle Component Analysis (PCA) is a statistical multivariate analysis method used in many fields“). The analysis method essentially transforms a number of possibly correlated variables into a smaller number of uncorrelated variables that are usually called principal components. The first principal component is constructed to account for as much of the variability in the data as possible, and then each succeeding component is 16 constructed to account for as much of the remaining variability as possible. In field of simulations of biological systems, PCA is essentially used to extract those modes with biggest fluctuations for further study.“‘15 One basic method to implement PCA is to diagonalize the covariance matrix for coordinates of atoms, Covij =< 613-5r j > , where 5r,- is defined as r,- - , with r,- = {x13 yi,zi} denoting the Cartesian components of the position of the ith atom.and <...> denoting the ensemble average. By diagonalizing, the orthonormal eigenvectors and corresponding eigenvalues of the covariance matrix can be obtained. Then the configuration point r(t) = (x1 (I), yl (t),..,zN (t)) can be decomposed as: 3N r(t)=;[r(t)omi]mi =12»,- (1)1111. (1.19) where the m,- are the orthonormal eigenvectors of the covariance matrix Covij , and the p, (t) are the corresponding PCA mode displacements. Typically the first few “important” modes have much larger eigenvalues than the remaining modes. (Normally the number of remaining modes is much larger than that of the first few important modes). The first few modes are considered to be important since they often describe large scale motions such as conformational changes while the remaining modes with small eigenvalues usually correspond mostly to small Gaussian fluctuations. l7 5. Cluster Analysis Cluster analysis or clustering is a common technique for statistical data analysis.46 In essence, clustering is the assignment of objects into groups (referenced as clusters) so that objects from the same cluster are more similar to each other than objects fiom different clusters. Often this similarity is assessed according to a distance measure. One widely used clustering method in analysis of simulated trajectories for bio-systems is the “g_cluster” method“, which uses the root mean square distance (RMSD) as its similarity measure. For each input structure, the method counts the number of neighbors within an RMSD cut-off, takes the structure with the largest number of neighbors, along with all its neighbors, as a cluster, and eliminates all the structures within this cluster from the pool of structures. The above steps are repeated for the remaining structures in the pool until 8 no structures are left. There are other clustering methods such as k-means4 or hierarchical clustering.49 6. Studied peptides and proteins Met-Enkephalin Met-enkephalin is an endogenous opioid peptide neurotransmitter found in the central nervous system and gastrointestinal tract where they bind to opiate receptorsso’5 l The pentapeptide (tyr-gly-gly-phe-met) is one of the two forms of known enkephalins and the other is leu-enkephalin (tyr-gly-gly—phe-leu). The opioid peptide neurotransmitters are shown to play important roles in pain mediation, opiate dependence, and euphoria.52’53 After being identified in 1975, a substantial amount of studies have been performed to investigate the enkephalins. These studies include experimental 18 studies such as nuclear magnetic resonance (NMR)53'55, infrared GR)“, ultraviolet (UV )57’58, and circular dichroism (CD)59’6°, and numerous computations.'9’32’61459 Almost all studies show that the enkephalins exhibit great conformational plasticity, and possibly because of the flexibility of the peptides, the exact dominant conformations of the peptides are still unknown (conflicting evidences for structures are obtained.) However, as will be discussed in more detail in Chapter 2, it seems to be hard to get good samples of met-enkephalin or leu-enkephalin with normal MD simulation. So some part of the work summarized in this dissertation is focused on using a special HREM approach to help improve sampling of met-enkephalin. The simulation work of met-enkephalin is being conducted in explicit water, because although the receptors of enkephalins are usually membrane proteins, the regions of those membrane bound receptors to which enkephalins bind are considered to probably be in the aqueous parts”. 6-HydroxymethyI-7,8-dihydropterin pyrophosphokinase (HPPK) Folate derivatives are essential cofactors for life; mammals, including humans, obtain folates fi'om their environment, while most microorganisms must synthesize folates de novo via the folate biosynthetic pathway. Therefore, biosynthetic pathway of folates is an important target for the development of antibiotics to treat infections from microorganisms. Due to rapidly increasing antibiotic resistance in recent years which has rendered the current antibiotics ineffective for treating many microbial infections, resulting in a worldwide health care crisis, new antimicrobial agents are urgently needed to control the infections that are resistant to the current antibiotic drugs.7l 6—Hydroxymethyl-7,8-dihydropterin pyrophosphokinase (HPPK) catalyzes the transfer of pyrophosphate from ATP to 6-hydroxymethyl- 7,8-dihydropterin (HP)72’73 and 19 is a new target in the folate biosynthesis pathway for developing antimicrobial agents. Its function mechanism is shown in the following figure73 : Figure 1.2. The function of HPPK in the folate biosynthesis pathway73 0 Hu’ufijflou 1.11/b. 1 ATP HPPK Mg” i) i N :o—ii—o—ii-o’ 43: *1 Ti «1- 1 A number of studies have been conducted on HPPK72'82, most of the studies focus on E. coli. HPPK.72’74'79’8"82 The crystal structure of apo E.coli HPPK 74 reveals a three- layered a—B-a fold formed by six B-strands and four a-helices. The fold of the HPPK molecule creates a valley that is approximately 26 A long, 10 A wide, and 10 A deep. Three flexible loops, [32-133 (loop2), [31-011 (loopl), and 012-34 (loop3), form one wall of the valley. The other wall of the valley is relatively rigid and is constructed by the part of the protein’s hydrophobic core. The crystal structure of the ternary complex E.coli HPPK 20 with AMPCPP (a non-reactive analog of ATP) and two associated Mg2+ ions and with HP shows significant conformational changes relative to the apo structure concentrated in the flexible loops75 that serve to sequester the ligands in a catalytically competent form. In the ternary form”, HP is sandwiched between two aromatic rings of Phe123 and Tyr53 and forms six hydrogen bonds with residues Thr42, Pro43, Leu45, and Asn55. Twelve residues are involved in binding AMPCPP, Gln74, Glu77, Arg84, Arg88, Try89, Arg92, Ile98, Arg110, Thr112, HisllS, Tyr116, and Arg121, of which Glu77, Arg92, HisllS, and Arg121 are conserved". The crystal structure data on the binary complex of E. coli HPPK with AMPCPP show that this binary complex is trapped in a “super open” conformation where one of the loops (loop 3) is in an extended conformation.76 NMR studies of HPPK with a related non-interactive ATP analog, AMPPCP present on average a different but also more open (than apo) conformation.79 Actually, examination of the ensemble of structures reveals that some are in conformations that approach the ternary closed conformation. The NMR results suggest that the binary complex of HPPK with ATP most likely is labile, sampling both open and closed-like conformations. In part of the work summarized in this dissertation, a special HREM approach is applied to study the HP binding pockets of E. coli HPPK and Y. pestis HPPK in the hope of discovering different important HP [binding pocket conformations of the proteins, which might be used to develop new narrow band antibiotics. 21 Chapter 2. A specific Hamiltonian Replica Exchange Method (HREM) and its validation by application to Met-Enkephalin 1. Introduction In the work covered in this chapter, the conformational space of Met-enkephalin is explored with the use of the Hamiltonian Replica Exchange Method (HREM) applied in explicit solvent MD simulations. As mentioned in Chapter 1, Met-enkephalin has been 53-56,58,60 and shown to exhibit great conformational plasticity by experiments computation.'9’3 2’61'69 Due to this plasticity and its practical importance as an important opioid peptide, Met-enkephalin is popularly used to test methods that can enhance the rate of conformational sampling. Of particular interest is the zwitterionic form (protonated N-terminus and ionized C-terminus), which should predominate in polar media such as water. The competition between “closed” forms, where the N and C termini are salt-bridged, and charge solvated “open” forms in which the terminal peptide charges interact with solvent dipoles may lead to reasonable co-existence of closed and open forms. A number of MD-based simulations of Met-enkephalin and the closely related Leu-enkephalin (both of them are enkephalins mentioned in Chapter 1) have been carried out. Berendsen et a1.63 noted that the zwitterionic form of Leu-enkephalin was quite labile but only sampled folded, closed forms during their simulation. Smith et al.64 found at neutral pH their simulated zwitterionic form of Leu-enkephalin had a roughly equal mixture of close and open states. They also found that starting from an open form the peptide rapidly closed, but eventually re-opened on their 10 ns time scale (only once during the entire simulation). Nielsen et a1.65 simulated Met- and Leu-enkephalin in zwitterionic forms and found a rapid one-way transition from open to stable, closed 22 conformers. Freed et al.(’6 compared the results of explicit and implicit solvent simulations and found mostly compact with some open conformations. Karvounis et al. 68 simulated the zwitterionic form of Leu-enkephalin in explicit solvent initiating a number of trajectories from open forms and found a variety of behaviors including a persistent salt-bridge form. Garcra et a1.69 simulated neutral Met-enkephalin in explicit solvent using the Temperature Replica Exchange Method (TREM) with 17 replicas (we call replicas “systems” in this dissertation) used. Garcia’s study demonstrated the enhanced sampling capability of a REM relative to conventional MD, showing that it could surmount the barriers separating non-helical from helical conformations in the simulation interval. One thing noticeable of the above mentioned simulation work is that, although the enkephalins are considered to be highly flexible according to experimental evidence (details are in Chapter 1), for the zwitterionic forms of enkephalins, which should be the predominant forms in water, no study has achieved a decent number of transitions between closed and open forms during their simulation time scales. Assuming the co- existence of the open and closed forms, then, to make meaningfirl statistical inferences of the enkephalins (required for the evaluation of potentials of mean force and equilibrium constants) in their conformational spaces not only requires sampling both forms but also requires achieving a large number of transitions between the open and closed forms during simulation. The co-existence is highly probable and supported by above- mentioned simulation results and experimental studies listed in Chapter 1. Therefore, in the work of this chapter, a special version of Hamiltonian Replica Exchange Method (HREM) was developed and applied to address the aforementioned sampling issue for Met-enkephalin. 23 2. Methodology Hamiltonian Replica Exchange Method As described in Chapter 1, the REM concept was first introduced as Temperature Replica Exchange Method (TREM), which suffers from the deficiency that the number of system copies needed could be very large for large system”. To solve ths problem, the REM concept was generalized to a Hamiltonian Replica Exchange Method25 (HREM) where the systems differ by their Hamiltonians (in practice, systems usually differ only in their potential energy functions and their kinetic energy parts are left intact). As a matter of terminology, we shall refer to these different Hamiltonians as systems (versus replicas), since replica connotes a copy of an item. In fact, for HREM, it is more natural to use this terminology and the term replicas will be reserved for the configurations that are present on any particular MD step. In CUKMODY83’84, the MD program deve10ped in our group and the simulation program used for all the work in this dissertation, a given configuration (replica) is maintained on a particular computer node and the systems (with different potential fimctions) move onto and out of that node. The Hamiltonian for the i111 system within the extended HREM ensemble can be represented as, Hi(X,P) =T(P)+Vi(X) where T(P) is the kinetic energy and VI(X) is the potential energy function for the ith system with phase space coordinates X, P. Exchange attempts are made regularly at certain predetermined MD steps and, for the HREM implementation in this work, the attempt was made every 40 steps. Between exchange attempts, normal MD is run for each system. The exchanges may be thought of either as configuration exchanges or potential energy firnction exchanges, which will be scale-of- 24 interaction in this HREM implementation. Computationally, it is much more efficient that only a scale factor for the potential energy function needs to be exchanged, versus exchanging configurations. Exchanges are attempted only between neighboring systems, because for the method to be efficient the overlap between the systems’ probability distributions needs to be adequate. (Efficiency can be roughly measured by the exchange acceptance probability, which is the fraction of successful exchange attempts, and is a compromise between the speed of motion and step size through the configuration space.) When system interchanges are attempted, detailed balance equations for pairs of neighboring systems 101,11. x:x> 1(X)P1-(X’) = am —> x, x91:- (xv;- (X) 12.11 are enforced. Here, a (X, X' —> X', X)is the acceptance probability (transition probability) that configuration X in the ith system and X' in the jth system before exchange results in configuration X' in the ith system and X in the jth system after exchange, and R (X) is the Boltzmann distribution at temperature T =1/kBfl for the ith system. The Metropolis rule for exchange between two systems, a (X, X' _) X', X) = min (I, e—A(X,X'—>X’,X)) (2.2) where A(X’X""x"x)=fl[(Vi(X')—Vj(X'))+(Vj(X)—Vz‘(x))] 12.31 is used to impose the detailed balance equations to guarantee that Boltzmann equilibrium in the extended ensemble of the product of all the systems’ ensembles will result for a 25 sufficiently long traj ectory.25 If the potential functions differ by a restricted set of degrees of freedom, only those will contribute to equation 2.3. In the HREM implementation for the Met-enkephalin study, the potential energy is parameterized as V (X) = 2.2V”, (xP,xP)+/1iVPS (xP,xS)+ VSS (xS,xS), (2.4) l l where the terms in equation 2.4 denote peptide-peptide, peptide-solvent and solvent- solvent interactions, respectively, and x1, is a scaling factor for the Lennard-Jones and electrostatic nonbonded interactions. In explicit solvent simulations, the number of degrees of freedom is dominated by the solvent. Thus, the indicated scaling is much reduced relative to the TREM where the global scaling ,BVi (X) = flAiV(X) would be used. The form of scaling in equation 2.4 is a requirement for the use of an Ewald method”, where the evaluation of energy and force in the reciprocal space is based on a structure factor (a sum over atoms), and consequently necessitates assignment of the scale factor to atoms, versus to atom-atom interactions. The electrostatic and Lennard- J ones interactions are uniformly scaled; therefore, as ,1, decreases, both softer Lennard- Jones and reduced electrostatic interactions are obtained, permitting sampling enhancement. One thing noticeable for REM schemes is the number of steps between two adjacent exchange attempts. Because by imposing the correct detailed balance equations, at equilibrium, all the systems in the extended ensemble will have corresponding Boltzmann distribution. The Boltzmann distributions are guaranteed when the whole 26 extended ensemble is at equilibrium, based only on the following two assumptions. Assumption one, the distributions of the systems within the extended ensemble are independent. Assumption two, the moving process of the whole extended ensemble is a Markov Chain process with only one equilibrium state. Therefore, for different choices of the number of steps between two adjacent exchange attempts, as long as the corresponding extended ensemble is at equilibrium (in practice, this “at equilibrium” is checked by checking whether the trajectories have converged), the systems should sample canonical ensembles. Of course, the optimal choices of the number of steps between two adjacent exchange attempts will be the one that make the trajectories converge most rapidly, The normal potential energy function is being used in system 1 (2, =1), so that the trajectory associated with it samples the normal canonical ensemble. Therefore, all our analysis in the later Results and Discussion section was based on trajectories for system 1, and hence for the normal canonical ensemble. Molecular Dynamics Simulation settings The CUKMODY protein molecular dynamics code, which uses the GROMOS966 force field, was modified to incorporate the HREM, based on the previous DREM83 code. The systems were run independently on different nodes of a Linux cluster computer and, when exchanges were attempted, information was passed between different computer nodes using technique conforming to the Message Passing Interface (MPI) standards. SHAKE8 was used to constrain bond distances enabling a 2 fs time step and temperature was globally controlled using a Berendsen thermostat10 with relaxation time of 0.2 ps. For the evaluation of the electrostatic and the attractive part of the Lennard-J ones 27 energies and forces, the PME method15 was applied with a direct-space cutoff of 8.52 A, an Ewald coefficient of 0.45, and a 30x 30x 30 A3 reciprocal space grid. All simulations were carried out in a box with 30.0 A sides, having 864 waters initially. The starting Met-enkephalin configuration was obtained from an NMR ensemble (pdb 1PLW).55 In this configuration, the end-to-end distance (nitrogen of the N- terminus to carboxylate carbon of the C—terminus) is 10.5 A. This distance in the ensemble of 80 lowest energy structures is ~10-11 A. The peptide was immersed in the water box and 51 overlapping waters were removed. To apply the HREM, five systems were used with the scale factors set to ’11 =1, 1?? =0.925, A3 =O.85, 2,, =0.775, 25 =0.7. All five systems were started from the same initial configuration and the first 3 ns are considered as the equilibration time. Exchanges were attempted every 40 steps. For an odd number 2n+1 of systems, exchange attempts are alternated among 1°2,...,2n-1°2n and 2°3,. ..,2n°2n+1. Convergence tests for Principal Component Analysis (PCA) The Principal. Component Analysis (PCA) method described in Chapter 1 was extensively used in the studies of the simulated trajectories of Met-enkephalin in this work. The PCA analysis was carried out by using ANALYZER”, a program written for the purpose of analyzing trajectory data by a wide variety of methods. One common issue related to using the PCA method is that the slow relaxation of the large fluctuations may prevent the fast convergence of the covariance matrix“, which results in slow convergence of the essential space spanned by those large fluctuation modes. So, before using trajectories generated from PCA, it will be better to first test the convergence in the essential subspace. (These tests can also be used as a severe tests of 28 simulation convergence because the larger PCA eigenvalues correspond to the slower motions of the subject peptide or protein.) Several convergence tests have been proposed.“87 Amadei and co-workers86 introduced a root mean square inner product (RMSIP) measure 1/2 1 n n ' RMSIP= ;Zka(t)-m1(t) (2.5) k=1i=l that evaluates the overlap of a subset of n m k (t) modes obtained from different time ' intervals of the total trajectory. Here, we take t and t’ to be two disjoint time intervals of the trajectory and use the resulting RMSIPs to monitor the stability of these modes. 3. Results and Discussion Diagnostics As discussed before, the HREM has the virtue of requiring only a limited number of systems while, if implemented appropriately, still providing robust sampling. In explicit solvent simulations, not scaling the solvent-solvent interactions should provide a substantial reduction in the ntunber of systems required relative to the TREM that scales all the degrees of freedom. As in all REM versions, the choice and optimization of the acceptance probability of attempted exchanges in the HREM is a central issue. There should be an optimal acceptance probability, because for low exchange probability the rate of movement through configuration space is small, while for high exchange probability the movement through configuration space is slow. Table 2.1 lists the 29 acceptance ratios for the HREM simulation (the ratios are shown for three 3-nanosecond time intervals, 3 to 6 ns, 6 to 9 ns and 9 to 12 ns, the first three nanoseconds are considered as equilibrium time); they are all around 0.43. Predescu and co-workers9 analyzed the optimization of the TREM acceptance ratio for a multi-dimensional oscillator system, and found that 0.3874 is the optimal acceptance ratio, with the efficiency falling off slowly around this value (acceptance probabilities in the 7—82% range provide rates sufficiently close to optimal sampling rates). The uniformity of the acceptance probability values observed indicates that the choice of the number and spacing (the specific 2, values) are appropriate. Table 2.1. The acceptance ratio for the time span of the HREM simulation (exchanges are attempted every 80fs.) Acceptance Acceptance Acceptance Potential ’1 ,1 Ratio for Ratio for Ratio for Index a b 3—6ns 6—9ns 9—12ns 1<--->2 1.0 0.925 0.424 0.438 0.420 2<--->3 0.925 0.85 0.451 0.409 0.463 3<--->4 0.85 0.775 0.457 0.433 0.458 4<--->5 0.775 0.7 0.431 0.408 0.423 To examine whether all the configurations (replicas) can visit a particular system (with a particular A, value) and whether given configurations are visited by all the systems, time trajectories for three 3-nanosecond intervals are displayed in Figures 2.1 to 30 2.3. From the plots, it is clear that all the configurations (replicas) can be visited by all the systems (a-e) and, conversely, all the systems can be visited by all the configurations (replicas) (f-j). In Figure 2.1 (a) the points are vertically connected to show that the plane is covered, demonstrating the desired itineration. In all the other plots, simple points are inserted to indicate occupancy. On this scale, all the plots look uniform in time and similar, which supports the desired feature that the systems undergo a random walk in the whole exchange range. Examined at higher resolution (a shorter time interval) plot 2.1 (a), for example, does show that replica 1 is slightly favored by lower-numbered systems, as actually can be inferred from the white space in the plot. The important migration aspect has been verified, as shown in the above, to make sure that that the exchanges of systems are sufficient. But, whether these exchanges are efficient and effective, in other words, whether these exchanges induce sufficient transitions of states for the normal system (i=1) is the next important thing that needs to be investigated. In Figure 2.4, fiom (a) to (c) the end-to-end (Tyrl terminal nitrogen to Met5 terminal carboxylate carbon) distances are plotted along the time line for the 3 to 6, 6 to 9, and 9 to 12 three 3—nanosecond intervals respectively. From the figure, it is clear that, for each time interval, there are numerous transitions between open and closed states. This observation of sufficient sampling of not only both open and closed states but also transitions is a necessary condition for the following analyses, since statistical estimates are not meaningful unless they are based on enough collected sample data points. In fact, a 18 ns normal MD simulation was conducted using the same starting structure (an open form of Met-enkephalin), but the sampling was always trapped in the open form for the whole simulation period. 31 Figure 2.1. (a)-(e): Migration of systems into and out of a given configuration for 3-6 ns. (f)-(j): Migration of configurations (replicas) into and out of a given system for 3-6 ns (,1,- scale value). The figures from (a) to (e) correspond to systems 1 to 5 and, similarly, the figures from (f) to (j) correspond to configurations l to 5. In (a) the points are vertically connected, and the coverage demonstrates the desired itineration. Note that in view of the number of data points that are plotted, it appears as if, at a particular time, several replicas occupy the same window or several systems visit the same replica— this does not happen, looking at shorter intervals dots will be separate. (a)5 (b) X X {‘3’ 4 8 s 3 5 $3 § 8 2 8 2 [E II 1 1 0:0 015 110 115 210 2:5 310 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time (ns) Time (ns) (C) (d) 315 5 ‘O 'O s 5 § § 8 2 8 2 a: o: 1 1 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time (ns) Time (ns) _ 32 Figure 2.1 A (D v (TI (continued) h L (JO N Replica Index 1—L '''''''''''' 000510152025 30 Time (ns) v Window Index 6 NOD->01 A_ u—L ''''''''''' 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time (ns) V Vifindow Index :2. o -.*.'°-°’.“.U‘. Time (ns) 0051101115 00255310 33 Window Index 3 A 3' v 01 Window Index o—‘AN‘OJL-h (111‘ IIIIIIIIIIII .0 0.5 1.0 1.5 2.0 2.5 3.0 Time (ns) vvvvvvvvvvvv 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time (ns) A; Window Index 9 o 4‘10 w‘h‘UlJ """"""" 0051.0 15 2025 30 Time(ns) Figure 2.2. (a)-(e): Migration of systems into and out of a given configuration for 6-9 ns. (f)-(j): Migration of configurations (replicas) into and out of a given system for 6-9 ns (2.,- scale value). The figures from (a) to (e) correspond to systems 1 to 5 and, similarly, the figures from (f) to (j) correspond to configurations l to 5. Note that in view of the number of data points that are plotted it appears as if, at a particular time, several replicas occupy the same window or several systems visit the same replica— this does not happen, looking at shorter intervals dots will be separate. A m v (’1 i A U- V (II # A A # A 4A Replica Ind x 00 Replica Index 0.) N N 1.__ ... -.-- __..... 1 0.0051101520215310 0.0'015'110'115'210'215'310 Time (ns) Time (ns) (c)5 (d)5 x ’ x ‘ U 4 g 4 5 3‘ E 3i 5 1 5 1 a 2 a 2 CC ‘ or ‘ 1 1 0.0051011521025310 0.0051011520215310 Time (ns) Time (ns) 34 Figure 2.2 (continued) (6)5. (f) 5‘ x ‘ >< ‘ 4 a» 4 E 3- E 31 S . 5 1 g- 2-—--- ———- —--—— 1:5” 2 m 1 1 1 1.__.. ..- -.-..___..._.. 0.00.5101152025310 0.0015"110152102530 Time (ns) Time (ns) (9) 5.———.. ———- ...-__ (h) 5 x l x ‘ g 4 g 4 s 1 5 - 3 3 1% a 1 E 2 E 2 3 1 g 1 0.0 ' 015 ' 110 ' 115 ' 210 '215250 0.0'015'17171520215310 Time (ns) Time (ns) (0 5 (I) 5 X i X g 4 g 4 _=. - s - 3 3 5 . g . E 2 3:3 2 3 1 3 1 _ 0.0051101320215310 0.0051101152035310 Time (ns) Time (ns) 35 Figure 2.3. (a)-(e): Migration of systems into and out of a given configuration for 9-12 ns. (f)-(j): Migration of configurations (replicas) into and out of a given system for 9-12 ns ( ,1,- scale value). The figures from (a) to (e) correspond to systems 1 to 5 and, similarly, the figures from (f) to (i) correspond to configurations l to 5. Note that in view of the number of data points that are plotted, it appears as if, at a particular time, several replicas occupy the same window or several systems visit the same replica— this does not happen, looking at shorter intervals dots will be separate. A m v -h 01 i i A C- v C11 Replica Index . A w . 1 . Replica Index (JO A # A N N 1 1 0.0' 015 ' 10' 1152102152320 0.0'0f5'11015210257110 Time (ns) Time (ns) A O V #01 A O. v 01 Replica Index 1 . w I 1 Replica Ind x (A) A $ A N N 4L 1 1 0.0015110 ' 115210215310 0.02 015'120'115'210'215'310 Time (ns) Time (ns) 36 Figure 2.3 (continued) A (D v 01 A f) 5 x ‘ >< ‘ 11> 4 a) 4 2 1 :c’ - .3 3. g 31 a 2 -—-—-— 1: 2 CD ._ m 1 1 3 1i 0.0015101152025310 0.0051103152025370 Time (ns) Time (ns) (9) 5' -—---— 005 X ‘ >< ‘ g 4 g 4 s - _=. 2 3 3 5 . 1% . 1:5” 2 E 2 1 3 1 0.0051101152021530 0.0'015'1T0'115'2i0'2i5'310 Time (ns) Time (ns) (0 5i <1) 5‘ X X g 4 g 4 E ‘ 5 1 3 3 f; 1 i2 « g 2 5 2 1‘-—- 3 1._._ 0.0'015'{01152102152310 0.0'015'11015210232320 Time (ns) Time (ns) 37 Figure 2.4. The end-to-end distances (Tyrl terminal nitrogen to Met5 terminal carboxylate carbon) along the time line for: (a) 3 to 6 ns, (b) 6 to 9 ns, and (c) 9 to 12 ns =1 system. for the 41 00051.0 15 2:0 ' 2T5 ' 3:0 I 0.0 of5 1:0 13520 25 3:0 ‘I 95 8:920 Eu 9 Eu Time (ns) Time (ns) I I I T I I I 000.5 1.0 1.52025 3.0' 4 q _ _ _ 4 2 0 8 6 4 2 1 Time (ns) 38 PCA trajectory diagnostics As discussed in Chapter 1, Principal Component Analysis (PCA) is a highly usefirl and popular tool for the systematic investigation of the conformational sampling of a peptide""’88 since it can reduce the high dimensional configuration space to an essential subspace that contains most of the significant, large-scale motions. Only window 1 in our HREM simulation ( )1 =1) corresponds to the normal force field, and all of the following analyses are based on window 1 for this reason. The first 3 nanoseconds of simulation are considered as an equilibration period, and the PCA trajectories are constructed based on the backbone atoms. Over the trajectories, the first mode represents around 40% of the total backbone variance (the total MSF), the first two modes together around 55%, and the first ten modes together around 90%, showing that the HREM simulation of met- Enkephalin does provide a division into essential and remaining subspaces. Because of the possible convergence problem mentioned earlier in this chapter, before diving into any detailed examination of the trajectories using PCA, we first test the convergence in the essential subspace with the RMSIP86 method (eq 2.5), examining convergence by comparing the overlap of modes constructed from different time intervals. We compare RMSIPs among three time intervals, 3-6 ns, 6-9 ns, and 9-12 ns, for the system 1 trajectories (211 =1) for the first mode, the sum of the first two modes (these two modes are the only modes which will be used for the following analysis), and the sum of the first ten modes. The RMSIP results, which are a direct measure of the projection of the basis of one subspace (for one time interval) onto the other subspace (another time interval), are listed in Table 2. All the RMSIP values are close to their limiting values of 1.0. The mode one result indicates that the three PCA first mode vectors constructed from the 39 three time intervals are essentially the same. That result suggests that there might be a reaction coordinate correlated with the PCA first (slowest) mode, and that 3-nanosecond intervals are long enough to capture this slowest motion. The RMSIP values for the first two and ten modes show a similarly good convergence. Thus, we may be confident that the HREM simulation leads to stable results over the 9-nanosecond interval (3-12 ns) and, that, 3-nanosecond intervals are sufficiently long to capture the fluctuations of Met- enkephalin. Table 2.2. The RMSIPs (for HREM window 1) for the three 3 ns intervals (3-6, 6-9 and 9-12 ns) using the PCA first mode, first two modes and first ten modes 3-6 vs. 6-9 ns 3-6 vs. 9-12 ns 6-9 vs. 9-12ns First Mode 0.972 0.991 0.980 First Two Modes 0.918 0.959 0.909 First Ten Modes 0.926 0.988 0.914 l-dimensional PCA analysis The observation that the first PCA mode captures about 40% of the total MSF combined with the observation of high convergence of this mode shown by the RMSIP test suggests its potential appropriateness to be used as a reaction coordinate. And, it should be more objective in character than that based on, for example, an end-to-end distance which could otherwise be a natural choice. The first mode displacement 40 trajectory, pl (I), defined in eq 1.19 exhibits numerous transitions between its extreme values, which shows that a broad range of distances is being sampled repeatedly during the simulation time. Thus, it is legitimate to construct a potential of mean force (PMF). The PMF, constructed by making a histogram of pl (t) for the three time intervals (for each 3ns interval, 15000 snapshots are used) of 3 to 6 ns, 6 to 9 ns, and 9 to 12 ns are displayed in Figure 2.5. The convergence among the three PMF profiles is as good as anticpated from the RMSIP test results. Along each PMF profile, there is a well defined well, around 1 A, and a broad plateau region from 5 to 18 A. The difference between the lowest points of the two wells is about 2.0 kcal/mol, and the barrier between them is around 3 kcal/mol. The barrier is not high, suggesting that switching between these two states should not be difficult. Although the deep well is about 2 kcal/mol lower then the broad well, it is much narrower, suggesting that there could be an energy/entropy compensation trade-off between the broad and deep wells. These features support the observations of the co-existence of different conformations of Met-enkephalin in water and of a lack of distinguishable secondary structure. The type of atom displacements that correspond to the first PCA mode can be inferred by calculating the correlation coefficient of the end-to-end distance for the first mode (obtained from eq 2.6 with i=1) and for the true trajectory. The correlation coefficients calculated are 0.968, 0.955 and 0.980 for the three 3-nanosecond time intervals respectively, suggesting that mode one is essentially reporting on the end-to-end distance fluctuations. The PCA has thus succeeded in singling out the principal collective motion that spans the open to closed conformations. 41 Figure 2.5. The PMF corresponding to the PCA mode 1 displacements for all k=l system snapshots for the three time intervals of 3 to 6 ns, 6 to 9 ns, and 9 to 12 ns. PMF (kcaI/mol) solid 3 - 6 ns dash 6 - 9 ns dot 9 - 12 ns 3 5 I 3 II vii” t: 1' :‘1 l‘ i " | ': :l 1' A ' 1 ' I . 1: , 11.41.111.11) I . ’1';-;.'u' "\rllr’ I‘ . ’ ’1'"! -.‘ ' V ’I -. i i l '. ‘1 I 1 "1‘, 5":."‘"w'|i-‘.. I“ . I :' .‘“‘ ' 0",; I" . . .‘J . ‘1 ~ I 5 ' 1'0 1 1'5 PCA mode1 (A) 42 20 End-to-end (Tyrl terminal nitrogen to Met5 terminal carboxylate carbon) distance PMF and its comparison with DREM results As noted above, there exists very high correlation between the end—to-end (Tyrl terminal nitrogen to Met5 terminal carboxylate carbon) distance fluctuations for the complete trajectory and that found from the first PCA mode, making the end-to-end distance seemingly to be an interesting reaction coordinate. In addition, the consideration that the zwitterionic form of Met-enkephalin should be capable of forming a salt bridge between the terminal Tyrl amide and Met5 carboxyl groups also makes the end-to-end distance to be a natural choice for a reaction coordinate. Therefore, the 1-dimensional PMF along this end-to-end distance coordinate was calculated from the trajectory of the HREM normal system (system 1) for the three time intervals, 3-6 ns, 6-9 ns, and 9-12 ms respectively and the results were displayed in Figure 2.6. The agreement among the three PMF profiles is farily good with the exception of the region expanded from 9 to 14 A in which there is a discrepancy about 1 kcal/mol. The poorer convergence in this region is most likely due to the extensive configuration space for these less constrained regions (this point is discussed more in detail in the next part for 2-dimentional PCA results). When the end-to-end distance is chosen as the reaction coordinate, an umbrella sampling based technique could be implemented to focus on such a natural reaction coordinate. Usually, an umbrella sampling based technique, if possible to be implemented, should be able to provide a better PMF estimation along a particularly chosen reaction coordinate than would a HREM simulation, due to the former’s focus on a specified reaction coordinate. Of course, for many problems, it is hard to determine a proper 43 Figure 2.6. The PMF for the end-to-end distance generated by the HREM simulation for all systems with lt=l for the three time intervals of 3 to 6 ns, 6 to 9 ns, and 9 to 12 ns. PMF(kcaI/mol) solid 3-6ns dash 6-9ns dot 9-12ns 3'1'0'1'2'1'4'1'6 end to and distance (A) 44 reaction coordinate beforehand, and some chosen reaction coordinates are hard to implement. For example, as for this Met-enkaphalin study, the end-to-end distance was chosen mostly because it was observed to have extremely high correlation with the PCA first mode and it would be very hard, if at all possible, to use a PCA mode as reaction coordinate for implementing an umbrella sampling like techniques. The umbrella sampling based technique chosen here is the Distance Replica Exchange Method (DREM)89 , which can also be considered as a special type of HREM method. The Hamiltonian for the ith system of the DREM approach has the form: H i (X ) = HO(X) + Wi(r) , where, Wi(r) = gki (r — roi)2 is the standard harmonic window potential, r is the chosen reaction coordinate distance and, r6 is the equilibrium distance for the system’s harmonic constraints used along with the force constant k,-. The main difference between DREM and a regular umbrella sampling method is that, for DREM, like all other replica exchange methods, after certain steps (100 steps for this work), exchange attempts will be made between neighboring systems for their window potentials. It was shown89 that DREM trajectories have much better convergence than for trajectories generated with normal umbrella sampling. Of course, since DREM biases the true Hamiltonian, as does the regular umbrella sampling method, some un-biasing procedure is needed to correct the corresponding probability distribution. The Weighted Histogram Analysis Method (WHAM)37’39 was used to do the un-biasing and to combine the trajectory data obtained fiom the different systems. Comparing the DREM PMF with the corresponding HREM PMF (constructed by combining 3ns time interval trajectories to form a long time trajectory from 3 to 21ns to get a smoother estimation in the broad 8 45 Figure 2.7. The PMF for the end-to-end distance generated by the HREM simulation and DREM simulations. The PMF line for DREM is obtained by combining two separate lines for the 3-11 A and the 8-16 A simulations. 7; SOIICI HREM 1 dash DREM ' PMF (kcaI/mol) ‘r’ -1 I ' I ' I ' I ' I ' I ' I ' I ' I 2 4 6 8 10 12 14 16 18 end to and distance (A) 46 to 16 A region) in Figure 2.7, it is clear that the patterns are very similar, but the DREM PMF simulation is shifted down around 1 kcal/mol in the region 7 to 15 A, which covers the broad well and the barrier between the two wells. The total time over all processors in the presented simulation, excluding the equilibration times, was 136 ns for the DREM and 90 us for the HREM simulations. Although the 136 ns vs 90 ns difference seems not to be too substantial, note that they do not represent the real time scale of the simulated trajectories. For REM schemes, because dynamics are accelerated, it is hard to estimate what the real time scale is for a given simulated trajectory. Therefore, the actual motions sampled by the DREM scheme and HREM scheme could be a little bit different, resulting in the corresponding pmf profiles to have some differences. Futhermore, although the DREM simulation was designed to focus on the particular end-to-end distance reaction coordinate, the quality of the resulting pmf depends on how well the other dimensions orthogonal to the reaction coordinate are sampled. Of course, the quality of the pmf calculated from the HREM results also depends on the sample quality of the HREM simulation. Therefore, it is hard to conclude which method provides a more accurate estimation of the pmf profile along end-to-end distance for Met-enkephalin. But, because the difference between the two profiles are not large, and their patterns are very similar, it seems to suggest both schemes are appropriate to producing a pmf. Taking into consideration that HREM does not require any explicit guiding potential for a reaction coordinate, the HREM simulation does a remarkably good job. 2-dimentional PCA Analysis Besides the first PCA mode, the second PCA mode also has a significant contribution (15%) to the overall motion, motivating construction of a 2-dimensional 47 Figure 2.8. The 2D PMF for the first and second PCA modes for all 21=l system snapshots. Backbone CA stick plots of the configurations in the dense places are shown with the distance between Tyrl backbone nitrogen and Met5 carboxyl carbon shown. Second 4 Mode (A) 2 0 -I'8 46 -1'4 .1'2 -10 -8 is First Mode (A) PMF -00 _o.s —1.0 —1.5 .20 —2.5 . (kcal/mol) --3.0 "3.5 r 4.0 45 48 Figure 2.9. CA wire plots for the two salt bridge conformers shown in Figure 2.8, with the Gly-2 and Gly-3 backbone atoms shown explicitly that illustrates the ‘I’(2) and (3) dihedral angle compensation mechanism. 49 Figure 2.10. Ramachandran plots for Gly2, Gly3 and Phe4 of Met-enkephalin; The three plots in the upper row show results for snapshots that are in the PCA first mode deep well (—0.6 to —0.4 A); The lower row for all snapshots. Gly2 1 A A r 1 r 1 Gly3 Phe4 '1“ ~l60 -80 0 460 -80 0 80 I60 -160 -80 0 80 I60 50 PMF displayed in Figure 2.8 (Since we have already shown good PCA convergence in the above discussions, for the 2 dimensional analyses only the 3—6 ns interval trajectories was focused on for convenience). There are two wells around (—0.5 A, 2.3 A) and (—0.5 A, 7.0 A) in the figure that correspond to the deep well in the one dimensional PMF plot. Representative alpha Carbon (CA) pictures of the dominant backbone conformations are also shown in the figure. Configurations in the two wells have end-to-end distances corresponding to salt-bridged conformers, versus the more extended states of all the other displayed configurations. Figure 2.9 shows representative CA wire fi'ame backbone structures with the Gly-2 and Gly—3 atoms explicit for the two wells. The differences in the backbone structures, with the parallel and anti-parallel carbonyls, come from the differences of the Gly-2 ‘1’ and Gly-3 d) dihedral angles. The first row plots of Figure 2.10 shows the Ramachandran plots for Gly2, Gly3 and Phe4, with snapshots picked fiom the PCA first mode within the range from —0.6 to —0.4 A, and the bottom row plots for all snapshots are shown for comparison purposes. Although the configurations corresponding to the two deep wells in the 2-dimensional PMF have large differences in their Gly-2 ‘1’ and Gly-3 (I) dihedral angles, the main patterns of their backbones are similar, illustrating the known90 feature that the LI’(i) and (D(i+1) values of residues i and i+1 can compensate and still lead to overall similar structures. This is the only local mechanism in peptides (or proteins) that can lead to a structure with essentially the same overall conformation. The Ramachandran plots in the first row of Figure 2.10 are quite similar to those generated by van der Spoel and Berendsen63 in their study of zwitterionic Leu-enkephalin solvated by water. Their simulations were started from a salt—bridge and an open form that rapidly closes and remains mostly closed, so they mainly sample salt 51 bridge conformers. They also note a Gly2 ‘I’ and Gly3 (D compensation for these salt- bridged conformers. The finding that the Gly2 and Gly3 are concentrated in two distinct regions of ‘1’ and (1) suggests that these residues may be participating in hydrogen bonding. For the “parallel” carbonyl arrangement of Figure 2.9 there can be Gly3 N-H to Met5 O=C hydrogen bonding/salt-bridge interactions. For the “anti-parallel” arrangement there can be Gly2 C=0 Met5 N-H as well as Gly2 N-H Met5 O=C hydrogen bonding/salt-bridge interactions. Selected snapshots of the parallel and anti-parallel conformers do show these interaction patterns that most likely aid in stabilizing the direct terminal salt-bridge interaction. However, hydrogen bond analysis of the trajectories shows that none of these additional interactions persists. The relevant distances are constantly fluctuating and are mainly beyond the distance that would permit such a conclusion. In the remaining region of the 2-dimensional PMF plot in Figure 2.8, there is a less concentrated but much larger area of relatively low free energy, than in the area around the two deep wells. The barrier between the deep and broad areas is only around 3 kcal/mol, and the difference between them is roughly within 2 to 3.5 kcal/mol, both of which are not very large on a thermal scale but, significantly, the pathway between two areas is very narrow. This observation suggests that although Met-enkephalin samples both regions, one favored by energy and the other favored by entropy, the rate of transition between them may be slower than might be inferred from the small free energy barrier. Starting from an extended configuration, there are many configurations that correspond to the extended states area to first sample, and then a restricted region in configuration space must be found that corresponds to the small transition area in Figure 52 2.8, to enter the salt-bridge region. This feature may explain the discrepancy between the 53-56,58,60 experimental results that Met-enkephalin or Leu—enkephalin shows great flexibility and a lack of definite conformations in water, and the difficulty of going from the extended to closed conformations found through conventional MD sirnulattionslg’32’61' 69. Indeed, for the zwitterionic Met-enkephalin simulated here, an 18 ns normal MD simulation with the same starting configuration and conditions as for the HREM system 1 (71=1) is trapped in extended states and never samples configurations corresponding to the two deep wells. 4. Concluding Remarks 2 The HREM approach was successfully implemented to improving the MD sampling of zwitterionic Met-enkephalin in explicit solvent. The implementation of HREM was quite efficient in the sense that only five windows were required to generate results that, as monitored by the PCA modes, show good convergence properties in the simulation time. More importantly, it was shown that the system 1 (normal system with X=l) trajectory repeatedly sampled configurations that correspond to all the relevant end- to-end distances between the open and close states, providing an essential condition for properly exploring the configuration space of Met-enkephalin and a meaningful calculation of free energy quantities, such as a potential of mean force. The PCA was effective in singling out the dominant end-to-end distance fluctuation motion in the first mode. The l-dimensional PMF profile along the end-to-end distance for the Met- enkephalin calculated fi'om using the HREM approach was also compared to the one 53 calculated from using a DREM approach, which was implemented to use window potentials to focus on the particular end-to-end distance reaction coordinate. The two PMF profiles are quite similar, especially in pattern. The DREM result might be more accurate since it is carefully designed to focus on the reaction coordinate. But, the HREM has the advantage that it does not require a commitment to a particular reaction coordinate, or a particular dimensionality of reaction coordinate. The objectivity of the HREM in this regard is an argument in its favor relative to the DREM since in many cases it is hard to specify or know beforehand a proper reaction coordinate. Of course, the TREM is also objective because all degrees of freedom are thermally excited. However, for a system with many degrees of freedom, the chain of replicas required may become impractically large. The HREM can be viewed as an attempt to excite only important degrees of freedom, but that requires a decision as to which are the important degrees of freedom. For explicit solvent simulations, not having to excite the solvent degrees of freedom is a great advantage since their number is an order of magnitude larger than the number of peptide (and protein) degrees of freedom in typical MD simulations. The simulated trajectory, analyzed with the PCA decomposition, also revealed the presence of another significant PMF dimension that is mainly composed of a correlated local change of the Gly2 ‘I’ and Gly3 (I) dihedral angles. That two very different regions of dihedral space for these residues are stably occupied, even though the overall structure, as monitored by the end-to-end distance, corresponds to a salt-bridge conformer, results from this local ‘l’(i) and 2 1.0 0.83 0.13 2<--->3 0.83 0.73 0.094 3<--->4 0.73 0.65 0.14 4<--->5 0.65 0.57 0.080 5<--->6 0.57 0.5 0.12 Potential Aa Ab Acceptance Ratio Indexm l<--->2 1.0 0.83 0.64 2<--->3 0.83 0.73 0.50 3<--->4 0.73 0.65 0.43 4<--->5 0.65 0.57 0.41 5<--->6 0.57 0.5 0.37 65 To examine whether all the configurations (replicas) can visit a particular system (with a particular 2,- value) and whether given configurations are visited by all the systems, we display these time trajectories in Figures 1 and 2. (Since the plots for the third and fourth us are similar to the plots for the fifth ns, only the plots for the fifth us are displayed for convenience.) From the plots, it is clear that for the simulation of YpHPPK, all the systems (windows) can be visited by all the configurations (replicas) (a- f), and, conversely, all the configurations (replicas) can be visited by all the systems (windows) (g-l). Although the plots for the simulation of EcHPPK are not as turiform as the ones for YpHPPK, it is still true that configurations 2 and 3 can be visited by all the systems and systems 2 and 3 can be visited by all the configurations while all the systems have configurations with both higher and lower indices visiting them. Comparing RMSD and RMSF for HREM trajectories with those for normal MD As noted in the Introduction and Methodology sections in this chapter, the most flexible regions of HPPK are Loops 2 and 3, and these two loop regions comprise the atoms whose interactions were scaled. To see whether the HREM simulations improve sampling relative to the normal MD simulations, the RMSD and RMSF of the two loops are compared. The system 1 HREM trajectories are used for the comparison since it is system 1 that samples from the normal MD force field’s ()1=1) canonical ensemble. The RMSD and RMSF are evaluated based on the backbone atoms and on all the atoms for each residue in the two loops. All trajectories used are ones obtained after the RMSD stabilized. 66 Figure 3.1. Migration Check for EcHPPK. (a)-(f): Migration of configurations (replicas) into and out of a given system (it,- scale value). (g)-(l): Migration of windows (systems) into and out of a given configuration . The figures of a-f correspond to systems 1-6 and the figures of f-j correspond to configurations 1-6. Note that in view of the number of data points that are plotted it appears as if, at a particular time, several replicas occupy the same window or several systems visit the same replica—this does not happen,—looking at shorter intervals, the dots will be separate. (a) 6; (106: q): 5‘ B 5: E 4 E 4‘.“ "" """" .§3""" " "— E3" 3 2 E 2‘ 1 1 .- 0.0 012 0:4 0:6 ' 0:8 ' 110 0.0 ' 0:2 ' 0:4 ' 06 j 018 ' 110 Time (ns) Time (ns) (C) 62. . .- _- .. . .. . _. (d) 6- - _. 5.- -._._-..._-..--. -... .. _ . _._.- __ 5 1 a: 5 :12 4. 1.2 4 — .§ -3‘—— § 3 '— 75- j__._ '5- -._-.. _ g 2)- ; 2 1,”.-. . .......-_._... 1 --.. ._ 0.0 032 0:4 0:6 0:8 110 0.0 '012 0:4 '06 0177.0 Time (ns) Time (ns) 67 Figure 3.1 (continue) (e) 6:— ... (f) 6:-..“ .... .. ...—-.-.__.- 5 5 — as . g 1 3. 41' "— ' ‘—‘— a 41 '— 51:, 3.. -. --....._._ g 3.. -_- _ l- 1. 0.0 02 0:4 0:6 0:8 110 0.0 012 0:4 0:6 0:8 170 Time (ns) Time (ns) (9) ; (h) Window Index 6 5 4 31---... - .. 2 l 0 Window Index o ~— N 05 4:. 111 O\ I .0 ' 0:2 ' 0:4 '016 10:8 ' 110 .0 02 '014'0:6*0f82110 Time (ns) Time (ns) (I) 6.)- . (I) 6-_...-.... .__...-..._......_.. >< 5.. .-.. . ---...._._ >4 5-- ...—..-..____ d.) 1 Q) . :2. 4 — :2 4 — g 32— g 3 '8 2;- E 2i. ._ .--. ..-__.... B 1 3 1 1..._. .. .-.-. .... . ......_ 1, 0.0 072 0:4 0:6 0:8 110 0.0 02 0:4 0:6 08 L 1‘0 Time (ns) Time (ns) 68 Figure 3.1 (continue) A x v 0\ Window Index M A 3: 2. 11 0.0 ' 012 ' 0:4 ' 0:6 ' 0:8 ' i0 Time (ns) 69 — v Window Index 6. 5;_ _. 4;- - _. 3i - _- . _ 2: 1: 0.0 ' 012 T 014 I 016 ' 0:8 ' 1T0 Time (ns) Figure 3.2. Migration Check for. YpHPPK. (a)-(t): Migration of configurations (replicas) into and out of a given system ( 11- scale value). (g)-(l): Migration of windows (systems) into and out of a given configuration . The figures of a—f correspond to systems 1-6 and the figures of f-j correspond to configurations 1-6. (Note that in view of the number of data points that are plotted it appears as if, at a particular time, several replicas occupy the same window or several systems visit the same replica— this does not happen. Looking at shorter intervals, the dots will be separate. .0fi 012T 0:4 '016 '018 ' 110 Replica Index o H N DJ A LII Replica Index o H N U) A LII .0 ' 0:? 0:4 ' 0:6 T08 ' 1:0 Time (ns) Time (ns) (C) 6' ' (d) 6‘ 5._.-...--. .. . . ...-.. 5 a 4 -----------— .5 4. 3 3__. E 2 on? 2.....-“ -_......_...._.__. _ 1 1...-.... .--......___. .— 0.0 ' 012T 0:4 ' 0:6 fois 1 110 0.0 012 0:4 0:6 0:8 110 Time (ns) Time (ns) 70 Figure 3.2 (continue) (6) 6‘—----—--°---——— (f) 6 x. 5 x 5 0 5 3 4' E 4 .§ 3 .3 3 " a 2 a 2 —-———-— — — .- o: a: 1.- 1-_ . . .... .. . 0.0 012 0:4 0:6 0:8 110 0.0 012 04 0:6 0:8 110 Time (ns) Time (ns) (9) 6--- ---..-......... . . .... . (h) 6f—_-—" . ...-.- >< 5' :4 5 0) d) . E 4. - .. --- .. __.. .... E 4._..-.. -_.._.._.__. _ E 3. E 3l .5 2 .5 2 3 B - 1 ---——- 1 0.0 012 0:4 0:6 0:8 110 0.0 012 0T4 0:6 0:8 110 Time (ns) Time (ns) (i) 6 - (J) 6 >< 5 5 5 ‘1’ '0 5 4——. ,5 4 "‘ ‘ a ‘ -..._-.-...___ E 3. 5 3. .5 2 .5 2 a B 1 1 0.0 ' 0:2 ' 0:4 ' 0:6 ' 0:8 ' 1:0 0'0 0:2 0.14 0:6 0:3 110 Time (ns) Time (ns) 71 Figure 3.2 (continue) (k) 6‘ (I) 6+ 5 5. 35 5j—----—-- —__ 3. 4 E 4 g 3T.-...--. .. . . ..... ..-.. g 3. . E 2l '8 2:__.__._ _--.. 3 . 3 . l 1-....- 0.0 ' 0:2 ' 0:4 ' 0:6 ' 0:8 ' 110 0.0 012 0:4 0:6 0.8 110 Time (“5) Time (ns) 72 The RMSD comparisons are displayed in Figures 3.3 and 3.4. Based on EcHPPK backbone atoms, the RMSD calculated for the normal MD trajecotry is around 3 A, showing that the simulated conformations did not deviate far fi'om the original crystal structure. The RMSD for most residues in Loop 2 and Loop 3 based on the HREM trajectories has a value between 3 and 5 A with residues at the ends of two loops having smaller RMSD while two residues in the center of Loop 3 having a bit larger RMSD. The simulated conformations based on HREM deviate more from the starting structure than those based on normal MD trajectories from the starting structure. A larger deviation from the starting structure after simulation for a couple of nanoseconds has a better chance to occur since the starting structure was made by removing HP from the ternary crystal structure. Based on YpHPPK backbone atoms, using either the normal MD or HREM method, the resulting RMSD plots show that the simulated protein loop 2 and 3 conformations are substantially different from the starting crystal structure after several nanoseconds of simulation. Basing the RMSD calculations on all atoms of Loops 2 and 3, the changes of conformations for most residues are larger for both EcHPPK and YpHPPK either by regular MD or by HREM. This result is expected because the side chains are usually more flexible than backbones. The larger RMSD difference for YpHPPK versus EcHPPK may be due to the fact that the starting crystal structure for YpHPPK was obtained by simulating one monomer from the dimeric crystal structure.80 As discussed in this chapter’s methodology section about the starting crystal structure for YpHPPK, in the dimer crystal structure each monomer’s Loop 2 interacts strongly with residues in the other monomer, and it is Loop 2 that shows the largest RMSDs from the crystal structure. 73 Figure 3.3. The RMSD for residues of Loop 2 and Loop 3 in EcHPPK: (a) and (b) are for RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c) and (d) are based on all atoms. The RMSD based on normal MD are shown with solid lines and the RMSD based on HREM are shown with dashed lines. The RMSDs are average RMSDs calculated for all the snapshots in the two three nanosecond long trajectories based on normal MD and HREM picked when the RMSDs and RMSFs for the trajectories have been stable. 10- (a) 10- (b) 8~ 3. <3: A ‘ '8 6- gr 6.. E ‘ 2 4- 2- 0 I ‘V I V 1 r I ' ' 'fi 0 I I I fl ' I Y I r I ‘ I 44 46 48 50 52 54 82 84 86 88 90 92 94 resnum resnum 8- 3- 4:: l ' ‘3 6-1 g 6. E w '8 . 45 E 4« 25 2. ( 1 0 I I I ' I r I r I r j 0 ' v I v I v I v I I 1 ' l 44 46 48 50 52 54 82 84 86 88 90 92 94 res num res num 74 Figure 3.4. The RMSD for residues of Loop 2 and Loop 3 in YpHPPK: (a) and (b) are for RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c) and (d) are based on all atoms. The RMSD based on normal MD are shown with solid lines and the RMSD based on HREM are shown with dashed lines. The RMSDs are average RMSDs calculated for all the snapshots in the two three nanosecond long trajectories based on normal MD and HREM picked when the RMSDs and RMSFs for the trajectories have been stable. 10. (a) 10‘ (b) . l 8- 3‘ g 6.. 2 6.. 3 ‘ 3 ‘ E 4. E 4- ‘ l 2- 2. o I I I T T I O I I fir I 7 I 44 46 48 50 52 54 86 88 90 92 94 resnum resnum 109 1°“ W a. 8‘ g 6- g 64 s ‘ 8 ‘ E 4- E 4- 25 2. J O I ' I ‘7 I 1 I ' I T fl 0 . I fl 1 . T fi, ‘ 44 46 48 50 52 54 86 88 90 92 94 resnum resnum 75 The Loop 2 residues of YpHPPK deviate more from the starting crystal structure than those of EcHPPK. Figure 3.5 shows that the Loop 2 conformations for the starting crystal structures of the two proteins are quite similar, while from the ternary structures it is clear that the HP binding pocket in HPPK is close to its loop 2. Therefore, the difference of the magnitudes of the deviations fiom the crystal structures observed for the Loop 2 residues in the two proteins suggests the possible existence of different HP binding pocket conformations for the two proteins. The differences between the RMSDs based on HREM and normal MD trajectories are in the main not large. But, RMSD only shows the deviation from the starting crystal structure and does not contain direct information about protein flexibility. In particular, if the protein is trapped in a small area far from the starting point within the configuration space, the RMSD can be large. Therefore, RMSF comparisons should be more revealing of the extent of configurational sampling. If they turn out to be small, then the RMSD measures indicate that the various trajectories are confined to basins in configuration space that are displaced by varying degrees from the crystal structures. Figures 3.6 and 3.7 display the RMSF comparisons, which show that the RMSF based on HREM trajectories is consistently much larger than the RMSF based on normal MD trajectories for both EcHPPK and YpHPPK. The normal MD RMSF based on the backbone atoms is less than 2 A for each residue of the two loops for the two proteins with the only exception that the Gly47 of YpHPPK has an RMSF of 2.2 A. Even when side chains are included, only Arg84 has an RMSF greater than 3 A. It is clear that the normal MD trajectories were trapped around some basin in the configuration space over the simulation time scale, after a stable RMSD was achieved. The RMSF obtained 76 Figure 3.5. Starting crystal structures for the simulations: (a) for EcHPPK (PDB entry lqu) (b) for YpHPPK (PDB entry 2qx0) (a) Loop2 —-> 77 Figure 3.6. The RMSF for residues of Loop 2 and Loop 3 in EcHPPK: (a) and (b) are for RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c) and (d) are based on all atoms. The RMSF based on normal MD are shown with solid lines and the RMSF based on HREM are shown with dashed lines The RMSFs are calculated for all the snapshots in the two three nanosecond long trajectories based on normal MD and HREM picked when the RMSDs and RMSFs for the trajectories have been stable. 8- 84 (a) (b) 6- 6. 4 '34- @4- 15 ......... ‘23 E g .............. 0 1 fi F T '71 T l I 0 I I T I f I I I 44 46 48 50 52 54 82 84 86 88 90 92 94 res num res num 8~ 8 (C) j (d) 6- 5. €44 ------ g4. ”‘3 '7. E 1 g 2" 2d 0 ' I ' r ' f V r I o ff I I I I I I T I 44 46 48 50 52 54 82 84 86 88 9o 92 94 res num nes num ' 78 Figure 3.7. The RMSF for residues of Loop 2 and Loop '3 in YpHPPK: (a) and (b) are for RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c) and (d) are based on all atoms. The RMSF based on normal MD are shown with solid lines and the RMSF based on HREM are shown with dashed lines. The RMSFs are calculated for all the snapshots in the two three nanosecond long trajectories based on normal MD and HREM picked when the RMSDs and RMSFs for the trajectories have been stable. 3- 3- b (a) ( ) 6 4 5 ‘ 5 ‘0 ~.,-, 24 2- 1 /\ \ 0 I ' I ' I ' I ' l ' l O ' I V I r I ' I j 44 46 48 50 52 54 86 88 90 92 94 res num res num 84 8“ (c) ‘ (d) 6_ 6- g 4- .74 E 4‘ u- ,' ‘0 g 4 ....... E 2-1 A 2. 0 I I r r f F I I I 0 fi I I I I I j 44 46 48 50 52 54 86 33 90 92 94 '93 num res num 79 through HREM simulation is much larger, especially for residues in the middle of the loops that are less susceptible to the influence of the rigid core. Based only on backbone atoms, the residues in the middle of the loops for the two proteins all have an RMSF greater than 3 A. It is noteworthy that the YpHPPK backbone of Loop 2 exhibits a much larger flexibility compared with that of EcHPPK. The RMSF based on backbone atoms is mostly more than 5 A for the residues in the middle of Loop 2 of YpHPPK, with the central three having an RMSF close to 7 A, while the similarly calculated RMSF is mostly between 3 and 4 A for EcHPPK. Again, since the HP binding pocket is close to Loop 2, this large difference suggests that there might be different configurations of residues contributing to the HP binding pockets for the two proteins. The relatively large RMSF based on backbone atoms from trajectories obtained through the HREM simulations makes it possible to search for the potential existence of different configurations suited for binding ligands. A possible reason for this large HREM loop 2 RMSF difference between the two proteins may be due to the short length of the Loop 3 of YpHPPK. Loop 3 of YpHPPK only contains eight residues, making it difficult to interact with Loop 2, leading to the possibility of enhanced freedom of motion for Loop 2. Binding pockets for EcHPPK and YpHPPK To investigate HP binding pocket conformations, and their potential differences between EcHPPK and YpHPPK, residues that are close to HP in the respective crystal structures are selected. These residues include T hr42, Pro43, Pro44, Leu45, Try53, Asn55, and Phe123 for EcHPPK and Thr43, Lys44, Pro45, Leu46, Phe54, Asn55, and Phe123 for YpHPPK. (These residues will be referred to as pocket residues hereafter.) Only the backbone atoms of these residues are included in view of their greater stability 80 relative to side-chain atoms. The g_cluster method described in the methodology section of this chapter can be used to divide the total 3000 snapshots of the HREM trajectories for each of the two proteins into different groups according to the RMSD calculated based on backbone atoms of the pocket residues. By using a 2 A cutoff in g_cluster, the HREM trajectories for EcHPPK and YpHPPK can be clustered into 7 and 8 clusters, respectively. (The populations of the clusters are shown in Table 3.2 and Table 3.3). The observation that the YpHPPK trajectory can be separated into more clusters using the same cutoff value may be due to the greater flexibility of Loop 2 of YpHPPK. The g_cluster method was also applied to the normal MD trajectories using the same 2 A cutoff, but all snapshots turned out to be in the same cluster for EcHPPK while 99.9% of the snapshots were in the same cluster for YpHPPK. The backbone atom conformations of the pocket residues of the central structures are shown in light green in Figures 3.8 and 3.9 for cluster 1 and cluster 4 obtained from the EcHPPK trajectory and for clustersl, 2, 5 and 6 from YpHPPK. The crystal structure conformations are shown in bright yellow in the two figures for comparison. HP is also put back and shown in magenta in the two figures, according to its place in the respective crystal ternary complexes, to give a clearer picture of the binding pocket. Central structures of other dense clusters are not shown since they are either similar to the crystal structures or to the central structures shown in Figures 3.8 and 3.9. From Figure 3.8, it is clear that the central structure of cluster 1 for EcHPPK has residues Thr42 and Pro43 moved downward and residues Pro44 and Leu45 move in to be closer to the HP position in the x-ray ternary complex. And, there exist substantial changes of the (D and ‘I’ dihedral angles of Pro44 and (I) dihedral angle of Leu 46 in the structure. These changes are mainly a consequence of the 81 Table 3.2. The number of snapshots in the clusters constructed for EcHPPK. Cluster Number of snapshots outof3000 1 1719 2 1058 3 183 4 29 5 9 6 1 7 1 82 Table 3.3. The number of snapshots in the clusters constructed for YpHPPK. Cluster Number of snapshots outof3000 1 1692 2 906 3 220 4 74 5 39 6 36 7 l 8 8 15 83 Figure 3.8. EcHPPK backbone atom conformations of pocket residues of the HP binding pocket of the central structures for clusters 1 and 4 of HREM snapshots: The lefi panel is for cluster 1 and the right for cluster 4. The central structure of each cluster is shown in green, the starting crystal structure is shown in yellow, and HP, in magenta, is placed according to the ternary complex crystal structure. 84 Figure 3.9. YpHPPK backbone atom conformations of pocket residues of HP binding pocket of the central structures for cluster 1, cluster 2, cluster 5, and cluster 6 of HREM snapshots. The left panel is for cluster 1 (top) and cluster 5 (bottom) and the right panels for cluster 2 (top) and cluster 6 (bottom). The central structure of each cluster is shown in green, the starting crystal structure is shown in yellow and HP, in magenta, is placed according to the ternary complex crystal structure. 9% x, a "' 434W ‘1 ‘7. 85 bending down of the lefi half of loop 2 (residues 44 — 49) toward the protein core. The central structure of cluster 4 shown in Figure 3.8, has residues, Pro43, Pro44, Leu45, and Try53, moved upward, and there are substantial changes of the (D and ‘1’ dihedral angles of Pro43. These changes are mainly due to loop 2 transiting toward the solvent. For YpHPPK, Figure 3.9 shows that the central structures of clusters 1 and 5 both have residues Thr43, Lys44, Pro45, Leu46, and Phe54 displaced substantially upward. In the central structure of cluster 1, there are substantial dihedral changes from the starting crystal structure for the ‘I’ of residue Thr43 and the (l) dihedrals of Pro45 and Leu46. In the central structure of cluster 5, the large dihedral changes are for ‘1’ of Lys44 and the (I) angles of Pro45 and Leu46. The (D dihedrals of Pro45, and Leu46 are also substantially different for the two central structures. These changes are mainly due to the extension of loop 2 toward the solvent. The central structures of clusters 2 and 6 for YpHPPK shown in Figure 3.9 both have residues Thr43, Lys44, Pro45, Leu46, and Phe54 moved upward, with the displacement for cluster 6 larger. The Pro45 ‘I’ and Leu46 (D of the central structure of cluster 2 are substantially different from those in starting structure. For cluster 6, the different dihedrals are ‘I’ for Lys44 and Pro45 and (D for Pro45 and Leu46. The upward displacements are mainly due to the extension of loop 2 toward the solvent. To get a solid picture of how the HP binding pocket conformations really change according to the aforementioned changes in backbone atom conformations of the pocket residues, the HP binding pocket profiles obtained using the methods presented in the methodology section in this chapter are shown in red in Figures 3.10 and 3.11 for EcHPPK and YpHPPK, respectively. It is clear from the two figures that when Loop 2 moves downward towards the core compared with the starting crystal structure (cluster 1 86 Figure 3.10. The HP binding pockets profiles for EcHPPK. The HP binding pocket profiles are shown in bright red in the following plots. (8) shows the profile for the starting crystal structure, and (b) and (c) show profiles for the central structures of cluster 1 and cluster 4, respectively. (3) Crystal (b) Cluster 1 87 Figure 3.11. The HP binding pockets profiles for YpHPPK. The HP binding pocket profiles are shown in bright red in the following plots. (a) shows the profile for the starting crystal structure, and (b), (c), (d), and (e) show profiles for central structures of cluster 1 cluster 2, cluster 5 and cluster 6, respectively. (a) ' 88 Figure 3.11 (continue) Cluster 5 ' - 89 for EcHPPK and clusters 2 and 6 for YpHPPK), the HP binding pockets get narrower, while they get wider when Loop 2 moves upward towards the solvent (cluster 4 for EcHPPK and clusters 1 and 5 for YpHPPK). Figures 3.10 and 3.11 also show that the HP pocket conformations for the central structures shown are quite different for EcHPPK and YpHPPK. This finding supports the hypothesis that it might be possible to design inhibitors that are effective for only one of the two proteins. Test of adding water molecules in HP binding pockets MD simulation trajectories sometimes depend on the starting state. A concern here is that for the results discussed in this chapter, HP was simply removed from the ternary structure to make the starting structure. In addition, from the configuration generated by our start up protocol it was clear that no water molecule was left in the region that was occupied by the HP before its removal. Hereafler, we will refer to this region as HP binding pocket region. The CUKMODY start up protocol consists of placing water molecules on a grid at the density appropriate to the normal water density for T=303. Then, all water molecules that overlap protein atoms are eliminated, based on a van der Waals overlap criterion. Since no waters were left in the HP pocket, it was not clear whether there would be some substantial difference in simulated trajectories if water molecules could be added into the HP binding pocket at the beginning of the MD. To investigate the issue, extensive tests were canied out by adding water molecules into the HP binding pocket of the starting structure for the EcHPPK simulation, as the following describes. (Note that water molecules can be added into the HP binding pocket region because the region has a somewhat flat profile, a feature that leads to the non- specific grid routine tending to eliminate waters from such a region.) In a first try, four 90 water molecules were added into the HP binding pocket region as shown in Figure 3.12 (a). (Note: other water molecules remaining after water removal-routine described above were not shown in the figure, because otherwise there would be more than 7000 water molecules in the figure, and the four separately added in water molecules could not be seen. However, in terms of simulation parameters, there were no differences between the water molecules remaining after the removal routine and the four water molecules added separately.) The configurations (distances and orientations) of the four added water molecules were adjusted so that hydrogen bonds (to define a hydrogen bond, the distance between two oxygen atoms is set to be around 2.7 A and the angle of oxygen—hydrogen- oxygen is set to be close to 180 degrees) are present between any two adjacent water molecules. Then, normal MD was conducted for a nanosecond with the structure shown in Figure 3.12 (a) as starting structure. In Figure 3.13 (b), the configuration after 1 ns of simulation is shown, it is clear that the water molecules originally put inside the HP binding pocket region moved out of the region to some places between Loop 2 and Loop 3. Those places between Loop 2 and Loop 3 would contain water molecules simply by executing the normal start-up procedure. The actual time needed for the water molecules to move away from the HP binding pocket region is only about 200 ps. One possible reason for water molecules to leave the HP pocket region shortly after being added explicitly is that the residues close to the HP binding pocket region (Thr42, Pro43, Pro44, Leu45, Try53, AsnSS, and Phe123) are mostly hydrophobic. Another possible reason is that the HP binding pocket region is in part solvent exposed and also close to the highly- charged ATP phosphate tail. The high charge on ATP phosphate tail and hydrophilic property of residues around it may also help to explain the observation that the added 91 Figure 3.12. (a) The configuration of EcHPPK and four hydrogen bonded added water molecules. (b) The configuration after 1 ns normal MD simulation with the starting structure as shown in (a). The added water molecules are shown in yellow. 92 four water molecules first moved to the region between Loop 2 and Loop 3 close to the ATP phosphate tail after leaving the HP binding pocket region. .Therefore, it seems that adding those four water molecules should not have substantial influence on the HREM simulation results of HPPK. Second, a similar test as described in the above paragraph, but by adding 16 additional water molecules into the region between Loop 2 and Loop3 by a water- insertion routine105 to make sure the region was fully occupied by water molecules at the very beginning, was tried. The result is shown in Figure 3.13. From Figure 3.13 (b), it is clear that the HP binding pocket region no longer had water molecules after 1 ns simulation. Again, it seems that adding water molecules explicitly into the HP binding pockets at the starting point should not have substantial influence on the simulated sample conformations. Moreover, the test using 4 hydrogen-bonded water molecules and the 16 additional water molecules was tried again by using smaller van der Waals constants for the added 20 water molecules to reduce possible large repulsion at the beginning. Because 4 of the water molecules were added in manually, relatively large vdw repulsion might exist at the beginning, and cause these water molecules to be bounced out of the HP binding pocket region quickly. However, even with smaller vdw constants, after 1 ns, still no water molecules stayed at the place that was directly under Loop 2 and was occupied by HP in the starting ternary structure. In summary, it seems that there should be no big difference between starting simulations with an HPPK structure with water molecules added into the HP binding pocket region and starting simulation simply with an HPPK structure. 93 Figure 3.13. (a) The configuration of EcHPPK with 20 added water molecules. (b) The configuration after 1 ns normal MD simulation with the starting structure as shown in (a). The added water molecules are shown in yellow. 94 4. Concluding Remarks A Hamiltonian Replica Exchange Method (HREM) molecular dynamics (MD) approach was successfully applied to Met-enkephalin in the work described in last chapter. In the work summarized in this chapter, a similar approach is applied. A HREM approach was used to study the near-closed conformations of EcHPPK and YpHPPK The crucial point of the HREM implementation is the decision as to which are the important degrees of freedom. For a protein with a rigid core simulated in explicit solvent, only exciting the degrees of freedom of the most flexible parts of the protein (usually just some loops) is a great advantage since their number is at least an order of magnitude smaller than the number of other degrees of freedom in the system in typical MD simulations. In our application of the method, by focusing on Loops 2 and 3, the HREM improves the MD sampling of both EcHPPK and YpHPPK, when monitored by comparing the RMSFs based on HREM trajectories and those based on normal MD simulation. Using clustering methods focused on the conformations generated with the HREM, some well-populated, near-closed structures were obtained for EcHPPK and YpHPPK. In terms of some of the key residues that form the HP binding pocket, the clustering was able to provide some distinct residue conformations. Most importantly, the clusters found for EcHPPK and YpHPPK are distinguishable, indicating that there are differing near-closed conformations suited to HP binding. Those structures could be used as targets to design inhibitors to test the hypothesis that selecting inhibitors to fit some near-closed conformation that is significantly more accessible in the targeted protein than in its homolog is an effective strategy to enhance inhibitor specificity. If successful, those 95 designed inhibitors might be used as new narrow band antibiotics to help deal with the problem of antibiotic resistance. 96 Chapter 4. Studies of HPPK-ATP conformation space and ATP binding through using enhanced MD methods 1. Introduction For the simulation processes conducted for the work described in Chapter 3, one interesting observation is that the ATP is very stable. During all the simulations, ATP always stays in its binding pocket with no substantial change of its conformation. Experimental results show that HPPK binds ATP (in those experiments, the non-reactive analog AMPPCP was used ) with high affinity and that ATP binds first, and slowly, followed by very rapid HP bindingmé. Without the presence of ATP, HPPK does not bind HP in measurable quantities. These data suggest that there is a “proto-pocket” for HP binding that is formed by first binding ATP. The slow ATP binding is consistent with the idea that there is conformational flexibility of HPPK, and that 'ATP selects from a conformational ensemble. Once ATP is bound, the HPPK-ATP complex may stabilize to a set of conformations appropriate for the rapid uptake of HP. This chapter is devoted to the study of the HPPK-ATP conformation space and possibly how ATP can bind to HPPK. Similar to the last chapter, the HPPK—ATP (only E. coli HPPK is studied in this chapter) is considered in this chapter as the basic species of the investigation (HPPK- ATP notation is used as shorthand for the complex of HPPK, ATP and the two Mg2+ ions that are required for the catalytic activity) A natural and simple approach to the investigation, of course, is to use conventional MD simulation to generate trajectories of conformations that can be used to address these issues. However, as shown before in Chapter 2 and Chapter 3, conventional MD ofien suffers the problem that integrating 97 Newton’s equations of motion with femtosecond scale time step will have difficulty reaching times that can capture substantial conformational transitions and fluctuations that may be occurring on the micro to even seconds time scales owing to the complex configuration space that has substantial barriers in the high-dimensional potential energy surface. A number of methods exist that speed up transitions over potential energy barriers. One class of method introduces restraints that operate on the atoms to direct the trajectory between some initial and final state9. The restraints can range from being applied to all the atoms to drive the system between the (known) endpoints with complete conformity. Or, they can be applied to just a particular atom-atom distance as would be appropriate for obtaining a one-dimensional potential of mean force. Another possible approach is to use a reweighting method, which modifies the potential Stu'face to generate a trajectory that more readily surmounts barriers.27'30’33’34’107 The trajectory is then re- weighted back at each step to restore Boltzmann sampling. (More details about reweighting methods are provided in Chapter 1.) Cukier and Morillo developed a version of reweighting that can be referred to as targeted reweighting.30 The term targeted implies that rather than modify the entire potential surface, only very specific terms in the potential are to be modified. Targeting can provide the flexibility to address different impediments to overcoming specific barriers on the potential energy surface. Targeted reweighting can be viewed as the opposite extreme of the global, temperature REM. (More details about REM are provided in Chapter 1 and Chapter 2.) The targeted reweighting method was successfully applied by the authors onto a simple testing protein model consisting of a chain of connected beads characterized by dihedral angles and the van der Waals interactions among the beads. 98 In the work summarized in this chapter, a combination of a restraint method and a targeted reweight method was used to explore the conformation space and mechanism of ATP binding in the HPPK (More details about HPPK are provided in the first introduction chapter and last chapter.) The simulation protocol is summarized in Figure 4.1. The cylinder denotes a set of residues that form a binding pocket for ATP (represented as a Wiggly line) with orientations of key residue side-chains that are important for binding ATP denoted by curved arrows. The initial HPPK-ATP binary complex is based on the ternary crystal structure. Restraint simulations are used to direct HPPK-ATP toward the apo crystal structure74 (pdb entry lHKA). This restrained simulation was done in the hope that the ATP might move away from its binding pocket during the process. However, the simulations show that ATP remains bound throughout the process even though residues initially trapping ATP gradually move away, through ATP trapping intermediates, until reaching ATP trapping states that have an apo-like- HPPK structure. ATP is still held in trap states by a network of hydrogen bonds and salt bridges that operate between ATP-2Mg2+ and a number of HPPK core residues. The reweight simulation that starts from this last trap state then shows that ATP can separate from HPPK through a series of breaking and making hydrogen bonds and salt bridges. 2. Methodology Molecular Dynamics Simulations As similar in previous chapters, the CUKMODY protein molecular dynamics code was used to carry out the simulations. The code was modified to incorporate the 99 Figure 4.1. A schematic representation of the MD simulation protocol. The cylinder denotes residues that form a binding region for ATP with key residue side-chain orientations denoted by curved arrows. ATP is represented as a Wiggly line. The initial HPPK-ATPMgz binary complex is constructed from the ternary crystal structure. Ten consecutive restraint simulations direct HPPK-ATPMgz toward the apo crystal structure. ATPMgz remains bound throughout the process even though residues initially trapping ATPMgz gradually move away, through trapping intermediates, until reaching open trapping states. The reweight simulation then shows that ATPMgz can separate from HPPK through a series of breaking and making hydrogen bonds and salt bridges. initial binary complex openfltrapping released state restraint MD restraint MD trapping intermediate 100 reweight method as discussed below. All simulations were run at 303 K under fixed number, volume and temperature (NV T) conditions.15 The simulations were carried out in a cubic box with sides of 64.1 A, having 7671 waters added after waters overlapping the protein were removed. For the evaluation of the electrostatic and the attractive part of the Lennard-J ones energies and forces, the PME method was applied with a direct-space cutoff of 9.0 A, an Ewald coefficient of 0.32, and a 72x72x72 reciprocal space grid. Five sodium cations were added to neutralize the system. Bond lengths were constrained with the SHAKE algorithm8 allowing a 2 fs time step and the temperature was globally controlled with a Berendsen thermostatlo with relaxation time of 0.2 ps. The starting structure for simulation of HPPK-ATP was the same as the one used for HREM simulation of E. coli HPPK described in Chapter 3. For constructing the starting structure, the two Mg2+ cations were each covalently linked to an oxygen phosphate atom of ATP. One magnesium, designated as Mgl, is linked to an alpha phosphate oxygen and the other is linked to a gamma phosphate oxygen, designated as Mg2, in accord with their placement in the crystal structure. The ATP phosphates were assumed fully deprotonated (ATP thus has a total charge of —4) in agreement with the ATP protonation state when ligated to Mgz‘i.104 Restraint method For the restrained MD simulations that were used to transit the protein configuration between two desired conformations, restraints on distances between mass centers of paired atom groups were introduced. Harmonic restraint potentials V=(k/2)(x—xo)2 with force constant k =2 kcal/mol/A2 were used to restrain the 101 current distance x to be around a desired target distance x0. A small restraint force constant was used to permit the HPPK-ATP complex to fluctuate extensively without being dominated by the effects of the restraints. The restraints were applied in a step-wise manner, by dividing the transition between the initial and final states into 10 sequential windows. Each consisted of 500 ps of MD simulation with restraints set as stated above with x0 = "(xfim - x,,,,-,,-a,)/ 10+ xmma, , n the window number running from 1 to 10, and xfinanapo and xmmanclose denoting a distance in the apo and closed (the temary complex with HP removed) crystal structures, respectively. Reweighting method The reweighting method”3 1 to accelerate configurational sampling is described in detail in Chapter 1. The essential idea underlying reweighting method is that a modified potential surface can be used during actual simulation to reduce barriers and then, by reweighting back when analyzing trajectory data, the correct ensemble average can be obtained. In extreme, if the total potential is modified multiplicatively then it simply corresponds to changing the temperature of the simulation. The essence of targeted- reweighting method developed by Cukier and Morillo30 is that only specific terms in the potential energy are targeted, and these terms are chosen based on specific information such as the presence of strong electrostatic interactions corresponding, for example, to salt bridges between residues or what we still will refer to as salt bridges between charged portions of ATP and the magnesium cations and ionized residues.(These are the potential terms targeted in the work summarized in this chapter). 102 For specific potential term V0. for certain pair of atoms 1' and j to be targeted, a linear scaling is used in the work summarized in this chapter, Vij’" = xii/U. A number of 2 values were tried. The actual results are quite sensitive to the ,1 value chosen with too large a value (closer to 1) leading to no effect and too small leading to a release that is so rapid that the sampling is very poor. The choice of l=0.71 led to escape on the ns time scale. The stages that are found took about 1 ns for this choice, and the simulation was run for a total of 2 ns to make sure that ATP, once unbound, did not rebind. As also mentioned in Chapter 2 and 3, the Ewald9 method, used in our CUKMODY program, requires the system to be electrically neutral. Thus, in general, a direct inclusion of the scaling factor into the Ewald calculation is hard to implement because it would require finding an overall neutral set of interactions to scale. (The Ewald method calculates energy andforces in reciprocal space that is atom-based versus the real space part that is interaction-based.) Since the modified potential surface does not have to correspond to a real surface this is not, in principal, a problem. However, as in any modified potential method, if the trajectory becomes too distorted from the true potential trajectory, the reweighting method becomes counterproductive. To deal with this issue, because we will only target interactions within the protein and/or ATP, and these interactions are dominant within the same MD image cell, the potential modification will only be applied to the primary cell and the real space energy and forces. For example, to target an electrostatic interaction (the only type of interaction targeted in the work summarized in this chapter) between atoms labeled 1 and 2, the interaction energy V12 and corresponding atom forces E and F2 are calculated using the Ewald method with it =1 , which is just the normal Ewald MD procedure. Next, the differences 103 between using 2 =1 and using the scale factor 2 are calculated directly fi'om Coulomb’s law in the primary cell as V131”? = (it —1)(q,q2 / rm) and the corresponding forces FI‘Wr and de'f are obtained by differentiation. Then, the above differences are added to the Ewald V12 , E and F2 calculated before to obtain the net values Vl’z'e‘ = V12 + Vl‘ffl , F,“ = F1+Edifl and F,” = F2 +F2d'fl. These net forces are used to advance the system configuration and the net energy is used to obtain AV. The other (van der Waals and internal) interaction terms can be obtained by a similar scheme. 3. Results and Discussion Restraint MD of HPPK-ATP As described in the methodology section, the simulations are based on the E. coli ternary complex of HPPK, AMPCPP with two magnesium cations and HP, with HP removed and AMPCPP replaced by ATP. The firlly deprotonated ATP phosphates (total charge —4) with the two ligated Mg2+ cations provides a neutral, though highly polar, ligand for HPPK. A primary 7 ns conventional MD simulation of this binary complex led to little change in conformation from the ternary (closed) structure. Furthermore, we also carried out simulations using the HREM approach similar to the one used for the work of last chapter to see if HPPK by itself, again started fi'om the ternary crystal structure with both ATP and HP removed, would take on a more apo-like conformation but these simulations also led to modest rearrangements. For these reasons, the HPPK-ATP binary complex was chosen to be the focus and a particular set of restraints were picked to drive the loops towards a conformation similar to the apo structure. Restraints were picked 104 partly based on the observation that long side-chains of some of the Loop 3 residues (principally Arg82, Arg88 and Trp89) drape over the ATP when HPPK is closed, forming a cylinder whose front face is composed of these residues, as schematized in Figure 1. These side-chains should be pushed out toward the solvent in conformity with the apo crystal structure. Other restraints were chosen to make the backbone conformation of Loop 2 and Loop 3 similar to that of the apo crystal structure. To achieve this transformation, harmonic restraints (with force constant k=2 kcal/Az) on 15 distances, measured between the mass centers of the atom pairs listed in Table 4.1, were introduced. The restraints were applied in a step-wise manner, by dividing the transition from the HPPK-ATP closed to open form into 10 sequential phases, with each phase consisting of 500 ps of MD, to interpolate between the endpoints. Figure 4.2 displays a superposition of representative structures selected from three of the above ten 500 ps time intervals corresponding to windows 1, 5 and 10. From the figure, it is clear that the cores of the three structures are well-aligned, which suggests that even under the forces arising from converting HPPK-ATP from its closed to open forms, there is minimal perturbation of the core, again showing the stability of the protein core. By comparing the loop conformations (including both backbone and important long side-chain conformations, such as Arg82, Arg88 and Trp89), it is clear that Loop 2 and Loop 3 were pulled gradually fi'om closed-oriented to open-oriented conformations. In particular, the side chains of Arg82, Arg88, and Trp89 that point inward to seal ATP in the binding pocket of the closed form progressively move to point out to the solvent in the open form. Another key feature to note is that the ATP position is very stable along the trajectory, with all its parts hardly moving. Our hope was that once the side-chains 105 Table 4.1. The atom(s) whose distances are used for the restraint simulations Atom(s) l Atom(s) 2 CA“) Gly46 CA Asp97 CA Pro47 CA Asp97 CA Asp49 CA Asp97 CA Arg82 CA Asp97 CA Arg84 CA Asp97 CA Ala86 CA Asp97 CA Arg88 CA Asp97 CA Trp89 CA Asp97 CA Arg92 CA Asp97 cz‘b) Arg82 CA A8997 CZ Arg84 CA Asp97 CZ Arg88 CA Asp97 CZ Arg92 CA Asp97 CZ@ Arg92 CA Leu78 C52“) Trp89 CA A8997 (a) CA denotes a C-alpha atom; (b) CZ stands for the 8 carbon and amino groups of the Arg side-chain; (c) This restraint was not used during the ten-window restraint procedure, but was later realized to be important to ATP separation from HPPK. It was added later, while holding the loops open in the tenth window; ((1) CE2 denotes a carbon atom of the 6-member aromatic ring. 106 Figure 4.2. Superposition of representative structures from three windows along the restraint pathway from closed to open. The structure for window 1, which is quite similar to the closed form, is colored in yellow; the structure for window 5 is colored in purple; the structure for window 10 is colored using the normal convention. ATP in each window is shown in “ball and stick” mode, and residues Arg82 and Trp89 are shown in “stick” mode. It is clear that ATP remains bound to HPPK throughout the restraint simulation. 107 that cover part of ATP in the closed form move away, ATP might be able to get separated from HPPK or at least show substantial change in its conformation. That this did not happen suggested that there must be other interactions responsible for maintaining ATP bound to HPPK in the absence of HP. To further the study, a 9 ns MD simulation was carried out starting from the end configuration of the above-mentioned step-wise restraining procedure. The restraints as listed in Table 4.1 were still maintained with equilibrium distances set to be equal to the ones measured from the apo HPPK crystal structure. This simulation was carried out to see if ATP would leave the active site automatically when HPPK was restrained to its apo-like state. In Figure 4.3 (left panel), a CPK view of the HPPK x-ray closed structure is shown for comparison with the CPK view of the protein structure after holding the restraints for 9 ns Figure 4.3 (right panel). Two points are worth noting. One is that from the viewpoint of the figure, it is clear from the right panel picture that there is nothing blocking the pathway for ATP to leave the binding pocket, while the path for ATP to leave in the left panel picture is clearly blocked by Ala86, G1u87, and the side-chains of Arg88 and Trp89. For clarity, these four residues are colored yellow. The other point is that there is minimal difference between the two pictures with respect to the ATP center of mass position relative to the protein core, though the conformations of ATP in the two figures are a bit different. (The conformation for the MD snapshot shown in the right panel is more compact in comparison with the x-ray structure shown in the left panel). Reweighting MD of HPPK-ATP It is clear from the restraint MD study that the AT P-HPPK interactions are 108 Figure 4.3. (left panel) CPK View of the starting MD simulation structure derived from the ternary complex. (right panel) CPK View of the MD structure after maintaining the restraints for 9 ns subsequent to the restraint opening protocol. ATP is colored purple. The yellow residues, 86-89, which cover part of ATP in the initial ternary complex, are mainly replaced by solvent interactions in the MD-opened structure. 109 sufficiently strong that on an MD time scale (here 9 ns) ATP remains bound, even though the interactions present in the closed structure that are certainly designed to hold ATP in position are absent in the MD-open form. To attempt to get the ATP separated from the HPPK, one possible and common scheme is similar to what was just used to open HPPK- ATP. However to do so either requires initial and final states as given by e. g. crystal structures, or if an obvious path does exist the ligand could be pushed out with restraints in an appropriate direction. However, here, an appropriate path is not clear from e. g. the end of the restrained open simulation. More importantly, picking a particular path will certainly introduce a prejudice into the simulation that may or may not be realistic. Thus, as noted in the methodology section in this chapter, when specific interactions can be identified targeted reweighting is an appropriate and more objective choice to accelerate the exploration of configuration space that here is characterized by the ATP-HPPK interaction. It is clear that in both the closed form crystal structure and the restraint MD opened structure the two magnesium cations are very close to the carboxylate groups of two HPPK core residues, Asp95 and Asp97. For example, in the restraint MD opened structure the distances of Mg] (associated with the or phosphate of ATP) to the closest carboxylate oxygens of Asp95 and Asp97 are 2.5 and 3.0 A, and the corresponding distances for the Mg2 (associated with the 7 phosphate of ATP) are 3.3 and 4.6 A. Even though the side-chains of Arg82, Arg88 and Trp89 no longer block the path for ATP to leave its pocket, clearly there still are strong interactions with the protein; notably with the carboxylate groups of core residues Asp95 and Asp97, and these strong “salt bridges” are good candidates for important barriers restraining ATP4‘(Mg2+)2. Unscreened charge 110 interactions that are on the scale of about 100 kcal/mol are essentially as strong as covalent bonds and will clearly prevent the movement of ATP on any normal MD time scale. Therefore, these salt bridges are good candidates to subject to the targeted reweighting scheme discussed in the methodology section in this chapter. The targeted atoms are the carbon and two oxygens of the carboxylate groups in these two residues and both magnesium cations; thus, the Mg] and 'Mg2 to Asp95 and Asp97 electrostatic interactions are targeted. A uniform scaling value of 7t= 0.71 (see the methodology section for details about the reason for picking this value) was used. The modified trajectory was started up fi'om an endpoint configuration of the restrained MD simulation. The targeting leads to ATP separating from HPPK on the nanosecond time scale. Six stable states were found along the separating process. The stable states are defined by examining the reweight energy — the electrostatic energy between the two magnesium cations and the two residues — along the trajectory. They correspond to plateaus in this energy. The initial modified energy of interaction V * is about 300 kcal/mol while at the final state it is about 30 kcal/mol. This energy scale is so large that we were not successful in carrying out the reweighting required to obtain the correct Boltzmann populations along the trajectory. That would be necessary if a potential of mean force were the objective, but the modified trajectory still provides legitimate states along the separating process. Six representative structures from the trajectory were picked for display to show different states as ATP separates, and their superposed pictures are shown in Figure 4.4. The first structure (ATP in yellow) is the starting structure (endpoint conformation of the restrained MD simulation) and the sixth one (ATP in red) is where 111 Figure 4.4. ATPMgz for six different conformations as it separates from HPPK are shown with yellow, green, blue, purple, orange, and red indicating stages one through six. The gradual separation of ATP is evident as a combination of reorientational and translational motion. 112 ATP has moved out of the binding pocket. The middle four structures (the ATP is colored green, blue, purple, and orange, respectively) are stable states along the path of separation. The gradual exit of ATP is evident from the Figure 4.4, and it is accomplished through a combination of mass center movement and rotation of the ATP, although it is difficult to see details of the motion from this figure. Therefore, the relevant hydrogen bonds and salt bridges were analyzed for the first 5 representative structures, and the results for structures 1, 3 and 5 are shown in Figure 4.5 that displays hydrogen bonds (left panels) and salt bridges (right panels) and are listed in Table 4.2 (hydrogen bonds) and Table 4.3 (salt bridges). Between the starting structure (structure 1) and structure 2, the mass center position of ATP does not change much, while its conformation does change significantly so that three new hydrogen bonds were formed and the two salt bridges involving one of the two Mg ions, Mg2, were broken. Between structure 2 and structure 3, ATP moved along its path away from the binding pocket somewhat; whereby three old hydrogen bonds and one salt bridge were broken, while one new hydrogen bond and one new salt bridge were formed. It is noteworthy that ATP has changed its conformation somewhat so that some old hydrogen bonds and salt bridges can be preserved with the change of its mass center position. This aspect of ATP motion is in evidence during the whole process. Throughout the movement fi'om structure 3 to 4, though ATP underwent a large mass center position change, three of the four old hydrogen bonds were preserved, and the one that was broken was replaced by a new one with similar character. For the salt bridges, the one between Mgl and Asp97 was lost, while the one between ATP and Arg110 changed a little bit. Again, though ATP moved a large step down its path (monitored by the energy of interaction with Asp95 and Asp97) of moving away from the 113 Figure 4.5. Details of the hydrogen bond and salt bridge interactions as ATP separates from HPPK. Left panels, from top to bottom, are hydrogen bonds (dotted lines) for stages 1, 3 and 5. Right panels, the corresponding salt bridges (solid lines). See Table 4.2 for hydrogen bond and Table 4.3 for salt bridge particulars. 114 Figure 4.5 (continued) 115 Table 4.2. The hydrogen bonds that are present in the stages of ATP separation from HPPK. Structure Number Hydrogen Bond Atom of ATP Atom of HPPK Structure 1 AN6 (Nitrogen) Carbonyl Oxygen of Ile98 (starting AO3 (Oxygen) Terminal Nitrogen on structure) guanidine group of Argl 10 A05 (Oxygen) Terminal Nitrogen on guanidine group of Argl 10 Structure 2 AN6 (Nitrogen) Carbonyl Oxygen of Ile98 AN 6 (Nitrogen) Carbonyl Oxygen of Thrl 12 AN 7 (Nitrogen) Backbone Nitrogen of Thrl 12 A03 (Oxygen) Terminal Nitrogen on guanidine group of Argl 10 A05 (Oxygen) Terminal Nitrogen on guanidine group of Argl 10 AOZPG (Oxygen) Oxygen of the phenol group of Tyrl 16 Structure 3 AN6 (Nitrogen) Carbonyl Oxygen of Ile98 AN6 (Nitrogen) Oxygen of the hydroxide of Thrl 12 AN 6 (Nitrogen) Backbone Nitrogen of Thr112 116 Table 4.2. (continued) AN 7 (Nitrogen) Oxygen of the hydroxide of Thrl 12 Structure 4 AN 6 (Nitrogen) Carbonyl Oxygen of Met99 AN 6 (Nitrogen) Oxygen of the hydroxide of Thrl 12 AN6 (Nitrogen) Backbone Nitrogen of Thrl 12 AN 7 (Nitrogen) Oxygen of the hydroxide of Thr112 Structure 5 AN 6 (Nitrogen) Oxygen of the hydroxide of Thrl 12 AN 6 (Nitrogen) Backbone Nitrogen of Thrl 12 AN7 (Nitrogen) Oxygen of the hydroxide of Thr112 117 Table 4.3. The salt bridges that are present in the stages of ATP separation from HPPK. Structure Number Salt Bridgem Group 1 (positively Group 2 (negatively charged) charged) Structure 1 (starting Mgl Carboxylate group of Asp95 structure) Mgl Carboxylate group of Asp97 Mg2 Carboxylate group of Asp95 Mg2 Carboxylate group of Asp97 Guanidine group of a-phosphate group of ATP Argl 10 Structure 2 Mgl Carboxylate group of Asp95 Mgl Carboxylate group of Asp97 Guanidine group of (at-phosphate group of ATP Argl 10 Structure 3 Mgl Carboxylate group of Asp97 Mg2 Carboxylate group of Aspl 17 Guanidine group of a-phosphate group of ATP Arg110 Structure 4 Mg2 Carboxylate group of Aspl 17 Guanidine group of B-phosphate group of ATP Arg121 118 Table 4.3. (continued) Structure 5 Mg2 Carboxylate group of Aspl 17 Guanidine group of B-phosphate group of ATP Argl 10 (a) Mgl (Mg2) refers to the one associated with the beta (gamma) phosphate of ATP 119 binding pocket, during the transition fiom structure 4 to structure 5, only one old hydrogen bond was broken, while all 3 other hydrogen bonds and all the salt bridges were preserved. Finally, the ATP-HPPK energetic interaction decreased to its smallest value and ATP moved to positions (see Figure 4.4 with ATP in red) where it is essentially out of the HPPK binding pocket. The successive and staged breaking of the original hydrogen bonds and salt bridges and formation of new ones during the process of separation show those salt bridges and hydrogen bonds are important in the simulated separation process. The ATP leaving process discussed in the above could be interesting for the following two reasons. First, when thinking about the ATP binding (versus leaving) process, the binding process might be similar] to the ATP leaving process simulated in the above. Then, for example, those salt bridges and hydrogen bonds, which seemed to be important in the simulated ATP leaving process, might play important roles in the binding mechanism of ATP. Second, the release process of the product ADP also might be similar to the simulated ATP leaving process, due to the similarity of ATP and ADP. Of course, the ATP leaving path shown in the above discussion was generated by targeting the interactions of the two magnesium cations with Asp95 and Asp97. When other interactions are targeted, it is possible that the generated ATP leaving path could be different. We were not successful in reweighting back the trajectory to obtain the correct Boltzmann equilibrium sampling. Said otherwise, a potential of mean force was not obtained because the large range in energies of the modified potential surface along the trajectory did not permit the collection of sufficient data for the required accurate. This problem stems from the exponential dependence on the reweight potential. The only way 120 to address the problem would be to scale many more degrees of freedom and possibly obtain a smaller range of energy values. However, those more implicit degrees of freedom (in contrast with the explicit salt-bridge interactions that we identified) would be very hard to identify. Ultimately, the difficulty is the very high-dimensional surface that is being explored. , One thing notable is that a reweighting scheme does not directly provide dynamical information on the original surface. Once the potential surface has been modified, a time scale cannot be directly extracted from the simulation. So, one should not get the impression from the reweighting simulation that ATP can be released from HPPK on the ns time scale. According to transition state theory, the real time step can also be calculated by reweighting the simulation time step by the reweighting factor exp(,BA V) , but this calculation requires that the energy difference AV to be zero for areas other than the energy basins, which is non-practical implementation requirement for complex systems.108 To obtain dynamic information from simulation of complex system based on modified energy surface, Hamelberg and McCammon introduced.108 an estimation method based on their special form of AV while Chen and Horing derived 109 a formula to calculate rate of transition out of the initial state for simulations using Langevin dynamics. 4. Concluding Remarks In the work summarized in this chapter, a set of restraints were found that did lead to HPPK-ATP opening to apo-like conformations. One scenario that could have resulted during the restraint simulation was that at some point along the ten stages used to span 121 the closed to apo-like conformations ATP would have separated from HPPK. This was not found. Instead, as shown in Figure 4.2, the relative center of mass position between HPPK and ATP hardly changed, even though several residues present in the ternary structure that clearly block ATP from solvent exposure have moved away from the ATP to be essentially solvated in the last restraint window, as in the apo crystal structure (see Figure 4.3). To investigate the interactions possibly responsible for holding ATP in its binding pocket, a 9 ns conventional MD run was carried out (with the restraints maintained at the last window values) starting from the endpoint of the HPPK-ATP restraint MD simulations and the simulation trajectory showed that ATP still remained bound to HPPK. Examination of the simulation data, illustrated by Figure 4.3, shows that there are several residues (Asp95 and Asp97) that still interact strongly with ATP4‘(Mg2+)2. These core residues are clearly part of the binding pocket for ATP and provide a large, mainly electrostatic, interaction with ATP4'(Mg2+)2. In view of the specific salt-bridge-like interactions between these charged residues and the magnesium cations that in turn have a strong electrostatic interaction with ATP4‘, some of these interactions were the natural candidates to use in a targeted reweighting scheme. The targeted reweighting does lead to the separation of ATP from HPPK as shown in Figures 4.4 and 4.5 and detailed in Tables 4.2 and 4.3. The successive and staged breaking of the original hydrogen bonds and salt bridges and formation of new ones dominate the interactions for ATP binding. In terms of energetics, there are six stages of energetic interaction that can be identified over about 1 ns of simulation time and then, over the remaining 1 us of the simulation, the ATP 122 remains separated from HPPK. Thus, these salt bridges play a key role in the binding mechanism. 123 Chapter 5. Some potential useful modifications of the HREM approach In this chapter, several modifications of the HREM approach described in previous chapters are discussed that are under development and will be usefirl to other protein problems. 1. HTREM (Hamiltonian Temperature REM) HTREMl '0 is developed through reformulating HREM by also scaling the kinetic energy of the protein, thereby introducing an effective temperature 7: = l/kBfil , where _L_[__N_s_]i[_fi_]_1_ (5,1) 16,1 NP+NS :3 NP+NS .B/1 with Np and N, the number of degrees of freedom of protein and solvent, respectively, and k is the same scaling factor used in the HREM approach, described repeatedly in previous chapters. It can be shown that by appropriate implementation, similar to the HREM approach, the Boltzmann equilibrium will result in the extended ensemble of the product of all the systems’ ensembles for a sufficiently long trajectory for the HTREM approach. The main usage of the HTREM approach is that by using these effective temperatures, a pseudo melting curve can be constructed. That is, now each system’s trajectory is of use and corresponds to a simulation at a series of effective temperatures. This method can be used for predicting the stability of dimerized proteins (fraction bound as a function of the effective temperature) and finds use to compare the stability of wild type and mutant formsI 10. 124 2. HRTREM: (Hamiltonian Restraint TREM) The HRTREM is formulated by introducing restraint potentials V01) to the HTREM Hamiltonian H (,1) , so that the resulting Hamiltonian for HRTREM is H (A, p): H (,1)+V(,u). The HRTREM is introduced in the hope of addressing two HTREM possible deficiencies when applied to constructing melting curves. First, it is probable for higher temperature systems to, having just exchanged, exchange back because they are not so different (in truly different conformations). Second, the sampling of separated monomers in a finite MD simulation box might be prejudiced toward less- than-separated distances. For the above mentioned purpose, in implementation, only the HRTREM systems at the highest temperature consist of H (,1, p) while systems at successively lower temperatures only consist of H (2) H (2., ,u) systems and H (,1) systems exchange at the highest temperature, and the chain of H ( ,1) systems exchange down to the lowest temperature, building a cycle where H (1, ,u) systems may help to break the undesired correlation among the higher temperature H (2) systems and this break in correlation could propagate down to the lowest temperature system. The restraint will also tend to keep the monomers separated as desired at the higher temperatures. Of course, only the H (2.) systems are used for melting curve construction. (Note: in the implementation of HRTREM in CUKMODY described in the Appendix Chapter, the HRTREM is implemented by a simple HRTREM routine and a more general (allowing variations in force constants and the form of restraints) HRTREMGEN routine. Details are given in the Appendix, Chapter.) 125 3. HTREM_RMS (HTREM_RootMeanSquare) The HTREM_RMS approach is formulated similar to the HRTREM approach. It is introduced to help deal with the following problem. For a HTREM simulation, while the high effective temperature systems could be effective in overcoming barriers, they sometimes have the downside of tending to explore extensively in places that are not important in the true sample space. Therefore, in HTREM_RMS, H (it =1) is used for the experimental system (as described repeatedly in previous chapters, 1 =1 indicating the true system Hamiltonian) and H (it, ,u) are used for all other systems with higher effective temperatures. (Note, the main difference from HRTREM is here: for HRTREM, only a few systems with the highest temperature are using H (A, ,u) .) Then the restraints V(p) can be set to help to keep the systems with high effective temperature close to some desired places in the sample space. For example, to keep a dimer from separating too readily at some higher temperature and thus permit better sampling of side chain interactions between monomers.) Of course, for HTREM_RMS, only the experimental system (H (2. =1)) contains ensemble information that can be used directly. (Note: for the routines described in the Appendix Chapter, both HTREM_RMS and its modification, HTREM_RMS_gradua1, are for the HTREM_RMS implementation. The difference is that the later routine applies restraints more gradually. Details are provided in the Appendix Chapter.) 126 4. HREM_Ion The conformations of highly-charged proteins might be influenced by the ionic atmosphere formed by explicit ions in the system. However, bound ions near charged residues could be quite sticky in an MD context due to the strong electrostatic salt-bridge- like interactions. A schemel ” whose modifications here will be referred to as HREM_Ion was recently introduced to help increase the sampling efficiency of the ions around highly charged DNA. A base system with normal potential energy V(O) is coupled to a set of N other hypothetical systems, each with a unique potential V“) 2 VM +(,u-1)vi (i=1,..N). Here v' denotes the interactions of the ith group of ions, and system i interacts with a modified potential term ,uvi. Different systems (with different groups of scaled ions) will have the possibility of exchanges that interchange their interaction strength scales, 1 and ,u . Again, although the hypothetical systems could help to accelerate exploration, only the base system provides information that can be used directly. In our implementation oppositely charged ions need to be paired in each system in order to maintain overall neutrality in the Ewald energy and force evaluation. Any number of (i) paired ions can be incoporated in V . 5. HREM_IonWater The scheme for HREM_IonWater has a similar goal to HREM_Ion, though there is an important difference in the underlying concept. All the systems of HREM_IonWater approach are normal systems (all systems contain information that can be used directly). 127 For HREM_IonWater, instead of scaling groups of ions with ,uvi as in HREM_Ion, a water molecule is considered as an ion whose charges on its three sites have been modified so that a vi (y) that is parameterized with water charges (qH,q0,qH) instead of ion charges (0,q,,0) can be introduced. For a certain exchange attempt between two systems with potentials V0) (water i and ion j) and V0) (water j and ion i), if the attempt is successfiil, then the ions and water molecules will exchange positions and then there is a large change in configuration. Of course, ions used to exchange with water must be chosen to have similar vdw parameters to water molecules to avoid bad vdw contacts. 128 Appendix. Description of implementation of HREM and several modifications 1. HREM The HREM approach is implemented in the CUKMODY program based on the 112 former DREM implementation . The following is some description of how HREM is implemented in CUKMODY. Initial Loading When constructing the main MD simulation object of the “Solute _polar” class, a "setHChange" function is called in the constructor to load the parameters needed for HREM and store the parameters in a certain structure member of “Solute _polar” class. Then the “remMachine” class object, which will be used to conduct HREM procedures, is constructed and the HREM parameters are copied from “Solute _polar” class object to the “remMachine” class object into a structure for storing HREM parameters. Note that the correct parameters and configurations are loaded to each MPI process according to its rank. (Each MPI process will contain a certain configuration and the HREM parameters (which are characteristics of systems) will be exchanged according to HREM scheme when attempting exchange). Exchange Attempt The first thing to be noted is that there are two pthread threads being used, one thread will take care of normal MD simulation (in this pthread thread, the MD simulation may also be parallelized by using multiple openmp threads) and another thread will take care of the work involving making exchange attempts, including exchanging information 129 among MPI processes (again, note that each MPI process contains a particular configuration, and a set of HREM parameters according to which system it is at current time.) For a certain MD step, after forces and energies are calculated normally, if the current time step is a multiple of the predefined number of steps for exchange attempt (say, 20 or 40), then a set of commands will be executed in the driver to conduct exchange attempt. First, functions will be called in each MPI process to calculate the energy for the current configuration of the process under its current HREM system parameters (“HREM_energy_one”) and the energy for the configuration under the new HREM system parameters if the exchange attempt would be successful (“I-IREM_energy_another”). (Note: first, one exchange attempt is only made between two adjacent systems. Each system, with the exception of the first and the last each of which has one adjacent system, only have two adjacent systems and the information of these two adjacent systems is already stored as part of the system’s HREM parameters structure. For system i, the exchange attempts are alternated between being made with system i+1 and with system i-1.Therefore, according to step number (by which the system on a certain MPI process can know which one of the two adjacent systems it would attempt to exchange with), in each MPI process the “I-[REM_energy_another” can be calculated independently. Second, the “HREM_energy__one” and “HREM_energy_ another” only include PME reciprocal space energy and the part of PME direct space energy that can not be canceled out for the later Metropolis Delta calculation. The definition of Metropolis Delta is provided in the methodology section of Chapter 2, Eq. 2.3.) 130 After obtaining “HREM_energy_one” and “HREM_energy_another” which are stored as part of HREM parameters structure, the thread responsible for exchange attempt on each MPI process will put its HREM parameters structure into an array created on the MPI process with its rank to be zero. Calculations of Metropolis Deltas for all the attempts are then made on that MPI process in its thread responsible for exchange attempt. If any exchange attempt is determined to be successful, corresponding changes will be made to the array containing all the HREM parameters structures. In the end, the thread responsible for exchange attempt on the MPI process with rank zero will pass the correct HREM parameters structure (again, note, the HREM parameters structure is the characteristic of a system) to the MPI process containing the corresponding configuration left intact. Then the new HREM parameters (could be the same ones if exchange attempt was determined to be not successful) will be used to calculate MD forces and energies starting from next MD step. Code excerpt of the driver of HREM implementation of CUKMODY The following is some excerpt of the c++ codes of the driver file for the HREM implementation of CUKMODY program. The excerpt is chosen to contain important lines relevant to making exchange attempt and some lines close to those important lines which may help understanding. (The sign “...”, as usual, means some lines are omitted, so if no “. . .” between two lines listed then those two lines are also consecutive in the real driver file.) ////creating the simulation 131 Simulation *ptr; // this will report the num threads used if omp used Solute_polar pt_ch2c12(numProcessPerNode); // "setHChange" function was called in the constructor for Solute_polar; //In the function, the parameters needed for HREM were read in and //temporarily stored in "pHChnge".(A substantial lines of codes here //are used to take care of "exchange going down" or "exchange going up" //details.) ptr = &pt_ch2c12; ////set up index ptr->setReactCoordIndex(rank); //At the start, since it is a new start (even for restart, we have re- //ordered back) so configurations and systems are matched at this //moment(say, configuration 0 must be in system O).So we simply, at //this moment, label each node according to the rank (which is an //integer starting from 0) being assigned to the MPI process on it by //the MPI library. And the corresponding configuration will be loaded //latter accordingly.(The correct HREM parameters were already loaded //in setHchange using the same trick as here.) /////create the RemMachine RemMachine remMachine(STEP_REM,ptr,&HREMProb); //Here,as usual data members of remMachine are initialized in the //constructor and the HREM parameters were read into the remMachine //object from the copy constructed in the previous "setHchange" function 132 //in the Solute_polar constructor by calling "pSim->getPHChange()" //function in the RemMachine constructor int index = rank; // Here, the "index" is the copy of index for driver and the previous //"ptr—>setReactCoordIndex(rank)" initializes the copy of index //essentially for the Solute_polar object pt_ch2c12. The usage of the //index and oldIndex variables here in the main function of driver is //for later output.purposes. int oldIndex = index; ////initially scale q c6 c121 c122 c123 ptr->scale_q_c6_c12_initially(); //Since we need to calculate some energies before any exchange attempt //can be made, we need to set up the electrostatic and vdw constants to //be in accordance with the corresponding (scaled) system. remMachine.turnOn(); //Here, we spawn off a second pthread thread to take care of the things //related with exchange (including sending and receiving information, //and arbitrating, which is only done on the node with rank=0). The //original pthread thread will still run normally to execute codes for //MD calculation. //The following three lines prepare for the replica exchange attempt at //the first step, so that later we can "setHREMEnergyOne" ptr->compute_energy(); ptr->compute_energy_solute(); 133 ptr->eva1uateTemperature(); ////The following is the MD simulation loop while(iinitialize(); if(is_solvent_polarizable) ptr->drude_dipole(10,0.01); ptr->build_list(rl,i); ptr->compute_PME(i); ptr->compute_force3(rc); ptr->compute_force_solute(rc); ptr->compute_cobond(i); ptr->compute_cobond_solute(i); ptr->compute_angle(); ptr->compute_angle_solute(i); ptr->compute_dihang_solute(i); ptr->compute_force_internal_solute(rc); ptr—>compute_inter_solute(rc); //The above lines for MD force and energy calculation ////The following in “if" brackets are for exchange attempt if(i% STEP_REM == 0) { ptr->setHREMEnergyOne(); remMachine.applyHChangeForCalc(); ptr->compute_PME_another(); remMachine.setHREMEnergyAnother(); 134 remMachine.setbackHChange(); //The previous five lines set up the HREM energy before exchange //(HREM energy one) and the HREM energy after exchange if exchange //would be successful (HREM energy another) for the latter calculation //in the remMachine object for Metropolis Delta. The HREM energy //comprises the whole reciprocal space energy (calculated by pme method) //and part of the direct space energy.(The not-included part of the //direct space energy will be canceled out in the Delta calculation.) //Due to the requirement of //pme implementation, to calculate the reciprocal space part of "HREM //energy another" we need to use "remMachine.applyHChangeForCalc" //function to set the electrostatic and vdw constants to be proper //values according to the system which the current system is attempting //to exchange with. In terms of direct space energy, since it is //calcu1ated pairwise, we only need to include the energies involving //atoms with their electrostatic and vdw constants scaled (others will //be canceled out). Those energies are only calculated in the //following three functions: "compute_force_solute", //compute_force_internal_solute", "compute_inter_solute". During the //execution of the above—mentioned three functions,the //"vdwE_sca1e","vdwE_scale_2","elecE_scale" and "elecE_scale_2" were //obtained. "_scale" means only one of the two atoms in the pair is to //be scaled and "_scale_2" means both of the two atoms in the pair are //to be scaled. Therefore, the included direct part of "HREMEnergyOne" //is simply the sum of the above four terms, while the calculation of //the included direct part of "HREMEnergyAnother" could be done by //multiplying the corresponding ratios calculated by dividing the new //scaling factor by the old factor (see "comments in "Rem.cpp" for //function "HChange::setHREMEnergyAnother").Since we have changed 135 //electrostatic and vdw constants for calculating the //"HREMEnergyAnother" while we are not sure at this moment whether the //exchange will be successful; we need to change back those constants //to original values through the execution of "setbackHChange" function. remMachine.updateInfo(); //The function puts the previously calculated information stored //in the "Hchange" class object, which is a member of the "remMachine" //class object, into a struct "info" which is a member of "pData", //which itself is a member of the "RemMachine" class object. remMachine.arbitrate(i); //In this function, the thread spawned off for sending and //receiving messages and doing arbitrating is awakened. The thread will //do things (see after " varForArbitrator.wait();" inside "Rem.cpp" //file) related to exchange attempts, including calling functions to //use the calculated "HREMEnergyOne" and "HREMEnergyAnother" to //calculate delta and then use the delta to judge whether the //exchange attempt is successful or not. remMachine.applyHChange(i); //After the exchange, the correct window information is stored in //the "HChange" class object and is used here to scale the //electrostatic and vdw constants of the atoms which are in the //initially specified group including atoms that will be scaled. (Note: //When the exchange attempt is rejected, then the ratio used to scale is actually 1.) remMachine.reverseIsUp()i 136 //As is usual in a REM approach, there are two patterns of exchange: //referred to as “up” and “down” (see Chapter, xxx) For HREM, the //information involved is a bit different. So we need to tell which //pattern it is for a given exchange attempt. For the first exchange //attempt, the "down" pattern is in use. } //end of the “if" brackets for exchange attempt oldIndex = index; index = remMachine.getIndex(); if(i%STEP_SYSLOG==O) { static int oldIndex = index; if(oldIndex != index) { sysLogStream << cfgIndexEnd;//put the index end sign sysLogStream << cfgIndex(index);//pUt the index sign oldIndex = index;//update index } sysLogStream << "\ntime_step " << i <