annqw-n “-

I
m
a
a
.
.
w
m

 

 

.‘3

Q.~ T“
—Q

3“\

 

LIBRARY
Michigan State
University

 

 

 

This is to certify that the
dissertation entitled

DEVELOPMENT OF SPECIAL MOLECULAR
DYNAMICS BASED METHODS TO ACCELERATE
SAMPLING OF IMPORTANT PROTEIN MOTIONS

presented by

Li Su

has been accepted towards fulﬁllment
of the requirements for the

Ph.D. degree in Chemistry

 

 

ﬂ/vwf CUM;

 

Major Professor’s Signature

@‘J’ /8/ 2009

Date

MSU is an Afﬁrmative Action/Equal Opportunity Employer

 

----.-r—.—o—-—.—.-.-.._ -

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5108 K:/Prolecc&Pres/CIRCIDateDue.indd

 

 

DEVELOPMENT OF SPECIAL MOLECULAR DYNAMICS BASED METHODS
TO ACCELERATE SAMPLING OF IMPORTANT PROTEIN MOTIONS

By

Li Su

A DISSERTATION
Submitted to
Michigan State University
In partial fulﬁllment of the requirements
for degree of
DOCTOR OF PHILOSOHPY
Chemistry

2009

ABSTRACT
DEVELOPMENT OF SPECIAL MOLECULAR DYNAMICS BASED METHODS
TO ACCELERATE SAMPLING 0F IMPORTANT PROTEIN MOTIONS
By
Li Su

Molecular Dynamics (MD) applied to complex systems such as proteins that have
energetic barriers separating different conﬁgurations may suffer from slow sampling of
the conﬁguration space. Therefore, new methods are needed to improve the ability of
exploring conformational space. One way to accelerate the exploration of conﬁguration
space is with various Hamiltonian replica exchange methods (HREM) whereby multiple
systems differing in their Hamiltonians are run by MD in parallel and, periodically,
attempts at exchanging them is made according to a Monte Carlo rule that maintains the
Boltzmann distribution.

The HREM (implemented in the CUKMODY MD code designed for the efﬁcient
simulations of solvated proteins) is ﬁrst used to study the conformational states of the
zwitterionic form of the pentapeptide Met-enkephalin. There is a competition between
open forms of the peptide driven by polar solvation of the terminal ammonium and
carboxylate groups and closed forms driven by their salt-bridge formation. Normal MD
started from an open state does not sample closed conformations. A small number of
HREM systems were found to be sufﬁcient to sample closed and open states a sufﬁcient
number of times to obtain potentials of mean force along various reaction coordinates. A
Principal Component Analysis (PCA) shows that the ﬁrst two principal modes capture
more than one-half of the HREM generated ﬂuctuations. The ﬁrst mode corresponds to

the end-to-end distance ﬂuctuations and shows that the closed zwitterionic state is the

predominant species. The second mode describes the presence of two conformations of
similar end-to-end distance that differ in the values of neighboring psi and phi dihedral
angles, because such psi/phi compensation can still produce the same end-to-end distance.

HPPK (6-Hydroxymethyl-7,8-dihydropterin pyrophosphokinase) catalyzes the
transfer of pyrophosphate from ATP to HP (6-hydroxymethyl-7,8-dihydropterin). This
ﬁrst reaction in the folate biosynthetic pathway is an important target for potential
antimicrobial agents. The conformations of the HP binding pockets of E. coli HPPK and
Y. pestis HPPK are studied by HREM methods. Root mean square ﬂuctuation (RMSF)
calculated based on backbone atoms shows that loop 2 (which is close to the HP binding
pocket) of YpHPPK has a much larger ﬂexibility than that of loop 2 of EcHPPK. By
using clustering methods, we ﬁnd that EcHPPK and YpHPPK have residue
conformations around the HP binding pocket that are are close to but distinct from those
found in the crystal structures. There are near-closed conformations that have different
HP binding pocket shapes in EcHPPK and YpHPPK that have potential for
discriminating among ligands.

The conformational space of ATP binding to HPPK and the unbinding process of
ATP from HPPK is studied using a restraint MD method and a targeted reweighting
scheme (implemented in CUKMODY). ATP remains remarkably stable in its binding
pocket when HPPK is driven, by using the restraint method, towards its open form. When
the ATP is induced to leave its binding pocket by using a targeted reweighting method, it
uses a special path that preserves the hydrogen bonds and salt bridges existing in previous

stages along the path.

Dedicated to

my family in China and my life at MSU

iv

Table of Contents

 

 

 

List ofTables .................... vii
List of Figures ...................................................................................... viii
Chapter 1. Introduction - -- 1
1. Molecular Dynamics Simulation ................................................................................ 1
General description ..................................................................................................... 1
Force Field .................................................................................................................. 3
Integrator ..................................................................................................................... 3
Temperature and pressure control ............................................................................... 4
Periodic Boundary Conditions (PBC) ......................................................................... 6
Calculation of long-range (non-bonding) interactions ................................................ 7
2. Methods for enhanced sampling ................................................................................. 9
General Introduction ................................................................................................... 9
Replica Exchange Methods ....................................................................................... 10
Reweighting .............................................................................................................. 1 1
3. Methods to compute the potential of mean force (PMF) .......................................... 13
4. Principle Component Analysis (PCA) ...................................................................... 16
5. Cluster Analysis ........................................................................................................ 18
6. Studied peptides and proteins ................................................................................... 18
Met-Enkephalin ......................................................................................................... 18
6-Hydroxymethyl-7,8-dihydropterin pyrophosphokinase (HPPK) .......................... 19
Chapter 2. A specific Hamiltonian Replica Exchange Method (HREM) and its
validation by application to Met-Enkephalin - 22
1. Introduction ............................................................................................................... 22
2. Methodology ............................................................................................................. 24
Hamiltonian Replica Exchange Method ................................................................... 24
Molecular Dynamics Simulation settings ................................................................. 27
Convergence tests for Principal Component Analysis (PCA) .................................. 28
3. Results and Discussion ............................................................................................. 29
Diagnostics ................................................................................................................ 29
PCA trajectory diagnostics ........................................................................................ 39
l-dimensional PCA analysis ..................................................................................... 4O
End-to-end (Tyrl terminal nitrogen to MetS terminal carboxylate carbon) distance
PMF and its comparison with DREM results ........................................................... 43
2-dimentional PCA Analysis .................................................................................... 47
4. Concluding Remarks ................................................................................................. 53
Chapter 3 A specific Hamiltonian Replica Exchange Method for the study of
EcHPPK and YpHPPK 56
1. Introduction ............................................................................................................... 56
2. Methodology ............................................................................................................. 59
Hamiltonian Replica Exchange Method ................................................................... 59

Molecular Dynamics Simulations ............................................................................. 6O

 

Deviation and ﬂuctuation Measures ......................................................................... 62
Clustering method and HP binding pocket proﬁle calculation ................................. 63
3. Results and Discussion ............................................................................................. 64
HREM Replica Exchange Diagnostics ..................................................................... 64
Comparing RMSD and RMSF for HREM trajectories with those for normal MD.. 66
Binding pockets for EcHPPK and YpHPPK ............................................................ 80
Test of adding water molecules in HP binding pockets ............................................ 90
4. Concluding Remarks ................................................................................................. 95
Chapter 4. Studies of HPPK-ATP conformation space and ATP binding through
using enhanced MD methods 97
1. Introduction ............................................................................................................... 97
2. Methodology ............................................................................................................. 99
Molecular Dynamics Simulations ............................................................................. 99
Restraint method ..................................................................................................... 101
Reweighting method ............................................................................................... 102
3. Results and Discussion ........................................................................................... 104
Restraint MD of HPPK-ATP .................................................................................. 104
Reweighting MD of HPPK-ATP ............................................................................ 108
4. Concluding Remarks ............................................................................................... 121
Chapter 5. Some potential useful modifications of the HREM approach ............... 124
‘ l. HTREM (Hamiltonian Temperature REM) ............................................................ 124
2. HRTREM: (Hamiltonian Restraint TREM) ............................................................ 125
3. HTREM_RMS (HTREM_RootMeanSquare) ........................................................ 126
4. HREM_Ion .............................................................................................................. 127
5. HREM_IonWater .................................................................................................... 127
Appendix. Description of implementation of HREM and several modifications.... 129
1. HREM ..................................................................................................................... 129
Initial Loading ......................................................................................................... 129
Exchange Attempt ................................................................................................... 129
Code excerpt of the driver of HREM implementation of CUKMODY .................. 131
2. The implementation of some possibly useful modiﬁcations based on HREM ....... 138
HTREM ................................................................................................................... 138
HRTREM ................................................................................................................ 138
I-[RTREMGEN ........................................................................................................ 139
HTREM_RMS ........................................................................................................ 139
HTREM_RMS_gradual .......................................................................................... 141
HREM_Ion .............................................................................................................. 141
HREM_IonWater .................................................................................................... 142
References- - 143

 

vi

List of Tables

Table 1.1. Typical motions in proteins ............................................................................. 10

Table 2.1. The acceptance ratio for the time span of the HREM simulation (exchanges are

attempted every 80fs.) ............................................................................................... 30
Table 2.2. The RMSIPs (for HREM window 1) for the three 3 ns intervals (3-6, 6-9 and
9-12 us) using the PCA ﬁrst mode, ﬁrst two modes and ﬁrst ten modes ................. 40
Table 3.1. Acceptance ratios for the HREM simulations; (a) for EcHPPK and (b) for
YpHPPK ................................................................................................................... 65
Table 3.2. The number of snapshots in the clusters constructed for EcHPPK. ................ 82
Table 3.3. The number of snapshots in the clusters constructed for YpHPPK ................. 83
Table 4.1. The atom(s) whose distances are used for the restraint simulations .............. 106

Table 4.2. The hydrogen bonds that are present in the stages of ATP separation ﬁ'om
HPPK. ..................................................................................................................... 1 16

Table 4.3. The salt bridges that are'present in the stages of ATP separation from HPPK.
................................................................................................................................. 1 18

vii

List of Figures

Images in this dissertation are presented in color.

Figure 1.1. Illustration of Periodic Boundary Conditions (The shadowed box is the

primary box and the boxes around it are its periodic images.) ................................... 7

Figure 1.2. The function of HPPK in the folate biosynthesis pathway73 .......................... 20

Figure 2.1. (a)-(e): Migration of systems into and out of a given conﬁguration for 3-6 ns.

(ﬂ-(i): Migration of conﬁgurations (replicas) into and out of a given system for 3-6
ns (1.,- scale value). The ﬁgures from (a) to (e) correspond to systems 1 to 5 and,

similarly, the ﬁgures from (t) to (j) correspond to conﬁgurations l to 5. In (a) the
points are vertically connected, and the coverage demonstrates the desired
itineration. Note that in view of the number of data points that are plotted, it appears
as if, at a particular time, several replicas occupy the same window or several
systems visit the same replica— this does not happen, looking at shorter intervals dots
will be separate. ........................................................................................................ 32

Figure 2.2. (a)-(e): Migration of systems into and out of a given conﬁguration for 6-9 ns.

(D-(i): Migration of conﬁgurations (replicas) into and out of a given system for 6-9
ns (11,- scale value). The ﬁgures from (a) to (e) correspond to systems 1 to 5 and,

similarly, the ﬁgures from (f) to 0) correspond to conﬁgurations l to 5. Note that in
view of the number of data points that are plotted it appears as if, at a particular time,
several replicas occupy the same window or several systems visit the same replica—
this does not happen, looking at shorter intervals dots will be separate. .................. 34

Figure 2.3. (a)-(e): Migration of systems into and out of a given conﬁguration for 9-12 ns.

(f)-(j): Migration of conﬁgurations (replicas) into and out of a given system for 9-12
ns (2.,- scale value). The ﬁgures from (a) to (e) correspond to systems 1 to 5 and,

similarly, the ﬁgures from (t) to (i) correspond to conﬁgurations 1 to 5. Note that in
view of the number of data points that are plotted, it appears as if, at a particular
time, several replicas occupy the same window or several systems visit the same
replica— this does not happen, looking at shorter intervals dots will be separate. 36

Figure 2.4. The end-to-end distances (Tyrl terminal nitrogen to Met5 terminal

carboxylate carbon) along the time line for: (a) 3 to 6 ns, (b) 6 to 9 ns, and (c) 9 to
12 us for the ’1] =1 system. ........................................................................................ 38

Figure 2.5. The PMF corresponding to the PCA mode 1 displacements for all lt=1 system

snapshots for the three time intervals of 3 to 6 ns, 6 to 9 ns, and 9 to 12 ns. ........... 42

viii

Figure 2.6. The PMF for the end-to-end distance generated by the HREM simulation for
all systems with ll=1 for the three time intervals of 3 to 6 ns, 6 to 9 ns, and 9 to 12 ns.
................................................................................................................................... 44

Figure 2.7. The PMF for the end-to-end distance generated by the HREM simulation and
DREM simulations. The PMF line for DREM is obtained by combining two
separate lines for the 3-11 A and the 8-16 A simulations ......................................... 46

Figure 2.8. The 2D PMF for the ﬁrst and second PCA modes for all X=l system
snapshots. Backbone CA stick plots of the conﬁgurations in the dense places are
shown with the distance between Tyrl backbone nitrogen and Met5 carboxyl carbon
shown. ....................................................................................................................... 48

Figure 2.9. CA wire plots for the two salt bridge conformers shown in Figure 2.8, with
the Gly—2 and Gly—3 backbone atoms shown explicitly that illustrates the ‘I’(2) and
$0) dihedral angle compensation mechanism. ........................................................ 49

Figure 2.10. Ramachandran plots for Gly2, Gly3 and Phe4 of Met-enkephalin; The three
plots in the upper row show results for snapshots that are in the PCA ﬁrst mode deep
well (—O.6 to —0.4 A); The lower row for all snapshots ............................................ 50

Figure 3.1. Migration Check for EcHPPK. (a)-(f): Migration of conﬁgurations (replicas)
into and out of a given system (2,. scale value). (g)-(l): Migration of windows

(systems) into and out of a given conﬁguration . The ﬁgures of a-f correspond to
systems 1-6 and the ﬁgures of f-j correspond to conﬁgurations 1-6. Note that in view
of the number of data points that are plotted it appears as if, at a particular time,
several replicas occupy the same window or several systems visit the same replica-
this does not happen,—~looking at shorter intervals, the dots will be separate. .......... 67

Figure 3.2. Migration Check for YpHPPK. (a)-(f): Migration of conﬁgurations (replicas)
into and out of a given system (2,- scale value). (g)-(l): Migration of windows

(systems) into and out of a given conﬁguration . The ﬁgures of a-f correspond to
systems 1-6 and the ﬁgures of f-j correspond to conﬁgurations 16 (Note that in
view of the number of data points that are plotted it appears as if, at a particular time,
several replicas occupy the same window or several systems visit the same replica—
this does not happen. Looking at shorter intervals, the dots will be separate. ......... 70

Figure 3.3. The RMSD for residues of Loop 2 and Loop 3 in EcHPPK: (a) and (b) are for
RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (0)
and (d) are based on all atoms. The RMSD based on normal MD are shown with
solid lines and the RMSD based on HREM are shown with dashed lines. The
RMSDs are average RMSDs calculated for all the snapshots in the two three
nanosecond long trajectories based on normal MD and HREM picked when the
RMSDs and RMSFs for the trajectories have been stable. ....................................... 74

Figure 3.4. The RMSD for residues of Loop 2 and Loop 3 in YpHPPK: (a) and (b) are for
RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c)

ix

and (d) are based on all atoms. The RMSD based on normal MD are shown with
solid lines and the RMSD based on HREM are shown with dashed lines. The
RMSDs are average RMSDs calculated for all the snapshots in the two three
nanosecond long trajectories based on normal MD and HREM picked when the
RMSDs and RMSFs for the trajectories have been stable. ....................................... 75

Figure 3.5. Starting crystal structures for the simulations: (a) for EcHPPK (PDB entry

lqu) (b) for YpHPPK (PDB entry 2qx0) ................................................................ 77

Figure 3.6. The RMSF for residues of Loop 2 and Loop 3 in EcHPPK: (a) and (b) are for

RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c)
and (d) are based on all atoms. The RMSF based on normal MD are shown with
solid lines and the RMSF based on HREM are shown with dashed lines The RMSFs
are calculated for all the snapshots in the two three nanosecond long trajectories
based on normal MD and HREM picked when the RMSDs and RMSFs for the
trajectories have been stable. .................................................................................... 78

Figure 3.7. The RMSF for residues of Loop 2 and Loop 3 in YpHPPK: (a) and (b) are for

RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c)
and (d) are based on all atoms. The RMSF based on normal MD are shown with
solid lines and the RMSF based on HREM are shown with dashed lines. The RMSFs
are calculated for all the snapshots in the two three nanosecond long trajectories
based on normal MD and HREM picked when the RMSDs and RMSFs for the
trajectories have been stable. .................................................................................... 79

Figure 3.8. EcHPPK backbone atom conformations of pocket residues of the HP binding

pocket of the central structures for clusters 1 and 4 of HREM snapshots: The left
panel is for cluster 1 and the right for cluster 4. The central structure of each cluster
is shown in green, the starting crystal structure is shown in yellow, and HP, in
magenta, is placed according to the ternary complex crystal structure. ................... 84

Figure 3.9. YpHPPK backbone atom conformations of pocket residues of HP binding

pocket of the central structures for cluster 1, cluster 2, cluster 5, and cluster 6 of
HREM snapshots. The left panel is for cluster 1 (top) and cluster 5 (bottom) and the
right panels for cluster 2 (top) and cluster 6 (bottom). The central structure of each
cluster is shown in green, the starting crystal structure is shown in yellow and HP, in
magenta, is placed according to the ternary complex crystal structure. ................... 85

Figure 3.10. The HP binding pockets proﬁles for EcHPPK. The HP binding pocket

proﬁles are shown in bright red in the following plots. (a) shows the proﬁle for the
starting crystal structure, and (b) and (c) show proﬁles for the central structures of
cluster 1 and cluster 4, respectively. ......................................................................... 87

Figure 3.11. The HP binding pockets proﬁles for YpHPPK. The HP binding pocket

proﬁles are shown in bright red in the following plots. (a) shows the proﬁle for the

starting crystal structure, and (b), (c), (d), and (e) show proﬁles for central structures
of cluster 1 cluster 2, cluster 5 and cluster 6, respectively. ...................................... 88

Figure 3.12. (a) The conﬁguration of EcHPPK and four hydrogen bonded added water
molecules. (b) The conﬁguration after 1 ns normal MD simulation with the starting
structure as shown in (a). The added water molecules are shown in yellow. ........... 92

Figure 3.13. (a) The conﬁguration of EcHPPK with 20 added water molecules. (b) The
conﬁguration after 1 ns normal MD simulation with the starting structure as shown
in (a). The added water molecules are shown in yellow. .......................................... 94

Figure 4.1. A schematic representation of the MD simulation protocol. The cylinder
denotes residues that form a binding region for ATP with key residue side-chain
orientations denoted by curved arrows. ATP is represented as a Wiggly line. The
initial HPPK-ATPMgz binary complex is constructed from the ternary crystal
structure. Ten consecutive restraint simulations direct HPPK-ATPMgz toward the
apo crystal structure. ATPMgz remains bound throughout the process even though
residues initially trapping ATPMgz gradually move away, through trapping
intermediates, until reaching open trapping states. The reweight simulation then
shows that ATPMgz can separate from HPPK through a series of breaking and
making hydrogen bonds and salt bridges. ............................................................... 100

Figure 4.2. Superposition of representative structures from three windows along the
restraint pathway from closed to open. The structure for window 1, which is quite
similar to the closed form, is colored in yellow; the structure for window 5 is colored
in purple; the structure for window 10 is colored using the normal convention. ATP
in each window is shown in “ball and stick” mode, and residues Arg82 and Trp89
are shown in “stick” mode. It is clear that ATP remains bound to HPPK throughout
the restraint simulation ............................................................................................ 107

Figure 4.3. (left panel) CPK view of the starting MD simulation structure derived from
the ternary complex. (right panel) CPK view of the MD structure aﬁer maintaining
the restraints for 9 ns subsequent to the restraint opening protocol. ATP is colored
purple. The yellow residues, 86-89, which cover part of ATP in the initial ternary
complex, are mainly replaced by solvent interactions in the MD-opened structure.
................................................................................................................................. 109

Figure 4.4. ATPMgz for six different conformations as it separates from HPPK are shown
with yellow, green, blue, purple, orange, and red indicating stages one through six.
The gradual separation of ATP is evident as a combination of reorientational and
translational motion. ............................................................................................... 1 12

Figure 4.5. Details of the hydrogen bond and salt bridge interactions as ATP separates
from HPPK. Left panels, from top to bottom, are hydrogen bonds (dotted lines) for
stages 1, 3 and 5. Right panels, the corresponding salt bridges (solid lines). See
Table 4.2 for hydrogen bond and Table 4.3 for salt bridge particulars. ................. 114

xi

Chapter 1. Introduction

1. Molecular Dynamics Simulation

General description

3 is a form of computer simulation for studying molecular

Molecular Dynamics“
motion by allowing atoms to interact under force ﬁelds, which are developed to
approximate certain known physics. Normally, MD simulation can provide dynamic
information of a system at atomic detail. (The method only deals with the classical
Newtonian forces and does not explicitly take into account any quantum level details.)
Thus, in principle, MD simulation can be used to answer any questions about a model
system. In other words, in principle, not only any equilibrium property of the model
system can be evaluated by calculating a trajectory time average based on the ergodic
hypothesis (in practice, the MD trajectory usually only provides a sample of the
ensemble) , but also the real dynamics of the system can be obtained. By examining the
trajectory, not only can one calculate average system properties such as free energy but
also one can learn the main functional motions of a protein. Of course, in practice, a
study based on MD simulation will be limited by the time-scale affordable under current
MD technique and the correctness of the force ﬁeld parameters in use.

The algorithms used to implement MD simulation are all based on answering a
fundamental question3: Given the current conﬁguration of the system, what is the next
one? MD propagates the model system in time by answering the above question
repeatedly. As the method only deals with classical dynamics, MD propagates the system

following the classical equations of motion (equations 1.1 and 1.2):

= _ _ . 1.1
, ari art. f, ( )
api ml

where, ’1' , pi_ ft and mi are the coordinate, momenttun, force and mass of particle i,

respectively, while H and V are the relevant Hamiltonian and potential. In practice for
MD, the above ﬁrst-order differential equations are usually solved for each particle to
obtain better accuracy solutions of the equations of motion although the second-order
Newton’s Equation (equation 1.3):

ft = mi 'rl. (1.3)
are sometimes used for implementation due to efﬁciency concerns. Therefore, in general,
after the composition of the system (including the number of atoms and their types,
masses and interaction potentials) are speciﬁed, there are three steps involved in the
process used to propagate the system. First, one needs to assign initial values of positions
and velocities to the particles within the model system. In terms of biological systems,
simulation is usually started out with the crystal structures of the molecules of interest
and, when conducting explicit solvent simulation, solvents can be added by either putting
the system in a lattice box or an equilibrated box of solvent molecules. Then, the initial
velocities are usually either generated randomly from a Maxwell distribution for a
speciﬁed temperature3 or assigned to a special value (typically zero). Second, forces are
evaluated according to the atoms’ interaction potentials used to deﬁne the system. In fact,

potentials are deﬁned through choosing developed force ﬁelds. Third, an integrator is

used to predict the next conﬁguration given the chosen time step. The second and third

steps are repeated until reaching the pre-set number of integration steps.

Force Field

In MD, a molecule is generally described as a series of charged points (atoms)
linked by springs (bonds). Therefore, the relevant potential interactions in a MD
simulation can be decomposed into the following interaction terms: bonding terms for
bond length vibration, bond angle vibration, and proper torsion and improper torsion
angles, and non-bonded Lennard-Jones and electrostatic terms. These terms must be
parameterized to obtain a force ﬁeld. The parameterization is obtained by reproducing
molecular geometry and selected properties of tested systems and the parameter values
are chosen to ﬁt experimental results or ab initio quantum data. By now, a substantial
number of force ﬁelds have been developed, including some popular ones for biological

systems such as the AMBER4, CHARMM5 and GROMOS6 force ﬁelds.

Integrator

Many integrators3 have been developed for MD simulation. One of the most
popularly used integrators is the leapfrog algorithm which is a modiﬁed version of the
standard Verlet algorithm mainly used for integrating the second order Newtonian
equations. The Verlet algorithm uses the positions and accelerations at the time t and the
positions at the time t - At to predict the positions at the time t + At, where At is the
integration step (time step). By using a Taylor expansion, it can easily be derived that the
errors in the atomic positions and velocities are of the order of At4 and Atz, respectively.

To obtain more accurate velocities, the leapfrog algorithm is used, using velocities at half

time step. The following two equations (equations 1.4 and 1.5), in which r represents
atom positions, t represents time and At represents one time step, show the leap-frog

algorithm7.
. At
r(t+At)=r(t)+r(t——2—)At (1.4)

At

f[t+£:—t)= '(t—-2—]+F(I)At (15)

One thing worth noting is that the integration step size should be chosen as large as
possible to make the simulation efﬁcient but small enough to catch the fastest motion (As
a rule of thumb, roughly the step size needs to be set as about 1/10 of the time scale of the
fastest motion.) Normally bonds involving hydrogen atoms vibrate in the scale of 10 fs.
Therefore, in order to make the MD simulation more efﬁcient, the SHAKE8 algorithm is
usually used to constrain at least hydrogen related bonds so that the proper time step size
can be increased to 2 fs. Since, for the normal simulation time scale, those quick bond
vibration motions will be averaged out, using the above constraints should be a good

approximation.

Temperature and pressure control

When only solving the Newton’s equation with no temperature modiﬁcation, then
the total energy of the system is conserved. (This condition is oﬁen used as a measure to
check the accuracy of the integrators used.) Therefore, without temperature modiﬁcation,
the MD simulation is in fact performed in the microcanonical (ﬁxed number (N), volume
(V) and energy (E)) ensemble. Although MD simulations within the microcanonical

ensemble are the easiest to perform, in most situations, the canonical (ﬁxed number (N),

volume (V) and temperature (T)) or the isothermal-isobaric (ﬁxed number (N),
temperature (T) and pressure (P)) ensemble is more appropriate for normal experimental
conditions. Therefore, we need methods to control temperature and pressure.

A number of methods have been developed to control temperature or pressure 9.
One popularly used method is the Berendsen method“); the essential idea behind the
method is that, in fact, the system in simulation is not isolated but coupled to an external

bath. So, Berendsen at al proposed the following equation of motion with a modiﬁcation

. T
on velocities of atoms, F,- = 1'— + i(r7ef_ -1)i‘i ,

where, r is the coupling time, and T refis
mi 22'

the reference temperature, which is the temperature pre-chosen for the external bath. T is

K , where k is the Boltzmann

the instantaneous temperature, which is equal to B

2
kB(Nd)

constant, Nd is the number of degrees of freedom in the system and the kinetic energy is

N
K = Z gmifiz. The extra term added in the above equation, in fact, serves as a frictional
i = 1

force and, hence, the actual implementation of the method simply employs a feedback

mechanism to control the temperature. The velocity is scaled by a factor

 

-T
0' = J1-91_7Ji . Usually, the coupling time, t, is set to around 0.2 ps. The principle
I

behind controlling pressure is similar to that behind controlling temperature. The
instantaneous pressure P is calculated as P =§27(K — Virial) with V the volume, K the

kinetic ener and Virial the virial function3. Thus, for an isotro ic s stem, the ressure
gy P y P

is controlled by sealing the box side and all atoms’ positions by a

l

t P-Pref 3

factory = l-% P , where B is the coupling time for pressure control. The

implementation of pressure control is much more complex for anisotropic box, in which
case a pressure tensor has to be used”). As shown above, the implementation of
Berendsen method is quite simple, but the compromise for this simplicity in
implementation is that the Berendsen thermostats, in theory, do not lead to any known
ensemble. To guarantee an NVT or NPT ensemble theoretically, more complicated
methods such as Langevin dynamics method11 or Nose-Hoover method12 need to be

introduced.

Periodic Boundary Conditions (PBC)

In mathematical models and computer simulations, periodic boundary conditions
(PBC) are a set of boundary conditions that are often used to simulate an inﬁnite system
by using only a ﬁnite size box. Since usually only a small box of atoms (usually
referenced as primary box) can be afforded in practical MD simulation, the boundary
particles’ interactions are a serious concern. The PBC is introduced to help address this
difﬁculty.3 In the PBC approximation, an inﬁnite system is represented as a periodically
repeated array of the ﬁnite small system being simulated. A schematic diagram for a two-
dimensional system is shown in the following Figure 1.1. Clearly by using PBC, certain
artiﬁcial periodicity (as shown in Figure 1.1, the simulated system is constructed by
putting periodic images around the primary box, so there clearly are certain periodicities
in the constructed system which might not exist in the real system) is imposed on the

system under study. But, it can be shown that this system can still represent a real system

provided that an appropriate box size is used.3 In conjunction with using PBC, the long-
range interactions can be calculated approximately by using cut-off methods or, more
accurately, by using an Ewald Lattice Sum.3 Details of calculation of long range

interactions are given in the next paragraph.

Figure 1.]. Illustration of Periodic Boundary Conditions (The shadowed box is the

primary box and the boxes around it are its periodic images.)

 

 

 

 

 

 

 

 

Calculation of long-range (non-bonding) interactions

For most current MD simulation force ﬁelds (including the one described in more
detail before), the non-bonded energy an for the molecular system can be written as a

sum of pair-wise electrostatic and Lennard-J ones contributions as shown in equation 1.6:

qiq Ar" B1"
an= Z(_—— j .+—12——6j (1.6)

47:8087241- I“..-

When conducting simulation using PBC concept, as far as the system is
determined, all its images are determined. Therefore, in theory, all the long-range
interactions can be calculated by the above expression. But, in practice, it can be seen that,

in the ﬁrst term of the above equation representing the electrostatic contribution, the

order of rij, the distance between two atoms, is -1. It can be shown that a summation of

inqj

r

 

the long-range interactions whose energy has the form of with p S 3 , converges

very slowly.” (Of course, if p 51, the summation does not converge at all.) Probably the

easiest method to solve this problem in calculation of those non-bonding interactions is to
use a cut-off (truncation) method, which simply ignores all interactions beyond some
chosen threshold distance (typically 10 A). But later it was found that for proper
simulation of many biological systems more efﬁcient methods, such as Ewald
summation3 are needed.14 Using an Ewald summation method would allow the long range
interactions to be calculated fully for a periodic system. The Ewald method is
implemented by dividing the summation calculations into two parts, within the so-called
reciprocal space and within the so-called real space. Then, assuming a charge neutral
system, the two summations both converge fairly rapidly given the special formula

developed by Ewald in 1921.3 A naive implementation of the Ewald summation method
would be implemented in 0(N2), which is too expensive for many biological systems
such as proteins simulated in explicit solvents. There are specialized implementations of
the Ewald method, which can run as 0(N log N). A popular implementation is called the

Particle Mesh Ewald (PME) method”, which makes interpolation of the reciprocal space
structure factor in the lattice points using Cardinal BSpline interpolation method and

utilizes the popular Fast Discrete Fourier Transform (F DFT) method.

2. Methods for enhanced sampling

General Introduction

Although, in theory, MD simulation can provide all structural, equilibrium and
dynamic information of a system under study at atomic detail, the quality of the
information obtained is limited by the quality of the force ﬁelds in use and the extent of
the coverage in the sample space. In our work we have focused on the sampling problem
(In other words, the force ﬁeld issue, which is another important problem cumbering
effective application of MD simulation, is not the focus for the work summarized in this
dissertation.) The current affordable time scale for conducting MD simulation through
integrating Newton’s equation of motion with a femtosecond time step is usually only
tens of nanoseconds for a normal-sized protein simulated under explicit solvents unless
large parallel machines are used so that hundreds of nanoseconds or a microsecond is
achievable. ,Thus, using MD simulations to study conformational changes that may occur
on micro second to seconds time scales is problematic.16 (Some typical motions in
proteins are listed in the following Table 1.1.) Proteins usually have complex energy
landscapes with energy barriers many times the thermal energy, RT (for room
temperature at 300 K, RT is around 0.6 kcal/mol), while going over barriers is

’5“ MT , where k is rate

exponentially hard according to the Arrhenius equation, k = Ae
constant, A is a pre-factor, and Ea is the activation energy. Therefore, structures are
frequently trapped in local energy minima where the thermal energy of the system is
insufﬁcient to traverse the energy barriers in normal MD simulation times. Special

techniques, such as the replica exchange method (REM) and the reweighting method,

have been developed to, respectively, help surmount or reduce those barriers among

minima so that, by using those methods appropriately, a better sampling within the given

affordable amount of time might be obtained.

Table 1.1. Typical motions in proteins

 

 

 

 

 

 

 

 

Protein Motion 10810 of Characteristic times (5)
Local denaturation and opening of secondary 3 to -1

structure

Aromatic side-chain rotation l to -5

Conformational changes and allosteric trasitions 0 to -5

Diffusion -6 to -9

Segmental motion -7 to -9

Aliphatic side-chain rotation -8 to -9

Vibrational and torsional modes ~11 to -13

 

 

 

 

Replica Exchange Methods

The Replica exchange method (REM)17'24 is a popularly used method designed to
address the sampling issue. The method was ﬁrst developed for the simulations of spin
systems 22 and now is widely used to study peptides and proteins. The ﬁrst version of the
REM introduced into the MD simulation ﬁeld was temperature replica exchange method
(TREM), and recently a Hamiltonian REM (HREM) was introduced into the ﬁeld by
Fukunishi et al.25 The essential algorithm for all the REM approaches contains two parts.
In part one, each REM system (a REM system is characterized by a certain property, e.g.,
temperature for TREM or Hamiltonian for HREM) containing a certain conﬁguration is

simulated simultaneously and independently for a certain number of MD steps. In part

10

two, exchange attempts are made among systems according to a Metropolis-Hastings
algorithm.9 (To improve the effectiveness, usually exchange attempts are only conducted
between adjacent systems). In the TREM, the low temperature systems can borrow fast
equilibration properties from high temperature systems, providing doorways for those
low temperature systems to overcome energy barriers and thus improve the sampling.26
Of course, the sampling of the low temperature system will not be guaranteed to improve
through exchanging with high temperature systems because, for example, the high
temperature system may simulate the same sample space as the low temperature system
or the high temperature system wastes most time on sampling unproductive regions in the
sample space. The TREM suffers from the deﬁciency that the number of system copies
needed is approximately proportional to the square root of the number of degrees of
freedom of the system of interest.25 The HREM was introduced to address the above-
mentioned difﬁculty with TREM.25, In HREM, the potential energy function in different
Hamiltonians for the systems differ only in a limited set of the total number of degrees of
freedom required to characterize a system, thereby, in principal, reducing the number of
systems needed. Of course, the basic issue of implementing HREM becomes choosing

which degrees of freedom to focus on.

Reweighting

The essential idea underlying a reweighting method27'3‘, which is a form of
umbrella sampling9, is that we can use another potential to run a trajectory, and then by

reweighting back we can still get the original ensemble average of any conﬁgurational

property. The ensemble average of any property A(r~ ) , which is a function of the
system conﬁguration r” , can be calculated as shown in equation 1.730

11

A ~ _ ldr”A(r”)e"“"”’_1mm)ewrwte-W)
< (r )> — [drive-WON) " IdrN {e+ﬂAV(rN)}e—'6V.('N)

: A (rN)e+,6AV(rN) e+ﬂAV(rN)

 

 

 

t+T Ns
limo A(r.~(s))e*f"’l’* ()st
T—-)oo to
= t +T
lim 0 e+ﬂAVIriv")]ds
T—)oo to

(1.7)

where, the potential energy V=(V+AV)—AVEV*—AV deﬁnes V* , a modiﬁed
potential energy surface upon which the dynamics is carried out, <..>* denotes an
average with modiﬁed ( V* ) weighting and r.” (s) (to SsStO+T) denotes the

V *modiﬁed trajectory over the simulation time interval, T. The last identity indicates
that, with the assumption of ergodicity, the ensemble averages are to be evaluated as time
averages over the trajectories generated from the modiﬁed surface. At each step of the

dynamics on the modiﬁed surface, the data can be “re-weighted” with the factor

exp( ,BA V) to guarantee that the average is Boltzmann weighted. If barriers are reduced

by use of the modiﬁed potential surface, the modiﬁed trajectory will explore the
conﬁguration space more rapidly.

Some successful examples of application of the method including protein
folding”, exploration of dihedral conformational space in the alanine dipeptide 33, and

study of HIV-1 protease 34.

l2

The most technical and difﬁcult part for reweighting method is to choose V*.

32-34

Examples of VI choices from some developed reweighting methods are shown in the

following equations 1.8 to 1.10.
V“ =expt(—V<rN)—V.h..>/nf1 (1.8)

if Vm > V(rN), then V* = AV(rN ) + V(rN )

 

with
_ N 2 (1.9)
AVON) = (Vthre V(r )])V
a + (Vthre — V0" ))
V* =1- exp[(—7V(rN ) — Vthre )] (1.10)

where, V is the true potential energy; Vthre is called a threshold energy and is a constant

chosen before conducting simulation; ”f in eq. 1.8 is the number of degrees of freedom of
the system; a and y are constants used in equations 1,9 and 1.10, respectively, to further
modify the energy landscape. Because a good choice of ere, and a and y, needs to be

made based on some knowledge of the energy landscape, usually a substantial amount of

trials need to be made before the ﬁnal choice can be made on those values.

3. Methods to compute the potential of mean force (PMF)

To ﬁnd a low energy path and barriers between different states, a potential of

mean force (pmf)35 is deﬁned as: W (g) = —len p(§)+W0, where, W0 is an arbitrarily

13

chosen reference value to set a zero ﬁee energy; and [2(6) is the distribution ﬁmction

which is deﬁned in following equation 1.11 as:

)_ Idr3N6(§'[r3N ] - g)exp[-V(r3N )/kT]
p(§ — * Idr3N exp[—V(r3N)/kT]

 

(1.11)

in which, V , the potential energy, depends on the 3N coordinates of the system, r3”; 4‘ is a

reaction coordinate, which is a ﬁmction depending on a certain number of degrees of

ﬁeedom in the system (e.g. an angle or a distance of interest). The 6(4‘ '[r3N ]-5) is a

delta function used to pick out the reaction coordinate value 5 of the snapshot being
checked, 2," '[r3N ]. Of course, in practice, a certain range will be used to consider

4‘ '[r3N ] being equal to 4’.

Usually, there will be some barriers along the chosen reaction coordinate that are
difﬁcult to overcome by normal MD simulation. Then, some artiﬁcial potential'terms
need to be added to help the sampling. One widely used scheme is the Umbrella
Sampling scheme.36 For applying this method, a restraining potential UK) (sometimes
called umbrella potential or window potential) is used to help improving the sampling of
speciﬁc regions around 4‘, UK) can be whatever which is convenient and proper, though
the most commonly used potential is the harmonic potential. Then, the chosen path along
reaction coordinate will be divided into properly spacing windows to provide enough
overlap in the reaction coordinate distributions for neighboring windows, and the
expression of the restraint potential biased distribution for some window i can be

represented as shown in equation 1.12:

pm) = expert/45)»? (r) < exp(-ﬂU.-(§)) >;‘ (1.12)

14

where, p," (9‘) is the unbiased (without the restraint potential) distribution for the 1"“

window. Substituting it into the WK) expression , the corresponding W,- can be obtained

for each window, which is shown in following equation 1.13,

W1<r>=—kT1np.-”(5)+W1 —U.-(:>+/,- 1111)

where, f,- is deﬁned as

fi = "kT 1n < 3XP(“13U1(§)) >1 (1.14)

There have been numerous efforts spent to address problems of unbiasing and

recombining the information gotten from all windows. WHAM37’38 is currently the most

popular method since all traditional methods, which calculate fi by overlapping ,0? (4‘) as

well as possible, only use data within overlapped area between neighboring ,0? (4‘) while
wasting a substantial amount of data outside the overlapped area. WHAM instead linearly

combines the ply (9’) to one unbiased distribution function:

p“(5)=:p,-(:)p;‘(r). :p,(;)=1 11.15)
1.1...
"1- exp<-ﬂ[U1(é‘)-f.-]>
injexp<-ﬂ[U,-(r)—ﬁ])

 

191(5):

(1.16)

WHAM scheme is developed to make the statistical error minimal.39 The ft ’3 can be

obtained from:

15

BXP(-ﬂf.-)= jdrp“(:)exp<—ﬂU.-(«:)>
zidf N n1eXp(-ﬂU.-(5)) P515) (1.17)
gnjexp(—3[U.(5)—ﬁ])

 

By plugging eq.1.14 into eq 1,12 and making some rearranging, we can get the following

equation:

p.” (4")=exp(-ﬂ[U1(§)-ﬁl)p1b(é‘) 1111)

Using the above four equations (equations 1.15 to 1.18), the fi can be computed
self-consistently (note UiK) is the added window restraint potential which is known),
and thus compute the unbiased distribution function ,0“ (5) shown in eq. 1.15. And, it is

clear that this method can be easily extended to higher dimensional distributions naturally.
For example, for two dimensions, the only difference is that the iteration will be carried

over ij instead of fi Though for practical issue, the number of data points needed for

pmf construction exponentially depends upon the dimensionality.

4. Principle Component Analysis (PCA)

Principle Component Analysis (PCA) is a statistical multivariate analysis method used in
many ﬁelds“). The analysis method essentially transforms a number of possibly
correlated variables into a smaller number of uncorrelated variables that are usually
called principal components. The ﬁrst principal component is constructed to account for

as much of the variability in the data as possible, and then each succeeding component is

16

constructed to account for as much of the remaining variability as possible. In ﬁeld of
simulations of biological systems, PCA is essentially used to extract those modes with

biggest ﬂuctuations for further study.“‘15 One basic method to implement PCA is to

diagonalize the covariance matrix for coordinates of atoms, Covij =< 613-5r j > , where 5r,-

is deﬁned as r,- - <I’i> , with r,- = {x13 yi,zi} denoting the Cartesian components of the

position of the ith atom.and <...> denoting the ensemble average. By diagonalizing, the
orthonormal eigenvectors and corresponding eigenvalues of the covariance matrix can be
obtained. Then the conﬁguration point r(t) = (x1 (I), yl (t),..,zN (t)) can be decomposed
as:
3N
r(t)=;[r(t)omi]mi =12»,- (1)1111. (1.19)

where the m,- are the orthonormal eigenvectors of the covariance matrix Covij , and the

p, (t) are the corresponding PCA mode displacements. Typically the ﬁrst few

“important” modes have much larger eigenvalues than the remaining modes. (Normally
the number of remaining modes is much larger than that of the ﬁrst few important modes).
The ﬁrst few modes are considered to be important since they often describe large scale
motions such as conformational changes while the remaining modes with small

eigenvalues usually correspond mostly to small Gaussian ﬂuctuations.

l7

5. Cluster Analysis

Cluster analysis or clustering is a common technique for statistical data analysis.46
In essence, clustering is the assignment of objects into groups (referenced as clusters) so
that objects from the same cluster are more similar to each other than objects ﬁom
different clusters. Often this similarity is assessed according to a distance measure. One
widely used clustering method in analysis of simulated trajectories for bio-systems is the
“g_cluster” method“, which uses the root mean square distance (RMSD) as its similarity
measure. For each input structure, the method counts the number of neighbors within an
RMSD cut-off, takes the structure with the largest number of neighbors, along with all its
neighbors, as a cluster, and eliminates all the structures within this cluster from the pool
of structures. The above steps are repeated for the remaining structures in the pool until

8

no structures are left. There are other clustering methods such as k-means4 or

hierarchical clustering.49

6. Studied peptides and proteins

Met-Enkephalin

Met-enkephalin is an endogenous opioid peptide neurotransmitter found in the
central nervous system and gastrointestinal tract where they bind to opiate receptorsso’5 l
The pentapeptide (tyr-gly-gly-phe-met) is one of the two forms of known enkephalins
and the other is leu-enkephalin (tyr-gly-gly—phe-leu). The opioid peptide
neurotransmitters are shown to play important roles in pain mediation, opiate dependence,

and euphoria.52’53 After being identiﬁed in 1975, a substantial amount of studies have

been performed to investigate the enkephalins. These studies include experimental

18

studies such as nuclear magnetic resonance (NMR)53'55, infrared GR)“, ultraviolet
(UV )57’58, and circular dichroism (CD)59’6°, and numerous computations.'9’32’61459 Almost
all studies show that the enkephalins exhibit great conformational plasticity, and possibly
because of the ﬂexibility of the peptides, the exact dominant conformations of the
peptides are still unknown (conﬂicting evidences for structures are obtained.) However,
as will be discussed in more detail in Chapter 2, it seems to be hard to get good samples
of met-enkephalin or leu-enkephalin with normal MD simulation. So some part of the
work summarized in this dissertation is focused on using a special HREM approach to
help improve sampling of met-enkephalin. The simulation work of met-enkephalin is
being conducted in explicit water, because although the receptors of enkephalins are
usually membrane proteins, the regions of those membrane bound receptors to which
enkephalins bind are considered to probably be in the aqueous parts”.

6-HydroxymethyI-7,8-dihydropterin pyrophosphokinase (HPPK)

Folate derivatives are essential cofactors for life; mammals, including humans,
obtain folates ﬁ'om their environment, while most microorganisms must synthesize
folates de novo via the folate biosynthetic pathway. Therefore, biosynthetic pathway of
folates is an important target for the development of antibiotics to treat infections from
microorganisms. Due to rapidly increasing antibiotic resistance in recent years which has
rendered the current antibiotics ineffective for treating many microbial infections,
resulting in a worldwide health care crisis, new antimicrobial agents are urgently needed
to control the infections that are resistant to the current antibiotic drugs.7l

6—Hydroxymethyl-7,8-dihydropterin pyrophosphokinase (HPPK) catalyzes the

transfer of pyrophosphate from ATP to 6-hydroxymethyl- 7,8-dihydropterin (HP)72’73 and

19

is a new target in the folate biosynthesis pathway for developing antimicrobial agents. Its

function mechanism is shown in the following ﬁgure73 :

Figure 1.2. The function of HPPK in the folate biosynthesis pathway73

0
Hu’uﬁjﬂou

1.11/b.
1 ATP

HPPK Mg”

i) i N :o—ii—o—ii-o’
43: *1 Ti «1- 1

A number of studies have been conducted on HPPK72'82, most of the studies focus

 

 

 

on E. coli. HPPK.72’74'79’8"82 The crystal structure of apo E.coli HPPK 74 reveals a three-
layered a—B-a fold formed by six B-strands and four a-helices. The fold of the HPPK
molecule creates a valley that is approximately 26 A long, 10 A wide, and 10 A deep.
Three ﬂexible loops, [32-133 (loop2), [31-011 (loopl), and 012-34 (loop3), form one wall of
the valley. The other wall of the valley is relatively rigid and is constructed by the part of

the protein’s hydrophobic core. The crystal structure of the ternary complex E.coli HPPK

20

with AMPCPP (a non-reactive analog of ATP) and two associated Mg2+ ions and with
HP shows signiﬁcant conformational changes relative to the apo structure concentrated in
the ﬂexible loops75 that serve to sequester the ligands in a catalytically competent form.
In the ternary form”, HP is sandwiched between two aromatic rings of Phe123 and Tyr53
and forms six hydrogen bonds with residues Thr42, Pro43, Leu45, and Asn55. Twelve
residues are involved in binding AMPCPP, Gln74, Glu77, Arg84, Arg88, Try89, Arg92,
Ile98, Arg110, Thr112, HisllS, Tyr116, and Arg121, of which Glu77, Arg92, HisllS,
and Arg121 are conserved". The crystal structure data on the binary complex of E. coli
HPPK with AMPCPP show that this binary complex is trapped in a “super open”
conformation where one of the loops (loop 3) is in an extended conformation.76 NMR
studies of HPPK with a related non-interactive ATP analog, AMPPCP present on average
a different but also more open (than apo) conformation.79 Actually, examination of the
ensemble of structures reveals that some are in conformations that approach the ternary
closed conformation. The NMR results suggest that the binary complex of HPPK with
ATP most likely is labile, sampling both open and closed-like conformations.

In part of the work summarized in this dissertation, a special HREM approach is
applied to study the HP binding pockets of E. coli HPPK and Y. pestis HPPK in the hope
of discovering different important HP [binding pocket conformations of the proteins,

which might be used to develop new narrow band antibiotics.

21

Chapter 2. A specific Hamiltonian Replica Exchange Method (HREM) and its

validation by application to Met-Enkephalin

1. Introduction

In the work covered in this chapter, the conformational space of Met-enkephalin
is explored with the use of the Hamiltonian Replica Exchange Method (HREM) applied
in explicit solvent MD simulations. As mentioned in Chapter 1, Met-enkephalin has been

53-56,58,60 and

shown to exhibit great conformational plasticity by experiments
computation.'9’3 2’61'69 Due to this plasticity and its practical importance as an important
opioid peptide, Met-enkephalin is popularly used to test methods that can enhance the
rate of conformational sampling. Of particular interest is the zwitterionic form
(protonated N-terminus and ionized C-terminus), which should predominate in polar
media such as water. The competition between “closed” forms, where the N and C
termini are salt-bridged, and charge solvated “open” forms in which the terminal peptide
charges interact with solvent dipoles may lead to reasonable co-existence of closed and
open forms. A number of MD-based simulations of Met-enkephalin and the closely
related Leu-enkephalin (both of them are enkephalins mentioned in Chapter 1) have been
carried out. Berendsen et a1.63 noted that the zwitterionic form of Leu-enkephalin was
quite labile but only sampled folded, closed forms during their simulation. Smith et al.64
found at neutral pH their simulated zwitterionic form of Leu-enkephalin had a roughly
equal mixture of close and open states. They also found that starting from an open form
the peptide rapidly closed, but eventually re-opened on their 10 ns time scale (only once

during the entire simulation). Nielsen et a1.65 simulated Met- and Leu-enkephalin in

zwitterionic forms and found a rapid one-way transition from open to stable, closed

22

conformers. Freed et al.(’6 compared the results of explicit and implicit solvent
simulations and found mostly compact with some open conformations. Karvounis et al. 68
simulated the zwitterionic form of Leu-enkephalin in explicit solvent initiating a number
of trajectories from open forms and found a variety of behaviors including a persistent
salt-bridge form. Garcra et a1.69 simulated neutral Met-enkephalin in explicit solvent
using the Temperature Replica Exchange Method (TREM) with 17 replicas (we call
replicas “systems” in this dissertation) used. Garcia’s study demonstrated the enhanced
sampling capability of a REM relative to conventional MD, showing that it could
surmount the barriers separating non-helical from helical conformations in the simulation
interval. One thing noticeable of the above mentioned simulation work is that, although
the enkephalins are considered to be highly ﬂexible according to experimental evidence
(details are in Chapter 1), for the zwitterionic forms of enkephalins, which should be the
predominant forms in water, no study has achieved a decent number of transitions
between closed and open forms during their simulation time scales. Assuming the co-
existence of the open and closed forms, then, to make meaningﬁrl statistical inferences of
the enkephalins (required for the evaluation of potentials of mean force and equilibrium
constants) in their conformational spaces not only requires sampling both forms but also
requires achieving a large number of transitions between the open and closed forms
during simulation. The co-existence is highly probable and supported by above-
mentioned simulation results and experimental studies listed in Chapter 1. Therefore, in
the work of this chapter, a special version of Hamiltonian Replica Exchange Method
(HREM) was developed and applied to address the aforementioned sampling issue for

Met-enkephalin.

23

2. Methodology

Hamiltonian Replica Exchange Method

As described in Chapter 1, the REM concept was ﬁrst introduced as Temperature
Replica Exchange Method (TREM), which suffers from the deﬁciency that the number of
system copies needed could be very large for large system”. To solve ths problem, the
REM concept was generalized to a Hamiltonian Replica Exchange Method25 (HREM)
where the systems differ by their Hamiltonians (in practice, systems usually differ only in
their potential energy functions and their kinetic energy parts are left intact). As a matter
of terminology, we shall refer to these different Hamiltonians as systems (versus replicas),
since replica connotes a copy of an item. In fact, for HREM, it is more natural to use this
terminology and the term replicas will be reserved for the conﬁgurations that are present
on any particular MD step. In CUKMODY83’84, the MD program deve10ped in our group
and the simulation program used for all the work in this dissertation, a given
conﬁguration (replica) is maintained on a particular computer node and the systems (with
different potential ﬁmctions) move onto and out of that node. The Hamiltonian for the i111

system within the extended HREM ensemble can be represented as,
Hi(X,P) =T(P)+Vi(X) where T(P) is the kinetic energy and VI(X) is the potential
energy function for the ith system with phase space coordinates X, P. Exchange attempts

are made regularly at certain predetermined MD steps and, for the HREM
implementation in this work, the attempt was made every 40 steps. Between exchange
attempts, normal MD is run for each system. The exchanges may be thought of either as

conﬁguration exchanges or potential energy ﬁrnction exchanges, which will be scale-of-

24

interaction in this HREM implementation. Computationally, it is much more efﬁcient that
only a scale factor for the potential energy function needs to be exchanged, versus
exchanging conﬁgurations. Exchanges are attempted only between neighboring systems,
because for the method to be efﬁcient the overlap between the systems’ probability
distributions needs to be adequate. (Efﬁciency can be roughly measured by the exchange
acceptance probability, which is the fraction of successful exchange attempts, and is a
compromise between the speed of motion and step size through the conﬁguration space.)
When system interchanges are attempted, detailed balance equations for pairs of

neighboring systems
101,11. x:x> 1(X)P1-(X’) = am —> x, x91:- (xv;- (X) 12.11
are enforced. Here, a (X, X' —> X', X)is the acceptance probability (transition probability)

that conﬁguration X in the ith system and X' in the jth system before exchange results in

conﬁguration X' in the ith system and X in the jth system after exchange, and R (X) is

the Boltzmann distribution at temperature T =1/kBﬂ for the ith system. The Metropolis

rule for exchange between two systems,

a (X, X' _) X', X) = min (I, e—A(X,X'—>X’,X))

(2.2)
where
A(X’X""x"x)=ﬂ[(Vi(X')—Vj(X'))+(Vj(X)—Vz‘(x))] 12.31

is used to impose the detailed balance equations to guarantee that Boltzmann equilibrium

in the extended ensemble of the product of all the systems’ ensembles will result for a

25

sufﬁciently long traj ectory.25 If the potential functions differ by a restricted set of degrees
of freedom, only those will contribute to equation 2.3.
In the HREM implementation for the Met-enkephalin study, the potential energy

is parameterized as

V (X) = 2.2V”, (xP,xP)+/1iVPS (xP,xS)+ VSS (xS,xS), (2.4)

l l

where the terms in equation 2.4 denote peptide-peptide, peptide-solvent and solvent-

solvent interactions, respectively, and x1, is a scaling factor for the Lennard-Jones and

electrostatic nonbonded interactions. In explicit solvent simulations, the number of

degrees of freedom is dominated by the solvent. Thus, the indicated scaling is much

reduced relative to the TREM where the global scaling ,BVi (X) = ﬂAiV(X) would be

used.

The form of scaling in equation 2.4 is a requirement for the use of an Ewald
method”, where the evaluation of energy and force in the reciprocal space is based on a
structure factor (a sum over atoms), and consequently necessitates assignment of the

scale factor to atoms, versus to atom-atom interactions. The electrostatic and Lennard-

J ones interactions are uniformly scaled; therefore, as ,1, decreases, both softer Lennard-

Jones and reduced electrostatic interactions are obtained, permitting sampling
enhancement.

One thing noticeable for REM schemes is the number of steps between two
adjacent exchange attempts. Because by imposing the correct detailed balance equations,
at equilibrium, all the systems in the extended ensemble will have corresponding

Boltzmann distribution. The Boltzmann distributions are guaranteed when the whole

26

extended ensemble is at equilibrium, based only on the following two assumptions.
Assumption one, the distributions of the systems within the extended ensemble are
independent. Assumption two, the moving process of the whole extended ensemble is a
Markov Chain process with only one equilibrium state. Therefore, for different choices of
the number of steps between two adjacent exchange attempts, as long as the
corresponding extended ensemble is at equilibrium (in practice, this “at equilibrium” is
checked by checking whether the trajectories have converged), the systems should
sample canonical ensembles. Of course, the optimal choices of the number of steps
between two adjacent exchange attempts will be the one that make the trajectories
converge most rapidly,

The normal potential energy function is being used in system 1 (2, =1), so that the

trajectory associated with it samples the normal canonical ensemble. Therefore, all our
analysis in the later Results and Discussion section was based on trajectories for system 1,

and hence for the normal canonical ensemble.

Molecular Dynamics Simulation settings

The CUKMODY protein molecular dynamics code, which uses the GROMOS966
force ﬁeld, was modiﬁed to incorporate the HREM, based on the previous DREM83 code.
The systems were run independently on different nodes of a Linux cluster computer and,
when exchanges were attempted, information was passed between different computer
nodes using technique conforming to the Message Passing Interface (MPI) standards.
SHAKE8 was used to constrain bond distances enabling a 2 fs time step and temperature
was globally controlled using a Berendsen thermostat10 with relaxation time of 0.2 ps.

For the evaluation of the electrostatic and the attractive part of the Lennard-J ones

27

energies and forces, the PME method15 was applied with a direct-space cutoff of 8.52 A,
an Ewald coefﬁcient of 0.45, and a 30x 30x 30 A3 reciprocal space grid.

All simulations were carried out in a box with 30.0 A sides, having 864 waters
initially. The starting Met-enkephalin conﬁguration was obtained from an NMR
ensemble (pdb 1PLW).55 In this conﬁguration, the end-to-end distance (nitrogen of the N-
terminus to carboxylate carbon of the C—terminus) is 10.5 A. This distance in the
ensemble of 80 lowest energy structures is ~10-11 A. The peptide was immersed in the
water box and 51 overlapping waters were removed. To apply the HREM, ﬁve systems

were used with the scale factors set to ’11 =1, 1?? =0.925, A3 =O.85, 2,, =0.775, 25 =0.7. All

ﬁve systems were started from the same initial conﬁguration and the ﬁrst 3 ns are
considered as the equilibration time. Exchanges were attempted every 40 steps. For an
odd number 2n+1 of systems, exchange attempts are alternated among 1°2,...,2n-1°2n

and 2°3,. ..,2n°2n+1.

Convergence tests for Principal Component Analysis (PCA)

The Principal. Component Analysis (PCA) method described in Chapter 1 was
extensively used in the studies of the simulated trajectories of Met-enkephalin in this
work. The PCA analysis was carried out by using ANALYZER”, a program written for
the purpose of analyzing trajectory data by a wide variety of methods.

One common issue related to using the PCA method is that the slow relaxation of
the large ﬂuctuations may prevent the fast convergence of the covariance matrix“, which
results in slow convergence of the essential space spanned by those large ﬂuctuation
modes. So, before using trajectories generated from PCA, it will be better to ﬁrst test the

convergence in the essential subspace. (These tests can also be used as a severe tests of

28

simulation convergence because the larger PCA eigenvalues correspond to the slower
motions of the subject peptide or protein.) Several convergence tests have been
proposed.“87 Amadei and co-workers86 introduced a root mean square inner product

(RMSIP) measure

1/2

1 n n '
RMSIP= ;Zka(t)-m1(t) (2.5)
k=1i=l

that evaluates the overlap of a subset of n m k (t) modes obtained from different time

' intervals of the total trajectory. Here, we take t and t’ to be two disjoint time intervals of

the trajectory and use the resulting RMSIPs to monitor the stability of these modes.

3. Results and Discussion

Diagnostics

As discussed before, the HREM has the virtue of requiring only a limited number
of systems while, if implemented appropriately, still providing robust sampling. In
explicit solvent simulations, not scaling the solvent-solvent interactions should provide a
substantial reduction in the ntunber of systems required relative to the TREM that scales
all the degrees of freedom. As in all REM versions, the choice and optimization of the
acceptance probability of attempted exchanges in the HREM is a central issue. There
should be an optimal acceptance probability, because for low exchange probability the
rate of movement through conﬁguration space is small, while for high exchange

probability the movement through conﬁguration space is slow. Table 2.1 lists the

29

acceptance ratios for the HREM simulation (the ratios are shown for three 3-nanosecond
time intervals, 3 to 6 ns, 6 to 9 ns and 9 to 12 ns, the ﬁrst three nanoseconds are
considered as equilibrium time); they are all around 0.43. Predescu and co-workers9
analyzed the optimization of the TREM acceptance ratio for a multi-dimensional
oscillator system, and found that 0.3874 is the optimal acceptance ratio, with the
efﬁciency falling off slowly around this value (acceptance probabilities in the 7—82%
range provide rates sufﬁciently close to optimal sampling rates). The uniformity of the
acceptance probability values observed indicates that the choice of the number and

spacing (the speciﬁc 2, values) are appropriate.

Table 2.1. The acceptance ratio for the time span of the HREM simulation (exchanges are

 

 

 

 

 

 

 

 

 

 

 

attempted every 80fs.)
Acceptance Acceptance Acceptance
Potential
’1 ,1 Ratio for Ratio for Ratio for
Index a b
3—6ns 6—9ns 9—12ns
1<--->2 1.0 0.925 0.424 0.438 0.420
2<--->3 0.925 0.85 0.451 0.409 0.463
3<--->4 0.85 0.775 0.457 0.433 0.458
4<--->5 0.775 0.7 0.431 0.408 0.423

 

 

To examine whether all the conﬁgurations (replicas) can visit a particular system

(with a particular A, value) and whether given conﬁgurations are visited by all the

systems, time trajectories for three 3-nanosecond intervals are displayed in Figures 2.1 to

30

2.3. From the plots, it is clear that all the conﬁgurations (replicas) can be visited by all the
systems (a-e) and, conversely, all the systems can be visited by all the conﬁgurations
(replicas) (f-j). In Figure 2.1 (a) the points are vertically connected to show that the plane
is covered, demonstrating the desired itineration. In all the other plots, simple points are
inserted to indicate occupancy. On this scale, all the plots look uniform in time and
similar, which supports the desired feature that the systems undergo a random walk in the
whole exchange range. Examined at higher resolution (a shorter time interval) plot 2.1 (a),
for example, does show that replica 1 is slightly favored by lower-numbered systems, as
actually can be inferred from the white space in the plot.

The important migration aspect has been veriﬁed, as shown in the above, to make
sure that that the exchanges of systems are sufﬁcient. But, whether these exchanges are
efﬁcient and effective, in other words, whether these exchanges induce sufﬁcient
transitions of states for the normal system (i=1) is the next important thing that needs to
be investigated. In Figure 2.4, ﬁom (a) to (c) the end-to-end (Tyrl terminal nitrogen to
Met5 terminal carboxylate carbon) distances are plotted along the time line for the 3 to 6,
6 to 9, and 9 to 12 three 3—nanosecond intervals respectively. From the ﬁgure, it is clear
that, for each time interval, there are numerous transitions between open and closed states.
This observation of sufﬁcient sampling of not only both open and closed states but also
transitions is a necessary condition for the following analyses, since statistical estimates
are not meaningful unless they are based on enough collected sample data points. In fact,
a 18 ns normal MD simulation was conducted using the same starting structure (an open
form of Met-enkephalin), but the sampling was always trapped in the open form for the

whole simulation period.

31

Figure 2.1. (a)-(e): Migration of systems into and out of a given conﬁguration for 3-6 ns.
(f)-(j): Migration of conﬁgurations (replicas) into and out of a given system for 3-6 ns

(,1,- scale value). The ﬁgures from (a) to (e) correspond to systems 1 to 5 and, similarly,

the ﬁgures from (f) to (j) correspond to conﬁgurations l to 5. In (a) the points are
vertically connected, and the coverage demonstrates the desired itineration. Note that in
view of the number of data points that are plotted, it appears as if, at a particular time,
several replicas occupy the same window or several systems visit the same replica— this

does not happen, looking at shorter intervals dots will be separate.

   
     

 

 

  
  

 

(a)5 (b)
X X
{‘3’ 4 8
s 3 5
$3 §
8 2 8 2
[E II
1 1
0:0 015 110 115 210 2:5 310 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Time (ns) Time (ns)
(C) (d)
315 5
‘O 'O
s 5
§ §
8 2 8 2
a: o:
1 1
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Time (ns) Time (ns) _

32

Figure 2.1

A
(D

v
(TI

(continued)

 

h L

 

(JO

 

N

 

Replica Index

1—L

 

 

 

 

''''''''''''

000510152025 30

Time (ns)

 

v

 

 

Window Index 6

NOD->01

A_

 

u—L

 

 

'''''''''''

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Time (ns)

 

V

 

 

 

Viﬁndow Index :2.

 

 

 

o -.*.'°-°’.“.U‘.

Time (ns)

0051101115 00255310

33

 

 

 

 

Window Index 3

 

A
3'
v
01

Window Index

o—‘AN‘OJL-h (111‘

 

 

IIIIIIIIIIII

.0 0.5 1.0 1.5 2.0 2.5 3.0
Time (ns)

 

 

 

 

 

 

 

vvvvvvvvvvvv

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Time (ns)

 

 

 

A;

 

Window Index 9

 

 

 

o 4‘10 w‘h‘UlJ

"""""""

0051.0 15 2025 30
Time(ns)

Figure 2.2. (a)-(e): Migration of systems into and out of a given conﬁguration for 6-9 ns.
(f)-(j): Migration of conﬁgurations (replicas) into and out of a given system for 6-9 ns

(2.,- scale value). The ﬁgures from (a) to (e) correspond to systems 1 to 5 and, similarly,

the ﬁgures from (f) to (j) correspond to conﬁgurations l to 5. Note that in view of the
number of data points that are plotted it appears as if, at a particular time, several replicas
occupy the same window or several systems visit the same replica— this does not happen,

looking at shorter intervals dots will be separate.

A
m
v
(’1
i
A
U-
V
(II

 

 

# A

 

 

A # A

4A

 

 

 

 

Replica Ind x
00

Replica Index
0.)

N
N

 

1.__ ... -.-- __..... 1

0.0051101520215310 0.0'015'110'115'210'215'310

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Time (ns) Time (ns)
(c)5 (d)5
x ’ x ‘
U 4 g 4
5 3‘ E 3i
5 1 5 1
a 2 a 2
CC ‘ or ‘
1 1
0.0051011521025310 0.0051011520215310
Time (ns) Time (ns)

34

Figure 2.2 (continued)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(6)5. (f) 5‘
x ‘ >< ‘
4 a» 4
E 3- E 31
S . 5 1
g- 2-—--- ———- —--—— 1:5” 2
m 1 1
1 1.__.. ..- -.-..___..._..
0.00.5101152025310 0.0015"110152102530
Time (ns) Time (ns)
(9) 5.———.. ———- ...-__ (h) 5
x l x ‘
g 4 g 4
s 1 5 -
3 3
1% a 1
E 2 E 2
3 1 g 1
0.0 ' 015 ' 110 ' 115 ' 210 '215250 0.0'015'17171520215310
Time (ns) Time (ns)
(0 5 (I) 5
X i X
g 4 g 4
_=. - s -
3 3
5 . g .
E 2 3:3 2
3 1 3 1 _
0.0051101320215310 0.0051101152035310

Time (ns) Time (ns)

35

Figure 2.3. (a)-(e): Migration of systems into and out of a given conﬁguration for 9-12 ns.
(f)-(j): Migration of conﬁgurations (replicas) into and out of a given system for 9-12 ns

( ,1,- scale value). The ﬁgures from (a) to (e) correspond to systems 1 to 5 and, similarly,

the ﬁgures from (f) to (i) correspond to conﬁgurations l to 5. Note that in view of the
number of data points that are plotted, it appears as if, at a particular time, several
replicas occupy the same window or several systems visit the same replica— this does not

happen, looking at shorter intervals dots will be separate.

 

 

A

m

v

-h 01
i i
A

C-

v

C11

Replica Index

. A w . 1 .

Replica Index
(JO

 

 

A # A

 

 

N

 

 

N

 

 

 

1 1
0.0' 015 ' 10' 1152102152320 0.0'0f5'11015210257110
Time (ns) Time (ns)

 

 

 

 

 

A
O

V
#01

A

O.

v
01

Replica Index

1 . w I 1

Replica Ind x
(A)

 

 

A $ A

 

 

N
N

 

 

4L

 

1 1
0.0015110 ' 115210215310 0.02 015'120'115'210'215'310
Time (ns) Time (ns)

 

 

 

 

 

36

Figure 2.3 (continued)

A
(D
v
01
A

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

f) 5
x ‘ >< ‘
11> 4 a) 4
2 1 :c’ -
.3 3. g 31
a 2 -—-—-— 1: 2
CD ._
m 1
1 3 1i
0.0015101152025310 0.0051103152025370
Time (ns) Time (ns)
(9) 5' -—---— 005
X ‘ >< ‘
g 4 g 4
s - _=. 2
3 3
5 . 1% .
1:5” 2 E 2
1 3 1
0.0051101152021530 0.0'015'1T0'115'2i0'2i5'310
Time (ns) Time (ns)
(0 5i <1) 5‘
X X
g 4 g 4
E ‘ 5 1
3 3
f; 1 i2 «
g 2 5 2
1‘-—- 3 1._._
0.0'015'{01152102152310 0.0'015'11015210232320

Time (ns) Time (ns)

37

Figure 2.4. The end-to-end distances (Tyrl terminal nitrogen to Met5 terminal

carboxylate carbon) along the time line for: (a) 3 to 6 ns, (b) 6 to 9 ns, and (c) 9 to 12 ns

=1 system.

for the 41

00051.0 15 2:0 ' 2T5 ' 3:0

I

 

 

0.0 of5 1:0 13520 25 3:0

‘I

 

 

 

95 8:920 Eu 9 Eu

Time (ns)

Time (ns)

 

I I I T I I

I

000.5 1.0 1.52025 3.0'

 

 

4 q _ _ _
4 2 0 8 6 4 2
1

Time (ns)

38

PCA trajectory diagnostics

As discussed in Chapter 1, Principal Component Analysis (PCA) is a highly
useﬁrl and popular tool for the systematic investigation of the conformational sampling of
a peptide""’88 since it can reduce the high dimensional conﬁguration space to an essential
subspace that contains most of the signiﬁcant, large-scale motions. Only window 1 in our

HREM simulation ( )1 =1) corresponds to the normal force ﬁeld, and all of the following

analyses are based on window 1 for this reason. The ﬁrst 3 nanoseconds of simulation are
considered as an equilibration period, and the PCA trajectories are constructed based on
the backbone atoms. Over the trajectories, the ﬁrst mode represents around 40% of the
total backbone variance (the total MSF), the ﬁrst two modes together around 55%, and
the ﬁrst ten modes together around 90%, showing that the HREM simulation of met-
Enkephalin does provide a division into essential and remaining subspaces. Because of
the possible convergence problem mentioned earlier in this chapter, before diving into
any detailed examination of the trajectories using PCA, we ﬁrst test the convergence in
the essential subspace with the RMSIP86 method (eq 2.5), examining convergence by
comparing the overlap of modes constructed from different time intervals. We compare
RMSIPs among three time intervals, 3-6 ns, 6-9 ns, and 9-12 ns, for the system 1

trajectories (211 =1) for the ﬁrst mode, the sum of the ﬁrst two modes (these two modes

are the only modes which will be used for the following analysis), and the sum of the ﬁrst
ten modes. The RMSIP results, which are a direct measure of the projection of the basis
of one subspace (for one time interval) onto the other subspace (another time interval),
are listed in Table 2. All the RMSIP values are close to their limiting values of 1.0. The

mode one result indicates that the three PCA ﬁrst mode vectors constructed from the

39

three time intervals are essentially the same. That result suggests that there might be a
reaction coordinate correlated with the PCA ﬁrst (slowest) mode, and that 3-nanosecond
intervals are long enough to capture this slowest motion. The RMSIP values for the ﬁrst
two and ten modes show a similarly good convergence. Thus, we may be conﬁdent that
the HREM simulation leads to stable results over the 9-nanosecond interval (3-12 ns) and,
that, 3-nanosecond intervals are sufﬁciently long to capture the ﬂuctuations of Met-

enkephalin.

Table 2.2. The RMSIPs (for HREM window 1) for the three 3 ns intervals (3-6, 6-9 and

9-12 ns) using the PCA ﬁrst mode, ﬁrst two modes and ﬁrst ten modes

 

 

 

 

3-6 vs. 6-9 ns 3-6 vs. 9-12 ns 6-9 vs. 9-12ns
First Mode 0.972 0.991 0.980
First Two Modes 0.918 0.959 0.909
First Ten Modes 0.926 0.988 0.914

 

 

 

 

 

 

l-dimensional PCA analysis

The observation that the ﬁrst PCA mode captures about 40% of the total MSF
combined with the observation of high convergence of this mode shown by the RMSIP
test suggests its potential appropriateness to be used as a reaction coordinate. And, it
should be more objective in character than that based on, for example, an end-to-end

distance which could otherwise be a natural choice. The ﬁrst mode displacement

40

trajectory, pl (I), deﬁned in eq 1.19 exhibits numerous transitions between its extreme

values, which shows that a broad range of distances is being sampled repeatedly during

the simulation time. Thus, it is legitimate to construct a potential of mean force (PMF).

The PMF, constructed by making a histogram of pl (t) for the three time intervals (for

each 3ns interval, 15000 snapshots are used) of 3 to 6 ns, 6 to 9 ns, and 9 to 12 ns are
displayed in Figure 2.5. The convergence among the three PMF proﬁles is as good as
anticpated from the RMSIP test results. Along each PMF proﬁle, there is a well deﬁned
well, around 1 A, and a broad plateau region from 5 to 18 A. The difference between the
lowest points of the two wells is about 2.0 kcal/mol, and the barrier between them is
around 3 kcal/mol. The barrier is not high, suggesting that switching between these two
states should not be difﬁcult. Although the deep well is about 2 kcal/mol lower then the
broad well, it is much narrower, suggesting that there could be an energy/entropy
compensation trade-off between the broad and deep wells. These features support the
observations of the co-existence of different conformations of Met-enkephalin in water
and of a lack of distinguishable secondary structure.

The type of atom displacements that correspond to the ﬁrst PCA mode can be
inferred by calculating the correlation coefﬁcient of the end-to-end distance for the ﬁrst
mode (obtained from eq 2.6 with i=1) and for the true trajectory. The correlation
coefﬁcients calculated are 0.968, 0.955 and 0.980 for the three 3-nanosecond time
intervals respectively, suggesting that mode one is essentially reporting on the end-to-end
distance ﬂuctuations. The PCA has thus succeeded in singling out the principal collective

motion that spans the open to closed conformations.

41

Figure 2.5. The PMF corresponding to the PCA mode 1 displacements for all k=l system

snapshots for the three time intervals of 3 to 6 ns, 6 to 9 ns, and 9 to 12 ns.

PMF (kcaI/mol)

 

solid 3 - 6 ns
dash 6 - 9 ns
dot 9 - 12 ns

3
5
I
3 II vii” t: 1' :‘1
l‘ i " | ': :l
1' A ' 1 ' I
. 1: , 11.41.111.11)
I . ’1';-;.'u'
"\rllr’ I‘ . ’ ’1'"! -.‘ ' V
’I -. i i
l '. ‘1 I 1 "1‘, 5":."‘"w'|i-‘..
I“ . I :' .‘“‘ ' 0",; I" . .
.‘J . ‘1 ~
I

 

5 ' 1'0 1 1'5
PCA mode1 (A)

42

20

End-to-end (Tyrl terminal nitrogen to Met5 terminal carboxylate carbon) distance

PMF and its comparison with DREM results

As noted above, there exists very high correlation between the end—to-end (Tyrl
terminal nitrogen to Met5 terminal carboxylate carbon) distance ﬂuctuations for the
complete trajectory and that found from the ﬁrst PCA mode, making the end-to-end
distance seemingly to be an interesting reaction coordinate. In addition, the consideration
that the zwitterionic form of Met-enkephalin should be capable of forming a salt bridge
between the terminal Tyrl amide and Met5 carboxyl groups also makes the end-to-end
distance to be a natural choice for a reaction coordinate. Therefore, the 1-dimensional
PMF along this end-to-end distance coordinate was calculated from the trajectory of the
HREM normal system (system 1) for the three time intervals, 3-6 ns, 6-9 ns, and 9-12 ms
respectively and the results were displayed in Figure 2.6. The agreement among the three
PMF proﬁles is farily good with the exception of the region expanded from 9 to 14 A in
which there is a discrepancy about 1 kcal/mol. The poorer convergence in this region is
most likely due to the extensive conﬁguration space for these less constrained regions
(this point is discussed more in detail in the next part for 2-dimentional PCA results).

When the end-to-end distance is chosen as the reaction coordinate, an umbrella
sampling based technique could be implemented to focus on such a natural reaction
coordinate. Usually, an umbrella sampling based technique, if possible to be implemented,
should be able to provide a better PMF estimation along a particularly chosen reaction
coordinate than would a HREM simulation, due to the former’s focus on a speciﬁed

reaction coordinate. Of course, for many problems, it is hard to determine a proper

43

Figure 2.6. The PMF for the end-to-end distance generated by the HREM simulation for

all systems with lt=l for the three time intervals of 3 to 6 ns, 6 to 9 ns, and 9 to 12 ns.

PMF(kcaI/mol)

 

solid 3-6ns
dash 6-9ns
dot 9-12ns

 

 

 

3'1'0'1'2'1'4'1'6
end to and distance (A)

44

reaction coordinate beforehand, and some chosen reaction coordinates are hard to
implement. For example, as for this Met-enkaphalin study, the end-to-end distance was
chosen mostly because it was observed to have extremely high correlation with the PCA
ﬁrst mode and it would be very hard, if at all possible, to use a PCA mode as reaction
coordinate for implementing an umbrella sampling like techniques. The umbrella
sampling based technique chosen here is the Distance Replica Exchange Method
(DREM)89 , which can also be considered as a special type of HREM method. The

Hamiltonian for the ith system of the DREM approach has the form:

H i (X ) = HO(X) + Wi(r) , where, Wi(r) = gki (r — roi)2 is the standard harmonic window

potential, r is the chosen reaction coordinate distance and, r6 is the equilibrium distance

for the system’s harmonic constraints used along with the force constant k,-. The main

difference between DREM and a regular umbrella sampling method is that, for DREM,
like all other replica exchange methods, after certain steps (100 steps for this work),
exchange attempts will be made between neighboring systems for their window
potentials. It was shown89 that DREM trajectories have much better convergence than for
trajectories generated with normal umbrella sampling. Of course, since DREM biases the
true Hamiltonian, as does the regular umbrella sampling method, some un-biasing
procedure is needed to correct the corresponding probability distribution. The Weighted
Histogram Analysis Method (WHAM)37’39 was used to do the un-biasing and to combine
the trajectory data obtained ﬁom the different systems. Comparing the DREM PMF with
the corresponding HREM PMF (constructed by combining 3ns time interval trajectories

to form a long time trajectory from 3 to 21ns to get a smoother estimation in the broad 8

45

Figure 2.7. The PMF for the end-to-end distance generated by the HREM simulation and
DREM simulations. The PMF line for DREM is obtained by combining two separate

lines for the 3-11 A and the 8-16 A simulations.

7; SOIICI HREM 1
dash DREM '

PMF (kcaI/mol)
‘r’

 

 

 

-1 I ' I ' I ' I ' I ' I ' I ' I ' I
2 4 6 8 10 12 14 16 18
end to and distance (A)

46

to 16 A region) in Figure 2.7, it is clear that the patterns are very similar, but the DREM
PMF simulation is shifted down around 1 kcal/mol in the region 7 to 15 A, which covers
the broad well and the barrier between the two wells. The total time over all processors in
the presented simulation, excluding the equilibration times, was 136 ns for the DREM
and 90 us for the HREM simulations. Although the 136 ns vs 90 ns difference seems not
to be too substantial, note that they do not represent the real time scale of the simulated
trajectories. For REM schemes, because dynamics are accelerated, it is hard to estimate
what the real time scale is for a given simulated trajectory. Therefore, the actual motions
sampled by the DREM scheme and HREM scheme could be a little bit different, resulting
in the corresponding pmf proﬁles to have some differences. Futhermore, although the
DREM simulation was designed to focus on the particular end-to-end distance reaction
coordinate, the quality of the resulting pmf depends on how well the other dimensions
orthogonal to the reaction coordinate are sampled. Of course, the quality of the pmf
calculated from the HREM results also depends on the sample quality of the HREM
simulation. Therefore, it is hard to conclude which method provides a more accurate
estimation of the pmf proﬁle along end-to-end distance for Met-enkephalin. But, because
the difference between the two proﬁles are not large, and their patterns are very similar, it
seems to suggest both schemes are appropriate to producing a pmf. Taking into
consideration that HREM does not require any explicit guiding potential for a reaction
coordinate, the HREM simulation does a remarkably good job.
2-dimentional PCA Analysis

Besides the ﬁrst PCA mode, the second PCA mode also has a signiﬁcant

contribution (15%) to the overall motion, motivating construction of a 2-dimensional

47

Figure 2.8. The 2D PMF for the ﬁrst and second PCA modes for all 21=l system
snapshots. Backbone CA stick plots of the conﬁgurations in the dense places are shown

with the distance between Tyrl backbone nitrogen and Met5 carboxyl carbon shown.

 

 

 

Second 4

Mode
(A) 2
0

       

 

-I'8 46 -1'4 .1'2 -10 -8 is
First Mode (A)

PMF -00 _o.s —1.0 —1.5 .20 —2.5
. (kcal/mol) --3.0 "3.5 r 4.0 45

48

Figure 2.9. CA wire plots for the two salt bridge conformers shown in Figure 2.8, with
the Gly-2 and Gly-3 backbone atoms shown explicitly that illustrates the ‘I’(2) and <I>(3)

dihedral angle compensation mechanism.

 

49

Figure 2.10. Ramachandran plots for Gly2, Gly3 and Phe4 of Met-enkephalin; The three
plots in the upper row show results for snapshots that are in the PCA ﬁrst mode deep well

(—0.6 to —0.4 A); The lower row for all snapshots.

Gly2

1 A A r 1 r 1

Gly3 Phe4

 

 

 

    
    

    

'1“

 

 

 

 

 

 

 

 

 

~l60 -80 0

 

460 -80 0 80 I60 -160 -80 0 80 I60

50

PMF displayed in Figure 2.8 (Since we have already shown good PCA convergence in
the above discussions, for the 2 dimensional analyses only the 3—6 ns interval trajectories
was focused on for convenience). There are two wells around (—0.5 A, 2.3 A) and (—0.5 A,
7.0 A) in the ﬁgure that correspond to the deep well in the one dimensional PMF plot.
Representative alpha Carbon (CA) pictures of the dominant backbone conformations are
also shown in the ﬁgure. Conﬁgurations in the two wells have end-to-end distances
corresponding to salt-bridged conformers, versus the more extended states of all the other
displayed conﬁgurations. Figure 2.9 shows representative CA wire ﬁ'ame backbone
structures with the Gly-2 and Gly—3 atoms explicit for the two wells. The differences in
the backbone structures, with the parallel and anti-parallel carbonyls, come from the
differences of the Gly-2 ‘1’ and Gly-3 d) dihedral angles. The ﬁrst row plots of Figure
2.10 shows the Ramachandran plots for Gly2, Gly3 and Phe4, with snapshots picked
ﬁom the PCA ﬁrst mode within the range from —0.6 to —0.4 A, and the bottom row plots

for all snapshots are shown for comparison purposes. Although the conﬁgurations
corresponding to the two deep wells in the 2-dimensional PMF have large differences in
their Gly-2 ‘1’ and Gly-3 (I) dihedral angles, the main patterns of their backbones are
similar, illustrating the known90 feature that the LI’(i) and (D(i+1) values of residues i and
i+1 can compensate and still lead to overall similar structures. This is the only local
mechanism in peptides (or proteins) that can lead to a structure with essentially the same
overall conformation. The Ramachandran plots in the ﬁrst row of Figure 2.10 are quite
similar to those generated by van der Spoel and Berendsen63 in their study of zwitterionic
Leu-enkephalin solvated by water. Their simulations were started from a salt—bridge and

an open form that rapidly closes and remains mostly closed, so they mainly sample salt

51

bridge conformers. They also note a Gly2 ‘I’ and Gly3 (D compensation for these salt-
bridged conformers.

The finding that the Gly2 and Gly3 are concentrated in two distinct regions of ‘1’
and (1) suggests that these residues may be participating in hydrogen bonding. For the
“parallel” carbonyl arrangement of Figure 2.9 there can be Gly3 N-H to Met5 O=C
hydrogen bonding/salt-bridge interactions. For the “anti-parallel” arrangement there can
be Gly2 C=0 Met5 N-H as well as Gly2 N-H Met5 O=C hydrogen bonding/salt-bridge
interactions. Selected snapshots of the parallel and anti-parallel conformers do show these
interaction patterns that most likely aid in stabilizing the direct terminal salt-bridge
interaction. However, hydrogen bond analysis of the trajectories shows that none of these
additional interactions persists. The relevant distances are constantly ﬂuctuating and are
mainly beyond the distance that would permit such a conclusion.

In the remaining region of the 2-dimensional PMF plot in Figure 2.8, there is a
less concentrated but much larger area of relatively low free energy, than in the area
around the two deep wells. The barrier between the deep and broad areas is only around 3
kcal/mol, and the difference between them is roughly within 2 to 3.5 kcal/mol, both of
which are not very large on a thermal scale but, signiﬁcantly, the pathway between two
areas is very narrow. This observation suggests that although Met-enkephalin samples
both regions, one favored by energy and the other favored by entropy, the rate of
transition between them may be slower than might be inferred from the small free energy
barrier. Starting from an extended conﬁguration, there are many conﬁgurations that
correspond to the extended states area to ﬁrst sample, and then a restricted region in

conﬁguration space must be found that corresponds to the small transition area in Figure

52

2.8, to enter the salt-bridge region. This feature may explain the discrepancy between the

53-56,58,60

experimental results that Met-enkephalin or Leu—enkephalin shows great

ﬂexibility and a lack of deﬁnite conformations in water, and the difﬁculty of going from
the extended to closed conformations found through conventional MD sirnulattionslg’32’61'
69. Indeed, for the zwitterionic Met-enkephalin simulated here, an 18 ns normal MD
simulation with the same starting conﬁguration and conditions as for the HREM system 1

(71=1) is trapped in extended states and never samples conﬁgurations corresponding to the

two deep wells.

4. Concluding Remarks

2 The HREM approach was successfully implemented to improving the MD
sampling of zwitterionic Met-enkephalin in explicit solvent. The implementation of
HREM was quite efﬁcient in the sense that only ﬁve windows were required to generate
results that, as monitored by the PCA modes, show good convergence properties in the
simulation time. More importantly, it was shown that the system 1 (normal system with
X=l) trajectory repeatedly sampled conﬁgurations that correspond to all the relevant end-
to-end distances between the open and close states, providing an essential condition for
properly exploring the conﬁguration space of Met-enkephalin and a meaningful
calculation of free energy quantities, such as a potential of mean force. The PCA was
effective in singling out the dominant end-to-end distance ﬂuctuation motion in the ﬁrst
mode.

The l-dimensional PMF proﬁle along the end-to-end distance for the Met-

enkephalin calculated ﬁ'om using the HREM approach was also compared to the one

53

calculated from using a DREM approach, which was implemented to use window
potentials to focus on the particular end-to-end distance reaction coordinate. The two
PMF proﬁles are quite similar, especially in pattern. The DREM result might be more
accurate since it is carefully designed to focus on the reaction coordinate. But, the HREM
has the advantage that it does not require a commitment to a particular reaction
coordinate, or a particular dimensionality of reaction coordinate. The objectivity of the
HREM in this regard is an argument in its favor relative to the DREM since in many
cases it is hard to specify or know beforehand a proper reaction coordinate. Of course, the
TREM is also objective because all degrees of freedom are thermally excited. However,
for a system with many degrees of freedom, the chain of replicas required may become
impractically large. The HREM can be viewed as an attempt to excite only important
degrees of freedom, but that requires a decision as to which are the important degrees of
freedom. For explicit solvent simulations, not having to excite the solvent degrees of
freedom is a great advantage since their number is an order of magnitude larger than the
number of peptide (and protein) degrees of freedom in typical MD simulations.

The simulated trajectory, analyzed with the PCA decomposition, also revealed the
presence of another signiﬁcant PMF dimension that is mainly composed of a correlated
local change of the Gly2 ‘I’ and Gly3 (I) dihedral angles. That two very different regions
of dihedral space for these residues are stably occupied, even though the overall structure,
as monitored by the end-to-end distance, corresponds to a salt-bridge conformer, results
from this local ‘l’(i) and <D(i+l) compensation. Though Met-enkephalin is small and the
barriers between the open and close states is not large, the conﬁguration space of

important states of the peptide seems to be very broad and there seems to be only a

54

narrow bridge between open and closed states. This observation probably suggests that
properly sampling the Met-enkephalin conﬁguration space is more an issue of having to
explore a large space with small potential energy barriers than of having to overcome a
large banier along a reaction coordinate for some transformation as in a folded, stable
protein. The result is a free energy proﬁle that does not exhibit a large banier, yet
requires a great deal of simulation time or implementing special MD methods such as the

HREM used here.

55

Chapter 3 A specific Hamiltonian Replica Exchange Method for the study of

EcHPPK and YpHPPK

1. Introduction

As described in Chapter 1, HPPK (6-Hydroxymethyl-7,8-dihydropterin
pyrophosphokinase), a key enzyme in the folate biosynthetic pathway, which catalyzes
the transfer of pyrophosphate from ATP to 6—hydroxymethyl- 7,8-dihydropterin (HP)72, is
a protein of great interest, since it could possibly be targeted to design new antibiotics. In
this work, an HREM implementation (details are in the following “methodology” section),
similar to the one which helped enhance the sampling of Met-enkephalin, as described in
Chapter 2, was used to study two homologs of HPPK, E. coli HPPK75 (EcHPPK) and Y.
pestis HPPK80 (Y pHPPK). EcHPPK is the most studied HPPK, and these studies have
shown that it has several well-deﬁned, biologically important conformational states along
its catalytic cycle.77’78’74 YpHPPK is a causative agent of septicemic, pneumonic and
bubonic plague, and is one of the most virulent pathogens known.91 The main reason to
study two homologs is, hopefully, to ﬁnd appropriate targets for which species-selective
inhibitors could be successfully designed. If this goal were accomplished, certain new
species-selective inhibitors might be designed to be used as new narrow band antibiotics
effective against targeted microorganisms, helping to counteract the dangerously wide
spread antibiotic resistance.

One important basis for this work is related to current suggestions that
conformational change occurs, to a greater or lesser extent, when proteins and ligands
form complexes. A progression of models has been introduced that in essence increase

the role of protein plasticity in the binding of ligands.”95 The “lock and key” model

56

asserts that a protein has a cavity that a ligand (or another protein) can ﬁt into with minor
rearrangements of protein and ligand. However, this model cannot account for proteins
that can bind differently shaped substrates. In contrast, the “induced ﬁt” model tries to
account for this by suggesting that a ligand induces a conformational change at the
binding site, shifting it toward an active state. The pre-existing equilibrium hypothesisgé,

97’98, asserts that

based on more recent funnel energy landscape protein-folding theories
the native state of a protein can exhibit an ensemble of conformations that can span apo
to more ligand-bound-like conformations. The ligand can select a pre-binding

conformation ﬁ'om the ensemble of protein conformers, and then bias the equilibrium

toward the catalytically competent binding conformation. Some studies have shown that a

99,100 101

number of ligand-free proteins (staphylococcal nuclease , calbindin , adenylate
kinase83’102, and cahnodulin103) do sample conformations required for ligand binding.
Conformational plasticity might be more advantageous for ligand binding by requiring
smaller protein rearrangements (with a reduced energetic requirement) than would be
necessary in the more open forms.

Adopting the reasonable assumption that proteins can ﬂuctuate to produce
conformations suited for ligand binding, in this work the focus is on near-closed (close to
what is referred to as the closed structure that characterizes the ligand-protein complex,
as determined by crystallography) conformations that are relatively stable (i.e.
signiﬁcantly populated) for the reason presented in the rest of this paragraph. Active-site
residues are often well preserved in a given enzyme among different species due to the

fact that the fundamental catalytic mechanism and substrates are typically the same.

Consequently, the closed conformation usually represents a challenging template for

57

developing species-selective inhibitors. Compared with closed conformations, near-
closed conformations may provide greater opportunities for two homologous proteins to
discriminate between different ligands. The identiﬁcation of new, low-energy protein
conformations that have not been trapped by crystallographic or NMR studies may
provide targets for species selective inhibitors.

To carry out the identiﬁcation of near closed conformations, there are two tasks
that need to be completed. First, near closed conformers need to be generated; for this
purpose, the HREM is used. Second, near closed conformers need to be distinguished and
then categorized. For this purpose, a clustering algorithm is used along with a qualitative
method to deﬁne binding cavities for the substrate. Implementing HREM in the way
described in the next methodology section for a protein requires information about which
parts are most rigid (a core region) and which parts are more ﬂexible. By comparing the

”’78 and open (apo)74 forms of

closed (the ternary complex with substrates HP and ATP)
EcHPPK it is clear that the regions of the protein that have the largest differences
between the two forms are Loops 2 and 3. Furthermore, for both EcHPPK” and
YpHPPK7S’80, these loops are among the most ﬂexible regions based on xray B-factors.
Therefore, the HREM method implemented in this work focused on these two loops.
Compared with using normal MD, we found that HREM helps to sample the low-energy
states neighboring the closed conformations of the crystal structures for both EcHPPK
and YpHPPK so that broader simulated ensembles of dominant conformations were
obtained. Based on these simulated ensembles, clustering methods were applied to

identify well-populated conformations that differ signiﬁcantly in HP binding pocket

conformations between the two species. Differences among some near closed

58

conformations in the EcHPPK and YpHPPK HP binding pockets can be identiﬁed that

are potential targets for designing ligands with enhanced speciﬁcity.

2. Methodology

Hamiltonian Replica Exchange Method

The HREM implementation used in the work summarized in this chapter is
similar to the one used for Met-enkaphalin described in detail in last chapter. The only
difference is that for Met-enkephalin the whole small peptide is scaled, while, in this
chapter, only the most ﬂexible regions of the protein are sealed with the aim of further
reducing the number of systems required. In more detail, for the HREM implementation

used for the HPPK studies, the potential energy is parameterized as

V. (X) = 2.2V“ (XL,xL)+/1iVLR (xL,xR)+VRR (xR,xR), (3.1)

l l

where the “L” subscript in eq 3.1 stands for “Loop” and denotes the HPPK Loop 2
(residues 44-53 for both EcHPPK and YpHPPK) and Loop 3 (EcHPPK residues 82-93,
YpHPPK residues 87-94). The R subscript stands for “Rest” which denotes the rest of the
atoms in the system, including the remaining parts of HPPK: ATP, the two magnesium
ions, and the explicit solvent. Thus, the three terms in eq 3.1 represent interactions

between atoms of the two loops, between a loop atom and an atom within the “Rest”

parts, and between atoms within the “Rest” parts. 1,. is a scaling factor for the Lennard-

Jones and electrostatic nonbonded interactions. It is clear by the construction used in eq

3.1 that the number of degrees of freedom is dominated by the Rest parts; thus, the

59

indicated scaling is much reduced relative to TREM for which the global scaling
,BVi (X) = ﬂAiV(X) would be used.

Again as described in last chapter, the normal potential energy function is being
used in system 1 ()11=1), so that the trajectory associated with it samples the normal
canonical ensemble. Therefore, all our analysis in Section 3 is based on trajectories for

system 1.

Molecular Dynamics Simulations

Similar to Chapter 2, the modiﬁed CUKMODY program for HREM is used to
carry out the simulations. Again, the systems were run separately on different nodes of a
Linux cluster computer and, when exchanges were attempted, information was passed
using MPICH, an implementation of the Message Passing Interface technique.

For all simulations, bond distances were constrained using SHAKEs, enabling a 2
fs time step, and temperature was globally controlled at 298 K using Berendsen
thermostat10 with a relaxation time of 0.2 ps. For the evaluation of the electrostatic and
the attractive part of the Lennard-Jones energies and forces, the PME method was applied
with a direct-space cutoff of 9.0 A, an Ewald coefﬁcient of 0.45, and a 72x72x72 A'3
reciprocal space grid. All simulations were carried out in a cubic box with sides of 64.1 A.
For the EcHPPK (Y pHPPK) simulations, 7671 (7644) waters were included, after waters
overlapping the protein were removed. Five sodium cations were added to neutralize the
systems for EcHPPK.

The MD starting structures were obtained from the crystal ternary complex
structures for EcHPPK (PDB entry lqu) and YpHPPK (PDB entry 2qx0) with AMPCPP

(an ATP analog that prevents turnover), two associated magnesium cations, and HP (6-

60

hydroxymethyl-7,8-dihydropterin). For our simulations, HP was removed, and AMPCPP
replaced by ATP, with the ATP modeled using the GROMOS966 force-ﬁeld parameter
values. The two Mg2+ cations were each covalently linked to an oxygen phosphate atom
of ATP with one linked to an alpha phosphate oxygen and the other to a gamma
phosphate oxygen, in accord with their placement in the crystal structures. The ATP
phosphates were assumed ﬁrlly deprotonated (ATP thus has a total charge of —4) in
agreement with the ATP protonation state when ligated to Mg”.104 The protonation states
of all the ionizable residues were set to their normal ionization states at pH 7.

One thing noticeable is that the available crystal structure of YpHPPK is a dimer
and the issue of whether dimerization is catalytically relevant is not clear according to
experiments.80 From the crystal dimer structure, it is evident that each monomer’s Loops
2 and 3 (especially Loop 2, which is longer) extends close to the other monomer, and
seems to strongly interact with the other monomer. Therefore, in the dimer, these active
site loops might spend a substantial amount of time bumping into each other, suggesting
that the dimer is not the physiologically relevant state. For these reasons, the monomer of

YpHPPK is simulated.

In the HREM simulation of both EcHPPK and YpHPPK, six systems were used
with the scale factors set toll=1,42=0.83, 23=0.73, 2,, =0.65, 25=0.57, 26 =0.5. They
were started from their respective initial conﬁgurations, and the ﬁrst 2 nanoseconds are
considered as equilibration time. Exchanges were attempted every 40 steps. For the

conventional MD simulations, the same initial conﬁgurations as the HREM were used

and the ﬁrst 4 nanoseconds are considered as equilibration periods.

61

The HREM scaling used does include interactions between loops 2 and 3 and the rest
of the protein (and the solvent). Thus, there should be and there are some modest
enhancements of the rest of the protein RMSFs relative to the MD results. The interaction
of loops 2 and 3 with the rest of the protein is somewhat stronger in EcHPPK than in
YpHPPK as monitored by the RMSFs. The weaker interaction for YpHPPK might be due
to the shorter length of its loop 3. But, even for EcHPPK, using the HREM approach, the
RMSFs for core residues are mostly below 2 A. An interesting feature, especially evident
for EcHPPK, is that though loop 1 is not scaled its RMSF has increased substantially.
This observation supports the crystal structure evidence that there may be strong
interactions between loop 1 and some parts of loops 2 and 3. As desired, by the

conservative HREM scaling in eq 2.3 the core protein structure is very well maintained.

Deviation and ﬂuctuation Measures

The standard structural deviation measure used is the RMSD} (root mean square

deviation) deﬁned as follows:

 

RMSD}: :jdt(rj(t)—r,‘.’)-(r,(t)—r,‘-’) T. (3.2)

o c o n .
where rj 1s the base posrtron vector (usually the crystal structure) for atom j.

The conventional RMSF)- (root mean square ﬂuctuation for atom j) measure of

protein ﬂuctuations (that can be compared to B-factors, by calculating B-factor using

RMSF through B - factor = (8772 / 3) * RMSF2 )'6 is deﬁned as

62

 

RMSF}: :idt(rj(t)—Fj)-(rj(t)——j)] T- (3-3)

1
where F}. = Idt r j (t) / T is the trajectory averaged position vector for atom j.
o

Clustering method and HP binding pocket proﬁle calculation

The clustering is done using the “g_cluster” routine of GROMACS which is
implemented in the software package using the algorithm as described in Daura et al.47
For each input structure, the algorithm counts the number of neighbors within an RMSD
cut-off (for the work summarized in this chapter, all the clustering is done using a 2 A
cut-off), takes the structure with the largest number of neighbors, along with all its
neighbors, as a cluster, and eliminates all the structures within this cluster from the pool
of structures. These steps are repeated for the remaining structures in the pool until no
structures are left.

The HP binding pocket proﬁle calculation is carried out as follows. First, the
conformation of a protein snapshot is superimposed onto the starting crystal structure.
Then, using the initial position of the mass center of HP in the crystal structure as the
center, and a radius of 5 A as a search sphere, a grid-point search is done within this
sphere with an oxygen test atom for points which have Lennard-J ones interaction

potential energy with protein atoms no greater than zero, and record them. Finally,

connect those points to deﬁne the HP binding pocket proﬁle.

63

3. Results and Discussion

HREM Replica Exchange Diagnostics

As usual, the most important issue for successful application of an HREM method
to a certain protein simulation problem is to limit the number of systems required while
still providing robust conﬁgurational sampling. In explicit solvent simulations, not
scaling the solvent-solvent interactions should provide a large reduction in the number of
systems relative to the temperature REM that scales all the degrees of freedom. The work
for Met-enkaphalin summarized in Chapter 2 provides a good example of the above
statement. In addition, since it can be expected that within a reasonable time span, the
relatively rigid core and Loop 1 of HPPK should not change their conformations
substantially, the interactions among those atoms can be left unscaled to reduce further
the number of systems required. As also highlighted in the last chapter, the choice and
optimization of the acceptance probability of attempted exchanges in the HREM is a
central issue because there should be a “reasonable” acceptance probability, since for too
low an exchange probability the rate of movement through conﬁguration space is too
small, while for too high an exchange probability the movement through conﬁguration
space is too slow. Table 1 lists the acceptance ratios for the HREM simulation for
EcHPPK and YpHPPK. It is clear that all the acceptance ratios are within the optimal
range (7-82%) suggested by Predescu and co-workers.26 By using same system ,1 values
for EcHPPK and YpHPPK, the acceptance ratios for YpHPPK seems to be much larger

which is probably because YpHPPK is more ﬂexible than EcHPPK.

64

Table 3.1. Acceptance ratios for the HREM simulations; (a) for EcHPPK and (b) for

 

 

 

 

 

 

YpI-[PPK
Potential Aa Ab
Acceptance Ratio
Indexm
1<--->2 1.0 0.83 0.13
2<--->3 0.83 0.73 0.094
3<--->4 0.73 0.65 0.14
4<--->5 0.65 0.57 0.080
5<--->6 0.57 0.5 0.12

 

 

 

 

 

 

 

 

Potential Aa Ab

Acceptance Ratio
Indexm
l<--->2 1.0 0.83 0.64

 

2<--->3 0.83 0.73 0.50

 

3<--->4 0.73 0.65 0.43

 

4<--->5 0.65 0.57 0.41

 

5<--->6 0.57 0.5 0.37

 

 

 

 

 

 

65

To examine whether all the conﬁgurations (replicas) can visit a particular system

(with a particular 2,- value) and whether given conﬁgurations are visited by all the

systems, we display these time trajectories in Figures 1 and 2. (Since the plots for the
third and fourth us are similar to the plots for the ﬁfth ns, only the plots for the ﬁfth us
are displayed for convenience.) From the plots, it is clear that for the simulation of
YpHPPK, all the systems (windows) can be visited by all the conﬁgurations (replicas) (a-
f), and, conversely, all the conﬁgurations (replicas) can be visited by all the systems
(windows) (g-l). Although the plots for the simulation of EcHPPK are not as turiform as
the ones for YpHPPK, it is still true that conﬁgurations 2 and 3 can be visited by all the

systems and systems 2 and 3 can be visited by all the conﬁgurations while all the systems

have conﬁgurations with both higher and lower indices visiting them.

Comparing RMSD and RMSF for HREM trajectories with those for normal MD

As noted in the Introduction and Methodology sections in this chapter, the most
ﬂexible regions of HPPK are Loops 2 and 3, and these two loop regions comprise the
atoms whose interactions were scaled. To see whether the HREM simulations improve
sampling relative to the normal MD simulations, the RMSD and RMSF of the two loops
are compared. The system 1 HREM trajectories are used for the comparison since it is
system 1 that samples from the normal MD force ﬁeld’s ()1=1) canonical ensemble. The
RMSD and RMSF are evaluated based on the backbone atoms and on all the atoms for
each residue in the two loops. All trajectories used are ones obtained after the RMSD

stabilized.

66

Figure 3.1. Migration Check for EcHPPK. (a)-(f): Migration of conﬁgurations (replicas)

into and out of a given system (it,- scale value). (g)-(l): Migration of windows (systems)

into and out of a given conﬁguration . The ﬁgures of a-f correspond to systems 1-6 and
the ﬁgures of f-j correspond to conﬁgurations 1-6. Note that in view of the number of
data points that are plotted it appears as if, at a particular time, several replicas occupy the
same window or several systems visit the same replica—this does not happen,—looking at

shorter intervals, the dots will be separate.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(a) 6; (106:
q): 5‘ B 5:
E 4 E 4‘.“ "" """"
.§3""" " "— E3"
3 2 E 2‘
1 1 .-
0.0 012 0:4 0:6 ' 0:8 ' 110 0.0 ' 0:2 ' 0:4 ' 06 j 018 ' 110
Time (ns) Time (ns)
(C) 62. . .- _- .. . .. . _. (d) 6- - _.
5.- -._._-..._-..--. -... .. _ . _._.- __
5 1 a: 5
:12 4. 1.2 4 —
.§ -3‘—— § 3 '—
75- j__._ '5- -._-.. _
g 2)- ; 2
1,”.-. . .......-_._... 1 --.. ._
0.0 032 0:4 0:6 0:8 110 0.0 '012 0:4 '06 0177.0

Time (ns) Time (ns)

67

Figure 3.1 (continue)

 

 

 

 

 

 

 

 

 

(e) 6:— ... (f) 6:-..“ .... .. ...—-.-.__.-
5 5 —
as . g 1
3. 41' "— ' ‘—‘— a 41 '—
51:, 3.. -. --....._._ g 3.. -_- _
l- 1.
0.0 02 0:4 0:6 0:8 110 0.0 012 0:4 0:6 0:8 170
Time (ns) Time (ns)
(9) ; (h)

 

 

 

 

Window Index

 

 

 

 

 

 

6
5
4
31---... - ..
2
l
0

Window Index
o ~— N 05 4:. 111 O\
I

.0 ' 0:2 ' 0:4 '016 10:8 ' 110 .0 02 '014'0:6*0f82110

 

 

 

 

 

 

 

 

 

Time (ns) Time (ns)
(I) 6.)- . (I) 6-_...-.... .__...-..._......_..
>< 5.. .-.. . ---...._._ >4 5-- ...—..-..____
d.) 1 Q) .
:2. 4 — :2 4 —
g 32— g 3
'8 2;- E 2i. ._ .--. ..-__....
B 1 3 1
1..._. .. .-.-. .... . ......_ 1,
0.0 072 0:4 0:6 0:8 110 0.0 02 0:4 0:6 08 L 1‘0
Time (ns) Time (ns)

68

Figure 3.1 (continue)

A
x
v
0\

Window Index

 

M

 

A

3:
2.
11

 

 

0.0 ' 012 ' 0:4 ' 0:6 ' 0:8 ' i0

Time (ns)

69

—
v

Window Index

 

 

 

 

6.

5;_ _.

4;- - _.

3i - _- . _

2:

1:

0.0 ' 012 T 014 I 016 ' 0:8 ' 1T0
Time (ns)

Figure 3.2. Migration Check for. YpHPPK. (a)-(t): Migration of conﬁgurations (replicas)

into and out of a given system ( 11- scale value). (g)-(l): Migration of windows (systems)

into and out of a given conﬁguration . The ﬁgures of a—f correspond to systems 1-6 and
the ﬁgures of f-j correspond to conﬁgurations 1-6. (Note that in view of the number of
data points that are plotted it appears as if, at a particular time, several replicas occupy the
same window or several systems visit the same replica— this does not happen. Looking at

shorter intervals, the dots will be separate.

 

 

 

 

 

 

 

 

 

 

.0ﬁ 012T 0:4 '016 '018 ' 110

 

 

 

 

 

Replica Index
o H N DJ A LII

Replica Index
o H N U) A LII

.0 ' 0:? 0:4 ' 0:6 T08 ' 1:0

 

 

 

 

 

 

 

 

 

 

 

 

Time (ns) Time (ns)
(C) 6' ' (d) 6‘
5._.-...--. .. . . ...-.. 5
a 4 -----------— .5 4.
3 3__.
E 2 on? 2.....-“ -_......_...._.__. _
1 1...-.... .--......___. .—
0.0 ' 012T 0:4 ' 0:6 fois 1 110 0.0 012 0:4 0:6 0:8 110
Time (ns) Time (ns)

70

Figure 3.2 (continue)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(6) 6‘—----—--°---——— (f) 6
x. 5 x 5
0 5
3 4' E 4
.§ 3 .3 3 "
a 2 a 2 —-———-— — — .-
o: a:
1.- 1-_ . . .... .. .
0.0 012 0:4 0:6 0:8 110 0.0 012 04 0:6 0:8 110
Time (ns) Time (ns)
(9) 6--- ---..-......... . . .... . (h) 6f—_-—" . ...-.-
>< 5' :4 5
0) d) .
E 4. - .. --- .. __.. .... E 4._..-.. -_.._.._.__. _
E 3. E 3l
.5 2 .5 2
3 B -
1 ---——- 1
0.0 012 0:4 0:6 0:8 110 0.0 012 0T4 0:6 0:8 110
Time (ns) Time (ns)
(i) 6 - (J) 6
>< 5 5 5
‘1’ '0
5 4——. ,5 4
"‘ ‘ a ‘ -..._-.-...___
E 3. 5 3.
.5 2 .5 2
a B
1 1
0.0 ' 0:2 ' 0:4 ' 0:6 ' 0:8 ' 1:0 0'0 0:2 0.14 0:6 0:3 110
Time (ns) Time (ns)

71

Figure 3.2 (continue)

 

 

 

 

 

 

 

 

 

 

 

 

(k) 6‘ (I) 6+
5 5. 35 5j—----—-- —__
3. 4 E 4
g 3T.-...--. .. . . ..... ..-.. g 3. .
E 2l '8 2:__.__._ _--..
3 . 3 .
l 1-....-
0.0 ' 0:2 ' 0:4 ' 0:6 ' 0:8 ' 110 0.0 012 0:4 0:6 0.8 110
Time (“5) Time (ns)

72

The RMSD comparisons are displayed in Figures 3.3 and 3.4. Based on EcHPPK
backbone atoms, the RMSD calculated for the normal MD trajecotry is around 3 A,
showing that the simulated conformations did not deviate far ﬁ'om the original crystal
structure. The RMSD for most residues in Loop 2 and Loop 3 based on the HREM
trajectories has a value between 3 and 5 A with residues at the ends of two loops having
smaller RMSD while two residues in the center of Loop 3 having a bit larger RMSD. The
simulated conformations based on HREM deviate more from the starting structure than
those based on normal MD trajectories from the starting structure. A larger deviation
from the starting structure after simulation for a couple of nanoseconds has a better
chance to occur since the starting structure was made by removing HP from the ternary
crystal structure. Based on YpHPPK backbone atoms, using either the normal MD or
HREM method, the resulting RMSD plots show that the simulated protein loop 2 and 3
conformations are substantially different from the starting crystal structure after several
nanoseconds of simulation. Basing the RMSD calculations on all atoms of Loops 2 and 3,
the changes of conformations for most residues are larger for both EcHPPK and
YpHPPK either by regular MD or by HREM. This result is expected because the side
chains are usually more ﬂexible than backbones. The larger RMSD difference for
YpHPPK versus EcHPPK may be due to the fact that the starting crystal structure for
YpHPPK was obtained by simulating one monomer from the dimeric crystal structure.80
As discussed in this chapter’s methodology section about the starting crystal structure for
YpHPPK, in the dimer crystal structure each monomer’s Loop 2 interacts strongly with
residues in the other monomer, and it is Loop 2 that shows the largest RMSDs from the

crystal structure.

73

Figure 3.3. The RMSD for residues of Loop 2 and Loop 3 in EcHPPK: (a) and (b) are for
RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c) and
(d) are based on all atoms. The RMSD based on normal MD are shown with solid lines
and the RMSD based on HREM are shown with dashed lines. The RMSDs are average
RMSDs calculated for all the snapshots in the two three nanosecond long trajectories
based on normal MD and HREM picked when the RMSDs and RMSFs for the

trajectories have been stable.

 

 

 

 

 

     

 

 

 

 

10- (a) 10- (b)
8~ 3.
<3: A ‘
'8 6- gr 6..
E ‘ 2
4-
2-
0 I ‘V I V 1 r I ' ' 'ﬁ 0 I I I fl ' I Y I r I ‘ I
44 46 48 50 52 54 82 84 86 88 90 92 94
resnum resnum
8- 3-
4:: l '
‘3 6-1 g 6.
E w '8 .
45 E 4«
25 2.
( 1
0 I I I ' I r I r I r j 0 ' v I v I v I v I I 1 ' l
44 46 48 50 52 54 82 84 86 88 90 92 94

res num
res num

74

Figure 3.4. The RMSD for residues of Loop 2 and Loop 3 in YpHPPK: (a) and (b) are for
RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c) and
(d) are based on all atoms. The RMSD based on normal MD are shown with solid lines
and the RMSD based on HREM are shown with dashed lines. The RMSDs are average
RMSDs calculated for all the snapshots in the two three nanosecond long trajectories
based on normal MD and HREM picked when the RMSDs and RMSFs for the

trajectories have been stable.

   

 

 

 

 

   

 

 

 

 

10. (a) 10‘ (b)
. l
8- 3‘
g 6.. 2 6..
3 ‘ 3 ‘
E 4. E 4-
‘ l
2- 2.
o I I I T T I O I I ﬁr I 7 I
44 46 48 50 52 54 86 88 90 92 94
resnum resnum
109 1°“ W
a. 8‘
g 6- g 64
s ‘ 8 ‘
E 4- E 4-
25 2.
J
O I ' I ‘7 I 1 I ' I T ﬂ 0 . I ﬂ 1 . T ﬁ, ‘
44 46 48 50 52 54 86 88 90 92 94
resnum resnum

75

The Loop 2 residues of YpHPPK deviate more from the starting crystal structure
than those of EcHPPK. Figure 3.5 shows that the Loop 2 conformations for the starting
crystal structures of the two proteins are quite similar, while from the ternary structures it
is clear that the HP binding pocket in HPPK is close to its loop 2. Therefore, the
difference of the magnitudes of the deviations ﬁom the crystal structures observed for the
Loop 2 residues in the two proteins suggests the possible existence of different HP
binding pocket conformations for the two proteins.

The differences between the RMSDs based on HREM and normal MD
trajectories are in the main not large. But, RMSD only shows the deviation from the
starting crystal structure and does not contain direct information about protein ﬂexibility.
In particular, if the protein is trapped in a small area far from the starting point within the
conﬁguration space, the RMSD can be large. Therefore, RMSF comparisons should be
more revealing of the extent of conﬁgurational sampling. If they turn out to be small,
then the RMSD measures indicate that the various trajectories are conﬁned to basins in
conﬁguration space that are displaced by varying degrees from the crystal structures.
Figures 3.6 and 3.7 display the RMSF comparisons, which show that the RMSF based on
HREM trajectories is consistently much larger than the RMSF based on normal MD
trajectories for both EcHPPK and YpHPPK. The normal MD RMSF based on the
backbone atoms is less than 2 A for each residue of the two loops for the two proteins
with the only exception that the Gly47 of YpHPPK has an RMSF of 2.2 A. Even when
side chains are included, only Arg84 has an RMSF greater than 3 A. It is clear that the
normal MD trajectories were trapped around some basin in the conﬁguration space over

the simulation time scale, after a stable RMSD was achieved. The RMSF obtained

76

Figure 3.5. Starting crystal structures for the simulations: (a) for EcHPPK (PDB entry

lqu) (b) for YpHPPK (PDB entry 2qx0)

(a)

 

     

Loop2 —->

 

 

 

 

 

 

 

77

Figure 3.6. The RMSF for residues of Loop 2 and Loop 3 in EcHPPK: (a) and (b) are for

RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c) and

(d) are based on all atoms. The RMSF based on normal MD are shown with solid lines

and the RMSF based on HREM are shown with dashed lines The RMSFs are calculated

for all the snapshots in the two three nanosecond long trajectories based on normal MD

and HREM picked when the RMSDs and RMSFs for the trajectories have been stable.

 

 

 

 

 

 

 

 

 

 

8- 84
(a) (b)
6- 6.
4
'34- @4-
15 ......... ‘23
E g ..............
0 1 ﬁ F T '71 T l I 0 I I T I f I I I
44 46 48 50 52 54 82 84 86 88 90 92 94
res num res num
8~ 8
(C) j (d)
6- 5.
€44 ------ g4.
”‘3 '7.
E 1 g
2" 2d
0 ' I ' r ' f V r I o ff I I I I I I T I
44 46 48 50 52 54 82 84 86 88 9o 92 94
res num nes num '

78

Figure 3.7. The RMSF for residues of Loop 2 and Loop '3 in YpHPPK: (a) and (b) are for
RMSD based only on backbone atoms for Loop 2 and Loop 3, respectively, while (c) and
(d) are based on all atoms. The RMSF based on normal MD are shown with solid lines
and the RMSF based on HREM are shown with dashed lines. The RMSFs are calculated
for all the snapshots in the two three nanosecond long trajectories based on normal MD

and HREM picked when the RMSDs and RMSFs for the trajectories have been stable.

 

 

 

 

 

 

 

 

 

3-
3-
b
(a) ( )
6 4 5 ‘
5 ‘0 ~.,-,
24 2-
1 /\ \
0 I ' I ' I ' I ' l ' l O ' I V I r I ' I j
44 46 48 50 52 54 86 88 90 92 94
res num res num
84 8“
(c) ‘ (d)
6_ 6-
g 4- .74 E 4‘
u- ,' ‘0
g 4 ....... E
2-1 A 2.
0 I I r r f F I I I 0 ﬁ I I I I I j
44 46 48 50 52 54 86 33 90 92 94
'93 num res num

79

through HREM simulation is much larger, especially for residues in the middle of the
loops that are less susceptible to the inﬂuence of the rigid core. Based only on backbone
atoms, the residues in the middle of the loops for the two proteins all have an RMSF
greater than 3 A. It is noteworthy that the YpHPPK backbone of Loop 2 exhibits a much
larger ﬂexibility compared with that of EcHPPK. The RMSF based on backbone atoms is
mostly more than 5 A for the residues in the middle of Loop 2 of YpHPPK, with the
central three having an RMSF close to 7 A, while the similarly calculated RMSF is
mostly between 3 and 4 A for EcHPPK. Again, since the HP binding pocket is close to
Loop 2, this large difference suggests that there might be different conﬁgurations of
residues contributing to the HP binding pockets for the two proteins. The relatively large
RMSF based on backbone atoms from trajectories obtained through the HREM
simulations makes it possible to search for the potential existence of different
conﬁgurations suited for binding ligands. A possible reason for this large HREM loop 2
RMSF difference between the two proteins may be due to the short length of the Loop 3
of YpHPPK. Loop 3 of YpHPPK only contains eight residues, making it difﬁcult to

interact with Loop 2, leading to the possibility of enhanced freedom of motion for Loop 2.

Binding pockets for EcHPPK and YpHPPK

To investigate HP binding pocket conformations, and their potential differences
between EcHPPK and YpHPPK, residues that are close to HP in the respective crystal
structures are selected. These residues include T hr42, Pro43, Pro44, Leu45, Try53,
Asn55, and Phe123 for EcHPPK and Thr43, Lys44, Pro45, Leu46, Phe54, Asn55, and
Phe123 for YpHPPK. (These residues will be referred to as pocket residues hereafter.)

Only the backbone atoms of these residues are included in view of their greater stability

80

relative to side-chain atoms. The g_cluster method described in the methodology section
of this chapter can be used to divide the total 3000 snapshots of the HREM trajectories
for each of the two proteins into different groups according to the RMSD calculated
based on backbone atoms of the pocket residues. By using a 2 A cutoff in g_cluster, the
HREM trajectories for EcHPPK and YpHPPK can be clustered into 7 and 8 clusters,
respectively. (The populations of the clusters are shown in Table 3.2 and Table 3.3). The
observation that the YpHPPK trajectory can be separated into more clusters using the
same cutoff value may be due to the greater ﬂexibility of Loop 2 of YpHPPK. The
g_cluster method was also applied to the normal MD trajectories using the same 2 A
cutoff, but all snapshots turned out to be in the same cluster for EcHPPK while 99.9% of
the snapshots were in the same cluster for YpHPPK. The backbone atom conformations
of the pocket residues of the central structures are shown in light green in Figures 3.8 and
3.9 for cluster 1 and cluster 4 obtained from the EcHPPK trajectory and for clustersl, 2, 5
and 6 from YpHPPK. The crystal structure conformations are shown in bright yellow in
the two ﬁgures for comparison. HP is also put back and shown in magenta in the two
ﬁgures, according to its place in the respective crystal ternary complexes, to give a
clearer picture of the binding pocket. Central structures of other dense clusters are not
shown since they are either similar to the crystal structures or to the central structures
shown in Figures 3.8 and 3.9. From Figure 3.8, it is clear that the central structure of
cluster 1 for EcHPPK has residues Thr42 and Pro43 moved downward and residues
Pro44 and Leu45 move in to be closer to the HP position in the x-ray ternary complex.
And, there exist substantial changes of the (D and ‘I’ dihedral angles of Pro44 and (I)

dihedral angle of Leu 46 in the structure. These changes are mainly a consequence of the

81

Table 3.2. The number of snapshots in the clusters constructed for EcHPPK.

 

 

 

 

 

 

 

 

Cluster Number of snapshots
outof3000

1 1719

2 1058

3 183

4 29

5 9

6 1

7 1

 

 

 

 

82

Table 3.3. The number of snapshots in the clusters constructed for YpHPPK.

 

 

 

 

 

 

 

 

 

Cluster Number of snapshots
outof3000

1 1692

2 906

3 220

4 74

5 39

6 36

7 l 8

8 15

 

 

 

 

83

Figure 3.8. EcHPPK backbone atom conformations of pocket residues of the HP binding
pocket of the central structures for clusters 1 and 4 of HREM snapshots: The leﬁ panel is
for cluster 1 and the right for cluster 4. The central structure of each cluster is shown in
green, the starting crystal structure is shown in yellow, and HP, in magenta, is placed

according to the ternary complex crystal structure.

 

84

Figure 3.9. YpHPPK backbone atom conformations of pocket residues of HP binding
pocket of the central structures for cluster 1, cluster 2, cluster 5, and cluster 6 of HREM
snapshots. The left panel is for cluster 1 (top) and cluster 5 (bottom) and the right panels
for cluster 2 (top) and cluster 6 (bottom). The central structure of each cluster is shown in
green, the starting crystal structure is shown in yellow and HP, in magenta, is placed

according to the ternary complex crystal structure.

9%
x, a

"'

434W
‘1 ‘7.

85

bending down of the leﬁ half of loop 2 (residues 44 — 49) toward the protein core. The
central structure of cluster 4 shown in Figure 3.8, has residues, Pro43, Pro44, Leu45, and
Try53, moved upward, and there are substantial changes of the (D and ‘1’ dihedral angles
of Pro43. These changes are mainly due to loop 2 transiting toward the solvent. For
YpHPPK, Figure 3.9 shows that the central structures of clusters 1 and 5 both have
residues Thr43, Lys44, Pro45, Leu46, and Phe54 displaced substantially upward. In the
central structure of cluster 1, there are substantial dihedral changes from the starting
crystal structure for the ‘I’ of residue Thr43 and the (l) dihedrals of Pro45 and Leu46. In
the central structure of cluster 5, the large dihedral changes are for ‘1’ of Lys44 and the (I)
angles of Pro45 and Leu46. The (D dihedrals of Pro45, and Leu46 are also substantially
different for the two central structures. These changes are mainly due to the extension of
loop 2 toward the solvent. The central structures of clusters 2 and 6 for YpHPPK shown
in Figure 3.9 both have residues Thr43, Lys44, Pro45, Leu46, and Phe54 moved upward,
with the displacement for cluster 6 larger. The Pro45 ‘I’ and Leu46 (D of the central
structure of cluster 2 are substantially different from those in starting structure. For
cluster 6, the different dihedrals are ‘I’ for Lys44 and Pro45 and (D for Pro45 and Leu46.
The upward displacements are mainly due to the extension of loop 2 toward the solvent.
To get a solid picture of how the HP binding pocket conformations really change
according to the aforementioned changes in backbone atom conformations of the pocket
residues, the HP binding pocket proﬁles obtained using the methods presented in the
methodology section in this chapter are shown in red in Figures 3.10 and 3.11 for
EcHPPK and YpHPPK, respectively. It is clear from the two ﬁgures that when Loop 2

moves downward towards the core compared with the starting crystal structure (cluster 1

86

Figure 3.10. The HP binding pockets proﬁles for EcHPPK. The HP binding pocket
proﬁles are shown in bright red in the following plots. (8) shows the proﬁle for the
starting crystal structure, and (b) and (c) show proﬁles for the central structures of cluster

1 and cluster 4, respectively.

(3)
Crystal

    
 

(b)
Cluster 1

 

87

Figure 3.11. The HP binding pockets proﬁles for YpHPPK. The HP binding pocket
proﬁles are shown in bright red in the following plots. (a) shows the proﬁle for the
starting crystal structure, and (b), (c), (d), and (e) show proﬁles for central structures of

cluster 1 cluster 2, cluster 5 and cluster 6, respectively.

(a) '

  

   

88

Figure 3.11 (continue)

Cluster 5 ' -

   

89

for EcHPPK and clusters 2 and 6 for YpHPPK), the HP binding pockets get narrower,
while they get wider when Loop 2 moves upward towards the solvent (cluster 4 for
EcHPPK and clusters 1 and 5 for YpHPPK). Figures 3.10 and 3.11 also show that the HP
pocket conformations for the central structures shown are quite different for EcHPPK and
YpHPPK. This ﬁnding supports the hypothesis that it might be possible to design

inhibitors that are effective for only one of the two proteins.

Test of adding water molecules in HP binding pockets

MD simulation trajectories sometimes depend on the starting state. A concern
here is that for the results discussed in this chapter, HP was simply removed from the
ternary structure to make the starting structure. In addition, from the conﬁguration
generated by our start up protocol it was clear that no water molecule was left in the
region that was occupied by the HP before its removal. Hereaﬂer, we will refer to this
region as HP binding pocket region. The CUKMODY start up protocol consists of
placing water molecules on a grid at the density appropriate to the normal water density
for T=303. Then, all water molecules that overlap protein atoms are eliminated, based on
a van der Waals overlap criterion. Since no waters were left in the HP pocket, it was not
clear whether there would be some substantial difference in simulated trajectories if water
molecules could be added into the HP binding pocket at the beginning of the MD.

To investigate the issue, extensive tests were canied out by adding water molecules into
the HP binding pocket of the starting structure for the EcHPPK simulation, as the
following describes. (Note that water molecules can be added into the HP binding pocket
region because the region has a somewhat ﬂat proﬁle, a feature that leads to the non-

speciﬁc grid routine tending to eliminate waters from such a region.) In a ﬁrst try, four

90

water molecules were added into the HP binding pocket region as shown in Figure 3.12
(a). (Note: other water molecules remaining after water removal-routine described above
were not shown in the ﬁgure, because otherwise there would be more than 7000 water
molecules in the ﬁgure, and the four separately added in water molecules could not be
seen. However, in terms of simulation parameters, there were no differences between the
water molecules remaining after the removal routine and the four water molecules added
separately.) The conﬁgurations (distances and orientations) of the four added water
molecules were adjusted so that hydrogen bonds (to deﬁne a hydrogen bond, the distance
between two oxygen atoms is set to be around 2.7 A and the angle of oxygen—hydrogen-
oxygen is set to be close to 180 degrees) are present between any two adjacent water
molecules. Then, normal MD was conducted for a nanosecond with the structure shown
in Figure 3.12 (a) as starting structure. In Figure 3.13 (b), the conﬁguration after 1 ns of
simulation is shown, it is clear that the water molecules originally put inside the HP
binding pocket region moved out of the region to some places between Loop 2 and Loop
3. Those places between Loop 2 and Loop 3 would contain water molecules simply by
executing the normal start-up procedure. The actual time needed for the water molecules
to move away from the HP binding pocket region is only about 200 ps. One possible
reason for water molecules to leave the HP pocket region shortly after being added
explicitly is that the residues close to the HP binding pocket region (Thr42, Pro43, Pro44,
Leu45, Try53, AsnSS, and Phe123) are mostly hydrophobic. Another possible reason is
that the HP binding pocket region is in part solvent exposed and also close to the highly-
charged ATP phosphate tail. The high charge on ATP phosphate tail and hydrophilic

property of residues around it may also help to explain the observation that the added

91

Figure 3.12. (a) The conﬁguration of EcHPPK and four hydrogen bonded added water
molecules. (b) The conﬁguration after 1 ns normal MD simulation with the starting

structure as shown in (a). The added water molecules are shown in yellow.

 

92

four water molecules ﬁrst moved to the region between Loop 2 and Loop 3 close to the
ATP phosphate tail after leaving the HP binding pocket region. .Therefore, it seems that
adding those four water molecules should not have substantial inﬂuence on the HREM
simulation results of HPPK.

Second, a similar test as described in the above paragraph, but by adding 16
additional water molecules into the region between Loop 2 and Loop3 by a water-
insertion routine105 to make sure the region was fully occupied by water molecules at the
very beginning, was tried. The result is shown in Figure 3.13. From Figure 3.13 (b), it is
clear that the HP binding pocket region no longer had water molecules after 1 ns
simulation. Again, it seems that adding water molecules explicitly into the HP binding
pockets at the starting point should not have substantial inﬂuence on the simulated
sample conformations.

Moreover, the test using 4 hydrogen-bonded water molecules and the 16
additional water molecules was tried again by using smaller van der Waals constants for
the added 20 water molecules to reduce possible large repulsion at the beginning.
Because 4 of the water molecules were added in manually, relatively large vdw repulsion
might exist at the beginning, and cause these water molecules to be bounced out of the
HP binding pocket region quickly. However, even with smaller vdw constants, after 1 ns,
still no water molecules stayed at the place that was directly under Loop 2 and was
occupied by HP in the starting ternary structure. In summary, it seems that there should
be no big difference between starting simulations with an HPPK structure with water
molecules added into the HP binding pocket region and starting simulation simply with

an HPPK structure.

93

 

Figure 3.13. (a) The conﬁguration of EcHPPK with 20 added water molecules. (b) The
conﬁguration after 1 ns normal MD simulation with the starting structure as shown in (a).

The added water molecules are shown in yellow.

 

94

4. Concluding Remarks

A Hamiltonian Replica Exchange Method (HREM) molecular dynamics (MD) approach
was successfully applied to Met-enkephalin in the work described in last chapter. In the
work summarized in this chapter, a similar approach is applied. A HREM approach was
used to study the near-closed conformations of EcHPPK and YpHPPK The crucial point
of the HREM implementation is the decision as to which are the important degrees of
freedom. For a protein with a rigid core simulated in explicit solvent, only exciting the
degrees of freedom of the most ﬂexible parts of the protein (usually just some loops) is a
great advantage since their number is at least an order of magnitude smaller than the
number of other degrees of freedom in the system in typical MD simulations. In our
application of the method, by focusing on Loops 2 and 3, the HREM improves the MD
sampling of both EcHPPK and YpHPPK, when monitored by comparing the RMSFs
based on HREM trajectories and those based on normal MD simulation.

Using clustering methods focused on the conformations generated with the
HREM, some well-populated, near-closed structures were obtained for EcHPPK and
YpHPPK. In terms of some of the key residues that form the HP binding pocket, the
clustering was able to provide some distinct residue conformations. Most importantly, the
clusters found for EcHPPK and YpHPPK are distinguishable, indicating that there are
differing near-closed conformations suited to HP binding. Those structures could be used
as targets to design inhibitors to test the hypothesis that selecting inhibitors to ﬁt some
near-closed conformation that is signiﬁcantly more accessible in the targeted protein than

in its homolog is an effective strategy to enhance inhibitor speciﬁcity. If successful, those

95

designed inhibitors might be used as new narrow band antibiotics to help deal with the

problem of antibiotic resistance.

96

Chapter 4. Studies of HPPK-ATP conformation space and ATP binding through

using enhanced MD methods

1. Introduction

For the simulation processes conducted for the work described in Chapter 3, one
interesting observation is that the ATP is very stable. During all the simulations, ATP
always stays in its binding pocket with no substantial change of its conformation.
Experimental results show that HPPK binds ATP (in those experiments, the non-reactive
analog AMPPCP was used ) with high afﬁnity and that ATP binds ﬁrst, and slowly,
followed by very rapid HP bindingmé. Without the presence of ATP, HPPK does not bind
HP in measurable quantities. These data suggest that there is a “proto-pocket” for HP
binding that is formed by ﬁrst binding ATP. The slow ATP binding is consistent with the
idea that there is conformational ﬂexibility of HPPK, and that 'ATP selects from a
conformational ensemble. Once ATP is bound, the HPPK-ATP complex may stabilize to
a set of conformations appropriate for the rapid uptake of HP. This chapter is devoted to
the study of the HPPK-ATP conformation space and possibly how ATP can bind to
HPPK.

Similar to the last chapter, the HPPK—ATP (only E. coli HPPK is studied in this
chapter) is considered in this chapter as the basic species of the investigation (HPPK-
ATP notation is used as shorthand for the complex of HPPK, ATP and the two Mg2+ ions
that are required for the catalytic activity) A natural and simple approach to the
investigation, of course, is to use conventional MD simulation to generate trajectories of
conformations that can be used to address these issues. However, as shown before in

Chapter 2 and Chapter 3, conventional MD oﬁen suffers the problem that integrating

97

Newton’s equations of motion with femtosecond scale time step will have difﬁculty
reaching times that can capture substantial conformational transitions and ﬂuctuations
that may be occurring on the micro to even seconds time scales owing to the complex
conﬁguration space that has substantial barriers in the high-dimensional potential energy
surface. A number of methods exist that speed up transitions over potential energy
barriers. One class of method introduces restraints that operate on the atoms to direct the
trajectory between some initial and ﬁnal state9. The restraints can range from being
applied to all the atoms to drive the system between the (known) endpoints with complete
conformity. Or, they can be applied to just a particular atom-atom distance as would be
appropriate for obtaining a one-dimensional potential of mean force. Another possible
approach is to use a reweighting method, which modiﬁes the potential Stu'face to generate
a trajectory that more readily surmounts barriers.27'30’33’34’107 The trajectory is then re-
weighted back at each step to restore Boltzmann sampling. (More details about
reweighting methods are provided in Chapter 1.) Cukier and Morillo developed a version
of reweighting that can be referred to as targeted reweighting.30 The term targeted implies
that rather than modify the entire potential surface, only very speciﬁc terms in the
potential are to be modiﬁed. Targeting can provide the ﬂexibility to address different
impediments to overcoming speciﬁc barriers on the potential energy surface. Targeted
reweighting can be viewed as the opposite extreme of the global, temperature REM.
(More details about REM are provided in Chapter 1 and Chapter 2.) The targeted
reweighting method was successfully applied by the authors onto a simple testing protein
model consisting of a chain of connected beads characterized by dihedral angles and the

van der Waals interactions among the beads.

98

In the work summarized in this chapter, a combination of a restraint method and a
targeted reweight method was used to explore the conformation space and mechanism of
ATP binding in the HPPK (More details about HPPK are provided in the ﬁrst
introduction chapter and last chapter.) The simulation protocol is summarized in Figure
4.1. The cylinder denotes a set of residues that form a binding pocket for ATP
(represented as a Wiggly line) with orientations of key residue side-chains that are
important for binding ATP denoted by curved arrows. The initial HPPK-ATP binary
complex is based on the ternary crystal structure. Restraint simulations are used to direct
HPPK-ATP toward the apo crystal structure74 (pdb entry lHKA). This restrained
simulation was done in the hope that the ATP might move away from its binding pocket
during the process. However, the simulations show that ATP remains bound throughout
the process even though residues initially trapping ATP gradually move away, through
ATP trapping intermediates, until reaching ATP trapping states that have an apo-like-
HPPK structure. ATP is still held in trap states by a network of hydrogen bonds and salt
bridges that operate between ATP-2Mg2+ and a number of HPPK core residues. The
reweight simulation that starts from this last trap state then shows that ATP can separate

from HPPK through a series of breaking and making hydrogen bonds and salt bridges.

2. Methodology

Molecular Dynamics Simulations

As similar in previous chapters, the CUKMODY protein molecular dynamics

code was used to carry out the simulations. The code was modiﬁed to incorporate the

99

Figure 4.1. A schematic representation of the MD simulation protocol. The cylinder
denotes residues that form a binding region for ATP with key residue side-chain
orientations denoted by curved arrows. ATP is represented as a Wiggly line. The initial
HPPK-ATPMgz binary complex is constructed from the ternary crystal structure. Ten
consecutive restraint simulations direct HPPK-ATPMgz toward the apo crystal structure.
ATPMgz remains bound throughout the process even though residues initially trapping
ATPMgz gradually move away, through trapping intermediates, until reaching open
trapping states. The reweight simulation then shows that ATPMgz can separate from

HPPK through a series of breaking and making hydrogen bonds and salt bridges.

     

 

initial binary complex openﬂtrapping released state

restraint MD restraint MD

 

 

trapping intermediate

100

 

reweight method as discussed below.

All simulations were run at 303 K under ﬁxed number, volume and temperature
(NV T) conditions.15 The simulations were carried out in a cubic box with sides of 64.1 A,
having 7671 waters added after waters overlapping the protein were removed. For the
evaluation of the electrostatic and the attractive part of the Lennard-J ones energies and
forces, the PME method was applied with a direct-space cutoff of 9.0 A, an Ewald
coefﬁcient of 0.32, and a 72x72x72 reciprocal space grid. Five sodium cations were
added to neutralize the system. Bond lengths were constrained with the SHAKE
algorithm8 allowing a 2 fs time step and the temperature was globally controlled with a
Berendsen thermostatlo with relaxation time of 0.2 ps.

The starting structure for simulation of HPPK-ATP was the same as the one used
for HREM simulation of E. coli HPPK described in Chapter 3. For constructing the
starting structure, the two Mg2+ cations were each covalently linked to an oxygen
phosphate atom of ATP. One magnesium, designated as Mgl, is linked to an alpha
phosphate oxygen and the other is linked to a gamma phosphate oxygen, designated as
Mg2, in accord with their placement in the crystal structure. The ATP phosphates were
assumed fully deprotonated (ATP thus has a total charge of —4) in agreement with the

ATP protonation state when ligated to Mgz‘i.104

Restraint method

For the restrained MD simulations that were used to transit the protein
conﬁguration between two desired conformations, restraints on distances between mass

centers of paired atom groups were introduced. Harmonic restraint potentials

V=(k/2)(x—xo)2 with force constant k =2 kcal/mol/A2 were used to restrain the

101

current distance x to be around a desired target distance x0. A small restraint force
constant was used to permit the HPPK-ATP complex to ﬂuctuate extensively without
being dominated by the effects of the restraints. The restraints were applied in a step-wise
manner, by dividing the transition between the initial and ﬁnal states into 10 sequential

windows. Each consisted of 500 ps of MD simulation with restraints set as stated above
with x0 = "(xﬁm - x,,,,-,,-a,)/ 10+ xmma, , n the window number running from 1 to 10,

and xﬁnanapo and xmmanclose denoting a distance in the apo and closed (the temary

complex with HP removed) crystal structures, respectively.

Reweighting method

The reweighting method”3 1 to accelerate conﬁgurational sampling is described in
detail in Chapter 1. The essential idea underlying reweighting method is that a modiﬁed
potential surface can be used during actual simulation to reduce barriers and then, by
reweighting back when analyzing trajectory data, the correct ensemble average can be
obtained. In extreme, if the total potential is modiﬁed multiplicatively then it simply
corresponds to changing the temperature of the simulation. The essence of targeted-
reweighting method developed by Cukier and Morillo30 is that only speciﬁc terms in the
potential energy are targeted, and these terms are chosen based on speciﬁc information
such as the presence of strong electrostatic interactions corresponding, for example, to
salt bridges between residues or what we still will refer to as salt bridges between
charged portions of ATP and the magnesium cations and ionized residues.(These are the

potential terms targeted in the work summarized in this chapter).

102

For speciﬁc potential term V0. for certain pair of atoms 1' and j to be targeted, a

linear scaling is used in the work summarized in this chapter, Vij’" = xii/U. A number of 2

values were tried. The actual results are quite sensitive to the ,1 value chosen with too
large a value (closer to 1) leading to no effect and too small leading to a release that is so
rapid that the sampling is very poor. The choice of l=0.71 led to escape on the ns time
scale. The stages that are found took about 1 ns for this choice, and the simulation was
run for a total of 2 ns to make sure that ATP, once unbound, did not rebind.

As also mentioned in Chapter 2 and 3, the Ewald9 method, used in our
CUKMODY program, requires the system to be electrically neutral. Thus, in general, a
direct inclusion of the scaling factor into the Ewald calculation is hard to implement
because it would require ﬁnding an overall neutral set of interactions to scale. (The
Ewald method calculates energy andforces in reciprocal space that is atom-based versus
the real space part that is interaction-based.) Since the modiﬁed potential surface does not
have to correspond to a real surface this is not, in principal, a problem. However, as in
any modiﬁed potential method, if the trajectory becomes too distorted from the true
potential trajectory, the reweighting method becomes counterproductive. To deal with
this issue, because we will only target interactions within the protein and/or ATP, and
these interactions are dominant within the same MD image cell, the potential
modiﬁcation will only be applied to the primary cell and the real space energy and forces.
For example, to target an electrostatic interaction (the only type of interaction targeted in
the work summarized in this chapter) between atoms labeled 1 and 2, the interaction

energy V12 and corresponding atom forces E and F2 are calculated using the Ewald

method with it =1 , which is just the normal Ewald MD procedure. Next, the differences

103

between using 2 =1 and using the scale factor 2 are calculated directly ﬁ'om Coulomb’s

law in the primary cell as V131”? = (it —1)(q,q2 / rm) and the corresponding forces FI‘Wr and
de'f are obtained by differentiation. Then, the above differences are added to the Ewald
V12 , E and F2 calculated before to obtain the net values Vl’z'e‘ = V12 + Vl‘fﬂ ,

F,“ = F1+Ediﬂ and F,” = F2 +F2d'ﬂ. These net forces are used to advance the system

conﬁguration and the net energy is used to obtain AV. The other (van der Waals and

internal) interaction terms can be obtained by a similar scheme.

3. Results and Discussion

Restraint MD of HPPK-ATP

As described in the methodology section, the simulations are based on the E. coli
ternary complex of HPPK, AMPCPP with two magnesium cations and HP, with HP
removed and AMPCPP replaced by ATP. The ﬁrlly deprotonated ATP phosphates (total
charge —4) with the two ligated Mg2+ cations provides a neutral, though highly polar,
ligand for HPPK. A primary 7 ns conventional MD simulation of this binary complex led
to little change in conformation from the ternary (closed) structure. Furthermore, we also
carried out simulations using the HREM approach similar to the one used for the work of
last chapter to see if HPPK by itself, again started ﬁ'om the ternary crystal structure with
both ATP and HP removed, would take on a more apo-like conformation but these
simulations also led to modest rearrangements. For these reasons, the HPPK-ATP binary
complex was chosen to be the focus and a particular set of restraints were picked to drive

the loops towards a conformation similar to the apo structure. Restraints were picked

104

 

partly based on the observation that long side-chains of some of the Loop 3 residues
(principally Arg82, Arg88 and Trp89) drape over the ATP when HPPK is closed,
forming a cylinder whose front face is composed of these residues, as schematized in
Figure 1. These side-chains should be pushed out toward the solvent in conformity with
the apo crystal structure. Other restraints were chosen to make the backbone
conformation of Loop 2 and Loop 3 similar to that of the apo crystal structure. To
achieve this transformation, harmonic restraints (with force constant k=2 kcal/Az) on 15
distances, measured between the mass centers of the atom pairs listed in Table 4.1, were
introduced. The restraints were applied in a step-wise manner, by dividing the transition
from the HPPK-ATP closed to open form into 10 sequential phases, with each phase
consisting of 500 ps of MD, to interpolate between the endpoints.

Figure 4.2 displays a superposition of representative structures selected from three
of the above ten 500 ps time intervals corresponding to windows 1, 5 and 10. From the
ﬁgure, it is clear that the cores of the three structures are well-aligned, which suggests
that even under the forces arising from converting HPPK-ATP from its closed to open
forms, there is minimal perturbation of the core, again showing the stability of the protein
core. By comparing the loop conformations (including both backbone and important long
side-chain conformations, such as Arg82, Arg88 and Trp89), it is clear that Loop 2 and
Loop 3 were pulled gradually ﬁ'om closed-oriented to open-oriented conformations. In
particular, the side chains of Arg82, Arg88, and Trp89 that point inward to seal ATP in
the binding pocket of the closed form progressively move to point out to the solvent in
the open form. Another key feature to note is that the ATP position is very stable along

the trajectory, with all its parts hardly moving. Our hope was that once the side-chains

105

Table 4.1. The atom(s) whose distances are used for the restraint simulations

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Atom(s) l Atom(s) 2
CA“) Gly46 CA Asp97
CA Pro47 CA Asp97
CA Asp49 CA Asp97
CA Arg82 CA Asp97
CA Arg84 CA Asp97
CA Ala86 CA Asp97
CA Arg88 CA Asp97
CA Trp89 CA Asp97
CA Arg92 CA Asp97
cz‘b) Arg82 CA A8997
CZ Arg84 CA Asp97
CZ Arg88 CA Asp97
CZ Arg92 CA Asp97
CZ@ Arg92 CA Leu78
C52“) Trp89 CA A8997

 

 

(a) CA denotes a C-alpha atom; (b) CZ stands for the 8 carbon and amino groups of the Arg side-chain; (c)
This restraint was not used during the ten-window restraint procedure, but was later realized to be
important to ATP separation from HPPK. It was added later, while holding the loops open in the tenth

window; ((1) CE2 denotes a carbon atom of the 6-member aromatic ring.

106

Figure 4.2. Superposition of representative structures from three windows along the
restraint pathway from closed to open. The structure for window 1, which is quite similar
to the closed form, is colored in yellow; the structure for window 5 is colored in purple;
the structure for window 10 is colored using the normal convention. ATP in each window
is shown in “ball and stick” mode, and residues Arg82 and Trp89 are shown in “stick”

mode. It is clear that ATP remains bound to HPPK throughout the restraint simulation.

 

107

that cover part of ATP in the closed form move away, ATP might be able to get separated
from HPPK or at least show substantial change in its conformation. That this did not
happen suggested that there must be other interactions responsible for maintaining ATP
bound to HPPK in the absence of HP.

To further the study, a 9 ns MD simulation was carried out starting from the end
conﬁguration of the above-mentioned step-wise restraining procedure. The restraints as
listed in Table 4.1 were still maintained with equilibrium distances set to be equal to the
ones measured from the apo HPPK crystal structure. This simulation was carried out to
see if ATP would leave the active site automatically when HPPK was restrained to its
apo-like state. In Figure 4.3 (left panel), a CPK view of the HPPK x-ray closed structure
is shown for comparison with the CPK view of the protein structure after holding the
restraints for 9 ns Figure 4.3 (right panel). Two points are worth noting. One is that from
the viewpoint of the ﬁgure, it is clear from the right panel picture that there is nothing
blocking the pathway for ATP to leave the binding pocket, while the path for ATP to
leave in the left panel picture is clearly blocked by Ala86, G1u87, and the side-chains of
Arg88 and Trp89. For clarity, these four residues are colored yellow. The other point is
that there is minimal difference between the two pictures with respect to the ATP center
of mass position relative to the protein core, though the conformations of ATP in the two
ﬁgures are a bit different. (The conformation for the MD snapshot shown in the right

panel is more compact in comparison with the x-ray structure shown in the left panel).

Reweighting MD of HPPK-ATP

It is clear from the restraint MD study that the AT P-HPPK interactions are

108

Figure 4.3. (left panel) CPK View of the starting MD simulation structure derived from
the ternary complex. (right panel) CPK View of the MD structure after maintaining the
restraints for 9 ns subsequent to the restraint opening protocol. ATP is colored purple.
The yellow residues, 86-89, which cover part of ATP in the initial ternary complex, are

mainly replaced by solvent interactions in the MD-opened structure.

 

109

 

 

sufﬁciently strong that on an MD time scale (here 9 ns) ATP remains bound, even though
the interactions present in the closed structure that are certainly designed to hold ATP in
position are absent in the MD-open form. To attempt to get the ATP separated from the
HPPK, one possible and common scheme is similar to what was just used to open HPPK-
ATP. However to do so either requires initial and ﬁnal states as given by e. g. crystal
structures, or if an obvious path does exist the ligand could be pushed out with restraints
in an appropriate direction. However, here, an appropriate path is not clear from e. g. the
end of the restrained open simulation. More importantly, picking a particular path will
certainly introduce a prejudice into the simulation that may or may not be realistic. Thus,
as noted in the methodology section in this chapter, when speciﬁc interactions can be
identiﬁed targeted reweighting is an appropriate and more objective choice to accelerate
the exploration of conﬁguration space that here is characterized by the ATP-HPPK
interaction.

It is clear that in both the closed form crystal structure and the restraint MD
opened structure the two magnesium cations are very close to the carboxylate groups of
two HPPK core residues, Asp95 and Asp97. For example, in the restraint MD opened
structure the distances of Mg] (associated with the or phosphate of ATP) to the closest
carboxylate oxygens of Asp95 and Asp97 are 2.5 and 3.0 A, and the corresponding
distances for the Mg2 (associated with the 7 phosphate of ATP) are 3.3 and 4.6 A. Even
though the side-chains of Arg82, Arg88 and Trp89 no longer block the path for ATP to
leave its pocket, clearly there still are strong interactions with the protein; notably with
the carboxylate groups of core residues Asp95 and Asp97, and these strong “salt bridges”

are good candidates for important barriers restraining ATP4‘(Mg2+)2. Unscreened charge

110

interactions that are on the scale of about 100 kcal/mol are essentially as strong as
covalent bonds and will clearly prevent the movement of ATP on any normal MD time
scale. Therefore, these salt bridges are good candidates to subject to the targeted
reweighting scheme discussed in the methodology section in this chapter. The targeted
atoms are the carbon and two oxygens of the carboxylate groups in these two residues
and both magnesium cations; thus, the Mg] and 'Mg2 to Asp95 and Asp97 electrostatic
interactions are targeted. A uniform scaling value of 7t= 0.71 (see the methodology
section for details about the reason for picking this value) was used. The modiﬁed
trajectory was started up ﬁ'om an endpoint conﬁguration of the restrained MD simulation.

The targeting leads to ATP separating from HPPK on the nanosecond time scale.
Six stable states were found along the separating process. The stable states are deﬁned by
examining the reweight energy — the electrostatic energy between the two magnesium
cations and the two residues — along the trajectory. They correspond to plateaus in this
energy. The initial modiﬁed energy of interaction V * is about 300 kcal/mol while at the
ﬁnal state it is about 30 kcal/mol. This energy scale is so large that we were not
successful in carrying out the reweighting required to obtain the correct Boltzmann
populations along the trajectory. That would be necessary if a potential of mean force
were the objective, but the modiﬁed trajectory still provides legitimate states along the
separating process. Six representative structures from the trajectory were picked for
display to show different states as ATP separates, and their superposed pictures are
shown in Figure 4.4. The ﬁrst structure (ATP in yellow) is the starting structure (endpoint

conformation of the restrained MD simulation) and the sixth one (ATP in red) is where

111

Figure 4.4. ATPMgz for six different conformations as it separates from HPPK are shown
with yellow, green, blue, purple, orange, and red indicating stages one through six. The
gradual separation of ATP is evident as a combination of reorientational and translational

motion.

 

112

ATP has moved out of the binding pocket. The middle four structures (the ATP is colored
green, blue, purple, and orange, respectively) are stable states along the path of separation.

The gradual exit of ATP is evident from the Figure 4.4, and it is accomplished
through a combination of mass center movement and rotation of the ATP, although it is
difﬁcult to see details of the motion from this ﬁgure. Therefore, the relevant hydrogen
bonds and salt bridges were analyzed for the ﬁrst 5 representative structures, and the
results for structures 1, 3 and 5 are shown in Figure 4.5 that displays hydrogen bonds (left
panels) and salt bridges (right panels) and are listed in Table 4.2 (hydrogen bonds) and
Table 4.3 (salt bridges). Between the starting structure (structure 1) and structure 2, the
mass center position of ATP does not change much, while its conformation does change
signiﬁcantly so that three new hydrogen bonds were formed and the two salt bridges
involving one of the two Mg ions, Mg2, were broken. Between structure 2 and structure 3,
ATP moved along its path away from the binding pocket somewhat; whereby three old
hydrogen bonds and one salt bridge were broken, while one new hydrogen bond and one
new salt bridge were formed. It is noteworthy that ATP has changed its conformation
somewhat so that some old hydrogen bonds and salt bridges can be preserved with the
change of its mass center position. This aspect of ATP motion is in evidence during the
whole process. Throughout the movement ﬁ'om structure 3 to 4, though ATP underwent a
large mass center position change, three of the four old hydrogen bonds were preserved,
and the one that was broken was replaced by a new one with similar character. For the
salt bridges, the one between Mgl and Asp97 was lost, while the one between ATP and
Arg110 changed a little bit. Again, though ATP moved a large step down its path

(monitored by the energy of interaction with Asp95 and Asp97) of moving away from the

113

Figure 4.5. Details of the hydrogen bond and salt bridge interactions as ATP separates
from HPPK. Left panels, from top to bottom, are hydrogen bonds (dotted lines) for stages

1, 3 and 5. Right panels, the corresponding salt bridges (solid lines). See Table 4.2 for

hydrogen bond and Table 4.3 for salt bridge particulars.

 

114

Figure 4.5 (continued)

 

115

Table 4.2. The hydrogen bonds that are present in the stages of ATP separation from

 

 

 

 

 

 

 

 

 

 

 

 

 

 

HPPK.
Structure Number Hydrogen Bond
Atom of ATP Atom of HPPK
Structure 1 AN6 (Nitrogen) Carbonyl Oxygen of Ile98
(starting AO3 (Oxygen) Terminal Nitrogen on
structure) guanidine group of Argl 10
A05 (Oxygen) Terminal Nitrogen on
guanidine group of Argl 10
Structure 2 AN6 (Nitrogen) Carbonyl Oxygen of Ile98
AN 6 (Nitrogen) Carbonyl Oxygen of Thrl 12
AN 7 (Nitrogen) Backbone Nitrogen of Thrl 12
A03 (Oxygen) Terminal Nitrogen on
guanidine group of Argl 10
A05 (Oxygen) Terminal Nitrogen on
guanidine group of Argl 10
AOZPG (Oxygen) Oxygen of the phenol group of
Tyrl 16
Structure 3 AN6 (Nitrogen) Carbonyl Oxygen of Ile98
AN6 (Nitrogen) Oxygen of the hydroxide of
Thrl 12
AN 6 (Nitrogen) Backbone Nitrogen of Thr112

 

 

 

 

116

 

Table 4.2. (continued)

 

 

 

 

 

 

 

 

 

 

 

AN 7 (Nitrogen) Oxygen of the hydroxide of
Thrl 12
Structure 4 AN 6 (Nitrogen) Carbonyl Oxygen of Met99
AN 6 (Nitrogen) Oxygen of the hydroxide of
Thrl 12
AN6 (Nitrogen) Backbone Nitrogen of Thrl 12
AN 7 (Nitrogen) Oxygen of the hydroxide of
Thr112
Structure 5 AN 6 (Nitrogen) Oxygen of the hydroxide of
Thrl 12
AN 6 (Nitrogen) Backbone Nitrogen of Thrl 12
AN7 (Nitrogen) Oxygen of the hydroxide of

Thr112

 

117

 

Table 4.3. The salt bridges that are present in the stages of ATP separation from HPPK.

 

Structure Number Salt Bridgem

 

Group 1 (positively Group 2 (negatively charged)

 

 

 

 

charged)
Structure 1 (starting Mgl Carboxylate group of Asp95
structure) Mgl Carboxylate group of Asp97
Mg2 Carboxylate group of Asp95
Mg2 Carboxylate group of Asp97

 

Guanidine group of a-phosphate group of ATP

 

 

Argl 10
Structure 2 Mgl Carboxylate group of Asp95
Mgl Carboxylate group of Asp97

 

Guanidine group of (at-phosphate group of ATP

 

 

Argl 10
Structure 3 Mgl Carboxylate group of Asp97
Mg2 Carboxylate group of Aspl 17

 

Guanidine group of a-phosphate group of ATP

Arg110

 

Structure 4 Mg2 Carboxylate group of Aspl 17

 

Guanidine group of B-phosphate group of ATP

Arg121

 

 

 

 

 

118

Table 4.3. (continued)

 

Structure 5

 

 

 

Mg2 Carboxylate group of Aspl 17
Guanidine group of B-phosphate group of ATP
Argl 10

 

 

(a) Mgl (Mg2) refers to the one associated with the beta (gamma) phosphate of ATP

119

 

binding pocket, during the transition ﬁom structure 4 to structure 5, only one old
hydrogen bond was broken, while all 3 other hydrogen bonds and all the salt bridges were
preserved. Finally, the ATP-HPPK energetic interaction decreased to its smallest value
and ATP moved to positions (see Figure 4.4 with ATP in red) where it is essentially out
of the HPPK binding pocket. The successive and staged breaking of the original
hydrogen bonds and salt bridges and formation of new ones during the process of
separation show those salt bridges and hydrogen bonds are important in the simulated
separation process. The ATP leaving process discussed in the above could be interesting
for the following two reasons. First, when thinking about the ATP binding (versus
leaving) process, the binding process might be similar] to the ATP leaving process
simulated in the above. Then, for example, those salt bridges and hydrogen bonds, which
seemed to be important in the simulated ATP leaving process, might play important roles
in the binding mechanism of ATP. Second, the release process of the product ADP also
might be similar to the simulated ATP leaving process, due to the similarity of ATP and
ADP. Of course, the ATP leaving path shown in the above discussion was generated by
targeting the interactions of the two magnesium cations with Asp95 and Asp97. When
other interactions are targeted, it is possible that the generated ATP leaving path could be
different.

We were not successful in reweighting back the trajectory to obtain the correct
Boltzmann equilibrium sampling. Said otherwise, a potential of mean force was not
obtained because the large range in energies of the modiﬁed potential surface along the
trajectory did not permit the collection of sufﬁcient data for the required accurate. This

problem stems from the exponential dependence on the reweight potential. The only way

120

to address the problem would be to scale many more degrees of freedom and possibly
obtain a smaller range of energy values. However, those more implicit degrees of
freedom (in contrast with the explicit salt-bridge interactions that we identiﬁed) would be
very hard to identify. Ultimately, the difﬁculty is the very high-dimensional surface that
is being explored.

, One thing notable is that a reweighting scheme does not directly provide
dynamical information on the original surface. Once the potential surface has been
modiﬁed, a time scale cannot be directly extracted from the simulation. So, one should
not get the impression from the reweighting simulation that ATP can be released from
HPPK on the ns time scale. According to transition state theory, the real time step can
also be calculated by reweighting the simulation time step by the reweighting factor

exp(,BA V) , but this calculation requires that the energy difference AV to be zero for

areas other than the energy basins, which is non-practical implementation requirement for
complex systems.108 To obtain dynamic information from simulation of complex system
based on modiﬁed energy surface, Hamelberg and McCammon introduced.108 an
estimation method based on their special form of AV while Chen and Horing derived 109

a formula to calculate rate of transition out of the initial state for simulations using

Langevin dynamics.

4. Concluding Remarks

In the work summarized in this chapter, a set of restraints were found that did lead
to HPPK-ATP opening to apo-like conformations. One scenario that could have resulted

during the restraint simulation was that at some point along the ten stages used to span

121

 

 

the closed to apo-like conformations ATP would have separated from HPPK. This was
not found. Instead, as shown in Figure 4.2, the relative center of mass position between
HPPK and ATP hardly changed, even though several residues present in the ternary
structure that clearly block ATP from solvent exposure have moved away from the ATP
to be essentially solvated in the last restraint window, as in the apo crystal structure (see
Figure 4.3).

To investigate the interactions possibly responsible for holding ATP in its binding
pocket, a 9 ns conventional MD run was carried out (with the restraints maintained at the
last window values) starting from the endpoint of the HPPK-ATP restraint MD
simulations and the simulation trajectory showed that ATP still remained bound to HPPK.
Examination of the simulation data, illustrated by Figure 4.3, shows that there are several
residues (Asp95 and Asp97) that still interact strongly with ATP4‘(Mg2+)2. These core
residues are clearly part of the binding pocket for ATP and provide a large, mainly
electrostatic, interaction with ATP4'(Mg2+)2. In view of the speciﬁc salt-bridge-like
interactions between these charged residues and the magnesium cations that in turn have
a strong electrostatic interaction with ATP4‘, some of these interactions were the natural
candidates to use in a targeted reweighting scheme. The targeted reweighting does lead to
the separation of ATP from HPPK as shown in Figures 4.4 and 4.5 and detailed in Tables
4.2 and 4.3. The successive and staged breaking of the original hydrogen bonds and salt
bridges and formation of new ones dominate the interactions for ATP binding. In terms
of energetics, there are six stages of energetic interaction that can be identiﬁed over about

1 ns of simulation time and then, over the remaining 1 us of the simulation, the ATP

122

 

 

remains separated from HPPK. Thus, these salt bridges play a key role in the binding

mechanism.

123

Chapter 5. Some potential useful modiﬁcations of the HREM approach

In this chapter, several modiﬁcations of the HREM approach described in
previous chapters are discussed that are under development and will be useﬁrl to other

protein problems.

1. HTREM (Hamiltonian Temperature REM)
HTREMl '0 is developed through reformulating HREM by also scaling the kinetic

energy of the protein, thereby introducing an effective temperature 7: = l/kBﬁl , where

_L_[__N_s_]i[_ﬁ_]_1_ (5,1)
16,1 NP+NS :3 NP+NS .B/1
with Np and N, the number of degrees of freedom of protein and solvent, respectively, and

k is the same scaling factor used in the HREM approach, described repeatedly in previous
chapters. It can be shown that by appropriate implementation, similar to the HREM
approach, the Boltzmann equilibrium will result in the extended ensemble of the product
of all the systems’ ensembles for a sufﬁciently long trajectory for the HTREM approach.
The main usage of the HTREM approach is that by using these effective temperatures, a
pseudo melting curve can be constructed. That is, now each system’s trajectory is of use
and corresponds to a simulation at a series of effective temperatures. This method can be
used for predicting the stability of dimerized proteins (fraction bound as a function of the
effective temperature) and ﬁnds use to compare the stability of wild type and mutant

formsI 10.

124

2. HRTREM: (Hamiltonian Restraint TREM)

The HRTREM is formulated by introducing restraint potentials V01) to the
HTREM Hamiltonian H (,1) , so that the resulting Hamiltonian for HRTREM is

H (A, p): H (,1)+V(,u). The HRTREM is introduced in the hope of addressing two

HTREM possible deﬁciencies when applied to constructing melting curves. First, it is
probable for higher temperature systems to, having just exchanged, exchange back
because they are not so different (in truly different conformations). Second, the sampling
of separated monomers in a ﬁnite MD simulation box might be prejudiced toward less-

than-separated distances. For the above mentioned purpose, in implementation, only the

HRTREM systems at the highest temperature consist of H (,1, p) while systems at
successively lower temperatures only consist of H (2) H (2., ,u) systems and H (,1)
systems exchange at the highest temperature, and the chain of H ( ,1) systems exchange
down to the lowest temperature, building a cycle where H (1, ,u) systems may help to

break the undesired correlation among the higher temperature H (2) systems and this
break in correlation could propagate down to the lowest temperature system. The restraint
will also tend to keep the monomers separated as desired at the higher temperatures. Of
course, only the H (2.) systems are used for melting curve construction. (Note: in the
implementation of HRTREM in CUKMODY described in the Appendix Chapter, the
HRTREM is implemented by a simple HRTREM routine and a more general (allowing

variations in force constants and the form of restraints) HRTREMGEN routine. Details

are given in the Appendix, Chapter.)

125

3. HTREM_RMS (HTREM_RootMeanSquare)

The HTREM_RMS approach is formulated similar to the HRTREM approach. It
is introduced to help deal with the following problem. For a HTREM simulation, while
the high effective temperature systems could be effective in overcoming barriers, they

sometimes have the downside of tending to explore extensively in places that are not

important in the true sample space. Therefore, in HTREM_RMS, H (it =1) is used for

the experimental system (as described repeatedly in previous chapters, 1 =1 indicating

the true system Hamiltonian) and H (it, ,u) are used for all other systems with higher

effective temperatures. (Note, the main difference from HRTREM is here: for HRTREM,

only a few systems with the highest temperature are using H (A, ,u) .) Then the restraints

V(p) can be set to help to keep the systems with high effective temperature close to

some desired places in the sample space. For example, to keep a dimer from separating
too readily at some higher temperature and thus permit better sampling of side chain

interactions between monomers.) Of course, for HTREM_RMS, only the experimental
system (H (2. =1)) contains ensemble information that can be used directly. (Note: for
the routines described in the Appendix Chapter, both HTREM_RMS and its modiﬁcation,
HTREM_RMS_gradua1, are for the HTREM_RMS implementation. The difference is

that the later routine applies restraints more gradually. Details are provided in the

Appendix Chapter.)

126

 

 

4. HREM_Ion

The conformations of highly-charged proteins might be inﬂuenced by the ionic
atmosphere formed by explicit ions in the system. However, bound ions near charged
residues could be quite sticky in an MD context due to the strong electrostatic salt-bridge-
like interactions. A schemel ” whose modiﬁcations here will be referred to as HREM_Ion

was recently introduced to help increase the sampling efﬁciency of the ions around

highly charged DNA. A base system with normal potential energy V(O) is coupled to a set

of N other hypothetical systems, each with a unique potential V“) 2 VM +(,u-1)vi

(i=1,..N). Here v' denotes the interactions of the ith group of ions, and system i interacts

with a modiﬁed potential term ,uvi. Different systems (with different groups of scaled
ions) will have the possibility of exchanges that interchange their interaction strength
scales, 1 and ,u . Again, although the hypothetical systems could help to accelerate

exploration, only the base system provides information that can be used directly. In our
implementation oppositely charged ions need to be paired in each system in order to
maintain overall neutrality in the Ewald energy and force evaluation. Any number of

(i)

paired ions can be incoporated in V .

5. HREM_IonWater

The scheme for HREM_IonWater has a similar goal to HREM_Ion, though there
is an important difference in the underlying concept. All the systems of HREM_IonWater

approach are normal systems (all systems contain information that can be used directly).

127

 

For HREM_IonWater, instead of scaling groups of ions with ,uvi as in HREM_Ion, a

water molecule is considered as an ion whose charges on its three sites have been

modiﬁed so that a vi (y) that is parameterized with water charges (qH,q0,qH) instead

of ion charges (0,q,,0) can be introduced. For a certain exchange attempt between two

systems with potentials V0) (water i and ion j) and V0) (water j and ion i), if the attempt
is successﬁil, then the ions and water molecules will exchange positions and then there is
a large change in conﬁguration. Of course, ions used to exchange with water must be

chosen to have similar vdw parameters to water molecules to avoid bad vdw contacts.

128

Appendix. Description of implementation of HREM and several modiﬁcations

1. HREM

The HREM approach is implemented in the CUKMODY program based on the

112

former DREM implementation . The following is some description of how HREM is

implemented in CUKMODY.

Initial Loading

When constructing the main MD simulation object of the “Solute _polar” class, a
"setHChange" function is called in the constructor to load the parameters needed for
HREM and store the parameters in a certain structure member of “Solute _polar” class.
Then the “remMachine” class object, which will be used to conduct HREM procedures,
is constructed and the HREM parameters are copied from “Solute _polar” class object to
the “remMachine” class object into a structure for storing HREM parameters. Note that
the correct parameters and conﬁgurations are loaded to each MPI process according to its
rank. (Each MPI process will contain a certain conﬁguration and the HREM parameters
(which are characteristics of systems) will be exchanged according to HREM scheme

when attempting exchange).

Exchange Attempt

The ﬁrst thing to be noted is that there are two pthread threads being used, one
thread will take care of normal MD simulation (in this pthread thread, the MD simulation
may also be parallelized by using multiple openmp threads) and another thread will take

care of the work involving making exchange attempts, including exchanging information

129

 

 

among MPI processes (again, note that each MPI process contains a particular
conﬁguration, and a set of HREM parameters according to which system it is at current
time.)

For a certain MD step, after forces and energies are calculated normally, if the
current time step is a multiple of the predeﬁned number of steps for exchange attempt
(say, 20 or 40), then a set of commands will be executed in the driver to conduct
exchange attempt. First, functions will be called in each MPI process to calculate the
energy for the current conﬁguration of the process under its current HREM system
parameters (“HREM_energy_one”) and the energy for the conﬁguration under the new
HREM system parameters if the exchange attempt would be successful
(“I-IREM_energy_another”). (Note: ﬁrst, one exchange attempt is only made between two
adjacent systems. Each system, with the exception of the ﬁrst and the last each of which
has one adjacent system, only have two adjacent systems and the information of these
two adjacent systems is already stored as part of the system’s HREM parameters
structure. For system i, the exchange attempts are alternated between being made with
system i+1 and with system i-1.Therefore, according to step number (by which the
system on a certain MPI process can know which one of the two adjacent systems it
would attempt to exchange with), in each MPI process the “I-[REM_energy_another” can
be calculated independently. Second, the “HREM_energy__one” and “HREM_energy_
another” only include PME reciprocal space energy and the part of PME direct space
energy that can not be canceled out for the later Metropolis Delta calculation. The
deﬁnition of Metropolis Delta is provided in the methodology section of Chapter 2, Eq.

2.3.)

130

 

 

After obtaining “HREM_energy_one” and “HREM_energy_another” which are
stored as part of HREM parameters structure, the thread responsible for exchange attempt
on each MPI process will put its HREM parameters structure into an array created on the
MPI process with its rank to be zero. Calculations of Metropolis Deltas for all the
attempts are then made on that MPI process in its thread responsible for exchange attempt.
If any exchange attempt is determined to be successful, corresponding changes will be
made to the array containing all the HREM parameters structures. In the end, the thread
responsible for exchange attempt on the MPI process with rank zero will pass the correct
HREM parameters structure (again, note, the HREM parameters structure is the
characteristic of a system) to the MPI process containing the corresponding conﬁguration
left intact. Then the new HREM parameters (could be the same ones if exchange attempt
was determined to be not successful) will be used to calculate MD forces and energies

starting from next MD step.

Code excerpt of the driver of HREM implementation of CUKMODY

The following is some excerpt of the c++ codes of the driver ﬁle for the HREM
implementation of CUKMODY program. The excerpt is chosen to contain important
lines relevant to making exchange attempt and some lines close to those important lines
which may help understanding. (The sign “...”, as usual, means some lines are omitted,
so if no “. . .” between two lines listed then those two lines are also consecutive in the real

driver ﬁle.)

////creating the simulation

131

Simulation *ptr;

// this will report the num threads used if omp used

Solute_polar pt_ch2c12(numProcessPerNode);
// "setHChange" function was called in the constructor for Solute_polar;
//In the function, the parameters needed for HREM were read in and
//temporarily stored in "pHChnge".(A substantial lines of codes here
//are used to take care of "exchange going down" or "exchange going up"

//details.)
ptr = &pt_ch2c12;

////set up index

ptr->setReactCoordIndex(rank);
//At the start, since it is a new start (even for restart, we have re-
//ordered back) so configurations and systems are matched at this
//moment(say, configuration 0 must be in system O).So we simply, at
//this moment, label each node according to the rank (which is an
//integer starting from 0) being assigned to the MPI process on it by
//the MPI library. And the corresponding configuration will be loaded
//latter accordingly.(The correct HREM parameters were already loaded

//in setHchange using the same trick as here.)

/////create the RemMachine

RemMachine remMachine(STEP_REM,ptr,&HREMProb);
//Here,as usual data members of remMachine are initialized in the
//constructor and the HREM parameters were read into the remMachine
//object from the copy constructed in the previous "setHchange"

function

132

//in the Solute_polar constructor by calling "pSim->getPHChange()"

//function in the RemMachine constructor

int index = rank;
// Here, the "index" is the copy of index for driver and the previous
//"ptr—>setReactCoordIndex(rank)" initializes the copy of index
//essentially for the Solute_polar object pt_ch2c12. The usage of the
//index and oldIndex variables here in the main function of driver is

//for later output.purposes.

int oldIndex = index;

////initially scale q c6 c121 c122 c123
ptr->scale_q_c6_c12_initially();

//Since we need to calculate some energies before any exchange attempt

//can be made, we need to set up the electrostatic and vdw constants to

//be in accordance with the corresponding (scaled) system.

remMachine.turnOn();
//Here, we spawn off a second pthread thread to take care of the things
//related with exchange (including sending and receiving information,
//and arbitrating, which is only done on the node with rank=0). The
//original pthread thread will still run normally to execute codes for

//MD calculation.

//The following three lines prepare for the replica exchange attempt at
//the first step, so that later we can "setHREMEnergyOne"
ptr->compute_energy();

ptr->compute_energy_solute();

133

ptr->eva1uateTemperature();

////The following is the MD simulation loop

while(i<total_time_steps)

{

ptr->initialize();
if(is_solvent_polarizable)
ptr->drude_dipole(10,0.01);
ptr->build_list(rl,i);
ptr->compute_PME(i);
ptr->compute_force3(rc);
ptr->compute_force_solute(rc);
ptr->compute_cobond(i);
ptr->compute_cobond_solute(i);
ptr->compute_angle();
ptr->compute_angle_solute(i);
ptr->compute_dihang_solute(i);
ptr->compute_force_internal_solute(rc);
ptr—>compute_inter_solute(rc);

//The above lines for MD force and energy calculation

////The following in “if" brackets are for exchange attempt
if(i% STEP_REM == 0)
{
ptr->setHREMEnergyOne();
remMachine.applyHChangeForCalc();
ptr->compute_PME_another();

remMachine.setHREMEnergyAnother();

134

remMachine.setbackHChange();

//The previous five lines set up the HREM energy before exchange
//(HREM energy one) and the HREM energy after exchange if exchange
//would be successful (HREM energy another) for the latter calculation
//in the remMachine object for Metropolis Delta. The HREM energy
//comprises the whole reciprocal space energy (calculated by pme method)
//and part of the direct space energy.(The not-included part of the
//direct space energy will be canceled out in the Delta calculation.)
//Due to the requirement of
//pme implementation, to calculate the reciprocal space part of "HREM
//energy another" we need to use "remMachine.applyHChangeForCalc"
//function to set the electrostatic and vdw constants to be proper
//values according to the system which the current system is attempting
//to exchange with. In terms of direct space energy, since it is
//calcu1ated pairwise, we only need to include the energies involving
//atoms with their electrostatic and vdw constants scaled (others will
//be canceled out). Those energies are only calculated in the
//following three functions: "compute_force_solute",
//compute_force_internal_solute", "compute_inter_solute". During the
//execution of the above—mentioned three functions,the
//"vdwE_sca1e","vdwE_scale_2","elecE_scale" and "elecE_scale_2" were
//obtained. "_scale" means only one of the two atoms in the pair is to
//be scaled and "_scale_2" means both of the two atoms in the pair are
//to be scaled. Therefore, the included direct part of "HREMEnergyOne"
//is simply the sum of the above four terms, while the calculation of
//the included direct part of "HREMEnergyAnother" could be done by
//multiplying the corresponding ratios calculated by dividing the new
//scaling factor by the old factor (see "comments in "Rem.cpp" for

//function "HChange::setHREMEnergyAnother").Since we have changed

135

//electrostatic and vdw constants for calculating the
//"HREMEnergyAnother" while we are not sure at this moment whether the
//exchange will be successful; we need to change back those constants

//to original values through the execution of "setbackHChange" function.

remMachine.updateInfo();

//The function puts the previously calculated information stored
//in the "Hchange" class object, which is a member of the "remMachine"
//class object, into a struct "info" which is a member of "pData",

//which itself is a member of the "RemMachine" class object.

remMachine.arbitrate(i);

//In this function, the thread spawned off for sending and
//receiving messages and doing arbitrating is awakened. The thread will
//do things (see after " varForArbitrator.wait();" inside "Rem.cpp"
//file) related to exchange attempts, including calling functions to
//use the calculated "HREMEnergyOne" and "HREMEnergyAnother" to
//calculate delta and then use the delta to judge whether the

//exchange attempt is successful or not.

remMachine.applyHChange(i);

//After the exchange, the correct window information is stored in
//the "HChange" class object and is used here to scale the
//electrostatic and vdw constants of the atoms which are in the
//initially specified group including atoms that will be scaled. (Note:
//When the exchange attempt is rejected, then the ratio used to scale

is actually 1.)

remMachine.reverseIsUp()i

136

//As is usual in a REM approach, there are two patterns of exchange:
//referred to as “up” and “down” (see Chapter, xxx) For HREM, the
//information involved is a bit different. So we need to tell which
//pattern it is for a given exchange attempt. For the first exchange
//attempt, the "down" pattern is in use.

}

//end of the “if" brackets for exchange attempt

oldIndex = index;
index = remMachine.getIndex();
if(i%STEP_SYSLOG==O)
{
static int oldIndex = index;
if(oldIndex != index)
{
sysLogStream << cfgIndexEnd;//put the index end sign
sysLogStream << cfgIndex(index);//pUt the index sign
oldIndex = index;//update index

}

sysLogStream << "\ntime_step " << i <<endl;

}

//The above lines are used to set marks for the end of the
//application of a previous system and the number of the system which
//will be applied next. "oldIndex" and "index" are local variables of
//main function and are used in the "if(oldIndex 1: index)" to check
//whether some changes have occurred after exchange through updating
//index with the line "index = remMachine.getIndex();".Note that each
//node has its own copy of driver so, of course, each node has its own

//copies of "oldIndex" and "index". In other words, the values for

137

//"oldIndex" and "index" are different for different nodes.

}

//end of “while” loop for MD simulation steps

2. The implementation of some possibly useful modiﬁcations based on HREM

In chapter 5, several possibly useful approaches based on modifying the HREM
approach were described. A simple description of the implementation of those

modiﬁcations (focusing on the implementation difference) is listed in the following.

HTREM

The main implementation modiﬁcation made on HREM to get HTREM is as
follows. In the driver ﬁle, the function "scale_temperature_HTREM" is called. The
function will assign an effective temperature (which is the temperature for the system
calculated as a weighted average as deﬁned in Chapter 5, eq 5.1) to every node based on
the lambda value (scaling factor for the system applied) speciﬁed in an input

ﬁle,"rewFactor.txt".

HRTREM

The HRTREM implementation is modiﬁed based on the implementation of
HTREM and has the exactly same feature as HTREM in terms of "effective temperature".
The addition for HRTREM is that some restraints speciﬁed in the input ﬁle
"rewFactor.txt" are applied to the systems that have "mu"=l ("mu" is a member of

Solute _polar class and values for different systems are speciﬁed in "rewFactor.txt" ﬁle) at

138

every step. Note, for all the systems with "mu"=l, the same set of restraints are used
(namely, same pairs of atoms, same equilibrium distances, same force constants).For the
“HREMEnergyOne” and “HREMEnergyAnother” used for Metropolis Delta calculation,
the corresponding restraint energies are included. The nature of the restraint is easy to
change in the "applyRestrain " function of Solute _polar class, which is called in the

driver.

HRTREMGEN

The HRTREMGEN, as can be told from its name, is an extension of HRTREM.
The added ﬂexibility for HRTREMGEN includes the following. First, now different
systems can have different equilibrium distances and force constants. Second, there is a
switch implemented between one-sided harmonic restraint (with switch value 1) and
harmonic restraint (with switch value 2). If needed, more switches can be easily added.
(Note that for all systems of HRTREMGEN, the restrained pairs of atoms still have to be

the same.)

HTREM_RMS

The HTREM_RMS is modiﬁed based on HTREM and has the exactly same
feature as HTREM in terms of having effective temperatures. The input ﬁle
"rewFactor.txt" containing the Hamiltonian scaling factors for different systems is
exactly the same as the one needed for HTREM, but some additional information needed
for HTREM_RMS is speciﬁed in another input ﬁle "restraint_rms.txt". In ﬁle
"restraint_nns.txt", restraint atom groups, and force constants are speciﬁed. Note that the

ﬁnal column of the restraints part in the ﬁle repeats the system number and this number

139

will be read into the variable "actualIndex". There is no real difference between
"windowIndex" and "actualIndex" and the only reason for both of them to be used is to
keep the structure of some existing routine for implementation simplicity. The "lambda"
values speciﬁed next in "restraint_rms.txt" are used to set the 2-d array (the array
constructed from substantiating the global pointer "pointerToDistMap_globa ")
containing the equilibrium distances for all pairs of restraints and for all systems. The
equilibrium distances are calculated in the "setHchange" function using two protein
structures (one speciﬁed in "coord_gbp_initial.da " ﬁle and another speciﬁed in
"coord_gbp_ﬁnal.dat" ﬁle) by applying simple linear interpolation on the distances
between restraint atom group mass centers from the two structures. The "lambda" values
are the factors for linear interpolation conducted for different systems. Note: ﬁrst,
although "lambda" values are often set to be the same for all systems (say, one for each
system, meaning that the distances of the structure contained in "coord_gbp_ﬁnal.da "
ﬁle are used as equilibrium distances for all the systems), they can be set to be different.
Second, although the lambda for the ﬁrst system may be set as some real value, no
constraints are applied in the system due to the fact that the force constant for the system
is always set to zero. Third, the two structures speciﬁed in ﬁles "coord_gbp_initial.dat"
and "coord_gbp_ﬁnal.da " are only used to simplify the initial setting of the 2-d array for
equilibrium distances; in other words, the program could have been implemented without
using the two structures but loading in a directly speciﬁed matrix of equilibrium distances,

though a large input matrix needs to be made manually then.

140

HTREM_RMS_graduaI

This is a variation of the HTREM_RMS. The difference here is that the target
equilibrium distances stored in the 2-d array ("pointerToDistMap_globa]") are only used
as equilibrium distances of the restraints at the last time step. For other time steps, the
linear interpolation (using a factor equals to the quotient of current time step over total
time step) results of the distances from starting structure and the target equilibrium
distances stored in the 2-d array, are used as equilibrium distances of the restraints. Note
that this method was implemented to make the change more gradual so that the exchange
between the normal system and the system with applied restraints might be enhanced.

Normally, using this method, a restart is not expected.

HREM_Ion

The implementation of HREM_Ion is modiﬁed based on the implementation of
HREM. The basic idea for HREM_Ion, as also described in Chapter 5, is that, with the
exception of the ﬁrst system (the normal system), each system has a neutral group of ions
(the number of ions in the group must be same for each system, and no two systems can
have the same ion in their groups) on which a uniform lambda value is used to scale their
electrostatic interactions (vdw interactions are not scaled but can be easily added in).
Note that each system has the same ions in it, the difference is which ions are in the
group to be scaled by the lambda (of course, no ions are sealed for the ﬁrst system). The
ions needed to be scaled for the systems are loaded from ﬁle “HREM.txt”. Once we think
those different groups of ions to be the HREM characteristic of systems, the
implementation is almost the same. One more difference noticeable is that, for

HREM_Ion, a roulette kind of method is used to mix systems with the exception of the

141

ﬁrst system, in this implementation. That is done to not privilege any scaled parameter

system. So, all systems after the ﬁrst do not have any useful sampling information.

HREM_IonWater

The implementation of the HREM_IonWater is based on the HREM_Ion
implementation. However, as also pointed out in Chapter 5, there is an important
difference in the underlying concept. All systems for HREM_IonWater are normal
systems. For HREM_IonWater, a 2-d array (substantiated "pWaterTable") is loaded
according to the "HREM.txt" ﬁle. The array contains places at which there are additional
waters in the “list of ions and additional waters” for the systems (the list for the ﬁrst
system is stored in "parameters.dat"). Because the same "coord_gbp.dat" and
"parameters.da " (set for the ﬁrst system) are loaded for all the systems in the constructor
of "Solute_polar", in function "set_c6_c12_initially", according to the "pWaterTable"
table, the systems are permuted for where the added waters (those that can turn into ions)
are. Similarly, for conducting an exchange, it is the system index that needs to be
exchanged so that the old and new indices can be used to check the "pWaterTable" for
the old and new water places to be permuted in the "applyHChange" ﬁrnction. (Note that
since ions are assumed to move swiftly, the term "neighbor" is only perfunctory here.)
For implementation simplicity, the ions are considered to be three-site molecules with
appropriate charges, and the vdw radii must be at least very close to a that of a water
molecule, to lead to reasonable exchange probabilities. The vdw parameters for those

ions are reloaded in frmction "set_c6_c12_initially" from "ioanw.txt".

142

References

(1) Alder, B. J .; Wainwright, T. B. Journal of Chemical Physics 1959, 31, 459.

(2) Bemal, J. D. Proceedings of the Royal Society of London Series a-
Mathematical and Physical Sciences 1964, 280, 299.

(3) Allen, M. P.; Tildesley, D. J. Computer Simulation of Liquids; Clarendon
Press: Oxford, 1987.

(4) Case, D. A.; Pearhnan, D. A.; Caldwell, J. W.; III, T. E. C.; Wang, J .;
Ross, W. S.; Simmerling, C. L.; Darden, T. A.; Merz, K. M.; Stanton, R. V.; Cheng, A. L.;
Vincent, J. J .; Crowley, M.; Tsue, V.; Gohlke, H.; Radmer, R.; Duan, Y.; Pitera, J .;
Massova, I.; Seibel, G. L.; Singh, C.; Weiner, P.; Kollman, P. A. AMBER7; University of
California: San Francisco, 2002.

(5) Mackerel], A. D.; Feig, M.; Brooks, C. L. Journal of Computational
Chemistry 2004, 25, 1400.

(6) van Gunsteren, W. F.; Billeter, S. R.; Eising, A. A.; Hiinenberger, P. H.;
Kruger, R; Mark, A. E.; Scott, W. R. P. Biomolecular simulation: the GROMOS96
manual and user guide; Vdf hochschulverlag AG an der ETH: Ziirich, 1996.

(7) Hockney, R. W.; Eastwood, J. W. Computer simulation using particles;
McGraw-Hill: New York, 1981.

(8) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. J. Comput. Phys. 1977,
23, 327.

(9) Frenkel, D.; Smit, B. Understanding Molecular Simulation: From
Algorithms to Applications; Academic: San Diego, 1996.

(10) Berendsen, H. H. C.; Postrna, J. P. M.; Gunsteren, W. F .; DiNola, A.;
Haak, J. R. J. Chem. Phys. 1984, 81, 3684.

(11) Loncharich, R. J .; Brooks, B. R.; Pastor, R. W. Biopolymers 1992, 32, 523.
(12) Andersen, H. C. Journal of Chemical Physics 1980, 72, 2384.

(13) Karasawa, N.; Goddard, W. A. Journal of Physical Chemistry 1989, 93,
7320.

(14) York, D. M.; Darden, T. A.; Pedersen, L. G. Journal of Chemical Physics
1993, 99, 8345.

143

(15) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen,
L. G. Journal of Chemical Physics 1995, 103, 8577.

(16) McCammon, A.; Harvey, S. C. Dynamics of proteins and nucleic acids;
Cambridge University Press: Cambridge, 1987.

(17) Swendsen, R. H.; Wang, J. S. Physical Review Letters 1986, 5 7, 2607.

(18) Geyer, C. J. Markov Chain Monte Carlo Maximum Likelihood. In
Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface
Kerarnidas, E. M., Ed.; Interface Foundation: Fairfax Sta tion,,1991.

(19) Sugita, Y.; Okamoto, Y. Chemical Physics Letters 1999, 314, 141.

(20) Hansmann, U. H. E. Chem. Phys. Lett. 1997, 281, 140.

(21) Wang, J. S.; Swendsen, R. H. Progress of Theoretical Physics Supplement
2005, 3 1 7.

(22) Hukushirna, K.; Nemoto, K. Journal of the Physical Society of Japan 1996,
65, 1604.

(23) Lei, H. X.; Duan, Y. Current Opinion in Structural Biology 2007, 1 7, 187.
(24) Sugita, Y. Front Biosci. 2009, 14, 1292

(25) Fukunishi, H.; Watanabe, 0.; Takada, S. Journal of Chemical Physics
2002, 116, 9058.

(26) Predescu, C.; Predescu, M.; Ciobanu, C. V. Journal of Physical Chemistry
B 2005, 109, 4189.

(27) Liu, Z. H.; Berne, B. J. Journal of Chemical Physics 1993, 99, 6071.
(28) Wenzel, W.; Hamacher, K. Physical Review Letters 1999, 82, 3003.

(29) Homak, V.; Simmerling, C. Proteins-Structure Function and Genetics
2003, 51, 577.

(30) Cukier, R. 1.; Morillo, M. Journal of Chemical Physics 2005, 123.
(31) Shen, T.; Hamelberg, D. J Chem Phys 2008, 129, 034103.

(32) Hansmann, U. H. E. Eur. Phys. J. B 1999, 12, 607.

144

(33) Hamelberg, D.; Mongan, J .; McCammon, J. A. Journal of Chemical
Physics 2004, 120, 11919.

(34) Hamelberg, D.; McCammon, J. A. Journal of the American Chemical
Society 2005, 127, 13778.

(35) Kirkwood, J. G. J. Chem. Phys. 1935, 3, 300.
(36) Torrie, G. M.; Valleau, J. P. Chemical Physics Letters 1974, 28, 578.

(37) Kumar, S.; Bouzida, D.; Swendsen, R. H.; Kolhnan, P. A.; Rosenberg, J.
M. Journal of Computational Chemistry 1992, 13, 1011.

(38) Chodera, J. D.; Swope, W. C.; Pitera, J. W.; Seok, C.; Dill, K. A. Journal
of Chemical Theory and Computation 2007, 3, 26.

(39) Souaille, M.; Roux, B. Computer Physics Communications 2001, 135, 40.

(40) Jolliffe, I. T. Principal Component Analysis, Second ed.; Springer Science:
New York, 2004.

(41) Balsera, M. A.; Wriggers, W.; Oono, Y.; Schulten, K. Journal of Physical
Chemistry 1996, 100, 2567.

(42) Garcia, A. E. Phys. Rev. Lett. 1992, 68, 2696.

(43) Garcia, A. E.; Blumenfeld, R.; Hummer, G.; Krumhansl, J. A. Physica D
1997, 107, 225.

(44) Amadei, A.; Linssen, A. B. M.; Berendsen, H. J. C. Proteins: Structure,
Function, and Genetics 1993, 17, 412.

(45) Cox, T. E; Cox, M. A. A. Multidimensional scaling, 2nd ed. ed.;
Chapman & Hall: Boca Raton, 2001.

(46) T. Hastie, R. T. a. J. F. The Elements of Statistical Learning: Data Mining,
Inference and Prediction; Springer: Berlin, 2001.

(47) Daura, X.; Gademann, K.; Jaun, B.; Seebach, D.; van Gunsteren, W. F .;
Mark, A. E. Angewandte Chemie-International Edition 1999, 38, 236.

(48) Scheffe, H. Biometrika 1953, 40, 87.

(49) Johnson, S. C. Psychometrika 1967, 32, 241.

145

(50) Hughes, J .; Smith, T. W.; Kosterlitz, H. W.; Fothergill, L. A.; Morgan, B.
A.; Morris, H. R. Nature 1975, 258, 577.

(51) Schiller, P. W. Conformational analysis of enkephalin and conformation-
activity relationships; Academic Press: Orlando, 1984; Vol. 6.

(52) Spanagel, R.; Herz, A.; Shippenberg, T. S. Proceedings of the National
Academy of Sciences of the United States of America 1992, 89, 2046.

(53) Graham, W. H.; Carter, E. S.; Hicks, R. P. Biopolymers 1992, 32, 1755.

(54) Higashijima, T.; Kobayashi, J.; Nagai, U.; Miyazawa, T. European
Journal of Biochemistry 1979, 97, 43.

(55) Marcotte,1.; Separovic, F.; Auger, M.; Gagne, S. M. Biophysical Journal
2004, 86, 1587.

(56) Surewicz, W. K.; Mantsch, H. H. Biochemical and Biophysical Research
Communications 1988, 150, 245.

(57) Khaled, M. A.; Long, M. M.; Thompson, W. D.; Bradley, R. J .; Brown, G.
B.; Urry, D. W. Biochemical and Biophysical Research Communications 1977, 76, 224.

(58) Takeuchi, H.; Ohtsuka, Y.; Harada, 1. Journal of the American Chemical
Society 1992, 114, 5321.

(59) Spirtes, M. A.; Schwartz, R. W.; Mattice, W. L.; Coy, D. H. Biochemical
and Biophysical Research Communications 1978, 81 , 602.

(60) DAlagni, M.; Delﬁni, M.; DiNola, A.; Eisenberg, M.; Paci, M.; Roda, L.
G.; Veglia, G. European Journal of Biochemistry 1996, 240, 540.

(61) Hansmann, U. H. E.; Okamoto, Y. Journal of Computational Chemistry
1997, I8, 920.

(62) Hansmann, U. H. E.; Okamoto, Y.; Eisenmenger, F. Chemical Physics
Letters 1996, 259, 321.

(63) vanderSpoel, D.; Berendsen, H. J. C. Biophysical Journal 1997, 72, 2032.
(64) Aburi, M.; Smith, P. E. Biopolymers 2002, 64, 177.
(65) Nielsen, B. G.; Jensen, M. 0.; Bohr, H. G. Biopolymers 2003, 71, 577.

(66) Shen, M. Y.; Freed, K. F. Biophysical Journal 2002, 82, 1791.

146

(67) Zaman, M. H.; Shen, M. Y.; Berry, R. S.; Freed, K. F. Journal of Physical
Chemistry B 2003, 107, 1685.

(68) Karvounis, G.; Nerukh, D.; Glen, R. C. Journal of Chemical Physics 2004,
121, 4925.

(69) Sanbonmatsu, K. Y.; Garcia, A. E. Proteins-Structure Function and
Genetics 2002, 46, 225.

(70) Spadaccini, R.; Temussi, P. A. Cellular and Molecular Life Sciences 2001,
58, 1572.

(71) Murray, B. E. Advances in Internal Medicine, Vol 42 1997, 42, 339.

(72) Blakley, R. L., Benkovic, S. J. Chemsitry and Biochemistry of folates;
John Wiley & Sons, Inc.: New York, 1984; Vol. 1.

(73) Hennig, M.; Dale, G. E.; D'Arcy, A.; Danel, F.; Fischer, S.; Gray, C. P.;
Jolidon, S.; Muller, F.; Page, M. G. P.; Pattison, P.; Oefner, C. Journal of Molecular
Biology 1999, 287, 211.

(74) Xiao, B.; Shi, G. B.; Chen, X.; Yan, H. G.; Ji, X. H. Structure 1999, 7, 489.

(75) Blaszczyk, J .; Shi, G.; Yan, H.; Ji, X. Structure 2000, 8, 1049.

(76) Xiao, B.; Shi, G. B.; Gao, J. H.; Blaszczyk, J .; Liu, 0.; Ji, X. H.; Yan, H.
G. Journal of Biological Chemistry 2001, 276, 40274.

(77) Li, Y.; Gong, Y. C.; Shi, G. B.; Blaszczyk, J .; Ji, X. H.; Yan, H. G.
Biochemistry 2002, 41, 8777.

(78) Blaszczyk, J .; Shi, G. B.; Li, Y.; Yan, H. G.; J i, X. H. Structure 2004, 12,
467.

(79) Li, G. Y.; Felczak, K.; Shi, G. B.; Yan, H. G. Biochemistry 2006, 45,
12573.

(80) Blaszczyk, J .; Li, Y.; Cherry, S.; Alexandratos, J .; Wu, Y.; Shaw, G.;
Tropea, J. E.; Waugh, D. S.; Yan, H. G.; Ji, X. H. Acta Crystallographica Section D-
Biological Crystallography 2007, 63, 1169.

(81) Lescop, E.; Lu, Z. W.; Liu, 0.; Xu, H. M.; Li, G. Y.; Xia, B.; Yan, H. G.;
Jin, C. W. Biochemistry 2009, 48, 302.

(82) Sn, L.; Cukier, R. 1. Journal of Physical Chemistry A 2009, 113, 2025.

147

(83) Lou, H. F .; Cukier, R. I. J. Phys. Chem. B 2006, 110, 12796.
(84) Su, L.; Cukier, R. 1. Journal of Physical Chemistry B 2007, 111, 12310.
(85) Lou, H. Analyzer; 1.0 ed. East Lansing, 2005.

(86) Amadei, A.; Ceruso, M. A.; Di Nola, A. Proteins-Structure Function and
Genetics 1999, 36, 419.

(87) Hess, B. Physical Review E 2002, 65, 031910.

(88) Romo, T. D.; Clarage, J. B.; Sorensen, D. 0; Phillips, G. N. Proteins-
Structure Function and Genetics 1995, 22, 311.

(89) Lou, H. F.; Cukier, R. 1. Journal of Physical Chemistry B 2006, 110,
24121.

(90) Korn, A. P.; Rose, D. R. Protein Engineering 1994, 7, 961.
(91) Brubaker, R. R. Clinical Microbiology Reviews 1991, 4, 309.

(92) Kumar, S.; Ma, B. Y.; Tsai, C. J .; Sinha, N.; Nussinov, R. Protein Science
2000, 9, 10.

(93) James, L. C.; Tawﬁk, D. S. Trends in Biochemical Sciences 2003, 28, 361.

(94) Goh, C. S.; Milbum, D.; Gerstein, M. Current Opinion in Structural
Biology 2004, 14, 104.

(95) Bahar, 1.; Chennubhotla, C.; Tobi, D. Current Opinion in Structural
Biology 2007, 1 7, 633.

(96) Ma, B. Y.; Kumar, S.; Tsai, C. J .; Nussinov, R. Protein Engineering 1999,
12, 713.

(97) Bryngelson, J. D.; Onuchic, J. N.; Socci, N. D.; Wolynes, P. G. Proteins-
Structure Function and Genetics 1995, 21, 167.

(98) Dill, K. A.; Chan, H. S. Nature Structural Biology 1997, 4, 10.

(99) Taniuchi, H.; Anﬁnsen, C. B. Journal of Biological Chemistry 1966, 241,
4366.

(100) Markley, J. L.; Williams, M. N.; Jardetzky, 0. Proceedings of the
National Academy of Sciences of the United States of America 1970, 65, 645.

148

(101) Chazin, W. J .; Kordel, J .; Drakenberg, T.; Thulin, E.; Brodin, P.;
Grundstrom, T.; F orsen, S. Proceedings of the National Academy of Sciences of the
United States of America 1989, 86, 2195.

(102) Zhang, H. J .; Sheng, X. R.; Niu, W. D.; Pan, X. M.; Zhou, J. M. Journal of
Biological Chemistry 1998, 273, 7448.

(103) Wilson, M. A.; Brunger, A. T. Journal of Molecular Biology 2000, 301 ,
1237.

(104) Wang, P. M.; Izatt, R. M.; Oscarson, J. L.; Gillespie, S. B. Journal of
Physical Chemistry 1996, 100, 9556. ‘

(105) Cukier, R. I. Biochimica Et Biophysica Acta-Bioenergetics 2005, 1 706,
1 34.

(106) Bermingham, A.; Bottomley, J. R.; Primrose, W. U.; Derrick, J. P. Journal
of Biological Chemistry 2000, 275, 17962.

(107) Figueirido, R; Levy, R. M.; Zhou, R. H.; Beme, B. J. Journal of Chemical
Physics 1997, 106, 9835.

(108) Hamelberg, D.; Shen, T.; McCammon, J. A. Journal of Chemical Physics
2005, 122.

(109) Chen, L. Y.; Horing, N. J. M. Journal of Chemical Physics 2007, 126.
(110) Li Su, R. I. C. J. Phys. Chem. B 2009, 113, 9595.

(111) Min, D. H.; Li, H. Z.; Li, G. H.; Berg, B. A.; Fenley, M. 0.; Yang, W.
Chemical Physics Letters 2008, 454, 391.

(112) Lou, H. Development of a Molecular Dynamics based method to
accelerate sampling of large domain motions in proteins: Applications to Adenylate
Kinase, Michigan State University, 2006.

149