DEVELOPMENT OF NOVEL COMPUTATIONAL TECHNIQUES FOR THE STUDY OF BIOMOLECULAR SYSTEMS USING MOLECULAR DYNAMICS SIMULATION By Vahid Mirjalili A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Mechanical Engineering - Doctor of Philosophy 2014 ABSTRACT DEVELOPMENT OF NOVEL COMPUTATIONAL TECHNIQUES FOR THE STUDY OF BIOMOLECULAR SYSTEMS USING MOLECULAR DYNAMICS SIMULATION By Vahid Mirjalili In this dissertation, we have developed novel computational techniques that have been effectively utilized to extend our knowledge of proteins and lipid membrane systems. Application of molecular dynamics combined with newly developed techniques and protocols to study protein structure refinement and interactions of amino-acid analog pairs within lipid membranes are studied. A robust protocol for structure refinement of proteins from a given homologous model is designed and optimized that uses restrained molecular dynamics followed by optimal subset selection and structure averaging. This protocol is tested on CASP8 and CASP9 targets, and later successfully applied to CASP10 in blind prediction manner. In order to understand physical characteristics of peptide interactions embedded in bilayer membrane, we have used umbrella sampling technique with model amino acid side-chain analog pairs to study their association free energy while placed in membrane bilayer. As a result of convergence issues observed in such simulations due to bilayer deformation, a novel enhanced sampling technique is developed which biases the density of water in a cylinder, thereby effectively imposing bilayer deformation. Applying this method to a DPPC bilayer, we were able to study free energy of pore formation in membrane bilayers, and showed that while the undergone mechanism is different from currently existing methods, the mechanism by our proposed method is closer to the natural pore formation mechanism. TABLE OF CONTENTS LIST OF TABLES ......................................................................................................................... vi LIST OF FIGURES ..................................................................................................................... viii Chapter 1 Introduction ............................................................................................................... 1 1.1 Background ...................................................................................................................... 2 1.2 Molecular Dynamics Simulation...................................................................................... 3 1.3 Molecular Force-Field ...................................................................................................... 4 1.4 Treatment of solvent molecules ....................................................................................... 5 1.4.1 Explicit treatment of solvent molecules .................................................................... 5 1.4.2 Implicit treatment of solvent environment ................................................................ 6 1.4.3 Implicit membrane model ......................................................................................... 8 1.5 Free Energy Calculation ................................................................................................. 10 1.6 Dissertation Scope .......................................................................................................... 12 1.6.1 Protein Structure Refinement .................................................................................. 12 1.6.2 Understanding peptide-membrane interactions ...................................................... 13 Chapter 2 Protein Structure Refinement through Structure Selection and Averaging from Molecular Dynamics Ensembles................................................................................................... 16 2.1 Abstract .......................................................................................................................... 17 2.2 Introduction .................................................................................................................... 17 2.3 Methods .......................................................................................................................... 19 2.4 Results ............................................................................................................................ 23 2.4.1 Final and Best Structures ........................................................................................ 23 2.4.2 Lowest-scoring Structures ...................................................................................... 28 2.4.3 Ensemble-averaged Structures ................................................................................ 29 2.4.4 Structure Interpolation ............................................................................................ 35 2.4.5 Quality Assessment ................................................................................................. 37 2.4.6 Final Refinement of Averaged Structures............................................................... 39 2.5 Discussion and Conclusion ............................................................................................ 41 2.6 Acknowledgment ........................................................................................................... 42 2.7 Supporting Information .................................................................................................. 44 Chapter 3 Physics Based Protein Structure Refinement via Data Mining through Multiple Molecular Dynamics Trajectories and Structure Averaging ........................................................ 47 3.1 Abstract .......................................................................................................................... 48 3.2 Introduction .................................................................................................................... 48 3.3 Methods .......................................................................................................................... 51 3.4 Results ............................................................................................................................ 56 3.4.1 Overall CASP10 Performance ................................................................................ 57 3.4.2 Model Selection based on Lowest DFIRE and Highest iRMSD ............................ 59 3.4.3 Quality Assessment using Correlation between iRMSD and DFIRE ..................... 60 3.4.4 Restraint Choice ...................................................................................................... 61 3.4.5 Simulation Time: Single MD vs. Multiple Short MDs ........................................... 66 iii 3.4.6 Final Stage of Refinement ...................................................................................... 68 3.5 Conclusion...................................................................................................................... 70 Chapter 4 Protein Structure Refinement on CASP11 Targets ................................................. 72 4.1 Introduction .................................................................................................................... 73 4.2 Methods .......................................................................................................................... 73 4.2.1 Conformational sampling........................................................................................ 75 4.2.2 Subset selection and average structure ................................................................... 76 4.2.3 Model submission in CASP11 ................................................................................ 78 4.3 Results and Discussion ................................................................................................... 78 4.4 Conclusion and Future Work ......................................................................................... 81 Chapter 5 Density Biased Sampling: A Robust Computational Method for Studying Pore Formation in Membranes .............................................................................................................. 82 5.1 Abstract .......................................................................................................................... 83 5.2 Introduction .................................................................................................................... 83 5.3 Methods .......................................................................................................................... 85 5.3.1 Density Biasing Potential ........................................................................................ 85 5.3.2 Simulation Details................................................................................................... 87 5.3.2.1 Method Validation............................................................................................... 87 5.3.2.2 Membrane Simulations ....................................................................................... 89 5.3.2.3 One-sided deformation of a membrane bilayer ................................................... 90 5.3.2.4 Pore formation in a membrane bilayer ................................................................ 90 5.3.2.5 Parameter Selection ............................................................................................. 91 5.3.2.6 Implementation.................................................................................................... 92 5.4 Results and Discussion ................................................................................................... 92 5.4.1 Mixing entropy of two-component gas ................................................................... 92 5.4.2 Membrane Simulations ........................................................................................... 94 5.5 Conclusions .................................................................................................................. 103 Chapter 6 Interactions of Amino Acid Analogs within Membrane Environments ............... 104 6.1 Abstract ........................................................................................................................ 105 6.2 Introduction .................................................................................................................. 105 6.3 Materials and Methods ................................................................................................. 108 6.3.1 Explicit Solvent Simulations................................................................................. 108 6.3.2 Implicit Solvent Simulations................................................................................. 110 6.3.3 Bilayer Deformation Simulation ........................................................................... 110 6.4 Results and Discussion ................................................................................................. 112 6.4.1 Membrane Deformations ...................................................................................... 112 6.4.2 Association Free Energy Profiles.......................................................................... 115 6.4.3 Contact Pair Formation of Polar Compounds ....................................................... 122 6.4.4 Comparison with Implicit Membrane Models ...................................................... 124 6.5 Conclusions .................................................................................................................. 128 6.6 Acknowledgement ........................................................................................................ 128 6.7 Supplementary Materials.............................................................................................. 128 Chapter 7 Conclusion and Perspective .................................................................................. 136 iv REFERENCES ........................................................................................................................... 140 v LIST OF TABLES Table 2-1 CASP8 and CASP9 refinement targets used here as test cases with the total number of residues and Cα-RMSD of the initial models from the respective native structures. Restraint regions denote residues for which harmonic restraints were applied to maintain structures near their initial structures. The targets were sorted according to increasing RMSD values. The regions suggested by CASP are shown in bold. ........................................................................... 21 Table 2-2 Changes in RMSD (Å) from the experimental structure relative to the RMSDs of the initial models during MD simulations with and without restraints over different simulation lengths. For all cases, the ∆RMSD for the final conformation and the overall lowest RMSD are given. Improved cases with negative ∆RMSD values are highlighted in bold. ............................ 25 Table 2-3 Changes in GDT-HA from the experimental structure relative to the GDT-HA values of the initial models during MD simulations as in Table 2-2. Improved cases with positive ∆GDT-HA values are highlighted in bold. ................................................................................... 26 Table 2-4 Changes in RMSD (Å) and GDT-HA upon selecting structures with the lowest DFIRE score and correlation coefficients of RMSD or GDT-HA vs. iRMSD or DFIRE. Correlation coefficients larger than 0.30 (RMSD) or less than -0.30 (GDT-HA) are highlighted in bold. ..... 28 Table 2-5 Change in RMSD (Å) and GDT-HA upon averaging over selected subsets (see text) with and without additional structure interpolation. Averages were calculated for all targets and for those where the correlation coefficient of iRMSD vs. DFIRE is less than 0.4 (indicated by*) ....................................................................................................................................................... 34 Table 2-6 Quality measures of averaged structures before (Avg) and after (MD) refinement via restrained MD simulations. ........................................................................................................... 40 Table 2-7 Summary of the average improvements in RMSD (Å) and GDT-HA for all the attempted methods for structure selection out of 8×3 ns and 200 ns simulation sets; Best in trajectory is given as a reference for the maximum possible improvement. ................................ 41 Table 2-8 Change in RMSD (Å) and GDT-HA for the structure closest to the subset average relative to their respective values from the initial models. ........................................................... 44 Table 3-1 Type of restraints applied on Cα atoms for the two simulation sets; strong (1 kCal/mol/Å2), weak (0.05 kCal/mol/Å2) and a combination of both. The strong force constants are only applied to the selected regions. ....................................................................................... 54 Table 3-2 Refinement results showing the best observed structures in the trajectories, the first submitted model, the best of five submitted models, and best model among models 3-5............ 57 Table 3-3 Fraction of improved trajectory frames in replicas classified by correlation between iRMSD and DFIRE score; Fraction of improved frames in trajectories with correlation <0.4 that are larger by 10% than fractions with correlation≥0.4 are highlighted. ....................................... 61 vi Table 3-4 Refinement results of different restraints for the best observed structure in terms of RMSD (Å) and GDT-HA; comparing strong (1 kCal/mol/Å2) restraint on selected residues vs. weak (0.05 kCal/mol/Å2) restraint on all Cαs, and a combination of both. Cases associated with * indicate targets that had suggestions from CASP on which regions need refinement.................. 63 Table 3-5 Refinement results of different restraints using the established structure generation protocol; comparing strong (1 kCal/mol/Å2) on selected residues vs. weak (0.05 kCal/mol/Å2) restraint on all Cα, and a combination of both .............................................................................. 65 Table 3-6 Summary of comparing refinement results between 10×20 ns simulations and single 200 ns simulations having the same restraint conditions. The results are averaged over 27 CASP10 targets. ............................................................................................................................ 67 Table 3-7 MolProbity results of the structure obtained from the averaging and structure interpolation .................................................................................................................................. 70 Table 4-1 List of refinement targets in CASP11 .......................................................................... 74 Table 4-2 Selected regions for the second round of sampling for each target .............................. 77 Table 4-3 GDT-HA results and MolProbity and ClashScore measures for quality assessment of submitted models .......................................................................................................................... 80 Table 6-1 Binding free energies in kcal/mol obtained from explicit simulations (CHARMM) at different Z distances as the difference between the free energy for the contact pair and the average energy for distances greater than 10 Å. Standard errors are given in parentheses. Values obtained at z=0, 12 Å, and 24 Å are compared with values obtained previously in decane, octanol, and water by de Jong et al.[1]. ...................................................................................... 122 Table 6-2 Simulation time in nanoseconds for explicit simulations of each amino-acid analog pair under different umbrella potential; the simulation time listed is used in forward and backward directions. ................................................................................................................... 129 Table 6-3 Improved HDGB parameters, dielectric profile (ε) and non-polar profile (γ). .......... 130 vii LIST OF FIGURES Figure 1-1 Number of research articles that have used molecular dynamics in biological science. ......................................................................................................................................................... 2 Figure 1-2 Schematic diagram of decomposing solvation free energy in a thermodynamic cycle. 7 Figure 1-3 Schematic diagram of variation of dielectric constant across membrane bilayer. ........ 9 Figure 1-4 The dielectric profile (A), and non-polar scaling profile (B) for HDGB model ......... 10 Figure 2-1 Change in RMSD with respect to native structure (A) and in GDT-HA (B) upon averaging different subsets of structures sorted by either DFIRE scores or iRMSD. Results from the 200 ns MD runs are shown in blue (circles) and from 8x3 ns sampling in green (triangles). Open symbols denote iRMSD-based selection; closed symbols refer to DFIRE-based selection. ....................................................................................................................................................... 30 Figure 2-2 Subset selection based on combination of DFIRE and iRMSD scores (normalized by their respective standard deviations). Selected structures (green triangles) are outside the circle with radius (ρ) and within the segment with angle (θ). ................................................................ 32 Figure 2-3 Change in RMSD with respect to native structure (A) and GDT-HA (B) as a function of radius (ρ), and angle (θ). Parameters considered to be optimal and used subsequently for subset averaging are indicated by ‘X’. .......................................................................................... 33 Figure 2-4 Change in RMSD with respect to native structure (A) and GDT-HA (B) upon structure interpolation between the initial (α=0.0) and the subset-averaged structures (at α=1.0). Results from 200 ns MD runs are shown in blue (circles) and from 8x3 ns sampling in green (triangles). ..................................................................................................................................... 37 Figure 2-5 Change in RMSD with respect to native structure as a function of correlation between iRMSD and DFIRE scores with (green triangles) and without (red squares) structure interpolation. ................................................................................................................................. 39 Figure 2-6 Change in GDT-HA of all CASP8 and CASP9 targets after refinement without imposing restraints using C36ff. ................................................................................................... 45 Figure 2-7 Change in GDT-HA vs. time for all targets with 200 ns simulation with imposed restraints. ....................................................................................................................................... 46 Figure 3-1 Flowchart of the refinement protocol, from simulation to model selection ................ 55 Figure 3-2 Correlation of iRMSD vs. DFIRe scores of individual replicas for TR674; Set 1 (replicas 1:20) is shown with red boxes and set 2 (replicas 21:30) with blue. Replicas with corr>0.4 are discarded for model selection. .................................................................................. 59 viii Figure 3-3 Subset selection based on normalized iRMSD and DFIRE scores from the replicas in set 1 with corr(iRMSD,DFIRE)<0.4 for TR674. The subset shown in the lower left corner of the scatter-plot with green triangles are selected to calculate the average structure and model selection. ....................................................................................................................................... 67 Figure 3-4 Initial model (blue), refined (green) and native (magenta) for (a) TR662, (b) TR723, (c) TR738 and (d) TR674 ............................................................................................................. 68 Figure 3-5 Sampling efficiency toward native, showing the cumulative minimum ∆RMSD (top) and cumulative maximum ∆GDT-HA (bottom), comparing the single 200 ns MD simulations (red) vs. 10×20 ns MD simulations (green) averaged over 27 CASP10 targets. .......................... 69 Figure 4-1 Comparing the initial (green) and refined (cyan) models of TR759 to its experimentally observed crystal structure .................................................................................... 81 Figure 5-1 A: Schematic representation of the biasing cylinder aligned to the bilayer normal. The center of the switching region is indicated with dashed lines; B: Volume function used in axial and radial directions. ..................................................................................................................... 86 Figure 5-2 Schematic representation of the mixing process for a simple two-component noble gas mixture that is fully demixed (A) and partially mixed (B). .................................................... 89 Figure 5-3 Free energy cost of mixing two noble gas species as a function of the biasing reaction coordinate ξ based on theory (mixing entropy) and simulation (free energy calculated from umbrella sampling simulations). Theory A is using uniform density to estimate total number of particle in cylinder, whereas theory B uses the empirical average number of particles observed during the simulations. .................................................................................................................. 93 Figure 5-4 Local membrane bilayer thickness of the upper leaflet vs. water density per unit volume from biased sampling of one-sided membrane deformations. ......................................... 95 Figure 5-5 Snapshots illustrating the one-sided deformation process from a flat bilayer state to a fully deformed state at water densities of 0.0016 (A), 0.0073 (B), 0.0111 (C), 0.0143 (D), 0.0167 (E), and 0.0170 Å-3 (F); Red spheres represent water molecules, brown spheres represent phosphorous atoms of the lipids, and lipid tails are shown in green. ........................................... 96 Figure 5-6 Average bilayer thickness in radial slabs for each umbrella window as a function of radial distance from the pore center. B: Free energy profile for one sided bilayer deformation as a function of water density in the cylinder. Standard error values obtained by calculating the PMF profiles over 10 2-ns subsets from the umbrella sampling simulation are shown as light blue shades. ........................................................................................................................................... 97 Figure 5-7 Snapshots illustrating the pore formation process from a flat bilayer state to a stable pore at water densities of 0.0067 (A), 0.0144 (B), 0.0159 (C), 0.0168 (D), 0.0196 (E), and 0.0222 Å-3 (F) with coloring as in Figure 5-5. ......................................................................................... 98 Figure 5-8 Number density of water molecules across bilayer normal compared between a flat bilayer and a bilayer with a stable pore (A), and their differences (B). ........................................ 99 ix Figure 5-9 Free energy of pore formation as a function of water density in the cylinder from density-biased sampling with errors indicated as in Figure 5-6B. A previous result from Bennett et al. is shown for comparison. ................................................................................................... 100 Figure 5-10 Average z coordinate of the two closest lipid phosphates from the bilayer center vs. water density within pore cylinder illustrating different mechanisms between density-driven and phosphate-driven pore formation bias. Sampling from each umbrella is shown in different colors. ..................................................................................................................................................... 101 Figure 5-11 Intermediate bilayer states with low average distance of phosphates to the bilayer center. .......................................................................................................................................... 102 Figure 6-1 Local membrane thickness of the upper leaflet as a function of the radial distance from the center of the amino acid analog pairs at different distances from the center of the membrane. A: sampling in forward direction; B: sampling in backward direction. Local thickness is calculated as average z of phosphorous atoms in the upper leaflet which fall into radial slabs of width 4 Å..................................................................................................................................... 114 Figure 6-2 Potentials of mean force as a function of water density to reflect membrane deformation. A: acetamide pair at z=0 and d=5.5 Å; B: methanol pair at z=0 and d=4.5 Å. ..... 115 Figure 6-3 Potential of mean force for acetamide as a function of pair distance at different insertion depth into the lipid bilayer from simulations with A) explicit solvent and lipids B) HDGB implicit membrane C) GBSW implicit membrane and D) EEF1 implicit membrane models; For each insertion depth, the bound state was used as the reference with an energy of zero. ............................................................................................................................................. 118 Figure 6-4 PMF of methanol as in Fig. 6-3. ............................................................................... 119 Figure 6-5 PMF of toluene as in Fig. 6-3.................................................................................... 120 Figure 6-6 PMF of propane as in Fig. 6-3. ................................................................................. 121 Figure 6-7 Conformational analysis of acetamide pair at the bound state; A) average hydrogen bonds formed between acetamide pair and water molecules as a function of bilayer normal distance, B) fraction of conformations that form one intra-pair hydrogen bond, C) fraction of conformations forming two intra-pair hydrogen bonds .............................................................. 123 Figure 6-8 Conformational analysis of methanol pair at two possible bound states; A) average hydrogen bonds formed with water, B) fraction of conformation 1 at bound distance 3.5 Å (blue) and at 4.5 Å (red). Results of explicit simulations are shown in solid lines, HDGB in dashed lines, GBSW in dotted line, and IMM1 in dash-dotted line ....................................................... 124 Figure 6-9 PMF profiles for acetamide, methanol, toluene, and propane at Z=0 and Z=12 as a function of the pair distance obtained from explicit, HDGB, GBSW, and EEF1 models .......... 126 Figure 6-10 Amino acid analogs used in this study, acetamide (Asn), methanol (Ser), toluene (Phe) and propane (Val). ............................................................................................................. 131 x Figure 6-11 Root mean squared deviation between the PMF profiles of biased simulations in forward and backward pulling directions. .................................................................................. 132 Figure 6-12 A: 2D PMF of acetamide pair association at z=0 and water density in biasing cylinder; B: Corrected 1D PMF as a function of pair distance after Boltzmann averaging along the water density reaction coordinate. ........................................................................................ 133 Figure 6-13 A: 2D PMF of methanol pair association at z=4 Å and water density in biasing cylinder; B: Corrected 1D PMF as a function of pair distance after Boltzmann averaging along the water density reaction coordinate. ........................................................................................ 134 Figure 6-14 Free energy profiles of insertion of single amino acid side-chain analogs using HDGB simulations with old and improved parameters compared with results of explicit simulation and experimental measurements. .............................................................................. 135 xi Chapter 1 Introduction 1 1.1 Background The use of computational techniques for study of biological systems is rapidly increasing. Molecular dynamics (MD) simulation is among the most influential computational methods that has given insights into physical behavior of complex biological systems. The increasing use of MD simulation in biological science is indicat indicated ed by the rapid increase of number of research article published in recent years. In 2014, more than 35 35,000 000 research articles are published in scientific journals that have used MD in biological science (Google Scholar,, see Figure 1-1). Figure 1-1 Number of research articles that have used molecular dynamics in biological science. science MD simulation can be used to analyze conformational dynamics and the kinetic and thermodynamic properties of proteins, nucleic acids and lipid membranes. While experimental techniques have limitations in extracting fine details of such systems, MD simulations have been 2 widely applied to analyze properties of these systems.[1-3] In recent years, MD simulations are considered a necessary stage prior/posterior to performing advanced experimental studies, which indicates the significance of results obtained from MD simulations.[4-6] Yet, computer simulations have gone beyond the limitations of experimental research. Scientists have used computer simulations to study how certain enzymes react to antibiotics.[7] The molecular-level insight obtained from computer simulations can be effectively used in future medicine to understand how bacteria become resistance to antibiotics. In this dissertation, we have developed novel computational techniques that have been effectively utilized to extend our knowledge of proteins and lipid membrane systems. This chapter provides an introduction of MD and other computational techniques that are utilized. Then, later chapters focus in more details on the application of such techniques combined with newly developed techniques and protocols to study protein structure refinement and interactions of amino-acid analog pairs within lipid membranes. 1.2 Molecular Dynamics Simulation Molecular dynamics (MD) simulation solves time evolution of a set of discrete particles by solving the Newton’s equation of motion in classical mechanics. A molecular force-field defines the level of interactions among particles and their environment. The notion of discrete particles determines the resolution at which the physical system is described. With advancement of modern computers, study of biological systems in atomistic details has been made feasible.[8] r Given a set of atomic coordinates ( ri ), atomic forces are determined from the derivatives of a 3 given set of potential energy functions, which is based on pairwise interactions of particles. From these forces, acceleration of each atom can be computed according to equation 1-1 Fi = −∇ iU = mi a i 1-1 where mi is the mass of atom i and a i is its acceleration. By numerical integration over equation 1-1, atomic velocities and their new coordinates can be determined. 1.3 Molecular Force-Field A molecular force-field is a set parameters used in different potential energy terms. There are a number of force-fields for biological systems, and understanding their differences and application is essential. CHARMM[9-11], AMBER[12], GROMOS[13], and OPLS[14, 15] are among the most widely used force-fields for biological systems, however, a number of forcefields also exist at different resolutions.[16-18] While the set of potential energy functions used in different force-fields are different, they are generally categorized as bonded interactions and non-bonded interactions. We used CHARMM force-field[9] throughout this dissertation. The latest version of CHARMM force field[9] has the following energy terms as given in equation 12 r V ( R) = ∑K b (b − b0 ) 2 + bonds 0 0 2 + ∑ K χ (1 + cos(nχ − δ )) + )2 + angles ∑ Kω (ω − ω ) improper ∑ Kθ (θ − θ 1-2 dihedrals ∑k UB Urey − Bradley (b 1−3 − b1−3,0 ) 2 + ∑u CMAP (Φ, Ψ ) residues where Kb, Kθ, Kχ, Kω, and KUB are the force constants for bonds, valence angles, dihedral angles, improper angles and Urey-Bradley term, respectively. The CMAP[19] term is a two dimensional spline-based energy function that was introduced to improve backbone treatments of proteins in MD simulations. 4 The non-bonded term contains the Lennard-Jones (LJ) and Coulomb terms in the following form as given in equation 1-3 Vnon −bonded  R min ij = ∑ ε ij    rij nonb . pairs  where ε ij = ε i ε j and R min ij = 12   R min  − 2 ij   r   ij Rimin + R min j 2     6  qi q j + ∑  nonb. pairs εrij  1-3 , and ( ε i , Rimin ) are atomic LJ parameters. In the Coulomb term, qi is the charge of atom i, and ε is the permittivity of free space. rij in both terms represent the distance between atoms i and j. 1.4 Treatment of solvent molecules Biomolecules are naturally embedded in a solvent environment and they interact directly with water molecules. Therefore, accurate treatment of solvents is necessary to derive physical conformations of such systems. Approaches for treatment of solvent molecules have two categories; one in which solvent molecules are explicitly present in the system, and they interact with the solute through non-bonded interactions. While this approach stands as the most accurate representation of a biological system, one of the main computational bottlenecks in inclusion of solvent molecules in the non-bonded calculations is that the cost of non-bonded calculations can be dominated by solvent-solvent interactions. As a result, a second approach has been developed that treats the solvents as a continuum environment, and utilizes the net effect of solvent molecules on the solute. 1.4.1 Explicit treatment of solvent molecules In explicit solvent MD simulations, solvent molecules are represented with a water model. Ref [20] lists 46 distinct water models. In this dissertation, TIP3P[21] is used in all explicit solvent 5 simulations. TIP3P contains 3 particles that represent one oxygen atom and two hydrogens for a water molecule. Although, there are some models that have a higher number of particles per water molecule, however, they make the non-bonded computations more costly. 1.4.2 Implicit treatment of solvent environment Explicit MD simulation of biomolecules uses the most detailed information of solvent atoms, which makes it costly. An alternative way is to remove solvent molecules, and only include the solvent degrees of freedom, and instead estimate the net effect of solvent environment on the solute atoms.[22] To do that, free energy cost of solvating the solute should be calculated, which has three components, i.e., electrostatic, non-polar, and cost of cavity formation, as given in equation 1-4 ∆Gsolv = ∆Gelec + ∆Gnon− polar 1-4 Figure 1-2 shows the schematic diagram of decomposing solvation free energy into its electrostatic and non-polar components in a thermodynamic cycle. In this diagram, the solute molecule is transferred from vacuum (white area) to solvent environment (gray area) in two different thermodynamic paths. Since free energy change is a path-independent thermodynamic property, the change in free energy from direct insertion (the first path) and step-by-step insertion (the second path) should be equivalent to each other. In step-by-step insertion path, first ε =1 the atomic charges in the solute are turned off ( − ∆Gelec ), then the uncharged solute is inserted from vacuum to solvent environment, resulting in non-polar solute-solvent and solvent-solvent ε =80 interactions ( ∆Gnon − polar ). Finally, the charges in the solute are turned on ( ∆Gelec ). 6 Figure 1-2 Schematic diagram of decomposing solvation free energy in a thermodynamic cycle. cycle The electrostatic component of solvation free energy is calculated by Generalized Born (GB) theory, using the formulation proposed by Still et al. al.[23] as follows  1 1  ∑ ∆Gelec = −k  −  ε solute ε solvent  i , j 1-5 qi q j rij2 + α iα j exp(− rij2 4α iα j ) where ε solute , ε solvent are the dielectric constants of the solute and solvent environments, respectively. αi represents the Born radius of atom i, which is average distance of atom i to the solvent-exposed exposed surface of the solute. The more buried atom i is, the larger Born radius it has. The main computational cost in the GB term is spent on Born radii calculation. Several methods have been proposed that estimate Born radii in different ways[24-27] that differ in computational complexity, and accuracy. In one category of such methods, a numerical integration integratio has to be performed over atomic molecular volume to obtain Born radii. radii.[25, 26] The latest GBMV (GB Molecular Volume) formulation[[26] is shown to have very good accuracy compared to PoissonPo 7 Boltzmann (PB) theory in computing the self-polarization energies for solute atoms.[26] Born radii in GBMV are calculated as follows αi = 1 (1 − 2 2) I 4 + I 7 I4 = 1 1 − Ri 4π 1 dV r4 solute , r > Ri  1 1 I7 =  4 −  4 Ri 4π  ∫  1  dV 7 ∫  r solute, r > Ri  1-6 1/ 4 where r is distance of grid points to atom i, and I4 and I7 are integration terms of r to atom i to the 4th and 7th power, respectively. The non-polar component of solvation free energy in equation 1-4, accounts for cost of cavity formation in solvent, as well as van der Waals (vdW) interactions between solute and solvent. In most GB implementations, this term is approximated by solvent-accessible surface area (SASA) ∆Gnon− polar = ∑γ i . SASAi 1-7 i where the scaling factor γ represents the surface tension coefficient. 1.4.3 Implicit membrane model In an implicit membrane environment, the dielectric constant is not homogenously uniform. A schematic diagram of variation of the dielectric constant across membrane is shown in Figure 13. 8 Figure 1-3 Schematic diagram of variation of dielectric constant across membrane bilayer. bilayer Therefore, a special treatment is needed to account for effects of heterogeneous dielectric environment on atomic Born radii and solvation free energy. Heterogeneous Dielectric Generalized Born (HDGB)[28 28, 29] is an extension of GBMV for implicit membrane environments,, which models dielectric constant across bilayer normal as a function of distance from bilayer center, Z. Further, Born radii calculation is also computed as a function of dielectric constant, and the HDGB GB energy formulation is given according to equation 1-8 1-8  1  1 × ∆Gelec,HDGB = − k  − ε  ( , ) ε ε ε solute solvent i j   qi q j ∑ i, j rij2 + α i (ε i )α j (ε j ) exp( − rij2 4α i (ε i )α j (ε j ) ) and the equation for Born radius is modified as follows α i (ε i , ε p ) = 1  3ε w C0 A4 + C1   3ε + 2ε p  w E D+ ε w +1   A7   1-9 + 9 The non-polar polar contribution is also modified by scaling the non non-polar term by a continuous profile S(z). The shape of ε(z) (z) and S(z) profiles as adopted from Sayadi and Feig[30 30], as shown in Figure 1-4 Figure 1-4 The dielectric profile (A), and non non-polar polar scaling profile (B) for HDGB model 1.5 Free Energy Calculation In order to estimate free energy landscape of a thermodynamic reaction along a reaction coordinate using molecular simulation, one needs to sufficiently sample configurational space to reach equilibrium conditions between two thermodynamic states. In molecular ular simulation of liquid environment, Gibbs free energy can be stated as follows ∆G = ∆U + p∆V − T∆S = ∆H − T∆S 1-10 where ∆U, ∆V, and ∆S are the change in internal energy, volume and entropy of the system, respectively, and p and T are pressure and temperature. Entropy of a thermodynamic system is directly related to the total number of configurations accessible to it. it.[31] As the system size gets 10 larger, the number of accessible configurations becomes larger; as a result, sampling all configurations with traditional MD simulations to estimate free energy becomes infeasible. Therefore, enhanced sampling methods for free energy calculations have been developed, which can be classified into three general categories; modified potential, modified sampling, and modified dynamics.[3] In this dissertation, umbrella sampling[32] is used for free energy calculation, which belongs to the first category. In umbrella sampling, the potential function is modified to bias sampling along a specific thermodynamic direction, also known as the reaction coordinate (ξ). The choice of reaction coordinate is critical for this method. The reaction coordinate should drive the system from one desired state to another. The reaction coordinate is divided into equal bins, and the system is simulated with additional potential function for each bin U umb (ξ ) = 1-11 1 2 K umb (ξ − ξ i ) 2 where Kumb is the force constant. Multiple biased systems are run with different equilibrium value of reaction coordinate for the ith bin (ξi). Free energy along the reaction coordinate can be recovered from the ensemble of biased sampling using weighted histogram analysis method (WHAM)[33]. Energy of the unbiased system can be written as E unbiased ( R N ) = E biased ( R N ) − U umb (ξ ) 1-12 Given energy of a particular configuration, the probability of finding the system having reaction coordinate ξ is ∫ δ (ξ − ξ ) exp( − βE p(ξ ) = (R ∫ exp( − βE 0 unbiased N unbiased 11 ( R N )) dR N )) dR N 1-13 where β = 1 k B T . substituting for unbiased energy, the probability distribution can be recovered from biased sampling as follows p (ξ ) = C exp(βU umb (ξ ) ) pbiased (ξ ) 1-14 where constant C is added due to integration. Then, potential of mean force (PMF) can be obtained from the probability distribution of reaction coordinate given above W (ξ ) = −k B T ln p (ξ ) = C − U umb (ξ ) − k B T ln p biased (ξ ) 1-15 = C − U umb (ξ ) + Wbiased (ξ ) 1.6 Dissertation Scope The theoretical backgrounds described in previous sections provide a valuable tool to study biological systems. In this dissertation, MD simulations have been used for two main tasks, protein structure refinement and characterizing peptide-membrane interactions within membrane environment. In the next section, a brief description of tasks is provided. 1.6.1 Protein Structure Refinement Refinement of protein structures using computational methods has remained a challenging task. Given a low resolution 3-dimensional model of a protein from homology modeling, the goal is to further refine the structure towards a high-resolution native-like model.[34] Successful refinement of protein targets using computational techniques can have a huge impact on future biological and pharmaceutical research. Critical Assessment of techniques in protein Structure Prediction (CASP) is a biennial world-wide competition which provides a benchmark for researchers to effectively test their method on new protein targets. The refinement category of CASP, called CASPR, selects the best homology models from tertiary structure prediction 12 groups as input for refinement. Over the past rounds of CASP, refinement has remained a difficult task, with most participating groups showing little to no improvement.[35, 36] The challenge in protein structure refinement is two folds; sampling conformational space of a given protein model, and model scoring and selection from a bag of sampled conformations.[37] Divers approaches involving a combination of physics-based[37-40] and knowledge-based[4143] approaches have been employed to address these challenges using Monte Carlo (MC) or MD simulations. For model selection, a number of scoring functions have also been proposed[44, 45], such as DFIRE[45, 46], Seder[47], and RW-plus[48]. However, structure selection still remains a difficult task since scoring functions are not accurate enough for discriminating models at close resolutions. Therefore, efforts in improving protein models would make the model qualities worse on average, even with long MD simulations of up to 100 µs.[34] We established a robust protocol, which for the first time showed positive improvement on average over CASP targets.[37, 38] Given an initial model, we used harmonic restraints on selected Cα atoms, and ran explicit solvent MD simulations using CHARMM36 force-field. An ensemble of conformations is obtained from MD simulations, which then for structure selection, we scored the models, and selected a subset based on an optimized criterion. Finally, the average structure from this subset was shown to have consistent improvement from the initial model. The details of our protocol is given in chapter 2, the results of our method applied to CASP10 for blind prediction is provided in chapter 3, and results of CASP11 in chapter 4. 1.6.2 Understanding peptide-membrane interactions Membrane proteins play critical roles in cellular processes and signaling pathways that are crucial for cell survival. The rapidly increasing number of experimental structures of membrane 13 proteins has shed light on their biological importance. However, experimental techniques for understanding behavior of peptides and proteins have limitations, especially when dealing with bilayer membranes. As a result, computational approaches provide valuable tools. In the second part of this dissertation, we characterized the peptide-membrane interactions. Previously, experimental data for free energy of insertion of amino acids into membrane bilayer was characterized.[49] Then, MacCallum et al. obtained the insertion profiles of amino acid sidechain analogs into membrane bilayer through molecular dynamics simulations.[2] The free energy profiles of insertion of amino acid side-chain analogs provided a valuable source for understanding polarities and behavior of side-chain analogs in bilayer environment, as well as a benchmark for comparison and parameterization of computational tools for membrane, such as implicit membrane models. While, the mentioned studies gave very useful biological insights on the interactions of peptides with membrane, they did not consider the interactions of amino acids with each other within membrane environment. Therefore, in the first attempt to address amino acid interactions within membrane, de Jong et al.[1] considered different pairs of amino acid side-chain analogs in three different environments, water, n-octanol, and decane. These environments have different polarities, and decane to some extent represents the hydrophobic region of membrane bilayers. However, the interactions of amino-acid pairs in a real bilayer have not been addressed so far. Knowing the importance of such interactions for understanding proteins structure and function, we tried to characterize the interactions of amino acid side-chain analog pairs within membrane environment. Considering all possible pairs of amino-acids could be very expensive, therefore, we only selected four amino acids, i.e. Phe, Val, Ser and Asn. These four amino acids resemble a wide range of amino-acid sizes and polarities. A pair of each side-chain analog is placed in 14 bilayer at different distances from bilayer normal (Z), and their association is studied using umbrella sampling molecular dynamics by pulling them apart at a fixed Z. Due to the polarity of some of the side-chain analog pairs placed in bilayer interior, water defect and membrane deformation was observed for some cases. While these phenomena are described in later chapters in detail, however, convergence issues are raised if neighboring windows are not sampling the same flat/deformed bilayer states. As a result of bilayer deformation in neighboring umbrellas, it is necessary to address the effect of amino acid side-chain analog pairs on bilayer deformation. Therefore, in order to study this physical process, we developed a new biasing potential that can effectively deform bilayer. The number of water molecules in a cylinder along the bilayer normal axis is computed in a continuous, rather than discrete fashion, using a smooth switching function. Then, number density of water molecules is computed, and used as a new reaction coordinate in umbrella sampling. Full description of this biasing potential and its applications in studying density driven processes are provided in chapter 5. 15 Chapter 2 Protein Structure Refinement through Structure Selection and Averaging from Molecular Dynamics Ensembles Vahid Mirjalili, Michael Feig Adapted from Journal of Chemical Theory and Computation, V2, p. 1294-1303, 2012 16 2.1 Abstract A molecular dynamics (MD) simulation based protocol for structure refinement of templatebased model predictions is described. The protocol involves the application of restraints, ensemble averaging of selected subsets, interpolation between initial and refined structures, and assessment of refinement success. It is found that sub-microsecond MD-based sampling when combined with ensemble averaging can produce moderate but consistent refinement for most systems in the CASP targets considered here. 2.2 Introduction Much progress has been made towards predicting the tertiary structure of proteins from their amino-acid sequence.[50-52] By far the most success has been found with template-based modeling (TBM) methods[53-55] where information from known experimental structures is utilized. Traditionally, TBM would use a single homologous protein for which a structure is available, but the best methods combine structural information from multiple templates in a variety of different algorithms.[50, 56-60] Using such methods, structures for most soluble proteins can be obtained today with high accuracy as long as sufficiently close structural templates can be found in the Protein Data Bank.[61] Nevertheless, the resulting models for nontrivial cases often retain structural errors with respect to experimental structures that limit the use of such models in further studies. For example, TBM-derived structures are often problematic as drug design targets[62, 63] or as starting structures for detailed mechanistic studies via molecular dynamics simulations and other computational methods.[64] Structure refinement methods aim at the further improvement of TBM-based models towards experimental accuracy.[35, 36, 65] Because TBM-based models already utilize knowledge from 17 related structures, most refinement algorithms that have been proposed rely on physics-based techniques, in particular molecular dynamics (MD) simulations.[65-68] Although successful examples of MD-based refinement have been reported in the past,[40, 51, 60, 66-72] consistent success appears to be hindered by a combination of insufficient sampling,[60, 73, 74] force field inaccuracies,[67, 75] and an inability to reliably identify refined structures that may be generated during the course of an MD simulation.[60, 70, 75-78] To address these issues, statistical potentials[41, 68, 79, 80] and optimized force fields[67, 81, 82] have been used as well as effective sampling techniques such as replica-exchange[40, 41, 66, 71] and self-guided Langevin dynamics[83] simulations. In some studies it was possible to generate improved structures by as much as 0.5 Å in root-mean-square deviation (RMSD) in one out of five models,[40, 41] but reliable identification of a single refined structure remained difficult. Recently, Fan et al.[71] have shown that by mimicking the electrostatic effects with chaperone Hamiltonian replicaexchange MD simulation can generate refined structures for 10 out of 15 targets with improvements of more than 1 Å RMSD for the secondary structure elements, but again reliable selection of refined structures without knowledge of the native state remained challenging. However, on average models selected based on a statistical potential function, Distance-scaled Finite Ideal gas REference (DFIRE),[44, 84] could be improved by 0.25 Å from the initial models.[71] A common observation is that unrestrained MD simulations of template-based models almost invariably end up drifting away from the native structure.[66, 70] Refinement is more likely to occur when structures are restrained,[66, 70] but the drawback of using restraints is that the degree to which structures can be refined is limited. The most extensive test of MD-based refinement published so far involved simulations up to 100 µs for CASP8 (Critical Assessment 18 of techniques for protein Structure Prediction) and CASP9 refinement targets.[70] In that work from the Shaw group, the final structures were not improved on average but refinement could be achieved by using a cluster-based selection method to reach 1% in terms of GDT-TS (Global Distance Test-Total Score)[85] for conformations extracted from simulations exceeding 10 µs in length. Better structures with sometimes much more significantly improved GDT-TS scores were generated in these simulations but could not be identified reliably.[70] Finally, Zhang et al.[81] used a fragment-guided MD technique, in which different fragments of target proteins were restrained to their homologous templates. Using this technique, improvements in GDT-HA (GDT-High Accuracy) scores were possible for targets with initial GDT-HA scores of greater than 50. However, for CASP8 and CASP9 targets average improvement was limited to only 0.6% in terms of GDT-HA and the improvement in RMSD was insignificant. Here, we are presenting a structure refinement protocol that combines MD-based sampling in explicit solvent using the latest CHARMM (Chemistry at HARvard Molecular Mechanics) force field[9], a scoring protocol that identifies the most native-like structures, and ensemble averaging to mimic the conditions under which experimental structures are obtained. Using this protocol, we are able to consistently refine CASP8 and CASP9 targets with relatively modest computational resources. In the following, the computational methods are described before results are presented and discussed. 2.3 Methods 19 We have performed all-atom molecular dynamics (MD) simulations for 26 refinement targets from CASP8 and CASP9. The targets used here as test sets are listed in Table 2-1. The initial structures were provided by the CASP organizers and represent predicted models of high accuracy for the respective targets that were submitted during CASP. Along with the initial coordinates, the CASP organizers also provided information for many targets about regions that refinement should focus on. This information was used here to apply restraints on the remaining parts of the structure considered to be accurate. For targets where a refinement residue range was not provided during CASP we determined a residue restraint list during the respective CASP rounds when knowledge of the experimental structures was not yet available under the assumption that the core secondary structure elements are likely to be more correct than other parts of the structure. The resulting list of restraints for each target is given in Table 2-1. For 16 targets the restraint regions were selected based on CASP suggestions, and for the remaining 10 targets restraints were based on core secondary structure elements. For each initial structure, missing hydrogens were built using the HBUILD module in CHARMM.[86, 87] The protein structures were then solvated in a cubic box of water with a minimum distance of 10 Å between any protein atom and the edge of the box. The systems were neutralized by adding Naା or Clି as counterions to balance the overall charge. All of the systems were equilibrated by minimization followed by heating through short simulations over 1 ps at 50 K, 100 K, 150 K, 200 K, 250 K, and 298 K. Subsequent production simulations were carried out at 298 K and 1 bar pressure in the NTP (constant number of particles, temperature, and pressure) ensemble over different simulation lengths up to 200 ns. 20 Table 2-1 CASP8 and CASP9 refinement targets used here as test cases with the total number of residues and Cα-RMSD of the initial models from the respective native structures. Restraint regions denote residues for which harmonic restraints were applied to maintain structures near their initial structures. The targets were sorted according to increasing RMSD values. The regions suggested by CASP are shown in bold. Target # of RMSD GDTres. (Å) HA TR592 105 1.26 72.9 TR453 87 1.47 71.3 TR432 130 1.65 77.5 TR462a 75 1.76 57.7 TR594 140 1.82 67.0 TR614 121 1.87 71.5 TR435 137 1.89 67.9 TR530 80 1.99 69.1 TR488 95 2.11 75.0 TR469 63 2.18 63.5 TR462b 68 2.42 48.9 TR389 135 2.64 63.3 TR464 69 2.73 59.8 TR569 79 3.01 52.2 TR454 192 3.24 42.3 TR567 142 3.44 58.3 TR574 102 3.58 40.0 TR557 125 4.06 46.8 TR429a 79 4.31 54.8 TR517 159 4.64 53.6 TR606 123 4.85 52.6 TR429b 76 4.98 30.3 TR624 69 5.19 35.9 TR568 97 6.15 35.8 TR622 122 6.47 51.9 TR576 138 6.85 45.3 Restraint regions 17-29;36-46;58-67;76-121 5-34;45-91 1-84;93-130 1-5;10-16;21-30;35-42;50-53;57-60;64-75 1-71;82-101;114-140 11-33;53-64;75-109 15-19;26-27;38-66;75-87;92-94;98-103;113-133;137-151 36-44;56-74;80-115 1-11;17-95 3-7;11-28;33-50;54-65 76-83;88-91;97-106;114-124;127-129;133-136;140-143 10-15;22-34;49-55;68-73;81-82;100-109;116-126 18-37;44-56;61-86 1-25;44-49;62-79 5-24;29-34;40-44;50-71;77-107;113-138;147-167;176-196 4-21;28-47;55-59;67-74;90-101;109-145 28-35;49-57;71-73;79-81;85-91;97-106 1-11;21-40;49-52;73-100;107-125 22-37;44-57;68-80;89-93;98-100 1-62;89-159 56-144 101-104;108-111;115-122;128-154;162-176 5-11;16-20;34-51;57-73 62-77;91-94;107-108;124-158 1-96 25-56;66-119 The CHARMM36 force field[9] was used in combination with the TIP3 water model[88]. The CHARMM36 force field was recently introduced as an improved version of the previous CHARMM22/CMAP force field[89, 90]. The main differences are improved sampling of 21 backbone propensities in better agreement with experimental data, in particular NMR J-coupling data, and improved side chain torsions, also to improve agreement with experimental data.[9] In all simulations, periodic boundaries were applied and particle-mesh Ewald summation was used to calculate electrostatic interactions using a grid spacing of 1 Å. Direct-space electrostatic and Lennard-Jones interactions were truncated using a switching function between 8.5 Å and 10 Å. All simulations used holonomic constraints on bonds involving hydrogens so that a 2 fs integration time step could be used. Simulations were carried out with and without restraints according to Table 2-1. Restraints were applied through a harmonic force on Cα atoms with a ଶ force constant of 1 kcal/mol/Å . Because part of our refinement protocol involves averaging over structural ensembles, a second set of simulations was carried out to allow side chains in the averaged structures to relax while maintaining the backbone geometries. This was accomplished by resolvation of the refined structures followed by minimization over 5000 steps and two short MD simulations at 10 K and 100 K, each for 40 ps. During these minimization and MD simulations, all Cα atoms were ଶ restrained with a force constant of 100 kcal/mol/Å . The quality of the structures before and after the final refinement simulations was assessed using the MolProbity structure validation web service[91]. All of the systems were initially setup using CHARMM[86, 87] and the MMTSB (Multiscale Modeling Tools for Structural Biology) Tool Set[92]. Production simulations were carried out using NAMD[93]. Analysis was carried out using a combination of CHARMM, the MMTSB Tool Set, and custom scripts and programs. 22 2.4 Results Molecular dynamics simulations were carried out for the CASP8 and CASP9 refinement targets starting from the template-based models provided during the respective CASP rounds for the CASPR refinement competition. Simulations were run with and without restraints and over different lengths of 24 ns, 200 ns, or eight times 3 ns to compare the effect of different amounts of sampling. The conformations sampled for each target during these simulations were then subjected to different selection and averaging protocols with the goal to obtain refined structures. Each protocol and the corresponding results are described in more detail in the following. 2.4.1 Final and Best Structures The most straightforward MD-based refinement protocol would consist of simply considering the final structure at the end of a given MD run. Tables 2-2 and 2-3 show the change in RMSD and GDT-HA, respectively, relative to the native structures for the final structures under different conditions. We show here changes in both RMSD and GDT-HA[94] values because they emphasize different aspects. GDT-HA represents the fraction of residues in the model that are within a short RMSD cutoff from a reference structure. Improvements in GDT-HA characterize to what extent the fraction of high-quality parts of a given structure is increased while ignoring parts of a structure that are of poor quality. RMSD changes capture the entire structure including bad parts of the structure. Often, GDT-HA and RMSD are highly correlated but in some cases, we find refinement in one measure but not in the other and vice versa. The first observation from the results in Tables 2-2 and 2-3 is that without restraints most of the structures move away from the native structure, some significantly, despite the relatively short simulation length of 24 ns. However, for the few cases where the final structure is refined, the improvement can also be quite significant, by about 1 Å for two targets. The occasional success 23 but overall failure with unrestrained MD simulations is consistent with similar findings by other groups.[70] When restraints are applied during simulations of the same length, the number of refined targets increases from 5 to 9 (out of total of 26 cases considered here) but while the restraints prevent large deviations away from the native they also limit to what extent structures can be improved. 24 Table 2-2 Changes in RMSD (Å) from the experimental structure relative to the RMSDs of the initial models during MD simulations with and without restraints over different simulation lengths. For all cases, the ∆RMSD for the final conformation and the overall lowest RMSD are given. Improved cases with negative ∆RMSD values are highlighted in bold. Target TR592 TR453 TR432 TR462a TR594 TR614 TR435 TR530 TR488 TR469 TR462b TR389 TR464 TR569 TR454 TR567 TR574 TR557 TR429a TR517 TR606 TR429b TR624 TR568 TR622 TR576 Avg. #better NO RESTRAINTS 24 ns Final Best 0.42 -0.05 0.34 0.08 1.39 -0.12 0.53 0.04 1.37 0.13 1.11 0.03 0.13 -0.31 -0.27 -0.64 1.29 -0.08 0.72 -0.14 0.23 -0.16 0.81 0.01 0.89 -0.14 0.46 -0.03 0.89 -0.31 -1.00 -1.46 1.82 0.15 -0.01 -0.75 2.32 -1.19 3.05 0.03 1.76 0.01 -0.35 -0.59 -0.90 -1.83 0.59 0.13 0.20 0.03 1.01 0.51 0.72 -0.26 5 15 WITH RESTRAINTS 24 ns Final Best -0.07 -0.18 0.16 -0.09 -0.13 -0.30 0.44 0.04 0.75 0.00 0.40 -0.13 0.30 0.03 0.18 -0.27 0.00 -0.25 -0.02 -0.19 0.10 -0.11 -0.27 -0.62 -0.02 -0.16 -0.24 -0.50 0.06 -0.15 0.02 -0.18 1.07 0.07 -0.58 -0.67 0.20 -0.20 0.50 -0.12 1.63 -0.28 -0.02 -0.17 -0.21 -0.68 0.07 -0.31 0.23 -0.05 0.74 0.49 0.20 -0.19 9 21 25 8 × 3 ns Final Best -0.12 -0.16 0.17 -0.10 -0.26 -0.34 -0.04 -0.26 0.05 -0.12 0.25 -0.11 0.07 -0.08 -0.22 -0.40 -0.16 -0.23 -0.09 -0.20 -0.02 -0.14 -0.11 -0.51 0.03 -0.15 -0.26 -0.47 -0.10 -0.19 -0.03 -0.11 0.09 -0.50 -0.35 -0.57 -0.08 -0.21 0.15 -0.17 0.58 -0.93 -0.04 -0.25 -0.03 -0.37 0.32 -0.10 0.06 -0.72 0.70 0.37 0.02 -0.27 15 25 200 ns Final Best -0.12 -0.20 0.44 -0.09 -0.18 -0.31 0.42 -0.07 0.34 0.00 0.08 -0.13 0.78 0.03 0.26 -0.35 -0.13 -0.26 0.15 -0.19 0.17 -0.11 0.31 -0.62 -0.12 -0.23 -0.28 -0.69 -0.12 -0.20 -0.06 -0.20 1.16 -0.40 -0.61 -0.84 -0.03 -0.26 0.46 -0.12 -0.80 -1.51 0.01 -0.23 -0.63 -0.89 0.29 -0.43 1.63 -0.32 0.28 0.00 0.14 -0.33 11 23 Table 2-3 Changes in GDT-HA from the experimental structure relative to the GDT-HA values of the initial models during MD simulations as in Table 2-2. Improved cases with positive ∆GDT-HA values are highlighted in bold. TR592 TR453 TR432 TR462a TR594 TR614 TR435 TR530 TR488 TR469 TR462b TR389 TR464 TR569 TR454 TR567 TR574 TR557 TR429a TR517 TR606 TR429b TR624 TR568 TR622 TR576 Avg. #better NO RESTRAINTS 24 ns Final Best -10.5 6.2 -5.5 4.0 -22.7 1.4 -6.0 1.3 -23.8 -4.5 -16.2 -1.1 -6.2 0.2 -1.6 2.8 -1.3 6.6 -16.7 -4.8 1.5 8.5 -18.9 -7.3 -4.7 3.3 -7.0 3.2 -11.6 1.3 -3.0 1.6 -7.9 -2.5 1.0 7.2 0.4 10.5 -1.6 2.4 -7.9 -2.2 2.6 7.9 6.9 11.2 -8.0 3.9 -14.1 0.0 -11.6 -4.2 -7.5 2.2 5 18 WITH RESTRAINTS 24 ns Final Best 4.5 8.3 -4.0 5.8 -2.5 5.0 -1.3 5.7 -0.7 3.8 0.0 4.6 -4.9 0.2 -0.9 3.1 2.4 6.6 -2.4 2.0 -1.1 2.2 -6.9 -1.9 0.0 5.4 0.0 6.0 -1.3 2.3 -0.4 4.8 1.7 4.2 2.2 7.4 2.8 12.1 -1.9 3.0 1.8 4.7 -0.7 2.6 4.4 6.2 1.0 3.9 2.7 6.0 -1.1 2.0 -0.3 4.5 9 25 8 × 3 ns Final Best 4.1 6.4 1.7 4.0 3.9 4.8 -0.3 7.3 0.9 2.1 0.4 2.8 -1.6 1.3 1.3 4.4 5.0 5.8 -1.6 2.8 0.7 4.0 -5.8 -0.6 -0.4 3.6 1.3 5.7 1.3 3.0 2.5 4.2 3.2 3.7 3.8 6.6 5.6 11.7 2.0 3.6 -0.4 3.5 0.0 4.6 0.4 4.0 0.3 3.4 4.8 6.0 -0.7 0.5 1.2 4.2 18 25 200 ns Final Best 5.7 9.1 0.9 5.8 2.5 6.4 -1.0 7.3 -0.7 4.1 -1.4 6.3 -3.5 1.8 -2.8 3.1 5.3 7.1 -5.6 3.2 -1.5 2.6 -7.3 -1.9 1.5 6.2 -0.6 7.6 0.3 4.0 2.8 5.3 0.7 6.4 5.2 9.0 8.9 14.5 -2.4 3.0 -1.4 5.7 -1.0 4.3 2.2 6.2 0.3 4.6 2.5 7.4 -1.3 2.9 0.3 5.5 13 25 Extending the sampling to 200 ns further increases the number of structures that were refined at the end to 11 (according to RMSD) or 13 (according to GDT-HA). However, even better results were found when the average final structures from many short simulations (8 × 3 ns) were 26 considered with now more than half of the structures being refined. The use of multiple short simulations is expected to improve sampling over a single long simulation[41, 95] and our results suggest that increased sampling does lead to improved success with refinement. This is in agreement with previous findings.[70] It is interesting to note that when selecting the average final structure from the 8 × 3 ns simulations, we already find an average improvement in GDTHA score by 1.2, comparable to the results reported by the Shaw group after much longer simulations. As shown in Figures 2-6 and 2-7 (supplementary material), the RMSD and GDT-HA scores fluctuate significantly during the simulations and while the final structures are often not improved, there are improved structures at other times during the simulation for many targets. Tables 2-2 and 2-3 also show the improvement in RMSD and GDT-HA for the best structures (in terms of RMSD or GDT-HA) that were sampled during the simulations. Without restraints, only about half of the targets are refined at some point during the trajectory, but with restraints refined structures are found for almost all of the targets, in particular during the longer 200 ns simulation and during the multiple short simulations. The average maximum improvement in terms of GDTHA is again similar to the values for the simulations from the Shaw group after about 10 µs. This finding raises the possibility that such long simulations may not be necessary to achieve refinement and that other methodological factors may be more critical. 27 Table 2-4 Changes in RMSD (Å) and GDT-HA upon selecting structures with the lowest DFIRE score and correlation coefficients of RMSD or GDT-HA vs. iRMSD or DFIRE. Correlation coefficients larger than 0.30 (RMSD) or less than -0.30 (GDT-HA) are highlighted in bold. 200 ns Target ∆ RMSD TR592 TR453 TR432 TR462a TR594 TR614 TR435 TR530 TR488 TR469 TR462b TR389 TR464 TR569 TR454 TR567 TR574 TR557 TR429a TR517 TR606 TR429b TR624 TR568 TR622 TR576 Avg. -0.06 0.16 -0.12 0.21 0.18 0.38 0.20 0.96 -0.13 0.09 0.30 0.30 -0.13 -0.37 -0.09 -0.05 1.08 -0.56 -0.14 0.03 -0.96 -0.09 -0.44 0.03 0.04 0.29 0.04 2.4.2 ∆ GDTHA 0.5 0.3 2.1 4.7 2.0 -4.2 -2.7 -3.4 2.9 -3.2 -3.3 -7.1 0.4 3.8 0.1 0.7 2.0 6.0 9.7 -1.3 0.4 1.0 1.8 2.6 3.9 -2.5 0.7 8 × 3 ns Correlation Correlation ∆ ∆ RMSD/GDT-HA RMSD/GDT-HA GDTRMSD HA vs iRMSD vs DFIRE vs iRMSD vs DFIRE 0.10/0.02 0.04/-0.10 -0.06 0.16/-0.09 0.35/-0.27 0.2 0.19/-0.17 0.30 -1.4 0.95/-0.33 0.89/-0.43 0.35/-0.30 -0.03/0.02 0.06/-0.11 -0.04 -0.25/0.24 -0.01/-0.17 0.4 -0.7 -0.11/-0.21 -0.16/-0.14 0.51/-0.43 0.25/-0.51 0.31 0.61/0.07 0.30/-0.25 0.07 -1.4 0.50/-0.06 0.17/-0.15 0.05/-0.02 0.22/-0.32 0.29 -0.03/-0.01 0.06/-0.33 0.7 0.95/0.15 0.57/0.03 0.08 -3.1 0.71/-0.37 0.34/-0.21 0.16/-0.14 0.15/-0.27 0.93/-0.55 -0.02/-0.03 0.03 0.3 -0.24/0.24 0.06/-0.11 -0.20/0.23 0.01/-0.14 -0.10 0.3 0.46/-0.27 0.11/-0.22 -0.04 -0.8 -0.10/-0.12 0.22/-0.26 -1.5 0.57/-0.43 -0.15/0.04 0.02 0.43/-0.48 0.24/-0.23 -5.8 0.08/-0.49 0.62/-0.28 0.71/-0.22 0.27/-0.15 -0.51 -0.37/0.18 0.12/-0.03 0.04 -2.2 -0.14/0.00 -0.06/0.07 0.0 -0.45/0.21 0.01/-0.13 -0.03 -0.70/0.08 0.18/0.00 0.37/-0.16 0.35/-0.37 -0.19 0.13/-0.07 0.09/-0.16 0.8 -0.10/-0.11 -0.07/-0.08 -0.02 0.05/-0.20 0.05/0.03 0.7 0.64/-0.03 0.15/-0.25 -0.09 -2.0 0.32/0.22 0.48/-0.18 1.6 -0.66/0.34 -0.20/0.06 -0.30/0.36 -0.16/0.00 -0.03 0.20/-0.09 0.32/-0.32 0.06 0.36/-0.04 0.04/-0.23 6.5 0.48/0.00 0.43/0.01 0.01 0.51/-0.12 0.45/-0.15 1.1 -0.04/0.08 0.80/-0.14 -0.26 -2.2 0.43/-0.02 0.55/-0.01 0.19/-0.06 0.41/-0.27 -0.10 0.3 0.71/-0.50 0.48/-0.44 -0.46/-0.05 0.06/-0.05 0.32 -2.2 0.16/-0.03 -0.12/0.05 -0.14/0.15 0.02/-0.20 0.14 0.36/-0.20 0.29/-0.26 1.6 0.82/-0.23 0.77/-0.28 0.27 -0.23/-0.06 0.09/0.05 0.8 -0.22/0.04 0.40/-0.10 0.84 -4.9 0.47/0.03 0.07/-0.08 0.24/-0.06 0.21/-0.16 0.05 -0.5 0.15/-0.10 0.18/-0.15 Lowest-scoring Structures Since refined structures were generated during most of the simulations, the next question we investigated was whether application of a scoring function to an ensemble of structures extracted from the MD runs would allow us to identify the most native-like, and therefore refined 28 structures. Table 2-4 shows the change in RMSD and GDT-HA with respect to the experimental structures when selecting the conformation with the lowest DFIRE score. We chose DFIRE as one of the best-performing scoring functions that has been widely applied in structure prediction applications.[44, 84] The results indicate that selecting structures based on the lowest DFIRE score has similar performance or is even slightly worse than simply taking the final structures. This is not entirely surprising when considering the correlation coefficients between RMSD or GDT-HA and the DFIRE score. Although the correlation coefficients largely have the correct sign (positive for RMSD, negative for GDT-HA), their small magnitude – with a few exceptions – suggests that it would be difficult to reliably select a single structure. We also considered other scoring functions (data not shown) and found similar results. 2.4.3 Ensemble-averaged Structures Next, we considered that experimental structures are the product of conformational averaging rather than representing single snapshots. Consequently, we obtained average structures from the MD-generated structure ensembles. Figure 2-1 shows the effect of averaging different percentages of the MD-generated structures that were sorted either according to their DFIRE score or based on their distance from the initial structure (iRMSD). We find that averaging generally outperforms selecting a single structure, while averaging over the 10% of structures with the lowest DFIRE scores results in a maximum improvement in GDT-HA by 2.6, which is about half of what could be achieved theoretically if the best conformation could be selected from each trajectory. However, when considering RMSD, an even smaller ensemble of only the 1% best-scoring structures results in a maximum improvement by 0.04 Å. Interestingly, selecting structures according to low iRMSD values, i.e. averaging over structures that have moved the least from the initial structure, also results in refinement. The rationale for that finding is that 29 when structures start to deviate significantly from the initial template template-based based model, they are much more likely to move away from the native structure than towards it. Figure 2-1 Change in RMSD with respect to native structure (A) and in GDT-HA GDT (B) upon averaging different subsets of structures sorted by either DFIRE sscores cores or iRMSD. Results from the 200 ns MD runs are shown in blue (circles) and from 8x3 ns sampling in green (triangles). Open symbols denote iRMSD-based based selection; closed symbols refer to DFIRE DFIRE--based selection. The observation that both DFIRE and iRMSD appear to be suitable metrics to identify ensembles of structures that when averaged provide structures that are likely closer to the native state, prompted us to consider a combination of both scores for selecting a subset of structures to be 30 averaged. Since the range of these two scores is different, we first normalized the values by subtracting the mean and dividing by their respective standard deviations for a given set of structures. We then chose values in an open arc segment as illustrated in Fig. 2-2. Given the identity line through the origin (dashed line in Fig. 2-2), structures were chosen within a given angle θ/2, around the line to the origin and at a minimum radial distance ρ from the center of the distribution. To find optimal values of (ρ, θ), we varied ρ from 0.2 to 1.9 with increments of 0.1, and changed the angle θ from 30 to 200 degrees at increments of 10. For each target, we extracted the structures that lie in the aforementioned region, and then calculated the average structure. Figure 2-3 shows the average improvements in RMSD and GDT-HA as functions of ρ and θ. As optimal values that maximize both RMSD and GDT-HA we chose ρ=1.2 and θ=120º. Using these values, the RMSD is improved by 0.07 Å and GDT-HA scores by 2.6. The improvements in RMSD and GDT-HA for individual targets using this criterion are given in Table 2-5. We find that GDT-HA is not further improved over simply selecting the 10% of the structures with the lowest DFIRE score but the improvement in RMSD appears to be more significant. A drawback of structure averaging is that further refinement is necessary afterwards to generate stereochemically good models. As an alternative protocol, we also selected the ensemble structure closest to the subset averages. The data given in Table 2-8 shows that on average there is no improvement in RMSD and there is only a small improvement in GDT-HA for structures taken from the 200 ns simulation. This suggests that averaging rather than selecting a single structure is a key to the success of the refinement protocol described here. 31 Figure 2-2 Subset selection based on combination of DFIRE and iRMSD scores (normalized by their respective standard deviations). Selected structures (green triangles) are outside the circle with radius (ρ) and within the segment with angle (θ). 32 Figure 2-3 Change in RMSD with respect to native structure (A) and GDT-HA (B) as a function of radius (ρ), and angle (θ). Parameters considered to be optimal and used subsequently for subset averaging are indicated by ‘X’. 33 Table 2-5 Change in RMSD (Å) and GDT-HA upon averaging over selected subsets (see text) with and without additional structure interpolation. Averages were calculated for all targets and for those where the correlation coefficient of iRMSD vs. DFIRE is less than 0.4 (indicated by*) 200 ns Corr. Subset Structure iRMSD Average Interpolation Target vs. ∆ ∆ ∆ ∆ DFIRE RMSD GDT RMSD GDT-HA HA TR592 0.01 -0.14 6.2 -0.13 4.3 TR453 0.14 0.09 0.04 2.9 2.9 TR432 -0.03 -0.19 4.4 -0.19 3.7 TR462a 0.49* 0.20 0.13 4.0 3.0 TR594 -0.05 0.15 0.09 2.0 1.3 * TR614 0.45 0.24 0.08 3.9 4.2 TR435 0.59* 0.23 -1.8 0.14 -0.9 TR530 -0.07 -0.16 0.9 -0.16 0.6 TR488 0.04 -0.12 5.0 -0.11 4.5 TR469 -0.16 -0.8 -0.02 -0.03 0.0 TR462b -0.28 0.07 0.00 0.7 2.6 TR389 0.17 -2.6 -1.5 -0.43 -0.48 TR464 0.09 -0.01 1.1 -0.01 0.7 TR569 0.06 -0.29 1.0 -0.27 1.0 TR454 0.23 -0.09 1.7 -0.09 1.8 TR567 0.23 -0.06 3.3 -0.06 2.5 TR574 -0.01 0.24 0.10 3.9 2.7 TR557 0.12 -0.56 4.2 -0.49 3.6 TR429a 0.18 -0.09 9.3 -0.10 8.5 TR517 0.28 0.22 0.12 1.3 1.4 TR606 -0.19 -1.04 2.6 -1.00 3.3 TR429b 0.28 0.0 -0.12 0.3 -0.13 TR624 -0.09 -0.33 4.0 -0.29 3.6 TR568 -0.09 0.02 3.1 -0.02 2.8 TR622 0.84* 0.14 0.07 5.8 5.4 TR576 0.31 0.31 0.21 0.4 0.4 Avg. --0.07 2.6 -0.10 2.4 Avg.* --0.12 2.5 -0.14 2.3 #better -15 23 16 23 34 8 × 3 ns Corr. Subset Structure iRMSD Average Interpolation vs. ∆ ∆ ∆ ∆ DFIRE RMSD GDT- RMSD GDT HA -HA 0.14 -0.12 3.1 -0.11 1.9 0.20 0.03 0.00 2.3 2.9 0.18 -0.14 3.5 -0.12 3.7 0.68* 0.08 0.03 0.7 0.3 0.13 0.01 0.7 -0.01 0.7 * 0.54 0.33 -0.4 0.22 0.7 0.36 -0.9 -0.2 -0.01 -0.02 0.11 -0.17 2.2 -0.15 1.6 -0.06 -0.13 4.2 -0.12 4.5 0.06 -2.4 -0.8 -0.06 -0.06 0.37 -0.03 2.2 -0.06 3.3 0.27 -2.2 -0.8 -0.14 -0.16 0.14 0.03 0.02 0.0 0.4 -0.19 -0.07 0.3 -0.06 1.0 0.14 -0.08 1.3 -0.07 2.1 -0.04 -0.02 3.3 -0.02 2.6 0.12 -0.04 1.5 -0.06 1.2 0.33 -0.18 3.8 -0.15 3.0 0.10 -0.08 6.1 -0.08 6.9 * 0.50 0.03 0.02 2.4 2.0 0.32 -0.01 0.2 -0.03 0.4 * 0.49 -0.15 1.7 -0.13 1.3 -0.07 0.00 0.0 0.4 -0.01 0.48* 0.18 0.14 1.0 1.0 0.26 0.17 0.12 4.3 3.9 0.52* 0.68 -1.3 0.52 -0.9 -0.00 1.5 -0.01 1.6 --0.05 1.7 -0.06 1.9 -16 21 18 20 2.4.4 Structure Interpolation As a result of subset averaging described above we can generate refined structures for a majority of cases (15-16 out of 26 in terms of RMSD and 21-23 in terms of GDT-HA, see Table 2-5). The idea we followed next was that whether it would be possible to refine structures further by extrapolating the 3N-dimensional vector between the initial model and the refined structures. More specifically, we consider the vector difference between the Cα coordinates in the initial r (init ) model, RCα r ( avg ) , and the ones obtained from the ensemble-averaged structures R Cα , most of which are refined relative to the initial model. Note, that the average structure is already superimposed to the initial model as a result of how the ensemble average was generated. We then tested whether a new set of coordinates obtained according to Eq. 2-1 would increase the degree of refinement: r ) r ( init ) r ( avg ) RC( new = ( 1 − α ) R + α R Cα Cα α 2-1 where α is a scaling factor. Here, α=0 corresponds to the initial model, and α=1 corresponds to the ensemble-averaged structure. Values of α between 0 and 1 would correspond to interpolation between the initial and refined structures, values beyond 1 would be extrapolation beyond the refined structures. Figure 2-4 shows the effect of applying Eq. 1 on the overall change in GDTHA and RMSD. We find the optimum value of α to be α=0.6 for maximizing improvements in RMSD, and α=1 for GDT-HA. This result was surprising as we expected that values of α>1 may improve structures further. However, closer inspection of which targets are most affected by the structure interpolation approach suggests that scaling coordinates according to Eq. 2-1 has a stronger effect on the RMSD of targets where the RMSD increased during the refinement stage (see Fig. 2-5), i.e. structures that were made worse during the refinement. On the other hand, 35 there was less of an impact on the structures that could be refined. Hence, the overall effect is an average improvement. It is unclear to what extent this is a general finding but as a result of applying the structure interpolation method (with α=0.8) we find further improvement in terms of RMSD. However, GDT-HA becomes slightly worse when the structure interpolation method is applied. The restraints applied during the MD simulations were either given by the CASP organizers or determined by us (see Table 2-1). An interesting question is whether the origin of the restraint list had an impact on the refinement success. The changes in RMSD and GDT-HA after refinement for the targets with CASP-suggested restraints were -1.4 Å and 2.6, respectively, but somewhat less, -0.04 Å and 2.0, respectively, for the targets where we selected the restraints. Hence, refinement is most successful if sampling can be targeted to the regions known to be deviating most from the native. 36 Figure 2-4 Change in RMSD with respect to native structure (A) and GDT-HA HA (B) upon structure interpolation rpolation between the initial ((α=0.0) and the subset-averaged averaged structures (at α=1.0). Results from 200 ns MD runs are shown in blue (circles) and from 8x3 ns sampling in green (triangles). 2.4.5 Quality Assessment Finally, we considered whether it is possible to predict in which cases refinement is successful and when structures become worse as a result of refinement. Motivated by a previous analysis using a correlation-based based metric, metric,[96, 97] we considered the correlation between the two scores iRMSD and DFIRE, both of which are available without knowledge of the native structure. The rationale for using this score is that because iRMSD is often correlated with RMSD (see Table 237 4), a correlation between DFIRE and iRMSD is indicative of a correlation between DFIRE and RMSD. Figure 2-5 shows the change in RMSD after refinement as a function of this correlation coefficient. It can be seen that all of the significantly refined structures have a correlation coefficient between -0.4 and 0.4 while higher correlation coefficients larger than 0.4 correlate with a lack of refinement. Significant correlation between DFIRE and RMSD (and by proxy with iRMSD) most likely occurs when structures move by a significant extent. It appears from this analysis that in those cases the motion is likely to be away from the native structure rather than towards it. Using a DFIRE/iRMSD correlation coefficient of <0.4 as a criterion that refinement has been successful, we identify four cases, TR435, TR462A, TR614, and TR622, that are outside this range and for which refinement was therefore assumed not to be successful. If we use the initial model (∆RMSD=0) for these targets instead of the ‘refined’ structures, the average change in RMSD from the native improves further, to -0.12 (without structure interpolation) and to -0.14 (with structure interpolation). The effect on GDT-HA is less clear, because the improvement is actually slightly decreased for the 200 ns set but it improves for the 8 × 3 ns sampling set. 38 Figure 2-5 Change in RMSD with respect to native structure as a function of correlation between iRMSD and DFIRE scores with (green triangles) and without (red squares) square structure interpolation. 2.4.6 Final Refinement of Averaged Structures So far, the structural analysis has focused on the Cα coordinates. As a result of the averaging and structure interpolation procedures, the generated structures are of poor quality in terms of bond geometries, clashes, etc. which is readily apparent when submitting those models to structural analysis tools (see Table 2-6). 6). In order to generate overall high quality structures, we performed additional short MD simulations where the Cα atoms were constrained to maintain the overall improvement in structure but where other atoms were allowed to relax. The quality qualit of the final models was improved dramatically (see Table 2-7) to result in high-quality quality refined structures. After the final step, the average change in RMSD was still -0.08 0.08 Å, and the change in GDT-HA GDT was 2.3. For comparison with other studies, we also ccalculated alculated the average improvement in GDT-TS TS for the final structures to be 1.6. 39 Table 2-6 Quality measures of averaged structures before (Avg) and after (MD) refinement via restrained MD simulations. Target TR592 TR453 TR432 TR462a TR594 TR614 TR435 TR530 TR488 TR469 TR462b TR389 TR464 TR569 TR454 TR567 TR574 TR557 TR429a TR517 TR606 TR429b TR624 TR568 TR622 TR576 Avg: Clash score % poor rotamers Avg 147.8 435.8 295.0 445.7 253.2 722.7 382.9 219.0 314.1 366.8 686.6 537.0 126.5 320.3 243.4 168.4 515.8 336.6 656.0 363.7 590.1 551.9 330.1 455.5 472.8 702.3 409.2 Avg 5.6 11.4 5.5 14.3 9.7 45.7 13.0 9.1 9.0 2.3 20.4 17.4 0.0 8.6 4.4 6.7 28.2 11.1 26.5 11.5 38.4 14.3 15.5 11.0 37.4 38.2 16.0 MD 3.0 2.2 0.9 0.0 3.1 8.2 1.4 3.9 0.7 1.1 1.8 4.2 0.0 2.7 1.0 2.2 3.9 2.6 1.6 2.7 6.3 3.2 3.6 2.6 5.5 5.5 2.8 MD 4.2 5.7 4.6 0.0 5.3 10.6 4.6 3.0 7.5 4.6 6.1 10.1 2.0 1.7 1.5 1.0 6.4 2.0 4.4 5.3 6.1 6.4 5.2 2.4 7.7 11.8 5.0 % Ramach. outliers Avg MD 8.8 0.0 8.4 0.0 7.2 0.8 15.3 4.2 14.2 6.0 31.0 10.6 7.6 0.8 10.4 2.6 4.4 3.3 3.4 1.7 19.7 4.6 14.0 5.4 1.5 0.0 6.7 1.3 5.9 0.5 4.5 0.8 23.2 5.1 9.0 2.5 34.2 7.9 8.9 3.8 30.6 5.0 28.4 8.1 9.1 0.0 17.2 4.3 29.3 3.5 24.8 6.8 14.5 3.4 Cβ dev. % bad bonds % bad angles MolProbity score Avg MD Avg MD Avg MD Avg 64 2 82.7 0.0 41.4 0.0 3.9 77 3 92.9 0.0 75.0 0.0 4.6 112 2 85.8 0.0 76.4 0.0 4.2 63 3 94.6 0.0 93.2 0.0 4.7 116 5 91.2 0.0 74.3 0.7 4.4 119 15 100 0.0 100 6.1 5.6 114 3 88.7 0.0 75.2 0.8 4.6 63 1 80.8 0.0 65.4 0.0 4.2 72 3 89.3 0.0 61.3 0.0 4.3 49 0 78.7 0.0 75.4 0.0 3.9 58 2 98.5 0.0 92.7 1.5 5.2 124 11 96.2 0.0 92.4 3.8 5.0 44 0 79.4 0.0 50.0 0.0 2.7 57 3 85.7 0.0 75.3 1.3 4.3 149 0 86.7 0.0 56.9 0.5 4.0 101 3 75.0 0.0 46.3 0.7 3.9 94 6 97.0 0.0 90.1 4.0 5.2 111 5 91.9 0.0 83.9 0.8 4.5 76 6 10 0.0 98.7 3.9 5.3 148 8 96.9 0.0 87.4 0.6 4.5 118 15 95.9 0.0 100 4.9 5.4 71 8 92.1 0.0 96.1 4.0 5.0 58 2 95.6 0.0 88.2 0.0 4.6 87 4 97.9 0.0 96.8 1.1 4.8 110 7 97.5 0.0 96.6 0.9 5.3 127 13 98.5 0.0 99.3 1.5 5.4 92 5 91.1 0.0 80.3 1.4 4.6 40 MD 2.0 1.9 1.3 1.0 2.3 2.9 1.8 2.0 1.8 1.5 2.1 2.6 0.7 1.6 1.0 1.3 2.5 1.8 2.1 2.1 2.6 2.4 2.1 1.8 2.5 2.8 1.9 Table 2-7 Summary of the average improvements in RMSD (Å) and GDT-HA for all the attempted methods for structure selection out of 8×3 ns and 200 ns simulation sets; Best in trajectory is given as a reference for the maximum possible improvement. ∆ RMSD ( Å ) 8 × 3 ns 200 ns -0.27 -0.33 0.02 0.14 0.05 0.04 -0.03 -0.04 0.01 -0.04 Method: Best in trajectory Final Structure Lowest DFIRE Average over 10% lowest DFIRE Average over 1% lowest iRMSD Subset average from combined DFIRE/iRMSD scores Closest structure to subset average Subset average and structure interpolation Subset average/interpolation with correlation-based filtering ∆ GDT-HA 8 × 3 ns 200 ns 4.2 5.5 1.2 0.3 -0.5 0.7 1.6 2.6 1.4 2.4 0.00 -0.07 1.5 2.6 0.07 0.01 -0.6 0.6 -0.01 -0.10 1.6 2.4 -0.06 -0.14 1.9 2.3 2.5 Discussion and Conclusion We are presenting here a new protocol for structure refinement that is based on MD simulations, but adds a new scoring and averaging protocol. A summary of the performance with different structure selection methods is presented in Table 2-7. Overall, the refinement results reported here are moderate, but what we consider most important is that we are able to consistently refine the large majority of structures rather than making a significant fraction worse as in earlier attempts at structure refinement. The overall refinement results are better than those reported recently by the Shaw group despite the much shorter simulations used here which may be due to a number of different reasons. The force field that was used here is a recently updated version of the CHARMM force field that appears to outperform most other available force fields in other tests.[9] Furthermore, the use of ensemble averages instead of single structures appears to lead to significant improvements that may compensate for the much more limited sampling compared to 41 the work by Shaw et al. With respect to the sampling, we find that nearly equivalent refinement can be achieved with multiple short simulations rather than a single long simulation. This is consistent with previous findings,[41, 95] but is a point that merits further investigation since it is generally much easier to run many short simulations than one very long simulation on commonly available computer platforms. We also attempted here to employ an extrapolation scheme to further refine structures –which was not successful so far – and an assessment criterion to determine whether structure refinement is successful –which does appear to have merit. Another question is whether the refinement success is biased by how the starting structures were generated. The targets considered here were selected by the CASP organizers from the best predictions during the CASP competition. While this limits the methods by which the models were generated to a few top groups, an effort was made to avoid selecting models from only one participating group. Hence, the models used as starting structures here represent some degree of diversity in terms of how they were created. Since we see consistent refinement across most of the targets we assume that refinement success is independent of the exact way the structures were initially prepared. Furthermore, similar results for sampling from 200 ns simulations vs. 8 x 3 ns simulations suggests that just a few nanoseconds were enough to equilibrate the structures sufficiently. Finally, it would be interesting to see whether repeated application of the protocol presented here can be used in an iterative protocol to achieve more significant refinement. These are areas that we will focus on in more detail in future studies. 2.6 Acknowledgment 42 We would like to thank Nan Liu for the initial setup and generation of some of the simulation data presented here. Funding from NIH GM084953 and NSF CBET 0941055 is acknowledged. Computer resources were used at XSEDE facilities (TG-MCB090003) and at the HighPerformance Computing Center at Michigan State University. 43 2.7 Supporting Information Table 2-8 Change in RMSD (Å) and GDT-HA for the structure closest to the subset average relative to their respective values from the initial models. Target TR592 TR453 TR432 TR462a TR594 TR614 TR435 TR530 TR488 TR469 TR462b TR389 TR464 TR569 TR454 TR567 TR574 TR557 TR429a TR517 TR606 TR429b TR624 TR568 TR622 TR576 Avg. 200 ns ∆ RMSD ∆ GDT-HA -0.08 3.1 0.15 -0.3 -0.20 0.4 0.20 2.7 0.24 -0.2 0.44 -1.1 0.26 -3.8 -1.6 -0.11 -0.14 2.1 0.03 -2.8 0.11 0.4 -5.2 -0.21 0.03 0.7 -1.0 -0.30 -0.08 1.4 -0.05 2.6 0.79 0.7 -0.50 1.4 -0.11 8.9 0.31 -1.0 -0.48 1.2 -0.3 -0.13 -0.42 4.0 -0.06 1.6 0.21 2.5 0.40 -0.7 0.01 0.6 8 × 3 ns ∆ RMSD ∆ GDT-HA -0.2 -0.03 0.07 -0.3 -0.11 0.4 0.11 0.7 0.04 -0.9 0.19 -2.5 0.08 -3.3 -2.2 -0.09 -0.07 0.5 0.00 -3.2 0.08 0.4 0.03 -4.9 0.04 -0.4 -0.03 0.3 -0.1 -0.05 0.01 2.8 0.14 -2.7 -0.16 2.2 0.03 2.4 0.07 0.8 0.02 -2.9 -0.06 0.3 0.06 0.0 0.22 0.5 0.39 -0.6 0.72 -3.3 0.07 -0.6 44 Figure 2-6 Change in GDT-HA of all CASP8 and CASP9 targets after refinement without imposing restraints using C36ff. 45 Figure 2-7 Change in GDT-HA vs. time for all targets with 200 ns simulation with imposed restraints. 46 Chapter 3 Physics Based Protein Structure Refinement via Data Mining through Multiple Molecular Dynamics Trajectories and Structure Averaging Vahid Mirjalili, Keenan Noyes, Michael Feig Adapted from Proteins, Structure, Function and Bioinformatics, V82, p196-207, 2014 47 3.1 Abstract We used molecular dynamics (MD) simulations for structure refinement of CASP10 targets. Refinement was achieved by selecting structures from the MD-based ensembles followed by structural averaging. The overall performance of this method in CASP10 is described and specific aspects are analyzed in detail to provide insight into key components. In particular, the use of different restraint sets, sampling from multiple short simulations vs. a single long simulation, the success of a quality assessment criterion, the application of scoring vs. averaging, and the impact of a final refinement step are discussed in detail. 3.2 Introduction Two decades of CASP (Critical Assessment of Techniques for Protein Structure Prediction) have documented significant progress with predicting the structure of proteins from their amino acid sequences.[55, 98-102] This can be attributed to the development of new techniques but an increasing number of structures in the Protein Data Bank (PDB)[103] are at least an equally important factor.[104-107] The most reliable method for protein structure prediction is template based modeling.[102, 105, 108, 109] The resulting models are often overall correct, but deviate from experimental structures in detail with typical root mean square deviations (RMSD) of 2-6 Å due to intrinsic errors when constructing models based on template structures.[35, 36] Therefore, recent attention has shifted towards the refinement of template-based models to improve their accuracy and generate models that are suitable for biological and pharmaceutical studies.[57] A variety of methods for the refinement of template-based models have been proposed, with the majority involving some combination of sampling and scoring with an emphasis on physicsbased methods, such as molecular dynamics[65, 70, 110, 111]. At the same time, knowledge 48 based methods have also been proposed[77, 82, 112, 113]. The challenges with typical structure refinement protocols are two-fold: 1) Sampling has to progress at least in part towards the native structure; and 2) improved structures generated by the sampling method have to be reliably selected. In terms of sampling, different strategies have been explored. The application of restraints on some regions of the protein judged to be of higher quality than other regions often leads to improved sampling of refined structures.[70] Other strategies have involved enhanced sampling methods such as replica exchange MD simulation[114] and self-guided Langevin dynamics[83] as well as implicit and explicit solvent simulations.[111, 115] A key issue is the quality of the force field which ultimately determines whether refined structures are likely to be generated. In the past, force fields have been optimized specifically for refinement[67, 116], but improvements in general biomolecular force fields[117, 118] are expected to also impact the ability to carry out successful structure refinement. While sampling methods are often able to generate refined models, these are typically not found at the end of a given sampling run but instead at intermediate time points. The challenge is then to find those refined structures from the ensemble of structures generated at the sampling stage. The force fields used for sampling, while physically accurate, are often too noisy to reliably identify single structures or small subsets of structures that are most native-like. Instead, a number of statistical potential functions have been used for scoring decoy structures, such as DFIRE,[119, 120] GOAP,[121] DOPE,[122] and OPUS-PSP[122]. All of these scoring functions have shown promise in selecting native-like structures from an ensemble, but struggle with consistently selecting refined structures.[37, 114, 123] Despite considerable efforts, effective structure refinement protocols have remained elusive. During the last round of CASP, CASP9, there were only a few groups that were able to 49 outperform a naive prediction of simply resubmitting the initial model given by the organizers to be refined[35]. Furthermore, refinement progress was very modest and predictions from the most successful groups lacked consistency as some targets were refined significantly, while others were made worse. Further efforts since CASP9 include very long MD simulation by the D. E. Shaw group[70]. In that work, it was clearly shown that without restraints the initial models are likely to drift away from the native structure making refinement largely impossible. When restraints were applied, the sampling of refined structures became possible but the reliable selection of refined structures remained a significant obstacle. Overall, structures selected based on cluster size and/or energetic criteria were improved on average 1% in terms of GDT-TS. A similar level of performance was reported by Zhang et al.,[110] in which they combined knowledge-based information with physics-based MD simulations and applied a fragmentguided method with distance restraints used on global and local structural templates from the PDB. Gront et al.[124] recently provided a comprehensive review of refinement methods ranging from physics based to knowledge based methods and concluded that refinement is more challenging when starting structures are already within 2-3 Å from the native structure. In that paper it was also noted that knowledge-based methods may have an advantage because they are parameterized based on experimental structures which are the target of refinement protocols vs. physics-based methods that aim at capturing the protein dynamics at the global minima of the energy landscape. The distinction between (simulation-generated) protein dynamics and experimentally-obtained structures may become increasingly important as refinement methods aim to reproduce experimental structures at high accuracy. One particular issue is that experimental structures reflect ensemble- and time-averaged conformations rather than instantaneous snapshots. 50 Following this idea, we have recently devised a structure refinement protocol that obtains refined structures from ensemble averages over selected subsets instead of single snapshots[37]. When this protocol was applied to ensembles from extensive MD-based sampling with the recently updated CHARMM36 force field in combination with explicit water, significant and consistent refinement became possible when tested on CASP8 and CASP9 targets. Here, we describe the blind application of such a refinement protocol during CASP10. In the following we will first describe the methodology before presenting and discussing results obtained during CASP10 and from subsequent post-analysis. 3.3 Methods The initial models from CASP10 were preprocessed by adding missing hydrogens using the HBUILD module in CHARMM.[86] Protonation states of His residues (if present), were determined by visual inspection. The pKa values of other titratable residues (Glu, Asp, Lys, Arg) were determined using the PROPKA web server[125, 126] followed by visual inspection. All proteins were subsequently solvated in a cubic box of water with at least 9 Å cutoff to the edge of the box. The systems were neutralized by adding Na+ or Cl- to balance the net charge of the systems. The solvated systems were then subjected to molecular dynamics (MD) simulations with periodic boundary conditions. The non-bonded interactions were cut off using the switching method between 8.5 to 10 Å, along with particle-mesh Ewald (PME) summation using a grid spacing of 1 Å for long range electrostatic interactions. The simulations were performed under NPT condition using Langevin dynamics at a temperature of 298 K with a Langevin piston to maintain constant pressure at 1 bar. A time step of 2 fs was used with the SHAKE algorithm to 51 fix bonds involving hydrogen atom. The CHARMM36 force field[118] was used to model the proteins in conjunction with the TIP3 water model[88]. All of the simulations used some form of restraints. Two types of restraints were used for almost all of the targets; type 1 consisted of weak restraints (with a force constant of 0.05 kcal/mol/Å2) applied to all Cα atoms; type 2 involved strong restraints (with a force-constant of 1 kcal/mol/Å2) applied to Cα atoms of only the regions that were assumed to be reliable in the starting model. For targets, where CASP organizers indicated which regions to refine, we followed their suggestions. In other cases, we assumed that secondary structure elements are likely to be more reliable and applied restraints to those while leaving loops flexible. Table 3-1 shows the regions which were selected for the strong restraints. In some cases, a combination of weak and strong restraints was used by applying strong restraints on selected residues but weak restraints on the rest. Due to the presence of zinc ions in TR754, the first set was modeled with weak restraints on all Cαs except for the region around the zinc fingers. The heating and equilibration protocol involved 10 stages: First, simulations were carried out at 50 K using Cα restraints according to Table 3-1 with a force constant of 2 kcal/mol/Å2 and a force constant of 0.5 kcal/mol/Å2 for all other Cα atoms. The temperatures and force constants were subsequently increased/decreased in 10 ps steps to (100 K, 2/0.5 kcal/mol/Å2), (200 K, 2/0.5 kcal/mol/Å2) (200 K, 1.5/0.2 kcal/mol/Å2), (200 K, 1/0.1 kcal/mol/Å2), (200 K, 1/0.05 kcal/mol/Å2), (250 K, 1/0.05 kcal/mol/Å2), (298 K, 1/.0.05 kcal/mol/Å2), (298 K, 1/0.01 kcal/mol/Å2), and (298 K, 1/0 kcal/mol/Å2). The structure at the end of the final stage was used as the starting point for all of the production runs. 52 Production simulations consisted of 10 or 20 replicate MD simulations, each 20 ns long and starting from the same starting structure using either strong or weak restraints (see Table 3-1). We ran multiple short simulations instead of a single long simulation to maximize sampling given limited availability of computer resources.[95] During post-analysis we also carried out single long simulations (200 ns) using the restraint types listed as set 1 (Table 3-1). Ensembles of structures were generated from the simulations, containing 500 snapshots for each of the MD trajectories. Structures in each replica ensemble were analyzed in terms of the RMSD from the initial model (iRMSD) and their DFIRE scores. Following our previously established protocol[37] (see Fig. 3-1), we began by using the correlation coefficient between iRMSD vs. DFIRE as a quality assessment score. Replicas with correlation coefficients greater than 0.4 were discarded from subsequent analyses. From the remaining replicas a subset of structures with combined minimal iRMSD and DFIRE scores[37] were then selected. Briefly, the selection criteria is based on normalized iRMSD and DFIRE scores to be within an angle θ/2 around the identity line and outside a circle of radius ρ from the center of the distribution, corresponding to the lower left corner of the scatter plot of iRMSD vs. DFIRE.[37] The criterion used in CASP experiment was, however, slightly different than what was used for testing the protocol on CASP8 and CASP9 targets because of additional optimization.[37] Here, we used ρ=1, and θ=100º. An average structure was then calculated from the selected subset of structures followed by a structure interpolation. This was accomplished by taking the point on the vector between the corresponding Cα atoms in the average and the initial model, with its distance to the initial model to be 0.55 of the vector length. The coordinates of all other atoms were copied from the initial model. 53 Table 3-1 Type of restraints applied on Cα atoms for the two simulation sets; strong (1 kCal/mol/Å2), weak (0.05 kCal/mol/Å2) and a combination of both. The strong force constants are only applied to the selected regions. Target TR644 TR655 TR661 TR662 TR663 TR671 TR674 TR679 Set 1: Set 2: 20×20 ns 10×20 ns Combined Strong Strong Weak Weak Weak Strong Combined Strong Strong Weak Weak Strong Strong Weak TR681 Strong Weak TR688 Combined Strong TR689 Strong Weak TR696 TR698 TR699 Weak Strong Weak Strong Weak Strong TR704 Weak Strong TR705 Weak Strong TR708 Weak Strong TR710 TR712 Weak Strong Strong Weak TR720 Weak Strong TR723 TR724 TR738 TR747 TR750 TR752 Strong Weak Strong Weak Weak Strong Weak Strong Weak Strong Strong Combined TR754 Weak Strong Strongly Restrained Regions 53:56, 61:66, 71:75, 84:87, 115:119, 129:132, 140:142, 151:153 21:50, 65:90, 94:141, 164:180 5:16, 38:50, 66:79 79:140, 182:204 38:54, 77:80, 85:89, 96, 108:125 284:288, 300:305, 310:312, 318:320, 333:335 1:24, 46:145, 157:186, 198:223 21:40, 51:57, 65:87, 102:118, 128:144, 153:157, 171:172, 200:224 46:54, 67:76, 89:98, 113:122, 137:145, 160:167, 182:190 14:21, 33:39, 48:59, 64:72, 81:89, 116:118, 143:147, 156:160, 165:169, 181:190, 197:207, 211:218, 226:234 18:22, 27:35, 41:43, 50:51, 58:60, 69:73, 93:96, 101:105 1:16, 36:89, 101:119 8:11, 37:45, 53:55, 86:94, 103:135, 161, 205:206, 219:234 25:32, 40:42, 50:55, 61:64, 81:83, 100:102, 113:120, 128:132, 141:149, 161:166, 188:189, 193:200, 204:209, 217:226, 236:237, 242:246 40:42, 65:67, 82:85, 90:91, 110:114, 119:126 24:27, 45:60, 66:70, 99:101, 113:119, 125:129, 136:152, 172:183 27:50, 67:83, 100:117, 135:152, 168:185, 201:220 38:79, 90:115, 130:140, 156:223 27:29, 53:71, 80:86, 91:103, 108:114, 127:139, 144:147, 154:157, 162:176 39:73, 99:112 135:136, 152:157, 198:202, 210:216, 232:238 1:38, 88:90, 103:249 24:26, 46:49, 55:59, 68:71, 80:83, 92:94, 103:109, 114:121 1:6, 28:29, 48:57, 64:66, 78:93, 98:100, 121:137, 168:182 1:40, 51:99, 111:124, 129:156 25, 33, 63:76 (weak restraints are not applied to the zinc fingers) 54 Initial setup of the model Determine choice of restraint sets Restraint Set1: Run 20x20 ns MD Simulation Restraint Set2: Run 10x20 ns MD Simulation Calculate iRMSD, DFIRE and their correlation for each replica Extract replicas with corr(iRMSD, DFIRE) < 0.4 Create two subsets of structures from all replicas based on the iRMSD and DFIRE score criterion MODELS 1, 2: Calculate the Avg. Str. and interpolations Str. from sets 1, 2 MODELS 3, 4, 5: Select structures with lowest DFIRE score Figure 3-1 Flowchart of the refinement protocol, from simulation to model selection 55 The resulting structure was then solvated again, neutralized by adding appropriate charges, and subjected to 5,000 energy minimization steps followed by 40 ps of MD simulation at 100 K with restraints on all Cαs and a force constant of 100 kcal/mol/Å2. The purpose of the final MD simulations was to relax structural artifacts due to the averaging procedure and generate structures that are of high stereochemical quality. The application of the above protocol to simulations with restraint sets 1 and 2 resulted in models 1 and 2 submitted to CASP. Models 3-5 were selected from the trajectory snapshots with low DFIRE score but outside the region of the scatter plot used for averaging with the idea that some of these structures may be refined more extensively compared to models 1 and 2. All of the molecular dynamics simulations were carried out with the NAMD molecular dynamics package in conjunction with the MMTSB tool set[92] which was also used for analysis along with custom scripts. The protein structures were visualized via the PyMol molecular visualization software.[127, 128] 3.4 Results The MD-based refinement protocol described in the methods section was applied to 27 CASP10 refinement targets. The protocol was not applied to one target, TR722, which was modeled unsuccessfully using an entirely different procedure. 56 Table 3-2 Refinement results showing the best observed structures in the trajectories, the first submitted model, the best of five submitted models, and best model among models 3-5. Target TR644 TR655 TR661 TR662 TR663 TR671 TR674 TR679 TR681 TR688 TR689 TR696 TR698 TR699 TR704 TR705 TR708 TR710 TR712 TR720 TR723 TR724 TR738 TR747 TR750 TR752 TR754 Avg. 3.4.1 Best in 30x20ns trajectories ∆RMSD ∆GDT(Å) HA -0.94 11.0 -0.26 2.4 -0.25 6.1 -0.54 13.0 -0.41 4.6 -0.62 5.4 -0.78 7.0 -0.55 4.8 -0.13 5.2 -0.14 6.9 -0.25 2.3 -0.81 11.0 -0.32 3.6 -0.33 4.1 -0.57 7.8 -0.51 10.7 -0.84 1.3 -0.20 11.1 -0.54 3.1 -1.85 5.1 -0.71 11.8 -1.49 8.5 -0.37 10.6 -0.44 13.1 -0.43 11.8 -0.30 3.1 -0.35 2.6 -0.55 7.0 First submitted model ∆RMSD ∆GDT(Å) HA -0.03 2.8 0.3 0.04 -0.03 1.9 -0.20 5.3 -0.12 2.6 -0.01 0.6 4.9 0.00 0.6 0.01 -0.04 1.1 1.5 0.01 -0.10 3.5 -0.13 3.5 -0.02 -0.4 -0.09 4.6 -0.17 3.9 -0.14 6.0 2.7 0.09 -0.04 4.3 -0.08 3.4 2.7 0.02 -0.13 6.5 -0.01 2.6 -0.20 6.0 -0.10 0.8 -0.16 4.8 -0.12 1.4 0.09 -6.3 -0.06 2.6 Best of five models ∆RMSD (Å) -0.55 0.00 -0.03 -0.25 -0.15 -0.25 -0.06 -0.03 -0.15 -0.02 -0.13 -0.33 -0.02 -0.09 -0.23 -0.24 0.09 -0.06 -0.14 -0.99 -0.39 -0.48 -0.30 -0.10 -0.16 -0.12 0.09 -0.19 ∆GDTHA 5.3 0.3 1.9 6.7 3.6 2.8 4.9 3.3 5.4 2.2 4.9 4.8 -0.4 4.6 5.6 7.3 2.9 4.3 5.0 3.2 9.7 3.7 9.5 0.8 4.8 1.4 -6.3 3.8 Best of models 3-5 ∆RMSD (Å) 0.04 0.20 0.17 -0.25 -0.15 0.09 -0.06 0.12 -0.15 0.02 -0.12 -0.33 0.09 -0.07 -0.23 -0.24 0.10 -0.04 -0.14 -0.99 -0.39 --0.09 0.10 -0.04 -0.05 0.09 -0.09 ∆GDTHA -1.4 -0.9 -2.2 6.7 3.3 2.8 -3.4 1.0 5.4 -0.1 2.3 4.8 -0.6 3.7 5.6 7.3 -2.4 2.1 5.0 1.1 9.7 -3.5 -0.6 2.5 -0.7 -7.4 1.8 Overall CASP10 Performance Five models were submitted for each of the targets. The first and second models resulted from ensemble averaging. The other models were selected based on favorable DFIRE scores (see methods section). Table 3-2 shows the changes upon refinement, ∆RMSD and ∆GDT-HA, with respect to the initial models provided by CASP for the first submitted model and the best of all five models, respectively. The average change in RMSD for the first models is -0.06 Å, and the 57 average change in GDT-HA is 2.6. More importantly, 20 out of 27 targets improved in RMSD, and 25 targets improved in terms of GDT-HA. This performance is similar to what we found previously when testing the protocol on CASP8 and CASP9 targets.[37] When selecting the best out of five structures, the average improvement in terms of RMSD is -0.19 Å, and in terms of GDT-HA is 3.8. Looking at all five models, 24 targets are improved with respect to RMSD, and all targets except for TR754 are improved in GDT-HA. The overall best refinement case has an RMSD value that is improved by almost 1 Å (TR720) and GDT-HA improvements by nearly 10 units (TR723 and TR738). These results suggest that with this refinement protocol it is possible to consistently generate significantly refined structures from the initial template-based models. The only target where the predicted structure was significantly worse than the starting structure was TR754, where the presence of zinc ions presumably complicated the scoring with DFIRE. Furthermore, Table 3-2 lists the best observed structures in terms of RMSD and GDT-HA throughout all 30×20 ns trajectories. Because the best cases were not necessarily picked out for submission, this information provides a theoretical limit of how much refinement could have been achieved with a perfect scoring function. Significant refinement of 1.85 Å in TR720 is observed, as well as several cases with improvements in GDT-HA higher than 10%. On average, improvement in RMSD is 0.55 and 7.0 for GDT-HA. On the other hand looking at the best of five models, we see that the RMSD and GDT-HA are improved by 34% and 69% of the maximum possible improvements, respectively. To our knowledge, no single-structure selection protocol can achieve such a result. Interestingly, there are a few cases where the refined structures are actually better (in terms of GDT-HA) than the best single structure from the trajectories (TR689, TR699, TR708, TR712). This indicates that the averaging procedure used here leads to additional refinement over just selecting the best structure from a given ensemble. 58 Figure 3-2 shows four of the best modeled targets, TR662, TR674, TR723, and TR738. While most of the secondary structure elements are fixed, some of the loop regions were refined to conformations intermediate between the initial model and the experimental reference. This suggests that refinement is proceeding towards the right direction but it is clear that further progress is needed to fully reach experimental accuracy. Figure 3-2 Correlation of iRMSD vs. DFIRe scores of individual replicas for TR674; Set 1 (replicas 1:20) is shown with red boxes and set 2 (replicas 21:30) with blue. Replicas with corr>0.4 are discarded for model selection. 3.4.2 Model Selection based on Lowest DFIRE and Highest iRMSD For models 3-5, we selected structures with the lowest DFIRE score and higher iRMSD values. The rationale was that ideally the lowest DFIRE scores would identify the most native structures while higher iRMSD values would allow for more significant refinement but also risks larger deviations away from the native. This is in contrast to the more conservative criterion used for the subset ensemble selection based on small iRMSD values. Table 3-2 shows the best structures among the submitted models 3-5. It can be seen that while there are indeed some cases with 59 significantly refined structures (TR720, TR723) and overall average improvement in both RMSD and GDT-HA, there are also many cases where no refinement was achieved. Although the best of three models were analyzed here, the results remain inferior to the single model 1 obtained from ensemble averaging. 3.4.3 Quality Assessment using Correlation between iRMSD and DFIRE One aspect of our refinement protocol is to estimate whether a given set of samples likely includes significantly refined structures. As discussed in more detail in our previous paper,[37] we identified the correlation coefficient between DFIRE and iRMSD as a suitable metric. Correlation coefficients above 0.4 appeared to be correlated with poor refinement performance;[37] we applied this criterion here to discard trajectories where this condition was satisfied from further analysis. To further assess the validity of this assumption, we compare in Table 3-3 the fractions of improved structures in terms of RMSD and GDT-HA for replicas where the correlation is less than 0.4, with those where the correlation is greater than or equal 0.4. While the results vary greatly for individual targets, there is on average a modest enrichment in terms of both RMSD and GDT-HA, both by about 6%, when discarding samples from replicas where the correlation coefficient is above 0.4. This suggests that the quality assessment procedure used here adds value and it could be used in the future to guide the generation of additional trajectories for cases where refinement appears to be difficult as suggested by many replicas with correlation coefficients above the 0.4 threshold. 60 Table 3-3 Fraction of improved trajectory frames in replicas classified by correlation between iRMSD and DFIRE score; Fraction of improved frames in trajectories with correlation <0.4 that are larger by 10% than fractions with correlation≥0.4 are highlighted. Target TR644 TR655 TR661 TR662 TR663 TR671 TR674 TR679 TR681 TR688 TR689 TR696 TR698 TR699 TR704 TR705 TR708 TR710 TR712 TR720 TR723 TR724 TR738 TR747 TR750 TR752 TR754 Avg. 3.4.4 Fraction (%) of traj. frames improved in RMSD Corr < 0.4 Corr ≥ 0.4 46.5 N.A. 4.4 3.7 12.6 N.A. 70.4 8.4 28.0 22.7 18.7 5.4 23.1 5.2 15.3 35.9 0.4 0.0 1.7 0.5 33.4 10.5 55.1 33.8 60.3 59.7 29.2 26.8 53.2 NA 33.7 3.0 3.4 9.8 26.3 NA 28.4 20.0 27.8 39.2 57.4 68.7 37.6 99.6 85.7 99.9 38.2 8.0 54.3 22.4 35.7 13.9 1.0 0.5 32.7 26.0 # replicas with Corr≥0.4 0 16 0 6 15 12 6 3 2 3 7 1 4 1 0 4 6 0 1 8 3 1 2 5 6 6 22 Fraction (%) of traj. frames improved in GDT-HA Corr < 0.4 Corr ≥ 0.4 14.1 N.A. 2.3 0.5 8.5 N.A. 76.1 33.8 49.0 41.0 13.2 3.0 9.3 1.3 16.7 30.7 2.2 0.1 19.8 17.3 0.7 0.3 33.6 15.6 13.6 13.2 1.6 0.2 37.9 N.A. 55.3 20.5 0.2 0.1 79.4 N.A. 5.5 0.6 15.6 32.1 55.7 47.0 22.9 60.8 78.8 99.8 20.8 5.7 76.6 36.2 6.1 0.7 0.1 0.1 26.5 20.0 Restraint Choice We used different restraints out of the following three choices: 1) strong restraints on selected Cαs; 2) weak restraints on all Cαs; 3) a combination of strong restraints on selected regions and weak restraints on the rest. The first choice is most appropriate for cases where there is specific information about which regions require refinement. In the case of CASP, this information was provided for some targets. However, for other targets – and more general applications of 61 structure refinement methods – such information may not be available. Therefore, we evaluated how the choice of restraints affected the results. In order to compare results in a consistent fashion, we used only the first 10 replicas of each set. Some targets use strong, partial restraints for the first set with 20 replicas while for other targets weak, complete restraints were used for the first set (see Table 3-1). Therefore, the total number of replicas that were used for each restraint type does not match among different targets. Furthermore, not all targets were run with strong, partial and weak, complete restraints. Those targets were excluded from the comparison (see Table 3-4). 62 Table 3-4 Refinement results of different restraints for the best observed structure in terms of RMSD (Å) and GDT-HA; comparing strong (1 kCal/mol/Å2) restraint on selected residues vs. weak (0.05 kCal/mol/Å2) restraint on all Cαs, and a combination of both. Cases associated with * indicate targets that had suggestions from CASP on which regions need refinement. Strong Restraint Target TR644 TR655 * TR661 TR662 TR663 * TR671 TR674 TR679 * TR681 TR688 TR689 TR696 TR698 * TR699 TR704 TR705 TR708 TR710 TR712 * TR720 TR723 TR724 TR738 * TR747 TR750 TR752 * TR754 Avg. of common rows: Avg. * (CASP sugg.) Avg. (no sugg.) Weak on all Cα ΔRMSD ΔGDT-HA (Å) ---0.14 -0.3 -0.22 3.9 -0.54 10.7 ---0.62 4.3 -0.40 4.7 -0.23 2.4 -0.13 3.5 ---0.24 0.9 -0.50 7.8 -0.16 -1.5 -0.33 1.7 -0.57 6.6 -0.45 5.7 -0.43 -0.1 -0.19 8.8 -0.23 -1.9 -0.21 2.8 -0.35 7.8 -0.40 3.1 -0.37 6.3 -0.44 9.2 -0.42 10.2 ---0.35 2.57 ΔRMSD (Å) ΔGDT-HA -0.94 -0.26 --0.22 -0.22 -0.21 -0.78 -0.55 -0.02 -0.09 -0.15 -0.81 -0.32 -0.32 -0.37 -0.48 -0.84 -0.16 -0.48 -1.85 -0.71 -1.49 -0.34 -0.19 -0.26 -0.30 -0.21 5.1 1.3 -5.0 3.6 1.7 3.8 2.9 1.6 4.1 -0.2 6.0 2.3 1.2 4.5 8.3 0.0 6.7 2.3 2.8 9.7 7.4 8.5 3.3 4.5 2.5 -0.7 -0.50 3.8 -0.35 4.3 -0.35 3.3 -0.23 1 -0.53 3.9 -0.38 5.2 Strong + weak ΔRMSD ΔGDT-HA (Å) -0.49 4.4 -0.34 3.6 -0.13 5.0 -0.24 1.0 Not enough data Not enough data Table 3-4 shows the best structures in terms of RMSD (Å) and GDT-HA from all the 10 replicas for a given restraint type. Average values were calculated only for the 22 targets, which have 63 both strong and weak restraint sets. The analysis suggests that in terms of best structures that were generated, strong, partial restraints may be roughly equivalent to using weak, complete restraints. Interestingly, the RMSD seems to be improved more with strong, partial restraints while GDT-HA scores appear to be improved more with weak, complete restraints. It is instructive to further separate the analysis into targets where the CASP organizers suggested regions to be refined vs. targets where no such information was given. We find that the degree of refinement was actually greater for the targets where no information was given, indicating that the additional information given during CASP10 was not essential for successful refinement. However, we also note that in the cases where information was available about which regions to refine, the application of partial restraints clearly outperformed weak restraints on all residues. On the other hand, targets where no information was given resulted in significantly better GDTHA scores with weak, overall restraints than with partial restraints based on secondary structures. This suggests that an optimal strategy may be to use partial restraints if information is available which regions require refinement while applying weak restraints for all residues otherwise. While Table 3-4 focuses on the best structures that are generated, Table 3-5 shows the result of refinement when the entire protocol is applied. The overall trends match those of Table 3-4. 64 Table 3-5 Refinement results of different restraints using the established structure generation protocol; comparing strong (1 kCal/mol/Å2) on selected residues vs. weak (0.05 kCal/mol/Å2) restraint on all Cα, and a combination of both Target TR644 TR655 * TR661 TR662 TR663 * TR671 TR674 TR679 * TR681 TR688 TR689 TR696 TR698 * TR699 TR704 TR705 TR708 TR710 TR712 * TR720 TR723 TR724 TR738 * TR747 TR750 TR752 * TR754 Avg. of common rows: Avg. * (CASP sugg.) Avg. (no sugg.) Strong restraint ΔRMSD ΔGDT(Å) HA -0.54 3.0 -0.05 0.0 ---0.03 1.3 3.1 0.54 1.1 0.02 -0.11 3.4 0.1 0.03 -0.06 -0.1 -0.02 2.3 -0.11 3.6 -0.24 2.8 -0.01 -0.4 -0.14 4.0 -0.10 2.3 -0.15 5.2 2.8 0.11 -0.05 2.6 -0.08 3.5 -0.54 1.1 -0.22 4.6 -0.51 3.7 -0.20 6.1 -0.08 0.3 -0.11 2.9 -0.13 1.7 0.16 -7.4 -0.11 2.0 Weak on all Cα ΔRMSD ΔGDT(Å) HA ---0.01 -1.7 -0.03 2.3 -0.20 4.7 ---0.05 1.4 5.3 0.00 -0.05 3.0 -0.04 1.3 ---0.14 4.4 -0.17 4.0 0.03 -1.3 -0.04 4.2 -0.18 3.9 -0.14 4.4 2.2 0.08 -0.05 4.4 -0.07 4.3 2.4 0.01 -0.13 6.3 -0.02 2.8 -0.23 7.5 -0.11 0.8 -0.16 4.0 --0.07 -5.5 -0.07 -0.11 2.8 0.01 1.4 -0.10 1.0 Not enough data Not enough data 2.9 0.01 2.0 -0.07 2.4 -0.14 2.1 -0.08 3.5 65 Strong + weak ΔRMSD ΔGDT(Å) HA -0.36 3.2 3.4.5 Simulation Time: Single MD vs. Multiple Short MDs Finally, we compared the sampling efficiency of two sets of simulations in order to assess the benefits of using multiple short simulations vs. a single long MD simulation. During CASP10 we ran multiple short simulations because of time and resource constraints. A single long simulation was run after completion of CASP for over 200 ns for each target continued from the first replica in set 1 using the same restraints as for the short simulations. We then compared the output of the 10×20 ns simulations with the results from the single 200 ns simulations. Note that the restraints can be either of strong, partial type, or weak, complete type. Figure 3-3 shows the cumulative minimum ∆RMSD and cumulative maximum ∆GDT-HA averaged over all 27 targets for the single 200 ns simulations and 10×20 ns simulations. The results of the 10 replicas in multiple short simulations are combined at each time slot, so at each time t, the cumulative minimum ∆RMSD and maximum ∆GDT-HA values are calculated from the t/10 portion of all of the 10 trajectories. There is an expanding gap between the single and multiple trajectories where the multiple short simulations outperform the long simulation both in terms of RMSD and GDT-HA. Furthermore, in Table 3-6 we compare the refinement performance by using structures either from a single 200 ns simulation or from multiple 10×20 ns simulations. We tested two selection protocols, using the lowest DFIRE score and subset selection and averaging followed by structure interpolation as described above. Selecting structures with the lowest DFIRE score performs poorly in both cases. However, applying our protocol improves the average RMSD with a similar level of accuracy (-0.08 Å), while the average improvement in GDT-HA is actually slightly higher in the case of the single 200 ns simulations (2.9 vs. 2.5 for 10×20 ns simulations). Given the increased sampling of refined structures with multiple short simulations, this is somewhat surprising and warrants further investigation. Assuming that the differences are 66 statistically significant, it may be that much longer simulations generate a broader sampling that when averaged result in a structure that is closer to the experimentally averaged structures. Figure 3-3 Subset selection based on normalized iRMSD and DFIRE scores from the replicas in set 1 with corr(iRMSD,DFIRE)<0.4 for TR674. The subset shown in the lower left corner of the scatter-plot with green triangles are selected to calculate the average structure and model selection. Table 3-6 Summary of comparing refinement results between 10×20 ns simulations and single 200 ns simulations having the same restraint conditions. The results are averaged over 27 CASP10 targets. Best ∆RMSD in trajectory Best ∆GDT-HA in trajectory Lowest ∆RMSD DFIRE score ∆GDT-HA Subset Avg. + ∆RMSD Str. Interp. ∆GDT-HA 67 200 ns -0.29 5.5 0.06 0.0 -0.08 2.9 10×20 ns -0.34 6.0 0.03 -0.1 -0.08 2.5 Figure 3-4 Initial model (blue), refined (green) and native (magenta) for (a) TR662, (b) TR723, (c) TR738 and (d) TR674 3.4.6 Final Stage of Refinement As mentioned in the methods section, structure averaging and interpolation cause some unphysical conformation with bad bonds, angles, dihedrals and steric clashes. Therefore, an extra stage of refinement is required to generate stereochemically acceptable structures. Table 3-7 shows the MolProbity measures for individual targets before (avg) and after final refinement stage (MD). This final refinement had only a small effect on RMSD from native and GDT-HA, 68 as before this stage, the average change in RMSD was -0.08 Å before the final stage and -0.06 Å after the final stage while the average GDT-HA did not change during the final refinement stage. Figure 3-5 Sampling efficiency toward native, showing the cumulative minimum ∆RMSD (top) and cumulative maximum ∆GDT-HA (bottom), comparing the single 200 ns MD simulations (red) vs. 10×20 ns MD simulations (green) averaged over 27 CASP10 targets. 69 Table 3-7 MolProbity results of the structure obtained from the averaging and structure interpolation Target TR644 TR655 TR661 TR662 TR663 TR671 TR674 TR679 TR681 TR688 TR689 TR696 TR698 TR699 TR704 TR705 TR708 TR710 TR712 TR720 TR723 TR724 TR738 TR747 TR750 TR752 TR754 Avg. MolProbity Avg. Final 1.42 4.16 4.83 2.33 1.03 4.00 1.49 4.48 4.80 2.51 4.69 2.52 1.99 4.34 1.74 3.54 1.74 4.45 1.64 4.09 4.26 2.00 4.82 2.14 1.47 3.88 4.34 2.10 1.32 4.72 5.07 2.36 1.44 4.11 1.14 4.06 1.82 3.79 1.28 4.60 1.80 4.44 1.90 4.87 0.88 3.88 1.12 3.91 1.61 4.12 1.05 3.02 5.15 2.59 1.72 4.31 3.5 Conclusion We applied a recently established molecular dynamics-based structure refinement protocol to CASP10 targets. Overall, we were able to reliably refine most of the targets both in terms of RMSD and GDT-HA relative to the experimental structures. The key components of our protocol are the use of restraints during MD simulations, the selection of trajectories based on a 70 quality assessment score, and the generation of refined structures following structure subset selection and averaging. We compared the results of using strong restraints on selected residues vs. weak restraints on all Cαs, and concluded that using strong restraints on selected Cαs leads to improved RMSD values, while weak restraints can improve GDT-HA measures better. Another question in MD based refinement is the time scale of the simulation, and in this study we compared the sampling in multiple short MD simulations vs. one single long simulation, and we observed that multiple short MD simulations may have a higher sampling efficiency. Although our protocol outperformed other refinement methods, overall, the improvements in RMSD and GDT-HA measures in refining protein structures are still relatively minor and it is clear that further progress is needed. One possibility is to take advantage of the consistent and reliable refinement obtained here and extrapolate along the initial direction. Another direction is the further improvement of structure selection methods since for many targets significantly more refined structures were generated than what we submitted as predictions. 71 Chapter 4 Protein Structure Refinement on CASP11 Targets 72 4.1 Introduction Previously, we developed a robust protocol for MD-based protein structure refinement.[37] As our protocol was tested on CASP10 targets, 23 out of 27 cases we had improvements in ∆GDTHA by taking advantage of subset selection algorithm and structure averaging.[38] Our work has remained state of the art solution to protein structure refinement to date. However, the improvements were still minor, reaching only up to 3% in ∆GDT-HA on average, and maximum improvement per target was limited to 5.5%. In order to make MD-based structure refinement a practical approach for structure determination procedure, improvements in ∆GDT-HA of about 20% is required. Therefore, we seek the limitations and bottlenecks in our approach. Computational methods for structure refinement of proteins rely on performance of two aspects: conformational sampling and structure selection.[37] Therefore, improvements in both categories are necessary. Our analysis showed that restraints impose a strong limit on conformational sampling. We realized that using weaker restraints on all Cα atoms for targets in CASP10 gave better or nearly the same performance in refinement compared to strong restraint on selected Cα atoms.[38] Therefore, we modified our protocol and used weaker restraints in MD simulations. Applying weak restraints will greatly enhance the sampling efficiency, since the protein backbone has more freedom to respond to its environment. Yet, we also investigated the structure selection category, by comparing the performance of our protocol using different scoring functions. Many scoring functions were considered, such as ITScore[129], RW+[48], DFIRE[46], GOAP[130], OPUS-PSP[131], DOPE[132], and Seder[47]. The final protocol was refined and optimized based on its performance on CASP8, CASP9, and CASP10 targets. 4.2 Methods 73 The refinement category of CASP11 provided 37 protein targets, ranging from 62 to 288 residues. Table 4.1 shows detailed information of targets, such as number of residues, initial GDT-HA, and suggestions from CASP on which regions need further refinement. Table 4-1 List of refinement targets in CASP11 Target Number of residues PDB Initial GDT-HA TR217 TR228 TR274 TR280 TR283 TR759 TR760 TR762 TR765 TR768 TR769 TR772 TR774 TR776 TR780 TR782 TR783 TR786 TR792 TR795 TR803 TR810 TR811 TR816 TR817 TR821 TR822 TR823 TR827 TR828 TR829 TR833 TR837 TR848 TR854 TR856 TR857 224 84 194 96 168 62 201 257 76 143 97 198 155 219 95 110 243 217 80 136 134 243 251 68 265 255 117 288 193 84 67 108 121 138 70 159 96 4WED 62.8 4QB7 4QDY 4CVH 4Q28 4PQX 4Q5T 4PWU 4OJU 2MQ8 4QHZ 4QB7 4Q9A 4QDY 4QRL 4CVH 4QVU 29.0 59.9 41.2 44.3 57.3 70.6 57.9 64.0 56.2 52.4 37.8 62.8 54.0 64.8 57.5 47.9 CASP Suggestions on regions to refine Residues 137-149 4WED 4R7S 65.8 48.3 N-terminal residues 2-9 4R03 61.3 4R4G 58.0 2MQC 32.8 74 As table 4-1 shows, the experimental structures for 22 targets are released on Protein Data Bank (PDB). The majority of targets did not have any suggestions from CASP on which regions need to be refined further. In the following sections, we describe our protocol applied to these protein models. 4.2.1 Conformational sampling Two rounds of MD sampling were performed for each target. In all the simulations, CHARMM36[9] force field was used, along with TIP3[21] water model. In the first round, the initial models underwent 5,000 energy minimization steps, and heated to temperatures of 10, 50, 100, 200, and 298 K within 42 ps MD simulations. Then, multiple replica MD simulations in explicit solvent starting from the given initial structures were run for 30 ns per target per replica. Having 40 replicas per target, total simulation time in the first round summed up to 1.2 µs for each target. The restraint force constant was chosen to be 0.05 kcal/mol/Å2 on all Cα atoms, unless suggestions from CASP exist for that protein, in which case, the restraint on suggested regions were relaxed (no restraint) and the force constant for the remaining regions was 0.1 kcal/mol/ Å2. An ensemble containing 30,000 structures was generated from all 40 replicas. Then, according to our optimized/tuned subset selection algorithm and structure averaging described in the next section, a final refined structure is obtained. This model constitutes the first submitted model to CASP, and the starting structure for the second round of refinement. In the second round of refinement, the refined and initial structures are geometrically aligned, and regions that have moved more than a certain threshold are identified. For this purpose, a moving window of size 3 residues is applied and the RMSD between the two structures subject to that window is computed. Consecutive regions that have RMSD more than threshold of 3 Å are identified. The selected regions are considered as potential regional targets for improvement 75 in the second round. Table 4-2 shows the number of regions, and the range of selected regions for each target. A set of 3 independent MD simulations of 20 ns were run for each region, while the residues outside the considered region were restrained with force constant of 1 kcal/mol/Å. For each region an ensemble of 1500 conformations was generated. The same subset selection algorithm, which was used in the first round, is applied to each ensemble, and an average structure for each region is obtained. 4.2.2 Subset selection and average structure The model selection algorithm is based on our previous work, in which a subset of structures was selected based on two scoring functions. Here, we modified the criterion for subset selection as follows. We used iRMSD (RMSD from the initial model) and RW+ as two scoring functions. These scores were standardized by subtracting their mean and dividing by their standard deviation. Then, the subset of points, which satisfy the following condition in polar coordinates, are selected S = {pi | ri > 1 & 200o < θ i < 270o } where ri and θi are the polar coordinates of point pi in the space formed by iRMSD and RW+ scores. The selected structures are geometrically superimposed to the initial model, and an average structure is computed. The average structure is further refined to remove clashes and unphysical bonds and angles, by a MD simulation of length 200 ps, and restraints applied to all Cα atoms with force constant 10 kcal/mol/Å2. 76 Table 4-2 Selected regions for the second round of sampling for each target Target Number of regions TR217 10 TR228 4 TR274 14 TR280 6 TR283 11 TR759 5 TR760 12 TR762 12 TR765 TR768 TR769 5 10 4 TR772 14 TR774 12 TR776 13 TR780 TR782 7 6 TR783 18 TR786 15 TR792 TR795 TR803 4 9 9 TR810 13 TR811 13 TR816 TR817 TR821 TR822 3 10 10 9 TR823 14 TR827 13 TR828 TR829 TR833 TR837 TR848 TR854 TR856 TR857 6 5 4 7 4 3 8 0 Regions 277-287; 303-313; 316-326; 326-334; 347-357; 367-376; 404-414; 435-445; 447-455; 481488; 236-245; 245-255; 269-277; 284-290; 187-197; 198-206; 206-212; 213-223; 243-253; 253-259; 264-267; 267-277; 277-287; 293303; 309-312; 314-323; 337-347; 357-367; 135-145; 157-163; 176-185; 190-196; 201-206; 214-217; 247-254; 254-262; 273-283; 286-296; 298-308; 317-327; 331-341; 349-357; 359-363; 369377; 396-404; 46-51; 52-62; 63-73; 73-79; 89-99; 41-50; 50-58; 65-74; 81-91; 94-101; 109-118; 130-140; 149-157; 165-174; 187-196; 205208; 217-224; 24-29; 38-46; 49-54; 60-68; 117-127; 162-169; 183-187; 209-218; 224-229; 239-247; 254259; 266-274; 37-47; 54-64; 70-74; 76-82; 95-101; 24-29; 36-44; 46-56; 63-73; 82-90; 99-104; 104-111; 118-128; 137-147; 155-160; 6-16; 32-39; 41-48; 81-90; 69-77; 78-88; 99-106; 106-110; 121-131; 137-147; 153-161; 161-165; 166-174; 174-180; 193-203; 220-230; 232-240; 243-252; 31-34; 35-45; 49-56; 60-69; 69-75; 75-85; 86-95; 95-105; 121-131; 136-144; 144-153; 156166; 38-43; 60-66; 66-73; 73-77; 85-90; 107-117; 128-134; 135-144; 160-164; 175-180; 180-190; 228-235; 236-246; 40-48; 50-57; 57-67; 67-75; 81-91; 91-101; 109-119; 38-41; 43-48; 51-54; 60-68; 85-95; 100-107; 1-4; 13-23; 26-32; 42-48; 52-58; 63-73; 76-85; 91-101; 125-135; 142-145; 150-158; 158168; 182-189; 189-198; 199-209; 212-221; 224-232; 232-237; 37-44; 45-55; 67-77; 91-100; 106-114; 114-122; 125-133; 133-142; 144-150; 155-165; 170177; 177-185; 189-199; 223-233; 242-247; 6-14; 21-30; 46-56; 57-66; 19-28; 36-46; 61-66; 68-77; 87-92; 95-98; 102-107; 113-122; 128-133; 1-5; 30-38; 44-50; 50-60; 61-69; 70-80; 87-96; 101-111; 114-124; 137-140; 140-149; 152-159; 224-229; 233-241; 266-273; 277-285; 285-293; 293-299; 305310; 320-326; 348-353; 364-373; 10-18; 21-31; 53-60; 60-63; 69-79; 83-88; 130-140; 155-163; 170-180; 196-205; 210-218; 230-237; 244-249; 25-35; 47-50; 56-62; 44-54; 54-62; 67-75; 122-131; 134-144; 148-158; 186-196; 200-210; 245-255; 280-288; 35-44; 45-55; 69-76; 82-90; 115-123; 128-136; 137-147; 156-166; 180-189; 217-225; 2-7; 8-17; 21-26; 31-41; 59-68; 68-73; 74-84; 99-106; 107-112; 5-8; 17-26; 30-35; 45-52; 73-82; 113-123; 129-136; 138-148; 148-155; 159-169; 190-200; 201-207; 243-251; 278-282; 31-40; 46-54; 54-64; 67-76; 84-92; 94-101; 113-118; 126-134; 137-142; 162-172; 176-181; 184-190; 191-201; 137-145; 145-153; 161-169; 177-185; 185-194; 205-211; 2-6; 23-32; 32-39; 42-52; 52-61; 46-53; 64-68; 92-100; 114-124; 8-17; 17-25; 37-44; 61-71; 75-83; 89-99; 110-115; 48-53; 96-106; 139-148; 155-164; 37-47; 64-69; 77-87; 1-8; 15-24; 43-53; 61-71; 83-92; 92-100; 118-126; 138-145; 77 4.2.3 Model submission in CASP11 For all the targets, we used the final refined model obtained after structure averaging from the first round of simulations as model 1. The second model is taken from the consensus structure obtained via the average of both average structures from the first and second rounds. The last 3 models are selected from the average structures of individual regions in the second round of sampling that have the highest RMSD from the initial model. 4.3 Results and Discussion The experimental structures for 22 targets are released in the PDB data bank. We have assessed the quality of our submitted models by calculating the change GDT-HA of submitted models from those of the initial models. Larger ∆GDT-HA means higher improvement in structure quality compared to the experimental structure is achieved. Table 4-3 shows ∆GDT-HA for all 5 models for each target. The average improvement in GDT-HA for the first model is 4.31% and for the best of five models is 4.71%. Comparing the average improvement of the first model to that of CASP10 suggests that our refinement protocol has improved from what we used in CASP10 with only 2.8% improvement. On the other hand, the best of 5 models does not show much difference to the first model. This indicates the shortcoming of our second round of refinement, in which the targets had restraint with force constant 1 kcal/mol/Å2 on selected regions. Significant improvement in 3 targets, i.e. TR759, TR765, and TR821, is observed with more than 10% change in GDT-HA. By carefully examining these targets, we can see that the improvement is made in the all parts of the protein including loops and secondary structure elements. Therefore, applying strong restraints on secondary structure elements would have limited refinement as was the case in our previous protocol. The initial, refined and native models for 78 TR759 are shown in Fig. 4-1. This level of refinement has not been observed in the past. While our results show significant improvement, it is still not clear how our results for targets TR759, TR765, and TR821 are compared against other groups. It could be that other groups were also able to achieve such high improvements. Therefore, a through comparison of the results in CASP11 would give insights on the effectiveness of refinement methods. 79 Table 4-3 GDT-HA results and MolProbity and ClashScore measures for quality assessment of submitted models Target Model 1 TR217 TR228 TR274 TR280 TR283 TR759 TR760 TR762 TR765 TR768 TR769 TR772 TR774 TR776 TR780 TR782 TR783 TR786 TR792 TR795 TR803 TR810 TR811 TR816 TR817 TR821 TR822 TR823 TR827 TR828 TR829 TR833 TR837 TR848 TR854 TR856 TR857 Avg. Model 2 ∆GDT-HA Model 3 Model 4 Model 5 Best of 5 0.34 0.00 0.46 -0.58 0.11 0.46 -0.50 5.70 0.30 12.30 0.40 -4.00 19.70 6.10 1.80 1.40 3.00 6.50 4.20 8.60 5.00 4.60 -2.70 4.90 0.30 14.30 -1.00 -5.60 20.70 6.50 0.80 0.90 2.20 5.50 2.90 7.50 5.50 3.90 -1.50 3.60 0.00 13.10 0.60 -6.20 20.10 4.70 1.00 0.50 1.80 3.60 1.10 8.40 3.20 4.30 -0.50 4.20 0.50 15.20 0.40 -5.00 18.80 4.50 1.30 0.80 3.70 4.90 2.90 9.30 2.80 4.80 -2.50 4.90 -0.20 12.70 -0.10 -5.10 18.10 4.40 1.00 -0.60 2.70 4.40 1.80 8.60 3.90 3.90 -0.50 5.70 0.50 15.20 0.60 -4.00 20.70 6.50 1.80 1.40 3.70 6.50 4.20 9.30 5.50 4.80 -0.94 12.16 -0.94 12.75 -1.03 12.26 -1.41 11.96 -1.32 12.26 -0.94 12.75 2.30 0.90 2.80 3.00 -2.50 3.00 1.99 0.91 1.45 1.09 -1.63 1.99 3.90 4.31 3.60 -- 2.10 -- 4.40 -- 3.10 -- 4.40 4.71 80 Figure 4-1 Comparing the initial (green) and refined (cyan) models of TR759 to its experimentally observed crystal structure 4.4 Conclusion and Future Work In CASP11 we modified our protocol by extending the amount of sampling by molecular dynamics simulation, using weaker restraints, and tuned the subset selection algorithm. As a result, we observed greater progress in structure refinement. The average improvement impro in GDTHA for the first model was significantly higher than our previous result in CASP10. Yet, in three targets we have achieved improvements of more than 10% in GDT GDT-HA. HA. We believe that such significant improvements would be impossible using strong restraints. On the other hand, the second round of refinement with strong refinement on selected regions which led to models 2-5 2 did not show much difference to the first model. 81 Chapter 5 Density Biased Sampling: A Robust Computational Method for Studying Pore Formation in Membranes Vahid Mirjalili, Michael Feig Submitted to Journal of Chemical Theory and Computation 82 5.1 Abstract A new reaction coordinate to bias molecular dynamics simulation is described which allows enhanced sampling of density-driven processes, such as mixing and de-mixing two different molecular species. The methodology is validated by comparing the theoretical entropy of demixing two ideal gas species and then applied to induce deformation and pore formation in phospholipid membranes within an umbrella sampling framework. Comparison with previous biased simulations of membrane pore formation suggest overall quantitative agreement but the density-based biasing potential results in a different, more realistic transition pathway than in previous studies. 5.2 Introduction Advanced computational methods have long attracted the attention of biophysicists to shed light on the behavior of biological systems. The computer simulation of proteins, membranes, and nucleic acids are a powerful technique for understanding the physical characteristics of these complex systems.[8] Despite advances in computer power, the time scales required for studying many physical phenomena are still beyond the possibilities for the majority of the scientific community. However, the use of enhanced sampling methods[133-137] can overcome such limitations. One example where enhanced sampling is needed is the pore formation and deformation of lipid membranes.[138-147] Pore formation is involved in a variety of biological processes, such as signal transduction and small molecule transports,[138-140, 147] but it is also highly-relevant in the context of toxins and antimicrobial peptides that induce membrane pores to cause cell leakage and ultimately kill cells.[39, 40, 148] 83 A common strategy for overcoming kinetic barriers is the use of umbrella sampling techniques[32], where a main challenge is the choice of a suitable reaction coordinate. Geometric properties such as distances, angles, or dihedrals between groups of atoms have been widely used, but some physical processes are not described well by such simple reaction coordinates. As a result, enhanced sampling simulations using such coordinates may be less effective for these systems. For example, density-driven processes may not be described well by traditional reaction coordinates. Membrane pore formation is one such process where the application of enhanced sampling methods has been challenging.[149] In one previous study, the pore radius was incorporated as a reaction coordinate in a molecular dynamics framework,[149, 150] and the free energy cost of pore formation was measured using the potential of mean constraint field (PMCF) approach[151]. Furthermore, Bennett et al.[152] investigated the mechanism of pore formation initially by long equilibrium MD simulations followed by umbrella sampling where a single phosphorous atom in one of the lipids was pulled to the center. However, both choices of the reaction coordinate could be problematic as they make assumptions about how the membrane structure deforms upon pore formation. A natural reaction coordinate for studying membrane pore formation is the density of water molecules within the membrane in the area where pore formation takes place. Using the water density instead of a structural property of the membrane avoids biasing membrane structure unnecessarily but still provides enhanced sampling across the key kinetic barrier, i.e. water penetration into the membrane. Here, we are describing the development of a density-based reaction coordinate and its application in umbrella sampling simulations of membrane pore formation. The method introduced here biases the density of a group of atoms in a volume of interest, such as a cylinder. Therefore, our density biasing potential function can be used not just 84 for studying membrane pores but it is also applicable more generally for reaching a target density for a given molecular species relative to another species in any context. This methodology was implemented in the CHARMM biomolecular software package[86]. In the remainder of this paper, we will provide a detailed description of the density biasing potential, followed by validation of our method by comparing entropic components of de-mixing free energy of two ideal gases with theoretical estimates. Then, this method is applied to a pure DPPC membrane bilayer system to demonstrate its potential for estimating free energies of membrane pore formation. 5.3 Methods 5.3.1 Density Biasing Potential In this section, we provide the mathematical basis of the density biasing potential function. r Given the coordinates qi for atom i , the total number of atoms in any arbitrary sub-volume of r interest V can be calculated by integrating the product of a volume function f (r ) with the Dirac r r r delta function: δ ( r − q i ) f ( r ) for all atoms: ∞ ∞ ∞ N r r r 3 N r ΓV = ∫ ∫ ∫ ∑δ (r − qi ) f (r ) dr = ∑ f (qi ) −∞−∞−∞ i =1 (1) i =1 r where f (r ) returns one inside V while it switches smoothly to zero on the boundaries, and stays zero for all the points outside the volume (see below). In general, any differentiable volume r function can be used to define f (r ) , however, simpler functions are preferred since they are easier to implement in a molecular dynamics framework. 85 The volume of interest in our study is a cylinder with radius Rcyl and height Z cyl with its axis aligned to the bilayer normal (Fig. 1A). Therefore, we use cylindrical coordinates and decompose the volume function into radial and axial components so that: N ΓV = ∑ f radial ( ri ) . f axial ( zi ) (2) i =1 Figure 5-1 A: Schematic representation of the biasing cylinder aligned to the bilayer normal. The center of the switching region is indicated with dashed lines; B: Volume function used in axial and radial directions. Choosing the switching function as a third degree polynomial used in CHARMM PBEQ[3] PBEQ and GBSW[31, 153] modules results in the following differentiable volume function:  1  3 1 1 f radial ( r) =  − (r − Rcyl ) + 3 ( r − Rcyl ) 3 4w  2 4w  0  86 r ≤ Rcyl − w r − Rcyl < w r > Rcyl + w (3) 1  1 3 1 ( z − Z up ) + 3 ( z − Z up ) 3  − 4h f axial ( z ) =  2 4h 1 3 1  + ( z − Z low ) − 3 ( z − Z low ) 3 4h  2 4h 0  Z low + h ≤ z ≤ Z up − h z − Z up < h (4) z − Z low < h z > Z up + h or z < Z low − h where w and h are the switching distances for the radial and axial terms, respectively. Figure 1B shows the shape of radial component of volume function; the axial component has a similar shape. The number density ρ V is calculated by normalizing ΓV by the cylinder volume. The potential energy is then calculated for a given value of target density ρ t with the force constant k U= k (ρV − ρ t )2 2 (5) The corresponding force components can be obtained from the gradient of the potential term r r r k Fi = ∇ iU = ( ρV − ρ t ) ∇ i ΓV V (6) r  ∂f (r ) ∂f (r ) ∂f ( z )  ∇ i ΓV =  f axial ( zi ) radial i , f axial ( zi ) radial i , f radial ( ri ) axial i  ∂x i ∂y i ∂zi   (7) The details of the derivative components are provided in the Appendix. 5.3.2 Simulation Details 5.3.2.1 Method Validation For validation of our computational method, we investigated the mixing entropy of two noble gas species. 200 helium atoms were placed in a box, 40 of which were tagged to make two distinguishable species with identical parameters. The box dimensions were 200×200×50 Å3. A density biasing cylinder with a radius of 50 Å was placed in the center of the box with the 87 cylinder axis aligned with the z axis. The switching distance in the radial direction was set to 1 Å. The cylinder height was considered to be infinite; therefore the biasing potential did not vary along the z-axis. The number densities were normalized by the equilibrium number of particles in the cylinder volume. In order to fully separate the two molecular species, the reaction coordinate in the density biasing potential was constructed as the difference between the densities of the tagged and untagged species in the cylinder. In this case, an increase in the value of the reaction coordinate can be due to either increasing the number of tagged species or decreasing the number of untagged ones assuming that the total number of particles in the cylinder is constant on average over time. For this system, the equilibrium value of the reaction coordinate is -6×10-5 Å-3 for the fully mixed state and 1×10-4 Å-3 for the fully separated state. Therefore, using umbrella sampling, the reaction coordinate was varied from -5.1×10-5 to 7.1×10-5 Å-3 in increments of 2.54×10-7 Å-3. Each umbrella window was simulated for 20 ns with a force constant of 107 kcal/mol/Å6 and a time step of 2 fs. The last 16 ns from each window were used to construct the PMF as a function of the reaction coordinate using WHAM analysis. A theoretical estimate of the mixing entropy for two molecular gas species A and B is given by ∆S = nR ( x A log( x A ) + x B log( x B )) (8) where x is the mole fraction of each species, n is the total number of moles, and R is the universal gas constant. The total change in entropy is given by ∆Stot = ntot, V1 ∆S ( x A, V1 , xB, V1 ) + ntot, V2 ∆S ( x A, V2 , xB, V2 ) (9) where ntot, V1 and ntot, V 2 are the average total number of atoms in volumes V1 and V2 at equilibrium, respectively. In order to compare the theoretical mixing entropy with our 88 computational approach, we evaluated the theoretical estimate as a function of the mole fraction of species A in volume V1 in the process of going from a fully separated state (i) ( to a partially mixed state (ii)) as shown in Fig. 2. The mole fraction is then converted to the reaction coordinate (ξ)) used in the umbrella sampling simulations according to: ξ= n tot ,V1 ( x A,V1 − x B ,V1 ) (10) V1 two-component component noble Figure 5-2 Schematic representation of the mixing process for a simple two gas mixture that is fully demixed (A) and partially mixed (B). 5.3.2.2 Membrane Simulations A pure membrane bilayer was constructed by web web-based CHARMM--GUI membrane builder[154],, containing 288 dipalmitoyl phosphatidylcholine (DPPC) and 8376 water molecules placed in a periodic box of size 95.2×95.2×66.6 Å3. The x-y y dimensions were adjusted to match 89 the experimental value of 63 Å2 for the area per lipid of DPPC in the fluid phase.[155, 156] The z dimension was chosen large enough to avoid boundary artifacts. The CHARMM36 force field[157] was used along with the TIP3 water model[21]. Lennard-Jones interactions were cut off at 9 Å (with a switching function beginning at 8 Å). Particle-Mesh Ewald summation[158] was used for long-range electrostatic interactions with a 9 Å cutoff for the direct sum. A time step of 2 fs was used in combination with the SHAKE algorithm.[159] The initially flat bilayer was heated in steps at 50K, 100K, 200K, 250K, and 323K, each for 100 ps with a Nosé-Hoover thermostat and barostat (target pressure of 1 bar) to maintain an NPT ensemble. The center of mass of the bilayer was restrained to the plane at z=0 with a force constant of 100 kcal/mol/Å2. The final equilibrated system was then used to study membrane deformation and pore formation with our density biased sampling method. 5.3.2.3 One-sided deformation of a membrane bilayer The density biasing approach was applied to the DPPC membrane bilayer system. A cylinder with a radius of 8 Å was aligned to the bilayer normal (z) axis. The cylinder spanned from z=-2.5 Å to z=+15 Å, and the radial and axial switching distances were set to 1 Å and 5 Å, respectively. Umbrella sampling simulations were performed with 10 windows, increasing the number density of water molecules per unit area in the cylinder from 1.1×10-3 to 2.17×10-2 Å-3. A force constant of 9.2×105 kcal/mol/Å6 was used. To prevent deformation in the lower leaflet, a plane potential with a force constant of 100 kcal/mol/Å2 was applied to the phosphates of the lower leaflet if their distance to bilayer center was less than 8 Å. Each umbrella was simulated for 50 ns. 5.3.2.4 Pore formation in a membrane bilayer In order to create a pore in a membrane bilayer, we expanded the cylinder from the previous case to cover both leaflets, i.e. from z=-18 Å to z=+18 Å. The radius of the cylinder was chosen as 90 r=6 Å, and the radial and axial switching distances were set to 2 Å and 8 Å, respectively. The parameters were adjusted based on initial trial simulations in order to achieve double-sided pore formation. 20 umbrella windows were used to vary the number density of water molecules in the cylinder from 6.7×10-3 Å-3 to 2.25×10-2 Å-3, using a force constant of 5.18×106 kcal/mol/Å6. Each umbrella was simulated for 50 ns. The total simulation time for pore formation was 1 µs. 5.3.2.5 Parameter Selection While our method can be used for a diverse set of applications, the biasing potential parameters would have to be adjusted accordingly. We will provide guidance here how to choose the two key parameters, height and radius, for the case of a cylindrical biasing volume. Generally, the cylinder height should encompass and extend beyond the region where the density is meant to be changed. For membrane simulations, a short cylinder height would be appropriate to induce one-sided deformation while longer cylinders are necessary to induce transmembrane pores. Furthermore, for one-sided deformations, the lower bound of the cylinder was fixed at z=2.5 Å to let water molecules reach the bilayer center without forming complete pores. In the helium gas de-mixing simulations, the cylinder height was chosen bigger than the box size to avoid gradients along the z axis. The cylinder radius should be chosen large enough so that the cylinder extends beyond the pore or deformation that is meant to be formed. Otherwise, the biasing potential may affect the shape of the deformation. On the other hand, a cylinder radius that is too large may not be effective in inducing pore formation because large membrane deformation could also satisfy a bias towards increased water densities within the cylinder. Because it was not entirely clear a priori which radius and cylinder height would be optimal, we conducted a series of test simulations with varying radii and cylinder heights until pore formation was accomplished successfully. 91 Finally, the force constants and window spacing were optimized by trial error. We found that the final values were similar as those predicted by the criterion given by Park and Im[97]. 5.3.2.6 Implementation The density biasing method using a cylinder-based volume function was implemented in the CHARMM biomolecular software package[86], version c40a1. Although not implemented so far, it would be easy to extend the method to other geometries such as a rectangular box with switching regions on each edge or a spherical geometry. 5.4 Results and Discussion 5.4.1 Mixing entropy of two-component gas The free energy cost of separating two noble gas species was calculated using theoretical and computational methods. Since the two species have identical properties, there is no change in the mean of internal energy of the system upon separating the two species. Figure 3 compares theoretical estimates of − T∆ S according to Eqs. 9 and 10 with the change in free energy computed using the density biased sampling method. The reference point in this figure is the fully mixed state which has the highest entropy. This state corresponds to a mole fraction of xA=0.22. If the theoretical estimate assumes a perfectly uniform particle distribution to obtain the number of particles in the cylinder (Fig 3 – theory A), the ∆G from the simulation underestimates the theory significantly. The agreement improves when the actual average number of particles in the umbrella windows that corresponds to the de-mixed states is used in the theoretical estimate (Fig 3 – theory B). The remaining small discrepancy is due to a non- 92 negligible virial term that results from a pressure differe difference nce inside and outside the cylinder during the umbrella simulateons in response to the application of the biasing potential. A correction by adding -∆(PV),, calculated from the average external pressures from simulations of the fully mixed and fully de-mixe mixed d states as reported by CHARMM, brings the theoretical and simulation estimates in near-perfect perfect agreement. We note, that the simulated system is not an ideal gas because of weak attractive interactions and volume exclusion effects as a result of the Lennard-Jones Jones interaction potential. This would lead to a small correction of the theoretical estimate that is expected to be smaller or on the same order as the uncertainties in the free energies obtained from the simulations. Therefore, the simple test case va validates lidates the density biasing potential introduced here. Figure 5-3 Free energy cost of mixing two noble gas species as a function of the biasing bi reaction coordinate ξ based on theory (mixing entropy) and simulation (free energy calculated from umbrella sampling simulations). Theory A is using uniform density to estimate total number of 93 particle in cylinder, whereas theory B uses the empirical average number of particles observed during the simulations. 5.4.2 Membrane Simulations We will now demonstrate the application of the density biasing approach to simulations of membrane bilayers. As described in detail in the Methods section, the density biasing potential was applied to water molecules within a cylinder encompassing a section of a phospholipid bilayer. Figure 4 demonstrates how local membrane thickness, calculated as the average z coordinate of the phosphorous atoms in a cylinder of radius 8 Å, responds to the water density in the cylinder when varied in umbrella sampling simulations. The strong correlation reaffirms that water density within the bilayer is a suitable reaction coordinate for inducing membrane deformations. Figure 5 shows snapshots of the membrane bilayer after 50 ns molecular dynamics simulation with the density biasing potential set to increasing target values. The increasing degree of membrane deformation is readily apparent and we note that the deformation appears to proceed with a slight bending on both leaflets (Fig. 5C), presumably because this lowers the overall free energy for these intermediate states. However, a further increase in the water density results in a pronounced one-sided deformation with little apparent perturbation on the opposing leaflet. This is shown in Fig. 5F. Another feature of the deformation process is that it progresses from an initially wide and shallow deformation to a narrow and deep deformation, presumably due to a balance between the elastic properties of the membrane bilayer and the free energy costs of forming water defects within the membrane. The deformation of the bilayer is further quantified in Figure 6A, where the bilayer thickness at the deformation location is shown for each umbrella window. The first umbrella is simulated with equilibrium flat bilayer conditions, therefore, no deformation is observed. However, other umbrellas increase the density of water molecules, which induces a deformation in bilayer. From the umbrella sampling, a free energy 94 profile was obtained ined by weighted histogram analysis method (WHAM) (WHAM)[33].. Figure 6B shows the resulting potential of mean force (PMF) as a function ooff the water density in the cylinder. As would be expected, increasing the number of water molecules, and thereby deforming the bilayer, is highly unfavorable in terms of free energy with a cost exceeding 40 kcal/mol for a one-sided sided water defect that extend extendss to the center of the membrane. As shown below, the cost of forming a pore is about half so that without any restraints on the lower leaflet (see methods section) the bilayer would not be expected to stably maintain a one one-side side deformation. Figure 5-4 Local membrane bilayer thickness of the upper leaflet vs. water density per unit volume from biased sampling of one one-sided sided membrane deformations. 95 Figure 5-5 Snapshots illustrating the one one-sided sided deformation process from a flat bilayer state to a fully deformed state at water densities of 0.0016 (A), 0.0073 (B), 0.0111 (C), 0.0143 (D), 0.0167 (E), and 0.0170 Å-3 (F); Red spheres represent water molecules, brown spheres represent phosphorous atoms of the lipids, and lipid tails are shown in green. 96 Figure 5-6 Average bilayer thickness in radial slabs for each umbrella window as a function of radial distance from the pore center. B: Free energy profile for one sided bilayer deformation as a function of water density in the cylinder. Standard error values obtained d by calculating the PMF profiles over 10 2-ns ns subsets from the umbrella sampling simulation are shown as light blue shades. Finally, we applied the density biasing method across the entire DPPC bilayer in order to induce pore formation. Snapshots of the bbilayer ilayer after 50 ns molecular dynamics simulations with increasing water density biases are shown in Fig. 7. Similar to what has been described previously[152],, pore formation starts by bending both leaflets inward. A water wire forms initially (Fig. 7D). The lipid head groups then rearrange and form the familiar hourglass shape of a stable pore once a critical pore radius is passed (Fig. 7E). A transition involving an initial water wire is consistent with results from the equilibrium simulations by B Bennett ennett et al.[152] al. The 97 average number density profiles of water molecules across the bilayer normal for a flat bilayer and a bilayer with a stable pore (with average water density of 0.0216 Å-3) are compared in Fig. 8. By integrating over the diffe difference rence between the two curves, it is found that 148 water molecules exist in the pore. This result is comparable with the 124 water molecules obtained by Leontiadou et al.[140],, in which they applied mechanical stress (surface tension) to form a pore p in a DPPC bilayer. Figure 5-7 Snapshots illustrating the pore formation process from a flat bilayer state to a stable pore at water densities of 0.0067 (A), 0.0144 (B), 0.0159 (C), 0.0168 (D), 0.0196 (E), and 0.0222 Å-33 (F) with coloring as in Figure 55-5. We computed the pore size by assuming perfect cylindrical shape between z= z=-8 8 to z=8 Å, and a uniform water density in that region. The average number of water molecules in the region was found to be 117.7 in the last umbrella. The resulting pore radius is found to be 8.8 Å. Similar analyses assuming perfect cylinder for water wire result in pore radius of 4.2 Å. Figure 9 shows the PMF of pore formation as a function of water density in the aforementioned aforementi cylinder. Again, pore formation is energetically unfavorable as expected. A plateau free energy of 22.2 (+/- 0.4) kcal/mol is reached at a critical water density of 0.018 Å-3 once a stable pore is 98 formed. This result is close to the value of 19.02 kca kcal/mol l/mol reported by Bennett et al. for DPPC.[152] The agreement is excellent, especially when considering differences in force fields. We further decomposed the free energy into enthalpic and entropic contributions. The change in enthalpy is estimated by computing the average potential energy of the syst system em and we found that pore formation is enthalpically favorable by 46±1 kcal/mol. The simple de de--mixing test case above suggests that there may be an additional ∆PV term but for a partially de-mixed mixed system the contribution is estimated to be less than 1 kcal kcal/mol /mol and it is therefore neglected here. This implies an entropic cost (-T∆S) S) of pore formation of about 68 kcal/mol. Figure 5-8 Number density of water molecules across bilayer normal compared between a flat fl bilayer and a bilayer with a stable pore (A), and their differences (B). 99 Figure 5-9 Free energy of pore formation as a function of water density in the cylinder from density-biased biased sampling with errors iindicated as in Figure 5-6B. 6B. A previous result from Bennett et al. is shown for comparison. As mentioned above, one motivation for inducing membrane pores via water density biasing rather than biasing the membrane structure directly was to avoid artifacts tthat hat could affect the pore formation pathway and thereby the energy profiles obtained from umbrella sampling. Figure 10 compares the water density ((ξ), ), our reaction coordinate that imposes minimal bias on the membrane structure, with the average distance of the two closest phosphates from the bilayer center (λ). ). The latter relates to previous biased simulation studies where the distance of a single phosphate group from the bilayer center was used. used.[152] Poor correlation between the two reaction coordinates suggests that there could be mechanistic differences when either of the two reaction coordinates is used to induce pore formation. With the density biasing term, a typical transition path (indicated in red in Fig. 10) would delay a transition of phosphates to the bilayer center until a critical water density is reached at which point there is a sharp, cooperative cooper transition that leads to formation of a full pore. On the other hand, we speculate that forming the pore by pulling down a phosphate group would follow a path indicated in green in Fig. 10 where 100 phosphates approach the center of the bilayer early and a sharp, cooperative transition could be absent. Figure 11 shows two intermediate conformations with extreme low λ values that may be intermediates on such a transition path. In these conformations, the membrane exhibits large deformations on one leaflet, and the water molecules are dragged into the center along with the lipid headgroups, as shown in Fig. 11. Since free energies are state functions, overall energies of pore formation are of course independent of the path taken. However, the free energy profile pro along the transition path and any mechanistic insight obtained from such simulations does depend on the path taken as a result of the biasing potential. Figure 5-10 Average z coordinate of the two clo closest sest lipid phosphates from the bilayer center vs. water density within pore cylinder illustrating different mechanisms between density-driven density and phosphate-driven driven pore formation bias. Sampling from each umbrella is shown in different colors. 101 Figure 5-11 Intermediate bilayer states with low average distance of phosphates to the bilayer center. The proposed method in this work applies a minimal bias to induce a pore in membrane. There is no assumption madee about the shape of the pore or the density distribution inside the cylinder. However, the performance of this method is sensitive to the choice of cylinder parameters as described above. Therefore, we believe that this method is more universally applicable applicab to membrane pore formation and deformations in response to interactions with other molecules, especially in cases where it is not clear a priori how exactly the membrane responds to such molecules. The variation of the water density in our method is rem reminiscent iniscent of grand canonical ensemble methods[41, 132] that have been widely used to simulate the mixing process of model fluids[34, fluids 160]. However, because de-mixing mixing and bilayer pore formation processes maybe either thermodynamically unfavorable or kinetically hindered, enhanced sampling ttechniques echniques such as umbrella sampling would still be required. Furthermore, a global variation of the chemical potential for water in a membrane membrane-bilayer bilayer system may not necessarily lead to pore formation since water molecules could be added in the bulk region while a targeted change of a local chemical potential would eventually result in a method similar to ours but with the additional complications of the grand-canonical canonical machinery. 102 Finally, while the method presented here focuses on overcoming the kinetic barriers in creating membrane deformations and pores, it may not fully address overcoming the slow relaxation times of lipid motions. Therefore, mechanistic studies of membrane pore formation would likely require longer simulations and/or a combination with other enhanced sampling techniques such as replica exchange sampling that can accelerate lipid motions to guarantee full convergence of deformed bilayer systems. 5.5 Conclusions We have developed a new computational technique to bias the density of a group of molecular species, or the difference in densities of two molecular groups. The method was validated for the case of de-mixing two ideal gas species. Furthermore, we applied the new biasing term in the context of membrane pore formation. We believe that biasing the water density rather than structural properties of the membrane is less likely to introduce artifacts. Furthermore, the density biasing approach allows the study of one-sided deformations which has not been described with umbrella sampling techniques previously. The density biasing function is also more broadly applicable to any system involving the mixing or de-mixing of molecular species with respect to each other. Possible applications include lipid raft formation, co-solvent effects, and studies of concentration gradients in complex systems. 103 Chapter 6 Interactions of Amino Acid Analogs within Membrane Environments Vahid Mirjalili, Michael Feig Submitted to Journal of Physical Chemistry B 104 6.1 Abstract The interactions among four amino acid analog pairs (Asn, Ser, Phe, and Val) in the membrane environment are investigated using umbrella sampling molecular dynamics simulations. The physical characteristics of interactions among the amino acid pairs at the bound states and transition states were analyzed, and useful insights are gained by observing the differences in the relative population of the bound state conformations at different distances from the bilayer interface. It is shown that the distance from the bilayer interface dictates the interactions between the polar pairs and their conformations. Furthermore, the binding free energy obtained from all atom explicit simulations of each pair with respect to the bilayer normal distance is calculated. The results of this study can potentially be used for parameterization of other membrane models, as we have compared the results with three implicit membrane models. 6.2 Introduction Membrane proteins are involved in a variety of cellular processes such as molecular transport and signaling pathways, and they are the target of many pharmaceutical studies. Membrane proteins are embedded in lipid bilayers that support and modulate their activity.[161-163] As with soluble proteins, the interactions among the amino acids and the environment are the primary determinants of membrane protein structure and function.[163] Yet, knowledge gaps remain about the fundamental nature of amino acid interactions within the membrane environment. Experimentally, such interactions are notoriously difficult to probe while computational studies have been hindered by the long time scales and complexity of bilayer systems. Therefore, many computational studies of 105 amino acids in bilayer environments rely on simplifications such as coarse-graining[164, 165] or implicit membrane[134, 166-168] models to facilitate the conformational sampling of membrane protein systems[169-172] at the expense of neglecting subtle details of amino acid lipid interactions. The association and insertion of small peptides in aqueous and lipid environments has been the subject of several studies.[2, 170-185] In particular, amino acid insertion into membrane bilayers is fairly well understood. MacCallum et al. investigated the insertion of 17 amino acid side chains from the bulk water into membrane region and calculated the distribution of amino acid analogs with respect to the distance from the bilayer center.[2] The membrane insertion free energy profiles for each amino acid group (aliphatic, aromatic, and polar side chains) were compared, and it was found that the energetic minimum of aliphatic side chains is at the center of the bilayer, while the free energy minimum for aromatic side chains (Trp, Tyr, Phe) is located near the lipid carbonyl group. Polar residues (Asn, Gln, Ser, and Thr) have large positive free energies at the bilayer center that can be resolved in part by allowing water penetration into the lipid bilayer.[2] Membrane deformations are especially dramatic when charged amino acids are inserted as described most clearly for the case of arginine insertion.[2] The energetics of amino acid interactions within the membrane is less well understood. Kim and Im[174] studied the interactions of transmembrane (TM) helices with lipid bilayers. They decomposed the PMF of helix tilt angles into entropic and helix-lipid interactions, and concluded that helix-lipid interactions provide a driving force for helix orientation under positive hydrophobic mismatch conditions.[174] In a recent study, Castillo et al.[177] studied the association of two WALP23 peptides in three lipid 106 membrane systems using the MARTINI coarse grained model[164]. In that study, the peptide-peptide, peptide-lipid, and lipid-lipid interactions upon peptide binding were analyzed and characterized in terms of their thermodynamic behavior. They reported that association of WALP23 peptides is favored by more than 20 kJ/mol, without any free energy barrier separating associated and dissociated states.[177] In a more systematic study by de Jong et al.[1], the dimerization of amino acid side-chain pairs was simulated using different force fields in water, n-octanol, and decane as mimics of lipid membranes. The general features of favoring association of polar compounds and disfavoring association of hydrophobic compounds in decane and octanol were reproduced but it remains unclear how their results transfer to actual lipid bilayer environments. In order to better understand amino acid interactions in lipid bilayers, this work describes the association of pairs of four amino acid analogs, acetamide (Asn), methanol (Ser), toluene (Phe), and propane (Val) in DPPC bilayers from extensive biased-sampling allatom computer simulations. The results provide association free energy profiles and detailed insight into the coupling between inserted amino acid pairs and membrane deformations. Furthermore, the energy profiles were compared with common implicit solvent models[24, 186, 187] to assess their ability to reproduce amino acid interactions within the membrane. 107 6.3 Materials and Methods Pairs of four amino acid analogs were considered in this study: toluene-toluene, acetamide-acetamide, methanol-methanol, and propane-propane (Fig. 6-10 supplementary materials). Molecular dynamics umbrella sampling simulations were used to study the interactions among each pair at different positions along the bilayer normal: z=0, 4, 8, 12, 16, 20, 24 Å. The center of mass was restrained to the respective z values using a harmonic potential function with a force constant of 50 kcal/mol/Å2. The center of mass distance between the analogs was then varied from 3 Å to 15 Å with increments of 0.5 Å. At each distance, umbrella sampling was carried out[32] using a force constant of 5 kcal/mol/Å2 to maintain the respective distances. Initial systems were set up by placing the pair of molecules inside two spheres that were created inside the membrane at different values of z. Two sets of umbrella sampling simulations were carried out. In one set (called forward sampling), the amino acid analogs were initially equilibrated at a distance of 5 Å, and then after 2 ns simulation, the pair distance was decreased to 4.5, 4, 3.5 and 3 Å as well as increased to 5.5, 6, 6.5 … 15 Å in subsequent simulations. In the second set (called backward sampling), the pair was initially equilibrated for 2 ns at a distance of 15 Å and then pulled to increasingly shorter distances up to 3 Å. 6.3.1 Explicit Solvent Simulations A membrane bilayer consisting of 288 DPPC molecules was constructed and enclosed in a periodic box with a fixed lateral size of 95.24 Å × 95.24 Å. The non-bonded interactions were calculated within a cutoff distance of 10 Å (switched to zero between 8.5 to 10 Å), and for long range electrostatic interactions particle-mesh Ewald (PME) with a grid spacing of 1 Å was used. The simulations were performed using the NAMD 108 molecular dynamics package[188], under NPAT conditions using Langevin dynamics with a temperature of 323 K, and a constant normal pressure of 1 bar. A time step of 2 fs was used in conjunction with SHAKE. The CHARMM36 force field[9] was used to model the lipids, the CHARMM General force field (CGenFF)[11] for the amino acid analogs, and the TIP3 water model[189] was used. Initial configurations were minimized for 500 steps and then heated and equilibrated to temperatures of 20 K, 100 K, 250 K and 323 K for 2 ps, 2 ps, 2 ps, and 10 ps, respectively under the restraining potentials with respect to the pair distance and the z position of the pair. The overall center of mass of the lipids was also restrained to zero using a force constant of 100 kcal/mol/Å2. Subsequent umbrella runs were started from the previous 2 ns production run, and equilibrated and heated to 100 K, 250 K, and 323 K with their corresponding umbrella potential. The first 2 ns of each simulation was discarded, and the rest of the data was used for calculating PMFs. To assess convergence, the root mean squared deviations (RMSD) between the potentials of mean force (PMF) at a given distance were compared between the forward and backward sets. Simulations were initially carried out for 6 ns per umbrella and continued in both sets until an RMSD value of less than 0.2 kcal/mol was achieved (see Figure 6-11). For some umbrellas this required as much as 200 ns with explicit solvent and lipids (see Table 6-2). Generally, polar compounds required more sampling because of coupling with membrane deformations as discussed below. The total simulation time for acetamide and methanol pairs were 10.4 µs and 8.0 µs, respectively, whereas for toluene and propane the aggregate simulation times were 6.8 µs and 2.1 µs. Finally, weighted histogram analysis method (WHAM)[33] was used to generate a 109 composite unbiased PMF from the individual umbrellas along the entire range of pair distances. 6.3.2 Implicit Solvent Simulations Three implicit solvent models were considered in this study, HDGB, GBSW, and IMM1, the implicit membrane extension of EEF1. The implicit solvent simulations were run using CHARMM[86] following the same umbrella sampling protocol as with the explicit lipids and solvent but with a shorter time of 1.5 ns per umbrella that was sufficient to satisfy the convergence criterion. All the initial systems underwent 50 steepest descent energy minimization steps followed by 500 adopted basis Newton Raphson method. Then the systems were heated to 100 K, 200 K, and 323 K for 500 MD steps. The production runs were performed for 1.5 ns in each direction. For HDGB simulations, the dielectric and non-polar profiles along the Z axis were adopted from Sayadi et al.[169] (also shown in table 6-3 supplementary materials). A scaling factor of 0.015 kcal/mol/Å2 was used to obtain non-polar solvation free energies proportional to the solvent-accessible surface area (SASA).[26] For GBSW simulations, the implicit membrane thickness was set to 28 Å, and a switching length of 0.3 Å was used. In the case of IMM1 model, a membrane thickness of 28 Å was used. For IMM1, the amino acid analog parameters were directly adopted from their corresponding amino acids in the EEF1 model without further modifications. With the given parameters, all three implicit solvent models are meant to approximate the energetics of a DPPC bilayer. 6.3.3 Bilayer Deformation Simulation For certain separation distances and certain values of z, membrane bilayer deformations were observed with acetamide and methanol pairs (see results section). In most cases, 110 forward and backward umbrellas exhibited the same behavior (deformed or undeformed membrane), but in a few cases bistable behavior was observed where forward and backward sampling did not converge to the same state and where the membrane was deformed in one case but not the other. In order to be able to generate a complete free energy profile we carried out additional umbrella biasing simulations at a fixed distance and z value but varying the degree of bilayer deformation. To connect states with different degrees of membrane deformation we employed a recently introduced density-biasing approach[190]. In this method, an imaginary cylinder is placed along the bilayer normal axis. A volume function V is defined with two independent radial and axial components with a value of 1 inside the cylinder that is smoothly switched to zero to points outside the cylinder. The integral of the volume function over all water molecules gives the number of water molecules within the volume, which once normalized by the cylinder volume, is used as the reaction coordinate where low water density corresponds to an undeformed bilayer and high water density indicates deformation. In this case, a cylinder with radius 8 Å was used, spanning from z=-2.5 to z=15 Å, with the switching region set to 1 and 5 Å in radial and axial directions, respectively. Umbrella sampling was then used to vary the water density in the cylinder from 1.1e-3 Å-3 to 17.1e-3 Å-3 over eight umbrella windows with a force constant of 1.225e6 kcal/mol/Å-6. Due to convergence issues, we increased the number of umbrellas to 16 for methanol at z=4 Å. An additional restraint was applied to the phosphates of the lower leaflet if their distance to bilayer center was less than 8 Å, which prevents deformation in the lower leaflet. Density biased molecular dynamics simulations were carried out for 48 ns for each umbrella. The water-density biasing simulations were 111 combined with the distance-based umbrella simulations to generate 2D PMFs as a function of the pair distance (ξ) and water density (ρ) using WHAM[33]. Final 1D PMF profiles as a function of the pair distance (ξ) were obtained by Boltzmann averaging according to Eq. 1  ∆G (ξ , ρ )   ∆G ′(ξ ) = − k B T log exp − k B T   (1) 6.4 Results and Discussion Results from extensive biased molecular dynamics simulations are presented that describe the pairwise interactions between acetamide, methanol, toluene, and propane pairs at different distances from the center of a lipid bilayer. Although the main focus of this study is on the amino acid interactions within lipid bilayers, we observed significant coupling with the lipid bilayer structure, which will be described first before continuing to amino acid association energetics and structural details. 6.4.1 Membrane Deformations Because none of the compounds are charged, we initially assumed that membrane deformations would be modest and limited to cases where the analogs are near the membrane surface. However, we found significant membrane deformations even for deeply inserted acetamide and methanol pairs as shown in Figure 1. In the case of acetamide, deep deformations of the bilayer are observed consistently at z=4 Å and z=8 Å. When the acetamide pair is at the center (z=0), deformations are observed in some of the umbrellas and only at some pair distances suggesting a bi-stable scenario where 112 deformed and undeformed membrane states are similarly favorable but separated by a significant kinetic barrier. Methanol pairs also result in membrane deformations at z=4 Å and z=8 Å but not at z=0. At z=4 Å, the sampling is again bi-stable with all of the backward sampling umbrellas showing a deformed membrane while the membrane is deformed only at three pair distances in the forward sampling umbrellas. The non-polar compounds toluene and propane do not lead to water insertion when inserted deeply but when fixed at z=16 Å and z=20 Å the bilayer expands to accommodate the hydrophobic pairs. The membrane deformation largely disappears when the pairs are placed even further away from the center at z=24 Å. In order to further understand the bi-stable membrane deformation states for acetamide and methanol, we carried out additional density-biasing umbrella sampling simulations along the deformation reaction coordinate for acetamide and methanol pairs at z=0 and z=4 Å and at short pairwise distances where the bi-stable behavior was observed. The results are shown in Figure 2. In both cases, two states are found, separated by a kinetic barrier. In the case of acetamide, deformed and undeformed membranes are similarly favorable; for methanol the deformed membrane appears to be slightly more favorable when the pair is placed at z=4 Å. Water density biasing simulations were also carried out for additional pair distances of 5 and 6 Å for acetamide in order to be able to connect the forward and backward umbrella sampling sets (see below). 113 Figure 6-1 Local membrane thickness of the upper leaflet as a function of the radial distance from the center of the amino acid analog pairs at different distances from the center of the membrane. A: sampling in for forward ward direction; B: sampling in backward direction. Local thickness is calculated as average z of phosphorous atoms in the upper leaflet which fall into radial slabs of width 4 Å. 114 Figure 6-2 Potentials of mean force as a function of water density to reflect membrane deformation. A: acetamide pair at z=0 and d=5.5 Å; B: methanol pair at z=0 and d=4.5 Å. 6.4.2 Association Free Energy Profiles A main goal of this study is to obtain free energy profiles for amino acid side chain analog association within lipid bilayer environments. Umbrella sampling along the pair 115 distance reaction coordinate was carried out at different membrane insertion depths and a comparison between forward and backward sampling umbrella runs was used to assess satisfactory convergence. As shown in Figure 6-11, convergence, defined as an RMSD of less than 0.2 kcal/mol between forward and backward runs, was achieved for almost all windows except for acetamide and methanol at certain short distances and deep membrane insertions. These cases correspond to the bi-stable membrane deformation scenario described above where both deformed and undeformed membranes are similarly favorable but transitions between the two states are not sampled in the pair distance umbrella simulations. The additional water-density biasing simulations described above provide access to that transition and a combination of the pair distance umbrella runs with the water-density umbrella runs was necessary to obtain a complete energetic picture. In order to do so, two-dimensional PMFs as a function of pairwise distance and water density were constructed from the combined sampling (see Fig. 6-12 and 6-13) and then integrated using Boltzmann averaging along the density reaction coordinate to obtain correct one-dimensional PMFs as a function of the pair distance. When compared to the naïve case where the pair distance umbrella runs are simply combined without considering that in fact disconnected states are sampled, the corrected PMFs differ by 0.25-0.5 kcal/mol (see Fig. 6-12 and 6-13). For other pairs, distances, and membrane insertions, such a correction was not necessary because forward and backward sampling umbrella appear to have reached convergence. The complete association energy profiles as a function of pair distance and membrane insertion are presented in Figures 6-3A, 4A, 5A, and 6A. These profiles include the corrected PMF profiles for acetamide at z=0 Å and for methanol at z=4 Å. We note that 116 because the pairs were fixed at certain insertion depths the present simulations do not provide information about the relative free energies along the z direction. Instead, the PMFs are combined so that the contact pair has the same free energy at all values of z. Information about membrane insertion free energies is available from previous studies while adequate sampling of membrane insertion along with separation within the membrane would have greatly increased the need for additional sampling beyond what we can accomplish with the resources available to us. Overall, the free energy analysis confirms what would be expected qualitatively: both acetamide and methanol have a deep minimum when forming a contact pair inside the membrane but separating the pair becomes increasingly favorable towards the edge of the bilayer where the polar molecules can interact with water rather than with each other. At z=0 the acetamide pair is stabilized by as much as 2.5 kcal/mol while the methanol pair is stabilized by about 1.5 kcal/mol. Toluene and propane pairs on the other hand are slightly more favorable when separated in the membrane by about 0.25 kcal/mol while favoring weak association at the edge of the bilayer as would be expected for hydrophobic compounds. For all compounds there is a ‘desolvation’ peak immediately after separating the contact pair with an energetic penalty of 0.5 to 1 kcal/mol. 117 Figure 6-3 Potential of mean force for acetamide as a function of pair distance at different insertion depth into the lipid bilayer from simulations with A) explicit solvent and lipids B) HDGB implicit membrane C) GBSW implicit membrane and D) EEF1 implicit membrane models; For each insertion depth, the bound state was used as the reference with an energy of zero. 118 Figure 6-4 PMF of methanol as in Fig. 6-3. 119 Figure 6-5 PMF of toluene as in Fig. 6-3. 120 Figure 6-6 PMF of propane as in Fig. 6-3. The pair binding free energies obtained from the PMF profiles as the difference between the free energy at the contact pair and at the greatest pair distance considered here can be compared to previous results for pair formation in different solvents by de Jong et al[1] with GROMOS[13] and OPLS[14] force fields. More specifically, we compare our results at z=0, 12, and 24 Å insertion depths to the values obtained in decane, octanol and water, respectively. Overall, the agreement is good especially if one considers differences in force fields, the oversimplification of using decane and octanol as mimics of lipid bilayer environments, and the missing contribution due to membrane deformations with the simple hydrophobic solvents. However, taking the data at face value, it appears that 121 the agreement with the OPLS results is better while GROMOS may be overestimating contact pair formation in decane except for propane. Table 6-1 Binding free energies in kcal/mol obtained from explicit simulations (CHARMM) at different Z distances as the difference between the free energy for the contact pair and the average energy for distances greater than 10 Å. Standard errors are given in parentheses. Values obtained at z=0, 12 Å, and 24 Å are compared with values obtained previously in decane, octanol, and water by de Jong et al.[1]. CHARMM Acetamide GROMOS[1] OPLS[1] CHARMM Methanol GROMOS OPLS CHARMM Toluene GROMOS OPLS CHARMM Propane 6.4.3 GROMOS OPLS 0 4 8 12 16 20 24 -2.54 (0.02) -4.21 NA -1.47 (0.03) -2.72 -1.37 0.07 (0.01) -1.09 -0.31 -0.24 (0.02) -0.04 -0.06 -1.78 (0.04) -0.65 (0.05) -0.15 (0.06) 0.03 (0.03) -1.74 (0.06) -0.46 (0.11) 0.13 (0.05) 0.22 (0.02) 0.19 (0.02) -0.15 (0.03) 0.26 (0.02) -0.68 (0.03) -0.33 (0.03) -0.53 (0.04) -0.17 (0.03) 0.06 -0.31 0.18 (0.03) -0.10 -0.19 0.33 (0.05) 0.21 -0.03 -0.92 (0.09) -0.38 -0.11 -0.89 (0.09) -0.86 (0.05) 0.18 (0.01) 0.13 0.04 0.24 (0.01) 0.34 0.32 -0.86 (0.08) -0.29 -0.47 -0.32 (0.02) -0.06 -0.06 Contact Pair Formation of Polar Compounds Closer inspection of the conformations of the polar side-chain pairs (acetamide and methanol) indicate a conformational bias at the bound state as a function of the presence or absence of water molecules around the pair, while conformational analysis of the hydrophobic compounds, on the other hand, did not reveal any noticeable difference along the bilayer normal. We refer to the bound state as the closest pair distance where the association profile is still favorable, while the longest pair distance is referred to as the free state. We observed that relative population of different conformations of acetamide and methanol pairs at the bound state are directly related to the number of 122 hydrogen bonds they form with water molecules. By clustering acetamide pair conformations at the local minimum of the free energy profiles (d=4 Å), we distinguished three different conformations that could form 0, 1 and 2 hydrogen bonds within the pair. Figure 7 shows the relative population of conformations that form two or one hydrogen bond, as a function of the distance from the bilayer center. At z=0, no hydrogen bond is formed with water molecules because the membrane is not deformed, and as a result the percentage of conformations forming two hydrogen bonds within the pair is 25%. This value decreases as the pair moves to z=4 Å, due to membrane deformation that allow the formation of hydrogen bonds with water. Figure 6-7 Conformational analysis of acetamide pair at the bound state; A) average hydrogen bonds formed between acetamide pair and water molecules as a function of bilayer normal distance, B) fraction of conformations that form one intra-pair hydrogen bond, C) fraction of conformations forming two intra-pair hydrogen bonds Methanol shows a shift in the contact pair distance from 3.5 Å for z values below 10 Å to a distance of 4.5 Å for z values above 12 Å (see Fig. 4A). The corresponding conformations are shown in Figure 8. At deeper insertion depths the methyl groups are 123 exposed to the hydrophobic environment while self-interactions between the two hydroxyl groups are maximized, leading to a shorter center of mass distance. On the other hand, at shallower insertion depths, the hydroxyl groups is exposed to the environment while the methyl groups interact with each other so that they are shielded from the more polar environment. Figure 6-8 Conformational analysis of methanol pair at two possible bound states; A) average hydrogen bonds formed with water, B) fraction of conformation 1 at bound distance 3.5 Å (blue) and at 4.5 Å (red). Results of explicit simulations are shown in solid lines, HDGB in dashed lines, GBSW in dotted line, and IMM1 in dash-dotted line 6.4.4 Comparison with Implicit Membrane Models The data presented here is especially useful for parameterizing simplified models of membrane environment. Implicit membrane models have been previously parameterized using amino acid side chain insertion free energies but so far little attention has been paid to how well implicit membrane models can capture interactions of solutes within the membrane. Figures 3-6 compare the association free energy PMFs for acetamide, methanol, toluene, and propane with HDGB, GBSW, and IMM1 to the explicit solvent results. Very qualitatively, the main trends are more or less reproduced, but, in detail, there are quite significant differences. For example, GBSW greatly overestimates the binding free energy of acetamide in the membrane while the acetamide contact pair is still more favorable than the separated pair at z=24 Å. In the case of methanol, both 124 HDGB and IMM1 do not find a significant favorable binding energy at z=0, only GBSW captures the explicit lipid trend correctly. HDGB and GBSW do capture the shift from favoring the hydroxyl-interacting close distance contact pairs at deep insertion to the methyl-interacting longer contact pair beyond 10-12 Å while IMM1 does not. For the non-polar compounds the differences are less dramatic but nevertheless significant when compared to the explicit lipid simulations. For example, GBSW shows little variation as a function of z while HDGB appears to overemphasize the attraction of hydrophobic pairs near the aqueous phase. 125 Figure 6-9 PMF profiles for acetamide, methanol, toluene, and propane at Z=0 and Z=12 as a function of the pair distance obtained from explicit, HDGB, GBSW, and EEF1 models Based on the new data from this study we attempted to improve the parameterization of the HDGB model that was previously developed in our group. Specifically, we adjusted 126 the dielectric profile as well as the overall scaling factor γ for the non-polar contribution to improve agreement with the pair distance free energies within the membrane while maintaining good agreement with membrane insertion free energies of single amino acid side chain analogs. The overall scaling factor was set to 0.02 kcal/mol/Å2. The optimized dielectric profile is given in Table 6-3 along with the (unmodified) non-polar profile. Figure 9 focuses on the distance profiles at z=0 and z=12 Å for the four analog pairs with the original and improved HDGB model. As can be seen, it is possible to significantly improve the agreement between the implicit membrane model and the explicit lipid results. At the same time, amino acid insertion profiles for 14 amino acid side-chain analogs are in similar agreement with results from explicit simulation[2] and experimental measurements[49] as for the previous HDGB parameterization (see Fig. S5). Nevertheless, with the modified parameters, the association free energy is now overestimated for acetamide at z=0 while dissociated toluene is still not favorable enough, especially for z=12 Å. The use of an implicit model that would allow membrane deformations such as the DHDGB model[134] that may improve the agreement with the explicit lipid results. Another possibility is the inclusion of implicit van der Waals interactions that are expected to become more important in the membrane environment as the role of electrostatics decreases due to the hydrophobic environment. 127 6.5 Conclusions In this study, we are presenting a detailed energetic and structural analysis of amino acid side chain analog interactions within lipid bilayer environments which has received little attention in previous studies. Qualitatively, we confirm expected trends of polar compounds associating strongly inside lipid bilayers compared to hydrophobic compounds. Furthermore, we present detailed quantitative data about the energetics of pair formation at different membrane insertion depths that required a careful analysis of the coupling between amino acid pair interactions and membrane deformations. The presented data is especially useful for the validation and parameterization of simplified membrane models. We show that established implicit membrane models have difficulties to reproduce the association energetics described here. However, it was possible to improve the HDGB model to better reproduce the new data from this study while maintaining good insertion free energy profiles. In future studies we will aim to further improve the implicit membrane model by considering membrane deformations and implicit van der Waals terms. 6.6 Acknowledgement Computational resources for this work were provided by XSEDE facilities (TGMCB090003) and High Performance Computing Center at Michigan State University (HPCC@MSU). 6.7 Supplementary Materials 128 Propane Toluene Methanol Acetamide Table 6-2 Simulation time in nanoseconds for explicit simulations of each amino-acid analog pair under different umbrella potential; the simulation time listed is used in forward and backward directions. 0 4 8 12 16 20 24 0 4 8 12 16 20 24 0 4 8 12 16 20 24 0 4 8 12 16 20 24 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12 12.5 13 13.5 14 14.5 15 20 30 20 20 20 20 20 20 70 20 20 20 20 20 30 30 20 20 30 20 20 20 50 20 20 30 20 20 20 200 20 20 30 20 20 200 90 20 20 30 20 20 120 150 20 20 30 20 20 20 50 20 20 30 20 20 20 50 20 20 30 20 20 20 30 20 20 30 20 20 20 30 20 20 30 20 20 20 20 20 20 30 20 20 20 20 20 20 30 20 20 10 10 10 10 40 10 10 10 10 20 10 14 10 10 10 10 10 10 10 10 10 20 30 20 20 20 20 20 20 70 20 20 20 20 20 30 30 20 20 30 20 20 20 50 20 20 30 20 20 20 200 20 20 30 20 20 200 90 20 20 30 20 20 120 150 20 20 30 20 20 20 50 20 20 30 20 20 20 50 20 20 30 20 20 20 20 20 20 20 20 20 12 12 12 12 70 12 12 6 6 6 6 6 6 6 20 20 20 20 20 20 20 12 12 12 12 70 12 12 6 6 6 6 30 6 6 20 20 92 20 50 20 20 12 12 12 12 70 12 12 6 6 6 16 30 6 6 20 200 40 20 20 20 20 12 20 12 12 70 12 12 6 6 6 16 30 6 6 20 30 20 20 20 20 20 12 12 12 40 40 12 12 6 6 6 16 30 6 6 20 30 20 20 20 20 20 12 12 20 40 40 12 12 6 6 20 16 30 6 6 20 20 20 20 20 20 20 12 12 12 40 40 20 12 6 6 6 16 30 6 6 20 20 20 20 20 20 20 12 12 12 40 40 12 30 6 6 6 16 30 6 6 50 20 20 20 20 20 20 12 12 12 20 40 12 30 6 6 6 30 40 20 6 20 20 30 20 20 20 20 12 12 12 20 40 12 12 6 6 6 30 100 20 6 20 20 20 20 20 20 20 12 12 12 30 40 12 12 6 6 6 30 40 6 6 20 20 30 20 20 20 20 12 12 12 20 40 12 12 6 6 20 16 60 6 6 20 20 20 20 20 20 20 12 12 12 20 40 12 12 6 6 6 30 30 6 6 10 10 10 10 10 10 10 10 10 10 10 40 10 10 6 6 6 30 20 6 6 10 10 10 10 10 10 10 10 10 10 62 10 10 10 6 6 20 20 20 20 6 10 10 10 10 10 10 10 10 10 10 10 30 10 10 6 6 6 6 6 20 6 20 20 20 20 20 20 20 12 12 12 12 70 12 12 6 6 6 6 6 6 6 20 20 20 20 20 20 20 12 12 12 12 70 12 12 6 6 6 6 30 6 6 20 20 92 20 50 20 20 12 12 12 12 70 12 12 6 6 6 16 30 6 6 20 200 40 20 20 20 20 12 20 12 12 70 12 12 6 6 6 16 30 6 6 20 30 20 20 20 20 20 12 12 12 40 40 12 12 6 6 6 16 30 6 6 20 30 20 20 20 20 20 12 12 20 40 40 12 12 6 6 20 16 30 6 6 20 20 20 20 20 20 20 12 12 12 40 40 20 12 6 6 6 16 30 6 6 20 20 20 20 20 20 20 12 12 12 40 40 12 30 6 6 6 16 30 6 6 50 20 20 20 20 20 20 12 12 12 20 40 12 30 6 6 6 30 40 20 6 129 Table 6-3 Improved HDGB parameters, dielectric profile (ε) and non-polar profile (γ). Z 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0 ε(z) - old 1.80 1.80 1.81 1.82 1.83 1.84 1.85 1.86 1.87 1.89 1.91 1.93 1.97 2.00 2.04 2.09 2.15 2.22 2.31 2.41 2.53 3.23 3.63 4.13 4.73 5.43 6.13 6.98 7.84 8.80 10.96 14.05 19.04 25.85 35.38 45.88 54.11 60.79 65.52 69.42 72.31 74.07 75.53 76.63 77.14 77.83 78.22 78.92 79.35 79.66 80.00 ε(z) - new 1.15 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.24 1.26 1.28 1.32 1.35 1.49 1.64 1.80 1.87 1.96 2.06 2.18 2.28 2.38 2.58 2.89 3.42 4.00 5.08 6.04 7.50 10.06 13.75 19.04 25.85 35.38 45.88 54.11 60.79 65.52 69.42 72.31 74.07 75.53 76.63 77.14 77.83 78.22 78.92 79.35 79.66 80.00 z 0.0 0.6 1.2 1.8 2.4 3.0 3.6 4.2 4.8 5.4 6.0 6.6 7.2 7.8 8.4 9.0 9.6 10.2 10.8 11.4 12.0 12.6 13.2 13.8 14.4 15.0 15.6 16.2 16.8 17.4 18.0 18.6 19.2 19.8 20.4 21.0 21.6 22.2 22.8 23.4 24.0 24.6 25.2 25.8 26.4 27.0 27.6 28.2 28.8 29.4 30.0 130 γ(z) 0.0000 0.0001 0.0002 0.0010 0.0050 0.0075 0.0100 0.0150 0.0200 0.0250 0.0300 0.0350 0.0410 0.0470 0.0520 0.0610 0.0720 0.0850 0.1000 0.1200 0.1500 0.1900 0.2500 0.3200 0.4000 0.5000 0.6200 0.7500 0.8753 0.9500 1.0308 1.0680 1.0921 1.1000 1.1000 1.0927 1.0690 1.0468 1.0328 1.0197 1.0130 1.0052 1.0005 1 1 1 1 1 1 1 1 Figure 6-10 Amino acid analogs used in this study, acetamide (Asn), methanol (Ser), toluene (Phe) and propane (Val). 131 Figure 6-11 Root mean squared deviation between the PMF profiles of biased simulations in forward and backward pulling directions. 132 Figure 6-12 A: 2D PMF of acetamide pair association at z=0 and water density in biasing cylinder; B: Corrected 1D PMF as a function of pair distance after Boltzmann averaging along the water density reaction coordinate. 133 Figure 6-13 A: 2D PMF of methanol pair association at z=4 Å and water density in biasing cylinder; B: Corrected 1D PMF as a function of pair distance after Boltzmann averaging along the water density reaction coordinate. 134 side-chain chain analogs using HDGB simulations with old and improved Figure 6-14 Free energy profiles of insertion of single amino acid side parameters compared with results of explicit simulation and experimental measurements. 135 Chapter 7 Conclusion and Perspective 136 Molecular dynamics simulation is a powerful computational technique that gives useful insights to understand the physical characteristics and detailed dynamical information of biophysical systems, such as proteins, membranes, and nucleic acids. Molecular dynamics combined with enhanced sampling methods, such as umbrella sampling, can be used to estimate the free energy and other thermodynamic properties of such systems. In this dissertation, we used molecular dynamics to investigate two goals in biophysical systems. The first goal was to improve/refine protein structures starting from a homology model and develop a robust MD-based protocol for structure refinement, and the second goal was to understand and characterize amino-acid interactions within membrane environments. A robust protocol for structure refinement of proteins models was developed. This protocol was applied to CASP10 targets, and 23 out of 27 targets were successfully refined. To this date, our methodology has remained state of the art solution to protein structure refinement. The key winning factor in our method is optimal subset selection and structure averaging, which was introduced for the first time in protein structure refinement. With the aforementioned success of our method, we applied this protocol to CASP11 with minor modifications; Sampling by MD was extended to 1.2 µs, as well as changes in restraint and subset selection algorithm. Finally, the outcome of this method with such modification has resulted in further improvement. Indeed, breakthrough results are achieved, in which in 3 cases, we have obtained up to 20% improvements in GDT-HA. While the results of structure refinement in CASP11 will be addressed thoroughly in future, we need to understand the effect of our modifications to the protocol. 137 As the second aim of this dissertation, we characterized the association free energy of four amino acid side-chain analog pairs (acetamide (Asn), methanol (Ser), toluene (Phe) and propane (Val)) within membrane bilayer at different distances from bilayer center. Throughout this study, it was observed that acetamide and methanol can create two separate states. The bilayer could be flat or deformed with the same position of those analog pairs, while the relative free energy between flat and deformed state was unknown. Therefore, in order to measure the free energy difference between flat and bilayer states with polar compounds placed at specific bilayer normal, we developed a new computational tool to study free energy of bilayer deformation under umbrella sampling framework. This methodology uses density of water molecules in a cylinder as a reaction coordinate. With application of this methodology, the association free energy surface of acetamide and methanol were corrected through Boltzmann averaging of PMF profiles as functions of association pair distance and water density. Polar compounds in this study, i.e. acetamide and methanol, showed favorable binding free energy at bilayer center, while this effect diminishes as the pair is moved toward water region. On the other hand, non-polar compounds, toluene and propane showed the opposite behavior. This result provides a useful benchmark for understanding peptide-membrane interactions, as well as a valuable tool for comparison and parameterization of other membrane models. In order to improve the performance of HDGB model, we have also re-parameterized HDGB and modified the dielectric profile by comparing the association free energies of these analog pairs against the obtained PMF profiles in explicit simulations. The new HDGB dielectric profile is made available through this study. Prior to this study, the field of protein structure refinement had remained steady with very little progress. We stand as the pioneers of structural subset selection and structure averaging for 138 protein structure refinement. The computational approach developed for membrane deformation and pore formation provides a useful tool for study of membrane bilayer stability under different stress conditions. The protocols and tools developed in this dissertation are freely available to the greater scientific community. 139 REFERENCES 140 REFERENCES [1] D.H. de Jong, X. Periole, S.J. Marrink, Dimerization of Amino Acid Side Chains: Lessons from the Comparison of Different Force Fields, Publisher, City, 2012. [2] J.L. MacCallum, W.F.D. Bennett, D.P. Tieleman, Distribution of Amino Acids in a Lipid Bilayer from Computer Simulations, Publisher, City, 2008. [3] S.A. Adcock, J.A. McCammon, Molecular Dynamics:  Survey of Methods for Simulating the Activity of Proteins, Publisher, City, 2006. [4] L. Fang, H.J. Cho, C. Chan, M. Feig, Binding site multiplicity with fatty acid ligands: Implications for the regulation of PKR kinase autophosphorylation with palmitate: PalmitateKinase Binding Multiplicity, Publisher, City, 2014. [5] B. Wang, Alexander V. Predeus, Zachary F. Burton, M. Feig, Energetic and Structural Details of the Trigger-Loop Closing Transition in RNA Polymerase II, Publisher, City, 2013. [6] M. Feig, Z.F. Burton, RNA polymerase II with open and closed trigger loops: active site dynamics and nucleic acid translocation, Publisher, City, 2010. [7] E.I. Chudyk, M.A.L. Limb, C. Jones, J. Spencer, M.W. van der Kamp, A.J. Mulholland, QM/MM simulations as an assay for carbapenemase activity in class A β-lactamases, Publisher, City, 2014. [8] M. Karplus, J.A. McCammon, Molecular dynamics simulations of biomolecules, Publisher, City, 2002. [9] R.B. Best, X. Zhu, J. Shim, P.E.M. Lopes, J. Mittal, M. Feig, A.D. MacKerell, Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone  , ψ and Side-Chain χ1 and χ2 Dihedral Angles, Publisher, City, 2012. [10] R.W. Pastor, A.D. MacKerell, Development of the CHARMM Force Field for Lipids, Publisher, City, 2011. [11] K. Vanommeslaeghe, E. Hatcher, C. Acharya, S. Kundu, S. Zhong, J. Shim, E. Darian, O. Guvench, P. Lopes, I. Vorobyov, A.D. Mackerell, CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields, Publisher, City, 2010. [12] W.D. Cornell, P. Cieplak, C.I. Bayly, I.R. Gould, K.M. Merz, D.M. Ferguson, D.C. Spellmeyer, T. Fox, J.W. Caldwell, P.A. Kollman, A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules, Publisher, City, 1995. 141 [13] C. Oostenbrink, A. Villa, A.E. Mark, W.F. Van Gunsteren, A biomolecular force field based on the free enthalpy of hydration and solvation: The GROMOS force-field parameter sets 53A5 and 53A6, Publisher, City, 2004. [14] W.L. Jorgensen, D.S. Maxwell, J. Tirado-Rives, Development and Testing of the OPLS AllAtom Force Field on Conformational Energetics and Properties of Organic Liquids, Publisher, City, 1996. [15] W.L. Jorgensen, J. Tirado-Rives, The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin, Publisher, City, 1988. [16] S.J. Marrink, H.J. Risselada, S. Yefimov, D.P. Tieleman, A.H. de Vries, The MARTINI Force Field:  Coarse Grained Model for Biomolecular Simulations, Publisher, City, 2007. [17] P. Kar, S.M. Gopal, Y.-M. Cheng, A. Predeus, M. Feig, PRIMO: A Transferable CoarseGrained Force Field for Proteins, Publisher, City, 2013. [18] S.M. Gopal, S. Mukherjee, Y.-M. Cheng, M. Feig, PRIMO/PRIMONA: A coarse-grained model for proteins and nucleic acids that preserves near-atomistic accuracy, Publisher, City, 2010. [19] A.D. Mackerell, M. Feig, C.L. Brooks, Extending the treatment of backbone energetics in protein force fields: Limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations, Publisher, City, 2004. [20] B. Guillot, A reappraisal of what we have learnt during three decades of computer simulations on water, Publisher, City, 2002. [21] P. Mark, L. Nilsson, Structure and Dynamics of the TIP3P, SPC, and SPC/E Water Models at 298 K, Publisher, City, 2001. [22] J. Chen, C.L. Brooks, J. Khandogin, Recent advances in implicit solvent-based methods for biomolecular simulations, Publisher, City, 2008. [23] W.C. Still, A. Tempczyk, R.C. Hawley, T. Hendrickson, Semianalytical treatment of solvation for molecular mechanics and dynamics, Publisher, City, 1990. [24] W. Im, M.S. Lee, C.L. Brooks, Generalized born model with a simple smoothing function, Publisher, City, 2003. [25] M.S. Lee, F.R. Salsbury, C.L. Brooks, Novel generalized Born methods, Publisher, City, 2002. [26] M.S. Lee, M. Feig, F.R. Salsbury, C.L. Brooks, New analytic approximation to the standard molecular volume definition and its application to generalized Born calculations, Publisher, City, 2003. 142 [27] A. Onufriev, D. Bashford, D.A. Case, Modification of the Generalized Born Model Suitable for Macromolecules, Publisher, City, 2000. [28] M. Feig, W. Im, C.L. Brooks, Implicit solvation based on generalized Born theory in different dielectric environments, Publisher, City, 2004. [29] S. Tanizaki, M. Feig, A generalized Born formalism for heterogeneous dielectric environments: Application to the implicit modeling of biological membranes, Publisher, City, 2005. [30] M. Sayadi, M. Feig, Role of conformational sampling of Ser16 and Thr17-phosphorylated phospholamban in interactions with SERCA, Publisher, City, 2013. [31] D.A. McQuarrie, Statistical mechanics, Harper & Row, New York, 1975. [32] G.M. Torrie, J.P. Valleau, Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling, Publisher, City, 1977. [33] S. Kumar, J.M. Rosenberg, D. Bouzida, R.H. Swendsen, P.A. Kollman, The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method, Publisher, City, 1992. [34] A. Raval, S. Piana, M.P. Eastwood, R.O. Dror, D.E. Shaw, Refinement of protein structure homology models via long, all-atom molecular dynamics simulations, Publisher, City, 2012. [35] J.L. MacCallum, A. Pérez, M.J. Schnieders, L. Hua, M.P. Jacobson, K.A. Dill, Assessment of protein structure refinement in CASP9, Publisher, City, 2011. [36] J.L. MacCallum, L. Hua, M.J. Schnieders, V.S. Pande, M.P. Jacobson, K.A. Dill, Assessment of the protein-structure refinement category in CASP8, Publisher, City, 2009. [37] V. Mirjalili, M. Feig, Protein Structure Refinement through Structure Selection and Averaging from Molecular Dynamics Ensembles, Publisher, City, 2012. [38] V. Mirjalili, K. Noyes, M. Feig, Physics-based protein structure refinement through multiple molecular dynamics trajectories and structure averaging, Publisher, City, 2014. [39] A. Jagielska, L. Wroblewska, J. Skolnick, Protein model refinement using an optimized physics-based all-atom force field, Publisher, City, 2008. [40] M.S. Lin, T. Head-Gordon, Reliable Protein Structure Refinement Using a Physical Energy Function, Publisher, City, 2011. [41] J. Zhu, H. Fan, X. Periole, B. Honig, A.E. Mark, Refining homology models by combining replica-exchange molecular dynamics and statistical potentials, Publisher, City, 2008. [42] G. Chopra, N. Kalisman, M. Levitt, Consistent refinement of submitted models at CASP using a knowledge-based potential, Publisher, City, 2010. 143 [43] J.P.G.L.M. Rodrigues, M. Levitt, G. Chopra, KoBaMIN: a knowledge-based minimization web server for protein structure refinement, Publisher, City, 2012. [44] Y. Yang, Y. Zhou, Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions, Publisher, City, 2008. [45] C. Zhang, S. Liu, H. Zhou, Y. Zhou, The Dependence of All-Atom Statistical Potentials on Structural Training Database, Publisher, City, 2004. [46] H. Zhou, Y. Zhou, Distance-scaled, finite ideal-gas reference state improves structurederived potentials of mean force for structure selection and stability prediction, Publisher, City, 2009. [47] E. Faraggi, A. Kloczkowski, A global machine learning based scoring function for protein structure prediction: Global Knowledge-Based Function for Protein Structure, Publisher, City, 2014. [48] J. Zhang, Y. Zhang, A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction, Publisher, City, 2010. [49] A. Radzicka, R. Wolfenden, Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution, Publisher, City, 1988. [50] A. Roy, A. Kucukural, Y. Zhang, I-TASSER: a unified platform for automated protein structure and function prediction, Publisher, City, 2010. [51] P. Bradley, K.M.S. Misura, D. Baker, Toward High-Resolution de Novo Structure Prediction for Small Proteins, Publisher, City, 2005. [52] A. Šali, T.L. Blundell, Comparative Protein Modelling by Satisfaction of Spatial Restraints, Publisher, City, 1993. [53] B.K. Lance, C.M. Deane, G.R. Wood, Exploring the potential of template-based modelling, Publisher, City, 2010. [54] K. Joo, J. Lee, S. Lee, J.-H. Seo, S.J. Lee, J. Lee, High accuracy template based modeling by global optimization, Publisher, City, 2007. [55] J. Moult, A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction, Publisher, City, 2005. [56] Y. Zhang, I-TASSER server for protein 3D structure prediction, Publisher, City, 2008. [57] Y. Zhang, Protein structure prediction: when is it useful?, Publisher, City, 2009. 144 [58] D. Fischer, 3D-SHOTGUN: A novel, cooperative, fold-recognition meta-predictor, Publisher, City, 2003. [59] K. Ginalski, A. Elofsson, D. Fischer, L. Rychlewski, 3D-Jury: a simple approach to improve protein structure predictions, Publisher, City, 2003. [60] K.M.S. Misura, D. Baker, Progress and challenges in high-resolution refinement of protein structure models, Publisher, City, 2005. [61] H. Berman, K. Henrick, H. Nakamura, J.L. Markley, The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data, Publisher, City, 2007. [62] T.Y. Liu, G.W. Tang, E. Capriotti, Comparative Modeling: The State of the Art and Protein Drug Target Structure Prediction, Publisher, City, 2011. [63] M. Takeda-Shitaka, D. Takaya, C. Chiba, H. Tanaka, H. Umeyama, Protein structure prediction in structure based drug design, Publisher, City, 2004. [64] A. Giorgetti, D. Raimondo, A.E. Miele, A. Tramontano, Evaluating the usefulness of protein structure models for molecular replacement, Publisher, City, 2005. [65] M.R. Lee, J. Tsai, D. Baker, P.A. Kollman, Molecular dynamics in the endgame of protein structure prediction, Publisher, City, 2001. [66] J. Chen, C.L. Brooks, Can molecular dynamics simulations provide high-resolution refinement of protein structure?, Publisher, City, 2007. [67] A. Jagielska, L. Wroblewska, J. Skolnick, Protein model refinement using an optimized physics-based all-atom force field, Publisher, City, 2008. [68] M.S. Lee, M.A. Olson, Assessment of detection and refinement strategies for de novo protein structures using force field and statistical potentials, Publisher, City, 2007. [69] H. Fan, A.E. Mark, Refinement of homology-based protein structures by molecular dynamics simulation techniques, Publisher, City, 2004. [70] A. Raval, S. Piana, M.P. Eastwood, R.O. Dror, D.E. Shaw, Refinement of protein structure homology models via long, all-atom molecular dynamics simulations, Publisher, City, 2012. [71] H. Fan, X. Periole, A.E. Mark, Mimicking the action of folding chaperones by Hamiltonian replica-exchange molecular dynamics simulations: Application in the refinement of de novo models, Publisher, City, 2012. [72] R. Ishitani, T. Terada, K. Shimizu, Refinement of comparative models of protein structure by using multicanonical molecular dynamics simulations, Publisher, City, 2008. 145 [73] A.W. Stumpff-Kane, K. Maksimiak, M.S. Lee, M. Feig, Sampling of near-native protein conformations during protein structure refinement using a coarse-grained model, normal modes, and molecular dynamics simulations, Publisher, City, 2008. [74] D.E. Kim, B. Blum, P. Bradley, D. Baker, Sampling Bottlenecks in De novo Protein Structure Prediction, Publisher, City, 2009. [75] C.M. Summa, M. Levitt, Near-native structure refinement using in vacuo energy minimization, Publisher, City, 2007. [76] D.W. Li, R. Bruschweiler, Dynamic and Thermodynamic Signatures of Native and NonNative Protein States with Application to the Improvement of Protein Structures, Publisher, City, 2012. [77] G. Chopra, C.M. Summa, M. Levitt, Solvent dramatically affects protein structure refinement, Publisher, City, 2008. [78] G. Chopra, N. Kalisman, M. Levitt, Consistent refinement of submitted models at CASP using a knowledge-based potential, Publisher, City, 2010. [79] H. Lu, J. Skolnick, Application of statistical potentials to protein structure refinement from low resolution Ab initio models, Publisher, City, 2003. [80] C. Zhang, S. Liu, Y.Q. Zhou, Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential, Publisher, City, 2004. [81] J. Zhang, Y. Liang, Y. Zhang, Atomic-Level Protein Structure Refinement Using FragmentGuided Molecular Dynamics Conformation Sampling, Publisher, City, 2011. [82] J. Zhu, L. Xie, B. Honig, Structural refinement of protein segments containing secondary structure elements: Local sampling, knowledge-based potentials, and clustering, Publisher, City, 2006. [83] M.A. Olson, S. Chaudhury, M.S. Lee, Comparison Between Self-Guided Langevin Dynamics and Molecular Dynamics Simulations for Structure Refinement of Protein Loop Conformations, Publisher, City, 2011. [84] Y. Yang, Y. Zhou, Specific interactions for ab initio folding of protein terminal regions with secondary structures, Publisher, City, 2008. [85] A. Zemla, C. Venclovas, J. Moult, K. Fidelis, Processing and analysis of CASP3 protein structure predictions, Publisher, City, 1999. [86] B.R. Brooks, C.L. Brooks, A.D. Mackerell, L. Nilsson, R.J. Petrella, B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch, L. Caves, Q. Cui, A.R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek, W. Im, K. Kuczera, T. Lazaridis, J. Ma, V. Ovchinnikov, E. Paci, R.W. Pastor, C.B. Post, J.Z. Pu, M. Schaefer, B. Tidor, R.M. Venable, H.L. Woodcock, X. Wu, 146 W. Yang, D.M. York, M. Karplus, CHARMM: The biomolecular simulation program, Publisher, City, 2009. [87] B.R. Brooks, R.E. Bruccoleri, B.D. Olafson, D.J. States, S. Swaminathan, M. Karplus, CHARMM: A program for macromolecular energy, minimization, and dynamics calculations, Publisher, City, 1983. [88] W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R.W. Impey, M.L. Klein, COMPARISON OF SIMPLE POTENTIAL FUNCTIONS FOR SIMULATING LIQUID WATER, Publisher, City, 1983. [89] A.D. MacKerell, D. Bashford, M. Bellott, R.L. Dunbrack, J.D. Evanseck, M.J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-McCarthy, L. Kuchnir, K. Kuczera, F.T.K. Lau, C. Mattos, S. Michnick, T. Ngo, D.T. Nguyen, B. Prodhom, W.E. Reiher, B. Roux, M. Schlenkrich, J.C. Smith, R. Stote, J. Straub, M. Watanabe, J. Wiórkiewicz-Kuczera, D. Yin, M. Karplus, AllAtom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins, Publisher, City, 1998. [90] A.D. MacKerell, M. Feig, C.L. Brooks, Improved treatment of the protein backbone in empirical force fields, Publisher, City, 2004. [91] V.B. Chen, W.B. Arendall, III, J.J. Headd, D.A. Keedy, R.M. Immormino, G.J. Kapral, L.W. Murray, J.S. Richardson, D.C. Richardson, MolProbity: all-atom structure validation for macromolecular crystallography, Publisher, City, 2010. [92] M. Feig, J. Karanicolas, C.L. Brooks, MMTSB Tool Set: enhanced sampling and multiscale modeling methods for applications in structural biology, Publisher, City, 2004. [93] J.C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R.D. Skeel, L. Kale, K. Schulten, Scalable molecular dynamics with NAMD, Publisher, City, 2005. [94] A. Zemla, LGA: a method for finding 3D similarities in protein structures, Publisher, City, 2003. [95] L.S.D. Caves, J.D. Evanseck, M. Karplus, Locally accessible conformations of proteins: Multiple molecular dynamics simulations of crambin, Publisher, City, 1998. [96] M.I. Zavodszky, A.W. Stumpff-Kane, D.J. Lee, M. Feig, Scoring confidence index: statistical evaluation of ligand binding mode predictions, Publisher, City, 2009. [97] A.W. Stumpff-Kane, M. Feig, A correlation-based method for the enhancement of scoring functions on funnel-shaped energy landscapes, Publisher, City, 2006. [98] V. Mariani, F. Kiefer, T. Schmidt, J. Haas, T. Schwede, Assessment of template based protein structure predictions in CASP9, Publisher, City, 2011. [99] A. Roy, A. Kucukural, Y. Zhang, I-TASSER: a unified platform for automated protein structure and function prediction, Publisher, City, 2010. 147 [100] A. Kryshtafovych, C. Venclovas, K. Fidelis, J. Moult, Progress over the first decade of CASP experiments, Publisher, City, 2005. [101] J. Moult, K. Fidelis, B. Rost, T. Hubbard, A. Tramontano, Critical assessment of methods of protein structure prediction (CASP) - Round 6, Publisher, City, 2005. [102] Y. Zhang, J. Skolnick, The protein structure prediction problem could be solved using the current PDB library, Publisher, City, 2005. [103] K. Henrick, Z.K. Feng, W.F. Bluhm, D. Dimitropoulos, J.F. Doreleijers, S. Dutta, J.L. Flippen-Anderson, J. Ionides, C. Kamada, E. Krissinel, C.L. Lawson, J.L. Markley, H. Nakamura, R. Newman, Y. Shimizu, J. Swaminathan, S. Velankar, J. Ory, E.L. Ulrich, W. Vranken, J. Westbrook, R. Yamashita, H. Yang, J. Young, M. Yousufuddin, H.M. Berman, Remediation of the protein data bank archive, Publisher, City, 2008. [104] A. Bazzoli, A.G.B. Tettamanzi, Y. Zhang, Computational Protein Design and Large-Scale Assessment by I-TASSER Structure Assembly Simulations, Publisher, City, 2011. [105] A. Fiser, A. Sali, MODELLER: Generation and refinement of homology-based protein structure models, in: C.W. Carter, R.M. Sweet (Eds.) Macromolecular Crystallography, Pt D, Elsevier Academic Press Inc, San Diego, 2003, pp. 461-+. [106] D. Xu, Y. Zhang, Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field, Publisher, City, 2012. [107] Y. Zhang, J. Skolnick, Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins, Publisher, City, 2004. [108] J. Ko, H. Park, C. Seok, GalaxyTBM: template-based modeling by building a reliable core and refining unreliable local regions, Publisher, City, 2012. [109] X.T. Qu, R. Swanson, R. Day, J. Tsai, A Guide to Template Based Structure Prediction, Publisher, City, 2009. [110] J. Zhang, Y. Liang, Y. Zhang, Atomic-Level Protein Structure Refinement Using Fragment-Guided Molecular Dynamics Conformation Sampling, Publisher, City, 2011. [111] J.H. Chen, C.L. Brooks, Can molecular dynamics simulations provide high-resolution refinement of protein structure?, Publisher, City, 2007. [112] D. Bhattacharya, J.L. Cheng, 3Drefine: Consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization, Publisher, City, 2013. [113] J. Rodrigues, M. Levitt, G. Chopra, KoBaMIN: a knowledge-based minimization web server for protein structure refinement, Publisher, City, 2012. 148 [114] J. Zhu, H. Fan, X. Periole, B. Honig, A.E. Mark, Refining homology models by combining replica-exchange molecular dynamics and statistical potentials, Publisher, City, 2008. [115] M.A. Olson, M.S. Lee, Structure refinement of protein model decoys requires accurate side-chain placement, Publisher, City, 2013. [116] L. Wroblewska, A. Jagielska, J. Skolnick, Development of a physics-based force field for the scoring and refinement of protein models, Publisher, City, 2008. [117] J.W. Ponder, D.A. Case, Force fields for protein simulations, Publisher, City, 2003. [118] R.B. Best, X. Zhu, J. Shim, P.E.M. Lopes, J. Mittal, M. Feig, A.D. MacKerell, Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone phi, psi and Side-Chain chi(1) and chi(2) Dihedral Angles, Publisher, City, 2012. [119] Y.D. Yang, Y.Q. Zhou, Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions, Publisher, City, 2008. [120] Y.D. Yang, Y.Q. Zhou, Specific interactions for ab initio folding of protein terminal regions with secondary structures, Publisher, City, 2008. [121] H.Y. Zhou, J. Skolnick, GOAP: A Generalized Orientation-Dependent, All-Atom Statistical Potential for Protein Structure Prediction, Publisher, City, 2011. [122] M.Y. Shen, A. Sali, Statistical potential for assessment and prediction of protein structures, Publisher, City, 2006. [123] H. Fan, X. Periole, A.E. Mark, Mimicking the action of folding chaperones by Hamiltonian replica-exchange molecular dynamics simulations: Application in the refinement of de novo models, Publisher, City, 2012. [124] D. Gront, S. Kmiecik, M. Blaszczyk, D. Ekonomiuk, A. Kolinski, Optimization of protein models, Publisher, City, 2012. [125] M.H.M. Olsson, C.R. Søndergaard, M. Rostkowski, J.H. Jensen, PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pKa Predictions, Publisher, City, 2011. [126] C.R. Søndergaard, M.H.M. Olsson, M. Rostkowski, J.H. Jensen, Improved Treatment of Ligands and Coupling Effects in Empirical Calculation and Rationalization of pKa Values, Publisher, City, 2011. [127] W.L. DeLano, PyMOL molecular viewer: Updates and refinements, Publisher, City, 2009. [128] W.L. DeLano, J.W. Lam, PyMOL: A communications tool for computational models, Publisher, City, 2005. 149 [129] S.-Y. Huang, X. Zou, An iterative knowledge-based scoring function for protein-protein recognition, Publisher, City, 2008. [130] H. Zhou, J. Skolnick, GOAP: A Generalized Orientation-Dependent, All-Atom Statistical Potential for Protein Structure Prediction, Publisher, City, 2011. [131] M. Lu, A.D. Dousis, J. Ma, OPUS-PSP: An Orientation-dependent Statistical All-atom Potential Derived from Side-chain Packing, Publisher, City, 2008. [132] M.-y. Shen, A. Sali, Statistical potential for assessment and prediction of protein structures, Publisher, City, 2006. [133] S.J. Marrink, H.J. Risselada, S. Yefimov, D.P. Tieleman, A.H. de Vries, The MARTINI Force Field:  Coarse Grained Model for Biomolecular Simulations, Publisher, City, 2007. [134] A. Panahi, M. Feig, Dynamic Heterogeneous Dielectric Generalized Born (DHDGB): An Implicit Membrane Model with a Dynamically Varying Bilayer Thickness, Publisher, City, 2013. [135] M. Feig, Implicit Membrane Models for Membrane Protein Simulation, in: A. Kukol (Ed.) Molecular Modeling of Proteins, Humana Press, 2008, pp. 181-196. [136] Y.Z. Ohkubo, Taras V. Pogorelov, Mark J. Arcario, Geoff A. Christensen, E. Tajkhorshid, Accelerating Membrane Insertion of Peripheral Proteins with a Novel Membrane Mimetic Model, Publisher, City, 2012. [137] G. Brannigan, P.F. Philips, F.L.H. Brown, Flexible lipid bilayers in implicit solvent, Publisher, City, 2005. [138] R.A. Böckmann, B.L. de Groot, S. Kakorin, E. Neumann, H. Grubmüller, Kinetics, Statistics, and Energetics of Lipid Membrane Electroporation Studied by Molecular Dynamics Simulations, Publisher, City, 2008. [139] A.A. Gurtovenko, J. Anwar, I. Vattulainen, Defect-Mediated Trafficking across Cell Membranes: Insights from in Silico Modeling, Publisher, City, 2010. [140] H. Leontiadou, A.E. Mark, S.J. Marrink, Molecular Dynamics Simulations of Hydrophilic Pores in Lipid Bilayers, Publisher, City, 2004. [141] J.D. Litster, Stability of lipid bilayers and red blood cell membranes, Publisher, City, 1975. [142] J.N. Sachs, P.S. Crozier, T.B. Woolf, Atomistic simulations of biologically realistic transmembrane potential gradients, Publisher, City, 2004. [143] D. Tieleman, The molecular basis of electroporation, Publisher, City, 2004. [144] D.P. Tieleman, H. Leontiadou, A.E. Mark, S.-J. Marrink, Simulation of Pore Formation in Lipid Bilayers by Mechanical Stress and Electric Fields, Publisher, City, 2003. 150 [145] J.C. Weaver, Y.A. Chizmadzhev, Theory of electroporation: A review, Publisher, City, 1996. [146] L. Delemotte, M. Tarek, Molecular Dynamics Simulations of Lipid Membrane Electroporation, Publisher, City, 2012. [147] Z. Levine, P.T. Vernier, Life Cycle of an Electropore: Field-Dependent and FieldIndependent Steps in Pore Creation and Annihilation, Publisher, City, 2010. [148] K.A. Dill, J.L. MacCallum, The Protein-Folding Problem, 50 Years On, Publisher, City, 2012. [149] T.V. Tolpekina, W.K. den Otter, W.J. Briels, Nucleation free energy of pore formation in an amphiphilic bilayer studied by molecular dynamics simulations, Publisher, City, 2004. [150] J. Wohlert, W.K. den Otter, O. Edholm, W.J. Briels, Free energy of a trans-membrane pore calculated from atomistic molecular dynamics simulations, Publisher, City, 2006. [151] E. Guàrdia, R. Rey, J.A. Padró, Potential of mean force by constrained molecular dynamics: A sodium chloride ion-pair in water, Publisher, City, 1991. [152] W.F.D. Bennett, N. Sapay, D.P. Tieleman, Atomistic Simulations of Pore Formation and Closure in Lipid Bilayers, Publisher, City, 2014. [153] D. Bhattacharya, J. Cheng, 3Drefine: Consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization, Publisher, City, 2013. [154] T. Nugent, D. Cozzetto, D.T. Jones, Evaluation of predictions in the CASP10 model refinement category: Assessment of Model Refinement Predictions, Publisher, City, 2014. [155] N. Kučerka, S. Tristram-Nagle, J.F. Nagle, Closer Look at Structure of Fully Hydrated Fluid Phase DPPC Bilayers, Publisher, City, 2006. [156] J.F. Nagle, S. Tristram-Nagle, Structure of lipid bilayers, Publisher, City, 2000. [157] J.B. Klauda, R.M. Venable, J.A. Freites, J.W. O’Connor, D.J. Tobias, C. MondragonRamirez, I. Vorobyov, A.D. MacKerell, R.W. Pastor, Update of the CHARMM All-Atom Additive Force Field for Lipids: Validation on Six Lipid Types, Publisher, City, 2010. [158] T. Darden, D. York, L. Pedersen, Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems, Publisher, City, 1993. [159] J.-P. Ryckaert, G. Ciccotti, H.J.C. Berendsen, Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes, Publisher, City, 1977. [160] A.B. Larsen, J.R. Wagner, A. Jain, N. Vaidehi, Protein Structure Refinement of CASP Target Proteins Using GNEIMO Torsional Dynamics Method, Publisher, City, 2014. 151 [161] R.S. Ostrom, P.A. Insel, The evolving role of lipid rafts and caveolae in G protein-coupled receptor signaling: implications for molecular pharmacology, Publisher, City, 2004. [162] W.F.D. Bennett, D.P. Tieleman, Computer simulations of lipid membrane domains, Publisher, City, 2013. [163] O.S. Andersen, R.E. Koeppe, Bilayer Thickness and Membrane Protein Function: An Energetic Perspective, Publisher, City, 2007. [164] S.J. Marrink, H.J. Risselada, S. Yefimov, D.P. Tieleman, A.H. de Vries, The MARTINI force field: Coarse grained model for biomolecular simulations, Publisher, City, 2007. [165] P. Kar, S.M. Gopal, Y.-M. Cheng, A. Predeus, M. Feig, PRIMO: A Transferable CoarseGrained Force Field for Proteins, Publisher, City, 2013. [166] W. Im, M. Feig, C.L. Brooks, An Implicit Membrane Generalized Born Theory for the Study of Structure, Stability, and Interactions of Membrane Proteins, Publisher, City, 2003. [167] T. Lazaridis, Effective energy function for proteins in lipid membranes, Publisher, City, 2003. [168] S. Tanizaki, M. Feig, A generalized Born formalism for heterogeneous dielectric environments: Application to the implicit modeling of biological membranes, Publisher, City, 2005. [169] M. Sayadi, S. Tanizaki, M. Feig, Effect of Membrane Thickness on Conformational Sampling of Phospholamban from Computer Simulations, Publisher, City, 2010. [170] A. Panahi, M. Feig, Conformational Sampling of Influenza Fusion Peptide in Membrane Bilayers as a Function of Termini and Protonation States, Publisher, City, 2009. [171] N.R. Latorraca, K.M. Callenberg, J.P. Boyle, M. Grabe, Continuum Approaches to Understanding Ion and Peptide Interactions with the Membrane, Publisher, City, 2014. [172] G. Brannigan, F.L.H. Brown, Contributions of Gaussian Curvature and Nonconstant Lipid Volume to Protein Deformation of Lipid Bilayers, Publisher, City, 2007. [173] S. Esteban-Martín, J. Salgado, The Dynamic Orientation of Membrane-Bound Peptides: Bridging Simulations and Experiments, Publisher, City, 2007. [174] T. Kim, W. Im, Revisiting Hydrophobic Mismatch with Free Energy Simulation Studies of Transmembrane Helix Tilt and Rotation, Publisher, City, 2010. [175] J. Lee, W. Im, Transmembrane Helix Tilting: Insights from Calculating the Potential of Mean Force, Publisher, City, 2008. [176] S.H. Park, S.J. Opella, Tilt Angle of a Trans-membrane Helix is Determined by Hydrophobic Mismatch, Publisher, City, 2005. 152 [177] N. Castillo, L. Monticelli, J. Barnoud, D.P. Tieleman, Free energy of WALP23 dimer association in DMPC, DPPC, and DOPC bilayers, Publisher, City, 2013. [178] P. Lagüe, M.J. Zuckermann, B. Roux, Lipid-Mediated Interactions between Intrinsic Membrane Proteins: Dependence on Protein Size and Lipid Composition, Publisher, City, 2001. [179] F.J.M. de Meyer, M. Venturoli, B. Smit, Molecular simulations of lipid-mediated proteinprotein interactions, Publisher, City, 2008. [180] A. Benjamini, B. Smit, Robust Driving Forces for Transmembrane Helix Packing, Publisher, City, 2012. [181] W. Im, C.L. Brooks, Interfacial folding and membrane insertion of designed peptides studied by molecular dynamics simulations, Publisher, City, 2005. [182] S. Choe, K.A. Hecht, M. Grabe, A Continuum Method for Determining Membrane Protein Insertion Energies and the Problem of Charged Residues, Publisher, City, 2008. [183] H. Aranda-Espinoza, A. Berman, N. Dan, P. Pincus, S. Safran, Interaction between inclusions embedded in membranes, Publisher, City, 1996. [184] S. Mondal, G. Khelashvili, H. Weinstein, Not Just an Oil Slick: How the Energetics of Protein-Membrane Interactions Impacts the Function and Organization of Transmembrane Proteins, Publisher, City, 2014. [185] D.P. Tieleman, J.L. MacCallum, W.L. Ash, C. Kandt, Z. Xu, L. Monticelli, Membrane protein simulations with a united-atom lipid and all-atom protein model: lipid–protein interactions, side chain transfer free energies and model proteins, Publisher, City, 2006. [186] J.L. Knight, C.L. Brooks, Surveying implicit solvent models for estimating small molecule absolute hydration free energies, Publisher, City, 2011. [187] T. Lazaridis, M. Karplus, Effective energy function for proteins in solution, Publisher, City, 1999. [188] J.C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R.D. Skeel, L. Kalé, K. Schulten, Scalable molecular dynamics with NAMD, Publisher, City, 2005. [189] W.L. Jorgensen, J. Chandrasekhar, J.D. Madura, R.W. Impey, M.L. Klein, Comparison of simple potential functions for simulating liquid water, Publisher, City, 1983. [190] V. Mirjalili, M. Feig, Density-Biased Sampling: A Robust Computational Method for Studying Pore Formation in Membranes, Publisher, City. 153