DEVELOPMENT OF MOLECULAR DYNAMICS FORCE FIELD OF YOPRO-1 AND DEEP LEARNING MODELS FOR PROTEIN CLASSIFICATION By Chi Jin A DISSERTATION Michigan State University Submitted to for degree of 
in partial fulfillment of the requirements Chemistry—Doctor of Philosophy 2019 ABSTRACT DEVELOPMENT OF MOLECULAR DYNAMICS FORCE FIELD OF YOPRO-1 AND DEEP LEARNING MODELS FOR PROTEIN CLASSIFICATION By Chi Jin Cyanine dyes, such as Oxazole yellow (YOPRO), are almost non-fluorescent in water but their fluorescence is greatly enhanced after intercalation in double- stranded DNA, providing the basis of DNA concentration assays. The rationale for this property is the flexibility difference of the conformations of the molecule in different environments, mainly attributed to the linker dihedral rotations. We compared two methods for deriving the specific dihedral force field on the linker of YOPRO, namely by modifying the AMBER generated force field (GAFF) and by using the IPolQ fitting protocol. There are two dihedral angles and the IPolQ method showed that their potential surfaces are coupled. Thus, going beyond the GAFF approach, coupled dihedral surfaces were obtained for the ground S0 and first excited S1 electronic states. Molecular Dynamics (MD) simulations of YOPRO were carried out in water and intercalations using these force field models. The MD simulations started at the minima of the S0 state vertically excited to the S1 state. The contrast between YOPRO conformational relaxation on the S1 surface in water and when intercalated provided the non-radiative relaxation pathways relevant to fluorescence decay and explain the differences in quantum yield. For the second topic, we investigated a number of deep machine learning (ML) models for protein family classification. We used one dimensional sequence and three dimensional secondary structural information of proteins as the input for training the neural network models. The results show that deeper convolutional networks of the Long Short Term Memory (LSTM) variety significantly enhanced the prediction accuracies compared to less sophisticated models. The addition of the secondary structural information greatly increases the testing accuracies with the training data size remaining the same. Proteins belonging to different families can be successfully distinguished using these methods. TABLE OF CONTENTS LIST OF TABLES ........................................................................................................................................ vi LIST OF FIGURES .................................................................................................................................... viii 1 Introduction ........................................................................................................................................ 1 1.1 General Introduction .............................................................................................................. 1 1.2 Molecular Dynamics Simulation ....................................................................................... 4 1.2.1 General Description .................................................................................................... 4 1.2.2 Force Field ....................................................................................................................... 5 Integrator ......................................................................................................................... 6 1.2.3 1.2.4 Period Boundary Condition ..................................................................................... 6 1.2.5 Temperature and Pressure Control ..................................................................... 7 Long-range Interactions ............................................................................................ 9 1.2.6 1.3 Weighted Histogram Analysis Method 
 ..................................................................... 10 1.3.1 Umbrella Sampling .................................................................................................... 10 1.3.2 Unbiasing Procedure ................................................................................................ 11 Implicitly Polarized Charge Method .............................................................................. 14 1.4 1.5 Force Constant Fitting ......................................................................................................... 16 1.6 Fluorescence Theory ............................................................................................................ 18 1.7 Kramers Theory ..................................................................................................................... 23 1.8 Gas Phase Approaches ......................................................................................................... 28 2 Methods ............................................................................................................................................... 31 Simulations with Modified Amber-Generated Force Fields ................................ 31 2.1 2.1.1 Introduction .................................................................................................................. 31 2.1.2 YOPRO Structure and Internal Force Field Parameters ............................ 32 YOPRO Solvation in Water ..................................................................................... 33 2.1.3 2.1.4 bZIP-DNA Solvation in Water ............................................................................... 34 2.1.5 bZIP-DNA-YOPRO Intercalation in Water

 ....................................................... 35 2.1.6 Umbrella Sampling of YOPRO Model 1 and Model 2 in Water ............... 36 2.1.7 Unrestrained Intercalation Simulation of Model 1, 2 and 3 ......................... 37 2.1.8 Umbrella Sampling Simulation of Models 2 and 3 Intercalated 
 .............. 38 2.2 Simulations with Fitted Force Fields by the Implicitly Polarized Charge Method 
 ................................................................................................................................................. 40 2.2.1 Partial Charge Set Fitting for Water-solvated YOPRO

 ........................... 40 2.2.2 Dihedral Force Constant Fitting ........................................................................... 41 2.2.3 Umbrella Sampling of YOPRO in Water on the S0 and S1 Surfaces ...... 42 Umbrella Sampling Simulation of Intercalated YOPRO Surfaces .......... 44 2.2.4 3 Results ................................................................................................................................................. 45 3.1 bZIP-DNA in Water ............................................................................................................... 45 3.2 Preparing Space in bZIP-DNA for YOPRO intercalation ........................................ 47 iv 3.3 YOPRO Models 1,2 and 3 .................................................................................................... 50 YOPRO Model 1 Intercalation in bZIP-DNA .................................................... 50 3.3.1 3.3.2 YOPRO Model 1 in Water ........................................................................................ 53 3.3.3 YOPRO Model 1 Intercalated ................................................................................. 54 Revision of the YOPRO Force Field, Model 2 .................................................. 56 3.3.4 3.3.5 YOPRO Model 2 in Water ........................................................................................ 57 3.3.6 YOPRO Model 2 Intercalated ................................................................................. 59 3.3.7 YOPRO Model 3 ........................................................................................................... 66 YOPRO Model 3 in Water ........................................................................................ 67 3.3.8 3.3.9 YOPRO Model 3 Intercalated ................................................................................. 68 3.3.10 Discussion and Conclusion Remarks of the Three Models ....................... 71 3.4 Defects of the Three Models .............................................................................................. 75 S0 and S1 YOPRO Free Energy Surfaces in Water with Fitted Force Constant 3.5 Parameters ............................................................................................................................................ 77 3.6 S0 and S1 YOPRO dsDNA intercalated. ......................................................................... 80 3.7 Nonradiative Relaxation Path on the S1 Surface from the S0 Equilibrium Position ................................................................................................................................................... 83 3.8 Discussion and Concluding Remarks of the Results from the Fitted Force Field of the Two Electronic States ............................................................................................... 85 C++ program implementing the steepest descent algorithm ................. 88 3.8.1 4 Protein Classification Using Deep Neural Networks ....................................................... 96 4.1 Introduction ............................................................................................................................. 96 4.2 Data Sources ............................................................................................................................ 99 4.3 List of terms used ............................................................................................................... 102 4.4 Neural Network Experiments ....................................................................................... 104 Model 1: Vanilla neural network with one hidden layer ...................... 104 4.4.1 4.4.2 Model 2: Convolutional Neural Network ...................................................... 106 4.4.3 A Deeper CNN ........................................................................................................... 108 4.4.4 Model 3: LSTM network ....................................................................................... 112 4.5 Conclusion and Remarks ................................................................................................. 118 5 Conclusion and Outlook ............................................................................................................ 120 Intercalation Outlook ........................................................................................................ 120 5.1 5.2 Molecular Dynamics of MiR-155 .................................................................................. 122 APPENDICES ........................................................................................................................................... 124 APPENDIX A Fitting Protocol from IpolQ Method ........................................................ 125 APPENDIX B C++ program implementing the steepest descent algorithm ....... 127 REFERENCES .......................................................................................................................................... 135 v LIST OF TABLES Table 2.1.1 Window data for Umbrella sampling of Model 2 intercalated. .................... 38 Table 2.1.2 Window data for Umbrella sampling of Model 3 intercalated ..................... 39 Table 2.2.1 S0 YOPRO a and b values of D1 surface as functions of D2 ........................... 41 Table 2.2.2 S0 YOPRO a and b values of D2 surface as functions D1 ................................ 41 Table 2.2.3 S1 YOPRO a and b values of D1 surface as functions of D2 ........................... 42 Table 2.2.4 S1 YOPRO a and b values of D2 surface as functions D1 ................................ 42 Table 3.2.1 Box expanded atom pair distances. ......................................................................... 48 Table 3.3.1 Hydrogen bond averages and standard deviations of the YOPRO- intercalated bases over a 10 ns simulation. ....................................................................... 51 Table 3.3.2 Averages and standard deviations of the YOPRO-intercalated box dimensions over a 10 ns simulation. ..................................................................................... 52 Table 3.3.3 Averages and standard deviations of distances between G, C, A and T and indicated YOPRO atoms over the MD trajectory. ............................................................. 53 Table 3.3.4 The intercalating hydrogen bond average lengths and standard deviations over the 10 ns simulation. ................................................................................... 59 Table 3.3.5 The box atom pair average lengths and standard deviations over the 10 ns simulation. .................................................................................................................................. 59 Table 3.3.6 The atom pairs average lengths and standard deviations (see Figure 7) for the distances used to characterize the intercalation extent over the 10 ns simulation. ........................................................................................................................................ 60 Table 3.3.7 The hydrogen bond average lengths and standard deviations over the 13 ns simulation. All H bond are well maintained. ............................................................... 61 Table 3.3.8 The box atom pair average lengths and standard deviations over the 13 ns simulation. .................................................................................................................................. 61 Table 3.3.9 The atom pair average lengths and standard deviations for the distances used to characterize the intercalation extent over the 13 ns simulation. ............. 62 vi Table 3.3.10 The hydrogen bond average lengths and standard deviations over the simulation. All H bond are well maintained. ...................................................................... 69 Table 3.3.11 The box atom pair average lengths and standard deviations over the simulation. ........................................................................................................................................ 69 Table 3.3.12 The atom pair average lengths and standard deviations for the distances used to characterize the intercalation extent. .............................................. 69 Table 3.3.13 Dihedral probability peaks (degrees) for all the simulations. ................... 71 Table 4.4.1 Accuracies of binary classification using the vanilla neural network ... 104 Table 4.4.2 Accuracies of classifying class 2 and 3 with regularization. ...................... 105 Table 4.4.3 Accuracies of using the above convolutional neural network ................. 108 Table 4.4.4 Accuracies of binary classification using a deeper convolutional neural network with the identity block ........................................................................................... 111 Table 4.4.5 Accuracies of binary classifications between Nup and the first four families with the deep CNN model ...................................................................................... 111 Table 4.4.6 Method 1: Accuracies of LSTM using the sequential representation ..... 116 Table 4.4.7 Method 2: Accuracies of LSTM adding in the first structural representation with threshold = 8Å ................................................................................... 116 Table 4.4.8 Method 3: Accuracies of LSTM adding in the second structural representation with threshold = 6Å ................................................................................... 116 Table 4.4.9 Method 4: Accuracies of LSTM adding in the second structural representation with threshold = 12Å ................................................................................. 116 Table 4.5.1 Accuracies of binary classification using the deeper convolutional neural network in Model 2 with 400 sequence for each class ............................................... 119 vii Figure 1.1.1 Structure of YOPRO (without hydrogen atoms) ................................................. 1 Figure 1.2.1 Illustration of the Periodic Boundary Condition. The center black box is the primary box and all the others are the copies of it. Once the dark green particle in the primary box goes out, the light green one in the bottom copy enters the box to compensate. From https://en.wikipedia.org/wiki/Periodic_boundary_conditions ................................. 7 Figure 1.3.1 Overlapping of the probability peaks of all the windows sampling the angle space of a bond angle31. .................................................................................................. 11 Figure 1.4.1 Work cycle of the IPolQ Method .............................................................................. 15 Figure 1.6.1 Fluorescence Theory. https://en.wikipedia.org/wiki/Franck%E2%80%93Condon_principle .............. 19 Figure 1.6.2 Avoided crossing. (Turro P157, Fig 6.3c) ............................................................ 20 Figure 1.6.3 Conical Intersection. https://commons.wikimedia.org/wiki/File:Conical_intersection.svg ................... 20 Figure 1.7.1 Potential U(x) with two metastable states A and C. Escape occurs via the forward rate k+ and the backward rate k- respectively, and Eb+ is the corresponding activation energy (Reaction Rate Theory -- 50 Years after Kramers, FIG 3). .............................................................................................................................. 23 Figure 1.8.1 Me-1122P51 ...................................................................................................................... 29 Figure 1.8.2 S1 state energy profiles along the constructed Linearly interpolated internal coordinate (LIIC) pathways .The inset illustrates the energy profile in the gas phase between the Franck-Condon point and minS1. 51 ............................... 30 Figure 2.1.1 Structure of YOPRO (without hydrogen atoms) with benzoxazole and 162°. ..................................................................................................................................................... 32 Figure 2.1.2 This density distribution shows that these 105 windows mentioned above cover the whole region of D1∈ [-180,180] and D2∈ [-40,40] for the model 2. .............................................................................................................................................. 37 quinoline rings connected by a cyanine-based linker. Dihedral C2−C6−C10−C11 (D1) and C6−C10−C11−O1 (D2) are the two dihedrals on linker C11−C10−C6 that define the twist angle of the two ring systems. In this example, D1 = 180° and D2 = LIST OF FIGURES viii Figure 3.1.1 The bZIP crystal structure59 showing the two monomers composed of leucine zipper and basic domains. The dsDNA that is perpendicular to the protein dimer is held in place by the basic domains. ..................................................... 45 Figure 3.1.2 RMSFs of the DNA bases and protein residues for bZIP-DNA. The dsDNA strands are numbered 2-20 and 22-40. The protein monomers are numbered 41-97 and 98-154. The bases and residues are quite stable with the protein dimer maintained and base pairing maintained with greater fluctuations at the ends of both monomers and both DNA strands. .............................................................. 46 Figure 3.2.1 The hydrogen bonded base pairs, G38-C4 and A37-T5 separated for YOPRO intercalation, with the three GC and two AT hydrogen bond distances at the end of the separation indicated. ...................................................................................... 47 Figure 3.2.2 During the expansion to provide room for YOPRO intercalation, the base pairing hydrogen bonds are well maintained. ........................................................ 49 Figure 3.3.1 A snapshot from the 10 ns MD trajectory of YOPRO intercalated into the ‘box’ formed from the separating the base pairs. Six distances are used to characterize the stability of the box: A37C2-C4C2, T5C5-C4C2, G38C4-T5C5, G38C4-C4C2, A37C2-T5C5, and A37C2-G38C4. ............................................................... 51 Figure 3.3.2 Base-to-YOPRO distances used to characterize the intercalation extent, for a snapshot from the 10 ns MD trajectory. .................................................................... 52 Figure 3.3.3 PMF of the dihedral D1(C2-C6-C10-C11). The red line is the result of the unrestrained simulation while the black line is obtained from umbrella sampling. ............................................................................................................................................ 54 Figure 3.3.4 PMF of the dihedral D1(C2-C6-C10-C11). The red line is the result of the unrestrained simulation while the black line is obtained from umbrella sampling. ............................................................................................................................................ 55 Figure 3.3.5 Model 1 RMSFs of the DNA bases and bZIP residues compared for bZIP- DNA and bZIP-DNA with intercalated YOPRO over their MD trajectories. .......... 56 Figure 3.3.6 The probability density for the two dihedrals for the unrestrained YOPRO Model 2 in water. ........................................................................................................... 57 Figure 3.3.7 The probability density (top) for the two dihedrals from the 2D umbrella sampling simulation. ................................................................................................ 58 Figure 3.3.8 Model 2 RMSFs of the DNA bases and bZIP residues compared for bZIP- DNA (black) and bZIP-DNA with intercalated YOPRO (red) over their MD trajectories. ....................................................................................................................................... 60 ix Figure 3.3.9 The dihedral angles D1 and D2 for intercalated YOPRO Model 2.
 ......... 61 Figure 3.3.10 Model 2 extended dihedral angles D1 and D2 for intercalated YOPRO. ................................................................................................................................................................ 62 Figure 3.3.11 The intercalated YOPRO probability density (top) for the two dihedrals from a 2D umbrella sampling simulation. ...................................................... 63 Figure 3.3.12 Four characteristic distances (see Figure 3.3.2) to monitor the positions of the ring systems, as a function of the D1 dihedral angle. .................... 64 Figure 3.3.13 The last snapshot from the window where D1 is constrained at –120 degrees. In this twisted configuration the benzoxazole ring has moved out of the box formed from the surrounding bases. .................................................................... 65 Figure 3.3.14 Model 2 average and standard deviations of the three AT and two GC hydrogen bonds as a function of the angle D1. The hydrogen bonding of the four bases is maintained even as the oxazole ring is no longer intercalated. ............... 66 Figure 3.3.15 Model 2 extended RMSFs of the DNA bases and bZIP residues compared for bZIP-DNA and bZIP-DNA with intercalated YOPRO of the most twisted window, (-120, 0) over their MD trajectories. .................................................. 66 Figure 3.3.16 The dihedral angles D1 and D2 for unrestrained YOPRO Model 3 in water. .................................................................................................................................................. 67 Figure 3.3.17 Model 3 RMSF of the DNA bases and bZIP residues compared for bZIP- DNA and bZIP-DNA with intercalated YOPRO over their MD trajectories. .......... 68 Figure 3.3.18 The dihedral angles D1 and D2 for intercalated YOPRO Model 3. ......... 68 Figure 3.3.19 The intercalated YOPRO probability density for the two dihedrals from the 2D umbrella sampling simulation for YOPRO Model 3. ......................................... 70 Figure 3.5.1 PMF surfaces of S0 and S1 YOPRO in water. ...................................................... 77 Figure 3.5.2 Subtracted energy surfaces of S0 and S1 YOPRO. ............................................ 78 Figure 3.5.3 PMFs of S1 YOPRO in water around the four minima on the S0 quantum energy surface. ................................................................................................................................ 79 Figure 3.6.1 PMFs of intercalated S0 YOPRO centered at the S0 global minima. ........ 81 Figure 3.6.2 PMFs of intercalated S1 YOPRO centered at the S0 global minima. ........ 82 x The two views displayed, panels (a) and (b), are for better visualization. Black (path) and gray (PMF) are in water. Red (path) and light red (PMF) are intercalated. The Figure 3.7.1 PMFs and relaxation pathways for S1 YOPRO in water and intercalated. barrier to relaxation is present only in the intercalated pathway. .................................... 84 Figure 4.4.1 Convolution and Flatten Operation .................................................................... 106 Figure 4.4.2 A simple CNN network. What’s inside the parentheses above the arrows show the shapes of the sample inputted into the next step. .................................... 107 Figure 4.4.3 An identity block consisting of two paths. ....................................................... 110 Figure 4.4.4 Architecture of a RNN. From https://en.wikipedia.org/wiki/Recurrent_neural_network ................................... 112 Figure 4.4.5 Structure of the repeating cell module in an LSTM. From http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ........................... 114 Figure 5.1.1 Chemical structure of YOYO (1,1'-(4,4,8,8- tetramethyl-4,8- diazaundecamethylene)bis[4-[3-methyl-benzo- 1 OX- azol-2-yl]methylidene]- 1,4-dihydroquinolinium])71. .................................................................................................. 121 Figure 5.2.1 miR-155 secondary structure and sequence conservation. ..................... 122 xi 1 Introduction 1.1 General Introduction Oxazole yellow (YOPRO), shown in Figure 1.1.1, is a cyanine-based dye widely used in commercial DNA concentration assays.1, 2 During gel electrophoresis for DNA detection, it forms a stable complex with the DNA to avoid its removal.3, 4 Furthermore, it is also used to monitor the formation and break up of protein-DNA complexes,5, 6 due to its high binding affinity to double-stranded DNA. At low concentrations, the preferred binding mode of YOPRO is intercalation between paired bases7, whereas at higher dye-to-DNA ratios, other binding modes on the surface, like external electrostatic binding or minor groove binding can occur. Figure 1.1.1 Structure of YOPRO (without hydrogen atoms) YOPRO is almost non-fluorescent in water but its fluorescence is enhanced ~1800 fold after intercalation in double-stranded DNA, providing the basis of DNA concentration assays1, 8-10. The rationale for this difference is that in water rotation around the linker connecting the benzoxazole and quinoline rings (Figure 1.1.1) lets the excited molecule decay through nonradiative relaxation pathways, thus quenching the 1 fluorescence. However, intercalation into double-stranded DNA enhances the constraints on the rotational motion around the linker, therefore eliminating this pathway and results in the observed intense fluorescence. In this work, we first carried out molecular dynamics (MD) simulations to explore the conformational sampling of YOPRO in the different environments as discussed above and constructed free energy surfaces (potentials of mean force) along the dihedral angles using modified Amber generated force field models. The simulation results and defects of these force field models are discussed in Chapter 2. Then in Chapter 3, we introduce a more accurate dihedral force field by using the implicitly polarized charge method (IPolQ)11 for both the ground (S0) and the first excited electronic states (S1). Then, MD simulations using the new force fields are carried out on both electronic surfaces for water solvated and intercalated dye. Finally, a steepest decent algorithm was implemented to find the non-radiative relaxation pathway on the S1 free energy surfaces for both YOPRO in water and intercalated. The distinct energy barrier found on the intercalated path that is absent in the water solvated path supports the hypothesis that constraints on linker between the two ring systems are responsible for the large difference in fluorescence intensity in different environments. Chapter 4 is devoted to the different topic of protein family classification12 using Machine Learning13. Proteins are classified into families based on evolutionary ancestry. A given protein family has similar three dimensional structure and function. In our investigations we studied if a neural network can be trained to discriminate between protein families based on sequence alone, and based on sequence and corresponding 2 structure. That sequence alone can discriminate between different families indicates that sequence does encode protein structure and function. We investigated different neural network models for predicting protein families and found that the long short-term memory networks (LSTM)14 show the highest performance among all the models studied. Different methods for extracting protein structural information as the input data, by mapping the extracted structural information into protein family labels, were considered. Because there are many more sequences known than their structures and because machine learning is data intensive there is a tradeoff between the sequence and sequence plus structure approaches. All in all, the LSTM neural network approach is shown to be capable of good discrimination in binary comparison of pairs of protein families. 3 1.2 4 Molecular Dynamics Simulation 1.2.1 General Description Molecular dynamics15, 16 (MD) is a method that simulates the motion of molecules in a closed system using by computationally integrating Newton’s equations of motion. The inter and intramolecular motions are driven by a force field that is predesigned to approximate physical laws. Therefore, MD simulations can provide dynamic information of a molecular system at atomic level. For instance, by examining the resulting trajectory of a simulation which is a sample in an ensemble, one can evaluate the average system properties such as energy, entropy and free energy by calculating their time averages. The algorithms that implement MD simulations are to answer one fundamental question: if the current configuration of the system is given, what is for next moment? By answering it repeatedly, MD propagates the system in time following Newton’s equation of motion: (1.2.1) Specifically, three steps are involved for MD to propagate the system: first, one needs to initialize the positions and velocities to all the atoms in the system. For most biological systems, MD starts out with the crystal structure of the molecules of interest. To conduct simulation in solvent, either implicit solvent is added as a continuous media or explicit solvent are added by putting the system into an equilibrated box of solvent molecules. Then the velocities of the system are initialized with a random seed generated from a Maxwell distribution17. Secondly, the forces are F m !!r = 1.2.2 Force Field evaluated according to the force field. Finally, an integrator is used to predict the configuration for the next moment given a certain time step. The last two steps are repeated until the preset number of step are completed. A MD force field is a set of functional forms and parameters that are used to simulate the potential energy change due to the intramolecular interactions and intermolecular motions. The force field parameters is developed to fit the experimental data or quantum calculation results into the following terms: bond stretches and vibrations, dihedral torsions, Lennard-Jones and electrostatic potentials. A number of force fields have been developed for biological systems such as the AMBER18, CHARMM19 and GROMOS20 force fields. As an example, equation 1.1.2 shows the functional form of the AMBER internal bonded force field: (1.2.2) The first term on the right is the bond stretching term as the function of the and force constant equal . bond length with The second term is the bond bending term as the function of the bond angle . The last one is the the equilibrium bond angle at dihedral torsional term as a function of as the phase and shift . and force constant equal with the multiplicity with the equilibrium bond length at [1+ cos(nω−γ)] V (r N ) = kb(l − l0) 2 + ka(θ−θ0) 2 + ∑ torsions 1 ∑ 2n n γ kb θ ∑ bonds ∑ angles θ0 ω 5 l l0 ka r(t + Δt) = r(t) + v(t − Δt 2 v(t + Δt 2 ) = v(t − Δt 2 ) + a(t)Δt )Δt 1.2.3 Integrator Many integrators have been designed, one of which is the leap-frog algorithm21: (1.2.3) where t is the time, and r(t), v(t) and a(t) are the coordinates, the velocity and the acceleration of the particles. The integration step, Δt, should be chosen small enough to catch the fastest motion which is the hydrogen related bond vibrations. To speed up the MD, the SHAKE algorithm17 is usually used to constrain these bonds to increase the step size to the scale of 1fs. This should be a good approximation since these bond vibration motions will be averaged out in a normal simulation scale. Period Boundary Conditions (PBC) are used to simulate an infinite system with a finite size box by representing it as a repeated array of the box (Figure 1.2.1). Certainly an artificial periodicity is imposed on the system simulated. In conjunction with using PBC, the long range interactions can be calculated using the Ewald Lattice Sum17 implemented by the Particle Mesh Ewald method22. 1.2.4 Period Boundary Condition 6 1.2.5 Temperature and Pressure Control Figure 1.2.1 Illustration of the Periodic Boundary Condition. The center black box is the primary box and all the others are the copies of it. Once the dark green particle in the primary box goes out, the light green one in the bottom copy enters the box to compensate. From https://en.wikipedia.org/wiki/Periodic_boundary_conditions The Newton’s equation (equation 1.2.1) conserves the total energy of the system simulated. Therefore, without temperature control, the system is simulated in a microcanonical ensemble with fixed number of particles (N), volume(V) and energy(E). However, the normal experimental conditions usually allows the total energy to fluctuate but fix temperature and the pressure of the system. Therefore, we need to control the temperature and pressure. Several methods have been developed for this problem23, one of them is the Berendsen24 method. It couples the simulated system with an external bath of the . For any particle i, Berendsen et al proposed the modification on temperature the velocity: Tref 7 !!ri = T − Tref T Fi mi + 1 2τ ( 2 3kBN 1 N ∑ mi !ri 2i=1 2 Tref T −1) !ri σ = 1− Δt τ The instantaneous pressure P is computed by (1.2.4) where τ is the coupling time and usually τ=0.2ps is used. T is the instantaneous , where N is the total number of temperature, which is equal to particles kB is the Boltzmann constant. The extra term added in equation 1.2.3 serves as a frictional force as a feedback mechanism to scale T back to . The velocity is , where Δt is the MD step size. therefore scaled by a factor (1.2.5) where K is the kinetic energy of the system and Virial is the virial function17. The pressure of a n isotropic system is constrained by a factor of γ: (1.2.6) is the reference pressure. For an anisotropic box, the pressure control is where complicated where a pressure tensor has to be used24. Given the simplicity of the implementation of the Berendsen method, however, it doesn’t lead to any known ensemble. To guarantee an NPT ensemble, methods such as the Langevin dynamics need to be used25. γ = (1− Δt β 2 3V P = (K −Virial) P − Pref 1 3 )− P Tref Pref 8 ( ∑ i, j qiq j rij + Bij 6 ) rij Vnb = Kqiq j r p Aij 12 + rij 1.2.6 Long-range Interactions The potential energy of the non-covalent interactions is usually expressed as a sum of the pair-wise electrostatic and Lennard-Jones contributions: (1.2.7) The first term on the right, which represents the electrostatic contribution, has with p =1. It is known that a summation of long-range interaction a form of terms of this form with p ≤ 3 is a conditionally convergent sum26. The easiest way to solve the convergence problem is to use a cut-off method, which simply ignores the interactions beyond some chosen cutoff distance. But later more accurate methods were developed, such as the Ewald summation17. It divides the calculation of summation into two parts that are in the real space and the reciprocal space, respectively, which converge rapidly given the system is charge neutral. The original reciprocal summation was implemented in O(N2), which is still too expensive for large systems. Particle Mesh Ewald (PME) method22 makes interpolation of the reciprocal structure factor in the lattice points and uses Fast Discrete Fourier Transform (FDFT) to reduce the time cost to O(NlogN). 9 1.3 Weighted Histogram Analysis Method 
 1.3.1 Umbrella Sampling The current affordable time scale for MD simulations for a normal-sized protein in explicit solvents is limited to only tens of nanoseconds. Therefore, when there are large barriers in the complex potential energy surface of the molecule of interest, MD may not be able to sample the configuration space properly. To compensate for this defect, umbrella sampling method is often used23. In an umbrella sampling, a restraint potential, which is usually a parabolic function of the coordinates, is added to the original potential surface to force the system to stay around the restraint coordinate and make a biased energy surface. This modified system is called a window. Multiple restraint potentials centered at different locations can be applied to make multiple windows, which are sampled by MD one after the other. Finally, the resulting window biased probability data are combined the Weighted Histogram Analysis (figure 1.3.1) and unbiased using method(WHAM)27, 28 to construct a potential of mean force (PMF)29, 30 of the whole sampling space. 10 ith Wi R U0(η) h h= R ( ) 1.3.2 Unbiasing Procedure window are functions of the reaction coordinate Figure 1.3.1 Overlapping of the probability peaks of all the windows sampling the angle space of a bond angle31. and the restraint Suppose the original potential energy surface potential of the , is the set of atom coordinates. Thus, the biased and unbiased probability where distributions are: (1.3.1) and (1.3.2) respectively. By plugging equation 1.3.2 into equation 1.3.1, we can obtain: (1.3.3) 11 e−(Wi (η)+U0 (η))β ∑ e−U0 (η)β e−(Wi (η)+U0 (η))β ∑ e−(Wi (η)+U0 (η))β (u ) (η) = ρi ρi ( b) (η) = ρi (b)(η)eβWi (η) e = å ) u ( ) r h ( i ∑ η ( h b ) 0 U - 0 ( h b ) U - e η h η fi e e h - f b i e U 0 ( h b )) W - i ( h b ) U - 0 ( h b ) ( W -+ i ( ) h ith e f b - ´ i å h ) u ( ) r h ( i e å ρ0(η) ρ0(η) ρ(u)(η) å h º= window ρ(u)(η) , and use We now define the free energy of a certain of the Now we combine the unbiased probability using the third term on the right of equation 1.3.3: (1.3.4) of all the windows to approximate the global unbiased probability to replace in equation 1.3.4: (1.3.5) where (1.3.6) is the weight of the where (1.3.7) In order to minimize the statistical error on the global probability distribution32, we differentiate the standard deviation with respects to the weights and set them to 0: (1.3.8) to derive: windows, and it is normalized: ( s r h ( )]) ( ) ( ) h r h ( ) u i p h i ( ) 1 = [ 0 ip ¶ 0( ) r h =´å =´å ( ) ip h ( ) r h N å i 1 = W - i ( h b ) = 0 N i 1 = 2 ¶ 0 p i e h ith 12 N j 1 = = 0 - ( W j - f ) j n i ( ) r h ( ) b i ( ) r h i å å N 1 = ( ) n e b h j Note that is the number of data points in The resulting So on and so forth until the difference of the from two iterations converges to some given tolerance. The final to construct the PMF surface. (1.3.9) window. then goes back to equation 1.3.5 to start the next iteration. value for all the windows obtained is then used ρ0(η) ρ0(η) ith ni f 13 Implicitly Polarized Charge Method The simulation of the solute molecule of interest relies on the construction of accurate force field parameters33. The Implicitly Polarized Charge method (IPolQ) derives the partial charge set of the molecule in solution as the first step to obtain these potentials from ab initio quantum calculations11. To start with, the solute molecule is solvated in a water box, and it is kept fixed. A 10 ns conformational sampling of the water solvent at 450K is carried out to collect a set of equilibrated water configurations. The solvent charge density and the electrostatic field of these water configuration is calculated as the reaction field in Figure 1.4.1. For the second step, the MD simulation of the solute molecule is performed from each equilibrated water configuration with the water solvent fixed to obtain a set of time average solute configurations. The fixed locations of the water solvent later will serve to create a field of point charges surrounding the solute. Once the MD simulation of the solute is finished, the quantum chemistry program orca34 is used to calculate the single point energy and electrostatic potential of each solute configuration in vacuum for charge fitting. The solvent charge density calculated in the first step is taken as a perturbation for the electrostatic potentials of the solute in water. The final IPolQ charge set was derived from the perturbed electrostatic potential set using the Restrained Electrostatic Potential (RESP) model35. The IPolQ charges set was used to replace the original guess to start the next iteration of charge fitting. Once the fitted charge set is converged, the charge fitting 1.4 14 process is finished. The IPolQ charge set then goes to the force constant fitting section. The architecture of whole procedure is show as below: Figure 1.4.1 Work cycle of the IPolQ Method 15 1.5 Force Constant Fitting The partial charge set of the molecule of interest obtained by the IPolQ method in Section 1.4 is used to sample a set of conformations in vacuum by systematically varying the investigated degrees of freedom (one or two) for which the force constants are to be fitted and optimizing the rest part of the molecule. For each conformation, the single point energy Q and the classical potential energy V from the AMBER Generalized Force Field18 with the fitted terms excluded were calculated. Therefore, the pure contribution of the fitted terms is computed by subtracting V from Q to construct a “subtracted” potential surface along them. If we have two terms to fit and the resulting potential surface shows a strong coupling of them then, for each conformer, we fit the force constant parameters of one of them (D1) with the other(D2) fixed at a set of preset values. For any fixed D2 value, a Fourier expansion of the potential energy of D1 with four terms (equation 1.2.2) is used. For instance, if both D1 and D2 are dihedrals, we use the following equation to express the torsional energy at a fixed D2 value: (1.5.1) values are the fitted force field parameters for the energy where the barrier and phase shift, respectively. of D1 and (1+ cos(4x − b4)) V( D1|D2)(x) = a0 + (1+ cos(2x − b2)) a4 4 a6 6 a8 8 + + + (1+ cos(6x − b6)) V( D1|D2)(x) (x) (1+ cos(8x − b8)) an bn a2 2 16 The whole procedure is repeated with the roles of D1 and D2 interchanged; thus, each conformation has a fitted force field parameter set for its corresponding window in the later umbrella samplings. 17 1.6 Fluorescence Theory An atom absorbs a photon when resonance happens, namely the oscillations of the light wave are coupled with the oscillation of the electrons of the atom, causing the electron distribution to “reshape” and conversion of the low electronic state to higher electronic state, which the has more nodes on the orbital surface36. The energy gap between the two state is equal to the energy of the photon: In a molecule, this electronic transition is coupled with nucleic motions such as vibrations, rotations. According to the Franck-Condon principle, since the electronic motions are much faster than the nuclear motions, the molecule maintains its nuclear structure after the transition, therefore the transition often ends at higher vibrational states rather than the ground state as indicated by the end of the blue arrow in Figure 1.6.1. The excited molecule usually quickly relaxes to the lowest vibrational state of the excited electronic state through non-radiative relaxations by dissipating heat to the surroundings and from there decays to the ground electronic state as shown with the green arrow in Figure 1.6.1, where a photon, known as the fluorescence is emitted. . E hn D= 18 Fluorescence 1.6.1 Figure https://en.wikipedia.org/wiki/Franck%E2%80%93Condon_principle Theory. In addition to fluorescence, the excited molecule may also decay to the ground electronic state via non-radiative transitions at critical nuclear configurations, rc (Figure 1.6.2), which correspond to the minima on the excited surface. In the vicinity of rc, ѱ(ground) and ѱ(excited) are mixed, especially when the energy gap ΔE is large and the surface jump is “strongly avoided”, which leads the electron to stay on the same adiabatic surface. 36 On the other hand, when the energy gap is small, the surface jump becomes possible near rc so the electron can therefore go down to the ground state surface . 19 Figure 1.6.2 Avoided crossing. (Turro P157, Fig 6.3c) Another type of non-radiative transitions happen at conical intersections (CI), where the S1 and S0 surfaces intersects(Figure 1.6.3). In this case, there’s no such region where ѱ(ground) and ѱ(excited) are mixed except the CI point. Figure https://commons.wikimedia.org/wiki/File:Conical_intersection.svg 1.6.3 Intersection. Conical 20 The basis of DNA concentration assays using cyanine dyes like YOPRO relies on the fact that these dyes are almost non-fluorescent in aqueous solutions, but intensely fluorescent when intercalated in DNA.1, 9, 10, 37-40 This results from the different configurational distributions in the two environments. In solution, twisting around the linker between the two ring systems leads to geometries that let the excited dye access nonradiative decay through avoided crossing or CI, quenching the fluorescence. However, when intercalated, the environment rigidity increases the constraints on the rotational motion around the linker by forming energy barriers on the pathway, resulting in fluorescence enhancements of greater than 1,000 fold.7, 41 This mechanism has been suggested broadly in the context of molecular photochemistry arising from increasing medium viscosity42; for example, the molecular motions of the stilbenes in solution versus rigid environments. 36, 43 It has been studied that the quantum yield of fluorescence of the stilbenes depends on rotational isomerization which is inhibited by the high viscosities.44 Moreover, in the first excited singlet state of stilbenes, there has been found some vibrational modes that are responsible for the low quantum yield at low viscosities. The population of these modes are all decreased at high viscosities where an increase in the quantum yield is observed. These motion are absent in rigid nonplanar molecules even at low viscosities.43, 44 For YOPRO, this scenario has also been inferred from the similarity between the activation energies for the temperature dependence of non-radiative decay processes and for viscous flow38. Similar results has also been observed for the dyes such as DPTox, T3ox42 and YO38 in a viscous solution, where the rotational mobility around the linker is 21 decreased and the corresponding activation energy for the temperature-dependent nonradiative decay is similar to that for the intercalation. Based on these, a series of hemicyanine derivatives are used as sensors for the viscosity in solutions or biological fluids in vivo. By measuring the lifetime, intensity and anisotropy of the fluorescence, the viscosity of the cell membranes and the cytoplasm can be determined for biological researches of intracellular mass and signal transport, reactive metabolites and biomacromolecules interactions and diffusions, and clinical diagnoses. By making choice of the heterocycles on the hemicyanine dyes, different levels of sensitivity can be achieved.39 In contrast, understanding this mechanism helps to increase the quantum yield of many chromophores used as noninvasive markers. By binding them to rigid macromolecular matrixes to restrict the twisting motions, their fluorescence intensities are raised by several orders of magnitude.41 A good example is the green fluorescent protein (GFP), which is used to mark gene expression and protein localization. Restraining the twisting degrees of freedom of its chromophores by tightly fixing the inside the protein exhibits a quantum yield of ~0.8,40 and the fluorescence emission can be reduced by protease digestion or heat denaturation, when the chromophores regain their flexibility. This is analogue to the intercalation of cyanine dye molecules like YOPRO in the double-stranded DNA. 22 1.7 Kramers Theory In 1940, Kramers developed a model describing the reaction rate of thermally activated barrier crossing processes.45, 46 As shown in Figure 1.7.1, a classical particle of mass M is moving in a asymmetric double-well potential U(x), where the two local minima at xa and xc on the reaction coordinate axis X designate the initial and final states, and the local maximum at xb gives the energy barrier of Eb+. We now derive the escaping rate of the particle over the potential barrier in consequence of Brownian motion. Figure 1.7.1 Potential U(x) with two metastable states A and C. Escape occurs via the forward rate k+ and the backward rate k- respectively, and Eb+ is the corresponding activation energy (Reaction Rate Theory -- 50 Years after Kramers, FIG 3). where γ is the fiction coefficient, and η is fluctuating force. 1.5.1) is neglected, by reordering the equation, we have: Given the Langevin Equation, M!!x = − ∂U(x) (1.7.1) In the overdumped region where the inertial term (left hand side of equation ∂(x) −γ !x +η 23 ¶ = = g 2 + D P x t P x t D + U ¶ x ¶ P x t x ¶ P x t ¶ x ¶ ¶ x ¶ U ¶ x ¶ ( ,) 2 P x t ¶ t ¶ ¶ x g ¶ J ¶ x ¶ γ !x = − ∂U(x) (1.7.2) ∂(x) +η The Fokker-Planck equation for the probability density P(x,t) corresponding to this Langevin equation, is47 1 ( ,) ( ,)] [ 1 ( ,) ] [ ( ,) where J is known as the current density: ( ,) 1 ( ,) with D a diffusion coefficient. Therefore, J can be rewritten as: ( ,) ( ,) ) ( ) (1.7.3) (1.7.4) (1.7.5) U ¶ x g ¶ k T B D U ¶ k T x ¶ B P x t ¶ x ¶ P x t ¶ x ¶ P x t D P x t D ( D g =- =- =- - = - J J ( ) ) U x k T B x ¶ - ¶ e P =- De U x k T B Hence ( ( ) By integrating both sides from A to C, we get: ( ) ) U x k T B x ¶ J e D ( ) U x k T B =- P ( e ¶ ' ' U x k T B e P C A =- U x k T B J D C A ò e dx 24 (1.7.6) (1.7.7) A −e J D kBT e kBT e C∫ A ⇒ J = dx' U(x') De C∫ By assuming P(x=C) is very small, because the activation energy is large compared with kBT, we get: kBT P(x = A)= − U(a) (1.7.8) kBT P(x = A) U(a) U(x') dx' We now want to evaluate the escape rate r. First, if the barrier is high then we have approximate equilibrium, which makes J in equation 1.7.5 close to 0, therefore: ( ( ) (1.7.9) ( ) The probability of finding the particle in the well A is therefore: where Δ is the size of the well A. The integrand is peaked at x=A, hence: 0 ( ) ( ) ( ) ( ) ( ) [ ( )( (1.7.10) (1.7.11) ( ) ) ( P x dx P A e = P A e = U x k T B ) ) 2 ò ( U x k T B x ¶ -+ -+ U A U A x A -+ Þ= P x P A e ( ) ( ) U x - k T B U x - k T B U A x A U A x A ) U x U A - U A k T B dx = U A k T B ´ ò U A k T B A +D A -D e ...] A +D A -D ò e ¶ e A +D A -D ò p = » e ( '' ( ) ( ) ) 1 2 k T B - - k T B P ® )( ' )( ) 2 k T B dx dx dx e dx 25 ) ) ) ( ( ( U A k T B U A k T B A +- -D e A +D 10 ( ò e 2 '' ( )( '' 2 ) () 12 e -D k T p B U a A +D A A +D -D ò ''2( A U A x A - k T B ) 2 ´ - - e =´ e » So - '' C A e e ( ( ) ) ò '' '' U b =´ U A k T B U x k T B U b U b ( P A b x +´ -+ ´ - 2 2 ) 1 2 U A k T ´=´ B p P A e k T p B U A k T p B U A ) 1 2 1 2 ( ) [ ( ) ) 2 ( ) ) ( ' ( ) ( ) ( ) ) ( )( ' ( ) ( ( )( )] 2 for the denominator of the current J in equation 1.7.8, similarly, we have so ( Finally, the Kramers flux over probability escape rate is De U B k T B U b k T B U b k T B dx k T B b x = = = e e e ' J (1.7.12) (1.7.13) (1.7.14) ' ' B B B B +D +D -D e e » ) ò U b dx dx U A k T B b x ´ - k T B -D k T p B U B P A k T p B U B 1 ( )( ò 2 '' ''2( ) ( ) 12 ( ) ) ( 2( ) ( ) 12 '' ( ) ) ( 2( ) ( ) ( ) 12 2 '' ( ) ( ) ( ) 12 '' ( ) ( )] [ 1 '' '' 2 ( ) [ ( )] 1 '' '' 2 P A k T p B U B k T p B U A De U B k T B P A U A k T B e r = = J p ) ( ) ( = = = ´ - k TB Eb - k TB Eb - k TB U A U B U A U B e U A U B e D k T p B D k T p B 2 212 (1.7.15) where Eb= U(B)-U(A) is the barrier height, and ωA and ωB are the square roots of the curvature of the potential surface U at A and B . The escape rate falls 26 ww A B pg e exponentially with the barrier height. It also decreases when the potential surface at point A and B become flatter, namely the curvatures decrease. On the other hand, the diffusion coefficient D decreases with the increase of the viscosity γ in the solution. All these factor decreases the escape rate r of the particle from the potential well A. Note this only applied to the overdumped region where Eb >>kBT, since we assumed the particle is at quasi-equilibrium and the current density J is close to 0. 27 1.8 Gas Phase Approaches HN(CH)n NH + n = 3,4,5 In 1999, Bernhard Schlegel et al. investigated the ultrafast photoisomerization correspond to tri-, penta- and of three cyanine dye models ( heptamethine cyanines, respectively) of different chain lengths using CASSCF quantum chemical calculations.48 In summary, for all the three models, the photoisomerization processes terminate with torsional motions coupled with the decay in the region of the twisted intramolecular charge-transfer(TlCT)49 state with an adjacent conical intersection (CI). Excited to S1 state at the planar conformation, the dyes are driven off the Franck−Condon point through a symmetric skeletal stretching mode along a barrierless path. Then a second non-symmetric vibrational mode is dominated by torsional motion about one of the double bonds in the linker that leads the system towards a 90° twisted configuration which is adjacent to the CI. Schlegel et al. found the intersection is not located at the bottom of the S1 surface but near a fully twisted minimum that corresponds to a twisted intramolecular charge- transfer(TlCT) state. After partial equilibration of the TICT state, the skeletal asymmetric stretching and NH wagging modes modified the equilibrium geometry and triggered the nonradiative decay through the CI, leading to the low quantum yield. Inhibition of this trans-cis isomerization by dissolving the dye in viscous medium or chemically rigidizing the linker50 or by intercalation as found in our work, prevents this decay mechanism, leading to the dramatic fluorescence enhancement. Gao et al. investigated the photoisomerization of 1,10 -dimethyl-2,20 - pyridocyanine (Me-1122P in Figure 1.8.1) both in gas phase and in methanol.51 It was , 28 found that in the gas phase, conical intersections were near the minima of the S1 state, and the S1 decay follows a barrierless pathway to the global minimum of S1 (minS1) before relaxing to the S0 state through the CI. The solvent effects were estimated using the polarizable continuum model (PCM)52. In methanol as well as other solvents with high polarities such as 1-propanol, ethanol and water, the system would reach a stationary structure (minS1-trans in Figure 1.8.2) first before the CI, and a significant barrier exists between the stationary structure and minS1, which results in a much longer lifetime of the excited state. Figure 1.8.1 Me-1122P51 29 Figure 1.8.2 S1 state energy profiles along the constructed Linearly interpolated internal coordinate (LIIC) pathways .The inset illustrates the energy profile in the gas phase between the Franck-Condon point and minS1. 51 30 2 Methods 2.1 2.1.1 Introduction Simulations with Modified Amber-Generated Force Fields Oxazole yellow, known as YOPRO (Figure 2.1.1), is a member of the cyanine dye family. It consists of benzoxazole and quinoline rings connected by a linker of two carbon-carbon bonds. YOPRO is almost non-fluorescent in water but its fluorescence is enhanced ~1800 fold after intercalation in double-stranded DNA, providing the basis of DNA concentration assays1, 8-10. The rationale for this difference is that in solution, rotational motion of the two rings around the linker permits non-radiative decay to the ground state, while when intercalated, twisting is suppressed, eliminate the non-radiative pathway. To explore the conformational sampling of YOPRO, we first simulated it in water and when intercalated in double stranded DNA that is stabilized by a basic leucine zipper (bZIP) protein. Both unrestrained and umbrella sampling molecular dynamics were used to obtain the free energy as a function of rotation around the two dihedral angles of the linker. Three different YOPRO force were built by varying the dihedral force constants of the linker bonds, reflecting different assumptions on the linker bonding. 31 2.1.2 YOPRO Structure and Internal Force Field Parameters Figure 2.1.1 Structure of YOPRO (without hydrogen atoms) with benzoxazole and quinoline rings connected by a cyanine-based linker. Dihedral C2−C6−C10−C11 (D1) and C6−C10−C11−O1 (D2) are the two dihedrals on linker C11−C10−C6 that define the twist angle of the two ring systems. In this example, D1 = 180° and D2 = 162°. A structure of YOPRO was obtained from the ChEBI website53. Its structure was optimized using Gaussian 0354 and the Merz-Kollman fitting procedure55 used to generate partial charges and a mol2 file. Parmchk in Amber56 was used to generate a frcmod file for tleap. In the frcmod file, we modified the dihedral force constants of the two linker dihedrals to produce three models: Model 1: The original force field generated by Parmchk has a double bond C6-C10 and a single bond C10-C11. But consideration of the stable resonant structure has a single bond C6-C10 and a double bond C10-C11. Dihedral force field parameters from the Generalized Amber Force Field (GAFF) of 1.0 kcal/mol for each of the nine contributions to single bonds and 6.65 kcal/mol for each of the four contributions to double bond were then used. Model 2: In the GAFF, the force constant of each of the four torsional energies of a 32 carbon-carbon double bond is set to 26.6/4=6.65 kcal/mol, (the values are half the barriers in accord with the form of the Amber dihedral force field) while that of a single bond is set to 1.4/9= 0.156 kcal/mol. If the numbers of dihedrals involved in a double bond and a single bond are considered, these two force constants are partitioned in the force field by dividing by 4 and 9, respectively. Based on these values, we estimated the force constant for the two YOPRO dihedrals by splitting the difference between single and double bonds; that is, using the geometric average of these partitioned force constants for single and double bonds. The result is close to 1 kcal/mol. This value agrees with the force constant for an sp2 carbon-sp2 carbon bond in the middle of a conjugated system in the GAFF. Thus, we adopt this value for the conjugated cyanine system. Model 3: Both dihedral force constants were reduced by half to 0.5 kcal/mol. 2.1.3 YOPRO Solvation in Water The solvation of YOPRO with TIP3P waters was done using tleap.56 The buffer distance chosen was 8 Å. To achieve electroneutrality of the system, 2 chloride counterions were added. Thus, the YOPRO was surrounded by 751 water molecules and 2 Cl− ions (which made the total size of the system 2312 atoms). A restrained energy minimization was performed using the SANDER module of AMBER12. For both the minimizations and the subsequent MD runs, the long-range Coulombic interactions were handled by the particle mesh Ewald method. The minimizations were performed in 2 stages. In both stages 4000 iterations of steepest descent were performed followed by 4000 iterations of conjugate gradient algorithm. A force 33 constant of 10 kcal/mol/rad2 was applied to the restrained atoms. During the first minimization, the heavy atoms of YOPRO were restrained, which allowed for the optimization of the hydrogens of the waters added by tleap. While in the second one, all atoms in the system are minimized. After the restrained minimizations, a 200 ps heating run was done to 300 K. The heat run used the exact same restraints that were used in the minimizations except that a force constant of 100 kcal/mol/rad2 was applied to the restrained atoms. SHAKE57 was used in the heating and subsequent production runs for bonds involving hydrogen atoms allowing for a 2-fs time step. A Langevin thermostat56 was used to maintain the temperature at 300 K. The heated structures were used as the reference structures for the next 800 ps equilibration run, which allowed the solvent environment to reach their proper densities. At the end of the equilibration run, the density of the YOPRO system was 0.9922 g/cc. The restart files from those equilibration runs were then used as the inputs for all of the subsequent production runs. Then MD runs were performed at 300 K using constant NPT conditions (isothermal-isobaric) for the three models as detailed in Chapter 3. The reference pressure was set equal to 1 bar, and the Berendsen barostat58 used with a pressure- coupling constant of 0.1 psec. The temperature was maintained at 300 K using a Langevin thermostat with a collision frequency of 0.2 ps–1. The crystal structure of GCN4 in the presence of DNA (PDB accession code 2.1.4 bZIP-DNA Solvation in Water 34 1YSA) was used59 to initiate the simulations with bZIP. It was solvated with 22526 TIP3P waters with an 8 Å buffer distance. 25 Na+ counterions were added to achieve electroneutrality. The minimization and heating runs were performed using the same restraints that were used for the “YOPRO in water” system. The heating run lasted 200 ps and was followed by a 400 ps equilibration run. At the end of the equilibration run, the density of the bZIP system was 1.0070 g/cc. The restart file from those equilibration runs were then used as the inputs for the subsequent production MD runs. The 20 ns MD run was performed at 300 K using constant NPT conditions with reference pressure 1 Bar and pressure-coupling constant of 0.1 psec using a Berendsen barostat.58 The temperature was maintained at 300 K using a Langevin thermostat with a collision frequency 2 ps–1. We used PyMOL60 to intercalate YOPRO into the space between the 2 base pairs C4-G38 and T5-A37 in the bZIP structure to obtain a coordinate file for the bZIP- DNA-YOPRO complex. The two oxygen atoms on end bases 1 and 21 were deleted to make the structure compatible with tleap. The bZIP-DNA-YOPRO complex was solvated with 20524 TIP3P waters with a buffer distance of 8 Å. 21 Na+ counterions were added for electroneutrality. The minimization and heating run were performed using the same restraints that were used for the YOPRO in water system. The heating run lasted 200 ps and was followed by the equilibration run. The heated structures were used as the reference structures for the next 200 ps equilibration run, which let the protein maintain its original structure while the solvent environments were 2.1.5 bZIP-DNA-YOPRO Intercalation in Water

 35 allowed to reach their proper densities. At the end of the equilibration run, the density of the bZIP-DNA-YOPRO complex was 1.011g/cc. The restart file from those equilibration runs were then used as the inputs for the subsequent production MD runs. After equilibration, different starting dihedral angles were used, as detailed in the following sections. 2.1.6 Umbrella Sampling of YOPRO Model 1 and Model 2 in Water For the model 1, the umbrella sampling of the dihedral C2-C6-C10-C11 referred to as D1 started from -110 degrees using the restart file from the free simulation. The windows were spaced 20 degrees apart from -170 to 170 degrees. A force constant of 20 kcal/mol/rad2 was used for the windows from -130 to -10 degrees while the force constant of all other windows was 40 kcal/mol/rad2. Each window was started with a 200 ps energy minimization consisting of 500 iterations of steepest descent and 1500 iterations of conjugate gradient algorithm. The next step was 200 ps equilibration. The same conditions as for unrestrained runs were used. A Berendsen thermostat was used to heat the system from 0 to 300 K under constant pressure (1 bar) with a pressure relaxation time of 5 ps. Each window production run was 1 ns. For the model 2, two dimensional umbrella sampling was carried on D1 and D2 (dihedral C6-C10-C11-O1) starting from D1=–120 and D2=0 degrees. The restart file from the free simulation was used. For D1, the “regular” windows were spaced 20 degrees apart from –180 to 180 degrees with the addition of several windows at the extremes for better overlapping (D1=±19 and ±10 degrees). D2 was limited to the 36 interval [-40, 40] with 5 windows spaced 20 degrees apart because the free simulation results indicated the probability density distribution concentrated only in that range. The force constant for D2 was always 20 kcal/mol/rad2, while that of D1 was also 20 kcal/mol/rad2 in all window except those six where the force constant of D1 was set to 60 kcal/mol/rad2: (D1=–10, D2=0, ±20, ±40) and (D1=10, D2=–40). All constraints used except the force constants were the same as those in Model 1. Each window was simulated for 10 ns after 1ns equilibration. All the windows overlap with their neighboring ones very well (Figure 2.1.2). 2.1.7 Unrestrained Intercalation Simulation of Model 1, 2 and 3 Figure 2.1.2 This density distribution shows that these 105 windows mentioned above cover the whole region of D1∈ [-180,180] and D2∈ [-40,40] for the model 2. For the model 1, a 10ns simulation of D1 was started from –165.0 degrees. For both model 2 and 3, D1 was started from –152.4 degrees and D2 was started from – 4.6 degrees, after equilibration. Two successive trajectories of 10 and 13 ns were run for Model 2 and one trajectory of 13 ns was run for Model 3. 37 2.1.8 Umbrella Sampling Simulation of Models 2 and 3 Intercalated 
 Only D1 was scanned, while D2 was constrained around 0 degrees (for model 2) and 20 (for model 3). The restart file from the free simulation was used for start up. The sampling was started from D1=–160 degrees (for model 2) and –140 degrees (for model 3) and then extended to both positive and negative directions to cover the whole range [–340,10]. After unbiasing by WHAM, this dihedral was converted to [- 180,180] for plotting. The force constant of D2 was always 20 kcal/mol/rad2. The force constants of D1 are given in Table 2.1.1 and 2.1.2: Force constant (kcal/mol/rad2) D1 -240 to -340, every 10 degrees 60 40 -170 to -230,every 10 degrees -160 to -120,every 10 degrees 20 -110 to -50, every 10 degrees 40 -40 to -20, every 10 degrees 60 80 -10 to 0, every 5 degrees 1 degree 100 3 degrees 160 200 5 degrees 8 degrees 200 10 degrees 100 Table 2.1.1 Window data for Umbrella sampling of Model 2 intercalated. D1 15 degrees 10 degrees 5 degrees 0 degree -2 degrees -5 degrees Force constant (kcal/mol/rad2) 160 160 140 120 120 120 38 -7 degrees -10 degrees -12 degrees -15 degrees -19 degrees -20 to -30, every 10 degrees -40 to -110, every 10 degrees -120 to -160, every 10 degrees -170 to -230, every 10 degrees -240 to -280, every 10 degrees -290 to -340, every 10 degrees Table 2.1.2 Window data for Umbrella sampling of Model 3 intercalated 120 100 100 60 80 60 40 20 40 60 80 39 2.2 Simulations with Fitted Force Fields by the Implicitly Polarized Charge Method 
 2.2.1 Partial Charge Set Fitting for Water-solvated YOPRO

 The Implicitly Polarized Charge Method(IPolQ)11 described in Section 1.3 was used to fit for the partial charge set for water-solvated YOPRO. After solvating YOPRO in 2238 SPC water molecules, we froze the YOPRO and carried out 10 ns conformational sampling of the solvent at 450K to collect a set of equilibrated water configurations. Next, the MD simulation of the solute YOPRO were performed from each equilibrated water configuration with the solvent fixed to obtain a set of time averaged solute configurations. The fixed locations of the water solvent served to create a field of point charges surrounding the solute. Once the MD simulations were finished, we used orca34 to calculate the single point energy and electrostatic potential of each solute configuration in vacuum for charge fitting at B3LYP/cc-pvTZ level, and took the solvent charge density as a perturbation for the electrostatic potentials in water. The final IPolq charge set was derived from the perturbed electrostatic potential set. The IPolQ charges set was used as the input for the next iteration by replacing the original guess. Once the fitted charge set is converged, the charge fitting process is finished. The IPolQ charge then goes to the force constant fitting section. 40 a6 a8 an bn 2.2.2 Dihedral Force Constant Fitting The force constant fitting method described in Section 1.5 was used to fit for the force constant parameter of D1 and D2 (figure 2.1.1). To start with, 36×36 conformations of YOPRO were generated by applying NMR constraints on the two dihedrals, in vacuum. values ): The following are some fitted force field parameters ( and D2a, b a2 b8 b6 b4 b2 a4 -20.620 4.010 0.645 1.836 174.181 155.290 101.500 -175 8.595 -145.600 10.940 -113.100 91.120 -125 2.592 2.462 0.867 165.630 -120.900 -107.900 55.780 2.286 1.456 1.829 202.610 12.715 -75 -70.410 -25 12.060 2.026 1.684 0.917 194.400 7.009 240.300 25 11.420 2.217 1.458 2.348 171.478 -37.760 125.800 94.930 -1.898 -45.790 2.221 0.502 0.940 165.770 125.700 11.440 75 125 11.470 2.978 2.003 0.246 188.932 143.040 -77.370 -106.500 175 8.920 1.405 2.334 3.090 188.786 22.430 162.860 -105.300 Table 2.2.1 S0 YOPRO a and b values of D1 surface as functions of D2 D1a, b a2 b8 b6 b4 b2 a6 a4 a8 3.437 1.02 -175 225.55 192.52 9.775 1.091 174.767 -0.924 -125 105.8 147.17 10.515 1.615 1.173 1.704 168.71 3.323 -121.3 247.71 69.53 2.829 1.633 1.595 216.69 12.305 -75 -25 8.91 1.683 0.457 0.431 187.063 -28.27 130.61 166.36 25 11.15 1.53 0.936 1.871 167.91 0.261 81.66 137.3 244.04 -32.03 2.368 1.279 1.564 150.78 11.625 -211.5 75 125 11.235 2.846 1.274 0.638 191.13 78.67 -76.14 244.86 175 8.175 3.289 0.783 1.872 184.417 14.44 181.946 208.43 Table 2.2.2 S0 YOPRO a and b values of D2 surface as functions D1 D2a, b a2 b2 41 b4 a4 a6 a8 b6 b8 195.350 145.020 132.980 178.759 166.330 179.736 210.960 177.134 a6 a8 3.742 3.375 1.806 8.553 -38.470 7.520 2.179 5.580 2.747 3.171 3.212 -90.120 5.790 4.004 3.725 -27.000 3.825 4.503 1.315 -7.339 1.223 1.960 2.091 202.530 9.875 2.198 5.295 37.870 3.522 3.143 4.763 15.890 -16.860 -14.480 -175 6.105 -113.800 206.900 -125 10.145 -86.530 -97.600 -75 2.409 0.349 3.228 -25 6.815 -24.470 -23.730 25 7.130 48.600 41.410 75 2.326 86.170 166.440 125 9.795 -4.631 -5.574 175 5.195 Table 2.2.3 S1 YOPRO a and b values of D1 surface as functions of D2 The torsional force constants of D1 are minimized at D2= ±75, which corresponds to high energy regions on the quantum energy surface of S1 YOPRO. D2a,b b6 b8 a2 -95.060 92.930 11.105 -175 -55.720 589.300 13.855 -125 208.160 -75 164.430 12.830 31.450 -9.114 10.140 -25 84.830 25 11.160 6.343 170.567 75 19.335 198.240 9.348 61.610 14.565 125 -21.640 175 10.055 -93.940 Table 2.2.4 S1 YOPRO a and b values of D2 surface as functions D1 The torsional force constants of D2 are maximized at D1= ±75, which corresponds to low energy regions on the quantum energy surface of S1 YOPRO. The final fitted result shows that in the S0 state, the two linker bonds are more or less in the same bond order, as expected for the ground state of YOPRO, while in the S1 state, D1 (see Figure 2.1.1) turns to be more like a pure single bond and D2 more like a pure double bond. a4 b2 2.344 0.914 0.565 183.751 3.599 2.018 0.195 188.427 8.380 4.466 1.256 146.220 0.889 2.142 1.866 177.814 0.985 0.905 0.564 174.326 12.31 5.855 1.534 209.330 2.001 1.653 0.452 175.116 3.188 0.230 0.629 180.027 b4 -7.855 63.950 -22.980 -114.000 247.670 18.080 -48.170 29.220 2.2.3 Umbrella Sampling of YOPRO in Water on the S0 and S1 Surfaces Two dimensional umbrella sampling on the S0 surface was carried out on dihedrals D1 (C2-C6-C10-C11) and D2 (C6-C10-C11-O1). Note that the dihedral 42 potentials that we developed, coupling D1 and D2, are well-suited to the umbrella sampling method whereby restricted window ranges for these dihedrals are used. Otherwise, in a free simulation, the specific dihedral-dependent force field would have to be introduced when the dihedrals’ would reach different regions of their angles. That would require monitoring the dihedral values and stopping and restarting the simulation for the new force field values. The windows were started from the trans-cis conformation (180, 0) which is around the global minimum on the quantum chemical energy surface with the corresponding fitted internal force constant sets. The starting window was initialized with a minimization followed by a 5ns equilibration at 300 K using constant NPT (isothermal-isobaric) conditions, and the restart file was passed to start the equilibration of its neighboring windows. Each adjacent window was spaced by 10 degrees along D1 or D2. Finally, the sampling range expanded to the full space (360⁰×360⁰) covered by 36×36 windows. Once all the windows were equilibrated, their production runs were started at the same time. All minimization, equilibration and production runs were carried out with a restraint force constant of 10 kcal/mol/rad2 for both D1 and D2. All the umbrella samplings were unbiased with the WHAM methodology61, 62 to obtain the two dimensional free energy surfaces along the D1 and D2 dihedral angles. For the S1 surface, two ways of generating the window data were used. The first way was started at the same places as in S0 sampling. The exact same protocol 43 was used, except that the internal force fields of YOPRO for each window were those for the S1 state. In the second way, the umbrella sampling was started at all four minima: (0,0), (0,180), (180,0) and (180,180) on the “subtracted” S0 surface of YOPRO for the force constant parameter fitting as described in Section 1.5 with the corresponding fitted internal force constant sets for the S1 state. The same protocols were used as in the first method, except this time we sampled four regions covered by 15×15 windows with the four starting windows in the center. This protocol is used since the minima on the S0 surface are the Frank-Condon transition points as discussed in Section 1.6. Later on this protocol is used for the S0 and S1 intercalations as discussed in the next section because during the full space sampling of intercalated YOPRO, deintercalation always happens which must be avoided because YOPRO never go back to the “box” after that. The same protocols were used to sample intercalated YOPRO in the S0 and S1 states as for the second sampling method of S1 YOPRO in water outlined in Section 2.2. When the production runs were finished, VMD63 was used to view the trajectories of the windows to exclude those where YOPRO de-intercalated. 2.2.4 Umbrella Sampling Simulation of Intercalated YOPRO Surfaces 44 3 Results 3.1 bZIP-DNA in Water Figure 3.1.1 The bZIP crystal structure59 showing the two monomers composed of leucine zipper and basic domains. The dsDNA that is perpendicular to the protein dimer is held in place by the basic domains. The bZIP-dsDNA complex is comprised of 40 paired bases and two leucine zipper monomers each with 57 residues. The crystal structure of GCN4 in the presence of DNA (PDB accession code 1YSA)59 was used to initiate the simulations (see Chapter 2). Figure 3.1.1 displays the two alpha-helical monomers and the DNA that is bound essentially perpendicular to dimerized bZIP. The bZIP-DNA was first simulated without YOPRO intercalated for 20 ns. Root mean square fluctuations (RMSFs) were traced for DNA (nucleic acid residues) and bZIP (protein residues) separately, as shown in Figure 3.1.2. For the double stranded dsDNA we ignored the bases at the beginnings of the two strands so the RMSFs of DNA start from base 2 and end at base 40, with one strand labeled as 2-20 and the other 22-40. The DNA is quite stable, with greater fluctuations at its extremities where it is not as constrained by interactions with bZIP. 45 6 RMSF of the nucleic residues 5 6 RMSF of the protein residues 5 ) Å ( F S M R 4 3 2 1 0 ) Å ( F S M R 4 3 2 1 0 0 40 10 30 nucleic residues 20 40 80 100 120 140 160 60 protein residues Figure 3.1.2 RMSFs of the DNA bases and protein residues for bZIP-DNA. The dsDNA strands are numbered 2-20 and 22-40. The protein monomers are numbered 41-97 and 98-154. The bases and residues are quite stable with the protein dimer maintained and base pairing maintained with greater fluctuations at the ends of both monomers and both DNA strands. In order to simulate bZIP with intercalated YOPRO space must be provided between paired bases. The procedure and results for doing so are now presented. 46 3.2 Preparing Space in bZIP-DNA for YOPRO intercalation Based on experiments intercalating YOYO, which consists of two YOPRO moieties connected by a bridge region, sites for YOPRO intercalation were found. The rings of each YOPRO predominantly intercalated in a double stranded, palindromic dsDNA consisting of 16 or 24 residues. The YOYO rings intercalate preferentially to d1(5'- CTAG-3') : d2(3’-GATC-5’) binding sites. 64 Therefore, we assume that YOPRO intercalates into d1(CT):d2(GA) and d1(AG):d2(TC) binding sites. We picked one d1(5’-C4T5-3’):d2(3’-G38A37-5’) and used constraint MD to gradually separate these neighbor base pairs to provide intercalation space for YOPRO. The expansion protocol is detailed in the Methods section. Figure 3.2.1 The hydrogen bonded base pairs, G38-C4 and A37-T5 separated for YOPRO intercalation, with the three GC and two AT hydrogen bond distances at the end of the separation indicated. Figure 3.2.1 displays the endpoint of the separation simulation and Table 3.2.1 provides key distances. The figure shows that the five hydrogen bonds between the 47 GC and AT bases are preserved. The distances that measure the expansion of the space between the neighbor base pairs increase from about 3.5 to 9.4 Å. This about 6 Å increase in separation is required to intercalate YOPRO. T5C5- Atom pairs T5C5- C4C2- G38C4 A37C2 A37C2 6.44 5.49 7.06 Starting Distance (Å) 10.9 11.19 Ending 5.49 Distance (Å) Table 3.2.1 Box expanded atom pair distances. Plots of the hydrogen bond distances (Figure 3.2.2) show that all five are well maintained throughout the expansion. G38C4- A37C2 3.59 9.39 C4C2- G38C4 6.08 6.08 C4C2- T5C5 3.37 9.37 48 Figure 3.2.2 During the expansion to provide room for YOPRO intercalation, the base pairing hydrogen bonds are well maintained. 49 3.3 YOPRO Models 1,2 and 3 3.3.1 YOPRO Model 1 Intercalation in bZIP-DNA The structure of YOPRO is given in Figure 2.1.1.1. As commonly drawn64, 65, the linker between the benzoxazole and the quinoline ring systems consists of a single bond (C11-C10) and a double bond (C10-C6). Therefore, the dihedral defined by C2- C6-C10-C11 is sufficient to represent the angle between the two rings systems. This YOPRO will be referred to as YOPRO Model 1. Standard internal force field parameters for these single and double bonds were set by the Generalized Amber Force Field (GAFF), as discussed in Chapter 2. YOPRO was manually docked into the ‘box’ in Figure 3.2.1, and 10 ns MD of this bZIP-DNA-YOPRO complex initiated. Figure 3.3.1 shows the intercalated YOPRO along with some distances that are used to characterize its bounding box. 50 Figure 3.3.1 A snapshot from the 10 ns MD trajectory of YOPRO intercalated into the ‘box’ formed from the separating the base pairs. Six distances are used to characterize the stability of the box: A37C2-C4C2, T5C5-C4C2, G38C4-T5C5, G38C4-C4C2, A37C2- T5C5, and A37C2-G38C4. The five GC and AT hydrogen bonds and the box dimensions were well maintained during the simulation, as summarized in Tables 3.3.1-3.3.2. Standard deviations (Å) H bonds A37N1-T5N3 0.1082 A37N6-T5O4 0.1921 G38N1-C4N3 0.1065 0.1309 G38N2-C4O2 G38O6-C4N4 0.1965 Table 3.3.1 Hydrogen bond averages and standard deviations of the YOPRO- intercalated bases over a 10 ns simulation. Average bond lengths (Å) 2.944 3.017 2.961 2.896 2.969 51 Average distances (Å) Atom pairs Standard deviations (Å) 8.739 A37C2-C4C2 0.5273 7.754 A37C2-G38C4 0.5518 5.992 A37C2-T5C5 0.1656 6.268 G38C4-C4C2 0.1157 10.407 G38C4-T5C5 0.5296 7.408 T5C5-C4C2 0.5799 Table 3.3.2 Averages and standard deviations of the YOPRO-intercalated box dimensions over a 10 ns simulation. In addition, the location of YOPRO relative to the box was parameterized by some distances between atoms belonging to YOPRO and the bases, as displayed in Figure 3.3.2. Figure 3.3.2 Base-to-YOPRO distances used to characterize the intercalation extent, for a snapshot from the 10 ns MD trajectory. Monitoring these distances over the MD trajectory (see Table 3.3.3) shows that YOPRO always remains inside the box in the sense that both rings are kept within the enclosure of the four bases. 52 3.3.2 YOPRO Model 1 in Water Ave distances (Å) 4.3574 5.5450 3.8205 5.7346 Atom pairs Standard deviations (Å) 0.9552 A37N7-YOPRO C3 1.9467 C4C2-YOPRO C14 0.5871 G38N3-YOPRO C8 1.4060 T5C4-YOPRO C17 Table 3.3.3 Averages and standard deviations of distances between G, C, A and T and indicated YOPRO atoms over the MD trajectory. The propanamine “tail” does stick out of the base enclosure; however, what is key is that the two ring systems remain well intercalated during the MD trajectory. Using the model 1 force field (see Section 2.1), two simulations of YOPRO Model 1 were carried out in water. First, a 10 ns simulation started from a configuration with the single bond dihedral D1 (C2-C6-C10-C11) at angle –170.5 degrees, obtained from the intercalated structure. This unrestrained simulation covered a small dihedral range. A histogram of the dihedral probability shows that the dihedral fluctuated around –120 degrees. This is far from a planar conformation of the two ring systems. However, even in water, MD may not indicate the global free energy minimum of a system by not being able to surmount barriers in the complex potential energy surface. Therefore, we carried out standard umbrella sampling66, 67 to obtain the probability of observing this dihedral angle. Nineteen windows, each simulated for 1 ns, were used to cover the range (–180,180), as detailed in the Section 2.1. The umbrella constraints were unbiased with WHAM27, 61, 62. Figure 3.3.3 shows that the dihedral angle is indeed restrained to about –115 degrees while the unrestrained simulation dihedral probability agrees well over its limited range with that obtained from the umbrella sampling. 53 Figure 3.3.3 PMF of the dihedral D1(C2-C6-C10-C11). The red line is the result of the unrestrained simulation while the black line is obtained from umbrella sampling. The second minimum at around 90 degree is about 4 kcal/mol higher than the , relative to the global global minimum, indicating that it has a population, minimum. Thus, in water the YOPRO the single bond C2-C6-C10-C11 dihedral is quite well restricted to sampling around –120 degrees. P ∼ 0.0007 3.3.3 YOPRO Model 1 Intercalated To see if the intercalated YOPRO model 1 would lead to a more planar configuration of the ring systems, we simulated the intercalated bZIP-DNA-YOPRO complex for 10 ns starting from the geometry at the end of the docking intercalation run in Section 3.3.1, when D1 was –165.0 degrees. A histogram of D1 is displayed in Figure 3.3.4. 54 Figure 3.3.4 PMF of the dihedral D1(C2-C6-C10-C11). The red line is the result of the unrestrained simulation while the black line is obtained from umbrella sampling. The unrestrained simulation of intercalated YOPRO shows that this dihedral fluctuates in a range around –145 degrees, which is not very different from that obtained with YOPRO in water (see Figure 3.3.4). Thus, the differentiation between YOPRO in water and intercalated is not evident here. Of course, it is possible that if a different initial D1 angle were used, YOPRO could fluctuate around some other minimum angle. However, rather than pursue an umbrella sampling simulation to obtain a PMF around this dihedral, a more realistic model for the YOPRO linker will be now introduced in the next section. With regard to the overall structure of the bZIP-DNA-YOPRO complex, Figure 3.3.5 shows the RMSFs of the bases and the residues for the complex, along with those for bZIP-DNA. The RMSFs are all quite similar. Thus, both in terms of overall structure and the specifics of the YOPRO intercalation, the bZIP-DNA-YOPRO complex is stable and the YOPRO ring systems are properly intercalated within the four bases. 55 6 4 2 ) Å ( F S M R RMSF of the nucleic residues bZIP-DNA bZIP-DNA-YOPRO RMSF of the protein residues bZIP-DNA bZIP-DNA-YOPRO 8 6 4 2 ) Å ( F S M R 40 60 80 100 120 140 160 0 40 0 0 20 10 nucleic residues 30 protein residues 3.3.4 Revision of the YOPRO Force Field, Model 2 Figure 3.3.5 Model 1 RMSFs of the DNA bases and bZIP residues compared for bZIP- DNA and bZIP-DNA with intercalated YOPRO over their MD trajectories. The small difference between the YOPRO single bond dihedral angle in both water and when intercalated brought us to the idea of reconsidering the accuracy of the force field for the atoms forming the linker between the ring systems. Indeed, the quantum chemical optimization (see Section 2.1) showed that the two bonds forming the linker are essentially the same length, 1.4 Å, and, therefore, should be of the same bond order, as appropriate to a more-or-less symmetrical cyanine dye. Cyanine dyes with the structure Aryl=N+=CH[CH=CH]n-N=Aryl are generally considered as delocalized systems.68, 69 That seems to be the case here. Thus, the two linker bonds are considered as delocalized “bond-and-a-half” bonds and modifications to the generalized amber force field (GAFF) were made. The result is that the torsional angle force constants for the two dihedrals D1 and D2 were set to values that interpolate single and double bonds (see Section 2.1 for details). This model will be referred to as YOPRO Model 2. 56 3.3.5 YOPRO Model 2 in Water We carried out a 270 ns simulation of YOPRO in water with this revised force field. Now the dihedral angles around the both bonds in the YOPRO linker were monitored. D1 is the original dihedral, while the new one, D2, is defined as C6-C10- C11-O1 (see Figure 2.1). Figure 3.3.6 shows that D2 fluctuates around zero degrees, while D1 now has four positions of high probability: ±70 and ±140 degrees, which is quite different from the previous result shown in Figure 3.3.6. Figure 3.3.6 The probability density for the two dihedrals for the unrestrained YOPRO Model 2 in water. We also carried out two dimensional umbrella sampling to support this result, as detailed in the Section 2.5. A total of 105 windows were used to sample D1 from – 180 to 180 degrees and D2 from –40 to 40 degrees, to cover the high probability ranges of the unrestrained simulation. The probability are displayed in Figure 3.3.7. 57 Figure 3.3.7 The probability density (top) for the two dihedrals from the 2D umbrella sampling simulation. This result is in good agreement with that obtained from the unrestrained simulation in the sense that D1 peaks are at ±65 and ±130 degrees, the same position as those in the free simulation. To investigate why the probability distributes like this, we used PYMOL60 to constrain D2 at 0 and rotated the benzoxazole ring around the bond between C6 and C10 and measured the distances between different atom pairs every 20 degrees. As is evident in Figure 2.1, the C7-O1 and C5-O1 are two atom pairs for which their distances could sample smaller than van der Waals contact. Assuming that these distances might be the dominant factors in the PMF, we evaluated the sum of the Lennard-Jones potentials of these two atom pairs at different D1 values. The results show that when D1 approaches zero degrees, both C7-O1 and C5-O1 distances become very short, leading to Lennard-Jones potential energies that are much larger 58 ± 3.3.6 YOPRO Model 2 Intercalated than thermal. This clash explains much of the structure of Figure 3.3.6. There is more structure in the probability surface of the umbrella sampling. However, the slight enhanced probability around 65 degrees corresponds to only about a 1 kcal/mol barrier in the PMF, which is on the thermal energy scale. We then carried out a 10 ns simulation on intercalated YOPRO with this modified force field. Table 3.3.4 shows that the five intercalating base-base hydrogen bonds are well maintained and Table 3.3.5 shows that the atom pair distances of the box that confines YOPRO are stable. H bonds Average bond lengths (Å) G38N1-C4N3 2.952 G38O6-C4N4 2.926 2.996 A37N6-T5O4 A37N1-T5N3 2.956 G38N2-C4O2 2.866 Table 3.3.4 The intercalating hydrogen bond average lengths and standard deviations over the 10 ns simulation. Average distances (Å) Atom pairs A37C2-C4C2 8.535 A37C2-G38C4 7.449 A37C2-T5C5 6.007 7.166 T5C5-C4C2 G38C4-C4C2 6.286 G38C5-T5C4 10.137 Table 3.3.5 The box atom pair average lengths and standard deviations over the 10 ns simulation. The distance monitors for the YOPRO ring systems remaining between the confining bases displayed in Table 3.3.6also show that the intercalation is stable. Standard deviations (Å) 0.08896 0.1410 0.1968 0.1086 0.1201 Standard deviations (Å) 0.4529 0.4120 0.1595 0.6327 0.1082 0.7354 59 Average distances(Å) 4.356 4.283 3.966 4.900 Atoms pairs C4C2-YOPRO C14 T5C4-YOPRO C17 A37N7-YOPRO C3 G38N3-YOPRO C8 Table 3.3.6 The atom pairs average lengths and standard deviations (see Figure 7) for the distances used to characterize the intercalation extent over the 10 ns simulation. Figure 3.3.8 demonstrates that there is no significant difference of the RMSF profiles between the solution and the intercalation simulations with the new force field: Standard deviation (Å) 0.5716 0.4600 0.4196 0.9586 Figure 3.3.8 Model 2 RMSFs of the DNA bases and bZIP residues compared for bZIP- DNA (black) and bZIP-DNA with intercalated YOPRO (red) over their MD trajectories. This 10 ns MD trajectory was started from –150 degrees for D1 and 0 degrees for D2. Figure 3.3.9 shows that D2 still fluctuates around 0 degrees, as in solution, while D1 is fixed at around –165 degrees. The behavior is significantly different from the solution model 2 YOPRO result shown in Figure 3.3.7. It indicates that intercalated YOPRO in this model is reasonably close to planar. Certainly much more so than in solution. 60 Figure 3.3.9 The dihedral angles D1 and D2 for intercalated YOPRO Model 2.
 To confirm the conclusions regarding the stability of the YOPRO intercalation and its negligible effect on the structure of the bZIP-DNA-YOPRO complex the simulation was continued for another 13 ns. The results are presented as follows and are, to statistical fluctuations, the same as what has been presented in this section. H bonds Standard deviations (Å) Average bond lengths (Å) A37N1-T5N3 2.9473 0.1043 0.1882 2.9983 A37N6-T5O4 0.08568 2.9530 G38N1-C4N3 0.1222 2.8702 G38N2-C4O2 0.1383 2.9232 G38O6-C4N4 Table 3.3.7 The hydrogen bond average lengths and standard deviations over the 13 ns simulation. All H bond are well maintained. Atom pairs Standard deviations (Å) Average distances (Å) T5C5-C4C2 7.0020 0.7810 G38C4-T5C5 9.9039 0.8569 0.4828 8.5856 A37C2-C4C2 A37C2-G38C4 7.4347 0.4508 A37C2-T5C5 5.9951 0.1620 G38C4-C4C2 6.2891 0.1088 Table 3.3.8 The box atom pair average lengths and standard deviations over the 13 ns simulation. 61 Atom pairs Standard deviations (Å) A37N7-YOPRO C3 0.4155 C4C2-YOPRO C14 0.5404 G38N3-YOPRO C8 1.0278 T5C4-YOPRO C17 0.5268 Table 3.3.9 The atom pair average lengths and standard deviations for the distances used to characterize the intercalation extent over the 13 ns simulation. Average distances (Å) 4.0691 4.2586 5.2105 4.3699 Figure 3.3.10 Model 2 extended dihedral angles D1 and D2 for intercalated YOPRO. As for YOPRO model 1 in water, we carried out umbrella sampling to investigate this result. Of course, with an intercalated YOPRO, if a PMF around one or both linker dihedrals is constructed there is the distinct possibility that, for some twist angle range between the two ring systems, YOPRO will no longer be intercalated and/or the bases will no longer be hydrogen bonded, or form a box. The windows procedure was started from D1 at –152.4 and D2 at –4.6 degrees, where the probability is maximum. The D1 dihedral was restrained to cover the range (– 180,180) while the D2 dihedral was restrained around 0 degrees (see Section 2.1). This probability is shown in Figure 3.3.11. 62 Figure 3.3.11 The intercalated YOPRO probability density (top) for the two dihedrals from a 2D umbrella sampling simulation. The minimum on the PMF profile shifted from about –160 to –135 degrees. This is going away from the more planar conformation of the intercalated MD simulation. To understand this result, we explored the possibility that the ring system might no longer be well confined by the box formed from the surrounding bases. If, for example, the benzoxazole ring protruded from the box, its rotational motion could become unfrozen. As indicated in Figure 3.3.12, the G38N3-YOPRO C8 and A37N7- YOPRO C3 distances can be used to characterize the quinoline ring position relative to the bases, while the T5C4-YOPRO C17 and C4C2-YOPRO C14 distances can be used to characterize the benzoxazole ring position relative to the bases. 63 Figure 3.3.12 Four characteristic distances (see Figure 3.3.2) to monitor the positions of the ring systems, as a function of the D1 dihedral angle. Figure 3.3.12 plots these four characteristic distances as a function of the D1 dihedral angle. The T5C4-YOPRO C17 distance rapidly increases when D1 goes beyond –160 degrees, indicating that the benzoxazole ring does go out of the box when the conformation of YOPRO is constrained to be further from planar. The relative location of YOPRO shown in Figure 3.3.13 supports this conclusion. 64 Figure 3.3.13 The last snapshot from the window where D1 is constrained at –120 degrees. In this twisted configuration the benzoxazole ring has moved out of the box formed from the surrounding bases. It is significant that the five hydrogen bonds of the box base pairs are still well- preserved during the windowing, as shown in Figure 3.3.14. Thus, the box conformation didn’t undergo very significant changes when the benzoxazole ring departs from the box confines suggesting the existence of another binding mode. 65 Figure 3.3.14 Model 2 average and standard deviations of the three AT and two GC hydrogen bonds as a function of the angle D1. The hydrogen bonding of the four bases is maintained even as the oxazole ring is no longer intercalated. In addition, the RMSFs of the residues for the most twisted window, (–120, 0), also show that they are hardly affected by the benzoxazole de-intercalation as below. Figure 3.3.15 Model 2 extended RMSFs of the DNA bases and bZIP residues compared for bZIP-DNA and bZIP-DNA with intercalated YOPRO of the most twisted window, (-120, 0) over their MD trajectories. That YOPRO is a conjugated system is quite clear from the quantum chemistry and from previous work on cyanine dyes as a class.68-70 It is worth exploring a 3.3.7 YOPRO Model 3 66 somewhat reduced value for these force constants to see what sensitivity there is to the bridging dihedral force constants. We use the values 0.5 kcal/mol for this model 3 as a value that is about the thermal energy. Figure 3.3.8.1 displays the two dihedral probability distributions. YOPRO was simulated in water for 30 ns with the new force constant values. 3.3.8 YOPRO Model 3 in Water Figure 3.3.16 The dihedral angles D1 and D2 for unrestrained YOPRO Model 3 in water. The D1 probability distribution is similar to Model 2 (see Figure 3.3.6) but without the “holes” in the distribution. D2 is still peaked around zero degrees. Thus there is some influence from the reduction in force constant, but the features that YOPRO in water is far from planar and samples broad ranges of these dihedral angles are still in evidence. 67 3.3.9 YOPRO Model 3 Intercalated The bZIP-DNA complex with YOPRO intercalated was simulated for 13 ns. Monitors of the stability of Model 3 intercalation are presented in the figure 3.3.17 and the tables 3.3.10–3.3.12. Figure 3.3.18 displays the probability distributions of the dihedrals. Figure 3.3.17 Model 3 RMSF of the DNA bases and bZIP residues compared for bZIP- DNA and bZIP-DNA with intercalated YOPRO over their MD trajectories. Figure 3.3.18 The dihedral angles D1 and D2 for intercalated YOPRO Model 3. D1 is now centered at –140 degrees, while the equilibrium position of D2 is shifted to 20 degrees. Therefore the angle between the two ring systems is 68 Standard deviations (Å) 0.1463 0.1338 0.09067 0.3313 0.1527 degrees. That is a significant shift away from planarity in the Model 2 intercalation simulation. Furthermore, in contrast to Model 2, in this simulation, the oxazole ring does not remain intercalated (see Table 3.3.12). After about 3 ns over an interval of about 200 ps its position shifts from being intercalated to de- intercalated, much like in the Model 2 PMF simulation. H bonds: Average bond lengths (Å) G38O6-C4N4h: 2.918 G38N2-C4O2h: 2.901 2.953 G38N1-C4N3h: A37N6-T5O4h: 3.184 2.962 A37N1-T5N3h: Table 3.3.10 The hydrogen bond average lengths and standard deviations over the simulation. All H bond are well maintained. Atom pairs Average distances (Å) 8.3688 A37C2-C4C2 7.6844 A37C2-G38C4 5.9107 A37C2-T5C5 6.2544 G38C4-C4C2 G38C4-T5C5 9.4147 Table 3.3.11 The box atom pair average lengths and standard deviations over the simulation. Atom pairs Standard deviations (Å) A37N7-YOPRO C3 0.6491 C4C2-YOPRO C14 0.5057 0.4293 G38N3-YOPRO C8 T5C4-YOPRO C17 0.4282 Table 3.3.12 The atom pair average lengths and standard deviations for the distances used to characterize the intercalation extent. Standard deviations (Å) 0.4334 0.3977 0.1753 0.1143 0.6400 −140 + 20 = −120 Average distances (Å) 5.5613 4.5867 4.1045 7.7156 69 Figure 3.3.19 The intercalated YOPRO probability density for the two dihedrals from the 2D umbrella sampling simulation for YOPRO Model 3. Similar to model 2, we also carried out umbrella sampling to explore the free energy surface around the probability maxima of the unrestrained simulation, with D1 around –140 and D2 around 20 degrees as shown in Figure 3.3.19. Windows were spaced 10 degrees apart for D1 while D2 was always constrained around 20 degrees. The details are given in the Section 2.1. The free energy minimum is found around D1 = –135 degrees, which is consistent with the result from the unrestrained simulation. This is quite far from the planar conformation, indicating that the reduced force constants leads to deviations from planarity, compared to the original force constants 1 kcal/mol of Model 2. This PMF run was initiated from the end of the unrestrained run with the oxazole ring de- intercalated, and measurement of the characteristic distances shows that it remains de-intercalated. 70 3.3.10 Discussion and Conclusion Remarks of the Three Models Three models of YOPRO were investigated. In model 1 the two linker bonds are treated as a single (C10-C11) and a double bond (C10-C11) with standard GAFF values for the dihedrals associated with these bonds. For models 2 and 3 both bonds are treated as appropriate to a delocalized bonds resulting in dihedral force constants that are intermediate between single and double bond values, as detailed in the Methods Section. Model 3 was generated by simply reducing the model 2 value by a factor of two. Thus, model 2 should be considered as the most accurate model based on the GAFF methodology. The results of our simulations are summarized in Table 3.3.13. Only the peak positions are indicated, but looking at the various plots indicated in the table shows that the intercalated simulations lead to somewhat more constrained dihedral angle sampling than do the water simulations. Model 1 (a) Model 3 Model 2 D1 D2 D2 D1 D1 –100 (b) ±100 (d) Water, unrestrained 0 (d) ±70; ±140 (c) –115(e) – Water, window ±65; ±130 (f) – 20 (i) –140 (i) –165 (h) –145 (g) Inter, unrestrained – –135 (k) Inter, window –135 (j) – Table 3.3.13 Dihedral probability peaks (degrees) for all the simulations. (a) Only D1 fluctuates. (b) Figure 3.3.2.1. (c) Figure 3.3.5.1. (d) Figure 3.3.8.1. (e) Figure 3.3.2.1. (f) Figure 3.3.5.2.(g) Figure 3.3.3.1. (h) Figure 3.3.6.2. (i) Figure 3.3.9.2. (j) Figure 3.3.6.4.(k) Figure 3.3.9.3 Model 1 in water shows that YOPRO is far from planar in the unrestrained simulation and is in good agreement with the umbrella sampling simulation over the range sampled by the unrestrained simulation. That the unrestrained sampling of D1 is so limited does show that even in water there are substantial barriers to this dihedral. Turning this dihedral introduces some van der Waals clashes within YOPRO, D2 0 (c) ±10 (f) 0 (h) – 71 along with various solvation effects. When intercalated, D1’s sampling shifts from peaking around–120 to around –145 degrees. Though the YOPRO intercalated between the base pairs is somewhat more planar than the unrestrained YOPRO in water, both are far from planar. In model 2, the water unrestrained and window simulations qualitatively agree with each other. The water simulation has four peaks in the probability distributions. The D1 (C2-C6-C10-C11 dihedral) is always far from planar and the D2 (C6-C10-C11-O1) dihedral samples around zero degrees. The window simulation does put more of the D1 weight on the ±65 versus ±130 degree peaks than does the unrestrained simulation. In either case a large range of angles between the planes of the two rings is sampled. The model 2 intercalated unrestrained simulation shows that the angle between the two ring systems is much closer to planar with (D1, D2) around (–165,0) degrees. The bases surrounding YOPRO are quite constraining though there are still fluctuations of both D1 and D2 around their average values. But the fluctuations are certainly much smaller than in water. In the corresponding model 2 intercalation window simulation, where D2 was restrained around zero degrees and D1 rotated through its entire range, the possibility of de-intercalation of either one of or both ring systems, break-up of the four intercalating bases and general decomposition of the bZIP-DNA complex arises. What did happen, actually, is a shift of the D1 peak from –165 to –135 with a corresponding change in the position of the benzoxazole ring so as to protrude from its confining box (see Figure 3.3.13), as D1 was rotated over this range. The 72 probability displayed in Figure 3.3.11 strongly favors this conformation. Note that D1 and D2 are, respectively, proximally associated with the quinoline and benzoxazole ring systems. Nevertheless it is the benzoxazole that comes out of its confining box. That is potentially due to the pyrimidine bases that are associated with the benzoxazole ring system (see Figure 3.3.13) presenting less ring surface area than do the purine bases. In spite of the benzoxazole de-intercalation, the hydrogen bonding of the four confining bases was not significantly altered (see Figure 3.3.14) for all the sampled D1 values indicating that this binding mode does not disturb the DNA pairing. The experimental linear dichroism data on YOPRO that suggested a surface binding mode at higher concentration was based on an excitonic71 interaction mechanism that indicates sufficiently close multiple dye binding to the DNA. Clearly, in the MD with one YOPRO molecule, such effects are beyond the scope of this investigation. While, it should be understood that any MD force field used may not completely reflect reality, it is still of interest that an external binding mode of lower free energy can be found, at least in a reaction coordinate-based, window simulation. Model 3 also has both linker bonds equivalent but with reduced (by a factor of two) dihedral force constant values relative to model 2. The values are not fundamentally based, but were introduced to explore the consequences of increasing the ability to twist around these dihedrals. In water, the D2 probability mainly peaks around zero degrees, and is essentially the same as model 2. Along the D1 dihedral coordinate, the four peak behavior of model 2 is smeared out into two peaks. Thus reduction of the dihedral potential, though intrinsically weak (from a barrier of 2 to 1 kcal/mol, so essentially thermal) has a modest but noticeable effect on the 73 probability along the D1 coordinate. The sampling is very diffuse and certainly far from producing planar configurations. When intercalated, the unrestrained model 3 simulation produces an average D1+D2 angle (essentially the angle between the planes of the two ring systems) of – 120 degrees. This is substantially different than the much more planar arrangement in Model 2. This difference can be traced to the “spontaneous” de-intercalation that occurs rapidly in the unrestrained simulation. Once the benzoxazole ring is no longer confined by the surrounding bases, it becomes freer in its ability to rotate. An exploration of the PMF with D2 restrained around 20 degrees, its average position in the unrestrained simulation, and D1 rotating through its entire range finds the free energy minimum around –135 degrees. Thus, again a stable de-intercalated conformation is found. In summary, model 2 that is best among the three from the point of view of force field development, leads to a view of the intercalation that is in good agreement with experimental data.64, 71 Namely, YOPRO does intercalate in a cage of four bases with a relatively planar conformation of the two ring systems. Furthermore, the orientation of the YOPRO ring planes is essentially perpendicular to direction of the dsDNA. Another potential mode of YOPRO binding is found where the oxazole ring is de-intercalated. A surface binding mode is also inferred in the experimental data71 at higher YOPRO concentrations. 74 3.4 Defects of the Three Models By examining three distinct parameterizations of the methine linker differing in the dihedral force constants, we revealed considerable sensitivity of the dye’s conformational behavior and intercalation stability to the parameterization of the dihedral potentials involved. This finding points to the improper description of YOPRO and other cyanine dyes by automated force field parameter-assigning tools and would be relevant for future MD studies of YOPRO and related dyes. However, these modified Amber-Generated models have several severe defects. Firstly, these models are not parameterized from quantum chemical calculations but based on the empirical estimations on the delocalized system on the linker. Therefore, the choice of Model 2 as the most accurate parameterization may not be reasonable. Most importantly, none of these models investigated the possible correlations between D1 and D2 in this conjugated system. As we show in Section 3.3, this turns out to be crucial to the correct description of the linker dihedral force field and has a major influence on the sampling in both water and when intercalated. Secondly, only one planar conformation (trans-cis as shown in Figure 3.3.2) was used to initialize the simulations of the intercalation, which leads to the limitation of the sampling range of the intercalated YORPO. Intuitively, all four planar conformations should be considered equivalently as the initial state since every time YOPRO is “trapped” in one conformation by the box constraints and can hardly jump to another without the de-intercalation. So the chosen trans-cis conformation may not represent all intercalation cases. 75 Thirdly, the bZIP bound DNA may not be necessary for the intercalation simulations. When intercalated, the YOPRO doesn’t interact with the bZIP directly. The DNA concentration assays using YOPRO doesn’t require protein-bound DNA samples either72. Finally, all simulation were performed on the electronic ground state, while the relevant dynamics during fluorescence take place in the excited state. The conformational probability distribution for solvated and intercalated YOPRO must be evaluated. Therefore, later on we carried out charge and force field parameter fitting using the IPolQ method11 as in Section 2.2 and reran the MD simulations on both S0 and S1 states with the corresponding fitted force field respectively. 76 Constant Parameters 3.5 S0 and S1 YOPRO Free Energy Surfaces in Water with Fitted Force Due to defects of the AMBER modified force field models as discussed in section 3.4, we developed our own force field models for both S0 and S1 electronic states of YOPRO starting from IPolQ charge fitting model11. The whole fitting procedure is detailed in Section 2.2. The following is the results from performing the simulations with the fitted force field models and closely follows the our paper States in Solution and when Intercalated in dsDNA published in 201773. Molecular Dynamics of Oxazole Yellow Dye in its Ground and First Excited Electronic YOP_S0_WAT_PMF YOP_S1_WAT_PMF 120 60 0 -60 -120 2 D 0.000 2.500 5.000 7.500 10.00 12.50 15.00 17.50 20.00 22.50 25.00 120 60 0 -60 -120 2 D 0.000 4.620 9.240 13.86 18.48 23.10 27.72 32.34 36.96 41.58 46.20 60 120 60 120 -120 -60 0 D1 -120 -60 0 D1 Figure 3.5.1 PMF surfaces of S0 and S1 YOPRO in water. We carried out two-dimensional umbrella sampling to obtain free energy surfaces, as detailed in Section 2.2. The two dimensional free energy surfaces, also potential of mean force (PMF) surfaces are defined as known as the corresponding the probability distribution function. The results are shown in Figure 3.5.1: PMF(D1, D2) ≡ −kBT ln[ p(D1, D2)] where 77 p(D1, D2) is The PMF plots of the S0 and S1 states reproduce the minima and maxima on the corresponding subtracted energy surfaces from subtracting the AMBER derived classical potential V from the quantum chemical energy Q as used for the force constant parameter fitting in Section 1.4 (Figure 3.5.2), which indicates that the dihedral potentials are quite strong compared to the solvation ability of water. On both the PMF and subtracted energy surfaces, S0 has four minima around (0, 0), (0, ±180), (±180, 0), (±180, ±180), which characterize the four planar conformations: cis−cis, cis−trans, trans−cis, trans−trans, respectively. In the S1 state, all of these minima become high-free-energy regions. The probability analysis corresponding to the PMF shows that the S0 state is dominated by the trans−cis conformer, peaked around (±180,0). Thus, it should the region with the highest probability of Franck−Condon transitions. For the S1 state, the probabilities peak around (+90,0) and (−90,0). Figure 3.5.2 Subtracted energy surfaces of S0 and S1 YOPRO. 78 The umbrella sampling results of the S1 state in Figure 3.5.3, obtained by using the second method described in the Section 2.2, show that all four minima on the S0 surface correspond to high free energy regions of the S1 surface, which is consistent with the result of the full range sampling using the first method. Thus, they are also suitable starting points for obtaining trajectories on the S1 surface starting from the respective Franck−Condon points. However, as noted above, the S0 surface probability is dominated by the trans-cis conformation. Figure 3.5.3 PMFs of S1 YOPRO in water around the four minima on the S0 quantum energy surface. 79 3.6 S0 and S1 YOPRO dsDNA intercalated. of the four local minima on the S0 state. The YOPRO does stably intercalate in the As discussed in Section 2.2, we sampled intercalated YOPRO in the vicinity S0 state at all four planar conformations as in Figure 3.6.1. For the S1 state, PMFs of intercalated YOPRO around the three S0 minima at (0, 0), (0,180) and (180,180) show that intercalation of these planar conformations of S1 YOPRO doesn’t change free energy profile pattern significantly. The global minimum of the S0 state is at around (±180, 0) which, therefore, is the geometric origin of the Franck−Condon transition to the S1 surface. This most highly populated ground-state conformer de-intercalates when D1 goes below 150°. Thus, the only relaxation path for this S1-intercalated YOPRO is in the direction of increasing D1. The PMF surfaces of the two electronic states of the intercalations are shown below in Figure 3.6.1 and Figure 3.6.2. 80 Figure 3.6.1 PMFs of intercalated S0 YOPRO centered at the S0 global minima. 81 Figure 3.6.2 PMFs of intercalated S1 YOPRO centered at the S0 global minima. 82 3.7 Nonradiative Relaxation Path on the S1 Surface from the S0 Equilibrium Position We obtained the relaxation pathway on the S1 surface of both water-solvated and intercalated YOPRO starting from the S0 equilibrium position (±180,0) using a steepest-descent algorithm that we implemented. As shown in Figure 3.7.1, the intercalated path has a distinct barrier between the initial (FC point) and final (local minimum) positions that is absent in the water simulation path. Figure 3.7.1a shows that starting from the lower point of 15.85 kcal/mol on the intercalated surface the relaxation path goes uphill at D1 = 210° and forms a barrier of 2.73 kcal/mol, whereas the path on the water-solvated surface starts from the higher point of 20.07 kcal/mol and is completely downhill. Figure 3.7.1b shows that the relaxation path for the intercalation is elongated due to the uphill region, which indicates that a longer free-energy pathway is involved in the relaxation process on the intercalated surface than on the water-solvated surface. 83 Figure 3.7.1 PMFs and relaxation pathways for S1 YOPRO in water and intercalated. The two views displayed, panels (a) and (b), are for better visualization. Black (path) and gray (PMF) are in water. Red (path) and light red (PMF) are intercalated. The barrier to relaxation is present only in the intercalated pathway. 84 Field of the Two Electronic States 3.8 Discussion and Concluding Remarks of the Results from the Fitted Force Conformational sampling of YOPRO in its S0 and S1 electronic states in solution and when intercalated in DNA was investigated with the aims of showing that YOPRO could stably intercalate in dsDNA and that intercalation leads to more constrained sampling around the linker dihedrals. The Methods Section 2 used a recently developed procedure11 to obtain potentials for the linker dihedrals. A coupling of the two dihedrals in both electronic states was observed. Therefore, an individual force constant fitting is essential for each dihedral conformational range. The S0 quantum energy surface along the two dihedrals shows a centrosymmetric pattern, which indicates the two C-C bonds on the linker (C6-C10 and C10-C11) are of the same bond order. This symmetry is not present in the S1 surface, revealing destruction of the conjugation of the linker due to charge transfer to the central carbon (C10) from its adjacent carbons (C6 and C11) 48. Tables 2.2.3-2.2.4 show that in the S1 state the D1 dihedral (C6-C10) becomes more like a single bond when D2 goes to the high energy regions around ±90°, while the D2 dihedral (C10-C11) becomes more like a pure double bond when D1 goes to the same angles, which correspond to the energy minima. The ground state dihedral population distributions in solution and when intercalated are essential to finding out where, in the vertical, Frank-Condon transition, there will be excited state population. Computational48, 74, 75 and experimental76-81 investigations of the S1 excited state of cyanine dyes suggest that torsional motions and other deformations in the bonds linking the two ring systems, 85 if not restrained, can take place readily. From the computational perspective, there is consensus that the S1 potential surface varies more slowly than the S0 surface. Experiments suggest correspondingly rapid motions along the S1 surface. Thus, without some constraint mechanism, twisting in the excited state is facile, avoided crossings or conical intersections (CIs) between excited and ground electronic states are readily reached, and the ground state is accessed via a nonradiative pathway. This process leads to a great reduction in fluorescence intensity. To prevent twisting, various constraints have been invoked. Lowering the temperature and increasing solution viscosity, or both, are the standard ways to prevent access of nonradiative pathways.82 83 Dye Intercalation in DNA is another method of enforcing a constraint. Our umbrella samplings show that when intercalated in the S0 equilibrium conformation (trans-cis), the difference between the starting pointing, which is the free energy maximum on the S1 surface in the sampling region, and the local minimum, is reduced from 20.07 kcal/mol to 15.85 kcal/mol. On the S1 surface of ME-1122P, a related dye, the local minimum is in the vicinity of the conical intersection point.74 This reduction is also observed when the S1 YOPRO is intercalated in the other three planar conformations (Figures 3.5.3 and 3.6.1), which correspond to the other low free energy regions on the S0 surface. The S1 surface dynamics of starting from the Franck-Condon point to a conical intersection point that is in the vicinity of S1 surface minimum can be thought of as consisting of two steps: a diffusive motion along a reaction coordinate, followed by the transition from the S1 to S0 surface around the CI point. Thus, the overall rate 86 Dk Rk Dk Dk Dk = 1 - k = k 1 - D + k 1 - R - E k T b B bww 0 b eww 0 g and barrier is the product of the well constant for this consecutive process can be written as with a is the rate of S1→S0 deactivation. Assuming that diffusive rate constant and . For as is often the case83, the overall rate is dominated by large compared with the intercalation case, where there is a barrier to transition, and what amounts to a high viscosity for the reorientation of the linker dihedrals, a Kramers rate expression in the highly overdamped regime is indicated.84 In this regime, is given by the expression: (3.8.1) frequencies, respectively, where is the barrier height is the friction coefficient along the reaction coordinate and relative to the well origin. The Boltzmann factor rate reduction from the barrier of 2.73 kcal/mol, when D1 goes to 210° and D2 fluctuates around 0°, would reduce the rate constant of non-radiative relaxation by about 100 fold. It is not possible to obtain a value for the friction coefficient for twisting of intercalated YOPRO, however it is safe to assume that it would be considerably larger than that for YOPRO twisting in water. Furthermore, the longer path to climb the barrier, when intercalated, also should decrease the rate of passage. Therefore, intercalation provides a combination of increased viscosity and a barrier to reaching the CI point, relative to the completely downhill process in water, and leads to the increase of fluorescence intensity with YOPRO intercalation. 87 Eb bw 0w g Rk Dk 3.8.1 C++ program implementing the steepest descent algorithm #include #include #include #include #include using namespace std; int main() { ifstream input; string in_name = ""; cout<<"input a file"<>in_name; input.open(in_name); if(!input.is_open()) { cout<<"No such file!"<>xbsize>>ybsize; 88 double xmax,xmin,ymax,ymin =0; cout<<"input xmin,xmax,ymin,ymax"<>xmin>>xmax>>ymin>>ymax; int xbnum = 1+(xmax-xmin)/xbsize; int ybnum = 1+(ymax-ymin)/ybsize; double** PMF = new double* [xbnum]; for(int i=0;i>skip>>skip>>PMF[i][j]>>skip; //cout<>x_st>>y_st; int x_index = (x_st-xmin)/xbsize; int y_index = (y_st-ymin)/ybsize; double PMF_st = PMF[x_index][y_index]; ofstream output(in_name+"result.txt"); vector> x_y; x_y.push_back(make_pair(x_index,y_index)); cout<=0 && y_index<=ybnum-1 && y_index>=0) { if(y_index0) neighbor_down = PMF[x_index][y_index-1]; if(x_index2) { if(y_index==x_y[x_y.size()-2].second && x_index == x_y[x_y.size()-2].first) //prevent stepping back and forth { y_index--; if(neighbor_right<=neighbor_down) { 91 PMF_next = neighbor_right; x_index++; } else { PMF_next = neighbor_down; y_index--; } if(PMF_next==PMF_old) { cout<<"done"<2) { if(y_index==x_y[x_y.size()-2].second && x_index == x_y[x_y.size()-2].first) //prevent stepping back and forth { y_index++; if(neighbor_right<=neighbor_up) { PMF_next = neighbor_right; x_index++; } else { PMF_next = neighbor_up; y_index++; } if(PMF_next==PMF_old) { cout<<"done"<