$.28... v3 .1 1 V. I. t v .«A :w 14.39%. J. . mm»_.$.m.m_ kW .7!“ mfiWVMIA a: 1.. IV V? f. .dr 7 . V . «751:... V V .- 3; . . V 2.9.3... 2 _ 3.3 av “, . V ,x : . V .3- .5 . . e ‘0‘ ‘ . -” .Nuwunmmw .r 17.... imuué , . R 2—. .1: i ii. 3.11.!“ £3..." unit; x4“ 00. .11! 3: In}. .ultoi‘x 51.5!" . 3.33.11. izl . 1i: .. 3%.”? . ! 5r .2 . 2... . . . V V HA . ....w.... Hat, . .. V ‘ v V V Latin...“ V...IV..5...V.‘ . V V V V h vs. $1.55: V. THE. Sinai»: a my»; 1 ‘ "Y‘V‘fl'r \. . 1:. Mg)” ; cjfl 03 5%6’791‘6 LIBRARY Michigan State This is to certify that the University dissertation entitled Modeling Non-Crystalline Networks presented by Ming Lei has been accepted towards fulfillment of the requirements for the PhD. degree in Physics M1 M. 'Major'Professor’s Si ature 0 8 '09“ — 7003 Date MSU is an Affirmative Action/Equal Opportunity Institution PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE W AA?” Irv CUW 6I01 cJClRC/DateDue.pG§-p.15 MODELING NON-CRYSTALLINE NETWORKS By MING LEI A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Physics and Astronomy 2003 ABSTRACT MODELING NON-CRYSTALLINE NETWORKS By MING LEI In this thesis, the author reports the modeling of both the static and the dynamical aspects of non—crystalline networks. Porous silicon and silica have attracted attention recently due to their unusual photoelectronic properties. Porosity is central to these striking properties which are not present in non-porous silicon and silica. We propose an algorithm that is effective in building fully-coordinated amorphous networks that are discontinuous in certain regions — that is, they contain large voids of mesoscopic or macroscopic dimensions. Such networks can be both porous and amorphous, and can also be finite in certain dimensions. Voids of arbitrary shapes and sizes are first superimposed on a crystalline silicon network. The atoms in the pore regions are removed. Local “defects” are created, then eliminated, as pairs of them are brought together by a defect migration process. The network is fully coordinated after the defect migration process. The Wooten Winer Weaire (WWW) algorithm, is then applied to make the network amorphous. Oxygen is inserted on every silicon-silicon bond to create a porous silica network. Silica networks in the form of an amorphous fiber and an amorphous film are created by this procedure. Distortions due to surface effects are investigated. The local atomic arrangement in these discontinuous networks is similar to that in bulk amorphous silica. Covalent bond lengths and angles in amorphous networks do not vary much be- cause of the high energies associated with bond length and angle distortions. There- fore, they can be viewed as constraints that do not change with time in any significant way. Proteins, viewed as another type of non-crystalline network, are glued together by covalent bonds, hydrogen bonds, hydrophobic interactions, and other interactions. The concentration of constraints in some regions of the proteins are so high that these regions are rigid. The other regions are flexible. The flexible regions of protein can exhibit large conformational changes. Protein functions and bio-activities are often coupled with these conformational changes. We have built an algorithm that samples protein conformations randomly. It is called Rigidity Optimized Conformational Kinetics (ROCK). It is efficient, as it avoids sampling conformations for the rigid regions of the proteins. The constraints in the flexible regions of proteins inter-lock with each other to form complicated networks of rings. ROCK closes all the rings simultaneously at every step of sampling the protein conformations. It is the first algorithm that samples the protein conformations by following the closure of all the rings in a complicated network. All the bond length and angle constraints are exactly preserved in the conformations sampled by ROCK. Main chain dihedral angles are restricted in the preferred regions of the Ramachandran plot. The generated conformations have good stereo-chemistry properties. ROCK samples a large scale conformational changes. Its capability is first demon- strated on a model molecule with two degrees of freedom. The conformations sampled by ROCK observes the same two symmetries which are present in the topology of the molecule. A large scale motion is shown in the conformations of HIV-1 protease sam- pled by ROCK. ROCK also samples conformational pathways between distinct con- formations of proteins. Multiple conformational trajectories are explored by ROCK between the closed, the occluded, and the open conformations of DHFR. Since ROCK explores both the main chain flexibility and the side chain flexibility, it is a good tool in the studies of protein-ligand interactions, ligand design, protein motions etc. To, My wife Jz'ng, My parents, My sister, And My mother in law iv ACKNOWLEDGEMENTS I would like to express my sincere gratitude toward my adviser, Prof. Michael F. Thorpe, for his insightful guidance, invaluable encouragement and countless sup- port. I did not have much research experience when I started my Ph.D. studies. It was under the direction of Mike that I finished the first research project in my life. Challenging and smart questions raised by l\=’Iike taught me how to check the results of research from a variety of ways, and how to ask the right questions to obtain the most interesting results. Mike is an exceptional expert on explaining complicated phenomenon by minimalist models. His teaching is invaluable to my future work. Mike gave me many suggestions on this thesis. I am much obligated to Prof. Leslie A. Kuhn for her endless help in my graduate study. Leslie taught me how to do research systematically through her patient and friendly teaching and communications. I learned how to appreciate the charming characteristics of individual proteins from Leslie. I am much obliged to Dr. Maria Zavodszky for her remarkable help in my research. It has always been a great pleasure to discuss a wide range of scientific topics with her. Candid feedback and innovative suggestions from Leslie and Maria are the key factors behind all the improvements in the program ROCK. Maria patiently and industriously scrutinized the manual of ROCK, and the Appendix and Acknowledgments of this thesis. I am indebted to Prof. A. Roy Day of Marquette University for close collaboration and numerous discussions on the flexibility of small molecules. I appreciate the generosity of Dr. Maria Kurnikova of University of Pittsburgh and Prof. Robert Cukier very much in sharing their molecular dynamic simulation data with me. Although these simulations are not discussed or cited in this dissertation, they greatly helped my understanding of the protein conformations. I thank very much Prof. Petkov of Central Michigan University and Prof. Simon J. L. Billinge for their generous sharing of the synchrotron diffraction experimental data with me. I also thank very much Dr. Michael VVachhold, Dr. Kashturi Rangan and Prof. Kanatzidis for their sharing of the knowledge on adamantane with me. I thank very much Prof. Walter Whiteley of York University, Prof. Normand Mousseau of University of Toronto, Prof. Chien-Peng Yuan, Prof. S. D. Mahanti and Dr. Claire Vieille for sharing their thoughts with me on the mathematics of rigidity analysis, on the protein dynamical trajectories, on the validity of the geometry aspects in proteins, and on the protein conformations in dihedral angle space. I also thank them a lot for their encouragement and help in various matters throughout my graduate career. I benefited a lot from the discussions with former graduate students in Prof. Thorpe’s and in Prof. Kuhn’s groups: Dr. Brandon Hespenheide, Dr. Andrew John Rader, Dr. Mykyta Chubynsky and Mr. Valentin Levashov. We spent a lot of time discussing the geometry and biochemical aspects of constraints and rigidity in proteins. Dr. Brandon Hespenheide and Dr. Andrew John Rader helped me a lot on the usage of the program FIRST. At last I would like to extend my thanks to my wife, my parents, my sister and my mother in law. I would not have been able to finish this challenging journey of the Ph.D. study without their support and understanding. vi TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES LIST OF ABBREVIATIONS 1 Introduction 1.1 1.2 1.3 Static Models of N(’)n-Crystalline Networks ............... 1.1.1 Continuous Random Network Models .............. 1.1.2 Discontinuous Networks Constraints and Flexibility Analysis ................... Sampling Conformations ......................... 1.3.1 Conformations of Proteins 1.3.2 Algorithms to Sample Protein Conformations ......... 1.3.3 Algorithms to Close Rings .................... 2 Modeling Discontinuous Random Networks 2.1 2.2 2.3 2.4 WWW Algorithms on CRN Models ................... Procedure to Build DCRN Silicon Network Models .......... From Silicon to Silica ........................... Defects and Hydrogen .......................... vii xi xii xvi 10 11 12 16 16 30 3 Discontinuous Random Network Models 3.1 Amorphous Fiber Silica Model ...................... 3.1.1 3.1.2 Procedure to Build Amorphous Fiber Network Model ..... Properties of Amorphous Fiber Network Model ........ 3.2 Amorphous Film Silica Model ...................... 3.2.1 3.2.2 Procedure to Build Amorphous Film Network Model ..... Properties of Amorphous Film Network Model ......... 3.3 Distribution Fimctions .......................... 3.4 Metal-Adamantane Network Model ................... 4 Constraints, Conformations and Flexibility 4.1 Constraints and Conformations ..................... 4.2 Flexibility and Degrees of Freedom ................... 4.3 Flexibility Analysis ............................ 4.4 Proteins .................................. 4.4.1 4.4.2 4.4.3 4.4.4 4.4.5 4.4.6 Hydrogen Bonds and HydrOphobic Interactions ........ Interactions in Proteins ...................... Flexibility Analysis on Proteins ................. Validity of Flexibility Analysis on Proteins ........... Advantage of Flexibility Analysis ................ viii 34 34 34 39 44 44 44 47 56 63 63 64 67 69 69 71 73 74 77 5 Rigidity Optimized Conformational Kinetics (ROCK) 5.1 5.2 5.3 5.4 5.5 Ring Clusters and Side Branches .................... Sampling Conformation of a Single Ring ................ 5.2.1 Conformations of a Seven-Fold Ring .............. The Complexity ASSOCI‘dIOd with a Network of Rings ......... Conformations of Side Chains ...................... V'Vorkflow ................................. 6 Results and Discussions 6.1 6.2 6.3 Model TA'IOIPCUIC’ H8Cg $20 ......................... 6.1.1 Conformations of Model Molecule HgCgSgo ........... Conformations of HIV-1 Protease .................... 6.2.1 6.2.2 6.2.3 Structures and Functions of HIV-1 Protease .......... Flexibility Analysis on HIV-1 Protease ............. Conformations of HIV-1 Protease ................ Conformational Pathways of DHFR ................... 6.3.1 6.3.2 6.3.3 6.3.4 Structures and Functions of DHFR ............... Flexibility Analysis on DHFR .................. Sampling Directed Pathways ................... Conformational Pathways of DHF R ............... 7 Summary and Perspectives ix 79 83 86 88 93 98 98 100 103 103 107 108 120 120 125 130 132 145 7.1 Summary ................................. 145 7.2 Limitations ................................ 151 7.3 Api’flications and Perspectives ...................... 152 APPENDIX 157 A Radial Distribution Function of Uniform Media 158 B Ramachandran Plot 164 B.1 Ramachandran Plot. of Residues Other Than Glycine and Proline . . 164 B2 Ramachandran Plot, of Glycine ..................... 166 B3 Ramachandran Plot of Proline ...................... 167 LIST OF REFERENCES 171 LIST OF TABLES 6.1 Differences in coordinates of two conformations . . . . . . . . . . . . 140 6.2 Differences in main chain (,6 and t": angles of two conformations . . . . 142 LIST OF FIGURES IMAGES IN THIS THESIS ARE PRESENTED IN COLOR 2.1 “W’HV algoritlun illustrated in 2D ................... 2.2 An amorphous silicon network ...................... 2.3 A crystalline porous silicon network ................... 2.4 A fully coordinated porous silicon network ............... 2.5 An amorphous porous silicon network .................. 2.6 Defects migration procedure ....................... 2.7 Hydrogen migration procedure ...................... 3.1 A crystalline fiber silicon network .................... 3.2 A fully coordinated crystalline fiber silicon network .......... 3.3 A fully coordinated amorphous fiber silica network .......... 3.4 Density distribution of fiber silica network ............... 3.5 Definition of surface atoms ........................ 3.6 Bond length and angle distortions in fiber silica network ....... 3.7 A crystalline film silicon network .................... 3.8 A fully coordinated amorphous film silica network ........... 3.9 Bond length and angle distortions in film silica network ........ 3.10 Radial Distribution Function of networks ................ 3.11 Reduced Density Distribution Function of fiber silica network xii 20 23 28 29 32 33 35 37 40 42 46 48 52 54 3.12 Reduced Density Distribution Function of networks .......... 55 3.13 An adamantane unit ........................... 57 3.14 Schematic diagram of mesoscopically porous metal-adamantane . . . 59 3.15 Diffraction pattern of metal-adamantane network ........... 61 4.1 Two molecules having same degrees of freedom ............. 65 4.2 Two conformations of a six-fold ring .................. 67 4.3 Dependent and independent constraints in molecules ......... 68 4.4 Amino acid chains ............................ 70 4.5 Hydrogen bonds .............................. 71 4.6 Energy landscape of constraint concept ................. 75 5.1 Topology of a small branch of HIV —1 protease ............. 81 5.2 A small flexible molecule with two rings ................ 82 5.3 Definitions of bond length, angle and dihedral angle .......... 84 5.4 A seven-fold ring ............................. 87 5.5 Conformations of a seven-fold ring .................... 89 5.6 A simple network with two degrees of freedom ............. 91 5.7 chirality .................................. 94 6.1 A model molecule H8C8S20 ........................ 99 6.2 Conformational space of model molecule ................ 101 6.3 Structure of unliganded HIV-1 protease ................. 105 xiii 6.4 6.6 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 A2 B.1 B2 B3 Flexibility properties of HIV-1 protease ................. Superimposition of HIV-1 protease conformations in ribbon diagram . Superimposition of HIV-1 protease conformations in wire diagram . . Distance fluctuations between flexible loops in HIV-1 protease Average fluctuations of C, atoms in HIV-1 protease .......... Distributions of ILE14 (X) and 1;" angles ................. Tip of flexible flaps of HIV-1 protease .................. Three conformations of DHFR ...................... Flexibility properties of DHFR at (:lifferent cut off energies ...... Flexibility properties of DHFR ...................... RMSD to the occluded and the closed conformations ......... RMSD to the occluded and the open conformations .......... RMSD to the closed and the open conformations ............ DARMSD to the occluded and the closed conformations ....... Superimposition of two conformations .................. Space correlation of a sphere and a film ................. RDF of uniform media .......................... Ramachandran plot of 18 residues .................... Glycine and proline ............................ Ramachandran plot of glycine ...................... xiv 109 111 112 119 122 128 131 135 136 141 160 165 167 168 B.4 Ramachandran plot of proline ...................... 170 XV LIST OF ABBREVIATIONS 2D ............................................................. Two Dimension 3D ........................................................... Three Dimension CRN ............................................. Continuous Random Network DARMSD ........................ Dihedral Angle Root Mean Square Deviation DCRN ....................................... Dis-Continuous Random Network DDF ............................................ Density Distribution Function DHF .......................................................... 7,8-dihydrofolate DHFR ................................................ Dihydrofolate Reductase DOF ....................................................... Degrees of Freedom ecDHFR .............................. Escherichia coli Dihydrofolate Reductase HIV ........................................... Human Immunodeficiency Virus MC ............................................................... lV‘Ionte Carlo MD ....................................................... Molecular Dynamics MM ...................................................... lV‘Iolecular Mechanics NADPH .......................... nicotinamide adenine dinucleotide phosphate PDB ....................................................... Protein Data Bank PDF ................................................ Pair Distribution Function QM ....................................................... Quantum Mechanics RCE ................................................... Ring Closure Equations xvi RDDF ................................. Reduced Density Distribution Function RDF ............................................. Radial Distribution Function RMSD ........................................... Root Mean Square Deviation RPDF ..................................... Reduced Pair Distribution Function THF ................................................... 5.6,7,8-tetrahydrofolate WWW .................................................. Wooten Winer Vt'eaire xvii Chapter 1: Introduction Physics and geometry are “a marriage made in heaven”, as said by Sir Michael Atiyah in his talk [1]. Geometry, which is a branch of mathematics, has been ex- tensively utilized in the study of the physical world. Copernicus, for example, put forward the heliocentric theory to replace the geocentric theory after pondering on the geometrical properties of the observed planet orbits. N avigations from the middle age till today all rely on calculations using geometry. Fractal geometry, which is a newly invented branch in geometry, is the mathematical language of chaotic systems. Geometry and mathematics are of the most important tools in theoretical physics studies. Complicated systems can of be too modeled using a few simple geometrical prin- ciples. Graner [2] nicely reviewed comprehensive aspects of building fluid foam models from geometrical considerations. Arns et al. [3] discuss how to build disordered ma- terial models whose micro-structures obey particular geometrical requirements. Free energy of the Langmuir mono-layer is examined by Lésche and his co—worker [4] based on an empirical Hamiltonian that accounts for the geometrical arrangement of micro- structures of the layers. Many more examples can be listed. All of them address certain properties of complicated materials from simple yet adequate geometrical thinking. This thesis shows two more examples of the application of geometry in condensed matter physics. The first example is on the building of non-crystalline network models with geometrical restrictions on bond length and angles. The second example is about sampling conformations of non—crystalline networks. Conformations are spatial arrangements of atoms whose bond lengths and angles are correct. Both of these two examples show how to build complicated non-crystalline network models from geometrical rules. This chapter is an introduction of the thesis. It explains the motivations of our research work. Chapter 2 discusses the algorithm to build non-crystalline networks, the results of which are examined in Chapter 3. The similarities and differences be- tween glassy networks and proteins are investigated in Chapter 4. Though they share common characteristics, amorphous networks and proteins differ in that the former do not have multiple dynamical conformations yet the latter do. A new approach to sample protein conformations is elucidated in Chapter 5. Two applications of the algorithm are shown in Chapter 6. Chapter 7 summarizes this thesis and re-iterate how simple geometrical considerations can lead to complex non-crystalline network models. 1.1 Static Models of Non-Crystalline Networks Short range structures of amorphous networks are very similar to those in the corresponding crystalline networks. For example, the bond lengths and angles of silicon atoms in the amorphous silicon networks are almost the same as those in the crystalline silicon networks. Those properties that rely on the short range structures, such as the vacuum UV absorption spectra and the electronic density of states curves, differ only in detail [5] between the amorphous and the crystalline networks. On the other hand, amorphous networks are remarkably different from crystalline networks in the characteristics related to the medium to long range order. For example the X-ray or neutron diffraction of amorphous materials detect broad peaks while the diffractions from crystalline materials show many fine sharp peaks. The modeling of amorphous networks has long been an interesting topic in condensed matter physics. 1.1.1 Continuous Random Network Models In the early 1930s, it was hypothesized that amorphous material are composed of numerous micro—crystals. The orientations of the micro-crystals are. not aligned so that the long range order of the crystals is destroyed. The X-ray powder diffraction pattern of the models built upon this hypothesis does not fit with the experimen- tal data. Zachariasen [6] proposed that the amorphous material is not made up of micro—crystals. The atomic arrangement in an amorphous network does not have symmetry and periodicity. However Zachariasen did not propose an algorithm to build a continuous three-dimensional network which lacks symmetry and periodicity. An amorphous network model that satisfies Zachariasen’s criteria is built by hand by Bell and Dean [7, 8]. It is an amorphous silica model with 614 atoms. A hand built amorphous silicon model containing 440 atoms was reported by Polk [9] in 1971. Atomic coordinates in that model were detected by laser beams and saved into a computer for analysis [10]. These hand built models have free surfaces. The sizes of these models are limited in the order of tens of angstroms. Moreover, since a large portion of the atoms are at or close to the surfaces, the number of atoms that can be used to analyze the bulk properties of amorphous material is not large. Henderson [11] built a periodic amorphous network model by hand. There are only 61 atoms in the supercell of the periodic model. Computational modeling is the area that thrived since the early attempt [12] when a periodic amorphous model with 54 atoms in the supercell was generated by computer. The scalability problem hindered further development of the algorithm be- hind the model. Guttman [13, 14] built amorphous network models in computers by linking atoms randomly. A subsequent relaxation procedure reduces the distortions of bond lengths and angles. Bonds are allowed to be switched in the relaxation pro— cedure. The most successful algorithm today in building amorphous network models is the WWW technique proposed by Wooten, W iner, and Weaire [15]. The details of the algorithm are discussed in Chapter 2. Several other algorithms are suggested thereafter, all of which are modifications of the original W W \V technique. For ex- ample the improvement by Barkema and Mousseau [16] makes the computational modeling of device sized amorphous silicon models [17] with more than 10,000 atoms possible. The amorphous network models built by the WWVVV and other similar techniques are called Continuous Random Network (CRN) models. By continuous it is meant that the networks are infinite, without disruption in the distribution of atoms. By random it is meant that the topology of an amorphous network model is different to that in the corresponding crystalline network. The CRN models built by the WWW algorithm match well with real amorphous silicon in terms of properties of electronic states, X-ray diffraction and the pair distribution function (PDF). The success of the WWW algorithm comes from the two geometrical principles in building CRN networks that are the essence of amorphous networks. Atoms are maintained as fully- coordinated. That is, all silicon and germanium atoms have four and exactly four nearest neighbors, while all oxygen, sulfur and selenium atoms have two and exactly two nearest neighbors. This principle originates from the preferred valences of these elements in semi-conductors. The topology of a CRN model is different from that of a crystalline network model. Randomly positioning atoms in a supercell will not create a CRN model. The WWW technique is a trustworthy method in creating a network whose topology is totally different from that of a crystalline network. The high quality models built by the WWW algorithm prove how geometrical approaches benefit the modeling of non-crystalline networks. As an example, we show in Section 3.4 how to model the amorphous metal-adamantane network starting from an amorphous gallium arsenide model. The powder diffraction pattern calculated from the model matches well with that measured in experiment. Molecular dynamics (MD) simulations coupled with empirical or semi-empirical potentials have been used to build amorphous silicon network models since 1985. Several empirical potentials have been invented and parameterized for the simulation of amorphous silicon. The SW potential by Stillinger and Weber [18], the potential invented by Biswas and Hamann [19], and the potential by Chelikowsky [20] are all composed of a two-body interaction part and a three-body interaction part, the first of which depicts the bond lengths vibrations while the latter of which describes the bond bending vibrations. Though the potential of Tersoff [21] has only pair wise interactions, the parameters in the interaction depend on the bond angles as well. Therefore the three-body interaction is implicitly calculated in the Tersoff’s potential. The semi-empirical potential by Baskes et al. [22] is more complicated in form. It is supposed to be in good agreement with first principle calculations. The SW potential is most widely used in MD simulations. Despite much effort, the empirical and the semi-empirical potentials are not accurate on all phases of silicon. Furthermore, the connectivity is not guaranteed. There are dangling bonds in the amorphous network models built by MD simulations. Therefore MD is not the best way to generate amorphous models as of today. Car and Parrinello [23] generated a small amorphous silicon model of 54 atoms by first principle quantum mechanical calculations. The calculation cost however forbids further application of such algorithms at present, and the small size means that these models have serious strains due to the periodic boundary conditions. 1.1.2 Discontinuous Networks Discontinuities in atom distributions lead to intriguing and unexpected properties that are not present in materials without discontinuities. Porous silicon has application potential in harvesting solar energy. Canham et al. [24] report that porous silicon is photo-luminescent. The average diameter of the pores is about 13nm. The thickness of the silicon layers between the pores is on the order of mm, measured by X-ray diffraction experiments [25]. Quantum confinement effects as well as the altered gaps between electronic states [26] are the causes of the photoluminescence phenomenon. The silicon layers between the pores are largely crystalline [27] rather than amorphous. Porous silicon is bio-compatible and bio-degradable [28]. The body does not reject organs made of porous silicon. Porous silicon has the potential to be the platform of future biomedical implants and artificial organs. Porous silica films are easy to fabricate. They have been used as chemical sen- sors and sources of photoluminescence. Zhao et al. [29] produced silica films whose porosities are between 51% and 75%. McDonagh et al. [30] applied sol-gel porous silica films to sensor the oxygen. Both Cohen and his co-workers [31] and Dag et al. [32] observe bright photoluminescence from the porous silica films containing nan- oclusters of silicon. Amorphous silica films, though not porous, are photo-luminescent as well. Yoshida et al. [33] have found blue photoluminescence from slightly silicon d0ped amorphous silica films. The origin of the photoluminescence is believed to be in the silicon nanocrystals. The structures and surface properties of amorphous silica films have been studied by a variety of techniques including electron diffraction [34], infrared spectroscopy [35], scanning reflection electron microscopy [36], Raman spectroscopy [37], and N MR [38] et al. Ab initio simulations [39, 40], MD simulations [41, 42], and Monte Carlo (MC) simulations [43] have all been used to study either amorphous silica films or the surface properties of amorphous silica. All of these materials mentioned above are not continuous in the traditional sense in that they are not microscopically homogeneous. Porous silicon and silica contain virtually periodic voids that break the uniform distribution of the atoms over space. Amorphous silica films are not continuous because of the discontinuity of atom distributions over the surfaces. Such discontinuities result in exciting and new material properties. Bulk silicon and silica, either crystalline or amorphous, are not photo-luminescent. On the other hand, porous silicon, porous silica and thin silica films all show photolumiuescence effects. Though the origin of photoluminescence in these materials is currently being debated, it is almost certain that photo-luminescent characteristics in these materials involve the discontinuity in the spatial arrangement of atoms. Though algorithms to build CRN models such as the “WV W technique have been available for a long time, there is not yet. a generic algorithm for building discontin— uous random network (DCRN) models. Though the endeavor of building DCRN models may seem to be unnecessary at the first glance because such materials as amorphous porous silicon and amorphous porous silica are not much discussed in lit- eratures yet, the author argues that these materials are not far fetched from being manufactured, considering the facts that 1) porous silicon and silica are easy to fab- ricate and 2) amorphous silicon and silica are stable. Since the porosity in crystalline silicon and silica leads to new properties, the porosity in amorphous silicon and silica is likely to bring exciting properties as well. The amorphous porous silicon models provide the first glimpse of the likely structural properties of such materials. Though not have been manufactured, the amorphous silicon film has been computationally modeled by Monte Carlo simulations using empirical potential [44] and by ab initio simulations [45]. The properties of amorphous silica films have also been examined, as described above. Local bond geometries in the DCRN models should be similar to those in the crystalline networks. This requirement is the natural result of the strongly covalent characteristics of the glass forming elements such as silicon, germanium, oxygen, sulfur and selenium. The geometrical concepts which are the roots of the WWW algorithm also serve as the foundations of our algorithm to build the DCRN models. Step by step, Chapter 2 reveals the methodology to build the DCRN models. 1.2 Constraints and Flexibility Analysis The empirical Keating potential is used in building DCRN models. The Keating potential reaches its minimum values of zero when bond lengths and angles are of their optimal values. The potential energy can be huge when distortions in bond lengths and angles are large. \Vhen it costs an infinite amount of energy to distort bond lengths and angles, every bond length and angle requirement is called a constraint. At finite temperatures the atoms in non-crystalline networks are in constant motions, due to the thermal fluctuation energy of kBT, in which kg and T are the Boltzmann constant and the temperature respectively. The thermal fluctuation en— ergy pushes the atoms so that they oscillate around the local potential minima. These oscillatory motions do not change the averaged relative orientation between atoms, not to mention the overall shapes of the networks. On the other hand, some non- crystalline networks have predominantly internal motions. Proteins are such exam- ples. The scale of the protein internal motions is large, for example the relative distance between atoms in different conformations of HIV-1 protease can vary be- tween 2.7A and 8.0A, shown in Section 6.2 in this thesis. These large scale motions do not result from the thermal fluctuations of bond lengths and angles. Rather they are caused by large thermal and ligand-induced fluctuations of the internal degrees of freedom (DOF) in the network. The concept of constraints simplifies the analysis of the internal large scale mo- tions of the non-crystalline networks. When bond lengths and angles are treated as constraints, the DOF of a network is simply the difference between the total number of degrees of freedom and the total number of independent constraints, as explained by Maxwell in 1864 [46]. The question of whether a non-crystalline network has large scale internal motions is answered by the counting of the DOF. A positive DOF in a network is correlated with the likelihood of large scale motions. Moreover, a network shows large scale motions without breaking any constraints when it undergoes motion by sampling the DOF. However the iV‘Iaxwell counting is not exact. A procedure called rigidity analysis was first used by Jacobs et al. to count constraints in proteins, based on the pebble game algorithm [47, 48]. Since what we care about here is the flexibility properties of networks, the author uses the phrase flexibility analysis instead of rigidity analysis throughout this thesis. When only bond length constraints are counted, or when both bond length and bond angle constraints are counted, the DOF calculated by this pro- cedure is exact for generic networks in 2D. Flexibility analysis is not exact for generic networks in 3D when only bond length constraints are included. However under usual conditions, though this has not been proven rigorously, its application to 3D networks is exact when both bond length and bond angle constraints are counted [49, 50, 51]. Flexibility analysis identifies the regions in generic networks that have positive numbers of DOF. These regions can have large scale internal motions that allow big changes in the relative orientations between atoms and the overall shapes of the networks. They are called the flexible regions. The other regions do not have large scale motions, hence they are called the rigid regions. Negative DOF are associated with over-constrained, or stressed regions of the network. Chapter 4 accounts for the details of the flexibility analysis. 1.3 Sampling Conformations It is one matter to distinguish the flexible regions from the rigid regions in net- works, yet another topic to search the spatial arrangements of atoms under which the bond lengths and angles obey all constraints. A spatial arrangement of atoms is called a conformation if bond lengths and angles in such an atomic arrangement observe the predefined constraints. In usual cases, one conformation is already ob- tained from modeling, from X-ray or neutron diffraction experiments, or from NIVIR experiments. The problem is to find other conformations that satisfy the same set of constraints as the original one does. 1.3.1 Conformations of Proteins Proteins, being non-crystalline and finite networks from a topological point of view, have one or more rigid regions and several flexible regions. The rigid regions of proteins serve as the stabilizing cores. At least one flexible region is typically close to the catalytic site in every protein. The catalytic functions of proteins are always coupled with conformational changes, either involving the side chain atoms or involving both the side chain and the main chain atoms. The main chain and the side chain of proteins are discussed in detail in Section 4.4.1. The flexible regions of proteins either uncover the functional sites, or make specific interactions with the substrates, or escort the substrates to and from the function sites, or directly assist the catalytic functions, through large scale conformational changes. Adenylate kinase (ADK), for example, has large scale domain motions related to its catalytic cycle. Berry et al. [52] resolved the X-ray crystal structure of the protein in its ligand-binding conformation, which is called the “closed” conformation. The large flexible lid domain covers one of the bound substrates in one such conformation. The conformation of the lid domain is completely changed when there is no ligand bound to the protein [53, 54]. The conformational transitions of ADK are believed to be driven by the rotation of several dihedral angles at a few hinges [55]. Calmodulin, which regulates a variety of cellular processes, is another example of a protein that goes through large scale conformational changes in its catalytic pathway. Its binding of calcium initiates conformational changes, which in turn forces the other proteins that are bound to calmodulin to change their conformations also, resulting in an activation of certain biological functions of the cells [56, 57]. Zhang et al. [58] report the coupling of conformational changes with the electron transfer function of the 10 protein cytochrome bcl. 1.3.2 Algorithms to Sample Protein Conformations Proteins have to change their conformations to achieve certain biochemical func- tions. It is the ability to sample different conformations with functional significance and to carry out chemistry on their molecular particles that distinguishes proteins from other non-crystalline networks. Unfortunately, there is no way to detect all protein conformations from experiments. X-ray crystallography experiments iden- tify one or several conformations of the same protein. NMR techniques reveal the most populated conformations, but they cannot detect the less populated ones. Com- putational modeling is indispensable in studying protein conformational transitions. Quite naturally, there have already been numerous such studies on sampling protein conformations. Some algorithms on sampling protein conformations use databases of existing peptide conformations. The database used by Deane et al. [59] in their PETRA pro- gram stores low energy conformations of short peptide segments up to twelve residues long. The conformations are calculated by ab initio methods. Short peptide segments in the loop region of the proteins are then replaced by the conformations of the same or different segments in the database. To reduce the total calculation cost, filter- ing on several easily calculated criteria is first done to rule out the most impossible conformations. The empirical potential of the replacement peptide segment is then calculated and minimized. The application of algorithm is limited to the flexible loops on the surfaces of proteins. Another category of algorithms samples protein conformations on the grid of dihedral angles [60, 61] systematically. Such algorithms are more appropriate for small molecules rather than on proteins, because the number of grid points to check grows exponentially with size. 11 Protein conformations can be sampled by MD simulations, in which protein motion trajectories are calculated by integrating Newton’s equations. Every snapshot of the motion trajectory is a viable protein conformation. Recent developments in MD simulations are reviewed by Wang et al. [62]. The time step within MD simulations is typically one or two femto-seconds. Integration of Newton’s equations is carried out at every time step. The MD simulations currently can reach one or two nanoseconds of real motion trajectories. Some fast protein conformational changes which are in the nanosecond time range can hence be simulated by MD algorithms. The slow protein conformational changes which are in the milliseconds to seconds time range are still out of the reach of the MD simulations. Clever techniques have been presented to make the MD calculations run faster. In the multicanonical MD simulations [63, 64], the possibility distribution of sampling conformations at different energies is artificially flattened [65], so that the probability of jumping over energy barriers is enhanced. Multiple MD trajectories are sampled in parallel at different temperatures in the replica exchange MD algorithm [66, 67]. Trajectories at different temperatures are periodically exchanged so that they all have chances to overcome high energy barriers at high temperatures. Where there is a MD technique, there is a corresponding MC method. The multicanonical MC method [68, 69, 70, 71] and the replica-exchange MC algorithm [72] have all been applied on the studies of proteins. 1.3.3 Algorithms to Close Rings Sampling conformations by MD or by MC algorithms is equivalent to exploring local minima on the rough and complicated energy surfaces which are constructed by the intricate potential functions. These potentials have numerous local minima and energy barriers. One way to sample the protein conformations is to follow the saddle points between the local minima [167]. In some studies, when to obtain an ensemble 12 of conformations that are as diverse as possible is sufficient, it is not necessary to know all the trivial peaks and troughs in the energy landscape. As discussed in Section 1.2, flexible regions of proteins have multiple conformations even in the most simplified potential, which is infinitely high or absolutely zero, depending on whether bond lengths and angles are distorted. Therefore it is very attractive to sample protein conformations obeying all bond lengths and angle constraints. The difficulty in sampling conformations in this way is to close all the rings in the proteins at every step otherwise the bond lengths and angles are distorted. A ring is a closed loop of bonds which connect atoms. There are two paths connecting any two atoms in a ring. Geometry solves the biochemical problem by providing algorithms to close the rings. The ring closure equations (RCE) proposed by G6 and Scheraga [73] state the conditions under which a ring is closed when the six unknown dihedral angles in the ring are consecutively positioned, or when they are only separated by locked peptide bonds. A peptide bond is the bond between two amino acid residues in proteins. The ideal dihedral angle of a peptide bond is either 0° or 180°. G6 and Scheraga elucidate a procedure to convert the RCE, which are six independent equations, to a single variable equation with a single unknown dihedral angle. A subsequent work [74] ex- hausts the conformational space of a short cyclic peptide segment by following the solutions to the RCE. G6 and Scheraga [75] later developed a method to close the big rings when the rings have C", I , or 52,, symmetry. The conformations of gramicidin S, which is a cyclic molecule with 18 rotatable bonds, are sampled by this proce- dure [76], assuming the molecule has an exact 02 symmetry. The conformations of the molecule cyclo—hexaglycyl under symmetries were also generated [77] and checked against conformations generated by MC algorithms [78]. Several algorithms have been invented to close the rings since the pioneering work of G6 and Scheraga. Wedemeyer and Scheraga [79] discovered that a ring is closed when its dihedral angles are roots of polynomial equations. The form and the 13 solutions to the polynomial functions of seven-fold and eight-fold rings are developed in that article. Wu and Deem [80] use three distance and angle constraints at the break point of a ring. These constraints are then transformed to a polynomial function of a single variable which can be solved numerically. Unlike all the other algorithms that optimize all of the involved dihedral angles simultaneously, the cyclic coordinate descent algorithm by Canutescu and Dunbrack [81] optimizes them one at a time. Bruccoleri and Karplus [82] allow bond angles to be relaxed when a ring cannot be closed exactly under the condition of fixed bond angles. All the algorithms listed above close only a single ring. In some studies, the whole protein main chain whose two ends are fixed in space is treated as a big ring. In some other studies, a short protein main chain segment is the ring to be closed. The main chain dihedral angles are varied systematically or randomly so that protein conformations are built or sampled [83, 84, 85, 86]. A protein has multiple interlocking rings which have to be closed exactly. Gibson and Scheraga [87] write the bond length and angle constraints at the disulfide bond as the pseudo-potential of a ring. The ring closes if the pseudo-potential is zero. They show several examples where three or four rings containing disulfide bonds are closed simultaneously. Chapter 4, Chapter 5 and Chapter 6 of this thesis present our approach to sample protein conformations by closing all the rings in a complicated network simultaneously. Our algorithm differs from the algorithm by Gibson and Scheraga in several aspects. First, the hydrogen bonds are treated as components of rings in our algorithm, but not in Gibson and Scheraga’s algorithm. Second, the fictitious potential of a ring utilized in our algorithm is different to that in Gibson and Scheraga’s algorithm. As discussed in detail in Section 5.2, the fictitious potential of a ring is defined to be the sum of the squares of the RCE in our algorithm. Gibson and Scheraga define the sum of the bond length and angle constraints at the disulfide bonds as the pseudo-potential of a ring. Third, our algorithm is able to handle a large 14 number of inter-locked rings. Though hydrogen bonds and hydrophobic interactions are important in proteins, previous studies do not regard them as forming rings. Therefore in the research works mentioned above few rings are identified in proteins. The flexibility analysis [88, 89, 90] has demonstrated that the predicted protein structural properties match better with what is measured in experiments when strong hydrogen bonds and hydrophobic interactions are counted as real constraints. Therefore proteins should be viewed as networks composed of covalent bonds, strong hydrogen bonds and hydrophobic interactions. The definitions of strong hydrogen bonds and hydrophobic interactions are discussed in Chapter 4. A large number of rings appear when hydrogen bonds are treated the same way as the covalent bonds. These rings inter-lock with each other in such complicated ways that the rotation of any single bond can break several rings instead of one. The algorithms designed to close a single ring are not capable of sampling conformations for these complicated networks, because they are not created to close all the rings simultaneously. Chapter 5 presents a new algorithm that is efficient in closing all the rings in a network. It is a powerful tool for sampling large scale protein conformations. Chapter 2: Modeling Discontinuous Random Networks As discussed in Chapter 1, porous silicon and silica networks have unique proper- ties and application potentials. This chapter demonstrates an algorithm to build fully coordinated amorphous silicon and silica networks with any desired characteristics of discontinuity in atomic distributions. Section 2.1 is a brief review of the standard WWW technique in building CRN models. Section 2.2 explains how to build DCRN models in 3D with voids of arbitrary shapes and sizes. Defects, such as dangling bonds and hydrogenated silicon atoms are inevitable in real amorphous silicon and silica material. Section 2.4 discusses how to include defects into the DCRN models. Examples of amorphous film and fiber silica models are shown in Chapter 3. 2.1 WWW Algorithms on CRN Models Amorphous silicon networks are made up of five—fold, six-fold, seven-fold, eight- fold rings and a small trace of four-fold rings and bigger rings. Crystalline networks are composed exclusively of six-fold rings. All rings mentioned in this thesis are irreducible rings. An irreducible ring is such a ring that the shorter path between two atoms in the ring is one of the shortest paths among all possible paths in the whole network connecting these two atoms. The WWW algorithm introduces five—fold and seven-fold rings to a silicon network by a bond switching process. In each bond switch process, the WWW algorithm exchanges a pair of nearest neighbors of two randomly chosen bonded atoms. Figure 2.1(a) shows a crystalline honeycomb network in 2D. The atom A in the figure has three nearest neighbors of atom B, C and D. The three 16 nearest neighbors of atom B are atom A, E and F. By breaking two bonds of AB and BE and creating two new bonds of AE and BD, the W\\"W algorithm alters the topology of the network, as shown in Figure 21(1)). The strain in the new network is relaxed so the lengths of new bonds become acceptable, as shown in Figure 2.1(c). Comparing the networks in Figure 2.1(a) and (c) shows that the WWW algorithm destroys four six-fold rings in the original network, followed by the creation of two five-fold rings and two seven—fold rings. The bond switch process works identically in 3D, though the number of six-fold rings eliminated is different than in 2D. After tens of thousands of such bond switch processes, the topology of the network does not have any residues of the t0pology of the original crystalline network. A subsequent simulated annealing process with even more bond switches reduces the total energy of bond length and angle variations. The resulting network has a totally different topology from that of the initial crystalline network. Bond length distortions and bond angle distortions in the networks are also low. For the sake of simplicity, the phrase bond distortions is used from here on to denote both bond length distortions and bond angle distortions. The Keating potential [91] used in the relaxation process in the WWW technique is composed of two parts: a bond length variation part and a bond angle variation part 16 1:22:01 Raf +8R2 B2:( Rij Rak — COSQJikRfi) (2.1) i jd. By connecting the two nearest neighbors of the two-coordinated atoms and removing the two-coordinated atoms themselves, we obtain a fully coordinated network which is finite in the :1: and y directions and periodic in the z direction, as shown in Figure 3.2. There are 64 cases of a two-coordinated atom connected directly to another two-coordinated atom. New bonds are created between the two nearest neighbors of the two two-coordinated atoms while the two-coordinated atoms are removed. In this case 64 four-fold rings are created. We allowed the creation of four-fold rings in this case because the procedure in building fiber model is simpler in this way. The distortions at the 64 four-fold rings will be relaxed in the following WWW procedure. The boundary of the original supercell is also shown in the figure to clarify how much the supercell shrinks in .r and y directions after this procedure. This bond rearrangement procedure removes almost half the atoms from the original supercell. There are only 640 atoms left in the supercell after this procedure. The remaining network is uniform in radius from top to bottom. The WWW algorithm is then applied to the network to make it amorphous. Oxygen atoms are inserted afterward between every silicon-silicon bond to create an amorphous silica network. The final model which is shown in Figure 3.3 has 640 silicon atoms and 1280 oxygen atoms in one supercell. The length of the supercell is 114.8A. Because the oxygen angles are soft, the fiber model is not stressed compared to the amorphous CRN silicon models. The root mean square deviation (RMSD) of Si-O bond is 0.107A, which is about 6.6% 0f the optimal value. The bond angle RMSD at silicon and oxygen atoms are 13.2° and 138° respectively. 36 .=®o.~@Q=m my: ad mEOud 4.5.7. ovo PE 823. muse N 2: wee? Sharon can $5385. a. was a 2: E 8ch mm 833 .1952. 885908 .33 4. and Sawm— V I ,2 . avuuvuaflotuaruu:avatars...unvuuvafln... a. h... .9. ... Auk... a”; I A . I! 34. .4. .4. .. i. . - a... a... ,.... a... ... 5.. . . .. f. ..~.~.~.-.-.. u. 6%» .oa» .oc. . .«v Jo» Jay .00. do. do. . c. .o, l Stu in Gin Gin C!" Cr“ Gin Cr" Ola 0‘" Urn 0.." din a... c... w... t... c... 434.14.... .. .. a. . w o o O..- ZA] 37 0.04 E- 1 E t ‘ 0 r 1 a, p . D ‘l : 10 p A A 1 5 E ‘ ‘C b ‘ ‘ ‘ c» : . 0- ‘ C O — 1 0’ 0.1 li- " 2 1 g b 4 .0 A E — > A 1 c : : '0 ’ O : : C ' ‘ n -o.04 .- ‘ 3 8 - I» o : ‘ 1k D ‘ a 1 O I 1 S : ‘ : (é) r G) I " E 1? -0.08 1 1 1 1 1 < m 0.05 1 1 1 1 1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Layer Layer 2., 0 : a I 5 I 1' C t I I I I T . C ; I ‘ _ c E 1 g : : .9 18 "f '1 .‘E : e : ‘di 1 0)) ’4 ':' E s > 1 'o : ‘ .8 I ‘ 1 a) I I . it 5, : 9 14 ,- . ': (CO -8 I' ‘ 11 g) : : 1 cu I ‘ E E U E : 8 5 5 1o '— g o '12 r— " D 1 : O) : 0 E 3 :9 : (é) : ‘ 1 m : - + 2 ~16 - 1 1 1 1 m < m 6 r 1 1 1 1 1 I 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Layer Layer Figure 3.6: The average bond length deviations from the optimal value (top left panel), the root mean square deviation (RMSD) of bond lengths from the optimal value (top right panel), the average bond angle deviation from the optimal value (bottom left panel) and the RMSD of bond angle deviation from the Optimal value (bottom right panel) in all seven layers of the fiber model. The first layer is the fiber surface, and the highest layers are in the center of the fiber. the average bond angles at silicon atoms are very close to the optimal value of 109°. Average bond angles at oxygen atoms are all about 10° less than the optimal value of 147°. This is caused by the presence of non-negligible number of four-fold rings in the network. The RMSD of both bond length distortions and bond angle distortions are small in all layers, proving distortions in our fiber model are acceptable. 43 3.2 Amorphous Film Silica Model 3.2.1 Procedure to Build Amorphous Film Network Model Fully coordinated amorphous film silica model can be easily built from crystalline silicon networks that are cut along the (0,0, 1) direction. All the atoms at the (0,0, 1) surface have two nearest neighbors, while all the non-surface atoms are fully coordi- nated, as shown in Figure 3.7. The example in the figure is made up of 4 by 4 by 8 supercells of crystalline silicon. It is periodic in :r and y directions but finite in the z axis. Since each supercell of the crystalline silicon network has 8 atoms, the total number of atoms in the supercell of the film model is 1024. Following the standard procedure described in Chapter 2, the two-coordinated atoms at the (0,0, 1) surfaces are removed after their nearest neighbors are connected by new bonds. The resulting network is fully coordinated. The number of remaining atoms in the supercell is reduced from 1024 to 960 after this procedure. The WWW algorithm is then applied on this network to make it amorphous. Oxygen atoms are added between every silicon—silicon bond to transfer the amorphous film silicon network to an amorphous film silica network. The supercell of the final result is shown in Figure 3.8. There are 960 silicon atoms and 1920 oxygen atoms in the supercell. The size of the supercell is 28.70A along the :r and y axes, and is 57.39A along the z axis. The model is finite in 2 directions. The top and bottom surfaces of the supercell are the surfaces of the film model. 3.2.2 Properties of Amorphous Film Network Model Similar to our studies in the amorphous fiber silica model, we can identify the layers in the amorphous film silica model. The atoms in the first layer are the surface atoms. The atoms in the second layer are the nearest neighbors to the first layer atoms, so on and so forth. All odd numbered layers in the model are made up of 44 Figure 3.7: A network that is made up of 4 x 4 x 8 supercells of crystalline silicon. Since the network is finite in z axis, atoms are not fully coordinated at the top and bottom surfaces of the network. The network is periodic in x and y directions. Fully coordinated silicon atoms are colored yellow, while two-coordinated atoms are colored blue. 45 N Figure 3.8: The supercell of an amorphous film silica network model. The sizes of the supercell are 28.70A, 28.70A and 57.39A along the :c, y and 2 directions respectively. The top and bottom surfaces of the supercell are the surfaces of the film silica network model. There are 960 silicon atoms (yellow) and 1920 oxygen atoms (red) in the supercell. 46 oxygen atoms while all even mimbered layers are of silicon atoms. Figure 3.9 shows the bond distortions at all layers in the amorphous film model. The average deviations of bond lengths from the optimal value is highest at the surface, then drops almost linearly to be about zero at the fifth layer. The distortions caused at the surface is then limited to the outer-most five layers of the film. The RMSD of bond lengths from the optimal value is also the highest at the surface, then decreases to be normal within five layers. Since the amorphous fiber silica model has only seven layers, the whole fiber silica model are influenced by the surface effect. The average deviations of bond angles in the amorphous film silica model oscillate around zero at the silicon atoms, but have negative values at the oxygen atoms. The smaller average bond angle at the oxygen atoms is caused by the four-fold rings in the network. The RMSD of all silicon-oxygen bonds is 0.09.31, which is slightly better than that in the amorphous fiber silica model. The RMSD of bond angles at silicon atoms and at oxygen atoms are 952° and 12.64°, which are all improved when compared to the corresponding values in the amorphous silica model. 3.3 Distribution Functions Due to the lack of long range order in amorphous networks, there is no way to predict positions and orientations of atoms far from the observation point based on the spatial arrangement of atoms nearby. The only information that is available in an amorphous network is the correlations between atomic distributions. Several distri- bution functions have been used to depict the atomic spatial distribution correlations. The nomenclatures of the distribution functions vary in literatures. The definitions used in this dissertation follow what are used by Thorpe et al. [97]. The Density Distribution Function (DDF) p('r) is the probability to find an atom in a unit volume that is at a particular distance r away observed from an averaged 47 A 0.08 ‘: Y—r Y ‘7 t V7 Y V V I 1' V 1' V I V Y V V I r: 0.15 P V V V V I V T V V I V Y V V I I V V V I V ‘ 3 3 ‘1‘, ,: ‘ s 2‘ a c . . a 0-04 .- 1 g P A 1 m I d H b 1 S I ‘ 1 g P ‘ m - > h ‘ 1 “o ‘ CD ‘ ‘ ‘ A A ‘ 11 U ’ ‘ 5 0 ' A ‘ A A A A " 5 0.1 *- ‘ "‘ 03 A ‘ "' ’ A ‘ C A ‘ ‘ ‘ a h 1 Q C A A A . 1 b A A A A A a : : 2 > . ‘ ‘ A ‘ A ‘ 1 o: -0.04 :- —I Q ~ ‘ A ‘3 : : CD ' ‘ 0 : ‘ 2 F 1 z s s a: : 2> —0'08 ' A A A A l A A A A l A A A A l A A A A l A ‘ 0.05 A A A A l A A A A l A A A A A A A A A l A 1 6 11 16 21 1 6 11 16 21 Layer Layer C = = a" E i c E . E V 18 r 1 O . : C 6 . :0 _4 — d .9 F (u : 0‘ > g : A ‘ ‘ A ‘ E .g E A ‘ A} b A A 1 - A ‘ 1 D E ‘ E 8 14 :- ‘ ‘ A 1 9 -8 L“ A 1 0 : ‘ ‘ A 3 cc» E E O) ‘ : CU : : c 1 c» _12 :. 4 D 10 T . a A ': (U * : (I) : A ‘ A e 1 a 1‘ 1 2 n . ‘ ‘ 1 > E It a: 3 < E I 1 -16 A l A l A A A A l A L A A l A A L A l A 6 A A A A l A A A A l A A A A l A A A A l A 1 6 11 16 21 1 6 11 16 21 Layer Layer Figure 3.9: The average bond length deviations from the optimal value (top left panel), the RMSD of bond lengths from the optimal value (top right panel), the average bond angle deviation from the optimal value (bottom left panel) and the RMSD of bond angle deviation from the optimal value (bottom right panel) in all layers of the amorphous film silica network model. The first layer is the film surface, and the highest layers are in the center of the film. 48 atom. It is mathematically defined as m = ,,..,,1{; De — n.) (3.1) 15161 in which N is the total number of atoms in the network, 13,- is the distance between atom i and atom j and the sum is over all pairs of atoms excluding such terms as i and j are equal. 6(r — 73,-) is a Dirac delta function. The Radial Distribution Function (RDF) T(r) is the total number of atoms within a thin spherical shell whose inner and outer radii are r and r + dr respectively observed from an averaged atom. It is related to the DDF by: T(r) = p(r)47r‘r2 = —— 2 6(7‘ — 73,-) (3.2) An integration of T(r) over all distance range produces N — 1 which is the total number of atoms in the network excluding the atom observed from. The atomic distribution in a random network is virtually uniform when the distance r is large. Therefore RDF should be roughly 47rr2p0dr in the limit of r —> oo, in which p0 is the average number density of the network. Hence p(r) approaches the average number density pa in the long distance range. It is for this reason that the p(r) defined in Equation 3.1 is called the density distribution function. The density distribution p(r) is not the same as the average number density p0 in the short and intermediate distance range because the atomic distribution is far from uniform when r is small. The DDF and its sister functions are great tools in analyzing the atomic arrangement in the short and intermediate distance range in amorphous networks. The Reduced pair distribution function (RPDF) G (r) is related to the DDF by the simple transformation C(r) = 47V [p('r) — po] (3.3) 49 Obviously the RPDF approaches zero in the long distance range where p(r') is virtually the same as p0. RPDF is indirectly measured in X-ray or neutron powder diffraction experiments through a conversion from the diffraction pattern 3(q) by C(r) : — [000 [8((]) — 1] sin(q7‘)q(1q (3.4) Because of the easy conversion between RPDF and the powder diffraction pattern, RPDF and its sister functions have been extensively used as the standard tool to study the short and intermediate atomic spatial distributions. The DDF, RDF, RPDF and other variants are generally called the pair distribution function (PDF). There are two typical approaches in comparing the PDF obtained from modeling and from experiments. The first approach is to convert the diffraction pattern 3(q) to PDF, then compare the experimentally obtained PDF with the PDF from modeling in the real space. The second approach is to convert the PDF to a diffraction pattern, then compare the diffraction pattern from the modeling with that from the experiments in reciprocal space. The first approach is preferred because a peak in the real space has obvious meanings while a peak in the reciprocal space may not be so easy to explain. One question we want to address is how much our amorphous fiber and film silica models differ from CRN models in term of PDF in the short to intermediate distance range. The PDF of our amorphous fiber and film models have to be close to that of CRN models, otherwise our models are likely to be unrealistic. The comparisons of the PDF of the amorphous fiber and film models with the PDF of the CRN silica model are thus not only a test of the validity of the built models, but also a test on the algorithm upon which the DCRN models are built. But all variants of PDF are defined under the assumption that atoms are isotrOp- ically distributed in all directions. However the atomic distributions in both the fiber and the film models are anisotropic. Atoms in the fiber model are distributed along a 50 long cylinder, and atoms in the film model are distributed within two surfaces. The long range behaviors of the PDF of the fiber and film models are thus quite different from the PDF of CRN models. Figure 3.10 shows the RDF of three amorphous mod- els: the CRN model, the amorphous fiber model and the amorphous film model. The RDF of the CRN model rises much faster than the RDF of the amorphous film model, which is roughly linear in long distance range. The RDF of the amorphous fiber has a maximum at about 10A. It is almost flat in long distance range. Therefore it is not reasonable to compare the PDF of the amorphous fiber, amorphous film and CRN models directly. To reveal the differences in PDF of the three models in the short and intermediate distance range, we define a new distribution function called the Reduced Density Distribution Function (RDDF) P(r) which is the radial distribution function of a network model divided by the radial distribution function of a uniform media that has the same overall shape as the model. As discussed in Appendix A, the RDF of infinitely large uniform media, of uni- form media in the shape of a film of thickness d, and of uniform media in the shape of a fiber of radius (1 are 3,1».(7‘) = 47r7‘2p0 (3.5) 47rr2(1— {—d)p0 0 g r < d Tflilm(r) : (36) 27rdrp0 'r > d 47r7‘2p0[ — fin + gang) + 3%(1— %)K(fi)] 7' < 2d Triber(r) : 47173,), [ — xii-$0 + 5;»)E(sin—l ($1172) + 36.12“ — I—.)K(2d)] r > 2d (3.7) in which p0 is the average number density and functions E and K are the elliptic 51 50m : Y T v V Y fir V r T I VVVVVVVVV r YYYYYYYYY I vvvvvvvvv V V Y : 4500 :— Amorphous Fiber Model -I E Amorphous Film Model - A 5 Continuous Random Model 5 E 4000 ,— ‘= 3 : I a = = E 3500 I- -I ‘9 : : g . . t, 3000 E- -I t- : : c ’ I .9 E : ‘5 :. .- c 2500 : 1 U- . . c E 3 .2 2000 r 1 3 I I :9. E i b : : (D - . O 1500 E E I . 8 E 3 CE 1000 — '1 500 E- /’ '- 0 h A AA A J J A A l AAAAAAAAA l A A A L L L A A L A J A A L L A A A A A A A A ‘ 0 5 10 15 20 r(A) Figure 3.10: Radial Distribution Function of a CRN model (blue curve), of an amor- phous film model (green curve) and of an amorphous fiber model (red curve). integrals defined as a EM, 1:) : / \/1 — A72 sin.2 (9(10 0 I\'(k) = f2 ‘19 . (3.8) 0 m The merit of the RDDF is that by dividing the RDF of uniform media of the same shapes, the RDF of amorphous models are transformed into new functions in which the peaks in the short and intermediate distance range are manifested. Figure 3.11(a) shows the RDF of the amorphous fiber model superimposed with the RDF of uniform media in the shape of a fiber. The peaks in the short and intermediate distance range are buried in the noise. Figure 3.11(b) shows the RDDF of the same fiber network model. Compared to the RDF, RDDF clearly exposes the first several peaks of the correlations in local atomic arrangement. The effects of the overall shapes on usual PDF are totally eliminated. The RDF of infinitely large and uniform media is 47rr2p0, as given in Equation 3.5. Therefore the RDDF of CRN models which are infinite and isotopic is the RDF of the model divided by 47rr2p0, which is identical to the definition of the DDF of the CRN models except a constant factor p0. The RDDF of the amorphous fiber and film models whose atomic distributions are anisotropic are different from the corresponding DDF. Atomic correlations in the local and intermediate distance range are better dis— closed in RDDF than in any other distribution functions. Figure 3.12 shows the RDDF of a CRN model, of the amorphous fiber model and of the amorphous film model. All three RDDF remarkably resemble each other. The first sharp peak comes from the nearest neighbor silicon-oxygen distance. The second peak is composed of the second nearest neighbor oxygen-oxygen correlations and the silicon-silicon corre- lations. A third peak is broad and barely visible in all three RDDF. The RDDF of IArno morphous fiber model—— Uniform media in shape oi liber 400- (a) Radial distribution function T(r) (arbitrary unit) RDDF u o l l I l l O 10 20 30 40 50 60 r(A) Figure 3.11: Figure (a) shows the RDF of the amorphous fiber silica model super- imposed with the RDF of uniform media in the overall shape of a fiber. The RDF of amorphous fiber model divided by that of the uniform media produces the RDDF which is shown in Figure (b). The effects of shapes on the atomic distribution func- tions is absent in RDDF. 54 Amorphous Fiber Model Amorphous Film Mridel Continuous Random Model AAlAA V I i Y T I V I Y I V T b "IfYYIITYYfI'VVVVVVY' AAALAAJAAAAAAAAA 'VV'IVIVVVV Reduced Density Distribution Function P(r) w I vvvrvarvav 0 l AAAAAAAAA l AAAAAAAA AAAAAAAAALALAAAAAALAAI AAAAAAAAA 1 AAAAAAAAA 0 1 2 3 4 5 6 7 8 Figure 3.12: The RDDF of an amorphous fiber model (red curve), of an amorphous film model (green curve) and of a CRN model (blue curve). The inset is the close-ups of the second peaks in the RDDF of the three models. all three models are featureless beyond the third peak. The first peak in the RDDF of the CRN model is narrower and sharper than that in the RDDF of the amorphous fiber and film models. This is due to the smaller distortions in nearest neighbor silicon-oxygen bond lengths in the CRN model. The atoms at and close to the surfaces in the amorphous fiber and film models are unavoidably more distorted in bond lengths and angles than the atoms far away from the surface. Since the CRN model does not have any surface, the average bond lengths and angle distortions in CRN models are expected to be lower than those in the fiber and film models in which surfaces are present. This is proved by the fact that the RDDF of the first peak of the CR.\T model is sharper and narrower than that of the other two models. The amorphous fiber model has the largest surface area, both in absolute and in relative terms. Therefore the distortions in the amorphous fiber model should be the largest among the three. Since the bond length distortions incurs higher potential penalties than the bond angle distortion does, most of the distortions are bond angle distortions, particularly oxygen bond angle distortions because the oxygen bond angles are softer than the silicon bond angles. The second peak in RDDF is the distribution of second nearest neighbor distances, which is an indirect measurement of the bond angle distortions. The second peak of the RDDF of the fiber model is the widest of the three RDDF, indicating the bond angle distortions in amorphous fiber model are larger than those in the other two models. The amorphous film model is in the middle between the amorphous fiber model and the CRN model in term of the width of the second peak. The general forms of the RDDF of the three models are the same. The shapes and the positions of the peaks are virtually identical. In this sense the DCRN models generated by our algorithm — the amorphous fiber and amorphous film models shown in this chapter — do not differ much from the CRN models in terms of local and intermediate atomic distributions. Though the distortions of bond lengths and angles at the surfaces play roles in affecting local atomic distributions, they are not beyond reasonable levels so that the RDDF are not altered significantly. 3.4 Metal-Adamantane Network Model Pivan et al. [98] synthesized a chemical material containing triogermanate ada- mantane [G84SIO]4— units in 1994. X-ray diffraction experiment proves this material is crystalline. Each adamantane unit is a tetrahedron, as shown in Figure 3.13. Each adamantane unit is composed of ten sulfur atoms and four germanium atoms. Six out of the ten sulfur atoms -— the gold atoms in the figure — are bonded to the four Figure 3.13: A [Ge4510]4~ adamantane unit. Each adamantane unit is composed of ten sulfur atoms and four germanium atoms. Four out of the ten sulfur atoms are connected to the sulfur atoms in the other adamantane units, or are bonded to metal atoms. These four sulfur atoms are colored in red. The other six sulfur atoms are bonded to the germanium atoms in the same adamantane unit. They are colored gold in the figure. Germanium atoms are colored as yellow. The whole adamantane unit forms a tetrahedron. germanium atoms. The other four sulfur atoms are called the terminal sulfur atoms. One end of each terminal sulfur atom is bonded to a germanium atom in the same unit, and the other end may be connected to a metal atom or a terminal sulfur atom in another adamantane unit. Later studies [99, 100, 101] report metal-adamantane networks. All these networks are crystalline and micro-porous. The material synthesized by Bonhomme and Kanatzidis [102] has mesoscopic structures. Thin layers of crystalline adamantane units are separated by long or- ganic surfactant molecules. A novel metal—adamantane material is first reported by MacLachlan et al. [103]. Long surfactant molecules cluster to form large cylinder- shaped tunnels. These tunnels are well organized on a hexagonal network. The metal 57 atoms and the adamantane units fill between these tunnels. Therefore from a topolog- ical point of view, the metal atoms and the adamantane units form a porous network. A subsequent work [104] suggests that the metal-adamantane network is well ordered. On the contrary, Rangan and his co—workers [105] propose that the metal-adamantane network should be disordered. Later X-ray powder diffraction analysis by Wachhold et al. [106] confirms the lack of long range order in the metal-adamantane network. Hence the metal-adamantane network is not only porous, but also amorphous. A schematic diagram is plotted in Figure 3.14. The organic surfactant molecules form cylindrical channels. According to Wachhold et al. [106], the diameter of the channels ranges from 22A to 32A, depending on the lengths of the surfactant molecules. The separation between the channels are between 30A to 44A. Therefore the thickness of the metal-adamantane network can be as thin as about 10A in some regions of the material. Since the distance between a terminal sulfur atom and the center of the adamantane unit is only 4.47A, at most two layers of adamantane units are allowed in the thinest regions between the surfactant channels. Metal atoms are tetrahedrally coordinated to the adamantane units. The ada- mantane units are also tetrahedrally bonded to other atoms. In this sense the amor- phous metal-adamantane network resembles the amorphous gallium arsenide network, in which all atoms are tetrahedral. It is energetically favorable for the metal atoms to be bonded to adamantane units, and for the adamantane units to be connected to the metal atoms. All the rings in the network should be even-numbered. However a large amount of five-fold rings and seven-fold rings exist in the networks built by our algorithm. A large number of unlikely bonds will be present if the metal-adamantane network is modeled by our algorithm. For this reason we turn to other models to model the metal-adamantane network. As the starting point we use an amorphous gallium arsenide model constructed by Barkema and Mousseau [107]. This model contains 1000 atoms. There are not any Figure 3.14: A schematic diagram of the proposed mesoscopically porous metal- adamantane material. Surfactant molecules cluster to form long cylindrical channels. The diameters of the channels range from 22A to 32A, depending on the lengths of the surfactant molecules. These surfactant channels can be approximately viewed as being on a hexagonal lattice. Average distance between channels are about 30A to 44A. These channels are sketched as the light blue circles in the figure. Metal atoms and adamantane units form cross-linked network between the channels. 59 odd-numbered rings in the model. The gallium and arsenic in the model are redefined to become metal atoms and adamantane units respectively. Since all the rings in the model are even-numbered, each metal atom is bonded to four adamantane units, and each adamantane unit is connected to four metal atoms. Since this is a bulk model, only the bulk properties of the metal-adamantane network can be studied. The network is then relaxed by a Keating-like potential E = $2022]- — 75,2").2 + 37'" Z (rij . rik — 7'?" cos 6",)2 + €220.“ - r12 — rsr‘m cos 0,)2 i,j i,j 6, one can systematically or randomly try all the possible combinations of N -— 6 dihedral angles of the ring, and minimize the fictitious ring closure potential f with respect to the six unknown dihedral angles at every step. Each zero fictitious potential f corresponds to a set of solutions to the ring closure equations, which in turn suggests a new conformation of the ring. We have utilized the limited-memory BF GS source code [122], which is a quasi-Newton unconstrained nonlinear optimization algorithm, to minimize the function f. The optimized fictitious potential may not have a zero value at some combinations of the N —- 6 dihedral angles. It may be due to the nonexistence of a solution of the remaining six dihedral angles, or it may be due to the inefficiency of the L-BFGS method to find a solution. The L-BFGS algorithm is a local potential optimization algorithm so it is not capable of finding the global minimum point under any conditions. Only when the function value of f is zero do we consider a set of solutions to all the ring closure equations has been found. 5.2.1 Conformations of a Seven-Fold Ring At every step while working on a single N-fold ring, ROCK first randomly se- lects and rotates N — 6 dihedral angles from their values in the previously accepted conformation. Then it minimizes the fictitious potential f with respect to the six remaining dihedral angles. The new conformation, if the fictitious potential is zero, is subjected to checks on van der Waals overlaps before accepted. A van der Waals overlap occurs when the distance between two non-bonded atoms is smaller than the sum of their van der Waals radii times a coefficient. A small coefficient such as 0.6 or 0.7 represents a soft van der Waals repulsion between non-bonded atoms. A large coefficient such as 0.9 or 1.0 represents a stifl van der Waals repulsion. A generated conformation is rejected if the number of van der Waals overlaps is not zero. The whole process is a random walk process. It is not exactly a Monte Carlo approach because it does not accept or reject conformations based on Metropolis [123] criterion on the energy of the generated conformations. Since ROCK searches conformations in the vicinity of the last one it accepted, it is capable of exploring the conformations space in a quasi-continuous manner. It is not able to jump from one cluster of con- formations to another cluster of conformations, if there are multiple conformational clusters. This point is manifested in the conformational space of a seven-fold ring. The bond lengths and angles are exactly 1.54A and 67° for all bonds (it is worth not- ing that bond angle values are given according to the definition shown in Figure 5.3). Due to the perfect seven-fold symmetry, three pairs of dihedral angle correlations are enough to describe the conformational space of the seven-fold ring. They are: the correlations of the nearest neighbor dihedral angles (251 vs. (152, the correlations of the second nearest neighbor dihedral angles (,0, vs. Q53, and the correlations of the third nearest neighbor dihedral angles (t1 vs. 6.4, as shown in figure 5.4. G?) and Scheraga’s algorithm [73] of reducing ring closure equations to four non- linear single variable equations is able to find all possible conformations of the seven 86 Figure 5.4: A seven fold ring whose bond lengths and angles are exactly the same. Correlations of (151 vs. (:52, (151 vs. $3 and Q51 vs. 4)., reveals the whole conformational space of the seven fold ring due to its seven-fold symmetry. fold ring in a single run. The left column in Figure 5.5 shows the conformations sampled by Go and Scheraga’s method. The seven-fold ring has two well separated clusters of conformations. One cluster of conformations forms a twisted loop in the seven dimensional conformational space which is spanned by the seven dihedral an- gles. The other cluster of conformations forms an ellipse in the conformational space. G5 and Scheraga’s algorithm is able to find all possible conformations in both clusters in one run. Our algorithm ROCK solves the ring closure equations by minimizing the fic- titious ring closure potential. It does not jump between clusters. Starting from a conformation in the twisted loop cluster, it is able to find all the other conformations in the same cluster, as shown in the middle column of Figure 5.5. But it is not able to find any conformation in the ellipse cluster. Or, if the starting conformation is in the ellipse cluster, ROCK will find all conformations in the ellipse cluster, but not any single conformation in the twisted loop cluster, as shown in the right column in 87 Figure 5.5. The. fact that ROCK is not capable ofjumping between clusters of con- formations can be viewed as a disadvantage in sampling conformations. Or it in fact can be regarded as a safeguard against jumping over high energy barriers. The bond distortions, which are necessities for the ring to transform from a conformation in one cluster to a conformation in another cluster, are equivalent to high energy barriers. Since our goal is to explore low energy protein conformations at room temperature, it is an advantage of ROCK that it does not overcome obvious potential barriers easily. 5.3 The Complexity Associated with a Network of Rings As stated in Section 5.1, it is difficult to generate conformations for a flexi- ble network with lots of inter-connected rings due to the correlations between all of the dihedral angles in the network. Go and Scheraga’s method, although it is able to generate conformations for single rings when the six unknown dihedral angles are con- secutively arranged, is not. efficient in searching conformations for networks. ROCK can be applied on any ring system because of its simplicity. We define a fictitious total ring closure potential of all the rings in a network as f: E t (5.4) all rings which is the sum of the fictitious ring closure potential of every ring in the network. The total fictitious potential of the whole network can then be minimized with respect to all the rotatable and unknown dihedral angles, in the hope to find a zero potential point that is the new conformation of the ring network. However the computation cost of solving multiple nonlinear equations simulta- neously makes it infeasible to solve ring closure equations of all rings concurrently. 88 -100 - . -. ........ I ......... I ......... I ........ : _,00 5W -100 -50 0 50 100 -100 -50 0 50 100 -100 '50 0 50 100 $0 4’0 4’0 Figure 5.5: Complete conformations of a seven-fold ring calculated by G5 and Scher- aga’s algorithm (left column) and by minimizing the fictitious ring closure potential (middle and right columns). The top, middle and bottom rows show correlations between nearest neighbor, second nearest neighbor and third nearest neighbor dihe- dral angles respectively. The seven-fold ring has two clusters of conformations. One cluster of conformation form a twisted loop in the conformational space. The other cluster looks like an ellipse. Go and Scheraga’s algorithm is able to find both clusters of conformations in one run (shown in left column). Our method of minimizing the fictitious ring closure potential however does not jump between clusters. The confor- mations sampled by ROCK are either confined to the twisted loop cluster (shown in middle column), or to the ellipse cluster (shown in right column), depending on the initial conformation of the seven-fold ring. 89 Suppose a generic network of rings has N bonds and M DOF. In principle one can randomly select and disturb M dihedral angles in the network, and minimize the total fictitious ring closure energy of all the rings in the network, with respect to the total number of variables N = N — M. Since the computational cost of minimizing such nonlinear potential increases in the order of N3, this method is not capable of handling networks with hundreds of bonds, which are common in the flexible regions in small to moderate sized proteins. It is more preferable to handle a network of rings in a ring by ring fashion. Suppose the number of rings in the network is n, the average number of variables per ring is thus N/n. The computational cost of solving ring closure equations for every ring one by one for one time is thus on the order of (JV/703 x n = N3/n2. Since both total number of rings 72. and total number of variables N — M scale linearly with the total number of bonds N in common protein structures, the computation cost in theory scales linearly, if ring closure equations of all rings can be solved one by one. In practice, however, any other rings in the network break when dihedral angles in one ring are rotated to close the ring, as have been explained in Section 5.1. In order to solve the ring closure equations for all rings in a network as efficient as possible, we design a procedure that minimizes the fictitious ring closure potentials of an expanding network. The algorithm first tries to close the ring which has the smallest number of unknown dihedral angles in the whole network. This ring is called the seed. After succeeding at closing this ring, the algorithm then minimizes the sum of fictitious ring closure potentials of the seed and of up to five more rings that share bonds with the seed. The newly added rings and the old seed is now the new seed of the network. If all rings in the seed can be closed simultaneously, ROCK then adds up to five more rings that share bonds with the seed to be the newly expanded seed. Step by step, ROCK adds rings to the expanding seed, and then minimizes the sum of the fictitious ring closure potential of all the rings in the seed. Because all the 90 Figure 5.6: A simple network with two DOF. The left side eight-fold ring and the right side seven-fold ring share one bond which is shown as the dashed line. rings in the seed are already closed when new rings are added, only small adjustments in dihedral angles are necessary to close all the rings in the seed concurrently. The whole process stops when all the rings in the network are added to the seed. The total calculation cost of this procedure is lower than minimizing the total fictitious ring closure potential of all the rings directly. One crucial step in generating a new conformation for a network of rings is to select a set of bonds to be randomly rotated about their original values. Inappropriate selections of bonds results in unsuccessful trials and wasted computation time. For example there are two DOF in the network in Figure 5.6. One can select and rotate two bonds in order to find a new conformation for the network. But the selections of the two bonds cannot be arbitrary. If two selected bonds are all in the seven-fold ring, there are only five variables left unknown in the ring closure equations of the seven- fold ring. But each set of ring closure equations has six independent equations. In most cases, there is not a solution for the five variables in six independent equations. This trial is most likely to be unsuccessful due to the wrong combination of selected bonds to be randomly rotated. The possibility to close the rings after two bonds are rotated is greatly enhanced if one selected bond is in the seven-fold ring and the other is in the eight—fold ring, or if both selected bonds are in the eight fold ring. ROCK utilizes the following procedure to select a set of bonds to rotate. It is 91 designed to avoid choosing the wrong combination of bonds to rotate from constraint theory point. of view. 1. Randomly select a freely rotate-ible bond. 2. Count the DOF in the ring to which the selected bond belongs. Go back to step 1 if the DOF is negative which means the ring is over-constrained. One more constraint is counted because a randomly selected bond is equivalent to one more constraint on the dihedral angles. The ring is considered to be the seed of the network. 3. Expand the seed by one more ring by adding one ring that shares bonds with the rings in the seed to the seed. The DOF of the seed is calculated. Every randomly selected bond is counted as one more constraint. Go back to step 1 if the DOF is negative. 4. Repeat step 3 until all rings in the network are included in the seed. The bond selected in step 1 is then officially selected to be randomly rotated. Any local area the network is not over constrained by the rotation of this bond. 0. Repeat step 1 to step 4 until a desired number of bonds has been selected. The procedure listed above ensures that the randomly selected bonds do not over constrain any local area of the network. Though it does not guarantee that there are six unknown variables for each set of ring closure equations, it does help reduce the rate of unsuccessful trials. Randomly selecting and rotating fewer bonds than the DOF further improves the rate of successful trials. The discussion carried so far assumes every bond in a network is freely rotatable. There are bonds which, however, should be considered to be locked. The peptide bonds, for example, favor either trans or cis conformation. There are energy barriers between these two conformations. The dihedral angles of these bonds should be kept 92 unchanged from their values in the initial conformation. Each fixed bond adds one constraint to the network. A seven-fold ring with one fixed bond has zero DOF which is the same as that of a six fold ring. 5.4 Conformations of Side Chains Once new conformations of the flexible ring clusters are generated, side branch atoms are anchored back to the ring clusters with correct bond lengths and angles. Side branch atoms are first randomly disturbed to sample the conformations of the side branches. The coordinates of the side branch atoms are then relaxed in the Cartesian coordinates so that 1) bond lengths and angles of side branch atoms are undistorted from the original values; 2) there are no van der Waals overlaps between side branch atoms themselves and between side branch atoms and ring cluster atoms; and 3) chiralities at side branch atoms are not changed. An atom is called a chiral center if the atom has equal to or more than three nearest neighbors. The bond lengths and angles between the chiral atom and its nearest neighbors are the same as those in the mirror image. The atomic arrangement of a chiral center and its three nearest neighbors is not identical to that in the mirror image. A chiral center atom and its three nearest neighbors are illustrated in Figure 5.7. Since the bond lengths and angles are equality constraints, while the checks against van der Waals overlaps and chirality flips are inequality constraints, ROCK calls the program DONLP2 [124] to minimize the function fee): 2 (r—r0)2+ 2 09—00)? (5.5) bonds angles subject to a collection of inequality constraints of van der Waals repulsions g1 .17 = r2. — r2 2 0 (5.6) 2] v 93 (8) I (b) 1 2 o o c 3 3 c Figure 5.7: A chiral center atom c and its three nearest neighbors of atom 1, atom 2 and atom 3. Figure (b) shows the mirror image of Figure (a). The bond lengths and angles between the chiral center atom and its nearest neighbors are the same as those in the mirror image. The atomic arrangement of the chiral center and its nearest neighbors is not identical to that in the mirror image. It goes counter-clockwise from atom 1 to atom 2 to atom 3 in the original image, but it goes clockwise from atom 1 to atom 2 to atom 3 in the mirror image. and a set of inequality constraints that chirality at atoms should not be flipped , _ 0 0 0 ,. 92(1") — [13‘1” (I‘z‘k X Pull [rij ' (rt/c X Full 2 0 (0-7) in which 7‘ and 0 are bond lengths and angles in the trial conformation, r0 and 60 are the bond lengths and angles in the initial conformation, 73-]- is the distance between two non-bonded atoms of atom i and atom j, rv is the sum of the van der Waals radii of the atom i and atom j, r”, rik and r“ are the vectors between atom i and 0 ij’ its bonded neighbors in the trial conformation, and r r3, and r9, are the vectors between atom i and its bonded neighbors in the initial conformation. The function f (:r) is minimized to be zero when the bond lengths and angles of the side branch atoms are identical to the corresponding values in the initial conformation. The sign of the dot and cross product of the three vectors r,j, rik and r“ specifies the chirality of atom i. The chirality of atom i in the generated conformation is identical to the chirality of the same atom in the initial conformation if and only if the sign of the dot and cross product of the three vectors at the atom in the generated conformation is the same as the sign of the same product in the initial conformation. DONLP2 is a non-linear optimization program that can optimize a function subject to both 94 equality and inequality constraints. The inequality constraints of van der V-Vaals overlaps forces the distances between any pair of non-bonded atoms to be larger than a critical value 1', which is the sum of van der Waals radii of these two atoms. When all the inequality constraints are satisfied there are no van der Waals overlaps between any pair of atoms in the protein, nor chirality at any atoms are flipped. After randomly disturbing the side branch atoms from their original Cartesian coordinates, ROCK checks the distances between every pair of non-bonded atoms to build the van der Waals overlap list. It constructs the inequality constraints gl(;r) according to the list. Then it minimizes the function f (51:) subject to the inequality constraints. Once the function is minimized to practically zero, ROCK builds a new van der Waals overlap list to begin a new round of minimization. A new conformation of the side branch is found when there are no van der Waals overlaps, when the function f (1') is practically zero, and when the chirality at every atom is not flipped. 5.5 Workflow According to the algorithms outlined above, we wrote a FORTRAN program package ROCK to sample conformations of the flexible regions in proteins. Since it relies on flexibility analysis to sort the flexible regions from the rigid cores of the proteins, it is preferable to have the flexibility analysis result from FIRST ready before the program is run. The program works in the following procedure: 1. Read in the initial protein conformation. Calculate the bond lengths and angles. 2. Read in the flexibility properties of the protein analyzed by FIRST. If the rigidity analysis result is not available, ROCK counts the distribution of DOF by itself by counting constraints in all local regions in the protein. 3. Find the rigid cores in the protein. Starting from one non-rotatable bond as the seed, ROCK adds nearest neighbor non-rotatable bonds to the seed until 95 10. there are no non-rotatable bonds can be included. ROCK finds all rigid cores in the proteins by this method. It then fixes the orientation and position of the largest rigid core in space. All the other smaller rigid cores and flexible ring clusters move relative to the largest rigid core. . Find all flexible ring clusters and side branches. . Randomly select and rotate several bonds in each flexible ring cluster according to the procedure described in Section 5.3. Minimize the total fictitious ring closure potential F of the ring cluster defined in Equation 5.4 by the L-BFGS algorithm [122]. Go back to step 5 if the fictitious potential cannot be minimized to be zero. . Check for van der Waals overlaps between the ring cluster atoms themselves and between the ring clusters atoms and the rigid core atoms. Go back to step 5 if van der Waals overlaps exist. Randomly disturb the side branch atoms away from their positions in the pre- vious conformation. Utilize the DONLP2 [124] algorithm to find a new conformation of each side branch with zero bond distortions, with correct chiralities at every atom and without any van der Waals overlaps. Go back step 8 for nine extra trials on each branch if new conformations cannot be found, or go back to step 5 to start from the beginning if ten consecutive searches of side branch conformations fail. Accept the new conformation. Go back to step 5 to search for another confor- mation by using this new conformation as the starting point. Stop the whole process when a predefined number of new conformations are generated. 96 ROCK also checks the quality of main chain 95 and t": angles against the Ra- machandran plot [110] to ensure the stereo—chen‘iical quality of the generated confor- mations. Eighteen out of the twenty standard residues (alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, histidine, isoleucine, leucine, lysine, methionine, phenylanaline, scrine, threonine, tryptophan, tyrosine and valine) are checked against the Ramachandran plot generated by Morris et al. [125]. The other two residues, glycine and proline, are checked against two Ramachandran plots spe- cially designed for glycine and proline. The Ramachandran plots of glycine and proline are discussed in Appendix B. The main chain d) and it) angles of the twenty standard residues are restricted to the core and the allowed regions of the Ramachandran plot by default. ROCK rejects any conformations which have one or more residues whose 06 and if) angles are not in the allowed regions of the Ramachandran plot. 97 Chapter 6: Results and Discussions ROCK has been tested on several macromolecules. This chapter shows three examples. The first example is a model molecule made up of four ten-fold rings. The second example is the conformations of the human immunodeficiency virus type 1 (HIV-1) protease, which is one of the most important proteins controlling the life cycle of the virus. The third example shows multiple randomly generated pathways between the occluded, the closed and the open conformations of the protein dihydrofolate reductase (DHFR). 6.1 Model Molecule H8CSSQO Figure 6.1 shows the model molecule HBCSSQO. Figure 6.1(a) shows the topology of the molecule, while Figure 6.1(b) shows a low energy conformation. From the topological point of view, eight carbon atoms are positioned on the corners of a rectangular box, connected either by double or by single sulfur atoms. Eight hydrogen atoms complete the valency of the carbon atoms. The bond lengths between carbon and sulfur, sulfur and sulfur, and carbon and hydrogen atoms are 1.805A, 2.019A and 1.120A respectively. These values are the optimal bond lengths used in the MM3 force fields [126]. All the bond angles at carbon atoms are exactly tetrahedral (109.5°). To avoid van der Waals overlaps, the bond angles of sulfur atoms are increased to be 135°, rather than a more realistic value such as 95°. The model molecule has 36 atoms, 60 independent bond angle and 40 bond length constraints. Note that there are only 5 not 6 independent bond angle constraints at the carbon atoms. The total number of DOF is thus 2. The molecule was chosen by us to have both many interlocking rings as well as two internal DOF, to make the conformational space both non-trivial and easy to display. The conformational space 98 r. - . ' - a." (y—L (a is LL Figure 6.1: The model molecule H8C8S-20. In the topological graph shown in Fig— ure (a), eight carbon atoms (green spheres) are at the corners of a rectangular box. Di-sulfur linkages are at eight out of twelve edges of the box. Single sulfur linkages occupy the other four edges. Sulfur atoms are represented by large yellow spheres. Hydrogen atoms (small Gray spheres) are connected to carbon atoms. A low energy conformation is shown in Figure (b). This molecule has two DOF. of the molecule can be plotted on a 2D graph, in which the two variables representing the two DOF are the two axes. Any two dihedral angles can serve as the two axes on which the conformational space is projected. A convenient choice is the two dihedral angles 61 and (1)2 as shown in Fig. 6.1(a). As illustrated in the figure, these two dihedral angles are equivalent in the sense that they are interchangeable. The topology of the model molecule also shows mirror symmetry over the plane cutting through the centers of (251 and 4);. Because a dihedral angle changes its sign under a mirror symmetry operation, and because the mirror image of a ring has the same bond lengths and angles as the ring does, a ring is closed when all its dihedral angles change sign. Therefore if there is a conformation at a certain combination of (151 and $2, there is also a conformation when the signs of ¢1 and (152 are changed. Because of these two symmetries, one quarter of the whole 27r x 27r plane expanded by the two dihedral angles $1 and ¢2 is enough to depict the conformational space accessible to the molecule. These symmetries are {$1, 52} —> {(152, (151} and {051, (92} —> {—¢1,—¢2}. Symmetries in conformational space hold true in the model molecule because it 99 does not have any bond length and angle variations. The lengths of all bonds between carbon and sulfur atoms are the same, the lengths of all bonds between sulfur and sulfur atoms are the same, the angles of all carbon atoms are the same, and the angles of all sulfur atoms are the same. The symmetries discussed above are not apparent in any single conformation of the model molecule, but are obvious in the ensemble of all conformations. 6.1.1 Conformations of Model Molecule H8C8820 Our program successfully generates 10,000 conformations in 53 hours on an Athlon AMD MP 1900+ processor. The search was carried out with a random walk procedure in the (.51, 62 space. Van der Waals overlap is allowed in the first 5,000 conformations but not in the later 5,000. As shown in Figure 6.2(a), the conforma- tions cover almost the whole two dimensional space when van der Waals overlaps are allowed. The conformations are not as densely packed in the vicinity of (1)1 ~ 0 or (1)2 ~ 0 as in the other regions. This is because the distribution of number of solutions to the ring closure equations is not uniform. As discussed in Section 5.2, 6 dihedral angles in a single N-fold ring can have zero or several solutions to the RCE, if the other N -— 6 dihedral angles are given. Similarly, because our model molecule has two degrees of freedom, for a given pair of (251 and (t2 values, there can be none to mul- tiple sets of solutions of the remaining dihedral angles to the RCE. There are fewer solutions to the RCE at points in (251 ~ 0 and €752 ~ 0 than at points in other regions. The distribution of the number of solutions in the ((1)1, (252) plane is directly reflected by the frequency of our program finding solutions in any regions. A low resolution grid search and a random search in conformational space confirm this point. Our program implements a hard sphere model in checking for van der Waals overlaps. The distances between any two non-bonded atoms have to be greater than the sum of their van der Waals radii otherwise the conformation is abandoned. Van 100 130 130 : ......... , ......... , ......... , ......... , ......... , ......... : , (b) 120 3 120 r ‘: 6° 6° - ’“r :5". _ 75:11..“ - :1]..- ‘ '9' O 0 "LI... ‘ , _ '1‘: . . it". 1‘ -60 -60 - , “\‘Sfig '- -120 420 E- - '180 '180 : ......... I ......... l ......... t ......... l ......... t ......... ~180 120 -60 O 60 120 180 180 : ’70 VVVVVVVVV I vvvvvvvvv : - r t l (d) + + i __ _1 P + + 4 120 : 5 _ + . + + : : + * +‘-O- + + ‘ = Q ' g + ++ + it, . 60 .- -. + + J»... , , 3 1 ++4" : : r e + l E - b + i a“ o . ‘1 '80 r + . * " 1 + l + t. + + ‘ ‘60 E- + + "1 f + f E «f t a * + ‘ : : + t -120 .- -' , + , E : l I '180 i ......... l ......... l ......... l ......... l ......... I ......... : _90 ......... 1 ......... -180 -120 -60 0 60 120 180 ~90 -80 -70 $1 4’1 Figure 6.2: The distribution of conformations projected on the two axes (231 and 452. The top left Figure (a) shows the generated conformations when van der Waals overlaps are allowed. The top right Figure (b) shows the generated conformations when van der Waals overlaps are prohibited. The bottom left Figure (c) shows the optimized conformations in the MM3 force field. The bottom right Figure ((1) shows a close up of one of the four clusters of optimized conformations in Figure (c). Two degenerate global minimum states are shown as solid circles in Figure ((1). 101 der Waals radii used in our program are 1.632A, 1.72A and 1.296A for carbon, sulfur and hydrogen respectively. These values are all 0.8 times the optimum radii used in the van der Waals interactions in the MM3 force field. The van der Waals potential in l\ll\~l3 force field is soft at the optimum distance. It does not rise sharply until the distance between two atoms is considerably shorter than the sum of their optimum radii. The potential increases by roughly 4 kcal/mol when the distance between two atoms decreases to 0.8 times the optimum distance. By allowing a maximum 4 kcal/mol potential penalty in the van der Waals interactions, our program samples as large a conformational space as possible, while avoiding generating conformations whose van der Waals potentials would be unreasonably high. Therefore our hard sphere radii of atoms are set to be 0.8 times the optimum radii used in the MMB force field. As shown in Figure 6.2(b), the conformational space sampled when the hard sphere interaction is turned on is considerably reduced from the conformational space when the van der Waals overlap effect is not included. This is indeed expected. The figure shows clearly that some regions are sampled more frequently than other regions. Van der Waals constraints, in addition to the uneven distribution in the number of solutions to the RCE, cause the nonuniform distribution of conformations. The 5000 conformations without van der Waals overlap are Optimized further by an external software package TINKER [127] with the MM3 force field. To be consistent with the model molecule, the optimal angles of sulfur atoms in MM3 force field are adjusted to be 135°. The conformational space of the optimized structures is plotted in Figure 6.2(c). The energies of most optimized conformations are all below - 58 kcal/mol. The global minimum energy is -61.998 kcal/mol. All but a few optimized conformations lie in one of the four symmetric clusters. The 5000 conformations generated when van der Waals overlap is included are also optimized by the same force field. The energy of the global minimum structure is also -61.998 kcal/mol, and has the same conformation as above. 102 It is worth noting that the ensen‘ible of conformations sampled by our pro— gram shows the two expected symmetries of {491,52} —> {052,001} and {abhog} —> {—51,—62}. As shown in Figure 6.2, these two symmetries can be vaguely identi- fied among the 5000 conformations generated when the van der Waals overlaps are allowed, and is obvious among the 5000 conformations generated when the van der Waals overlaps are disallowed. The fact that the two symmetries required by the topology of the molecule are manifested in the ensemble of conformations sampled by our program is a necessary yet insuflicient proof that our algorithm samples the whole conformational space of this model molecule. It is clear that our algorithm samples conformational space efficiently by using this hierarchical approach. Maximum conformational space is sampled when van der Waals overlaps are allowed. As expected, the conformational space is considerably reduced when van der Waals overlap is prohibited. The conformational space is fur- ther reduced when the molecular structure is Optimized in a full force field such as MM3. By exploring the conformational space with only bond length and angle con- straints while forbidding van der Waals overlap, our algorithm is able to explore the conformational space where local minima of full force field are most likely to reside. Without bond length distortion, bond angle distortion and van der Waals overlaps, every generated conformation is accessible by the model molecule at moderate tem- peratures. 6.2 Conformations of HIV-1 Protease 6.2.1 Structures and Functions of HIV-1 Protease HIV-1 protease is vital for the reproduction of the HIV virus. The HIV virus replicates several proteins that are essential for the viral maturation process on a long peptide chain which is called the “polyprotein”. The proteins in the long polypro- 103 tein chain are not active. There is a time window for the HIV-1 protease to cut the polyprotein into several pieces to activate these proteins before the polyprotein de- grades. The binding of prohibitory molecules to the HIV-1 protease therefore hinders the activation of the proteins in the HIV polyprotein, with the consequence that the HIV virus cannot reach its matured stage. The importance of the HIV-1 protease in the life circle of the HIV virus has made itself the primary pharmaceutical target. for curbing the acquired immune deficiency syndrome (AIDS) which is caused by the HIV virus. Several drugs designed to bind to the HIV-1 protease with great affinity have shown positive effects on AIDS patients. Since the first 3D structure of the HIV-1 protease was published [128], more than 200 structures of the protease have been reported on various resources [129]. Most of the structures are bound with inhibitors. The inhibitor-free structure of HIV-1 protease contains two identical amino acid chains. Each chain is made up of 99 residues. The two chains are glued together by hydrogen bonds and hydrophobic interactions. The molecule has an exact C2 symmetry. Shown in Figure 6.3 are a ligand-free HIV—1 protease structure (1HHP) obtained from X-ray crystallography experiment [130]. The two chains are shown as red and blue ribbons respectively. The big free volume in the middle of the protein is the binding site. Polyproteins are locked and cut in this region. The catalytic sites ASP25—THR26-GLY27 in both chains are rendered as spheres. It is worth noting that all conformations of the HIV-1 protease observed in ex- periments are either bound with ligands or are in the closed conformation, similar to the one shown in Figure 6.3. In such a conformation the catalytic site is not covered by the two flaps at the top of the protease due to the large void that is immediately above the site, but the short distance between the two flaps forbids the polyprotein to approach to the catalytic site from the top. There should be many open conforma- tions where the two flaps are widely open so that the polyprotein can pass through. 104 Figure 6.3: The ligand-free structure of the HIV-1 protease (1HHP). The two identical chains are rendered as red and blue ribbons respectively. The flexible flaps at the top of the protein are rendered as widened strands. The catalytic site is indicated by spheres. These open conformations are not observed in X-ray crystallography because other molecules in adjacent unit cells in the protein crystal force the protease to take a unique and closed conformation. The two flaps at the top of the protease are flexible. NMR experiments show that the two flaps, residue 48 to 55 which are shown as the widened ribbons in Figure 6.3, have two types of motions in different time range. One motion is the slow motion of the flexible flaps in the time range of [,tS-mS [131]. The other motion is the fast curl in and curl out motion of the tips of the flexible flaps (residue 49 to 53) which is in the sub-ns time range [132]. The fast motion confirmed what is seen in the MD simulation [133], though Freedberg et al. [132] do not agree on the scale of the flexibility shown in the simulation. Another MD simulation by Carlson‘s group [134] shows that the fast curl motion of the tips of the flaps is absent when water molecules are first equilibrated before the MD simulation. Carlson argues that the 105 curl motion observed in MD simulations is actually caused by the voids between water molecules and the flaps, if the water molecules are not in their optimal positions at the beginning of the simulation. Whether the curl motion is real is yet to be cleared. The MD simulation shows that the characteristic of the curl motion is that the 03 and it: angles of the GLY51 residue are in different regions in the Ramachandran plot as other glycine residues in the protease are. The motion of the flexible flaps is important in the function of HIV-1 protease. The distance between the two tips of the flaps in crystal structures is only 2.7A. Such distance is not large enough for the polyprotein to pass through, bearing in mind that the diameter of a water molecule is 2.8A. The two flaps have to undergo a large range of conformational transformations to catch a small part of the polyprotein. The fast sub-ns range motion enables the protease to adapt to a suitable conformation to better interact with the polyprotein that is passing nearby. The flaps then drag a part of the polyprotein chain to the reaction center via the slow motion. We would like to answer two questions pertaining to the conformations of the flexible flaps of the HIV-1 protease: 1) How large the distance between the tips of the flexible flaps can be in all possible conformations? This distance have to be large enough so that the amino acid chain of the polyprotein can pass through the voids between the flaps. 2) Are there conformations in which the tips are curved inward? Are the (13 and ’t/J angles of the GLY51 residue different than those of other glycine residues in the protease? We utilize the combination of FIRST and ROCK to address these questions. Our algorithm samples the possible conformations but not the dynamical tra- jectories. Therefore our algorithm can study the statistical behavior of possible tra- jectories such as the maximum distance between two atoms and the distribution of dihedral angles et al. Our algorithm cannot answer the questions related to the abso- lute dynamical trajectories. However we do get hints of what a real trajectory looks 106 like by studying the statistical behavior of an ensemble of possible conformations. 6.2.2 Flexibility Analysis on HIV-1 Protease The software package FIRST is used to analyze the flexibility properties of the HIV-1 protease. FIRST assumes a hydrophobic interaction whenever two non-bonded carbon atoms are within a certain distance. This criterion works well in the interior of the proteins where hydrophobic interactions are stable. On the other hand, the hydrophobic interactions on the surface of the proteins are fragile. The only in- teraction between two non-bonded carbon atoms is the van der Waals interaction, which is weak and easy to break. The hydrophobic interactions in the interior of the protein are stable not because of the interactions between the non-bonded atoms themselves are strong, but because of the overall confinement and compression ef- fects of the water molecules around the proteins. The presence of water molecules around the proteins strengthens the hydrOphobic interactions in the interior of the proteins. However water weakens or even destroys the hydrophobic interactions on the surface of the protein. Two non-bonded carbon atoms on the protein surface may accidentally be close to each other in fractions of the whole protein motion trajec- tory, but the contacts are easily destroyed by the collision of water molecules which are always in thermal motion. Another argument of why the hydrophobic interac- tions are stable in the interior but not on the surface is that the concentration of hydrophobic interactions in the interior of proteins is higher than that on the surface. The high concentration hydrophobic interactions interlock with each other so all of the interactions are strengthened. Therefore the analysis of hydrophobic interactions should consider the environments in the surrounding of the potential hydrophobic interactions. FIRST however does not take the environmental effects on hydrophobic interactions into consideration. Its simple definition of hydrophobic interactions may produce false positives in some cases. 107 FIRST identifies six hydrophobic interactions between the two flexible flaps. From the fact that. these two flexible flaps are in constant motion between the open and the closed states. though the open states are not observed in X-ray crystallography experiments, we know that. these six hydrophobic interactions must be short lived and weak, otherwise the two flexible flaps will be always in the closed conformation. The fact that these six pairs of non-bonded carbon atoms are in close contacts in the crystals does not mean these atoms always stay together. For this reason we manually removed these six hydrophobic interactions. A flexibility analysis without these six interactions shows that majority of the HIV-1 protease is rigid with small flexible regions, as shown in Figure 6.4. The two flaps on the top are flexible, as required by the biochemical function of these two flaps. The four additional flexible regions in the protein are not of interest to us. Not shown in the figure are many other smaller flexible regions which are exclusively made up of flexible side chain atoms. 6.2.3 Conformations of HIV-1 Protease ROCK generates 600 conformations of the protein HIV-1 protease obeying all the constraints specified in the flexibility analysis. Because a large portion of the protease is rigid, the calculational power of ROCK is concentrated on the two top flexible flaps and on the other smaller flexible regions shown in Figure 6.4. ROCK also generates conformations for the small flexible regions which involve only side chain atoms. The total CPU time is 6 hours and 40 minutes on an AMD Athlon 1900+ (real frequency is 1.6GHZ) processor. All of the generated conformations do not have van der Waals overlaps. All main chain (t and i!) angles are restricted to the core and the allowed regions in the Ramachandran plot. The Ramachandran plot is discussed in detail in Appendix B. Figure 6.5 shows the superimposition of the 600 conformations in the ribbon diagram. The top figure is viewed from the side and the bottom figure is viewed from 108 Rigid Isostatic Flexible {1| {E} Figure 6.4: The flexibility properties of the HIV—1 protease. The protease has one major rigid core as shown in blue and gray. The blue regions are over constrained. Lots of bond constraints have to be cut to make the blue regions flexible. The gray regions are almost flexible but still rigid. The two flaps on the top of the protein are flexible, indicated by the color gold. Four additional regions in the protease are flexible, shown as the yellow regions in the figure. The regions colored by gold have greater density of DOF than the regions colored by yellow. 109 the top. Only the protein main chain motion is shown in the figure. The flexible and the rigid regions of the proteins are colored by yellow or gold and blue or gray respectively. Figure 6.6 shows the superimposition of the same 600 conformations in wire diagram. All bonds are shown in the figure. The motion of flexible side chains is revealed in this figure but not in Figure 6.5. The distance between the tips of the two flexible flaps in all conformations indi- cates how big the conformational space the protein can sample. The distance between the flaps is defined as the smallest distance between any atom in one flap and any atom in the other flap. The distance between the two flaps is 2.7A in the crystal structure. This distance is too small for any peptide chain or small molecules to pass through. The distance can be as large as more than 8.0A however, as shown in Figure 6.7. As shown in the figure, the distance between the two flaps is smaller than 3.0A only occasionally. The distance oscillates around 5.0A in most of the conformations. The distance of 5.0A is large enough for water molecules to pass through, but not large enough for a thick peptide or a big inhibitor molecule to pass. So the protein does not open up enough in most of the time. There are occasions however that the distance is larger than 8.0A, which is large enough for a peptide chain to go through. Once a short segment of polyprotein can go through the gap between the two flexible flaps, the flaps falls back to the closed conformations to hold the polyprotein tightly. The catalytic residues in the HIV-1 protease then cut the polyprotein. Our calculation shows that the first step of the whole catalytic function of the protease, which is the opening of the two flexible flaps, is indeed possible without any external driving force. The RMSD of Cu atoms indicate the average deviation of the main chain atoms in generated conformations from those in the crystal structure. Its mathematical form u is calculated by N . 1 . . "(1) = N guy) _ A”)? (6.1) Rigid lsostatic Flexible Figure 6.5: Superimposition of 600 conformations of HIV—1 protease generated by ROCK shown in ribbon diagram. The residues in the rigid regions of the protein are plotted in blue and gray. The flexible residues are shown in yellow, gold and red. Figure (a) shows the side view while Figure (b) shows the view from the top. 111 Figure 6.6: Superimposition of 600 conformations of HIV-1 protease generated by ROCK shown in wire diagram. Atoms are not shown in the figure but the bonds are colored by the atom types at the two ends of the bonds. Carbon, nitrogen, oxygen and hydrogen atoms are colored in green, blue, red and gray respectively. The conformations shown in this figure are the same as those in Figure 6.5. Figure (a) shows the side view while Figure (b) shows the view from the top. 112 Distance between two flaps (A) Conformations Figure 6.7: Distance between the two flexible flaps in all generated 600 conformations. The distance is only 2.7A in the crystal structure. The distances in most of the generated conformations are around 5.0A. The largest distance is more than 8.0A. 113 for the RMSD of the Ca atom of the jth residue. r?) is the coordinate of the CO atom of the jth residue in the ith conformation. r?) is the coordinate of the Ca atom of the jth residue in the crystal structure. The sum is over all N conformations. Atoms are in constant motion even in crystal structures. The averaged atomic motion in X-ray crystallography data is given by the Debye-Waller B factor which is related to the RMSD by B‘“ = 879W)? (6.2) Figure 6.8 shows the calculated and the measured RMSD of the Ca atoms. The calculated data is based on the coordinates of the Ca atoms in the 600 generated conformations. The measured data is converted from the measured Debye-Waller factor provided in the crystal structure of the HIV-1 protease [130]. Because the motion of the protease is restricted in the crystal, the RMSD of the Ca atoms in the crystal structure does not have much interesting features. It oscillates around 0.7A without any major peaks. The protease in the crystal structure does not sample many conformations because of crystal contacts and volume constraints. Our calculation shows that the protein should have large conformational changes in three regions: from residue 15 to residue 18, from residue 35 to residue 42 and from residue 45 to residue 56. Prominent motion is shown in residue 45 to residue 56, which are the flexible flaps at the top of the HIV-1 protease. The calculated RMSD is only non-zero in the flexible regions. This is expected because our algorithm fixes the positions of the atoms in the rigid cores of the proteins. Our algorithm artificially eliminates the small scale fluctuations of all atoms around their equilibrium positions. It illustrates only the large scale conformational changes that are beyond the averaged fluctuations. The calculated RMSD of the two chains, shown as the red and the blue curves in Figure 6.8 do not overlap with each other. This is because our calculation is not able to sample all possible conformations. These two curves will be identical when 114 5 ........ , ......... , ......... , ......... , ......... , ...... ., , _ , . ,. . . I Calculated from 600 conformations. Chain A — 2 Calculated from 600 contormations. Chem B — . From X—ray crystallography data I RMSD (A) Residue Figure 6.8: The comparison of the calculated and the measured RMSD of the main chain Ca atoms in each residue. The red and the blue curves are calculated based on the 600 generated conformations. The black curve is converted from the Debye- Waller factor in the X-ray crystallography data of the crystal structure of the HIV-1 protease. 115 the generated conformations of one chain are the same as those of the other chain. Since the conformations of the two chains are independently generated, the generated conformations of the two chains will be symmetric only when they are the complete conformations of the two chains. Given the number of DOF in the HIV-1 protease is large, the calculation time required to sample the complete conformations would be astronomical. Scott et al. [133] publish a similar figure in the discussion of their MD simulation on the same protease. Because MD simulation catches both the slow and the fast motions in proteins, the RMSD calculated from the MD simulation has a oscillating background of 1.2A. The highest RMSD value of about 48A is observed at the 50th residue in one chain. This data is comparable to the RMSD of the 50th residue in our generated conformations. The RMSD of the CO, atoms in the three flexible regions identified by FIRST are all distinguishable from the background in the MD simulation. MD simulation shows that the RMSD of the CO atoms in residue 75 to residue 83 are higher than the background noise, which implies residues from 75 to 83 are flexible. However flexibility analysis predicts these residues to be rigid. The flexibility analysis may have falsely identified a few hydrogen bonds or hydrOphobic interactions in this region. The distribution of the main chain ¢ and if) angles of the glycine residues in the conformations generated by our algorithm are qualitatively different from the those of the conformations sampled by the MD simulation reported by Scott et al. [133]. As shown in the four panels in Figure 6.9, the distributions of d) and ti) angles of four glycine residues that are on the tips of the flexible flaps are very narrow in the whole Ramachandran plot. The main chain o and 1/2 angles of the GLY51 residue in both chains are distributed in such a narrow range that they can be considered as not having changed at all in all of the 600 conformations. A structural analysis reveals that these two dihedral angles are all included in a ten-fold ring which is formed by 116 the main chain atoms of GLY49, ILE50 and GLY51. The hydrogen bond between the GLY52 main chain hydrogen atom and the GLY49 main chain oxygen atom closes the ten-fold ring, as indicated by the dashed bond in Figure 6.10. The peptide bond of the ILE50, GLY51 and GLY52 residues are also included in this ten-fold ring, as indicated by the red bonds in the figure. Because peptide bonds are not rotatable, the ten-fold ring has in fact only one DOF. The dihedral angles of all rotatable bonds, including the main chain at) and in angles of the residue GLY51, are hence much limited. On the contrary, the distribution of the main chain (5 and 2,!) angles of the GLY51 residue in the conformations created by the MD simulation [133] covers a significant portion of the whole Ramachandran plot. Assume the bond lengths and angles of the covalent bonds do not vary much in the MD simulation, the bond length and angle of the hydrogen bond that closes the ten-fold ring must undergo great distortions to allow the (b and il) angles of GLY51 to vary much. This implies that the hydrogen bond of the ten-fold ring is not stable in the MD simulation. Because the energy of the hydrogen bond is -3.14 kcal/mol in the crystal structure, which is a typical value of a strong hydrogen bond, flexibility analysis by FIRST lists this bond as stable constraints. The discrepancy between the hydrogen bond stability predicted by the flexibility analysis and that by the MD simulation has to be eliminated in future studies. Since the distribution of the main chain (0 and w angles of the GLY51 residue is limited to a small region in the Ramachandran plot, the flexible flaps in the conforma- tions generated by our algorithm do not have the curl in motion discovered in the MD simulation. Even then, the two flexible flaps are widely open in some conformations, for example the distance between the two flaps can be as large as 8.0A. Therefore we conclude that the curl motion is not related to the open and close motion of the flexible flaps at the top of the HIV-1 protease. 117 180 _ ....... w ....... , ......... , ......... , t" ....... ,. ........ , ......... , ..... ., .g : . 2 1+ tat "r-s‘ . r : li 1 : 5.; : : : - - l- u 90 E- GLY48 in Chain A -: l— GLY48 in Chain B -: Z . : : ; : : 3 5* 0 r 1 .- 1 I I : . : 3 : . P - . -90 r 1 r 1 Z 1 - : : + 1 inf +3 ’ i .1. 1J1"‘:"‘-kuilll ......... l ............... 5.113.... ..1 u .1 mum‘s” ‘4 180 _ ......... , ......... , ......... , ......... _ ......... , ......... , ......... rm...“ : : : : h n n - ' I - ' 90 r GLY51 In Chain A -. _- GLY51 In Chain B —_ : : : : . 5. 0 L _ '_ _: I W : “an «at g : : 5 -90 .- 1 r 1 —180 ’ ......... l ......... l ......... l .................. l ......... l ......... l 111111111 —1 80 —90 0 90 1 8O —90 0 90 180 Figure 6.9: The distribution of main chain (13 and 21) angles of the GLY48 residue in chain A (top left panel), of the GLY48 residue in chain B (top right panel), of the GLY51 residue in chain A (bottom left panel) and of the GLY51 residue in chain B (bottom right panel). The distribution of main chain (2) and ii) angles of GLY48 in both chains covers much larger space in the Ramachandran plot than the distribution of <13 and ti) angles of the GLY51 residues does. 118 GLYSZ O GLY49 L, ILESO Figure 6.10: The tip of the flexible flaps at the top of the HIV‘I protease is made up of GLY49, ILE50, GLY51 and GLY52. The main chain (15 and w angles are included in the ten—fold ring which is closed by the hydrogen bond between the main chain hydrogen atom of GLY52 and the main chain oxygen atom of GLY49. The hydrogen bond is indicated as the dashed bond. Three non-rotatable peptide bonds, which are the red bonds in the figure, are also in the ten—fold ring. Green, blue, red and gray spheres are carbon, nitrogen, oxygen and hydrogen respectively. 119 6.3 Conformational Pathways of DHFR 6.3.1 Structures and Functions of DHFR The enzymatic protein DHF R catalyzes the reduction of 7,8—dihydrofolate (DHF) or folate to 5,6,7,8—tetrahydrofolate (THF) with the help of the coenzyme nicoti- namide adenine dinucleotide phosphate (NADPH). THF is the one carbon carrier in the synthesis of many amino acids. It also plays a key role in the bio-synthesis of purine and thymidylate, which are essential components of DNA. Therefore the activity of DHFR, which transforms folate or DHF to THF, indirectly controls the bio-synthesis of DNA. The inhibition of DHFR inevitably results in the blocked DNA synthesis, which ultimately leads to the death of the cells. The key role it plays in the DNA metabolism has made DHF R the target of anti-cancer drugs [135]. These drugs hinder the growth of cancer cells by blocking the activity of DHF R. Because cancer cells grow faster than human cells, anti-cancer drugs that bind to DHFR do not affect the human cells as much as they do cancer cells. Because of the importance it bears, the protein DHFR is present in all living or- ganisms, including archae, prokaryotes and eukaryotes. It is also extensively studied. Structures of DHF R complexed with various ligands have been determined by X-ray crystallography and by NMR techniques. Up to date there are already 105 DHFR structures in the Protein Data Bank (PDB) [136]. Statistical analysis [137] shows that there are three Escherichia coli DHFR (ecDHF R) conformations: the Open, the closed and the occluded conformations. Crystal structures of vertebrate DHFR pro- teins are always found in one conformation which resembles the closed conformation of ecDHFR [138, 139]. Figure 6.11(a) shows the superimposition of the three conformations of ecDHF R. The closed, the occluded and the Open conformations are represented by the protein 1RX1, IRXG and IRAQ respectively [140]. The three conformations are almost iden- 120 tical except in the loop region Of residue 14 to 24, which are conventionally called the ;\I-20 loop. Figure 6.11(b) shows the close up Of the loop region. The M-20 loop covers the binding site Of the ecDHF R. Its movement is believed to be coupled with the catalytic reaction of the protein. The M-20 loop catches the ligands when it is in a particular conformation. The loop then escorts the ligand to the binding site through proper conformational changes. Once the ecDHFR finishes the catalyzing process, the i\-‘I-2() loop then Opens, guides the reduced ligands out Of the binding site, and then releases it. The NMR experiments by Falzone et al. [141] prove the frequency Of the M-20 loop conformational change is the same as the disassociation rate Of the THF. Beyond the function of steering the ligands in and out Of the binding site, the motion of the M-20 lOOp may participate in the catalytic reaction directly by coupling with the reaction coordinates, according to the theory that protein motions may acti- vate catalytic reactions [142]. Through their hybrid Quantum Mechanics/ Molecular Mechanics (QM/MM) simulation, Agarwal et al. [143] identify the coupling between the side chain motions and the reaction coordinate Of the ecDHFR. Simulation of Shrimpton et al. [138] reports similar results on chicken DHF R. Sawaya et al. [137] conclude from systematic studies that the closed conforma- tion of the ecDHFR is in the first half cycle Of the catalytic reaction of the protein, while the occluded conformation is in the second half cycle. There must be confor- mational pathways between these two distinct conformations. The role Of the Open conformation is not clear though. The Open conformation is not seen in either the first or the second half Of the catalytic reactions Of ecDHFR. The Open structure, shown as the blue bend in Figure 6.11, is not in the middle Of the closed (the yel- low bend) and the occluded (the red bend) conformations in real space. However, Sawaya et al. find that the Open conformation is in the middle of the closed and the occluded conformations in terms Of structural and biochemical characteristics, such 121 Figure 6.11: The superimposition Of the open (blue), the closed (yellow) and the occluded conformations (red) of ecDHFR. The closed, the occluded and the open conformations are represented by the protein 1RX1, 1RX6 and IRAQ respectively. Figure (a) shows the whole proteins while Figure (b) shows the close up of the M-20 loops. 122 as the pattern of hydrogen bonds, the packing with ligands, the secondary structures and the distribution of main chain dihedral angle. The importance of the motion of the M-20 loop requires studies of its detailed motion patterns. There are two questions to be answered in understanding the confor- mational changes of the M-20 loop. The first question relates to the conformational pathways. Since the occluded and the closed conformations Of the M-20 loop are detected and proved to be intermediates of the catalytic reactions, there must be conformational pathways between these two distinct conformations. It is not clear whether there is only one conformational pathway, or there are many. The second question concerns the existence of the open conformations. The open conformation is in the middle Of the occluded and the closed conformations when examined from the point of the main chain dihedral angles, but is outside of the range of these two conformations in real space, as shown in Figure 6.11. It is not clear whether the con- formational pathways between the occluded and the closed conformations involved the Open conformation. The N MR technique, which measures an ensemble Of conformations, is the right tool to gain the structural data of the most populated conformations, but not the tool to extract the structures of a conformation that is less populated. For this reason the intermediate conformations between the occluded, the closed and the Open conformations are never Observed in experiments. Simulation techniques have to be utilized to gain insight into the conformational pathways Of the M-20 loop. MD simulation, being the standard method in exploring protein conformational changes, is in practice not be able to sample the whole conformational pathways between the Open, the closed and the occluded conformations Of ecDHFR, because the time range of the ecDHFR conformational changes its conformation is beyond the calculation capacity of current MD simulation techniques. The ecDHFR changes its conformation at a rate of about 203‘1 [144]. The best MD simulations reach a 123 couple of microseconds while the majority of MD simulations in literatures are in the nanosecond time range. Yet several MD simulations have been tried on ecDHF R [145, 146]. The MD simulation by Rod et al. [147] shows very promising results. It shows multiple transitions between the three distinct conformations, though the simulation is only 10ns long which is far less than the 203“1 conformational change rate reported in experiments. It also predicts several other distinct conformations that are not detected by experiments. In practice, the MD simulation is limited in its capacity to sample long time scale protein conformational changes partly because it wastes time on high frequency motions, partly because it includes the whole protein in calculation. The protein ecDHF R is very stable. Except in the M-20 loop region, the protein does not undergo much structural transformation under usual conditions. The structural stability Of the protein support our interpretation of proteins as flexible regions anchored on motionless rigid cores. The rigid core Of DHF R does not change its conformation in a full catalytic reaction cycle. MD simulations waste time on calculating the dynamic trajectory of the rigid cores of proteins which is not Of interest in our study Of the motion Of the M-20 lOOp. ROCK eliminates calculation endeavor on any high frequency motions by fixing the bond lengths and angles. The flexibility analysis enables our algorithm to sample the conformations of the flexible regions Of the proteins only. These two advantages make our algorithm a powerful tool in sampling the conformational changes that is be- yond the scope Of present MD simulations. Our algorithm is capable Of sampling the conformational pathways between the occluded and the closed conformations of the protein ecDHF R. We would like to answer the questions of how similar these confor- mational pathways are and whether the Open conformation appears in the pathways. Section 6.3.2 examines the flexibility characteristic Of the protein. It is worth noting that our algorithm samples the protein conformations by a random walk procedure. 124 Section 6.3.3 shows how a small n’iodification enables our algorithm to search the random yet directional pathways. Section 6.3.4 discusses the calculation results. Un- like the MD simulation which determines whether a snapshot Of a trajectory is closer to which Of three conformations (Open, closed or occluded ) according tO the char- acteristics of the main chain dihedral angles, our analysis are mainly done in real space. 6.3.2 Flexibility Analysis on DHFR Experiments extract more than one distinct conformation for some proteins, such as HIV-1 and DHFR. The existence of multiple conformations makes it easier tO pre- dict which hydrogen bonds or hydrOphobic interactions are stable in the whole path Of the protein conformational changes. If a hydrogen bond or a hydrophobic interac- tion is present in one conformation but not in the others, it is safe to say that this interaction is not stable. Therefore by comparing the distinct protein conformations we can identify those interactions that are truly stable. The quality of the flexibility analysis is improved by including only the interactions that are truly stable in the whole protein motion trajectory. In principle the more distinct conformations Of a protein there are the more reliable the prediction Of the stable hydrogen bonds and hydrophobic interactions is in flexibility analysis. We use only one conformation in the flexibility analysis of HIV- 1 protease however because our algorithm is powerful enough to provide much insight into the intrinsically allowed motions of proteins on the basis of only one structure. After all, many proteins have only one distinct structure which is determined from experiments. Since the occluded and the closed conformations are two Observed intermediate states in the catalytic pathways of the enzyme ecDHFR, these two structures are good indicators of which hydrogen bonds and hydrophobic interactions are stable in the whole enzyme reaction pathways. \Ve exclude the unstable hydrophobic interactions and hydrogen bonds from our flexibility analysis. Of those hydrogen bonds that are present in both conformations, the energies are taken to be the higher values (weaker bonds) of the same bonds in the two conformations, because the higher energy of a bond in two conformations tells how unstable the hydrogen bond can be. Information from the Open conformation is not used in the analysis of stable hydrogen bonds and hydrophobic interactions. One question we would like to address is whether the Open conformation is accessible on the pathways for the protein to transform between the closed and the occluded conformations. The inclusion of any information from the Open conformation in the flexibility analysis would bring bias in favor of showing Open conformation in the pathways. The protein structures IRXI and 1RX6 [140] are used in this study tO represent the closed and the occluded conformations Of ecDHFR. The two conformations are identical in amino acid residues. Therefore the covalent bonds are the same in the two conformations. Both conformations are bound with ligands. Ligands and sur- rounding water molecules are removed since they are irrelevant to the sampling Of intrinsically allowed conformations of the protein. Both protein structures are from X-ray crystallography with a resolution Of 2.0A. Polar hydrogen atoms are added to both conformations by the Unix version Of the software WhatIF99 [148]. The software FIRST [88] is applied to both conformations. It first calculates the energies of the potential hydrogen bonds based on geometrical considerations. Hydrogen bonds in the two conformations are then re—organized so that the hydrogen bonds are identical in both conformations. A potential hydrophobic interaction is listed when a pair of hydrophobic centers is closer than 0.7A plus the sum Of the van der Waals radii Of the two centers. According to Dr. Zavodszky’s experience [119] on the usage of FIRST on several proteins, biochemical properties match better with the predictions from FIRST when the critical distance Of hydrophobic interactions is 126 0.7A plus the sum of van der “'aals radii of a pair Of hydrophobic centers. A possible hydrophobic interaction is taken as a stable hydrophobic interaction when it is present in both conformations. This procedure creates a list of interactions and constraints that are common to both conformations. The two conformations, when obeying this single set Of constraints and interactions, are identical in flexibility properties. The selection of hydrogen bonds depends on the hydrogen bond cut Off energy. Figure 6.12 shows the structural properties of the protein under different hydrogen bond cut off energies. Each horizontal line represents the protein under one particular hydrogen bond cut Off energy. The thick bar shows rigid cores while the thin lines are flexible loops. The red and thick bars in the figure denote the biggest rigid cluster in the protein. From top to bottom, hydrogen bonds are accumulatively cut according tO their energies. Majority part of the protein is in the single and large rigid core until hydrogen bonds whose energies are higher than -2.661 kcal/mol, which is equivalent to more than 2300 K in temperature, are all cut. The structure of ecDHFR is very stable in this sense. In ambient temperatures those hydrogen bonds whose energies are less than —1.0 kcal/mol should be considered as stable interactions. The flexibility properties of ecDHFR, when all hydrogen bonds whose energies are less than —1.0 kcal/mol are counted as constraints while all the other hydrogen bonds are eliminated, is shown in Figure 6.13. The core of the protein is rigid as expected. The M-2O lOOp is flexible. The PRO21 residue is shown as a rigid residue because its main chain ()5 angle is locked by a five fold ring. Appendix B.3 discusses the distribution Of main chain dihedral angles of proline residues in detail. In addition to the M-20 loop, the residues from 118 to 129 and from 142 to 150 are also flexible. Because these two ranges of residues are Close to the M-20 loop in the coordinate space, the conformational changes of these two ranges of residues are coupled to the conformational fluctuation Of the M-20 lOOps through van der Waals interactions, hydrogen bonds and hydrophobic interactions. 127 882:5: 0:38: 2: 8: no: 33 :0 3:: BC. .388 to :8 :55 88.613 2: 8:. d3 2: :o 5:: no 5:38 8:88 2E. 8.5858 580:: 2: :_ wfiEwEO: 8:02 88:83 mo SAFE: on... 8: d2 2: :0 Ed: mo 5:38 85 2; :23: Emu 2:8 a 3 w:o_mn :28 2:8 2: 13 8:38 88:.me 8: 83:3. .80me BE: 305m 8:2:me .8: 8:28 8:: x82: 22% :03 gage ar:w_m 388%; 8:: :59 .388 to a8 :8: 89533 H2835: w as mmEQom £88: 2: $8838 @5— _c..:oN_:o: 8mm .mmmw88 to :8 9:5 88:33 9:88:28 a: mmIQoo 2: mo 52585 on? “mmc mSmE 128 u u u u u an h n u ww~.m- m. u u ... n u ... nu I u u Row- 8 u u u I" I T'lxll .fu wwmé- mm u u u III-ill. u n wmnd- vm u u u lillll—l'llll u u :33- am u u n In T" II-Tl u u 80m- cm ... u u Ifllllllxl u u 30m- .m u u u IilllllT'l—il u u wand- mm u u u 'Tllll-I'IIIIlI u u mg.” 8 .llllllflllllllllflllfl God. we .lllulllellllllITllllllil-l' sand- 3 .lllllIlT-lllll'alllllllalllvfllqlll. SWN- 8 IltlllIllllll-lllllfllill-lll 34mm- 5 ll-llchlulllllalllifil ommd- we 'lTIlil-IllfllTll-filfl omfim- ac Illllfgllalllmlllrfll-ull EON- on ll-l'IllilT'lf-l-Ill cemd- Q. lllllflilllllllIlI-ll' m5..- 2. [iii ~24- mm lllllqlllTll-ll Bad. 2: llllllllllTlfl cmmd- m: I'llllllllll 3.6. _~_ lilTll-ll 83. v2 ill-ll coed- a: in! coed of Ill-lull coed me. Ill oood mm: III—ll seed at ill-l coed a: In! coed a. In! oood mom Iii coed Km 1! omm _ _ _ _ _ _ _ _ _ _ . _ . . _ _ m .5953: of oi oi oi o: oo— oa ow on 8 om ow om cm 2 _ 2.8-: 129 Residues from 63 to 72, which are shown in color red at the top of the protein in the figure, are also flexible. The conformational changes in these residues do not have obvious biochemical functions. 6.3.3 Sampling Directed Pathways ROCK samples conformations randomly. Starting from one conformation, it searches a nearby conformation in random directions. In order to search pathways directed from the occluded or the closed conformation Of ecDHFR to the other, a simulated annealing procedure [149, 150] is incorporated in our algorithm. The RMSD d between the main chain atoms in a generated conformation and the corresponding atoms in the target conformation is the pseudo-energy Of the generated conformation. It is calculated as 1 N d 2 iv' gm — r§)2 (6.3) in which r,- is the coordinate Of the ith atom in the generated conformation and the rf is that of the same atom in the target conformation. The sum over the index 2’ is over all main chain atoms Of interested residues. In this case the sum is over all main chain atoms in the M-20 loop region. Those conformations whose RMSD tO the target conformation are smaller than those of their immediately proceeding accepted conformations are always accepted. The other conformations are accepted with a probability Of exp [—-Ad/T*], in which Ad is the change Of the RMSD to the target conformation since the last accepted conformation and T "‘ is the pseudo-temperature. The pseudo-temperature is always proportional tO the current RMSD tO the target conformation by T" = d7 (6.4) In this way the pseudo-temperature is high when the RMSD to the target confor- mation is big so that it is possible for the protein to overcome some pseudo-energy 130 Rigid Isostatic Flexible 4" 'i E} Figure 6.13: Flexibility properties of ecDHFR. The protein shown in this figure is 1RX6 which is in the occluded conformation. Only those hydrogen bonds and hy- drophobic interactions that are present in both the occluded and the closed confor- mations are included in the flexibility analysis. Residues shown in blue and gray are rigid. Residues shown in yellow, gold and red are flexible. The flexibility property is analyzed by the software FIRST. 131 barriers. The pseudo-temperature is low when the RMSD to the target conformation is small so that only those conformations which are closer to the target conforma- tion than their previously accepted conformations are accepted. The setting Of the parameter T depends on the properties Of the conformational space Of the proteins being studied. It is set to be 0.05 in this calculation. The generated conformations are dragged toward the target conformation when the RMSD decreases. Because the trajectories generated by our algorithm are not driven by any empirical potential, they are not the real dynamic pathways Of the proteins. However since all of the conformations generated by our algorithm are feasible conformations, the connections Of all these conformations link to form possible conformational pathways. These conformational pathways are statistically correct. The properties Of an ensemble of possible conformational pathways generated by our algorithm will be the same as those of an ensemble Of conformational pathways generated by MD or any other algorithms, when the size of the ensemble is large. 6.3.4 Conformational Pathways Of DHFR Pathways from the Occludedl to the Closed Conformation Six trajectories starting from the occluded conformation targeting at the closed conformation are generated by our algorithm. All Of these calculations generate thou- sands Of conformations within two to three days of CPU time on a single AMD Athlon 1900+ processor. The parameter settings for all Of the six pathways are the same ex- cept the initial random seeds. The random numbers used in the program ROCK are generated by the program authored by L’Ecuyer et al. [151]. The region of our interest is the M-20 loop. There are 11 residues in the M—20 loop. Each residue has four main chain atoms. Therefore the conformations are best defined in a 3 x 4 x 11 = 132 dimensional space. TO simplify the analysis, we define three reference points in the high dimensional space, which are the occluded, the 132 vvvvvv [vvvvvvvvvlivvf‘v1‘virl’rvv rEieétoiy '1 ' j . : Trajectory2 x 1 * t_ o . . rajectorya I : “Twagfi‘ s6 q) Trajectory4 a , r I .[j J LE] l'.,u" ".r F l r +- Q1 [3 D“. d? Traiecto 6 0 l , ‘I‘M I . ‘ D b l W 4 f ‘34.? i ‘ : <3 : ‘. : .5 3 ~ _‘ Ti . I g . o E o 0 g . . 2 2 :- - O . 9. O . (D P 2 L I . 1 - _ l . I f L 0 f 1 L LJ_L L l 4._L . I ......... I ......... l 0 1 2 3 4 RMSD to Oocluded Conformation (A) Figure 6.14: The correlation of RMSD Of the six trajectories tO the occluded and to the closed conformations. The RMSD Of all six trajectories are exactly 0.0A to the occluded and roughly 4.0A to the closed conformations in the beginnings. The calculations are terminated when the RMSD to the closed conformations are below 1.0/3.. closed and the Open conformations. All the conformations in the high dimensional space then can be projected to a simpler three dimensional space, in which the RMSD of a conformation to the three reference points are its coordinates. Trajectories of conformations are easily tracked and examined in the three dimensional space, at the cost that some information is lost when trajectories in high dimensional space is expressed in the newly built three dimensional space. Figure 6.14 illustrates the correlations between the RMSD of generated confor- mations to the occluded and the closed conformations. Since the calculation begins from the occluded conformation, the RMSD of conformational trajectories to the oc- cluded conformation are exactly zero at the beginnings. The RMSD of trajectories 133 to the closed conformation are more than 4.0A at the first several snapshots. Our calculations are terminated when the RMSD to the closed conformation are below 1.0A. Because the bond lengths and angles in the starting conformation, the occluded conformation in this case. are not exactly the same as those in the ending conforma- tion which is the closed conformation. our calculation is not capable of driving the RMSD to be exactly zero because the bond lengths and angles are not changed in our algorithm. In a simple test, we build a conformation in which the bond lengths and angles are identical tO those Of the occluded conformation, and the dihedral an- gles are the same as those of the closed conformation. The RMSD Of the best fit Of this manually built conformation to the closed conformation is about 0.6A. Therefore driving the RMSD down to the vicinity Of 0.6A is the limit Of our algorithm when the bond lengths and angles are not disturbed. In reality the lowest RMSD Observed in calculations is between 0.8A and 1.0A. All six trajectories do not pass the Open conformation, as shown in Figure 6.15. All six trajectories are distinctly away from the Open conformation at any point. The smallest RMSD of the trajectories to the Open conformation is about 2.0A. This fact is repeated in Figure 6.16 in which the correlations Of RMSD tO the closed and to the open conformations are shown. All the correlations between the RMSD of the trajectories to the three confor- mations prove that these six trajectories do not differ from each other much. All trajectories fall into same regions in all correlations shown in Figure 6.14, Figure 6.15 and Figure 6.16. This is reasonable considering that there are not Obvious barriers among the occluded, the closed and the Open conformations. As shown in Figure 6.11, it is not as crowded in the M-20 lOOp region as it is in other regions in the protein, even when the van der Waals repulsion of the side chains are taken into account. Therefore the M-20 lOOp can transform from the occluded conformation to the closed conformation without as much difficulty as conformational changes in other regions 134 ...... , ,,, : Trajectoryi + , . + TflljFrluly 2 - . . 6‘ y»: Trajectory 3 I ‘ 4 .i'crfip ‘ Trajectory 4 G " mafia: 1 [W ' ‘ : have“. Tra'ecto 6 0 ‘9‘} g o ‘ ry s i w .5 3 r ‘ ‘65 g . 2 I c . o 0 I c . 5‘ 2 " ‘ 2 o 1 m . 5 I I 1 1 - _‘ o ......... l ......... l ......... l ......... I ‘ 0 1 2 3 4 RMSD to Oocluded Conformation (A) Figure 6.15: Correlations of the RMSD Of trajectories to the occluded and to the open conformations. All six trajectories are not close to the Open conformation at any point. 135 + ......... , ......... , ........ , ....... l V v t 4 L 3" I - C ~- E :@i 8 + . )r '3 : 1&9” : .5 3 r 1 ’5 ~ ‘ E : 1 o . ‘E ; Tralectory1 + 1 o _ Trajectory2 - 4 o , . ., , Trajectory3 I < 5 2 L Trajectory4 r: _‘ 8- , T'f.j(1"tl:,‘ a o o _ Trajectory6 o D l. w . 2 (I 1 - _ i 4 0 ' ......... L ......... 1 . . . . . .4 A . l . . . LA . l . . . . l 0 1 2 3 4 RMSD to Closed Conformation (A) Figure 6.16: Correlations of the RMSD of trajectories to the closed and to the open conformations. This figure, together with the two previous figures, proves that the six trajectories do not differ from each other much. 136 of the protein encounter. Since the overall shapes of the correlations are almost linear in all of the three figures, we hypothesize that the conformational pathways between the occluded and the closed conformations are roughly linear connections between the two. Real vs. Main Chain (1) and 2.x"; Space Similar to the analysis in real space, we define the dihedral angle RMSD 0 (DARMSD) of the trajectories to the three conformations as 1 IV I C ‘7 9 = W E [(e. — a)? + (w.- — z/ )2} (6.5) in which (15, and w,- are main chain dihedral angles of a generated conformation and 96f and 2,525 are the corresponding dihedral angles in the closed conformation. The sum is over all residues of interest, which are residues in the M-20 loop in this case. Since there are 11 residues in the M-20 loop, the conformations are best described in a 22 dimensional space which is spanned by the 22 main chain dihedral angles. Similar to the analysis in real space, we define three reference points which are the closed, the occluded and the open conformations respectively. Any conformation is then expressed in a smaller three dimensional space in which the RMSD to the three reference points are its coordinates. Figure 6.17 shows the correlations of DARMSD of the six trajectories to the occluded and to the closed conformations. Surprisingly, the DARMSD of the six trajectories to the closed conformation are always bigger than 50°. Some trajectories do not show any inclination to smaller DARMSD values at all. The DARMSD of the six trajectories to the closed conformation in real space are all below IDA when calculations are terminated, but the DARMSD of the same trajectories to the closed conformation in dihedral angle space oscillate around big values. 137 ......... l . . . . . . . . r Trajectory l + Trajectory 2 Trajectory 3 85 - Trajectory 4 OUT]!- Tréjectory 6 80 75 70 65 Dihedral Angle RMSD to Closed Conformation (°) 55 0 10 20 30 40 50 60 70 Dihedral Angle RMSD to Oocluded Conformation (°) Figure 6.17: Correlation of the RMSD of the six trajectories to the occluded and to the closed conformations. The DARMSD is defined in Equation 6.5. 138 To further investigate the difference between the RMSD in real space and the DARMSD in dihedral angle space, we analyzed the similarities between a generated conformation and the closed conformation. This generated conformation is one of the conformations which are. very near to the closed conformation in term of RMSD in real space. Table 6.1 lists the difference in coordinates of the main chain atoms of the l\I-20 loop between the two conformations. The RMSD between these two conformations is mere 1.057A in real space. The biggest distance between corresponding atoms in the two conformations is only 121613;. All these data prove that the generated conforn‘iation is almost identical to the closed conformation. Figure 6.18 shows the M-20 loop main chain atoms of both the generated and the closed conformations. Though not perfectly matched, the two conformations wind around each other like the two chains of the double helix. It is obvious that these two conformations resemble each other. Though these two conformations are virtually the same in real space, they are quite different in the main chain ¢ and I/J angle space. Table 6.2 lists the main chain (1') and 111 angles of the M-20 loop in the generated and in the closed conformations. The main chain dihedral angles are calculated by the software ViewerLite 5.0 [152] to eliminate the possibility that our program may be erroneous in calculating the main chain d) and w angles. The DARMSD between the two conformations is 569°. Differences in (15 and 1/1 angles are also listed in the table. The difference between some corresponding dihedral angles can be more than 100°. When the RMSD between two conformations is exactly zero in real space, the DARMSD between them must also be zero in dihedral angle space. Therefore, it is intuitive to assume that when the RMSD is small in real space the DARMSD must also be small in dihedral angle space. In contrast our calculations show that the RMSD in dihedral angle space can be very large even when it is small in real space. It is easy to prove that the case of small DARMSD in dihedral angle space 139 3.1: A}; A; R N -0305 1.112 -0073 1.155 1LE14 C0 -0550 1.077 0.132 1.216 C -0.163 0.298 0.123 0.361 N 0.016 0.344 -0248 0.424 GLv15 Co, 0.406 -0391 -0202 0.599 C 0.259 0.023 -0.600 0.654 N 0542 -O.683 0.308 0.925 .\IET16 Ca 0.288 -0304 0.111 0.433 C -0402 -0.164 0.169 0.466 N 0.457 -0.281 0.196 0.571 GLL’IT CO -0001 -0152 0.190 0.250 C -0852 0.139 0.762 1.151 N 0302 -0.736 0.025 0.796 ASl\'18 Ca -0.136 -0.632 0.371 0.745 C —0.450 -0377 0.056 0.590 N -0030 -0244 -0312 0.397 ALAIQ CO 0169 -0111 -0544 0.580 C 0.584 —0.081 -0498 0.772 N -0195 0.045 0.246 0.317 MET20 CO 0.390 0.190 0.425 0.607 C 0.277 0.104 0.076 0.305 N 0.458 0.357 —0.045 0.582 PR021 CO 0.375 0.277 -0130 0.484 C -0147 0.102 0.492 0.524 N -0015 0.643 -0.783 1.013 TRP22 CO -0403 0.590 -0427 0.832 C 0.241 0.223 -0141 0.357 N -0.708 0.421 -0152 0.838 AS.\"23 CO -0333 -0092 0.132 0.370 C -0414 -0053 0.349 0.544 N 0.052 -0372 -0545 0.662 LEU24 Ca 0.054 -0352 -0477 0.595 C -0237 0.000 -0701 0.740 Table 6.1: Difference in coordinates of main chain atoms in the M-20 loop between the generated and the closed conformations. The difference in coordinates in :r, y and z are listed, together with the distance R between corresponding atoms in the generated and in the closed conformations. 140 Figure 6.18: Superimposition of the M-20 loop of the generated and of the closed conformations. One conformation is colored red while the other is colored blue. The two conformations are almost identical. 141 Generated Closed 4(0) 410) 44°) 44°) 444°) 444°) ILE14 -79.8 -46.1 -111.7 -13.4 -31.9 32.7 GLY15 179.1 133.7 —162.0 161.4 18.9 27.7 .\IET16 -54.0 167.6 -147.4 109.8 ~93.4 -57.8 GLU17 29.3 -58.9 50.6 59.6 21.3 118.5 ASN18 163.0 24.0 53.9 31.8 -109.9 7.8 ALAIQ -153.7 105.9 -148.3 154.7 5.4 48.8 MET20 -54.1 127.3 -99.9 123.5 -45.8 -3.8 PRO21 -79.7 57.6 -66.8 —46.5 12.9 -104.1 TRP22 —120.1 -145.9 -65.6 178.0 54.5 -37.1 ASN23 165.5 118.1 -134.8 89.2 59.7 -28.9 LEU24 176.2 92.2 -117.4 78.2 66.4 -14.0 Table 6.2: Main chain (25 and 1,0 angles of residue 14 to 24 in the generated and in the closed conformations. Angles larger than or close to 100.0° are marked by bold font. and large Rh‘ISD in real space is also possible. Suppose one dihedral angle in a chain molecule is rotated by 180°. The RMSD between the resulting and the initial conformations is large because the flip of one dihedral angle can drag part of the molecule far away from its original coordinates. The DARMSD between the two conformations is small because only one dihedral angle is altered. The lack of correlations between the RMSD in real space and the DARMSD in dihedral angle space has been seen before. In their study of the variations of main chain (15 and d) angles in different conformations of proteins, Kern and Rose [153] discover that the main chain conformation of a protein is not deformed when the main chain dihedral angles are rotated compensatorily. The effect of a big change in the main chain 212 angle in one residue may be offset either by the rotation of the main chain (0 angle in the next residue by the same amount in the opposite direction, or by small rotations of several main chain dihedral angles in nearby residues. Vieille [154] observes the same phenomenon in her analysis of the conformational fluctuations in a MD simulation trajectory. The counter-intuitive fact that the RMSD in real space is not correlated with the 142 DARMSD in dihedral angle space disqualifies the usage of the main chain dihedral angles in the analysis of the similarities between conformations. Whether two confor- mations are alike should be investigated in real space. On the other hand, the analysis of the variations of the main chain 05 and 2,0 angles can be used to pinpoint the exact positions of the conformational changes. Whether to inspect the deviations in real space or in dihedral angle space therefore depends on the questions to be answered. Pathways from the Occluded and the Closed Conformations to the Open Conformation The six conformational trajectories between the occluded and the closed confor- mations do not pass the open conformation. It is not clear whether it is because the trajectories do not need to pass by the open conformation, or because the constraints in the occluded and the closed conformations inhibit the sampling of the open con- formation. To clarify this point, two more conformational trajectories are sampled by the ROCK. One trajectory starts from the occluded conformation targeting at the open conformation. The other one starts from the closed conformation targeting at the open conformation. A trajectory between the closed and the Open conformations is successfully ex- plored. The RMSD between generated conformations and the open conformation drops to around 1.014 after ROCK samples about 2000 conformations. A trajectory between the occluded and the open conformations is also found. The best RMSD between generated conformations and the open conformation is around 1.214.. The closed conformation is not on the trajectory between the occluded and the open con- formations. The occluded conformation is not on the trajectory between the closed and the open conformations too. To summarize, ROCK successfully generates direct pathways between any two of the three conformations _ the closed, the occluded and the open conformations. 143 The pathways between any two conformations do not pass the third one. Therefore it is possible that the three conformations form a triangle in the high dimensional conformational space. Direct pathways between any two conformations can be built, without the necessity of passing through the third. However our calculations are not conclusive. The trajectories are sensitive to the constraints. The addition or removal of one or more constraints could change the characteristics of the trajectories. The removal of constraints enables a protein to sample larger conformational space. The addition of constraints compresses the conformational space of a protein. For example an addition of one or few crucial constraints may disqualify the six conformational trajectories between the occluded and the closed conformations. The ecDHF R may have to go through the open confor- mation along its pathways between the occluded and the closed conformations in this case. The influences of the constraints on the properties of protein motion trajectories deserve further analysis. 144 Chapter 7: Summary and Perspectives 7. 1 Summary This thesis covers two sub-topics of modeling non-crystalline networks: the mod- eling of discontinuous networks and the sampling of conformations of non-crystalline networks. We propose and test two algorithms on these two topics. The first algo- rithm is dedicated on the rearrangement of dangling bonds in network models so that 1) all atoms in the resulting networks are all fully coordinated and 2) the bond length and angle distortions are within acceptable limits. The second algorithm samples conformations of non-crystalline networks which are composed of complicated ring clusters. Proteins are such networks. The building of the DCRN models starts from crystalline networks. Voids are cut from the networks, resulting in dangling bonds at the surface of the voids. The defects involving dangling bonds at three-coordinated atoms are transferred to their nearest neighboring atoms at each step of the defect migration process. The defects are then removed when they come together. The defect migration process creates fully coordinated networks. The networks are randomized to be amorphous by the WWW technique. Since the bond switch process is local in the WWW technique, the voids in the initial networks are roughly unchanged before and after the WWW process. The resulting network is random, but discontinuous in the sense that it can have built in voids of any shapes and sizes. The insertion of oxygen between all silicon-silicon bonds makes amorphous silica models. This algorithm is demonstrated to build the amorphous fiber silica and the amorphous film silica models. The distortions of bond lengths and angles are reasonable in both models. The variation of distortions in atom layers suggests that the surface effects diminish within five layers to the surfaces. Since the overall shapes of the amorphous fiber silica and film silica models are confined in one or two dimensions, the PDF of these two models are not directly comparable with that of the CRN silica models. The effects of the overall shape on the PDF can be eliminated by dividing the RDF of the models over that of the continuous and uniform media of the same shape. The obtained RDDF of both the amorphous fiber silica and film silica models are quantitatively in good agreement with that of CRN silica models, except a small shoulder at the second peak which is caused by distortions at the surfaces. The procedure to build the DCRN models can be used to build any glassy network models. Moreover, defects and hydrogen can be introduced into a fully coordinated random network by reversing the defect migration process used to build the fully coordinated networks. Hydrogenated networks can be built with any concentration and distribution of hydrogen. If desired, models produced by this algorithm can be further optimized by other techniques, for example by MD simulations with empirical potentials or by ab initio algorithms. The refined models are then suitable for the studies of the electron states of defects, of the photo luminescent characteristics of voids, and of the electron distributions around hydrogen atoms et al. The amorphous metal-adamantane network is modeled starting from a CRN gallium arsenide model. Gallium and arsenic atoms are replaced by the metal atoms and the adamantane units respectively. Since the amorphous gallium arsenide model does not have odd-numbered rings, each metal atom is bonded to four adamantane units and each adamantane unit is bonded to four metal atoms. The calculated X-ray powder diffraction pattern fits that measured in the synchrotron experiment. There are only covalent bonds in the DCRN models. The concentration of the 146 covalent bonds in DCRN networks is so high that DCRN models we examined are all rigid. The concentration of bonds in another type of non-crystalline network, namely the protein, is not high enough to restrict the relative positions of atoms, so some regions of proteins have multiple conformations. Proteins have evolved in such a way that some regions of the proteins are rigid while the other regions are flexible. Constraints in the rigid regions stabilize the proteins and define a template for interacting with other molecules. The flexible regions of proteins can carry out biological functions. Since distortions of bond lengths and angles are energetically unfavorable, it is preferred to sample the conformations without disturbing the bond lengths and angles. This task is not easy for proteins, because there are lots of rings in proteins which are composed of covalent and hydrogen bonds. The concentration of the bonds is high, so virtually all these rings are inter-locked with each other. The bonds linking the rings bring extra constraints on how rings are relatively positioned against each other. It is therefore vital to build an algorithm that closes all the rings simultaneously. Inspired by the RCE proposed by Go and Scheraga [73], we define a fictitious ring closure potential which is the sum of the squares of the RCE. A set of dihedral angles is the solution to RCE if and only if it makes the fictitious ring closure potential to be zero. Since the fictitious potential is non-negative everywhere, the potential is at one of its minima whenever its value is zero. Therefore by minimizing the fictitious potential, our algorithm is able to numerically solve the RCE. This method can close any rings, regardless of how the non-rotatable and rotatable dihedral angles are mixed. The main advantage of this method is that by minimizing the total fictitious potential of all the rings in a complicated ring cluster, it is able to close all the rings simultaneously. All the bond constraints are concurrently satisfied. The ability to close all rings and to generate stereo-chemically correct structures makes our algorithm efficient in sampling conformations for proteins, which are composed 147 of numerous inter-locked rings. Our method of solving the RCE is tested on a model molecule, which is made up of four inter-connected rings. The molecule has only two DOF, so its conformations can be easily projected on a 2D graph. The distributions of the conformations gener- ated by our algorithm show two symmetries which are identical to the two symmetries of the topology of the molecule. This fact supports the claim that, given enough cal- culation time, our algorithm can sample all conformations of a small macromolecule, if these conformations are continuous in the conformational space. While all the dihe- dral angles in a large macromolecule cannot be sampled exhaustively, the combination of ROCK and FIRST makes ROCK sample the important DOF in the system. The application of flexibility analysis [88] to proteins reduces the calculation cost by differentiating the flexible regions from the rigid regions in proteins. The rigid re- gions of proteins have negative or zero DOF. They may have multiple conformations, but these conformations are well separated in the conformational space, so that it is reasonable to assume these rigid regions do not sample more than one conformation under usual conditions. The program ROCK samples conformations only for the flex— ible ring clusters which are identified from the flexibility analysis. It saves calculation time by avoiding sampling conformations for the rigid regions of proteins. ROCK first perturbs the rotatable dihedral angles in the flexible regions of pro- teins by modest rotations which are typically within 5° to 10°. It then closes all the rings by minimizing the total fictitious ring closure potential. It randomly samples side branch conformations as well. It has three minimal requirements on side branch conformations: 1) the bond lengths and angles should be undistorted from their orig- inal values; 2) there should be no van der Waals overlaps between non-bonded atoms; and 3) the chirality at all chiral centers should not be inverted. In the case when a bond in the side branch is not rotatable, a distance constraint between the two nearest neighboring atoms of the two ends of the bond is imposed, so that the bond 148 is effectively locked. ROCK also checks the quality of the generated protein conformations on the main chain Ramachandran plot. It rejects those conformations whose main chain (,0 and 17' angles are not in the preferred regions in the plot (see Appendix B). The distributions of the dihedral angles in the side chains have certain patterns [155]. An additional check on the quality of side chain conformations against the dihedral angle distributions in the rotamer library [156, 157] could be added to the program. However the additional check would bring an additional calculational cost to our algorithm. The side chains often adopt to non-rotameric dihedral angles. Moreover, at least in the examples of HIV-1 protease and DHF R, the main interest of sampling the main chain conformations does not require elegant algorithms on the side chain conformation sampling, if there is a feasible set of side chain conformations for the given main chain conformation. By keeping only the bond length and angle constraints, ROCK samples the conformations that are consistent with the constraints. ROCK has been applied on two proteins, the HIV-1 protease and DHF R. Our algorithm did a very good job on sampling the conformations of the HIV-1 protease. The distances between the tips of the flaps can be as large as 8.0A in some confor- mations. This is indeed a big distance considering the fact that such distance is only 2.7A in the crystal structure of the protease which is used as the initial conformation. Our calculation shows that a curling motion is not necessary for the two flaps to open. A detailed analysis of the bond constraints in the tips of the flaps suggests that such curling motion may not exist at all. Because the hydrogen bond between the main chain hydrogen atom of the residue GLY52 and the main chain oxygen atom of residue GLY49 encloses the main chain 05 and 0) bonds of residue GLY51 into a ten-fold ring which has only one DOF, the distribution of the main chain 03 and 1,0 angles of GLY51 is limited in a very narrow region in the Ramachandran plot. One explanation is that this hydrogen bond is not dynamically stable, though it has a 149 favorable energy in the crystal structure. However, it is safe to say that any curling motion is independent of the slow flap motions. ROCK has been used to generate six conformational trajectories between the occluded and the closed conformations of the protein ecDHF R, one trajectory between the closed and the open conformations, and one trajectory between the occluded and the open conformations. Based on the prOperties of these conformational trajectories, we hypothesize that the open, closed and occluded conformations of ecDHFR form a triangle in the conformational space. The direct conformational pathways between any two conformations do not necessarily pass the third one. The six conformational trajectories between the occluded and the closed conformations are very similar to each other. We also analyzed the similarities between the generated conformations and the closed conformation. The RMSD in real space between the generated conformations and the target conformation can be as low as 1.0A. The DARMSD in dihedral angle space between the same conformations and the closed conformation are always large. This analysis yields the surprising fact that there are virtually no correlations between the RMSD in real space and the DARMSD in dihedral angle space. Two conforma- tions that are close in terms of RMSD in real space may have little similarity in their dihedral angles. Real space RMSD values of two conformations that are similar in their dihedral angles may be large. Therefore, caution should be taken in choosing how to analyze protein trajectories. Conformations should be compared in real space to answer the question of whether they are similar. The analysis in dihedral angle space is good at locating the dihedral angles that have big variations and result in conformational changes. 150 7 .2 Limitations Both the algorithms to build the DCRN models and to sample protein confor- mations are somewhat limited in their applications. The former algorithm is suitable for modeling amorphous networks made of strong covalent bonds. The atoms in the network models are limited to the group IV semiconductor elements of silicon and germanium, and to the group VI elements of oxygen, sulfur and selenium. The four strong covalent bonds of group IV elements usually form a tetrahedron. Covalent bonds of group VI elements favor certain geometries too. The bonds of all elements mentioned above can be described simply by geometries of bond lengths and angles. It is for this reason that our algorithm to build the DCRN models and the WWW technique to build the CRN models are able to predict the characteristics of these net- works based on simple geometry considerations. Both our algorithm and the WWW technique do not apply when the covalency of elements is weak. Two other group IV elements, tin and lead, bond with their nearest neighbors more like metal than cova- lent elements. Our procedure is not appropriate in building glassy networks for these two elements. Valency electrons in the metallic atoms flow freely around, causing the bonds around metallic atoms to lack fixed bond angle geometries. Therefore, our algorithm is not suitable for the building metallic glassy networks or semi-conductor glassy networks with metal atoms. Proteins are mainly built of carbon, nitrogen, oxygen and hydrogen. All these atoms are bonded together by covalent bonds and by hydrogen bonds with certain optimal geometries of bond lengths and angles. Our algorithm to sample protein conformations is also based on these simple geometry considerations. Care should be taken when sampling conformations of proteins with buried metallic atoms to model their bonds correctly. All of the limitations in both algorithms are caused by the lack of consideration of quantum effects and electronic states. In general, our algorithms cannot be used to 151 study reactions related to the re-distribution of electrons. The initial and final states of the hopping of a hydrogen atom in the amorphous silica network can be simulated in our algorithm, but the detailed bond breaking and reforming mechanism cannot be. Though our program ROCK can sample conformations of ecDHF R, it does not study such questions as how a hydrogen atom is transferred from NADPH to the DHF in ecDHFR. These types of questions can be addressed by QM / MM hybrid simulations. 7 .3 Applications and Perspectives Non-crystalline networks have rich physics due to the lack of long range or- der. Local atomic arrangements in non-crystalline networks are roughly identical to those in crystalline networks. For example, the bond angles at silicon atoms in non- crystalline networks are all roughly 109°, which is the optimal silicon bond angle in crystalline networks. But non-crystalline networks do not have any long range order as crystalline networks do. The absence of long range order brings two difficulties in simulations: the complicated energy landscape and the large number of atoms included in the simulation. The number of atoms involved in ab initio calculations is usually between 10 and 100. Due to the long range translational and rotational symmetries, electronic properties of roughly 10 or more atoms in a small supercell in the crystalline network are identical to those of all of the other atoms in the whole network. Therefore, the simulations in crystalline networks can be very accurate through the use of ab initio algorithms. Simulations in non-crystalline networks, however, do not have such an advantage. Because of the dearth of long range order, an infinite number of atoms, in principle, should be included in the simulation. The supercell of an amorphous network typically has thousands or even millions of atoms. The supercell of MD simulations on proteins with explicit water contains one large protein surrounded 152 by tens of thousands of water molecules. In all of these cases, the precise quantum calculations have to be confined to a small region of the network, for example a dozen or fewer atoms on the surface of amorphous silica, or ten atoms or so in the catalytic sites of proteins. The large scale structures and the long range motions of non-crystalline networks have to be modeled by empirical algorithms. The local distortions in bond lengths and angles lead to complicated energy landscapes. Since atoms in one supercell are equivalent to the atoms in all the other supercells in a crystalline network, the energy levels of the one supercell are degenerate with those of the other supercells. The energy landscapes of crystalline networks are clean and easy to interpret. The degeneracy is lost when bonds are distorted. The consequence of the distortions is a very complicated energy landscape with numerous local energy minima. The energy barriers between these local energy minima may vary a lot. The energy barriers of the amorphous networks, for example, are high enough so that the networks are trapped in local minima even under high temperatures. An amorphous silicon network is in one local minimum in the whole energy landscape of silicon networks, because its potential is higher than that of a crystalline silicon network. The energy barriers of proteins, on the other hand, are low enough so that proteins can transform from one conformation to another fairly freely under room temperature. Our model molecule described in Section 6.1 exemplifies how complicated the energy landscape can be even for small molecules. It has more than 230 local minima, some of which are shown in Figure 6.2. The energy landscape of this simple molecule containing only 36 atoms is already complicated, not to mention a large protein or a large amorphous fiber silica network. The protein energy landscapes have been studied by various techniques [158, 159, 160, 161]. The funnel picture [162, 163, 164, 165] has been accepted by many scientists. It states that a protein has one global minimum conformation, which is the folded state, and many other local minimum conformations, which are the intermediate or the unfolded states. The whole. landscape is very rough, with numerous local minima. The sampling of all local energy minima is not possible for large proteins. The sampling of all minima for small molecules and for large proteins in the vicinity around a. given conformation are computationally feasible. Our program ROCK cannot be utilized directly to sample the local minima in protein energy landscapes, because the effective force field used in ROCK is not the same as the usual ones used in MD simulations. The force field used in ROCK is a sum of a series of infinitely high step functions. Any violations of the bond length and angle constraints or any van der Waals overlaps push the effective potential to be infinitely high. The dihedral angles are freely rotatable, as long as the rotations do not result in van der Waals overlaps and the break of rings. The force fields used in standard MD simulations are much more complicated. Potentials of bond length and angle variations are high but finite. Many weak interactions, such as the dihedral angle rotation interaction and the elec- trostatic interaction, are included in the force fields used in standard MD simulations. The advantage of ROCK is that by ignoring the details of energy variations caused by the weak interactions, it is capable of easily jumping over small energy barriers that require calculation cost in other algorithms utilizing more complex force fields. The coupling of ROCK and commonly used force fields enables a fast sampling of conformations without losing details of energy landscape. Such an algorithm can be a good tool in scanning the maps of local minima of the protein energy landscape. The sampling of local minima of the model molecule in Section 6.1 is such an ex- ample. There are also methods of predicting saddle points [166, 167, 168] around local minima in complicated energy landscapes. Once all local minima and saddle points are known, statistical quantities such as transition rate, relative populations and conformational pathways are easy to obtain. The goal of protein docking is to filter tens of thousands of potential ligands 154 (drugs) for their ability to match the binding sites of proteins. In principle, both the flexibility of ligands and that of the proteins should be taken into account in docking studies. In many studies. however, only the flexibility of ligands is considered, for example. the methods used in DOCK [169] and in a modified version of DOCK [170], and the more complicated approach by Given et al. [171]. Or, in some cases, only the side chain flexibility of the proteins is included in the calculations [172, 173]. It is only recently that the flexibility of main chains of proteins catches the attention in the protein docking studies [175]. Since ROCK is efficient at sampling conformations for ring clusters, it is a perfect tool to build conformational libraries in which protein main chain and side chain conformations are stored. Zavodszky et al. [176, 177] report improved docking when flexible drugs are docked to the ensemble of protein conformations created by ROCK. Flexibility of the ligands is handled by the tool, SLIDE [178, 179]. It is interesting that the length scale of both the pores in the mesoscopic porous network models and proteins are in the order of nanometers. Physics on the length scale of angstroms requires quantum mechanics to interpret. Physics on larger length scales such as micrometers and millimeters can be safely described by classical me- chanics of continuous media. It is in the length scales of nanometers to micrometers that classical mechanics based on single atoms is the suitable tool, on the condition that the interactions involved are not subject to large fluctuations. Covalent bonds are the stable interactions in amorphous material. We include covalent bonds, strong hydrogen bonds and hydrophobic interactions as stable interactions in our sampling of protein conformations. The algorithm to build DCRN models, the algorithm of ROCK, and MD simulations, are dedicated trials to build a set of geometrical lan- guage that is precise and fast to describe the physics in the nanometer length scale and up. All these efforts will facilitate future studies and simulations of larger and more complicated systems, such as protein-protein interactions, viruses, semiconductors in the nanometer range and the niicro-machines. APPENDIX 157 Appendix A: Radial Distribution Function of Uniform Media RDF of infinite and uniform media is T(r) = 477r2p0 in which p0 is the average density. The RDF is proportional to r2 in any distance range in the infinite media. The RDF of uniform media distributed in the shape of a film and in the shape of a fiber do not follow this square law due to the anisotropic mass distribution. Suppose uniform mass is distributed in an infinitely wide and flat film of thickness d in space. Let. 2 = 0 plane is the center of the film and let 3 = d/2 and z = —d/2 be the top and bottom surfaces of the film. Suppose a point (0,0, 20)T which is on the z axis is the observation point. Due to the mirror symmetry of mass distribution about the z = 0 plane, the RDF observed from a point whose z coordinate is positive is the same as that observed from its mirror point about the z = 0 plane. Therefore we limit our considerations to points above the z = 0 plane. Since the observation point should be inside of the mass distribution we have 20 g d/ 2. RDF at distance 7' observed from this point is proportional to the surface area of a sphere of radius 7‘ that is buried in the film. When the distance r is small, as shown in Figure A.1(a), complete surface of a sphere that is centered at the observation point is within the top and the bottom surfaces of the film. In this case, the RDF T i observed from the particular point at (0, 0, 20) s T(r; zo) = 47772720 (A.1) which is valid in the range of 7" E [0, d/2 — 2:0). When the radius r of the sphere increases beyond the point of r = d/2 — zo, the sphere crosses with the top surface of the film, as shown in Figure A.1(b). The 158 surface area of the sphere that is still buried inside of the film is less than 4777’). Simple calculation shows that the RDF observed from the point of (0. 0, 20)T is T(/'; :0) = (27775“) + 7rrlr - 277207')p0 (A2) which is valid when I‘ is in the range of r E [(1/2 — :0, (1/2 + .20). When the radius r of the sphere is larger than (1/2 + 20, the sphere crosses with both the top and the bottom surfaces of the film, as shown in Figure A.1(c). The RDF in the large distance range is T(‘ :30) : 277(17'p0 (A.3) which is valid when r E [(1/2 + 20. +00). Therefore the RDF observed from one particular point of (0, 0, 2:0)T is: 47772;)0 0 S 1‘ < % — :0 T(7‘§ 30) = (2777'2 + 7rd]: — 277:07‘)p0 % — zo S r < g+ 20 (A4) 277(1‘rp0 % + 20 g r RDF of the whole mass distribution is the averaged RDF observed at every point in the media. Because of the translational symmetry in the :c and y directions, any point whose 2 component is :0 observes the same RDF as shown in Equation A.4. Therefore the average is only necessary along the z axis where :0 ranges from 0 to (1/ 2. When 0 S 7‘ < (1/2, the first case in Equation A.4 is valid when 0 S 2.0 < d/2 - 7', while the second case in the equation is valid when d/ 2 — r g 20 < d/ 2. There is no 20 in the full range of 0 S 20 S (1/2 satisfies the third condition in Equation A.4. The (a) j z=d/2 @ ------------ - -----—-—----- z: z=—d/2 (b) A z=d/2 z: z=—d/2 z=d/2 z: z=—d/2 Figure A.1: A sphere of radius T. It can be totally inside of a continuous media of the shape of infinitely wide film (Figure (a)), or it may cross with one surface of the film (Figure (b)), or it may cross with both surfaces of the film (Figure (c)). 160 RDF of the film media is hence F d d 1 5"" 2 T(7‘) = E / T(-r; :0)(1:0 +/ T(r; 20)(le 5 _ 0 g—r 1 P i” . :’ . : U / 4777‘zporlzo + / (2777‘2 + 7rrl-r — 277207‘)p0d20 7; _ 0 g_r . ‘7' = 477r2(1 — a)“, (A.5) which is valid when r E [0, (1/2). When d/ 2 S r < d, the first case in Equation A.4 is invalid for all possible values of 20 in the range of [0, (1/2]. The second case is valid in the range ofr—d/2 S zo S d/2 while the third case is valid in the range of 0 S 2.0 < r —— d/2. An average of T(r; 20) over all possible :0 values produce T(r) = 477r2(1 — £8“) (A.6) which is valid when r E [d/ 2, d). The RDF of the whole media in this distance range happens to be in the same form as that in the range of r E [0, (U2). Similar analysis produces T(r) = 27rdrp0 (A.7) when r > d. Therefore the RDF of an uniform mass distribution in a film of thickness d is 47rr2(1— filpo 0 S r < d T(7') : (A8) 277d'rp0 7‘ > d The RDF of the film mass distribution is proportional to 7 instead of r2 in the large distance. Similar to the RDF of infinite uniform distribution, the RDF of mass distribution in the shape of a film is proportional to 4777‘2 in the small distance range 161 where r ——> 0. The RDF of the mass distribution in a form of a fiber can be calculated by the same technique, though the derivation is much more complicated due to the complicated integration of the spherical surface area that is buried inside. ofa cylinder. The final result is ‘ r2 r ( 7'2 ' r 47fl'2/)0[1— f—rfirt(1+lrl3)E(2—d)+:71r(1_ m)1\(:a)] T<2d T(r) = 4777'2/)0[ — £0 + {(1:._.)E‘(sin-1 (27"), 2%,) -l- .]:‘:;(1 — :_(:2)A(Z’t_1)] r > 2d (4.9) in which (1 is the radius of the fiber and functions E and K are elliptic integrations defined as <25 E(g$, k) =/ \/1 — k72811120(19 0 11'(k) = [2 do , (4.10) 0 \/1— k2si1120 The RDF of mass distributions of an infinitely long fiber is proportional to 4771"2 in the short distance range, and approaches to a constant in the long distance range. Figure A.2 shows the RDF of mass distribution in infinite media, in a film of thickness d and in a fiber of radius d. The difference in the three RDF in the long distance range is obvious. 162 30 p VVVVVVVVV I VVVVVVVVV I I 1 1' I V 7 fr 7" VVVVVVVVV I ''''''''' 1 h d b 1 V 1 b 1 F 1 y: d t 1 25 ~ 4 D . 1 : 1 b d b I P I P d L _ , , < 20 _- Infinite Media 1 » Film ———-— 4 b . d . Fiber 4 b '4 b 4 u. I 1 d o 15 ’- ~ CE I I D d D 1 d I d D Q 10 ’- -‘ p d b d ‘ D d b 1 F d t . b d 5 b— - L 1 >- 1 b 4 > 4 r 4 r- -l b 4 P d 0 > A J L l AAAAAAAAA l AAAAAAAAA l AAAAAAAA L l AAAAAAAAA T 1 2 3 4 r/d Figure A.2: RDF of uniform mass distribution in infinite (red curve), in a film thickness d (blue curve) and in a fiber of radius (1 (green curve). 163 of Appendix B: Ramachandran Plot B.1 Ramachandran Plot of Residues Other Than Glycine and Proline As discussed in Chapter 4, proteins are made up of 20 amino acid residues. As shown in Figure 4.4, the main chain 05 and the 1/2 dihedral angles are freely rotatable. Certain values of these two dihedral angles would bring the side chain groups in close contact with the main chain atoms, resulting in high van der Waals potential energies, which are unfavorable. Ramachandran et al. [110] systematically examined the intra— and inter-residue van der Waals contacts at all possible combinations of the main chain 05 and w dihedral angles. Their study led to the construction of the Ramachandran plot, which shows the favorable combinations of the 05 and 21) angles that do not produce van der Waals collisions. The Ramachandran plot was later refined to reflect the statistical preferences for (15 and 11) dihedral angle values observed in high resolution crystal structures of proteins [125, 180, 181]. These distributions are used to assess the quality of experi- mentally determined or predicted protein structures. Except for glycine and proline, the other 18 standard amino acid residues share a similar distribution of favored 05 and 112 angles in the Ramachandran plot, shown in Figure B.1. The values of the (15 and w angles are most likely to be in one of the three large regions of the plot. The (15 and 1/2 angles of right handed a helices, of left handed a helix’s, and of 13 sheets are in the regions whose centers are at (05 = —100°, 11) = -30°), (¢ = 60°, 212 = 50°) and (0’) 2 —120°, 0) = 150°) respectively. The plot by Morris et al. [125] is widely used in protein structure validation. Based on the experimentally observed statistical preferences, the boundaries of these regions can be set at various cut off values, depending on what percentage of the angles 164 Ramachandran Plot (Excluding Gly and Pro) 180 :zjzl 120 60 -1 89180 —120 -60 Figure B.1: Reproduction of the Ramachandran plot by Morris et al. [125]. Red, yellow and brown blocks represent the core, allowed and generously allowed regions, respectively. White regions represent the disallowed regions. A residue has van der Waals collisions with its adjacent residues if its db and 1,11 angles are in the white regions in the plot. 165 fall within the given boundaries. Over 90% of the residues in the protein structures fall in the core regions. Most of the remaining 10% of the residues are in the allowed regions. Morris et al. designate the regions that are within 20° of the allowed regions as generously allowed regions. The percentage of the residues in the generously allowed regions is at most 1% or 2% in well-resolved protein structures. The remaining areas of the plot are the disallowed regions. Figure BI is a reproduction of the original data by Morris et al. [125]. The core regions, the allowed regions, and the generously allowed regions are represented by red, yellow and brown blocks in the figure. Our program ROCK utilizes this Ramachandran plot to check the stereo-chemical quality of residues other than glycine and proline in generated conformations, and can be set to discard those conformations that have (I) and 0) angle values in the generously allowed and / or disallowed regions. B.2 Ramachandran Plot of Glycine Unlike the other standard amino acid residues, glycine has only one side chain atom, which is hydrogen, as shown in Figure B.2(a). All other residues contain larger side chains. Because the van der Waals radius of a hydrogen atom is small, the d and w angles of a glycine residue can be in a large range without introducing intra— or inter-residue van der Waals contacts. The Ramachandran plot for glycine is therefore qualitatively different from the plot of the majority of residues which is shown in Figure 8.1. The Ramachandran plot containing the distribution of d) and w angles of glycine residues in high resolution crystal structures [182] was provided to Dr. Zavodszky by Dr. Laskowski (University College London) . The data show the numbers of occurrences of glycine main chain (0 and w angles in protein structures in 8° x 8° blocks. The number of occurrences in every block is first converted into the frequency of 166 (a) side chain Figure B2: The glycine residue shown in Figure (a) has only one hydrogen atom in its side chain. The side chain of proline shown in Figure (1)) links with the main chain to form a five-fold ring. Carbon, nitrogen, oxygen and hydrogen atoms are represented by green, blue, red and white spheres respectively. The side chain atoms in both residues are enclosed in circles. occurrence by dividing the number of occurrence by the total number of occurrences of d and w angles in the whole 360° x 360° range. Starting from the block with the highest frequency of occurrence, we mark the blocks in order of their decreasing occurrence until 90% of occurrences are accounted for. These blocks form the core regions in the glycine Ramachandran plot. All the other blocks with a non-zero frequency of occurrence are labeled as allowed regions of the glycine Ramachandran plot. The generously allowed regions were defined following the same procedure as in the case of the non-glycine and non-proline residues. The final glycine Ramachandran plot is shown in Figure B.3. Our glycine Ramachandran plot is in qualitative agreement with the plot created by Lovell et al. [181]. B.3 Ramachandran Plot of Proline Proline is a unique amino acid. Unlike all the other residues whose side chains are bonded to their main chains through one bond, proline has two covalent bonds between its main chain and side chain forming a five—fold ring, as shown in Figure B.2. Glycine Ramachandran Plot '18-0180 -120 -60 0 60 120 180 Figure B.3: The glycine Ramachandran plot shown in 8° x 8° resolution. The red, yellow and brown blocks are the core, allowed and generously allowed regions. The white regions are disallowed regions. 168 As discussed in Chapter 4, a five-fold ring is rigid, so the main chain 6) angle which is in the five-fold ring of proline is not rotatable. The (f) angles of proline residues in all high resolution protein structures are therefore restricted within a narrow range. Following the same procedure by which we created the glycine Ramachandran plot, as explained in Section 8.2, we generated the proline Ramachandran plot from the data generously provided to us by Dr. Laskowski. The proline Ramachandran plot is shown in Figure 8.4. Our proline Ramachandran plot is in agreement with the one shown in the paper by Lovell et al. [181]. 169 Proline Ramachandran Plot 180 120 60 -120 '1891 80 -120 -60 0 60 120 180 Figure 8.4: The proline Ramachandran plot shown in 8° x 8° resolution. The red, yellow and brown blocks are the core, allowed and generously allowed regions. The white regions are disallowed regions. 170 BIBLIOGRAPHY [l] .\I. F. Atiyah, Geometry and physics: A marriage made in heaven, lecture addressed to Department of Physics at University of Michigan, 2001, video and slides of this lecture are collected by the Web Lecture Archive Project. They are available for free download at http://www.wlap.org. [2] F. Graner, Two-dimensional fluid foams at equilibrium, in Morphology of Condensed Matter: Physics and Geometry of Spatially Complex systems, edited by K. R. Mecke and D. Stoyan, volume 600 of Lecture Notes in Physics, pages 187~211, Springer-Verlag, Heidelberg, 2002. [3] C. H. Arns, M. A. Knackstedt, and K. R. Mecke, Characterising the morphol- ogy of disordered materials, in Morphology of Condensed Matter: Physics and Geometry of Spatially Complex systems, edited by K. R. Mecke and D. Stoyan, volume 600 of Lecture Notes in Physics, pages 37—74, Springer-Verlag, Heidel- berg, 2002. [4] M. L5sche and P. Kriiger, Morphology of langmuir monolayer phases, in Mor- phology of Condensed Matter: Physics and Geometry of Spatially Complex sys- tems, edited by K. R. Mecke and D. Stoyan, volume 600 of Lecture Notes in Physics, pages 37—74, Springer-Verlag, Heidelberg, 2002. [5] P. H. Gaskell, Structure and Properties of Glasses — How Far Do We Need To Go .9, Journal of Non-Crystalline Solids 222, 1 (1997). [6] W. H. Zachariasen, Atomic Arrangement in Glass, Journal of the American Chemical Society 54, 3841 (1932). [7] R. J. Bell and P. Dean, Properties of Vitreous Silica: Analysis of Random Network Models, Nature 212, 1354 (1966). [8] R. J. Bell and P. Dean, The Structure of Vitreous Silica: Analysis of Random Network Theori, Philosophical Magzine 25, 1381 (1972). [9] D. E. Polk, Structural Model for Amorphous Silicon and Germanium, Journal of Non-Crystalline Solids 5, 365 (1971). [10] D. E. Polk and D. S. Boudreaux, Tetrahedrally Coordinated Random-Network Structure, Physical Review Letters 31, 92 (1973). [11] D. Henderson, Random Tetrahedral Network with Periodic Boundary Condi- tions, Journal of Non-Crystalline Solids 16, 317 (1974). [12] W. Y. Ching, C. C. Lin, and L. Guttman, Structural Disorder and Electronic Properties of Amorphous Silicon, Physical Review B 16, 5488 (1977). [13] L. Guttman, Simulation of continuous random network models with periodic boundary conditions, in American Institute of Physics Conference Proceedings, pages 224—228, 1974. [14] L. Guttman, Vibrational spectra of four-coordinated random networks with periodic boundary conditions, in American Institute of Physics Conference Proceedings, volume 31, pages 268—272, 1975. [15] F. Wooten, K. Winer, and D. Weaire, Computer-Generation of Structural Mod- els of Amorphous Si and Ge, Physical Review Letters 54, 1392 (1985). [16] G. T. Barkema and N. Mousseau, High-quality Continuous Random Networks, Physical Review B 62, 4985 ( 1985). [17] R. L. C. Vink, G. T. Barkema, M. A. Stijnman, and R. H. Bisseling, Device-size Atomistic Models of Amorphous Silicon, Physical Review B 64, 5214 (2001). [18] F. H. Stillinger and T. A. Weber, Computer Similation of Local Order in Condensed Matter Phases of Silicon, Physical Review B 31, 5262 (1985). [19] R. Biswas and D. R. Hamann, Interatomic Potentials for Silicon Structure Energies, Physical Review Letters 55, 2001 (1985). [20] J. R. Chelikowsky, J. C. Phillips, M. Kama], and M. Strauss, Surface and Thermodynamic Interatomic Force Fields for Silicon Clusters and Bulk Phases, Physical Review Letters 62, 292 (1989). [21] J. Tersoff, New Empirical Approach for the Structure and Energy of Covalent Systems, Physical Review B 37, 6991 (1988). [22] M. I. Baskes, J. S. Nelson, and A. F. Wright, Semiempim'cal Modified Embedded- atom Potential for Silicon and Germanium, Physical Review B 40, 6085 (1989). [23] R. Car and M. Parrinello, Structural, Dynamical, and Electronic Properties of Amorphous Silicon: An ab initio Molecular Dynamics Study, Physical Review Letters 60, 204 (1988). [24] L. T. Canham, Silicon Quantum Wire Array Fabrication by Electrochemical and Chemical Dissolution of Wafers, Applied Physics Letters 57, 1046 (1990). [25] D. Buttard, D. Bellet, and G. Dolino, X-ray-diffraction Investigation of the Anodic Oxidation of Porous Silicon, Journal of Applied Physics 79, 8060 (1996). [26] V. Lehmann and U. Gésele, Porous Silicon Formation: A Quantum Wire Effect, Applied Physics Letters 58, 856 (1991). [27] A. G. Cullis, L. T. Canham, and P. D. J. Calcott, The Structural and Lu- minescence Properties of Porous silicon, Journal of Applied Physics 82, 909 (1997) 172 [30] [31] [3‘2] [33] [34] [35] [36] [37] [38] [39] [’40] V. Chin, B. E. Collins, .\1. J. Sailor, and S. Bhatia, Compatibility of Primary Hepatocytes with Oxidized Nanoporous Silicon, Advanced Materials 13, 1877 (2001) D. Zhao et al., Continuous Mesoporous Silica Filmns with Highly Ordered Large Pore Structures, Advanced Materials 10, 1380 (1998). C. McDonagh, B. D. MacCraith, and A. K. McEvoy, Tailoring of Sol-gel Films for Optical Sensoring of Oxygen in Gas and Aqueous Phase, Analytical Chem- istry 70, 45 (1998). Y. Cohen et al., Spin-on Nanostructured Silicon-Silica Film Displaying Room- Temperature Nanosecond Lifetime Photoluminescence, Advanced Materials 15, 572 (2003). O. Dag et al., Photoluminescent Silicon Clusters in Oriented Hexagonal Meso- porous Silica Film, Advanced Materials 11, 474 (1999). S. Yoshida, T. Hanada, S. Tanabe, and N. Soga, Blue Photoluminescence from Si-doped Amorphous Silica Films by RF Sputtering, Japanese Journal of Applied Physics Part I - Regular Papers, Short Notes and Review Papers 35, 2694 (1996). T. Morioka et al., Study of the Structure of Silica Film by Infrared Spectroscopy and Electron Diffraction Analyses, Monthly Notices of the Royal Astronomical Society 299, 78 (1998). V. M. Izgorodin et al., Composition and Surface Properties of a Silicon Dioxide Film Deposited by a Plasma-Chemical Technique, High Energy Chemistry 36, 426 (2002). K. Z. Baba Kishi, Scanning Reflection Electron Microscopy of Surface To- pography by Difiusely Scattered Electrons in the Scanning Electron Microscope, Scanning 18, 315 (1996). M. Yoshikawa et al., Characterization of Fluorine-Doped Silicon Dioxide Film by Raman Spectroscopy, Thin Solid Filmns 310, 167 (1997). S. Yoshida, T. Hanada, S. Tanabe, and N. Soga, Annealing Characteristics of Si Doped Amorphous Silica Films by RF Sputtering, Journal of Materials Science 34, 267 (1999). B. Civalleri et al., Quantum Mechanical ab initio Characterization of a Simple Periodic Model of the Silica Surface, Journal of Physical Chemistry B 103, 2165 (1999). C. Mischler, W. Kob, and K. Binder, Classical and Ab-initio Molecular Dynamic Simulation of an Amorphous Silica Surface, Computer Physics Communications 147, 222 (2002). 173 [41] A. Roder, W. Kob, and W. Binder, Structure and dynamics of amorphous silica surfaces, Journal of Chemical Physics 114, 7602 (2001). [42] C. Wang, X. Kuzuu, and Y. Tamai, Molecular Dynamics Study on Surface Structure of a-Si02 by Charge Equilibration Method, Journal of Non—Crystalline Solids 318, 131 (2003). [43] V. I. Bogillo, L. S. Pirnach. and A. Dabrowski, Monte Carlo Simulation of Silica Surface Dehydroxylation Under Nonisothermal Conditions, Langmuir 13, 928 (1997). [44] G. Hadjisavvas, G. Kopidakis, and P. C. Kelires, Structural Models of Amor- phous Silicon Surfaces, Physical Review B 64, 5413 (2001). [45] K. A. Kilian, D. A. Drabold, and J. B. Adams, First-Principles Simulations of a-Si and a-Si:H Surfaces, Physical Review B 48, 17393 (1993). [46] J. C. Maxwell, On the Calculation of the Equilibrium and Stiflrzess of Frames, Philosophical iN’Iagzine 27, 294 (1864). [47] D. J. Jacobs and M. F. Thorpe, Generic Rigidity Percolation: The Pebble Came, Physical Review Letters 75, 4051 (1995). [48] D. J. Jacobs and M. F. Thorpe, Generic Rigidity Percolation in Two Dimen- sions, Physical Review E 53, 3682 (1996). [49] T. S. Tay and W. Whiteley, Recent Advances in the Generic Rigidity of Struc- tures, Structural Topology 9, 31 (1984). [50] W. Whiteley, Rigidity of molecular structures, in Rigidity Theory and Appli- cations, edited by M. F. Thorpe and P. M. Duxbury, pages 2146, New York, 1999, Kluwer Academic/ Plenum Publishers. [51] D. J. Jacobs, Generic Rigidity in Three-Dimensional Bond Bending Networks, Journal of Physics A: Mathematical and General 31, 6653 (1998). [52] M. B. Berry et al., The Closed Conformation of a Highly Flexible Protein - the Structure of Escherichia-Cali Adenylate Kinase with Bound AMP and AMPPNP, Proteins: Structure, Function, and Genetics 19, 183 (1994). [53] C. W. Miiller, G. J. Schlauderer, J. Reinstein, and G. E. Schulz, Adenylate Kinase Motions During Catalysis: an Energetic Counterweight Balancing Sub- strate Binding, Structure 4, 147 (1996). [54] G. J. Schlauderer and G. E. Schulz, The Structure of Bovine Mitochondrial Adenylate Kinase: Comparison with Isoenzymes in Other Compartments, Pro- tein Science 5, 434 (1996). 174 [55] M. Gerstein, G. Schulz, and C. Chothia, Domain Closure in Adenylate K inase: Joints on Either Side of Two Helices Close Like Neighboring Fingers, Journal of Molecular Biology 229, 494 (1993). [56] A. Crivici and M. Ikura, Molecular and Structural Basis of Target Recognition by Calmodulin, Annual Review of Biophysics and Biomolecular Structure 24, 85 (1995). [57] A. Houdusse, l\-‘I. Silver, and C. Cohen, A Model of Ca2+-fr‘ee Calmodulin Bind- ing to Unconventional Myosins Reveals How Calmodulin Acts as a Regulatory Switch, Structure 4, 1475 (1996). [58] Z. Zhang et al., Electron Transfer by Domain Movement in Cytochrome be], Nature 392, 677 (1998). [59] C. M. Deane and T. L. Blundell, A Novel Exhaustive Search Algorithm for Predicting the Conformation of Polypeptide Segments in Proteins, Proteins: Structure, Function, and Genetics 40, 135 (2000). [60] P. Giintert et al., Conformational Analysis of Protein and Nucleic Acid Frag- ments ith the New Grid Search Algorithm FOUND, Journal of Biomolecular NMR 12, 543 (1998). [61] R. A. Dammkoehler, S. F. Karasek, E. F. B. Shands, and G. R. Marshall, Sampling Conformational Hyperspace: Techniques for Improving Completeness, Journal of Computer-Aided Molecular Design 9, 491 (1995). [62] W. Wang, 0. Donini, C. M. Reyes, and P. A. Kollman, Biomolecular Simula- tions: Recent Developments in Force Fields, Simulations of Enzyme Catalysis, Protein-Ligand, Protein-Protein, and Protein-Nucleic Acid Noncovalent Inter- actions, Annual Review of Biophysics and Biomolecular Structure 30, 211 (2001) [63] N. N akajima, H. Nakamura, and A. Kidera, Multicanonical Ensemble Generated by Molecular Dynamics Simulation for Enhanced Conformational Sampling of Peptides, Journal of Physical Chemistry B 101, 817 (1997). [64] U. H. E. Hansmann, Y. Okamoto, and F. Eisenmenger, Molecular Dynamics, Langevin and Hybrid Monte Carlo Simulations in a Multicanonical Ensemble, Chemical Physics Letters 259, 321 (1996). [65] K. B. Kamal and J. P. Sethna, Multicanonical Methods, Molecular Dynamics, and Monte Carlo Methods: Comparison for Lennard-Jones Glasses, Physical Review E 57, 2553 (1998). [66] Y. Sugita and Y. Okamoto, Replica-exchange Molecular Dynamics Method for Protein Folding, Chemical Physics Letters 314, 141 (1999). 175 [67] Y. Sugita and Y. Okamoto, Replica-exchange Multicanonical Algorithm and Multicanonical Replica-exchange Method for Simulating Systems with Rough Energy Landscape, Chemical Physics Letters 329, 261 (2000). [68] B. A. Berg and T. Neuhaus, Multicanonical Algorithms for First Order Phase Transitions, Physics Letters B 267, 249 (1991). [69] J. Lee, New Monte Carlo Algorithm: Entropic Sampling, Physical Review Letters 71, 211 (1993). [70] J. Higo et al., Two-Component Multicanonical Monte Carlo Method for Effective Conformation Sampling, Journal of Computational Chemistry 18, 2086 (1997). [71] U. H. E. Hansmann and Y. Okamoto, Prediction of Peptide Conformation by Multicanonical Algorithm - New Approach to the Multiple-minima Problem, Journal of Computational Chemistry 14, 1333 (1993). [72] R. H. Swendsen and J.-S. Wang, Replica Monte Carlo Simulation of Spin- Glasses, Physical Review Letters 57, 2607 (1986). [73] N. G5 and H. A. Scheraga, Ring Closure and Local Conformatinal Deformations of Chain Molecules, Macromolecules 3, 178 (1970). [74] N. G6 and H. A. Scheraga, Calculation of the Conformation of the Pentapep- tide cyclo-(Glycylglycylglycylprolylprolyl). I. A Complete Energy Map, Macro— molecules 3, 188 (1970). [75] N. G5 and H. A. Scheraga, Ring Closure in Chain Molecules with C", I, or S2,l Symmetry, Macromolecules 6, 273 (1973). [76] M. Dygert, N. Go, and H. A. Scheraga, Use of a Symmetry Condition to Compute teh Conformations of Gramicidin S, Macromolecules 8, 750 (1975). [77] N. G6 and H. A. Scheraga, Calculation of the Conformation of cyclo-Hexaglycyl, Macromolecules 6, 525 (1973). [78] N. Go and H. A. Scheraga, Calculation of the Conformation of cyclo-Hexaglycyl 2: Application of a Monte Carlo Method, Macromolecules 11, 552 (1978). [79] W. J. Wedemeyer and H. A. Scheraga, Exact Analytical Loop Closure in Pro- teins Using Polynomial Equations, Journal of Computational Chemistry 20, 819 ( 1999). [80] M. G. Wu and M. W. Deem, Efiicient Monte Carlo methods for cyclic peptides, Molecular Physics 97, 559 (1999). [81] A. A. Canutescu and R. L. Dunbrack, Jr, Cyclic Coordinate Descent: A Robotics Algorithm for Protein Loop Closure, Protein Science 12, 963 (2003). 176 [8‘2] [83] [87] [88] [89] [90] [91] [92] [93] R. E. Bruccoleri and M. Karplus, Chain Closure with Bond Angle Variations, Macromolecules 18, 2767 (1985). K. A. Palmer and H. A. Scheraga, Standard-Geometry Chains Fitted to X-ray Derived Structures: Validation of the Rigid-Geometry Approximation I: Chain Closure through a Limited Search of “Loop” Conformations, Journal of Com- putational Chemistry 12, 505 (1991). K. A. Palmer and H. A. Scheraga, Standard-Geometry Chains Fitted to X-ray Derived Structures: Validation of the Rigid-Geometry Approximation II: Sys- tematic Searches for Short Loops in Proteins: Applications to Bovine Pancreatic Ribonuclease A and Human Lysozyme, Journal of Computational Chemistry 13, 329 (1992). J. Moult and M. N. G. James, An Algorithm for Determining the Conformations of Polypeptide Segments in Proteins by Systematic Search, Proteins: Structure, Function, and Genetics 1, 146 (1986). M. J. Dudek and H. A. Scheraga, Protein Structure Prediction Using a Combi- nation of Sequence Homology and Global Energy Minimization I. Global Energy Minimization of Surface Loops, Journal of Computational Chemistry 11, 121 (1990). K. D. Gibson and H. A. Scheraga, Energy Minimization of Rigid-Geometry Polypeptides with Exactly Closed Disufide Loops, Journal of Computational Chemistry 18, 403 (1997). D. J. Jacobs, A. J. Rader, L. A. Kuhn, and M. F. Thorpe, Protein Flexibility Prediction Using Graph Theory, Proteins: Structure, Function, and Genetics 44, 150 (2002). A. J. Rader, B. M. Hespenheide, L. A. Kuhn, and M. F. Thorpe, Protein Unfolding: Rigidity Lost, Proceedings of the National Academy of Sciences of the United States of America 99, 3540 (2002). B. M. Hespenheide, A. J. Rader, M. F. Thorpe, and L. A. Kuhn, Identifying Protein Folding Cores from the Evolution of Flexible Regions During Unfolding, Journal of Molecular Graphics and Modelling 21, 195 (2002). P. N. Keating, Efiect of Invariance Requirements on the Elastic Strain Energy of Crystals with Application to the Diamond Structure, Physical Review 145, 637 (1966). B. R. Djordjevic’, M. F. Thorpe, and F. Wooten, Computer Model of Tetrahedral Amorphous Diamond, Physical Review B 52, 5685 (1995). Y. Tawada, H. Okamoto, and Y. Hamakawa, a-SiC:H/a-Si:H Heteriojunction Solar Cell Having Moer Than 7.1% Conversion Efi‘iciency, Applied Physics Letters 39, 237 (1981). 177 [94] [95] [99] [100] [101] [102] [103] [104] [105] H. Iida et al., Efficiency of the a-Si:H Solar Cell and Grain Size of SN02 Transparant Conductive Film, IEEE Electron Device Letters 4, 157 (1983). S. C. Deane, F. J. Clough, W. I. Milne, and M. J. Powell, The Role of the Gate Insulator in the Defect Pool Model for Hydrogenated Amorphous Silicon Thin Film Transistor Characteristics, Journal of Applied Physics 73, 2895 (1993). S. K. Kim, Y. J. Choi, K. S. Cho, and J. Jang, Coplanar Amorphous Silicon Thin Film Transistor Fabricated by Inductively Coupled Plasma Chemical Vapor Deposition, Journal of Applied Physics 84, 4006 (1998). M. F. Thorpe, V. V. Levashov, M. Lei, and S. J. L. Billinge, Notes on the analysis of data for pair distribution functions, in From Semiconductors to Proteins: Beyond the Average Structure, edited by S. J. L. Billinge and M. F. Thorpe, pages 105—128, Kluwer Academic/ Plenum Publishers, New York, 2002. J. Y. Pivan, O. Achak, L. Michéle, and L. Daniel, The Novel T hiogermanate [(CH3,)4N]4 Ge4Sm with a High Cubic Cell Volumn. Ab Initio Structure Deter- mination from Conventional X-ray Powder Diffraction, Chemistry of Materials 6, 827 (1994). C. L. Bowes et al., Dimetal Linked Open Frameworks: [(CH3)4N]2 (Agg, Cug)Ge4Slo, Chemistry of Materials 8, 2147 (1996). O. M. Yaghi, Z. Sun, D. A. Richardson, and T. L. Groy, Directed Transforma- tion of Molecules to Solids: Synthesis of a Microporous Sulfide from Molecular Germanium Sulfide Cages, Journal of the American Chemical Society 116, 807 (1994). C. Cahill and J. B. Parise, Synthesis and Structure of MnGe4Slo - (C6H14N2) - 3H20.’ A Novel Sulfide Framework Analogous to Zeolite Li-A (BW), Chemistry of Materials 9, 807 (1997). F. Bonhomme and M. G. Kanatzidis, Structurally Characterized Mesostruc- tured Hybrid Surfactant—Inorganic Lamellar Phases Containing the Adaman- tane [Ge4510]4_ Anion: Syntesis and Properties, Chemistry of Materials 10, 1153 (1998). M. J. MacLachlan, N. Coombs, and G. Ozin, Non-aqueous Supramolecular As- sembly of Mesostructured Metal Germanium Sulphides from (GC4SIO)4— Clus— ters, Nature 397, 681 (1999). M. J. MacLachlan et al., Mesostructured Metal Germanium Sulfides, Journal of the American Chemical Society 121, 12005 (1999). K. K. Rangan et al., Aqueous Mediated Synthesis of Mesostructured Manganese Germanium Sulfide with Hexagonal Order, Chemistry of Materials 11, 2629 (1999). 178 [106] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] M. Wachhold et al., ll/Iesostructured Metal Germanium Sulfide and Selenide Ma- terials Based on the Tetrahedral [Ge4SIO]4— and [Ge4Selo]4’ Units: Surfactant Templated Three-Dimensional Disordered Frameworks Perforated with Worm Holes, Journal of Solid State Chemistry 152, 21 (2000). The amorphous GaAs model is generously provided by G. T. Barkema and N. Mousseau. V. Petkov, K. K. Rangan, M. G. Kanatzidis, and S. J. L. Billinge, Structure of crystallographically challenged materials by profile analysis of atomic pair distribution functions: Study of limosg and mesostructured mnge4slo, in Ma- terials Research Society Proceedings, edited by P. G. Allen, S. M. Mini, D. L. Perry, and S. R. Stock, volume VI of Application of Synchrotron Radiation Techniques to Materials Science, page EE1.5, Materials Research Society, 2001. M. F. Thorpe, M. V. Chubynsky, D. J. Jacobs, and J. C. Phillips, Non- Randomness in Network Glasses and Rigidity, Glass Physics and Chemistry 21, 160 (2001). G. N. Ramachandran and V. Sasiskharan, Conformation Of Polypeptides And Proteins, Advances in Protein Chemistry 23, 283 (1968). E. N. Baker and R. E. Hubbard, Hydrogen Bonding in Globular Proteins, Progress in Biophysics and Molecular Biology 44, 97 (1984). B. I. Dahiyat, D. B. Gordon, and S. L. Mayo, Automated Design of the Surface Positions of Protein Helices, Protein Science 6, 1333 (1997). F. Fabiola, R. Bertram, A. Korostelev, and M. S. Chapman, An Improved Hydrogen Bond Potential: Impact on Medium Resolution Protein Structures, Protein Science 11, 1415 (2002). R. Balasubramanian, R. Chidambaram, and G. N. Ramachandran, Poten- tial Functions for Hydrogen Bond Interactions II: Formulation of an Empirical Potential Function, Biochimica et Biophysica Acta, International Journal of Biochemistry and Biophysics 221, 196 (1970). W. D. Cornell et al., A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules, Journal of the American Chemical Society 117, 5179 (1995). D. A. Pearlman et al., AMBER, A Package of Computer Programs for Ap- plying Molecular Mechanics, Normal Mode Analysis, Molecular Dynamics and Free Energy Calculations to Simulate the Structural and Energetic Properties of Molecules, Computational Physics Communication 91, 1 (1995). A. D. J. MacKerell et al., All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins, Journal of Physical Chemistry B 102, 3586 (1998). 179 Ls.- [118] [119] [1‘26] [127] [128] [129} B. R. Brooks et al., CHARMM — A Program For Macromolecular Energy, Min- imization, and Dynamics Calculations. Journal of Computational Chemistry 4, 187 (1983). M. I. Zavodszky, personal communication. D. F. Stickle, L. G. Presta, K. A. Dill, and G. D. Rose, Hydrogen-Bowling in Globular-Proteins, Journal of Molecular Biology 226, 1143 (1992). B. H. Hespenheide et al., Improved definition of hydrophobic interactions in rigidity analysis, to be published. D. C. Liu and J. Nocedal, On the Limited Memory Method for Large Scale Optimization, Mathematical Programming B 45, 503 (1989). N. Metropolis et al., Equation of State Calculations by Fast Computing Ma- chines, Journal of Chemical Physics 21, 1087 (1953). P. Spellucci, A SQP Method for General Non-linear Programs Using Only Equality Constrained Subproblems, Mathematical Programming 82, 413 (1998). A. L. Morris, M. W. MacArthur, E. G. Hutchioson, and J. M. Thornton, Stere- ochernical Quality of Protein Structure Coordinates, Proteins: Structure, Func- tion, and Genetics 12, 345 (1992). N. L. Allinger, Y. H. Yuh, and J. H. Lii, Molecular Mechanics. The MM3 Force Field for Hydrocarbons: 1, Journal of the American Chemical Society 111, 8551 (1989). http://dasher.wustl.edu/tinker. M. Miller et al., Structure Of Complex Of Synthetic HIV-1 Protease With a Substrate-Based Inhibitor At 2.3/I Resolution, Science , 1149 (1989). M. Kumar and M. V. Hosur, Adaptability And Flexibility Of HIV-1 Protease, European Journal of Biochemistry 270, 1231 (2003). [130] S. Spinelli et al., The Ill-Dimensional Structures of the Aspartyl Protease from [131] [132] the HIV-1 Isolate Bru, Biochimie 73, 1391 (1990). R. Ishima et al., Flap Opening and Dimer-Interface Flexibility in the Free and Inhibitor Bound HIV Protease and Their Implications for Function, Structure 7, 1047 (1999). D. I. Freedberg et al., Rapid Structural Fluctuations Of The Free HIV Protease Flaps In Solution: Relationship To Crystal Structures And Comparison With Predictions Of Dynamics Calculations, Protein Science 11, 221 (2002). 180 [133] [134] [13.3] [136] [137] [138] [139} [140] [141] [142] [143] [144] [145] [146] W. R. P. Scott and C. A. Schiffer, Curling of Flap Tips in HIV-1 Protease as a Mechanism for Substrate Entry and Tolerance of Drug Resistance, Structure 8, 1259 (2000). H. A. Carlson, personal communications. C. Shih et al., LY28’1514, A Pyrrolo/2,3-dfpyrimidine—based Antifolate That Inhibits Multiple Folate-requiring Enzymes, Cancer Research 57, 1116 (1997). http://www.pdb.org. M. R. Sawaya and J. Kraut, Loop and Subdomain Movements in Mechanism of Escherichia Coli Dihydrofolate Reductase: Crystallographic Evidence, Bio- chemistry 36, 586 (1997). P. Shrimpton, A. Mullaney, and R. K. Allemann, Functional Role for Tyr31 in the Catalytic Cycle of Chicken Dihydrofolate Reductase, Proteins: Structure, Function, and Genetics 51, 216 (2003). V. Cody et al., Ligand-induced Conformational Changes in the Crystal Structure of Pneumocystis Carinii Dihydrofolate Reductase Complexes with Folate and NADP+, Biochemistry 38, 4303 (1999). V. M. Reyes, M. R. Sawaya, K. A. Brown, and J. Kraut, Isomophous Crystal Structures of Escherichia Coli Dihydrofolate Reducatase Complexed with Folate, 5-Deazafolate, and 5,10-dideazatetrahydrofolate: Mechanistic Implications, Bio- chemistry 34, 2710 (1995). C. J. Falzone, P. E. Wright, and S. J. Benkovic, Dynamics of a Flexible Loop in Dihydrofolate Reductase from Escherichia Coli and Its Implication for Catalysis, Biochemistry 33, 439 (1994). D. Antoniou and S. D. Schwartz, Internal Enzyme Motions as a Source of Cat- alytic Activity: Rate-Promoting Vibrations and Hydrogen Tunneling, Journal of Physical Chemistry B 105, 5553 (2001). P. K. Agarwal et al., Network of Coupled Promoting Motions in Enzyme Catal- ysis, Proceedings of the National Academy of Sciences of the United States of America 99, 2794 (2002). G. G. Hammes, Multiple Conformational Changes in Enzyme Catalysis, Bio- chemistry 41, 8221 (2002). E. Y. Lau and J. T. Gerig, Effects of Fluorine Substtution on the Structure and Dynamics of Complexes of Dihydrofolate Reductase (Escherichia Coli), Bio- physical Journal 73, 1579 (1997). J. L. Radkiewicz and C. L. Brooks 111, Protein Dynamics in Enzymatic Cataly- sis: Exploration of Dihydrofolate Reductase, Journal of the American Chemical Society 122, 225 (2000). 181 [147] T. H. Rod, J. L. Radkiewicz, and C. L. Brooks III, Correlated Motion and the Effect of Dismal Mutations in Dihydrofolate Reductase, Proceedings of the National Academy of Sciences of the United States of America 100, 6980 (2003). [148] G. \e'riend, WHAT IF: a Molecular Modelling and Drug Design Program, Jour- nal of Molecular Modelling and Graphics 8, 52 (1990). [149] H. Gould and J. Tobochnik, An Introduction to Computer Simulation Methods, Addison-Wesley Publishing Company, 1998. [150] S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, Optimization by Simulated Annealing, Science 220, 671 (1983). [151] P. L’Ecuyer and S. Cote, Implementing a Random Number Package with Split- ting Facilities, ACM Transactions on Mathematical Software 17, 98 (1991). [152] http: / /www.accelrys.com. [153] A. P. Korn and D. R. Rose, Torsion Angle Differences as a Means of Pinpointing Local Polypeptide Chain Trajectory Changes for Identical Proteins in Different Conformational States, Protein Engeering 7, 961 (1994). [154] C. Vieille, personal communication. [155] J. Janin, S. Wodak, M. Levitt, and B. Maigret, Conformation of Amino-Acid Side-Chains in Proteins, Journal of Molecular Biology 125, 357 (1978). [156] R. L. Dunbrack Jr. and K. Martin, Backbone-Dependent Rotamer Library for Protein Application to Side-Chain Prediction, Journal of Molecular Biology 230, 543 (1993). [157] S. C. Lovell, J. M. Word, J. S. Richardson, and D. C. Richardson, The Penul- timate Rotamer Library, Proteins: Structure, Function, and Genetics 40, 389 (2000). [158] H. F rauenfelder et al., The Role of Structure, Energy Landscape, Dynamics and Allostery in the Enzymatic Function of Myoglobin, Proceedings of the National Academy of Sciences of the United States of America 98, 2370 (2001). [159] K. W. Plaxco, K. T. Simons, I. Ruczinski, and D. Baker, Topology, Stability, Sequence and Length: Defining the Determinants of Two-state Protein Folding Kinetics, Biochemistry 39, 11177 (2000). [160] T. Kortemme et al., Similarities Between the Spectrin 5H3 Domain Denatured State and its Folding Transition State, Journal of Molecular Biology 297, 1217 (2000). [161] D. T. Leeson et al., Protein Folding and Unfolding on a Complex Energy Land- scape, Proceedings of the National Academy of Sciences of the United States of America 97, 2527 (2000). 182 [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] \V. A. Eaton et al., Fast Kinetics and Mechanisms in Protein Folding, Annual Review of Biophysics and Biomolecular Structures 29, 327 (2000). M. Karplus, Aspects of Protein Reaction Dynamics: Deviation from Simple Behavior, Journal of Physical Chemistry B 104, 11 (2000). J. N. Onuchic, Z. Luthey-Schulten, and P. G. VVolynes, theory of Protein Fold- ing: The Energy Landscape Perspective, Annual Reviews of Physical Chemistry 48, 545 (1997). D. J. Brockwell, D. A. Smith. and S. E. Radford, Protein Folding Mechanisms: New Methods and Emerging Ideas, Current Opinion in Structural Biology 10, 16 (2000). L. Angelani et al., Saddles in the Energy Landscape Probed by Supercooled Liquids, Physical Review Letters 85, 5356 (2000). N. Mousseau and G. T. Berkema, Traveling Through Potential Energy Land- scapes of Disordered Materials: The Activation-Relaxation Technique, Physical Review E 57, 2419 (1998). N. Mousseau, P. Derreumaux, G. T. Barkema, and R. Malek, Sampling Acti- vated Mechanisms in Proteins with the Activation-Relaxation Technique, Jour- nal of Molecular Graphics and Modelling 19, 78 (2001). T. J. A. Ewing, S. Makino, A. G. Skillman, and I. D. Kuntz, DOCK 4.0: Search Strategies for Automated Molecular Docking of Flexible Molecule Databases, Journal of Computer-Aided Molecular Design 15, 411 (2001). D. M. Lorber and B. K. Shoichet, Flexible Ligand Docking Using Conforma- tional Ensembles, Protein Science 7, 938 (1998). J. A. Given and M. K. Gilson, A Hierarchical Method for Generating Low- Energy Conformers of a Protein-Ligand Complex, Proteins: Structure, Func— tion, and Genetics 33, 475 (1998). A. R. Leach, Ligand Docking to Proteins with Discrete Side-chain Flexibility, Journal of Molecular Biology 235, 345 (1994). R. Najmanovich, J. Kuttner, V. Sobolev, and M. Edelman, Side-chain Flex- ibility in Proteins upon Ligand Binding, Proteins: Structure, Function, and Genetics 39, 261 (2000). H. Clauben, C. Buning, M. Rarey, and T. Lengauer, FLEXE: Efficient Molec- ular Docking Considering Protein Structure Variations, Journal of Molecular Biology 308, 377 (2001). H. A. Carlson and J. A. McCammon, Accommodating Protein Flexibility in Computational Drug Design, Molecular Pharmacology 57 , 213 (2000). 183 [176] [177] [178] [179] [180] [181] M. I. Zavodszky, M. Lei, M. F. Thorpe, and L. A. Kuhn, Modeling protein main chain flexibility for docking, to be published. M. I. Zavodszky, Modeling Flexibility in Protein-Ligand Recognition, PhD thesis, Michigan State University, 2003. V. Schnecke et al., Screening a Peptidyl Database for Potential Ligands to Proteins Including Side-Chain Flexibility, Proteins: Structure, Function, and Genetics 33 (1998). V. Schnecke and L. A. Kuhn, Virtual Screening with Salvation and Ligand- Induced Complementarity, Perspectives in Drug Design and Discovery 20, 171 (2000). G. J. Kleywegt and T. A. Jones, Phi/Psi-chology: Ramachandran Revisited, Structure 4, 1395 (1996). S. C. Lovell et al., Structure Validation by C0, Geometry: (b, w and C3 Deviation, Proteins: Structure, Function, and Genetics 50, 437 (2003). R. A. Laskowski, M. W. i\—~’IacArthur, D. S. Moss, and J. M. Thornton, PROCHECK: a program to check the stereochemical quality of protein struc- tures, Journal of Applied Crystallography 26, 283 (1993). 184 N T [[[]]]][[[][‘]][[1]]