’4 . ‘ :41 new. .55 , wk. 2.. i V afluifierx _ ._ aura i. .1; .- Zeiiéw. it a .v ,% bxhu .z 23mm? uwm.14.ze.f:tn.avr4 : til at. n: ‘ S fit an .3!- 3.5%, A? ‘4 Infiniti- w? V 9; . 5 311.1. II. .I 1 iii Yui . X“ l. .h #6... . vaLk .ul a“? a. ,q . . . 5.. This is to certify that the dissertation entitled Modeling Flexibility in Protein-Ligand Recognition presented by Maria lldiko Zavodszky has been accepted towards fulfillment of the requirements for the Ph.D degree in Biochemistry and Molecular Biology KVVVV V4. £44., Major Professor’s Signature April 29, 2003 Date MSU is an Affinnative ActiorVEquaI Opportunity Institution PLACE IN RETURN Box to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c:/ClRC/DateDue.p65-p. 15 MODELING FLEXIBILITY IN PROTEIN-LIGAND RECOGNITION By Mdria Ildiko' Zdvodszky A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Biochemistry and Molecular Biology 2003 COPYRIGHT BY MARIA ILDIKO ZAVODSZKY 2003 ABSTRACT MODELING FLEXIBILITY IN PROTEIN-LIGAND RECOGNITION By Mdria Ildiké Zdvods'zky The function of many proteins is to recognize and bind peptides and other small molecules. Understanding the way these proteins work implies understanding the driving forces of protein-ligand interactions. This, in tum, is necessary to find new, specific ligands for proteins that are potential targets for disease therapy, or to help elucidate the fitnction of the increasing number of proteins with known structure, but yet unknown function. Computational methods are well suited for analyzing the large amount of existing experimental data to identify the underlying principles of protein-ligand interactions. These principles, combined with efficient algorithms, are built into docking and screening schemes used to predict binding orientations and interactions of partner molecules providing a working hypothesis for further experiments. At the same time, these computational tools facilitate screening of large databases for reducing the large number of organic molecules to a smaller set of potential ligands to be tested for binding to the target of interest. SLIDE, the protein structure-based ligand docking and screening tool developed in our laboratory, handles ligands and protein side chains flexibly and has the possibility of taking water-mediated interactions into account. It is capable of screening a database of 100 000 molecules in 1-2 days on a desktop computer. SLIDE was used to propose a viable model of the ternary complex of R67 dihydrofolate reductase ° folate - cofactor, taking into account existing experimental data. The results are in good agreement with the predictions of another widely used docking tool, DOCK, and propose specific interactions that can be tested by mutagenesis. Improved representation of the binding site, using a knowledge-based approach, coupled with the realistic modeling of protein side-chain and ligand flexibility in SLIDE allowed the identification of new ligands for thrombin. This was achieved by screening the Available Chemical Directory, followed by the experimental measurement of the binding constants of the predicted top scoring ligand candidates using Isothermal Titration Calorimetry. Two of the top scoring ligand candidates were found to have binding affinities in the micromolar range for human thrombin. As part of this thesis work, a new approach toward modeling main-chain flexibility in docking was proposed: Flexibility analysis of the target protein was performed using the graph-theoretic algorithm FIRST, followed by the generation of alternative conformations for the predicted flexible regions with ROCK, a conformer searching program based on a random walk sampling of the rotatable bonds. A representative set of the conformational ensemble generated this way was used as targets for docking with SLIDE. ROCK is uniquely suited for flexibly handling ring structures and can be used to model the flexibility of large circular ligands, as well, as demonstrated on the case of cyclosporin. The use of this combined method to perform fully flexible docking is illustrated for cyclophilin A — cyclosporin, while addressing the question of how much flexibility of the interacting molecules is tolerated without hindering molecular recognition. ACKNOWLEDGEMENTS First of all, I would like to thank my advisor, Dr. Leslie A. Kuhn, for her encouragement, support and guidance throughout my graduate carrier. She has been a great example of professionalism, enthusiasm for science and healthy optimism, a pleasure to work with. I have found not only an excellent advisor, but also a good friend in her. I want to thank Dr. Michael Thorpe for the opportunity to work with him. He encouraged me to rise above the details, look at the big picture. The interesting conversations with him gave me valuable insight, from a physicist’s point of view, of how science works. I extend my thanks to the members of my committee, Dr. Shelagh Ferguson- Miller, Dr. Jack Preiss and Dr. Gregory Zeikus for providing encouragement and valuable feedback on my work. I am thankful to collaborators Dr. Elisabeth Getzoff and Dr. John Tainer from the Scripps Research Institute, and Dr. Michal Zolkiewski from Kansas State University for giving me the opportunity to use their resources to screen the Available Chemical Directory and perform the ITC experiments, respectively. Scientific work these days is usually a group effort. My thanks go out to the former members of the Kuhn lab, Dr. Volker Schnecke, Dr. Mike Raymer, Dr. Paul Sanschagrin, Dr. Brandon Hespenheide, Rajesh Korde and the former member of the Thorpe group, Dr. A.J. Rader for their invaluable help and advice. I would also like to thank the present members of the Kuhn lab, Litian He, Chetan Sukuru, and Sameer Arora for creating an enjoyable working environment. I am especially thankful to Ming Lei from the Thorpe group for the excellent collaboration over the ROCK project. He was always ready to help, debate over ideas and approaches, and answer my suggestions promptly. Finally, I would like to thank my fiiends and family, especially my husband Peter for making it all possible and worth while. vi TABLE OF CONTENTS LIST OF TABLES xi LIST OF FIGURES xii LIST OF ABBREVIATIONS xv 1 Introduction 1 1.1 Modeling Protein-Ligand Interactions .................................................................... 1 1.2 Computational Docking and Screening ................................................................... 2 1.3 Overview of Current Approaches in Docking ......................................................... 4 1.3.1 Binding Site Representation .................................................................................... 4 1.3.2 Orientational and Conformational Search ............................................................... 5 1.3.3 Scoring .................................................................................................................... 8 1.4 SLIDE ...................................................................................................................... 9 2 Predicting the Ternary Complex of R67 DHFR-NMN-Folate with SLIDE 12 2. 1 Introduction ........................................................................................................... 12 2.2 Methods ................................................................................................................. 16 2.3 Results ................................................................................................................... 19 2.3.1 Active Site Symmetry and Docking Constraints ................................................... 19 2.3.2 Docking of NMN into R67 DHFR-Fol 1 Using SLIDE ........................................ 21 2.4 Discussion ............................................................................................................. 28 vii 2.4.1 2.4.2 2.4.3 2.5 3.1 3.2 3.2.1 3.2.2 3.3 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 3.4 3.4.1 3.4.2 3.5 3.5.1 How Can R67 DHFR Bind Both NADPH and Folate? ........................................ 28 Relationship to Mutagenesis Results .................................................................... 33 A Model for Hydride Transfer .............................................................................. 34 Conclusions ........................................................................................................... 36 Distilling the Essential Features of a Protein Surface for Improving Protein-Ligand Docking, Scoring, and Virtual Screening 37 Abstract ................................................................................................................. 37 Introduction ........................................................................................................... 39 The Evolution of Protein Surface Representations in SLIDE ............................... 39 Other Approaches for Discrete Representation of Protein Binding Sites ............. 41 Methods ................................................................................................................. 44 Knowledge-Based Representation of Protein Binding Sites ................................. 44 Ligand Interaction Points ...................................................................................... 49 Ligand Databases .................................................................................................. 50 Key Template Points ............................................................................................. 51 Evaluation of New Protein and Ligand Representations in Ligand Screening and Docking .......................................................................................................... 53 Results ................................................................................................................... 55 Thrombin ............................................................................................................... 56 Glutathione S—Transferase .................................................................................... 66 Discussion ............................................................................................................. 7 l The Influence of Accurate Binding Site Representation on Docking and Scoring .................................................................................................................. 71 viii 3.5.2 3.5.3 3.6 4.1 4.2 4.3 4.4 4.5 5.1 5.2 5.2.1 5.2.2 5.3 5.4 6.1 6.2 6.3 6.3.1 The Role of Flexibility in Docking to Thrombin and GST ................................... 75 Previous Docking and Screening Validation Studies on Thrombin and GST ...... 76 Conclusions ........................................................................................................... 79 Side-Chain Flexibility in Docking with SLIDE: Testing the Minimal Rotation Hypothesis 80 Introduction ........................................................................................................... 80 Methods ................................................................................................................. 82 Results ................................................................................................................... 88 Discussion ............................................................................................................. 94 Conclusions ........................................................................................................... 95 Using SLIDE to Find New Ligands For Thrombin 96 Introduction ........................................................................................................... 96 Methods ................................................................................................................. 98 Screening the ACD with SLIDE ........................................................................... 98 Isothermal Titration Calorimetry .......................................................................... 99 Results ................................................................................................................. 100 Discussion ........................................................................................................... 106 Modeling Protein Main-Chain Flexibility in Docking 108 Introduction ......................................................................................................... 108 Methods ............................................................................................................... 112 Results ................................................................................................................. 116 Flexibility Analysis ............................................................................................. l 16 ix 6.3.2 Conforrner Generation ......................................................................................... 120 6.3.3 Docking ............................................................................................................... 127 6.4 Discussion ........................................................................................................... l3 1 6.5 Conclusions ......................................................................................................... l 32 7 Summary and Future Directions 134 7.1 Summary of Advances Made .............................................................................. 134 7.2 Interesting Problems Remaining to be Solved .................................................... 137 BIBLIOGRAPHY 141 2.1 3.1 3.2 4.1 4.2 4.3 6.1 6.2 LIST OF TABLES Steady state kinetic values for R67 DHF R variants at pH 7.0 .............................. 34 SLIDE scores, DrugScore scores and RMSD values of known thrombin ligands 59 SLIDE scores, DrugScore scores and RMSD values of known GST ligands ....... 68 Thrombin crystallographic complexes used in testing the minimal rotation hypothesis .............................................................................................................. 85 GST crystallographic complexes used in testing the minimal rotation hypothesis .............................................................................................................. 86 Li gand-free structures and their corresponding 1i gand-bound complexes used in testing the minimal rotation hypothesis ............................................................ 87 Results of docking cyclosporin conformers into CypA conformers ................... 129 Interactions monitored to identify correct cyclosporin dockings ........................ 130 xi 1.1 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 LIST OF FIGURES Images in this dissertation are presented in color. An overview of the SLIDE screening algorithm ................................................... 11 The structure of R67-DHFR .................................................................................. 15 The structures of folate and NADPH .................................................................... 18 SLIDE scores and DrugScore scores of well docked NMN molecules ................ 23 LIGPLOT figures for the DHFR-Fol I-NMN ternary complex ........................... 24 Comparing the docked orientations of NMN obtained with SLIDE and DOCK.. 27 Overlap of the NMN binding site with Fol I and F01 11 sites ................................ 32 Comparison of the grid-based and knowledge-based templates ........................... 40 Placement of optimal hydrogen-bonding template points in SLIDE .................... 47 Summary of rules for hydrophobic interaction point assignment ......................... 49 Examples of new knowledge-based templates ...................................................... 54 Comparing the docked orientations to the crystal structure position of a B-strand mimetic inhibitor in the binding site of thrombin ................................... 58 RMSD values of thrombin and GST ligands docked using the original and new template generation methods ......................................................................... 61 Screening and enrichment improvements for thrombin as reflected by SLIDE scores ..................................................................................................................... 63 Screening and enrichment improvements for thrombin as reflected by Drugscore .............................................................................................................. 65 xii 3.9 3.10 4.1 4.2 4.3 4.4 4.5 5.1 5.2 5.3 5.4 6.1 6.2 6.3 Screening and enrichment improvements for GST as reflected by Drugscore ..... 70 Correlation between SLIDE scores and DrugScores ............................................ 72 The active site of thrombin filled with template points ........................................ 89 Example of side chains rotated by SLIDE upon docking ..................................... 89 Side-chain rotations performed by SLIDE compared to the dihedral angle differences observed between 1i gand-free and 1i gand-bound crystal structures of thrombin ............................................................................................................ 9O Side-chain rotations performed by SLIDE compared to the dihedral angle differences observed between ligand-free and ligand-bound crystal structures of GST ................................................................................................................... 92 Side-chain rotations performed by SLIDE compared to the dihedral angle differences observed between 1i gand-free and ligand-bound crystal structures of 18 proteins ......................................................................................................... 93 ACD compounds selected for testing based on their scores ............................... 102 ACD compounds selected for testing based on molecular graphics inspection of their docked complex with thrombin .............................................................. 103 ITC data for 4-aminobenzamidine, morelloflavone, and new fuchsin ................ 105 Docked orientations of morelloflavone and new fuchsin in the binding site of thrombin .............................................................................................................. 106 The cyclic undecapeptide Thr2-cyclosporin ....................................................... l 14 Hydrogen-bond dilution plot of the unliganded human CypA ........................... 1 18 Ribbon diagrams of the ligand-free and the ligand-bound CypA colored by flexibility index ................................................................................................... 120 xiii 6.4 6.5 6.6 6.7 6.8 The 12 most distinct conformers of CypA generated by ROCK compared to the NMR structure loca .................................................................................. 121 Ramachandran plots of the crystal structure and a ROCK conformer of CypA. 123 Maximum Ca deviations of CypA conformers generated with ROCK compared to NMR HD exchange rates of backbone amide protons of free CypA .............. 124 Free, protein-bound and ROCK- generated conformations of cyclosporin ......... 126 Cyclosporin conformers docked into the binding sites of CypA conformers ..... 130 xiv ACD CSD CypA DHF DHFR DMSO FIRST GA GST HD exchange ITC MD + NADP NADPH NMN NMR NOE PDB RMSD ROCK LIST OF ABBREVIATIONS Available Chemicals Directory Cambridge Structural Database cyclophilin A dihydrofolate dihydrofolate reductase dimethyl sulfoxide Floppy Inclusions in Rigid Substructure Topography genetic algorithm glutathione S-transferase hydrogen-deuterium exchange isothermal titration calorimetry molecular dynamics nicotinamide adenine dinucleotide phosphate, oxidized form nicotinamide adenine dinucleotide phosphate, reduced form nicotinamide mononucleotide nuclear magnetic resonance nuclear Overhauser effect Protein Data Bank root mean square deviation Rigidity Optimized Conformational Kinetics XV SLIDE Screening for Ligands with Induced-fit Docking Efficiently THF tetrahydrofolate xvi Chapter 1 Introduction 1.1 Modeling Protein-Ligand Interactions Life can be viewed as interconnecting series of specific binding steps. The molecular organization of living matter implies the dependence of all biochemical processes on molecular recognition, from protein-ligand interactions through macromolecular associations to the intriguing process of protein folding. This thesis deals with modeling the molecular recognition between proteins and small molecule ligands that is of great practical significance for finding new specific ligands to proteins that are potential targets for disease therapy. Detected patterns in binding preferences can also help elucidate the function of the increasing number of proteins with known structures, yet unknown functions. Scientists have been puzzled by the specificity and accuracy of molecular recognition since the beginning of modern biochemistry. Advances in technology facilitated rapid advancement in this area over the past few decades. Knowledge about the features contributing to protein-ligand interactions is derived from experimental data, most importantly from the exponentially increasing number of protein structures solved by X-ray crystallography and nuclear magnetic resonance spectroscopy (N MR) and deposited into the Protein Data Bank. (Berman et al., 2000). Calorimetric methods (Jelesarov and Bosshard, 1999), often combined with mutagenesis, provide useful information about the energetics of association of biomolecules. Reciprocal burial of hydrophobic surface patches as well as shape and chemical complementarity of the interacting partners along the interface are traditionally believed to be the critical contributors to the specificity of the binding process (Halperin et al., 2002; Kuntz et al., 1999; Sobolev et al., 1996). Nevertheless, proteins bind ligands even if their shapes in free form do not seem to be optimized for complementing each other. The differences observed between the ligand-flee and ligand-bound structures of a number of proteins (Betts and Stemberg, 1999) point to the importance of accounting for the flexible changes occurring upon binding. According to the original concept of induced fit (Koshland, 1958), the match becomes perfect after these changes are induced in the protein by the ligand itself. The newly emerging view of the pre-selection of a complementary conformer, based on experimental and theoretical considerations (Bosshard, 2001; Ma et al., 2002), is an alternative to the classical induced fit, arguing that all proteins exist in ensembles of substates, presenting to the incoming ligand a range of binding shapes. 1.2 Computational Docking and Screening A computational biochemist creating a docking program has to find solutions to the following problems: (1) accurate representation of the shape, chemistry and flexibility of the protein binding site and the ligand; (2) efficient search algorithm to explore the orientational and conformational space of the ligand in the binding site, and (3) prediction of the correct docked ligand orientation. In addition, when the goal is to identify new ligands for a protein of interest, in which case the process is called screening, potential ligands have to be separated from other molecules based on their estimated binding affinity, providing a reasonably short list of ligand candidates for experimental testing. These problems are strongly interrelated: predicting the correct binding orientation depends on the proper representation of the binding site, and a list of plausible ligand candidates could not be provided without a reasonable estimate of the binding affinity, nor could the binding affinity be estimated without a reasonably correct prediction of binding orientation. Although every screening program includes a docking step, screening is different from docking to some extent: the goal of screening is to identify ligands from a large database of typically diverse molecules. When a large database of small molecules is searched for potential ligands, the computational time spent on docking one molecule must be very short. Current docking algorithms spend a relatively long time (ranging from minutes up to hours) on docking one ligand with high accuracy. If only one minute were spent on docking each ligand when searching through a database of about 100 000 molecules, such as the Cambridge Structural Database (CSD) (Allen, 2002), the screening time would be more than two months. The most efficient way to overcome this problem is to rule out the unfeasible ligand candidates as early as possible from the screening process and spend the most time-consuming final docking step on promising candidates. The computational intensity and relative ranking of different ligands are problems that make screening a more challenging problem than docking. 1.3 Overview of Current Approaches in Docking There are a number of excellent reviews in the literature summarizing the principles of docking (Abagyan and Totrov, 2001; Halperin et al., 2002; Lengauer and Rarey, 1996) and the different approaches used for screening large databases for lead generation (Abagyan and Totrov, 2001; Klebe, 2000). This section is intended to give only a brief overview of the main steps involved in the docking and screening processes. 1.3.1 Binding Site Representation All-atom representation of both the binding site and the ligand is used throughout the simulation only when docking is done with molecular dynamics (MD) simulation. The protein is usually held fixed, while the orientational and conformational space of the ligand is sampled with a Monte Carlo (Carlson et al., 2000) or a genetic algorithm (GA) based method (Taylor and Burnett, 2000). The energy of the system is monitored, and a docked ligand orientation is considered to be a possible solution when a local energy minimum is found. These types of methods are the most accurate, but their high computational cost makes them a non-practical alternative for screening. While maintaining the all-atom representation for the small ligands, fast docking programs use a reduced representation of the binding site in order to minimize the cost of the conformational search that follows. Discrete representations come in different flavors: geometric, with added chemical features, knowledge based or grid based. There is a whole spectrum of binding site representations between the geometric and the knowledge-based method, where a limited number of points are laid down to trace the protein binding surface while the density of the points in certain areas is determined by knowledge about favored hydrogen-bond geometry, for example. A more detailed description of the currently used methods for binding site representation is provided in Chapter 3. 1.3.2 Orientational and Conformational Search A rigid-body transformation superimposes the ligand over the binding site in all possible orientations that result in no deep interpenetrations between the molecules’ van der Waals surfaces. When the molecules are represented by discrete points indicating the position of atoms capable of participating in potentially key interactions, it is usually sufficient to match three non-collinear points from the two molecules to perform the 3D alignment. A triplet of non-collinear points bears just enough chemical and spatial information about a molecule to indicate a possible nontrivial match to another molecule with a complementary set of three points. In the same time, it is easier to match triplets of points sets of four or more. The result is a large number of possible matches due to the combinatorial complexity of matching every possible atom triplet from the ligand to every possible triplet point describing the binding site. For example, assuming the binding site is described by only 100 points and the ligand by 10 points, there are 100x99x98 = 970 200 possible protein triangles to be matched to 10x9x8 = 720 ligand triplets, which would result in approximately 7x108 computations. Geometric hashing algorithms are used to reduce the complexity of the problem by replacing the exhaustive search of matching every property of an object A to every property of all the objects from set B with a hierarchical search of matching one property at a time and eliminating mismatches at each step. In the case of SLIDE, for example, all the possible template point triangles describing the protein binding site are organized in index tables based on individual simple properties like chemical labels attached to the points, length of shortest side, length of longest side, and perimeter of the triangle. As a first step, the chemical labels of one possible ligand interaction-point triplet are compared to the chemical labels of all template triangles. Only template triangles with matching chemistry are kept for the next step of matching the length of the shortest side, and during the third step, only triangles with similar shortest sides are kept for comparing the length of the longest side, and so on. The number of matches to be checked is reduced at each level of the index table, resulting in much faster execution times compared to exhaustive matching. There are two main forms of docking: redocking and predictive docking, with redocking being far simpler. This is done by taking the ligand structure out of a crystallographic complex, and docking it back into its target structure, with both molecules initially possessing their favored conformation for binding to each other. Predictive docking, in which the free structure of the ligand is docked into the unliganded, apo structure of the target protein, is much more complex. In this case, the orientational search of the ligand has to be complemented with the exploration of the internal conformational space available for both the protein and the ligand to find the appropriate conformers that complement each other the best. Most current methods treat the ligand flexibly while keeping the protein rigid. The ligand is either incrementally built up in the binding site, or internal dihedral rotations are used to fit the ligand into the rigid binding site. Although better than completely rigid docking, the shortcoming of this approach is unrealistically placing all the burden of induced conformational change onto the ligand. Another method to take into account induced fit upon binding is by allowing a certain amount of van der Waals overlap between protein side-chains and the ligand. This is often called sofi docking. Analysis of conformational changes on complex formation for a representative set of 39 pairs of ligand-free and ligand-bound structures (Betts and Stemberg, 1999) showed that about 50% of the proteins undergo substantial main-chain and side-chain conformational changes when binding ligands. This induced fit is often modeled by selecting alternative side chain conformations for the binding-site residues from a rotamer library or by performing directed rotations of rotatable bonds in the protein side chains and flexible ligand portions to resolve collisions afler the ligand is transformed into the binding site. Inducing main-chain flexibility changes while performing the docking is too expensive computationally, so efforts are directed toward generating a conformational ensemble of the protein, and using this set as the target for the docking instead of one single structure. This approach is also following the line suggested by a number of theoreticians and experimentalists (Bosshard, 2001; Ma et al., 2002), who argue that the idea of the ligand selecting a complementary conformer from the preexisting native state ensemble of the protein is at least an alternative to the classical induced fit, where ligand binding triggers the conformational changes in the binding partners necessary to create a good steric complementarity. A more detailed analysis of handling protein side-chain and main chain flexibility in docking is presented in Chapters 4and 6. 1.3.3 Scoring Docking programs usually return a number of possible docked orientations for each ligand. A scoring function is employed to select the best docking among all. When known ligands are docked to their targets, the scoring function is expected to give the best score to the docking closest to the orientation of that ligand seen in the crystal or NMR structure of the protein-ligand complex. Also, when multiple ligands are docked to a single target, the scoring function should rank them according to their binding affinity. Theoretically, free energy calculations combined with MD simulations were shown provide reliable ranking for some systems, but they are too time-consuming, and as such, not a practical alternative (Miyamoto and Kollrnan, 1993; Pearlman and Chan'fson, 2001). Instead of calculating the binding affinity from first principles, docking programs use scoring functions to estimate the tightness of binding from structural parameters. Empirical scoring functions estimate the fiee energy of binding as a sum of several terms, each of them describing the contribution of one type of interaction to binding, such as van der Waals interactions, hydrogen bonds, ionic interactions, etc. (Bohm, 1994; Rognan et al., 1999; Schapira et al., 1999; Wang et al., 2002). Knowledge-based scoring functions, on the other hand, use statistical preferences derived from pair-wise interatomic distances and frequencies observed in crystal structures of protein-ligand complexes to determine the contribution of individual ligand atoms to the final score (Gohlke et al., 2000; Mitchell et al., 1999; Muegge and Martin, 1999). Scoring is one of the most challenging problems in the field, and there is no existing scoring function that performs consistently well across various systems (Halperin et al., 2002; Tame, 1999). The correct docking, or the one closest to the crystal structure position, is usually near the top of the list, but buried among the large number of false positives (poor or approximate dockings given very favorable scores). Similarly, true inhibitors are often given smaller scores than inactive compounds when a mixed database of known ligands and decoy molecules is screened against a protein target. Consensus scoring has been suggested by several authors as a feasible way to ease this problem, resulting in improvements of up to 65- 70% in hit rates (Bissantz et al., 2000; Charifson et al., 1999). 1.4 SLIDE The docking and screening software SLIDE (Screening for Ligands by Induced-fit Docking, Efficiently) developed in our laboratory (Schnecke et al., 1998; Schnecke and Kuhn, 1999, 2000) models flexible protein-ligand interactions based on steric complementarity combined with hydrogen bonding and hydrophobic interactions. SLIDE efficiently reduces the large number of ligand candidates to a manageable number by using geometry indexing and distance geometry filtering on discrete representations of the protein and ligand candidates. Approximately 100,000 small molecules can be screened and docked by SLIDE in one to two days on a typical desktop workstation. A novel feature of SLIDE amongst geometric (rather than MD) methods is that it can take into account solvation. Consolv, a k-nearest-neighbor classifier developed in our laboratory (Raymer et al., 1997), is applied to predict conservation of binding site water molecules upon ligand binding. Waters predicted to be conserved are included as part of the binding site, although they can be displaced by a ligand atom at a later step if this results in greater molecular complementarity. SLIDE was the first method to balance protein and ligand flexibility in docking. Due to this feature, it can identify and correctly dock diverse, known ligands into the ligand-free conformation of the binding site for a variety of proteins, e.g., subtilisin, cyclodextrin glycosyltransferase, uracil DNA glycosylase, rhizopuspepsin, HIV protease, estrogen receptor, and Asn tRNA synthetase (Schnecke et al., 1998; Schnecke and Kuhn, 1999, 2000). Scoring of the docked protein-ligand complex by SLIDE is based on the number of hydrogen bonds and the hydrophobic complementarity between the ligand and its protein environment. The main steps involved in screening with SLIDE are shown in Figure 1. The purpose of the work described in this thesis was not only to enhance the performance of SLIDE by improving the representation of the protein binding site and including protein main chain flexibility into the docking process, but to apply it to solving biologically relevant problems. 10 M PREPARATION y ‘4/ Generate template with hydrogen bonding and hydrophobic interaction points for the binding site of the protein of interest. Determine interaction points of the prospective ligand candidates. Create look-up tables indexing all possible template triangles by chemistry (donor/acceptor/hydrophobic) labels and triangle geometry. i Identify feasible template triangles for each triplet of ligand interaction points. Dock ligand into the binding site by least-squares fit of ligand triangle onto template triangle. /’MODELING INDUCED? \ / Identify rigid anchor fragment (defined by matched interaction point triangle). Identify flexible bonds in the ligand. Resolve collisions between ligand anchor fragment and protein by iterative ligand translations. Resolve side chain collisions by directed rotations. i Score protein-1i gand complex based on number of intermolecular H-bonds and hydrophobic complementarity. Figure 1.1. An overview of the SLIDE screening and docking algorithm. 11 Chapter 2 Predicting the Ternary Complex of R67 DHFR - NMN 0 Folate with SLIDE The research presented in this chapter has been previously published in: Howell, E.E., Shukla, U., Hicks, S.N., Smiley, R.D., Kuhn, L.A., Zavodszky, M.I. One site fits both: a model for the ternary complex of folate + NADPH in R67 dihydrofolate reductase, a D2 symmetric enzyme. J. Comput. Aided M01. Des. 15:1035-52, 2001. 2.1 Introduction R67 dihydrofolate reductase (DHFR) is a novel enzyme that confers resistance to the antibiotic trimethoprim. The crystal structure of R67 DHFR displays a toroidal structure with a central active-site pore. This homotetrameric protein exhibits 222 symmetry, with only a few residues from each chain contributing to the active site, so related sites must be used to bind both substrate (dihydrofolate) and cofactor (N ADPH) in the productive R67 DHFR-NADPH-dihydrofolate complex. Whereas the site of folate binding has 12 been partially resolved crystallographically, an interesting question remains: how can the highly symmetrical active site also bind and orient NADPH for catalysis? Since computational docking tools are optimally suited for predicting such biologically important protein-ligand complexes, 1 employed our docking program SLIDE to model this ternary complex. Approaching a problem with different methods followed by a comparison of the results has the potential of providing a more confident prediction by supplying the complementary pieces of the whole puzzle. I compared the SLIDE results with the model predicted by Dr. Elisabeth Howell using DOCK, another method for docking flexible ligands into proteins using a quite different algorithm (Howell et al., 2001). One of the strengths of SLIDE is the balanced protein-ligand flexibility modeling, whereas DOCK explores the ligand conformational space more thoroughly. The two programs also employ different scoring functions to rank the dockings. Dihydrofolate reductase (DHFR) catalyzes the reduction of dihydrofolate (DHF) to tetrahydrofolate (THF) using NADPH as a cofactor. This enzyme is essential in folate metabolism since tetrahydrofolate is required for the synthesis of thymidylate, purine nucleosides, methionine, and other metabolic intermediates; thus, DHFR has been a prime target for anticancer and antibacterial therapy. Whereas chromosomal DHFR has been extensively studied and was one of the first successful targets for structure-based drug design, the plasmid R67 encoded DHFR has only recently been characterized. R67 DHFR is of special interest because it can transfer resistance between bacteria against the antibiotic trimethoprim. This D HFR h as an entirely different 8 equence and fold from chromosomal DHFR (Narayana et al., 1995). R67 DHFR is a homotetrarner in which each short chain forms a five—stranded B-barrel also found in SH3 domains (Narayana et 13 al., 1995) and a variety of other proteins including the Tudor domain of human survival motor n euron p rotein l , f erredoxin thioredoxin reductase, n itrile h ydratase, two 0 f t he SOS ribosomal proteins, and HIV integrase (Holm and Sander, 1996). Type II DHFR, typified by R67 DHFR, is a dimer of dimers as shown in Figure 2.1. The central pore forms the active site, and the high degree of symmetry means that each of the four subunits contributes the same few residues to the binding surface. R67 DHFR is unlike the chromosomal enzyme in another respect. There are three different ligand binding combinations available to its active site: 2 folate/DHF, or 2 NADPH, or 1 folate/DHF plus 1 NADPH (Bch et al., 1996). The latter is the productive ternary complex. Thus, each half of the pore can bind either NADPH/NADP+ or folate/DHF, a very different binding strategy than observed for chromosomal DHF R. Crystallographically defining the positions of bound ligands has proven especially difficult for the plasmid encoded enzyme, as the four-fold symmetry within the pore results in a four-fold dilution of the electron density. For example, if one ligand is bound, there is an equal probability that this binding will be in any one of the four equivalent sites within the pore for each of the individual protein copies in the crystal lattice. This effectively dilutes the observed electron density to an average over these four states. The symmetry and small size of the pore also means that the same residues (possibly from different chains) must contribute to the binding of both folate and NADPH. Thus, R67 DHFR is a fascinating system for studying how evolution can select a limited number of residues to co-optimize the catalytically productive binding of two quite different ligands, folate and NADPH. l4 Figure 2.1. Panel A: The structure of R67 DHFR is a homotetramer formed by a dimer of dimers in which all four subunits (shown in red, yellow, blue, and green ribbons) contribute equally to create the symmetry related binding sites for folate and NADPH. The pteridinc ring of the bound folate (shown in tubes in the central pore) and the side chains of the residues lining the pore are also shown. The view is down a twofold symmetry axis. Panels B and C describe a reverse image of the active site generated using the SPHGEN subroutine of DOCK on the 1VIE DHFR coordinates from the PDB. Water molecules were removed fi‘om the PDB file prior to running SPHGEN. Each sphere point corresponds to a possible atom position for docked ligands. In Panel A, the sphere cluster would fill the active site pore. Two perpendicular orientations of the protein chains and sphere cluster are shown in panels B and C. A previous model of the ternary complex was given in Narayana et a1. (Narayana et al., 1995). However, that model contained three bound ligands (2 folate + 1 NADPH), inconsistent with more recent solution studies indicating only two ligands are bound (Bradrick et al., 1996). Also the model by Narayana et a1. positioned the productively bound folate molecule parallel to and above the NADPH molecule. This would predict numerous interligand NOEs, which are not observed in NMR experiments (Li et al., 2001). H owever, the p artial d ensity available for f olate in the b inary c rystallographic complex provides a valuable guide to its favored position within the pore. The possibility that NADPH could interact in R67 DHFR in the same orientation relative to folate as it does in the chromosomal DHFR crystal structures was evaluated. However, due to steric limitations within the small pore of R67 DHFR, this binding mode is not feasible. Because the chromosomal DHFR complexes do not explain how the substrate and cofactor bind in R67 DHFR, and this ternary complex has so far proven crystallographically inaccessible, docking methods were used to predict their interactions in R67 DHFR. The predicted interaction of NADPH with folate in R67 DHFR were then compared with their orientation in chromosomal DHFR, and related to the effects of site- directed mutants on ligand binding. 2.2 Methods DOCK v4.0 and SLIDE v1.1 were utilized to predict the binding modes of NADPH and folate in the active-site pore of R67 DHFR. DOCK v4.0 uses van der Waals interactions in its scoring and allows ligand flexibility (Ewing et al., 2001). SLIDE v1.1 includes protein side-chain flexibility, ligand flexibility, probabilistic inclusion of active-site l6 bound water molecules, and a scoring function with hydrophobic interaction and hydrogen bond terms (Schnecke and Kuhn, 2000). The structures of folate and NADPH and their atom labeling conventions are given in Figure 2.2. SLIDE (Schnecke and Kuhn, 2000) is described in Chapter 1, section 1.4. CONSOLV, a k-nearest-neighbor-based classifier (Raymer et al., 1997), was used to identify b inding-site w aters likely to b e c onserved upon ligand binding b ased on their mobility and their favorable interactions with the protein. CONSOLV labeled each bound water molecule in the lVIF R67 DHFR structure according to its probability of being conserved upon ligand binding, and these values were used by SLIDE to appropriately incorporate bound water molecules or to penalize their displacement by non-polar ligand atoms. DRUGSCORE is a knowledge-based scoring function (Gohlke et al., 2000) that was shown to discriminate efficiently between well-docked ligand-binding modes and computer-generated artifacts. DrugScore was used in addition to the built—in scoring function of SLIDE to score and rank all docked ligand orientations with a suitable distance between the C4 atom of the NADPH nicotinamide ring and the C6 of the folate pteridinc ring (<5.0 A). LIGPLOT (Wallace et al., 1995) was used to create the figures showing the protein- ligand and ligand-ligand hydrogen bonds and hydrophobic interactions. 17 FOLATE NADPH N7 HS HR Hzl\ NC7 070/ NC3 NC5 NCB‘ ch' HO OH N03' N02 \ J V Figure 2. 2. The structures of folate and NADPH. Reduction of folate across the C7-N8 bond yields dihydrofolate. During catalysis, the A or re hydrogen (HR) on C4 of the nicotinamide ring faces the si face of the folate pteridinc ring, which accepts a hydride at C7. The hydride would approach the si face of the pteridinc ring from beneath the plane of the paper. The NMN moiety of NADPH is indicated by the bracket. 18 The coordinates of apo R67 DHFR as well as a binary complex with 2 folates bound are available as lVIE and lVIF (Narayana et al., 1995) at the PDB. In the present study, the structure lVIF was used. The coordinates of the NADPH molecule were taken from the TRIPOS database for the DOCK experiment. SLIDE handles ligands as flexible molecules, but it avoids large conformational changes compared to the starting conformation. To include a broad range of energetically favorable starting conformations in docking with SLIDE, 59 NADPH molecules were extracted from crystal structures of various protein-NADPH complexes fiom the PDB. The nicotinamide ring is syn with respect to the ribose ring in 14 of these NADPH conformations and it is anti in 45 of them. 2.3 Results 2.3.1 Active Site Symmetry and Docking Constraints A reverse image of R67 DHFR’s active site was generated using the DOCK subroutine, SPHGEN. Two orientations of this image, given in Figure 2.1.B and C, show the symmetry associated with the pore as well as its size. If the ligand were small with respect to the binding site, four symmetry related sites could potentially be occupied. A larger ligand would reduce the number of possible binding sites because of steric hindrance. Binding of the ligand near the center of the pore, as is the case with Fol I from the crystal structure, is expected to have a similar effect by breaking the 222 symmetry, limiting the number of possible bound molecules to at most two, which is consistent with the experimental results (Bradrick et al., 1996). 19 Several constraints obtained from experimental data were used in preparing the docking experiments and in screening the docked ligand conformations to eliminate unlikely binding modes: 1. Isothermal titration calorimetry (ITC) data show a total of two ligands bind (Bradrick et al., 1996). The combinations are two folates or 2 NADPHs or 1 NADPH + 1 folate. Binding of two NADPH molecules shows negative cooperativity (24), suggesting the first molecule binds at or near the center of symmetry and impedes binding of a second molecule at a symmetry related site. Binding of two folate molecules shows positive cooperativity, indicating there are interactions between the bound folate molecules that enhance affinity. 2. Interligand NOE (ILOE) data from Li et al. (Li et al., 2001) show few ILOE’s, suggesting the ligands are bound in extended conformations on opposite sides of the pore and meet somewhere in the middle of the pore. 3. From fitting the electron density, two folate molecules were modeled in asymmetric positions in lVIF (Narayana et al., 1995). Fol I is bound productively with its si face exposed, whereas Fol H has its si face against the side of the pore, making it unavailable to receive a hydride. For this reason, Fol I was used to dock NADPH to the binary complex of R67 DHFR-folate. 4. For docking of folate or its analogues, the docked pteridinc ring should conform to the observed electron density in the crystal structure (Narayana et al., 1995). This flat density was observed at the center of the pore near the Gln 67 residues, which form the “floor” and “ceiling” of the binding site. Density for the p—aminobenzoic acid- Glu (PABA-Glu) tail was not observed in the crystal structure, indicating disorder. 20 2.3.2 Docking of NMN into R67DHFR0F01 I Using SLIDE All SLIDE dockings with a distance of 5 A or less between the C4 of the nicotinamide ring of NMN and the C6 of the folate pteridinc ring involved in hydride transfer were analyzed. There are four possible orientations: the nicotinamide ring can be syn or anti with respect to its ribose ring, and in both cases either the pro-R (A-side) or the pro-S (B- side) hydrogen can point toward the pteridinc ring. These orientations are named syn R, syn S, anti R and anti S, respectively. Among the docked orientations, 39 adopted a syn R conformation, 4 were in syn S, 20 in anti R, and 12 in anti S. This distribution indicated a preference for the syn R orientation of NMN to interact with the R67 DHFR- Fol I complex, especially given that there were about three times as many anti conformers as syn conformers in the input data set of NADPH molecules. The syn R orientation is the one most consistent with the experimental results (Brito et al., 1990; Li et al., 2001) In addition to the built-in scoring function of SLIDE, DrugScore (Gohlke et al., 2000) was used to evaluate these NMN dockings. DrugScore calculates an empirical intermolecular potential, with the best scores having the largest negative values, whereas the best SLHDE scores have the largest positive values (greater hydrophobic complementarity and number of intermolecular hydrogen bonds). Most of the high- scoring ligand orientations (lower right in Figure 2.3) were in syn conformation with the R-side hydrogen of the nicotinamide ring directed toward the folate. These orientations had the best scores with both scoring fimctions, except for one anti R orientation, which obtained an unusually high score with DrugScore. The available version of DrugScore does not consider water-mediated interactions, and therefore preferred dockings of 21 NADPH closest to the wall of the binding site such as this one. The anti S orientations, which obtained high scores from SLIDE but not DrugScore, had a larger number of hydrogen b onds formed b etween the P, o f NMN and v arious p rotein residues, b ut the nicotinamide ring formed at most one hydrogen bond with the protein. However, to have a well-defined stereochemistry between NADPH, folate, and the protein, some specific hydrogen b ending is expected b etween the h ead o f t he N ADPH rn olecule and D HFR. The docked NMN in syn R orientation best fulfills this requirement by forming three hydrogen bonds between the O7 and N7 atoms of the nicotinamide head and the backbone hydrogen and nitrogen of Ile 68A, as well as the backbone oxygen of Val 66A, the latter being mediated by a water molecule (W 121A). For waters bound in the DHFR-Fol I crystallographic complex, CONSOLV was used to predict their probability of conservation upon NADPH binding, based on the favorability of their interactions with the protein. After eliminating those water molecules that were found to be too close (<2.5 A) to a protein or folate atom, only 11 water molecules were predicted to be more than 50% likely to be conserved inside the pore. Performing the docking experiment with the conserved water molecules included as part of the binding site did not result in significantly different dockings. The preference for the syn R orientation of the docked NMN was slightly higher compared to the dockings without waters, accounting for 62% of the docked conformations that have a high SLIDE ranking. A number of water molecules were found to be important in anchoring the docked NMN to the protein (Figure 2.4.A), similarly to water molecules 121 and 124 (Figure 2.4.B) which have been suggested to form a bridge between the pteridine ring of folate 22 and the backbone of the R67 DHFR (Narayana et al., 1995). However, these water molecules were predicted to be only moderately conserved by CONSOLV. The explanation of this finding originates in the symmetry of the binding site: the productively bound pteridinc ring can occupy any of the four symmetry related positions in the R67 DHFR tetramer structure, and by doing so it displaces different water molecules in different tetramers in the crystal lattice. As a result, many water molecules from the crystal structure of the R67 DHFR-folate complex (PDB entry lVIF) have high temperature factors. In predicting conserved waters, CONSOLV weighs temperature factors heavily, so most of these waters were predicted to be only 28 - 55% conserved. -20 i I SynR -22-1. *' A SynS ._ o AntiR -24- .' _ . - . ' tr AntiS 9 . . .' l' I I t * * 8 -26— . ‘ . '- ' ,, w * o l! ‘ * t 2 '28— A. ' x I: l t 8 . 1 .I l .' . I a '30- . I ' I ' D " I -32" I I -34-1 0 ' I ' I ' I ' I ' I ' 10 15 20 25 30 35 40 SLIDE score Figure 2.3. Comparison of the scores for well docked NMN molecules (consistent with experimental constraints and a distance of less than 5.0 A between C4 of NADPH and C6 of folate) obtained with two different scoring functions: those of SLIDE and DrugScore. Because the currently available version of DrugScore does not include water-mediated contacts, these dockings did not include water molecules from the binding site, though similar dockings were found with water molecules included. 23 A Cir 64(C) Ile 68(D) \ Val 66(C) Ser 65(C) Val 66(A) Ile 68(A) Tyr 69(A) Gln 67(C) Gln 67( A) Figure 2. 4.A. Potein-ligand and ligand-ligand interactions from the R67 DHFR-Fol IONMN ternary complex for the NMN docked in the R67 DHFROFol I structure (drawn by LIGPLOT). The position of the NMN molecule in this complex corresponds to the top scoring docking obtained with SLIDE. W denotes water molecules. 24 Ile 68(D) FOL , 3.41)” ,r \ j” ‘ Gln 67(D) - \\ l ‘ II HI ’I 3.46 _ Val 66(B) D 3"" llhh lux‘ 2.4K ; 2.63. 1”“ g , $2.46 \\‘.Ijlllll \\'l,‘|IlM 193 Val 66(1)) Ile 68(B) \gxliw/ Q 4 Gln 67(B) KEY TO SYMBOLS FOL folate; NMN nicotinamide mono-nucleotide (A), (B), (C), (D) subunit identifiers in R67 DHFR H Bonds in ligand His 53 H Bonds in other molecules W6 .310 Hydrogen bond and its length 0 Figure 2. 4. B. Protein-ligand and ligand-ligand Non—ligand residues involved in hydrophobic contact(s) Corresponding atoms involved in hydrophobic contact(s) interactions from the R67- DHFROFOIIONMN ternary complex for the pteridine ring. The position of the NMN molecule in this complex corresponds to the top scoring docking obtained with SLIDE. W denotes water molecules. 25 There is good agreement between the predictions of SLIDE and DOCK: both predict syn R to be the most likely orientation of the NMN molecule relative to folate. The position of the nicotinamide ring in the top scoring orientations (using SLIDE’s scoring function) syn R is very similar to the top orientation produced using DOCK (Figure 2.5). The largest differences are found in the position of the phosphate group of NMN, which is understandable given the large space available and the absence of constraints because of the missing tail of the NADPH. The non-hydrogen atom RMSD between the top NMN orientations obtained with D OCK and S LIDE is 1 .5 A (Figure 2.5). The SLIDE scores and DrugScore scores for these two top dockings are 28.8 and — 34,1300 for the DOCK docking and 36.4 and ~32,2246 for the SLIDE docking. The protein-ligand interactions generated by LIGPLOT (Wallace et al., 1995) for the R67 DHFR-Fol I-NMN ternary complex are shown in Figures 2.4.A and 2.4.3 for NMN and the pteridinc ring, respectively. The position of the NMN molecule corresponds to the top scoring NMN docking obtained with SLIDE, and the F01 I position from the crystal structure is u sed. A c omparison o f the c ontacts for NMN and f olate shows that symmetry related residues were involved in binding both ligands. For example, Gln 67 from both the B and D subunits made several contacts with the pteridinc ring, while Gln 67 from the A and C subunits made several contacts with the nicotinamide ring. Utilization of symmetry related residues during binding was also observed for Ile 68. Fol I binding involved Ile 68 from the D subunit which interacted with the pteridinc ring, while Ile 68 from the A and D subunits interacted with the nicotinamide and ribose groups. Numerous van der Waals contacts and a hydrogen bond were also predicted between the ligands, as shown in Figures 2.4A and 2.4B. Positive 26 Figure 2. 5. The NMN portion of NADPH docked into the binding site of R67 DHFR in syn R orientation next to the pteridinc ring of folate (purple, at top). The solvent accessible molecular surface of the binding site is colored according to atom type: carbon is green, oxygen is red and nitrogen is blue. The top scoring orientation of NMN obtained with SLIDE (obtained with the water-mediated template and ranked lSt by SLIDE and 3rd by DrugScore) is shown in white and that obtained with DOCK is shown in magenta. Hydrogen atoms are shown only for the C4 of NADPH, which donates the hydride to reduce folate. cooperativity has been previously observed between R67 DHFRONADPH and DHF (Bradrick et al., 1996). The proposed interactions between NMN and F01 I may describe how positive cooperativity between NADPH and folate is generated. One of the significant differences between SLIDE and DOCK is that SLIDE allows protein flexibility upon docking by balancing ligand and protein side chain rotations to resolve van der Waals overlaps, whereas DOCK more thoroughly explores ligand conformational space. In the case of R67 DHFR, there were only slight movements of two Gln 67 residues from subunits A and C, resulting in displacements of 27 less than 0.5 A away from the docked NMN molecule, maintaining the original hydrogen-bonding pattern of the protein. 2.4 Discussion 2.4.1 How Can R67 DHFR Bind Both NADPH and Folate? There are a number of cases in which the same site in a protein is designed to accommodate binding of several different ligands. Binding of diverse peptides to the major histocompatibility complex is achieved by having a number of specific binding pockets available for different side chains as well as by making key interactions to the peptides’ backbones (Fremont et al., 1992; Matsumura et al., 1992). Binding of different unfolded protein chains to GroEL is proposed to be accomplished mainly by hydrophobic interactions where more flexibility is allowed (Chen and Sigler, 1999). To bind various sugars, the maltodextrin transport/chemosensory receptor uses aromatic rings to interact with the sugar ring faces (Quiocho et al., 1997). Binding of various peptides to oppA, a peptide transporter, utilizes numerous intermediary water molecules (Sleigh et al., 1999), as does binding of various fatty acids to adipocyte lipid-binding protein (LaLonde et al., 1994), binding of various sugars to arabinose binding protein (Quiocho et al., 1997), and high-affinity binding of a proteinaceous inhibitor, BLIP, to B-lactamases with diverse sequences (Strynadka et al., 1994). These are all mechanisms to facilitate numerous binding modes. 28 Hot spots for protein-protein interactions have been noted and evaluated by mutagenesis and statistical analysis (Bogan and Thorn, 1998; DeLano et al., 2000; Hu et al., 2000). A general trend proposed is the presence of residues that are amphipathic or can make hydrophobic and hydrogen-bonding interactions. For example, Tyr, Trp and Arg have a large hydrophobic component to their side c hains as well a s the ability to provide polar interactions. The residues that provide binding contacts in the center of R67 DHFR’s active site pore include Ser 65, Val 66, Gln 67, Ile 68 and Tyr 69. The side chains of Ser 65 and Gln 67 are polar, while those of Val 66 and Ile 68 are hydrophobic. However, since Val 66 and Ile 68 present both their hydrophobic side chains as well as their backbone NH- and carbonyl groups for potential interactions, they can mediate both hydrophobic and polar interactions on the active site pore surface. Similarly, the side- chain methylene groups of Gln 67 also comprise part of the binding surface. From Figures 2.4.A and 2.4.3, it is clear that the same residues are likely to be involved in binding both NADPH/NMN and folate/DHF. Utilization of protein symmetry is the mechanism by which this is achieved. For example, Gln 67 from subunits A and C make contacts with the NMN moiety while Gln 67 from subunits B and D make contacts with folate. This trend is also apparent with Val 66, Ile 68, Tyr 69, and Lys 3 2 r esidues. When symmetry 0 perations are p erforrned o n the d ocked f olate and NMN conformers, it is clear that while the binding sites are not identical, they overlap to a great extent. Three of the four symmetry related sites (generated by symmetry rotations) are shown in Figure 2.6. Two of the symmetry related sites compare the F01 I and NMN (top scoring conformer from DOCK) binding modes while the third compares NMN and Fol II (the non-productively bound folate in lVIF). The fourth symmetry related site is 29 empty, precluding a comparison. Polar atoms that occupy similar positions in panel A are N5 of Fol I and N1 of the nicotinamide ring of NMN. In panel B, the C4 oxygen (Fol I) and the carboxamide oxygen (NMN), the N1 (Fol I) and N1 (NMN) as well as the N3 (Fol I) and carboxamide nitrogen (NMN) atoms occupy similar positions. Finally in panel C, the corresponding pairs of polar atoms that are close in space include: the C4 oxygen (Fol II) and the carboxamide oxygen (NMN) as well as the N1 (Fol II) and the N1 (NMN) atoms. This comparison supports a variation of hot spot binding, in which a few residues are responsible for most of the binding through making both polar and hydrophobic interactions with a small molecule ligand, rather than a protein (DeLano et al., 2000). The number of similar docking orientations of the NMN fiagment of NADPH indicates some alternative possibilities for hydrogen bonding to DHFR. This is also consistent with some mobility of bound NADPH, which in turn may explain the lower catalytic efficiency of R67 DHFR(Dion-Schultz and Howell, 1997; Reece et al., 1991). Because of the high degree of symmetry associated with the binding site of R67 DHFR, the catalytically productive folate-NADPH complex can bind in four equivalent positions, such that both molecules can be positioned at either side of the pore. The position adopted by NADPH independent of folate might well be different from the optimal p osition w hen f olate is p resent, for two reasons: b ecause folate c reates a n ew chemical and structural environment that can favor a different placement of NADPH, and because the symmetry of the pore tells us that there may be several favored, overlapping optimal placements for folate and NADPH (Figure 2.6). Therefore, it seems that co- optimization of both ligands’ binding is important. 30 Figure 2.6. Overlap o fthe NMN binding site with Fol I and Fol II sites. While two molecules do not bind in the same site concurrently, the symmetry of R67 DHFR implies that the same site must be used at different times for both NADPH and folate (in different halves of the pore or in different copies of the protein). Here, the top-scoring orientation of NMN from DOCK (Figure 3A) is compared (by symmetry operations) with the crystallographic orientation of Pol I or Fol II in the same site. Their substantial overlap corresponds to the region in which residues must be co-optimized for NADPH and folate binding. NMN atoms are labeled in yellow while Fol I or Fol H atoms are shown in white. In panel A, the closest protein atoms for interaction with the N1 (NMN) and N5 (Fol I) nitrogens are the carboxamide groups of the Q67 residues (3.69—4.46A distant). In panel B, the closest protein atoms for interaction with the N1 ligand nitrogens are again the Q67 carboxamide groups (3.68-3.93A). For interaction with the O4 (Fol I) or 07 (NMN) oxygens, the backbone NH from 168 lies nearby (3.07-3.25A). The N3 (Fol I) or N7 (NMN) atoms come closest to the backbone oxygen of I68 (3.57-4.75A). In panel C, the backbone NH of 168 is close (2.90—3.25A) to the O4 (Fol II) or 07 (NMN) oxygens while the backbone oxygen of 168 could interact with the N5 (Fol H) or the N7 (NMN) atoms (2.68-3.28 A). The closest protein atoms for interaction with the N1 nitrogens are again the carboxamide groups from the Q67 pairs (3.68-4.37A). A similar comparison of the overlap between the F01 I and F01 11 sites is shown in Figure 4b of Narayana et a1. (Narayana et al., 1995). 31 F()|. ll Figure 2. 6. 32 2.4.2 Relationship to Mutagenesis Results Mutagenesis of R67 DHFR has been performed to evaluate the roles of many of the pore residues in ligand binding: Lys 32, Ser 65, Gln 67, Ile 68 and Tyr 69 (Park et al., 1997 ; Strader et al., 2001). The effects of mutations are consistent with the docked interactions of NADPH and folate (Figure 2.4). Mutating Ser 65 to Ala does not affect catalytic efficiency, suggesting it does not interact directly with the ligands. NMN docking by SLIDE predicted the Ser 65 side chain hydrogen bonds to a water molecule that participates in NMN binding; however, this water site is also stabilized by interactions with Tyr 69 and could persist in the absence of interactions with Ser 65. Gln 67 hydrogen bonds directly with NMN and makes hydrophobic interactions with folate in the docked ternary complex. Ile 68 makes direct hydrogen bond and hydrophobic interactions with NMN, as well as water-mediated interactions with folate. Tyr 69 participates in water-mediated interactions with NMN. As shown in Table 1, mutations at any of these residues (except S65) alter the Km values for both ligands. The changes in Km vary over three orders of magnitude (fiorn 100 fold tighter to 10 fold weaker), however the ability of the mutations to preferentially alter NADPH vs. DHF binding appears marginal (Park et al., 1997; Strader et al., 2001). These data support a dual role for these active-site residues in binding both ligands. 33 Table 2.1. A comparison of steady state kinetic values for R67 DHFR variants at pH 7.0. DHFR Species 11..., (s'l) Kmmmr) Kmmmu) (pH 7) (14M) (11M) Wt R67 DHFR“ 1.3 i 0.07 5.8 i 0.02 3.0 :I: 0.06 S65A R67 DHFRb 1.1 i 0.10 4.0 :l: 0.51 2.9 i 0.57 Q67H R67 DHFR (pH 8)“ 0.022 :1: 0.003 0.16 a; 0.01 0.028 1: 0.001 I68M R67 DHFR” 0.17 d: 0.03 25 i 3.0 21 :1 3.0 Y69F R67 DHFRb 2.5 1: 0.04 44 1: 2.1 66 i 2.6 “ Values from reference Reece et al., 1991. b Values from reference Strader et al., 2001. c Values from reference Park et al., 1997. 2.4.3 A Model for Hydride Transfer The picture emerging from the docking studies using SLIDE and DOCK (Howell et al., 2001) is that folate and NADPH approach the catalytic site from the opposite ends of the R67 DHFR binding pore, with the pro-R side of the nicotinamide ring of NADPH turned toward the si face of the pteridinc ring of folate. The hydride transfer distances between C7 of the pteridinc ring and C4 of the nicotinamide ring, which participate in the reduction of folate (Figure 2.5) are predicted to be between 3.72 - 3.93 A. These distances are longer than the 2.6-2.7 A predicted by ab initio calculations (Castillo et al., 1999; Wu and Houk, 1987) and from a model of the transition state in E. coli DHFR (Bystroff et al., 1990). No docking method would probably be able to reproduce the 34 distances predicted by ab initio calculations for transition state complexes, but it is possible to reproduce crystal structure orientations with differences in intermolecular distances of approximately 0.2 A. When testing the capacity of SLIDE to reproduce the crystal structure orientation of NADP+ from a chromosomal DHFR in complex with folate and NADP+ (PDB code 1RA2), the docked orientation of NADP+ closest to the crystal structure position resulted in a C4-C7 distance of 3.45 A, comparable to the 3.21 A value found in that same crystal structure. The greater distances observed in the R67 DHFR dockings imply either a low rate of hydride transfer or an interligand chemical attraction that shortens the distance. Molecular dynamic studies suggest that in general, enthalpic contributions to catalysis predominate over entropic contributions (Bruice and Benkovic, 2000). However in R67 DHFR, a range of similar docking modes is predicted for the ligands, or perhaps an unusual degree of mobility (Howell et al., 2001). Both these options likely result from the use of symmetry related residues. The ability of the PABA-Glu tail of folate and the 2’,5’-ADP tail of NADPH to remain flexible but still maintain favorable electrostatic interactions may enhance binding through entropic as well as enthalpic contributions. An additional consequence of alternate binding modes for the ligand tails (or an enhanced mobility) might be to prevent binding of two molecules in one half of the pore, and instead steer binding to one molecule in opposite sides of the pore. 35 2.5 Conclusions The evolution of catalytic activity is the focus of many recent research articles. One perspective suggests new enzymes evolve by gene duplication followed by accumulation of mutations. This approach takes advantage of structural and mechanistic similarities in generating different catalytic activities and suggests a certain level of catalytic promiscuity (Babbitt and Gerlt, 1997 ; O'Brien and Herschlag, 1999). In addition, catalytic antibodies might be expected to provide insight into the process of enzyme evolution. They appear to adopt predominately a lock and key strategy towards binding transition state analogs. Also, a comparison of different catalytic antibodies that catalyze the same reaction suggests they mostly converge to the same binding site motif (Karlstrom et al., 2 000; Mader and B artlett, 1 997; S mithrud and B enkovic, 1 997). In contrast to these evolutionary strategies, the results of DOCK and SLIDE showing the favored orientation of NADPH relative to folate in R67 DHFR indicate this enzyme has adopted a novel, yet simple approach: the utilization of symmetry related residues to bind both NADPH/NADP+ and folate/DHF using a range of interaction types through a limited number of amphipathic residues. This symmetry is used to generate a hot-spot surface that accommodates numerous, different interactions. 36 Chapter 3 Distilling the Essential Features of a Protein Surface for Improving Protein-Ligand Docking, Scoring, and Virtual Screening The research presented in this chapter has been previously published in: Zavodszky, M.l., Sanschagrin, P.C., Korde, R.S., Kuhn, L.A. Distilling the essential features of a protein surface for improving protein-ligand docking, scoring, and virtual screening J. Comput. Aided M01. Des. in press. 3.1 Abstract For the successful identification and docking of new ligands to a protein target by virtual screening, the essential features of the protein and ligand surfaces must be captured and distilled in an efficient representation. Since the running time for docking increases exponentially with the number of points representing the protein and each ligand candidate, it is important to place these points where the best interactions can be made 37 between the protein and the ligand. This definition of favorable points of interaction can also guide protein structure-based ligand design, which typically focuses on which chemical groups provide the most energetically favorable contacts. In this chapter, a method of protein template and ligand interaction point design that identifies the most favorable points for making hydrophobic and hydrogen—bond interactions by using a knowledge base is presented. The knowledge-based protein and ligand representations have been incorporated in version 2.0 of SLIDE and resulted in dockings closer to the crystal structure orientations when screening a set of 57 known thrombin and glutathione S—transferase (GST) ligands against the apo structures of these proteins. There was also improved scoring enrichment of the dockings, meaning better differentiation between the chemically diverse known ligands and a ~15,000-molecule dataset of randomly-chosen small organic molecules. This approach for identifying the most important points of interaction between proteins and their ligands can equally well be used in other docking and design techniques. While much recent effort has focused on improving scoring functions for protein-ligand docking, our results indicate that improving the representation of the chemistry of proteins and their ligands is another avenue that can lead to significant improvements in the identification, docking, and scoring of ligands. This work is the result of a group effort. My roles were to develop the knowledge-based hydrogen bonding template, design the experiments to test the new method, analyze the results, and write the manuscript. 38 3.2 Introduction 3.2.1 The Evolution of Protein Surface Representations in SLIDE Two methods to generate a template for the binding site of interest were initially implemented in SLIDE: small, biased, pharmacophore-like templates, and unbiased, grid- based approaches. The biased template is based on known ligand binding modes and consists of coordinates of ligand atoms making hydrogen bonds or engaging in hydrophobic interactions with the protein of interest, as seen in crystal structures of protein-ligand complexes. This pharmacophorc-like representation of binding determinants is biased towards known ligands and is especially appropriate when the aim is to identify other molecules that make similar interactions. When the goal instead is to identify new classes of ligands or help define the ligand specificity for protein structures with unknown functions, an unbiased, thorough representation of the potential ligand- binding site is preferable. Therefore, SLIDE also has an option to automatically generate an unbiased template based on a ligand-free structure of the protein. To generate an unbiased template in version 1 of SLIDE, the binding site was filled with a large number of points, initially located on a fine grid with a spacing of 0.3—0.7 A (Figure 3.1.A) (Schnecke and Kuhn, 2000). Initial experiments with random placement of the points showed significant under-representation of some areas in the binding site, so the grid- based approach was adopted instead. Only points located 2.5 to 5.0 A from the nearest protein atom were kept. Each point was then checked to determine if it could serve as a hydrogen bond donor, acceptor, or form a hydrophobic interaction with the protein, and was either labeled as such, or eliminated from the set. All points of the same class were 39 then clustered using complete linkage clustering to reduce the number of template points to 150 or fewer. OOOOOOOOOOOOOOOO O. 0 0 protein protein Figure 3.1. Comparison of the grid-based (A) and knowledge-based (B) template generation methods. Template points are generated on a grid in version 1 of SLIDE. The method implemented in SLIDE, version 2, uses a knowledge base to define points where optimal protein—ligand interactions can be made, based on points where the ligand could make optimal hydrogen bonds and hydrophobic interactions with the protein. Template points are colored according to their type: green for hydrophobic, red for acceptor, blue for donor, and purple for donor and/or acceptor points. Improving the success rate of docking known ligands to a protein structure that does not already have correct side-chain conformations for that ligand (e.g., an “apo” structure of the protein, solved in the ligand-free state) was the motivation for the present work, which is aimed at defining protein templates that capture optimal points for interacting with the protein. Knowledge bases of hydrogen-bonding geometry around protein groups (Ippolito et al., 1990; McDonald and Thornton, 1994) allow us to focus now on optimal (rather than just feasible) positions for hydrogen bonding. Significantly hydrophobic positions at the protein surface can also be distinguished from the 40 background level of solvent-exposed carbon atoms, based on the local enhancement of hydrophobic atoms. Similarly, the interaction points on ligand candidates can be sampled to have similar density and chemistry to the hydrophobic and hydrogen-bonding assignments in the protein template. While this work has been driven by the aim to improve the modeling of protein recognition through docking in SLIDE, this representation of key interacting groups in proteins and ligand candidates is also expected to be usefirl for other docking methods, and to provide a focus on optimal interactions to make in structure-based protein and ligand design. 3.2.2 Other Approaches for Discrete Representation of Protein Binding Sites Reduced representations of protein binding sites have been developed by other groups for use in modeling protein recognition. Typically, the protein’s binding site is discretized to a set of 100 or fewer interaction points to enable fast comparison between the protein and each ligand. Many of these methods use reduced representations to aid in matching the protein and ligand surfaces. The initial, computationally complex search of the 6 degrees of rotational and translational freedom of the ligand relative to the protein is reduced to a problem of matching a set of N points on the ligand to the best-matching subset of N points from M points on the protein. N and M typically must be small due to the factorial complexity of the number of ways of matching N points to a larger set of M points. In the case of SLIDE, 3-point subsets of N interaction points on the ligand are tested for matching to all 3-point subsets of a set of typically 100-150 template points representing the protein. 41 In the case of DOCK (Kuntz et al., 1982), the earliest protein-ligand docking technique, the binding site is filled with spheres, whose centers serve as possible ligand atom positions. Chemical properties or other characteristics can be associated with the spheres, and a sphere with a particular characteristic can only be matched with a ligand atom of complementary character (Shoiehet and Kuntz, 1993). Jones et a1. (Jones et al., 1995) identify solvent—accessible hydrogen—bond donor and acceptor atoms within the active site of the protein and associate virtual points with each hydrogen and lone pair of these atoms, enabling the genetic algorithm employed by GOLD (Jones et al., 1997) to transform the ligand into the binding site by minimizing the least—square distance between protein virtual points and similarly defined ligand virtual points. Ruppert et a1. (Ruppert et al., 1997) coat the protein’s binding site surface with probes of three types, hydrophobic, acceptor and donor, which could potentially interact with the protein. These probes can serve as potential alignment points for ligand atoms and are scored to represent the probe's affinity for the protein. High affinity probe-clusters identify sticky spots, or regions of strongest potential binding. This method can also be used to find binding pockets on the surface of a protein. FlexX (Kramer et al., 1999) uses a multi- layered representation of the binding site adopted from its predecessor LUDI (Bohm, 1992): interaction types are arranged on three levels depending on their directionality, with H-bonds being the most directional at level three and hydrophobic interactions the least directional at level one. Each group capable of forming an interaction is characterized by an interaction center and a surface, the latter being approximated by a finite number of points. Ligand interaction centers are superimposed over these points and aligned, giving preference to higher-level interaction points over lower-level ones. In 42 an approach related to that of SLIDE, Fischer et a1. (Fischer et al., 1993; Fischer et al., 1995) describe the surfaces of the protein and ligand by a set of critical points and their normals, then apply geometric indexing to dock the ligands into the protein by matching the critical points and vectors. Grid-based representations are also used to map favorable points of interaction with proteins. In preparation for docking with AutoDock (Morris et al., 1998), the protein binding site is placed in a grid. The protein-ligand pair-wise interaction energies are precalculated at each grid point for each possible ligand atom type and are stored in a look-up table for use during the docking simulation. The Grid technique developed by Boobbyer et al. (Boobbyer et al., 1989) calculates for each grid point an empirical energy designed to represent the interaction energy of a chemical probe group, such as a carbonyl oxygen or an amine nitrogen atom, around the target molecule. This function is used to determine the sites where ligands may bind to the target, such as a protein. Finally, knowledge bases of the frequency of pair-wise atomic or functional group interactions deduced from the crystallographic protein structures in the PDB (Berrnan et al., 2000) and small organic molecule structures in the Cambridge Structural Database (CSD) (Allen, 2002) can be used to map favorable sites for ligand interactions with proteins. Relibase (Bergner et al., 2001), a database system of protein-ligand interactions fiom the PDB, has been used to derive atomic potentials between protein and ligand atom groups for use in DrugScore (Gohlke et al., 2000a). DrugScore can then calculate “hotspots” for interactions with different ligand atom types, which are displayed as contour maps within the binding site (Gohlke et al., 2000b). Similarly, the SuperStar software (V erdonk et al., 2001), based on pair-wise interaction frequencies in the CSD 43 database, can calculate hotspots for the binding of 16 probe atom types to proteins. A recent paper analyzes how the interaction maps developed from PDB versus CSD data complement each other (Boer et al., 2001). Another knowledge-based approach was taken by Moreno and Leon (Moreno and Leon, 2002) to describe the binding site for DOCK: templates of attached points or contact points are constructed for each amino acid type, representing the geometry of the interactions observed in the different protein- ligand complexes from the PDB. In this chapter, it is shown how a knowledge-based approach for describing favorable interaction sites on proteins and ligands can improve the performance of SLIDE when a database of known ligands combined with a random selection of CSD compounds is screened against two protein targets, thrombin and glutathione S- transferase (GST). 3.3 Methods 3.3.1 Knowledge-Based Representation of Protein Binding Sites Because grid placement of hydrophobic and hydrogen-bond points is not always optimal with respect to protein interactions, here we describe the development of a knowledge- based approach to placing points in an unbiased template. Geometrically favored subsites for ligand hydrogen-bonding atoms are assigned based on the distance and angle to protein hydrogen-bonding partners (Figure 3.1.B). After identifying the protein atoms capable of hydrogen bonding, a number of template points are placed at and around the optimal hydrogen bonding position for each of these atoms, using the geometries shown 44 in Figure 3.2. The template points belonging to one hydrogen-bonding protein atom are separated by ~l A and are placed at a distance of 2.9 A (for Asp, Glu, Lys, Thr and Tyr side chains) or at 3.0 A (for all the other side chains and backbone oxygen and nitrogen) from the protein donor or acceptor atom. The parameters for optimal hydrogen bonding geometry were taken fiom the literature (Ippolito et al., 1990; McDonald and Thornton, 1994). The points are labeled as donors, acceptors or donor/acceptors, depending on the role an atom at this position would have in hydrogen bonding to the protein. A donor template point, for example, is located near an acceptor protein atom, such as a backbone carbonyl oxygen, and represents a favorable placement for a ligand atom acting as an H— bond donor. A donor/ acceptor point is defined in two cases: when a ligand atom at that point could make favorable hydrogen bonds with separate hydro gen-bond donor and acceptor atoms in the protein, or when it could interact with a group that both donates and accepts hydrogen bonds (e.g., -OH in the side chains of Ser, Thr, or Tyr). Template points that overlap with those belonging to neighboring atoms (template points separated by < 1 A) are clustered and relabeled, and points closer than 2.5 A to a protein atom are discarded. The clustering of hydrogen-bonding template points reduces the number of points by about 10—25%. Points generated by the clustering of a donor and an acceptor point are relabeled as donor/acceptors. Hydrophobic template points are generated using a grid for initial point placement, as before, but the criteria have been updated for which of these points should be included to represent favorable sites for ligand interactions. Hydrophobic points are those grid points with a hydrophobic enhancement score of at least 3. This score is defined as the number of carbon atoms minus the number of hydrophilic atoms, such as oxygen or 45 Figure 3. 2. Panel A: Placement of optimal hydrogen-bonding template points in SLIDE. For each polar side chain, the optimal placement of hydrogen-bond donor (D), acceptor (A) and donor and/or acceptor (N ) template points is shown with respect to the donor and acceptor atom positions in the side chain. These template points represent positions where a ligand atom matching the template point can form a hydrogen bond with the protein. A ligand atom matching a donor/acceptor (N) template point must be either a donor or acceptor, or both. These optimal distances and angles are consensus values describing preferred geometries (Ippolito et al., 1990; McDonald and Thornton, 1994) observed in high resolution protein structures from the PDB. The positions of hydrogen atoms in the protein are not assumed in template point placement, since these positions are not available in most crystal structures. Instead, the most favorable positions for hydrogen-bonding partners is measured relative to the geometry of the covalent bonds in the side chains (e.g., trans and gauche positions for Lys), as found from analysis of crystallographic data (Ippolito et al., 1990; McDonald and Thornton, 1994). Panel B: Three-dimensional example of template point placement relative to a Lys side chain. The template points defined for minimal, sparse, and dense templates are shown, along with the most-preferred distance and angle for hydrogen bonding, as shown above. The default template specification in SLIDE is dense, and thus there are more possible H- bond template point matches, each of which is shifted by a small amount relative to the optimal position and still allows formation of a near-optimal hydrogen bond between the matched ligand atom and the protein. Sparse and minimal hydrogen-bond templates are alternatives that can be used to decrease the number of hydro gen-bond template points when the complete template for a protein, including hydrophobic points, exceeds the practical limit of about 150 points. 46 p, Asn and D " Gln gauche gauche .‘D gauche —? "" gauche Main chain nitrogen Main chain oxygen / ,0 O'=C 120° \ ”'/' Ni+1 """""" A / \J‘\ C / N 1+1 140° ‘g 0: 1+1 D A A Q gauche . g . 8, . gauche "a. .. ,1" o- . ,. " OI, ' o trans . / / 2 Minimal Sparse Dense Figure 3.2 47 nitrogen, within a spherical shell of radius 2.5-5.0 A from the template point in question. The cutoff value of 3 was found to define the significantly hydrophobic protein surface patches that complement the hydrophobic groups in ligands for a number of 3D protein- ligand complexes. After they are generated separately, the H-bonding and hydrophobic template points are merged into one template that can be used for docking with SLIDE. If the total number of template points is much larger than 150 (a practical upper limit given the combinatorics of matching ligand interaction points with template points), then the complete linkage clustering feature can be used to reduce neighboring points of the same class to a single point, the cluster centroid. Complete linkage clustering has the desirable features that the clusters can be defined to not exceed a certain diameter (helping control the separation between centroids), and they are guaranteed to be the most densely occupied set of clusters for that diameter (Sanschagrin and Kuhn, 1998). Typically we use a clustering threshold of 4 A, resulting in hydrophobic template points separated by about 2 A. When a clustering threshold of x A is used with complete linkage clustering (where x is typically chosen between 2 and 4 A), the average distance between the final template points (the centroids of each cluster) is very close to x/2. For any uniformly distributed set of points clustered by complete linkage, the centroids of the clusters will be separated by half the cluster diameter (called the clustering threshold in this work), on average. 48 3.3.2 Ligand Interaction Points Hydrophobic ligand interaction points are assigned using a rule-based approach summarized in Figure 3.3. These rules are designed to place an interaction point at approximately every 1.5 hydrophobic carbon atoms in hydrophobic chains and around the circumference of hydrophobic rings. This density of hydrophobic interaction points is Methyl Group lsopropyl Group Tetrahedral Group Hydrophobic Ring with Hydrophillc Hydrophobic Ring with Single Hydrophobic Ring with Multiple Substituent Hydrophobic Substituent Hydrophobic Substituents Hydrophobic Rings with Shared lntemal Hydrophobic Atom Triplet Edges Figure 3. 3. Summary of rules for hydrophobic interaction point assignment. The goal is to place a point at approximately every 1.5 carbon atoms, which is commensurate with the default spacing of hydrophobic points in the template. Hydrophobic interaction points are denoted by green spheres, carbon atoms by gray tubes, and nitrogen atoms, representing hydrophilic atoms, by blue tubes. 49 commensurate with the spacing of hydrophobic points in the protein template, using the default clustering criteria. For this approach, carbon and sulfur atoms bonded only to carbon, sulfirr or hydrogen atoms are considered to be hydrophobic. Other atoms are taken as hydrophilic. Hydrogen-bonding interaction points in the ligand are identified as atoms capable of accepting or donating hydrogen bonds, based on the SYBYL atom types in the molZ file (described at http://www.tripos.com). 3.3.3 Ligand Databases A combined database of known ligands from the PDB and a subset of 14,691 randomly selected CSD compounds was assembled for alpha-thrombin and n-class human GST. The CSD database was prescreened to exclude molecules with excessive molecular weight as well as those containing unusual atoms. The nonredundant subset of known ligands for thrombin contained 42 molecules taken from thrombin-ligand complexes available fi'om the PDB. To screen for ligands to GST, 15 known ligands with PDB crystal structures in complex with human GST were selected. For both thrombin and GST, ligands fi'om crystal structures with a resolution of 3.0 A or better were included in the known ligand test set. If a ligand was found in multiple structures, the one with the highest resolution was chosen. To ensure that SLIDE can appropriately model the side- chain conformational changes necessary in nature when proteins bind their ligands, structures of thrombin and 1t-GST determined crystallographically with ligand-free active sites (apo structures) were used as the targets for screening and docking (PDB code lvrl for thrombin (Dekker et al., 1999) and PDB code l6gs for GST (Oakley et al., 1998)). This also avoided the docking bias that is implicit in redocking experiments (when the 50 ligand-bound structure of the protein, already conformationally biased for that ligand, is used as the basis for docking). Because interactions in a mutant protein structure might change the favored orientation of a ligand relative to its orientation in the wild-type protein (and therefore not allow fair comparison of the docking with the crystallographic complex), ligands from complexes containing a mutant version of n—GST were excluded fiom the analysis. Four of the GST crystal complexes (PDB codes l3gs, 20gs, 21gs and 2gss (Oaldey et al., 1997; Oakley et al., 1999)) contained two ligands: glutathione, and a smaller hydrophobic ligand bound to the xenobiotic subsite of the active site. Only the hydrophobic ligands from these structures were included in the screening dataset, and glutathione from the GST—glutathione complex laqw (Prade et al., 1997) was used as the single representation of this ligand in the screening set. 3.3.4 Key Template Points In order to focus the large number of orientations that can result from the screening/docking process on productive binding modes, selected template points can be labeled as key points. Template points from parts of the binding site known to be critical for tight and/or specific binding can be marked as key points, and any docking must then include a match to one (not all) of these points. This ensures that docked molecules will at least partially occupy the targeted site. For thrombin, points in the specificity pocket within 5.0 A of the carboxyl oxygens of Asp 189 were selected as key points. Assignment of key points in the GST binding site was more challenging, as it is made up of two subsites, one for hydrophobic ligands and the other for glutathione, which is fairly polar. SLIDE was run twice on the known ligands in the case of GST: initially with key 51 hydrogen bonding points in a 5.0 A radius sphere around the side chain hydroxyl oxygen of Ser 65 in the deepest pocket of the glutathione binding site, to capture ligands that can bind to this polar site, then with key hydrophobic points in the area between Tyr 108 and Phe 8, the xenobiotic (hydrophobic) binding site. Screening against the CSD ligands was done using the first set of key points in the glutathione-binding pocket, which includes both hydrophobic and hydrogen-bonding interactions. Using key points is mainly a convenient way to ensure that ligands make interactions in the deep pockets of the binding site, rather than making less favorable, superficial interactions. Placing key points in the deepest pocket of the thrombin active site would be usefirl, in the absence of any knowledge of thrombin ligand structure or chemistry, to ensure the absence of a significant, destabilizing cavity in the complex. Ensuring that deep pockets are filled is also a widely used approach in structure-based drug design to increase ligand binding affinity and specificity. For GST, the use of key points allows a convenient analysis of ligand binding to the hydrophobic binding site versus binding to the glutathione site, without specifying which ligands favor which site, or how they bind. We can therefore assess the accuracy of ligand specificity as well as docking for GST: hydrophobic ligands should fit and score well in the hydrophobic site, and score poorly if they also dock into the polar site (when key points are included there, instead), and vice versa for the polar ligands. This allows a more sophisticated analysis for GST, making use of both its binding sites. Key points can also hurt docking results, because not all ligands may make one of the chosen interactions and therefore would either not be docked at all, or would be forced to dock by making a non-native interaction. Thus, using key points is only recommended for predicting the docking of ligands if there 52 is a strong indication as to the location of a key binding pocket within the larger binding site (as is obvious in the case of thrombin, which has a funnel-shaped active site). Another appropriate occasion for including key points is in design applications, when the intent is to control which pocket or binding site is to be probed by a database of ligand candidates or fragments. 3.3.5 Evaluation of New Protein and Ligand Representations in Ligand Screening end Docking Templates for thrombin and GST were created both with the grid-based and the knowledge-based template generation methods; the knowledge-based templates are shown in Figures 3.4.A and B. Sets of interaction points for the known ligands and the CSD compounds were also identified using both assignment methods. SLIDE was used to screen the known ligands and the CSD compounds against thrombin and GST, first using the grid-based template and the original ligand interaction points, and in a second experiment using the knowledge-based template and the new ligand interaction points. The two methods for representing the protein target and ligand candidates were evaluated in two ways. First, they were evaluated based on how well SLIDE, using these protein and ligand representations, could reproduce the known ligand positions in the structure of the protein-ligand complex. This involved docking the ligands into an apo structure of the protein, with side-chain positions not already optimized for the ligands. Secondly, they were evaluated by how well known ligands and nonspecific molecules (in our case, CSD compounds) could be differentiated. The heavy atom root-mean—square—deviation (RMSD) was used to compare the docked ligand orientation with its crystal structure 53 Figure 3.4. Examples of new knowledge-based templates. The Connolly solvent— accessible molecular surfaces (Connolly, 1993) of the GST (A) and thrombin (B) active sites are shown, color—coded according to atom type (green — carbon, blue — nitrogen, red — oxygen, yellow — sulfur). Known ligands from PDB structures 2pgt (A) and la5g (B) were docked into the binding site with SLIDE and are shown as tubes, also colored according to atom type. The template points are represented as spheres, with blue representing hydrogen—bond donor points, red for acceptors, whitefor donor/acceptors, and green for hydrophobic interaction points. 54 position. Because scoring remains a major challenge in the field (Bissantz et al., 2000; Charifson et al., 1999; Stahl and Rarey, 2001), and to ensure that the results were not very dependent on the particulars of one scoring function, the dockings were also evaluated using DrugScore as well as the SLIDE score. While SLIDE scores the protein- ligand complex based on the number of hydrogen bonds and the hydrophobic complementarity (Schnecke and Kuhn, 2000), DrugScore (Gohlke et al., 2000a) calculates protein-ligand interaction energies employing a knowledge-based potential that reflects the frequency of pair-wise atomic distances observed in protein-ligand complexes from the PDB. The known ligands and CSD compounds were each docked, scored, and sorted by score. Then, the enrichment in selecting known ligands from the random database, based on scores, was calculated as the percentage of known ligands captured as a firnction of the percentage of the database screened, where the top 1% of the database represented the top scoring compounds. 3.4 Results All four combinations of template and ligand interaction point design were evaluated: grid-based template with original interaction points, grid-based template with new interaction points, knowledge-based template with original interaction points, and knowledge-based template with new interaction points. Both the knowledge-based template design and the new interaction point assignments resulted in improvements individually, but the most improvement was seen upon combining the two. For brevity, only the results obtained with the two most relevant combinations are presented: grid- based protein template with original ligand interaction point assignments (subsequently 55 referred to as method 1, corresponding to the implementation in SLIDE v.1), and knowledge-based template with new interaction points (method 2, as now implemented in SLIDE v.2). 3.4.1 Thrombin The 42 known thrombin ligands used in this study are listed in Table 3.1, along with the PDB code of the crystallographic complexes from which they were obtained. SLIDE docked 36 ligands into the binding site of thrombin using both methods. The ligands with no scores listed could not be docked, due to unresolved steric overlaps with the apo- active site thrombin structure (lvrl) except for the case of benzamidine (PDB code ldwb), which was not docked, due to the unusual proximity of its three interaction points (the two amide N’s, and any pair of its three benzene-ring hydrophobic points, were all < 2.5 A apart). This caused benzamidine dockings to not meet a default parameter setting in SLIDE which ensures that the minimum edge of any triangle being matched is > 2.5 A. This is intended to ensure that ligand dockings are complementary to more than a very local region of the binding site. (If the binding site is small, or the goal is to find small molecules that match very locally, this parameter can be changed easily.) Among the docked ligands, 27 had a heavy atom RMSD smaller than 2.0 A compared to the crystal structure orientation using method 1, while 33 such dockings were obtained with method 2. As shown in Figure 3.5.A, the dockings were generally closer to the crystal structure position using method 2, as reflected by their lower RMSD values. The mean RMSD for thrombin ligand dockings was 1.83 A using method 1, and 1.28 A using method 2. An 56 example of the typical improvement in the quality of docking for thrombin ligands is shown in Figure 3.5. Enrichment plots of the percentage of known ligands docked as a function of the percentage of the database screened (CSD plus thrombin ligands) are shown for SLIDE scores (Figure 3.7) and DrugScores (Figure 3.8). Higher enrichment is gained with method 2 compared to method 1, independently of the scoring function used (indicated by a shift to the left of the new curve compared to the original one in panel A in Figures 3.7 and 3.8). This means that more known ligands are returned by SLIDE among the top scoring CSD compounds. Based on the SLIDE score, for example, the percentage of the known ligands that ranked among the top scoring 100 molecules increased from 38% (16 out of 42) to 64% (27 out of 42). The results are very similar when using DrugScores: 67% of the known molecules (28 of the 42) ranked among the top scoring 100 molecules with method 2, compared to 33% (14 of the 42) using method 1. The score distributions also show that the knowledge—based protein and ligand representations provide a better separation between known ligands and randomly chosen CSD compounds for both the SLIDE scores (Figure 3.7.8 and C) and DrugScores (Figure 3.8.B and C). The difference between the mean SLIDE scores of the known ligands and CSD compounds increased from 20.7 score units to 27.1 score units when method 1 was replaced by method 2. DrugScore also mirrors a better discrimination between known ligands and CSD compounds when the knowledge—based method is used. 57 Figure 3. 5. Comparing the docked orientations to the crystal structure position of a B— strand mimetic inhibitor (PDB code 1a46) in the binding site of thrombin. The crystal structure position of the ligand is shown in white, and the docked orientation using the knowledge-based method is in magenta (RMSD 1.03 A), while the docking obtained with the grid-based method is shown in blue (RMSD 2.48 A). This is representative of the improvement in docking quality observed for the thrombin and GST ligands in general. The view into the thrombin active site is slightly shified relative to that in the previous panel. 58 9.0 00.0 0.00I 0.00I 0mm 0.0 0.40 0.00I Eflmi 82 00.0 v: 0.00I 0.31 0.00 0.5 0.40 ST 00.020-80-050 002 5.0 00.0 4.0T 0.07 0.00 0.6 0.00 0.5I 2- .05th 008 04.0 K0 0.0T F.00I 0.00 0.00 0.00 0.0? 05-02-80 002 E0 5; 3.? 0.0T 0.00 30 00> 0.001 051-02-90-05: 9'31 I 00.0 I 4.91 I 0.04 0.00 0.31 00000Eo Emai— 00. I 00.. 0.40I 0.0T 0.00 0.4.0 0.00 0.31 $0.00 Ea ES _m>o._on I I I I I I 0.00 0.me 65000060001050 Va: .2900 I I I I I I 0.3. ET -oiaaodooofoi 0:2 80 04.0 0.0I 0..0I 0.00 0.00 0.00 0.00I 202: -_ :2 05202-0 ES 00; 00,— 0.00I 0.00I 0.04 4.00 0.00 0.00I 000-0 62-90-08 22 [00.0 00.0 4.91 0.31 0.00 0.00. 0.00 0.0.1 0:0-ma<-oi-0§-o-oow 003 I I I I I I 0.0 0.0? 0000502 008 00. F 00..— 0.va P.00I 0.40 2.0 0.00 001 53-02-65. 52 3.0 00.0 0.00I 0.0? 0.: 0.00 V00 00? 80-02-20 003 00.0 00; 0.0mI 0.0.? V. .0 0.00 0. K 0.0T 5-0600205“ 5.3 00.0 04.0 0.00I 0.ka 0. 5 0.04 0.00 0.00I V2325 0:955 00000-0me 02: I I I I I I 0.00 0.0? 000650 600. 00.0 00; 0.00I 0.07 0.00 0.00 0.04 0.00I 30.960 003 00.0 00.0 0.00I «NT 0.00 0.00 4.00 0.31 $00058; 22 602 Donna comma woman woman woman comes coined 02.2265. .25 0900.305. .25 ,bmufliocx -25 9.303....» .2030 2 70.0. 268 70: 6&0: 0:00: 2.8 man 002m .80 280055 son 26% 00:0 .80 mojm 280005 Eon com .0880 mEI—m 3 836906 «c .590 030.0888 0 oz» 2 4A: .3 3330 80 0280 0.80me 353.8 2F .0052: Iago—>55— ofi HEB cemgafioo E 08508 sonneoeow 050m 55835 05m: use 30388 1698 05 mime Mafia ,3 :58 .s 20 2V8 20 as 0380 saw: asses 5.65 a V0020 as .28. 2800.5 use... 0030 .3 caused .5 Abo>woon02 .070 can wIm 0:83—03 QmEM 00252 8 280 «8&2 05 .23 wee—cow 05 8 02080280 .203: was .2000 Dues 0 8008 03? 83030 Home: 0 .8080 @093 2.: Souk 59 00.0 00; 0. F0I 0.40I 0.00 E4 0.00 0.00I 0000000 0:20 04.0 40.0 F.00I 0W? 4.00 4.00 004 4.00I E0000 0:0: 00.0 00.. 0.0T 0..0MI 0.00 0.04 4.00 0. V0I 25:: < 00: 0550. 020 050 .0000; .0 40.0 00.0 0.00I :07 0.00 0.00 0.00 0.07 -_>650e8_>£0&_0-z.z 0&2 050 _ _ 085203 40.0 .0; 0.44I 0.00I 0.04 004 0.00 0.04I -oc_50-oi-0i-_>£02 0? 40.0 9.0 0.04I F.00I 4.00 0.04 0.04 0.40I 02-90-20 E: 00; 0.0 4.00I 0.00I 0.00 0.40 0.00 4.00I 4. 02508052600 05: 0 V. V :4 4. V0I 0.00I 0.00 0. 4 0.04 0.mI0I 60.02.90.000. 00: E0 05 0.00I 0.00I 4.44 0.04 0.04 0.00I 4.94%. 000: 00.0 04; 0.04I 0.40I :0 0.04 0.00 0.00I 02-20-094-03 ES IOImEcchOOEOcoLon 00. V I 0.00I I 0.00 I 0.04 0.04I 05-054064. 0;: 10-020 6-05250 K0 00. V 0.00I 0.04I V00 0. 0 0.40 0.00I ..23-26V60-oi-0cn_-0-o< 9.: R0 00. V 0.04I 0.00I 0.04 0.00 0. .4 V. V0I 10-032805054004. E: 00.0 0 V. V 0.00I 0.04I 0.04 0.00 0.00 0.00I 10-0Hm£9 0950: 5.65. .o .0090... A B C SLIDE score 63 Figure 3. 7 Figure 3.8. Significant improvement in enrichment for thrombin ligands, as reflected by the scoring function DrugScore (A), where a lefiwards shift of the curve corresponding to the knowledge-based method indicates improved enrichment. The distributions of DrugScore scores (divided by 104) obtained using the grid-based method (B) and the knowledge-based method (C) show a much better separation between the scores of known thrombin ligands and CSD compounds. This is reflected by a lO-unit increase in separation between the mean DrugScore for ligands and the mean DrugScore for random CSD compounds. Curves that do not reach 100% for the “Percent of known ligands retrieved” indicate that some ligands were not docked. 64 knowledge-based method 100 I 0 Top scoring percent of all docked molecules original method ~~ no enrichment 5 .0. A wwwmo 099.02 00002 $5ch .0 .0090... (Mean: -30.06 SD: 6.97) // known ligands (Mean: -41.91 SD: 9.83) ‘7////. 4\\\\\V CSD compounds V/u/é/g/é e .......... ... ........ ........... -4O -50 -60 -7O -30 DrugScore x 10'4 ‘0 CSD compounds (Mean: -27.25 SD: 6.71) C "////. // known ligands (Mean: -48.31 SD: 10.52) Figure 3.8 65 3.4.2 Glutathione S—Transferase SLIDE was able to find a collision—free orientation for 14 of the 15 known ligands in the active site of GST with method 2, while 13 were docked using method 1 (Table 3.2). The ligand from the crystal complex l9gs could not be docked for the same reason described for benzamidine in the previous section, whereas the reason for failure of chlorambucil (21 gs) to dock was the existence of unresolved steric clashes with the protein. Method 2 resulted in better dockings (lower RMSD values), as illustrated in Figure 3.6.B by the majority of points falling under the diagonal. Only one of the 14 docked ligands had a lower RMSD when method 1 was used, two were docked about equally well, while 10 were docked closer to their crystal structure position with method 2. The number of known ligands docked with an RMSD less than 2.0 A doubled fiom five to ten, and the mean RMSD between crystal structure and docked positions decreased from 2.15 A to 1.00 A upon introducing the knowledge—based method. The four hydrophobic ligands, shown by the crystal complexes to bind to the hydrophobic subsite of GST (l3gs, 20gs, 21gs, 2gss), were docked incorrectly (RMSD > 5.0 A) when polar template points were used as key points. This is not surprising given that these ligands must make interactions in a region different from where the key points were assigned. However, their docking improved substantially when hydrophobic key points were used in the second run with either method of template generation and interaction point assignment. Hydrophobic template points can be used as key points for docking smaller sets of ligands to a protein, but this is not a practical alternative when screening large databases. Since matching three template points is sufficient for docking with SLIDE, using hydrophobic key points when screening a large database can result in docking a very large number of small, 66 relatively nonspecific, hydrophobic molecules. They could later be eliminated based on their scores, of course, but this would still result in a considerable increase of the running time and output volume. Only the results of the first run (with hydrogen bonding key points) were used to construct the enrichment plots for GST (Figure 3.9.A). For brevity, only the enrichment plot for DrugScores is shown; the results were substantially similar using SLIDE scores. DrugScores indicate that more of the known ligands were retrieved among the top scoring molecules (Figure 3.9.A), meaning improved enrichment was achieved with method 2 compared to method 1 for GST. When the SLIDE scoring function was used, 73% of the known ligands (11 out of 15) were ranked among the top scoring 100 of all docked molecules when using method 2, compared to 60% (9 out of 15) among the top 100 with method 1. Using DrugScore, the percentage of the known ligands ranking among the top scoring 100 of all the docked molecules increased from 33% (5 out of 15) to 60% (9 out of 15). The distribution of scores obtained for the docked known ligands and CSD compounds to GST are shown in Figures 3.9.B and C. The difference between the mean scores of the GST ligands and randomly selected GST molecules increased due to the introduction of the knowledge—based method, independently of the scoring function applied: the means were separated by an additional 7.5 score units using SLIDE scores, and by an additional 9.4 X 104 units using DrugScore. Although the standard deviations of the DrugScores and SLIDE scores also increased, the increased separation of the means was roughly three times greater than the increase in standard deviations. 67 00.0 00.0 0.00- 0.00- .40 0.00 .00 0. .0- 0000-02-3 .0 0.05.08 082.058 .000 00.0 00.. 0.04- 0. .0- 4. .0 0.00 0.00 0.00- 2000.08 0025058000 20000.00 000m 00950000202358.6888? 40.0 04.4 4.00- 0.00- 0.00 0.00 0.00 0.40- 0.-._>co_£0s_0r0.-0-.00.00. .000 .000. .04... . .0.-. .040. ...00. .000. 00.0 .00 0.0.- 0.4.- 0.00 .0. 0.00 0.0.- .80 20400050 . 0000 ..0.4. .004. 8.00-. .00.-. .009 .000. ...0 I 0.0.- I .0. I 4.00 0.00- 8320520 .. 00.0 $00. .000. 00.00-. .040. .0000 .000. 04.0 .00 0.00- .00- 0.44 0.0. 0. .0 0.00- 020 090008 . 0000 00.0 04.0 0.00- 0.0? 4.04 .00 0.04 .0? 08.50.203.000 .000 00.0 44.. 0.04- 0.00- 4.04 0.00 4.00 0.04- 050.000.400.009:05-0.0.0.-0 x00. 04.0 00.0 0.00- 0.00- 0.00 0.40 0.0 0.00- 00250.20 300. 44.0 .00 0.04- 0.0.- 0.04 4.04 0.04 0.04- 082.05%..085900 >00. :2 .80 o_co._=0_0-..0._0 I I I I I I 0.0. 0.0.- £2058 02200004400.12000 000. 40.0 00.. 0.00- 0.00- 0.04 0.00 0.44 0.4- 00000009..__0-4.0-._..co_£0s_0w-0.-. 00.01. .00... .000. .0.00-. .000. $000 .400. 04.0 00.0 0. .0- 4.0.- 0.00 0.00 0.40 0.00- 0500.00.30 . 000. 00.0 00.0 0.44- 4.40- 0.00 0.04 0.04 0.04- 052009.280 00Mm 00.0 00.0 .04- 0.00- 0.44 0.04 0.04 0.00I 0500.03.20 050.030.0000 00.01. .8000 0300 0800 0800 .803 .4800 5.002— .050.2305 -25 6940.39... .25 003.305. .25 0.5.0:...» .850 0000 280 4.0... 2:0: .800... 00.. .<. 00..: .80 10... 280020 .80 28» 00:0 .80 00:0 280020 .9.o>.80..00. 670 .000 0-0 0:828. D045. .0026. 8 0.800 .023... 05 5.3 0.5.8.. on. 9 0.0800208 :02... .80 .0800 8:0... 0 0.80:. 020.. 0.2030 80.0. 0 .0088 52. 80. .0088 mad-Hm o. 0.038%... 00 .008 030.0088 0 03w 8 4c. .3 32>... 20 00.800 080mwan. .0538 on... .8508 00009-032308. 0... 5.3 0800.00.88 .0 002008 8000000 :80 80000.0. 0000.. 0.00.5 000 0.0.000. 00..-00-0.00 2.. 5.3 000: as 0050 .000. 008 mm... .000 .8 3.0 0380 05 9:. 09.8.. 005w: .50 8505. .8 0.0.0210. .80 00.800 080me9 .0088 40.940 .8 80.00980 N“. 030.0 68 Figure 3.9. Enrichment for glutathione S—transferase ligands, as reflected by the scoring function DrugScore (A), where the significant lefiwards shift of the curve corresponding to the knowledge-based method indicates greater enrichment. The distributions of the scores (divided by 10‘) obtained using the original grid-based method (B) and the knowledge-based method (C) again show a better separation between the scores of known GST ligands and CSD compounds, indicated by the large increase of 10 units between the means of these two classes of compounds. Given the smaller sample size (15) of GST ligands, this score distribution is less well defined than those for thrombin (Figures 3.7 and 3.8). However, the same trends in improvement are found for both proteins and both scoring functions. Curves that do not reach 100% for the “Percent of known ligands retrieved” indicate that some ligands were not docked. This percentage decreased with use of the knowledge—based template. 69 100 60 0 //////////A ..//w////////////////////////////////////.m.w ,/////////////////// ////////0-.mv ///////////////// 00000000000 / 0 Top scoring percent of docked molecules If. ’ ooooooooooooooooooooooo \\\\\\\\\\\\\\\\\\\\\\\\\\\s \\\\\\\\ m \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ § .00. \\\\\\. \ w 0 .......... 00000000000 .......... ooooooooooo 0000000000 m A: oooooooooo .¢.o 0.0.-n. ...... {.0005 m ....................u.vummmammmfih.” 3.3mm, W///////////0.0.-0.0.0.0.”...x00.x...3.3.00.0...“.H 0 S . U r D / 10 ooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooo original method --~- no enrichment CSD compounds (mean: -18.83 SD. 6. 77) — knoMedge-based method V//////. known ligands (mean: -26.91 SD: 8.89) CSD compounds (mean 1711 SD 6 34) \-10 W known ligands (mean: -38.09 SD: 12. 33) O 1 w w m o 090:8. 00:09. 5.65. .o Ememn. A 100 80 B C 3.5 Discussion 3.5.1 The Influence of Accurate Binding Site Representation on Docking and Scoring Because the computation time increases nearly exponentially with the size of the template, a compromise must be reached such that the most important features of the binding site are captured with the smallest possible number of template points. Using a knowledge—based approach for identifying the most favorable hydrogen—bonding subsites in the binding site of the protein proved to be superior over grid-based sampling followed by the selective retention of points where ligand atoms could act as hydrogen— bond donors or acceptors. More known ligands could be docked closer to their known crystal structure positions for both thrombin and GST using the knowledge—based method of template and ligand interaction point generation. Docking experiments usually return multiple docked orientations per ligand. Ideally, the scoring function will indicate the one closest to the crystal structure by giving it the highest score. Also, when a large database is screened, the scoring function should be able to discriminate between promising ligand candidates and artificial hits. Using the assumption that most CSD compounds are unlikely to be ligands of thrombin and of GST, the ability of SLIDE scores and DrugScores to discriminate between known ligands and CSD compounds was tested. The enrichment plots calculated with both scoring methods showed improvement upon replacing the grid—based template with the knowledge—based one, and the separation of scores between ligands and CSD compounds also increased. The reason for this is the ability of SLIDE to dock ligands better with the 71 knowledge—based method, with better dockings receiving higher scores, whereas the CSD compounds received roughly the same scores using both methods. Precise computational prediction of the binding affinities of a series of ligands for an arbitrary protein target cannot be routinely achieved by any method at this time. Particular challenges remain in the handling of interfacial solvation and protein and ligand flexibility, so scoring functions perform best when the details of the protein—ligand complex are well—resolved. Thus, docking presents a particularly hard case for scoring, and consensus scoring by combining several scoring functions has been suggested to enhance hit rates (Bissantz et al., 2000; Charifson et al., 1999; Stahl and Rarey, 2001). To compensate for the shortcomings of using a single scoring function, a second, independent scoring function, DrugScore, was also used to score the ligands docked by 93 2 ° 8 3 0 LU Lu 9 c: ..J :l U) , (I) A 20‘ 030000 coefficient R= 0.66 20" Correlation coefficient R= 0.80 10‘ . (SD: 8.59.N= 36,.P<00001) 10* (SD= 7.87N: 36,P939 momcmco mace .9350 @039 $9.20 mace .2850 cm _. mm _. om mv 02. m2. om 9V 0 hi p b i? u — o P ”/2 E K A o I O ‘— tunoo $5826 .50 m: 85828 two 2 " /-o ,/ z: /' /{/ 4" / O N tunoo excmm uc=mEm no one “3 2.52801 excmm uo=mEm .0 one Co 22501 $3 ”.o__mEm .0 03 do mcozfiom m— Qcmw co=mEm .5 cm? do mcozflom 4 row O O N 92 .8585 258-3%: 98 020-23»: mo 8%. mfiufin 05 89: mane 02m mficqommoboo 5232. moon—Bog: 2mg 3.63:3 2: 5V 8 @2888 .8585 $99 03.3%: wficqoqmotoo 2: 85 mugw: 859.6: M: waxes com: :3 ”1550 .3 38.5830 20:98 59.9020 6.» 239k $93095 >930 80:20 0.98 .9350 $039 $9.20 06cm .9850 s.-. 8:. . .. _ s. , s... . . so u: s. o... e. o . . . . . we a a o . ”w / e // fl - // 0 / 8 o 9.9 Essa mm m 2.9 599:. M m ccece-cccm_:cc:-ccu9_ 2 . fl 1 cc:ce-cccm_:cue-cccm__2 M 8 l M 02 fl / / ” $3 :0..me Lo om... 28:93. com .me :2.me .0 one 80:90”. ov $3 uo=mEm .o 03 mcozmuom m— .xao 9.2.95 ..o 03 20:93. < . om 93 4.4 Discussion When measuring side-chain dihedral angle differences of ligand-bound and ligand-free proteins, only protein side-chains in direct contact with the ligand in the ligand-bound were taken into account. This was done to ensure that only ligand-induced changes were considered. There is a very good qualitative agreement between the pattern of side-chain rotations that occur upon ligand binding provided by SLIDE and the picture that emerges from comparing 1i gand-free and ligand-bound protein structures. On average, about 85- 90% of side chain rotations are smaller than 45°. Studies of ligand-induced changes in side-chain conformations in protein binding sites usually count only differences larger than 45°, or even 60 or 75° (Betts and Stemberg, 1999; Najmanovich et al., 2000), that would correspond to changes in rotameric states of the side chains. Heringa and Argos on the other hand, observed that ligand binding induces non-rotarnericity in the preferred side-chain conformations (Heringa and Argos, 1999). The model of protein side-chain flexibility implemented in SLIDE provided results that are in good agreement with these latter observations. Not only were most of the rotations too small to allow changes in rotameric states, but they were necessary to correctly dock the ligand in about 60% of the thrombin ligands. Even when docking could be achieved with a rigid protein structure, the resulting docked orientation was farther from the correct position in many cases, compared to the orientation resulting from flexible docking. Comparing the A and B panels of the plots showing the distributions of side chain rotations (Figures 4.3, 4.4, and 4.5) it is noticeable that SLIDE is somewhat more parsimonious than nature by producing a smaller number of rotations in the protein upon binding the ligand. One reason for this is that the larger the number of side chains 94 allowed to be rotated by SLIDE, the larger the possibility of creating new intramolecular overlaps in the protein. This would ultimately lead to an increase in computational time, limiting the usefiilness of the program in screening large databases. Another reason for the above mentioned quantitative discrepancy could be that SLIDE does move side chains away to resolve collisions but does not move them toward the ligand to make new interactions. This is a future improvement to be implemented in SLIDE. 4.5 Conclusions The assumption that both protein side chains and ligands move as little as necessary in order to achieve a collision-free complex proved to be both reasonable and sufficient to dock most of the known ligands into the binding sites of their target proteins for the systems tested in this study. The results of the ligand-free and ligand-bound crystal structure comparisons underscore that side chain conformational changes are typically not rotameric, but instead involve modest (<15°) changes in side-chain angles. 95 Chapter 5 Using SLIDE to Find New Ligands for Thrombin 5.1 Introduction Increase in efficiency and reliability of computational tools has enabled virtual screening to become a valuable method in the pharmaceutical drug discovery process, complementing high-throughput screening (Good, 2001; Schneider and Bohm, 2002; Shoichet et al., 2002; Waszkowycz, 2002). Novel inhibitors have been identified for thrombin (Fox and Haaksma, 2000; Massova et al., 1998), protein tyrosine phosphatase- lB (Doman et al., 2002), various nuclear hormone receptors (Schapira et al., 2000, 2001), human carbonic anhydrase (Gruneberg et al., 2002), and thyrnidylate synthase (Shoichet et al., 1993) by in silico screening of compound databases. As described in the previous chapters, SLIDE is a computational tool which can efficiently screen databases of hundreds of thousands of molecules to identify feasible ligand candidates for a target protein with known three dimensional structure (Schnecke 96 and Kuhn, 1999; Schnecke and Kuhn, 2000). The realistic modeling of protein side- chain and ligand flexibility, combined with the improved representation of the binding site by knowledge-based template design has allowed a better discrimination between true ligands and non-specific compounds (Zavodszky et al., 2003). Since experimental testing is a useful complement to modeling, a screening experiment was designed to test the predictive power of SLIDE. After screening the Available Chemicals Directory (ACD; MDL Information Systems, Inc.) to identify new ligands for thrombin, binding affinities were measured for the top scoring candidates using isothermal titration calorimetry (ITC). The Target: Thrombin Thrombin is a key player in the blood coagulation cascade: it catalyzes the proteolytic cleavage of the soluble plasma protein fibrinogen to produce fibrin. The linear fibrin monomers are then cross-linked by factor XIII, producing insoluble blood clots. Factor XIII is a transglutaminase, the last enzyme of the coagulation cascade, which is itself activated by thrombin. Thrombin is also a potent platelet activator. Activated platelets adhere to the site of vascular injury, aggregate, and form a plug to reduce blood loss. The coagulant activity of thrombin is kept under control by thrombomodulin, a thrombin binding protein on the surface of endothelial cells. When too much thrombin is generated, thrombomodulin binds to thrombin, dramatically altering its specificity. The complex rapidly cleaves the protein C zymogen to form the anticoagulant, activated protein C. Complex formation between thrombin and thrombomodulin also prevents thrombin from cleaving fibrinogen. Numerous efforts to control the blood clotting process are directed 97 toward thrombin because of its pivotal role in maintaining the intricate balance between hemostasis and thrombolysis (Davie etal., 1991; Esmon, 1995). Thrombin is a trypsin-like serine protease with the characteristic Ser-His-Asp catalytic triad at the active site. Its specificity is also similar to that of trypsin, preferentially binding substrates with Lys or Arg residues in its specificity pocket. Two additional binding sites (the fibrinogen binding exosite and the heparin binding site), and the ability to use different combinations of these elements allow thrombin to play a key role in a variety of blood coagulation related processes (Tulinsky, 1996). Biochemical modeling studies are greatly aided by the extent of structural data on thrombin that has become available during the last few years (Stubbs and Bode, 1993; Stubbs and Bode, 1995). 5.2 Methods 5.2.1 Screening the ACD with SLIDE SLIDE (described in more detail in Chapter 1, section 1.4 and Chapter 3, section 3.3) was used to screen the Available Chemicals Directory (ACD) to identify new ligands for thrombin. After eliminating compounds with fewer than 6 or greater than 200 non- hydrogen atoms, the ligand database contained 214,713 small organic molecules. DrugScore (Gohlke et al., 2000) was used to rescore the top dockings returned by SLIDE. A short description of DrugScore is provided in Chapter 2, section 2.2. 98 5.2.2 Isothermal Titration Calorimetry Isothermal Titration Calorimetry (ITC) is a particularly suitable technique to follow the energetics of an association reaction between macromolecules (J elesarov and Bosshard, 1999), allowing the measurement of the enthalpy as well as the entropy changes of such interactions. The experiment is performed at a constant temperature by titrating the ligand into the protein solution in the sample cell of the calorimeter. After each step of adding a small aliquot of the ligand, the heat exchange in the sample cell is determined by measuring the electrical power necessary to keep the small temperature difference between the sample cell and the reference cell constant. The integrated heat changes plotted against the molar ratios of the binding reaction show the characteristic sigrnoidal curve of the binding reaction. For a single set of identical binding sites, the total heat of the reaction Q can be calculated from the following equation: 1 X 1 X 1 2 4X Q'ENIPMHV 1+F+NK[P]_\/[l+7\7+NK[P]] "7v— where N is the number of binding sites, [P] the total protein concentration, AH the enthalpy of the binding, V the volume of the calorimetric cell, X the ligand/protein molar ratio, and K the binding constant. Least square fitting of the equation describing the binding process to the experimental data allows determination of the enthalpy of the binding (AH), the association constant (K), and the stoichiometry, which reflects the number of ligands binding to one protein molecule (N). The other thermodynamic parameters, the Gibbs free energy (AG) and entropy change (AS), for the interaction can be calculated from the relationship: 99 AG == -RTan = AH — TAS where R is the universal gas constant, and T is the absolute temperature. Measurements were carried out using an MCS_ITC instrument from MicroCal (Northampton, Massachusetts). Human a-thrombin (Enzyme Research Laboratories, South Bend, Indiana) was dialyzed overnight at 4°C against TRIS buffer (50mM TRIS, 100 mM NaCl, 0.1% PEG800, pH 7.8) with the buffer changed twice to remove salts and impurities. The third dialyzate was saved and used for making the protein and ligand dilutions. Protein concentrations in the sample cell were in the 0.39 - 0.85 mg/ml range. The ligands were added to the protein solution using a lOOul syringe, with concentrations ranging from 0.4 to lmM. All the experiments were carried out at 30°C, with both the protein and ligand solutions degassed before measurements. The reference cells contained deionized and degassed water. As a negative control, a buffer-run was performed with each ligand candidate, when the ligand was titrated into the buffer without thrombin. A known thrombin ligand, 4-aminobenzamidine, was used as a positive control. Data analysis was performed with the Origin software supplied with the instrument. 5.3 Results The ACD screening run to identify potential new ligands for thrombin was completed in approximately two days on a double processor desktop workstation. SLIDE returned 15,474 docked molecules, with an average of two orientations per compound that fit the active site. The top 3000 orientations were rescored with the knowledge—based scoring 100 function, DrugScore, and these compounds were then ranked according to their consensus score, calculated as the normalized sum of the SLIDE score and DrugScore SCOTCI SLIDE _score + DrugScore_sc0re Consensus _score 2 Max_ SLIDE _score Min _DrugScore_sc0re The largest SLIDE score and the smallest DrugScore were used to calculate the normalized scores because SLIDE scores are positive numbers with larger being better, while DrugScore calculates energy-type scores with negative values, where the smaller the value is the better the score. Seven compounds were selected based on their consensus scores (Figure 5.1), excluding closely related molecules and compounds that were difficult to obtain. Molecules that were obviously dyes were also excluded, because they tend to bind to a wide variety of biological macromolecules non-specifically. Eight other compounds were selected based on molecular graphics inspection of shape and chemical complementarity of the ligand with the protein (Figure 5.2). These 15 compounds were chosen to be assayed for binding affinity by ITC. Of the 11 out of 15 compounds that proved to be soluble, two, morelloflavone and new firchsin, showed micromolar binding affinity to human thrombin and are novel ligands for this protein (Figure 5.3). 101 Morelloflavone 172.46 Sennoside 157.71 H HO CHz-OH Carbobenzyloxy-L-tyrosyl-L- leucine methyl ester 152.42 .0 i 5““. Cefsulodin 147.48 "III , ,s -0 o N‘\'/ "'N Z- Arg-P-nitrobenzyl ester HCl 142.17 Trans- (E)- flupenthixol OH 150. 96 CNN Figure 5.1. ACD compounds selected for testing based on their scores. The numbers next to the ligand names are consensus scores, a normalized sum of SLIDE score and DrugScore, where higher is more favorable. 102 N-alpha-trityl-L-Arg 133.68 3-aminobenzenesulfonic (6-(fluoro-SOZ-)- 1 -Me- 2-Ph—4(1H)-quinolydine) hydrazide l 33 .62 O O Z>- . ~ .1 4 Ole D-glucose dibenzyl O mercaptal 133.17 2,2-bis[4-(4- w aminophenoxy) 0 . s phenyl] propane g A 0 131.46 4,4’-(alpha,4-dichlorobenzylidene)bis [2,2’-(phenylimino)diethanoll 13 1 .81 Rutin hydrate l 29.27 2,4,6-tris-(hydroxyl-phenyl)-pyranylium on R R 131.41 “° “ Figure 5. 2. ACD compounds selected for testing based on molecular graphics inspection of their docked complexes with thrombin. The numbers next to the ligand names are consensus scores, where higher is more favorable. 103 Figure 5.3. The recorded heat changes upon successive injections of the ligand into the buffer (negative control, top panel) and the protein solution (middle panel) are shown for each ligand. Morelloflavone and new fuchsin produced larger heat changes (heat of dilution) when injected into the buffer compared to 4—arninobenzamidine, due to the small percentage of DMSO that had to be used to solubilize the first two compounds, while 4-aminobenzamidine was directly soluble in the working buffer. For each ligand, the heat of dilution was subtracted from the corresponding heat of the binding reaction. The integrated heat values plotted against the molar ratios are shown in the lower panels. The red lines represent the least square fitting of the one—binding—site per protein model (N=1) to the experimental data. The parameters calculated from this fitting are the association constant (K, in M"), the enthalpy (AH in cal/mol) and the entropy change (AS in cal/molK) of the binding reaction. The dissociation constant (K. in M) is the inverse of K. The shape of the fitted curve depends on the protein concentration, binding constant, and the stoichiometry of the binding reaction. 104 ofim. has. n a v o p P - NNI «ma. ma 6349.5... In . 383% v. .3. a z 1061 .38 aw. .3- 8898890 .55 2.: rueroalug jo slow/130x oasneod m w 2sz oasneori iueroalug lo mow/[eon oasneod 0v 8 on or o .55 as.» OER—2‘ . .. m m . o 8.3- we .. . n. mg? 33. I< 1N..- m. gwuvawmé v. . W 19.7 m. . a w. :06. w. a . d. in: .8. m . 8629:8823. . 1 #6. £99.. + Lotam -8. -No. d . m. :::. . is -111. , ll .o.o .5 82¢ 5:3 .8. .3 m -3. w p- rcpt—riplrrrr -F ..o.o 15. A 4 d J 4 1. 12 . n p - p r p p L b F.o 105 The predicted binding orientations of these newly identified thrombin ligands mimic the binding modes of known thrombin ligands but have a different molecular scaffold (Figure 5.4). Figure 5. 4. Docked orientation of morelloflavone in magenta (A) and new fuchsin in blue (B) in the binding site of thrombin (PDB code lvrl). The molecules colored by atom types are known thrombin inhibitors fiom X—ray complexes (PDB codes ldwd and laht, respectively), showing that known thrombin ligands sample similar regions of the binding pocket. 5.4 Discussion Given the time constraints imposed by the large number of compounds in screening libraries, virtual screening tools can only afford to perform a relatively rough, approximate docking and employ a simple and quick scoring fimction instead of highly detailed quantum mechanics/molecular mechanics calculations to score the hits. Under 106 these conditions, finding low affinity binders with novel scaffolds is a realistic expectation from screening random compounds. Micromolar affinity is typical of lead compounds identified by high throughput combinatorial library screening for drug discovery. These leads can then be further modified, with functional groups added or deleted to develop tight and specific inhibitors for the given target protein. Two of the 11 soluble ligand candidates tested for binding turned out to be micromolar binders to human thrombin. This success rate for identifying new ligands based on SLIDE virtual screening is comparable to the best results reported by other groups (Doman et al., 2002; Fox and Haaksma, 2000; Gruneberg et al., 2002; Massova et al., 1998; Schapira et al., 2000; Schapira et al., 2001; Shoichet et al., 1993), and is about 1000-fold more efi‘ective than in vitro high-throughput screening, which typically has a success rate of ~0.02% (Doman et al., 2002). SLIDE also explicitly predicts the binding mode between the protein and ligand (Figure 5 .4), which will aid in optimizing the new ligands for higher affinity and protein selectivity (e. g., binding to thrombin over other coagulation and digestive serine proteases). 107 Chapter 6 Modeling Protein Main-Chain Flexibility in Docking 6.1 Introduction Analysis of conformational changes on complex formation for a representative set of 39 pairs of ligand-flee and ligand-bound structures (Betts and Stemberg, 1999) showed that about 50% of the proteins undergo substantial main-chain and side-chain conformational changes when binding the ligand. In another study, focused mainly on evaluating the average number and the type of protein side-chains that undergo major rearrangements upon ligand binding, aside from ubiquitous side-chain movements, Najmanovich and co- workers found backbone displacements larger than 1 .3. in 25% of the cases (Najmanovich et al., 2000). This means that in many instances the protein-ligand recognition process cannot be correctly described unless protein main-chain flexibility is taken into account. Excellent reviews have been published recently (Carlson, 2002; Halperin et al., 2002) summarizing the state of the art in flexible docking. Except for 108 limited cases — simple hinge motions (Sandak et al., 1998), crystallographically determined alternative conformations (Claussen et al., 2001), or small-scale motions typical of molecular dynamics simulations (Carlson et al., 1999; Lin et al., 2002) — main- chain flexibility has not been considered in docking. The various approaches to model side-chain flexibility published in the literature are summarized in Chapter 4, followed by the analysis of how induced fit is modeled for side-chains in SLIDE. This chapter introduces a new and generally applicable method including main-chain flexibility in modeling protein-1i gand recognition. Inducing changes in the protein main chain while performing docking is too expensive computationally, so efforts are directed toward generating a representative conformational ensemble of the protein and using this set as targets for the docking instead of a single structure. This approach is also following the line suggested by a number of theoreticians and experimentalists who argue that the idea of selection of a naturally occurring, fitting conformer is closer to reality than the classical induced fit model (Bosshard, 2001; Carlson and McCammon, 2000; Ma et al., 2002). According to this paradigm, the protein exists in a number of conformations in solution. Ligands of various shapes and sizes can bind to any conformation of the unbound protein, not only to the one with the lowest free energy. A ligand that binds to a less populated conformational state of the receptor with very high affinity can be a stronger binder than one that binds to the lowest energy conformation of the target with lower affinity. Nevertheless, this ligand would be missed if only the lowest energy conformer of the receptor or the average of several low energy structures was used as docking targets. 109 The set of multiple protein conformers usually come from NMR studies, x-ray structures of the same protein with various ligands, or MD simulations. In their groundbreaking work, Kuntz and co-workers (Knegtel et al., 1997) use ensembles of NMR and x-ray protein structures as targets for docking with DOCK. The binding site is placed on a grid, and intermolecular force field values are calculated at the grid points. Variations among different observable conformations are taken into account by calculating the average of the force field values at each grid point. Two types of averaging are used: energy weighted, and geometry weighted. The first method involves calculating the contribution of each atom from each structure to the potential energy, then calculating a weighted potential by averaging over all structures. Geometry weighted averaging means that the averaging is performed at the structural level by calculating a mean position for every atom of the protein. Although this approach does not include receptor flexibility in a dynamic sense, the composite grid representing the interaction energies of the docked ligands with the different protein conformers is shown to outperform many of the grids derived from individual structures in identifying known inhibitors for the cases studied. Claussen and co-workers use FlexE, an extention of FlexX (Kramer et al., 1999), to dock ligands into a united protein description generated from the superimposed structures of the x-ray srtucture ensemble of the target protein (Claussen et al., 2001). While averaging the similar backbone and side-chain positions, the regions with larger variations are retained in form of conformational libraries. New conformations of the receptor are created by combining compatible conformations of the various flexible regions of the binding site. The method can handle several side-chain conformations and smaller loop (up to three or four amino acid) movements but not 110 motions of larger backbone segments. Nevertheless, docking into the united protein description with F LEXE did not provide considerably better dockings than docking into the individual crystal structures with FLEXX for the proteins studied. The use of multiple experimental structures limits the conformational sampling to already observed and existing conformations. Some proteins do not have multiple x-ray structures or are too large or flexible for NMR structure determination. MD simulations can provide novel protein conformers to be used as targets for docking, however, they generate smaller scale movements than may be observed in nature due to their high computational cost. The development of a dynamic pharmacophore model for HIV-1 integrase is described by Carlson et al. by using snapshots of MD simulations and the multi-copy minimization method MUSIC to determine binding regions for probe molecules in the dynamic binding site (Carlson et al., 2000). The drawback of MD simulations is the long time (from weeks to months) required to achieve a good sampling. In fact, it is almost impossible to get beyond microsecond timescale motions. In this chapter, a new and relatively efficient approach to modeling main-chain flexibility in docking and screening is described. Flexibility analysis from a single conformation of the target protein was performed using the graph-theoretic algorithm FIRST (Jacobs et al., 2001), followed by the generation of alternative conformations for the predicted flexible regions with ROCK (Thorpe et al., 2001), a fast and efficient conformational sampling algorithm. A representative and diverse set of the conformational ensemble generated this way was used as a series of targets for docking with SLIDE. ROCK is uniquely suited for flexibly handling ring structures and can be used to model the flexibility of macrocyclic ligands as well, as it is demonstrated for 111 cyclosporin. The use of this combined method to perform flexible docking is illustrated on the cyclophilin A — cyclosporin system, while addressing the question of how much flexibility of the interacting molecules is tolerated without hindering recognition. 6.2 Methods WHAT IF is a program suite for protein structure analysis (Vriend, 1990) used in this study to add the polar hydrogens to the crystallographic structure of the protein. This program was selected because it proved to reliable reproduce hydrogen positions observed in proteins whose structure were determined with neutron scattering. F IRST (Floppy Inclusion and Rigid Substructure Topography) is a graph theoretical approach to identify rigid and flexible regions based on the bond network in proteins (Jacobs et al., 2001). The bond network consists of covalent bonds, hydrogen bonds and hydrophobic tethers. The algorithm counts the number of internal bond-rotational degrees of fi'eedom in the system and identifies rigid regions as those having no bond- rotational degrees of freedom, in other words having enough constraints to become rigid. Flexible regions are those with remaining degrees of freedom or not enough constraints to become rigid. The number of extra constraints or the number of remaining degrees of fi'eedom is used to calculate the relative rigidity or flexibility index of the region. This computational approach is very fast and is able to reliably predict the conformational flexibility of a protein from a single, static three-dimensional structure (Jacobs et al., 1999; Jacobs et al., 2001). 112 ROCK (Rigidity Optimized Conformational Kinetics) uses a “random walk” approach to search the conformational space available to proteins represented as bond networks (Thorpe et al., 2001; Thorpe and Lei, 2003). The program keeps bond lengths and coordination angles constant, randomly performing small rotations for the rotatable bonds. It also ensures that all the original bond constraints are obeyed and van der Waals overlaps between atoms are avoided. By using the results of the FIRST flexibility analysis, ROCK generates a set of semi-continuous conformations by sampling only those bonds in the proteins that are predicted to be rotatable. A non-linear constrained optimization algorithm is used for repositioning the side chains not involved in rings, consistent with the new main chain conformation. Only those main-chain conformers obeying the favored is the average number of covalent and non-covalent bonds for the atoms in the protein, and provides an overall description of the protein bond network, depending strongly on the number of bonds present in the structure. The mean coordination is a useful parameter when comparing rigid to flexible transitions in different proteins (Rader et al., 2002). Moreover, Rader et a1. find = 2.405 i 0.015 to be a universal value describing the rigid-to-flexible phase transition of every protein analyzed, a property shared with amorphous glasses. An energy cutoff value of -2.3 kcal/mol was selected from the hydrogen bond dilution plot as corresponding to the flexibility observed in the native state of the CypA protein. This energy corresponds to the thermal energy of the protein and was selected to reflect a state near physiological conditions where the protein has one rigid core but the outer loops are flexible (Figure 6.3.A), corresponding to regions found flexible in the well—determined NMR structure of CypA (Ottiger et al., 1997). The flexibility properties of the bond network at this particular energy cutoff were determined by FIRST. The following regions were identified to be flexible: residues 12-15, 24-29, 43-47, 54-60, 65- 76, 79-82, 87-94, 101-107, 116-127, 133-135, and 143-155. Three strands of the B-sheet forming the bottom of the binding site (Figure 6.3.A) are rigid, while the loops surrounding the incoming ligand are flexible. To study the effect of the ligand binding on the flexibility of the target protein, FIRST analysis was also performed on the CypA- 119 cyclosporin complex (Figure 6.3.B). Cyclosporin rigidifies part of the CypA binding site, especially the 87-94 strand and loops 24-29, 87-94, and 116-127. f, 116-127 8.}, 5.38 f/ t 5 V \_ / 3.7!. ()5- 76 Figure 6.3. Ribbon diagram of the ligand-free (A) and the ligand-bound (B) CypA structures colored by flexibility index. Grey regions are isostatic or just rigid, blue regions are overconstrained, having more than enough bonds to make them rigid, while yellow to red regions are flexible. The red arrow in panel A indicates the location of the binding site, which is occupied by the cyclosporin ligand (colored green) in panel B. 6.3.2 Conformer Generation The results of the FIRST analysis for CypA (list of hydrogen bonds stronger than -2.3 kcal/mol, list of hydrophobic tethers, flexibility index of each bond) were used as the input for ROCK to generate alternative conformations for the flexible regions. Two ROCK rtms were performed: one with dihedral angle rotation steps of maximum 5 120 degrees, the second one with steps up to 10 degrees. Given the random walk nature of the conformational sampling with ROCK, runs with different angle step have the potential of sampling different areas of the conformational space. Each run generated 600 conformers. The 20 most distinct conformations fi'om each of the two runs were combined and the 12 most distinct conformers (Figure 6.4.A) of these 40 structures were identified and used as targets for docking with SLIDE. The good stereochemistry of these conformers was confirmed with PROCHECK, and the Ramachandran plots for the x-ray structure and the most distinct conformer from the x-ray structure are shown in Figure 6.5, with the other conformers being of similar quality. Figure 6.4. The 12 most distinct conformers of CypA generated by ROCK (A). The ligand, in magenta, indicates the location of the binding site. The ribbon diagram of the lowest energy NMR structure of free CypA (PDB code loca) is shown in panel B. The thickness of the tube is proportional to the maximum deviation of the backbone Cu atoms fi‘om the average Ca position of the 20 energy-minimized NMR structures fi‘om PDB entry loca. 121 The set of 12 most distinct conformers was identified based on the backbone RMS deviations of residues 42-46, 67-75, 79-81, 120-124, and 148-149 relative to the crystal structure (PDB entry lbck), ranging from 0.94 to 1.34 A overall. The movements of these regions were monitored because these are the main regions predicted to be flexible by FIRST surrounding the binding site and they were also identified as the ones with the most significant backbone differences in NMR structures (Ottiger et al., 1997). The conformers generated by ROCK (Figure 6.4A) sample approximately the same conformational space as the 20 lowest energy NMR structures of CypA (Figure 6.4B). The regions with the largest movements modeled by ROCK are the regions with the most variations among the individual NMR structures, with the ROCK conformers showing somewhat larger backbone deviations. The backbone RMS deviations of the ROCK conformers for the whole sequence of CypA were in the range of 0.50-0.76 A. To illustrate the range of motions captured by ROCK, the maximum Ca deviations fiom the x-ray structure observed in the 12 most distinct conformers were plotted for each residue (Figure 6.6.A). As a comparison, the deuterium exchange rates of the backbone amide protons of free CypA are shown in panel B of Figure 6.6. The HD exchange data was generously provided by Marcel Ottiger and Kurt Wfithrich (Ottiger et al., 1997). 122 .28me worse—E bmsococom 8 30:0» 28 98 .223.“ 832? 3 26:02 View .mqomwoc 88 3.826.“ 82: 3 309.028 82m and Ame MUOM 5m? woufioeow 5:28:00 852% «88 2t mo 98 A3 <96 mo 838.5. Rambo 05 mo 82a guess; we 93me .1 . . a .i... .4 4.5323853852853839 Excusfiaoaivsnisaiaugagiia Exafigggisugissfiina godgiuouog—guogo—Coaiigz EQNI-Sfluogo-Buo-DBREBCoaba-EEE v85 9 ggfififinfifl! «$6 a gags—:83!“ «2.6 _ 3...). 91.0.; :8? v2.21 >368» E 8318— *o. o o ELLA—1.0.; b . . i .111...— $v§ on 2.3.5 3052 v2.8:- 1323- E .388.— «8: 2 3.3.; 3932 3331 ion—3.. 3 .828: 3a.: >8 Eda {gave—65388.86!"— oSfiu m: EQS {swag-«gag 8g .05 Aseuoe Em (“0139?) Ed £58 22: -. x2: «oi gauges m 85 unevenness—mm < 123 I I I ' I ' I ' I ' I ' I A 6- 20 40 60 80 100 120 140 16q A Residue Number 3 . E 4‘ > a: D 1 <3 2- X N 2 I ..x O l I I I I A (A) N I I I J A Exchange rate (mind) w an I O3 Figure 6. 6. Maximum deviations of backbone Ca atom positions compared to x-ray structure positions (PDB code lbck) seen among the most distinct CypA conformers generated with ROCK (A). As a comparison, the hydrogen-deuterium (HD) exchange rates of the backbone amide protons of free CypA are shown in panel B (Ottiger et al., 1997). Exchange rates of -1 indicate that the exchange was too fast to be observed. Exchange rates of —5 indicate very slow exchange. The locations of the regular secondary structure elements are given in the top panel, where blue lines indicate B-sheets and the red zig-zags corresponds to a-helices. The backbone RMS deviations of the ROCK conformers provide a measure for how different these conformers are, in average. Even if RMSD values are relatively small (0.94 to 1.34 A overall for the flexible regions), individual backbone atom deviations can be much larger. The largest Ca deviations of the ROCK conformers of up 124 to 4.28 A from the crystallographic structure were observed in the loop regions surrounding the binding site (residues 43-47, 65-76, 116-127, 143-155). These are also the regions with very fast HD exchange rates (Figure 6.6) observed with NMR. The only apparent discrepancy between the FIRST/ROCK conformational predictions and the NMR data is found for helix 136-143, which has low HD exchange rates, suggestive of maintaining rigidity, but large Ca deviations, indicating larger movements. This helix is predicted to be an independent rigid cluster by FIRST, with the ability to move as a rigid unit relative to the rest of the protein. This rigid body movement gives rise to backbone deviations compared to the x-ray structure, but the helix remains intact and so do the intrahelical hydrogen bonds, prediction consistent with slow HD exchange rates. Conformers for the cyclic ligand cyclosporin were also generated with ROCK. The protein-bound conformation of cyclosporin was used as a starting structure, and only the covalent bond lengths and angles were used as constraints. This was considered to be a reasonable approach given there is only one hydrogen bond in this peptide and no intramolecular hydrophobic tethers were identified by FIRST. Generating conformers starting fiom the unbound form of cyclosporin was also considered, but later dismissed because the unbound conformations of cyclosporin and its derivatives have a cis amide bond between residues 9 and 10, while the bound conformations are all-trans (Figure 6.7.A). The peptide bonds are locked in ROCK, so the correct protein-bound conformation would have never been sampled; allowing flips between cis and trans peptide bonds can be incorporated in a future version of ROCK. 125 Unbound Bound Figure 6. 7. (A) Comparing the crystal structures of free (CSD codes ZAJDUJ and KEPNAU) and protein-bound conformations (PDB entry lbck) of cyclosporin. The arrow indicates the location of the peptide bond between residues 9 and 10, which is cis in the free structures but trans in the protein-bound one. (B) A subset of cyclosporin conformers generated with ROCK from the protein-bound conformation (taken from PDB entry lbck). The x-ray structure of cyclosporin is shown in red. A total of 3000 cyclosporin conformers were generated in three separate runs using small angular steps. The runs were different in the step sizes of the dihedral angle rotations (2.0 and 5.0 degrees) and the maximum percentage of bonds that could be rotated at each step (10 and 20%). Since cyclosporin is a relatively large and flexible ligand, and docking thousand of conformers to 13 CypA targets (12 conformers plus the original x-ray structure) would be very time consuming, the most distinct 395 conformers out of the 3000 cyclosporin structures were selected for docking. The backbone RMS deviations of these conformers compared to the x-ray structure of the protein-bound cyclosporin (PDB code lbck) were in the range of 0.45-8.60 A, with Ca deviations up to 126 14.17 A (Figure 6.7.3) indicating a diverse set of conformers and good sampling of the available conformational space. 6.3.3 Docking CypA and cyclosporin conformers were used for docking with SLIDE to probe the range of flexibility consistent with molecular recognition. The templates created for the x-ray structure of CypA (PDB entry lbck) and the 12 ROCK-generated conformers included 77 to 108 points each. All the hydrogen bonding template points were assigned as key points to assure that only those dockings with at least one hydrogen bond between protein and the ligand were generated. The results of the SLIDE dockings are summarized in Table 6.1. Docked ligands maintaining interactions with the rigid base of the binding site were considered to be correct dockings (Table 6.2). A docking was classified as correct if the contacts listed in Table 6.2 were maintained with a maximum distance of 5 A (5.5 A for the hydrophobic contact). This approach of using known recognition determinants (Kallen et al., 1998) to identify good dockings was employed since scoring firnctions, trained on correct dockings, do not perform well at distinguishing slight misdockings from gross misdockings (Zavodszky et al., 2003). As can be seen in Table 6.1, every protein conformation could recognize and accommodate at least one ligand conformer (CypA_3 89 and CypA_548) although not necessarily the x-ray conformation. Main chain deviations at Ca positions of up to 4.28 A were tolerated in the protein binding site, which was able to accommodate a wide range of cyclosporin ligand conformations with backbone RMSD values up to 4.25 A compared to the x-ray conformation. Upon docking, key interactions with the more rigid 127 portions of the binding site were maintained by residues 1, 2 and 11 of cyclosporin, while the effector loop protruding fi'om the binding site could flex considerably, reaching up to about 10 A in Ca deviations (Figure 6.8). 128 mam mad $6 I on; ww.~ I o: a oo.~ 3”.— Nam end 35 mmd SN _ we; «3 men mod cad 2am I m; mod I $6 2 mad an; may £6 36 m3. I mud mean I cod No EN 34 Se 36 9.6 end I 34 enm I cod m— $.N we; Ev owe mod 3N I 3d 84.. _ Sum an; awm wed mofi mmd I we; worm I on; Q awd 34 aom ONE 36 New I $6 26 I cod 8 Kan mu; m2 Sum 36 mm.m I n: v?” I who 3 mad :4 #2 35 who 86 I 56 me I cod 2. 3d wad w: owd g6 mod I and mod I cod 3 ~m.m n: wwo 3.3 36 cod I 2.0 mmd I cod me Nwd em; m8 $.m vww wm.m I :3 3.N - cod 2. cod cod ..um >88“ wee—cow meme—ecu meme—ecu wee—eon 8% 280m B¢< 880m Refinance 3e 9 A5 commerce do 335er 9e Omar/E woe—cow :03 n 5% do 432 3Q QmEM $5.880 floéomeoo gramme—emu coo—08 =03 <96 .Eoecemeoo <96 85 Engage 5.83205 wee—cow me 838% .~ 6 m3§ 129 Table 6. 2. Interactions monitored to identify correct cyclosporin dockings. CypA NE2 02: Asn102: N 113: CD1 X-ray BMTl: 1 THR2: N MVA11: MVA11: Figure 6.8. Cyclosporin conformers (green tubes) docked into the binding sites of CypA conformers (ribbons colored by flexibility index where yellow to red is increasingly flexible and grey to dark blue is increasingly rigid). The red tube is the x-ray conformation of cyclosporin docked into the x-ray conformation of CypA. 130 6.4 Discussion In addition to studying the flexibility of CypA starting from the PDB structure lbck, the FIRST analysis was also performed on the ligand-free crystal structure 2cpl, with a resolution of 1.63 A (Ke, 1992). The results obtained were in excellent agreement with those for lbck, indicating that the FIRST results are not very dependent on the particularities of the individual x-ray structures, if they have good stereochemistry. The good agreement between the flexibility predictions of FIRST and the NMR data indicating which regions are the most flexible in CypA (Figures 6.4 and 6.5), as well as previous results on other proteins (Jacobs et al., 2001; Rader et al., 2002; Thorpe et al., 2000), suggest that FIRST is a reliable method to predict the flexible regions of a protein from a single x-ray structure. Further studies are needed, however, to identify a consistent way of identifying energy cutoff values corresponding to the native state of various proteins. ROCK is a unique tool in its ability to sample flexibility in multiple interlocked ring systems. The protein main-chain conformers generated for CypA are of good stereochemical quality and span a conformational space very similar to that of the NMR solution structures. A very long MD simulation would be the only other computational method that could provide a similar degree of conformational sampling, but it would require several orders of magnitude longer time than ROCK. Comparisons between MD and ROCK sampling are ongoing in the research groups of our collaborators Michael Thorpe and David Case. The shortcomings of this method are the lack of a timescale associated with the modeled motions, as well as the lack of an energy firnction that allows assessment of the relative likelihood of the generated conformers. Also, an efficient way 131 of repositioning the side chains on the modified main chains should be implemented to find not only a feasable side chain conformation for each residue, but to identify the most favorable one. These aspects will be the focus of our future work on ROCK. FIRST provided flexibility analysis combined with ROCK, for probing the range of possible motions, can have other applications besides providing a set of conformers for modeling main chain flexibility in docking. This approach also has great potential for studying entropy changes upon macromolecular association, as well as for studying allostery. The suggestion of using multiple protein conformations is only the first step toward realistic modeling of main chain flexibility in docking. When the conformational space to be sampled is large, a large number of individual conformations should be used to ensure uniform sampling, which is not a practical solution. The idea of the combinatorial joining of the discrete conformations of different segments seems to be a sound one (Kramer et al., 1999), and it is worth pursuing. 6.5 Conclusions Comparing our results to NMR data on CypA indicates that employing FIRST flexibility analysis of the target protein, followed by generating conformers for the predicted flexible regions with ROCK, is a realistic approach to model the flexibility of the protein main chain. Protein conformers with good stereochemistry are generated given only one starting structure. These conformers can be successfully used for docking experiments to model protein main chain flexibility. This method also provides the unique opportunity to study the effects of ligand binding on the entropy of the system. The docking 132 experiments of cyclosporin conformers to CypA conformers using SLIDE confirmed our hypothesis that there is a considerable amount of flexibility tolerated in the protein-ligand recognition process, reflected by the fact that multiple target structures accommodate a wide range of ligand conformers while maintaining key interactions. 133 Chapter 7 Summary and Future Directions 7.1 Summary of Advances Made Our early work on of predicting the binding mode of NMN in the active site of R67- DHFR suggested that the binding site representation in SLIDE needed improvement. This was done, and we envisioned that the next major advance in protein-ligand docking was to incorporate firll flexibility of the protein, including main chain motion. The docking experiments of NMN into the highly symmetric active site pore of R67-DHFR described in Chapter 2 led to the conclusion that a limited number of symmetry-related amphipathic residues allowed binding of either of the two ligands, NADPH or folate, using the same binding site residues. Using internal symmetry in the protein to generate hot-spots that accommodate a number of different interactions was proposed as a novel evolutionary strategy used by this enzyme to confer antibiotic resistance to bacteria. 134 To improve the quality of docking and scoring in SLIDE, a new, knowledge- based approach to representing the protein binding site for docking was evaluated in Chapter 3. Instead of randomly or uniformly sampling the binding site, template points were placed at positions where the strongest hydrogen bonds with optimal geometry could be made between the protein and its ligand. A better description of the binding site led not only to better dockings but also to improvements is scoring, as reflected by the ability of SLIDE to better differentiate between known ligands and nonspecific molecules in the case of thrombin and GST. The reason behind scoring improvements was that scoring functions trained on correct x-ray binding orientations performed much better when the dockings were close to the correct position, while performing poorly on misdocked ligands. An evaluation of modeling side-chain flexibility by SLIDE was presented in Chapter 4. This model of induced fit was built upon the hypothesis that both proteins and ligands change their conformations as little as necessary to resolve the interatomic collisions arising upon their association. This assumption was shown to be both necessary and sufficient to dock most known ligands to their target proteins correctly, at least when main-chain conformational change is not significant. The side chain rotations performed by SLIDE were shown to mimic the differences observed in side chain conformations in the binding sites of ligand-free and ligand-bound proteins. The systems studied included thrombin with 35 known ligands, human glutathione S-transferase with 14 known ligands, and a set of 18 diverse protein-ligand complexes. Our results reinforced earlier findings that ligand binding induces nonrotamericity in the target protein (Heringa and Argos, 1999), meaning that smaller rotations are found that do not 135 necessarily match distribution of the favored side-chain conformations in 1i gand-free proteins. Every model, no matter how intuitive and elegant it is, should also relate usefully to experimental work. To assess the predictive power of SLIDE, it was used to screen the Available Chemicals Directory to predict new ligands for thrombin. These predictions were tested by measuring the binding affinity of the ligand candidates with isothermal titration calorimetry. As shown in Chapter 5, two of the molecules tested, morelloflavone and new fuchsin, turned out to be new ligands having micromolar affinities for thrombin. The main role of virtual screening is to identify leads that can be further developed into drugs, and micromolar affinity is typical for lead compounds identified by in vitro screening, too. Thus, SLIDE can discover promising ligand candidates from screening a large database. A novel approach was suggested to model main-chain flexibility in docking in Chapter 6. The most distinct feature of this method is that it did not require lengthy MD simulations, nor was it restricted to available experimental structures to provide alternative main-chain protein conformations. This approach, combining the conformational sampling tool ROCK with SLIDE, was applied to explore the amount of flexibility allowed during the recognition process of cyclosporin by human cyclophilin A. As a first step, the flexibility of CypA was assessed based on the network of covalent bonds, hydrogen bonds and hydrophobic interactions identified in the crystallographic structure using the graph theoretic algorithm FIRST. As a next step, the program ROCK was employed to explore the conformational space available for the regions predicted by FIRST to be flexible, generating feasible main-chain CypA conformers with good 136 stereochemistry. Comparing the flexibility predictions of FIRST and the range of motions sampled by ROCK to existing NMR data on CypA showed very good agreement. This approach was shown to be not only a possible but also a realistic way to mimic the protein main-chain motions observed in NMR experiments. ROCK was also used to generate a set of conformers for the cyclic ligand cyclosporin. Main chain flexibility in protein-ligand recognition was modeled by docking a wide range of cyclosporin conformers into the CypA conformers with SLIDE. The docking results confirmed the hypothesis that a considerable amount of flexibility can be tolerated without hindering the protein-ligand recognition process. 7.2 Interesting Problems Remaining to be Solved In an attempt to model a natural phenomenon or find an acceptable explanation to a puzzling problem, usually more questions than answers arise. In this last part, I enumerate some of the new problems that surfaced during my work and which need solutions in the near future. During docking with SLIDE, both the protein and the ligand are handled as flexible molecules. However, the initial matching of ligand interaction point triangles to template point triplets rigidifies part of the ligand, which can bias the docking toward the starting ligand conformation. This is not a problem if the ligand is in a conformation not too far from the binding conformation. Many times that is not the case. According to the calculations of Bostrom et a1. (Bostrom et al., 1998), the difference between the conformational energy of the free ligand in solution and that of the bioactive conformation is S 3 kcal/mol for most ligands studied, indicating that the energy 137 required to distort the ligand too much for binding would decrease the binding affinity, and as such is not favorable. It is relatively safe to assume, that a ligand can exist in multiple low energy conformations in solution, so a possible approach would be to sample the conformational space of the ligand, generate a number of low energy solution conformers, and use them as input to SLIDE. Water molecules are known to mediate protein-1i gand interactions in many cases. Often, correct docking cannot be achieved without taking interfacial water into account. SLIDE has the possibility of taking water molecules into account at the protein-ligand interface, keeping or replacing them during the docking process based on Consolv (Raymer et al., 1997) or other predictions. SLIDE currently seems to penalize too much for displacing a water molecule predicted to be conserved upon ligand binding. Further studies are needed to find an appropriate way of handling this issue. In addition, we need to handle the more general case of new bound water molecules being recruited into the binding interface. An additional functionality is needed that would detect the existence of a cavity between the protein and the ligand appropriate for accommodating a water molecule and place water there such to form a bridge between the interacting partners, when energetically favorable. One of the greatest challenges of the docking/screening field is predicting the binding affinity of the docked ligand to its target protein. This is done by either free energy calculations or by employing simple physical or empirical scoring functions. None of the methods reported so far performs satisfactorily across various systems. An adequate scoring scheme would facilitate identification of the best docked orientation among the tens or even hundreds of possible orientations produced by a docking 138 experiment. It would also allow ranking of ligand candidates according to their binding affinity, which would make virtual screening a more reliable tool in searching for ligands. The usefulness of a good scoring method would not be restricted to protein-ligand docking. The same principles of interaction apply to protein-protein docking and protein folding, for example. It is widely known that the bigger challenge in the protein folding field is not generating possible structures but ranking them and identifying the most native-like ones. Docking a ligand into a rigid protein is of limited use. While it is relatively easy to sample side-chain conformations on the fly, main-chain conformational sampling is more computationally intensive, and it is more feasible to generate a set of possible main chain conformations in advance to use them as a library of alternative structures during docking. Docking into individual conformers might still be very time consuming, especially if a thorough sampling has to be done. Combining the available main-chain conformations in a combinatorial way seems to be a good way to speed up the docking and explore new conformations. In addition, associating energy values with the main chain conformers would help concentrating on the energetically most favorable ones instead of taking into account all the stericly feasible conformers. A major improvement regarding flexibility modeling in SLIDE also could be made by invoking dynamics not only to move protein and/or ligand parts out of the way to resolve inter-atomic collisions, but to move them toward making better interactions as well. In conclusion, this thesis shows that computational methods can be effective tools for modeling interactions between flexible biological molecules, and they can be used successfully to predict binding orientations of ligands relative to their target proteins, as 139 well as to find new ligands for proteins. Computational modeling of biomolecular interactions is one of the fastest evolving fields of science today. Further improvements in algorithms, combined with the rapidly growing amount of new information accumulated about the structure, function, and interaction of bio-molecules assures an important role of computational modeling of protein-ligand recognition in the future of biological sciences. 140 BIBLIOGRAPHY Abagyan, R. and Totrov, M. 2001. High-throughput docking for lead generation. Curr. Opin. Chem. Biol. 5 :375-382. Allen, RH. 2002. The Cambridge Structural Database: A quarter of a million crystal structures and rising. Acta Crystallogr. B 58:380-388. Babbitt, RC. and Gerlt, J.A. 1997. Understanding enzyme superfamilies: Chemistry as the fundamental determinant in the evolution of new catalytic activities. J. Biol. Chem. 272:30591-30594. Baxter, C.A., Murray, C.W., Waszkowycz, 8., Li, J., Sykes, R.A., Bone, R.G., Perkins, T.D., Wylie, W. 2000. New approach to molecular docking and its application to virtual screening of chemical databases. J. Chem. Inf Comput. Sci. 40:254-262. Bergner, A., Gunther, J ., Hendlich, M., Klebe, G., Verdonk, M. 2001. Use of Relibase for retrieving complex three-dimensional interaction patterns including crystallographic packing effects. Biopolymers 61 :99-1 10. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Boume, RE. 2000. The Protein Data Bank. Nucleic Acids Res. 28:235-242. Betts, M.J. and Stemberg, M.J. 1999. An analysis of conformational changes on protein- protein association: Implications for predictive docking. Protein Eng 12:271-283. Bissantz, C., Folkers, G., Rognan, D. 2000. Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations. J. Med. Chem. 43:4759-4767. Boer, D.R., Kroon, J., Cole, J .C., Smith, B., Verdonk, M.L. 2001. SuperStar: Comparison of CSD and PDB-based interaction fields as a basis for the prediction of protein- ligand interactions. J. Mol. Biol. 312:275-287. Bogan, AA. and Thorn, KS. 1998. Anatomy of hot spots in protein interfaces. .1. Mol. Biol. 280: 1-9. Bohm, HJ. 1992. The computer program LUDI: A new method for the de novo design of enzyme inhibitors. J. Comput. Aided Mal. Des 6:61-78. Bohm, HI. 1994. The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J. Comput. Aided Mol. Des 8:243-256. 141 Boobbyer, D.N., Goodford, P.J., McWhinnie, P.M., Wade, RC. 1989. New hydrogen- bond potentials for use in determining energetically favorable binding sites on molecules of known structure. J. Med. Chem. 32: 1083-1094. Bosshard, HR. 2001. Molecular recognition by induced fit: How fit is the concept? News Physio] Sci. 16:171-173. Bostrom, J., Norrby, P.O., Liljefors, T. 1998. Conformational energy penalties of protein- bound ligands. J. Comput. Aided Mal. Des. 12:383-396. Bradrick, T.D., Beechem, J .M., Howell, BE. 1996. Unusual binding stoichiometries and cooperativity are observed during binary and ternary complex formation in the single active pore of R67 dihydrofolate reductase, a D2 symmetric protein. Biochemistry 35:1 1414-1 1424. Brito, R.M., Reddick, R., Bennett, G.N., Rudolph, F.B., Rosevear, PR. 1990. Characterization and stereochemistry of cofactor oxidation by a type II dihydrofolate reductase. Biochemistry 29:9825-9831. Bruice, TC. and Benkovic, SJ. 2000. Chemical basis for enzyme catalysis. Biochemistry 39:6267-6274. Burkhard, P., Taylor, P., Walkinshaw, MD. 1998. An example of a protein ligand found by database mining: Description of the docking method and its verification by a 2.3 Angstrom X-ray structure of a thrombin-ligand complex. J. Mol. Biol. 277:449-466. Bystroff, C., Oatley, S.J., Kraut, J. 1990. Crystal structures of Escherichia coli dihydrofolate reductase: The NADP+ holoenzyme and the folate.NADP+ ternary complex. Substrate binding and a model for the transition state. Biochemistry 29:3263-3277. Carlson, H.A. 2002. Protein flexibility and drug design: How to hit a moving target. Curr. Opin. Chem. Biol. 6:447-452. Carlson, H.A., Masukawa, K.M., McCammon, J.A. 1999. Method for Including the Dynamic Fluctuations of 3 Protein in Computer-Aided Drug Design. J. Phys. Chem. A [03:10213-10219. Carlson, H.A., Masukawa, K.M., Rubins, K., Bushman, F.D., Jorgensen, W.L., Lins, R.D., Briggs, J.M., McCammon, J.A. 2000. Developing a dynamic pharrnacophore model for HIV-1 integrase. J. Med. Chem. 43:2100-2114. Carlson, H.A. and McCammon, J.A. 2000. Accommodating protein flexibility in computational drug design. Mol. Pharmacol. 57:213-218. Castillo, R., Andres, J., Moliner, V. 1999. Catalytic mechanism of dihydrofolate reductase enzyme: A combined quantum-mechanical/molecular-mechanical 142 characterization of transition state structure for the hydride transfer step. J. Am. Chem. Soc. 121:12140-12147. Charifson, P.S., Corkery, J.J., Murcko, M.A., Walters, WP. 1999. Consensus scoring: A method for obtaining improved hit rates from docking databases of three- dimensional structures into proteins. J. Med. Chem. 42:5100-5109. Chen, L. and Sigler, PB. 1999. The crystal structure of a GroEL/peptide complex: Plasticity as a basis for substrate diversity. Cell 99:757-768. Chen, Y.Z. and Ung, CY. 2001. Prediction of potential toxicity and side effect protein targets of a small molecule by a ligand-protein inverse docking approach. J. Mol. Graph. Model. 20: 199-218. Claussen, H., Buning, C., Rarey, M., Lengauer, T. 2001. FlexE: Efficient molecular docking considering protein structure variations. J. Mol. Biol. 308:377-395. Connolly, M.L. 1993. The molecular surface package. J. Mol. Graph. 11:139-141. Davie, E.W., Fujikawa, K., Kisiel, W. 1991. The coagulation cascade: Initiation, maintenance, and regulation. Biochemistry 30: 10363-10370. Dekker, R.J., Eichinger, A., Stoop, A.A., Bode, W., Pannekoek, H., Horrevoets, A.J. 1999. The variable region-l fi'om tissue-type plasminogen activator confers specificity for plasminogen activator inhibitor-1 to thrombin by facilitating catalysis: release of a kinetic block by a heterologous protein surface loop. J. Mol. Biol. 293 :613-627 . ‘ DeLano, W.L., Ultsch, M.H., de Vos, A.M., Wells, J.A. 2000. Convergent solutions to binding at a protein-protein interface. Science 28 7: 1279- 1 283. Dion-Schultz, A. and Howell, EB. 1997. Effects of insertions and deletions in a beta- bulge region of Escherichia coli dihydrofolate reductase. Protein Eng 10:263-272. Doman, T.N., McGovern, S.L., Witherbee, B.J., Kasten, T.P., Kurumbail, R., Stallings, W.C., Connolly, D.T., Shoichet, B.K. 2002. Molecular docking and frigh- throughput screening for novel inhibitors of protein tyrosine phosphatase-IE. J. Med. Chem. 45:2213-2221 . Dunbrack, R.L., Jr. and Karplus, M. 1993. Backbone-dependent rotamer library for proteins. Application to side-chain prediction. J. Mol. Biol. 230:543-574. Esmon, CT. 1995. Thrombomodulin as a model of molecular mechanisms that modulate protease specificity and function at the vessel surface. FASEB J. 9:946-955. Ewing, T.J., Makino, S., Skillman, A.G., Kuntz, ID. 2001. DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided M01. Des 15:411-428. 143 Fischer, D., Lin, S.L., Wolfson, H.L., Nussinov, R. 1995. A geometry-based suite of molecular docking processes. J. Mol. Biol. 248:459-477. Fischer, D., Norel, R., Wolfson, H., Nussinov, R. 1993. Surface motifs by a computer vision technique: Searches, detection, and implications for protein-ligand recognition. Proteins 16:278-292. Fox, T. and Haaksma, EB. 2000. Computer based screening of compound databases: 1. Preselection of benzamidine-based thrombin inhibitors. J. Comput. Aided M01. Des 14:411-425. Fradera, X., Knegtel, R.M., Mestres, J. 2000. Similarity-driven flexible ligand docking. Proteins 40:623-636. Fremont, D.H., Matsumura, M., Stura, E.A., Peterson, P.A., Wilson, LA. 1992. Crystal structures of two viral peptides in complex with murine MHC class I H-2Kb. Science 25 7 2919-927. Gohlke, H., Hendlich, M., Klebe, G. 2000a. Knowledge-based scoring function to predict protein-ligand interactions. J. Mol. Biol. 295 :337-356. Gohlke, H., Hendlich, M., Klebe, G. 2000b. Predicting binding modes, binding affinities and ‘hot spots' for protein-ligand complexes using a knowledge-based scoring function. Perspectives in Drug Discovery and Design 20:1 15-144. Good, A. 2001. Structure-based virtual screening protocols. Curr. Opin. Drug Discov. Devel. 4:301-307. Goodsell, D.S., Morris, G.M., Olson, A.J. 1996. Automated docking of flexible ligands: Applications of AutoDock. J. Mol. Recognit. 9: 1-5. Gruneberg, S., Stubbs, M.T., Klebe, G. 2002. Successful virtual screening for novel inhibitors of human carbonic anhydrase: Strategy and experimental confirmation. J. Med. Chem. 45:3588-3602. Hacker, J. and Fischer, G. 1993. Irnmunophilins: Structure-function relationship and possible role in microbial pathogenicity. Mol. Microbiol. 10:445-456. Halperin, 1., Ma, 3., Wolfson, H., Nussinov, R. 2002. Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins 47:409-443. Heringa, J. and Argos, P. 1999. Strain in protein structures as viewed through nonrotarneric side chains: 11. Effects upon ligand binding. Proteins 3 7:44-55. Hespenheide, B.M., Rader, A.J., Thorpe, M.F., Kuhn, LA. 2002. Identifying protein folding cores from the evolution of flexible regions during unfolding. J. Mol. Graph. Model. 21: 195-207. 144 ‘ “fl.- Holm, L. and Sander, C. 1996. Mapping the protein universe. Science 273 :595-603. Howell, E.E., Shukla, U., Hicks, S.N., Smiley, R.D., Kuhn, L.A., Zavodszky, M.I. 2001. One site fits both: A model for the ternary complex of folate + NADPH in R67 dihydrofolate reductase, a D2 symmetric enzyme. J. Comput. Aided M01. Des 15:1035-1052. Hu, Z., Ma, 8., Wolfson, H., Nussinov, R. 2000. Conservation of polar residues as hot spots at protein interfaces. Proteins 39:331-342. Ippolito, J .A., Alexander, R.S., Christianson, D.W. 1990. Hydrogen bond stereochemistry in protein structure and function. J. Mol. Biol. 215:457-471. Jacobs, D.J., Kuhn, L.A., Thorpe, M.F. 1999. Flexible and rigid regions in proteins. Jacobs, D.J., Rader, A.J., Kuhn, L.A., Thorpe, M.F. 2001. Protein flexibility predictions using graph theory. Proteins 44:150-165. Jelesarov, I. and Bosshard, HR. 1999. Isothermal titration calorimetry and differential scanning calorimetry as complementary tools to investigate the energetics of biomolecular recognition. J. Mol. Recognit. 12:3-18. Jiang, F. and Kim, SH. 1991. "Soft docking": Matching of molecular surface cubes. J. Mol. Biol. 219:79-102. Jones, G., Willett, P., Glen, RC. 1995. Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. J. Mol. Biol. 245 :43-53. Jones, G., Willett, P., Glen, R.C., Leach, A.R., Taylor, R. 1997. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 26 7 2727-748. Kallblad, P. and Dean, RM. 2003. Efficient conformational sampling of local side-chain flexibility. J. Mol. Biol. 326: 1651-1665. Kallen, J ., Mikol, V., Taylor, R, Walkinshaw, MD. 1998. X-ray structures and analysis of 11 cyclosporin derivatives complexed with cyclophilin A. J. Mol. Biol. 283 :435-449. Karlstrom, A., Zhong, G., Rader, G, Larsen, N.A., Heine, A., Fuller, R., List, 8., Tanaka, R, Wilson, [A., Barbas, CR, 111, Lerner, RA. 2000. Using antibody catalysis to study the outcome of multiple evolutionary trials of a chemical task. Proc. Natl. Acad. Sci. U. S. A 97:3878-3883. Ke, H. 1992. Similarities and differences between human cyclophilin A and other beta- barrel structures. Structural refinement at 1.63 A resolution. J. Mol. Biol. 228:539-550. 145 Kc, H., Mayrose, D., Belshaw, P.J., Alberg, D.G., Schreiber, S.L., Chang, Z.Y., Etzkom, P.A., Ho, S., Walsh, CT. 1994. Crystal structures of cyclophilin A complexed wrth cyclosporin A and N-methyl-4-[(E)-2-buteny1]-4,4-dimethylthreonine cyclosporin A. Structure 2:33-44. Klebe, G. 2000. Recent developments in structure-based drug-design. J. Mol. Med. 78:269-281. Knegtel. R.M., Bayada, D.M., Engh, R.A., von der, S.W., van Geerestein, V.J., Grootenhurs, PD. 1999. Comparison of two implementations of the incremental construction algorithm in flexible docking of thrombin inhibitors. J. Comput. Aided Mol. Des 13:167-183. Knegtel, R.M., Kuntz, I.D., Oshiro, CM. 1997. Molecular docking to ensembles of , protein structures. J. Mol. Biol. 266:424-440. Knegtel, R.M. and Wagener, M. 1999. Efficacy and selectivity in flexible database dockrng. Proteins 3 7:334-345. Koehler, .R.T., Villar, H.O., Bauer, K.E., Higgins, D.L. 1997. Ligand-based protein alignment and isozyme specificity of glutathione S-transferase inhibitors. Proteins 28:202-216. Koshland, DE]. 1958. Application of a theory of enzyme specificity to protein synthesis. Proc. Natl. Acad. Sci. USA 44:98-104. Kramer. 8., Rarey, M., . Lengauer, T. 1999. Evaluation of the FLEXX incremental OOHStI'uctron algorithm for protein-ligand docking. Proteins 37:228-241. Getzoff, ED. 1995. Atomic and Kuhn, L.A., Swanson, C.A., Pique, M.E., Tainet, J.A., Proteins 23:536- ge‘tsidue hydrophilicity in the context of folded protein structures. R., Ferrin, T.E. 1982. A geometric Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, 161:269-288. approach to macromolecule-ligand interactions. J. Mol. Biol. Kuntz, I.D., Chen, K., Sharp, K.A., Kollman, PA. 1999. The maximal affinity of ligands. Proc. Natl. Acad. Sci. U. S. A 96:9997-10002. ographic structures of Lalmde. J .M., Bernlohr, D.A., Banaszak, LJ. 1994. X-ray crystal] d hexadecanesulfonic adipocyte lipid-binding protein complexed with palmitate an acrd. Properties of cavity binding sites. Biochemistry 33:4885-4895. mton, J .M. 1993. Procheck - A Laskowski. R.A., MacArthur, M.W., Moss, D.S., Tho f protein structures. Journal of program to check the stereochemical quality 0 Applied Crystallography 26:283-291. 146 Leach, AR. 1994. Ligand docking to proteins with discrete side-chain flexibility. J. Mol. Biol. 235:345-356. Leach, AR. and Lemon, AP. 1998. Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. Proteins 33:227-239. Lengauer, T. and Rarey, M. 1996. Computational methods for biomolecular docking. Curr. Opin. Struct. Biol. 62402-406. Li, D., Levy, L.A., Gabel, S.A., Lebetkin, M.S., DeRose, E.F., Wall, M.J., Howell, E.E., London, RE. 2001. Interligand Overhauser effects in type II dihydrofolate reductase. Biochemistry 40:4242-4252. Lin, J.H., Perryman, A.L., Schames, J.R., McCammon, J.A. 2002. Computational drug design accommodating receptor flexibility: The relaxed complex scheme. J. Am. Chem. Soc. 124:5632-5633. Lovell, S.C., Word, J .M., Richardson, J.S., Richardson, DC. 2000. The penultimate rotamer library. Proteins 40:389-408. Ma, 3., Shatsky, M., Wolfson, H.J., Nussinov, R. 2002. Multiple diverse ligands binding at a single protein site: A matter of pre-existing populations. Protein Sci. 11:184- 197. Mader, M.M. and Bartlett, RA. 1997. Binding energy and catalysis: The implications for transition-state analogs and catalytic antibodies. Chem. Rev. 97: 1281-1302. Massova, 1., Martin, P., Bulychev, A., Kocz, R., Doyle, M., Edwards, B.F., Mobashery, S. 1998. Templates for design of inhibitors for serine proteases: Application of the program DOCK to the discovery of novel inhibitors for thrombin. Bioorg. Med. Chem. Lett. 8:2463-2466. Matsuda, S. and Koyasu, S. 2000. Mechanisms of action of cyclosporin. Immunopharmacology 4 7: 1 19-125. Matsumura, M., Fremont, D.H., Peterson, P.A., Wilson, LA. 1992. Emerging principles for the recognition of peptide antigens by MHC class I molecules. Science 25 7:927-934. McDonald, 1K. and Thornton, J. M. 1994. Satisfying hydrogen bonding potential in proteins. J. Mol. Biol. 238. 777- 793. Mestres, J., Rohrer, D.C., Maggiora, GM. 1997. MIMIC: A molecular-field matching program. Exploiting applicability of molecular similarity approaches. J. Comp. Chem. 18:934-954. 147 Mitchell, J .80., Laskowski, R.A., Alex, A., Thornton, J.M. 1999. BLEEP - potential of mean force describing protein-ligand interactions: 1. Generating potential. J. Computat. Chem. 20: 1 177-1 185. Miyarnoto, S. and Kollman, PA. 1993. Absolute and relative binding free energy calculations of the interaction of biotin and its analogs with streptavidin using molecular dynamics/free energy perturbation approaches. Proteins 16:226-245. Moreno, E. and Leon, K. 2002. Geometric and chemical patterns of interaction in protein- -ligand complexes and their application in docking. Proteins 47: 1-13. Morris, A.L., MacArthur, M.W., Hutchinson, E.G., Thornton, J .M. 1992. Stereochernical quality of protein structure coordinates. Proteins 12:345-364. Morris, G.M., Goodsell, D.S., Halliday, R.S., Huey, R., Hart, W.E., Belew, R.K., Olson, A.J. 1998. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comp. Chem. 19:1639-1662. Muegge, I. and Martin, Y.C. 1999. A general and fast scoring function for protein-ligand interactions: A simplified potential approach. J. Med. Chem. 42:791-804. Murray, C.W., Baxter, C.A., Frenkel, AD. 1999. The sensitivity of the results of molecular docking to induced fit effects: Application to thrombin, thermolysin and neuraminidasc. J. Comput. Aided Mal. Des 13:547-562. Najmanovich, R., Kuttner, J ., Sobolev, V., Edelman, M. 2000. Side-chain flexibility in proteins upon ligand binding. Proteins 39:261-268. Narayana, N., Matthews, D.A., Howell, E.E., Xuong, NH. 1995. A plasmid-encoded dihydrofolate reductase from trimethoprim-resistant bacteria has a novel D2- syrnmetric active site. Nat. Struct. Biol. 2:1018-1025. O'Brien, P.J. and Herschlag, D. 1999. Catalytic promiscuity and the evolution of new enzymatic activities. Chem. Biol. 6:R91-R105. Oakley, A.J., Lo, B.M., Nuccetelli, M., Mazzetti, A.P., Parker, M.W. 1999. The ligandin (non-substrate) binding site of human Pi class glutathione transferase is located in the electrophile binding site (H-site). J. Mol. Biol. 291 :913-926. Oakley, A.J., Lo, B.M., Ricci, G., Federici, G., Parker, M.W. 1998. Evidence for an induced-fit mechanism operating in pi class glutathione transferases. Biochemistry 37:9912-9917. Oakley, A.J., Rossjohn, J., Lo, B.M., Caccuri, A.M., Federici, G., Parker, M.W. 1997. The three-dimensional structure of the human Pi class glutathione transferase Pl- 1 in complex with the inhibitor ethacrynic acid and its glutathione conjugate. Biochemistry 36:576-585. 148 Osterberg, F ., Morris, G.M., Sanner, M.F., Olson, A.J., Goodsell, BS. 2002. Automated docking to multiple target structures: Incorporation of protein mobility and structural water heterogeneity in AutoDock. Proteins 46:34-40. Ottiger, M., Zerbe, O., Guntert, P., Wutlnich, K. 1997. The NMR solution conformation of unligated human cyclophilin A. J. Mal. Biol. 272:64-81. Park, H., Bradrick, T.D., Howell, BE. 1997. A glutamine 67 —) histidine mutation in homotetrameric R67 dihydrofolate reductase results in four mutations per single active site pore and causes substantial substrate and cofactor inhibition. Protein Eng 10:1415-1424. Pearlman, DA. and Charifson, PS. 2001. Are free energy calculations useful in practice? A comparison with rapid scoring functions for the p38 MAP kinase protein system. J. Med. Chem. 44:3417-3423. Prade, L., Huber, R., Manoharan, T.H., Fahl, W.E., Renter, W. 1997. Structures of class pi glutathione S-transfcrase from human placenta in complex with substrate, transition-state analogue and inhibitor. Structure. 5: 1287-1295. Quiocho, F.A., Spurlino, J.C., Rodseth, LE. 1997. Extensive features of tight oligosaccharide binding revealed in high-resolution structures of the maltodextrin transport/chemosensory receptor. Structure. 5:997-101 5. Rader, A.J., Hespenheide, B.M., Kuhn, L.A., Thorpe, M.F. 2002. Protein unfolding: Rigidity lost. Proc. Natl. Acad. Sci. U. S. A 99:3540-3545. Ramachandran, G.N. and Sasisekharan, V. 1968. Conformation of polypeptides and proteins. Adv. Protein Chem. 23:283-438. Raymer, M.L., Sanschagrin, P.C., Punch, W.F., Venkataraman, S., Goodman, E.D., Kuhn, LA. 1997. Predicting conserved water-mediated and polar ligand interactions in proteins using a K-nearest-neighbors genetic algorithm. J. Mal. Biol. 265 :445-464. Reece, L.J., Nichols, R., Ogden, R.C., Howell, EB. 1991. Construction of a synthetic gene for an R-plasmid-encoded dihydrofolate reductase and studies on the role of the N-terminus in the protein. Biochemistry 30:10895-10904. Rognan, D., Lauemoller, S.L., Holm, A., Buus, S., Tschinke, V. 1999. Predicting binding affinities of protein ligands fi'orn three-dimensional models: Application to peptide binding to class I major histocompatibility proteins. J. Med. Chem. 42:4650-4658. Ruppert, J ., Welch, W., Jain, A.N. 1997. Automatic identification and representation of protein binding sites for molecular docking. Protein Sci. 6:524-533. Sadowski, J. and Gasteiger, J. 1993. From atoms and bonds to three-dimensional atomic coordinates: Automatic model builders. Chem. Rev. 93:2567-2581. 149 Sandak, B., Wolfson, H.J., Nussinov, R. 1998. Flexible docking allowing induced fit in proteins: Insights from an open to closed conformational isomers. Proteins 32:159-174. Sanschagrin, RC. and Kuhn, LA. 1998. Cluster analysis of consensus water sites in thrombin and trypsin shows conservation between serine proteases and contributions to ligand specificity. Protein Sci. 7 :2054-2064. Saphire, A.C., Bobardt, M.D., Gallay, RA. 2000. Human immunodeficiency virus type 1 hijacks host cyclophilin A for its attachment to target cells. Immunal. Res. 21 :21 1- 217. Schaffer, L. and Verkhivker, GM. 1998. Predicting structural effects in HIV-1 protease mutant complexes with flexible ligand docking and protein side-chain optimization. Proteins 33:295-310. Schapira, M., Raaka, B.M., Samuels, H.H., Abagyan, R. 2000. Rational discovery of novel nuclear hormone receptor antagonists. Proc. Natl. Acad. Sci. U. S. A 97:1008-1013. Schapira, M., Raaka, B.M., Samuels, H.H., Abagyan, R. 2001. In silico discovery of novel retinoic acid receptor agonist structures. BMC. Struct. Biol. 1:1-7. Schapira, M., Totrov, M., Abagyan, R. 1999. Prediction of the binding energy for small molecules, peptides and proteins. J. Mal. Recognit. 12:177-190. Schnecke, V. and Kuhn, LA. 1999. Database screening for HIV protease ligands: The influence of binding-site conformation and representation on ligand selectivity. Proc. Int. Canf Intell. Syst. Mal. Biol. 242-251. Schnecke, V. and Kuhn, LA. 2000. Virtual screening with salvation and ligand-induced complementarity. Perspectives in Drug Discovery and Design 20:171-190. Schnecke, V., Swanson, C.A., Getzoff, E.D., Tainer, J.A., Kuhn, LA. 1998. Screening a peptidyl database for potential ligands to proteins with side-chain flexibility. Proteins 33 :74-87. Schneider, G. and Bohm, HI. 2002. Virtual screening and fast automated docking methods. Drug Discov. T aday 7:64-70. Shoichet, B.K. and Kuntz, ID. 1993. Matching chemistry and shape in molecular docking. Protein Eng 6:723-732. Shoichet, B.K., McGovern, S.L., Wei, 3., Irwin, J.J. 2002. Lead discovery using molecular docking. Curr. Opin. Chem. Biol. 6:439-446. Shoichet, B.K., Stroud, R.M., Santi, D.V., Kuntz, I.D., Perry, KM. 1993. Structure-based discovery of inhibitors of thymidylate synthase. Science 259: 1445-1450. 150 Sleigh, S.H., Seavers, P.R., Wilkinson, A.J., Ladbury, J.E., Tame, JR. 1999. Crystallographic and calorimetric analysis of peptide binding to OppA protein. J. Mol. Biol. 291:393-415. Smithrud, DB. and Benkovic, SJ. 1997. The state of antibody catalysis. Curr. Opin. Biatechnal. 8:459-466. Sobolev, V., Wade, R.C., Vriend, G., Edelman, M. 1996. Molecular docking using surface complementarity. Proteins 25 : 120- l 29. Sotriffer, C.A., Gohlke, H., Klebe, G. 2002. Docking into knowledge-based potential fields: A comparative evaluation of DrugScore. J. Med. Chem. 45: 1967-1970. Stahl, M. and Rarey, M. 2001. Detailed analysis of scoring functions for virtual screening. J. Med. Chem. 44:1035-1042. Strader, M.B., Smiley, R.D., Stinnett, L.G., VerBerkmoes, N.C., Howell, EB. 2001. Role of S65, Q67, I68, and Y69 residues in homotetrameric R67 dihydrofolate reductase. Biochemistry 40:1 1344-1 1352. Strynadka, N.C., Jensen, S.E., Johns, K., Blanchard, H., Page, M., Matagne, A., Frere, J.M., James, MN. 1994. Structural and kinetic characterization of a beta- lactarnase-inhibitor protein. Nature 368:657-660. Stubbs, M.T. and Bode, W. 1993. A player of many parts: The spotlight falls on thrombin's structure. T hramb. Res. 69: 1-58. Stubbs, M.T. and Bode, W. 1995. The clot thickens: Clues provided by thrombin structure. Trends Biochem. Sci. 20:23-28. Tame, J .R. 1999. Scoring functions: A view from the bench. J. Comput. Aided Mal. Des 13:99-108. Taylor, J.S. and Burnett, RM. 2000. DARWIN: A program for docking flexible molecules. Proteins 41:173-191. Thorpe, M.F., Hespenheide, B.M., Yang, Y., Kuhn, LA. 2000. Flexibility and critical hydrogen bonds in cytochrome c. Pac. Symp. Biacamput. 191-202. Thorpe, M.F. and Lei, M. 2003. Macromolecular flexibility. Philosophical Magazine Special Edition: 1-8. Thorpe, M.F., Lei, M., Rader, A.J., Jacobs, D.J., Kuhn, LA. 2001. Protein flexibility and dynamics using constraint theory. J. Mal. Graph. Model. 19:60-69. Tuffery, P., Etchebest, C., Hazout, S., Lavery, R. 1991. A new approach to the rapid determination of protein side chain conformations. J. Biamal. Struct. Dyn. 8:1267-1289. 151 Tulinsky, A. 1996. Molecular interactions of thrombin. Semin. Thramb. Hemast. 22:117- 124. Verdonk, M.L., Cole, J.C., Watson, P., Gillet, V., Willett, P. 2001. SuperStar: Improved knowledge-based interaction fields for protein binding sites. J. Mal. Biol. 307:841-859. Vriend, G. 1990. What If - A Molecular Modeling and Drug Design Program. Journal of Molecular Graphics 8:52-56. Wallace, A.C., Laskowski, R.A., Thornton, J .M. 1995. LIGPLOT: A program to generate schematic diagrams of protein-ligand interactions. Protein Eng 8: 127-134. Wang, R., Lai, L., Wang, S. 2002. Further development and validation of empirical scoring fiinctions for structure-based binding affinity prediction. J. Comput. Aided Mal. Des 16:11-26. Waszkowycz, B. 2002. Structure-based approaches to drug design and virtual screening. Curr. Opin. Drug Discov. Devel. 5 :407-413. Wu, Y.D. and Houk, K.N. 1987. Theoretical transition structures for hydride transfer to methyleneiminium ion from methylamine and dihydropyridine. On the nonlinearity of hydride transfers. J. Am. Chem. Soc. 109:2226-2227. Zavodszky, M.l., Sanschagrin, P.C., Korde, R.S., Kuhn, LA. 2003. Distilling the essential features of a protein surface for improving protein-1i gand docking, scoring, and virtual screening. J. Comput. Aided Mal. Des. in press. 152 IIllllll[IllilljljllllflllsII