LIBRARY Michigan State University This is to certify that the dissertation entitled STRUCTURE-BASED LIGAND SCREENING AND DESIGN FOR AMINOACYL-tRNA SYNTHETASE INHIBITORS presented by SAI CHETAN K. SUKURU has been accepted towards fulfillment of the requirements for the PhD. degree in Biochemistry and Molecular Biology 5/0., £154: %_ 14% ___.. Major Professor’s Signature ./17//:?/c 7 Date MSU is an affirmative-action, equal-opportunity employer —._._.—u-o--¢n-._.- PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/07 p:/C|RCIDaleDue.indd-p,1 STRUCTURE-BASED LIGAND SCREENING AND DESIGN FOR AMINOACYL—tRNA SYNTHETASE INHIBITORS By Sai Chetan K. Sukuru A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Biochemistry and Molecular Biology 2007 ABSTRACT STRUCTURE-BASED LIGAND SCREENING AND DESIGN FOR AMINOACYL-tRNA SYNTHETASE INHIBTORS By Sai Chetan K. Sukuru Asparaginyl-tRNA synthetase (AsnRS) is a rational target for drug development against lymphatic filariasis caused by the human parasite, Brugia malayi. This thesis describes the application of structure-based computational techniques to identify novel Brugia AsnRS inhibitors. A new method, developed to incorporate specificity determinants in virtual ligand screening and design, is also presented. Large databases of organic molecules were screened using SLIDE to identify potential inhibitors of the ATP binding site in a 1.9 A resolution Brugia AsnRS crystal structure. SLIDE is a structure-based virtual screening tool that models the flexibility of protein and ligand side chains while docking. Seven new classes of compounds, identified by SLIDE, were confirmed as Brugia AsnRS inhibitors in experimental assays. Analogs of variolin B, one of the inhibitors, showed 3- to 8-fold selectivity for Brugia over human AsnRS. These analogs, unlike variolin B, cannot bind to the closed conformation of the Brugia AsnRS crystal structure due to steric clashes. Modeling of different main-chain loop flexibility in Brugia and human AsnRS, due to a single amino acid sequence difference at the base of the loop, can explain the selectivity of these analogs. Variolin B and triazinylamine, the top most potent inhibitors in experimental assays, are predicted to bind in the adenosyl pocket of Brugia AsnRS. To optimize these compounds to enhance their binding affinity and specificity for the enzyme, the analogs with different substituents were designed. The energetic favorability of the designed analogs was assessed using predicted protein-ligand complementarity scores and the difference in the ligand internal (conformational) energies between their bound and lowest-energy free conformations. Analogs with maximal shape and chemical complementarity with Brugia AsnRS binding site and minimal strain were identified for chemical synthesis by our collaborators. Determining the significant differences between protein binding sites is key to designing drugs that are selective to one protein relative to another. Here, an approach is presented, based on SLIDE’s calculation of points at which ligand atoms can make chemically favorable interactions with the protein, to automatically compare binding sites and identify their significant similarities and differences with respect to favored ligand interactions. Application of this method to Brugia AsnRS and a set of ATP-binding proteins reveals novel chemical and steric differences in their binding sites which can guide fiiture structure-based drug design efforts against the parasite. ACKNOWLEDGEMENTS I would like to first thank my advisor Dr. Leslie Kuhn for her patient guidance throughout my graduate school career. Her infectious energy and enthusiasm in teaching science inspired me to work harder. I consider myself fortunate to be mentored by her and aspire to become a competent, compassionate and responsible scientist like her. Working in her lab offered me great opportunities to not only train myself in different aspects of drug design but also to meet with excellent scientific collaborators. I couldn’t have asked for a better collaborator cum teacher than Dr. Michael Kron. His passion for infectious diseases research and generous appreciation of diverse approaches to the problem encouraged me to contribute actively to the project. I would also like to sincerely thank other members of my guidance committee, Dr. Robert Cukier, Dr. Michael Feig, Dr. Robert Hausinger and Dr. Rawle Hollingsworth. Their timely advice and critical assessment has helped me keep on track and improved the quality of the research too. I would also like to extend an enthusiastic thank you to other collaborators on the AsnRS project, Dr. Jonathan Morris, Dr. Morten Grotli, Dr. Stephen Cusack, Dr. Thibaut Crepin, Dr. Frank Danel and Dr. Malcolm Page. The research work that I have completed over the years would not have been possible without the help of many current and former members of the Kuhn lab, who I have had the pleasure to work with. I would like to wholeheartedly thank Matthew Tonero, Sandeep Namilikonda, Jeff Van Voorst, Dr. Maria Zavodszky, Anjali Rohatgi, iv Sameer Arora, Litian He, Erica Scheller and Andrew Stumpff-Kane. Although I have never worked with them, I will like to thank Dr. Paul Sanschagrin, Dr. A. J. Rader, Dr. Brandon Hespenheide and Dr. Ming Lei for their helpful advice. I will also cherish my friendship with many current and former graduate students from other labs in the department. Of particular note are Dr. Lishan Yao, Dr. Dean Shooltz, Dr. Harini Krishnamurthy, Josh Kwekel, James Johnson, Colleen Doherty, Joonyul Kim, Sean Law and Soledad Quiroz. Helen Geiger, Julie Oesterle, Lesley Reed, Jessica Hudson, Melinda Kochenderfer and Dr. K Padmanabhan deserve a special thanks for their timely help at various points during the course of my graduate school career. I would like to extend a special thanks to my friends Tejas Kadakia, Soumya Korrapati, Rahul Datta, Sumohan Misra, Dr. RG Iyer, Dr. Nan Ding, Dr. Priya Mani and Dr. Gauri Jawdekar for all the support and good times that made my stay in East Lansing really memorable. Finally, I would like to thank my family for their overwhelming affection and encouragement. I want to thank my mom, dad, uncles, aunts, cousins, nephews and nieces for their belief in me and prayers for my success. TABLE OF CONTENTS LIST OF TABLES x LIST OF FIGURES xi LIST OF ABBREVIATIONS xii Chapter 1: Introduction 1 1.1 Screening the chemical space ........................................................................................ 1 1.2 Structure-based screening and docking ........................................................................ 3 1.3 Modeling flexibility in ligand binding .......................................................................... 7 1.4 Defining binding site specificity determinants ............................................................ 9 1.5 Motivation for this thesis work ................................................................................... 10 References .......................................................................................................................... 13 Chapter 2: Discovering new classes of Brugia malayi asparaginyl-tRNA synthetase inhibitors and relating specificity to conformational change 20 2.1 Abstract ......................................................................................................................... 20 2.2 Introduction .................................................................................................................. 21 2.3 Materials and methods ................................................................................................. 27 2.3.1 Asparaginyl-tRNA synthetase structures ............................................................ 27 2.3.2 Experimental assay ............................................................................................... 29 2.3.3 Screening and docking with SLIDE .................................................................... 29 2.3.4 Scoring protein-ligand interactions ..................................................................... 30 2.3.5 Modeling main-chain flexibility .......................................................................... 31 2.4 Results .......................................................................................................................... 35 2.4.1 Scoring Brugia AsnRS—ligand interactions ........................................................ 35 2.4.2 Screening the databases ........................................................................................ 39 2.4.3 Modeling the conformational flexibility of Brugia AsnRS ............................... 52 2.4.4 Impact of main-chain conformational flexibility on ligand binding: interpreting the observed affinities and spevcificities ...................................................................... 58 2.5 Discussion .................................................................................................................... 59 2.6 Conclusions .................................................................................................................. 62 References .......................................................................................................................... 64 Chapter 3: Optimizing variolin B and triazinylamine to improve their binding affinity and specificity for Brugia AsnRS 73 3.1 Introduction .................................................................................................................. 73 3.2 Methods ..... - ................................................................................................................... 7 7 3.2.1 Designing new analogs and generating their structures ..................................... 77 3.2.2 Scoring the interactions between designed analogs and Brugia AsnRS ........... 78 3.2.3 Assessing ligand internal energies ....................................................................... 78 3.3 Results and discussion ................................................................................................. 79 3.3.1 Variolin B analogs ................................................................................................ 79 3.3.2 Triazinylamine analogs ........................................................................................ 87 3.4 Conclusions .................................................................................................................. 93 References .......................................................................................................................... 94 Chapter 4: Automated shape and chemistry comparison for defining binding site invariants and specificity determinants 96 4.1 Abstract ......................................................................................................................... 96 4.2 Introduction .................................................................................................................. 97 4.3 Materials and methods ............................................................................................... 103 4.3.1 Representing the protein binding sites .............................................................. 103 4.3.2 Superposition to bring the templates in the same reference ............................ 105 4.3.3 Complete-linkage clustering to identify the shared interaction sites .............. 106 4.3.4 Post-clustering processing to identify similar and chemical difference sites .1 11 4.3.5 Relative significance of the chemical difference sites ..................................... 113 4.3.6 Clustering sensitivity to superpositional accuracy ........................................... 114 4.3.7 Steric difference sites ......................................................................................... 115 4.3.8 Datasets ............................................................................................................... 117 4.4 Results ............................ _ ............................................................................................ 1 25 4.4.1 Chemical difference sites: Explaining the observed experimental selectivity of ligands bound to proteins of the AffinDB set ............................................................ 125 4.4.2 Similar sites identified in the ATP-set proteins ................................................ 134 4.4.3 Chemical difference sites identified in protein pairs of the ATP set .............. 141 4.4.4 Steric difference sites identified between AsnRS and phosphorylase kinase . 149 4.5 Discussion .................................................................................................................. 151 4.5.1 Relative significance of chemical difference sites ........................................... 151 4.5.2 Integrating our method into virtual screening protocol .................................... 153 4.5.3 Conformational flexibility and chemical difference sites ................................ 154 4.5.4 Superpositional accuracy ................................................................................... 154 4.6 Conclusions ................................................................................................................ 155 References ........................................................................................................................ 1 56 vii Chapter 5: Summary and future directions 163 5.1 Virtual screening for aminoacyl-tRN A synthetase inhibitors ................................. 163 5.1.1 Summary and perspective .................................................................................. 163 5.1.2 Future directions ................................................................................................. 164 5.2 Using specificity determinants in virtual screening ................................................. 166 5.2.1 Summary and perspective .................................................................................. 166 5.2.1 Future directions ................................................................................................. 167 References ........................................................................................................................ 168 viii LIST OF TABLES 2.1 Data collection and refinement statistics of Brugia AsnRS crystal structures ....... 28 2.2 Predicted AsnRS-inhibitor complementarity scores and experimentally determined affinity values of known ligands of Brugia AsnRS ............................................................. 37 2.3 Predicted AsnRS-inhibitor complementarity scores and experimentally determined affinity values of SLIDE-predicted inhibitors of Brugia AsnRS ........................................ 43 2.4 Predicted AsnRS-inhibitor complementarity scores and experimentally determined affinity values of synthesized analogs of SLIDE-discovered Brugia AsnRS inhibitors variolin B and cycloadenosine ............................................................................................... 45 2.5 Known flexible regions in Brugia AsnRS ................................................................ 54 3.1 Designed analogs of variolin B with sulfamoyl-asparagine (S-ASN) attached to two different positions on the variolin scaffold .................................................................... 80 3.2 Predicted protein-ligand complementarity scores and difference in ligand internal energies of docked orientations of designed variolin analogs compared with known ligands ..................................................................................................................................... 86 3.3 Predicted protein-ligand complementarity scores and difference in ligand internal energies of docked orientations of designed triazinylamine analogs .................................. 88 4.1 Proteins of the AffinDB set used in our analysis with their bound ligands and affinity data ........................................................................................................................... 119 4.2 Proteins of the ATP set used in our analysis ........................................................... 123 4.3 Significant chemical difference sites identified between protein pairs of the AffinDB set that can explain the known relative binding affinities of their bound ligands. ...................................................................................................... 126 4.4 Significant chemical difference sites identified between Brugia AsnRS and representative structures of other ATP-binding proteins ................................................... 142 ix LIST OF FIGURES 2.1 Brugia AsnRS dimer and the three class II aminoacyl-tRNA synthetase (AARS) sequence motifs ...................................................................................................................... 23 2.2 The three residues that differ near the active sites of Brugia and human AsnRS and diverse ROCK-generated conformations of known flexible regions of Brugia AsnRS ....32 2.3 Enrichment plot for the three different scoring functions - SLIDE score, DrugScore and X-Score ......................................................................................................... 35 2.4 The predicted binding modes of seven SLIDE-discovered Brugia AsnRS inhibitors compared with the crystallographic binding mode of known ligand ASNAMS ............... 41 2.5 The surface of closed and open Brugia AsnRS conformations compared with its ligand-free (apo) crystal structure. The adenine binding loop residue His 219 undergoes significant motion between the closed and open conformations. The steric clashes between the long side-chain variolin B derivative and the closed conformation of the adenine binding loop compared with its overlap-free docked orientation in the open conformation ........................................................................................................................... 56 3.1 The 2D structure of variolin B scaffold and its SLIDE-predicted binding mode...75 3.2 The 2D structure of triazinylamine, its SLIDE-predicted binding mode, and the truncated scaffold used to design substituents ...................................................................... 76 3.3 Manually assessed binding modes of designed variolin B analogs I, II and 111 compared with the SLIDE-predicted binding mode of variolin B and ASNAMS bound in the crystal structure ................................................................................................................ 82 3.4 Two manually assessed binding modes of designed variolin B analog IV compared with the SLIDE-predicted binding mode of variolin B and ASNAMS bound in the crystal structure. For the attached asparagine side chain to be docked in the amino acid pocket, the pendant ring of analog IV has to be buried fiirther in the ribose pocket of AsnRS binding site .............................................................................................................................. 84 3.5 The SLIDE-predicted binding mode of designed variolin B analog V is compared with the SLIDE-predicted binding mode of variolin B and ASNAMS bound in the crystal structure .................................................................................................................................. 85 3.6 The SLIDE-predicted binding modes of nine designed analogs of triazinylamine are compared with the SLIDE-predicted binding mode of variolin B and ASNAMS bound in the crystal structure ................................................................................................ 90 4.1 The steps in the algorithm to perform automated comparison of binding sites are briefly explained using a flowchart ..................................................................................... 104 4.2 The complete-linkage clustering algorithm is briefly explained using a hypothetical example ........................................................................................................... 108 4.3 A plot to show the sensitivity of the clustering algorithm to superpositional accuracy ................................................................................................................................ 116 4.4 Significant chemical difference sites identified between protein pairs of the AffinDB set that can explain the known relative binding affinities of their bound ligands ................................................................................................................................... 128 4.5 Chemically similar sites identified for each of the four subsets of ATP-binding proteins .................................................................................................................................. l 3 5 4.6 Significant chemical difference sites identified between Brugia AsnRS and representative structures of each of the other three classes of ATP -binding proteins ..... 144 4.7 Steric difference sites, showing accessible space in Brugia AsnRS relative to a representative protein kinase structure, and vice-versa ..................................................... 15 0 Images in this thesis/dissertation are presented in color. 2D 3D AARS ASNAMS AsnRS ATP CSD DS HTS LBHAMP MMFF NCI PDB QSAR RMSD ROCK SLIDE LIST OF ABBREVIATIONS 2-dimensional ' 3-dimensional aminoacyl-tRN A synthetases asparaginyl sulfamoyl adenylate asparaginyl-tRN A synthetase adenosine triphosphate Cambridge Structural Database DrugScore hi gh-throughput screening L-aspartate-B-hydroxamate adenylate molecular dynamics Merck Molecular Force Field National Cancer Institute Protein Data Bank quantitative structure-activity relationship root mean square deviation Rigidity Optimized Conformational Kinetics Screening for Ligands by Induced-fit Docking Efficiently xii Chapter 1 Introduction 1.1 Screening the chemical space Screening of chemical compounds to discover inhibitors or agonists of proteins as drug targets is a dominant tool in the modern drug discovery process. In 1909, Paul Ehrlich and colleagues were the first to screen a few hundred compounds to discover a drug that that binds to a certain ‘chemoreceptor’ of therapeutic interest (1-3). Advances in assay and instrument technologies, coupled with the use of combinatorial chemistry to produce diverse libraries of compounds, have significantly increased the compound coverage and throughput in screening methodologies (4). Experimental high-throughput screening (HTS) can now screen between thousands and millions of compounds in an attempt to discover novel drug leads. However, despite its many successes, the method has its drawbacks. It is a complex and expensive method with declining productivity in terms of the money spent and the number of new drugs developed (5). It also provides no insights into the mode of interaction between the compounds and the protein target. The advent of computers and the technology to store chemical information enabled creation of chemical databases and associated information retrieval systems. Virtual screening, analogous with HTS but less expensive, refers to screening of a large number of compounds stored in virtual libraries by computer instead of by experiment. In its earliest forms during the 19805, virtual screening was widely used to find molecules in databases that were ‘similar’ to the query molecule. The central premise of these similarity search methods was that structurally similar molecules are more likely to exhibit similar biological activity (6, 7). These similarity methods could be differentiated from each other based on the way they represent the molecules and the way they calculate the similarity between the molecules (8-10). The most common representations are binary fingerprints which encode molecular structures in a string of 0s and Is (bits) that describe the presence or absence of a certain feature, e.g. a functional group. The Tanimoto coefficient (8) is a common similarity metric used to efficiently compare these binary strings. Molecular shape and electrostatics, in conjunction with structural fingerprints, have also been found to be important variables in similarity searching (11). However, all similarity methods come with an inherent risk of missing the structurally diverse potential candidates in the database. Virtual screening works best in an information-rich environment, i.e., its ability to evaluate a large number of compounds automatically with a high degree of accuracy will be greatest in those cases where the most information is available (12). For example, in cases where high resolution structural information about the target protein is available, success is more likely when this information is also taken into account. The advantages of and challenges to protein structure-based virtual screening are discussed in the next section. 1.2 Structure-based screening and docking In 1902 Sir Archibald Garrod was the first to attribute a disease (alkaptonuria) to an enzyme defect, to what he identified as an “inborn error of metabolism” (13). The biochemical evidence of his theory was provided fifty years later when the enzyme responsible for the metabolic defect was identified (14). The elucidation of the structural and mechanistic details of the enzyme involved in the disease had to wait even longer (15), since the technology to determine protein structures wasn’t developed until 19608. Myoglobin and hemoglobin were the first protein structures solved at atomic resolution by Kendrew and Perutz (16, 17). Their work showed it was possible to see how macromolecules bind to their ligands providing insights into how they carry out their functions. The crystal structure of hemoglobin, a pharmacologically relevant target, enabled Goodford and colleagues to carry out the first reported example of ‘structure- based design’ (18). They used primitive physical models to find compounds similar to diphosphoglycerate, the allosteric effector of hemoglobin. Their approach had a significant impact on how chemists and molecular modelers viewed protein active sites and the possibility for rational design. This was followed by the development of an innovative molecular docking approach by Kuntz and colleagues (19) to dock ligands to binding sites of medically relevant proteins. Molecular docking can be defined as the prediction of the binding orientation of small molecule ligand candidates to protein binding sites. Kuntz’s classical algorithm for computational docking generates a set of spheres to describe the volume, or negative image, of the binding site and uses the centers of these spheres as sites for matching to ligand atoms. Sets of spheres representing the binding site are matched to sets of ligand atoms to generate a ligand orientation. Since then, the exponential growth in the number of protein structures deposited to the Protein Data Bank (PDB; (20)) has been accompanied by a rapid growth and evolution of structure-based drug design methods in the last two decades (21 -23). The structure—based screening and docking tool SLIDE (24-26), developed in our laboratory, was used for the work presented in this thesis. SLIDE represents the binding site of the target protein by a set of points collectively called a template. These template points identify the optimal positions for potential ligand atoms to form favorable hydrogen bond and hydrophobic interactions with the neighboring protein atoms. The ligand candidates in the database are similarly represented by a set of hydrogen bonding and hydrophobic interaction points, assigned to polar atoms and centers of hydrophobic atom clusters respectively. A ligand candidate is docked after finding a feasible match between all possible triplets of its ligand interaction points and all geometrically and chemically compatible protein template triangles. Molecular docking is an integral part of protein structure-based virtual screening. For docking to be useful in a high-throughput mode, accuracy and speed in assessing orientations and chemical complementarity of the two molecules are key factors. Since these can be contradictory requirements, some simplifications (which will be discussed in later sections) need to be made to make docking tractable in virtual screening efforts. Nevertheless, structure-based screening and docking methods have led to successful discoveries, such as the HIV protease inhibitor Viracept (27) and the anti-influenza drug Relenza (28), and significantly higher hit rates (ligands discovered per molecules tested) than experimental HTS (29). While experimental validation of docking hits must always be done, docking provides a clear, testable hypothesis of how the molecules interact and intelligently filters a chemical database to focus on those molecules that are most complementary with the target protein structure. The fundamental challenges in structure-based screening and docking are sampling and scoring (30-32). Sampling relates to assessing various conformations and relative orientations of the flexible molecules — ligands as well as the target protein — while scoring relates to calculating the binding affinities between each docked ligand candidate and the target protein. Since the computational complexity of the problem increases exponentially with the number of degrees of freedom (bond rotations, molecular translations/rotations), modeling the flexibility of ligands is less difficult than that of the target protein. Conformational sampling of the ligands is necessary during docking because usually it is not known which low-energy conformation interacts most favorably with the protein. This can be achieved by pro-computing a database of conformers for each compound to be screened. Examples of virtual screening tools that work with ligand conformer libraries are FRED (33) and SLIDE (34). Alternatively, docking programs can explore the conformational flexibility of the ligands as the docking proceeds, as done by a variety of docking tools, e.g. the genetic algorithm in GOLD (35), Monte Carlo methods in QXP (36) and incremental construction in FlexX (37) and GLIDE (38). Approaches to model protein flexibility during docking will be discussed in the next section of this chapter. The ultimate purpose of a screening and docking tool is to provide a ranked list of compounds to be tested for biological activity. Therefore, once a binding orientation for a ligand candidate is generated by the docking program, it needs to be scored to rank the quality of the orientation, not only with respect to other possible orientations of the same compound but also with respect to other compounds in the database. The protein-ligand scoring functions can be broadly divided into three categories: a) force field-based, b) empirical, and c) knowledge-based. The force field-based methods, although generally more accurate, are computationally intensive, and hence unsuitable for high-throughput molecular docking purposes. Structure-based screening and docking tools like QXP (36) employ force field-based scoring functions in a minimalist manner with no explicit solvent term. Empirical scoring functions, derived from fitting to known experimental binding affinities of different protein-ligand complexes, are widely employed by docking algorithms, e.g. PLP (27), FlexX (3 7), and SLIDE (26, 34). These scoring functions use an additive approximation to estimate the binding affinity and are usually composed of several terms corresponding to hydrogen bonding, hydrophobic interactions and, in some cases, interactions with metal ions. On the other hand, knowledge-based scoring functions use information from available structures of protein-ligand complexes to estimate the binding affinity based on the potentials of mean force (e.g. PMF (39)) or preferred interatomic distances (e. g. DrugScore (40)). The different scoring functions may give the individual docking programs a particular advantage in one aspect with respect to another, but they are still far from perfect (32). No current scoring function is able to accurately estimate the binding affinities across different protein classes. Consensus scoring by combining several scoring functions has been suggested to overcome their individual deficiencies and enhance the hit rates (41). The empirical scoring function in SLIDE is a weighted sum of hydrophobic and hydrogen-bonding interaction terms trained to match binding affinity values in known protein-ligand complexes (26). For the work presented here, I have used SLIDE’s internal scoring fimction, DrugScore (40), and X-Score, another empirical scoring function (42), focusing on assessing which scoring function or a combination thereof performed best for the particular protein target. 1.3 Modeling protein flexibility in ligand binding Virtual screening and docking experiments have led to the identification of many bioactive compounds, although very different from the natural ligands for a given protein, that bind to the active site as predicted (43). The fact that structurally diverse ligands can be recognized by the same binding site is facilitated by protein flexibility in ligand binding. The evidence of conformational rearrangements in the protein leading to the binding of structurally diverse ligands is substantial (44, 45). From a structure-based drug design perspective, incorporating protein flexibility into the docking algorithm should enhance the diversity of lead compounds with desired bioactivity (46, 47). Protein conformational changes induced upon ligand binding can range from the local rotation of a few side chains to whole domain rearrangements (48, 49). In general, two broad schemes have been employed to model protein flexibility in structure-based screening and docking methods. First, an ensemble of protein conformations obtained from multiple sources, 6. g. multiple crystal or NMR structures of the same protein, molecular dynamic simulations or homology models. A ligand candidate is then docked to an average (50, 51), most conserved (52), or all of these protein conformations (53). Second, the protein conformation is allowed to change during the docking process, either by rotating the optimal side-chain torsional angles (34, 54) or by using a rotamer library to represent the preferred side-chain orientations (55, 56). Deriving a high-affinity lead compound that specifically binds to an alternate conformation of the target protein is a highly attractive strategy in structure-based drug design. However, the existing approaches are limited either by availability of experimentally solved structures or by the ability of algorithms to sample large scale motions involving the protein backbone. The screening and docking tool SLIDE (26, 34), used for the work presented here, models flexibility of protein side chains during docking. It resolves steric overlaps between the docked ligand and the protein through minimal directed rotation, determined by mean-field optimization, of the rotatable bonds in protein and ligand side chains (34, 49). To model the main-chain flexibility of our target protein and assess its impact on ligand binding, we used a graph-theoretic algorithm ProFlex (57, 58) to first predict the flexible and rigid regions in the protein structure, and then search the conformational space available to those flexible regions using a restricted random-walk sampling algorithm ROCK (Rigidity Optimized Conformational Kinetics) (59, 60). ProFlex predicts the protein flexibility based on the analysis of the constraints posed by the protein’s network of covalent bonds and non-covalent interactions including hydrogen bonds, salt bridges and hydrophobic interactions. Protein conformers, generated by sampling dihedral angles in ROCK, are either accepted or rejected, depending upon whether they maintain the non-covalent bond network and have no van der Waals overlaps between atoms. 1.4 Defining binding site specificity determinants Even some of the most potent bioactive compounds discovered by structure-based screening are beset by problems with respect to their efficacy to discriminate between the target protein and other homologs or related proteins. The problem of specificity has plagued many common drug targets like serine proteases (61), nuclear receptors (62) and matrix-metalloproteinases (63). In the case of protein kinases, the problem is more acute as some of the promising ATP-site inhibitors, including those that were approved or were in clinical development, have been reported to have poor specificity profiles (64). Most of the methods used to model specificity of drug candidates apply techniques to compare protein binding site models. Kastenholz and co-workers analyzed several binding site models, generated by the program GRID (65), using the consensus principal component analysis to obtain contour plots identifying regions that are important for specificity in the chosen target protein (66). GRID generates molecular interaction fields to identify the energetically favorable sites for ligands to bind to a protein. The GRID/PCA technique was adapted in a different way by Braiuca and co-workers to partially account for protein flexibility that could predict selectivity differences caused by amino-acid residue differences in not only the active site but also in regions that are not directly interacting with the ligand (67). Sheridan and co-workers developed a mathematically simpler method FLOGTV (68) that uses the trend vector paradigm to compare the binding site field maps, generated by the program FLOG (69), to visualize the differences in closely related proteins superimposed in a reasonable way. Deng and co-workers used hierarchical clustering to analyze the interaction fingerprints, which reduce the three- dimensional (3D) structural binding information of protein-ligand complexes into corresponding one-dimensional binary strings, to identify similarities and diversities between their small-molecule binding interaction patterns (70). By modifying the virtual screening protocol to introduce essential protein-ligand interactions, identified by prior findings, as filters during the docking stage, Perola reported significant reduction in the false positive rates in kinase virtual screens (71). However, these methods are handicapped by their reliance on known ligands of the target protein, and their results cannot be easily integrated in any other structure-based screening and docking protocols. A new method to identify specificity determinants in one protein relative to another has been developed for the work presented here. This method uses complete- linkage clustering to identify significant similarities and differences between SLIDE- generated templates representing protein binding sites. The results generated by this method can be incorporated into our structure-based screening protocol to identify prospective ligands specific to a target protein. 1.5 Motivation for this thesis work Lymphatic filariasis is caused by the parasitic nematode worms Wuchereria bancrofti and Brugia malayz’. It is a debilitating human disease that afflicts more than 200 million people worldwide and more than a billion people reside in areas where the disease is actively transmitted, making it one of the top ten tropical diseases being targeted by the World Health Organization (72, 73). In its most obvious manifestation, lymphatic filariasis causes enlargement of the entire leg or arm, the genitals, vulva and breasts. The crippling physical effects of the disease have a huge social and economic 10 impact. Existing drugs to combat the disease, discovered decades ago as chemotherapeutic agents, are deficient because of their inability to kill adult worms, severe side effects, long treatment durations and the emergence of drug resistant strains in humans (72, 74). Brugia malayi asparaginyl tRNA synthetase (AsnRS) has been acknowledged as a rational target for drug development against filariasis (72, 75). This dissertation presents the discovery of new inhibitors of Brugia AsnRS using structure- based ligand screening and design techniques. Chapter 2 describes the discovery of seven new classes of Brugia AsnRS inhibitors using SLIDE (25, 26, 34), our virtual screening tool capable of modeling protein and ligand side chains during docking. The discovery of these inhibitors was a result of our collaboration with a parasitologist and biochemist, Dr. Michael Kron at Medical College of Wisconsin, crystallographer Dr. Stephen Cusack at EMBL Grenoble in France, medicinal chemists, Dr. Jonathan Morris at University of Adelaide in Australia and Dr. Morten Grotli at University of Goteborg in Sweden and biochemists, Dr. Frank Danel and Dr. Malcolm Page at Basilea Pharrnaceutica in Switzerland. The sampling of the active-site loop motions, using our restricted random-walk algorithm ROCK (59, 60), not only allows the modeling of protein conformational flexibility in ligand binding but also provides. a potent tool for exploiting it in structure-based screening and design of species selective inhibitors. Chapter 3 describes the optimization of two Brugia AsnRS inhibitors to design and identify analogs, with improved affinity and specificity for the target protein, for chemical synthesis by our medicinal chemistry collaborators. Extension from our efforts to optimize Brugia AsnRS inhibitors, to improve their affinity and selectivity for the target protein relative to human AsnRS or other proteins, 11 has motivated the development of an automated binding site comparison tool. Chapter 4 describes the algorithm for the automated shape and chemistry comparison for defining binding site invariants and specificity determinants. The application of this algorithm to Brugia AsnRS and other ATP-binding proteins reveals novel binding site differences which can be used as effective filters in our virtual screening protocol. 12 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) References Drews, J. (2000) Drug discovery: a historical perspective. Science 287, 1960-4. Schwartz, R. S. (2004) Paul Ehrlich's magic bullets. N Engl J Med 350, 1079-80. Riethmiller, S. (2005) From Atoxyl to Salvarsan: searching for the magic bullet. Chemotherapy 51, 234-42. Hertzberg, R. P., and Pope, A. J. (2000) High-throughput screening: new technology for the let century. Curr Opin Chem Biol 4, 445-51. Fox, 8., Farr-Jones, S., Sopchak, L., Boggs, A., and Comley, J. (2004) High- throughput screening: searching for higher productivity. J Biomol Screen 9, 354- 8. Sheridan, R. P., and Kearsley, S. K. (2002) Why do we need so many chemical similarity search methods? Drug Discov Today 7, 903-11. Martin, Y. C., Kofron, J. L., and Traphagen, L. M. (2002) Do structurally similar molecules have similar biological activity? J Med Chem 45, 4350-8. Willett, P., Barnard, J. M., and Downs, G. M. (1998) Chemical similarity searching. J Chem Inf Comp Sci 38, 983-996. Bajorath, J. (2001) Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening. J Chem Inf Comp Sci 41, 233-245. Schnecke, V., and Bostrom, J. (2006) Computational chemistry-driven decision making in lead generation. Drug Discov Today 11, 43-50. Nicholls, A., MacCuish, N. E., and MacCuish, J. D. (2004) Variable selection and model validation of 2D and 3D molecular descriptors. J Comput Aided Mol Des 18, 45 1-74. 13 (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) Walters, W. P., Stahl, M. T., and Murcko, M. A. ( 1998) Virtual screening - an overview. Drug Discov Today 3, 160-178. Garrod, A. E. (1902) The incidence of alkaptonuria a study in chemical individuality. Lancet 2, 1616-1620. Ladu, B. N., Seegrniller, J. E., Laster, L., and Zannoni, V. G. (1958) The Nature of the Metabolic Defect in Alcaptonuria. Arthritis Rheum 1, 271-271. Titus, G. P., Mueller, H. A., Burgner, J ., Rodriguez De Cordoba, S., Penalva, M. A., and Tim, D. E. (2000) Crystal structure of human homogentisate dioxygenase. Nat Struct Biol 7, 542-6. Kendrew, J. C., and Perutz, M. F. (1957) X-ray studies of compounds of biological interest. Annu Rev Biochem 26, 327-72. Perutz, M. F. (1960) Structure of hemoglobin. Brookhaven Symp Biol 13, 165-83. Beddell, C. R., Goodford, P. J ., Norrington, F. E., Wilkinson, S., and Wootton, R. (1976) Compounds designed to fit a site of known structure in human haemoglobin. Br J Pharmacol 5 7, 201 -9. Kuntz, I. D., Blaney, J. M., Oatley, S. J ., Langridge, R., and Ferrin, T. E. (1982) A geometric approach to macromolecule-ligand interactions. J Mol Biol 161, 269- 88. Berrnan, H., Henrick, K., Nakamura, H., and Markley, J. L. (2007) The worldwide Protein Data Bank (waDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35, D301-3. Kuntz, I. D. (1992) Structure-based strategies for drug design and discovery. Science 25 7, 1078-82. Klebe, G. (2000) Recent developments in structure-based drug design. J Mol Med 78, 269-81. Kroemer, R. T. (2007) Structure-based drug design: docking and scoring. Curr Protein Pepi Sci 8, 312-28. 14 (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) Schnecke, V., Swanson, C. A., Getzoff, E. D., Tainer, J. A., and Kuhn, L. A. (1998) Screening a peptidyl database for potential ligands to proteins with side- chain flexibility. Proteins 33, 74-87. Schnecke, V., and Kuhn, L. A. (1999) Database screening for HIV protease ligands: the influence of binding-site conformation and representation on ligand selectivity. Proc Int Coanntell Syst Mol Biol, 242-51. Zavodszky, M. 1., Sanschagrin, P. C., Korde, R. S., and Kuhn, L. A. (2002) Distilling the essential features of a protein surface for improving protein-1i gand docking, scoring, and virtual screening. J Comput Aided Mol Des 16, 883-902. Gehlhaar, D. K., Verkhivker, G. M., Rejto, P. A., Sherman, C. J., Fogel, D. B., Fogel, L. J ., and Freer, S. T. (1995) Molecular recognition of the inhibitor AG- 1343 by HIV-1 protease: conformationally flexible docking by evolutionary programming. Chem Biol 2, 317-24. von Itzstein, M., Wu, W. Y., Kok, G. B., Pegg, M. S., Dyason, J. C., Jin, B., Van Phan, T., Smythe, M. L., White, H. F., Oliver, S. W., and et a1. (1993) Rational design of potent sialidase-based inhibitors of influenza virus replication. Nature 363, 418-23. Doman, T. N., McGovern, S. L., Witherbee, B. J ., Kasten, T. P., Kurumbail, R., Stallings, W. C., Connolly, D. T., and Shoichet, B. K. (2002) Molecular docking and hi gh-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J Med Chem 45, 2213-21. Verkhivker, G. M., Bouzida, D., Gehlhaar, D. K., Rejto, P. A., Arthurs, S., Colson, A. B., Freer, S. T., Larson, V., Luty, B. A., Marrone, T., and Rose, P. W. (2000) Deciphering common failures in molecular docking of ligand-protein complexes. J Comput Aided Mol Des 14, 731-51. Brooijmans, N., and Kuntz, I. D. (2003) Molecular recognition and docking algorithms. Annu Rev Biophys Biomol Struct 32, 335-73. Klebe, G. (2006) Virtual ligand screening: strategies, perspectives and limitations. Drug Discov Today 11, 580-94. McGann, M. R., Almond, H. R., Nicholls, A., Grant, J. A., and Brown, F. K. (2003) Gaussian docking functions. Biopolymers 68, 76-90. 15 (34) (35) (36) (37) (38) (39) (40) (41) (42) (43) (44) Schnecke, V., and Kuhn, L. A. (2000) Virtual screening with solvation and ligand-induced complementarity. Perspect Drug Discov 20, 171-190. Jones, G., Willett, P., and Glen, R. C. ( 1995) Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. J Mol Biol 245, 43-53. McMartin, C., and Bohacek, R. S. (1997) QXP: powerful, rapid computer algorithms for structure-based drug design. J Comput Aided Mol Des 11, 333-44. Rarey, M., Kramer, B., Lengauer, T., and Klebe, G. (1996) A fast flexible docking method using an incremental construction algorithm. J Mol Biol 261, 470-89. Friesner, R. A., Banks, J. L., Murphy, R. B., Halgren, T. A., Klicic, J. J ., Mainz, D. T., Repasky, M. P., Knoll, E. H., Shelley, M., Perry, J. K., Shaw, D. E., Francis, P., and Shenkin, P. S. (2004) Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47, 1739-49. Muegge, I. (2000) A knowledge-based scoring fimction for protein-ligand interactions: Probing the reference state. Perspect Drug Discov 20, 99-114. Gohlke, H., Hendlich, M., and Klebe, G. (2000) Knowledge-based scoring function to predict protein-ligand interactions. J Mol Biol 295 , 337-56. Charifson, P. S., Corkery, J. J ., Murcko, M. A., and Walters, W. P. (1999) Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J Med Chem 42, 5100-9. Wang, R., Lai, L., and Wang, S. (2002) Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput Aided Mol Des 16, 1 1-26. Shoichet, B. K. (2004) Virtual screening of chemical libraries. Nature 432, 862-5. Tong, L., Pav, S., Mui, S., Lamarre, D., Yoakim, C., Beaulieu, P., and Anderson, P. C. (1995) Crystal structures of HIV-2 protease in complex with inhibitors containing the hydroxyethylamine dipeptide isostere. Structure 3, 33-40. 16 (45) (46) (47) (48) (49) (50) (51) (52) (53) (54) (55) Weichsel, A., and Montfort, W. R. (1995) Ligand-induced distortion of an active site in thymidylate synthase upon binding anticancer drug 1843U89. Nat Struct Biol 2, 1095-101. Teague, S. J. (2003) Implications of protein flexibility for drug discovery. Nat Rev Drug Discov 2, 527 -41. Alberts, I. L., Todorov, N. P., and Dean, P. M. (2005) Receptor flexibility in de novo ligand design and docking. J Med Chem 48, 6585-96. Lesk, A. M., and Chothia, C. (1988) Elbow motion in the immunoglobulins involves a molecular ball-and-socket joint. Nature 335, 188-90. Zavodszky, M. I., and Kuhn, L. A. (2005) Side-chain flexibility in protein-ligand binding: the minimal rotation hypothesis. Protein Sci 14, 1104-14. Knegtel, R. M., Kuntz, I. D., and Oshiro, C. M. (1997) Molecular docking to ensembles of protein structures. J Mol Biol 266, 424-40. Osterberg, F., Morris, G. M., Sanner, M. F., Olson, A. J ., and Goodsell, D. S. (2002) Automated docking to multiple target structures: incorporation of protein mobility and structural water heterogeneity in AutoDock. Proteins 46, 34-40. Carlson, H. A., Masukawa, K. M., Rubins, K., Bushman, F. D., J orgensen, W. L., Lins, R. D., Briggs, J. M., and McCammon, J. A. (2000) Developing a dynamic pharrnacophore model for HIV -1 integrase. J Med Chem 43, 2100-14. Femandes, M. X., Kairys, V., and Gilson, M. K. (2004) Comparing ligand interactions with multiple receptors via serial docking. J Chem Inf Comput Sci 44, 1961-70. Totrov, M., and Abagyan, R. (1997) Flexible protein-ligand docking by global energy optimization in internal coordinates. Proteins Suppl 1, 215-20. Leach, A. R. (1994) Ligand docking to proteins with discrete side-chain flexibility. J Mol Biol 235, 345-56. 17 (56) (57) (53) (59) (60) (61) (62) (63) (64) (65) Kallblad, P., Todorov, N. P., Willems, H. M., and Alberts, I. L. (2004) Receptor flexibility in the in silico screening of reagents in the S1' pocket of human collagenase. J Med Chem 47, 2761-7. Rader, A. J ., Hespenheide, B. M., Kuhn, L. A., and Thorpe, M. F. (2002) Protein unfolding: rigidity lost. Proc Natl Acad Sci U S A 99, 3540-5. Hespenheide, B. M., Rader, A. J ., Thorpe, M. F., and Kuhn, L. A. (2002) Identifying protein folding cores from the evolution of flexible regions during unfolding. J Mol Graph Model 21, 195-207. Lei, M., Zavodszky, M. I., Kuhn, L. A., and Thorpe, M. F. (2004) Sampling protein conformations and pathways. J Comput Chem 25, 1133-48. Zavodszky, M. 1., Lei, M., Thorpe, M. E, Day, A. R., and Kuhn, L. A. (2004) Modeling correlated main-chain motions in proteins for flexible molecular recognition. Proteins 5 7, 243-61. Walker, B., and Lynas, J. F. (2001) Strategies for the inhibition of serine proteases. Cell Mol Life Sci 58, 596-624. Coghlan, M. J ., Elmore, S. W., Kym, P. R., and Kort, M. E. (2003) The pursuit of differentiated ligands for the glucocorticoid receptor. Curr Top Med Chem 3, 1617-35. Matter, H., and Schudok, M. (2004) Recent advances in the design of matrix metalloprotease inhibitors. Curr Opin Drug Discov Devel 7, 513-35. Fabian, M. A., Biggs, W. H., 3rd, Treiber, D. K., Atteridge, C. E., Azimioara, M. D., Benedetti, M. G., Carter, T. A., Ciceri, P., Edeen, P. T., Floyd, M., Ford, J. M., Galvin, M., Gerlach, J. L., Grotzfeld, R. M., Herrgard, S., Insko, D. E., Insko, M. A., Lai, A. G., Lelias, J. M., Mehta, S. A., Milanov, Z. V., Velasco, A. M., Wodicka, L. M., Patel, H. K, Zarrinkar, P. P., and Lockhart, D. J. (2005) A small molecule-kinase interaction map for clinical kinase inhibitors. Nat Biotechnol 23, 329-36. Goodford, P. J. (1985) A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J Med Chem 28, 849-57. 18 (66) (67) (68) (69) (70) (71) (72) (73) (74) (75) Kastenholz, M. A., Pastor, M., Cruciani, G., Haaksma, E. E., and Fox, T. (2000) GRID/CPCA: a new computational tool to design selective ligands. J Med Chem 43, 3033-44. Braiuca, P., Cruciani, G., Ebert, C., Gardossi, L., and Linda, P. (2004) An innovative application of the "flexible" GRID/PCA computational method: study of differences in selectivity between PGAs from Escherichia coli and a Providentia rettgeri mutant. Biotechnol Prog 20, 1025-31. Sheridan, R. P., Holloway, M. K., McGaughey, G., Mosley, R. T., and Singh, S. B. (2002) A simple method for visualizing the differences between related receptor sites. J Mol Graph Model 21, 217-25. Miller, M. D., Kearsley, S. K., Underwood, D. J ., and Sheridan, R. P. (1994) FLOG: a system to select 'quasi-flexible' ligands complementary to a receptor of known three-dimensional structure. J Comput Aided Mol Des 8, 153-74. Deng, Z., Chuaqui, C., and Singh, J. (2004) Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-1i gand binding interactions. J Med Chem 4 7, 337-44. Perola, E. (2006) Minimizing false positives in kinase virtual screens. Proteins 64, 422-35. Lazdins, J ., and Kron, M. (1999) New molecular targets for filariasis drug discovery. Parasitol Today 15, 305-6. Melrose, W. D. (2002) Lymphatic filariasis: new insights into an old disease. Int J Parasitol 32, 947-60. Brown, K. R., Ricci, F. M., and Ottesen, E. A. (2000) Ivermectin: effectiveness in lymphatic filariasis. Parasitology 121 Suppl, S133-46. Kron, M. A., Kuhn, L. A., Sanschagrin, P. C., Hartlein, M., Grotli, M., and Cusack, S. (2003) Strategies for antifilarial drug development. J Parasitol 89 (Suppl), $226-$235. 19 Chapter 2 Discovering new classes of Brugia malayi asparaginyl-tRNA synthetase inhibitors and relating specificity to conformational change This research has been previously published as: Sukuru, S. C. K., Crepin, T., Milev, Y., Marsh, L. G, Hill, J. B., Anderson, R. J ., Morris, J. C., Rohatgi, A., O’Mahony, G., Gretli, M., Danel, F., Page, M. G. P., Hartlein, M., Cusack, S., Kron, M., and Kuhn, L. A. (2006) Discovering new classes of Brugia malayi asparaginyl-tRNA synthetase inhibitors and relating specificity to conformational change. J Comput Aided Mol Des 20, 159-78. 2.1 Abstract SLIDE, which models the flexibility of protein and ligand side chains while docking, was used to screen several large databases to identify inhibitors of Brugia malayi asparaginyl- tRNA synthetase (AsnRS), a target for anti-parasitic drug design. Seven classes of compounds identified by SLIDE were confirmed as having micromolar inhibition constants against the enzyme. Analogs of one of these classes of inhibitors, the long side- chain variolins, cannot bind to the adenosyl pocket of the closed conformation of AsnRS 20 due to steric clashes, though the short side-chain variolins identified by SLIDE apparently bind isosterically with adenosine. We hypothesized that an open conformation of the motif 2 loop also permits the long side-chain variolins to bind in the adenosine pocket and that their selectivity for Brugia relative to human AsnRS can be explained by differences in the sequence and conformation of this loop. Loop flexibility sampling using ROCK confirms this possibility, while scoring of the relative affinities of the different ligands by SLIDE correlates well with the compounds’ ranks in inhibition assays. Combining ROCK and SLIDE provides a promising approach for exploiting conformational flexibility in structure-based screening and design of species selective inhibitors. 2.2 Introduction Lymphatic filariasis, also known as elephantiasis, is caused by the nematode worms Wuchereria bancrofii and Brugia malayi. It is a debilitating human disease that afflicts more than 200 million people worldwide. More than 1.2 billion people in 80 countries reside in areas where the disease is actively transmitted and are at a great risk of contracting the disease (1-3). The crippling physical effects of the disease have a huge economic and social impact, which is why lymphatic filariasis is one of the top 10 tropical diseases being targeted by the World Health Organization. Strategies to control the disease include administration of drugs like iverrnectin, diethylcarbarnazine and albendazole, which reduce the level of infection and prevent transmission. Most of these drugs were discovered decades ago as chemotherapeutic agents to combat human 21 filariasis. However, the use of these drugs has been plagued with concerns about their inability to kill the adult worms even after long treatment durations, severe side effects, and the emergence of drug resistance in humans (4, 5). Aminoacyl-tRNA synthetases (AARS) have been acknowledged as rational targets for anti-infective drug development (6) because these enzymes are essential for viability. AARS are one of several new drug targets in human filarial parasites that have been proposed in recent years (1). They differ significantly in sequence and structure between the parasite and host organism, although sharing a common catalytic site topology. AARS are responsible for the specific aminoacylation of transfer RNAs (tRNAs). The two-step catalytic reaction involves the ATP-based activation of the amino acid and the transfer of activated amino acid to the 3’-end of the cognate tRNA. The 20 AARS (one for each amino acid) are divided into two classes on the basis of their active site architecture and conserved sequence identity. The class I AARS possess two signature arnino-acid sequences (HIGH and KMSKS) located in the active site with its characteristic nucleotide-binding Rossmann fold, which consists of alternating pattern of B-strands and (it-helices (7). The active site of class II AARS is comprised of a six- stranded antiparallel B-sheet flanked by an additional parallel strand and 3 a-helices (8). Three signature sequence motifs - motifs 1, 2 and 3 — characterize this class (Figure 2.1.3). Motif 1 consists of: +G(F/Y)XX(V/L/I)PXXXXFRxE. Motif 3 consists of: GER - Ihkl,i|]/[thl Xillhkll], where i is the number of reflection hkl. Refinement with CNS (34). e Ramachandran diagram has been calculated with PROCHECK (35 ). (LBHAMP) were used for analyzing the conformational flexibility of the protein. Data collection and refinement statistics are provided in Table 2.1. Details of crystallization and structure determination will be presented elsewhere, and X-ray coordinates will be 28 deposited in the Protein Data Bank (33). In the meantime, X-ray coordinates can be obtained by contacting Stephen Cusack (cusack@embl-grenoblefi). 2.3.2 Experimental assay We standardized the malachite green assay for phosphate release (36—39) for use in monitoring inhibition of aminoacylation using colorimetric measurement of pyrophosphate generation from the first step in the aminoacylation reaction: E + AA + ATP 4—» E(AA-AMP) + PP. (2.1) E(AA-AMP) + tRNA <——> E + AA-tRNA + AMP (2.2) This assay was used to measure inhibition constants for inhibitors predicted by SLIDE and their analogs. 2.3.3 Screening and docking with SLIDE SLIDE (Screening for Ligands by Induced-Fit Docking Efficiently) was used to screen databases of small organic molecules to find potential inhibitors of Brugia AsnRS. SLIDE (20, 26, 40) is a screening and docking tool that uses distance geometry to screen and dock ligand candidates into the binding site of the target protein. SLIDE represents the binding site of the protein by a template consisting of points identified as the most favorable positions for ligand atoms to form hydrogen bonds or make hydrophobic interactions with the neighboring protein atoms (27). The ligand candidates in the database are similarly represented by a set of interaction points, assigned to polar atoms 29 or centers of hydrophobic atom clusters. For each ligand candidate, all possible triplets of its interaction points are mapped onto all geometrically and chemically compatible template triangles. The anchor fragment of the ligand is defined by the triplet of interaction points that match with a template triangle. Any portion of the ligand outside this anchor fragment is considered flexible by SLIDE. After finding a feasible match between the ligand candidate and protein template, SLIDE models induced-fit by resolving steric overlaps between the flexible portion of the ligand and the protein side chains using minimal rotations determined by mean-field optimization (26, 29). Collision-free docked ligand orientations are scored based on the number of hydrogen bonds and degree of hydrophobic complementarity with the protein. The SLIDE software is available to academic and commercial researchers; see Software at http://www.bch.msu.edu/labs/kuhn 2.3.4 Scoring protein-ligand interactions Several comparative studies of docking and scoring methods (28, 41 -45) have shown that no one scoring function for predicting ligand binding and affinity performs consistently well across diverse protein families. Hence, to develop a scoring protocol that can distinguish ligands from non-ligands, reliably detect the correct conformation and binding mode for known ligands and score them in the order of their relative affinity for Brugia AsnRS, a panel of three different scoring functions was tested: SLIDE score (2 7), DrugScore (46), and X-Score (4 7). SLIDE score is a weighted sum of hydrophobic and hydrogen-bond interaction terms, trained to match affinity values in known complexes. 30 DrugScore is a knowledge-based scoring ftmction that uses structural information from the Protein Data Bank (PDB) to score protein-ligand complexes based on the preferred distances observed between different ligand and protein atom pairs. X-Score is an empirical scoring function that calculates the binding affinity of a protein-ligand complex by using terms that account for van der Waals interactions, hydrogen bonding, deformation, and the hydrophobic effect. Scoring accuracy was determined by how well the scoring functions assessed the binding modes and relative affinities of known Brugia AsnRS ligands, whereas the enrichment accuracy was determined by their ability to select true ligands from a large number of decoys (1000 diverse, random drug-like molecules obtained from the website of Dr. Didier Rognan, CNRS: http://bioinfo.pharma. u-strasbg.fr/bioinformatics-cheminforrnatics-group.htrnl). Low-energy conformers of these decoy molecules were generated using Omega (OpenEye Software) as input to SLIDE screening. All conformers within 7.5 Kcal/mol of the minimum energy conformer sampled (using the MMFF force field) were included. For each docked compound (known ligands as well as non-ligand decoys), only the top scoring binding orientation was ranked. Any two docked compounds scoring identically were given identical ranks. 2.3.5 Modeling main-chain flexibility The active sites of human and Brugia AsnRS are very similar, with only 3 amino acid differences in the first shell of amino acids surrounding the active site, including all residues within 9A of any atom of ASNAMS (Figure 2.2.A). Because these three side chains point away from the binding site, it is most likely that they influence the 31 Figure 2.2: (A) Known flexible regions of Brugia AsnRS, based on comparison of the three crystal structures, are shown in green ribbons, while the rest of the monomer is shown in grey ribbons (see Table 2.5 for details; residue ranges are given for each flexible loop). The three residues that differ near the active site in Brugia and human AsnRS are rendered in space filling models colored by atom type, with carbon atoms colored grey. ASNAMS is rendered in atom-colored tubes with carbon atoms colored orange. The template generated by SLIDE to represent the binding site is rendered in small stars. The template points, representing ligand chemistry that would be favored at that site, are colored red for hydrogen-bond acceptor; blue for hydrogen-bond donor; white for hydrogen-bond donor and/or acceptor; and, green for hydrophobic interactions. (B) ROCK flexible loop conformations generated from the ASNAMS-bound Brugia AsnRS crystal structure. The diverse conformations shown were chosen from 500 ROCK conformers by selecting those conformations with the largest pairwise RMSDs in main- chain dihedral angles, as described in [31]. The monomer is rendered in ribbons colored by the flexibility index, with blue being the most rigid and red being the most flexible regions of the structure, according to ProFlex. The ligand is shown in atom colored tubes with carbon atoms in orange. The three active-site-neighboring residues that differ between Brugia and human AsnRS are rendered as space filling models colored by flexibility index. Different conformers for the flexible regions of Brugia AsnRS, generated by ROCK, are shown in green ribbons, with the adenine binding loop shown at bottom. 32 Figure 2.2 33 conformations of residues that interact with asparagine or adenosine or the conformations of the motif 2 adenine-binding loop, which has residue 224 (Ala in Brugia and Thr in human) near its hinge. To examine alternate conformations accessible to the known flexible active-site loops in Brugia AsnRS, we used ProFlex software (30) to identify the flexible regions in the protein, and ROCK (31, 32) to sample them. ProFlex predicts the flexible and rigid regions in a given structure (which bonds are constrained and which bonds remain free to rotate) based on analysis of constraints posed by the protein’s network of covalent bonds, hydrogen bonds, salt bridges, and hydrophobic interactions. ProFlex calculations are fast and have been shown to predict the conformational flexibility of a protein reliably from a single 3D structure (30, 48, 49). ROCK (Rigidity Optimized Conformational Kinetics) uses a restricted random-walk sampling to search the conformational space available to proteins given the flexible regions defined by ProFlex as input. A conformer generated by ROCK is either accepted or rejected, depending upon whether it maintains the non-covalent bond network and results in no van der Waals overlaps between atoms. The most distinct main-chain conformers generated by ROCK are selected based on the RMSD values relative to the initial structure. Brugia AsnRS active-site loop conformers representing favorable open conformations of the protein were used to interpret the observed affinities and specificities for Brugia AsnRS relative to human AsnRS for inhibitors that could not bind (based on steric clashes) with the closed loop conformation of Brugia AsnRS. ProFlex and ROCK software are available to academic and commercial researchers; see Software under http://www.bch.msu.edu/labs/kuhn. 34 2.4 Results 2.4.1 Scoring Brugia AsnRS-ligand interactions The results of the scoring analysis by SLIDE score, DrugScore and X-Score (Figure 2.3) show that both SLIDE score and DrugScore do a reliable job of assessing the right conformation, binding mode, and relative affinity of known AsnRS ligands, and also distinguish three known ligands from 1000 diverse drug-like molecules. -I— Slide + Drugscore . —A-— XScore 100 - I 0 g 90 .- '5 . 9 80 - U) 'o .. 8 g 70 4 ‘ 1: . E 60 - C x . “6 a 501 g 'I g 40 -' i ‘ ll 30 1 l 1 l l L I I Rank Figure 2.3: Enrichment plot for the three different scoring functions - SLIDE score, DrugScore and X-Score. The scoring functions were assessed for their ability to distinguish as top—scoring compounds the three known Brugia AsnRS ligands (ASNAMS, LBHAMP, ATP) when mixed into a set of 1000 drug-like small molecules (from http://bioinfo-pharma.u-strasbg.fr/bioinformatics-cheminformatics-group.html). The top- scoring binding orientation and conformer for each ligand candidate was ranked relative to the other candidates. 35 The first of these known ligands discovered for Brugia malayi independently of SLIDE, with an experimentally characterized binding mode, is ASNAMS, which was designed as a product analog for step 1 in the aminoacylation reaction and had previously been shown to bind in the crystal structure of Thermus thermophilus AsnRS (8). The structure of Complex I (Figure 2.1) shows that this compound binds in the same orientation in Brugia malayi AsnRS, with an IC50 value of 4.5 M, as measured by the malachite green assay (Table 2.2). L-aspartate-B-hydroxamate was used as a reagent during development of the malachite green assay (39). When the third known ligand, the substrate ATP, was added during the assay, L-aspartate-B-hydroxamate adenylate (LBHAMP) was formed and remained bound to AsnRS, as confirmed by the structure of Complex II (Table 2.1). This complex contains LBHAMP and pyrophosphate bound to one of the monomers in the dimer, whereas ATP alone was observed in the other monomer. The IC50 value of LBHAMP is 4 pM (Table 2.2), very similar to that of the product analog ASNAMS. The three known ligands, ASNAMS, LBHAMP, and ATP, were all docked by SLIDE to within 1 A RMSD of their crystallographically observed positions, and the scoring functions ranked them correctly according to their experimentally determined IC50 values against Brugia AsnRS; adenosine is not observed to inhibit AsnRS at a concentration of 500 pM. The enrichment plot in Figure 2.3 shows that SLIDE score performs particularly well in ranking the three Brugia AsnRS known ligands within the top 5 scoring compounds and clearly distinguishes them from the vast majority of the molecules used as non-ligand decoys. 36 __ \o o olalulolxolamloA 32980 soz 4 one- a. z z m mi :zlo: Bee/Ema _\ _ \V 2/ z N=2 :0 IO o o o Diminimi—flUJWIOA 363560 E 2. as. 8 W.,. 2 6 £2 2": m2008 8005 0030.208 08 mafia 8 @0003 0.8 003850 380880an .9804 592% .8 88338 wouonvoafieqm .8 003? .3850 80888008 338083898 88 $80me 880 mew—9 0080.... 3880803800 8:338-m~80< 80880.5 ”flu 03:8 43 .0EEO>£ 058 mm 080me mo 02? 0>umw0c 085 < 00.0.5530 Ho Z 0 n diggfl 088 mm 058 mew—m we 0:?» 00:3: < m 2108 3 cou:=::_ fog Q2 galoomym :ou:=n:_ ovg QZ géloo~aa couuzgam REV DZ gancomam couwzgam @»mm OZ 0N6- o3». o / \\\ $54va $3 mUmZ 835:8 3 «Sue oomA omflm: 0 OZ mm 5204 Sum—3 mi m nod- Mm M<>m2m $8000 983 008.5 0.00:5 Q: My 0.80m 0000mw=0D maflm 00:00:05 ON .80»: 93 an: 0 00800.00 my80< Swim 0859 -mZSAm< 05 .00 < 8080 008 080—000 00.“ 0080800 0003 000000 0000808 05 0.83 .mM:0< 808:: 0:0 SMEM 00.0 @000 000% 0020808 05 $800 :0 0003 000 005850 _0808toaxm 0800800020? 000 m 8:088 0.80588 mug/x umwfim 0000>000_?m94m 08 .«0 $2080 00800850 .00 0080> E850 008800000 £0808t00x0 0:0 A0000meQ ~80 meqmv 00.500 38080808800 008£8Tm~80< 03080.5 ”Wm 030B. 45 .000 050.5 02020.0 :0 0:80 0025030 000 3 008% 950w 0208000000 08 .00 0008000 805 AHOQ<>UV 0800000002003 .00 808000 mew—m :0 00009 00008000 003 w0_0:0 000C. 0 008800000 002 0 030000000 0008 08 008me .00 080> 0>00w000 0008 < 0 050000000 0008 08 00000 meqm .00 080> 0088 < 0 O \\ OlmlzIOIIOIUIIU /z£ u z0<-0 8 00 -8906 82A «0002 02 00 820.0 0008800 EN 030,—. 46 Although variolin B and its derivatives show promising inhibition of Brugia AsnRS enzymatic activity, they are highly cytotoxic in human Namalwa cell lines (F. Danel, data not shown). The binding mode predicted by SLIDE indicates that the ligand binds in the adenosyl pocket of the binding site. Because variolins contains five- and six- membered rings that are isosteric and share chemistry with adenine, these compounds may be toxic because they bind to ATP sites in general. This interpretation is supported by research showing that pyrrolopyrimidines and their variants inhibit human protein kinases (62-65). The key to address cytotoxicity would be to design in selectivity for Brugia AsnRS by conjugating an asparagine side-chain to the variolin scaffold via an appropriate linker, and this work is in progress. Since no crystallographic information is available on the variolin B derivatives, but the Cambridge Structural Database provides a crystal structure for the variolin B scaffold (CSD code: LEPWIM), the 3D structures of the variolin B derivatives were built on this scaffold using CORINA (66), followed by generating all low-energy conformers using Omega (OpenEye Software). All conformers within 7.5 Kcal/mol of the minimum energy conformer sampled (using the MMF F force field) were included. Results of docking these conformers will be presented in the section entitled “Impact of Main-chain Conformational Flexibility on Ligand Binding”. 2.4.2.1.2 Rishirilide B Rishirilide B (CSD code CUQZUJ), isolated from Streptomyces rishiriensis, has been shown to have antithrombotic activity through selective az-macroglobulin inhibition, leading to the activation of plasmin (6 7). It has a tricyclic scaffold that is relatively rigid 47 and is isosteric with the adenosine moiety of ASNAMS, according to the SLIDE docking. This orientation (Figure 2.4.B) was scored highly by both SLIDE and DrugScore. An alkyl side-chain protrudes from the binding site and is in a polar environment surrounded by residues Arg411, Glu310 and Hile9. The scaffold of this compound, generated by pruning the alkyl chain and a carboxymethyl group (-COOMe) from the ligand, was also docked using SLIDE. The scaffold docked with improved isostericity with the adenine portion of ASNAMS but with a poor interaction score, apparently due to loss of the side chains. The malachite green assay was carried out on a sample of rishirilide B, provided generously by Dr. Samuel Danishefsky (Sloan-Kettering Institute, New York), to test for inhibition of Brugia AsnRS. The compound has + and — enantiomers, and only the enantiomeric mixture showed weak inhibitory activity (Table 2.3), while the — enantiomer showed no inhibition. This suggests that the + enantiomer is responsible for inhibition. Given the unavailability of purified + enantiomer for assays and the apparently weak binding of the enantiomeric mix, we chose to focus on tighter-binding compounds, as described in the following sections. 2.4.2.1.3 Cycloadenosine Also identified by SLIDE as a top-scoring potential inhibitor was 8, 2’-cycloadenosine (CSD code for the crystal structure of its trihydrate: CYADOT). Cycloadenosine is a modified nucleoside cyclized at the C(8) and 0(2’) atoms and is known to be active against leukemic and other tumor cells (68). Derivatives of cycloadenosine are also known to inhibit other tRNA synthetases: PheRS, SerRS, LysRS, ValRS, IleRS and 48 AragRS (69). This compound was docked by SLIDE in a position that is isosteric with the adenosine of ASNAMS and was considered to have a potential advantage over adenosine because the bridged ring system in cycloadenosine could potentially reduce the entropic cost of binding to AsnRS. The binding mode predicted by SLIDE is shown in Figure 2.4.C. This compound can be made more stable by replacing the oxygen in the bridge between the adenine and ribose moieties by a methylene group. While this cycloadenosine did not show inhibitory activity even at a concentration of 500 uM, this is also true of the native substrate, adenosine. The corresponding sulfamoyl asparagine derivative (CYADOT-S-Asn) was then synthesized and assayed for inhibition of Brugia AsnRS. The corresponding sulfamoyl asparagine derivative of cycloadenosine showed moderate inhibition, with IC50 values of 70 p.M and 90 [1M against Brugia and human AsnRS, respectively (Table 2.4). This represents weaker binding than indicated by IC50 values of the un-cyclized analog, ASNAMS (4.5 uM and 1.7 uM, respectively). Strain caused by cyclization of the ribose moiety to the adenine, resulting in the 5’ sulfamoyl asparagine group being directed somewhat out of the asparagine pocket of the binding site, may have weakened the binding relative to non-cyclized ASNAMS. This can be addressed by redesigning the linker. 2.4.2.2 Results from screening the N CI plated compounds database 2.4.2.2.] Phenanthridinol 8-Chloro—3-(hydroxy(oxido)amino)-6-phenanthridinol (NCI code: NSC114691) has a rigid tricyclic scaffold (Figure 2.4.D) that fills the adenine pocket of AsnRS. Although 49 this compound showed a favorable 65 uM inhibition of Brugia AsnRS in the experimental assay (Table 2.3), the planarity and aromaticity of the scaffold are a cause for: concern. Planar, tricyclic scaffolds are potentially toxic because of their ability to intercalate DNA. Compounds sharing similar scaffolds have been shown to possess inhibitory activity against Brugia AsnRS (70) and phosphodiesterase 4 (PDE4) (71 ). 2.4.2.2.2 Triazinylamine 4-(3 -(4-amino-6—isopropeny1— 1 ,3 ,5-triazin-2-yl)phenyl)-6-isopropenyl-1 ,3 ,5-triazin-2- ylamine (NCI code: NSC363624) has a symmetric structure with two substituted triazine rings connected by a phenyl group. The SLIDE-predicted orientation of the compound (Figure 2.4.E) places one of the triazine rings isosteric with the 6-membered ring of ASNAMS and mimics its interactions with the surrounding binding site residues as well. The bridging phenyl ring in the center is docked in the ribose pocket of the binding site, but is unable to mimic the polar interactions of the ribose ring owing to its hydrophobic character. Results of the malachite green assay (Table 2.3) show that this compound inhibits 50% of Brugia AsnRS activity and 80% of human AsnRS activity at 25 uM concentration. Further studies will assess if structure-based substitutions can make it an even more potent and selective inhibitor of Brugia AsnRS. 1,3,5—triazine-substituted- polyamines have been shown to be active against the malarial parasite, Plasmodium falciparum ( 72). 50 2.4.2.2.3 Phenanthrylethanone 2-(3-methyl-1 XS-pyridin-l-yl)—1-(2-phenanthryl)ethanone (NCI code: NSC35467) is a charged pyridine derivative. A keto group bridges between the tricyclic moiety and the pyridine ring. The SLIDE-predicted orientation (Figure 2.4.F) shows the tricyclic scaffold in the adenine pocket which, though isosteric with adenine does not form any specific hydrogen bonds with the surrounding binding site residues. Planarity of the tricyclic scaffold in this compound increases concerns about potential toxicity associated with DNA intercalation. Results from the malachite green assay (Table 2.3) show that at 200 uM, this compound inhibits 53% Brugia AsnRS activity but does not inhibit human AsnRS. Although it is weak inhibitor of Brugia AsnRS, its selectivity for Brugia relative to human AsnRS is attractive. We are focusing on identifying the source of specificity within this compound to guide the optimization of more potent inhibitor scaffolds. 2.4.2.2.4 Dimethylmalanomide N1,N3-bis(4-amino-Z-methyl-6-quinolinyl)-2,2-dimethylmalonamide (NCI code: NSC- 12156) is a symmetric compound with two bicyclic ring systems. Disubstituted malonamides are known to have weak trypanocidal activity against Trypanosoma brucei (73). SLIDE docked one of the bicyclic groups, which shares chemistry, and shape with the 6-membered ring of adenine, into the adenine pocket of the binding site (Figure 2.4.G), mimicking the hydrogen bonds formed by N3 and N6 of adenine. However, in this predicted binding mode, the other bicyclic ring system could not be docked favorably. Results from the inhibition assay (Table 2.3) show that this compound weakly 51 inhibits both Brugia and human AsnRS (~ 50% inhibition at 200 uM ligand concentration). However, a single malonamide group could be substituated to allow binding in the ribose and asparagine pockets as well as the adenine pocket. 2.4.2.3 Success rate of screening From SLIDE screens on the CSD and NCI drug-like compounds, 45 compounds altogether were tested for Brugia and human AsnRS inhibition. Out of the compounds tested, seven classes of compounds predicted by SLIDE were confirmed as low- to mid- micromolar inhibitors: rishirilide and cycloadenosine from the CSD, four NCI plated compounds, and variolin B and its analogs (Tables 2.3 and 2.4 and Figure 2.4). Some of these compounds and their analogs (particularly the long-chain variolins, Table 2.4) selectively inhibit Brugia relative to human AsnRS. The success rate in screening by SLIDE for AsnRS inhibitors is thus 7 out of 45 compounds (~15%). The best published hit rate for structure-based screening is 34% (I9), involving visual screening by medicinal chemists as well as using docking scores as a guide. 2.4.3 Modeling the conformational flexibility of Brugia AsnRS Modeling protein flexibility in ligand binding is important to improve the accuracy of results in computational ligand screening and design, as even a small change in the protein binding site conformation can introduce large changes in ligand interactions and computed binding affinities. Understanding conformational differences can also enable 52 the design of substituents that improve binding and specificity. The results from the enzyme inhibition assays performed on ligand candidates identified by SLIDE indicate that they bind to Brugia AsnRS with a range of binding affinities, from low (2 200 uM) to moderate (S 25 M). The variolin B derivatives are of particular interest because they show some selectivity towards Brugia AsnRS. Given the absence of active-site sequence differences between Brugia and human AsnRS, we sought to understand how the flexibility of active-site loops coupled with neighboring sequence differences (Figure 2.2.A) could influence ligand binding through conformational differences. To model flexibility that might contribute to inhibitor specificity, the known flexible regions in Brugia AsnRS were mapped by comparing the ASNAMS-bound and ligand-free (apo) crystal structures (Table 2.5). Analysis of the apo crystal structure indicates high mobility of the adenine-binding loop, and comparison of Brugia AsnRS structures bound to LBHAMP and ATP indicates the amino acid recognition loop adopts significantly different conformations depending on the type of ligand bound. To assess alternative conformations for the adenine-binding loop, ProFlex flexibility analysis (30) was performed on the ASNAMS-bound conformation of Brugia AsnRS to identify the coupled networks of covalent and non-covalent bonds within the protein. The ligand was removed from the protein before running ProFlex, and only those hydrogen bonds and salt bridges with energies of S —1 .0 Kcal/mol were included to avoid including hydrogen bonds that are too weak to influence protein flexibility. The results of ProFlex analysis for Brugia AsnRS included the relative flexibility for each bond (from rigid/non rotatable through entirely flexible) and lists of which bond rotations were coupled through rings of covalent and noncovalent interactions. This information and the structure of Complex I 53 Table 2.5: Known flexible regions in Brugia AsnRS. See structures in Figure 2.2. Residue Range of Flexible Loop Remarks K213 ' L220 Coordinates could not be determined due to mobility of these Adenine binding loop residues in both chains A and B of the apo crystal structure. E297 '— F302 Residues were mobile (no coordinates determined) in chain A Distal loop of the apo crystal structure. Q1 63 — L172 Significant conformational differences were observed for these Amino acid substrate residues between the LBHAMP-bound monomer and the ATP- recognition 100p bound monomer of the same crystal structure. with ASNAMS removed were used as the input to ROCK. ROCK then generated alternative low-energy conformations that preserved the non-covalent bond network, by sampling (3]) favored main-chain dihedral angles in the flexible regions of Brugia AsnRS. Five hundred main-chain conformers were generated, spanning from closed to very open conformations. Out of the 500 conformers generated by ROCK, there were 14 conformers with a significant main-chain deviation (more than 4.5 A) in the adenine- binding loop. To select from these conformations the most open conformer of the protein, overall, we computed the minimum and maximum distances of the three known flexible loops (Table 2.5 and Figure 2.2.A) from the centroid of the co-crystallized ligand, ASNAMS, and chose the AsnRS conformation with the greatest sum of these loop distances. Thus, the open conformation analyzed not only had a significant main-chain 54 deviation in the adenine-binding loop, but also reflected a feasible, highly open conformation of the protein overall, when compared to the closed conformation. The most open conformation generated by ROCK for Brugia AsnRS shows a significant opening of the adenine-binding loop (residues K213 — L220; Figure 2.5.3) connecting the two anti-parallel B-strands near the binding site (Figure 2.1.B). This loop is involved in the binding of ATP and the acceptor end of the cognate tRNA and has been reported to play a significant role in conformational changes associated with other class II AARS, such as AspRS (74), LysRS (75), SerRS (76), ProRS (77) and HisRS (78). Mutations in this loop have also been shown to affect the tRNA dependent amino acid recognition by SerRS (79). Motion of this loop in Brugia AsnRS, as simulated by ROCK, exposes a new cavity near the adenosine pocket of the binding site, leading to an open conformation that emulates the apo crystal structure of the protein (Figure 2.5.B). A shift in the position of His 219 facilitates the significant change in backbone conformation of this loop. In the ASNAMS-bound crystal structure, representing the closed conformation of the protein, His 219 is docked between Glu 310 and Arg 411 and blocks access to the cavity that is exposed in the open conformation (Figure 2.5.C, D). 55 Figure 2.5: (A) The ASNAMS-bound closed crystal structure conformation of Brugia AsnRS is compared with the ligand-free (apo) crystal structure. ASNAMS is shown as atom colored tubes, with carbon atoms colored orange. The Connolly solvent-accessible molecular surface of the apo conformation, lacking atomic coordinates of the mobile adenine binding loop, is rendered as a solid surface (off-white), while that of the closed conformation is rendered as mesh in cyan, showing the closed adenine binding loop as a cyan-colored ribbon. (B) The ROCK-generated most open conformation of Brugia AsnRS is compared with the apo crystal structure. The Connolly surface of the apo crystal structure conformation is again rendered as solid, while that of the open conformation is rendered as green mesh. The most open conformation of the adenine binding loop is rendered as a green ribbon. (C) The adenine binding loop residue His 219 (Connolly surface colored by atom type, with carbon atoms colored cyan) is docked between Glu 310 and Arg 411 (atom-colored Connolly surface, with carbons in green) in the closed crystal structure conformation of Brugia AsnRS. ASNAMS is shown in atom colored tubes with carbons colored orange. (D) His 219 (Connolly surface colored by atom type) undergoes a significant motion away from adenosine, due to reorientation of the main chain in the ROCK-generated open conformation. (E) LCM02, a long side-chain variolin B derivative shown in atom colored tubes, manually docked in the closed crystal structure conformation of Brugia AsnRS (with Connolly surface colored purple) according to the predicted binding mode of variolin B (Figure 2.4.A). ASNAMS is shown in atom colored tubes, with carbons colored orange. The steric clashes between the side- chain of LCM02 at the lower lefi and the closed conformation of the adenine binding loop (purple ribbon) could not be resolved by any single bond rotations in the ligand or protein. (F) LCM02, long-chain variolin B derivative shown in atom colored tubes, docked in the ROCK-generated open conformation of Brugia AsnRS (Connolly surface colored cyan) to match the variolin B binding mode. For reference, ASNAMS is shown in atom colored tubes with carbons colored orange. There were no steric clashes between LCM02 and the open conformation of the adenine binding loop (cyan ribbon), and the long side chain with dimethyl amino group (lower-left) fits well into the channel uncovered by opening the adenine binding loop and the proposed His 219 gate. 56 Figure 2.5 57 2.4.4 Impact of main-chain conformational flexibility on ligand binding: interpreting the observed affinities and specificities The binding modes predicted by SLIDE, using the closed, crystal structure Brugia AsnRS conformation, could explain the observed binding affinities and specificities of all the compounds except the two long side-chain derivatives of variolin B (LCM01 and LCM02 in Table 2.4). All low-energy conformations of the variolin B derivatives, generated using Omega, were tested for docking into the closed conformation with SLIDE. LCM01 and LCM02 could not be docked into the binding site of the closed conformation, due to unresolvable steric collisions between the long side chains of the variolins and the backbone atoms of the adenine-binding loop (Figure 2.5.E). However, with the same docking protocol, SLIDE was able to dock these compounds into the binding site of the open conformation generated by ROCK (Figure 2.5.F). (The apo crystal structure was not used for docking these compounds because the adenine-binding loop is so flexible in the apo structure that its atomic coordinates could not be determined, and the interactions between the ligand and the loop therefore could not be assessed.) The binding mode of LCM01 and LCM02 docked in the , ROCK-generated open conformation of Brugia AsnRS was in good agreement with the SLIDE-predicted binding mode of the unsubstituted variolin B in the closed conformation of the protein. Long side-chain variolins LCM01 and LCM02 have IC50 values of 173i90 11M and 123i54 uM, respectively, indicating they are moderate to weak binders of Brugia AsnRS, whereas variolin B and its short side-chain derivative SMEVAR have IC50 58 values of ~50 uM against the enzyme. However, the assay data on the long side-chain variants of variolin B indicates they bind to Brugia AsnRS 3- to 8-fold more tightly than to human AsnRS. While the structure of human AsnRS has not yet been determined, there are only three sequence differences (A224T, A3353, and L353V; Figure 2.2.A) near the active site. One of these residues, Ala 224 in Brugia AsnRS, is near the hinge of the adenine-binding loop, and may favor a different, more open conformation in Brugia than in human AsnRS, since the less bulky and non-hydrogen bonding alanine side chain is likely to restrict the motion of this loop less than threonine. The Brugia selectivity of long side—chain variolins may therefore be explained by their fitting only into the open conformation of the binding site, with this conformation being more readily accessible in Brugia than in human AsnRS due to the sequence substitution at the base of the loop. Similarly, the greater potency of the short side-chain variolins relative to the long side- chain analogs in Brugia could also be accounted for by this conformational model, since the closed conformation of the adenine-binding loop allows more favorable contacts with inhibitors. 2.5 Discussion A realistic expectation of structure-based drug screening is to find low affinity binders with novel scaffolds that can be further optimized by adding substituents to develop tight and selective inhibitors. Low micromolar affinity is typical of such lead compounds, especially in the case of Brugia AsnRS, where even the product mimic has a low micromolar (4.5 uM) IC50 (Table 2.2). While aminoacyl-tRNA synthetases are 59 acknowledged as rational drug targets (6), this is the first published account of discovering new classes of AARS inhibitors by structure-based screening. Screening and docking algorithms previously had been used to model the binding and relative affinity values of known inhibitors of synthetases and their analogs. Goddard and co-workers used the HierDock virtual screening protocol to dock and predict the relative binding energies of phenylalanine analogs to the T. thermophilus PheRS crystal structure (80). Lee and Kim used comparative molecular field analysis (81) to dock four known inhibitors of S. aureus MetRS and develop a predictive quantitative structure-activity relationship (82). Most of the highly potent and selective AARS inhibitors discovered in recent years have come from in vitro screening and optimization studies (83-86). Here we show that structure-based screening against an AARS target can identify several new classes of inhibitors. SLIDE has identified seven classes of inhibitors showing 50% inhibition of Brugia AsnRS at 25-240 uM concentrations. Analogs of variolin B showed 3- to 8-fold selectivity for Brugia relative to human AsnRS. This success rate for identifying new ligands based on SLIDE virtual screening (~15%) supports the benefits of including 3- dimensional structural information in high-throughput screening, since structure-blind in vitro screening typically has a success rate of <0.l% (19). In the process of screening, SLIDE predicts the binding mode of the docked ligand in the binding site of the protein, which aids in optimizing the new ligands for higher affinity and selectivity for the target protein. Incorporating complete protein flexibility during the screening of large molecular databases is sufficiently computationally intensive that it is not yet feasible. Various 60 methods developed in recent years have shown that selecting a small ensemble of protein structures can satisfactorily represent the conformational space available to the flexible regions of a protein binding site. In particular, crystallographic snapshots, representing structures of the same protein in different conformational states, have been used to represent protein flexibility in the screening and design of ligand candidates (22, 24, 25, 87). However, this approach is limited to experimentally observed states rather than fully representing the low-energy conformations of a protein. Molecular dynamics simulations can provide a sample of low-energy states, but remain limited to sampling motions on the sub-millisecond timescale, typically reflecting small-scale motions. However, Gorfe and Caflisch have used explicit-water MD simulations in a similar application to ours, to assess the flexibility of the substrate binding site between apo and inhibitor-bound structures of B-secretase (88). Their results indicate that the open- and closed-flap conformations of the protein are accessible at room temperature; hence, the open conformation could also be used for B-secretase inhibitor design. ROCK is designed to sample flexible regions in a protein using a non-forcefield approach, in which the motions maintain the non-covalent bond network and avoid steric overlaps. Unlike MD, ROCK does not ascribe timescales to modeled motions nor assess the relative likelihood/energy of the generated conformers; this can be done by coupling ROCK and MD, however. By preserving non-covalent interactions, ROCK tends to sample low- energy states and follow low-barrier paths between conformations. Furthermore, the ProFlex software used with ROCK can automatically define interactions that are coupled within the protein, without the need for expensive normal modes or essential dynamics calculations (32). This approach can also assess how flexibility in a protein changes or 61 redistributes upon complex formation, as has been analyzed for HIV protease and the Ras—Raf complex (89). The active sites of Brugia and human AsnRS have high sequence identity, with only three amino acid differences adjacent to the substrate binding sites. One of these substitutions occurs at the base of the adenine-binding loop (residue 224 is Ala in Brugia and Thr in human AsnRS), which likely alters its conformational flexibility in Brugia relative to human AsnRS. Designing inhibitor substituents that optimally fill the pocket created when this loop opens could improve inhibitor binding affinity and selectivity for Brugia AsnRS. This approach is supported by the work of others. For instance, Bursavich and Rich have proposed that stabilizing the conformational ensemble of an enzyme, including less-populated open conformations, can explain a range of ligand binding events that cannot be explained by lock-and-key or induced fit to a single target structure (90). Stroud and co-workers also suggest, based on their crystallographic analysis of C. neoformans and E. coli thymidylate synthase (91), that differences in flexibility or dynamics can be employed for species-specific inhibition. Thus, we envision that considering conformational differences of active-site loops between species, rather than only considering residue differences in the static parts of binding pockets, will open a range of new possibilities for gaining specificity between closely homologous enzymes. 2.6 Conclusions Using a template designed to represent the active site of Brugia AsnRS and its interactions with known ligands, SLIDE has successfiilly identified seven diverse 62 compounds that mimic the interactions between adenosine and the protein and bind with micromolar affinity. All the CSD and NCI compounds docked into the adenosyl pocket of the binding site. This protein is highly specific for binding asparagine in its aminoacyl pocket, as is generally true for AARS and their cognate amino acids. As a consequence, a productive strategy for AARS inhibitor design is to find promising scaffolds that bind strongly in the adenosyl pocket and can be linked appropriately to the cognate aminoacyl group. SLIDE identification of variolin B as an inhibitor led to the testing of variolin derivatives, which prove to be similarly potent and show selectivity for the Brugia enzyme. The impact of main-chain conformational flexibility on ligand binding in Brugia AsnRS has been modeled, providing insights into the binding of long side-chain variolins and their selectivity for the parasite AsnRS. The motions of active-site loops sampled by ROCK enable us to assess the contributions of protein conformational flexibility to ligand binding and specificity and provide a potent tool to develop even more selective inhibitors of the protein. 63 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) References Lazdins, J ., and Kron, M. (1999) New molecular targets for filariasis drug discovery. Parasitol Today 15, 305-6. Melrose, W. D. (2002) Lymphatic filariasis: new insights into an old disease. Int J Parasitol 32, 947-60. Kron, M. A., Kuhn, L. A., Sanschagrin, P. C., Hartlein, M., Grotli, M., and Cusack, S. (2003) Strategies for antifilarial drug development. J Parasitol 89 (Suppl. ), 8226-8235. Brown, K. R., Ricci, F. M., and Ottesen, E. A. (2000) Iverrnectin: effectiveness in lymphatic filariasis. Parasitology 121 Suppl, 8133-46. Horton, J., Witt, C., Ottesen, E. A., Lazdins, J. K., Addiss, D. G., Awadzi, K., Beach, M. J ., Belizario, V. Y., Dunyo, S. K., Espinel, M., Gyapong, J. O., Hossain, M., Ismail, M. M., J ayakody, R. L., Lammie, P. J ., Makunde, W., Richard-Lenoble, D., Selve, B., Shenoy, R. K., Simonsen, P. E., Wamae, C. N., and Weerasooriya, M. V. (2000) An analysis of the safety of the single dose, two drug regimens used in programmes to eliminate lymphatic filariasis. Parasitology 121 Suppl, 8147-60. Schimmel, P., Tao, J ., and Hill, J. (1998) Aminoacyl tRNA synthetases as targets for new anti-infectives. F aseb J 12, 1599-609. Eriani, G., Delarue, M., Poch, O., Gangloff, J., and Moras, D. (1990) Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature 347, 203-6. Cusack, S., Berthet-Colominas, C., Hartlein, M., Nassar, N., and Leberman, R. (1990) A second class of synthetase structure revealed by X-ray analysis of Escherichia coli seryl-tRNA synthetase at 2.5 A. Nature 34 7, 249-55. Woese, C. R., Olsen, G. J ., Ibba, M., and Sol], D. (2000) Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol Mol Biol Rev 64, 202-36. Brown, M. J ., Mensah, L. M., Doyle, M. L., Broom, N. J ., Osbourne, N., Forrest, A. K., Richardson, C. M., O'Hanlon, P. J ., and Pope, A. J. (2000) Rational design of femtomolar inhibitors of isoleucyl tRNA synthetase from a binding model for pseudomonic acid-A. Biochemistry 39, 6003-11. 64 (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) Casewell, M. W., and Hill, R. L. (1985) In-vitro activity of mupirocin ('pseudomonic acid') against clinical isolates of Staphylococcus aureus. J Antimicrob Chemother 15, 523-31. Hughes, J ., and Mellows, G. (1980) Interaction of pseudomonic acid A with Escherichia coli B isoleucyl-tRNA synthetase. Biochem J 191, 209-19. Nilsen, T. W., Maroney, P. A., Goodwin, R. G., Perrine, K. G., Denker, J. A., Nanduri, J ., and Kazura, J. W. (1988) Cloning and characterization of a potentially protective antigen in lymphatic filariasis. Proc Natl Acad Sci U S A 85, 3604-7. Kron, M., Petridis, M., Milev, Y., Leykam, J ., and Hartlein, M. (2003) Expression, localization and alternative function of cytoplasmic asparaginyl- tRNA synthetase in Brugia malayi. Mol Biochem Parasitol 129, 33-9. Ramirez, B. L., Howard, 0. M., Dong, H. F., Edamatsu, T., Gao, P., Hartlein, M., and Kron, M. (2006) Brugia malayi asparaginyl-transfer RNA synthetase induces chemotaxis of human leukocytes and activates G-protein-coupled receptors CXCR1 and CXCR2. J Infect Dis 193, 1164-71. Kron, M., Marquard, K., Hartlein, M., Price, 8., and Leberman, R. (1995) An immunodominant antigen of Brugia malayi is an asparaginyl-tRNA synthetase. FEBS Lett 374, 122-4. Beaulande, M., Tarbouriech, N., and Hartlein, M. (1998) Human cytosolic asparaginyl-tRNA synthetase: cDNA sequence, fimctional expression in Escherichia coli and characterization as human autoantigen. Nucleic Acids Res 26, 521-4. Doucet, J. P., and Weber, J. (1996) Molecular Similarity, in Computer-Aided Molecule Design: Theory and Applications pp 328-362, Springer. Doman, T. N., McGovern, S. L., Witherbee, B. J ., Kasten, T. P., Kurumbail, R., Stallings, W. C., Connolly, D. T., and Shoichet, B. K. (2002) Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J Med Chem 45, 2213-21. Schnecke, V., Swanson, C. A., Getzoff, E. D., Tainer, J. A., and Kuhn, L. A. (1998) Screening a peptidyl database for potential ligands to proteins with side- chain flexibility. Proteins 33, 74-87. 65 (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32) Carlson, H. A., and McCammon, J. A. (2000) Accommodating protein flexibility in computational drug design. Mol Pharmacol 5 7, 213-8. Knegtel, R. M., Kuntz, I. D., and Oshiro, C. M. (1997) Molecular docking to ensembles of protein structures. J Mol Biol 266, 424-40. Osterberg, F., Morris, G. M., Sanner, M. F ., Olson, A. J ., and Goodsell, D. S. (2002) Automated docking to multiple target structures: incorporation of protein mobility and structural water heterogeneity in AutoDock. Proteins 46, 34—40. Claussen, H., Buning, C., Rarey, M., and Lengauer, T. (2001) FlexE: efficient molecular docking considering protein structure variations. J Mol Biol 308, 377- 95. Carlson, H. A., Masukawa, K. M., Rubins, K., Bushman, F. D., Jorgensen, W. L., Lins, R. D., Briggs, J. M., and McCammon, J. A. (2000) Developing a dynamic pharmacophore model for HIV -1 integrase. J Med Chem 43, 2100-14. Schnecke, V., and Kuhn, L. A. (1999) Database screening for HIV protease ligands: the influence of binding-site conformation and representation on ligand selectivity. Proc Int Coanntell Syst Mol Biol, 242-51. Zavodszky, M. I., Sanschagrin, P. C., Korde, R. S., and Kuhn, L. A. (2002) Distilling the essential features of a protein surface for improving protein-1i gand docking, scoring, and virtual screening. J Comput Aided Mol Des 16, 883-902. Zavodszky, M. I., and Kuhn, L. A. (2006) Improving Docking Validation. Proteins in review. Zavodszky, M. 1., and Kuhn, L. A. (2005) Side-chain flexibility in protein-ligand binding: the minimal rotation hypothesis. Protein Sci 14, 1104-14. Jacobs, D. J ., Rader, A. J., Kuhn, L. A., and Thorpe, M. F. (2001) Protein flexibility predictions using graph theory. Proteins 44, 150-65. Lei, M., Zavodszky, M. I., Kuhn, L. A., and Thorpe, M. F. (2004) Sampling protein conformations and pathways. J Comput Chem 25, 1133-48. Zavodszky, M. I., Lei, M., Thorpe, M. E, Day, A. R., and Kuhn, L. A. (2004) Modeling correlated main-chain motions in proteins for flexible molecular recognition. Proteins 5 7, 243-61. 66 (33) (34) (35) (36) (37) (38) (39) (40) (41) (42) (43) (44) Berthet-Colominas, C., Crepin, T., Haertlein, M., Kron, M., and Cusack, S. to be submitted. Brunger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse- Kunstleve, R. W., Jiang, J. S., Kuszewski, J ., Nilges, M., Pannu, N. S., Read, R. J ., Rice, L. M., Simonson, T., and Warren, G. L. (1998) Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr 54, 905-21. Laskowski, R. A., Moss, D. S., and Thornton, J. M. (1993) Main-chain bond lengths and bond angles in protein structures. J Mol Biol 231 , 1049-67. Hess, H. H., and Derr, J. E. (1975) Assay of inorganic and organic phosphorus in the 0.1-5 nanomole range. Anal Biochem 63, 607-13. Baykov, A. A., Evtushenko, O. A., and Avaeva, S. M. (1988) A malachite green procedure for orthophosphate determination and its use in alkaline phosphatase- based enzyme immunoassay. Anal Biochem 1 71, 266-70. Cogan, E. B., Birrell, G. B., and Griffith, O. H. (1999) A robotics-based automated assay for inorganic and organic phosphates. Anal Biochem 271, 29-35. Danel, F., Walle, C., Kron, M., Haertlein, M., Cusack, S., and Page, M. G. P. (2004) in International Conference on Aminoacyl tRNA Synthetases pp 115, Seoul, Korea. Schnecke, V., and Kuhn, L. A. (2000) Virtual screening with solvation and 1i gand-induced complementarity. Perspect Drug Discov 20, 171-190. Bissantz, C., Folkers, G., and Rognan, D. (2000) Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/ scoring combinations. J Med Chem 43, 4759-67. Stahl, M., and Rarey, M. (2001) Detailed analysis of scoring functions for virtual screening. J Med Chem 44, 1035-42. Halperin, 1., Ma, B., Wolfson, H., and Nussinov, R. (2002) Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins 4 7, 409-43. Ferrara, P., Gohlke, H., Price, D. J ., Klebe, G., and Brooks, C. L., 3rd. (2004) Assessing scoring functions for protein-ligand interactions. J Med Chem 4 7, 3032-47. 67 (45) (46) (47) (48) (49) (50) (51) (52) (53) (54) (55) Perola, E., Walters, W. P., and Charifson, P. S. (2004) A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance. Proteins 56, 235-49. Gohlke, H., Hendlich, M., and Klebe, G. (2000) Knowledge-based scoring function to predict protein-ligand interactions. J Mol Biol 295, 337-56. Wang, R., Lai, L., and Wang, S. (2002) Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput Aided Mol Des 16, 11-26. Hespenheide, B. M., Rader, A. J ., Thorpe, M. F., and Kuhn, L. A. (2002) Identifying protein folding cores from the evolution of flexible regions during unfolding. J Mol Graph Model 21, 195-207. Rader, A. J ., Hespenheide, B. M., Kuhn, L. A., and Thorpe, M. F. (2002) Protein unfolding: rigidity lost. Proc Natl Acad Sci U S A 99, 3540-5. Taylor, R. (2002) Life-science applications of the Cambridge Structural Database. Acta Crystallogr D Biol Crystallogr 58, 879-88. Ihlenfeldt, W. D., Voigt, J. H., Bienfait, B., Oellien, F ., and Nicklaus, M. C. (2002) Enhanced CACTVS browser of the Open NCI Database. J Chem Inf Comput Sci 42, 46-57. Lipinski, C. A., Lombardo, F ., Dominy, B. W., and F eeney, P. J. (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 4 6, 3-26. Perry, N. B., Ettouati, L., Litaudon, M., Blunt, J. W., Munro, M. H. G., Parkin, S., and Hope, H. (1994) Alkaloids fiom the antarctic sponge Kirkpatrickia varialosa.: Part 1: Variolin b, a new antitumour and antiviral compound. Tetrahedron 50, 3987. Evers, D. L., Breitenbach, J. M., Borysko, K. Z., Townsend, L. B., and Drach, J. C. (2002) Inhibition of cyclin-dependent kinase 1 by purines and pyrrolo[2,3- d]pyrimidines does not correlate with antiviral activity. Antimicrob Agents Chemother 46, 2470-6. Gompel, M., Leost, M., De Kier Joffe, E. B., Puricelli, L., Franco, L. H., Palermo, J ., and Meij er, L. (2004) Meridianins, a new family of protein kinase inhibitors 68 (56) (57) (58) (59) (60) (61) (62) (63) (64) (65) (66) isolated from the ascidian Aplidium meridianum. Bioorg Med Chem Lett 14, 1703-7. Anderson, R. J ., and Morris, J. C. (2001) Total synthesis of variolin B. Tetrahedron Lett 42, 8697-8699. Anderson, R. J ., and Morris, J. C. (2001) Studies toward the total synthesis of the variolins: rapid entry to the core structure. Tetrahedron Lett 42, 311-313. Anderson, R. J ., Hill, J. B., and Morris, J. C. (2005) Concise total syntheses of variolin B and deoxyvariolin B. J Org Chem 70, 6204-12. Anderson, R. J. (2002) in Department of Chemistry, University of Canterbury, Christchurch, NZ. Marsh, C. L. (2005) in Department of Chemistry, University of Canterbury, Christchurch, NZ. Hill, J. B. (2005) in Department of Chemistry, University of Canterbury, Christchurch, NZ. Saffer, J. D., and Glazer, R. I. (1981) Inhibition of histone H1 phosphorylation by sangivamycin and other pyrrolopyrimidine analogues. Mol Pharmacol 20, 211-7. Davies, L. P., J amieson, D. D., Baird-Lambert, J. A., and Kazlauskas, R. (1984) Halogenated pyrrolopyrimidine analogues of adenosine fi'om marine organisms: pharmacological activities and potent inhibition of adenosine kinase. Biochem Pharmacol 33, 347-55. Recchia, I., Rucci, N., Festuccia, C., Bologna, M., MacKay, A. R., Migliaccio, S., Longo, M., Susa, M., Fabbro, D., and Teti, A. (2003) Pyrrolopyrimidine c-Src inhibitors reduce growth, adhesion, motility and invasion of prostate cancer cells in vitro. Eur J Cancer 39, 1927-35. Recchia, I., Rucci, N., Funari, A., Migliaccio, S., Taranta, A., Longo, M., Kneissel, M., Susa, M., Fabbro, D., and Teti, A. (2004) Reduction of c-Src activity by substituted 5,7-diphenyl-pyrrolo[2,3-d]-pyrimidines induces osteoclast apoptosis in vivo and in vitro. Involvement of ERK1/2 pathway. Bone 34, 65-79. Sadowski, J ., and Gasteiger, J. (1993) From Atoms And Bonds To 3-Dimensional Atomic Coordinates - Automatic Model Builders. Chem Rev 93, 2567-2581. 69 (67) (68) (69) (70) (71) (72) (73) (74) (75) (76) (77) Allen, J. G., and Danishefsky, S. J. (2001) The total synthesis of (+/-)-rishirilide B. JAm Chem Soc 123, 351-2. Neidle, S., Taylor, G. L., and Cowling, P. C. (1979) Crystal And Molecular- Structure Of 8,2'-Cycloadenosine Trihydrate. Acta Crystallogr B 35, 708-712. Freist, W., and Cramer, F. (1980) Phenylalanyl-tRNA, lysyl-tRNA, isoleucyl- tRNA and arginyl-tRNA synthetases. Substrate specificity in the ATP/PPi exchange with regard to ATP analogs. Eur J Biochem 107, 47-50. Dhananjeyan, M. R., Milev, Y. P., Kron, M. A., and Nair, M. G. (2005) Synthesis and activity of substituted anthraquinones against a human filarial parasite, Brugia malayi. J Med Chem 48, 2822-30. Burnouf, C., and Pruniaux, M. P. (2002) Recent advances in PDE4 inhibitors as immunoregulators and anti-inflammatory drugs. Curr Pharm Des 8, 1255-96. Klenke, B., Barrett, M. P., Brun, R., and Gilbert, 1. H. (2003) Antiplasmodial activity of a series of 1,3,5-triazine-substituted polyamines. J Antimicrob Chemother 52, 290-3. Goble, F. C. (1950) Chemotherapy of experimental trypanosomiasis; trypanocidal activity of certain bis (2-methyl-4-amino—6-quinolyl) amides and ethers. J Pharmacol Exp Ther 98, 49-61. Ruff, M., Krishnaswamy, S., Boeglin, M., Poterszman, A., Mitschler, A., Podjarny, A., Rees, B., Thierry, J. C., and Moras, D. (1991) Class II aminoacyl transfer RNA synthetases: crystal structure of yeast aspartyl-tRNA synthetase complexed with tRNA(Asp). Science 252, 1682-9. Shiba, K., Stello, T., Motegi, H., Noda, T., Musier-Forsyth, K., and Schimmel, P. (1997) Human lysyl-tRN A synthetase accepts nucleotide 73 variants and rescues Escherichia coli double-defective mutant. J Biol Chem 272, 22809-16. Cusack, S., Yaremchuk, A., and Tukalo, M. (1996) The crystal structure of the ternary complex of T.thermophilus seryl-tRNA synthetase with tRNA(Ser) and a seryl-adenylate analogue reveals a conformational switch in the active site. Embo J15, 2834-42. Burke, B., Yang, F., Chen, F., Stehlin, C., Chan, B., and Musier-Forsyth, K. (2000) Evolutionary coadaptation of the motif 2--acceptor stem interaction in the class II prolyl-tRNA synthetase system. Biochemistry 39, 15540-7. 70 (78) (79) (80) (81) (82) (83) (84) (85) (86) (87) Yaremchuk, A., Tukalo, M., Grotli, M., and Cusack, S. (2001) A succession of substrate induced conformational changes ensures the amino acid specificity of Thermus thermophilus prolyl-tRNA synthetase: comparison with histidyl-tRNA synthetase. J Mol Biol 309, 989-1002. Lenhard, B., Filipic, S., Landeka, 1., Skrtic, I., 8011, D., and Weygand-Durasevic, I. (1997) Defining the active site of yeast seryl-tRNA synthetase. Mutations in motif 2 loop residues affect tRNA-dependent amino acid recognition. J Biol Chem 272, 1136-41. Wang, P., Vaidehi, N., Tirrell, D. A., and Goddard, W. A., 3rd. (2002) Virtual screening for binding of phenylalanine analogues to phenylalanyl-tRNA synthetase. J Am Chem Soc 124, 14442-9. Cramer, R. D., Patterson, D. E., and Bunce, J. D. (1988) Comparative Molecular- Field Analysis (Comfa).1. Effect Of Shape On Binding Of Steroids To Carrier Proteins. J Am Chem Soc 110, 5959-5967. Kim, S. Y., and Lee, J. (2003) 3-D-QSAR study and molecular docking of methionyl-tRNA synthetase inhibitors. Bioorg Med Chem 11, 5325-31. Finn, J ., Mattia, K., Morytko, M., Ram, S., Yang, Y., Wu, X., Mak, E., Gallant, P., and Keith, D. (2003) Discovery of a potent and selective series of pyrazole bacterial methionyl-tRNA synthetase inhibitors. Bioorg Med Chem Lett 13, 2231- 4. Lee, J., Kim, S. E., Lee, J. Y., Kim, S. Y., Kang, S. U., Seo, S. H., Chun, M. W., Kang, T., Choi, S. Y., and Kim, H. O. (2003) N-Alkoxysulfamide, N- hydroxysulfamide, and sulfamate analogues of methionyl and isoleucyl adenylates as inhibitors of methionyl-tRNA and isoleucyl-tRNA synthetases. Bioorg Med Chem Lett 13, 1087-92. Jarvest, R. L., Erskine, S. G., Forrest, A. K., Fosberry, A. P., Hibbs, M. J ., Jones, J. J ., O'Hanlon, P. J ., Sheppard, R. J ., and Worby, A. (2005) Discovery and optimisation of potent, selective, ethanolarnine inhibitors of bacterial phenylalanyl tRNA synthetase. Bioorg Med Chem Lett 15, 2305-9. Bemier, S., Akochy, P. M., Lapointe, J ., and Chenevert, R. (2005) Synthesis and aminoacyl-tRNA synthetase inhibitory activity of aspartyl adenylate analogs. Bioorg Med Chem 13, 69-75. Cavasotto, C. N., and Abagyan, R. A. (2004) Protein flexibility in ligand docking and virtual screening to protein kinases. J Mol Biol 33 7, 209-25. 71 (88) (89) (90) (91) Gorfe, A. A., and Caflisch, A. (2005) Functional plasticity in the substrate binding site of beta-secretase. Structure 13, 1487-98. Gohlke, H., Kuhn, L. A., and Case, D. A. (2004) Change in protein flexibility upon complex formation: analysis of Ras-Raf using molecular dynamics and a molecular framework approach. Proteins 5 6, 322-37. Bursavich, M. G., and Rich, D. H. (2002) Designing non-peptide peptidomimetics in the let century: inhibitors targeting conformational ensembles. J Med Chem 45, 541-58. Finer-Moore, J. S., Anderson, A. C., O'Neil, R. H., Costi, M. P., Ferrari, S., Krucinski, J ., and Stroud, R. M. (2005) The structure of Cryptococcus neoformans thymidylate synthase suggests strategies for using target dynamics for species-specific inhibition. Acta Crystallogr D Biol Crystallogr 61, 1320-34. 72 Chapter 3 Optimizing variolin B and triazinylamine to improve their binding affinity and specificity for Brugia AsnRS 3.1 Introduction The preclinical drug discovery process could be broadly divided into two phases: the lead discovery phase and the lead optimization phase (I, 2). Lead discovery typically involves different flavors of high throughput screening (HTS) in which large sections of chemical space are sampled for biological activity. During lead optimization, the chemically feasible hits obtained from HTS are subjected to synthetic modification to optimize activity. The underlying rationale for chemical modification is that the change in molecular structure of a lead compound could maximize its biological aetivity (3, 4). The objectives of the lead optimization process could be multi pronged, to improve one or more of the following: biological properties (e.g., in vitro and in vivo potency), physiochemical properties (e.g., logP, pKa), pharmaceutic properties (e.g., solubility, crystallinity), or pharrnacokinetic properties (absorption, distribution, metabolism and 73 elimination) (3, 5). The objective of the work presented here is to optimize, using computational techniques, two of the top hits discovered using SLIDE (6) to improve their affinity and specificity for Brugia AsnRS over human AsnRS or other human proteins. The best analogs of these compounds, assessed using different metrics described in later sections of this chapter, were recommended for chemical synthesis to our medicinal chemistry collaborators. Variolin B, a pyrrolopyrimidine, was identified by SLIDE as a potential inhibitor, and was confirmed to inhibit ~50% of Brugia AsnRS activity at a concentration of 50p.M (6, 7). It had been reported to bind to a cyclin-dependent kinase (8), possibly explaining its cytotoxicity. The SLIDE-predicted binding mode (Figure 3.1), supported by quantitative structure activity relationship (QSAR) analysis of inhibition assay results of synthesized analogs, indicate that variolin B binds in the adenosyl pocket. We hypothesize that growing an asparagine side chain to the variolin scaffold will improve its affinity for Brugia AsnRS relative to other ATP-binding proteins. New analogs of variolin B were designed with the asparagine side chain attached to two different grth points on its scaffold and investigated to see whether they could bind well in both the adenosine and asparagine pockets of the binding site. Protein-ligand complementarity scores and the difference in ligand internal energies (between the bound and free conformations) were used to compare the docked orientations, if any, of these analogs. Triazinylamine has a symmetric structure with two substituted triazine rings connected by a phenyl linker. It was identified by SLIDE as a potential inhibitor and was confirmed to inhibit 50% of Brugia AsnRS activity and 80% of human AsnRS activity at a concentration of 25uM . The SLIDE-predicted binding mode (Figure 3.2) indicates that 74 Figure 3.1: (A) The 2D structure of variolin B. (B) The SLIDE-predicted binding mode of variolin B, shown in atom-colored tubes (carbon, green; oxygen, red; nitrogen, blue), is compared to ASNAMS bound in the crystal structure, shown in atom-colored tubes, but with carbon atoms colored orange. The pendant ring of variolin B, consisting the N6 atom, sits in the ribose pocket, with N6 oriented towards the asparagine pocket. Five analogs with sulfamoyl-asparagine grown from two different positions (occupied by N6 and 01 atoms) on the variolin scaffold were assessed. one of the triazine rings is isosteric with the adenine ring of ASNAMS and mimics its interactions with the surrounding binding site residues as well. The bridging phenyl ring is docked in the ribose pocket and is unable to make any polar interactions owing to its 75 Figure 3.2: (A) The 2D structure of triazinylamine is shown. (B) The truncated scaffold used to design substituents (R1 and R2) is shown. (C) The SLIDE—predicted binding mode of triazinylamine, shown in atom-colored tubes (carbon, green; oxygen, red; nitrogen, blue), is compared to ASNAMS bound in the crystal structure, shown in atom- colored tubes, but with carbon atoms colored orange. The phenyl ring in the middle, linking the two triazine rings, sits in the ribose pocket. Combinations of R1 and R2 groups were assessed to see whether they can replace the isopropenyl and phenyl groups in triazinylamine respectively. hydrophobic character. The program BOMB (Biochemical and Organic Molecule Builder, (9)) was used to design user-defmed substituents off of the truncated (only one of the two triazine rings) triazinylamine scaffold (Figure 3.2.B) to replace the phenyl 76 linker with a polar ring, isosteric with the ribose in ASNAMS, and to fill in the small cavity in AsnRS binding site where the C2 atom of adenine binds. The analogs designed by BOMB were assessed using protein-ligand complementarity scores and the difference in ligand internal energies. 3.2 Methods 3.2.1 Designing new analogs and generating their structures The analogs of variolin B were designed manually while those of triazinylamine were designed using an automated method. The most probable binding mode of variolin B (Figure 3.1.B) indicates that it binds in the adenosyl pocket of Brugia AsnRS. Given that the amino acid pocket of AsnRS is highly specific for asparagine, attaching a sulfamoyl- asparagine side chain to the variolin scaffold will improve its affinity and specificity for Brugia AsnRS relative to other ATP-binding proteins. Using the Cambridge Structural Database structure of variolin B scaffold (CSD code: LEPWIM), the 3D structures of the new analogs were built using CORINA (10), followed by generating multiple low-energy conformers using Omega (OpenEye Software). Five analogs of variolin with the sulfamoyl-asparagine attached to two different positions on the variolin B scaffold (Figure 3.1.A) were designed. The analogs of triazinylamine were designed using BOMB, a molecular mechanics-based method that builds new structures by growing user-defined substituents off of a given core structure. The truncated triazinylamine scaffold (Figure 3.2.B) was 77 used as an input structure to BOMB. The analogs generated by BOMB had one structure each, assessed as the best by its internal scoring function. However, to be consistent with our optimization protocol used for variolin B, the conformational space available to these analogs was explored by generating multiple low-energy conformers using Omega. 3.2.2 Scoring the interactions between designed analogs and Brugia AsnRS The multiple low-energy conformers generated for analogs of both variolin B and triazinylamine were screened and docked into the AsnRS binding site using SLIDE. For scoring the interactions between the docked orientations of designed analogs and Brugia AsnRS, we used a new version of SLIDE score (11) that assesses the protein-ligand complementarity using two scoring functions: SLIDE OrientScore to pick the correct orientation and SLIDE AffiScore to predict the binding affinity. All the three different scoring functions SLIDE OrientScore, SLIDE AffiScore and DrugScore (12) do a reliable job of assessing the right conformation, binding mode, and relative order of affinity of known AsnRS ligands. 3.2.3 Assessing ligand internal energies It has been observed that the bioactive, protein-bound conformation of a ligand often differs from its minimum-energy conformation in a protein-free environment or solution. 78 The energetic costs of the protein-induced ligand strain must be offset by the fi'ee energy of binding. Hence, it is important to assess the difference in ligand internal energies between the docked (bound) and minimum-energy conformations of the designed analogs. An analog with a docked conformation having a low difference in ligand internal energies and hence minimized strain is desirable. We used the force field MMFF94s, available with Omega, to assess the ligand internal energies of conformers of the designed analogs (13). 3.3 Results and discussion 3.3.1 Variolin B analogs Five analogs of variolin B were designed and investigated to see whether they could bind well in both the adenosine and asparagine pockets of Brugia AsnRS binding site. Three of them had the sulfamoyl-asparagine attached to the N6 atom of the pendant ring of the variolin B scaffold (Figure 3.1.A) while two of them had it attached to the 01 atom of the scaffold. In analogs where the sulfamoyl-asparagine was attached to the 01 position, the oxygen atom at that position was replaced by a nitrogen atom to make it a more stable bond. The 2D structures of all the five analogs are shown in Table 3.1. Each of the five analogs and their relative favorability is briefly discussed in the following sections. 79 Table 3.1: Designed analogs of variolin B with the sulfamoyl-asparagine (S-ASN) attached to two different positions on the variolin scaffold (Figure 3.1.A). Analog 2D Structure II III N-r-‘S-ASN truncated 80 Analogs I, II and III are similar overall and differ only in the chemical group present at the 01 position on the variolin B scaffold (Figure 3.1.A). Analog I has an OMe group at this position while analog II has an OH group. Analog 111 does not have any group there and hence is labeled as deoxy in Table 3.1. SLIDE could not dock any of the conformers of analogs I, H or III in the Brugia AsnRS binding site because the steric overlaps between the attached side chain and residues in the adenosyl pocket could not be resolved. When the minimum-energy conformers of each of these three analogs were docked manually in the AsnRS binding site by superimposing onto the binding mode of the variolin B scaffold shown in Figure 3.1.3, it was observed that the pendant ring was rotated by varying degrees compared to its position in the unsubstituted scaffold as shown in Figure 3.3. Due to the rotated positions of the pendant ring, the attached sulfamoyl-asparagine was oriented in totally different directions than intended. Ideally, the asparagine side chain in the designed analogs should be docked as close as possible to that of ASNAMS. The rotated positions of the pendant ring in these analogs could be due to the electrostatic interactions between the oxygen atoms of the attached sulfamoyl group and the planar tricyclic ring system, including the chemical groups present in the 01 position (OMe in analog I and OH in analog II). The conformers of analog IV, with the sulfamoyl-asparagine attached to the 01 position, also could not be docked by SLIDE in the AsnRS binding site. When the minimum-energy conformers of analog IV was docked manually in the AsnRS binding site by superimposing onto the binding mode of the variolin B scaffold, it was observed that the asparagine side chain, although oriented in the right direction, was still far off from its 81 Figure 3.3: The SLIDE-predicted binding mode of variolin B, rendered as atom colored tubes (carbon, green; oxygen, red; and nitrogen, blue), along with ASNAMS bound in the crystal structure, rendered as atom colored tubes but with carbon atoms colored orange, is shown in all the panels for reference. Binding modes of minimum-energy conformers of designed variolin analogs, rendered as atom colored tubes but with carbon atoms colored yellow, assessed manually by superimposing onto the binding mode of variolin B scaffold are also shown. (A) analog 1, (B) analog II, and (C) analog III. 82 counterpart in ASNAMS as shown in Figure 3.4.A. After detailed docking analysis performed both manually and using SLIDE, we inferred that for the asparagine side chain to be docked in the right orientation, the pendant ring has to be further buried in the ribose pocket, as shown in Figure 3.4.3. However, when this ring is docked deep in the Figure 3.4: The SLIDE-predicted binding mode of variolin B, rendered as atom colored tubes (carbon, green; oxygen, red; and nitrogen, blue), along with ASNAMS bound in the crystal structure, rendered as atom colored tubes but with carbon atoms colored orange, is shown in panel A for reference. Binding mode of minimum-energy conformer of variolin analog IV, rendered as atom colored tubes but with carbon atoms colored yellow, assessed manually by superimposing onto the binding mode of variolin B scaffold is shown in panel A. The pendant ring of analog IV has to be further buried in the ribose pocket in order to match the orientation of its attached asparagine side chain with that of ASNAMS as shown in panel B. ribose pocket, there are many steric overlaps with the neighboring residues that form the a beta sheet, that cannot be resolved without altering the main-chain torsion angles. In view of the steric problems posed by the pendant ring, the possibility of attaching sulfamoyl-asparagine side chain to a truncated asparagine scaffold was explored, leading to the design of variolin analog V (Table 3.1). SLIDE was able to dock one of the low- energy conformers of analog V with favorable orientation of the asparagine side chain as shown in Figure 3.5. Analog V, in its docked orientation, forms multiple hydrogen bonds Figure 3.5: The SLIDE-predicted binding mode of variolin analog V, rendered as atom colored tubes (carbon, green; oxygen, red; and nitrogen, blue), is compared with ASNAMS bound in the crystal structure, rendered as atom colored tubes but with carbon atoms colored orange. The hydrogen bonds between analog V and Brugia AsnRS residues, rendered as atom colored tubes, are shown as white dashed lines. Bound crystallographic water molecules that mediate interactions between ASNAMS and AsnRS are shown as cyan spheres. Analog V is capable of displacing the water molecule 28 and mimics its interactions as well. with several binding site residues as shown in Figure 3.5. In addition to the favorable interactions, analog V is also capable of displacing the buried crystallographic water molecule 28 (Figure 3.5). Releasing a bound water molecule and mimicking its interactions reduces the entropic costs and contributes favorably to the binding free 85 energy. Protein-ligand complementarity scores as well as the difference in ligand internal energies of known AsnRS ligands and variolin analogs have been reported in Table 3.2. Table 3.2: Predicted protein-ligand complementarity scores and difference in ligand internal energies of docked orientations of designed variolin analogs compared with known ligands. The scores were computed for dockings into the chain A of the ASNAMS-bound Brugia AsnRS structure. Com ound Could be fit in Protein-ligand WFFIMS ligand p the binding site? complementarity scoresa (Ecili/lgognergy Yes 03" = -10.88 Min.e = -106.4 ASNAMS (known 1i and) AS" = -9.00 Boundf = -69.5 g Dsd = -7.25 Diff.g = 36.9 Yes OS = -7.90 Min. = -117.9 Variolin B (known 1i and) AS = -9.24 Bound = -95.0 g Ds = 322 Diff. = 22.9 Analog I No - - Analog II No - - Analog 111 No - - Analog IV No - - OS = -8.90 Min. = -89.6 Analog V Yes AS = -8.10 Bound = -11.7 DS = -3.71 Diff. = 77.9 a More negative is more favorable. b SLIDE OrientScore (OS) in Kcal/mol. ° SLIDE AffiScore (AS) in Kcal/mol. d DrugScore (x105) (DS) in arbitrary units. a Ligand internal energy of the minimum-energy conformer. f Ligand internal energy of the bound conformer. g Ligand internal energy difference between the bound and minimum-energy conformer. A lower value is more favorable. Analog V lacks the pendant ring of the variolin B scaffold (Figure 3.1.A) and this results in loss of interactions in the ribose pocket. Given the large difference predicted in the 86 ligand internal energy between the bound conformation and the minimum-energy conformer of this analog (Table 3.2), substituting the truncated pendant ring with a smaller, more flexible, polar ring, will help overcome the ligand strain by increasing its interactions with the protein. 3.3.2 Triazinylamine analogs Combinations of R1 and R2 groups were tested by the BOMB program to substitute the isopropenyl group and the bridging phenyl ring respectively (Figure 3.2). Nine different analogs designed by the program were investigated further to see if they could be docked into the Brugia AsnRS binding site. Multiple low-energy conformers of these analogs were screened using SLIDE and the scores and ligand internal energies for the top- scoring docked orientation for each of them is tabulated in Table 3.3. The substituents to replace the phenyl ring of triazinylamine docked in the ribose pocket (R2 groups in Table 3.3) are all variants of pyrimidyl and furanyl linkers as shown in Figure 3.6. The difference in ligand internal energies between the bound and free conformations for all the triazinylamine analogs varies from 2.2 to 4.6 Kcal/mol. Favorable interactions between the docked orientations of these compounds and AsnRS binding site residues can readily overcome the energetic costs of the ligand strain in this range. Both pyrimidyl and furanyl linkers are rigid ring structure, unlike the puckered ribose ring in ASNAMS, which binds to AsnRS in a C3’-endo conformation. Alternate flexible linkers for the ribose pocket would be more favorable over the limited number of heterocyclic compounds in the R-group library used by BOMB. 87 Table 3.3: Predicted protein-ligand complementarity scores and difference in ligand internal energies of docked orientations of designed triazinylamine analogs. These analogs were designed using the BOMB program by testing combinations of R1 and R2 groups on the truncated triazinylamine scaffold (Figure 3.2.B). The scores were computed for dockings into the chain A of the ASNAMS-bound Brugia AsnRS structure. . . MMFF94s Protein-ligand 1i and internal Compound R1 R2 complementarity eEer scores“ (Kcalg/ymol) sozmr, osb = -8.80 Min.° = 432.6 1 com-I, \ N AS° = -8.10 Boundf = 429.5 1 J Dsd = -403 Diff.g = 3.0 / N so,,NH2 os = -8.3 Min. = 421.9 2 coon AS = -8.4 Bound = 418.7 Ds = -3.65 Diff. = 3.2 os = 77 Min. = -2912 3 on AS = -8.5 Bound = -289.0 Ds = -3.86 Diff. = 2.2 \ N 0s = -8.0 Min. = -541 4 CONH, I A AS = -7.4 Bound = -50.4 / = _ ' = N 502"”: DS 3.79 Diff. 3.7 \ n 08 = .77 Min. = 441.9 5 on I X AS = -7.5 Bound = 437.3 / = _ - = N 802"": DS 3.32 Diff. 4.6 30er2 0s = -7.6 Min. = 427.4 6 cnzon \ N AS = -6.7 Bound = 424.8 I /| Ds = -3.81 Diff. = 2.6 N/ 88 Table 3.3 continued . . MMFF94s Protein-ligand Ii and internal Compound R1 R2 complementarity g scores energy (Kcal/mol) sozmi2 OS = -8.1 Min. = -254.6 7 on / \ AS = -8.1 Bound = -2515 ..... DS = -3.43 Diff. = 3.1 0 30sz2 OS = -8.1 Min. = -162.1 8 CH20H / \ AS = -7.7 Bound = 458.1 ..... DS = -2.94 Diff. = 4.0 0 U 08 = -8.3 Min. = -220.3 9 OH ,,,,, AS = -7.4 Bound = -215.9 0 302"": DS = -2.66 Diff. = 4.4 a More negative is more favorable. ” SLIDE OrientScore (OS) in Kcal/mol. c SLIDE AffiScore (AS) in Kcal/mol. d DrugScore (x105) (DS) in arbitrary units. 8 Ligand internal energy of the minimum-energy conformer. f Ligand internal energy of the bound conformer. g Ligand internal energy difference between the bound and minimum-energy conformer. A lower value is more favorable. 89 Figure 3.6: The SLIDE-predicted binding modes of designed analogs of triazinylamine, rendered as atom colored tubes (carbon, green; oxygen, red; and nitrogen, blue), are compared with ASNAMS bound in the crystal structure, rendered as atom colored tubes but with carbon atoms colored orange. The compound names and 2D structures of the R2 (Figure 3.2.B) groups, docked in the ribose pocket, are tabulated in Table 3.3. (A) compound 1, (B) compound 2, (C) compound 3, (D) compound 4, (E) compound 5, (F) compound 6, (G) compound 7, (H) compound 8, and (1) compound 9. The first six compounds have six-membered pyrimidyl rings while the last three have S-membered furanyl rings docked in the ribose pocket 90 91 The R2 groups in compoundsl, 6 and 7 (Table 3.3 and Figure 3.6) roughly occupy the same volume as ribose in ASNAMS and also mimic some of its hydrogen bonds with AsnRS. These compounds have good protein-ligand complementarity scores and relatively low ligand strain as indicated by their difference in ligand internal energies. In addition, the R1 groups for compounds 1 and 6 are capable of displacing the buried crystallographic water molecule 20 (Figure 3.5), forming direct hydrogen bonds with most of the surrounding AsnRS residues. Meta substituted R2 rings seem to allow better fits in the adenine and ribose regions than ortho substituted R2 rings. All the substituents in the ribose pocket were attached with —SOzNH2 (sulfonamide) linker, which could be used as a grth point for an asparagine side chain. Compounds 4 and 8 are attractive because they have their sulfonamide linkers docked very close to that in ASNAMS. In addition, their R1 groups also displace the buried crystallographic water molecule 20 (Figure 3.5) and mimic all its interactions with the binding site residues. However, their R2 groups partially occupy the ribose pocket, making them less effective as ribose mimics. Overall, compounds 1 and 9 look best as they could displace a buried water molecule as well as fit the ribose pocket with greater similarity, both in terms of interactions as well as volume occupied, to ribose in ASNAMS. TheRl group in compound 9 (-CHzOH) is less bulky and more flexible than that of compound 1 (- CONHz). A more flexible R1 substituent is desirable as it could provide more options for making optimal interactions with the binding site residues. However, compound 1 has better protein-ligand complementarity scores than compound 9, implying better interactions between its docked orientation and the binding site residues. 92 3.4 Conclusions A series of analogs were designed for two of the top Brugia AsnRS inhibitors discovered using SLIDE. For variolin analogs, it was observed that the pendant ring in the scaffold posed a hurdle in terms of enabling the attached sulfamoyl-asparagine to be fit, with minimal strain, in the highly specific asparagine pocket of the AsnRS binding site. The variolin analog V was assessed as the best analog, as SLIDE-predicted binding mode indicates that the sulfamoyl-asparagine, attached to the 01 position of the truncated variolin scaffold, could not only fit in the asparagine pocket making favorable interactions but is also capable of displacing a buried crystallographic water molecule in that pocket. The chemical synthesis of variolin analog V, along with analogs II and IV, is being pursued in the laboratory of Dr. Jonathan Morris (University of Adelaide, Adelaide, Australia), our medicinal chemistry collaborator. For compounds designed from a truncated triazinylamine scaffold, compounds 1 and 9 were assessed as the best candidates to be pursued further for chemical synthesis. The SLIDE-predicted binding modes for these compounds indicate better isostericity in the ribose pocket and ability to displace a buried crystallographic water molecule in the adenine pocket of the AsnRS binding site. The chemical synthesis of the starting compound, the truncated triazinylamine scaffold, is being pursued in the laboratory of Dr. Morten Grotli (University of Goteborg, Goteborg, Sweden), our medicinal chemistry collaborator. Upon successful synthesis and purification of the starting compound, chemical synthesis of compounds 1 and 9 will be pursued. 93 (1) (2) (3) (4) (5) (6) (7) (8) (9) References Kenakin, T. (2003) Predicting therapeutic value in the lead optimization phase of drug discovery. Nat Rev Drug Discov 2, 429-38. Hajduk, P. J ., and Greer, J. (2007) A decade of fragrnent-based drug design: strategic advances and lessons learned. Nat Rev Drug Discov 6, 211-9. Pickett, S. D., McLay, I. M., and Clark, D. E. (2000) Enhancing the hit-to-lead properties of lead optimization libraries. J Chem Inf Comput Sci 40, 263-72. Joseph-McCarthy, D., Baber, J. C., Feyfant, E., Thompson, D. C., and Humblet, C. (2007) Lead optimization via high-throughput molecular docking. Curr Opin Drug Discov Devel 1 0, 264-74. Stahl, M., Guba, W., and Kansy, M. (2006) Integrating molecular design resources within modern drug discovery research: the Roche experience. Drug Discov Today I 1, 326-33. Sukuru, S. C., Crepin, T., Milev, Y., Marsh, L. G, Hill, J. B., Anderson, R. J ., Morris, J. C., Rohatgi, A., O'Mahony, G., Grotli, M., Danel, F ., Page, M. G., Hartlein, M., Cusack, S., Kron, M. A., and Kuhn, L. A. (2006) Discovering new classes of Brugia malayi asparaginyl-tRNA synthetase inhibitors and relating specificity to conformational change. J Comput Aided Mol Des 20, 159-78. Anderson, R. J ., Hill, J. B., and Morris, J. C. (2005) Concise total syntheses of variolin B and deoxyvariolin B. J Org Chem 70, 6204-12. Bettayeb, K., Tirado, O. M., Marionneau-Lambot, S., Ferandin, Y., Lozach, 0., Morris, J. C., Mateo-Lozano, S., Drueckes, P., Schachtele, C., Kubbutat, M. H., Liger, F., Marquet, B., Joseph, B., Echalier, A., Endicott, J. A., Notario, V., and Meij er, L. (2007) Meriolins, a new class of cell death inducing kinase inhibitors with enhanced selectivity for cyclin-dependent kinases. Cancer Res 67, 8325-34. Jorgensen, W. L., Ruiz-Caro, J ., Tirado-Rives, J ., Basavapathruni, A., Anderson, K. S., and Hamilton, A. D. (2006) Computer-aided design of non-nucleoside inhibitors of HIV-1 reverse transcriptase. Bioorg Med Chem Lett 16, 663-7. 94 (10) (11) (12) (13) Sadowski, J ., and Gasteiger, J. (1993) From Atoms And Bonds To 3-Dimensional Atomic Coordinates - Automatic Model Builders. Chem Rev 93, 2567-2581. Zavodszky, M. I., Sanschagrin, P. C., Korde, R. S., and Kuhn, L. A. (2002) Distilling the essential features of a protein surface for improving protein-ligand docking, scoring, and virtual screening. J Comput Aided Mol Des 16, 883-902. Gohlke, H., Hendlich, M., and Klebe, G. (2000) Knowledge-based scoring fiinction to predict protein-ligand interactions. J Mol Biol 295 , 337-56. Bostrom, J ., Greenwood, J. R., and Gottfries, J. (2003) Assessing the performance of OMEGA with respect to retrieving bioactive conformations. J Mol Graph Model 21 , 449-62. 95 Chapter 4 Automated shape and chemistry comparison for defining binding site invariants and specificity determinants 4.1 Abstract Structure-based virtual screening methods have been successful in identifying new ligands to a protein target. However, it is challenging to identify new ligands that will bind strongly to one protein relative to another using such methods. This is an important factor in designing selective inhibitors (or agonists) as reagents to probe for biochemistry, or as drug lead compounds. We have developed an automated approach, CompSite, to identify specificity determinants in one protein relative to another protein to aid in structure-based virtual screening and design. SLIDE, our screening and docking software, generates a binding site template to represent sites where a ligand can make favorable hydrogen-bond and hydrophobic interactions with atoms in the protein binding site. Using complete-linkage clustering of superimposed templates, shared interaction sites that are chemically similar or different between the templates are identified. Steric 96 difference sites between the templates are also identified by checking for van der Waals overlaps to indicate pockets for ligand binding available in one protein relative to another. We applied this method on different pairs of proteins that bind to the same compound, to see if the identified chemical difference sites could explain the experimentally measured relative binding affinities. In addition, we also applied this method to identify the binding site invariants and differences between Brugia malayi asparaginyl-tRNA synthetase (AsnRS), an anti-parasitic drug target, and a set of diverse ATP-binding proteins. The ATP-binding proteins were broadly divided into two categories based on the way they bind to the adenine moiety, with the exocyclic amine N6 buried in the binding site (N6-in) or exposed to the solvent (N6-out). Combining our CompSite method with virtual screening protocols provides a promising approach for screening prospective ligands to identify those most likely to be specific to a protein target. The results obtained from our method can also be used to suggest modifications to known ligands that improve their specificity for the target protein relative to other protein(s). Examples are shown for serine proteases, protein kinases and our drug discovery target, Brugia AsnRS. 4.2 Introduction Successful applications of protein structure-based virtual screening efforts have been well documented in several reviews in recent years (1-5). The universally acknowledged major bottlenecks in virtual high throughput screening methods are identifying and ranking true hits from a large pool of small organic molecules (6-8). Recent algorithmic 97 advances aimed to address those problems have led to improvement in the hit rates and have identified potent drug leads that were predicted and experimentally validated to bind strongly to the target protein (9, 10). However, even some of the potent lead compounds discovered by structure-based screening are beset by drawbacks in the early stages of drug discovery, one of the biggest being the ability of the compound to discriminate between the targeted (protein) structure and its structural homologs. Protein kinases are an obvious example because more than 500 homologous kinases have been identified in the human genome (11) and so many of them are rational drug targets for various diseases including cancer, diabetes and inflammation (12). Some of the promising ATP- site kinase inhibitors, including those that were approved or were in clinical development, have been reported to have poor specificity profiles (13). The problem of selectivity is not limited to protein kinases but has also been reported in other common drug targets like serine proteases (14), nuclear receptors (15) and matrix metalloproteinases (MMPs) (16). One of the new classes of inhibitors of Brugia malayi asparaginyl-tRNA synthetase (AsnRS), discovered in our lab (I 7), has been recently reported to bind to a cyclin- dependent kinase (18). This lack of selectivity may explain its cytotoxicity observed in human cell lines. The problem of ligand selectivity can be computationally addressed without affecting the quality of docking or speed of virtual screening of current structure-based methods, by filtering docking results to select those molecules which include pre- identified specificity determinants using an algorithm like CompSite. There have been a few other methods published in the literature that follow this paradigm of treating computational selectivity modeling as a separate module that could be integrated into the 98 overall virtual screening and docking protocol. Kastenholz and co-workers developed a method in which the molecular interaction fields generated by the program GRID (19, 20) for the binding sites of several target protein structures are analyzed with consensus principal component analysis to obtain contour plots identifying regions that are important for selectivity in the chosen target protein (21). The GRID/PCA technique was adapted in a different way by Braiuca and co-workers to partially account for protein flexibility that could predict selectivity differences caused by arnino-acid residue differences in not only the active site but also in regions that are not directly interacting with the ligand (22). Sheridan and co-workers developed a mathematically simpler method FLOGTV (23) that uses the trend vector paradigm (24) to compare the binding site field maps, generated by the program FLOG (25), to visualize the differences in closely related proteins superimposed in a reasonable way. Deng and co-workers developed the structural interaction fingerprint method that reduces the three-dimensional structural binding information of protein-ligand complexes into corresponding one- dirnensional (1D) binary strings (26). A hierarchical clustering analysis of these binary strings helps identify similarities and diversities between their small-molecule binding interaction patterns, and facilitates the post-docking analysis to organize and filter screening results. The same authors modified their method to develop a strategy for designing protein target-focused chemical libraries by using the desirable protein-ligand interactions as filtering constraints (27). By modifying the virtual screening protocol to introduce essential protein-ligand interactions, identified by prior findings, as constraints during the docking stage, Perola reported significant reduction in the false positive rates in kinase virtual screens (28). Other than the structure-based methods that we have briefly 99 covered here, different sequence-based and ligand-based approaches to model ligand selectivity in drug design have been reviewed recently by Ortiz and co-workers (29). My goal was to develop a method that can automatically identify differences in preferred ligand chemistry, or differences in pockets available for ligand binding, to aid structure-based drug design and incorporate specificity determinants in virtual screening. Instead of focusing on an atomistic protein comparison approach, we focus on the implications of such changes in the ligand binding space. For instance, if an amino group were interchanged with a hydroxyl group on the other side of the binding Site, this would appear to be a significant difference between two proteins being compared. However, the interchange of the two groups could create a substantially similar environment on the ligand binding space: the presence of H-bonding donor and acceptors at similar distances and angles. Here we implement a ligand-space chemistry and shape comparison tool, CompSite, and test its ability to explain experimentally observed specificity differences in protein homologs. Our structure-based screening and docking tool SLIDE (30—34), capable of efficiently screening large databases of small organic molecules while accommodating the protein and ligand side-chain flexibility during the process, generates knowledge- based templates to represent the protein binding sites. Templates are essentially detailed pharmacophores representing the favored spatial distribution of ligand chemistry from the protein’s point of view. The key to our method is the underlying mechanism to find the shared interaction sites in the proteins, achieved by complete-linkage clustering of their SLIDE-generated templates. Complete-linkage clustering provides an objective technique to resolve superimposed interaction template points into the most highly occupied set of 100 sites of fixed radius. This approach has been used earlier to locate consensus water sites in serine proteases (35). The results of the clustering process in this method are dependent on the clustering threshold chosen and the accuracy of the superposition method used to bring the templates in the same reference frame. I will elaborate on those aspects in the “Materials and methods” section of this chapter. Each of the shared interaction sites, identified in the clustering process, consists of individual template points, representing one or more of the proteins included as input to the algorithm, that make up the cluster at that site. Post-clustering processing is performed to classify the shared interaction sites as either chemically similar or different sites based on the preferred ligand chemistry represented by the clustered template points. In order to assess the quantitative importance of any differences identified, we quantified these sites using DrugScore, a knowledge-based scoring function developed to score the binding geometry of ligands in proteins (36). DrugScore converts structural information obtained from the Protein Data Bank (PDB) into distance-dependent preferences for pairs of interacting protein and ligand atoms. Since our method finds the shared interaction sites where potential ligand atoms can interact with the respective proteins, DrugScore is suited to score those sites and confirm any chemical differences. The chemical difference sites were further validated using LigPlot (37) to compute the interactions at each shared interaction site with each protein. In addition to the chemical difference sites, steric difference sites between two proteins are also identified, independent of the clustering process, by checking for van der Waals overlaps between the template points of one protein and the atoms of the other protein. Steric difference sites are useful in indicating pockets available for ligand binding in one protein relative to another and are 101 automatically validated by checking for van der Waals overlaps. The two distinct advantages of our fast, modular method are the ease with which it can be integrated into the structure-based virtual screening and docking protocols, independent of the availability of known ligands, and its usefulness in guiding lead optimization efforts by suggesting synthetic modifications that could improve the specificity of the lead compounds. We applied this method to two sets of proteins, assembled to address these questions: i) can we explain the known relative binding affinities of a ligand that binds to two different proteins? and ii) can we identify specificity determinants between Brugia AsnRS and a set of other ATP-binding proteins? To address the first question, we assembled 4 protein pairs taken from the AffinDB database (38). Each homologous protein pair was bound to the same ligand and the structure of the crystal complex was available plus the binding affinities. Henceforth, we will call this set the AffinDB set. We addressed the second question in the context of Brugia AsnRS and other ATP-binding proteins. It is important, in attaining ligands specific to AsnRS, to find the chemical and steric difference sites that can be exploited to identify or design ligands that discriminate between AsnRS and other ATP-binding sites. For this, we assembled a set of ATP- binding proteins (the “ATP set”) that was further subdivided based on the way they bind to the adenine moiety. Results obtained by applying our method on this set elucidate the significant similarities and differences. 102 4.3 Materials and methods 4.3.1 Representing the protein binding sites The first step in our algorithm (Figure 4.1) is to generate the templates to represent the protein binding sites using our SLIDE software (30-32). SLIDE uses distance geometry to screen and dock ligand candidates into the template of the target protein. A template consists of points identified as the most favorable positions for ligand atoms to form hydrogen bonds or make hydrophobic interactions with the neighboring protein atoms (33). The template points are given chemistry labels as donors, acceptors, doneptors (donor and/or acceptor) or hydrophobic, depending upon the type of interaction that a potential ligand atom at that site would make with the protein. An acceptor template point, for example, is located near a donor protein atom, such as the backbone amide nitrogen, and represents a favorable placement for a ligand acceptor atom at that point. A doneptor (donor/acceptor) point is defined in two cases: when a ligand atom at that site could make favorable hydrogen bonds with separate hydrogen-bond donor and acceptor atoms in the protein, or when it could interact with a group that both donates and accepts hydrogen bonds (e.g., —OH in the side chains of Ser, Thr, or, Tyr). Geometrically favored interaction sites for ligand hydrogen-bonding atoms are assigned based on the distance and angle to protein hydrogen-bonding partners. The parameters for optimal hydrogen bonding geometry were taken from the literature (39, 40). Hydrophobic template points are placed near significantly hydrophobic protein surface patches that complement the hydrophobic groups in ligands for a number of 3D protein-ligand complexes (41). 103 Generate the templates for the binding sites of the proteins Bring the proteins in the same reference frame by ligand- based or protein backbone-based superposition Cluster the template points using complete-linkage clustering if % Post clustering processing to identify (chemical) difference sites between a pair of proteins Post clustering processing to identify (chemically) similar sites Prune sites that have more than one point If there are any two points in the from a given template to keep the closest cluster, one from each template, that one to the cluster centroid are chemically similar, this is not a chemical difference site Is the dominant chemical type the same as in at least 2/3 of the templates clustered? If the cluster passes all the previous conditions, then it is a chemical difference site If the cluster passes all the previous conditions, then it is a chemically similar site Figure 4.1: A flowchart describing steps for defining chemically similar or different sites 104 A dense template generated by SLIDE finely samples the protein binding site surface and hence leads to a better representation of the chemistry of the protein that not only improves the identification, docking and scoring of ligands (33), but also allows us to compare different binding Sites of similar proteins. The binding site limits (and hence the template volume) are user-defined by creating a 3D box. A known ligand, if available, can serve as a starting point to generate the box, in this case the ligand atom’s (x, y, z) coordinates become the two opposite comers of the box. For the proteins selected for our analysis, the co-crystallized ligands were used to define the volume of the binding. 4.3.2 Superposition to bring the templates in the same reference Once the templates representing the binding sites are generated using SLIDE, they have to be transformed into the same reference frame before we cluster them to identify the shared interaction sites. This can be achieved by superimposing the proteins (from which the templates are derived) using either the bound (co-crystallized) ligand-based superposition or protein main-chain least squares superposition. The choice of the superposition method depends on two things: one, whether the structure of the bound ligand is available at all and two, the purpose of clustering to compare the binding site templates and the questions being addressed. When no structural information is available for ligands that are known to bind to the proteins being compared, then we have to bring them into the same reference frame using main-chain least-squares superposition. On the other hand, when structures of the proteins being compared are available with their bound 105 ligands, then we have an option of choosing either of the two superposition methods. If the objective of clustering the templates is to identify specificity determinants in order to optimize a ligand (or a scaffold) that is known to hit two or more proteins such that it becomes more specific to one protein relative to the other protein(s), then ligand-based superposition should be used. If the objective is to find out what makes two or more proteins that have a degree of sequence and structural Similarity bind to different ligands and whether there is any, yet unknown, ligand (or scaffold) that can bind to both or all of them, then protein main-chain least squares superposition should be used. In our case, protein structures were either bound to the same ligand (AffinDB set) or to ligands that shared a common scaffold (adenine in the ATP set), and hence we used ligand-based superposition before clustering the templates. To reiterate, the AffinDB set was assembled to address the question whether our method can explain the known relative binding affinities of a ligand that binds to two different proteins, while the ATP set was assembled to address the question whether our method can identify specificity determinants between Brugia AsnRS and a set of other ATP-binding proteins. The clustering sensitivity to superpositional accuracy will be discussed in the subsection entitled “Clustering sensitivity to superpositional accuracy”. 4.3.3 Complete-linkage clustering to identify the shared interaction sites Complete linkage clustering was chosen to identify the Shared interaction sites of the superimposed templates. It provides an objective technique to not only detect the 106 template points (representing different proteins) that occupy the same site but also to resolve overlapped sites into dense microclusters representing shared interaction sites. Complete linkage clustering allows us to define the maximum diameter (or a threshold) for any cluster (helping control the separation between the cluster centroids) and it produces compact, most densely occupied set of clusters for that diameter. The complete linkage clustering algorithm has been used in our lab earlier to identify consensus water sites in different serine protease structures (35). The algorithm is explained using a hypothetical case in Figure 4.2, in which 5 template points are clustered. Three of these template points (labeled as 1, 2 and 3 in Figure 4.2) belong to one protein while the other two (labeled as 4 and 5 in Figure 4.2) belong to the other protein. The first step is to compute the distances between all pairs of template points as shown in Figure 4.2.A. Complete-linkage clustering algorithm begins by placing the two closest points together into a cluster, provided they are separated by a distance lower than the clustering threshold (chosen to be 1.3 A for our analysis). In our example, points 1 and 5 form the first cluster as shown in Figure 4.2.B. Each subsequent iteration in the algorithm is performed to find the next closest pair. The distance between a point and a cluster is defined by the maximum distance between that point and all the elements in the cluster. In our example, points 2 and 4, that form cluster 11 (Figure 4.2.B), are at a distance of 1.8 A and 1.9 A respectively from point 3. Therefore, the distance between cluster II and point 3 is 1.9 A (greater of the two distances) as shown in Figure 4.2.B. This feature in complete-linkage clustering ensures that all the pairwise distances within a given cluster meet the distance threshold and hence result in the most compact clusters for that threshold. 107 Figure 4.2: The complete-linkage clustering algorithm is briefly explained using a hypothetical example of 5 interaction sites (template points) of which 1, 2 and 3 (rendered as solid spheres) belong to one protein and 4 and 5 (rendered as spheres with a mesh surface) belong to another protein. They are colored by the type of interaction that a ligand atom can make at each site: red for hydrogen bond acceptor, blue for hydrogen bond donor, and green for hydrophobic. The clustering threshold for this example is 1.3 A, same as used in our analysis. The clustering process begins by computing all pairwise distances as shown in A. The two closest sites are clustered together provided their separation is below the clustering threshold. Hence, sites 1 and 5, separated by 1.1 A, form the first cluster (enclosed in a white box) as shown in B. Each subsequent iteration in the algorithm is performed to find the next closest pair. Sites 2 and 4, separated by 1.2 A (below the threshold), happen to be the next closest pair and hence form the second cluster in this example. The distance between a site and a cluster is defined by the maximum distance between that site and all the elements in the cluster. For example, the distance between cluster H (formed by sites 2 and 4) and site 3 in this example is 1.9 A, defined by the distance (shown as a dotted line in B) between site 4 and site 3. The iterative process is repeated until no further elements can be clustered without exceeding the threshold distance. Sites that are not yet clustered because of the clustering threshold are considered to define single-point or singlet clusters, e.g. site 3 forms singlet cluster III in B. Once these clusters are identified, the post-clustering processing will eventually identify chemically similar (e.g. cluster I shown in B is a similar (acceptor) site) and different (e. g. cluster 11 shown in B is difference (polar-phobic) site) sites. 108 Cluster Ill Cluster l Cluster ll Figure 4.2 109 This iterative process is repeated until no further elements can be clustered without exceeding the threshold distance. Any points that are not yet clustered because of the clustering threshold are considered to define single-point or singlet clusters, e.g. point 3 shown in Figure 4.2.B. A clustering threshold of 1.3 A was chosen for our analysis. The rationale behind defining a particular clustering threshold is to choose a distance that would ensure that only spatially overlapping template points (from two or more proteins) are clustered. The non-overlapping separation between the centers of two atoms with a typical van der Waals radius of 1.4 A is 2.8 A. However, when the separation between their centers is 1.3 A, there is an overlap of 1.5 A between the two atoms. Of course, for atoms with van der Waals radius greater than 1.4 A, the overlap will be even higher. Choosing a clustering threshold of 1.3 A ensures that the clustered template points, representing potential ligand will be Spatially overlapping and, in most cases, prevents two template points from the same protein from being clustered in the same interaction site. Rarely, the fine sampling of hydrogen-bonding points in the templates results in two H-bonding points fi'om the same protein being placed in one cluster. This is explicitly considered in defining chemically similar and different sites. 110 4.3.4 Post-clustering processing to identify similar and chemical difference sites After the shared interaction sites are identified by clustering, each one of them is processed computationally to check if it could be classified as either a similar site or a chemical difference site, based on the templates points that occupy the site. 4.3.4.1 Similar sites The rationale behind identifying the similar sites in the templates representing different proteins is to find the binding site invariants. Each of the individual template points that occupy a shared interaction site denotes a chemical type based on the preferred ligand chemistry that it represents at that site. For each of the shared interaction sites, the method queries if the dominant chemical type is the same as in 2/3 of the templates clustered. If so, the interaction site is classified as a chemically similar site. The idea is to impose a strict filter so that only those interaction sites that have at least 2/3 occupancy, i.e. sites that are occupied by template points representing at least 2/3 of the proteins clustered, and representing a dominant chemical type are identified as similar sites. Hence, by definition, the method will assign a chemical type to each similar site that it identifies. Any interaction site that has more than one point fi'om a given template is pruned to include only the closest point to the cluster centroid. Pruning is done to avoid any bias while determining the chemical type of the similar site. Our method has been designed to identify similar sites between two or more templates that could either belong lll to conformers of the same protein or to a set of homologs that can be reasonably superposed. Templates of proteins that are diverse in both sequence and structure space, but bind to the same ligand (or scaffold) could also be clustered and processed to identify the similar Sites that enable them to bind the same ligand. Identifying similar sites in templates representing conformers of the same protein is similar in concept to MUSIC, developed by Carlson and co-workers (42) to identify the binding regions that are conserved in molecular dynamics (MD) simulations of the target protein and use the information for drug design. Recently, Rarnensky and co-workers developed a similar method that identifies binding site local similarity based on the analysis of protein environments of ligand fragments, the results of which could be used for ligand optimization (43). 4.3.4.2 Chemical difference sites Our method has been designed to identify chemical difference sites between two templates only. As the name suggests, the rationale behind identifying chemical difference sites is to find the shared interaction sites that are occupied by template points, representing the two proteins, of the opposite chemical type or in other words find the specificity determinants for the two proteins. For each of the shared interaction sites, the method queries if there are any two points, one from each template, clustered in the site that are chemically similar. If so, the interaction Site is not a chemical difference site. Hence, algorithmically, unlike the similar sites, the chemical difference sites are identified by elimination. 112 4.3.5 Relative significance of the chemical difference sites Our method identifies the chemical difference sites in a binary qualitative way, i.e., a shared interaction site is either a chemical difference site or not. However, quantifying the degree of the chemical difference is required to assess their relative significance and potential contribution to the binding specificity. Since the template points represent the geometrically optimal positions in a protein binding site for a ligand atom to make favorable hydrogen bond and hydrophobic interactions, we chose DrugScore (36), a knowledge-based scoring function, to score the chemical difference sites. DrugScore uses structural information from the PDB to score protein-ligand complexes based on the preferred distances observed between different ligand and protein atom pairs and hence is suited to score our chemical difference sites. Scoring these sites also enables us to filter out any noise that may have been introduced by one or more of the following: quality of the protein structures, quality of the superposition or the chosen clustering threshold. Each chemical difference site is defined by two template points of the opposite chemical type, one from each protein. To score a chemical difference site, for example say one that has a hydrogen-bond acceptor template point from protein 1 and a hydrogen-bond donor template point from protein 2, the following scheme is implemented to score it using DrugScore (DS): ADSpmtein] = DS(A) — DS(D) in protein 1 (4.1) ADSmtein 2 = DS(D) — DS(A) in protein 2 (4.2) The difference in DrugScore values, denoted by ADS, is computed for both the proteins in order to determine the degree of preference. Hence, ADSpmtein 1 quantitatively 113 indicates protein 1’s degree of preference for its own acceptor point at that site over protein 2’s donor point that also shares the site. Similarly, ADSpmtein 2 quantitatively indicates protein 2’s degree of preference for its own donor point at that site over protein 1’s acceptor point that also shares the site. Since a more negative value of DrugScore is more favorable, therefore, for a chemical difference site to be qualified as a favorable difference, ADS must also be negative. To be considered as a significant chemical difference site, we chose a threshold of -10,000 (arbitrary DrugScore units) for the ADS values. Apart from being the average value of all the ADS values computed in our analysis, it was successful in not only eliminating more than half of the false positives (see Figure 4.3 and the subsection entitled “Clustering sensitivity to superpositional accuracy”) but also detecting the most significant chemical difference sites that explain the observed differences in binding affinities for the proteins of the AffinDB set. The degree of preference of the chemical difference sites was further validated using LigPlot (3 7). LigPlot computes whether the template points at any chemical difference site can make hydrogen-bonding or hydrophobic interactions with the neighboring atoms in either of the proteins. In any chemical difference site, it would be expected that a template point can make interactions with the neighboring atoms of its own protein and none with those of the other protein. 4.3.6 Clustering sensitivity to superpositional accuracy The results of clustering and the subsequent processing to identify the similar and chemical difference sites depend on the superposition performed to bring the templates 114 into the same reference frame. It is, of course, desirable to have a robust clustering method in which the results do not vary much with small, reasonable shifts in the template positions, introduced by alternative superposition methods. To assess the sensitivity of our clustering method to superpositional accuracy, the Brugia AsnRS template and its copy were used as sample templates to be clustered, at a clustering threshold of 1.3 A. The template copy was shifted from O to 2.0 A in one direction, in steps of 0.25 A, and clustered with the original template after every shift. The same steps were repeated in another orthogonal direction too. The rationale behind shifting the template copy was to mimic the shifts that are introduced by using alternative superposition methods. Since we are clustering two templates that have the same spatial distribution of chemistry, albeit with a shift in one of them, we would expect that greater the shift, the lower the number of similar sites and higher the number of chemical difference sites. Indeed, as the plot shown in Figure 4.3 indicates, 90% of true chemical similarities or differences are preserved when the proteins are shifted by 1.25 A or less. When the chemical difference Sites are scored and include only those that pass the threshold difference in DrugScore values (-1.0 x 104 in arbitrary units), there is almost a two-fold reduction in their number. 4.3.7 Steric difference sites Like the chemical difference sites, the steric difference sites are also pairwise in nature and are identified for two templates. However, unlike the chemical difference sites, the steric difference sites are identified by checking for van der Waals overlaps between the 115 Similar sites identified by shift in: X direction —I— 100 4 I I l—l—I - x z direction -e— 901 \. 80- \\ 70- 30; ,,.../ 20.: / ‘ g/OA; 10- e/ 0 - a a a—a—eé r I U 1 U I I I V T I l I I I l I l I 0.00 0.25 0.50 0.75 1 .00 1.25 1.50 1 .75 2.00 RMSD of shifted templates in Angstroms 2 i \l 3 60 - 3 50 ‘ Chemical difference sites identified by shift in: .3 . X direction —a— all. —0— Drugscore selected : 40 - 2 direction —e— all, —0— DrugScore selected 0 § 0 0. DO \. Figure 4.3: Clustering sensitivity to superpositional accuracy was assessed and the results are shown in this plot. A copy of the AsnRS template was shifted (in steps of 0.25 A) in two orthogonal directions (X and Z) and each shifted copy was clustered (clustering threshold of 1.3 A) with its original and processed to identify chemically similar sites and chemical difference sites between the two. The RMSD shift shown here mimics the RMSD shift obtained as a result of superposition (ligand-based or protein backbone- based) that is required to bring the templates in the same reference frame before clustering them to identify the shared sites (step 2 in Figure 4.1). Since we are clustering the original AsnRS template with its shifted copy, we would expect the number of similar sites is expected to gradually decrease with increasing RMSD while the number of chemical difference sites to gradually increase The chemical difference sites become less influenced by superpositional error when the chemical differences are quantified using DrugScore. Only the significant differences that pass the threshold difference in DrugScore values (-1 .0 x 104 in arbitrary units) were selected. At the clustering threshold of 1.3 A, 80-90% of the shared interaction sites are identified as similar sites while only 0-10% of them are identified as significant chemical difference sites, when the superpositional error has an RMSD of 1.25 A or less. 116 template and the other protein structure, without performing clustering. These sites identify volumes that are sterically accessible in one protein relative to the other. Let us consider two templates — template 1 and template 2 — that represent the binding sites of protein 1 and protein 2. For each interaction Site in template 1, the closest atom in protein 2 is identified. To be identified as a steric difference Site, a point in template 1 is identified as having van der Waals overlaps with its nearest neighbor atom in protein 2. The van der Waals radius of each template point was defined to be 1.4 A, which tends to be on the conservative (small) side; the effective van der Waals radii of organic atoms, including a correction for the bound hydrogens, ranges fi'om 1.4 to 2.0 A. This ensures that the van der Waals overlaps that are detected with this radius do not result in over prediction of steric differences. 4.3.8 Datasets We assembled two sets of proteins on which to apply our method on and check whether the results can address our questions. As introduced earlier, the two sets are referred to as the AffinDB set and the ATP set in our analysis, and they have been detailed in the following subsections. 4.3.8.1 AffinDB set This set was assembled to address the question whether the results from our method can explain the known relative binding affinities of a ligand that binds to two different 117 proteins. The AffinDB database (38) is a freely accessible repository of affinity data of structurally resolved protein-ligand complexes from the PDB. We selected 4 protein pairs (Table 4.1) from this database that met our selection criteria: i) two different proteins (or variants) bound to the same ligand, ii) the difference in binding affinity is at least 2 fold, iii) the resolution of the crystal structure complex is 2.5 A or better, and iv) no major change in the ligand conformation between the two structures. The difference in binding affinity in 3 out of the 4 pairs turns out to be almost 2 orders of magnitude. All the protein pairs happen to belong to the serine proteases family. The individual protein pairs were brought into the same reference frame using (bound) ligand-based superposition. The database is regularly updated and we acknowledge that more protein pairs, that meet our criteria, could have been added at the time of preparing this manuscript. 4.3.8.2 ATP set AsnRS, a classIlb aminoacyl t-RNA synthetase (AARS), is responsible for the specific aminoacylation of tRNAAs". The two-step catalytic reaction involves the ATP-based activation of the cognate amino acid asparagine and the transfer of the activated asparagine to the 3’-end of tRNAAs". We earlier applied SLIDE to discover new inhibitors of Brugia AsnRS by targeting a 1.9 A resolution closed form of the enzyme in complex ASNAMS (17). The catalytic binding site of AsnRS can be divided into two pockets -— adenosine and asparagine. The asparagine pocket in AsnRS is very specific for 118 o «:2 Amcoaam 080m 6N.N ”ONm—S Ed ad «:2 z z: «X .8“on I dyad o o ease mom .8 2 DE: 2: oo 58.5 m GEES 0:83 ”mm; ”NmU: 21 mama 5338.“ comes—€33 25 395—95 65m N A3298 080E moo; MOmUS 2: 33% £2 2: 5 98 o E a. $538 080: “mu; ammo: £2 m 2: 8.918 Season «:2 / me cowofifimfla 25 325—83 H @538 oEom mow; ”Zmu: _ 2: ~48 Boson; Avon—Sm J. E 56283! wan—Ac Sc 3% one? 2.5.58 E can: no.3»: :8 £395 n83 bags 28 285%: cases :05 £3, Ema—m5 .50 E com: Dom macaw/w 05 me 3685 a: 03:. 119 .ofivgwncom E .5me 3508 Bus QFASDAENED-EoEEmEmemo. mYN AoEEaAEcontmotv.TEEQEAEBEEDENJmm-m _ .oEEESEm e. .052830380-«damages?”3-0838244 _ .65 mod Boa 86.8 88 632-... one _ .8388: @2233 Bob t 3.89: E3 5 £023 .Ame 838% mQ=E< 05 Bob :83 we? 83 385w 2F c 2: G 21 8m Nfz 2... .2mm WEBB mom 5o.— MAS/3 m Eat? Ema»; cans mom so; :9: _ SB; 535 won—cacao :4 03.5. 120 the amino acid, as expected for an AARS. In fact, collaborator Michael Kron’s screen of 11,000 amine compounds in the collection of Discovery Technologies indicated that none would bind to the asparagine or other sites of AsnRS. However, SLIDE-predicted binding modes of the new Brugia AsnRS inhibitors, supported by experimental assays and flexibility modeling studies, indicate that the adenosine pocket can bind to inhibitor scaffolds that mimic the interactions of adenine, in full or in part. In fact, variolin B, one of the SLIDE-discovered Brugia AsnRS inhibitors, has been recently reported to bind also to a human cyclin-dependent kinase (18), which could possibly explain the compound’s cytotoxicity in human cell lines as well. Hence, motivated by the desire to optimize the AsnRS specificity of variolin B and other inhibitors binding to the ATP site, CompSite was tested for its ability to identify binding site specificity determinants that set it apart from other ATP-binding proteins. The ATP-binding proteins were broadly divided into two categories based on the way they bind to the adenine moiety — with the exocyclic amine N6 buried in the binding site (N 6-in) or exposed to the solvent (N 6-out). Chene reported in the context of ATPases, that the N6-in orientation, which allows for maximum hydrogen-bonding contacts between the purine ring and the active site, is found in majority of ATPases reviewed (44). AsnRS falls in the N6-out category while protein kinases belong to the N6-in category. To expand our dataset beyond AsnRS and protein kinases, we also included diverse non-AsnRS N6-out and non-protein-kinase N6- in proteins in our study. Dividing the ATP-binding proteins in this manner provides us with a convenient tool to validate the results of CompSite, especially in terms of identifying steric differences, and also allows identification of chemical specificity determinants in AsnRS relative to a variety of ATP-binding proteins. The 4 subsets of the 121 ATP set have been detailed in the following subsections and tabulated in Table 4.2. All the selected proteins are bound to ATP or its analogs and have a resolution of 2.5 A or better. The proteins were brought into the same reference frame by ligand-based. superposition, using the adenine moiety. A set of 9 PDB protein kinase structures, all of them mammalian, was assembled for the analysis. The selected kinase structures approximately cover most of the different kinase families defined (11) and analyzed (45) in literature. A set of 5 PDB structures of non-protein-kinase N6-in proteins, three of which are from a mammalian source while two are from a bacterial source was assembled. It is not possible to scan all the available structures of ATP-binding proteins and pick the N6-in ones. Hence, the structures of ATP-binding protein families, identified and analyzed earlier by Kuttner and co-workers (46), was surveyed to assemble our N6-in subset. A set of 5 PDB crystal structure complexes of non-AsnRS, N6-out proteins was assembled, two of which are from mammalian sources, two from bacterial sources and one from yeast. For assembling the N6-out subset too, the structures of ATP-binding protein families identified and analyzed earlier by Kuttner and co-workers (46) was surveyed. Apart fiom the crystal structure bound to ASNAMS, another crystal structure with one monomer bound to ATP and the other bound to L-aspartate-B-hydroxamate adenylate (LBHAMP) was also used. In summary, 3 different crystallographic conformations of Brugia AsnRS were used in our analysis. This allowed the three crystallographic conformations of AsnRS, provided by collaborator Stephen Cusack (EMBL Grenoble), to be analyzed by CompSite to identify ligand interaction sites shared between them. 122 Table 4.2: Proteins of the ATP set used in our analysis Protein kinase subset PDB Protein Source Resolution (A) Liganda lATP CAMP-dependent Protein Mus musculus 2.20 ATP Kinase A 1CM8 MAP Kinase P3 8-gamma Homo sapiens 2.40 ANPb Cyclin-dependent Kinase . lHCK (CDK) 2 Homo sapiens 1.90 ATP lIR3 Ins ulrn Receptor (Tyr) Homo sapiens 1.90 ANP Kinase lJNK c-Jun N-terminal Kinase Homo sapiens 2.30 ANP lJPA EPHBZ (Ephrm B2 Mus musculus 1.91 ANP receptor) Kinase lPHK Phosphorylase Kinase 0ryctolagus 2.20 ATP cuntculus lQPC IIEymphocyte-SpeCIfic Homo sapiens 1.60 ANP inase 2SRC Protein Kinase c-Src Homo Sapiens 1.50 ANP N6-in subset PDB Protein Source Resolution (A) Ligand 1A82 Dethiobiotin Synthetase Escherichia coli 1.80 ATP 1AUX Synapsin IA Bos taurus 2.30 SAP° Ubiquitous Kinesin lBG2 Homo sapiens 1.80 ADP Motor Domain 123 Table 4.2 continued PDB Protein Source Resolution (A) Ligand 1BYQ g5” Shmk Pram“ (H513) Homo sapiens 1.50 ATP 1L4U Shikimate Kinase MyCObac’e’if’m 1.80 ADP tuberculoszs N6-out subset PDB Protein Source Resolution (A) Ligand 1BX4 Adenosine Kinase Homo Sapiens 1.50 ANDd lHPl 5 -Nucleotidase (open Escherichia coli 1.70 ATP form) 1QSY DNA Polymerase I Thermus 2.30 DADe aquatzcus 1S3X $106“ Sh°°k Pmte‘“ (HSP) Homo Sapiens 1.84 ADP lYAG Actin S““”“".”."yces 1.90 ATP cerevzszae a The 346m code taken from PDB file. b Phosphoaminophosphonic acid-adenylate ester. c ADP-monothiophosphate. d Adenosine. '3 2’, 3’ —dideoxy ATP. 124 4.4 Results 4.4.1 Chemical ' difference sites: Explaining the observed experimental selectivity of ligands bound to proteins of the AffinDB set Significant chemical difference sites identified by CompSite in the overlapping volumes of binding sites can explain the selectivity of the same ligand bound to two different protein structures of the AffinDB set (Table 4.1). The degree of preference for one template point (or interaction) over other in each chemical difference site is quantified by the difference in DrugScore (ADS) value, with a more negative value being more favorable. To further validate that the template points are well placed for making favorable interactions with the neighboring protein atoms, we used Li gPlot to detect the interactions. The sites detailed in Table 4.3 could be located, with reference to the bound ligand, in the labeled panels of Figure 4.4. For each of the 4 protein pairs of the AffinDB set, the chemical difference sites identified by our method and their agreement with experimental results have been explained in the following paragraphs. Thrombin (PDB: 1C5N) and urokinase type plasminogen activator (UPA, PDB: 1C5X) are both bound to the ligand 4-iodobenzo-[B]-thiophene-2-carboxamidine (PDB 3-letter code ESI, Figure 4.4.A). The ligand ESI inhibits UPA better than thrombin, with almost a lOO-fold difference in the inhibition constants (K) (Table 4.1). 125 Table 4.3: Significant chemical difference sites identified between protein pairs of the AffinDB set that can explain the known relative binding affinities of their bound ligands (Table 4.1). Protein paira: IC5N and 105x , 1: Interaction Degree Ofd Site Protein referredc preference Interactions detected P ADS (x 10‘) C interacts with Val 213: CGl 26 lCSX C over N -l .05 and Trp 215: CA. N has no interactions Protein pair: 1 C50 and 1 C52 Interaction Degree 0f Site Protein referred preference Interactions detected " ADS (x 10‘) 16 1C5Z C over N 4.75 C interacts with Cys 191: C. N has no interactions. 21 1C50 C over N -122 C interacts with Trp 215: C. N has no interactions. Protein pair: 1F 0U and lEZQ Interaction Degree 0f Site Protein referred preference Interactions detected 9 ADS (x 10“) C interacts with ring atoms of 116 lEZQ C over D -0.74 Phe 174. D has no interactions. C interacts with Val 213: CGl 127 lEZQ CoverN -1.04 and Trp 215; C, CA. N has no interactions. 126 Table 4.3 continued Protein pair: I V2J and I V2L Interaction Degree 0f Site Protein referred preference Interactions detected 1’ ADS (x 10‘) 21 1sz C over N _1 .50 C interacts with Va1213: CGl. N has no interactions. a Only the PDB codes of the protein pairs are mentioned here. For the protein names and other details, please refer to Table 4.1. The shared interaction sites are numbered by the clustering algorithm. To locate these sites please see Figure 4.4. c The template points are labeled by their interaction type: A for hydrogen-bond acceptor, D for hydrogen-bond donor, N for hydrogen-bond acceptor and/or donor, and C for hydrophobic points. d Degree of preference is quantitatively indicated by the difference in DrugScore (ADS) value (in arbitrary units). 127 Figure 4.4: The significant chemical difference sites, tabulated in Table 4.3, for the 4 pairs of proteins in the AffinDB set (Table 4.1) are shown in this figure. For each pair, these sites can explain the observed difference in binding affinities of the same ligand that they bind to. The superimposed bound conformations of each ligand, rendered in atom colored tubes (carbon, green or orange; oxygen, red; nitrogen, blue; and sulfur, yellow) are shown in all panels for reference. The template points in these chemical difference sites, rendered as spheres (solid or mesh surface), are colored by interaction type: red for hydrogen bond acceptor, blue for hydrogen bond donor, white for hydrogen bond acceptor and/or donor, and green for hydrophobic. The significant chemical difference site between thrombin (PDB: 1C5N) and urokinase type plasminogen activator (UPA, PDB: 1C5X) is shown in A. The 1C5N template point is rendered as a sphere with mesh surface while the lCSX template point is rendered as a solid sphere. The bound ligand inhibits 1C5X with a better Ki than 1C5N (Table 4.1). The significant chemical difference sites between thrombin (PDB: 1C50) and urokinase type plasrrrinogen activator (UPA, PDB: 1C5Z) are shown in B. The 1C50 template points are rendered as a sphere with mesh surface while the 1CSZ template points are rendered as a solid sphere. The bound ligand inhibits 1C5Z with a better Ki than 1C50 (Table 4.1). The significant chemical difference sites between trypsin (PDB: 1FOU) and factor Xa (PDB: lEZQ) are shown in C. The 1FOU template points are rendered as a sphere with mesh surface while the lEZQ template points are rendered as a solid sphere. The bound ligand inhibits lEZQ with a better Ki than 1FOU (Table 4.1). The significant chemical difference site between trypsin variant 1 (PDB: 1V2J) and trypsin variant 2 (PDB: 1V2L) is shown in D. The 1V2J template point is rendered as a sphere with mesh surface while the 1V2L template point is rendered as a solid sphere. The bound ligand inhibits 1V2L with a better Ki than 1V2] (Table 4.1). 128 Figure 4.4 129 The chemical difference site 26 (Table 4.3, Figure 4.4.A) indicates that UPA prefers a hydrophobic ligand atom over a hydrogen bond acceptor and/or donor. A hydrophobic ligand atom at this site can interact with carbon atoms in UPA residues Val 213 and Trp 215 whereas no hydrogen bond interactions could be made. This is well corroborated by the interactions made by the nearby (~l.5 A) sulfur atom in the ligand with the same residues Val 213 and Trp 215 in UPA. Thrombin too has the same (Val 213 and Trp 215) residues and they do interact with the sulfur atom in its bound ligand. However, a hydrogen bond acceptor and/or donor ligand atom at this site in thrombin could improve the binding affinity as it can both donate to the main-chain oxygen atom in Ser 214 as well as accept from the hydroxyl group in Ser 195. In the crystal structure, these two serine residues make hydrogen bonds with water molecules. Now, the same serine residues, 195 and 214, are also present in UPA but are located at an unfavorable distance to interact with a ligand atom at this site. In the UPA crystal structure, Ser 195 donates to the citrate ion while Ser 214 accepts from a water molecule. Katz and co-workers suggested that the selectivity of the ligand ESI is due to a hydrogen bond between the ligand’s amidine group and the hydroxyl group in Ser 190, located in the S1 pocket of UPA (4 7). In the thrombin structure, the same serine residue in the S1 pocket is replaced by Ala 190 and hence lacks the hydrogen bond with the amidine group of the ligand. The templates representing the two protein structures did place a hydrophobic point in UPA and a hydrogen bond acceptor and/or donor in thrombin, but the clustering algorithm couldn’t identify them as a shared interaction site as they were separated by 1.53 A, greater than our clustering threshold of 1.3 A. In summary, our method identified an 130 additional specificity determinant between the two proteins which could further explain the ligand’s selectivity. Thrombin (PDB: 1C50) and urokinase type plasminogen activator (UPA, PDB: 1C5Z) are both bound to the ligand benzarnidine (PDB 3-letter code BAM, Figure 4.4.B). The ligand BAM inhibits UPA better than thrombin, with almost a 3-fold difference in K; (Table 4.1). There were two significant chemical difference sites identified by our method between these two protein structures (Table 4.3, Figure 4.4.B). The chemical difference site 16 indicates that UPA prefers a hydrophobic ligand atom over a hydrogen bond acceptor and/or donor. A hydrophobic ligand atom at this site can interact with UPA residue Cys 191 whereas no hydrogen bond interactions could be made. This is well corroborated by the interactions made by the nearby (~0.6 A) carbon atom in the ligand with other hydrophobic residues in its proximity. Similarly, the chemical difference site 21 indicates that thrombin prefers a hydrophobic ligand atom over a hydrogen bond acceptor and/or donor. A hydrophobic ligand atom at this site can interact with thrombin residue Trp 215 whereas no hydrogen bond interactions could be made. This site actually detects a known binding site chemistry difference in the S1 pockets of the two structures where a serine residue in UPA is substituted by an alanine residue in thrombin at the 190 position (47). Trypsin (PDB: 1FOU) and factor Xa (PDB: lEZQ) are both bound to the ligand 3- [(3'-aminomethyl-biphenyl-4-carbonyl)-amino]- 2-(3-carbamimidoyl-benzyl)-butyric acid methyl ester (PDB 3-letter code RPR, Figure 4.4.C). The ligand RPR inhibits factor Xa better than trypsin, with almost an 80-fold difference in K, (Table 4.1). Maignan and co- workers reported that the only binding site differences in these two serine protease 131 structures are found in their 81 (left side of Figure 4.4.C) and S4 (right side of Figure 4.4.C) pockets (48). In both structures, the S1 pocket is occupied by the benzamidine group of the ligand and the S4 pocket is occupied by the aminomethylbiphenyl group of the ligand. The two chemical difference sites identified by our method are in good agreement with the known differences, although one of them didn’t pass the significance, difference in DrugScore (ADS) threshold value of -1.0 x 104 (arbitrary units). The significant chemical difference site 127 in factor Xa, located in the S1 pocket, indicates that a hydrophobic ligand atom is preferred over a hydrogen bond acceptor and/or donor. A hydrophobic ligand atom at this site can interact with carbon atoms of factor Xa residues Val 213 and Trp 215 whereas no hydrogen bond interactions could be made. Apart from the interactions made by nearby (~ 0.9 A) carbon ring atoms in the ligand with the same residues, the site also detects the known binding site chemistry difference due to the presence of Ala 190 in the S1 pocket of factor Xa as opposed to Ser 190 in the SI pocket of trypsin. In fact, a hydrogen bond acceptor and/or donor ligand atom at this site in trypsin can accept from the hydroxyl group of Ser 190 in its S1 pocket. The chemical difference site 116 in factor Xa, located in the S4 pocket, indicates that a hydrophobic ligand atom is preferred over a hydrogen bond donor. A hydrophobic ligand atom at this site can interact with ring atoms of factor Xa residue Phe 174 whereas no hydrogen bond interactions could be made. This is well corroborated by the ring atoms in the aminomethylbiphenyl group of the ligand RPR that interact with residues Phe 174 and Tyr 99. A hydrogen bond donor ligand atom at this site in trypsin can interact with the side chain oxygen atom (OEl) of Gln 175, which occupies the same space as Phe 174 in factor Xa. 132 The structures of several recombinant bovine trypsin variants were solved and analyzed by Rauh and co-workers to study the effect conformational variability on binding affinity (49). For our analysis, we selected two structures that were bound to the same ligand and had the maximum difference in their experimental Ki’s. The first trypsin variant (PDB: 1V2J) contained Ser 172, Ser 173, Arg 174 and Ile 175 insertion, while the second one contained Glu 97, Tyr 99, Ser 172, Ser 173, Phe 174 and Ile 175 insertion, and Ala 190 and Glu 217 of factor Xa. Both these structures were bound to the ligand benzamidine (PDB 3-letter code BEN, Figure 4.4.D). The ligand BEN inhibits variant 2 better than variant 1, with a 100-fold difference in K, (Table 4.1). Except for Ser 190 in variant 1 and Ala 190 in variant 2, the difference in the two structures is peripheral to the ligand-binding site. The chemical difference site 21 indicates that variant 2 prefers a hydrophobic ligand atom over a hydrogen bond acceptor and/or donor. A hydr0phobic ligand atom at this site can interact with variant 2 residue Val 213 whereas no hydrogen bond interactions could be made. Apart from the interactions made by the closest (~0.2 A) carbon atom in the ligand with the same residue, this site also detects the actual binding site chemistry difference substitution of Ser 190 in variant 1 by Ala 190 in the S1 pocket of variant 2. In fact, a hydrogen bond acceptor and/or donor ligand atom at this site in variant 1 can accept from the hydroxyl group of Ser 190 in its S1 pocket. In addition to this chemical difference site identified by our method, it was algorithmically challenging to assess the known contribution of conformational flexibility of different insertions that are considerably distant from the ligand-binding site that our method focused on. 133 4.4.2 Similar sites identified in the ATP-set proteins Chemically similar sites identified by our method for proteins of each of the 4 subsets of the ATP set are shown in Figure 4.5. These sites represent the binding site invariants in each of the 4 subsets as they are identified after passing a stringent filter described in the “Materials & Methods” section of this chapter. The details of the similar sites identified by our method for each of the 4 subsets of the ATP set have been explained in the following paragraphs. 4.4.2.1 AsnRS similar sites Based on the comparison of available crystal structures, we know the flexible regions of Brugia AsnRS that adopt different conformations when bound to different ligands (1 7). Similar sites identified for the AsnRS subset, representing the invariant features of the 3 conformers is shown in Figure 4.5.A. The individual templates representing the 3 AsnRS conformers bound to ASNAMS, LBHAMP and ATP had 148, 142 and 123 points respectively. The adenine moiety of all the three bound ligands was superimposed, as shown in Figure 4.5.A, to bring the templates in the same reference frame. Clustering and further processing of these individual templates led to the identification of 98 chemically similar sites out of which 77 were hydrogen bonding sites while 21 were hydrophobic sites. Since the three structures were essentially conformers of the same protein, the large number of similar sites identified, covering more than 50% of the points in each individual template wasn’t unexpected. But, our method was able to sense the conformational variability by not identifying the template points that were found in only 134 Figure 4.5: The chemically similar sites identified for each of the 4 subsets of ATP- binding proteins in the ATP set (Table 2) are shown in this figure. The bound ligands of the clustered structures (templates) in each class are rendered as atom-colored tubes in all panels. The similar sites are rendered as solid spheres, colored by interaction type (red for hydrogen bond acceptor, blue for hydrogen bond donor, white for hydrogen bond acceptor and/or donor, and green for hydrophobic) in all the panels. The similar sites for: 3 conformers of AsnRS are shown in A, 9 protein kinase structures in B, 5 non-kinase N6-in structures in C, and 5 N6-out structures in D. 135 Figure 4.5 136 one of the three, and hence accounting for the change in binding site architecture, structures clustered. The density of hydrogen bonding sites, as expected, was high near the polar atoms of the adenine moiety, the ribose moiety and the asparagine side chain. ATP binds to AsnRS in a bent conformation and our method could identify a lot of similar, hydrogen-bond acceptor sites near the phosphate tail. The ASNAMS-bound AsnRS conformation, with its template points occupying 80 of the 98 similar sites, was chosen as the representative structure of this subset for further analysis. 4.4.2.2 Protein Kinase similar sites Similar sites identified by our method for the protein kinase subset (Table 4.2), representing the most invariant features of the 9 structures selected for our analysis, are shown in Figure 4.5.B. The number of interaction sites in the individual templates representing the 9 protein kinase structures bound to ATP or its analogs ranged from 59 to 183. We used the co-crystallized ligand to define the volume of the binding site while generating the template for each of the selected protein. In the ephrin B2 receptor (EPHBZ) kinase (PDB: lJPA) crystal structure complex, only the adenine ring of the bound phosphoaminophophonic acid-adenylate ester (ANP) was seen in the density. As a result, the template representing the binding site of this kinase structure had the least number of points. The template representing protein kinase c-src (PDB: 2SRC) had the highest number of interaction sites. The adenine moiety of all the bound ligands was superimposed, as shown in Figure 4.5.3, to bring the templates in the same reference frame. Clustering and further processing of these individual templates led to the 137 identification of 17 chemically similar sites out of which 10 were hydrogen bonding sites while 7 were hydrophobic sites. The selected set of protein kinases are structurally quite diverse and also differ in the way they bind to the ATP analogs as is evident in the different puckered conformations of the ribose ring that sends the phosphate tail in different directions (Figure 4.5.B). However, the interactions made by the polar nitrogen atoms, especially N1 (a hydrogen bond acceptor) and N6 (a hydrogen bond donor), of the adenine moiety in bound ligands are well represented by similar sites identified by our method and is in agreement with previous surveys on ATP-binding proteins (50, 51). There were hardly any hydrogen-bonding similar sites identified near the phosphate tails, indicating the difference in their bound conformations observed across all the 9 structures included in our analysis. For further analysis, we chose the phosphorylase kinase structure (PDB: lPHK), with its template points occupying 16 of the 17 similar sites, as the representative structure for protein kinase subset. 4.4.2.3 N6—in similar sites Similar sites identified by out method for proteins of the N6-in subset (Table 4.2), representing the most invariant features of the 5 structures selected for our analysis, are shown in Figure 4.5.C. The number of points in the individual templates representing the 5 N6-in structures bound to ATP or its analogs ranged from 125 to 159. The adenine moiety of all the bound ligands was superimposed, as shown in Figure 4.5.C, to bring the templates in the same reference frame. Clustering and further processing of these individual templates led to the identification of 11 chemically similar sites out of which 138 only 3 were hydrogen-bonding sites while 8 were hydrophobic sites. The polar similar sites are located near the a-phosphate moiety, 5’-OH of the ribose moiety and near the N1 and N6 atoms of the adenine moiety of the superimposed ligands (Figure 4.5.C). Although the 5 selected structures are structurally quite diverse, we were still expecting to see more similar sites. The low number of hydrogen bonding interaction sites is both surprising and calls for further investigation. There are three possible explanations for this low number. The first one pertains to the post-clustering processing of the shared interaction sites to identify chemically similar sites. For a shared interaction site to be identified as a Similar site, a strict filter of 2/3 occupancy (in this case, points from at least 4 of the 5 templates) is imposed. There are Similar sites which have occupancy of 0.6 or points from 3 out 5 templates, but are left out because of the strict filter. The second one pertains to the clustering process itself to identify the shared interaction sites. The lack of structural and sequence similarity between these proteins could result in the geometrically favored positions for ligand atoms to make hydrogen bonds being farther apart than the clustering threshold of 1.3 A and hence are not clustered. The third reason could be that the hydrogen bonding potential of the polar atoms in the adenine moiety of the bound ligands may actually be satisfied by water molecules, and hence we don’t see many hydrogen-bonding points from the individual templates being clustered. For further analysis, we chose the dethiobiotin synthetase structure (PDB: 1A82), with its template points occupying 10 of the 11 similar sites, as the representative structure for N6-in subset. 139 4.4.2.4 N6-out similar sites Similar sites identified by out method for proteins of the N6-out subset (Table 4.2), representing the most invariant features of the 5 structures selected for our analysis, are shown in Figure 4.5.D. The number of points in the individual templates representing the 5 N6-out structures bound to ATP or its analogs ranged from 85 to 162. The template representing the adenosine kinase structure (PDB code lBX4) had the least number of interaction sites Since the protein was bound to adenosine and hence the volume defined by it during template generation was relatively smaller. The adenine moiety of all the bound ligands was superimposed, as shown in Figure 4.5.D, to bring the templates in the same reference frame. Clustering and further processing of these individual templates led to the identification of 9 chemically similar sites out of which only 2 was a hydrogen- bonding site while 7 were hydrophobic sites. The two polar similar sites are located near the C2, N3 atoms of the adenine moiety and 3’-OH of the ribose moiety of the superimposed ligands (Figure 4.5.D). The low number of similar hydrogen-bond Sites could be because of the same reasons discussed earlier in the “N6-in similar sites” section. A stronger reason could be that the hydrogen bonding potential of most of the polar atoms in the adenine moiety of the bound ligands may actually be satisfied by water molecules or other hetero groups (e.g. DNA in DNA polymerase I structure (PDB: 1QSY)), and hence we don’t see many polar interaction sites in the individual templates being clustered. For further analysis, we chose the actin structure (PDB: lYAG), with its template points occupying 8 of the 9 similar sites, as the representative structure for N6- out subset. 140 4.4.3 Chemical difference sites identified in protein pairs of the ATP set We chose only the representative structures, chosen based on the analysis of their similar sites, of the ATP subsets to identify the chemical difference sites between each of them and AsnRS. Significant chemical difference sites identified by our method between AsnRS and protein structures representing each of the other 3 subsets of the ATP set have been tabulated in Table 4.4 and shown in Figure 4.6. Since our objective is to identify Specificity determinants in Brugia AsnRS relative to other ATP-binding proteins, only the sites with AsnRS-preferred interactions have been discussed here. Just like we did for the AffinDB set, the degree of preference is quantified by the difference in DrugScore (ADS) value and the template points are validated by LigPlot to detect the interactions. The sites detailed in Table 4.4 could be located, with reference to the bound ligands, in the labeled panels of Figure 4.6. Based on their location in the AsnRS binding site, we have divided the chemical difference sites into three categories described in the following subsections. 4.4.3.1 Chemical difference sites in adenine pocket Adenine pocket seems to be the most conserved in the different binding sites of ATP-set proteins in our analysis. The hydrogen-bond interactions of the polar nitrogen atoms and the hydrophobic interactions of the aromatic carbon atoms of adenine have been identified as specificity determinants to distinguish adenine from other nucleotides 141 Table 4.4: Significant chemical difference sites identified between Brugia AsnRS and representative structures of other ATP-binding proteins (Table 4.2). Protein paira: AsnRS and lPHK (a representative structure of the protein kinase subset) , b , Interaction Degree Ofd . Site Protein referredc preference Interactions detected P ADS (x 10‘) A accepts from Gly 408: N. 11 AsnRS A over D -1.03 D donates to Ile 361: O. C interacts with atoms in Ile 71 AsnRS C over D -l .63 361 and Val 362' D has no interactions. Protein pair: AsnRS and 1A 82 (a representative structure of the N6-in subset Interaction Degree at Site Protein referred preference Interactions detected P ADS (x 10‘) 3 AsnRS A over D _1_72 A accepts from Arg 411: NHl. D donates to Glu 360: O. A accepts from Tyr 223: OH 29 AsnRS A over C 4 .58 and/or Arg 210: NHZ- C interacts with His 225: CEl and Arg 210: CZ. 53 AsnRS C over A -1.67 C interacts with Tyr 223: CE2. A has no interactions. 64 AsnRS C over A -l .39 C interacts with Gly 363: CA. A has no interactions. 142 Ii: Table 4.4 continued Protein pair: AsnRS and I YAG (a representative structure of the N6-out subset , Degree of , , Interaction . Site Protein referred preference Interactions detected P ADS (x 10‘) A accepts from Tyr 223: OH 7 AsnRS A over C -1.58 and/or Arg 210: NHZ' C interacts with His 225: CEl and Arg 210: CZ. 18 AsnRS A over D _1 .72 A accepts from Arg 411: NHl. D donates to Glu 360: O. 54 AsnRS C over A 444 C interacts with His 219: CD2. A has no interactions. a Only the PDB codes of the ATP-set protein are mentioned here. For the protein names and other details, please refer to Table 4.2. b The shared interaction sites are numbered by the clustering algorithm. To locate these sites please see Figure 4.5. c The template points are labeled by their interaction type: A for hydrogen-bond acceptor, D for hydrogen-bond donor, N for hydrogen-bond acceptor and/or donor, and C for hydrophobic points. d Degree of preference is quantitatively indicated by the difference in DrugScore (ADS) value (in arbitrary units). 143 Figure 4.6: The significant chemical difference sites (Table 4.4) between Brugia AsnRS and representative structures of each of the other three classes of ATP-binding proteins in the ATP set (Table 4.2) are shown: (A) phosphorylase kinase (PDB: lPHK), a representative protein kinase structure bound to ATP, (B) dethiobiotin synthetase (PDB: 1A82), a representative non-kinase, N6-in structure bound to ATP, and (C) actin (PDB: lYAG), a representative N6-out structure bound to ATP. Most of the difference sites are located in the ribose pocket of AsnRS binding site. The bound ligands (superimposed by their adenine moiety), rendered as atom-colored tubes, are shown for reference in all the panels. ASNAMS bound to AsnRS can be distinguished from other ligands by its orange carbon atoms. The template points in the chemical difference sites are rendered as spheres (solid surface for AsnRS and mesh surface for the other) and colored by interaction type: red for hydrogen bond acceptor, blue for hydrogen bond donor, white for hydrogen bond acceptor and/or donor, and green for hydrophobic. 144 (50, 51). Even the similar sites identified by our method for these proteins (Figure 4.5) indicate high similarity in the adenine pocket. Therefore, it wasn’t surprising to find there were only 2 significant chemical difference Sites, with AsnRS-preferred interactions, identified by our method — site 53 between AsnRS and dethiobiotin synthetase (Figure 4.6.B) and site 54 between AsnRS and actin (Figure 4.6.C). The chemical difference site 53 (Table 4.4, Figure 4.6.B) indicates that AsnRS prefers a hydrophobic ligand atom over a hydrogen bond acceptor. A hydrophobic ligand atom at this Site can interact with the aromatic side-chain AsnRS residue Tyr 223 whereas no hydrogen bond interactions could be made. On the other hand, the hydrogen bond acceptor at this site in dethiobiotin synthetase can accept from side-chain amide nitrogen atom of 1A82 residue Asn 175, mimicking the interactions of the nearby ligand nitrogen atom N7 of the adenine moiety. The N7 atom in the bound ligand (ASNAMS) of AsnRS does not interact with either the neighboring protein atoms or any crystallographic water molecules. Therefore, a hydrophobic ligand atom at this site in the AsnRS binding site should improve the affinity and selectivity for AsnRS over dethiobiotin synthetase, a non-protein-kinase N6- in representative structure. The other significant chemical difference site in the adenine pocket, site 54 between AsnRS and actin (Table 4.4, Figure 4.6.C), indicates that AsnRS prefers a hydrophobic ligand atom over a hydrogen bond acceptor. A hydrophobic ligand atom at this site can interact with a side-chain carbon atom of AsnRS residue His 219 (CD2) whereas no hydrogen bond interactions could be made. On the other hand, the hydrogen bond acceptor at this site in actin can accept from side-chain terminal amine of lYAG residue Lys 336. 146 4.4.3.2 Chemical difference sites in ribose pocket The different puckers of ribose ring, observed in structures of bound ligands, has been used earlier in designing selective ligands (52, 53). Our method identified 5 unique significant chemical difference sites in the ribose pocket between AsnRS and the 3 other ATP-binding proteins. Chemical difference sites 3 and 29 (Figure 4.6.B), between AsnRS and dethiobiotin synthetase, are almost identical to Sites 18 and 7 (Figure 4.6.C) respectively, between AsnRS and actin. Two of the chemical difference sites near the 2’-OH of the ribose moiety in ASNAMS — site 11 (Figure 4.6.A) and ahnost identical sites 3 (Figure 4.6.B) and 18 (Figure 4.6.C) — prefer a hydrogen-bond acceptor ligand atom over hydrogen-bond donors in rest of the three proteins. An acceptor atom at site 11 can accept from main- chain nitrogen atom of AsnRS residue Gly 408 whereas a donor atom can donate to main-chain oxygen atom of Ile 361 (Table 4.4). On the other hand, a hydrogen-bond donor at site 11 (Figure 4.6.A) can donate to main-chain oxygen atom Leu 25 and/or side-chain carboxyl oxygen atom of Glu 110 in phosphorylase kinase. An acceptor atom at site 3 in Figure 4.6.B, identical to site 18 in Figure 4.6.C, can accept from side-chain terminal amine of AsnRS residue Arg 411 whereas a donor atom can donate to main- chain oxygen atom of Glu 360 (Table 4.4). On the other hand, a hydrogen-bond donor at site 3 in dethiobiotin synthetase can donate to side-chain carboxyl oxygen atom Glu 211, and at site 18 in actin can donate to side-chain carboxyl oxygen atom of Glu 214. While hydrogen-bond donors can interact with AsnRS residues, the interactions made by an acceptor atom seem to be much stronger as indicated by the difference in DrugScore values (Table 4.4). It is also significant to note that these sites, located near the 2’-OH of 147 the ribose moiety in ASNAMS, are exposed to the solvent in both phosphorylase kinase and dethiobiotin synthetase while they are buried in the binding site in AsnRS. In fact, the 2’-OH in the bound ligands of phosphorylase kinase and dethiobiotin synthetase interact with the crystallographic water molecules whereas in ASNAMS, it (2’-OH) donates to the main-chain oxygen atom of AsnRS residue Ile 361 and accepts from the main-chain nitrogen atom of Gly 408. The significant chemical difference site 29 in Figure 4.6.B, almost identical to site 7 in Figure 4.6.C, near the 5’-OH of the ribose moiety, indicates that AsnRS prefers a hydrogen-bond acceptor over a hydrophobic atom. Arr acceptor atom at this site can accept from side-chain hydroxyl group of AsnRS residue Tyr 223 and/or side-chain terminal amine of Arg 210, whereas a hydrophobic atom can interact with side-chain carbon atoms of AsnRS residue His 225 and Arg 210 (Table 4.4). A hydrophobic ligand atom is preferred in AsnRS for the other two sites — site 71 (Figure 4.6.A) and site 64 (Figure 4.6.B) — over hydrogen-bond donor atom in phosphorylase kinase and an acceptor atom in actin. The hydrophobic atoms at sites 71 and 64, located near the C3’ atom and the sulfamoyl group respectively of the ribose moiety in ASNAMS, can interact with AsnRS residues Ile 361, Val 362 and Gly 363 whereas no hydrogen bond interactions could be made (Table 4.4). 4.4.3.3 Chemical difference sites in the amino-acid (asparagine) pocket As expected from an AARS, the asparagine pocket in AsnRS is very specific for the cognate amino acid. However, there were no significant differences identified by our 148 method in and around this pocket. When the proteins are brought in the same reference frame, other protein binding site pockets in the vicinity of the AsnRS asparagine pocket are more polar as they bind to the phosphate tail. However, it must be noted that AsnRS does bind to ATP but in a bent conformation (Figure 4.5.A) such that the B and y phosphates of ATP bind and interact with residues located away from the asparagine pocket in the AsnRS binding site. In terms of specificity determinants for AsnRS in this pocket, no new information is revealed except the obvious that it is sterically highly specific for asparagine. In fact, the steric difference sites (presented in the next subsection) identified by our method reveal that the asparagine pocket may not even be sterically accessible in the phosphorylase kinase structure. 4.4.4 Steric difference sites identified between AsnRS and phosphorylase kinase We used our method to identify steric difference sites between AsnRS and protein structures representing each of the other 3 subsets of the ATP set. However, we present here results obtained for only one pair —- AsnRS and phosphorylase kinase (representative structure of the kinase subset), as it sufficiently demonstrates the ability of our method to identify the true steric difference sites between a pair of protein structures. As mentioned earlier, AsnRS and phosphorylase kinase bind to ATP in completely opposite ways with reference to the exocyclic amine N6, which is exposed to the solvent in the former while it is buried in the binding site in the latter. Hence, it is easier to verify the steric difference sites by looking at the surfaces of the two proteins around them. Also, the fact that we 149 check for van der Waals overlaps between template points of one and atoms of the other protein structure, self-validates these steric difference sites. The Connolly solvent- accessible molecular surface of both AsnRS and phosphorylase kinase are shown in the panels of Figure 4.7. Steric difference sites indicating the accessible space in AsnRS Figure 4.7: The steric difference sites, showing accessible space in Brugia AsnRS relative to phosphorylase kinase (PDB: lPHK), a representative protein kinase, and vice- versa, are shown in this figure. The ligands (ASNAMS in A and B, ATP in C and D), rendered as atom-colored tubes, bound in each of the structures are also shown for reference. The steric difference sites are rendered as solid cyan spheres in all the panels. The Connolly solvent-accessible molecular surface of AsnRS is colored grey while that of phosphorylase kinase is colored yellow. The back of both the surfaces appears black in B and D. The accessible space in AsnRS (binds to adenine with the exocylic N6 amine exposed to the solvent) is depicted by steric difference sites in A. This same space is inaccessible in phosphorylase kinase (binds to adenine with the exocylic N6 amine buried in the binding site) as shown in B, where the steric difference sites, accessible in AsnRS, are occluded behind the kinase surface. Similarly, the accessible space in phosphorylase kinase is depicted by steric difference sites in C. This same space is inaccessible in AsnRS as shown in D, where the steric difference sites, accessible in the kinase, are occluded behind the AsnRS surface. 150 (surface colored grey) are Shown in Figure 4.7.A, while the same space is inaccessible in phosphorylase kinase (surface colored yellow) as shown in Figure 4.7.B. Similarly, steric difference sites indicating the accessible space in phosphorylase kinase are shown in Figure 4.7.C, while the same space is inaccessible in AsnRS as shown in Figure 4.7.D. It turns out that the asparagine pocket is sterically inaccessible in phosphorylase kinase, explaining the lack of significant chemical difference sites in the same pocket between the two proteins. The steric difference sites identified by our method reveal two important things. Firstly, the adenine moiety is efficiently packed in both the binding sites, hardly leaving any room for binding of additional ligand atoms. Of course, this observation discounts any conformational changes that might alter the shape and accessible volume in the two binding sites, allowing for larger ligands to bind. Secondly, the ribose moiety in AsnRS (and other N6-out proteins as well) is buried and tightly packed against the binding site, leaving no room for any larger ligand to bind in that pocket. In contrast, the ribose moiety in phosphorylase kinase (and other N6-in protein as well) is exposed to the solvent and hence offers more options for ligand modification to improve the selectivity for the kinase. The asparagine pocket provides the best sterically accessible option for ligand modification to selectivity for AsnRS. 4.5 Discussion 4.5.1 Relative significance of chemical difference sites We used DrugScore to score the chemical difference sites identified by our method in order to assess their relative significance. An interaction site that is quantified as a 151 significant chemical difference in one protein need not be a significant difference in the other protein. Although counterintuitive at first, this can be explained if you consider the contribution of specific interactions to binding affinities of the same ligand bound to two different proteins. For example, ligand RPR binds to both trypsin and factor Xa (Table 4.1, Figure 4.4.C) with the benzamidine group occupying the S1 pocket in both the binding sites and making interactions with residue the Asp189, present in both the proteins. However, the residue Ser 190 in trypsin is substituted by Ala 190 in factor Xa in the S1 pocket, which is sensed by the chemical difference site 127 (Table 4.3, Figure 4.4.C) identified by our method. Similarly, the residue Phe 174 in trypsin occupies the same volume as Gln 175 in factor Xa in the S4 pocket, which is sensed by the chemical difference site 116 (Table 4.3, Figure 4.4.C) identified by our method. The interactions made by the trypsin residue Ser 190 with the benzamidine group may contribute significantly to its binding affinity but are not enough to confer selectivity relative to factor Xa and hence Site 127 is not characterized as significant in trypsin. The 80-fold difference in K, for ligand RPR between trypsin and factor Xa has been experimentally accounted for by the differences in the S1 and S4 pockets of the two binding sites (48), and is in good agreement with the chemical difference sites identified by our method. Similarly, the chemical difference sites identified in the ribose pocket are quantified as significant in AsnRS but not in other ATP-binding proteins (Table 4.4, Figure 4.6). Especially site 71 (near C3’, Figure 4.6.A) and site 3 (near 2’-OH, Figure 4.6.B), with ADS values of -l6,300 and -l7,200 respectively (Table 4.4), have very high degree of preference for AsnRS relative to phosphorylase kinase or dethiobiotin synthetase. The fact that the co-crystallized ligands bind to these proteins with their 152 ribose moieties either buried (AsnRS) or exposed to the solvent (phosphorylase kinase and dethiobiotin synthetase), does have a bearing on whether a chemical difference site located in these pockets are significant or not. Based on our results, we observe that a chemical difference site, identified in a pocket that is solvent exposed in one but buried in another protein binding site, usually is a significant specificity determinant for the protein where it is buried. 4.5.2 Integrating our method into virtual screening protocol The chemical as well as steric difference sites identified by our method could be directly integrated into our structure-based virtual screening tool SLIDE. For docking a ligand, represented by a set of interaction points, into the binding site of the target protein, all possible triplets of its interaction points are mapped onto geometrically and chemically compatible template triangles. Template points that are identified as significant chemical difference sites by our method can be marked as key points, and any docking must then include a match to at least one of these points. Labeling selected template points as key points in SLIDE has yielded improved results in the past in docking and identifying both known as well as new ligands of thrombin, glutathione S-transferase (GST), HIV-1 protease and AsnRS (17, 33). Results from our methods could also used as pharmacophore constraints and/or filters and enable a bias to be applied in other structure-based virtual screening protocols like DOCK (54, 55), FlexX (56, 57) and FRED (58). Other approaches for target-biased structure-based virtual screening were reviewed by Jansen and co-workers (5 9). 153 4.5.3 Conformational flexibility and chemical difference sites For identifying chemical difference sites, the method is currently designed to work for a pair of protein structures fed into the algorithm, without accounting for their conformational flexibility. To circumvent this issue, one could use our method to first identify the similar sites among the available structures of conformers of the same protein (see AsnRS similar sites, Figure 4.5.A) and then compare the similar sites, representing the binding site invariants, for identifying chemical difference sites. When structures of different conformers of the same protein are not available, then sampling algorithms could be used to generate low-energy conformers of the input protein structures. We have earlier employed our graph-theoretic algorithm to identify the flexible regions in a given protein structure, ProFlex (60, 61), combined with our random-walk sampling algorithm, ROCK (62, 63), to generate several low-energy conformers of various proteins including cyclophilin A, estrogen receptor, dihydrofolate reductase, HIV protease and AsnRS (I 7, 62, 63). 4.5.4 Superpositional accuracy All methods of this type need a reasonable superposition between the protein structures to start with. We have used ligand-based superposition to apply our method to address the questions we asked and it was shown that it performs robustly as long as the superpositional shift in the two structures is under 1.5 A (Figure 4.3). Protein structure- based superposition methods like DALI (64) and MSDfold (65) could also be used to 154 bring the two structures in the same reference frame. However, the alignment of the structures may be challenging when the proteins are not related and have low sequence and/or structural homology. 4.6 Conclusions Using complete-linkage clustering of superimposed templates, generated by SLIDE to represent the protein binding sites, we developed a method to identify binding site invariants and specificity determinants between proteins. We applied this method on two sets of proteins, assembled to address different questions. The significant chemical difference sites identified by our method were able to explain the experimentally observed selectivity of ligands bound to the proteins of the AffinDB set. For proteins of the ATP set, we used our method to identify chemically similar sites, chemical difference sites and steric difference sites. Given the high density of similar sites identified in the adenine pocket, a productive strategy for AsnRS inhibitor design would be to exploit the significant chemical difference sites and steric difference sites identified in the ribose and asparagine pockets of the AsnRS binding site respectively. The results from this method could easily be integrated in our structure-based virtual screening protocol which could then be used to screen for selective ligands that occupy the significant chemical difference sites in AsnRS. 155 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) References Carr, R., and Jhoti, H. (2002) Structure-based screening of low—affinity compounds. Drug Discov Today 7, 522-7. Lyne, P. D. (2002) Structure-based virtual screening: an overview. Drug Discov Today 7, 1047-55. Stahura, F. L., and Bajorath, J. (2004) Virtual screening methods that complement HTS. Comb Chem High Throughput Screen 7, 259-69. Ghosh, S., Nie, A., An, J ., and Huang, Z. (2006) Structure-based virtual screening of chemical libraries for drug discovery. Curr Opin Chem Biol 10, 194-202. Seifert, M. H., Kraus, J ., and Kramer, B. (2007) Virtual high-throughput screening of molecular databases. Curr Opin Drug Discov Devel 10, 298-307. Verkhivker, G. M., Bouzida, D., Gehlhaar, D. K., Rejto, P. A., Arthurs, S., Colson, A. B., Freer, S. T., Larson, V., Luty, B. A., Marrone, T., and Rose, P. W. (2000) Deciphering common failures in molecular docking of ligand-protein complexes. J Comput Aided Mol Des 14, 731-51. Gohlke, H., and Klebe, G. (2002) Approaches to the description and prediction of the binding affinity of small-molecule ligands to macromolecular receptors. Angew Chem Int Ed Eng] 41, 2644-76. Coupez, B., and Lewis, R. A. (2006) Docking and scoring-theoretically easy, practically impossible? Curr Med Chem 13, 2995-3003. Alvarez, J. C. (2004) High-throughput docking as a source of novel drug leads. Curr Opin Chem Biol 8, 365-70. Cavasotto, C. N., and Orry, A. J. (2007) Ligand docking and structure-based virtual screening in drug discovery. Curr Top Med Chem 7, 1006-14. Manning, G., Whyte, D. B., Martinez, R., Hunter, T., and Sudarsanam, S. (2002) The protein kinase complement of the human genome. Science 298, 1912-34. 156 V‘ \‘I ‘1 “Ag (12) (13) (14) (15) (16) (17) (18) (19) (20) Cohen, P. (2002) Protein kinases--the major drug targets of the twenty-first century? Nat Rev Drug Discov 1, 309-15. Fabian, M. A., Biggs, W. H., 3rd, Treiber, D. K., Atteridge, C. E., Azimioara, M. D., Benedetti, M. G., Carter, T. A., Ciceri, P., Edeen, P. T., Floyd, M., Ford, J. M., Galvin, M., Gerlach, J. L., Grotzfeld, R. M., Herrgard, S., Insko, D. E., Insko, M. A., Lai, A. G., Lelias, J. M., Mehta, S. A., Milanov, Z. V., Velasco, A. M., Wodicka, L. M., Patel, H. K., Zarrinkar, P. P., and Lockhart, D. J. (2005) A small molecule-kinase interaction map for clinical kinase inhibitors. Nat Biotechnol 23, 329-36. Walker, B., and Lynas, J. F. (2001) Strategies for the inhibition of serine proteases. Cell Mol Life Sci 58, 596-624. Coghlan, M. J ., Elmore, S. W., Kym, P. R., and Kort, M. E. (2003) The pursuit of differentiated ligands for the glucocorticoid receptor. Curr Top Med Chem 3, 1617-35. Matter, H., and Schudok, M. (2004) Recent advances in the design of matrix metalloprotease inhibitors. Curr Opin Drug Discov Devel 7, 513-35. Sukuru, S. C., Crepin, T., Milev, Y., Marsh, L. G, Hill, J. B., Anderson, R. J ., Morris, J. C., Rohatgi, A., O'Mahony, G., Grotli, M., Danel, F., Page, M. G., Hartlein, M., Cusack, S., Kron, M. A., and Kuhn, L. A. (2006) Discovering new classes of Brugia malayi asparaginyl—tRNA synthetase inhibitors and relating specificity to conformational change. J Comput Aided Mol Des 20, 159-78. Bettayeb, K., Tirado, O. M., Marionneau-Lambot, S., F erandin, Y., Lozach, 0., Morris, J. C., Mateo-Lozano, S., Drueckes, P., Schachtele, C., Kubbutat, M. H., Liger, F., Marquet, B., Joseph, B., Echalier, A., Endicott, J. A., Notario, V., and Meij er, L. (2007) Meriolins, a new class of cell death inducing kinase inhibitors with enhanced selectivity for cyclin-dependent kinases. Cancer Res 6 7, 8325-34. 3 Goodford, P. J. (1985) A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J Med Chem 28, 849-57. Wade, R. C., Clark, K. J ., and Goodford, P. J. (1993) Further development of hydrogen bond functions for use in determining energetically favorable binding sites on molecules of known structure. 1. Ligand probe groups with the ability to form two hydrogen bonds. J Med Chem 36, 140-7. 157 (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) Kastenholz, M. A., Pastor, M., Cruciani, G., Haaksma, E. E., and Fox, T. (2000) GRID/CPCA: a new computational tool to design selective ligands. J Med Chem 43, 3033-44. Braiuca, P., Cruciani, G., Ebert, C., Gardossi, L., and Linda, P. (2004) An innovative application of the "flexible" GRID/PCA computational method: study of differences in selectivity between PGAs from Escherichia coli and a Providentia rettgeri mutant. Biotechnol Prog 20, 1025-31. Sheridan, R. P., Holloway, M. K., McGaughey, G., Mosley, R. T., and Singh, S. B. (2002) A simple method for visualizing the differences between related E receptor sites. J Mol Graph Model 21, 217-25. 5 Sheridan, R. P., Nachbar, R. B., and Bush, B. L. (1994) Extending the trend vector: the trend matrix and sarnple-based partial least squares. J Comput Aided 1 Mol Des 8, 323-40. 1 Miller, M. D., Kearsley, S. K., Underwood, D. J ., and Sheridan, R. P. (1994) FLOG: a system to select 'quasi-flexible' ligands complementary to a receptor of known three-dimensional structure. J Comput Aided Mol Des 8, 153-74. Deng, Z., Chuaqui, C., and Singh, J. (2004) Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. J Med Chem 4 7, 337-44. Deng, Z., Chuaqui, C., and Singh, J. (2006) Knowledge-based design of target- focused libraries using protein-ligand interaction constraints. J Med Chem 49, 490-500. Perola, E. (2006) Minimizing false positives in kinase virtual screens. Proteins 64, 422-35. Ortiz, A. R., Gomez-Puertas, P., Leo-Macias, A., Lopez-Romero, P., Lopez— Vinas, E., Morreale, A., Murcia, M., and Wang, K. (2006) Computational approaches to model ligand selectivity in drug design. Curr Top Med Chem 6, 41- 55. Schnecke, V., Swanson, C. A., Getzoff, E. D., Tainer, J. A., and Kuhn, L. A. (1998) Screening 8 peptidyl database for potential ligands to proteins with side- chain flexibility. Proteins 33, 74-87. 158 (31) (32) (33) (34) (35) (36) (37) (38) (39) (40) (41) Schnecke, V., and Kuhn, L. A. (1999) Database screening for HIV protease ligands: the influence of binding-site conformation and representation on ligand selectivity. Proc Int Conflntell Syst Mol Biol, 242-51. Schnecke, V., and Kuhn, L. A. (2000) Virtual screening with solvation and ligand-induced complementarity. Perspect Drug Discov 20, 171-190. Zavodszky, M. I., Sanschagrin, P. C., Korde, R. S., and Kuhn, L. A. (2002) Distilling the essential features of a protein surface for improving protein-ligand docking, scoring, and virtual screening. J Comput Aided Mol Des 16, 883-902. Zavodszky, M. I., and Kuhn, L. A. (2005) Side-chain flexibility in protein-ligand binding: the minimal rotation hypothesis. Protein Sci 14, 1104-14. Sanschagrin, P. C., and Kuhn, L. A. (1998) Cluster analysis of consensus water sites in thrombin and trypsin shows conservation between serine proteases and contributions to ligand specificity. Protein Sci 7, 2054-64. Gohlke, H., Hendlich, M., and Klebe, G. (2000) Knowledge-based scoring function to predict protein-ligand interactions. J Mol Biol 295 , 337-56. Wallace, A. C., Laskowski, R. A., and Thornton, J. M. (1995) LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. Protein Eng 8, 127-34. Block, P., Sotriffer, C. A., Dramburg, I., and Klebe, G. (2006) AffinDB: a freely accessible database of affinities for protein-1i gand complexes from the PDB. Nucleic Acids Res 34, D522-6. McDonald, 1., and Thornton, J. M. Ippolito, J. A., Alexander, R. S., and Christianson, D. W. (1990) Hydrogen-Bond Stereochemistry In Protein-Structure And Function. J Mol Biol 215, 457-471. Kuhn, L. A., Swanson, C. A., Pique, M. E., Tainer, J. A., and Getzoff, E. D. (1995) Atomic and residue hydrophilicity in the context of folded protein structures. Proteins 23, 536-47. 159 (42) (43) (44) (45) (46) (47) (48) (49) (50) (51) Carlson, H. A., Masukawa, K. M., Rubins, K., Bushman, F. D., Jorgensen, W. L., Lins, R. D., Briggs, J. M., and McCammon, J. A. (2000) Developing a dynamic pharmacophore model for HIV-1 integrase. J Med Chem 43, 2100-2114. Ramensky, V., Sobol, A., Zaitseva, N., Rubinov, A., and Zosirnov, V. (2007) A novel approach to local similarity of protein binding sites substantially improves computational drug design results. Proteins 69, 349-357. Chene, P. (2002) ATPases as drug targets: learning from their structure. Nat Rev Drug Discov 1, 665-73. IT Vieth, M., Higgs, R. E., Robertson, D. H., Shapiro, M., Gragg, E. A., and Hemmerle, H. (2004) Kinomics-structural biology and chemo genomics of kinase inhibitors and targets. Biochim Biophys Acta 1697, 243-57. Kuttner, Y. Y., Sobolev, V., Raskind, A., and Edelrnan, M. (2003) A consensus- 1 binding structure for adenine at the atomic level permits searching for the ligand site in a wide spectrum of adenine-containing complexes. Proteins 52, 400-11. Katz, B. A., Mackman, R., Luong, C., Radika, K., Martelli, A., Sprengeler, P. A., Wang, J ., Chan, H., and Wong, L. (2000) Structural basis for selectivity of a small molecule, Sl-binding, submicromolar inhibitor of urokinase-type plasmino gen activator. Chem Biol 7, 299-312. Maignan, S., Guilloteau, J. P., Pouzieux, S., Choi-Sledeski, Y. M., Becker, M. R., Klein, S. I., Ewing, W. R., Pauls, H. W., Spada, A. P., and Mikol, V. (2000) Crystal structures of human factor Xa complexed with potent inhibitors. J Med Chem 43, 3226-32. Rauh, D., Klebe, G., and Stubbs, M. T. (2004) Understanding protein-ligand interactions: the price of protein flexibility. J Mol Biol 335 , 1325-41. Moodie, S. L., Mitchell, J. B., and Thornton, J. M. (1996) Protein recognition of adenylate: an example of a fuzzy recognition template. J Mol Biol 263, 486-500. Mao, L., Wang, Y., Liu, Y., and Hu, X. (2004) Molecular determinants for ATP- binding in proteins: a data mining and quantum chemical analysis. J Mol Biol 336, 787-807. 160 (52) (53) (54) (55) (56) (57) (58) (59) (60) (61) (62) Pankiewicz, K. W., Zatorski, A., and Watanabe, K. A. (1996) NAD-analogues as potential anticancer agents: conformational restrictions as basis for selectivity. Acta Biochim Pol 43, 183-93. Jacobson, K. A. (2001) Probing adenosine and P2 receptors: Design of novel purines and nonpurines as selective ligands. Drug Develop Res 52, 178-186. Shoichet, B. K., and Kuntz, I. D. ( 1993) Matching chemistry and shape in molecular docking. Protein Eng 6, 723-32. Good, A. C., Cheney, D. L., Sitkoff, D. F., Tokarski, J. S., Stouch, T. R., Bassolino, D. A., Krystek, S. R., Li, Y., Mason, J. S., and Perkins, T. D. (2003) Analysis and optimization of structure-based virtual screening protocols. 2. Examination of docked ligand orientation sampling methodology: mapping a pharmacophore for success. J Mol Graph Model 22, 31-40. Gruneberg, S., Stubbs, M. T., and Klebe, G. (2002) Successful virtual screening for novel inhibitors of human carbonic anhydrase: strategy and experimental confirmation. J Med Chem 45, 3588-602. Hindle, S. A., Rarey, M., Buning, C., and Lengaue, T. (2002) Flexible docking under pharmacophore type constraints. J Comput Aided Mol Des 16, 129-49. Schulz-Gasch, T., and Stahl, M. (2003) Binding site characteristics in structure- based virtual screening: evaluation of current docking tools. J Mol Model 9, 47- 57. Jansen, J. M., and Martin, E. J. (2004) Target-biased scoring approaches and expert systems in structure-based virtual screening. Curr Opin Chem Biol 8, 359- 64. Jacobs, D. J ., Rader, A. J ., Kuhn, L. A., and Thorpe, M. F. (2001) Protein flexibility predictions using graph theory. Proteins 44, 150-65. Rader, A. J ., Hespenheide, B. M., Kuhn, L. A., and Thorpe, M. F. (2002) Protein unfolding: rigidity lost. Proc Natl Acad Sci U S A 99, 3540-5. Lei, M., Zavodszky, M. I., Kuhn, L. A., and Thorpe, M. F. (2004) Sampling protein conformations and pathways. J Comput Chem 25, 1133-48. 161 (63) (64) (65) Zavodszky, M. 1., Lei, M., Thorpe, M. F., Day, A. R., and Kuhn, L. A. (2004) Modeling correlated main-chain motions in proteins for flexible molecular recognition. Proteins 5 7, 243-61. Holm, L., and Sander, C. ( 1993) Protein-Structure Comparison by Alignment of Distance Matrices. J Mol Biol 233, 123-138. Krissinel, E., and Henrick, K. (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D 60, 2256-2268. 162 Chapter 5 Summary and future directions 5.1 Virtual screening for aminoacyl tRNA synthetase inhibitors 5.1.1 Summary and perspective In chapter 2, the discovery of seven new classes of Brugia AsnRS using structure-based virtual screening is presented. This is the first reported example of tRNA synthetase inhibitors being discovered by protein structure-based screening methods. A survey of the recently published database of known aminoacyl-tRNA synthetase (AARS) inhibitors (1) reveals that majority of them are bacterial class I AARS (e. g., IleRS and MetRS) inhibitors and very few are class II AARS (e.g. AsnRS, ProRS) inhibitors. Most of the potent class I AARS inhibitor scaffolds are natural products discovered in experimental screening (2-4). A combination of an empirical scoring function (SLIDE score) and a knowledge- based scoring function (DrugScore) does a reliable job of assessing the right conformation, binding mode, and relative affinity of known AsnRS ligands, as well as 163 .-_. -' G distinguish them from a pool of 1000 decoy molecules. This scoring protocol also guided us in selecting 45 compounds for experimental assays, out of which 7 were confirmed as inhibitors. From our experience, proper representation of the binding site and having the ligand conformers close to the bioactive, bound conformation are important factors for scoring functions to perform well on a given system. The selectivity of long side chain variolins, for Brugia relative to human AsnRS, was explained using a ROCK-generated open conformation of a flexible active-site loop of Brugia AsnRS, coupled with a sequence substitution at the base of the loop. These results open a new range of possibilities of considering conformational differences between active-site loops, rather than only considering residue differences in the static parts of binding pockets, for gaining specificity between close homologs. 5.1.2 Future directions Brugia AsnRS inhibitors discovered by SLIDE were all docked in the adenosyl pocket of the binding site. AsnRS is highly specific for binding asparagine in its aminoacyl pocket, as is generally true for all AARS and their cognate amino acids. A productive strategy for AsnRS inhibitor design is to link the sulfamoyl-asparagine group to the promising inhibitor scaffolds that bind in the adenosyl pocket. Chapter 3 describes the design of analogs of two of the most promising Brugia AsnRS inhibitors by employing this strategy. However, given the high active-site sequence homology between Brugia and human AsnRS, there is need for identification of alternative binding pockets in Brugia AsnRS that could be used for screening and design of new inhibitors. 164 Assisted by the binding site comparison tool described in chapter 4, our laboratory identified a pocket in Brugia AsnRS, off of the binding site occupied by the co-crystallized ligand ASNAMS. The ZINC database (5), containing more than a million commercially-available compounds for virtual screening, was screened by my colleague Anj ali Rohatgi to find inhibitors that may be docked in the new binding pocket. It will be interesting to find compounds, predicted to bind elsewhere from the known binding site, that inhibit the enzyme Our collaborator Prof. Michael Kron and co-workers have shown that Brugia AsnRS could possibly contribute to the acute host inflammatory response against the filarial parasite by activating human chemokine receptors CXCR1 and CXCR2 (6). To elucidate the structural and/or sequence determinants that confer chemokine activity to Brugia AsnRS, it was compared with the interleukin 1L8, a representative chemokine that binds to both CXCR1 and CXCR2 with high affinity. Preliminary analysis of the results showed two short (three residues long) sequence stretches in Brugia AsnRS, far from its binding site, that were most similar to human 1L8. Further experiments are required to test whether these sequences play a crucial role in conferring the chemokine activity of Brugia AsnRS. The Brugia AsnRS pocket containing residues interacting with chemokine receptors, once confirmed, could be an additional target for virtual screening to identify potential ligands that block the enzyrne’s interactions with chemokine receptors. A structurally distinct editing site for proofreading has been reported in many AARS (7). However, for class II AARS, of which AsnRS is a member, the editing site has been identified for only AlaRS (8), ThrRS (9) and ProRS (10). Elucidation of an 165 V a“: in. tmcuqyu—q editing site in Brugia AsnRS will provide an additional binding Site for virtual screening to identify potential ligands that can inhibit the proofieading activity of the enzyme. 5.2 Using specificity determinants in virtual screening 5.2.1 Summary and perspective Structure-based virtual screening has been successful in identifying ligands with good shape and chemical complementarity to a protein target. However, it is challenging to find ligands that are specific to one protein relative to another in a fast, automated way. In chapter 4, a new method to perform automated Shape and chemistry comparison to identify binding site invariants and specificity determinants has been described. The results from this method could be integrated into our structure-based virtual screening protocol to selectively screen for ligands that match the specificity determinants of a target protein. To identify the specificity determinants between Brugia AsnRS and other ATP- binding proteins, their binding sites were compared using the new method. The results obtained are not only useful in aiding structure-based drug design efforts but also elucidate novel differences between the binding sites of ATP-binding proteins. The adenine pockets of these proteins are very similar given the high density of similar sites identified by the method. However, there are key chemical and steric difference sites in their ribose and phosphate pockets which can be exploited in ligand design. 166 5.2.2 Future directions The results obtained fiom comparing the binding sites will be very useful when they are integrated with structure-based screening protocol. Our screening and docking tool SLIDE can use the specificity determinants identified by the method by labeling them as key points in its protein binding site model called a template. The docking of ligand candidates must then include a match to at least one of these key template points. The results obtained from our method could also be used as pharmacophore constraints and/or filters to enable a bias to be applied in other virtual screening protocols. The degree of preference of chemical difference sites, identified by our method between two protein binding Sites, was quantified using DrugScore, a knowledge-based scoring function. Two interesting questions can be addressed by performing cross docking experiments of promising compounds to the binding sites of the two proteins: 1. Can a compound, docked by matching the specificity determinants in one protein relative to other, be docked at all to the binding site of the other? 2. If a compound can be docked to both proteins, then do the predicted relative binding affinities have any correlation with the quantified degree of preference? Answers to these questions can contribute to our understanding of the essential features in molecular recognition that are similar and/or different between two proteins. They can also aid in further development of scoring functions that predict protein-ligand complementarity, by deciphering the features that contribute most to the binding affinity. 167 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) References Torchala, M., and Hoffinann, M. (2007) IA, database of known ligands of aminoacyl-tRNA synthetases. J Comput Aided Mol Des 21, 523-5. Kim, S., Lee, S. W., Choi, E. C., and Choi, S. Y. (2003) Aminoacyl-tRNA synthetases and their inhibitors as a novel family of antibiotics. Appl Microbiol Biotechnol 61, 278-88. Pohlmann, J ., and Brotz-Oesterhelt, H. (2004) New aminoacyl-tRNA synthetase inhibitors as antibacterial agents. Curr Drug Targets Infect Disord 4, 261-72. V132.“ N‘Wu__..' :’ Ochsner, U. A., Sun, X., Jarvis, T., Critchley, I., and Janjic, N. (2007) Aminoacyl- tRNA synthetases: essential and still promising targets for new anti-infective agents. Expert Opin Investig Drugs 16, 573-93. Irwin, J. J ., and Shoichet, B. K. (2005) ZINC--a free database of commercially available compounds for virtual screening. J Chem Inf Model 45, 177-82. Ramirez, B. L., Howard, O. M., Dong, H. F., Edamatsu, T., Gao, P., Hartlein, M., and Kron, M. (2006) Brugia malayi asparaginyl-transfer RNA synthetase induces chemotaxis of human leukocytes and activates G-protein-coupled receptors CXCR1 and CXCR2. J Infect Dis 193, 1164-71. Ibba, M., and S011, D. (2000) Aminoacyl-tRNA synthesis. Annu Rev Biochem 69, 617-50. Beebe, K., Ribas De Pouplana, L., and Schimmel, P. (2003) Elucidation of tRNA- dependent editing by a class II tRNA synthetase and significance for cell viability. — Embo J22, 668-75. ‘ Beebe, K., Meniman, E., Ribas De Pouplana, L., and Schimmel, P. (2004) A domain for editing by an archaebacterial tRNA synthetase. Proc Natl Acad Sci U SA 101, 5958-63. Crepin, T., Yaremchuk, A., Tukalo, M., and Cusack, S. (2006) Structures of two bacterial prolyl-tRNA synthetases with and without a cis-editing domain. Structure 14, 1511-25. 168 1llllljllfllllllljljj11111131111|