. u...
”mm
:m

.11:

I

.x: {.1

:5:
.95

V .
69$
, T :J
u s. I 1 .

urxdxa J

a
.. a
u» a.

.ﬁ
...

 

Ltd
a. “an“. 3. 1
.6. {9.3.07
3%”) 3.19:.
ti!!!-
3 .

 

Jpn "gasp-qty";

Jame-eta.

I c

4..."... V .
. .. , . H V $23 m . ‘ _ .. . ‘
‘ . . . . . ‘ . . . . ,aéuﬁm ggﬁﬁg .ﬁaﬁégaw? is: . ﬂawlhwau.‘ .u

 

 

7%! 7355’

LIBRARY
Michigan State
University

This is to certify that the
thesis entitled

OPTIMIZING SIDE-CHAIN INTERACTIONS IN
PROTEIN-LIGAND INTERFACES

presented by

SAMEER ARORA

has been accepted towards fulﬁllment
of the requirements for the

MS. degree in Computer Science

 

5/44,

Major Professor’s Signature

2/ f/ of

Date

MSU is an Aﬂirmative Action/Equal Opportunity Institution

 

PlACE lN RETURN BOX to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/05 chlRC/DatoDqundd—pts

OPTIMIZING SIDE-CHAIN INTERACTIONS
IN PROTEIN-LIGAND INTERFACES

By

Sameer Arara

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

MASTER OF SCIENCE
Department of Computer Science

2005

ABSTRACT
OPTIMIZING SIDE-CHAIN INTERACTIONS IN PROTEIN-LIGAND
INTERFACES
By

Sameer Arora

Proteins bind to other proteins or small molecules to perform essential cellular
functions. Protein side-chain ﬂexibility is crucial for binding and molecular recog-
nition. Hence modeling side-chain ﬂexibility in protein-ligand docking algorithms
to predict the optimal inter- and intra-molecular interactions is extremely desirable.
However, modeling side-chain ﬂexibility in docking and screening is computationally
expensive due to the numerous side chains and their many degrees of freedom.

Our research indicates that direct, intra-protein hydrogen bonds and hydrophobic
interactions are preserved to a signiﬁcant extent upon ligand binding. This provides
guidance to restrict the number of side-chain candidates for conformational sam-
pling in docking. While these bond-preservation tendencies limit side-chain move-
ments, large side-chain motions are also observed in the protein-ligand interface.
Subsequently, the extent of these large side-chain motions from ligand-free to ligand-
bound crystal-structure conformations are characterized, as is the suitability of using
backbone-dependent rotamers for sampling these larger motions.

The ability to accurately identify which side chains move signiﬁcantly upon ligand
binding as well as their optimal conformations is crucial for docking and screening. A
new scoring function, having good linear correlation with experimentally determined
protein-ligand binding afﬁnities, is presented for scoring dockings and side-chain in-
teractions by SLIDE. Using the new scoring function as a cost measure, a mean-ﬁeld
based algorithm, exploiting rotamer-based side-chain ﬂexibility modeling, is proposed

and tested for optimizing interactions in protein-ligand interfaces.

Copyright by
Sameer Arora
2005

To my Mummyji and Papajz'. It is your endless love, sacriﬁces, committment and
support, that I am here, forever indebted.

iv

ACKNOWLEDGEMENTS

First, I would like to express my deep gratitude for my advisor, Dr. Leslie A.
Kuhn, for all hers support, guidance and encouragement throughout my graduate
education. Her boundless energy and infectious enthusiasm for science helped me
embark and persist on this research work. I am grateful for her mentorship, and
feel lucky to have been associated with such a person, whose optimism, patience and
dynamism continue to set high standards for me.

I want to thank my co-advisor, Dr. Bill Punch. His advice and guidance strength-
ened my decision and faith in pursuing this interdisciplinary research. And his support
and patience helped me continue forward with a calm and positive mind.

I extend my thanks to Dr. Phil Duxbury. My discussions with him always left
me with valuable insight and a physicist’s perspective on our research problem.

I would also like to thank the past and present members of the Kuhn lab, Litian
He, Chetan Sukuru, Rajesh Korde, Dr. Brandon Hespenheide, and Sandeep Nami-
likonda for creating an enjoyable and stimulating research environment. I am espe-
cially thankful to Dr. Maria Zévodszky, who helped me grow, besides teaching me
many things over the years, including Biochemistry and SLIDE. She was always ready
to help, discuss ideas and answer countless (quick) questions.

Finally, I would like to thank my friends (especially Kantha Kumar) and family,

without whose support and encouragement, it would be hard to imagine myself here.

Table of Contents

List of Tables viii
List of Algorithms viii
List of Figures viii
1 Flexibility in Protein-Ligand Docking 1
1.1 Introduction: Protein-Ligand Docking .................. 1
1.2 Overview of Protein-Structure Prediction Methods ........... 2
1.3 Overview of Current Side-Chain Modeling Approaches in Docking Tools 7
1.4 Side-Chain Modeling in SLIDE ..................... 10
1.5 Motivation and Focus in this Thesis Work ............... 11
1.6 Organization of the Thesis ........................ 13
2 Hydrogen Bond Preservation in Protein Binding Sites 15
2.1 Motivation ................................. 15
2.2 Methods .................................. 16
2.3 Results and Discussion .......................... 22
2.4 Conclusions ................................ 27
3 Sampling Side-Chain Positions in Protein-Ligand Interfaces in
SLIDE 29
3.1 Side-Chain Displacements upon Ligand Binding ............ 30
3.2 Current Rotation Paradigm in SLIDE .................. 32
3.2.1 Motions Modeling in SLIDE ................... 39
3.3 Employing Hydrogen-Bond Preservation Bias in Mean-Field Optimiza-
tion .................................... 43
3.4 Sampling Large Side-Chain Motions ................... 45
3.4.1 Rotamer Libraries ......................... 49
3.5 Conclusions ................................ 56
4 Scoring SLIDE Dockings 58
4.1 Introduction ................................ 58
4.2 SLIDE Scoring Function ......................... 60
4.3 Methods .................................. 61
4.3.1 Calculation of Terms for Scoring Function ........... 62
4.3.2 Preparation of Test Set ...................... 66
4.3.3 Training and Testing ....................... 67
4.4 Results ................................... 69
4.5 Conclusions ................................ 73

vi

5 Guiding Sampling by Score 74

5.1 Sampling Choices for Maximizing Score ................. 75
5.2 Methods .................................. 81
5.3 Results ................................... 82
5.4 Analysis .................................. 94
5.5 Conclusion ................................. 97
6 Summary and Future Directions 99
6.1 Future Directions ............................. 100
Bibliography 103

vii

2.1

2.2
3.1

4.2
4.1
4.3

4.4

List of Tables

List of protein-structure pairs studied for hydrogen bond preservation
upon ligand binding. ...........................
Rules for categorization of atom types .................

Visual analysis of reasons behind large dihedral rotations in interfacial
side chains .................................

Scoring function variants evaluated to predict binding afﬁnities . . . .
Potential terms for new scoring function for SLIDE ..........
PDB codes of crystal complexes used for training and testing the scor-
ing functions ................................
Predicted and experimental binding-afﬁnity linear correlation coefﬁ-
cients and average weights derived from linear multiple regression

viii

50

61
63

68

70

List of Algorithms

1 Pseudo-code for rotamer sampling for unsatisﬁed side chains and mean-
ﬁeld optimization for maximizing score ................. 79

ix

1.1
1.2

2.1

2.2

2.3

2.4

2.5

2.6

3.1

3.2

3.3

3.4
3.5

3.6

3.7

3.8

3.9

List of Figures

Example of a protein-ligand crystal complex .............. 3
Screening and docking algorithm implemented in SLIDE ....... 12
Geometric parameters used to identify hydrogen bonds and measure
their energy ................................ 18
Percentages and displacements, by atom category, of preserved direct
intra-protein hydrogen bonds in protein-ligand interfaces ....... 23
Percentages, by residue type, of preserved direct intra-protein hydrogen
bonds in protein-ligand interfaces .................... 24
Percentages, by residue type, of preserved interfacial intra-protein hy-
drogen bonds mediated by one water molecule ............. 25
Percentages, by residue type, of preserved interfacial intra-protein hy-
drogen bonds mediated by two water molecules ............ 26
Percentages, by atom category, of preserved interfacial intrarprotein
hydrophobic interactions ......................... 28

Frequency distribution of RMSD ligand-bound side-chain conforma-
tions from ligand-free conformations ................... 31
Change in X1 and x2 dihedral angles of preserved interfacial all side
chains within 4.0 A of the ligand, across 30 structures listed in Table 2.1. 33
Change in x3 and x4 dihedral angles of preserved interfacial all side
chains within 4.0 A of the ligand, across 30 structures listed in Table 2.1. 34
Rotation angle choices for resolving an inter-atomic steric clash . . . 36
The algorithm for resolving side-chain collisions using the mean-ﬁeld
Optimization technique. When there are still collisions exceeding the
threshold after 10 iterations of the outer loop, this ligand orientation
is discarded ................................. 38
Frequency distributions of RMSD between ligand-free and ligand-
bound side-chain conformation, as well as RMSD between ligand-free
conformation and conformation in the best docking generated by SLIDE. 40
Comparison of side-chain RMSDs between ligand-free to ligand-bound
conformation, and between ligand-free conformation and conformation
in the top docking. ............................ 42
Comparison of the number of intra-protein hydrogen bonds before and
after ligand binding in crystal complexes, in best dockings by SLIDE
and the hydrogen-bond-preservationist version SLIDE. ........ 46
Comparison of the number of intra-protein hydrogen-bonds lost upon
ligand binding in nature (between ligand-free and ligand-bound struc-
tures), in best dockings by SLIDE, and in the best dockings by
hydrogen-bond-preservationist version SLIDE. ............. 47

3.10 Comparison of the number of intra-protein hydrogen bonds gained
upon ligand binding in crystal complexes, in the best dockings by
SLIDE and by the hydrogen-bond—preservationist version of SLIDE. .

3.11 Illustration of bond-rotation angles associated with single bonds in a
side chain. .................................

3.12 Number of rotamers approximating dihedral angles of target confor-
mations for 25 interfacial side chains undergoing large rotation upon
ligand binding. ..............................

3.13 Cartesian space search for rotamers close to ligand-bound conforma-
tions of selected side chains using Dunbrack rotamer library .....

4.1 Correlation between experimentally known binding afﬁnity and score
values determined by SLIDE scoring function 61 ............
4.2 Correlation between experimentally known binding afﬁnity and score
values determined by old SLIDE scoring function ...........
4.3 Correlation between afﬁnity values known experimentally and those
determined by DrugScore .........................

5.1 Flowchart of algorithm used to guide selection of a rotamer from a pool
of rotamers ................................
5.2 Distribution of change in score across all dockings with triangle
matches in common, generated by both old and new SLIDE versions .
5.3 Comparison of scores of best dockings by old and new SLIDE ver-
sions for each protein, as well as scores evaluated for the corresponding
crystal complexes .............................
5.4 Comparison of the number of interfacial unsatisﬁed polar groups in the
best dockings generated by old and new SLIDE versions .......
5.5 Comparison of displacement and direction of side-chain motions per-
formed for resolving protein-ligand collisions ..............
5.6 Side-chain motions performed in new SLIDE to optimize interactions
in the same 24 complexes .........................
5.7 Large natural side-chain motions undetected by old or new SLIDE
versions ..................................
5.8 Selection or rejection reasons for each of the sampled rotamers for
unsatisﬁed side chains in the best dockings ...............

Images in this thesis/dissertation are presented in color.

xi

48

52

54

57

69

71

72

80

84

86

87

89

90

96

Chapter 1

Flexibility in Protein-Ligand

Docking

1.1 Introduction: Protein-Ligand Docking

Like people interact and cooperate with one another to perform many functions
for the sustenance and growth of life at the social level, proteins too interact and
cooperate with each other as well as with other molecules for the sustenance and
growth of life at the cellular level. Proteins perform various vital functions in a
cell - they provide cellular structure, bind and transport other proteins or organic
compounds, and catalyze or inhibit reactions. Underlying all stable bindings between
a protein and another molecule lies the mechanism for the two molecules to recognize
each other and achieve a bound state more stable than their individual unbound
states. Understanding the mechanism of protein binding is key understanding their
function as well providing valuable insights for discovering novel compounds that can
bind to speciﬁc proteins for designing therapeutic drugs.

Molecular docking is a term used to describe computational techniques that at-

tempt to ﬁnd the “best” mode binding between two molecules. Protein-ligand docking

aims at ﬁnding the optimal binding between a protein and small molecule to a spe-
ciﬁc site of the protein. In protein-ligand docking, the atomic structures of the two
molecules are given as input in the most general form, no additional data is provided.
However, in practice, additional biochemical information can be given, speciﬁcally,
information about the location of the binding site.

Docking methods can be categorized in multiple ways. “Rcdocking” attempts to
reconstruct a complex using bound structures of the receptor and the ligand. More
challenging is the “unbound” docking, which attempts to reconstruct a complex us-
ing unbound structures of the receptor and the ligand. In this case, some degree of
conformational change in the protein and the ligand must be modeled or accommo-
dated. Another categorization may be based on ﬂexibility: “rigid” docking keep the
structures rigid during the docking, while ﬂexible docking techniques allow ﬂexibility
in the receptor or ligand or both.

Typically, there are three key ingredients in docking algorithms:

1. representation of the molecular system
2. conformational and orientational space search (“sampling”)

3. ranking of potential solutions (“scoring”)

The present work focuses on modeling protein side-chain motion in docking, and
the subsequent sections of this Introduction cover how ﬂexibility has been modeled

by others in a variety of protein modeling methods.

1.2 Overview of Protein-Structure Prediction

Methods

Over several years, various techniques have been developed to predict protein

structures from their amino acid sequences. These prediction techniques attempt to

 

 

Figure 1.1: This image is presented in color. Example of a protein-ligand crys-
tal complex. Crystal structure of a-momorcharin complexed with formycin 5'-
monophosphate. Green ribbon represents the protein backbone while the ligand is
displayed with Connolly solvent-accessible molecular surface, colored by atom type.
Only binding-site side chains are displayed (in tubes) with interfacial side-chains col-
ored cyan and side chains beyond the interface colored by atom type.

 

deﬁne the protein structure in the native state from the known sequence of amino
acids, and in some cases aim to capture dynamics of processes like protein folding
and protein-ligand or protein-protein docking. While each of these methods is used
for structure prediction, they have been used in context of dockings as well.

Molecular Dynamics (MD) Simulation is one such technique that serves as an im-
portant tool in protein structural dynamics and reﬁnement. It uses torsional degrees
of freedom in proteins and ligands and key physical properties at the atomic level
to solve Newton’s equations of motion. Forces on individual atoms are represented
as sums of potential terms (e.g. electrostatic and van der Waals) during the entire
simulation. Each simulation run is divided into time steps, which are typically scaled
small enough (1-2 femtoseconds) to ensure that the physical interactions are modeled
accurately. While smaller time steps, combined with an accurate force ﬁeld, lead to a
better quality of simulation results, the large number of steps required makes the sim-
ulation extremely slow for docking and screening purposes, as each time step involves
thousands of degrees of freedom in the molecule(s), requires evaluation of computa-
tionally expensive potential energy terms, and must also account for the surrounding
solvent. The computational intensity also limits the amount of conformational space
sampled for both protein and ligands. To optimize side-chain placement in docking,
sometimes less computationally demanding energy minimization is used, but this in-
volves very local motions. Carlson and colleagues[2] have used MD to determine
relatively rigid regions from multiple crystal structures and focus on these immobile
regions as templates for drug design.

Monte Carlo (MC) sampling is a stochastic sampling method applied to a num-
ber of diverse domains, including economics, physics, trafﬁc regulation and others. In
molecular simulations, an MC simulation is often guided by simulated annealing. In
simulated annealing, the “temperature” of the system is raised and then system is al-

lowed to cool down gradually. Here, temperature implies any parameter of the system

that controls the magnitude of sampled motions. Like high an input of energy helps
electrons jump to higher energy orbital in atoms, high “temperature” in MC simu-
lations helps torsional rotations overcome high energetic barriers. Once the systems
has been allowed to sample a variety of states, temperature is dropped in a step-wise
manner to allow the system to relax into a low energy state. This relaxation may be
guided by a force ﬁeld or could result from a random move. New conformations are
accepted if the energy drops. the simulation stops when the temperature falls below
a threshold. Since there is some randomness to the generation of the ﬁnal molecular
complex, and since there are usually many protein-ligand conformations with similar
energy, an ensemble of many structures is usually calculated to observe which binding
modes appear most often. This means that this computationally intensive program
must be repeated many times.

The Dead-End Elimination (DEE) approach relies on a potential energy function
to evaluate discrete side—chain conformations. A DEE run searches for a combination
of side-chain conformations that result in global energy minimum for the protein or
protein-ligand complex[6]. Typically, a set of side-chains is identiﬁed for conforma-
tional sampling. The rest of the atoms provide a representation of the immediate
environment and additional constraints for the sampling. The DEE implementations
typically use rotamer libraries to generate side-chain conformations. The relatively
high efﬁciency of this relies on a mathematical expression, the DEE criterion, which
rules out infeasible rotamers that cannot be part of the global minimum conforma-
tions. One example is that if a certain x angle value of a particular side chain is
observed to cause unresolvable collisions with neighboring residues, then this x value
is ruled out in all future moves. This criterion helps reduce the number of rotamers
to be considered for side chains in the modeling set, resulting in a number that is
sufﬁciently small to analyze by means of “brute force” combinatorial analysis. The

DEE approach is useful for homology modeling, where backbone structure is usually

known from a related protein. For docking and screening purposes, DEE scales well
as long as the number of side-chains sampled is small. Schaffer et a1 [40] have success-
fully applied DEE for side-chain Optimization after docking HIV-1 protease mutant
complexes using Monte Carlo/simulated annealing sampling.

The latest version of Dunbrack’s SCRWL algorithm[1] for side-chain modeling
uses results from graph theory to solve the combinatorial problem encountered in
side-chain prediction. In this method, side chains are represented as vertices in an
undirected graph. Any two residues that have rotamers with nonzero interaction
energies are considered to have an edge in the graph. The resulting graph can be
partitioned into connected subgraphs with no edges between them. These subgraphs
can in turn be broken into biconnected components, which are graphs that cannot be
disconnected by removal of a single vertex. The combinatorial problem is reduced to
ﬁnding the minimum energy of these small biconnected components representing sets
of mutually acceptable rotamers for different side chains and combining the results
to identify the global minimum energy conformation. While SCRWL’s main usage
is in homology modeling, ab initio protein structure prediction, and protein design
applications, it can be used for side-chain modeling in the presence of ligands too.
While such an approach may help develop induced ﬁt, the ligand largely remains rigid
in the process.

In mean-ﬁeld optimization methods[38, 26], the part of the protein that requires
modeling, like the side chains, is oversampled by replacing single copies by multiple
different copies. The copies of each side chain do not interact with each other but “see”
the other side chains as an average. The multiple copies essentially correspond to a
distribution function of one side chain, represented by a number of discrete structures
with assigned, often equal, weights. Mean—ﬁeld optimization has been used both for
side-chain modeling in homology modeling [38] as well as modeling ﬂexible side chains

during docking in SLIDE, a ligand screening and docking software developed in our

laboratory [42, 41]. Jackson et al[19] also attempt side-chain optimization in protein-
protein interfaces through iterative cycles of mean-ﬁeld optimization and rigid-body
energy minimization. Rotamers from the rotamer library of Thffrey et al [46] are used
for discrete sampling of side-chain conformational space while the main-chain atoms
have ﬁxed coordinates. The self-consistent mean ﬁeld approach is used to determine
the most probable set of rotamers from an ensemble of rotamers. In this closed
system, the potential mean force on each rotamer is based on the internal energy of
the rotamer itself, rotamer-backbone interactions, interactions between the rotamer
and rotamers of other residues, and the rotamer’s interactions with the surrounding
solvent. Each rotamer’s interactions are calculated prior to the Optimization, thus
saving time during the mean-ﬁeld optimization step.

Raplacing a system of side chains by a multiple copies per side chain forming
a mean-ﬁeld system has two major advantages. First, while mean-ﬁeld optimization
complexity is lower than the exponential complexity of searching the entire conforma-
tional space, the global energy minimum of the new mean-ﬁeld system is the same[38].
Secondly, the barriers separating the minima in the mean-ﬁeld calculation are lower
than in the original system, making annealing easier. However, the result of opti-
mization does depend on initial conditions, like the probability distributions for the

multiple copies of the side chains.

1.3 Overview of Current Side-Chain Modeling Ap-
proaches in Docking Tools

Accounting for side-chain reorientation during docking is similar to predicting
side-chain conformations in homology modeling. The search for candidate solutions in
a docking problem is addressed by two essentially different approaches: (1) a gradual

guided search through solution space, or (2) a full solution space search. The ﬁrst

either scans only part of the solution space in a partially random and partially criteria:
guided manner. This approach consists mainly of Monte Carlo (MC), simulated
annealing, molecular dynamics (MD), and evolutionary algorithms such as genetic
algorithms (GA). In contrast, the second scans the entire solution space in a predeﬁned
systematic manner such as using rotamer-library or geometric-hashing based searches
of conformations.

The classical algorithm implemented for computational docking is that Of DOCK,
the ﬁrst ligand docking tool [7, 5, 9, 44]. DOCK operates by generating a set of spheres
to describe the volume, or negative image, of the binding site and uses the centers
Of these spheres as sites for matching to ligand atoms. Sets of receptor spheres are
matched to sets of ligand atoms to generate a ligand orientation, which can then
be scored according to their complementarity with the protein. The early internal
DOCK scoring function, GRID [32], is a grid-based scoring function in the method of
Goodford and colleagues [13]. Later implementations have used more robust scoring
functions which are also grid-based. It is possible to use the ligand docking method
of DOCK with an externally supplied scoring function. The initial implementation
of DOCK used only steric ﬁt and electrostatics as a determinant for ligand docking,
but later versions implemented chemical type matching to better model chemical
complementarity between ligand and receptor groups, including hydrogen bonding
interactions. It should be noted that DOCK uses only rigid-body translations and
rotations, including no internal molecular ﬂexibility within the docking algorithm.
Recently, Shoichet and colleagues have used conformational ensembles as input to
DOCKBm.

Another popular docking algorithm is AutoDock[33], which employs a Monte-
Carlo simulated annealing method to sample binding orientations and ligand con-
formations, by randomized rotation of torsional angles in the ligand. AutoDock is

inexpensive, easy to use, achieves reasonable dockings, and has been applied success-

fully to predict ligand binding modes in several cases[33].

Another stochastic approach based on the use of a genetic algorithms (GA) was
developed by Jones and colleagues[21]. Their approach uses a simple GA operating
on rotational angles in the protein, rotational angles in the ligand, and on hydro-
gen bonds between protein and ligand. The ﬁtness or scoring function encompasses
terms for the hydrogen bond energy between protein and ligand, for the van der
Waals energy between protein and ligand, and for the internal van der Waals energy
Of contacts within the ligand. Improvements to this algorithm resulted in the de-
velopment of the GOLD algorithm[22], with changes in the representation of angles
and hydrogen bonds and inclusion of a more robust scoring function. The GOLD
algorithm achieved “acceptable” dockings for 71 of 100 test cases. Similar to GA
approaches is the evolutionary programming approach AGDOCK [11], developed at
Agouron Pharmaceuticals, which is able to correctly dock (within 1.5 A of correct po-
sition), methotrexate into dihydrofolate reductase (DHFR) and a proprietary ligand,
AG-l343, into HIV protease.

To address side-chain and limited main-chain ﬂexibility of the receptor structure
in docking, FlexE [3], was created. FlexE docks ligands into an ensemble of structures
of the receptor instead of a single receptor binding site. The binding site ensembles can
be from different crystallographic structures, as in [3], from a homology model with
uncertain side-chain positions, from a series molecular dynamics time steps, or from
another source. The key component of this algorithm is that multiple conformations
of the protein can be used as a docking target simultaneously. In this algorithm,
the receptor structures are merged, with regions of similar conformations reduced
to a single structure and regions with dissimilar conformation constituting alternate
positions. While this algorithm may handle some backbone movement in addition
to side-chain rotations, the authors claim it is not able to work with large domain

movements and limit their test set to protein ensembles with similar backbone traces.

1.4 Side-Chain Modeling in SLIDE

The docking and screening software SLIDE (Screening for Ligands by Induced-ﬁt
Docking, Efﬁciently) was developed in our laboratory for screening and docking small
molecules into speciﬁed binding sites of target proteins [43, 41, 42]. Figure 1.2 pictori-
ally describes the algorithm used in SLIDE for screening and docking small molecules.
Surfaces of both the protein and the molecule being screened are represented by sets
of strategically located interaction points, each point representing the binding site’s
or the ligand’s polar atom or hydrophobic characteristics. The protein’s interaction
points are called template points, while the potential ligand’s interaction points are
called interaction centers. Each of these points is assigned one of the descriptor types
: hydrogen-bond donor if the protein or ligand atom can donate a proton in an hy-
drogen bond, acceptor if the atom can accept a hydrogen bond, donor/acceptor if
the atom can donate and/or accept a hydrogen bond, and hydrophobic if the atom
or point represents a hydrophobic site in the ligand or protein. Unlike the potential
ligand’s interaction-centers, template points represent the binding-site’s “negative”
image, or ideal ligand atom types to bind at the position. Screening and docking is
based on matching triplets of ligand’s interaction centers onto triangles Of template
points based on geometry as well as descriptor types. Since there can be 0(n!) tem-
plate triangles from n template points, SLIDE uses a multi-level geometric hashing
for matching interaction-point triangles to pre-computed template triangles stored in
the hash table, indexed using the triangle’s descriptor types and geometry. Geomet-
ric hashing was originally applied in Object matching and identiﬁcation in computer
vision [49]. Combined with an adequate molecular surface representation, geometric
hashing yields a state-of-the-art toolkit for docking[10, 35]. This indexing empowers
SLIDE to exhaustively explore orientational space for the ligand with respect to the
protein speedily and efﬁciently. The part of the ligand within the triangle of inter-

action centers is treated as rigid, and called an anchor fragment. It serves as the

10

anchor for docking the ligand into the binding site. Any remaining fragments of the
ligand are considered ﬂexible. Single bonds in the ﬂexible parts of both the protein
and ligand candidates are rotated as needed to remove inter-atomic collisions and
to generate a shape-complementary interface, before the complex is scored by the
number of intermolecular hydrogen bonds and the hydrophobic complementarity of
the contact surfaces.

SLIDE was the ﬁrst method to balance protein and ligand ﬂexibility in dock-
ing while developing induced ﬁt between the protein and ligand surfaces during the
docking step. It has identiﬁed and correctly docked diverse, known ligands into the
ligand-free conformation of the binding site for a variety of proteins, e.g., subtilisin,
cyclodextrin glycosyltransferase, uracil DNA glycosylase, rhizopuspepsin, HIV pro-

tease, estrogen receptor, and Asn tRNA synthetase [43, 41, 42].

1.5 Motivation and Focus in this Thesis Work

Various groups have studied protein side-chain ﬂexibility modeling in docking.
Docking techniques using MD or MC simulations can model side-chain ﬂexibility
with good accuracy, however they are slow. Besides, MD is poor at crossing energetic
barriers at reasonable temperatures, hence limited in its sampling conformational
space. Genetic algorithms encoding ﬂexibility tend to be too slow for use in screening
many molecules by docking. Exhaustive search space sampling methods using dis-
crete rotamer libraries are fast and efﬁcient if the number of side chains are limited.
Methods like SCRWL using rotamer libraries have good accuracies for predicting 20.2
x angles, however such predictions have been conﬁned to homology modeling or ab
initio protein design rather than docking. Because rotamer libraries specify rotamers
as average conformations representing clusters of similar side-chain conformations,

a realistic use of libraries requires expanding the rotamers by sampling around the

11

 

For All Possible Anchor Fragments Deﬁned by All Trlplets of
lntersctlon Centers In Each of the Screened Molecules
Identliy Chemically and
Geornelrlcally Feasible
Superposition of Ligand Triangle

onto Template TriangIeA
————> .

    

Dock Rigid Anchor
Model Induced Md Ligand Side Fragment based on
Complementarity Chains in Triengle'e
with

 

Figure 1.2: This image is presented in color. Screening and docking algorithm im-
plemented in SLIDE [39]. SLIDEs docking of potential ligands into the binding site
is based on mapping triplets of ligand interaction centers (hydrogen-bond donors,
acceptors donor/ acceptors, or hydrophobic atom centers) onto triangles of template
points located above the protein surface. Feasible template triangles for each possible
triplet in a screened molecule are directly accessed via a multi-level hash table, and
the corresponding mapping is used to dock the rigid anchor fragment of the potential
ligand. Single bonds in the ﬂexible parts of both molecules are rotated to generate a
shape-complementary interface, before the complex is scored by the number of inter-
molecular hydrogen bonds and hydrophobic complementarity of the contact surfaces.
In all steps, the ligand triplets or dockings that do not meet a particular threshold
are discarded.

 

12

dihedral angles within speciﬁed deviations[24]. DEE-based side-chain conformational
search techniques are efﬁcient but can miss global energy minima. The fuzzy-end
elimination theorem[28, 25] corrects this problem but the search space becomes huge.

Some groups have studied speciﬁc side-chain ﬂexibility aspects, such as the num-
ber of interfacial side chains undergoing large conformational changes or whether the
side chains remain rotameric after ligand binding. For instance, [14, 15] claim that
interfacial side chains are not always rotameric as ligand binding could induce non-
rotamericity in the binding-site residues. Najmanovich and colleagues[34] show that
for 85% of the binding pockets they studied, 3 or fewer side chains underwent a dihe-
dral rotation of more than 30°. Knowledge about such proclivity might help restrict
the conformational sampling in an exhaustive search without loosing the ability to
model substantial side-chain conformational changes; and hence is very valuable for
to improve docking algorithms.

This research was motivated to develop techniques that accurately model side
chain positioning during docking to optimize the interactions in the protein-ligand
interfaces. Optimal interactions, which may be deﬁned as interactions observed in
crystal complexes, play signiﬁcant role in molecular recognition. Ability to predict
Optimal interactions in SLIDE would require developing an algorithm that not only
samples larger side-chain conformational space than sampled by the current induced-
ﬁt collision resolution paradigm, but can also correctly identify those side chains for

which bigger motions need to be sampled.

1.6 Organization of the Thesis

This thesis addresses the questions of which side chains are observed to undergo
large rotations upon ligand binding (and why), how to sample such motions and the

extent to which rotameric sampling and protein-ligand scoring facilitate or limit our

13

ability to model these motions accurately. Chapter 2 presents ana analysis of intra-
protein noncovalent bond preservation upon ligand binding, conclusions from which
help identify the side chains which undergo large motions during docking. Chapter 3
analyzes the rotamer-based sampling for these side chains. Of the many conformations
available for a collection of side chains, a docking tool should be able to identify
conformations that Optimize interactions and rank them highly. Chapter 4 presents
developing a new scoring function to predict the afﬁnity of a protein-ligand complex
with state-of-the-art accuracy and applies it to select from high-probability side-chain
rotamers. Chapter 5 presents the side-chain interaction optimization algorithm in
SLIDE and subsequent results. Chapter 6 concludes the thesis with a brief summary

and a discussion on future directions of this work.

14

Chapter 2

Hydrogen Bond Preservation in

Protein Binding Sites

2.1 Motivation

Predicting the structure of the complex of a small ligand with a protein is still
a complex task, two major problems being the deﬁnition of an appropriate scoring
function to discriminate good binding modes of the ligand among all possible binding
modes and the huge size of the binding-mode search space itself. Assuming that one
knows the correct scoring function, a successful search procedure should consider the

three factors that give rise to the size of search space:
0 sampling the relative orientations of ligand and receptor,
0 sampling the low-energy ligand conformations, and
e sampling the low-energy protein conformations

SLIDE can sample different relative positions of ligand. Moreover it also ad-
dresses small-scale receptor and ligand ﬂexibility through resolving steric overlaps

and developing a shape complementarity between the two. However, while enhancing

I5

shape complementarity, it does not yet take into account how chemical complemen-
tarity can also be enhanced during the docking. Enhancing chemical complementarity
in the binding site through exhaustive conformational search for both protein and lig-
and leads to a very large conformational search space. If there are ‘n’ conﬁgurations
for each of ‘m’ side chains in the protein-ligand interface, then there are 12'" possible
conﬁgurations for the protein alone to be assessed by scoring.

Such expensive spatial sampling is infeasible for high throughput screening algo-
rithms. To perform such afﬁnity enhancing sampling, we ﬁrst focus on what can be
learned from known changes in a receptor’s non-covalent bond network upon ligand
binding in order to limit the choices to those that are reasonable. This can help
us identify circumstances in which side chains do not undergo much rearrangement.
Hence, the knowledge of the circumstances under which side-chain motions occur,
and their extent of motion, can provide guidance for developing improved docking

prediction algorithms.

2.2 Methods

We selected 30 non-homologous protein crystal structures, both ligand free (apo-
protein) and ligand bound (holo—protein) from the PDB macromolecular structural

database.

16

Table 2.1: List Of protein-structure pairs studied for hydrogen bond preservation
upon ligand binding.

 

 

PDB CodeI Protein / Ligand Complex R.ee(A)2 RMSD3

1ahc/lahb a-momorcharin / formycin 2.0/2.2 0.23
5’ -monophosphate

lapm/latp CAMP-dependent protein kinase / 2.0/2.2 0.35
MnATP

1ajz/ 1aj2 dihydropteroate synthsse / 2.0/2.0 0.29
diphospbate

1ca2/1bcd carbonic anhydrase ll / 2.0/ 1.9 0.20
triﬂuoromethane sulphonamide

1cgf/1hfc ﬁbroblast collagenase / HAP4 2.1 / 1.56 0.39

lgmq/ lgmr ribonuclease/2’ -guanosine- 1.8 / 1.77 0.25
monophosphate

lkem/lkel catalytic antibody/hapten 2.2/1.90 0.89

2hvm/lllo hevamine a/allosamidin 1.8/ 1.85 0.12

1swa/1swd apo-core-streptavidin / biotin 2.0/ 1.90 0.97

2ptn/1tps trypsin / a90720a 1.55/1.9 0.3

1xib/1xid d-xylose isomerase / l-ascorbic acid 1.6/ 1.7 0.19

lydc/lydb carbonic anhydrase ii / acetamlamide 1.95/ 1.9 0.13

1tli/3tmn thermolysin / val-trp 2.05/ 1.7 0.22

6taa/7taa family 13 alpha amylase / acarbose 2.1 / 1.98 0.30

lgta/lgtb glutathione S-transferase / 2.4/2.6 0.20
praziquantel

lhel/lmlc hen egg—white lysozyme / monoclonal 1.7/2.1 0.49
antibody Fab D44.1

llib/llic adipocyte lipid-binding protein / 1.7/1.6 0.32
hexadecanesulfonic acid

lnsb/lnsc neuraminidase / N-acetyl neuraminic 2.2/ 1.7 0.12
acid

lpoa/lpob phospholipase A2 / transition-state 1.5/2.0 0.72
analogue

lsyc/ lsyd staphylococcal nuclease / 1.8/ 1.7 0.41
2’ -deoxy-3’ -5’ -diphosphothymidine

ludg/ludh uracil-DNA glycosylase / uracil 1.75/1.75 0.19

2sct/1aec actinidin / E645 1.7/1.86 0.11

2apr/3apr acid proteinase / reduced peptide 1.8/1.8 0.13
inhibitor

2cla/3cla chloramphenicol acetyltransferase / 2.35/ 1.75 0.41
chloramphenicol

2ctv/5cna concanavalin A / 1.95/20 0.42
a-methyl-D—mannopyranoside

2sgs/5sga proteinase A / tetrapeptide 1.5/ 1.8 0.08
AcePro-AlapPro-Tyr

2wrp/1tro 'Ii'p repressor / synthetic operator 1.65/ 1.9 2.18

3cox/lcoy cholesterol oxidase / 1.8/1.8 0.24
3-B-hydrmry-5—androsten-17—one

3dni/2dnj deoxyribonuclease I / DNA 2.0/2.0 0.37

3enl/5enl enolase / 2-phospho-D-glyceric acid 2.25/22 0.21

3grs/1gra glutathione reductase / glutathione 1.54/20 0.12
disulﬁde

5cpa/6cpa carboxypeptidase A / phosphonate 1.54 /2.0 0.36

 

‘ Ligand-free/ligand-bound

2Resolution of the crystallographic structures in Angstroms.

3Main-chain RMS positional deviation from superposition of the ligand-bound and free structures.
‘HAP is (N-(2-hydrouramatemethylene-4—methyl-pentoyl)phenylalanyl) methylarnine.

5E64 is [N-(l-3-trans-carboxyoxirane-2-carbonyl)-l-leucyl]-amido(4-guanido)butane.

17

These structures are a set of diverse, non-homologous structures with resolution
better than 2.2 A . Positional root-mean-square deviation after superimposition of
ligand-free and ligand-bound structures was less than 1.0 A (mostly smaller than 0.5

A ) . The structures had no missing residues in the binding site.

Hydrogen bonds were identiﬁed between donor and acceptor groups according to

the following geometric criteria [45, 31], shown graphically in Figure 2.1:

1. Donor-Acceptor distance, d S 3.6A
2. Hydrogen-Acceptor distance, r 5 2.6A
3. Donor-Hydrogen-Acceptor angle, 90° 3 0 5 180°

 

 

Figure 2.1: This image is presented in color. Geometric parameters used to identify
hydrogen bonds and measure their energy. The hydrogen bond is depicted as a
dashed line between the hydrogen and the acceptor oxygen. r is the hydrogen-acceptor
distance, d is the donor-acceptor distance, 0 is the donor-hydrogen-acceptor angle and
d: is the hydrogen-acceptor-base atom angle, where the carbon is the base atom in
this example.

 

The energy of each hydrogen bond was measured using a modiﬁed Mayo poten-
tial [16, 4]. The function evaluates the favorability of the observed hydrogen-bond

18

length relative to the optimal, equilibrium length for that pair Of atoms based on
their electron orbital hybridization, as well as the favorability of the angles between
the donor and acceptor groups. The modiﬁcation avoids non-physical H-bonds with
angles near 90 deg (e.g., between C=O(i) and NH(i+3), rather than the important
C=O(i)4—>NH(i+4) interactions in the middle of a-helices). The energies of hydrogen

bonds, Egg were calculated using equations 2.1.

Ens = Vo {5 (%)12 - 6 (%)lo} F09. «A w) (2-1)

with
V0 = 8 kcal/mol R0 = 2.80 A
sp3 donor - sp3 acceptor F = cos2l9e'(*‘“)°cos2 (qi — 109.5)
sp3 donor - sp2 acceptor F = cos”0e"*‘”l°cos"’¢
sp2 donor - sp3 acceptor F = c0340(e‘2(*‘9)6)

sp2 donor - sp2 acceptor F = cos20e’(""’lﬁcos2 (max [(1), <p])

R is the distance between the donor and acceptor atoms. The 0 angle is the
donor—hydrogen—acceptor angle, and {/2 is the hydrogen—acceptor-base atom angle,
where the base atom is the atom bonded to the acceptor (e.g., carbonyl carbon for
a carbonyl oxygen acceptor atom). The angle go is an out-of-plane angle that arises
when both the donor and acceptor have sp2 hybridization.

First, both ligand free and ligand bound structures were preprocessed for FIRST
[20] analysis to identify the hydrogen bond and hydrophobic interaction interactions
within the protein. To correctly identify only intra-protein interactions, ligand was re-
moved from the ligand-bound structure after adding hydrogen atoms by WHATIF [47]
program. Adding hydrogen atoms before removing the ligand is needed to correctly

assign hydrogen positions in the presence of ligand. Any other non-ligand non-water

19

hetero atoms, like 304, P04 or co-factors which were far from the binding site were
also removed. Then FIRST analysis was run on both ligand-free and ligand-bound
structures with hydrogen bond minimum energy cutoff of -0.1 kcal/mol as recom-
mended in [20]. This energy cutoff helps in excluding large number of very hydrogen
bonds. A hydrophobic interaction is determined between any two hydrophobic atoms
whose inter-atomic distance is within the sum of their van der Waals radii plus 0.5
A. FIRST analysis resulted in deﬁning the network of non-covalent interactions, in-
cluding hydrogen bonds, hydrophobic interactions and salt bridges.

The binding site was identiﬁed from the ligand bound structure. Any residue
having any of its atoms within 4.0 A of any ligand atom or any interfacial water atom
is considered as part of the binding site. Interfacial waters were determined to be any
water atoms within hydrogen-bonding distance (3.5 A) of both ligand and protein
structure, or any other water within hydrogen-bonding distance of any water atom
that meets the previous criterion. Corresponding to residues Of a binding site from
ligand-bound structure, residues from the ligand-free structure were identiﬁed as part
of binding site before ligand binding. These two sets of binding site residues, without
ligand or any other non-water hetero-atoms, were used for experiments.

Using the binding site residues and list Of interactions generated by FIRST , all
those interactions were identiﬁed which had any atom in the binding site, for both
ligand-free and ligand-bound structures. Using this list, intra—protein hydrophobic
interactions, and direct, one-water-mediated hydrogen bonds and two-water hydrogen
bonds are determined. These bonds are then categorized according to either atom

categories according to the rules speciﬁed in table Table 2.2

20

Table 2.2: Rules for categorization of atom types. These categories are used to classify
hydrogen bonds in the bond preservation analysis using FIRST.

 

 

Residue Type Atom Name Atom Category Category Description
ALA C" ALPH Aliphatic carbon
ARG NE N_Pos Positive nitrogen
ARG CZ ARMT Aromatic carbon
ARG C“ ALPH Aliphatic carbon
ASN OD Keto Neutral keto group
ASN ND N_Ntrl Neutral nitrogen
ASN C* ALPH Aliphatic carbon
ASP OD O_Neg Negative oxygen
ASP C* ALPH Aliphatic carbon
CY S C* ALPH Aliphatic carbon
GLN OE Keto Neutral keto group
GLN NE N_Ntrl Neutral nitrogen
GLN C“ ALPH Aliphatic carbon
GLU OE O_Neg Negative oxygen
GLU C“ ALPH Aliphatic carbon
HIS ND His_ntrl_Pos Histidine neutral or positive nitrogen
HIS NE His_ntrl_Pos Histidine neutral or positive nitrogen
HIS C* ARMT Aromatic carbon
ILE C* ALPH Aliphatic carbon
LEU C* ALPH Aliphatic carbon
LYS NZ N.Pos Positive nitrogen
LYS NZ N_Pos Positive nitrogen
LYS C“ ALPH Aliphatic carbon
MET C“ ALPH Aliphatic carbon
PHE CB ALPH Aliphatic carbon
PHE C* ARMT Aromatic carbon
P04 0 Ion Ion
PRO 0* ALPH Aliphatic carbon
SER OG O_N tr1_I-lxyl Neutral hydroxyl oxygen
S04 0 Ion Ion
THR OG O_Ntrl_nyl Neutral hydroxyl oxygen
THR C“ ALPH Aliphatic carbon
TRP NE N_Ntrl Neutral nitrogen
TRP CB ALPH Aliphatic carbon
TRP C“ ARMT Aromatic carbon
TYR OH O_Ntrl_nyl Neutral hydroxyl oxygen
TYR CB ALPH Aliphatic carbon
TYR C“ ARMT Aromatic carbon
VAL C“ ALPH Aliphatic carbon

* NHl N_Pos Positive nitrogen
* NH2 N_Pos Positive nitrogen
* OT1 O_Neg Negative oxygen
* 0T2 O_Neg Negative oxygen
* OXT O_Neg Negative oxygen
* O MCHN-O Main chain oxygen
* N MCHN_N Main chain oxygen

 

21

2.3 Results and Discussion

Figure 2.2 displays preservation percentages of direct hydrogen bonds between
different atom-category pairs or in short, “Bond Category”. For each bond category,
the label at far right indicates the bond preservation occurrences both in percentage
terms and in terms of actual cases found. The red and green bars for each category
also show the average displacement that was experienced by the bond centroid before
and after ligation, bond centroid being the mid-point between the bonding pair of
atoms.

Figure 2.3 shows the bond preservation percentages (in absolute counts) by
residue type including both side-chain and main-chain bonds.

Two key trends stand out. 70% or more of the direct hydrogen bonds between
two protein atoms are preserved for most atoms. Taking bond centroid displacement
as an approximate measure of how much atoms move in preserved hydrogen bonds,
it’s clear that almost always their displacement is less than 0.5 A. Both these ob-
servations may prove to be quite useful for docking algorithms. They help restrict
the number of sidechains which might require rearrangements upon ligand binding.
The displacement Observations also measure the extent to which bonded interfacial
sidechains can move while still preserving their hydrogen bond interactions. In fact,
those groups that maintain intra-protein hydrogen bonds upon ligand binding also
do not move signiﬁcantly upon ligand binding.

Pair of interfacial receptor atoms may also participate in water mediated hydro-
gen bonds apart from direct hydrogen bonds . There is a higher chance of displacing
the water upon ligand binding, hence a smaller percentage of water-mediated hydro-
gen bonds are preserved upon ligation, as presented in Figures 2.4 and 2.5.

Hydrophobic interactions are key to both protein folding as well as protein-
ligand docking. Exposing hydrophobic patches to solvent has a high energy cost via

unfavorable entropy. Figure 2.6 displays the percentages of hydrophobic interactions

22

 

Direct H-bonds : Bond preservation percentage
and bond centroid displacement

 

   

His_ntrl_Pos:MCHN_N
His_ntrl_Pos:O_Ntr1_nyl
N_Pos:0_Ntrl_nyl
O_Ntrl_nyl:O_Ntr1_nyl 501““)
N_Ntr1:O_Ntrl_nyt
Hie_nlrl_Pos:MCHN_O
MCHN_0:N_Pos 68%(13/19)
N_Pos:O_Neg

MCHN_N:O_Neg
s_ntrl_Pos:O_Neg
O

- Broken
50“"2) - Preserved

luau;

luau,

 

Hl
O_Neg:O_Ntrl_nyl
N_Ntr1:0_Neg
Keto:N_Pos
MCHN_O:O_Ntr1_nyl
MCHN_N:O_Ntr1_nyI
MCHN__O:N_Ntrl
MCHN_N:MCHN_O
Keto:N_Ntrl
Hls_ntrl_Pos:N_Pos
Hls_nlr1_Pos:Keto
Keto:MCHN_N , , , . I . . . .

l
0.0 0.5 1 .0 1.5 2.0
Displacement (A)

‘III‘II

 

 

 

I u v v

Figure 2.2: This image is presented in color. Percentages and displacements, by atom
category, of preserved direct intra-protein hydrogen bonds in protein-ligand interfaces.

 

23

 

Direct Hydrgen bonds: Preservation trends
by residue type

 

  
  
   
 
  

LYS 70.97%(22/31)
TYR 71 05%(27/38)

HIS 75.00%(24/32)
ARG 75.20%(94/125)

75.86%(88/116)

   

77.92%(60/77)

 
 
  

84.21%(32/38)
85.00%(17/20)

 
     
   

ASN 86.96%(40/46)

87.50%(7/8)

  

87 50%(35/40)
88.24%(30/34)

Residue Type

   
  

90.00%(9/10)
95.24%(20/21)

PHE
PRO
Y
3E: - Broken
- Preserved

 

100.00%(16/16)

'lIIVIIIIvlvaI'IIIIIIIII'IVII'IIIV

0 20 40 60 80 100 120 140 160
Count

Figure 2.3: This image is presented in color. Percentages, by residue type, of pre-
served direct intra-protein hydrogen bonds in protein-ligand interfaces.

 

24

 

H-bonds mediated by one water-molecule:
Preservation trends by residue type

- Broken
- Preserved

 

21 .95%(18/82)

  

45.45%(15/33)

 
  

53.85%(21/39)
22.22%(2/9)
26.09%(6/23)
53.33%(8/15)
30.77%(8/26)

21 .62%(8/37)

 
   
 
   

Residue

GLY

38.46%(5/13)
62.50%(5/8)
PRO 75.00%(3/4)
86.67%(2/3)
38.10%(8/21)

0 20 40 60 80 100 120 140 160
Count of one-water mediated h-bonds

 
   
 

 

Figure 2.4: This image is presented in color. Percentages, by residue type, of pre-
served interfacial intra—protein hydrogen bonds mediated by one water molecule.

 

25

 

H-bonds mediated by two water-molecules:
Preservation trends by residue type

0.00%(0/1) - Broken
13.33%(2/15) - Preserved

25.00%(5/20)

 
     
 
 
 
  
  
   
   

 

 

”swam/27)

19.23%(5/26)

GLU 11.11%(3/27)
a SER 21.05%(4/19)
'—

g ALA 55.56%(5/9)
E ASN 20.83%(5/24)
a:

12.50%(1/8)
GLY 20.00%(3/15)
7.41%(2/27)
11.11%(1/9)

35.71%(5/14)

 

35.71%(5/14)
,...,.,

0 20 40 60 80 100 120 140 160
Count of two-water mediated h-bonds

Figure 2.5: This image is presented in color. Percentages, by residue type, of pre-
served interfacial intra—protein hydrogen bonds mediated by two water molecules.

 

26

that stay preserved between the same pair of residues. Hydrophobic interactions
are less speciﬁc, unlike hydrogen bonds. Hence, while hydrogen bondsare counted
preserved if they persist between same pair of atoms upon ligand binding, hydrophobic
interactions are evaluated preserved if the persist between the same residues. The

criteria of evaluating hydrophobic interactions are :-

1. inter-atom distance 3 sum of van der Waals radii of the atoms + 0.5
2. Both the participating atoms are connected to hydrogen, carbon or sulphur

atoms only.

This focuses on more hydrophobic interactions rather than simple C-C van der

Waals interactions.

2.4 Conclusions

The presented results provide powerful ways to guide spatial sampling during the
docking process. Hydrogen-bond conservation probabilities can be utilized to limit
the choices of rotatable bonds that can be rotated to resolve steric-clashes, determin-
istically or probabilistically, that are part of a sidechain participating in a hydrogen
bond . Such limiting not only brings down the combinatorial effort involved in resolv-
ing clashes, but can also boost the chemical complementarity of the ﬁnal docking by
keeping the number of unsatisﬁed atoms buried in the interface low, thus improving
the overall quality of dockings. Strong preservation trends of hydrophobic interactions
can similarly guide docking algorithms to make wiser choices for developing induced

ﬁt.

27

 

Hydrophobic Interactions : Preservation

 

 
  

 

 

 

trends by atom category
Broken
73.61%(53/72) 1 Preserved
ARMTzARMT
L; J.
8
E 66.67%(100/150)
g W... i
it
i
.c ..
D.
2
E1 71.33%(51/71)
ALPHALPH
.....,....,.........,....,..............,....
40 so so 100 120 140 160 180

Count of hydrophobic interactions

Figure 2.6: This image is presented in color. Percentages, by atom category, of
preserved interfacial intra—protein hydrophobic interactions.

 

Chapter 3

Sampling Side-Chain Positions in

Protein-Ligand Interfaces in SLIDE

The extent of intra-protein hydrogen bond preservation and the average displace-
ment of the centroid of preserved and broken hydrogen bonds in Figure 2.2 indicate
the tendency of most interfacial side chains to minimally adjust upon ligand bind-
ing. This chapter presents a detailed analysis of side-chain displacements as observed
from ligand-free to ligand-bound conformations and how such displacements might be
modeled in the docking tool SLIDE. Section 3.1 characterizes in detail the side-chain
displacements observed in the 30 structures speciﬁed in Table 2.1. Displacement
characterization is presented as the extent of rotations experienced by the different
x angles of side chains, as well as the extent of positional deviation (RMSD) the in-
terfacial residues undergo upon ligation. Thereafter, the induced-ﬁt model currently

encoded in SLIDE is explained, as well as how it can be augmented.

29

3.1 Side-Chain Displacements upon Ligand Bind-
ing

Ligand binding can change the protein conformation by moving not only side
chains, but also the protein backbone. A simultaneous conformational search for both
backbone and side chains is infeasible in screening software due to the exponential
complexity of the conformations to sample and test. However, sampling side-chain
conformations only is relatively easier, though still a computationally daunting task;
protein side-chain motion is still ignored in most docking software or modeled in a
biased way based on known crystal complexes with different ligands.

To study the type of side-chain displacements through dihedral side—chain rota
tions, all residues with at least one rotatable bondbetween heavy atoms were con-
sidered. Hence alanine, glycine and proline were not considered. Rotation of the
NE—CZ bond of arginine was not considered because the CD, NE, CZ, NHl and N H2
atoms form a planar, partial double—bonded structure, severely restricting rotation.
All the ligand-bound side chains analyzed were within 4.0 A of the ligand in the 30
ligand-bound structures, deﬁning the set of interfacial side chains; corresponding side
chains from the ligand-free structures were also considered.

Because angular rotations can be compensatory and even side chains with signiﬁ-
cant x-angular differences can be almost superimposible in Cartesian coordinates[29],
the amplitude of side chain motions was also studied. Figure 3.1 shows the root-mean-
square positional deviation (RMSD) experienced by interfacial side chains upon ligand
binding across 30 structures. Up to 70% of the side-chain deviations were less than
0.5 A, while another 15% moved within 0.5 A to 1.0 A, while the remaining 15% ex-
perienced somewhat bigger motion between 1.0 to 5.5 A. This observation is in good
agreement with the bond-preservation observations, where 70% of the intra-protein

hydrogen bonds were preserved and, on average, the hydrogen bond preserving side

30

chains moved less than 0.5 A. Observations of the prominent tendency for intra-
protein hydrogen bond preservation is further supported by [50], where analysis of 63
ligand-bound and ligand-free structure pairs shows that 85% of side-chain motions
in the interface are small rotations that typically do not lead to another rotameric

conformation.

 

 

67.9%

 

 

200

Frequency of Occurence
8 8

8
8
§

\\\\\‘§-
\Vi
\‘i

 

 

 

 

 

 

 

 

 

 

 

0 // / 3% 97.7% 98.4% 95.4% 3232‘, 99.7% 99.71% 100.091
1' I e I I l e l I l y I

0 1 2 3 4 5 6
Residue RMSD (A) between ligand-free and ligand-bound conformations

Figure 3.1: Frequency distribution of RMSD Of ligand-bound interfacial side-chain
positional shifts upon ligand binding in 30 structures. Percentages atop each bar are
cumulative.

 

While RMSD measures the positional deviation of the entire side chain, changes

31

in each of the X angle of all the interfacial side chains were also studied. These
changes in x angles are presented in Figures 3.2 and 3.3. As would be intuitive,
higher x angles, which were further from the main chain, experience not only larger
changes when compared to lower x angles, but the changes have a broader angular
distributions, too. Possible explanation is that terminal side-chain bonds have more
space to sample small as well as large rotations with a lower probability of causing

steric overlaps, as compared to rotatable bonds closer to the backbone.

3.2 Current Rotation Paradigm in SLIDE

Here we present SLIDE’s current paradigm for modeling protein side-chain and
ligand ﬂexibility, and assess the extent to which it models the kinds of side-chain
motions observed in Figures 3.2 and 3.3.

As described earlier in Section 1.4, after matching a triangle from any three
ligand-interaction points to a template-point triangle using geometric hashing and
the chemical-complementarity criteria, SLIDE transforms the ligand triangle onto the
template triangle using least-square ﬁtting. Any ligand atoms within the perimeter
of the ligand triangle are also transformed, thus docking what is now considered to
be a rigid anchor fragment of the ligand, into the binding site. Transforming anchor-
fragment atoms not representing the interaction centers can lead to inter-molecular
bumps with the protein. SLIDE discards any ligand whose anchor fragment cannot
escape clashing with protein backbone, after attempting small translations of the
anchor fragment away from the clashing protein atoms.

If the anchor fragment has no steric clashes with the protein, the rest Of the
ligand is then transformed into the reference frame of the ligand anchor fragment to
complete the ligand structure in the binding site. Reconstructing the ligand in the

binding site can lead to new steric clashes between the protein and the ligand. Since

32

 

200

 

X xi
0 O
0 It,
, O
150- 2 O . C
, x l o X
100- x ' x X X

0

8
l

WOX X
>rer<x
XCDCII
onmo
MO
cocoons» -
‘MOOX»
OWOX
m
osmoer~
> “XX
XXX)“
m:
X0 0 0mm
>181.

Dihedral Angle Change (degrees)
0
I
X (X ram
mo 0

 

 

 

-50—l 5
o o X
.350
400- o x 33
.X.
38
-150...
O o
“ZOOIIIIIIIIIIIIIIIIIII
02amzagurarnr-ruarrra.
easesalIesassserséi
ResidueType

Figure 3.2: Change in X1 and x2 dihedral angles of all interfacial side chains within
4.0 A of the ligand, across 30 structures listed in Table 2.1. The change is calculated
as the difference of x angle of interfacial side chains between their ligand-free and
ligand-bound conformations. The standard deviation for X1 changes is 25.8 and for
x2 changes is 34.7. Note that most changes in x”) fall within the range of —20° to
20°.

 

33

 

200

 

 

 

 

 

150"" O O
' 8
A100- " 9
I x O
8’ o
2.50- 52 O X
3 o.
s 8 o
6 o- i M g.
0 .
i“ x ’8 :
§-50- g 00
5-100-
‘ %
450- 6 .
1 O L
'200 TiillTIIIlllllliliI
02am aggawr-urrraco.
rasaésx-ssessaefi

Figure 3.3: Change in x3 and x4 dihedral angles of all interfacial side chains within
4.0 A of the ligand, across 30 structures listed in Table 2.1. The change is calculated
as the difference of x angle of interfacial side chains between their ligand-free and
ligand-bound conformations. The standard deviation for x3 changes is 42.9 while
for x4 changes is 46.4. Note the somewhat broader range of change in x3 and x4
compared to changes in X1 and x2 dihedral angles in Figure 3.2.

 

34

these steric clashes involve the ﬂexible portion of the ligand, it is possible to resolve
each steric clash independently by rotating either the ligand’s ﬂexible side chain, or
a ﬂexible group in the protein with which it overlaps. In this context, a ligand side
chain is any group connected to the anchor fragment by a rotatable bond. When the
target (protein) atom clashing with the ligand belongs to a ﬂexible protein side chain,
that side chain’s rotation also presents another option to resolve the steric clash.

Geometrically, any collision between two atoms, which are modeled as va der
Waals spheres can be resolved in multiple ways, given that each atom as 3 degrees of
translational freedom. However, the colliding atoms are themselves bonded to other
atoms by single or double bonds. For each single-bond rotation that can help resolve
a steric clash, overlapping atom displaced by the bond rotation is considered mobile,
while the other overlapping atom is considered ﬁxed. When both the overlapping
atoms are independently connected to single bonds, than their overlap resolution is
evaluated with each of them being considered mobile and ﬁxed, one at a time.

Single bond rotations allow moving the rotating atoms in a circular trajectory in
planes perpendicular to the rotation axis, which is along the single bond. To resolve
a collision through a speciﬁc bond rotation, the rotatable single bond is transformed
along the Y-axis, with the atom of the bond further from the backbone forming the
new origin. The same transformation is applied to all the atoms in the side-chain
beyond, including both the atoms having the steric clash. The transformations are
calculated in a way such that, on application, they would move the ﬁxed atom into
the X-Y plane, as shown in Figure 3.4. This later helps calculations for determining
the rotatable bond’s rotation angle for collision resolution.

While there are potentially inﬁnite positions on the trajectory where mobile
atom can be placed to resolve a clash, most of these are usually infeasible due to the
high probability of causing new collisions. SLIDE’s paradigm to choose a position

to resolve collisions during docking is based on developing “induced ﬁt” between the

35

 

Circular trajectory available to
mobile atom through rotation around
a rotatable bond aligned along Y- “
axis. which points downward.
perpendicular to the plane or the Z

     
 

'a' reflects the minimum

 

angular rotation for mobile \ , ~ Sterlc
atom to resolve the steric ‘ \ ‘ overlq:
clash \ Mobile Atom I
0 Fixed
Atom X

 

Figure 3.4: Induced ﬁt development mechanism implemented in SLIDE performs di-
rected rotations to resolve atomic collisions. Directed rotations are performed around
a rotatable bond not adjacent to the mobile atom. The rotations are performed in a
plane parallel to X—Z plane, by aligning rotatable bond along the Y-axis. The hinge-
atom of the bond is new origin for the rotation. The ﬁxed atom is transformed into
the positive quadrant of the X-Y plane, while the mobile atom is rotated through the
minimum angle of rotation, ‘a’ to resolve the steric overlap.

 

36

ligand and the protein binding site[23]. To mimic induced ﬁt, SLIDE chooses the
smallest angle through which the mobile atom can be rotated to resolve the collision.

However, choosing the smallest rotation angle for a bond rotation is not sufﬁcient
since there may be multiple single bonds in a side chain in residues like lysine, arginine
which could be rotated to resolve a collision. SLIDE’s approach for deciding which
rotations to apply for resolving a set of collisions is based on mean-ﬁeld theory[26, 19],
implemented to optimize bond rotations[42]. The key feature of this approach is a
probability matrix P(i, j), which describes the probability that a particular collision
2' will be resolved by a rotation of bond j . First, all intermolecular collisions in the
complex are identiﬁed. They form one dimension of the matrix. Only complexes
with up to 20 collisions between atom pairs undergo side-chain collision resolution in
SLIDE; ligand dockings with more collisions are discarded. All rotatable bonds that
can be used to resolve at least one of the collisions build the other dimension of the
matrix. Only those rotations are considered that do not result in an intramolecular
collision based on the current conformation. Note that there is no differentiation
between ligand and protein side chains in this matrix. All rotations that can resolve
a particular collision are initialized with equal probability values. For each probability
entry P(z', j), a cost value E’ (z', j) is computed that reﬂects the cost of rotating bond
j to resolve collision i. This cost is simply the product of the number of displaced
non-hydrogen atoms and the absolute value of the rotation angle. This makes large
rotation anng or the rotation of large side chains more costly, as they are more likely
to cause steric problems elsewhere.

During the iterative cycles of the mean-ﬁeld optimization process, the probability
matrix P is updated to converge to high probabilities for those rotations that provide
the lowest-cost conformational change of both molecules to resolve a maximal number

of the observed collisions (Figure ??). In each cycle a mean cost, E (i, j), is computed

37

 

 

Find all side-chain collisions 2' and all rotatable bonds j;
While there are between 1 and 20 side-chain collisions do:
Compute probability matrix P and cost matrix E’;
For 10 cycles do:
Compute mean cost E(i, j);
update probability matrix P;
Do feasible highest probability rotations;
Find all remaining side-chain collisions i;

 

 

 

Figure 3.5: The algorithm for resolving side-chain collisions using the mean-ﬁeld
optimization technique. When there are still collisions exceeding the threshold after
10 iterations of the outer loop, this ligand orientation is discarded.

 

for each rotation, as follows:

E(i.j) = E'(z°.j) + Z depi(t,j),(h,k)i-P(h.k)~E’(h,k)
haéi, I:
The value of dep[(i, j), (h, k)] is set to -1.0 if j = k, i.e., both entries refer to the
same bond and both rotations are in the same direction. Hence, two collisions can
be resolved at once by a rotation of this bond, and the —1.0 value results in a lower
mean cost E(i, j). The value of dep[(z', 3'), (h, k)] is set to 1.0, with a resulting increase
in E(z’, j), in two cases. The ﬁrst case is when both entries refer to the same bond
(j = 1:), but the corresponding rotation directions are opposed to each other. In
the other case, bond j lies on the path from bond 1: to the anchor fragment or the
main chain, respectively. Here, the mean cost of rotating bond j is increased, since if
rotation (i, j) were applied, then bond k would be moved, and this would invalidate
the assumptions made in the current iteration regarding rotations involving this bond.

At the end of each cycle, the entries in the probability matrix are updated based

38

on the mean costs E (2', j) using the Boltzmann principle:

. . e-Eﬁajvl‘
P(2,J) = Tl: e-E(i.k)/u

where p is the average value of all computed mean costs. Convergence of the values in
the probability matrix is usually observed in fewer than ten cycles, and those rotations
with the highest probability are chosen to resolve the collisions. At this point, it is
again necessary to check for negative correlations between bonds. Although this
was already considered during the computation of the mean cost, two correlated
bonds can receive high probabilities if they are the only bonds to resolve particular
collisions, or if alternative rotations are much more expensive. During the mean-
ﬁeld optimization process, it is not possible to anticipate complex dependencies, e.g.,
which ligand rotations inﬂuence protein bonds related to other collisions. Rotations
are only accepted if they do not cause any intramolecular collisions. Since it is likely
that not all collisions can be resolved in one application of the mean-ﬁeld Optimization
technique, up to 10 iterations of this process are executed, as outlined in Figure ??.
Ligand conformations are discarded if they have more than 20 collisions at any time

during the optimization or have collisions exceeding the threshold after 10 iterations.

3.2.1 Motions Modeling in SLIDE

Mean-ﬁeld optimization helps choose bond rotations to minimize the confor-
mational cost of developing shape complementarity. Figures 3.6 and 3.7 present
a displacement comparison of side chains that were moved by SLIDE for collision
resolution with displacements of corresponding side chains from their ligand-free to
ligand-bound conformations. (Hence, these ﬁgures present a subset of interfacial side
chains which were found to have moved in Figure 3.1).

To examine how well SLIDE models known side—chain motions, the distribution

39

 

 

 

 

 

 

 

 

 

 

Ligand-free to ligand-bound
side-chain RMSD
40 " Ligand-free to SLIDE's best docking's

side—chain RMSD

Z

/ V

a A
E ZS

M .

/\/§

/§/\

Zﬁﬁ

/§%\

/\/\

/\/\

Zhé§ a .

Mt

as, \T . l3“. .

-2 0 2 4 6

RMSD (A)

(negaMmRMSD:MienSUDEmvedasldednhmmermyﬁunllgmdbwndomt)

Figure 3.6: Frequency distributions of RMSD between ligand-free and ligand-bound
side-chain conformation, as well as RMSD between ligand-free side-chain conforma-
tion and sidechain conformation in the best docking generated by SLIDE. The best
docking was determined using the RMSD of the docked ligand relative to the crystal
structure orientation. Positive values indicate the RMSD by which SLIDE’s con-
formation of side chain got closer to crystal complex conformation, while negative
values indicates that SLIDE moved the side chain further from crystal complex con-
formation. Observations are derived from SLIDE’s best dockings of 24 structures,
comparing only those interfacial side chains that were moved by SLIDE.

 

40

of root-mean-square deviations (RMSDs) between ligand-free and docked side-chain
conformations found in the best dockings by SLIDE were compared with the distribu-
tion of RMSDs between ligand-free and ligand-bound side-chain conformations from
crystal structures (Figure 3.6). This distribution data was derived from 24 of the 30
structures speciﬁed in Table 2.1. The remaining 6 structures did not permit dock-
ing without an increase in inter- atomic interpenetration parameters, so they were
excluded from this analysis. The RMSD value of a docked side-chain conformation
relative to its ligand-free conformation is displayed negative when it is greater than
the RMSD value between the ligand-free and ligand-bound conformation in crystal
structures. Hence, this ﬁgure compares side-chain displacement distributions between
modeled and observed structures, both in magnitude and direction. While SLIDE’s
distribution of RMSD magnitudes was quite similar to nature, as most of the motions
were restricted between 0-1 A, a few large displacements in ligand-bound conforma-
tions are also observed. These observations are in agreement with [50], where it has
been found that in 85% of the cases, interfacial side chains rotate through less than
45° upon ligand binding.

While Figure 3.6 compares displacement distributions, Figure 3.7 compares dis-
placements on individual side-chain basis. Presented as a scatter plot is the compar-
ison of side chain RMSDs between ligand-free and ligand-bound conformations, and
between the ligand-free conformation and the conformation in the best dockings for
24 structures by SLIDE. Comparing with Figure 3.1, it is clear that SLIDE moves
very few side chains compared to nature. The correlation between the ligand-free to
ligand-bound side chain RMSD and ligand-free to docked side chain RMSD is small
(0.28), indicating the extent to which SLIDE misjudges the magnitudes of motion,
as is clear by the degree of scatter. The color coding indicates the quality of motion,
answering the question: “ did SLIDE move the side chain closer to ligand-bound

conformation or further away ? ” SLIDE, while removing steric overlaps, tends to

41

 

5 Closer to

 

 

 

 

 

:, llg-bound oonf.
O . -1
I 0 0.6000
‘0 0
c: 4 o
g ‘ 0.3375 5
g 5
l 0 07500 9..
§ 5 §.
2 3‘ ,-
u, 0.1875 8.
i " m.
g 0.4500
.2 .E 2 'l .0 g
.3 “ . ° . ' c.7125 _
_ E . . 3
g . . 0 0.9750 g
1 - . e re
0 0 ..:~ 4.238
(O O .9. 0
z '0 ' ° 1 500
m 0 -l - '
Further from
I - I . I ' I . I . liq-bound cont.
0 1 2 3 4 5
RMSD between Ligand-free and SLIDE‘s best docking's
side-chain conformation

Figure 3.7: This image is presented in color. Comparison of side chain RMSDs
between ligand-free to ligand-bound conformation, and between ligand-free confor-
mation and conformation in the best docking. The best docking was determined
using the RMSD of the docked ligand versus the crystal structure ligand orientation.
The color scale indicates the quality of motion. Positive values indicate the RMSD
by which the SLIDE-generated best docking’s side-chain conformation became closer
to crystal complex conformation, while negative values indicate that SLIDE moved
the side chain further from the crystal complex conformation. Observations are de-
rived from the best dockings of 24 structures, with only those interfacial side chains
being compared which were moved by SLIDE. Correlation between the two RMSD
distributions is 0.28.

 

42

move the side chains away from the ligand-bound conformation (colors ranging from
cyan to red). For only 18 of 58 observations did SLIDE move side chain closer to the
ligand-bound conformation (blue).

Having observed that most of the SLIDE motions are minimal to resolve colli-
sions, that each collision can usually be resolved by either atom’s rotation around a
rotatable bond, and that often, SLIDE moves atoms further away from ligand-bound
positions, it is postulated that both directions of rotation should be explored for col-
lision resolution. While the angle of rotation in one direction will be bigger than the
angle in the opposite direction, the ability to make better interactions in one direc-
tion versus the other should inﬂuence the direction of rotation. Furthermore, from the
observed intra—protein hydrogen-bond preservation tendency presented in Chapter 2,
any rotations which might disrupt direct hydrogen bonds between the binding-site
side chains may be discouraged so that SLIDE’s dockings will have a similar bias

towards preserving most of the intra-protein hydrogen bonds.

3.3 Employing Hydrogen-Bond Preservation Bias
in Mean-Field Optimization

As explained in previous section 3.2, the key feature of mean-ﬁeld optimization
process is the probability matrix P(i, j), which describes the probability that a par-
ticular collision i will be resolved by a rotation of bond j. At the beginning of the
optimization, each rotation that can resolve a collision is assigned equal probabil-
ity. Subsequently, these probabilities are updated iteratively during the optimization
process, depending upon the conformational cost of rotation and dependencies of re-
solving other collisions. To bias against those speciﬁc side-chain rotations that disrupt
a hydrogen bond, initial probabilities can be reduced according to the hydrogen-bond

preservation probabilities derived from the statistics shown in Figure 2.2. This should

43

effectively discourage rotations which can break a hydrogen bond. Nevertheless, de-
spite low probability, a particular rotation that disrupts a hydrogen bond can still be
effected if it is the only one that can resolve a particular collision, hence ensuring that
ﬁnal docking will be free from atomic overlaps. Biasing against breaking intra-protein
hydrogen bonds would likely improve the chemical complementarity of the docking
as well.

To measure the effect of incorporating hydrogen-bond preservation bias, a pilot
study of 5 ligand-free and ligand-bound structure pairs was conducted. The templates
representing the binding sites of the ligand-free structures included random sampling
of potential interaction points as well as known interaction points with the ligand.
Known points were included to ensure that at least one docking would result in placing
the ligand close to its native position and orientation, so that side-chain motion would
not be required to compensate for inaccurate ligand placement.

Figures 3.8, 3.9, and 3.10 present a comparison of SLIDE and the new SLIDE
variant with a hydrogen-bond preservation bias based on the statistics from Figure 2.2.
Data presented for both SLIDE versions include only the best dockings according to
the ligand RMSD (relative to its crystal complex position), which must be less than
1.0 A. The native ligand conformations from ligand-bound crystal complexes were
used for docking into apo structures. Figure 3.8 presents the total number of intra-
target (protein binding site) hydrogen bonds. The crystal complexes have more intra-
protein hydrogen bonds upon ligand binding compared to the ligand-free structures.
Both SLIDE versions preserve a similar number of hydrogen bonds, but fewer than
what nature preserves in the crystal complexes. This indicates that new intra-target
hydrogen-bonding opportunities are missed in both versions of SLIDE. Figure 3.9
presents intra—target hydrogen bonds that were lost upon ligation in the crystal com-
plex and in the best dockings by SLIDE. Interestingly, while more hydrogen bonds

were lost in the crystal complexes, Figure 3.8 indicates that these complexes form

44

new intra-target hydrogen bonds that were not present in the ligand-free structures,
as shown in Figure 3.10. Even though the sample set of 5 structures is small, it nev-
ertheless suggests that nature breaks and makes more intretarget hydrogen bonds
upon ligation through optimal arrangement of side chains. SLIDE, on the other hand,
due to its minimal rotation collision resolution model, tends to keep the binding site
side-chain positions and their interactions more or less the same, breaking and making
fewer intra-target hydrogen bonds compared to nature. Considering that each hydro-
gen bond has an energy of about -5 Kcal/mol, and that typical complex formation
involves a favorable energy change of only ~3 times this number, careful modeling of
interfacial hydrogen bonds is likely to be crucial to correctly sample and assess the

best dockings.

3.4 Sampling Large Side-Chain Motions

Comparison of the extent of displacement in Figures 3.6 and 3.7, as well as
hydrogen-bond preservation results from best dockings by SLIDE presented in Fig-
ures 3.8, 3.9, and 3.10 indicate that aside from developing good shape comple-
mentarity through small side chain adjustments, optimal rearrangement of the intra-
target hydrogen-bond network is important for favorable binding afﬁnity between the
molecules. This rearrangement may involve not only small but also large rotations,
which SLIDE’s induced-ﬁt mechanism does not encourage.

To explore if there are any prominent reasons for rearrangements through large
side-chain rotations, all interfacial residues that had undergone a side—chain dihedral
rotation greater than 60° upon ligand binding were analyzed by molecular graphics
to understand the reasons for these large rotations. Any binding sites having gaps
between residues were excluded from this analysis using the molecular graphics tool

InsightII, since those mobile residues could inﬂuence motions in unpredictable ways.

45

 

lntra-protein direct hydrogen—bond comparison
between ligand-free, ligand-bound structures
and best dockings by old and H-bond-preservationist SLIDE

+0

3 rs H-bond-preservatlonist SLIDE
9 +1 223 Old SLIDE

E Ligand-bound crsytal complex
DID Ligand-free structure

+0
1syc +3

1nsb

Complex

1 ajz

1ahb

 

0 1O 20 30 4O 50 60
Total number of intra-protein hydrogen bonds in the binding site

Figure 3.8: Comparison of the number of intra-protein hydrogen bonds before and af-
ter ligand binding in crystal complexes, in best dockings by SLIDE and the hydrogen-

bond-preservationist version SLIDE.

 

46

 

Comparison of intra-protein direct hydrogen bonds
lost upon docking by old SLIDE and H-bond-preservationist
SLIDE versions, compared against H-bonds lost in crystal complexes

H-bond preservationist SLIDE
agrs Old SLIDE
E Crystal complex

1 eye

1nsb

Complex

1ajz

Iahb

 

0 1 2 3 4 5 6 7 8 9
Number of intra-protein hydrogen bonds lost in the binding site

Figure 3.9: Comparison of the number of intra—protein hydrogen-bonds lost upon
ligand binding in nature (between ligand-free and ligand-bound structures), in best
dockings by SLIDE, and in the best dockings by hydrogen-bond-preservationist ver-

sion SLIDE.

 

47

 

Comparison of intra-protein direct hydrogen bonds
gained upon docking by SLIDE and by H-bond-preservationist SLIDE,
compared to hydrogen bonds lost in the crystal complexes

 

H-bond preservationist SLIDE

398 I222 Old SLIDE
E Crystal complex

 

1syc

1ahb

 

O 1 2 3 4 5 6 7 8 9
Number of intra-protein hydrogen bonds gained in the binding site

Figure 3.10: Comparison of the number of intra—protein hydrogen bonds gained upon
ligand binding in crystal complexes, in the best dockings by SLIDE and by the

hydrogen-bond—preservationist version of SLIDE.

 

48

Table 3.1 presents the conclusions from visual inspection of such large rotations. The
process involved super-imposing the ligand-free binding site onto ligand-bound site,
followed by analyzing steric and chemical factors that could encourage large rotations.

Several causes for rotations or displacements were found - main-chain movements
of more than 1.0 A (for residues like L33 in PDB entry lkel L33, residues A45 and
A49 in lswd), side-chain movements caused by motion in adjacent residues, aromatic
interaction with the ligand (lahb, residue 70), 1r—cation interaction with the ligand
(lgmr residue B40, lsyd residue 115 ), surface exposure or even crystal packing
effects (ludh residue 87). Nevertheless, in two-thirds of the cases, side chains moved
not to resolve any steric clashes upon ligation. Rather, side chains moved to satisfy
polar atoms (in 10 cases, excluding cases with aromatic or 1r-cation interactions)
which would have remained unsatisﬁed and buried if they had remained in ligand-
free conformations.

Visual inspection and results trends indicate that, in 50% of the cases, the reason
to satisfying the hydrogen—bond potential of a polar buried atom encouraged large
side-chain rotations in the interface upon ligand-binding. Exhaustive modeling of pos-
sible large side-chain rotations in SLIDE around single bonds would mean sampling
300° of dihedral-angular space for each single bond. Given that a side chain can have
from zero to four rotatable bonds, the combinatorial sampling, depending upon the
ﬁneness of sampling and the number of side chains to model, would be prohibitively

time-consuming to do in docking.

3.4.1 Rotamer Libraries

Fortunately, for most of the bond-rotation angles, protein side chains adopt pri-
marily a staggered dihedral angle such that covalently bonded, sp3-hybridized atoms
tend to stagger, rather than eclipse, their substituent atoms. As the number of

solved protein structures has increased over the years, statistical distributions of the

49

Table 3.1: Visual analysis of reasons behind large dihedral rotations (> 60°) in inter-
facial side chains.

 

Structure Residue# Residue Type Did side chain clash Would side-chain
with the ligand ? H-bond potential
have remained if side

chain had not moved
I?

 

1aj2 221 LYS N 0 YES
latp El 20 MET NO YES
1 atp E53 SER NO YES
lcgu 257 GLU NO YES
lgmr B40 ARG NO YES
lpob A52 ASN N 0 YES
lsyd 43 GLU N 0 YES
lsyd 1 1 5 TYR NO YES
ltps 192 GLN NO YES
7taa 296 HIS NO YES
lgmr B38 GLN N O NO
1ke1 L33 ASN NO NO
lbib 1 83 LYS NO YES
llic 58 LYS NO NO
1ahb 70 TYR YES YES
lgmr 865 ARG YES YES
1kel H56 LYS YES YES
lpob A30 ARG YES NO
laj2 115 ASN YES NO
llic 57 PHE YES NO
lpob A2 LEU YES NO
lcoy 122 MET YES NO

 

5O

side chain bond-rotational angles for each residue type have been characterized[24].
These bond-rotation angles are labelled x angles by convention, as explained in Fig-
ure 3.11. Side-chain X angles have been found to occur in tight clusters around certain
values. This is both because of the hybridization of the bonded atoms, as well as to
prevent collisions with the main-chain atoms of the residue and its neighbors. Each
single bond in the side chain can sample its dihedral degrees of freedom subject to
these constraints. The resulting side chain conformation is called ‘rotational isomer’
of the side chain. Abbreviated as ‘rotamer’, this favored orientation is represented
as a set of values, one for each dihedral angle degree of freedom. Since bond angles
and bond lengths in proteins have rather small variances, they are usually not in-
cluded in the deﬁnition. A library of such favored sets of side-chain x-angle values for
each residue type is called a rotamer library. Rotamer libraries usually also contain
information about the frequency of each rotamer, as some conformations are more
likely than others due to side chain stereochemistry, besides the information about
the variance around the dihedral angle mean or mode.

Using rotamers to sample side chain conformation can help avoid exhaustive
spatial sampling and make tractable the identiﬁcation of a suitable combination of
dihedral angles. With known geometries derived from experimentally solved crystal
structures, not only can time be saved during sampling, it is also more likely to arrive
at a side chain conformation that is natural and low in energy.

Using rotamer library for side chain modeling does have certain drawbacks.
Firstly, rotamers are typically averaged conformations representing a cluster of con-
formations; they may not be real conformation themselves. Besides, granularity of
dihedral space sampling of side chains within the rotamer library can itself be a limit,
especially because rotamers largely reﬂect favored side-chain orientations outside of
interfaces. Another aspect is that even though rotamers are thought to represent

local energy minima on a potential energy landscape, not all rotamers may be local

51

 

 

Figure 3.11: Illustration of bond-rotation angles associated with single bonds in an
arginine side chain. The number of single bonds in a side chain, ranging from 0 to 5,
depends on the residue type. A single bond is free to sample dihedral rotations and
its associated dihedral angle is called a x angle. The x angles are labelled X1» x2, x3
and so on, according to the level of the single bond from the residue’s backbone.

 

52

extrema since distributions Of side chain dihedral angles are fairly broad (i20° or 30°
relative to the average value presented).

Nevertheless, rotameric conformations can deﬁnitely be considered candidate
conformations to sample, and the extent to which they approach Optimal confor-
mation can be assessed. For sampling larger conformational space for side chains,
rotamer libraries Offer time savings while providing ready-tO-use, pre-calculated rO-
tameric side chain conformations.

Rotamer libraries can be backbone-independent, backbone-dependent, or even
secondary-structure-dependent. Backbone-dependent rotamers have dihedral angles
and/ or frequencies that are binned according to the local backbone conformation.
Backbone-independent rotamers are calculated using all available side chains of a
residue type. Backbone-dependent rotamers have the advantage that no rotameric
side chain can have steric clashes with its own backbone.

TO see if one can ﬁnd rotamers from a library similar to ligand-bound conforma-
tions Of few side chains which rotated through more than 60° on ligand-binding, the
Dunbrack May 2002 backbone dependent rotamer library was searched (downloaded
from http://dunbrack.fccc.edu/bbdep). Rotamer searches were done in incremental
fashion - ﬁrst comparing only X1, then X1 and x2 and so on up to x4, depending
upon the number Of x angles in the side chain. A rotamer was considered similar to
a side chain if each Of the side-chain x angles were within the range x :l: a, where a is
the standard deviation of respective x angle speciﬁed for the rotamer in the rotamer
library.

As is clear from search results presented in Figure 3.12, for side chains with x3
and x4 angles, fewer suitably close rotamers were found. Furthermore, for a few Of the
side chains, like Tyr70 in 1ahc and Tyr115 in lsyc, no rotamer was found. This result,
in effect, places an upper bound on how close, in x-angular space, tO a ligand-bound

side chain conformation can rotamers from this particular library reach.

53

 

Rotamers found by x—angie search for selected
ligand-bound side chain conformations

 

 

1ahc_ TYR _701
1syc_ TYR_ 1151
1swa_ A _SER _451
1apm: E _SER _531
1Iib_ ”PHE _571
3cox_ MET_ 1221
1apm_ E_ MET: 1201
1kem_ H L_YS _561
1ajz_LYS_2211
1bia_ us: 1831
1Iib_ LYS _581
1kem_ L_ LYS 501
6taa:_ HIS _2961
1syc: GLU _431
1cgt_ GLU _2571
1udg_GLN_ 871
1gmq__BG GLN _381
GLN_ 1921

1swa_ A_ ASN _491
1kem: L _ASN :331
1p0a_ ASN :521
1poa_ ARG :301
1gmq_B_ ARG_ 651
13m mcLB _ARG: 401
chs_ A_ ARG :901

WMmemk
Matching upto 13
Matching upto x4

4

Matching upto x,

4

 

00290-0000...
DO
4

Sidechain
4
B 8

4

3'

A V
'l'l'Ijjinl'l'l'lejT

510152025303540455055
Number Of rotamers found In the Dunbrack May 2002 rotamer library

 

 

 

010090000000.
4

-5

Figure 3.12: Number Of rotamers, from Dunbrack May 2002 backbone-dependent
rotamer library (http://dunbrack.fccc.edu/bbdep), approximating dihedral angles of
target conformations for 25 interfacial side chains undergoing large rotation upon
ligand binding. Selected side chains across structures are those that had rotated
more than 60° from the ligand-free tO ligand-bound conformation. Side chain names
(Y-axis) include the PDB code, chain ID (if present), residue type and residue number.
Backbone d, 21) angles were used to locate the bins in which rotamers were searched,
according the x angles Of the side chain in question. In the rotamer library, the d, w
resolution is 10°. The neighboring 8 bins, using 45 :t 10 , 1,0 :l: 10, were also searched.
Rotamer searches were done in incremental fashion - ﬁrst comparing only X1, then X1
and x2, and so on, up tO x4, depending upon the number Of x angles in the side chain.
A rotamer was considered similar to a side chain if each Of the side-chain x angles
were within the range x i a, where a is the standard deviation Of respective x angle
speciﬁed for the rotamer in the rotamer library. Rotamers searched consisted of only
those rotamers which had the probability Of at least 0.05 times the probability Of the
highest probable rotamer within the same 45, ll) bin. This helps exclude rotamers that
were very rare, since many Of them could be poorly resolved side-chain conformations
in PDB.

 

54

Another worthwhile aspect to explore is to determine how close rotameric side
chains, which are the 3D side-chain orientations generated from the rotamer library’s
05, w and x angles, can approach ligand-bound conformations in terms Of RMSD.
The reason for this is that single-bond rotations through adjacent x-angle changes
can either compensate for each other or augment each other in terms of the resulting
side-chain motion. The closest rotamers found, in 3D Cartesian space, for ligand-
bound conformations are presented Figure 3.13. Rotarneric conformations within
1 A Of the ligand-bound conformations were found for about half Of the 22 side
chain cases presented here. This again effectively deﬁnes the limit on how close
rotameric modeling approach can approach ligand-bound side-chain conformations
using Dunbrack’s backbone-dependent rotamer library.

In Figure 3.13, the relationship between B-values for the large-motion side chains
and the probability Of their having a rotameric conformation is presented. The moti-
vation for this is that poor resolution Of high B-value residues could be reﬂected in the
unlikelihood Of ﬁnding these conformations in the rotamer library. On a color-scale
representing the maximum B—factor value found for any atom Of the side chain, black
to yellow indicate that the atom coordinates are low in mobility, while light-green to
grey indicate that the side chain had at least one atom whose position was highly mo-
bile and thus likely to be poorly resolved in the crystal complex. SO, while rotamers
close to ligand-bound conformations were found (RMSD < 0.7 A) for lbib-Ly8183
and 1gmr-ArgB40, high B-factor values indicate that these side chains do not have
a well-deﬁned conformation. Hence reliable conclusions may not be drawn about the
eﬂ'ectiveness Of a rotamer library for its ability or inability to ﬁnd a close rotamer for
side chains that consist Of atoms having high B-factor values.

However, for most (13 out of 19) side chains having low B-factor values, a ro-
tamer within an RMSD rang of 0.28 A to 0.83 A was found in each case. For 5 side

chains having low B-factor values for which no close rotamers were found, three (1ahb-

55

Tyr70, 1cgu-G1u257 and lpoa-A52 ) moved to interact with the ligand, while 1poa—A2
moved due tO excessive steric clashes with the ligand, indicating that their favored
conformations are inﬂuenced by the ligand as well as the side-chain conformational

energetics.

3.5 Conclusions

In this chapter, the bond-preservation bias discovered in 2 chapter was im-
plemented in SLIDE. However, on further experiments it was found that SLIDE
models small motions well enough during docking to preserve most Of the hydrogen-
bond interactions and, thus, additional bias towards preserving hydrogen bonds is
not warranted. Relatively large side—chain rotations in ligand-bound conformations
and causes for them were further investigated. Trying to satisfy buried polar groups
was determined as a predominant reason. Investigating how large rotations can be
modeled in SLIDE, the effectiveness Of using Dunbrack backbone-dependent rotamer
library to sample larger conformational space was analyzed. For X1 and x2 angles, the
rotamer libraries always contained entries reﬂecting the conformations Of side chains
that underwent large side-chain motions upon ligand binding. However, for long side
chains there were Often no rotamers that also reﬂected their ﬁnal ligand-bound x3 and
x4 angles. In Cartesian RMSD space, 13 out Of221igand-bound side-chain conformer
were represented in the rotamer library tO within 1.0 A RMSD and 8 Of the remain-
der to within 2.0 A RMSD. For all cases where side-chain mobility values were low,
rotamers close to the ligand-bound conformations were found. At other times, side
chains were found to be non-rotameric either because Of ligand-induced strain[14, 15]

in the interface or because they moved to enhance interactions with the ligand.

56

 

RMSD of the ligand—bound side chain conformations from
closest rotamers found in Dunbrack's backbone dependent rotamer library

 

 

 

 

 

 

 

120 a. i
1001
so-
q
so-
401 / a\
a m1z/Aaj:A/\\z/./X:z\n
E - An
I:
g ’ . +clloast—rotamuzerMSDmiativeto
a l.
8 25‘ +MaximumB-factorofanysidechainatom
3 —A—AverageB—factorofdlsIde—ctuinatonu
s 2.0- 3250
I .
o \ I
(I) _
2 1.5 /\I_
I!
0.5- u/'
an
"" IIIrIIIIIIIrIIIIIIII

 

 

9:» ﬁé $5069 3’ éeée o 06’ ~89
“its“ .9. 3W . 33:55:55 73* 1...:ij ti 5’5: :«t

Large dihedral rotation residues

Figure 3.13: This image is presented in color. RMSD of rotamers from Dunbrack’s
May 2002 backbone-dependent rotamer library found to be closest to selected ligand-
bound side-chain conformations in Cartesian space. Selected side chains across struc-
tures are those that had dihedral rotations more than 60° from the ligand-free to
ligand-bound conformation. Side-chain observation names include the PDB code,
chain ID (if present), residue type, and the residue number. Backbone dz, i/) angles
were used to locate the bins in which rotamers were searched, using a it, w resolution
of 10°. The neighboring 8 bins, using :15 :l: 10 , 1,!) d: 10 were also searched. Each ro-
tamer in these bins was converted to a side—chain orientation in Cartesian space and
transformed into the reference frame of the selected side chain. The rotamer library
consisted of only those rotamers which had a probability of at least 0.05 times the
probability of the highest probable rotamer within the same (1), 1/1 bin. This helped
exclude rotamers that were rare, since they could be poorly resolved side-chain con-
formations in the PDB. Maximum and average B-factor values over all the side-chain
atoms are also shown in the top half of the ﬁgure. SO, while close rotamers (RMSD
< 0.7 A) were found for lbib-Ly5183 and 1gmr-ArgB40, high B-factor values convey
that some target side-chain positions are in fact poorly deﬁned. In general, there is
no trend between the B-value mobility of a side-chain and the probability that this
conformation is present in the rotamer library.

57

Chapter 4

Scoring SLIDE Dockings

4.1 Introduction

Computer-aided structure-based methods for virtual screening are aimed at pre-
dicting the binding mode Of a ligand in the binding site of a protein or molecular
target, and estimating the resulting binding afﬁnity. A conformational search algo-
rithm can produce an immense number Of solutions from which the right solution(s)
must be selected by an energy-based or other scoring procedure. Thus, it is extremely
valuable tO predict binding afﬁnity accurately so the user may select good conforma-
tions and orientations from poor ones with conﬁdence.

For ranking protein-ligand dockings, a class Of programs, namely ‘scoring func-
tions’ has been actively researched for many years. Since docking algorithms search
for potentially good binding modes between a speciﬁed target and, and in some cases,
thousands Of small molecules, they must have the ability to discriminate not only be-
tween promising and poor binders, but also between different binding modes Of the
same ligand. Theoretically, free—energy simulations can be a reliable check for dis-
criminating good from bad dockings, but this approach is too expensive for screening

large libraries Of small molecules and remains a techniques performed by relatively

58

few experts.

Scoring functions can be broadly categorized into three categories: force—ﬁeld
based, knowledge-based and empirical. Force-ﬁeld based scoring functions apply clas-
sical molecular-mechanics based energy functions to approximate the binding free en-
ergy Of the protein-ligand using a sum Of van der Waals, electrostatic, bond length,
and bond angle terms. Solvation is usually taken into account using a distance de-
pendent dielectric or solvent-accessible surface term, although explicit modeling of
discrete water molecules is also done. Non-polar contributions are usually assumed
to be proportional to the solvent-accessible surface area. A drawback is that energy
landscapes associated with the force-ﬁeld potentials are usually rugged, and therefore
minimization is required prior to any energy evaluation. Furthermore, small inac-
curacies in positioning atoms during docking can lead to large discrepancies in the
energy score.

Knowledge-based scoring functions represent the binding afﬁnity as a sum Of
protein-ligand pairwise atomic interactions. Using the protein-ligand complexes de-
posited in the Protein Data Bank (PDB), knowledge-based potentials are derived
from the Observed preferences for particular atom-pair interactions tO occur at given
distances pr distance ranges between ligands and proteins. However, structures in
PDB do not provide a thermodynamic ensemble at equilibrium. Hence knowledge-
based potentials should be considered as a statistical preference rather than an actual
potential.

Recently, many empirical functions have emerged as alternatives to force-ﬁeld-
based and knowledge-based scoring functions. Unlike force ﬁelds, the weights Of
terms in empirical scoring functions are directly calibrated using a set Of protein-
ligand complexes with experimentally determined structures and binding afﬁnities
through multivariate regression analysis. These scoring functions may also include

derived terms that include interactions that would otherwise involve several terms in

59

a force-ﬁeld based function, e.g. terms for hydrogen bonds and hydrophobic inter-
actions. Empirical scoring functions have several appealing features. They can be
calibrated against a set Of diverse protein-ligand complexes. Secondly, each term in
an empirical function has an intuitive physical meaning. The resulting regression co-
efﬁcients before each term conveys the relative contribution Of each interaction in the
ligand-binding process. Thirdly, empirical scoring functions are extremely fast with
reasonable accuracy, which makes them acceptable for structure-based drug design
applications like virtual database screening and de novo ligand generation. Finally,
most force-ﬁeld-based energy functions have been parameterized solely tO ﬁt pep-
tidyl data, and are just beginning to be parameterized to recognize and accurately

represent atom and bond types that occur in non-protein ligand molecules.

4.2 SLIDE Scoring Function

The current SLIDE scoring function is an empirical scoring function trained tO
score a collision-free complex using a linear combination of hydrophobic and hydrogen-

bond interactions terms:
SCORE(P, L) = A - HPHOB(P, L) + B - HBONDS(P, L)

The weights A and B have been ﬁt to Optimize SCORE(P, L) to match the
afﬁnities Of 89 high-resolution complexes listed in reference [8]. For more information
about the terms and their tuning in current version Of SLIDE, please see reference
[42].

While the current SLIDE scoring function is based on hydrogen-bonding and
hydrophobic terms, more detailed terms tO represent both stabilizing and destabilizing
thermodynamic interactions can be explored for developing new scoring functions for

SLIDE. Inclusion Of such terms is also desirable to help discriminate among different

60

binding orientations generated while sampling rotamers for buried unsatisﬁed side

chains and to detect interactions that could be improved through further Optimization.

4.3 Methods

Empirical scoring functions are typically trained and tested against series of
experimentally determined protein-ligand crystal complexes whose binding afﬁnities
are known. Unfortunately, no experimental data exists for suboptimal orientations Of
ligands with proteins. For designing a new empirical scoring function for SLIDE to be
used during and after docking, we developed and tested a series of scoring functions
resulting from linear combinations Of individual terms listed in Table 4.1. The choice
Of which terms to combine was driven by low-correlation among the terms, as well

their ability to represent different interaction types.

Table 4.2: Scoring function variants evaluated to predict binding afﬁnities. 269 crystal complexes,
listed in Table 4.3, with experimentally known affinities were used for training and testing the
scoring functions. Terms are as deﬁned in Table 4.1. Linear regression constant a, and coefﬁcients
6 through 1) were determined to best fit the binding afﬁnity values.

 

 

 

 

Scoring Function Expression
f1 (Id-BA
f2 a+ﬂB
f3 (Id-BC
f4 a+ﬂD
f5 O+ﬂE
fa a-i-BF
f7 a-i-ﬂG
f3 a-l-BH
f9 0+3]
fro 0+1”
fit 0+5K
in 01-514
f13 0+5M
fti 0+5N
fits 04-50
fie 0+3!)
In 04-50
he 0+3}?
fro 04'53
Continued on next page

 

 

 

61

Table 4.2 — continued from previous page

 

 

 

 

 

Scoring Function Expression

fro a+BT

f21 0+5U

fn 0+3V

f23 01-5-7443

f24 a+l6(J+K)+vB

f25 0+5(H+I)+’YB

fee 04-130

f2? 0+5D+7I

fzs 0+BD+7(I+K)

129 a+ﬂI+7(G+H)+6(J+K+L+M)+eP+(R
[30 a+ﬂI+7(G+H)+6(J+K+L+M)+eP
f3, a+ﬂI+7(G+H)+6(P)

f32 a+ﬂB+7(I+G+H)+6P+eR

f33 a+ﬂB+7(I+G+H)+6R+e

I34 0+ﬂB+7I+6R+E

fss Owl-33+?!

fag a+ﬂO+7(I+G+H)+6(J+K+L+M)

I37 a+BO+7(I+G+H)+6(J+K+L+M)+eR
I33 a+ﬂO+7(I+G+H)+6(J+K+L+M)+eP
f39 a+BS+71+6(G+H)+e(J+K+L+M)+(P+nR
f4o a+BS+7I+6(G+H)+6(J+K+L+M)+CP
f“ a+ﬁS+7I+6(G+H)+eP

I42 a+ﬂS+7(I+G+H)+6P+eR

I43 a+ﬁS+7(I+G+H)+6P

f“ a+ﬂS+7(I+G+H)+6R

f45 a+ﬂS+7I+6R

f4s 044934-71

I47 a+ﬂS+7(I-i-G+H)+6(J+K+L+M)

I48 a+ﬂS+ty(I+G+H)+6(J+K+L+M)+eR
f“ a+ﬂS+7(I-i-G+H)+6(J+K+L+M)+eP
I50 a+ﬂB+7I+dG+€H

f51 O+BS+7I+6G+€H

f52 a+ﬂB+7I+6(G+H)

f53 a+BS+7I+6(G+H)

f54 0+BT+VI+6(G+H)

f55 a+ﬂT+7I+6(G+H)+cP

I53 a+ﬂT+7(I-l-G+H)+6(J+K+L+M)+eR
I57 a+ﬂS+7G+6(H+I)+eL+(J

f“ a+ﬂT+7G+6(H+I)+eL+(J

[59 a+ﬂS+7G+6(H+I)+eL+(J+nV

foo a+ﬂU+7G+6(H+I)+eL+<J

fel a+BU+7G+6(H+I)+eL+CJ+nV

 

 

4.3.1 Calculation Of Terms for Scoring Function

The overall free energy change in a protein-ligand binding can be represented as

62

Table 4.1: Potential terms to capture hydrophobic and polar interactions as well as
molecular size and entropy while scoring a ligand-protein complex.

 

Ligand-size

 

terms

A number Of ligand atoms

O number Of interfacial ligand atoms

N percentage Of ligand atoms that are interfacial

Hydrophobic

terms

B sum Of atomic hydrophobicity values2 for P-Ll hydrophobic contacts

C difference Of atomic hydrophobicity values2 for P-L hydrophobic con-
tacts

S number Of P-L hydrophobic contacts (without considering hydropho-
bicity values)

U hydrophobic term from current SLIDE score [42]

 

Polar terms

E

sum Of atomic hydrophilicity values for P-L hydrophilic contacts

 

 

F difference of atomic hydrophilicity values for P-L hydrophilic contacts

G number Of favorable protein-bound metal interactions with ligand

H number Of P-L salt bridges

I number Of P-L hydrogen bonds

Mismatch

terms

P number Of exposed hydrophobic ligand atoms

D difference Of atomic hydrophobicity values for hydrOphobic-hydrOphilic
contacts

J number Of interfacial, unsatisﬁed polar atoms

K number Of interfacial, unsatisﬁed charged atoms

L number Of P-L repulsive polar interactions

M number Of P-L repulsive charge interactions

W number Of hydrOphobic-hydrophilic P-L contacts

Entropic

terms

Q number Of rotatable bonds in the ligand

R number Of rotatable interfacial bonds in both ligand and protein

V number Of rotatable interfacial bonds in ligand

 

1 Protein-ligand.

2 from Kuhn et a1 [27].

63

A(:ln'nd = AGprotein—ligand h—bond + AG'salt—bridge
+AGhydrophobic + AG'unsatisfied buried polar atoms (4'1)

+AGent1-opy + Cl

Here, AGh_bmd accounts for the hydrogen bonding between the ligand and
the protein and AGhyd,ophob,-c accounts for the protein-ligand hydrophobic interac-
tions. AGun,at,-,f,-ed bun-ed ”in, am, penalizes the score if dockings have unsatisﬁed
polar atoms buried in the interface, while AGmtrm coarsely approximates the rota-
tional entropy by counting the number Of interfacial bonds, in either the ligand only
or both the ligand and the protein. a is the regression constant which implicitly
includes other effects for which no explicit scoring function term is included such as
solvation and rotational / translational main-chain entropy changes.

The complexes being scored for training or testing are assumed to be free from
any van der Waals overlaps among atoms. This is true for PDB structures in the
data set. When dockings are scored from within SLIDE, van der Waals overlaps are
resolved by directed rotations before scoring a docking. Interactions are determined

in the following order:

1. Intermolecular repulsive polar contacts If any ligand and protein atom-
pair consists Of both atoms as hydrogen-bond acceptors or donors, and their
interatomic distance is between 2.5 A and 3.5 A, the interaction is considered

as repulsive.

2. Protein-ligand metal-bonds If the interaction is not a repulsive polar con-
tact, and the protein-ligand atom-pair includes an acceptor or a donor or a
donor/acceptor, that will match either donor or acceptor, ligand-atom that is

within Of 2.6 A for CO, Cu, Fe, Mg, Mn, Ni, or Zn metal ions or within 2.9 A

64

for Ca, Na, or K metal ions bound to the protein, the interaction is considered

as a favorable metal interaction.

. Protein-ligand salt bridges If an interaction is not characterized as meeting
the criteria in (1) or (2), and the atoms are complementary (acceptor matched
to donor, or a donor/ acceptor that will match either donor or acceptor), inter-
atomic distance is between 2.5 A and 4.5 A, and the charges Of the atoms are

non-zero and complementary, then the interaction is counted as a salt bridge.

. Protein-ligand hydrogen bond If none Of the above, the atom-pair is eval-
uated for an hydrogen bond . TO calculate intermolecular hydrogen bonds, the
position Of the shared hydrogen in each intermolecular hydrogen bond is com-
puted, if not provided in the protein or ligand structure. Hydrogen atoms and
partial charges were added tO the ligand structures by utilizing molcharge Op-
tion in the AMIBCC [18]. If a potential hydrogen bond needed to be explored
when nO hydrogen atom was present in the protein structure, the position Of
the hydrogen atom was analytically determined. This position is well-deﬁned by
bond lengths and angles for all but the terminal hydrogen atoms in lysine and
hydroxyl side chains; for these side chains, the optimal hydrogen-atom position
is chosen, with respect to maximizing hydrogen bonds, on the circle of possible
positions deﬁned by covalent bonding constraints. All hydrogen bonds with a
donor—acceptor distance up to 3.5 A and a donor-hydrogen-acceptor angle larger
than 120° contribute equally to the score. If water molecules are included in
the interface, all water-mediated hydrogen bonds between protein and ligand

are also counted [42].

. Hydrophobic interactions A protein-ligand atom-pair is evaluated for mak-
ing a hydrophobic interaction if it has been found not to participate in any

Of the previous interactions. The interatomic distance for the atom pair must

65

be within 4.5 A to qualify for a hydrophobic interaction. Calculations for the
hydrophobicity measure, term U, is explained in detail in [42]. Other hydropho-
bic terms are either counts Of hydrophobic contacts, or sums or differences in

hydrophobicity values.

. Intra-molecular polar interactions At this point intremolecular salt
bridges, direct hydrogen bonds, and water-mediated hydrogen bonds are eval-
uated, in this order, within the protein and the ligand for atoms that did not

contribute to any Of the previous protein-ligand interactions.

. Unsatisﬁed buried polar atoms If any interfacial polar or charged protein
or ligand atom remains unpaired in any Of the above interactions, it is counted

as unsatisﬁed.

. Entropic terms Counts Of single (rotatable) bonds in the entire ligand and
in the interfacial residues Of the ligand and protein are used to generate terms

that partially measure loss Of degrees Of freedom upon ligand binding.

When evaluating score of a docking from within SLIDE, interactions that are lost

due to docking are also considered. For instance, unsatisﬁed protein atoms which pre-

viously had intra-protein hydrogen bonds that were lost due to side-chain rotations,

or hydrogen bonds to waters that were displaced upon ligand docking, are considered

in the penalty terms J and K.

4.3.2 Preparation Of Test Set

The data set used in this study is constructed from 269 protein-ligand complexes

speciﬁed in Table 4.3. The resolution range Of the crystal complexes was from 0.95 A

to 3.16 A with 85% Of the complexes having resolution 3 2.5 A. This set was assembled

from training sets used by other empirical scoring function studies[8, 37, 48]. All Of

66

these complexes have crystal structures and experimentally measured afﬁnity values.
Coordinates Of all the complexes were downloaded from the Protein Data Bank. NO
energy minimization was performed on structures. Each complex was split into a
protein molecule, which was saved in PDB ﬁle format, and a ligand molecule, which
was saved in M012 ﬁle format. Metal ions, if present in the protein-ligand interface,
were kept in the protein ﬁle. Water molecules were excluded and will be considered
in future work. Any other organic or inorganic cofactors were kept with the protein.

Values were generated for each Of the terms mentioned in Table 4.1 for each Of
the complexes in Table 4.3. Before combining various terms, the linear correlation
coefﬁcients between the terms were calculated. Combining highly correlated terms

was avoided when deﬁning new combinations as candidate scoring functions.

4.3.3 Training and Testing

Since scoring functions evaluated were linear expressions, training for a scoring
function involved performing linear multiple regression between the binding afﬁnity
and scoring function terms to determine the weights Of each terms and the constant, a,
and to minimize the error between the predicted and actual afﬁnity values. Raw terms
speciﬁed in Table 4.1 were calculated for each Of the complex speciﬁed in Table 4.3.
Then the set Of 23 terms for 269 complexes was randomly divided into halves. One half
was used for training to derive regression coefﬁcients (term weights) and the constant,
then these weights were used with the values of terms to predict binding afﬁnities
for the complexes in second half. TO avoid over-ﬁtting or training on any speciﬁc
subset Of the complexes, random repartitioning Of the 269 complexes into halves was
performed 10 times, and the training and testing sets were also interchanged for
each repartition. For each scoring function, its weights, constants and the resulting
correlation coefﬁcient with experimentally determined binding afﬁnity values were

determined across these 20 training and testing sets. These values are presented in

67

Table 4.3: PDB codes Of crystal complexes used for training and testing the scoring
functions

 

1a46 1bmn 1fq8 1rus 2xis 6tim
1a4w lbxo 1g6n lsre 2ypi 6tmn
1a5g lbxq lhbv ltet 3cla 7abp
1a94 1bzm lhdt ltha 3cpa 7acn
laeq lcbx lhew ltlp 3csc 7cat
labe lcil lhih ltmn 3er3 7cpp
labf lcim lhos ltmt 3er5 7dfr
lac4 lcin lhpv ltng 3fx2 7est
1ac8 lcla lhpx ltnh 3pgm 7hvp
lacj 1cnw 1hri ltni 3ptb 7tim
ladb 1cnx lhsg ltnj 3tmn 7up j
ladd lcny lhsl ltnk 3tsl 8abp
ladf ‘ lcps lhtf ltnl 4cla 8acn
1ae8 lcsc lhtg ltpp 4dfr 8atc
laeb lctt lhvi lulb 4erl 8cpa
laed ld3d lhvj luvs 4&2 8cpp
laee ld3p lhvk lxig 4er4 8hvp
laef ld3q lhvl lxli 4fab 8icd
laeg 1d3t lhvr 2ak3 4grl 8xia
laeb ldbb lhvs 2cgr 4hvp 9aat
laej ldb j 1183 2cpp 4mdh 9abp
lack ldbk 1187 2csc 4phv 9hvp
laem ldbm lldm 2dbl 4sga Qicd
laen ldif llgr 2dri 4tim 9rub
laeo 1dih lmbi 2er0 4tln

laeq ldog lmcf 2er6 4tmn

lacs ldrl lmcj 2er7 4tsl

laet 1drf lmcs 2er9 4xia

laeu ldwb lmdq 2gbp 5abp

laev ldwc lmfc 2ifb Sacn

1af2 ldwd lmfe 2ldb 5cna

lajv leed lmnc 2mcp 5cpp

lajx lela lnnb 2msb 5enl

lanf lelc 1nsc 2phh 5er2

1apt lent 1nsd 2pk4 5hvp

lapu lepo lokl 2qwc 5icd

lapv lepp lpgp 2qwd 51dh

1apw letr lphf 2qwc 5p21

lavn lets lphg 2qwf Ssga

1b5g lett lppc 2qwg 5tim

1ba8 1fbc lpph 2r04 5tln

1bai lfbf lppk 2mt 5tmn

1bap lfbp lppl 2tmn 5xia

lbbz lfkb lppm 2tsc 6abp

lbcu lfkf lrbp 2up j 6apr

lbhf lfmo lrgk 2wea 6cpa

lbid lfq4 lrne 2web 6enl

lbll lfq5 lrnt 2wec 6gst

 

68

Table 4.4.

 

 

 

 

 

-4 1 R = 0.672, R2 = 0.451

 

 

 

 

 

 

 

 

 

.5 _
-8 _
i -10 1
1.“ cl
0
3 -14 1
ll)
-16 - "
I SLIDE Score #61
—— Linear ﬁt
-13 q - ---------- Upper 95% Prediction Limit
------ Lower 95% Prediction Limit
-20 l ' I ' I ' ' f '

' l

, . r . .
-16 -14 -12 -10 -8 —6 14 -2 0
Experimental AG (keel/moi)

Score = (4.848) + (10.122‘U) + (0.181‘G) + (~0.164°(H+I) ) +
(-0.536*L) + (0.098’J) + (0.053'V)

Figure 4.1: Correlation between experimental binding afﬁnity and score values deter-
mined by SLIDE scoring function 61.

 

4.4 Results

Scoring functions 58, 60 and 61 have the highest correlations with binding afﬁnity

afﬁnities. Scoring function 61 was chosen since the standard deviation of its weights

69

Table 4.4: Predicted and experimental binding-afﬁnity linear correlation coefﬁcients
and average weights derived from linear multiple regression. 20 rounds Of training
and testing were performed to best ﬁt the experimental afﬁnities.

 

 

Scoring func. Avg Corr. 0 l3 '7 6 e C r)
1 0.511 -5.972 -0.085
2 0.522 -5.506 0.259
3 0.485 -5.57 11.172
4 0.474 -5.212 -0.109
5 0.339 45.389 -0.254
6 0.324 -6.769 -0.549
7 0.041 -8.51 0.404
8 0.001 ~8.429 ~0.007
9 0.329 -6.917 -0.274

10 0.252 -7.217 -0. 194

11 0.221 ~7.777 10.457

12 0.286 -7.583 -0.765

13 0.104 -8.268 —0. 192

14 0 ~8.857 0.431

15 0.56 -5.485 -0.111

16 0.102 18.293 -0. 132

17 0.415 16.811 -O. 125

18 0.432 -5.716 -0.1

19 0.51 -5.435 -0.053

20 0.593 -5.804 0.752

21 0.542 16.192 -0.097

22 0.404 -6.705 -0.163

23 0.596 -5.509 -0.1 0.242 0.16

24 0.606 -5.529 -0.122 0.246 0.141

25 0.579 «1.535 -0.043 -0.098 0.171

26 0.528 14.75 -0.734 -0.064 0

27 0.526 14.689 -0.828 -0.039 -0. 104

28 0.525 -4.74 -0.791 -0.045 -0.06

29 0.56 -4.323 0.285 -0.206 10.148 10.004 -0.001 0.027
30 0.561 14.36 0.267 10.161 10.119 0.017 0.009
31 0.567 14.335 0.255 -0. 16 -0.114 0.01
32 0.565 14.315 0.292 -0.168 0.002 0.022
33 0.565 14.319 0.293 -0.169 0.023

34 0.551 -4.784 0.219 -0. 145 ~0.014

35 0.55 -4.884 0.231 -0. 169 0

36 0.571 -5.631 -0.144 -0.025 0.124

37 0 58 -5.798 -0.187 -0.105 0.103 0.08

0.573 -5.613 -O.149 -0.025 0.125 0.091
0.56 -4.055 -0.057 -0.206 -0.145 10.006 -0.018 0.017
0.562 -4.085 -0.054 -0.l77 -0. 126 0.008 -0.01
0.568 -4.084 -0.053 -0.175 10.123 10.011
0.565 14.044 -0.058 -0. 165 -0.014 0.012
0.569 -4.053 -0.055 10.148 -0.009
0.565 14.049 -0.058 -0. 164 0.011
0.552 -4.567 -0.044 -0.143 10.021
0.547 -4.722 -0.047 -0. 184 0
0.564 -4.071 -0.057 -0.149 0.011
0.562 14.045 -0.058 -0. 163 0.003 0.009
0.563 -4.061 -0.056 -0. 15 0.011 -0.01
0.562 -4.336 0.255 -0.16 -0.177 -0. 106
0.563 14.089 10.053 -0.175 -0.169 -0.117
0.567 -4.344 0.254 -0.16 -0.112
0.568 -4.094 -0.053 -0.175 -0. 122
0.604 -5. 178 0.727 10.082 -0.085
0.605 -5. l 75 0.743 -0.081 -0.089 0.049
0.611 -5.551 0.977 -0.155 0.031 0.057
0.596 -4. 184 -0.064 0.081 -0. 128 -0.465 0.149
0.628 -5.58 0.839 0.126 10.055 10.399 0.154
0.594 -4.163 -0.064 0.087 -0.13 -0.466 0.147 0.002
0.626 -5.074 -0.111 0.19 ~0.128 -0.505 0.129
0.627 -4.848 -0.l22 0.181 -0.164 -0.536 0.098 0.053

28$838828$SS$$338£6$2888

 

70

 

 

 

 

 

 

 

 

 

160 1 I Old SLIDE Score
. — Linear ﬁt I
140 _ ----------- Upper 95% Prediction Limit
. ------- Lower 95% Prediction Limit
12° ‘ R = 0.485, R2 = 0.235
‘ I ..............
sti- . ...................
a) .
LU
Q
_1
co
'0
6

 

 

 

 

Experimental -AG (kcallmol)

Figure 4.2: Correlation between experimental binding afﬁnity and score values deter-
mined by the previous SLIDE scoring function.

 

71

 

 

 

 

R = 0.631. R2 = 0.400

 

O
l

 

1'»
o
I .

1

DrugScore(x10‘)
s

 

 

 

Predicted afﬁnity by
s
l

 

 

 

 

 

l '4‘"
l u 5 u. ‘1'. I DrugScore
-100- —Linearflt
,5' ----------- Upper 95% Prediction Limit
‘ ------- Lower 95% Prediction Limit
'120 T:’ I ' I . I ' I f I 1 J1

. . . . .
-1s -14 -12 -10 -s -s -4 -2 0
Experimental AG (kcallmol)

Figure 4.3: Correlation between afﬁnity values known experimentally and those de-
termined by DrugScore [12].

 

72

and the regression constant was least among the three (data not shown), indicating
that these weights do not vary much as training set varies. Apart from representing
hydrogen bonds and hydrophobic interactions, function 61 also includes terms for
metal interactions, the entropic cost Of ligand binding as well as penalizing the score
for unsatisﬁed polar or charged atoms buried in the interface.

Figures 4.1 and 4.2 present the correlation Of experimentally known afﬁni-
ties against scores predicted by scoring function 61 and Old-SLIDE scoring function
respectively. Not only scoring function 61 perform much better than Old scoring func-
tion, but it performs comparable to DrugScore [12], whose correlation with known

afﬁnities Of same 269 complexes is presented in Figure 4.3.

4.5 Conclusions

When sampling larger rotations to model large side-chain displacements in SLIDE
is desirable, a suitable scoring function representing favorable as well unfavorable
interactions is needed tO discriminate among the candidate side-chain conformations.
Scoring function 61 developed by our team not only predicts afﬁnities more accurately
than our previous scoring function, but it also performs somewhat better than the
popular knowledge-based scoring function DrugScore and is considerably (lo-fold)
faster. Thus, this scoring function has been employed to evaluate how side-chain
rotamers inﬂuence the afﬁnity Of the protein-ligand complex, and select among the

rotamers, as described in the following chapter.

73

Chapter 5

Guiding Sampling by Score

While rotamers present a mechanism to model side chain flexibility through
a range Of motions, a scoring function is used to determine which speciﬁc rotamers
enhance the interactions over the existing side chain conformation. Two requirements
need to be met. One is tO identify which side chains provide deﬁnite Opportunities
to enhance interactions through rotations after a steric overlap-free docking has been
achieved by SLIDE. Second is to determine which orientations for these side chains
collectively enhance the interactions the most. The work and results described in
the previous two chapters provide tools to address these requirements. While buried
unsatisﬁed groups are typically not Observed in known protein-ligand complexes and
provide deﬁnite interaction-enhancing Opportunities, rotamers provide a reasonable
method to sample larger rotations needed to satisfy hydrogen bonding groups. Using a
scoring function that rewards favorable orientations as well as penalizes unsatisfactory
or unfavorable orientations would help choose from several rotamers for a side chain.

If there are multiple rotamers for multiple side chains which improve interactions
for unsatisﬁed groups buried in the interface, there are exponentially many possible
conﬁguration sets to choose from. If each single bond can sample 3 low-energy x

angles, gauche“, gauche“ and trans, the number Of possible rotamers for a side chain

74

with ‘c’ x angles is 3‘. Further, if there are ‘r’ residues tO sample for rotamers, the
total number Of possible conformations for the ‘r residues is 3". For instance, if there
were four side chains to be Optimized with three x angles each, there would be 312
conﬁgurations to test, which is unfeasible except if many conﬁgurations can be ruled

out by dead-end elimination or other pruning approaches.

5.1 Sampling Choices for Maximizing Score

Thus, generating and evaluating ensembles Of side-chain conﬁgurations for en-
hancing dockings is expensive. TO reduce the conﬁgurational search space, the
number Of rotamers sampled from the rotamer library is limited to only rotamers
with backbone conformation similar to the side chain being sampled and having a
backbone-dependent probability Of at least 5% of that of the highest probability ro-
tamer. Rotamers with probability lower than 5% of the threshold represent either
poorly resolved side-chain conformations or rare geometries which are unlikely to oc-
cur frequently enough tO be considered as backbone dependent rotamers (personal
communication with Roland Dunbrack, developer Of this rotamer library).

Since more than one rotamer could potentially enhance the interactions for a
residue, measured by the change in the score, one could keep the rotamer that in-
creases the score the most with respect to the initial steric overlap-free docked con-
formation of the ligand and protein achieved by SLIDE. However, this may lead tO a
set Of rotamers which may not be compatible together due tO van der Waals overlaps
or which may even decrease the overall score due to repulsive contacts amongst the
repositioned side chains. Moreover, the backbone-dependent probability for a rotamer
may be low when compared to other acceptable rotamers. Alternatively, two or more
rotamers may enhance the score equally, but one may be more likely Of all.

To avoid local minima as well as choose high-probability rotameric conformations

75

for side chains, self-consistent mean-ﬁeld Optimization is used to maximize the chem-
ical complementarity in the interface after the induced-ﬁt docking has been achieved
by SLIDE (see section 3.2). The mean-ﬁeld Optimization is based on multi-copy
sampling. For each residue being Optimized, these multiple copies are generated by
sampling through high-probability rotamers with their backbone conformations same
as the residue’s. Prior to sampling, the induced-ﬁt docked complex is recorded as the
base conformation. This base conformation is restored before any rotamer is sampled
for any residue. This ensures that every rotamer that is modeled for a side chain
is evaluated in the same environment. More importantly, this also ensures that the
rotamers for all side chains are evaluated in the same environment, thus avoiding
any dependence on the order in which side chains are Optimized. As each rotamer
is modeled as a side chain for each residue, any resulting steric overlaps are resolved
through the induced-ﬁt mechanism. Any substituted rotamer causing unresolvable
overlaps is discarded. From these overlap-free rotameric conformations for a side
chain, maximum n are retained based on the magnitude of score improvement. In
our experiments, 12 is 5, though it can be easily scaled to larger number Of the rotamers
used.

With these multiple copies, or rotameric conformations, for each residue being
Optimized, the probability and cost matrices for the mean-ﬁeld Optimization are con-
structed. Each element, P(i, j), Of probability matrix, P, describes the probability
that rotamer j is optimal for residue 2'. TO encourage ﬁnal side-chain conformations to
be highly likely rotamers, each element is assigned the respective backbone-dependent
probability Of the rotamer derived from the rotamer library. For each probability en-
try P(i, j), an initial cost value .5" (i, j) is assigned the value Of score improvement

brought by rotamer j for residue i over the base conformation. In each iteration Of

76

the Optimization, the mean cost for each rotamer is computed as:

At the end Of each cycle, the entries in the probability matrix are updated based
on the mean costs S(i, j) using the Boltzmann principle:

e—S(ilj)/ﬂ

H“) = Zke-swm

where p is the average value of all computed mean costs. In eﬂ'ect, probability for each
rotamer for a residue is reﬁned iteratively by Boltzmann weighting each rotamer by a
term encoding both the score-improvement as well as the rotamericity bias introduced
by the initial backbone-dependent probabilities. Convergence Of the values in the
probability matrix is usually Observed in fewer than ten cycles, and those rotamers
with the highest probability are chosen to model the side chains for the selected
residues.

This class of mean-ﬁeld methods, called self-consistent mean-ﬁeld Optimization,
used to determine side-chain conformations to achieve the global energy minimum (in
our case, the global score minimum) is inspired from [42, 26, 17]. However, the quality
Of solution depends critically on the choice Of conformations for sampling - here,
the rotamers - which serves as the basis for the probability distribution. Similarly,
the probability function, which includes the score improvement as well as rotamer
probabilities, also critically affects the quality Of results.

In Figure 5.1, the ﬂowchart of the algorithm used to select Optimal combination
Of rotamers for a set side chains is presented. A protein-ligand docking free Of van
der Waals overlaps, achieved through induced-ﬁt docking by SLIDE, deﬁnes the base
conformation Of the interface for which rotamers are sampled. After achieving the

base conﬁguration, residues consisting Of buried unsatisﬁed polar groups are identiﬁed

77

by scoring function 61 (see previous chapter. Then, rotamer conformations having
the same at — lb angles are sampled for each Of these residues, keeping top 5 acceptable
rotamers. Acceptable rotamers are those side-chain conformations that are free from
steric collisions with other atoms, besides improving the score. TOp 5 rotamers are
selected based on steric compatibility and maximum score improvements over the
base conﬁguration. Iterative self-consistent mean-ﬁeld Optimization is then started,
with backbone probabilities of the rotamers used as initial probabilities, and score
improvement used as the cost. The goal is converge to high probabilities for rotamers
which together maximize the cost. This Optimization is very similar to the mean-ﬁeld
technique described in section 3.2, except for three differences. One, is that the
score improvement here replaces force. Two, the initial probabilities are backbone
dependent rather than equal probabilities, since keeping side chain conformations
rotameric and appropriate for the backbone conformation is desirable. Finally, no
inter-residue rotamer dependencies are modeled yet.

The minimization yields an Optimal set Of rotamers with respect to rotamer
probabilities and score improvement, one per residue, which together maximize the
overall score. These chosen rotamers are transformed into the binding site and the
entire docked complex is again checked and overlap resolution is performed by SLIDE
to remove any new van der Waals overlaps. Since induced-ﬁt motions are small in
nature, it is expected that very few key interactions would be lost during the induced-
ﬁt collision resolution. In fact, none were Observed in our earlier studies Of how
overlap resolution inﬂuences scoring. If the collisions can be resolved and the ﬁnal
score is better than the pre-rotamer-search score, the new conformation is recorded.
Otherwise, the pre-rotamer search conﬁguration is restored and SLIDE moves to
generate the next docking. In future, this can be modiﬁed to explore the next-best

set Of rotamers that are compatible while enhancing the score.

78

 

 

1. Generate an overlap-free docking from ligand-free protein structure and ligand. Record these protein,
ligand docking conformations as base conformations and their score as base score.

2. Use scoring function to determine protein side chains with interfacial, unsatisﬁed polar atoms.
3. if no unsatisﬁed residues exist then
I Exit rotamer samming. Write the docking ﬁles. Continue to generate and score next binding mode.
also
Loop over the unsatisﬁed residues.

if currently considered residue is N- or C-terminal then
| Skip this residue, since it or it: cannot be calculated. Continue sampling rotamers for next residue.

else
a. Determine 05,11: for the residue. Determine the 10°- spaced 4: - rb bin in the rotamer library in

which these values of rim/I fall.

b. LOOp over the rotamers in this ¢ — w bin.

c. Restore protein, ligand to their base conformations before sampling a rotamer.

d. Transform the rotamer into the binding site in place Of original side-chain conformation.

if steric overlaps are found them
Try to resolve steric overlaps through SLIDE’s induced-ﬁt collision resolution.

if overlaps cannot be resolved then
| Discard this rotamer, continue by sampling the next rotamer.

else
Evaluate score of the entire docking.

If score is worse compared to base score than
I Discard this rotamer, continue by sampling the next rotamer.
also
record current conformation of side chain, keeping at most top 5 score-enhancing
conformations for this residue.
and

 

end

else

Evaluate score of the entire docking.

if score is worse compared to base score than

I Discard this rotamer, continue by sampling the next rotamer.

else
Record current conformation of side chain, keeping at most tOp 5 score-enhancing
conformations for this residue.

and

 

and

 

end

if more than one unsatisﬁed residue then

a. By performing mean-ﬁeld optimization on recorded candidate rotameric conformations for the
unsatisﬁed residues, identify the rotamers, one per residue, that together maximise the score.

b. Transform these rotamers onto their residues in the interface, replacing their side chains.

c. if steric overlaps are found then

i. Try to resolve steric overlaps through SLIDE’s induced-ﬁt collision resolution.

ii. If overlaps cannot be resolved, restore the protein, ligand base conformations, write the
docking ﬁles and continue to generate next binding mode.

also
i. Evaluate score of the entire docking.

ii. If score is worse compared base score, restore the protein, ligand base conformations.

 

and
d. Write docking ﬁles and continue to generate next binding mode.

 

also
I Identify the best-scoring rotamer conformation the single unsatisﬁed residue.

and
Substitute the most score-enhancing set of rotamer(s). If steric overlaps are found, attempt to resolve

them; if not possible, revert to pro-rotamer search conformation.

 

end

 

 

Algorithm 1: Pseudo-code for rotamer sampling for unsatisﬁed side chains and mean-ﬁeld
Optimization for maximizing score. Flowchart Of implementation is presented in Figure 5.1.

79

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Any
4 a” Heeolveany
f Yes coll“? stericcollieions
Generateanewbinding
modefortheligandthrough
geornetrlcandchemical
trienglematching
ForeechreelweJubetituto
l itsoriginalside-chein
AchieveavanderWaaIs No conforrnation'withmo
ovedap-freebindingmodetor conformationotttshighest
theiigand'sanchortragment L_Reoordthedocldng A WWW
InafiIe
I
Preparealistofwhoseside-chain
“W needoptimization ““09 Perlormecore-improvermntbaeed
the scoring function mean field minimization‘ involving
top'n'rotarnersforemlored
£ residues.
heretoforeach
reeldueintheliat
Picknext residue v“
fromthislst ""
Yes
”mind? ' r HIM-ll
Y“ ‘ rotamersbeen
sampled?
No
* Yes
Calculate ml.“ anabofortheresidue. tfneodedwecordthieoverlm-
Using phi-psi .andtheresiduetype. freerotarneric side-chain
index Into appropriate phi-psi bin of the conformation among the top
Dunbrack'sbackbonedependent 'n'rotamericcontormatione
rotamerllbrary. formic II
Q
1 Yes Y“
Seisctnextrotemerfromthisbin ‘
5:! 11 :'111' t5:'t:ll A 1L l
in who met j. No Didscore
ty inprovereiatlveto
pro—rotamer
1 score?
Transformtherotamerinto
target'sreferenoetrame.
we: U Scoreoorrplexagainwith
overture using the SLIDE's
lrrdr.rced-fltmechanlsm WWW
oontormation
Doesligandhas No]

 

stericoverlepe?

 

Figure 5.1: Flowchart of algorithm used to guide selection Of a rotamer from a pOOl
Of rotamers. Pseudo-code is presented in algorithm 1.

 

80

5.2 Methods

Experiments involved 30 pairs Of ligand-free and ligand-bound crystal structures
speciﬁed in Table 2.1. TO be able to compare a docked ligand’s RMSD from its crystal
complex conformation and orientations, the ligand-free protein backbone structure.
The 30 proteins were chosen tO have a backbone RMSD less than 1.0 A, so that side-
chain motion would not be required to compensate for incorrect backbone positions.

After superposition, the binding site in the ligand-free structure was initially
identiﬁed as the set Of all residues having any atom within 9.0 A Of the ligand.
Corresponding to this set Of ligand-free residues, ligand-bound residues were identiﬁed
as the binding site Of the ligand-bound protein structure.

Each ligand was then separated from the ligand-bound complex. Hydrogen atoms
were added to the ligand structures by the AMIBCC [18] program to determine partial
charges and add protons as needed to polar atoms of the ligand. A biased template
representation of the active site generated by SLIDE to represent strategic points
of potential interactions between protein and this ligand was generated using the
ligand conformation from ligand-bound structure [51]. This ensures that most of the
dockings would place ligand in its correct position, so that side-chain conformations
Of the docked complexes could then be compared with side-chain conformations in
the ligand-bound crystal complex.

Scoring function #61 was implemented in both current version Of SLIDE (re-
ferred to as Old in ﬁgures) as well as the version with rotamer sampling (re-
ferred to as new). Default SLIDE parameters were used for docking most of the
known ligands into respective apo protein binding sites[36]. However, the param-
eter SIDE-CHAIN_OVERLAP was relaxed from 0.3 to 0.5 A, and parameter IN-
TRA-OVERLAP was relaxed from 0.1 to 0.2 A to allow ligand dockings for struc-
tures lpoa, 1apm, 3cox, llib. Water molecules were not included at any stage Of the

docking or analysis.

81

Of the 30 structures experimented, no successful dockings were generated by
either SLIDE version due to excessive steric overlaps for structures llib, 3on1, 1udg,
1ca2, lxib, and 2ctv. The predominant reason was that the ligand anchor fragment
had unresolvable van der Waals overlaps with the protein main chain. Since dockings
were not generated by either SLIDE version, these cases were not pursued further, and
will require other reﬁnements in ligand or main-chain ﬂexibility modeling in SLIDE.

TO observe the effects Of rotamer sampling, dockings generated by different
SLIDE versions need to be compared. Only those dockings that matched the same
anchor fragment Of the ligand to the same template-point triangle and resulted in
an accurate docking were compared. Comparing such dockings, henceforth called
common dockings, ensures fair evaluation of side-chain modeling performance across
SLIDE versions. Since new SLIDE version, unlike the Old version, Optimizes side-
chain conformations for chemical complementarity, the side-chain conformations in a
common docking generated by new SLIDE may be different than those in the cor-
responding docking by Old SLIDE, even though the ligand orientation in both the

dockings are the same.

5.3 Results

This section presents results from the score-Optimizing mean-ﬁeld method im-
plemented in SLIDE for sampling and substituting rotamers in the protein-ligand
interface. To analyze the performance Of the new SLIDE version, there were some

speciﬁc measures Of success:
0 Chemical Improvements

1. Does the score improve for dockings generated by new SLIDE as compared
to the same dockings by old SLIDE ?

This provides insight into the effectiveness Of rotamer sampling and subse-

82

quent mean-ﬁeld Optimization to choose a chemically Optimal rotamer set
for residues with unsatisﬁed hydrogen-bond potential. In Figure 5.2, we
present the distribution Of improvement in score for all common dockings,
generated by both Old and new SLIDE versions for the 24 protein-ligand
complexes. New SLIDE samples rotamers for unsatisﬁed side chains and
Optimizes rotamer choices for maximizing score improvement. The score
improvement for each common docking is deﬁned as (score Of docking by
new SLIDE - score Of identical docking by Old SLIDE). As shown, for
more than 90% Of almost 2500 common dockings, the score improves. The
inset graph, provided tO show the same distribution while excluding the
most frequent score-improvement bin Of 0 — 0.5, shows that bigger score
improvements (greater than 0.5) are also signiﬁcant in number. Note that
the score improvement is an estimation Of improvement in AGh-ndgng, since
the new scoring function # 61 was developed to predict protein-ligand
binding afﬁnities. Through Figure 5.2, we summarize that by combining
rotamer-sampling for unsatisﬁed polar groups and subsequent score-based
mean-ﬁeld maximization, we are able to enhance the score for most Of the

dockings generated by the new version Of SLIDE.

. Does the score improve for the best docking by new SLIDE as compared to
best docking by old SLIDE ?

While rotamer sampling and score maximization was effective for most Of
the dockings, the typical usage Of SLIDE aims at identifying best dockings
from a multitude Of binding modes generated for each ligand. When dock-
ing known ligands, as in this research work, the best docking, involving the
ligand-free conformation Of the protein’s active site docked with the known
ligand, implies the docking that has the ligand position and conformation

closest to that in the ligand-bound crystal structure. Focusing on the best

83

 

 

2000
Score-improvement distribution, excluding interval [00.5]

 

      

   
 

       

  
  

 

 

     

 

 

M; ,r

im‘ in

£12001 5 150.

§....- .

"5 800: gloo-

r . , 1 1

gm' ” Ns\\.

in: r ....... §§§§§m

mi msmmmmmmm

 

 

 

I 1 I
-2 0 2 4 6 8 10
Score difference between identical ligand
orientations docked by new and old SLIDE (new - Old)

Figure 5.2: Distribution Of change in score across all dockings with triangle matches
in common, generated by both Old and new SLIDE versions. Both Old and new SLIDE
use scoring function #61. The common triangle matches ensure ligand is in the same
orientation in dockings by both Old and new SLIDE versions. However, side-chain
positions may differ, since the new SLIDE samples rotamers for unsatisﬁed side chains
and Optimizes rotamer choices for score improvement. Score improvement for each
common docking is deﬁned as (score of docking by new SLIDE - score Of docking with
same orientation by Old SLIDE). Inset shows the same distribution with expanded
y-axis scale, omitting the largest score-improvement bin.

 

84

docking is needed since the effectiveness of side-chain Optimization can be
evaluated fairly only with dockings that have the ligand in the correct pO-
sition as well as conformation. In Figure 5.3, we present scores Of the best
dockings achieved by Old and new SLIDE versions, comparing against the
score evaluated for the crystal complex itself. In 50% Of the best dock-
ings generated by new SLIDE (12 out Of 24 cases), the scores improved as
compared to the best dockings generated by Old SLIDE, and in no case
was it signiﬁcantly worse. In about one-third Of the cases, the new SLIDE
dockings scored better than the crystal complex, which indicates that the

scoring function still requires improvement.

3. Does the number of unsatisfied polar atoms decrease in best dockings gen-
erated by new SLIDE as compared to old SLIDE .9
Since the rotamer sampling was restricted to only those residues that had
unsatisﬁed polar groups buried in the interface, we compared the number
Of such groups remaining in the best dockings by both Old and new SLIDE
in Figure 5.4. Restricting analysis tO the best dockings, for the reasons
cited above, we Observe that as compared to best dockings generated by
Old SLIDE, the number of unsatisﬁed polar groups buried in the interface in
new SLIDE decreased in close to 50% (11 out Of 24 cases) of the cases. The
decrease in number Of unsatisﬁed polar groups varied from 3 to 1. Another
encouraging fact is that the criterion for detecting candidate residues for
rotamer sampling could easily be expanded to include repulsive contacts

and exposed hydrophobic groups, as well as unsatisﬁed polar groups.

0 Conformational Improvements

While the score-guided rotamer sampling aims to maximize the side-chain
interactions in the binding interface, another criterion to measure the ef-

fectiveness of rotamer sampling is how closely the sampled side-chain con-

85

 

Comparison Of scores Of top dockings from old and new SLIDE
with score of the crystal complex

 

 

 

 

 

 

 

 

 

14 1 —I— Crystal complex
. --o-- Old SLIDE's top docking
13 - A---- New SLIDE's top docking
u]
12-
E 11-
* .
. - ...A
g 10 O AHA 9 __
a ' 3 -' .
v 9.. l“; 59/“:
5 ' l’ --—-1-t.. ~. . .
(D 8 " I, 1’ ’84“. 5:7: ‘. J A "i
‘/-.'I ‘ :f O l I'
5 1 I41"7 4'" 81A“? 11'
«t 7- 44°C t4 0 ii
. '/..0 x 5: A
I" ‘\.4
6- 6 0‘ ;
.1 Of— \0'
5-
IIIIIIIIIIIIIITIIIIIIII
.92 EXOE‘E .0ng 5 me& a
£s§'§ése§?§2§g§§a§§§2§§a
Structure

Figure 5.3: Comparison Of scores Of best dockings by Old and new SLIDE versions
for each protein, as well as scores evaluated for the corresponding crystal complexes.
The best docking is identiﬁed using the lowest RMSD docked ligand conformation
from the ligand’s crystal-complex conformation. Scoring function #61 was used to
score the dockings after they were recorded by Old and new versions Of SLIDE.

 

86

 

Comparison of the number Of unsatisﬁed target atoms
for best docking by old and new SLIDE versions

 

 

    

-------~A Old SLIDE
_V_ New SLIDE

  

Number Of interfacial unsatisfied polar side-chain atoms

 

 

 

 

Figure 5.4: Comparison Of the number Of interfacial unsatisﬁed polar groups in the
best dockings generated by Old and new SLIDE versions. New SLIDE sampled ro-
tamers for only those residues which had at least one interfacial polar atom unsatisﬁed.

 

87

formation approaches the to ligand-bound conformation upon docking. Is
new SLIDE able to model motions, both in magnitude and direction, that
are similar to the motions from ligand-free to ligand-bound conformations
found in crystal complexes ? This study was organized to answer the

following key questions:

. Does new SLIDE correctly detect the set of side chains that undergo large

motions from ligand-free to ligand-bound conformations .9

A large motion is deﬁned here as side-chain’s positional displacement, mea-
sured as RMSD between ligand-free and ligand-bound conformations, Of at
least 0.5 A . While chapter 3 focused on large rotations, here focus is on
large displacements since evaluation Of docking performance is measured

using positional deviation measures like RMSD.

In Figures 5.5, 5.6, and 5.7, all side chains are presented that were either
moved by SLIDE or underwent a large motion from ligand-free to ligand-
bound conformation, across best dockings. This set Of side chains was
selected to compare any side-chain motion by any SLIDE version relative
to any large interfacial side-chain motions observed from the ligand-free
and ligand-bound crystal structures. There were 93 individual side-chain
cases. Figure 5.5 presents side chains moved to resolve collisions. As
expected, new SLIDE moves almost the same set Of side chains (52 in
number) as Old SLIDE does (55). Since collision resolution and induced-
ﬁt development are handled the same as before, therefore Old and new
SLIDE versions perform almost identically.The few differences that arise
between motions by Old and new SLIDE versions are the cases where a
side chain that was rotated tO resolve a steric overlap was also detected

having unsatisﬁed hydrogen-bond potential by the new SLIDE, as in the

88

.188 2: ﬁg. 8:853 osoaoa 8 sea 3 858.8 same. a 858588 assess
2: .8388 ~38 on... 5 685 .8ch83 a 9:8 .8th 38.3.3 05 83c: .35 8388.88 a 828 33385 3 8 some
55 new 8888 8388 85 823 .mn—Em Be: on» .3 8388: 8 888.83% 83 8&8 _883 5.888 some on» x_adonvoansm
803:8 s 2:88 3 meqm Bod was Bo £3 .3 858 83 833 .3~b§..3m~ 838 on» 38.88 .o. 8:23 8388 88.028
one 52588.8 558-188 2: son 5:8 588588 55552.. 88-558: 2: 858 88 as cases as. 8:88
a 233 8288.888 vasonﬁnsg 2: 3 Homo—u 8.58 52m sounds—om: 2: 858 meqm 8888 $88» so am: 8c88oEmmc
mo 88 03:88 < .mn—Sm Bo: 98 So 3 83.88» .8388 388 one 8 83:88 2: 3 538—8 Gaza 88m: .3 888.88%
amen—coo Son 3 8888.“ E 888888 23. .8058888 enconénuwm 3 8b68m: 88m 8338 .«c 83.8 :3? maﬁa Ben was
So 8 8588282. 5.838: on.» 83388 83:8 83 8e 83 maize 88.8.“an 8338 88.8% .«o 88.38800 "mat. 88mm

 

 

 

 

 

 

 

 

 

 

Q/ In.
.1
.sva
em;
arm
..~.0mm
83382325838888 ..".m L
.......F.............
.. . . .
.............::.:.:. r
..... ............,..........,........._..._.....,......._......_:c w
.. ..:.. .. .. .._ ...._.. .. ... ...‘.
LEI? .:.:;_ : _ :::Z::_ :.. m
:3 . L“. _ . .. u. Li...
_ . .. “u _“ ...m...m
. “_ ._ _. “.mmua
“ u 7me
n “ “com
moascasssgeazm' “ n u . w w
333559885558sz “ n n 4
sags. agaaoecmzmg . .
a

§8§§s§§_§§§§assaasua§as5355§o§
23888328838332.8353;

89

 

Side-chain motions in new SLIDE for satisfying buried unsatisﬁed groups
compared against motions from ligand-free to ligand-bound oonfon'nations

 

 

RMSD of crystal complex conformation
- RMSD of top-docking by new SLIDE

 

 

 

Towards
Lig-bound Con?

 

 

RMSD from lig-free conformation (A)

 

 

 

Side chains moved to satisfy buried unsatisﬁed polar groups

Figure 5.6: Side-chain motions performed in new SLIDE to optimize interactions
in the same 24 complexes. For side chains with unsatisﬁed buried polar groups,
rotamers were sampled and steric overlap-free, score-maximizing conformations were
selected. As explained earlier, positive displacement indicates that the motion moved
the side-chain conformation closer to the ligand-bound conformation from the ligand-
free conformation; negative displacement indicates the opposite.

 

90

case of 1gta-Tyr104, which was subsequently optimized by new SLIDE for

better interactions.

The new SLIDE also moves side chains through rotamer sampling for un-
satisﬁed interfacial groups after the induced-ﬁt docking. In eight cases,
where side chains were moved only for optimizing interactions, are pre-
sented in Figure 5.6. While many more side chains were detected and
sampled for better rotamers (data not shown for brevity), if those rotamers
caused unresolvable collisions, they were reverted to the initial state and

re-sampled.

Another set of 30 large-motion cases presented in Figure 5.7 remained un-
detected by both SLIDE versions; new SLIDE did not detect them because
they did not involve unsatisﬁed polar groups. This presents an opportunity
for investigating into other causes for large motions. Based on molecular
graphics inspections of these complexes, possible reasons include the form-
ing of n—cation or aromatic interactions, large main-chain induced motions
or even high B—factor values contributing to imprecise side—chain atom po-
sitions.

New SLIDE identiﬁed similar side chains as old SLIDE (primarily to de-
velop protein-ligand induced ﬁt), and was also able to identify some of
the large-motion cases which old SLIDE did not detect. Some additional,
experimentally observed large motions may be detected if the candidate
side chain selection criterion were expanded to consider groups besides un-
satisﬁed polar side chains, including side chains having repulsive contacts

or with solvent-exposed hydrophobic regions.

. Does rotamer sampling in new SLIDE model side-chain motions similar to

motions from ligand-ﬂee to ligand-bound conformations ?

While identifying the correct set of side-chains for rotamer sampling is

91

 

Side-chain motions undetected by old nor by new SLIDE versions,
but observed from ligand-free and ligand-bound conformations

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

RMSD of crystal complex conformation
3.0+ ;
F
a I
2 I
W2.5-- I .
8 5 I ‘
s - : I :
E I 1 II
.2 2'0“ I : 3,
§ .-F : I _ :
Mr I i I ,I
E I“ I ; .I
..,_1.5-,; 1 “I l 'I "
g if F j. ’ 'I :.
E ‘5’ I ”I I '5 "’
MM :5:: III
0 .2: I fIII! II :l::
(I) II I III ,I III
2 .,, I III ,I III
a: II : ::: ,I III
0.5-:: I:IIII,3 ill
II I .rII‘ Ij III
.;; 1 Hi I. l':
“j IHII II 'l:|
0'0"" lllll 1T "" ' ““
agasgaagsassessasagssassﬁzasaa
'I'I'I I'I'I Ii' ' 'a' 'a':' ' ' ' I'I "I'I'Ia ' ' ' "I
El
iazzaiaaa,iaiaqiiigiiraaIiiita
3;- iis:,§=«§§§-§§§ ”am-5..-,
Side chains moved from ligand-free to ligand-bound conformations

Figure 5.7: Large natural side-chain motions undetected by old or new SLIDE ver-
sions. These side-chains appear to have undergone large displacements for reasons
other than inter-atomic collisions or to satisfy unsatisﬁed atoms. Some reasons ob-
served in molecular graphics inspection of the complexes relative to ligand-free con-
formations, include achieving n-cation or aromatic interactions, large main-chain in-
duced motions, even high B-factor values contributing to imprecise side-chain atom
positions.

 

92

crucial, it is also important to compare the motions of these side chains
in the crystal structures against the motions modeled by SLIDE. Two
aspects of side-chain motions were analyzed - displacement and direction

of the motion.

0 Displacement Magnitude

Does rotamer sampling achieve motions of similar magnitude as ob-
served between ligand-free and ligand-bound crystal structures ?

The RMSD between ligand-free and ligand-bound or docked, side-
chain conformation is taken as the measure of side-chain displace-
ment. For induced-ﬁt side-chain motions presented in Figure 5.5,
while displacement magnitudes are similar for old and new SLIDE
versions, these magnitudes do not match those from crystal struc-
tures consistently; experimentally observed displacements vary more
in magnitude. For score-enhancing rotamer sampling for unsatisﬁed
side chains, again the side-chain displacements achieved by SLIDE do
not agree with displacements from crystal structures Investigations of
the large displacements achieved through rotamer substitutions in Fig-
ure 5.6 show that most of the cases are tyrosine residues which were
detected as unsatisﬁed after induced-ﬁt docking. Rotamers helped
satisfy the hydrogen-bond potential of the hydroxyl oxygen. While
scores improved due to improved interactions, sampling also moved
the tyrosine side chains further away from their true ligand-bound
conformations.

o Direction of Motion

Does rotamer sampling model side-chain motion closer to the ligand-
bound conformation, as observed from comparing ligand-free and

ligand-bound crystal structures .9

93

More importantly, through Figures 5.5,and 5.6, we learn that side
chains often were moved further from their crystal structure positions.
The motion was deduced to be more correct if the RMSD between
docked and ligand-bound side-chain conformations was less than the
RMSD between ligand-free and ligand-bound side-chain conforma-
tions; otherwise the motion was considered incorrect. For induced-ﬁt
docking, the smallest angle of the available angles is usually taken to
resolve a collision, as explained in Figure 3.4. However, the choice of
the smallest to minimize conformational effort while resolving the col-
lisions often leads to angular choices that move side chains away from
the desired position, as observed in Figure 5.5. In Figure 5.6 too, most
of the side chains were positioned further from their ligand-bound po-

sitions, though the score improved by satisfying the unsatisﬁed polar

groups.

5.4 Analysis

An interesting observation is that while new SLIDE successfully enhances scores
across all dockings, and often for the best dockings too, many of its motions, whether
driven by collision-resolution or score-maximization, move side chains through larger
than necessary distances. There are, in effect, many more ways of moving side chains
incorrectly than correctly, which leads to the observed results in Figures 5.5 and 5.6.
The main reason is that the unsatisﬁed side chains were moved too much.

Reasons why other rotamers were rejected were investigated for each of the sam-
pled rotamer for any unsatisﬁed side chains in the 24 structures. Each sampled
rotamer is denoted as a ﬁlled circle in Figure 5.8, with its color conveying whether

the rotamer was selected or rejected due to steric overlaps or decrease in score. As

94

various rotamers were substituted and collision resolution was attempted, rotamers
were rejected either due to decrease in score or unresolvable collisions. The base con-
formation for each rotamer sampling represents the collision free state of the docked
complex before the start of rotamer sampling.

Valuable insights can be drawn which help explain score improvement despite

the increase in RMSD relative to the ligand-bound conformation:

o A is an example, in Figure 5.8, of side-chain rotamer samplings where none of
the rotamers, be they closer or further relative to ligand-bound conformation,
could achieve an overlap free conformation. Nature may resolve such cases with

through more complex motions.

o B is an example of a side chain where the base conformation ‘b’ was far from
the ligand-bound conformation, and a rotamer was found that was closer to
the ligand-bound conformation. However, the score improvement for the most
correctly positioned rotamer was equivalent to the score improvements for other
overlap-free rotamers that were further from ligand-bound conformation. The
ﬁnal rotamer choice is based on the maximum probability after score-based
mean-ﬁeld optimization, which depends on both the rotamer probability and
the score improvement. The higher-probability rotamer happened to be less

native-like.

a C is an example of side-chain cases where a score-improving rotamer close to the
ligand-bound conformation is sampled, but rejected due to unresolvable steric
overlaps. This can happen in SLIDE’s minimal rotation paradigm, since a small
rotation for any of the single bonds may cause difﬁcult-to-resolve new collisions
in a tightly packed interface, while nature may resolve such collisions by a slight

backbone motion.

a D is an example of side chain case where scoring function evaluated the same

95

 

 

, _ 0 Rotamer sam Ie
Rotamer chouces for unsatisﬁed Label Legend?
side chains based on score improvement ” b - base conf. before rotamer sampling
3 - rotamer rejected due to steric clashes

 

 

 

 

 

   
 

 

 

' - rotamer was steric overlap free
j 3%," .3. .- .1. $1 .- . Lower score
— 0' a" 0' h .-
2 _ .b . .- .- .- .I-
g _ ,. .s' 1 ,, -o.7oo
_ -b0‘ .s' e '
‘8 - -b °' '6 's -s l -o.4oo
a - Q' o' 0' e' a" 0" D i l
*: ?b-ss-sossms as as i !-0.100
B - ob ﬁrsts-s Os ‘ '
a - .b .s- » ! loose 2
— e' e'
8 _ .. .r ’5’. «99' -0.006 §
._ - ab 0' 0““ 08' e' d" o' o' a
In _ ob ,. .. .. .. o ‘8
5 - C Ob 0' d'
3 - \ os os ~s - Is - 0.003 g
.3 _ .0 .
—I d) 0' 0'”
'8 _ cbs .. -s .95 0.500
ﬁ - A —> -b us-s-s ~96 G -s -s
a - ab .0 us mos «s «as 05 030°
5 — .gb .‘ e' a' a" J'e' "BR 1.1%
_ 0% e'
' 05 es .Qj 1.400
.. 0' \B/ I“)
. Higher score

I I l I I l I I
-1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
Side-chain RMSD relative to ligand-bound conformation (A)

Figure 5.8: This image is presented in color. Selection or rejection reasons for
each of the sampled rotamers for the unsatisﬁed side chains in the best dockings of
the 24 structures used in experiments. X-axis represents the RMSD between the
docked side-chain conformation and the ligand-bound conformation. Y-axis entries
are the cases of unsatisﬁed side chains; their names have been omitted for clarity.
For each side chain, the sampled rotamers are represented as ﬁlled circles. Symbol
‘b’ represents side-chain conformation after the induced-ﬁt docking and before the
rotamer sampling, ‘3’ indicates the rotamer was rejected due to unresolvable steric
overlaps, while ‘*’ means that steric overlaps, if any, were resolved. The color of
the ﬁlled circle represents the change in score - from brown towards blue indicates
the rotameric conformation enhanced the score compared to starting or the base
conformation ‘b’, while brown towards red depicts a decrease in score.

 

96

score for each overlap-free rotamer. In such cases, the likelihood of a rotamer
being chosen depends on the backbone probability used in the score—based mean-
ﬁeld maximization. A rotamer further from the ligand-bound conformation may

be chosen due to having a high probability.

5.5 Conclusion

One of the most important observation involves “B” cases. Representing the
side-chain conformation after preliminary induced-ﬁt docking, these conformations
are already very close, within 1.0 A, of the ligand—bound conformations in most of
the cases. This is observed to be true for almost all of the unsatisﬁed polar side chains.
This implies that while rotamers help sample a larger 3D space, instead a local 3D
sampling of rotations should ﬁrst be explored to satisfy unsatisﬁed polar groups, as
this approach is more likely to stay close to the true ligand-bound conformation.
This also conveys that nature too may resist big rotameric shifts in conformation,
since that involves overcoming high strain energy barriers due for the rotamer tran-
sitions, and substantial repacking of the interface. Hence though rotamer sampling
helped improve scores and hydrogen-bond interactions, across a substantial number
of dockings, it was observed that side-chain conformation prediction actually wors-
ened due to ambiguity in choice of the correct rotamer given that several rotamers
could result in the same score improvement. Furthermore, even the most positionally
correct rotamer was often further from the ligand-bound position than was the initial,
collision-resolved side chain. The metrics for choosing a rotamer were to maximize
the interactions as measured by the new scoring function, and to choose rotamers
that are observed the most frequently in nature. There can be no a priori knowledge
within rotamer sampling or optimization to recognize which side-chain conformations

are close to ligand-bound conformations. Recognizing that the target side—chain con-

97

formations are typically within 1 A RMSD of the initial position, ﬁner and more local

conformational sampling may be all that is required.

98

Chapter 6

Summary and Future Directions

Even though enhancing protein side-chain interactions remains the goal of this
work, the ultimate goal of docking tools is to model motions to predict crystal complex
conﬁguration using apo protein conformation and unbound ligands. Towards these

ends, our key contributions have been:

0 Discovering that the intra—protein hydrogen bond network in the binding site
largely remains conserved upon ligand binding. Most of the rearrangements in
the hydrogen bond network increase the number of hydrogen bonds or serve
as a ligand-recognition mechanism by replacing protein-water hydrogen bonds

with protein-ligand hydrogen bonds.

0 Observing, and corroborating, that SLIDE’s paradigm of minimal rotations
for induced-ﬁt docking[50] performs well for preserving most of the hydrogen
bond network. However, favoring minimal rotations misses 15% of the cases
where larger motions are needed to improve docking chemistry, as found from

comparing ligand-free and ligand-bound crystal structures.

0 Analyzing ligand-free and ligand-bound crystal structures for the reasons be-

hind large—angle rotations in side chains upon ligand binding. For half of the

99

cases, large rotations aided in satisfying buried polar groups which would have
remained unpaired in the protein-ligand interface if they had remained in their
ligand-free conformations. The current criteria for selecting side chains for
conformational optimization - steric complementarity and satisfying unpaired
hydrogen-bond donors and acceptors - could be easily expanded to include other
cases of suboptimal chemical complementarity like side chains with repulsive

contacts or side chains with solvent exposed hydrophobic groups.

0 Implementing in SLIDE an infrastructure for modeling these larger motions

through the use of rotamer libraries.

0 Determining that rotamer-based sampling typically moves side chains too much,
and it can be difﬁcult to determine the correct rotamer when several rotamer
choices improve the protein-ligand complementarity score equivalently. In gen-
eral, the induced-ﬁt docked conformation proved to be closer to the ligand-
bound conformation than were any of the choices in the rotamer library. Thus,
local rotational sampling (going beyond minimal rotations for removing van der
Waals collisions) is likely to be a better approach for repositioning interfacial

side chains to satisfy their hydrogen-bonding potential.

6.1 Future Directions

Future work can build upon the insights and algorithmic and geometric frame-

work that has resulted from this work. Speciﬁcally:

0 Local, directed rotations. Akin to directed rotations in induced-ﬁt to resolve
collisions, local rotations can be used to explore potential hydrogen bonds and

salt bridges with nearby protein and ligand atoms.

100

e Ligand ﬂexibility. While optimizing side-chain placement has been the focus
in this work, the ligand’s rotatable bonds present equally probable opportuni-
ties to enhance binding-site chemical complementarity. Exploring motions in
the ligand in balance with those in the protein would also help model ligand

ﬂexibility in the docking process.

0 Unbiased protein template. While we conducted experiments using an active-
site template derived from ligands known to bind the protein in nature, SLIDE’s
typical use is to ﬁnd new potential ligands. Hence unbiased templates (which
include no knowledge of known ligands’ interactions) should also be used to

validate performance improvements from such local sampling.

0 Residue selection criteria. As mentioned before, about half of the large side-
chain motions in crystal structures were not detected by the current criteria to
select residues for ﬂexible modeling. These motions can be investigated further

to learn the reasons for their motions and deﬁne criteria to identify such residues.

0 Handling water molecules. Experiments in this work did not consider water
molecules during the docking process. Including interfacial conserved waters
should help in correctly identifying if side chains have unsatisﬁed polar groups
or not. This should improve results by avoiding search for better geometries of
polar side chains already satisﬁed by water, as well as recognizing water-exposed

hydrophobic side chains in the protein-ligand interface.

0 Scoring function discrimination between alternative conformations. The current
scoring function was trained on crystal complexes with known binding afﬁni-
ties. These complexes are unlikely to offer many cases with repulsive contacts
or unsatisﬁed polar atoms left due to rotations or water displacements, even
though such scenarios are bound to occur frequently during docking. A scoring

function, trained on data that includes favorable as well as unfavorable con-

101

tacts may better differentiate good conformations from bad ones in the same

neighborhood.

With these thoughts on future opportunities, we would like to conclude this
thesis. This work is still in progress, and we are excited to pursue the above directions

and continue working towards improving state of the art in protein-ligand docking.

102

BIBLIOGRAPHY

[1] Adrian A. Canutescu, Andrew A. Shelenkov, and Roland L. Dunbrack Jr. A
graph-theory algorithm for rapid protein side-chain prediction. Protein Science,
12:2001—2014, 2003.

[2] Heather A. Carlson, Kevin M. Musukawa, Kathleen Rubins, Fredric D. Bush-
man, William L. Jorgensen, Robert D. Lins, James M. Briggs, and J. Andrew
McCammon. Developing a Dynamic Pharamacophore Model for HIV -1 Integrase.
Journal of Medicinal Chemistry, 43:2100—2114, 2000.

[3] H. Claussen, C. Buning, M. Rarey, and T. Lengauer. FlexE: Efﬁcient molecular
docking considering protein structure variations. Journal of Molecular Biology,
308(2):377—395, 2001.

[4] Bassil I. Dahiyat, D. Benjamin Gordon, and Stephen L. Mayo. Automated design
of the surface positions of protein helices. Protein Science, 6:1333—1337, 1997.

[5] R. L. DesJarlais, R. P. Sheridan, G. L. Seibel, J. S. Dixon, 1. D. Kuntz, and
R. Venkataraghavan. Using shape complementarity as an initial screen in de-
signing ligands for a receptor binding site of known three-dimensional structure.
Journal of Medicinal Chemistry, 31(4):722—729, 1988.

[6] Johan Desmet, Marc De Maeyer, and Ignace Lasters. The Dead-End Elimination
Theorem: A New Approach to the Side-Chain Packing Problem. The Protein
Folding Problem and Tertiary Structure Prediction, Birkhauser, pages 307—338,
1994.

[7] I. D.Kuntz, J. M. Blaney, S. J. Oatley, R. Langridge, and T. E. Ferrin. A geo-
metric approach to macromolecule—ligand interactions. Journal of Computational
Chemistry, 19:269—288, 1982.

[8] Matthew D. Eldridge, Christopher W. Murray, Timothy R. Anton, Gaia V.
Paolini, and Roger P. Mee. Empirical scoring functions: I. The development of
a fast empirical scoring function to estimate the binding afﬁnity of ligands in
receptor complexes. Journal of Computer-Aided Molecular Design, 11:425—445,
1997.

[9] T. Ewing and I. D. Kuntz. Critical evaluation of search algortihms for automated
molecular docking and database screening. Journal of Computational Chemistry,
18:1175—1189, 1997.

[10] D. Fischer, S. L. Lin, H. L. Wolfson, and R. Nussinov. A geometry based suite of
molecular docking processes. Journal of Molecular Biology, 248:459—477, 1995.

103

[11] D. K. Gehlhaar, G. M. Verkhivker, P. A. chto, C. J. Sherman, D. B. Fogel,
L. J. Fogel, and S. T. Freer. Molecular recognition of the inhibitor AG—1343
by HIV-1 protease: Conformationally ﬂexible docking by evolutionary prograrn-
ming. Chem. Biol, 2(5):317—324, 1995b.

[12] Holger Gohlke, Manfred Hendlich, and Gerhard Klebe. Knowledge-based Scoring
anction to Predict Protein-Ligand Interactions. Journal of Molecular Biology,
295:337—356, 2000.

[13] P. J. Goodford. A computational procedure for determining energetically favor-
able binding sites on biologically important macromolecules. Journal of Medici-
nal Chemistry, 13(7):849—857, 1985.

[14] Jeep Heringa and Patrick Argos. Strain in Protein Structures as Viewed Through
Nonrotameric Side Chains: 1. Their Position and Interaction. Proteins: Struc-
ture, Function, and Genetics, 37:30—43, 1999.

[15] J aap Heringa and Patrick Argos. Strain in Protein Structures as Viewed Through
Nonrotameric Side Chains: II. Upon Ligand Binding. Proteins: Structure, Func-
tion, and Genetics, 37:44—55, 1999.

[16] Brandon M. Hespenheide, A. J. Rader, M. F. Thorpe, and Leslie A. Kuhn.
Identifying protein folding cores from the evolution of ﬂexible regions during
unfolding. Journal of Molecular Graphics and Modelling, 21:195—207, 2002.

[17] Thomas Huber, Andrew E. Torda, and Wilfred F. van Gunsteren. Optimization
Methods for Conformational Sampling Using a Boltzmann-Weighted Mean Field
Approach. Biopolymers, 39:103—114, 1996.

[18] OpenEye Scientiﬁc Software Inc. Quality Atomic Charges, Proton Assignment
and Canonicalization . http://www.eyesopen.com/docs/html/quacpac, 2004.

[19] Richard M. Jackson, Henry A. Gabb, and Michael J. E. Sternberg. Rapid Reﬁne
ment of Protein Interfaces Incorporating Solvation: Application to the Docking
Problem. Journal of Molecular Biology, 276:265—285, 1998.

[20] D. J. Jacobs, A. J. Rader, L. A. Kuhn, and M. F. Thorpe. Protein ﬂexibility
predictions using graph theory. Proteins: Structure, Function and Genetics,
44:150—165, 2001.

[21] G. Jones, P. Willett, and R. C. Glen. Molecular recognition of receptor sites
using a genetic algorithm with a description of desolvation. Journal of Molecular
Biology, 245:43—53, 1995.

[22] G. Jones, P. Willett, R. C. Glen, A. R. Leach, and R. Taylor. Development
and validation of a genetic algorithm for ﬂexible docking. Journal of Molecular
Biology, 3:727—748, 1997.

104

[23] D. E. Koshland Jr. Application of a Theory of Enzyme Speciﬁcity to Protein
Synthesis. Proceedings of National Academy of Science, USA, 44:98-123, 1958.

[24] Roland L. Dunbrack Jr. Rotamer libraries in the 21" century. Current Opinion
in Structural Biology, 12:431—440, 2002.

[25] D. Keller, M. Shibata, E. Markus, R. Ornstein, and R. Rein. Finding the global
minimum: A fuzzy end elimination implementation. Protein Engineering, 8:893—
904, 1995.

[26] P. Koch] and M. Delarue. Mean-ﬁeld minimization methods for biological macro-
molecules. Current Opinion in Structural Biology, 6:222—226, 1996.

[27] L. A. Kuhn, C. A. Swanson, M. E. Pique, J. A. Tainer, , and E. D. Getzoﬂ'.
Atomic and Residue Hydrophilicity in the Context of Folded Protein Structures.
Proteins: Structure, Function, and Genetics, 23:536—547, 1995.

[28] I. Lasters and J. Desmet. The fuzzy-end elimination theorem: correctly imple-
menting the side chain placement algorithm based on the dead-end elimination
theorem. Protein Engineering, 6:717—722, 1993.

[29] Ming Lei, Maria I. Zavodszky, Leslie A. Kuhn, and Michael F. Thorpe. Sampling
Protein Conformations and Pathways. Journal of Computational Chemistry,
25(9):]133—48, 2004.

[30] D. M. Lorber and B. K. Shoichet. Flexible ligand docking using conformational
ensembles. Protein Science, 7:938—950, 1998.

[31] I. K. McDonald and J. M. Thornton. Satisfying hydrogen bonding protein in
proteins. J. Mol. Biol, 238:777—793, 1994.

[32] E. C. Meng, B. K. Shoichet, and I. D. Kuntz. A geometric approach
to macromolecule—ligand interactions. Journal of Computational Chemistry,
13(6):505-524, 1992.

[33] G. M. Morris, D. S. Goodsell, R. S. Halliday, R. Huey, W. E. Hart, R. K. Belew,
and A. J. Olson. Automated docking using a Lamarckian genetic algorithm and
an empirical binding free energy function. Journal of Computational Chemistry,
19:1639—1662, 1998.

[34] Rafael Najmanovich, Joseph Kuttner, Vladimir Sobolev, and Marvin Edelman.
Side-Chain Flexibility in Proteins Upon Ligand Binding. Proteins: Structure,
Function, and Genetics, 39:261-268, 2000.

[35] R. Norel, D. Fischer, H. Wolfson H, and R. N ussinov. Molecular surface recogni-
tion by a computer vision based technique. Protein Engineering, 7:39—46, 1994.

[36] Protein Structural Analysis and Design Lab, MSU. Quick Guide to SLIDE
Version 2.30a. http://www.bch.msu.edu/kuhn/projects/slide/user_manual.html,
2005.

105

[37] Olivier Roche, Ryuichi Kiyama, and III Charles L. Brooks. Ligand-Protein
DataBase: Linking Protein-Ligand Complex Structures to Binding Data. Jour-
nal of Medicinal Chemistry, 44:3592—3598, 2001.

[38] Adrian Roitberg and Ron Elber. Modeling side chains in peptides and proteins:
Application of the locally enhanced sampling and the simulated annealing meth-
ods to ﬁnd the minimum energy conformations. Journal of Chemical Physics,
95:9277—9287, 1991.

[39] Paul C. Sanschagrin. Computational Techniques for Modeling Protein-Ligand
Interactions and their Application to Serine Proteases and Asparaginyl-tRN A
Synthetase. Ph. D. dissertation, Michigan State University, 2001.

[40] Lana Schaffer and Gennady M. Verkhivker. Predicting Structural Effects on
HIV-1 Protease Mutant Complexes With Flexible Ligand Docking and Protein
Side-Chain Optimization. Proteins: Structure, Function, and Genetics, 33:295-
310, 1998.

[41] Volker Schnecke and Leslie A. Kuhn. Flexibly Screening for Molecules Interacting
with Proteins. Rigidity in Theory and Applications, Plenum Publishing, New
York, 385-400, 1999.

[42] Volker Schnecke and Leslie A. Kuhn. Virtual screening with salvation and ligand-
induced complementarity. Perspectives in Drug Discovery and Design, 20:171-
190, 2000.

[43] Volker Schnecke, Craig A. Swanson, Elizabeth D. Getzoff, John A. Tainer, and
Leslie A. Kuhn. Screening a Peptidyl Database for Potential Ligands to Proteins
With Side-Chain Flexibility. Proteins: Structure, Function, and Genetics, 33:74-
87, 1998.

[44] B. K. Shoichet, D. L. Bodian, and I. D. Kuntz. Molecular docking using shape
descriptors. Journal of Computational Chemistry, 13:380—397, 1992.

[45] D. Stickle, L. Presta, K. Dill, and G. Rose. Hydrogen bonding in globular
proteins. J. Mol. Biol, 226:1143—1159, 1992.

[46] P. Tuffrey, C. Etchebest, S. Hazout, and R. Lavery. A new approach to the
rapid determination of protein side-chain conformations. Journal of Biomolecular
Structural Dynamics, 8(6):1267—89, 1991.

[47] G. Vriend. WHAT IF: A molecular modeling and drug design program. Journal
of Mololecular Graphics, 8:52-56, 1990.

[48] Renxiao Wang, Xueliang Fang, Yipin Lu, and Shaomeng Wang. The PDBbind
Database: Collection of Binding Afﬁnities for Protein-Ligand Complexes with
Known Three-Dimensional Structures. Journal of Medicinal Chemistry, 47:2977—
2980, 2004.

106

[49] H. J. Wolfson and Y. Lamdan. Geometric hashing: A general and efﬁcient model-
based recognition scheme. IEEE Internation Conference on Computer Vision,
Tampa, Fl, pages 238-249, 1988.

[50] Maria I. Zavodszky and Leslie A. Kuhn. Side-Chain Flexibility in Protein—Ligand
Binding: The Minimal Rotation Hypothesis. Protein Science, 1421104 — 1114,
2005.

[51] Maria I. Zavodszky, Paul C. Sanschagrin, Rajesh S. Korde, and Leslie A. Kuhn.
Distilling the Essential Features of a Protein Surface for Improving Protein-
Ligand Docking, Scoring, and Virtual Screening. Journal of Computer-Aided
Molecular Design, 16:883—902, 2002.

107

   

[[[r[[[[[[[[[[[[[[ii