MODELING (UN)BINDING KINETICS OF BIOLOGICALLY RELEVANT SYSTEMS
   USING RESAMPLING OF ENSEMBLES BY VARIATION OPTIMIZATION
                                        By
                                 Thomas Dixon
                              A DISSERTATION
                                  Submitted to
                          Michigan State University
                   in partial fulfillment of the requirements
                                for the degree of
   Computational Mathematics Science and Engineering – Doctor of Philosophy
              Biochemistry and Molecular Biology – Dual Major
                                       2021


                                        ABSTRACT
  MODELING (UN)BINDING KINETICS OF BIOLOGICALLY RELEVANT SYSTEMS
       USING RESAMPLING OF ENSEMBLES BY VARIATION OPTIMIZATION
                                              By
                                        Thomas Dixon
    Conventional drug design optimizes binding affinity when designing molecules to maxi-
mize efficacy. However, recent studies show that taking kinetics into account when designing
drugs is necessary in some systems where the drug efficacy does not correlate with binding
affinity, instead correlating with residence time (RT). To maximize the RT, knowledge of
the kinetic pathway is required, but not currently feasible to determine experimentally due
to the instability of the transition state. Molecular dynamics (MD) allows us to simulate
these pathways with atomic resolution. However, the rare events of interest often occur at
timescales as long as milliseconds to hours, and most MD trajectories are computationally
limited to the microsecond timescale. In this thesis we use a variant of the Weighted Ensem-
ble (WE) enhanced sampling algorithm, Resampling of Ensembles by Variation Optimization
(REVO), to overcome the limitations of MD. This approach is more computationally effi-
cient than conventional MD and does not alter the system’s Hamiltonian nor does it affect
the force field parameters used in simulation. We use REVO simulations to produce full
binding and unbinding trajectories of biologically relevant systems such as the unbinding of
a radioligand bound to Translocator Protein (18kDa) (TSPO), a potential drug target in the
treatment of neurodegenerative diseases. We validate these pathways by predicting kinetic
rate constants and binding free energies and comparing these results to experiment. Finally,
we developed new distance metrics that use experimental data to help guide simulations to
a desired conformation. We tested these new distance metrics using Hydrogen deuterium
exchange (HDX) data to form the ternary complex between a ligase-proteolysis-targeting
chimera (PROTAC) dimer and a target protein.


Copyright by
THOMAS DIXON
2021


To my supportive fiancée and family
                iv


                                 ACKNOWLEDGMENTS
Firstly I would like to thank my advisor Dr Alex Dickson for all of his guidance and support.
He has been a kind and understanding advisor who has pushed me to be a better scientist
and researcher. Secondly, I’d like to thank the members of the Dickson Lab. Working with
you has been an amazing experience and I have learned so much from you all. Next I would
like to thank my committee for their guidance and insight to help guide my research. I would
also like to thank the Department of Computational Mathematics, Science and Engineering
as well as the Biochemistry and Molecular Biology Department for hosting my studies as I
pursued this degree. Thank you to my undergraduate advisors: Dr. Michelle Ammerman
and Dr. Johnathan Wenzel for allowing me to experience the research process for the first
time in your labs, the mentorship and guidance you gave me as I prepared for graduate
school. I would finally like to thank my friends and family for all of their love, support, and
for giving me a place to decompress over the last few years. I could not have done any of
this without you.
                                              v


                             TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         ix
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           x
KEY TO ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . .                    1
   1.1 Importance of Kinetics in Drug Design . . . . . . . . . . . . . . . . . .            1
   1.2 Computational Methods to Determine Kinetics . . . . . . . . . . . .                  3
       1.2.1 Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . .         4
       1.2.2 Enhanced Sampling . . . . . . . . . . . . . . . . . . . . . . . . . .          6
             1.2.2.1 Parallel Tempering Methods . . . . . . . . . . . . . . .               7
             1.2.2.2 Altered Potential Energy Methods . . . . . . . . . . .                 8
             1.2.2.3 Trajectory Parallelization Enhanced Sampling Meth-
                       ods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    9
       1.2.3 Weighted Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . .         10
       1.2.4 Resampling of Ensembles by Variation Optimization . . . . .                   11
       1.2.5 Rate Calculations by Ensemble Splitting . . . . . . . . . . . . .             12
       1.2.6 Markov State Models . . . . . . . . . . . . . . . . . . . . . . . . .         13
   1.3 Outline of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     15
CHAPTER 2 PREDICTING LIGAND BINDING AFFINITY FOR THE
              SAMPL6 CHALLENGE FROM ON- AND OFF-RATES
              USING WEIGHTED ENSEMBLES OF TRAJECTORIES .                                   18
   2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  18
   2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   20
       2.2.1 Host-guest systems . . . . . . . . . . . . . . . . . . . . . . . . . .        20
       2.2.2 Dynamics Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . .        22
       2.2.3 Reweighting of Ensembles by Variance Optimization . . . . .                   24
       2.2.4 Calculating rates by ensemble splitting . . . . . . . . . . . . . .           26
       2.2.5 REVO simulation details . . . . . . . . . . . . . . . . . . . . . . .         28
             2.2.5.1 Note about CB8-G3-0 and CB8-G3-4 . . . . . . . . . .                  29
       2.2.6 Visualization of trajectory trees . . . . . . . . . . . . . . . . . .         29
       2.2.7 Clustering and visualization of conformation space networks                   30
   2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
       2.3.1 Warped walkers . . . . . . . . . . . . . . . . . . . . . . . . . . . .        32
       2.3.2 Kinetics and free energies . . . . . . . . . . . . . . . . . . . . . .        32
       2.3.3 Trajectory trees reveal correlation between exit points . . . .               36
       2.3.4 Conformation space networks reveal connection between
             starting poses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    38
   2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  40
                                            vi


CHAPTER 3 ON CALCULATING FREE ENERGY DIFFERENCES
              USING ENSEMBLES OF TRANSITION PATHS . . . . . .                             42
  3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  42
  3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   44
      3.2.1 Host-guest systems . . . . . . . . . . . . . . . . . . . . . . . . . .        44
      3.2.2 Molecular dynamics . . . . . . . . . . . . . . . . . . . . . . . . . .        45
      3.2.3 Reweighting of Ensembles by Variance Optimization . . . . .                   46
      3.2.4 Calculating rates by ensemble splitting . . . . . . . . . . . . . .           48
      3.2.5 Calculating electrostatic interaction energies . . . . . . . . . .            50
  3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
      3.3.1 Derivation of correction terms . . . . . . . . . . . . . . . . . . .          50
      3.3.2 Extended trajectory ensembles with lower friction coefficients                54
      3.3.3 Free energy estimates, correction terms and comparison
             with previous benchmarks . . . . . . . . . . . . . . . . . . . . . .         58
  3.4 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .       61
CHAPTER 4 MEMBRANE-MEDIATED LIGAND UNBINDING OF
              THE PK-11195 LIGAND FROM TSPO . . . . . . . . . . . .                       65
  4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  65
  4.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . .       67
      4.2.1 Protein Preparation . . . . . . . . . . . . . . . . . . . . . . . . . .       67
      4.2.2 Docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     68
      4.2.3 Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . .        68
      4.2.4 REVO Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . .         69
      4.2.5 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . .         72
      4.2.6 Clustering and Network Layout . . . . . . . . . . . . . . . . . .             72
      4.2.7 Quantifying Unbinding Pathways . . . . . . . . . . . . . . . . .              73
      4.2.8 Calculating Non-bonded Energies . . . . . . . . . . . . . . . . .             73
      4.2.9 Calculating Off-Rates and Mean First Passage Times using
             Hill Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    74
      4.2.10 Calculating Mean First Passage Times using Markov State
             Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   75
      4.2.11 Selecting Poses for Straightforward MD Simulations . . . . .                 76
  4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
      4.3.1 PK-11195 Unbinding Pathway . . . . . . . . . . . . . . . . . . .              77
      4.3.2 PK-11195 Rates and Residence Times . . . . . . . . . . . . . .                85
      4.3.3 PK-11195 Transition State . . . . . . . . . . . . . . . . . . . . . .         87
  4.4 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .       98
CHAPTER 5 ATOMIC-RESOLUTION PREDICTION OF DEGRADER-
              MEDIATED TERNARY COMPLEX STRUCTURES BY
              COMBINING MOLECULAR SIMULATIONS WITH HY-
              DROGEN DEUTERIUM EXCHANGE . . . . . . . . . . . . 101
  5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
  5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
                                           vii


      5.2.1 Experimental Methods . . . . . . . . . . . . . . . . . . . . . . . .          106
            5.2.1.1 Cloning, expression and purification of SMARCA2
                      and VHL/EloB/C . . . . . . . . . . . . . . . . . . . . . .          106
            5.2.1.2 Hydrogen Deuterium Exchange Mass Spectrometry                         108
            5.2.1.3 Structural Determination of SMARCA2:ACBI1:VHL
                      Complex . . . . . . . . . . . . . . . . . . . . . . . . . . . .     110
      5.2.2 Computational Methods . . . . . . . . . . . . . . . . . . . . . . .           111
            5.2.2.1 Unbound System Preparation . . . . . . . . . . . . . .                111
            5.2.2.2 Molecular Dynamics . . . . . . . . . . . . . . . . . . . .            112
            5.2.2.3 Generating Bound Ensemble . . . . . . . . . . . . . . .               112
            5.2.2.4 REVO-epsilon Weighted Ensemble method . . . . . .                     113
            5.2.2.5 Distance Metrics . . . . . . . . . . . . . . . . . . . . . . .        114
            5.2.2.6 Ternary complex docking protocol . . . . . . . . . . . .              115
            5.2.2.7 HREMD simulation . . . . . . . . . . . . . . . . . . . . .            116
            5.2.2.8 Conformational free energy landscape determination                    117
            5.2.2.9 Calculating Interface RMSD . . . . . . . . . . . . . . .              120
  5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
      5.3.1 Degraders with different efficiency induce similar ternary
            complex structures in X-ray crystallography. . . . . . . . . . .              122
      5.3.2 Hydrogen Deuterium Exchange Reveals Extended Protein-
            Protein Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . .      125
      5.3.3 Efficient simulation of ternary complex formation using REVO
            Weighted Ensemble simulations . . . . . . . . . . . . . . . . . .             130
      5.3.4 HDX improves prediction of ternary complex using docking                      136
      5.3.5 Conformational sampling of ternary complexes . . . . . . . . .                137
  5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  144
CHAPTER 6 SUMMARY OUTLOOK AND IMPACT . . . . . . . . . . . . 147
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
                                          viii


                                     LIST OF TABLES
Table 2.1: Pose-averaged rates and affinities . . . . . . . . . . . . . . . . . . . . . . .     33
Table 3.1: Binding and unbinding rates as a function of friction coefficient (γ). The
           uncertainties shown use the standard error of the mean calculated from 5
           and 10 independent REVO runs for binding and unbinding, respectively.
           The quantities from Chapter 2 were obtained with 5 REVO runs that
           used different initial conformations, each of which were 2000 cycles in length. 56
Table 3.2: Raw (∆G0 ) and corrected (∆Gcorr ) free energy values using simulation
           data from three different friction coefficients. Values are in kcal/mol and
           uncertainties are calculated using propagation of the standard error of
           the mean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    59
Table 5.1: Binding affinity (Kd ), efficiencies (IC50, DC50), and cooperativity (α)
           of PROTAC 1, PROTAC 2, and ACBI1 degraders. Ternary IC50 and
           binary (SMARCA2) DC50 values are reported; the cooperativity is the
           ratio of binary over ternary IC50. Table adapted from Farnaby et al. [208]. 105
Table 5.2: Details of Hamiltonian Replica Exchange Molecular Dynamics (HREMD)
           simulations. Protein complexes, number of atoms in a simulation box,
           number of replicas used and the aggregate length of the simulations are
           listed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Table 5.3: Details of HREMD simulations. Effective temperatures and average ex-
           change probabilities of neighboring replicas are listed. . . . . . . . . . . . 121
Table 5.4: Crystallographic table for protein crystal structure 7S4E SMARCA2-
           iso2:ACBI1:von Hippel-Lindeu protein (VHL). . . . . . . . . . . . . . . . 123
Table 5.5: A summary of the performance of REVO simulations run with differ-
           ent distance metrics. Each REVO simulation ran with 48 walkers. The
           number of binding events (Nbinding ) counts the barrier crossings into the
           bound state, defined using an interface root mean square deviation (I-
           RMSD) < 2.0 Å. The number of simulations with binding events (Sims.
           w/ binding) shows the probability of binding success. The total simu-
           lation time (Sim. time) aggregates the length of all trajectories in each
           REVO simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Table 5.6: Comparison of kon rates between simulation and experiment for the
           ACBI1 PROTAC 1, and PROTAC 2 systems. The experimental rate
           for PROTAC 2 has not been determined yet. . . . . . . . . . . . . . . . . 136
                                                ix


                                   LIST OF FIGURES
Figure 1.1: Amount of drug concentration in the blood stream over time. Panel
            A shows this relationship on a linear scale. The effective minimum
            concentration is shown in blue and the minimal toxic concentration is
            shown in red. Panel B shows the log of drug concentration vs time after
            the maximum concentration has been reached. From the semilog plot,
            we can determine the elimination rate from the slope. . . . . . . . . . . .      2
Figure 1.2: (Left) The equation to calculate potential energy of a molecular system.
            r is the bond length. θ is the bond angle, φ is the dihedral angle and
            rij is the atomic distance between atoms i and j. kr , kθ , and kφ are
            force constants. req , θeq , and φeq are equilibrium positions. The n is
            multiplicity, γ is a phase shift to describe a periodic dihedral term. The
            eij is the Lennard-Jones well depth and rm is the distance the potential
            reaches its minimum. qi and qj are charges for atoms i and j and 0 is
            the dielectric constant. (Center) Each summation calculates a separate
            type of energy. (Right) A pictorial representation of each type of energy.
            The parameters are defined by the molecular dynamic force field. This
            figure is modified from Ref [42]. . . . . . . . . . . . . . . . . . . . . . . .  5
Figure 2.1: Structure of the ligands used in this study. (Top) Quinine, referred
            to herein as cucurbit[8]uril (CB8)-G3. (Middle) 5-hexenoic acid (de-
            protonated form), referred to herein as octa acid (OA)-G3. (Bottom)
            4-methyl pentanoic acid (deprotonated form), referred to here ohhh in
            as OA-G6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   21
Figure 2.2: Starting poses for CB8-G3. Side and top views are shown. Coloring for
            pose indices is consistent with Figures 2.7, 2.8 and 2.9. . . . . . . . . . .   22
Figure 2.3: Starting poses for OA-G3. Side and top views are shown. Coloring for
            pose indices is consistent with Figures 2.7, 2.8 and 2.9. . . . . . . . . . .   23
Figure 2.4: Starting poses for OA-G6. Side and top views are shown. Coloring for
            pose indices is consistent with Figures 2.7, 2.8 and 2.9. . . . . . . . . . .   23
Figure 2.5: The REVO algorithm. Each cycle begins by running an ensemble of
            walkers forward in time using unbiased dynamics. The distances be-
            tween the walkers are used to calculate a variance (Eq. 2.2). In the
            resampling loop (blue), coupled cloning and merging operations are pro-
            posed, and they are accepted only if they result in a higher V . If the
            proposed V is lower, the resampling loop is terminated and dynamics
            are continued for the next cycle. . . . . . . . . . . . . . . . . . . . . . .   25
                                                x


Figure 2.6: Ensemble splitting. An equilibrium host-guest binding system is split
             into two non-equilibrium ensembles for the calculation of on and off-
             rates. This is done by defining “bound” and “unbound” basins (left
             and right of each ensemble). The “unbinding” ensemble (top) is the
             set of trajectories that have most recently visited the bound basin. The
             “binding” ensemble (bottom) is the set of trajectories that most recently
             visited the unbound basin. The on and off-rates are directly computed
             using the time averaged trajectory flux (φ̄b or φ̄u ) between the ensembles.   27
Figure 2.7: Weights of warped walkers. Weights of warping events for the unbinding
             (top row) and rebinding (bottom row) simulations. In both cases the
             points are colored according to the index of the corresponding starting
             pose (0, blue; 1, red; 2, yellow; 3, green; 4, brown). . . . . . . . . . . . . 33
Figure 2.8: Spatial distribution of warped walkers. Structures of warping events
             for the unbinding simulations viewed from the front and back. Guest
             ligands are colored according to the index of the corresponding starting
             pose (0, blue; 1, red; 2, yellow; 3, green; 4, brown). . . . . . . . . . . . . 34
Figure 2.9: Predicted kinetics and free energies. The calculated free energies (top),
             off-rates (middle), and on-rates (bottom) are shown as a function of
             simulation time for each starting pose in each host-guest system. The
             curves are colored according to the index of the starting pose as in
             Figures 2.7 and 2.8. The calculated binding free energies are compared
             with experimental measurements (horizontal red line) [123], and the
             computational reference (dashed black line) for each system. . . . . . .       35
Figure 2.10: Trajectory trees show all cloning and merging events in a simulation.
             The trajectory tree for the first 1329 cycles of the OA-G3-0 unbinding
             simulation is shown. Each horizontal row in this tree represents a cycle,
             and the placement of all 48 nodes in the row is determined by minimizing
             an energy function (see “Visualization of trajectory trees” in Methods).
             solvent accessible surface area (SASA) is used to color the nodes, with
             blue and dark green indicating bound structures, and yellow to orange
             indicating unbound. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  37
Figure 2.11: Conformation space networks for the unbinding simulations. Each node
             in a conformation space network (CSN) represents a cluster of host-guest
             structures. Edges in the networks connect clusters that are seen to inter-
             convert in the REVO simulations. The size of each node is proportional
             to the number of times it was observed in the unbinding simulations.
             Nodes are colored according to the solvent accessible surface area of the
             guest molecule, as shown in the color-bars on the right. The clusters
             corresponding to the starting poses are labeled in each network. . . . . .     39
                                                xi


Figure 3.1: (A) The initial pose for the OA-G6 system (side view: left, top view:
            right). Note that some atoms from the host are removed in the side view
            for clarity. The carboxyl oxygens are shown in sphere representation.
            (B) The chemical structure of the G6 ligand in the deprotonated form. .         45
Figure 3.2: Splitting an equilibrium ensemble into two history-dependent ensembles
            using basins. The bound and unbound basins are shown in grey and
            light orange, respectively. The unbinding ensemble (B, top) contains all
            trajectories that last visited the bound basin, which are shown in black.
            The binding ensemble (B, bottom, also referred to as the “rebinding”
            ensemble) contains all trajectories that last visited the unbound basin,
            shown in red. Simulations in a given ensemble are terminated once they
            reach the destination basin and thus switch ensembles. The trajectory
            flux between ensembles is denoted by φu→b and φb→u . The quantity πb
            refers to the probability of the entire top ensemble, and the quantity fb
            denotes the probability of the bound basin within the unbinding ensemble. 48
Figure 3.3: (A) Average temperatures observed in short simulations for different
            friction coefficients (γ). (B) Probability distributions of observed tem-
            peratures from ensembles of longer simulations with different γ. . . . . .      55
Figure 3.4: Predicted on- (top) and off-rates (bottom) as a function of simulation
            time. Each panel is labeled according to the friction coefficient used
            for that set of simulations. The independent simulations are shown in
            shades of orange (kon ) and blue (koff ), and the averages are depicted by
            bold black lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Figure 3.5: Binding (top) and unbinding (bottom) fluxes for γ = 0.001 ps−1 . Fluxes
            are shown for each simulation individually. Parameters are the same as
            those used for higher γ values in the main text. Average fluxes over the
            simulations are shown as thick black lines. . . . . . . . . . . . . . . . . .   57
Figure 3.6: Weights of warped walkers in unbinding (top) and binding (bottom)
            REVO simulations for γ = 0.01, 0.1 and 1.0 ps −1 . Each simulation is
            shown in a different color. . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 3.7: Weights of warped walkers in unbinding (top) and binding (bottom)
            REVO simulations for γ = 0.001 ps−1 . Each simulation is shown in
            a different color. Parameters are the same as those used for higher γ
            values in the main text. . . . . . . . . . . . . . . . . . . . . . . . . . . .  59
                                               xii


Figure 3.8: Free energies as a function of friction coefficient. The dark blue line
            shows the uncorrected free energies calculated at three different γ values.
            The light blue line shows the corrected values, which are shifted upwards
            by 2.72 kcal/mol. The thin red line shows the value reported in Chapter
            2, which employed a friction coefficient of 1.0 ps−1 and used a smaller
            dataset than is reported here. The black horizontal line shows the value
            of a computational reference computed using alchemical perturbation,
            reported in Ref. [148]. The dashed grey line shows the experimental
            measurement, reported in Ref. [153]. . . . . . . . . . . . . . . . . . . . .  61
Figure 4.1: TSPO-PK-11195 system. (A) Front view of the TSPO dimer in the
            membrane with PK-11195 bound. (B) All six starting poses are shown
            from the side view, along the inter-dimer axis. To compare poses, two
            moeities of PK-11195 are colored in black (o-chlorophenyl) and magenta
            (1-methylpropyl), with the rest of the molecule colored according to
            atom name. TM-2 is shown as transparent for clarity. . . . . . . . . . . .    69
Figure 4.2: Protein-ligand interaction plots for the six starting conformations. The
            red suns indicate that the residue has a hydrophobic contact with PK-
            11195. The green dashed lines show hydrogen bonds. . . . . . . . . . . .      70
Figure 4.3: The energy of non-bonded interactions between PK-11195 and TSPO
            as a function of minimum distance between PK-11195 and TSPO. . . . .          75
Figure 4.4: Combined CSN of all REVO simulations from each starting pose. Each
            node in the network represents a cluster of ligand poses and is sized
            according to the cluster weight. Nodes are connected by edges if the
            ligand poses are observed to interconvert in the REVO trajectory seg-
            ments. Nodes are colored according to the lipid accessible surface area
            (LASA). Starting poses are marked in bold and transition state poses
            shown in Fig. 4.5D are marked in italics. . . . . . . . . . . . . . . . . . . 78
                                             xiii


Figure 4.5: Analysis of membrane-mediated exit paths. (A) The coordinate Qij
            is defined as the x-y distance between the center of mass of PK-11195,
            shown as sticks and colored by atom type, and the line that connects the
            centers of mass of helix i and helix j. LP1 is not shown here for clarity.
            (B) The expectation values of the interaction energy between PK-11195
            and TSPO (blue) and between PK-11195 and the membrane (black) are
            shown as a function of Q. In each case the solid line shows Q12 and the
            dashed line shows Q25 . The shaded region indicates the standard error
            over the ensemble of measurements at each Q value. (C) Probability
            curves projected onto Q12 for simulations initialized in Pose D1 (blue)
            and D2 (orange). Q12 values of the starting structures are marked with
            (*). (D) Poses from transition pathways with Q ≈ 0. These poses are
            also labeled in the CSN of Fig. 4.4. Phe46 is shown in purple and Trp50
            is shown in orange. (E) A set of poses along the Q12 pathway colored
            from bound (red) to unbound (blue). Top view is shown on the left and
            a front view is shown on the right. (F) The minimum PK-11195-TSPO
            distance and the Q12 value is shown for each pose in panel (E). (G) The
            z center of mass (COM)position as a function of Q12 . The red lines
            indicate the upper and lower bounds of the membrane as defined by the
            maximum and minimum z coordinate of the lipid membrane. . . . . . . .           79
Figure 4.6: CSN networks indicating the clusters that were observed from each ini-
            tial pose. Red nodes indicate the simulations observed a TSPO-PK-
            11195 conformation that was clustered into that node. . . . . . . . . . .       80
Figure 4.7: Expectation value for Eint as a function of Q12 . The lines are colored by
            residue. Only residues who have a minimum interaction energy below
            −3.5 kcal/mol are shown. The standard error is shown in the lighter
            shaded regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Figure 4.8: Expectation value for Eint as a function of Q25 . The lines are colored by
            residue. Only residues who have a minimum interaction energy below
            −3.5 kcal/mol are shown. The standard error is shown in the lighter
            shaded regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Figure 4.9: The residues with the strongest non-bonded interactions with PK-11195
            on the Q12 pathway. This summarizes the curves in Fig. 4.7, plotting
            the minimum Eint against the Q12 value for which this minimum value is
            observed. The colors indicate the region of TSPO, blue for residues on
            TM-1 and black for residues on TM-2. Only residues with a non-bonded
            energy below -3.5 kcal/mol are shown. . . . . . . . . . . . . . . . . . . .     83
                                             xiv


Figure 4.10: The residues with the strongest non-bonded interactions with PK-11195
             on the Q12 pathway. This summarizes the curves in Fig. 4.8, plotting
             the minimum Eint against the Q25 value for which this minimum value is
             observed. The colors indicate the region of TSPO, red indicates residues
             on the LP1 loop, black for residues on TM-2 and orange for residues on
             TM-5. Only residues with a non-bonded energy below −3.5 kcal/mol
             are shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  83
Figure 4.11: Residues moving along with the ligand during dissociation. Expectation
             values of Q12 for individual residues are shown as a function of the Q12
             of PK-11195. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    84
Figure 4.12: Residues moving along with the ligand during dissociation. Expectation
             values of Q25 for individual residues are shown as a function of the Q25
             of PK-11195. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    84
Figure 4.13: The average dihedral angles for the Markov state model (MSM) states
             for four different rotatable bonds on the PK-11195 ligand. . . . . . . . .        88
Figure 4.14: The standard deviation of the dihedral angles for the MSM states for
             four different rotatable bonds on the PK-11195 ligand. . . . . . . . . . .        89
Figure 4.15: The range of the dihedral angles for the MSM states for four different
             rotatable bonds on the PK-11195 ligand. . . . . . . . . . . . . . . . . . .       90
Figure 4.16: (A) mean first passage time (MFPT) estimates using unbinding fluxes
             observed over the course of REVO simulations. The light shaded area
             shows the standard error across the three simulations conducted for each
             pose. (B) A bar graph of the final MFPTs comparing the Hill Relation
             (green), MSM simulations before (grey), and after (black) the addition
             of new straight forward MD simulations. Pose-specific MFPTs were
             computed from MSMs that were built using only trajectories generated
             from that starting pose. Simulations starting from pose R never entered
             the unbound basin and thus MFPTs could not be determined by either
             method. The experimental MFPT of 34 min is shown as a dashed blue
             line in each panel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Figure 4.17: Combined conformation space network of all REVO simulations from
             each starting pose with the addition of frames from straightforward
             MD simulations, colored by (A) LASA and (B) committor probability.
             Starting poses are marked in bold in panel A. . . . . . . . . . . . . . . .       92
                                                xv


Figure 4.18: An MSM network including both straight forward and REVO trajec-
             tories colored by pose specific committor probability values calculated
             using trajectories beginning in pose 4RYI. States that were not visited
             by these simulations are colored grey. . . . . . . . . . . . . . . . . . . . .  93
Figure 4.19: An MSM network including both straight forward and REVO trajec-
             tories colored by pose specific committor probability values calculated
             using trajectories beginning in pose D1. States that were not visited by
             these simulations are colored grey. . . . . . . . . . . . . . . . . . . . . . . 94
Figure 4.20: An MSM network including both straight forward and REVO trajec-
             tories colored by pose specific committor probability values calculated
             using trajectories beginning in pose D2. States that were not visited by
             these simulations are colored grey. . . . . . . . . . . . . . . . . . . . . . . 95
Figure 4.21: An MSM network including both straight forward and REVO trajec-
             tories colored by pose specific committor probability values calculated
             using trajectories beginning in pose D3. States that were not visited by
             these simulations are colored grey. . . . . . . . . . . . . . . . . . . . . . . 96
Figure 4.22: An MSM network including both straight forward and REVO trajec-
             tories colored by pose specific committor probability values calculated
             using trajectories beginning in pose D4. States that were not visited by
             these simulations are colored grey. . . . . . . . . . . . . . . . . . . . . . . 97
Figure 5.1: Potential energy of all replicas from HREMD simulation of Sys7. Left
             to right: rank0 to rank19. A good overal between adjacent replicas sug-
             gests a sufficient number of replicas were employed and also confirmed
             no phase transition took place during the HREMD simulation. . . . . . . 117
Figure 5.2: Effective temperature trajectories of replicas 0 (red), 5 (blue), 10 (green)
             and 19 (grey) from HREMD simulation of Sys7 . . . . . . . . . . . . . . 118
                                              xvi


Figure 5.3: Ternary complex of SMARCA2 and VCB induced by ACBI1 shows
            structural similarities with PROTAC 1 and PROTAC 2: a Overall per-
            spective of SMARCA2 Isoform 2 (green) and VHL/ElonginC/ElonginB
            (grey) induced by degrader molecule ACBI1 (bright orange). b ACBI1-
            induced interface contacts between SMARCA2 and VCB. The proteins
            are shown in space-filling, the colors are as in a, annotated residues are
            among those that make the highest number of contacts (see c). c A
            contact map for the interface of the crystal structure. The circle size
            reflects the number of atoms (including hydrogen atoms) participating
            in interactions. d Superposition of 6HAY (purple), 6HAX (salmon),
            7S4E (green) by aligning VHL (grey) shows varied conformations of the
            warheads of the three degraders PROTAC 1, PROTAC 2, or ACBI1 (up
            to 1.7 Å) resulting in alterations of SMARCA2 within the ternary complex.124
Figure 5.4: Peptic coverage map of proteolyzed proteins SMARCA2, VHL, Elongin
            C and Elongin B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Figure 5.5: Relative uptake heat map of HDX exchange data of all PROTAC molecules
            1, 2 and ACBI1 bound to binary and Ternary State SMARCA2 isoform
            2 bromo domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Figure 5.6: Relative uptake heat map of HDX exchange data of all PROTAC molecules
            1, 2 and ACBI1 bound to binary and Ternary State VHL. . . . . . . . . 129
Figure 5.7: Relative uptake heat map of HDX exchange data of all PROTAC molecules
            1, 2 and ACBI1 bound to binary and Ternary State Elongin C. . . . . . 130
Figure 5.8: ACBI-induced ternary complex formation of SMARCA2 isoform 2:VCB
            leads to protection of specific sites:a-d, SMARCA2 isoform 2(a), VHL(b),
            Elongin C(c), and Elongin B(d) monitored for hydrogen-deuterium ex-
            change over time. The difference plots of each protein in the binary and
            ternary states are generated by subtracting the deuterium exchange of
            like peptides of the APO or binary from the binary or ternary states
            (defined as Binary∆APO and Ternary∆Binary), respectively. Regions
            that exchange significantly less than the comparative state are depicted
            in blue (negative), whereas regions that exchange significantly more ap-
            pear in red (positive). The resultant difference plots of the binary (e),
            or ternary complex (f) were mapped onto the structure 7S4E. The ex-
            periments were repeated on 2 separate days. . . . . . . . . . . . . . . . . 131
                                               xvii


Figure 5.9: Comparing the w-RMSD, number of target-ligase contacts, and triple
             distance metrics (Linear combination of w-RMSD, target-ligase contacts
             and number of target-PROTAC contacts). (a) The minimum I-RMSD
             over time during the simulation for the triple distance metric. Each
             green line indicates one replica and the black line is the average be-
             tween all runs. The blue line is a straightforward MD simulation run
             on Folding@home. (b) The minimum I-RMSD for each distance metric.
             (c) A scatter plot of the free energy vs the I-RMSD after clustering
             the 6HAX simulations. The circles are colored by w-RMSD. (d) The
             predicted binding rates for PROTAC 1 system (purple) and the ACBI1
             system (green). The black line is the experimental on-rate determined
             via Surface Plasmon Resonance (SPR). . . . . . . . . . . . . . . . . . . . 134
Figure 5.10: Illustration of the representative prediction produced by REVO sim-
             ulation and its comparison to the co-crystallized structure (Protein
             Data Bank (PDB) ID: 6HAX) (a) predicted ternary structure with I-
             RMSD=1.1 Å; (b) detail of the binding interface; (c) contact maps
             for the interfaces of co-crystallized and predicted structures. The cir-
             cle size reflects the number of atoms (including hydrogens) participat-
             ing in interactions; (d) structurally aligned prediction (green) and co-
             crystallized structure (pink) with a detailed PROTAC 2 comparison shown.135
Figure 5.11: Comparing the bound ensembles determined by docking and REVO sim-
             ulations with and without information from HDX for the PDB ID 6HAX
             ternary complex. The REVO bound ensemble is defined as structures
             below a warhead RMSD of 2 Å and more than 30 contacts between the
             target and ligase interface. The docking bound basin is defined as the
             100 top structures as determined by Rosetta-scoring. (a) Probability
             density function distributions of I-RMSD values for the bound ensem-
             bles. (b) The percent of structures in the predicted bound ensembles
             below specific I-RMSD thresholds (2 Å, 2.5 Å, and 3 Å). . . . . . . . . . 138
Figure 5.12: Most populated structures of SMARCA2 bound to VHL with different
             degrader molecules, identified by dimension reduction and clustering
             of HREMD simulation data. (a-d) Colors of VHL and SMARCA2
             represent HDX protection in the presence of the degrader molecules
             relative to the situation in the absence of the degrader. The second
             ranked structures of c PROTAC 2 and d isoform 1 SMARCA2 are
             displayed that support HDX data, whereas the top three structures are
             included in Figure 5.13. Elongin B and Elongin C are also included
             in panel d. e The top structures of ternary complexes are compared
             after aligning VHL to illustrate conformational differences among top
             structures of ternary complexes. . . . . . . . . . . . . . . . . . . . . . . . 139
                                              xviii


Figure 5.13: Cluster centroids from the three highest populated structures of SMARCA2-
             iso2 bound to VHL via (a) ACBI1, (b) PROTAC 1, and (c) PROTAC
             2, along with their populations. Less populated structures are omitted. . 140
Figure 5.14: Free energy landscapes determined from Principle Component Analysis
             (PCA) projections of SMARCA2-iso2 bound to VHL via (a) ACBI1, (b)
             PROTAC 1, and (c) PROTAC 2. Red points indicate k-means centroids.             140
Figure 5.15: a Conformational free energy landscape as a function of the first two
             Time-structure independent components analysis (TICA) features of
             SMARCA2-PROTAC2-VHL ternary complex inferred from a MSM.
             The ensemble of bound states from REVO simulations is shown as blue
             points; the crystal structure (PDB ID 6HAX) is shown as a red X. In
             this projection, states II and V are close to state I. b Network dia-
             gram of the coarse-grained MSM calculated using a lag time of 50 ns,
             with the stationary probabilities associated with each state indicated.
             c mean first passage time (MFPT) from one state in the MSM to an-
             other. Numbers indicate predicted MFPTs in µs. d-e Comparison of
             the crystal structure (gray) with the lowest free energy state (cyan) and
             a metastable state (orange) predicted by the MSM. Arrows indicate a
             change of orientation relative to d. . . . . . . . . . . . . . . . . . . . . . 142
Figure 5.16: Contact maps from the (a) co-crystallized structure 6HAX ; (b) global
             minimum state and (c) metastable state identified by our MSM. . . . . . 143
                                              xix


                              KEY TO ABBREVIATIONS
RT residence time. ii, 1, 3, 16, 18, 34, 40, 43, 65, 67, 75, 85, 86, 99, 100, 148, 150
MD Molecular dynamics. ii, xv, 3, 4, 5, 6, 7, 8, 10, 12, 13, 14, 15, 19, 43, 46, 50, 64, 66, 68,
    69, 71, 76, 86, 91, 92, 100, 103, 105, 112, 113, 116, 120, 132, 141, 144, 149
WE Weighted Ensemble. ii, 10, 11, 12, 13, 24, 46, 103, 104, 130, 148, 149
REVO Resampling of Ensembles by Variation Optimization. ii, ix, x, xi, xii, xiii, xv, xvi,
    xviii, xix, 11, 12, 16, 17, 20, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 38, 39, 46, 47,
    48, 49, 54, 56, 58, 59, 67, 69, 71, 72, 75, 77, 78, 91, 92, 93, 94, 95, 96, 97, 99, 101, 104,
    105, 113, 114, 117, 120, 130, 132, 133, 134, 135, 136, 137, 138, 141, 142, 145, 146, 147,
    149, 150
TSPO Translocator Protein (18kDa). ii, xiii, xiv, xv, 16, 65, 66, 67, 68, 69, 71, 72, 74, 75,
    77, 79, 80, 81, 83, 84, 87, 98, 99, 100, 147, 150
HDX Hydrogen deuterium exchange. ii, xvii, xviii, 17, 101, 103, 104, 105, 108, 109, 110,
    114, 115, 126, 128, 129, 130, 132, 133, 134, 137, 138, 139, 141, 144, 146, 149
PROTAC proteolysis-targeting chimera. ii, ix, xvii, xviii, xix, 17, 101, 102, 104, 105, 109,
    111, 112, 113, 114, 115, 119, 122, 124, 125, 128, 129, 130, 132, 134, 135, 136, 139, 140,
    141, 142, 144, 145, 148, 149, 150
HREMD Hamiltonian Replica Exchange Molecular Dynamics. ix, xvi, xviii, 101, 106, 116,
    117, 118, 119, 121, 137, 139, 141, 145
VHL von Hippel-Lindeu protein. ix, xvii, xviii, xix, 102, 104, 105, 106, 107, 108, 110, 111,
    112, 113, 115, 117, 119, 120, 122, 123, 124, 126, 127, 128, 129, 131, 132, 139, 140, 141,
    142, 144, 145, 146, 150
I-RMSD interface root mean square deviation. ix, xviii, 101, 120, 132, 133, 134, 135, 136,
    137, 138, 144, 145, 149
CB8 cucurbit[8]uril. x, 20, 21, 22, 27, 29, 31, 32, 33, 34, 38
OA octa acid. x, xi, xii, 21, 22, 23, 31, 32, 33, 36, 37, 38, 41, 44, 45, 48, 54, 55, 62
SASA solvent accessible surface area. xi, 36, 37, 38, 41
CSN conformation space network. xi, xiii, xiv, 16, 31, 38, 39, 67, 72, 75, 77, 78, 79, 85, 86
LASA lipid accessible surface area. xiii, xv, 76, 77, 78, 82, 86, 92
                                              xx


COM center of mass. xiv, 68, 73, 79
MSM Markov state model. xv, xvi, xix, 13, 14, 15, 16, 67, 85, 86, 88, 89, 90, 91, 93, 94,
    95, 96, 97, 106, 120, 141, 142, 143, 145, 147
MFPT mean first passage time. xv, xix, 29, 33, 34, 75, 76, 85, 86, 91, 100, 142
SPR Surface Plasmon Resonance. xviii, 3, 108, 134
PDB Protein Data Bank. xviii, xix, 66, 67, 68, 98, 104, 105, 111, 112, 116, 122, 132, 135,
    138, 142, 144
PCA Principle Component Analysis. xix, 14, 117, 118, 119, 137, 140
TICA Time-structure independent components analysis. xix, 14, 120, 141, 142, 145
GPU graphics processing unit. 6
CV collective variable. 8, 9, 11, 103, 130, 145, 149
WHAM Weighted Histogram Analysis Method. 8
RMSD root mean square deviation. 26, 27, 47, 48, 71, 103, 120, 145, 149
NAM Northrup-Allison-McCammon. 41
FEP free energy perturbation. 62
NMR nuclear magnetic resonance. 65, 66, 98, 102, 125
RAMD random accelerated molecular dynamics. 66
CGENFF CHARMM Generalized Force Field. 67
VDAC voltage dependent ion channel. 99
POI protein of interest. 101, 102, 104, 149
TPD targeted protein degradation. 102
                                             xxi


                                       CHAPTER 1
                                     INTRODUCTION
1.1     Importance of Kinetics in Drug Design
Historically, ligand efficacy has been predicted by measuring the thermodynamics of lig-
and binding. Examples include measurements of the half maximal inhibitory concentration
(IC50 ), equilibrium dissociation, and the change in binding free energy (∆Gbind ) [1, 2]. Al-
though the binding affinity has been successfully used to guide drug design, [3, 4, 5] these
experiments are conducted under closed equilibrium conditions. Living organisms do not
match these conditions as the body is in constant flux to maintain homeostasis. For example
the concentration of the ligand takes time to distribute throughout the body after a dose
is taken and does not necessarily ever reach equilibrium, as the body is also working to
metabolize and eliminate the ligand from the body. To address the above, Copeland has
suggested the key parameter to maximize the ligand efficacy should not be binding affinity,
but rather the residence time (RT), or the average time it takes for the ligand to unbind
from the target [2]. This motivates taking kinetics into consideration when developing new
drugs.
    After a drug is administered, it is absorbed and distributed throughout the body. The
body then metabolizes and excretes the drug. When the first two processes are dominant,
the concentration of drug in the blood plasma increases and when metabolizing and excreting
the drug becomes dominant, the concentration decays (Figure 1.1 A). In order to observe a
therapeutic effect, the drug concentration must be above a minimum threshold, called the
minimum effective concentration. The amount of drug introduced into the body is bound
from above by the minimal toxic concentration, or the concentration where an organism
starts observing harmful effects by the drug. The range between these two thresholds makes
up the therapeutic window. The goal of drug design is to widen the therapeutic window and
                                              1


 A                         Minimal Toxic Concentration
                                                                             B
                                                                             Log Drug Concentration
   Drug Concentration
                                                                                                      Slope = kelim
                                             Therapeutic Window
                        Eﬀective Minimum Concentration
                                         Time                                                          Time
Figure 1.1: Amount of drug concentration in the blood stream over time. Panel A shows
this relationship on a linear scale. The effective minimum concentration is shown in blue
and the minimal toxic concentration is shown in red. Panel B shows the log of drug
concentration vs time after the maximum concentration has been reached. From the
semilog plot, we can determine the elimination rate from the slope.
maximize the amount of time that a given dose of a drug remains effective.
       One method to maximize the time within the therapeutic window is to minimize the rate
of decay, i.e. increasing its half-life [6]. If a drug has a fast half-life then doses will either
need to be at higher concentrations, making it more likely that toxic effects are observed,
or be taken more frequently, which requires more effort on the part of the person taking
the medication. We can determine the half-life of our drug (τ1/2 ) by calculating the rate of
elimination (kelim ), which is done by taking the log of the drug concentration in the blood
after the maximum concentration has been reached and calculating the slope of the semilog
plot [7] (Figure 1.1 B). The half-life is then calculated by the following relation:
                                                                  log(2)
                                                         τ1/2 =          .                                            (1.1)
                                                                   kelim
       However, there is often an observed time lag between the plasma concentration and the
observed therapeutic effect [8]. To take this delay into account it was hypothesized that the
drug took time to move from the blood into the tissue of interest and bind to the target
of interest [9, 10]. Several previous studies have linked the change in drug concentration
                                                                  2


and therapeutic effects with the (un)binding rate between the drug and the target protein
[11, 12, 13]. Pharmacokinetic-pharmacodynamic models have been developed that integrate
drug binding kinetics with thermodynamic information to predict drug activity [14]. These
models significantly improve the prediction in cases of long drug RTs as previous models
assumed the existence of a rapid equilibrium between drug and target and under predicted
drug activity in this scenario.
    The optimization of ligand kinetic rates, also known as kinetics-orientated drug design,
has recently been gaining traction when designing new drugs [15, 16, 17]. This is due to
identifying systems where binding affinity does not correlate with drug efficacy, rather the
RT does.[18, 19, 20]. By developing drugs with longer RTs, we can ensure the therapeutic
effects of the drug are observed for longer periods of time.
1.2     Computational Methods to Determine Kinetics
We have established the importance of kinetics and RTs in drug design. There are several
methods to experimentally determine the kinetics from radioligand binding [21, 22], Surface
Plasmon Resonance (SPR) [23, 24] and florescence assays [25, 26, 27]. However, developing
drugs to optimize the ligand kinetics experimentally is difficult because we can not under-
stand the mechanism with which the (un)binding takes place. In particular, identifying the
transition state – the maximum free energy state along the unbinding pathway – is critical
to determine in order to develop drugs with longer RTs. Unfortunately, characterizing these
states experimentally is not feasible because the system is at this state for only an instant
during the unbinding event. Therefore, we turn to computational methods to help us inves-
tigate the kinetic pathways. In particular in this thesis we will use an algorithm which can
give us atomic resolution, molecular dynamics (MD), to observe these pathways.
                                              3


1.2.1    Molecular Dynamics
MD is a computational algorithm that simulates molecular interactions and motion at the
atomic level using classical mechanics [28]. First developed to study phase transitions using
hard spheres [29] and radiation damage [30], modern computing allows simulations to study
more complex phenomena such as molecular diffusion through biological membranes [31, 32],
DNA supercoiling [33], and stability of intermolecular conformations [16, 34, 35, 28]. The
MD algorithm works by first setting initial conditions of the system (atomic positions and
velocities). Then the net force on each atom (F ) is calculated by:
                                        F (t) = −∇U (r(t)),                                  (1.2)
where U is the potential energy for a set of atomic positions (r), at time t. For MD simula-
tions the force is in kcal
                      mol
                           , potential energy is in the units of kcal/mol, the positions are in Å,
and time is in fs.
    The potential energy function and its associated parameters are determined by the force
field used in the simulation. This potential energy function includes terms for bonds, angles,
dihedrals between covalently bonded atoms, and non-bonded energies (electrostatics and
Lennard-Jones) for non-covalently bonded atom pairs (Figure 1.2). Some common force
fields used in biomolecular MD simulations are AMBER[36, 37, 38], CHARMM [39, 40], and
GROMOS [41].
    After the force has been calculated, we then solve Newton’s equations to determine the
change in atomic positions
                                            dr(t)
                                                  = v(t),                                    (1.3)
                                              dt
and change in atomic velocities (v)
                                      d2 r(t)    dv(t)   F (t)
                                              =        =       ,                             (1.4)
                                        dt2       dt       m
over the time step. The mass, m, has units of grams/mol, and velocities are in Å/fs. Once
the changes are known, we update the positions and velocities for the atoms. We repeat this
                                                  4


                                                      Bond
                                                       Angle
                                                      Dihedral
                                                       Van der Waals
                                                      Electrostatics       -        +
Figure 1.2: (Left) The equation to calculate potential energy of a molecular system. r is
the bond length. θ is the bond angle, φ is the dihedral angle and rij is the atomic distance
between atoms i and j. kr , kθ , and kφ are force constants. req , θeq , and φeq are equilibrium
positions. The n is multiplicity, γ is a phase shift to describe a periodic dihedral term. The
eij is the Lennard-Jones well depth and rm is the distance the potential reaches its
minimum. qi and qj are charges for atoms i and j and 0 is the dielectric constant.
(Center) Each summation calculates a separate type of energy. (Right) A pictorial
representation of each type of energy. The parameters are defined by the molecular
dynamic force field. This figure is modified from Ref [42].
process as many times as necessary to observe the phenomenon of interest with statistical
significance.
    The above formulation for MD simulations was designed to keep the energy of the system
constant. However, many biological experiments are not conducted under constant energy,
rather at constant temperature. To keep MD simulations at constant temperature, a thermo-
stat is required such as a Langevin thermostat using Langevin dynamics [43]. This algorithm
uses the above formulation for MD as explained above, but adds two additional terms:
                                                                                               (1.5)
                                                           p
                        F (t) = −∇U (r(t)) − γmv(t) +         2mγkb T R(t),
where γ is the friction coefficient in units of ps−1 and is generally set to 1, kb is the Boltzmann
constant in units of  kcal
                      molK
                           , T is the absolute temperature in K, and R(t) is a Gaussian random
process centered at 0 and has the following properties:
                                            hR(t)i = 0,                                        (1.6)
                                                  5


                                    hR(t)R(t0 )i = δ(t − t0 ),                             (1.7)
where δ is the Dirac delta function. The second term is designed to take energy out of the
system and damp the force [44]. The third term represents random thermal fluctuations
from small particles that can add energy back into the system [44]. The sum of these two
terms are what maintains the temperature of the simulation [45].
    MD is an excellent tool to study the biological pathways of binding and unbinding,
create statistical ensembles to describe these events and make hypotheses that can be tested
experimentally. However, it is difficult to observe long timescale phenomena in simulation,
due to the discrepancy between the simulation time steps and the natural timescales with
which these events take place. MD time steps are constrained to 1-2 fs in order to capture
the fastest molecular motion (oscillation along covalent bonds). Typical ligand (un)binding
events take place in the millisecond to minute timescales. This means that it would take on
the order of 1015 time steps to simulate an event that is 1-2 seconds in real time.
    Advancements in computing power and computer architecture have led to improvements
in MD simulation speed. Running simulations on supercomputers using graphics processing
unit (GPU)s have been able to simulate on the order of tens to hundreds of ns per day
[16, 34, 35, 46, 47]. The creation of specialized supercomputers, the Anton series, specifically
designed to run MD simulations can simulate on the order of µs per day[48, 49, 50]. In
another approach, the Pande lab used idle computing power from personal computers of
volunteers on the Folding@home network to simulate a comparable time scale [51]. However,
these improvements are still not enough to be able to reach the timescales to simulate
biological phenomena of interest.
1.2.2    Enhanced Sampling
MD simulations are able to model (un)binding events, but computational limitations prevent
these simulations from reaching the natural timescales to simulate rare events. Many en-
hanced sampling algorithms have been developed to overcome this limitation and simulate
                                                 6


long timescale events. These can be put into three broad categories: parallel tempering,
modified potential energy, and trajectory parallelization enhanced sampling methods. All
these methods are capable of simulating long time events.
1.2.2.1     Parallel Tempering Methods
Parallel tempering[52] (also known as temperature replica exchange), aims to improve the
breadth of MD sampling by running several copies of the system ("replicas") at different
temperatures. After running for a given number of time steps, the energy is calculated
for each replica. Temperature swaps between systems are suggested the energy difference
between the swap is calculated by:
                                      ∆ = (βi − βj )(Ei − Ej ),                              (1.8)
where Ei is the energy of the simulation at the conformation associated with temperature i
(Ti ) and Ej is the energy of the simulation at the conformation associated with temperature
j (Tj ). βi is defined as:  1
                           kb Ti
                                 , and βj is defined as   1
                                                        kb Tj
                                                              respectively. The energy is in units
of kcal/mol and β has units of mol/kcal. It is worth noting that the relation Ti < Tj is
always true when calculating the energy difference, making the first term always positive. If
the temperature swap lowers the overall energy (∆ < 0) then the temperature swap occurs.
However, if ∆ is positive, then we can perform the temperature swap with a probability of
e−∆ .
     The idea behind parallel tempering is that it is easy to get trapped in local minima
in the free energy landscape at lower temperatures. However, by having simulations at
elevated temperatures, the simulation can more easily cross large energy barriers, then cool
the simulations down to explore this new region at the desired temperature.
     In this category of enhanced sampling algorithms, the mechanism for (un)binding does
not need to be known before the simulation begins, nor is a reaction coordinate needed.
However, by allowing the simulations to heat and cool in such an unnatural way, it is difficult
                                                  7


to generate the kinetics because when the trajectories swap energies, they are no longer
continuous[53]. Additionally, there can be poor mixing of hot and cold trajectories resulting
in an inaccurate landscape[54]. Finally, there needs to be an adequate number of energy levels
to make sure the probability distributions between the trajectories sufficiently overlap to
ensure a reasonable acceptance probability for trajectory swaps[55]. This can require several
different energy levels to ensure sufficient overlap prohibitively increasing the computational
cost.
1.2.2.2    Altered Potential Energy Methods
The altered potential energy category of enhanced sampling is a broad category, including but
not limited to metadynamics[56, 57, 58], umbrella sampling[59] and temperature accelerated
MD[60, 61], but all the algorithms bias the system along a small set of collective variable
(CV) to help cross energy barriers more easily. These algorithms bias the potential energy
by the following equation:
                                        Usim = U + Ubias ,                                (1.9)
where U is the unbiased potential energy and Ubias is the potential energy that is biasing
the simulations. However, the method-specific means of biasing the system is quite varied.
For example, in umbrella sampling the CV is divided into a series of independent windows
and a harmonic potential is added to the existing Hamiltonian; this helps flatten the energy
barriers in the window. MD simulations are then performed in each of the windows. Analysis
is performed using the Weighted Histogram Analysis Method (WHAM) [62, 63], which works
by combining data from all windows to obtain the original, unbiased free energy profile.
    In contrast, in the metadynamics algorithm only one simulation is run over the entire
landscape. Each time the simulation visits a location along the CV, a Gaussian, centered at
that CV value is added to the Hamiltonian. As the simulation progresses, these Gaussians
accumulate and allow the simulation to gradually fill up the free energy basin the simulation
is stuck in, allowing it to cross energy barriers. A common analogy for this process is filling
                                                8


up the free energy landscape with sand. As basins fill with sand, it becomes easier for the
simulation to cross over a previously large energy barrier.
    Similar to the parallel tempering algorithms, the altered potential methods make explor-
ing the free energy landscape easier. However, there are a couple of drawbacks in using these
methods. First is the need for a CV, which requires the prior knowledge about potentially
critical variables that lead to the biological phenomenon of interest[64]. These variables
are not trivial to determine and can lead to unphysical results if critical variables are not
included. Additionally, by adding the bias to the potential energy function, the bias needs
to be removed to calculate observables[56]. While there are ways to remove the bias and
approximate transition rates, this process assumes that the deposited bias did not affect the
dynamics at the transition state [65]. Finally, these simulations assume the system is in
equilibrium, which is not always justifiable in biomolecular systems.
1.2.2.3    Trajectory Parallelization Enhanced Sampling Methods
The two categories described above enhance sampling of the free energy landscape by altering
the Hamiltonian of the system being simulated. In the case of parallel tempering, the tem-
perature swaps make determining realistic pathways impossible due to the unrealistic energy
jumps between simulations. Additionally the altered potential energy methods require prior
knowledge to determine good collective variables to sample along. Here we discuss a group
of algorithms that do not change the Hamiltonian and thus do not bias the system, produce
continuous trajectories, and give us the ability to compute path dependent observables such
as kinetic rates.
    Algorithms that fall into this category, such as milestoning[66, 67, 68], forward flux
sampling[69, 70] and weighted ensemble[71, 72, 73], use sampling over many parallel tra-
jectories to gain a full understanding of the energy landscape. Both the milestoning and
forward flux sampling algorithms construct a number of hypersurfaces that are arranged be-
tween two basins of interest. By simulating the flux from one hypersurface to another they
                                               9


get the local kinetics between the surfaces and then combine these local rates to determine
the global rate of transition between the basins. However to form these surfaces, prior infor-
mation is needed about the pathway to know how to construct them. In this thesis work, we
have used the weighted ensemble algorithm [71, 73], described in detail in the next section,
to simulate the unbinding and binding pathways because it has variants (developed in our
research group)[34, 35, 74] that do not require any prior information about how to get from
one basin to another.
1.2.3   Weighted Ensemble
The previous section described different algorithms to enhance the breadth of sampling
from MD simulations. Here we go into detail about the enhanced sampling algorithm we
use for all simulations in this thesis: weighted ensemble (WE). WE is a path sampling
algorithm that attempts to focus computational resources on the low-probability states that
are relevant to an observable or process of interest. An example of these states is the set
of "transition states", conformations corresponding to the highest point on the free energy
surface separating two basins, along an unbinding pathway. The original WE approach was
outlined in 1996 by Huber and Kim [71] to simulate Brownian motion between a product
and reactant basin. However, WE has been used to simulate a variety of processes from
protein folding [75], to ligand (un)binding [76, 77, 78, 16, 35, 34], large scale conformational
changes[79, 80, 81], ion permeation[72] and protein-protein binding[82, 83].
    The WE algorithm has two distinct steps: dynamics and resampling, the combination of
these two steps is called a cycle. In the dynamics step, we have a set of simulations (called
walkers) that each have an associated statistical weight (w), which sum to 1. The walkers
undergo MD independently for a certain amount of simulation time (τ ). After each walker
has completed the dynamics step, we perform resampling. In resampling we can perform
two operations on the walkers called merging and cloning. Merging involves taking two
walkers and choosing one to remain and one to kill. The survival probability of each walker
                                              10


is proportional to its weight. The weight of the surviving walker then increases by the weight
of the one that was killed. Cloning involves making exact replicas of a single walker. The
weight of the original walker is divided evenly among its clones.
    In the original WE algorithm, resampling is performed using bins, which are subdivided
regions defined on a progress coordinate. Walkers are then cloned and merged in order to
achieve a constant number of walkers in each bin (M ). If there are more than M walkers
in a bin, then we perform merging between these walkers in order to bring the number of
walkers back down. If a bin has fewer than M walkers, then clones are made in the bin
until we have the necessary number of walkers. This method increases the computational
cost of the simulation when new bins are found. Another issue of using the WE algorithm
when applied to simulating biologically relevant events, is that the dimensionality of these
processes is typically high, and the number of bins increases exponentially with the number
of dimensions[84]. This means that we would need to spend a lot of computational resources
to simulate these events.
1.2.4    Resampling of Ensembles by Variation Optimization
WE is able to simulate (un)binding pathways and calculate path dependent observables.
However, the algorithm divides the landscape into bins which guide the resampling. De-
termining a low dimensional set of CVs that sufficiently describes the pathway of interest
is not trival, especially when multiple pathways exist. Additionally, binning in high dimen-
sional spaces becomes exponentially more computationally expensive as more dimensions are
required to describe the pathway. Here we discuss a binless weighted ensemble algorithm de-
veloped by our research group: Resampling of Ensembles by Variation Optimization (REVO)
[35]. In the REVO algorithm, merging and cloning is guided by an objective function called
the trajectory variation (V). The variation is defined as:
                                   X        X X  dij α
                               V =     Vi =                 φi φj ,                     (1.10)
                                    i         i    j
                                                      d0
                                                11


where Vi is the trajectory variation contribution from walker i, dij is the distance between
walkers i and j, d0 is a characteristic distance used to normalize the variation when comparing
between different distance metrics and keep the distance term unitless, and φi is the novelty
term that describes the importance of individual walkers. REVO balances the exploration
(distance) term with the exploitation (novelty) term using α. In this thesis the novelty is
defined in terms of walker weights and can be mathematically described as:
                                                        p     
                                                           min
                                    φi = log(wi ) − log          ,                       (1.11)
                                                          100
where wi is the weight of walker i and pmin is the minimum weight a walker is allowed to be.
pmin is generally set to 1 ∗ 10−12 .
    V is initially calculated and then walkers are proposed to be merged and cloned. The
walker that is proposed to be cloned (walker i) is the one that has the highest walker
trajectory variation, Vi , and whose cloned weight would be greater than pmin . A pair of
walkers (walkers j and k) are selected to clone based on minimizing the expected trajectory
variation loss (Vloss ) which is defined as:
                                               Vk wj + Vj wk
                                       Vloss =               .                           (1.12)
                                                  wj + wk
For two walkers to be eligible for merging, they need to be within a distance cutoff of each
other, called the "merge distance", and the sum of their weights needs to be lower than the
maximum allowed weight (pmax ), set to 0.1 for this thesis. Once these walkers are selected,
V is recalculated as though the merging and cloning operations have been performed. If
V increases, the merging and cloning operations are performed. This resampling process
repeats until V has been maximized. Once V is maximized, a new cycle begins and more
MD is performed.
1.2.5    Rate Calculations by Ensemble Splitting
WE is capable of directly calculating observables such as on and off-rates using a technique
called ensemble splitting[85, 86, 87, 88, 89]. Using this technique an equilibrium ensemble
                                                 12


is split into two non-equilibrium ensembles and an unbound and bound basin are defined.
Starting trajectories from one basin, the rate between the two basins is computed as the flux
of trajectories between the two basins.
    The on-rate is defined as:
                                                       wiB
                                                  P
                                          kon =     i
                                                           ,                               (1.13)
                                                    CT
where wiB is the weight of a walker that transitioned from the unbound basin into the bound
basin, C is the concentration of the ligand that is binding in units M of and T is the aggregate
simulation time in µs. The sum is over all walkers that transitioned.
    Similarly the off-rate can be calculated by:
                                                       wiU
                                                  P
                                         kof f =     i
                                                            ,                              (1.14)
                                                     T
where wiU is the weight of a walker that transitioned from the bound to unbound basin.
    To determine the fluxes into each basin, two sets of simulations are run: a binding simula-
tion where trajectories are initialized in the unbound state and are terminated in the bound
state; and an unbinding simulation where trajectories are initialized in the bound state and
terminated in the unbound state. To achieve this, once a walker has crossed the termina-
tion boundary, it is reinitialized in the initial state by resetting its atomic conformation and
velocities. The walker’s weight however, is not affected.
1.2.6     Markov State Models
Although WE does a good job in sampling low-probability states and transitions, the rate of
convergence to the equilibrium probability distribution is slow and therefore, it can take a lot
of computational time to get adequate sampling of the space. An additional issue is the vast
amounts of data the simulation produces, which makes performing a quantitative analysis
expensive in terms of computer memory and disk space. Here we describe the formulation
of a Markov State Model (MSM), which uses the results from MD simulations to help build
a statistically robust and efficient model that can help solve the above issues.
                                                 13


    A MSM is a model that describes long timescale dynamics among a set of m macrostates
[90]. It is based on the Markovian assumption: that the model is a memoryless description
of an ergodic process[90, 91]. This means that the history of a trajectory does not affect
its dynamics, the evolution of a trajectory only depends on its current position. An ergodic
process implies that given infinite time any macrostate can be reached from any other state.
The evolution of a trajectory from one state to another is entirely described by a transition
matrix T .
    To generate these macrostates, the results of the MD simulation are clustered together.
The issue with clustering the atomic positions directly is that the dimensionality of molecular
systems is large: it is equal to 3N , where N is the number of atoms, typically on the order
of 104 to 105 . This makes the clustering a computationally expensive operation and can
overshadow important long timescale dynamics with many sources of noise. Therefore it is
common to reduce the dimensions of the system by dimension reduction algorithms such as
Principal Component Analysis (PCA) [92, 93] and Time-structure Independent Components
Analysis (TICA)[91, 92] in the construction of MSM. Additionally, the dimensions can be
reduced to a set of features that describe the pathway of interest[94]. Example features can
be atomic distances of a subset of atom pairs or dihedral angles. In this thesis, the MSMs did
not use TICA for dimension reduction but we did use sets of distances between key atoms
to featurize the space in Chapters 2 and 4, and we used PCA in Chapter 5. After the states
have been defined, T can be generated by counting the transitions between states that occur
for a specific lag time (τ ).
    From an initial probability distribution on the MSM, which we call p0 , we can evolve the
system by applying T as follows:
                                           p0 T = p τ ,                                  (1.15)
where pτ is the probability distribution after time τ . Additionally we can evolve the simu-
lation by multiple time steps by:
                                          p0 T n = pnτ                                   (1.16)
                                               14


where n is the number of lag times for which we are evolving the system. If the system
is Markovian and ergodic, then eventually the system no longer evolves when applying T
(i.e. when pnτ ≈ p(n+1)τ ) and the system reaches its stationary distribution. We define
this distribution as π. Mathematically π is an eigenvector of the transition matrix with a
corresponding eigenvalue of 1, which is the maximum eigenvalue. If the eigenvalues and
eigenvectors of T are sorted from the largest to the smallest eigenvalues, excluding the
eigenvalue equal to 1, the sorted eigenvectors correspond to dynamical motions in the MSM
and the larger the eigenvalue, the slower the motion.
    From the MSM, we can determine the kinetics of going from one state to another by
determining the relaxation time of the model. The errors for determining kinetics from
MSMs come from discretization and spectral errors [90]. To minimize these errors one needs
to ensure that: i) features being used to cluster the trajectory data need to capture the
slow dynamical motions, ii) there are enough macrostates in the model to approximate the
dynamics, and iii) to use a large enough lag time. The third requirement is challenging as
longer lag times reduce the discretization error[90], but require longer MD simulations to
construct the MSM which might not be feasible. To determine an appropriate lag time it
is common to satisfy the Chapman-Kolmogorov test [95]. This test compares the relaxation
time with respect to the lag time, and is satisfied when the relaxation time is essentially
constant with respect to changes in the lag time.
1.3     Outline of Work
The overarching goal of this thesis is to apply molecular dynamics trajectories, guided by
weighted ensemble algorithms, to investigate (un)binding pathways for biologically relevant
systems. The specific goals are:
    • Characterize pathways of host-guest (un)binding reactions.
    • Estimate physical quantities such as binding and unbinding rates and binding free
      energies.
                                              15


    • Develop methodologies to apply experimental results to guide simulations.
    We first test the REVO algorithm using small host-guest systems in Chapter 2. The
host-guest systems is comprised of two small molecules that form a complex, the larger
of two is defined as the host and the smaller is the guest. These systems are used as
validation for our methodologies as they are simpler than biological systems. We determine
the unbinding and binding pathways from several different initial host-guest conformations.
We introduce the use of conformational space networks(CSNs) to visualize these pathways
in a graph visualization to help understand the complex pathways which the guest can
transition between the bound and unbound basins. From these simulations we calculate
the rates and from these the ∆Gbinding energies and compare the results to those previously
reported.
    Chapter 3 will expand on the work in Chapter 2. While the previous work used a correct
relationship to describe a macroscopic state in equilibrium, we introduce correction terms
to take into account a finite box volume, electrostatic interactions between the host and
guest, and the volume of the unbound basin. Additionally we investigate the effect the
Langevin dynamics friction coefficient has on the kinetics and ∆Gbinding with and without
the correction terms.
    In Chapter 4, we increase the complexity to a membrane bound protein-ligand system
(Translocator protein (18 kDa)-PK-11195) (TSPO). This system has a significantly longer
unbinding RT which pushes the limits of the REVO algorithm. Additionally there is an
interest in developing specific ligands that target TSPO for possible treatments of neurode-
generative diseases. In this chapter we simulate the unbinding process for PK-11195 from 5
different starting poses and quantify two distinct unbinding pathways for PK-11195 disso-
ciation. We determine key residues that have strong interactions with PK-11195 along the
unbinding pathways. We computed unbinding rates similarly to Chapter 2 for each pose
however, we also constructed an MSM to verify these results. Finally we used committor
probabilities alongside the MSM to determine the transition state for unbinding.
                                               16


    In Chapter 5, we investigate the binding process for a ternary complex comprised of a
ligase, a proteolysis-targeting chimera (PROTAC), and a target protein. This is a more com-
plicated process than previously explored in this thesis as we are trying to form a ternary
complex involving two proteins and a small linker molecule. Since REVO is naturally de-
signed to maximize the exploration of the landscape, we alter the resampling algorithm by
only cloning walkers that have sufficiently progressed toward the target state. Additionally,
we develop new distance metrics that take hydrogen deuterium exchange (HDX) experimen-
tal data to help drive the simulation to the bound state.
    In Chapter 6 we take a high level look at the goals of this thesis and discuss the progress
and describe next steps for improvement.
                                              17


                                          CHAPTER 2
      PREDICTING LIGAND BINDING AFFINITY FOR THE SAMPL6
      CHALLENGE FROM ON- AND OFF-RATES USING WEIGHTED
                           ENSEMBLES OF TRAJECTORIES
This work was published in Journal of Computer-Aided Molecular Design volume 13, pages
1001-1012 in 2018. The work is presented here as published except that the supplemental
figures are worked into the text.
2.1     Introduction
Binding affinity has long been seen as the crucial parameter for drug discovery, as it de-
termines the proportion of drug that is bound to a receptor in solution. A wide variety of
methods have emerged to predict both absolute and relative binding affinities, each with
its own domain of applicability, and tradeoff between efficiency and accuracy [96, 97]. The
SAMPL challenge is playing an important role to compare tools that predict affinities using
blind predictions [98]. Importantly, errors can arise from both the physical model used to
describe the system (e.g. forcefield, thermostat, dynamics engine), and from the sampling
methodology used. The SAMPLing challenge, described in this issue, thus serves an im-
portant role in comparing the accuracy of computational methods that all employ the same
physical model [99].
    While the binding affinity is all that is needed to describe the action of a ligand at
equilibrium, the on (kon ) and off-rates (koff ) are necessary to model drug action in general
[2]. For instance, in many systems it has been observed that drug residence time (RT)
(RT = 1/koff ) is the critical factor governing efficacy in living cells [18, 20, 19]. This is due to
the number of factors that drive the system out of equilibrium, such as drug metabolism and
elimination, the turnover of target protein, and the periodic nature of drug administration.
Although KD = koff /kon , and lower KD can be correlated to lower koff , this relationship is
governed by the free energy along the ligand binding pathway, particularly the ligand binding
                                                18


transition state, which is the highest point in free energy between the bound and unbound
states [100]. Though the binding rate has an upper bound of 109 M−1 s−1 , which corresponds
to the “diffusion limit”, binding rates of ligands to the same target have been shown to vary
over 4 orders of magnitude, which disrupts the correlation between KD and koff [101].
    Prediction of koff and kon is challenging, as they are not state functions: they depend
fundamentally on the transition path ensemble between the bound and unbound states.
Computational sampling of these transition paths is in general a great challenge for molecular
dynamics (MD) due to the long timescales of ligand binding and release, although in recent
years, a variety of enhanced sampling methods have rose up to meet this challenge [102]. The
trypsin-benzamidine system has served as a common benchmark application for enhanced
sampling methods such as Adaptive Multilevel Splitting [103], SEEKR [104], adaptive [105]
and traditional [106, 107] Markov state modeling, funnel metadynamics [108], as well as the
WExplore method developed by our group [109]. Recently these efforts have been expanded
to more challenging systems such as the unbinding of inhibitors from c-Src kinase [110] and
p38 MAP kinase [56] using metadynamics, and the unbinding of the TPPU ligand from
the target soluble epoxide hydrolase with WExplore [16]. The diversity of computational
approaches to handle long timescale ligand binding and release events is a promising sign
for the field, but comparison of methodologies is complicated – even for applications to the
same system – due to differences in forcefields, boundary conditions, and integrators.
    As a step toward the robust comparison of different computational methods for simu-
lation of binding pathways, we participated in the SAMPLing challenge for the prediction
of binding affinities. The SAMPLing challenge required participants to compute free ener-
gies as a function of simulation time, to compare the convergence properties and relative
computational cost of different free energy calculation methods. Instead of computing free
energies through alchemical perturbation, here we achieve this by explicitly simulating the
binding and release processes, determining the absolute rates kon and koff , and computing
                                                19


the binding affinity as the ratio koff /kon . We calculate the binding free energy as follows:
                                                           
                                                      kof f
                                     ∆G = kT ln                                             (2.1)
                                                     C0 kon
where kT = 0.597 kcal/mol corresponding to a temperature of 300 K and C0 is the reference
concentration of 1 mol/L. As we broadly sample unbinding pathways from multiple starting
points, we can also synthesize these results and examine how these poses are connected in
the binding network.
    We efficiently determine unbinding and binding rates using a further developed variant of
the WExplore sampling method [84]. This is the first application of this new method, which
we call Reweighting of Ensembles by Variance Optimization (REVO). This new method
is also based in the weighted ensemble framework [71], where trajectories are merged and
cloned, but it is the first to completely eschew the idea of dividing a space into a set of sam-
pling regions (the possibility has previously been recognized however [73]). REVO instead
directs merging and cloning operations by maximizing a measure of variance that describes
the instantaneous spread of the ensemble of trajectories, which is described in the Methods
section below. We visualize our REVO simulations using a branching tree network diagram,
whose layout uses an energy function that takes into account the distances between the tra-
jectories. This allows for the easy visualization of the correlation of exit point ensembles
within a weighted ensemble simulation. We compare our binding affinities to computational
reference values, and observe that the affinities from REVO are systematically tighter than
the reference. We conclude the manuscript with a discussion of possible sources of error.
2.2     Methods
2.2.1    Host-guest systems
The host-guest systems were selected from the main SAMPL6 challenge. One system is
a cucurbit[8]uril (CB8) host [111, 112], using quinine as a guest ligand (Figure 2.1). The
host is a ring-shaped structure, with 8-fold rotational symmetry about the vertical axis,
                                                20


and two-fold symmetry about the horizontal axis. There are thus 16 symmetry-equivalent
atom mappings for this system. The second and third systems both use a Gibb deep cavity
cavitand, referred to as OA, as a host [113]. Here there is only 4-fold symmetry about the
vertical axis. Binding and release of two ligands is examined: 5-hexenoic acid and 4-methyl
pentanoic acid, referred to as OA-G3 and OA-G6, respectively. Both of these ligands carry
an explicit negative charge.
                                            N
                        O
                                                          H
                        CH3                           N
                                      HO
                                                                   CH2
                      CB8-G3
                                     H2 C                            O
                                                                   -
                      OA-G3                                    O
                                                 CH3
                                                                     O
                                          H3C
                                                                   -
                      OA-G6                                    O
Figure 2.1: Structure of the ligands used in this study. (Top) Quinine, referred to herein
as CB8-G3. (Middle) 5-hexenoic acid (deprotonated form), referred to herein as OA-G3.
(Bottom) 4-methyl pentanoic acid (deprotonated form), referred to here ohhh in as OA-G6.
                                              21


2.2.2    Dynamics Setup
The fifteen initial configurations (five for each host-guest system) were used as prepared by
the organizers of the SAMPLing challenge without modification (Figure 2.2-2.4). The two
OA systems had a cubic box with a box length of 45 Å solvated with 2586 water molecules,
and contained 12 sodium ions and 3 chloride ions to neutralize the system. The CB8 system
had a cubic box with a box length of 42.5 Å solvated with 2149 water molecules, and
contained 6 sodium ions and 6 chloride ions to neutralize the system. OpenMM v7.1.1 [114]
was used to run dynamics on the CUDA v8.0 platform. We use a Langevin integrator, with
a thermostat at 300 K, a friction coefficient of 1.0 ps−1 , a Monte Carlo barostat to keep
pressure constant at 1 atm, and a time step of 2 fs. The non-bonded forces had a cutoff of
1 nm, and were calculated using partial mesh Ewald. The simulation temperature differs
slightly from that used to calculate the reference free energies (298.15 K), although we expect
the resulting differences in free energy will be negligible.
  CB8-G3
  side
  top
             Pose 0            Pose 1             Pose 2            Pose 3           Pose 4
Figure 2.2: Starting poses for CB8-G3. Side and top views are shown. Coloring for pose
indices is consistent with Figures 2.7, 2.8 and 2.9.
                                                22


 OA-G3
 side
 top
            Pose 0           Pose 1            Pose 2       Pose 3           Pose 4
Figure 2.3: Starting poses for OA-G3. Side and top views are shown. Coloring for pose
indices is consistent with Figures 2.7, 2.8 and 2.9.
 OA-G6
 side
 top
            Pose 0           Pose 1            Pose 2       Pose 3           Pose 4
Figure 2.4: Starting poses for OA-G6. Side and top views are shown. Coloring for pose
indices is consistent with Figures 2.7, 2.8 and 2.9.
                                              23


2.2.3    Reweighting of Ensembles by Variance Optimization
To encourage the sampling of rare events, we developed a method based on the weighted
ensemble (WE) framework [71] that we call “Reweighting of Ensembles by Variance Opti-
mization”, or REVO. WE methods use an ensemble of trajectories (called “walkers”) that are
each assigned a statistical weight, and enhance sampling through the introduction of cloning
and merging steps. Initially the weights of all the walkers are equal, and are defined as
1/Nwalk , where Nwalk is the total number of walkers. When walkers are cloned, their weight
is divided among the progeny. The cloned trajectories are identical replicas of the original,
with the same atomic positions and velocities. This is typically done in under-sampled re-
gions of space, in order to boost the probability of observing rare events in the simulation.
Walkers are also merged together, and their summed weight is given to the resulting merged
walker. In practice, merging walkers A and B is accomplished by choosing a survivor (walker
A is chosen with probability     wA
                               wA +wB
                                       ),  and discarding the other walker. Merging is typically
done in over-sampled regions, with walkers that can be seen as “redundant”. The trajectory
weights are only changed due to merging and cloning operations.
    Previous applications of the weighted ensemble methods proceed by constructing a set of
sampling regions, determining their occupancies, and using cloning and merging operations
to make the occupancies as even as possible. In general, the free energy landscapes of interest
are inherently high-dimensional, which makes it difficult to construct an appropriate set of
regions. For this reason we were motivated to discard the notion of “regions” entirely, and
direct cloning and merging operations instead by the optimization of a variance measure, V :
                                    X          X X  dij α
                              V =         Vi =                 φi φj ,                     (2.2)
                                     i          i    j
                                                          d0
where the double sum is over all pairs of walkers, dij is some measure of distance between
walkers i and j, d0 is the characteristic distance, the exponent α is a parameter set here to
4, and φa is a weighting function for walker a:
                                                          
                                                    100wa
                                       φa = log              ,                             (2.3)
                                                     pmin
                                                  24


where wa is the weight of trajectory a, and pmin is the lowest probability attainable by a
walker, set here to be 10−12 . The weighting function φ was designed to be largest for high
wa , and to smoothly decay to a low value as wa approaches pmin .
                                                            Run Dynamics
                                                          Calculate all-to-all
                                                                distances
                                                         Calculate trajectory
                                                             variance (Vold)
                                                   Determine walkers that would
                 Perform clone + merge              be best to clone and merge
                       Adjust weights                  Clone: least central walker   End clone + merge
                      set Vold = Vnew                  Merge: most central walker
                                                        (with its closest neighbor)
                                    if Vnew > Vold   Calculate new trajectory       else
                                                            variance (Vnew)
Figure 2.5: The REVO algorithm. Each cycle begins by running an ensemble of walkers
forward in time using unbiased dynamics. The distances between the walkers are used to
calculate a variance (Eq. 2.2). In the resampling loop (blue), coupled cloning and merging
operations are proposed, and they are accepted only if they result in a higher V . If the
proposed V is lower, the resampling loop is terminated and dynamics are continued for the
next cycle.
    The structure of the REVO “resampling” algorithm proceeds as follows (see also Fig. 2.5).
Eq. 2.2 is used to compute the variance function, and the walker with the highest (“H”) and
lowest (“L”) contributions to the variance are identified (e.g. with the highest and lowest Vi
values). The closest walker to “L” is identified, called “C”. A coupled cloning and merging
event is proposed, where “C” and “L” would be merged and “H” would be cloned. Eq. 2.2 is
again used to recompute the variance, and this coupled cloning and merging move is only
accepted if V increases. Further moves are proposed after recomputing “H”, “L” and “C”, and
the process continues until V decreases, and the move is rejected. This way, the algorithm
automatically determines the optimal number of cloning and merging events. In fact, if the
                                                                25


system is already in an optimal configuration, no cloning and merging operations will take
place, and REVO will skip to the next dynamics step.
     As in previous WExplore applications [16], a minimum and maximum walker weight was
enforced (pmin and pmax , respectively). Note that only walkers which will not violate the
walker probability boundaries (pmin and pmax ) are eligible to be chosen as walkers “H”, “L”
and “C”. In these simulations, pmin = 10−12 and pmax = 10−1 , following previous work[16].
     This process is general to any dynamics engine, and to any form of the distance function
dij . Here we use two different distance functions to describe the unbinding and rebinding
processes. For unbinding, dUij is defined as the root mean square deviation (root mean square
deviation (RMSD)), in Å, of the guest ligand between structures i and j, after aligning to
the host. As mentioned in Section 2.2.1, there are multiple symmetry-equivalent mappings
of the host atoms. We thus compute this distance after alignment of j to each symmetry-
equivalent mapping of host i, and use the smallest such value as dUij . For rebinding, dR    ij is
computed using the RMSD of both i and j to the reference starting structure:
                                      dR        U      U
                                       ij = 1/di0 − 1/dj0 ,                                 (2.4)
where dUa0 is the distance from walker a to the reference structure. The difference between
the inverse of these two quantities is used to highlight differences between small values of
this quantity (e.g. between RMSD = 1.5 Å and RMSD = 2.0 Å).
2.2.4     Calculating rates by ensemble splitting
REVO, like other weighted ensemble methods, can calculate kinetic quantities on the fly,
through a technique we call “ensemble splitting” [85, 115] (also referred to as “tilting” [87], or
“coloring” [88, 116]). An equilibrium ensemble is split into two non-equilibrium ensembles by
defining two basins, in this case the “bound” basin and the “unbound” basin (Fig. 2.6). The
unbinding ensemble is defined as the set of trajectories that have most recently visited the
bound basin, and the rebinding ensemble is the set of trajectories that have most recently
                                               26


visited the unbound basin. The unbound basin is the set of structures where the closest
host-guest interatomic distance exceeds 10 Å, as in previous work[16]. The bound basin is
defined as the set of structures with guest RMSD < 1.0 Å, computed after aligning to the
host. Note that a sweep over symmetry-equivalent atom mappings of the host was again
conducted, so a binding event can be registered by binding to either the top or bottom of
the CB8 host, for example.
                            Unbinding trajectories                         unbound
       bound
                               Binding trajectories                        unbound
       bound
Figure 2.6: Ensemble splitting. An equilibrium host-guest binding system is split into two
non-equilibrium ensembles for the calculation of on and off-rates. This is done by defining
“bound” and “unbound” basins (left and right of each ensemble). The “unbinding” ensemble
(top) is the set of trajectories that have most recently visited the bound basin. The
“binding” ensemble (bottom) is the set of trajectories that most recently visited the
unbound basin. The on and off-rates are directly computed using the time averaged
trajectory flux (φ̄b or φ̄u ) between the ensembles.
    In this work, REVO simulations are conducted explicitly either in the unbinding en-
                                               27


semble, or the rebinding ensemble. After each dynamics step, any walker that has exited
its ensemble (by entering the opposite basin) is identified. Its weight is recorded, and its
structure is “warped” back to the starting structure. The structure recorded before warping
is known as an exit point. When a walker “warps”, the atomic coordinates and velocities of
the trajectory return to the starting structure. The weight of a walker does not change as
a result of warping. In the unbinding ensemble, the starting structure is the initial bound
pose. In the rebinding ensemble, the starting structures are exit points that were generated
by the unbinding simulations.
    As shown in Figure 2.6, the rates are simply calculated using the flux of trajectories
(sometimes referred to as the Hill relation [117, 88]) that leave the ensemble:
                                                       wiU
                                                   P
                                     koff = φ¯u =    i
                                                           ,                              (2.5)
                                                     T
                                                   P R
                                            φ̄b        w
                                     kon  =      = i i ,                                  (2.6)
                                            C       CT
where φ̄u and φ̄b are the unbinding and binding flux, T is the elapsed time, the sums are
over the set of exit points observed before time T , and C is the concentration of the ligand,
computed as 1/V where V is the box volume.
2.2.5   REVO simulation details
Unbinding REVO simulations were run for 2000 cycles, with 48 walkers run for ∆t = 20 ps
each cycle. The exit points registered after 1000 cycles were used to initialize the rebinding
REVO simulations. In some cases, fewer than 48 exit points were obtained at this point, and
the walkers were randomly cloned in order to create a full set of 48 walkers. The rebinding
REVO simulations were run for 200 cycles, with ∆t = 200 ps per cycle. Five simulations
were run for each ligand, one from each starting pose (see Figures 2.2-2.4 for starting poses).
In aggregate, we ran 1.92 µs for each of the unbinding and rebinding simulations, 3.84 µs
for each starting pose, or 57.6 µs over the entire set of results presented here.
                                                28


2.2.5.1    Note about CB8-G3-0 and CB8-G3-4
After the conclusion of the SAMPLing challenge we found an error in the weight normaliza-
tion procedure that was used to initialize the weights of the rebinding walkers when fewer
than 48 exit points were observed. This affected only two simulations: CB8-G3-0 and CB8-
G3-4, where only 5 and 7 exit points were observed, respectively, in the first 1000 cycles of
the unbinding simulation. Due to an error, the initial weights in these rebinding simulations
summed to a value greater than 1, and while this could be accounted for in the rate calcula-
tions, it was compounded by the fact that no walker in these simulations had a weight value
less than pmax = 0.1, and thus no cloning/merging moves could occur.
    Surprisingly, this did not affect the calculation of the binding rate. Although the number
of binding events observed in CB8-G3-0 and CB8-G3-4 (32 and 25, respectively), was much
lower than the number observed in CB8-G3-1, CB8-G3-2 and CB8-G3-3 (289, 427 and 190),
the total amount of wegiht that exited was comparable (0.62, 0.43, 0.66, 0.14, 0.50, for
starting poses 0 through 4). This goes to show the downhill nature of binding in host-guest
systems, as confirmed by the almost diffusion-limited kon (see Table 3.1. The calculated
mean first passage time (MFPT) of binding for the CB8-G3 system was 91 ns, which is well
within the aggregate sampling time of each rebinding simulation (1.92 µs), again indicating
why a group of straight-forward trajectories was able to produce over two dozen binding
events each.
2.2.6    Visualization of trajectory trees
To visualize the correlation between exit points, we visualize REVO cloning events in a tree
graph, where each node represents a walker at a given time point and the edges indicate how
walkers are connected through time as can be seen in Figure 2.10. Each level (y-position)
on the tree represents walkers at the same time step. The initial horizontal placement (x-
position) of each node is a direct result of its parent’s position in the previous time step. If
no cloning events occurred for that walker, then the node is placed directly above its parent.
                                                29


If the parent was cloned, then the walkers are spread out in a fan pattern. Once the nodes
are initially placed, their x-positions are minimized with a steepest descent algorithm using
the following energy function:
                              "                                                  #
                         X                                    X rep (xti −xtj )2
                    E=          b(xti − xt−1   2       t 2
                                           i ) + cwi (xi ) +       Eij e ro        ,             (2.7)
                           i                                   j
where xti and xtj are the positions on the tree of walkers i and j at time t, xt−1   i is the position
of the parent at the previous time step, and wi is the walker weight obtained from the
simulation. The variables b, c, r0 are parameters set here to 0.01, 5 and 1000 respectively.
The first term causes the nodes to stay close to their parent’s position, allowing trajectories
to be visually tracked through the tree more easily. The second term encourages the higher
weight trajectories to stay close to x = 0. The third term is a pairwise repulsion term, which
gives the nodes a “radius” of r0 , and is scaled by a repulsion energy (Eijrep ) that takes into
account the molecular distance between the walkers in the simulation (dij ):
                                     Eijrep = a ∗ max(0, dij − d0 ),                             (2.8)
where a and d0 are parameters set here to 2.5 and 2.0. dij can refer to either dUij or dR            ij
when making trees of unbinding and rebinding simulations respectively. However, only trees
generated from unbinding simulations are shown here. The parameters for Eq. 2.7 were
selected to keep the branches generated by cloning events in the same region on the tree, as
well as to keep larger weighted walkers towards the center. It is important to note that this
energy minimization only affects the x-position of each node. The y-position is determined
by the time step and is not used in the steepest descent algorithm. The graphs were made
using NetworkX 2.2 library [118] and visualized using Gephi 0.9.2 [119].
2.2.7    Clustering and visualization of conformation space networks
All of the trajectory frames for the five starting poses of each system were clustered together
using the MSMBuilder 3.8.0 library [120]. The datasets were first featurized using a vector of
                                                   30


host-guest distances for each system. These vectors contain 7056, 3128, and 3496 distances
for the CB8-G3, OA-G3 and OA-G6 systems, respectively. A k-centers clustering algorithm
was used to generate 1000 clusters using the featurized space and assign each frame of the
trajectories to a cluster. The clustering was done using the Canberra distance:
                                                n
                                               X    |pi − qi |
                                   D(p, q) =                                               (2.9)
                                               i=1
                                                   |pi |, +|qi |
where p and q are host-guest distance vectors from different frames and n is the total number
of distance pairs. A count matrix describing the cluster-cluster transitions was calculated.
This corresponds to a Markov state model with a lag-time equal to the cycle length ∆t = 20
ps.
    We then construct Conformation Space Networks (CSNs) from the count matrices, which
are graphical models of the transition matrix, with a node representing each row, and edges
representing non-zero off-diagonal elements using CSNAnalysis [121]. Gephi 0.9.2 was used
to visualize the CSN. The size of each node is proportional to the statistical population of the
cluster. The smallest node was 20 times smaller than the largest node. The topology of the
network was determined using a force minimization algorithm, Force Atlas, included in Gephi
[122]. This algorithm includes repulsive forces for nodes that are not connected and attractive
forces proportional to the weight of the edges. The directed edge weights were values between
0.1 and 100 as determined by wij = 100pij where pij is the transition probability of cluster
i transitioning to cluster j. Undirected edge weights were then determined as the average
between the two directed edge weights. Force Atlas was applied twice, first without adjusting
for node sizes which enabled the nodes to overlap, and then a second minimization adjusted
for node size which prevented overlap. For visualization, all edge weights were given a
uniform value. A CSN of each system is shown in Figure 2.11.
                                               31


2.3    Results
2.3.1   Warped walkers
For each host-guest system we run both unbinding and rebinding REVO simulations orig-
inating from five different starting poses (Figure S1-S3), making 30 simulations total. All
of these REVO simulations generated a substantial number of warping events. In general
these are distributed across a wide range of weight values (Figure 2.7). For all systems it
is observed that rebinding can occur with very high probability walkers (p > 0.1), but that
unbinding occurs with much lower probability. Indeed it is the probability of the unbinding
warped walkers that largely governs differences in KD and koff between the systems. The
minimum weight that is achievable by a walker, pmin , was set to 10−12 in all cases. As shown
in Figure 2.7, this could be increased substantially (e.g. to 10−3 ) in the rebinding case to
avoid the propagation of low-weight trajectories that will not meaningfully contribute to the
binding flux.
    The warping points for the unbinding simulation are shown in Figure 2.8, again using
color to indicate the starting pose. Although they exhibit some strong correlation within
a REVO run, together they comprise a broad distribution. For CB8-G3, both upward and
downward exit pathways are sampled with roughly equal frequency, whereas for the Octa-
Acid systems, the exit points are clustered towards the top of the cavitand.
2.3.2   Kinetics and free energies
The binding and unbinding rates are calculated using the sum of the weights of the warped
walkers, divided by the elapsed time (see “Calculating rates by ensemble splitting” in Meth-
ods). The binding rate is calculated by dividing the binding trajectory flux by the concen-
tration of the guest in mol/L, calculated as C =      1
                                                    NA V
                                                         , where V is the box volume. The
concentration ranged from 0.021 M for OA-G6 to 0.025 M for CB8-G3 and OA-G3, resulting
from unit cells with side-length ranging from 4.1 nm to 4.3 nm. Running estimates of kon
                                              32


                                                     CB8-G3                                   OA-G3                                     OA-G6
                                 1e-05
                                 1e-06
  Unbinding trajectory weight
                                 1e-07
                                 1e-08
                                 1e-09
                                 1e-10
                                 1e-11
                                 1e-12
                                 1e-13
                                    1
  Rebinding trajectory weight
                                  0.01
                                0.0001
                                1e-06
                                1e-08
                                1e-10
                                1e-12
                                         0      10        20        30       40 0        10        20        30        40 0        10        20        30      40
                                             Simulation time elapsed (ns)             Simulation time elapsed (ns)              Simulation time elapsed (ns)
Figure 2.7: Weights of warped walkers. Weights of warping events for the unbinding (top
row) and rebinding (bottom row) simulations. In both cases the points are colored
according to the index of the corresponding starting pose (0, blue; 1, red; 2, yellow; 3,
green; 4, brown).
                                                           Table 2.1: Pose-averaged rates and affinities
                                          koff (s−1 )      MFPToff (s)       kon (s−1 M−1 )    MFPTon (ns)         ∆G (calc.)   ∆G (ref.) [99]    ∆G (exp.) [123]
CB8-G3                             0.0012 ± 0.0003            860 ± 230     4.7 ± 0.8 × 108        92 ± 16        −16.0 ± 0.2   −10.90 ± 0.16       −6.45 ± 0.06
OA-G3                                    160 ± 110      0.0064 ± 0.0044     1.2 ± 0.2 × 109         36 ± 6         −9.5 ± 0.4    −6.70 ± 0.02       −5.18 ± 0.02
OA-G6                                  0.48 ± 0.11             2.1 ± 0.5    2.8 ± 1.0 × 108       150 ± 50        −12.1 ± 0.3    −7.18 ± 0.05       −4.97 ± 0.02
and koff are shown individually for each REVO simulation in Figure 2.9, along with their
average, which is calculated by averaging the trajectory flux over the set of five simulations.
Large, upward jumps are observed in the rate curves whenever an exit point is recorded that
has a higher weight than was previously observed.
             The final average rate values, as well as the corresponding mean first passage times, are
given in Table 2.1. The uncertainties of kon and koff (δon and δoff ) are determined using the
standard error across the five trajectories. The uncertainties in the MFPT of binding and
unbinding are calculated as δon /Ckon
                                   2
                                      and δoff /koff
                                                 2
                                                     , respectively. Finally, the uncertainty in
                                                                                              33


                CB8-G3
                OA-G3
                OA-G6
Figure 2.8: Spatial distribution of warped walkers. Structures of warping events for the
unbinding simulations viewed from the front and back. Guest ligands are colored according
to the index of the corresponding starting pose (0, blue; 1, red; 2, yellow; 3, green; 4,
brown).
∆G is as follows:                        s        2        2
                                              δoff        δon
                               δ∆G = kT               +          .                        (2.10)
                                             koff         kon
    The MFPT of unbinding demonstrate the power and scope of the REVO method: we
estimate that the CB8-G3 system has an average ligand RT of 860 seconds, and we obtain
multiple ligand release events for each of the five starting poses. In total, we used 9.6 µs of
sampling in the CB8-G3 unbinding ensemble, resulting in an acceleration factor of ≈ 9 × 107 .
    With koff and kon in hand, the binding affinity is calculated using Eq. 2.1. This bind-
ing affinity is compared to both the experimentally measured binding affinity [123], and
                                              34


a computational reference computed using alchemical free energy calculations with YANK
(see [99] for more details). As shown in Figure 2.9, the host-guest affinity calculated by the
rate ratio in REVO is systematically too tight when compared both the experimental and
reference values. This is possibly due to finite box size effects, which is discussed further in
the Discussion and Conclusions section.
                                    CB8-G3                                     OA-G3                                 OA-G6
                        0
                        -5                         Experimental
    ΔG (kcal/mol)     -10                          Reference
                      -15
                      -20
                      100
                        1
   koff (1/s)
                      0.01
                    0.0001
                    1e+10
                    1e+08
  kon (s-1 M-1 )
                    1e+06
                    10000
                      100
                        1
                             0    10        20        30          40   0    10        20        30      40   0    10        20        30      40
                                 Total simulation time (ns)                Total simulation time (ns)            Total simulation time (ns)
Figure 2.9: Predicted kinetics and free energies. The calculated free energies (top),
off-rates (middle), and on-rates (bottom) are shown as a function of simulation time for
each starting pose in each host-guest system. The curves are colored according to the index
of the starting pose as in Figures 2.7 and 2.8. The calculated binding free energies are
compared with experimental measurements (horizontal red line) [123], and the
computational reference (dashed black line) for each system.
                                                                               35


     Moderate variation in kon and koff is observed across the sets of simulations for each host-
guest system, which contributes to some uncertainty in the predicted rates and affinities.
However, the average standard deviation in the log10 final rates (log 10(k)) is 0.28 for on-rates
and 0.56 for off-rates, both well under an order of magnitude. This compares very favorably
with recent studies using WExplore [102, 16], where rates from individual simulations varied
over several orders of magnitude.
2.3.3     Trajectory trees reveal correlation between exit points
Rates are derived from exit points, and while points from different starting poses are guar-
anteed to be independent, it is unclear how correlated the observations are within a given
REVO simulation. We can use a tree network to observe the entire set of merging and
cloning events that occur during a simulation, and to determine how closely related walkers
are to one another. Additionally, one can visualize the state of the walkers through coloring
the tree based on physical properties observed during the simulation, such as the solvent
accessible surface area (SASA) of the guest molecule, which can help evaluate how close the
guest is to unbinding from or rebinding to the host. Using this coloring, and how closely re-
lated walkers are to one another, we can visualize the correlation between a set of unbinding
or rebinding events.
     Figure 2.11 shows a trajectory tree for the OA-G3-0 unbinding simulation. From the tree
it is clear that the majority of sampling time is spent sampling the bound state (dark green
structures). However, the top inset shows that this sampling is still very active, with outliers
being detected and cloned nearly every cycle, although the vast majority of these clones are
merged one or two cycles later, which implies that the outlying property corresponded to a
fast degree of freedom. The middle inset shows a breakout event that led to a series of exit
points. The vertical “branches” show individual trajectories. Termination of a branch with
high SASA (orange) correspond to exit points.
     The OA-G3-0 simulation generated 966 exit points, 534 of which can be seen in Figure
                                                36


Figure 2.10: Trajectory trees show all cloning and merging events in a simulation. The
trajectory tree for the first 1329 cycles of the OA-G3-0 unbinding simulation is shown.
Each horizontal row in this tree represents a cycle, and the placement of all 48 nodes in the
row is determined by minimizing an energy function (see “Visualization of trajectory trees”
in Methods). SASA is used to color the nodes, with blue and dark green indicating bound
structures, and yellow to orange indicating unbound.
                                               37


2.10, which captures only the first 1329 cycles. From the tree it can be seen that many of
these exit points are correlated, as they were recently cloned from common ancestors. Using
the tree analysis one can observe that there are likely at least seven distinct groups of exit
points that can be treated as independent observations of unbinding pathways.
    In the bottom inset we see a trajectory that demonstrates transient rebinding behavior.
That is, the SASA goes high (≈ 320, orange), to medium (≈ 160, light green), back to high
again. This behavior results from a transient, loose association with the exterior of the host
molecule.
2.3.4    Conformation space networks reveal connection between starting poses
Here we obtain combined estimates of kon , koff and KD by averaging the transition flux
from simulations with different starting poses, and in the case of the rebinding simulations,
different boundary conditions. This is only appropriate if the five starting poses are all part
of the same basin of attraction, and can interconvert on timescales much faster than the
unbinding process. If two poses form distinct basins of attraction, then we cannot expect
that the poses will have the same koff , kon , or KD . To examine the connectivity of starting
poses, we use the REVO trajectory segments to construct a Markov state model. We then
visualize CSNs to examine how the starting poses are connected, whether they are in the
same basin of attraction, and whether they share the same (un)binding pathways.
    Figure 2.11 shows CSNs for the unbinding simulations of all three host-guest systems.
For both OA systems a large, densely-connected ensemble of bound states is observed. As
the entire set of host-guest distances was used to featurize our dataset, this heterogeneity
arises from motions of the flexible chemical groups on the bottom and around the rim of
the OA host molecule. Starting structure 3 in CB8-G3 is bound in the opposite orientation
from the others (see Figure 2.2), although the host-molecule is symmetric to inversion about
the horizontal plane. While this did not affect the kinetic measurements (which took into
account symmetry-equivalent atom mappings of the host), in the CSN it forms a distinct
                                                38


Figure 2.11: Conformation space networks for the unbinding simulations. Each node in a
CSN represents a cluster of host-guest structures. Edges in the networks connect clusters
that are seen to interconvert in the REVO simulations. The size of each node is
proportional to the number of times it was observed in the unbinding simulations. Nodes
are colored according to the solvent accessible surface area of the guest molecule, as shown
in the color-bars on the right. The clusters corresponding to the starting poses are labeled
in each network.
                                              39


basin from the other starting poses. This allows us to observe that the ligand cannot flip be-
tween these two structures inside the host, and instead converts between the two poses only
through the quasi-bound and unbound states (yellow and orange). Although here we con-
clude that all structures are part of the same (or symmetry-equivalent) fast-interconverting
bound ensemble, this type of analysis is useful to reveal the interconversion of binding poses,
and whether we should expect them to have the same calculated RTs.
2.4     Discussion
Although we obtained much information about the binding and release processes of these
host-guest systems, our predicted ∆G values were systematically lower than those of a ref-
erence calculation employing the same forcefield (average 4.2 kcal/mol). These reference
∆G values were themselves systematically lower than the experimentally calculated dissoci-
ation constants (average 2.7 kcal/mol), likely arising from inaccuracies in the forcefield. The
nature of the SAMPLing challenge gives us a unique opportunity to isolate these different
sources of error. Below we discuss different possible sources of error in light of the analyses
presented above.
    In weighted ensemble simulations that calculate kinetic quantities, convergence is often
the first question. Here we devoted the same amount of sampling time to the binding and
unbinding processes (1.92 µs per system per starting pose). This is more than sufficient
to capture the binding process, which has a mean first passage time ranging from 36 to
160 ns. The unbinding process was much more challenging, and it is possible that longer
simulations would have captured higher weight walkers exiting from the bound state. This
would increase our koff estimates, and KD as well. Significantly extending the unbinding
simulations and monitoring their exit rates could provide additional insight.
    We also have concerns related to the size of the simulation box. This was chosen to be
appropriate for standard alchemical free energy perturbations, and not for simulations of
full unbinding and binding pathways. A more accurate determination of the binding rate
                                              40


could be obtained with the Northrup-Allison-McCammon (NAM) method, which combines
the rate of first hitting points with a committor probability to determine the binding rate
[82]. Diffusion at long distances is typically efficiently simulated using Brownian dynamics.
This approach has been used successfully to determine binding rates with both the weighted
ensemble method [124, 83], and the SEEKR method (Simulation Enabled Estimation of
Kinetic Rates) [104].
    An important point is that although the reference calculations were performed with the
same forcefield, the rates can sensitively depend on aspects of the forcefield that are not
relevant to alchemical measurements of the affinity. As an example, in OA-G3 unbinding
trajectory trees we observe long “tendrils” of unbound trajectories that are stuck in interme-
diate SASA values, where the guest ligand is bound to the outer surface of the host. The
strength of these interactions can significantly affect our calculations of koff , although they
will not affect the alchemical KD calculations.
    In general, to successfully predict kon and koff will require optimizing the ligand forcefield
terms that govern interactions that occur along binding pathways. By analogy, it is known
that protein forcefields that are only trained on folded protein structures have difficulties
representing unfolded and intrinsically disordered structures. As a community we must take
care not to over-emphasize the ligand bound state in forcefield development. An exten-
sion of the SAMPL challenge to include the prediction of kinetic quantities would thus be
tremendously valuable to the development of both sampling methodologies and forcefields.
                                               41


                                        CHAPTER 3
ON CALCULATING FREE ENERGY DIFFERENCES USING ENSEMBLES
                                OF TRANSITION PATHS
This work was done in collaboration with Robert Hall, a postbaccalaureate student working
in our research group. Robert ran simulations and performed analysis. I acted as a mentor
and co-advisor to help Robert analyze the simulations, make meaningful conclusions and
prepare the manuscript.
    This work was published in Frontiers in Molecular Biosciences volume 6 page 106 in 2020.
The work is presented here as published except that the supplemental figures are worked
into the text.
3.1     Introduction
In recent years there is a growing appreciation for the utility of binding kinetics in the
prediction of drug efficacy [20, 125, 126, 127, 128, 2, 129, 130, 131, 132]. Pharmacokinetic
and pharmacodynamic models of drug activity in the body are inherently out of equilibrium:
a drug is administered, it is absorbed, distributed to different tissues, metabolized and
eliminated from the body. As such, kinetic constants of binding and release – beyond just
the equilibrium constants of binding – are required to model drug action when the timescales
of binding and release cannot be separated from the other competing processes [133]. The
relationship between molecular structure and the kinetics of binding (also called “structure-
kinetic relationships" or SKR) is complicated, as small changes to structure can change
kinetic constants by orders of magnitude [128]. It is important to note that changes in
kinetics are not always tied to changes in affinity [134], and that to accurately predict
changes in kinetics, models of the ligand-binding transition state are needed to estimate
transition-state stabilization or destabilization [135].
    Computational methods that reveal structures of transition states and calculate binding
                                                42


(kon ) and unbinding (koff ) rate constants for real compounds are in their infancy, but are
quickly developing [102]. It is a tremendous challenge to obtain reliable values for these
quantities, as 1) they depend on the entire (un)binding pathway, not just its endpoints,
and 2) the timescales of ligand binding and release often exceed the capabilities of molec-
ular dynamics (MD) simulations by orders of magnitude. Specialized computing platforms
have been applied to generate continuous binding pathways [136], although the unbinding
process is typically beyond the reach of MD simulation for compounds beyond millimolar
drug fragments [130, 137]. Recent studies have used enhanced sampling methods in MD
to simulate ligand (un)binding pathways and determine mechanisms and rate constants kon
and koff [56, 110, 138, 101, 16, 139, 140, 141]. Some of these rate constants have shown
surprisingly good agreement with experiment – given the extraordinarily long timescales in-
volved – however these have the confounding uncertainty of force field accuracy [142, 143],
there is a possibility for fortuitous cancellation of error. Unfortunately, the computational
cost required to predict these quantities is typically massive [143], especially for large pro-
tein systems and ligands with extremely long residence times (RTs), precluding the study of
these events under a series of different simulation conditions (e.g. forcefields, water models,
polarizability).
    In the field of biomolecular modeling, blind challenges – where a series of objectives are
released by the organizers, and participants entries are directly judged by their agreement
with experiment – have been useful catalysts for the development of predictive algorithms
[144, 145, 146, 147]. Although no blind challenge currently exists for the prediction of kon
and koff , we recently participated in the SAMPL6 SAMPLing challenge, which required
participants to compute free energies as a function of simulation time and to compare the
computational cost of different free energy calculation methods [148, 99, 74]. This challenge
allows sampling methods to be assessed independently of force field accuracy, as all entries
used the same initial coordinates, force field parameters and partial charges. Importantly,
the challenge makes use of very small model systems (host-guest) that require considerably
                                               43


less computational resources to simulate, which allowed us to efficiently simulate binding and
release for a number of systems, determine kon and koff , and predict values for the binding
free energy (∆G) that would then be compared to experimental observables, as well as results
from alchemical free energy perturbation methods [149, 150].
    The binding free energy was determined as a function of rate constants:
                                                     C 0 kon
                                                            
                                   ∆G = −kB T ln               ,                          (3.1)
                                                      koff
where C 0 is a reference concentration of 1 mol/L. In this chapter, we revisit this equation
in detail and explicitly examine the assumptions made when the rate constants used in Eq.
3.1 are computed through typical simulations with finite box-size and periodic boundary
conditions. In Section 3.3.1 we derive three correction terms that can be easily computed and
facilitate a better connection with both experiment and alchemical computational free energy
calculations. To examine questions of convergence, we reproduce our binding and unbinding
simulations with larger numbers of replicas and longer simulation times. We also explore
the effects of the Langevin integrator on the prediction of unbinding and binding rates;
in particular, how altering the friction coefficient (γ), defined in the Langevin integrator,
impacts the binding and release processes. Although γ does not appear in the internal
energy function, and hence cannot affect thermodynamic properties such as the binding
free energy, we examine whether lower friction coefficients can accelerate the convergence of
unbinding simulations.
3.2     Methods
3.2.1    Host-guest systems
The host-guest system utilized in this study is referred to as OA-G6 (Fig. 3.1), where the
host, octa acid (OA) is a Gibb deep cavity cavitand, referred to as an octa acid (OA [113].
OA forms a basket-like structure with 4-fold symmetry, functionalized with four benzoic-acid
substituents on the top rim of the basket and four more on the bottom. The guest ligand we
                                              44


study here is 4-methyl pentanoic acid (referred to as “G6”). This ligand harbors a negative
charge at the carboxyl end of the alkyl chain.
  A                                                               B
                                                                        CH3
                                                                                          O
                                                                   H3C
                                                                                        -
                                                                                    O
                    side                              top
Figure 3.1: (A) The initial pose for the OA-G6 system (side view: left, top view: right).
Note that some atoms from the host are removed in the side view for clarity. The carboxyl
oxygens are shown in sphere representation. (B) The chemical structure of the G6 ligand in
the deprotonated form.
3.2.2   Molecular dynamics
The OA-G6 configuration was obtained from the organizers of the SAMPLing challenge [148].
The system was solvated in a (roughly) cubic box with box length 4.28, 4.33 and 4.33 nm
in the x, y and z dimensions, respectively. The system provided had a total of 7976 atoms:
2586 water molecules to solvate the system, 12 sodium and 3 chloride ions to neutralize the
system, and the remaining atoms belonging to either the host or the guest. OpenMM v7.2.1
[114] was used to run dynamics with the CUDA v9.0.176 platform. A Monte Carlo barostat
is used to maintain a constant pressure of 1 atm. A time step of 2 fs was used across all
simulations.
    We utilize the Langevin integrator, which uses a drag term and a noise term to account
for the friction of solvent molecules and high velocity collisions that perturb the system.
Langevin dynamics allows for the temperature to be controlled and can be used as a ther-
mostat; we run all dynamics here at 300 K. Our host-guest system follows the Langevin
                                             45


equation, shown below:
                                                                                             (3.2)
                                                           p
                        F (t) = −∇U (r(t)) − γmv(t) +        2mγkb T R(t),
where U (~r(t) is the particle interaction potential, R(t)
                                                         ~    is a random Gaussian noise term,
T is the temperature, kB is the Boltzmann constant, and γ is the friction coefficient. The
friction term plays two different roles here, both modulating the second “drag” term, and
the Gaussian noise. As γ approaches zero, the noise gets weaker and the dynamics becomes
more deterministic. Here we run binding and unbinding simulations with γ values of 1.0, 0.1
and 0.01 ps−1 .
3.2.3    Reweighting of Ensembles by Variance Optimization
To generate an ensemble of ligand unbinding events, we need to employ enhanced sampling
as the timescale of ligand unbinding events in this system is prohibitively long: we found
in previous studies a mean first passage time of 2.1 s (Chapter 2), which is six orders of
magnitude longer than the reach of conventional MD sampling. In this work, we implement
the Resampling Ensembles by Variation Optimization (REVO) method, based on weighted
ensemble (WE) framework, to encourage the sampling of rare unbinding/rebinding events.
WE accelerates the sampling of rare events using an ensemble of trajectories that are each
assigned a statistical weight [71]. The ensemble is integrated forward in time in a parallel
fashion, and periodically “resampled” by cloning certain trajectories and merging others.
When a trajectory is cloned, its weight is divided amongst the clones, but the multiple
copies of the trajectory go on to evolve independently. By repeatedly cloning trajectories
that are in undersampled regions of space we can obtain statistics on very long-timescale
events using only short-timescale simulations.
    The REVO was designed to efficiently perform cloning and merging operations on small
ensembles of trajectories that are evolving in high-dimensional spaces [35]. This is valuable
in situations where it is difficult to define one or two progress variables that capture the long-
                                                 46


timescale events of interest. In REVO, coupled cloning and merging operations are proposed
(e.g. clone trajectory i, and merge trajectories j and k) and are accepted or rejected based
on an objective function called the “trajectory variation”:
                                   X          XX
                               V =      Vi =          (dij /d0 )α φi φj ,                     (3.3)
                                     i         i    j
where dij is the distance between trajectories i and j, α and d0 are parameters, and φx is a
function that measures the importance, or “novelty" of a trajectory x, which in our work here
is strictly a function of the weight of the trajectory: φi = log wi − C, where wi is the weight
of trajectory i and C is a constant. Trajectories with the highest Vi values in Eq. 3.3 are
chosen for cloning, and those with the lowest Vi are chosen for merging. More information
about the algorithm can be found in previous work [35].
    We run separate simulations for the binding and unbinding processes. In our unbinding
simulations, the ligands start in the bound state and are terminated as they unbind. In the
rebinding simulations, the ligands start in the unbound state and are terminated as they
bind. The distance function (dij ) we use in Eq. 3.3 is different for these two simulation
types. For the unbinding simulations, we superimpose the hosts from trajectories i and j,
and then compute the root mean square deviation (RMSD) between the guest molecules,
without any further alignment [77, 109],. As there is 4-fold symmetry in this system, we
perform the alignment four times (once for each symmetrically-equivalent mapping) and use
the smallest such distance as dij . For the rebinding simulations, we calculate the distance
to the native state for each trajectory (dnative (Xi )), which again takes into account the four
symmetry mappings, using the lowest such distance. The distance between trajectories i
and j is then calculated as dij = |1/dnative (Xi ) − 1/dnative (Xj )|, where the inverse is used to
prioritize differences between small values of dnative .
                                                 47


3.2.4   Calculating rates by ensemble splitting
A major advantage of the REVO method, much like other weighted ensemble methods, is
that it can calculate kinetic parameters in real time as the simulation progresses. This is
achieved by running separate simulations for the binding and unbinding processes, and in
each case, measuring the trajectory flux into the opposite basin [85, 86, 87, 88, 89]. The
unbound basin is defined as the set of structures where the closest host-guest interatomic
distance is > 1 nm, following previous work [77, 109, 16]. The bound basin is defined as
the set of structures where the guest RMSD (compared to the native structure) is < 0.1
nm after aligning to the host. Again, this RMSD measurement takes into account the four
symmetry-equivalent mappings of OA.
      A                                       B
                                                     Unbinding ensemble
                             Equilibrium
                                                                   Binding ensemble
Figure 3.2: Splitting an equilibrium ensemble into two history-dependent ensembles using
basins. The bound and unbound basins are shown in grey and light orange, respectively.
The unbinding ensemble (B, top) contains all trajectories that last visited the bound basin,
which are shown in black. The binding ensemble (B, bottom, also referred to as the
“rebinding” ensemble) contains all trajectories that last visited the unbound basin, shown
in red. Simulations in a given ensemble are terminated once they reach the destination
basin and thus switch ensembles. The trajectory flux between ensembles is denoted by
φu→b and φb→u . The quantity πb refers to the probability of the entire top ensemble, and
the quantity fb denotes the probability of the bound basin within the unbinding ensemble.
    In our studies, the binding and rebinding REVO simulations are conducted separately.
However, the methodology of obtaining on and off-rates is essentially the same. After each
                                              48


dynamics step, if a walker has entered the opposite basin, as described above, its weight is
recorded and its structure is “warped” back to the starting structure at the beginning of the
simulation. The atomic coordinates are set to the starting structure and the velocities are
reinitialized; however, the weight of the trajectory remains the same. Before the warping
event to the starting structure, the structure of the walker is recorded and is referred to
as an “exit point”. In our unbinding simulations, the initial starting structure is the initial
bound pose provided. In our rebinding simulations, the initial starting structure is chosen
from a set of exit points generated from the unbinding simulations. Therefore, the unbinding
analyses were performed prior to initialization and the subsequent running of our rebinding
simulations.
    The off and on-rates are calculated by using the flux of trajectories into either the un-
bound or bound state respectively, and mathematically calculated by:
                                                   P
                                                       wi
                                       koff (t) =    i
                                                          ,                              (3.4)
                                                    T
                                                   P
                                                       wi
                                       kon (t) =    i
                                                          ,                              (3.5)
                                                   CT
where the sum is over the set of “warped” trajectories, T is the elapsed simulation time, and
C is the concentration of ligand, computed as 1/V where V is the box volume. The box
volume was approximately 80.2 nm3 , corresponding to a concentration of ligand of 0.0207
M.
    There are a few key differences between the REVO simulations discussed here and in
Chapter 2. For both the unbinding and rebinding simulations in this study, the total sim-
ulation time is 2.25 times longer compared to our previous study, as our current unbinding
and rebinding simulations were run for 4500 and 450 cycles, respectively. Additionally, ten
independent unbinding simulations were run for each of the four friction coefficients, whereas
our previous study only ran five independent simulations for each starting pose. However,
only five independent rebinding simulations were run for each of the coefficients, as we ob-
serve much less variation in the kon estimates. Finally, 48 walkers were used in both studies
                                                49


and the time per cycle is consistent, where the unbinding simulations are 20 ps/cycle and
the rebinding simulations are 200 ps/cycle.
3.2.5     Calculating electrostatic interaction energies
The electrostatic energy between the host and guest molecules for use in the second correction
                                   1 Qi Qj
term was calculated as: Eint =   4πw rij
                                             where Qa is the partial charge of atom a used in the
force field during simulation. rij is the interatomic distance between atoms i and j calculated
by using the minimum image convention. w = 6.88 × 10−10 F/m is the permittivity of water
at 300 K calculated by linear interpolation of the water dielectric constant at 298.15 and
303.15 K [151].
3.3     Results
3.3.1     Derivation of correction terms
The binding free energy can be calculated using the rate constants kon and koff as ∆G =
                                                              
Gbound − Gunbound = −kT ln (Keq C0 ) = −kT ln koff , where Keq is the binding equilibrium
                                                       C0 kon
constant, C0 is the reference concentration of 1 mol/L, k is Boltzmann’s constant and T is
the temperature in Kelvin. While this relationship is correct in the macroscopic limit, it
fails to account for the box size and the volume of the unbound state in finite simulation
environments with periodic boundary conditions. Here we derive a more accurate expression
for the binding free energy that accounts for the finite box size in a typical MD simulation.
    Our starting point is an expression for Keq , which is valid for a dilute solution in ther-
modynamic equilibrium. We use the notation of Woo and Roux (see Eq. 4 from Ref. [152]):
                                                    d1 dXe−βU
                                            R           R
                              Keq = R         bound
                                                           ∗
                                                               R        ,                   (3.6)
                                        bulk
                                             d1δ(r  1 −  r 1 )   dXe−βU
where U is the internal energy of the system, β = 1/kT is the inverse temperature, r1 is the
center of mass of the ligand (referred to as a “guest" molecule) and r∗1 is an arbitrary position
of the guest in the bulk. Note that d1 integrates over the guest positions, and dX integrates
                                                   50


over everything else: the host and the solvent degrees of freedom. Note also that Keq has
units of volume, as the delta function constraining the center of mass in the denominator
removes three spatial degrees of freedom.
    Here we examine the calculation of free energies using rates determined from split en-
semble calculations (Fig. 3.2, see Section 3.2.4 for more details). We denote the probability
of these two ensembles as πb and πu , where πb + πu = 1, and:
                                              πb      φu→b
                                                   =         ,                         (3.7)
                                              πu      φb→u
where φa→b is the time-averaged flux from the a ensemble to the b ensemble (i.e. across the
dotted lines in Fig. 3.2). The equilibrium probability of a position X can be obtained by
combining estimates from both ensembles:
                                    p(X) = pu (X)πu + pb (X)πb ,                       (3.8)
where pa (X) is the probability of conformation X in ensemble a, which is normalized such
that pa (X)dX = 1.
     R
    Let us define the bound state as the domain of the integral in the numerator of Eq. 3.6,
and the unbound state as a set of structures considered unbound in simulation (not the same
as the bulk state in Eq. 3.6). These states are shown as shaded regions in Fig. 3.2. The
ratio of the probabilities of these two states, at equilibrium, is given by:
                                                               dXe−βU
                                                R            R
                                   pbound               d1
                                            =  R  bound       R           ,            (3.9)
                                  punbound       unbound
                                                          d1 dXe−βU
which can also be calculated in our ensemble splitting simulations:
                                               R
                             pbound         πb bound pb (X)dX           πb f b
                                        =     R                       =        ,      (3.10)
                            punbound       πu unbound pu (X)dX          π u fu
where fa is the probability of the basin state within ensemble a.
    Expanding Eq. 3.6 we have:
                                    d1 dXe−βU                        d1 dXe−βU
                            R           R                  R            R
                             bound                          unbound
                   Keq = R               R
                                                 −βU
                                                      R                  ∗
                                                                             R
                            unbound
                                     d1    dXe               d1δ(r 1  − r1 )   dXe−βU
                                      R            R bulk−βU
                           πb f b               d1 dXe
                        =         R unbound             R            .                (3.11)
                           πu fu bulk d1δ(r1 − r∗1 ) dXe−βU
                                                    51


The unbound state in simulation is far enough that the host and guest do not interact directly
through van der Waals interactions, although if both molecules carry an explicit charge –
as in the example considered here – there could still be significant host-guest electrostatic
interactions. To account for these, we introduce another intermediate state with an altered
energy function (U ∗ ) which is the same as U except that it does not include electrostatic
interactions between the host and the guest:
                                                                                           ∗
                         πb fb unbound d1 dXe−βU                                    dXe−βU
                                R              R                  R               R
                                                                              d1
                Keq =                                               unbound
                                                                                                   (3.12)
                         πu fu unbound d1 dXe−βU ∗ bulk d1δ(r1 − r∗1 ) dXe−βU
                                R              R              R                        R
                                                                                ∗
                                                              d1 dXe−βU
                                                   R              R
                         πb fb βEint −1
                     =            e        unb
                                               R     unbound
                                                                    ∗
                                                                        R
                                                                                −βU
                                                                                     .             (3.13)
                         π u fu                  bulk
                                                      d1δ(r  1 −   r1 )    dXe
where Eint = U − U ∗ and the subscript “unb" indicates an ensemble average over structures
in the unbound state obtained with the normal energy function U . Note the final step used
the relation:
                                     −βU ∗
                                                                 dXeβEint e−βU
              R             R                  R              R
                        d1     dXe                        d1
              Runbound       R
                                     −βU
                                            =    unbound
                                                   R              R
                                                                            −βU
                                                                                    = eβEint unb
                                                                                                 . (3.14)
                unbound
                        d1     dXe                   unbound
                                                              d1     dXe
    We can now reasonably assume that the guest in the unbound state is non-interacting
with the host. This allows us to write e−βU as e−βUG e−βUHS , where UG are the terms in the
energy function that depend only on the coordinates of the guest, and UHS are terms that
only depend on the host and the solvent. We can then pull the integral dXe−βUHS out of
                                                                                           R
the numerator and denominator of the last term of Eq. 3.13:
                                                     ∗
                                    d1 dXe−βU                                  d1e−βUG
                        R              R                            R
                    R     unbound            R             =   R      unbound
                                                                                           .       (3.15)
                      bulk
                           d1δ(r1 − r∗1 ) dXe−βU                 bulk
                                                                      d1δ(r1 − r∗1 )e−βUG
The bottom integral has the center of mass of the ligand fixed and is only over internal and
rotational degrees of freedom of the ligand. This can also be separated and removed from
the numerator, which simplifies the ratio to be the volume of the unbound state, defined as:
                                                     d1e−βUG
                                          R                           Z
                           Vunbound = R     unbound
                                                                  =         dRφu (R),              (3.16)
                                            guest
                                                  dG1 e−βUG             box
where we use G1 to denote the internal and rotational degrees of freedom of the guest that
remain after specification of r1 . The quantity φu (R) is the fraction of conformers with center
                                                        52


of mass R that satisfy the unbound boundary conditions: here, that the guest atoms are all
farther than a cutoff distance of 1 nm away from the host. This integral can be calculated
by Monte Carlo, where a center of mass position and orientation of the ligand is randomly
generated, and the number of successful unbound conformers is recorded:
                                                      Nunbound
                                    Vunbound = Vbox            .                             (3.17)
                                                       Ntrials
Note that for large boxes Vunbound ≈ Vbox .
     Putting this all together we have:
                                         πb fb βEint   −1
                                Keq =           e      unb
                                                           Vunbound ,                        (3.18)
                                        πu f u
which differs from the straightforward interpretation used in Chapter 2:
                                               πb      πb
                                       0
                                     Keq  =          = Vbox .                                (3.19)
                                              πu [L]   πu
Using ∆G = −kT ln(Keq C0 ), we have:
                                                                                  
                                     fb                                     Vunbound
                          0
              ∆G = ∆G − kT ln              + kT ln e  βEint
                                                            unb
                                                                 − kT ln               ,     (3.20)
                                     fu                                       Vbox
which explicitly shows ∆G as the sum of ∆G0 = −kT ln(Keq         0
                                                                   C0 ) and the three newly derived
correction terms. The first term will go to zero in the limit that the basin states are chosen to
represent the vast majority of the probability in both the binding and unbinding ensembles.
In other words, this term goes to zero when both fb and fu approach one. The second term
is likely to only be non-negligible in the case of explicitly charged host and guest molecules
and regardless would go to zero as the definition of the unbound state is moved to farther
and farther distances. The third term would also go to zero for large simulation boxes,
but in practice this is often not feasible due to computational constraints. Consequently,
Vunbound /Vbox could be much less than one, introducing a correction in the positive direction.
Below we calculate these three correction terms and apply them to free energy calculations.
                                                  53


3.3.2     Extended trajectory ensembles with lower friction coefficients
In previous work, we used a Langevin integrator with a value of γ = 1 ps−1 for the friction
coefficient. As the simulations already have explicit solvent, this adds extra friction into the
system that is not physical. Here we investigate whether reducing γ to values less than one
will significantly affect our rate calculations. We thus run a set of trajectory ensembles at
multiple values of γ and extend each ensemble to be longer than those discussed in Chapter
3 to more fully examine questions of convergence.
    As γ governs the coupling to the Langevin thermostat, we determine the minimum value
of γ where our target temperature (300 K) is maintained. We first ran a series of short simu-
lations (one 10 ns trajectory for each γ) and find that temperature control is completely lost
for friction coefficients less than γ = 0.001 (Figure 3.3A). We then ran longer simulations for
γ = 1, 0.1, 0.01 and 0.001, examining not only the mean temperature, but the probability of
significant temperature fluctuations, which could spur anomalous results in our ligand disso-
ciation simulations. Figure 3.3B shows the probability distribution of observed temperatures
over an ensemble of 240 trajectories run for 90 ns each. For γ = 0.01, 0.1 and 1 ps−1 , the
temperature distribution is normally distributed around the mean (300 K) as seen by the
parabolic curves on a log scale. Temperature control is not fully maintained for γ = 0.001
ps−1 , as shown by a rightward shift and slight widening of the parabolic distribution. We
thus restrict our analysis to three values of the friction coefficient: γ = 0.01, 0.1 and 1 ps−1 .
    We run both unbinding and rebinding REVO simulations for the OA-G6 system. For
unbinding, we ran 10 simulations for each of the three friction coefficients; for rebinding, we
ran 5 simulations for each coefficient, yielding a total of 30 simulations for unbinding and
15 simulations for rebinding. A set of binding and unbinding simulations were also run for
γ = 0.001 – despite the impaired temperature control – which are reported in Figure 3.5.
The estimates for the unbinding and binding fluxes are depicted in Figure 3.4, where each
curve represents an individual REVO simulation. The averages, illustrated with a bolded
line, are calculated by averaging the trajectory flux over the entire set of simulations for
                                                54


     A                      320                                                      B
                                                                                                            γ = 0.001
                                                                                                   10-2     γ = 0.01
                            315
     Mean temperature (K)
                                                                                                            γ = 0.1
                                                                                                            γ=1
                            310
                                                                                     Probability
                                                                                                   10-3
                            305
                                                                                                   10-4
                            300
                                                                                                   10-5
                                  0   1e-5 1e-4    0.001 0.01         0.1   1                             260          280      300   320
                                           Friction coefficient (γ)                                               Temperature (K)
Figure 3.3: (A) Average temperatures observed in short simulations for different friction
coefficients (γ). (B) Probability distributions of observed temperatures from ensembles of
longer simulations with different γ.
that value of γ. The upward jumps on these plots indicate that an exit point was recorded
that has a higher weight than was previously observed. A set of binding and unbinding
simulations were also run for γ = 0.001 – despite the impaired temperature control – the
rates of which are depicted in Figure 3.5
   By reducing γ to values less than 1, we observed no change in the binding rates, and
small changes to the unbinding rates which are on the border of significance. With regard
to unbinding rates, the two largest friction coefficients yielded the smallest error and similar
koff values, where γ = 1 yielded an average off-rate of 16.4 s−1 and γ = 0.1 yielded an off-rate
of 11.5 s−1 . The off-rate increased by 10-fold for γ = 0.01, although this is mostly driven by
exit points observed in a single simulation. In our previous OA-G6 results using γ = 1, we
calculated an unbinding rate of 0.48 s−1 which slightly differs from the value calculated in this
study using γ = 1 (Table 3.1). Unbinding rates for γ = 0.001 ps−1 were approximately 1000-
fold higher, although these are known to be affected by a higher average temperature (SI).
Taking a closer look at the binding rates, we saw no discernible difference across the friction
coefficients. The binding rate was approximately 109 s−1 M−1 , for all friction coefficients,
which was about 5-fold larger when compared to our previous study using γ = 1. For both
                                                                                55


           1010
                 109
 kon (s-1 M-1)
                 108
                 107
                 106                    γ = 0.01 s-1                        γ = 0.1 s-1                        γ = 1 s-1
                 105
                       0   20     40      60     80    0     20     40      60     80     0   20     40      60     80
                            Simulation time (ns)              Simulation time (ns)             Simulation time (ns)
             104
             102
 koff (s-1)  100
           10-2
                                        γ = 0.01 s-1                        γ = 0.1 s-1                        γ = 1 s-1
           10-4
                       0   20     40      60     80    0     20     40      60     80     0   20     40      60     80
                            Simulation time (ns)              Simulation time (ns)             Simulation time (ns)
Figure 3.4: Predicted on- (top) and off-rates (bottom) as a function of simulation time.
Each panel is labeled according to the friction coefficient used for that set of simulations.
The independent simulations are shown in shades of orange (kon ) and blue (koff ), and the
averages are depicted by bold black lines.
binding and unbinding rates we have more confidence in the results obtained here, as they
are based on more extensive simulation data.
Table 3.1: Binding and unbinding rates as a function of friction coefficient (γ). The
uncertainties shown use the standard error of the mean calculated from 5 and 10
independent REVO runs for binding and unbinding, respectively. The quantities from
Chapter 2 were obtained with 5 REVO runs that used different initial conformations, each
of which were 2000 cycles in length.
                                       kon (108 M −1 s−1 )    kof f (s−1 )
 γ = 0.01                              17 ± 1                 122 ± 94
 γ = 0.1                               16 ± 2                 22 ± 12
 γ=1                                   13 ± 1                 16.4 ± 9.4
 Chapter 5 (γ = 1)                     2.8 ± 1.0              0.48 ± 0.11
             For both the unbinding and rebinding simulations, across all friction coefficients, we
                                                                 56


                                    1010
                                          109
                          kon (s-1 M-1)
                                          108
                                          107
                                          106                   γ = 0.001 s-1
                                          105
                                                0   20     40      60     80
                                                     Simulation time (ns)
                                      106
                                      104
                                      102
                          koff (s-1)
                                      100
                                    10-2
                                                                γ = 0.001 s-1
                                    10-4
                                                0   20     40      60     80
                                                     Simulation time (ns)
Figure 3.5: Binding (top) and unbinding (bottom) fluxes for γ = 0.001 ps−1 . Fluxes are
shown for each simulation individually. Parameters are the same as those used for higher γ
values in the main text. Average fluxes over the simulations are shown as thick black lines.
observed at least 1000 warping events (Figure 3.7). As expected, we observe that rebinding
occurs with a much higher probability when compared to unbinding, by several orders of
magnitude. The unbinding walker weights are limited at the low end by the minimum
walker probability (pmin ), which is set to 10−12 . The rebinding walker weights are limited at
the high end by the maximum walker probability (pmax ), which is set to 10−1 . respectively.
Figure 3.7 shows that the 10-fold larger unbinding rate for γ = 0.01 was largely due to a single
unbinding point in a single simulation, which underscores the sensitivity and uncertainty of
rate calculations using trajectory fluxes. Figure 3.6 shows unbinding fluxes for γ = 0.001,
which is known to have elevated temperatures. There we see a large number of high-weight
                                                          57


unbinding events in two different simulations, leading to the 1000-fold increase in koff .
                                                   γ = 0.01 ps-1                                                                      γ = 0.1 ps-1                                                                       γ = 1 ps-1
                                  10− 4                                                                            10− 4                                                                            10− 4
   Unbinding trajectory weight                                                      Unbinding trajectory weight                                                      Unbinding trajectory weight
                                  10− 6                                                                            10− 6                                                                            10− 6
                                  10− 8                                                                            10− 8                                                                            10− 8
                                 1 0 − 10                                                                         1 0 − 10                                                                         1 0 − 10
                                 1 0 − 12                                                                         1 0 − 12                                                                         1 0 − 12
                                            0      20        40       60       80                                            0      20        40       60       80                                            0      20        40       60       80
                                                Simulation time elapsed (ns)                                                     Simulation time elapsed (ns)                                                     Simulation time elapsed (ns)
                                  100                                                                              100                                                                              100
                                 10− 1                                                                            10− 1                                                                            10− 1
   Rebinding trajectory weight                                                      Rebinding trajectory weight                                                      Rebinding trajectory weight
                                 10− 2                                                                            10− 2                                                                            10− 2
                                 10− 3                                                                            10− 3                                                                            10− 3
                                 10− 4                                                                            10− 4                                                                            10− 4
                                 10− 5                                                                            10− 5                                                                            10− 5
                                            0      20        40       60       80                                            0      20        40       60       80                                            0      20        40       60       80
                                                Simulation time elapsed (ns)                                                     Simulation time elapsed (ns)                                                     Simulation time elapsed (ns)
Figure 3.6: Weights of warped walkers in unbinding (top) and binding (bottom) REVO
simulations for γ = 0.01, 0.1 and 1.0 ps −1 . Each simulation is shown in a different color.
3.3.3                                       Free energy estimates, correction terms and comparison with previous
                                            benchmarks
As the friction coefficient unevenly affected the rates of binding and unbinding, there was a
net effect on the binding free energies. As shown in Figure 3.8 and Table 3.2, the binding
free energy increases as the friction coefficient is lowered, independent of the free energy
correction terms derived in Section 3.3.1. Table 3.2 shows the free energies computed using
the averaged fluxes across all simulations at each γ value. For all friction coefficients, the
calculated free energy was always higher than that from our previous study (−12.1 kcal/mol;
red line), even for γ = 1, signifying that extending the simulation time aided in predicting
experimentally determined binding free energies.
           The correction terms are calculated using data obtained from the simulations, but they
are mostly functions of geometric properties of the simulation box and boundary condi-
                                                                                                                                          58


                                                                       γ = 0.001 ps-1
                                                         10−4
                           Unbinding trajectory weight
                                                         10−6
                                                         10−8
                                                    10−10
                                                    10−12
                                                                  0    20        40        60        80
                                                                        Simulation time elapsed (ns)
                                                          10 0
                                                         10 − 1
                           Rebinding trajectory weight
                                                         10 − 2
                                                         10 − 3
                                                         10 − 4
                                                         10 − 5
                                                                  0    20       40        60        80
                                                                       Simulation time elapsed (ns)
Figure 3.7: Weights of warped walkers in unbinding (top) and binding (bottom) REVO
simulations for γ = 0.001 ps−1 . Each simulation is shown in a different color. Parameters
are the same as those used for higher γ values in the main text.
Table 3.2: Raw (∆G0 ) and corrected (∆Gcorr ) free energy values using simulation data
from three different friction coefficients. Values are in kcal/mol and uncertainties are
calculated using propagation of the standard error of the mean.
                                                                      ∆G0 (kcal/mol)             ∆Gcorr (kcal/mol)
                     γ = 0.01                                          −9.83 ± 0.46               −7.11 ± 0.47
                     γ = 0.1                                           −10.78 ± 0.32              −8.06 ± 0.33
                      γ=1                                              −10.85 ± 0.34              −8.13 ± 0.36
                 Chapter 2 (γ = 1)                                      −12.1 ± 1.0                −9.38 ± 1.0
                 Comp. ref. [148]                                            -                     −7.0 ± 0.1
                    Exp. [153]                                               -                    −4.97 ± 0.02
tions, and are not expected to change as a function of γ. The first term, −kT ln fb /fu , was
calculated to be 0.74 ± 0.10 kcal/mol, with fb and fu taking on values of 0.157 and 0.54
                                                                               59


respectively. As described in Section 3.3.1, fb is the probability of the being in the bound
basin given that you are in the unbinding ensemble, which is calculated using the sum of the
weights of trajectories in the bound basin, divided by the total sum of the weights of the
trajectories considered. The fb value in particular was lower than expected, indicating that
our definition of the bound state might be too restrictive, even though we did account for
all symmetry-equivalent conformations in our calculation of fb .
    The second term, +kT ln eβEint    unb
                                          , was calculated to be 1.64 ± 0.002 kcal/mol. This
was calculated by determining the electrostatic interaction energies (see Section 3.2.5) for
the set of unbound states observed in the rebinding simulations. The expectation value in
the correction term again accounted for trajectory weights and was computed using 71428
interaction energy measurements that were selected from the unbound ensemble. The uncer-
tainty was computed as the standard error of the mean of this set of energies. To calculate
                                            
the third correction term, −kT ln Vunbound
                                       Vbox
                                               , we directly estimated Vunbound /Vbox using the
Monte Carlo procedure described in Section 3.3.1. The ratio was computed as 0.56 ± 0.0037
using five batches of 10000 trials each, where the uncertainty is the standard error of the
mean across the sets of trials.
    Together these three terms sum to 2.72 kcal/mol, which is a significant correction to the
binding free energies computed here. Over half of this comes from the residual electrostatic
interaction energy between the host and the guest. Note that both the host and the guest
have negative charges, and the residual interaction between the two molecules is repulsive.
Turning this interaction off releases 1.64 kcal/mol of energy, which lowers the free energy
gap between the bound and unbound states. The corrected and uncorrected free energies are
shown as a function of γ in Figure 3.8. For γ ≥ 0.01 the calculated free energies are almost
equal to within standard error and the correction terms significantly reduce the error with
respect to the computational reference value [99, 148].
                                               60


                                                             Experimental
                                    −6
                                                             Reference
                   ΔG (kcal/mol)
                                    −8
                                                                  ΔGcorr
                                   −10
                                                                  ΔG0
                                                             Dixon 2018
                                   −12
                                         10 − 2    10 − 1            10 0
                                                  γ (ps-1)
Figure 3.8: Free energies as a function of friction coefficient. The dark blue line shows the
uncorrected free energies calculated at three different γ values. The light blue line shows
the corrected values, which are shifted upwards by 2.72 kcal/mol. The thin red line shows
the value reported in Chapter 2, which employed a friction coefficient of 1.0 ps−1 and used
a smaller dataset than is reported here. The black horizontal line shows the value of a
computational reference computed using alchemical perturbation, reported in Ref. [148].
The dashed grey line shows the experimental measurement, reported in Ref. [153].
3.4    Discussion and Conclusion
In this study, we sought to better connect the calculation of binding and unbinding rates
with the calculation of binding free energies. The rate calculations measured the microscopic
fluxes of trajectories from one basin to another. These fluxes can be visualized in an extended
history-dependent conformation space, where trajectories change their “color" based on which
basin (“bound” or “unbound”) they have most recently visited [85, 86, 87, 88, 89]. The ratio
of these rates gives a ratio of two populations: the trajectories that have most recently
visited the “bound” basin and the trajectories that have most recently visited the “unbound”
basin. The first correction term adjusts this ratio to instead only account for the probability
contained within the basins themselves and is particular to rates that are calculated using
this history-dependent formalism. The third term can be seen as a volume correction term,
                                                  61


which is used to accurately account for the volume in the unbound state. This is done in
other approaches where restraints are used, such as umbrella sampling [154, 155, 156]. In
our case the unbound state cannot be easily approximated by a geometric object, such as
the volume of a spherical shell.
    The second term accounts for residual interactions in the unbound ensemble. This could
be used by other approaches that directly determine free energy differences between bound
and unbound conformations, such as umbrella sampling. The conventional approach is to
define a simulation box that is large enough such that the interactions between the host and
guest are negligible in the unbound state. However, this can significantly increase the cost
of the simulation. It is worth noting that umbrella sampling results for this system (OA-G6)
obtained by Song et al. [156], −8.50 kcal/mol, were also below both the computational
benchmark and the experimental value. Their unbound state was defined as a 20 Å distance
between an atom in the guest and a dummy atom in the center of the host, which is roughly
comparable to our unbound basin of 10 Å of clearance between the host and the guest.
Assuming a similar value for the electrostatic correction term, it would have brought their
prediction to −6.86 kcal/mol, which is in line with the computational benchmarks [148].
    The electrostatic term can also be viewed as a sort of “decoupling" between the host and
the guest, and it is warranted to discuss similarities and differences with similar procedures
in alchemical free energy methods. They are similar in that we are computing a free en-
ergy between two Hamiltonians, one in which an interaction is turned off. We could thus
use similar techniques for computing these free energy differences, such as thermodynamic
integration [157, 158], BAR [159], MBAR [150, 158], or MM/PBSA [160], although here we
effectively use a simple free energy perturbation (FEP) expression [161, 162]. The approaches
are different in that we are only considering ensembles of structures where the interactions
being turned off are relatively weak. We are assuming here – as is always the case with FEP
– that the conformational ensembles of both the host and the guest are highly overlapping
between the two Hamiltonians, which considerably simplifies the problem. We also note that
                                               62


although we employ electrostatic decoupling to compute free energies, our simulations still
reveal important information about the (un)binding kinetics and mechanism.
    We also examined the role that the Langevin integrator plays in the prediction of kinetic
and thermodynamic quantities. In particular, we adjusted the friction coefficient (γ), defined
in the Langevin integrator, while maintaining the stability of temperature at 300 K. We did
not expect that altering the friction coefficient would have an impact on the calculation of
equilibrium quantities. As γ does not appear in the Hamiltonian of the system, it should not
affect the probability of a given microstate P (X), which is given by the Canonical probability
density exp(−βU (X)). While we did expect it to affect rates, we expected that these effects
would offset: that if unbinding was accelerated 10-fold, we would observe the binding process
to be sped up by the same factor. However, we observe that the on-rate was very stable
as a function of γ, while the off-rate changed slightly. One explanation is that unbinding is
much more rare event when compared to rebinding, and estimates of koff were not converged.
Lower friction coefficients could be accelerating sampling of these events and making it easier
to observe higher probability walkers unbind in our simulations.
    Convergence is of utmost priority in weighted ensemble simulations that calculate kinetic
quantities. In our previous study, we hypothesized that it was possible that extending the
time of the unbinding simulations could capture more high weight walkers exiting from the
bound state. Indeed, we observe a higher unbinding flux in this study across all friction
coefficients. In Figure 3.4, we observe large upward jumps, for all γ values, even after 40
ns of simulation time per walker, which was sampling limit in our previous study. These
upward jumps, as previously described, signify that an exit point was recorded that has a
higher weight than previously observed. This highlights the challenges involved in accurate
determination of rate fluxes for rare events. It is worth noting that by using our correction
terms to account for small unbound volumes and persistent but small electrostatic interac-
tions in the unbound state, we can keep box sizes small, allowing for better convergence of
rate fluxes at fixed computational cost.
                                                63


    Of course the binding free energy alone is still an important quantity for drug design
[163]. If one is only interested in the absolute binding free energy, calculating it through the
ratio of rates is needlessly complicated; free energy is a state function and thus only depends
on the endpoints of the binding pathway. The prediction of koff and kon themselves is
challenging, since they are not state functions: they depend on the transition path ensemble
between the bound and unbound state. Sampling of these physical pathways is a large
challenge for MD, largely due to the long timescales of the binding and release processes.
Ensuring that the ratio of rates is consistent with binding free energy calculations - as done
here - provides an additional, powerful consistency check. In particular, comparing to well-
converged computational benchmarks is more useful than experimental quantities, as we
avoid an additional layer of uncertainty associated with the force field used to describe the
system.
                                                64


                                          CHAPTER 4
     MEMBRANE-MEDIATED LIGAND UNBINDING OF THE PK-11195
                                  LIGAND FROM TSPO
This work was published in the Biophysical Journal volume 120 pages 158-167 in 2021. The
work is presented here as published except that the supplemental figures are worked into the
text.
4.1     Introduction
The binding affinity of a ligand to its protein target has long been viewed as the key parameter
determining its efficacy. However, recent studies have shown that in some protein-ligand
systems residence time (RT) correlates more strongly with efficacy than binding affinity [2].
But unlike the binding affinity, RT is not a state function; it depends on the height of the
free energy barrier separating the bound and unbound states. In order to rationally design
ligands for longer RTs we need to understand the (un)binding mechanism and what molecular
interactions occur along the ligand (un)binding pathway.
    Previous studies have shown that the Translocator protein (18 kDa) (TSPO) is one
such protein where RT is important for predicting efficacy [18]. TSPO is a well-conserved
membrane protein, being present in all kingdoms including prokaryotes as well as in the outer
mitochondrial membrane of eukaryotes [164]. TSPO has five transmembrane α-helices (TM1-
5) along with a small helical region in a 20-residue loop (hereafter denoted as the LP1 region)
connecting TM-1 and TM-2 on the cytosolic side (Fig. 4.1A). While in the membrane, TSPO
is largely found in a dimeric state [165]. To date, four different structures have been solved
for TSPO, for both bacterial [165, 166] and mammalian [167, 168] organisms the former by
X-Ray crystallography, the latter by nuclear magnetic resonance (nuclear magnetic resonance
(NMR)).
    While the structure of TSPO have been solved, its function remains unknown. In humans,
                                                 65


TSPO is highly expressed in steroidogenic tissues, consistent with the hypothesis that it
is involved in the regulation of cholesterol transport across the mitochondrial membrane.
Indeed, TSPO has been shown to have a high binding affinity for cholesterol [169]. There are
other studies linking it to apoptosis [170, 171] and cellular stress regulation in TSPO knockout
mice [172, 173], although evidence for this is mixed [174, 175]. Increased TSPO expression has
also been observed in cases of neurodegenerative diseases such as Alzheimer’s and Parkinson’s
diseases [176]. Relatedly, due to its high expression in areas of inflammation TSPO serves as
a biomarker for neurodegenerative disease and brain trauma, and radiolabeled ligands such
as [H3]-PK-11195, are commonly used in positron emission tomography (PET) scans [177].
PK-11195 is an isoquinoline carboxamide with no known therapeutic effect [175] and a RT
of 34 min in the human TSPO sequence [18, 178].
    Molecular dynamics (MD) simulations have been previously performed using a bound
TSPO-PK-11195 complex. Researchers recently determined the unbinding pathway of PK-
11195 from a rat TSPO model generated from the Protein Data Bank (PDB) 2MGY structure
[139]. To generate unbinding paths they used a combination of random accelerated molec-
ular dynamics (RAMD) [179] and steered MD[180] and determined that PK-11195 unbinds
into the cytosol through the largely disordered LP1 region (Fig. 4.1A). Unfortunately, this
starting structure, determined by NMR, was significantly destabilized by the detergent used
in the purification [181, 182]. Also, the methods used to determine the unbinding pathway
RAMD have the potential to impart bias on the predicted (un)binding path. Another group
performed an induced-fit docking of PK-11195 using Glide [183] with a homology model
to resemble the mammalian (mouse) TSPO structure using the PDB 4UC1 Rhodobacter
sphaeroides structure. They simulated the TSPO-PK-11195 complex for 700 ns and did not
observe significant ligand displacement, which is expected due to the extremely long RT of
the TSPO-PK-11195 complex.
    Here we study the unbinding mechanism for the TSPO-PK-11195 complex, using PDB
4UC1 as the TSPO starting structure [165] and using a weighted ensemble algorithm: Re-
                                                 66


sampling Ensembles by Variation Optimization (REVO) to generate continuous unbinding
pathways without perturbing the underlying dynamics [35]. REVO has been previously
applied to study ligand unbinding on a series of host-guest systems (Chapter 2 and the
trypsin-benzamidine system [35]. In the next section we discuss the methodology used for
the simulations: the REVO resampling algorithm, the clustering algorithm used to make the
Markov State Model (MSM) and the conformational space networks (CSN)s representation,
and rate calculations. In the Results and Discussion we analyze pathways found for dissocia-
tion of PK-11195 from TSPO, residues which bound strongly to PK-11195 along the observed
pathways, and we compare RTs between different starting poses. We then summarize our
findings and discuss how they relate to existing research.
4.2     Materials and Methods
4.2.1    Protein Preparation
The initial TSPO dimer structure is comprised of chains A and B from PDB 4UC1[165]. This
x-ray crystal structure comes from the Rhodobacter sphaeroides with an A139T mutation
to resemble human TSPO. CHARMM-GUI membrane generator[184] was used to place the
TSPO complex into a membrane comprised of 174 phospholipids consisting of 53.4% phos-
phatidylcholine, 28.2% phosphatidylethanolamine, and 18.4% phosphatidylinositol lipids.
10268 TIP3 water molecules were inserted up to a cutoff of 10 Å from the complex and 121
potassium ions and 27 chloride ions were added to reach a salt concentration of 150 mM and
to neutralize the system. The system was placed into a rectangular box with dimensions
96.4 Å x 96.4 Å x 91.8 Å. The protein was simulated using the CHARMM36 forcefield [185]
and parameters for the PK-11195 ligand were obtained with CHARMM Generalized Force
Field (CGENFF) [39, 40].
                                             67


4.2.2    Docking
Six different PK-11195 poses were used in the simulations. Docking was carried out with
Extra Precision (XP) by using Schrödinger Glide[186]. The center of mass (COM) of PK-
11195 was placed at the COM of the bound Protoporphyrin IX in the chain A monomer
of TSPO protein from PDB 4UC1 without any constraints. The XP docking yielded four
poses (D1-D4) and the XP Gscores for the resultant poses can be seen in Fig 4.1B. A
homology model of PK-11195-bound TSPO (Pose R) was generated by Xia et al. as a Rosetta
comparative model of the mouse TSPO structure constructed using TSPO structures from
Mus musculus (PDB 2MGY[167]), R. sphaeroides (PDB 4UC1[165]), and Bacillus cereus
(PDB 4RYI [166]); more details found in Ref. [187]. The TSPO monomer bound to PK-
11195 from this model was then aligned to chain A of the 4UC1 structure using PyMol
1.7.2.1 [188], and the ligand coordinates from the D1 pose were changed to reflect the new
pose. The 4RYI pose was generated by X-Ray crystallography and the coordinates of the
PK-11195 ligand were added to the 4UC1 structure in the same way as pose R. The system’s
energy was minimized using a series of constraints with scripts provided by CHARMM-GUI
for all poses. The molecular structure for each pose is shown in Fig. 4.1B and pose view
diagrams are shown in Fig. 4.2.
4.2.3    Molecular Dynamics
All MD simulations were performed using OpenMM[114] v7.1.1. The time step for every
simulation was 2 fs. To enforce constant temperature and pressure, a Langevin heat bath
was used with a set temperature of 300K and a friction coefficient of 1 ps−1 was coupled to
a Monte Carlo barostat set to 1 atm and volume moves were attempted every 50 time steps.
The non-bonded forces were computed using the CutoffPeriodic function in OpenMM with
a cutoff of 10 Å. The atomic positions and velocities are saved every 15, 000 time steps, or
every 30 ps of simulation time, which is the resampling period (τ ) used here.
                                             68


                        A              LP1
                                                  TM-1
                                             TM-2
                                      TM-5
                        B     LP1      pose 4RYI            pose D3
                                                         Gscore: -7.22
                                          TM-1
                            TM-5
                                        pose D1              pose D4
                                     Gscore: -8.74       Gscore: -7.20
                                        pose D2               pose R
                                     Gscore: -8.49
Figure 4.1: TSPO-PK-11195 system. (A) Front view of the TSPO dimer in the membrane
with PK-11195 bound. (B) All six starting poses are shown from the side view, along the
inter-dimer axis. To compare poses, two moeities of PK-11195 are colored in black
(o-chlorophenyl) and magenta (1-methylpropyl), with the rest of the molecule colored
according to atom name. TM-2 is shown as transparent for clarity.
4.2.4   REVO Resampling
To observe long timescale unbinding of PK-11195, we used a variant of the weighted en-
semble algorithm: REVO[35]. In this algorithm, we perform unbiased MD simulation on 48
separate trajectories in a parallel fashion. Each of these trajectories (called "walkers") has
                                                   69


     A             O
                                                4RYI                                                                                        B                          D1
                                                                                  Phe91
                  C
                                CB
                                                   CE3
                                                                                                                                                           Tip1074                                                 Pro47
                                  Trp50                     CZ3
                                                                                                                                                                                         Phe92
                                     CG CD2
                   CA
                       N                                      CH2                                                                       Tyr54
                              CD1              CE2                                                                                                          OH2     Tip1050
                                                                     C5
                                                                                      C1            Leu142                                                             C5                C2
                                                                                                                                                    OH2                                            C1
                                                  CZ2
                                    NE1
                                  CH2
                                            3.00                                         C2                                                              3.09             N1 C3                                 Trp50
                                                                       N1
                                                                                                                                                      2.83
                      CZ3               CZ2              O                                                                           Tip1183                   O C6                             CL
                                                                                                                                                                                         C4
                          CE3
                                                                           C3                                                                                                                                                       Phe46
                                                          C6
            Trp135
                                       CE2
                                              2.74                                C4
                                                                 C7                                                                                                       C7     N2                 C21      C20
              O
                  C                         NE1                           N2                                Phe46
                           CD2                       C8
                                                                                                                                                                C8                    C15 C16                    C19
                                  CG
                                    CD1                                    C15
                                                                                            C17
                                                                                                   C18
                                                                                                                                                 Gly22
                    CA
                                                            C9
                                                                                                                                                                        C9 C14                      C17C18
           N                                                                                                                                                                                                              Thr139
                                              C10                              C16                   C19
                       CB                                     C14
                                                                                      C21                                                                     C10                    C13
                                                                    C13
                                                     C11                                                          Gly22
                                                                         CL
                                                                                         C20                                                                           C11C12
                          Trp87                         C12
                                                                                                                                                                                                     Trp87
                                                                                                                                                  Asn84
                                             Thr139                             Thr88
                                                                                                                                                                            Thr88
                                                                                                                                                        Ala19
                                                                 Asn84                             Tyr54                                                                                        Tip1045
     C                                              D2                                                                                      D       Tip1268
                                                                                                                                                                        D3    Leu25
                                                      Ala19                                            Tip1153                                                                                               Tip1088
                     Tip1092
      Thr88
                                                                               Thr51
                                                             C4
                                                                                                                                         Tip1044             Pro47                                                         Tip263
                                      Pro47                                 C2
                                                                  C3
                                                                                                                                                                                   C1                Tyr31
                                                                                  C1            Tip1079
                                                         N1
                                                C5
                                                                                                                                                                            C2
         Thr21                                                     C6
                                                                            O
                                                                                                                                Tip1153                                                        C4
                                                                                                                                                     Arg43
                          Trp50                            C7                                                                                                                     C3                                                     Gly22
                                                  C8
                                                                                                                                                                          O           N1
                                                                     N2           Gly22                                                                                                      C5
                                           C10
                                              C9                                                                                                                                                         Phe46              Phe92
                                                                                                                                                                          C6
                                                            C15
                                                                                                                                                                              C7
                                    C11                                                                                                                                                            CL
                                                                                                                                                                   C8
                                                                                 C17                                                                                                 N2
                                                         C14            C16
                                                C13                                                                                              Trp87
                                            C12
             Leu25                                                                     C18
                                                                                                                                                                      C9           C15
                                                               C21                                  Tyr54                                                                                         C21
                                                     CL                                                                                                    C10
                                                                                                                                                                                                         C20
                                                                               C19
                                                                                                                                                                                                                        Asn40
                                                                                                                                                                        C14            C16
                                                                                                                                                               C11          C13
                                                                     C20                                                                                                                    C17          C19
                                                                                                                     Tip3185                  Thr88
                                   Phe46                                                                                                                         C12
                                                                                                                                                                                              C18
                                                                 Arg43
                                                                                                                                                                               Trp50
                                                                                             Tip1075                                                                                                      Leu142
                                                                                                                                                                Thr139
                                                                                                                                                                                          Trp135
     E                                              D4                                                                                      F                             R
                                                                                 Dpp4
                                                                                                                                                        Tyr31                                                    Phe91
                                                     Gly22                                                                                                                    Leu142
                                                                              C1
                                                                                                C4
                                                                                                         Tip1126
                                                                                 C2                                                                                                                   C1
                                                                                     C3
                                Pro47
                                                                                                                                                       Asn40                                                                      Phe92
                                                                                              N1                                                                                      C4                C2
                                                                                                      C5
                                                        Ala23
                                                                                 O C6
                Ala19                                                                                             Tip1025                                                               C3
                                                                                             C7     C8
                                                                                                                                                                              O               N1
                                                                                   N2                                                                                                                  C5
                                                                        CL
                                                                                                        C9    C10                                                                                                    Trp87
   Pro18                                                                                                                                                                       C6
                                     Thr88                                                C15
                                                                                                                 C11
                                                                                                                                                     Phe46
                                                                                                                                                                                     C7
                                                                           C21
                                                                                              C14
                                                                                                                                                                                                                                      Asn84
                                                                               C16
                                                                                                      C13C12              Phe46                                       C8                      N2
                                                                C20
                                                                                        C17
                                                                                                                                                                             C9              C15              C17
                                                                         C19
                                                                                                                                                                                                                     C18
                                            Tyr54                                                                                                            C10
         Thr21                                                              C18
                                                                                                        Phe92                            Trp135                               C14               C16
                                                                                                                                                                    C11              C13               C21            C19
                                                                                                                                                                                                                                Thr88
                         Asn84                                   Trp50                                                                                                                 CL
                                                                                                                                                                      C12                                C20
                                                                                          Trp87
                                                                                                                                                     Trp50
                                                                                                                                 Leu53
                                                                                                                                                                                                             Tip1117
                                                                                                                                                                              Thr139
Figure 4.2: Protein-ligand interaction plots for the six starting conformations. The red
suns indicate that the residue has a hydrophobic contact with PK-11195. The green dashed
lines show hydrogen bonds.
                                                                                                                                         70


a statistical weight (w) that governs the probability with which it contributes to statistical
observables. With periodicity τ , a resampling procedure is performed, where similar walkers
are merged together and unique walkers are cloned, as defined by a distance metric. During
cloning, weights are split, and during merging, weights are added, to ensure conservation of
probability.
    Below we briefly describe the REVO method, focusing on the details of its application in
this work. More information on the algorithm can be found in previous work [35]. In REVO,
merging and cloning is done to maximize a variation function:
                                   X          X X  dij α
                              V =       Vi =                  φi φj ,                      (4.1)
                                     i         i    j
                                                        d0
where dij is the distance between walker i and walker j determined using a distance metric of
choice. For these simulations the distance metric used was the root mean square deviation
(RMSD) of the PK-11195 atoms between each walker, following alignment to a selection
of binding site atoms in TSPO. The exponent α is used to modulate the influence of the
distances in the variation calculation and was set to 4 for all simulations. d0 = 0.148 nm
is a characteristic distance used to make V dimensionless and to normalize the variance for
comparison between different distance metrics. φ is a novelty and here is defined as:
                                                       p 
                                                          min
                                  φi = log(wi ) − log         .                            (4.2)
                                                         100
The minimum weight, pmin , allowed during the simulation was 10−12 . The walker that is
selected for cloning is the one that has the highest Vi and the resultant weight of the clones
is larger than pmin . The two walkers selected for merging are at most 2 Å away, have a
combined weight lower than the maximum allowed weight pmax = 0.1, and is the walker pair
j, k that minimizes the variation loss (Vloss ) defined as:
                                              Vj wk + Vk wj
                                      Vloss =               .                              (4.3)
                                                  wj + wk
Once the walkers (i, j, k) are selected, the new variation is calculated: if it increases, then
these operations are performed and another (i, j, k) is proposed; if it decreases then resam-
pling for that cycle is terminated and a new cycle of MDis performed. Three simulations
                                                 71


were run for each docked pose using 48 walkers and 1200 cycles, for 1.728 µs of simulation
time per simulation. In total each pose was simulated for 5.184 µs.
4.2.5    Boundary Conditions
The overall goal of the simulations was to determine the pathways along which PK-11195
can transition from the initial starting poses to an unbound state. During the simulations,
we defined PK-11195 as being unbound when the minimum distance between the ligand and
TSPO was at least 10 Å. When the ligand crossed this boundary, the weight is recorded and
the walker was "warped" back to the initial conformation. The structure recorded before
warping is known as an exit point. When the walker warps back, the atomic positions and
velocities are reset to their initial values before the simulation began. The walker weight
does not change as a result of warping.
4.2.6    Clustering and Network Layout
The trajectory frames of all 18 REVO runs were clustered together using the MSMBuilder
3.8.0 python library. The frames were featurized using a vector of atomic distances between
TSPO and PK-11195 atoms initially within 8 Å of each other from the 4RYI starting pose
for a total of 7527 distances. A k-centers clustering algorithm was used to generate 2000
clusters using the featurized space and each frame was assigned to a cluster. The clustering
was done using the Canberra distance metric. A count matrix describing the cluster-cluster
transitions was calculated for a lag time of 30 ps.
    We then construct a CSNs from the count matrix, which is a graphical representation of
the transition matrix. Each node, representing each row of the transition, and the edges,
representing non-zero off diagonal elements of the transition matrix, were determined using
the CSNAnalysis package [121]. Gephi 0.9.2[119] was used to visualize the CSN. The size of
each node is proportional to the statistical population of the cluster. For visualization, the
smallest node was set to be 20 times smaller than the largest node. The layout of the network
                                               72


was determined using a force minimization algorithm, Force Atlas included in Gephi. The
algorithm repulses nodes that are not connected and attracts nodes that are connected via
an edge. The strength of the attractive force is proportional to the weight of the edges.
The directed edge weights were values between 0.1 and 100 as determined by wij = 100pij ,
where pij is the transition probability of cluster i transitioning to cluster j. Unidirectional
edge weights were then determined using the average between the two directed edge weights.
Force Atlas was applied twice. The first minimization was done without adjusting for node
sizes, allowing the nodes to overlap. The second minimization adjusted for the node size and
prevented overlap. For visualization, all edges are shown with a uniform line weight.
4.2.7     Quantifying Unbinding Pathways
Upon analysis of the simulation results, the only unbinding pathways observed in our sim-
ulations were PK-11195 dissociating through pairs of transmembrane helices. We therefore
introduce the coordinate Qij which measures the minimum x-y distance from the COMof
PK-11195 to the line formed by the COMs of helices i and j to measure the dissociation
progress of PK-11195 into the membrane. Negative values indicate the COMof the ligand
is closer to the center of the helical bundle, and positive values indicate the COMis closer
to being fully dispersed in the membrane. All six poses had trajectories where PK-11195
traveled between transmembrane helices 1 and 2 and only pose R had trajectories where
PK-11195 went between transmembrane helices 2 and 5. For pose R analysis we separate
the conformations according to which value (Q12 or Q25 ) is largest. Projections onto a given
Q value will only use conformations for which that Q value is the largest.
4.2.8     Calculating Non-bonded Energies
We calculated the non-bonded interaction energies (Eint ) by:
                                       Eint = VLJ + VES ,                                 (4.4)
                                               73


where VLJ is the Lennard-Jones potential energy and VES is the potential energy from electro-
static interactions. The Lennard-Jones interactions were determined using a 12 − 6 potential
given as:
                                                           
                                               σ 12  σ 6
                                  VLJ  = 4           −         ,                          (4.5)
                                                r          r
where r is the atomic distance between atoms, σ is the inter-atomic distance at which the
potential is 0, and  is the depth of the potential well. To calculate σ and  we used the
Lorentz-Berthelot combining rule. There was a hard cutoff distance of 10 Å when calculating
the Lennard-Jones potential. The electrostatic energy was calculated using:
                                                  1 Qi Qj
                                       VES =                ,                              (4.6)
                                               4π0 rij
where Qa is the charge of atom a, rij is the interatomic distance between atoms i and j,
0 = 8.854 ∗ 10−12  F
                    m
                       is the permittivity of free space in farads per meter. The specific σ, ,
and Q, for each atom type was provided by CHARMM-36 parameter files obtained through
CHARMM-GUI. Two sets of non-bonded energies were calculated: between PK-11195 and
TSPO, and between PK-11195 and lipids in the membrane.
4.2.9    Calculating Off-Rates and Mean First Passage Times using Hill Relation
The rates are calculated using the flux of trajectories into the unbound basin, also known as
the Hill relation[117, 85, 88], defined as
                                                  P
                                                      wi
                                         kof f =    i
                                                         ,                                 (4.7)
                                                    T
where wi is the weight of the walker entering the unbound basin, and T is the total simulation
time. During the simulations the unbound basin was defined by the 10 Å boundary condition.
However, although many walkers had dissociated into the membrane, no walkers made it to
the boundary. Therefore, to obtain estimates of unbinding rates, after the simulations were
completed the unbound basin was redefined using a minimum distance of 5 Å as we found
negligible interaction energy between PK-11195 and TSPO at this distance (Fig. 4.3). In
                                                74


our simulations we observed a total of 2285 instances of trajectory crossings into the 5 Å
unbound basin. This is broken down by starting pose as follows: 4RYI (47), D1 (4), D2
(1804), D3 (278), D4 (152) and R (0). In our analysis, once a walker entered the unbound
basin, we ignored all future trajectories associated with that walker. This was done to
prevent double-counting of unbinding transitions. The mean first passage time (MFPT),
synonymous with the RT, was calculated as
                                                           1
                                               MFPT =             .                      (4.8)
                                                          kof f
The uncertainty of off-rates and MFPT for each pose is the standard error across each set
of simulations.
                       0
 <Eint> (kcal/mol)
                     −10
                     −20
                     −30
                     −40
                           2      4       6       8          10
                               Min PK-TSPO Distance (Å)
Figure 4.3: The energy of non-bonded interactions between PK-11195 and TSPO as a
function of minimum distance between PK-11195 and TSPO.
4.2.10                 Calculating Mean First Passage Times using Markov State Models
We create transition matrices, T (τ ), for various lag times (τ ) using the cluster identities
from the CSN and tracking walkers through merging and cloning operations in the REVO
resampler. We alter these matrices to include a probability sink for states that are unbound,
defined as when PK-11195 is at least 5 Å away from the TSPO dimer. We run a Markov
                                                    75


chain simulation for a given starting pose and lag time by initializing a probability vector,
P , where all of the probability starts at the state of a given starting pose. To progress the
simulation we use the following: Pk = P0 T (τ )k where P0 is the initial probability vector,
and Pk is the probability vector after k time steps. We continue the simulations until all
the probability accumulates in the unbound basin. We then calculate the MFPT using the
following formula:
                                                        tk + tk−1
                              MFPT = Σk (pk − pk−1 )(             )                      (4.9)
                                                            2
where pk is the probability of being unbound at time step k and tk is the time associated
with time step k. We repeat this for all initial poses and lag times to determine MFPT as a
function of lag time.
4.2.11    Selecting Poses for Straightforward MD Simulations
To strengthen the accuracy of our Markov state model, we run straightforward simulations
at weak points in the network. To determine these weak points, we randomly multiplied the
elements of a row on the transition matrix with numbers drawn from a Gaussian distribution
with a mean (µ) at 1 with a standard deviation (σ) of 0.2 and we re-normalized the row after
perturbation. We rerun the Markov chain simulations to calculate the MFPT. To get a sense
of how consistently the cluster alters the MFPT, we randomly perturb the transition matrix
10 times independently. Weak points in the network are determined by the clusters whose
perturbations affect the MFPT the most, using the following formula: δMFPT /MFPT, where
δMFPT and MFPT are the standard deviation and average of the perturbed MFPT values,
respectively. For two poses, this ratio was greater than 0.2; we identified these clusters as
weak points and reran straightforward MD simulations from the highest weighted structure
in that cluster. From each weak point we launched 144 independent straightforward MD
simulations for a length of 500 cycles (15 ns). In addition we launched trajectories from
high lipid accessible surface area (LASA) clusters in the central unbound region and each
                                               76


of the high-LASA states originating from pose R. In total we ran 10.8 µs of supplemental
trajectories to bolster our Markov state model.
4.3     Results
4.3.1    PK-11195 Unbinding Pathway
We comprehensively studied the TSPO-PK-11195 interaction landscape using a set of REVO
simulations initialized at six different starting poses (Fig. 4.1B), simulating 5.184 µs per pose.
After the simulations were completed, all frames were clustered together into a CSN shown
in Fig. 4.4, where each node represents a PK-11195 pose and the edges reveal which poses
interconvert in our simulations within a 30 ps lag time. All of the starting poses form a
connected network, though pose R is only connected via two low probability edges to the
pose 4RYI ensemble (Fig. 4.6). The 4RYI pose is similarly connected to pose D4, but is
also connected to the other docked poses via the high LASA clusters. It is worth noting
that both pose 4RYI and pose R were the only poses that were not designed for this specific
protein structure and were instead inserted from other protein structures after alignment.
Consistent with this fact, both of these regions in the CSN do not show accumulation of
probability into one or more high-probability states. Instead we observe a broader distribu-
tion among many low probability states indicating a lack of a local funneling in the energy
landscape. Interestingly, all of the docked poses (D1-D4) show at least one high-probability
state, although this is not necessarily at the initial docked pose itself, indicating that some
relaxation is required from the docked poses to reach the true local minima.
    Contrary to what was observed in previous work [139], PK-11195 did not dissociate into
the solvent via the LP1 region: it instead dissociates into the membrane. The CSN shows
that all the poses, besides pose R, connect directly to the unbound states, shown in yellow
and orange, where PK-11195 is fully dissolved into the lipid membrane. In all of these
pathways, PK-11195 exits between TM-1 and TM-2. The pose R trajectories show two
different pathways that have a moderate LASA – one between TM-1 and TM-2 and another
                                                 77


    Lipid-Accessible Surface Area (Å2)
   0                  275                    550
                                                                                    R
                    D3
                                                      4RYI
   D1
        ‡                                      ‡
    D1                                  4RYI
           D2
                                                                                     R‡
                 ‡
             D2
                                        D4
Figure 4.4: Combined CSN of all REVO simulations from each starting pose. Each node
in the network represents a cluster of ligand poses and is sized according to the cluster
weight. Nodes are connected by edges if the ligand poses are observed to interconvert in
the REVO trajectory segments. Nodes are colored according to the LASA. Starting poses
are marked in bold and transition state poses shown in Fig. 4.5D are marked in italics.
                                              78


A                                            B                                                                                                          C
                                                                 0                                                    Q25                                             100                          * *
 Helix 5                                                       −10
                                                                                                           Q12                                                       10-4
                                                                                                                                                       Probability
                                             Eint (kcal/mol)
                                                               −20
                                                                                                                                                                                                                             D2
                                                                                                                                                                     10-8
                                                               −30                                         Q12                    Membrane-PK
                                                                      TSPO-PK
                                                                                                                      Q25
                Q12                                            −40                                                                                                   10-12
                                                                                                                                                                                                                      D1
                       -ve
                                                               −50
    Helix 2           +ve                                            −10                                    0                10             20                                             −10          0             10          20
                                   Helix 1                                                                       Q (Å)                                                                                      Q12 (Å)
D
                        D1‡                                                4RYI‡                                                                 D2‡                                                                       R‡
                        Q12 = -0.6 Å                                       Q12 = -0.1 Å                                                          Q12 = -1.3 Å                                                              Q12 = 0.1 Å
E                       Helix 4                                                        F                                                                             G
     Helix 5                                                                                                                                                                                     Membrane upper bound
                                                                                Min TSPO-PK Distance (Å)
                             Helix 3                                                                       8
                                                                                                                                                                       PK Z-COM (Å)
                                                                                                                                                                                      40
                                                                                                           6
           Helix 2                                                                                         4
                                                                                                                                                                                      20
                             Helix 1
                                                                                                                                                                                                 Membrane lower bound
                                                                                                           2
                                                                                                                 −5      0    5       10    15    20                                       -10         0              10          20
                             Top                                      Front
                                                                                                                                  Q12 (Å)                                                                      Q12 (Å)
Figure 4.5: Analysis of membrane-mediated exit paths. (A) The coordinate Qij is defined
as the x-y distance between the center of mass of PK-11195, shown as sticks and colored by
atom type, and the line that connects the centers of mass of helix i and helix j. LP1 is not
shown here for clarity. (B) The expectation values of the interaction energy between
PK-11195 and TSPO (blue) and between PK-11195 and the membrane (black) are shown
as a function of Q. In each case the solid line shows Q12 and the dashed line shows Q25 .
The shaded region indicates the standard error over the ensemble of measurements at each
Q value. (C) Probability curves projected onto Q12 for simulations initialized in Pose D1
(blue) and D2 (orange). Q12 values of the starting structures are marked with (*). (D)
Poses from transition pathways with Q ≈ 0. These poses are also labeled in the CSN of
Fig. 4.4. Phe46 is shown in purple and Trp50 is shown in orange. (E) A set of poses along
the Q12 pathway colored from bound (red) to unbound (blue). Top view is shown on the
left and a front view is shown on the right. (F) The minimum PK-11195-TSPO distance
and the Q12 value is shown for each pose in panel (E). (G) The z COMposition as a
function of Q12 . The red lines indicate the upper and lower bounds of the membrane as
defined by the maximum and minimum z coordinate of the lipid membrane.
                                                                                                                 79


                      4RYI                            D1                              D2
                       D3                             D4                               R
Figure 4.6: CSN networks indicating the clusters that were observed from each initial pose.
Red nodes indicate the simulations observed a TSPO-PK-11195 conformation that was
clustered into that node.
between TM-2 and TM-5 – where PK-11195 forms direct interactions with membrane lipids.
    We introduce the coordinate Qij , which measures the minimum x-y distance from the
center of mass of the ligand to the line connecting the centers of mass of helix i and helix j
(Fig. 4.5A). Negative Q values indicate the ligand is within the helical bundle and positive
values indicate the ligand is outside the bundle. This provides a basis to compare between
different pathways and a means of obtaining general information about membrane-mediated
ligand unbinding pathways. Fig. 4.5B compares the TSPO-PK-11195 interaction energy
(Eint ) with membrane-PK-11195 interaction energy. In the Q12 pathway (solid lines), PK-
11195 interacts more closely with the lipid membrane than TSPO after about 5 Å. For
                                             80


the Q25 pathway (dashed lines) this crossover occurs at 7.5 Å. The difference is due to
differences in the orientation of PK-11195 along the two pathways. Fig. 4.5D shows the
transition states labelled in Fig. 4.4 where the Q values are approximately equal to zero
along each dissociation pathway. We see that each structure is still heavily informed by
its starting pose, with very different PK-11195 orientations. Fig. 4.5C shows probability
distributions projected onto Q12 for starting poses D1 and D2. This shows that although
D1 started further backward on the unbinding pathway, the simulations discovered another
high-probability basin around Q12 = 0, which can also be seen by the high-probability states
around D1‡ . A representative Q12 dissociation pathway is shown and analyzed in Fig. 4.5E
and 4.5F. Note that while the Q12 value increases steadily along the pathway, the minimum
distance between TSPO and PK-11195 (used to define the unbound state) rises rapidly only
as PK-11195 reaches a Q12 of about 15 Å. Additionally, we track the PK-11195 center of
mass as a function of Q12 (Fig. 4.5G). Once it gets fully dissociated into the membrane,
PK-11195 does not travel closer towards the solvent in either direction. Rather it interacts
strongly with the hydrophobic tails and remains at approximately the membrane midpoint
over the course of our simulations.
    We also measure interaction energies between PK-11195 and individual residues for all
residues on TM1, TM2, TM5 and LP1 (Fig. 4.7-4.10). Early in both the Q12 and Q25
pathways, PK-11195 strongly interacts with aromatic residues Phe46 and Trp50 forming
π-π interactions. These aromatic residues with long side chains follow PK-11195 along the
unbinding pathway, which is observed by plotting the Q value of individual residues as a
function of Q-PK-11195 (Fig. 4.11 and Fig. 4.12). Interestingly, this phenomenon occurs for
smaller amino acid side chains as well: Gly22 and Pro47 both change Q value significantly
over the Q12 pathway, indicating significant distention of the helices during unbinding.
    Finally, we investigated the similarity of the PK-11195 conformations within each cluster
with respect to the dihedral angles along four different rotatable bonds (see Fig. 4.13-4.15).
The standard deviation for all the angles is generally low (below 85 degrees), however there
                                                81


                         0
  < Eint > (kcal/mol)
                                   Gly22
                                           Phe46
                        −2                                  Pro47
                        −4
                        −6                 Trp50
                             −10              0             10            20
                                                  Q12 (Å)
Figure 4.7: Expectation value for Eint as a function of Q12 . The lines are colored by
residue. Only residues who have a minimum interaction energy below −3.5 kcal/mol are
shown. The standard error is shown in the lighter shaded regions.
                         0
  < Eint > (kcal/mol)
                                                Asn40
                        −2
                                     Leu142
                        −4                                          Pro41
                        −6
                             Trp50
                                        Phe46                       Trp135
                        −8
                               −2.5        0.0       2.5     5.0         7.5
                                                  Q25 (Å)
Figure 4.8: Expectation value for Eint as a function of Q25 . The lines are colored by
residue. Only residues who have a minimum interaction energy below −3.5 kcal/mol are
shown. The standard error is shown in the lighter shaded regions.
are clusters in high LASA regions on the network that have a higher standard deviation.
This indicates that PK-11195 has more degrees of freedom when it comes to rotation when
it is within the membrane. However, when looking at the overall angle range for the network
clusters, there are several clusters with a high overall range, indicating that different ligand
conformations are occasionally being clustered together. In particular, rotatable bond 1
                                                                    82


                          −4.0
  Min <Eint> (kcal/mol)
                                                                             Gly22
                          −4.5
                                                                        Pro47
                          −5.0
                                       Phe46
                          −5.5
                                                   Trp50
                          −6.0
                              −10              −5              0                    5
                                                    Q12 (Å)
Figure 4.9: The residues with the strongest non-bonded interactions with PK-11195 on the
Q12 pathway. This summarizes the curves in Fig. 4.7, plotting the minimum Eint against
the Q12 value for which this minimum value is observed. The colors indicate the region of
TSPO, blue for residues on TM-1 and black for residues on TM-2. Only residues with a
non-bonded energy below -3.5 kcal/mol are shown.
  Min <Eint> (kcal/mol)
                                                                   Pro40
                          −4
                                                                         Pro41
                                                               Leu142
                          −5
                               Trp50
                          −6
                                        Phe46
                                                                           Trp135
                          −7
                                   −2          0          2         4           6
                                                   Q25   (Å)
Figure 4.10: The residues with the strongest non-bonded interactions with PK-11195 on
the Q12 pathway. This summarizes the curves in Fig. 4.8, plotting the minimum Eint
against the Q25 value for which this minimum value is observed. The colors indicate the
region of TSPO, red indicates residues on the LP1 loop, black for residues on TM-2 and
orange for residues on TM-5. Only residues with a non-bonded energy below −3.5
kcal/mol are shown.
has a range of 90 degrees or higher for most states, and rotatable bond 2 has a range over
150 degrees in the D1-D3, D4, and R basins as well as in the states where PK-11195 has
                                                                        83


                        4
   <Q12> residue (Å)
                                    Pro47
                        2
                        0           Gly22
                       −2           Phe46
                       −4           Trp50
                            −10        0             10         20
                                           Q12   PK Å
Figure 4.11: Residues moving along with the ligand during dissociation. Expectation
values of Q12 for individual residues are shown as a function of the Q12 of PK-11195.
   <Q25> residue (Å)
                       10
                                     Pro41
                                     Asn40
                        5            Phe46
                                     Trp135
                        0            Leu142
                                      Trp50
                             −2.5    0.0    2.5  5.0           7.5
                                        Q25 PK Å
Figure 4.12: Residues moving along with the ligand during dissociation. Expectation
values of Q25 for individual residues are shown as a function of the Q25 of PK-11195.
dissociated into the membrane. Therefore, it is likely that the distance metric, defined as a
set of atomic distances from PK-11195 to the TSPO binding site, is good at distinguishing the
PK-11195 location but not necessarily good at defining the internal coordinates of PK-11195.
It is thus possible that the clustering procedure introduced some unphysical connections and
the network should be seen as representing an upper bound of the connectivity between the
bound states.
                                                          84


4.3.2    PK-11195 Rates and Residence Times
We directly estimate the unbinding rates (kof f ) by summing the weights of the unbinding
trajectories and we calculate the MFPT by inverting the unbinding rate for each starting
pose (Fig. 4.16A). Pose D2 had a high unbinding flux and a predicted MFPT of less than
0.02 s, indicating a clear lack of stability with respect to the other poses. Poses D3 and D4
had predicted MFPTs of 2.6 and 4.1 minutes, respectively, still lower than the experimental
measurements; these estimates are likely to continue to decrease with further simulation
time. Poses 4RYI and D1 had MFPT estimates near or above the experimental MFPT (28
and 260 min, respectively). No unbinding events were observed for Pose R, implying an even
longer MFPT than 260 min.
    One of the issues with performing simulations via weighted ensemble is ensuring the
simulations converge. A lack of convergence introduces additional uncertainty into kof f and
MFPT calculations. To address this issue, we launch a set of Markov chain simulations using
the transition matrix that constructed the CSN. Due to the unphysical connections between
various clusters, we constructed pose-specific networks by only including states that were
visited by trajectories that were generated from a given starting pose. We again find that
the D2 pose has a low MFPT, though 2 orders of magnitude less than that calculated by
the Hill relation. Calculating the MFPT using the MSM showed that all poses besides D2
were on the same order of magnitude as the experimental RT, and were within an order of
magnitude of that determined by the Hill Relation. Since there were no trajectories starting
from pose R that entered the unbound basin, a MFPT could not be computed for this
starting pose without additional simulations.
    The accuracy of the MFPT calculations however, assumes that the transition matrix
determined from the simulations has converged. To test for convergence, we run additional
straightforward simulations at the bottlenecks of the network and rerun the MFPT calcula-
tions by combining the old and new trajectory data. Two such bottlenecks were identifed:
the connections between pose R and pose 4RYI as well as between poses D2 and D4. In
                                                85


order to better sample the unbound state, we also ran straightforward MD simulations from
high-LASA poses in the Q12 and Q25 pathways that were seen by pose R as well as the most
probable state in the unbound basin. We then reclustered and remade the CSN network to
include the new frames (Fig. 4.17). Several connections were formed between pose 4RYI
and pose R, which also gained connections the other poses after reclustering. Additionally,
the most probable region in the network was once again the D1-D3 basin as determined by
the steady state probabilities of each state.
    With the addition of the straightforward simulations, we recalculated the pose-specific
MFPTs from each starting pose. The new simulations did not show pose R progress enough
along the unbinding pathway to enter the unbound basin, and therefore we again could not
compute a RT for this pose. The MFPT for pose D2 increased by five orders of magnitude
in the new D2 MSM, but it is still the pose with the fastest unbinding pathway. This is
likely a result of reclustering after the addition of the new trajectories. Accordingly, when
we recalculate a new MSM that uses the new clusters but excludes the new trajectories from
the transition matrix, we find only an additional slight increase of the D2 MFPT from 0.13
to 0.16 minutes.
    Poses 4RYI, D1, and D4 all had MFPTs on the same order of magnitude as the original
MSM simulations and D3 had an MFPT that was lower by a factor of ∼10. The lack of
change between the MFPT calculations for slow unbinding events indicates that the original
MSM had converged enough to produce a reliable estimation for those poses. In terms
of stability, D2 consistently has the fastest unbinding events and is consistently the most
unstable pose we simulated. Poses 4RYI, D1, D3, and D4 all have similar levels of stability,
as can be seen by their similar RTs. Due to the lack of unbinding events for pose R we can
not measure how stable it is in comparison to the other starting poses, but we can say that
starting in the pose R basin is more stable than the other poses we simulated.
                                               86


4.3.3    PK-11195 Transition State
Our final goal was to determine the location of the transition state along the unbinding
pathway. Fig 4.17B, shows the committor probability of each state in the final network.
The vast majority of the states have a near zero committor to the unbound state. Only
once PK-11195 is dissociated into the membrane does the committor probability begin to
significantly increase. We built an ensemble of transition states using the centroid structures
for the two nodes with committor probabilities between 0.4 and 0.6. In this way, we estimate
that the transition state – where the committor equals 0.5 – occurs when PK-11195 has
begun dissociating into the membrane and has reached a Q12 of ∼ 10 Å. For these states
we find that the non-bonded interaction energy between TSPO and PK-11195 is roughly −5
kcal/mol (compared to −45 kcal/mol in the bound state), whereas the interaction energy
between PK-11195 and the lipid membrane has increased to −40 kcal/mol at this Q12 (Fig.
4.5B).
    To ensure that this result is not affected by any unphysical connections between bound
poses, we also calculated the committor probability for each state in the pose-specific net-
works (Figs. 4.18-4.22). We determined pose-specific transition states for each of the initial
poses that had unbinding events (e.g. all except pose R) and found that they were all located
in the membrane after PK-11195 had dissociated from TSPO. This confirms the results from
the committor probability analysis of the full network. Further, these transition states all
demonstrated a mix of direct PK-11195-TSPO interactions and PK-11195-lipid interactions.
Together these results suggest that the membrane presents a physical barrier that acts to
trap PK-11195 near TSPO and forms the rate-limiting step of PK-11195 dissociation into
the membrane.
                                              87


                                                    Average Dihedral Angle(Degrees)
                                                                   90
                                                       0                        180
                                                                  270
      Rotatable Bond 1                                  Rotatable Bond 2
      Rotatable Bond 3                                  Rotatable Bond 4
Figure 4.13: The average dihedral angles for the MSM states for four different rotatable
bonds on the PK-11195 ligand.
                                            88


                                                    Dihedral Angle Standard Deviation (Degree)
                                                   0                    85                170
         Rotatable Bond 1                             Rotatable Bond 2
         Rotatable Bond 3                             Rotatable Bond 4
Figure 4.14: The standard deviation of the dihedral angles for the MSM states for four
different rotatable bonds on the PK-11195 ligand.
                                            89


                                                        Dihedral Angle Range (Degree)
                                                    0                  90             180
        Rotatable Bond 1                              Rotatable Bond 2
        Rotatable Bond 3                              Rotatable Bond 4
Figure 4.15: The range of the dihedral angles for the MSM states for four different
rotatable bonds on the PK-11195 ligand.
                                            90


           A
                         103
                                                                            D1 (260 min)
                                   Exp. (34 min)
                                                                            4RYI (28 min)
                         101
                                                                            D4 (4.1 min)
           MFPT(min)
                                                             D3 (2.6 min)
                         10   -1
                                   No unbinding events               D2 (0.015 s)
                         10-3
                                   observed for pose R
                                           500         1000         1500
                                            Total Simulation Time (ns)
           B
                         102
                         100
            MFPT (min)
                         10-2
                         10-4
                                    4RYI        D1       D2        D3       D4
                                                        Pose
Figure 4.16: (A) MFPT estimates using unbinding fluxes observed over the course of
REVO simulations. The light shaded area shows the standard error across the three
simulations conducted for each pose. (B) A bar graph of the final MFPTs comparing the
Hill Relation (green), MSM simulations before (grey), and after (black) the addition of new
straight forward MD simulations. Pose-specific MFPTs were computed from MSMs that
were built using only trajectories generated from that starting pose. Simulations starting
from pose R never entered the unbound basin and thus MFPTs could not be determined
by either method. The experimental MFPT of 34 min is shown as a dashed blue line in
each panel.
                                                        91


            A                   4RYI                  Lipid Accessible Surface Area (Å2)
                                                     0               287             575
              R
                                                       D2
                                                                   D4
                                       D1
                                    D3
            B                                             Committor Probability (%)
                                                     0               50              100
Figure 4.17: Combined conformation space network of all REVO simulations from each
starting pose with the addition of frames from straightforward MD simulations, colored by
(A) LASA and (B) committor probability. Starting poses are marked in bold in panel A.
                                             92


     4RYI
                                                              Committor Probability (%)
                                                          0             50              100
Figure 4.18: An MSM network including both straight forward and REVO trajectories
colored by pose specific committor probability values calculated using trajectories
beginning in pose 4RYI. States that were not visited by these simulations are colored grey.
                                            93


     D1
                                                              Committor Probability (%)
                                                          0             50              100
Figure 4.19: An MSM network including both straight forward and REVO trajectories
colored by pose specific committor probability values calculated using trajectories
beginning in pose D1. States that were not visited by these simulations are colored grey.
                                            94


     D2
                                                              Committor Probability (%)
                                                          0             50              100
Figure 4.20: An MSM network including both straight forward and REVO trajectories
colored by pose specific committor probability values calculated using trajectories
beginning in pose D2. States that were not visited by these simulations are colored grey.
                                            95


     D3
                                                              Committor Probability (%)
                                                          0             50              100
Figure 4.21: An MSM network including both straight forward and REVO trajectories
colored by pose specific committor probability values calculated using trajectories
beginning in pose D3. States that were not visited by these simulations are colored grey.
                                            96


     D4
                                                              Committor Probability (%)
                                                          0             50              100
Figure 4.22: An MSM network including both straight forward and REVO trajectories
colored by pose specific committor probability values calculated using trajectories
beginning in pose D4. States that were not visited by these simulations are colored grey.
                                            97


4.4     Discussion and Conclusion
The results of our simulation show that from all six initial PK-11195 poses, using the R.
sphaeroides TSPO structure, the ligand dissociates into the membrane through the trans-
membrane helices. We found a pathway between TM1 and TM2 and a lower probability
pathway between TM2 and TM5. These pathways identify residues with which PK-11195
has high interaction energy. Among them are aromatic residues: Phe46 and Trp50 which
form π-π interactions with the ligand. The interactions with the Trp50 rings are also found
in different bound states. We note that the Trp50 residue happens to be highly conserved
across organisms of several species and kingdoms. These stabilizing interactions could lower
the barrier to entry for other TSPO ligands such as protoporphyrin-IX and heme, which are
also largely aromatic.
    Previous results [139] using a different starting pose and TSPO structure showed PK-
11195 dissociating into the cytosol through the LP1 loop region. The TSPO structure
used in the previous study was built from a homology model based on the mouse NMR
TSPO structure and used the rat sequence, whereas our structure was determined from
X-Ray crystallography from R. sphaeroides TSPO. As mentioned in the Introduction, this
NMR structure was destabilized by the detergent used in the purification [181, 182], which
likely affected the homology model structure as well. This, in addition to the differences
in sequence, results in several key structural differences between the mouse (PDB 2MGY
[167]) and R. sphaeroides (PDB 4UC1 [165]) structures. TM1 in the mouse structure is
significantly longer and the top portion of the helix is at a drastically different angle than
the helix in the structure we used in our simulations. While the LP1 region is present in both
structures, the R. sphaeroides sequence has a small α-helix which in the mouse structure
is incorporated into TM1. Finally, the LP1 region in R. sphaeroides has several stabilizing
interactions [165] between non-bonded residues such as between Trp30-Met97, Asp32-Arg43
and Trp39-Gly141 that are not present in the mouse structure. This stabilization limits
the freedom of motion of the LP1 loop, sterically hindering PK-11195 from leaving via the
                                              98


LP1 pathway. In addition to TSPO structural differences, previous results were obtained
using a 2:1 POPC:cholesterol lipid bilayer, while our results used an approximately 2.9:1.6:1
mixture of POPC:POPE:POPI lipids. Cholesterol is known to bind to TSPO, although
known binding sites are not close to the TM1-TM2 pathway found here. Differences in lipid
composition could also affect membrane fluidity, which could impact the relative probabilities
of the LP1 and TM1-TM2 pathways. It will be an important goal of future work to parse
the relative impact of these differences (protein sequence, protein structure and membrane
composition) in determining ligand dissociation pathways.
    There is interest in designing new TSPO ligands with longer RTs [18, 178]. The ligand
binding transition state is the rate-limiting step of ligand binding and release, which can
also be identified in simulations by a committor probability of 0.5 between the bound and
unbound basins. Here we find that the ligand binding transition state occurs when the ligand
has only minimal direct contact with TSPO, with a Q12 of ∼ 10 Å. In addition to details of
the bound state, this implies that TSPO ligand RT is primarily affected by properties related
to membrane permittivity and diffusivity, such as hydrophobicity. These results lead to the
hypothesis that the membrane composition could have a direct impact on ligand binding
kinetics of PK-11195.
    This work also raises questions about membrane insertion and removal along ligand
binding paths. Additional REVO simulations with only PK-11195 and the lipid membrane
could reveal the membrane diffusion coefficient of PK-11195 as well as rate constants for
insertion and removal to form holistic models of membrane-mediated binding that stretch
from solvent to binding site. A larger question is how the presence of other proteins known to
interact with TSPO, such as voltage dependent ion channel (VDAC) [170] and cytochrome
P450s [189] affect the unbinding/binding and insertion/removal pathways. Cholesterol could
also affect the binding pathways of PK-11195, either by binding to TSPO and affecting a
conformational change, or through membrane fluidity, which could affect the (un)binding
rate of PK-11195 as it interacts with the membrane [190].
                                              99


    Although it is exciting that our predicted RTs come so close to experimental quantities,
some caution should be exercised in making this comparison. First, it has been previously
shown that some simulations using traditional MD force fields do not produce reliable estima-
tions for RT [58]. However, we note that this result was mainly due to a lack of polarizability
in the force field and errors in parameters that overestimate the electrostatic interactions.
In our system, PK-11195 is uncharged and we do not expect these errors from the force field
to dramatically influence the RT. Another thing to note is the experimental MFPT reported
by Costa [18, 178] was determined using human TSPO while our simulations used the struc-
ture from R. sphaeroides containing a A139T mutation. While the mutation was designed
to mimic the human TSPO structure [165], the human and R. sphaeroides sequences have
low homology (30%) which could potentially result in different transition paths, transition
states and unbinding rates. Furthermore, these results emphasize that we should take care
to ensure consistency of the “unbound” state from simulation and experiment. In radioligand
displacement assays, any ligand pose that is not sterically blocking entry of the radiolabelled
competitor ligand would be considered “unbound” [191]. However, in surface plasmon reso-
nance, a ligand would still be considered bound until it dissociated from the detergent that is
bound to the chip along with TSPO. Our simulations show how differences in the definition
of the unbound state can lead to significant differences in RT, and could help rationalize
differences between experimental RTs obtained with different methods.
                                             100


                                      CHAPTER 5
    ATOMIC-RESOLUTION PREDICTION OF DEGRADER-MEDIATED
   TERNARY COMPLEX STRUCTURES BY COMBINING MOLECULAR
        SIMULATIONS WITH HYDROGEN DEUTERIUM EXCHANGE
The research conducted in this chapter was done in collaboration with Roivant Discovery. I
prepared, simulated and analyzed the REVO simulations and compared my results with other
methods. Roivant Discovery performed the crystallography, hydrogen deuterium exchange,
docking, and HREMD experiments and analyzed the HREMD experiments to construct the
free energy landscape.
    This work is not yet published but is available on BioRxiv [192]. This chapter is an
excerpt from that work, presenting only what pertains to the REVO simulations. The
crystallography section denotes how the reference structure for the ACBI1 PROTAC was
determined and REVO attempts to replicate the warhead conformation from this structure.
The HDX experiments were used to determine what residues are protected from hydrogen-
deuterium exchange when the ternary complex was formed and we created a distance metric
used in REVO to maximize contacts between these residues. The I-RMSD of the bound
states from docking and REVO simulations were compared. The most probable states from
the REVO simulations were projected onto the free energy landscape constructed via the
HREMD simulations.
5.1     Introduction
Heterobifunctional degrader molecules are a class of ligands that induce proximity between
a target protein of interest (POI) and a E3 ubiquitin ligase, which can ultimately lead to
ubiquitination of the POI and its subsequent proteosomal degradation through a complex
machinery of proteins[193]. These degrader molecules provide the opportunity of a novel
therapeutic modality, single molecules induce catalytic turnover of the POI, and potentially
offer an avenue for modulation of targets traditionally labeled as undruggable by classical
                                            101


therapeutic strategies [194, 195, 196]. The subset of degrader molecules classified as het-
erobifunctionals, also known as proteolysis-targeting chimera (PROTAC) molecules, consist
of two separate moieties joined by a “linker”; the “warhead” binds to the and the “ligand”
binds to an E3 ligase such as Cereblon [197, 198, 199], cIAP [200], Keap1[201], and the von
Hippel-Lindau disease tumor suppressor (VHL) [202, 203, 204]. In each case it is the ability
of the warhead-linker-ligand degrader molecule to induce a ternary complex that is critical
for bridging the interaction between the POI and an E3 ligase (which can be the native or
non-native degradation partner for the POI).
     The formation of the POI-degrader-E3 ternary complex is central to the targeted protein
degradation (TPD) process, but how the formation of the ternary structure impacts protein
degradation is still poorly understood, especially given the dynamic nature of the non-native
induced proximity complex[23]. X-ray crystallography, the primary experimental technique
for determining 3-dimensional structures of the ternary complex [205], provides a high res-
olution structure of a single conformational state, but a growing body of evidence suggests
that the dynamic nature of the ternary structure is integral to the binding cooperativity (the
term used to describe degree to which the binding affinity of ternary complexes are ther-
modynamically different than the binary counterparts) and degradation efficiency. A study
targeting the degradation of Burton Tyrosine Kinase by Cereblon found that optimal protein
removal was achieved through a molecule that induced a non-cooperative ternary complex,
demonstrating a disconnect between binding affinity and degradation efficiency[206]. Simi-
larly, Burton Tyrosine Kinase was also found to non-cooperatively interact with cIAP but
still led to high degradation efficiency[194]. Interestingly, NMR and crystallography revealed
a structural ensemble being sampled by this ternary complex, suggesting specific conforma-
tions could be responsible for efficient downstream ubiquitination[207]. In contrast, studies
with SMARCA2 and VHL found that more cooperative molecules led to higher degradation
efficiency[208]. Furthermore, analysis of the ternary structures revealed a high degree of
similarity despite the fact that the heterobifunctional molecules displayed different degrees
                                                102


of degradation efficiency[208], raising questions about relationship between static structural
representations of the the ternary complex and degradation efficiency. These findings and
others[209, 210, 211] suggest that degradation efficiency is more complex than can be un-
derstood through the thermodynamics of binding or static structural analysis. As such,
determining the dynamic ensemble of the ternary complex can reveal mechanistic insights
to facilitate the design of more effective degrader molecules, especially to understand the
relationship between linkers and degradation. [205, 208, 212, 213, 214].
    Previous work to computationally predict ternary structures mostly consists of protein-
protein docking protocols, perhaps followed by refinement of the initial structures with molec-
ular dynamics (MD) simulations to assess the stability of the predicted models [208, 214,
215, 216, 217, 218]. However, these docking protocols fail to predict high-resolution struc-
tures (sub-2.0 Å) with high fidelity. That is, while protein-protein docking protocols have
shown some promise in generating structural models of ternary complexes with reasonable
resolution (often characterized as sub-10 Å root mean square deviation (RMSD) to an x-
ray structure), the best structures typically fall somewhere within a long list of possible
poses (often in the hundreds or thousands), demonstrating the challenge associated with the
selection of high-accuracy ternary structure models.
    Here, we present an integrated workflow that combines solution-state biophysical tech-
niques with advanced MD simulations to produce atomic resolution structural ensembles
of the ternary complex. Hydrogen-deuterium exchange (HDX) protection data is used as
a collective variable (CV) in the MD simulation, enhancing both the speed and accuracy
of the computational predictions. Furthermore, HDX data is also used as constraints for
protein-protein docking when higher throughput and lower resolution models are sought,
such as when screening many degrader molecules.
    We use the weighted ensemble (WE) approach to perform MD simulations at biologically
relevant timescales (from microseconds to milliseconds) across multiple graphics processing
units (from dozens to thousands of simultaneous GPUs). This approach allows for the speed
                                              103


and throughput needed to sample the conformational free energy landscape at a sufficient
level to generate robust, high accuracy predictions of the ternary complex structural ensem-
ble. WE utilizes an adaptive sampling procedure where an ensemble of unbiased trajectories
are iteratively simulated and analysed so that computational resources can be optimally
reallocated to regions of interest (e.g. unexplored regions of conformational phase space or
regions of interest based on data from HDX experiments). Trajectories in sparsely populated
regions (i.e. limited data for statistical thermodynamic calculations) are cloned in order to
enhance sampling and high-probability regions with sufficient data for computing statistical
thermodynamic quantities are merged so computational resources can be reallocated to the
sparsely populated regions [71]. The resampling is done such that the probability of the whole
simulation ensemble of “walkers” is tracked in a statistically rigorous manner [44, 76, 75, 219].
We modified a bin-less algorithm called Resampling Ensembles by Variation Optimization
(REVO) [35] to more efficiently sample ternary complex formation, which is implemented in
the open source software package wepy [34]. The work presented here relies on knowledge of
the binding pose of the warhead to the POI and that of the ligand to the E3 ligase, which are
typically known from prior experiments or can be generated with computational tools like
docking or shape-based alignment. Experimental HDX data is used to determine the level
at which residues are shielded from solvent upon ternary complex formation, as compared
to the binary complexes (POI plus degrader or E3 ligase plus degrader).
     Our ultimate goal is to understand the structural and dynamic basis for differences in
degradation among a set of degrader molecules. Here, we focus on three different degrader
molecules of the BAF ATPase subunit SMARCA2 isoform 2 that recruit the E3 ligase VHL.
The binding affinities and cooperativity of ternary complex formation and the degradation
efficiency for these three degrader molecules are summarized in Table 5.1.
     Ternary complex crystal structures of PROTAC 1 (Protein Data Bank (PDB) ID: 6HAY)
and PROTAC 2 (PDB ID: 6HAX) show slight variations in the interactions and orientation of
the proteins in the ternary structure. In addition, we obtain the crystal structure of the highly
                                               104


Table 5.1: Binding affinity (Kd ), efficiencies (IC50, DC50), and cooperativity (α) of
PROTAC 1, PROTAC 2, and ACBI1 degraders. Ternary IC50 and binary (SMARCA2)
DC50 values are reported; the cooperativity is the ratio of binary over ternary IC50. Table
adapted from Farnaby et al. [208].
                  KdVHL (nM )     KdSMARCA2 (nM )      IC50 (nM)      DC50 (nM)         α
  PROTAC 1          98 ± 26          4500 ± 480          205 ± 15          300         12
  PROTAC 2         100 ± 10            770 ± 51           45 ± 9           N/A         18
     ACBI1         250 ± 64          1800 ± 980           26 ± 3          6/3.3        30
cooperative and more efficient degrader ACBI1 (PDB ID 7S4E from the work presented here)
(Section 5.3.1). A static analysis of these crystal structures does not explain the difference in
cooperativity and degradation of these heterobifunctional degrader molecules. To explain the
different degradation profiles of these molecules, we carry out MD simulations and solution
experiments, which reveal insights beyond what is defined by the crystal structure alone
(Section 5.3.2).
    Our results show that by including experimental solution-phase HDX data into the REVO
simulations (REVO+HDX) we obtain improved throughput and accuracy of the ternary
structure predictions. Starting from unbound SMARCA2 and VHL structures, REVO+HDX
is able to produce structural models of ternary complexes with Interface-RMSD below 2 Å
from the experimental x-ray crystal structures (Section 5.3.3). Additionally, REVO+HDX
generates an ensemble of bound conformations spanning a free energy basin within 3 kcal/-
mol from the crystal structure (Section 5.3.5). These dynamic models describe an ensemble of
energetically viable structures that could be used to study multiple aspects of the targeted
protein degradation process, including binding kinetics, affinity, selectivity, cooperativity,
ubiquitination, and degradation. We make prospective ternary structure predictions of the
SMARCA2 isoform 1, ACBI1 and VHL:Elongin C:Elongin B, where SMARCA2 isoform 1
has a 17 amino acid extension compared to isoform 2. Our prediction reconciles the HDX
data showing interaction of the isoform 1 extension with a beta-strand from VHL.
    We also introduce methodology to determine the conformational free energy landscapes of
these ternary complexes, which is the foundation for quantifying the populations of different
                                               105


conformational states. Starting from the crystal structures, we first sample conformations
using a HREMD simulation similar to solute tempering. From these simulations, we choose
structures as seeds to run 10,000 simulations on Folding@Home, totalling approximately 6 ms
of accumulated simulation time. We build a Markov State Model (MSM) [220] that identify
the most probable structures along with their conformational free energies and kinetics of
interconversion, all of which can be used to guide the design of novel degrader molecules.
5.2    Methods
5.2.1   Experimental Methods
5.2.1.1   Cloning, expression and purification of SMARCA2 and VHL/EloB/C
The SMARCA2 gene from Homo sapiens was custom-synthesized at Genscript with N-
terminal GST tag (Ciulli 2019 Nature ChemBio) and thrombin protease cleavage site. The
synthetic gene comprising the SMARCA2 (UniProt accession number P51531-1; residues
1373-1511) was cloned into pET28 vector to create plasmid pL-477. The second construct
of SMARCA2 with deletion 1400-1417 (UniProt accession number P51531-2) was created
as pL-478. For biotinylated SMARCA2, AVI-tag was gene synthesized at C-terminus of
pL-478 to create pL-479. The VHL gene from Homo sapiens was custom-synthesized with
N-terminal His6 tag [208] and thrombin protease cleavage site. The synthetic gene comprising
the VHL (UniProt accession number P40337; residues 54-213) was cloned into pET28 vector
to create plasmid pL-476. ElonginB and ElonginC gene from Homo sapiens was custom-
synthesized with AVI-tag at C-terminus of EloB [213]. The synthetic genes comprising the
EloB (UniProt accession number Q15370; residues 1-104) and EloC (UniProt accession num-
ber Q15369; residues 17-112) were cloned into pCDFDuet vector to create plasmid pL-474.
For protein structural study, AVI-tag was deleted in pL-474 to create pL-524.
    For SMARCA2 protein expression, the plasmid was transformed into BL21(DE3) and
plated on Luria-Bertani (LB) medium containing 50 µg/ml kanamycin at 37 °C overnight.
                                              106


A single colony of BL21(DE3)/pL-477 or BL21(DE3)/pL-478 was inoculated into a 100-ml
culture of LB containing 50 µg/ml kanamycin and grown overnight at 37 °C. The overnight
culture was diluted to OD600=0.1 in 2 x 1-liter of Terrific Broth medium containing 50 µg/ml
kanamycin and grown at 37 °C with aeration to mid-logarithmic phase (OD600 = 1). The
culture was incubated on ice for 30 minutes and transferred to 16 °C. IPTG was then added to
a final concentration in each culture of 0.3 mM. After overnight induction at 16 °C, the cells
were harvested by centrifugation at 5,000 xg for 15 min at 4 °C. The frozen cell paste from 2
L of cell culture was suspended in 50 ml of Buffer A consisting of 50 mM HEPES (pH 7.5),
0.5 M NaCl, 5 mM DTT, 5% (v/v) glycerol, supplemented with 1 protease inhibitor cocktail
tablet (Roche Molecular Biochemical) per 50 ml buffer. Cells were disrupted by Avestin C3
at 20,000 psi twice at 4 ºC, and the crude extract was centrifuged at 39,000 xg (JA-17 rotor,
Beckman-Coulter) for 30 min at 4 ºC. Two ml Glutathione Sepharose 4 B (Cytiva) was added
into the supernatant and mixed at 4 ºC for 1 hour, washed with Buffer A and eluted with
20 mM reduced glutathione (Sigma). The protein concentration was measured by Bradford
assay, and GST-tag was cleaved by thrombin (1:100) at 4 ºC overnight during dialysis against
1 L of Buffer B (20 mM HEPES, pH 7.5, 150 mM NaCl, 1mM DTT). The sample was
concentrated to 3 ml and applied at a flow rate of 1.0 ml/min to a 120-ml Superdex 75 (HR
16/60) (Cytiva) pre-equilibrated with Buffer B. The fractions containing SMARCA2 were
pooled and concentrated by Amicon® Ultracel-3K (Millipore). The protein concentration
was determined by OD280 and characterized by SDS-PAGE analysis and analytical LC-MS.
The protein was stored at –80 ºC.
    For VHL/EloB/C protein expression, the plasmids were co-transformed into BL21(DE3)
and plated on Luria-Bertani (LB) medium containing 50 µg/ml kanamycin and 50 µg/ml
streptomycin at 37 °C overnight. A single colony of BL21(DE3)/pL-476/474 or BL21(DE3)/pL-
476/524 was inoculated into a 100-ml culture of LB containing 50 µg/ml kanamycin and 50
µg/ml streptomycin and grown overnight at 37 °C. The overnight culture was diluted to
OD600=0.1 in 6 x 1-liter of Terrific Broth medium containing 50 µg/ml kanamycin and 50
                                              107


µg/ml streptomycin and grown at 37 °C with aeration to mid-logarithmic phase (OD600 =
1). The culture was incubated on ice for 30 minutes and transferred to 18 °C. IPTG was
then added to a final concentration of 0.3 mM in each culture. After overnight induction
at 18 °C, the cells were harvested by centrifugation at 5,000 g for 15 min at 4 °C. The
frozen cell paste from 6 L of cell culture was suspended in 150 ml of Buffer C consisting
of 50 mM HEPES (pH 7.5), 0.5 M NaCl, 10 mM imidazole, 1 mM TCEP, 5% (v/v) glyc-
erol, supplemented with 1 protease inhibitor cocktail tablet (Roche Molecular Biochemical)
per 50 ml buffer. Cells were disrupted by Avestin C3 at 20,000 psi twice at 4 ºC, and the
crude extract was centrifuged at 17000 g (JA-17 rotor, Beckman-Coulter) for 30 min at 4
ºC. Ten ml Ni Sepharose 6 FastFlow (Cytiva) was added into the supernatant and mixed
at 4 ºC for 1 hour, washed with Buffer C containing 25 mM imidazole and eluted with 300
mM imidazole. The protein concentration was measured by Bradford assay. For protein
crystallization, His-tag was cleaved by thrombin (1:100) at 4 ºC overnight during dialysis
against 1 L of Buffer D (20 mM HEPES, pH 7.5, 150 mM NaCl, 1 mM DTT). The sample
was concentrated to 3ml and applied at a flow rate of 1.0 ml/min to a 120-ml Superdex 75
(HR 16/60) (Cytiva) pre-equilibrated with Buffer D. The fractions containing VHL/EloB/C
were pooled and concentrated by Amicon® Ultracel-10K (Millipore). The protein concen-
tration was determined by OD280 and characterized by SDS-PAGE analysis and analytical
LC-MS. The protein was stored at –80 ºC. For the Surface plasmon resonance (SPR) assay,
10 mg VHL/EloB/C protein complex was incubated with BirA (1:20), 1 mM ATP and 0.5
mM Biotin and 10mM MgCl2 at 4 ºC overnight, removed free ATP and Biotin by 120-ml
Superdex 75 (HR 16/60) with the same procedure as above, and confirmed the biotinylation
by LC/MS.
5.2.1.2    Hydrogen Deuterium Exchange Mass Spectrometry
Our HDX analyses were performed as reported previously with minor modifications [221,
222, 223]. HDX experiments were performed using a protein stock at the initial concentra-
                                            108


tion of 200 µM of SMARCA2, VCB in the APO, binary (200 µM PROTAC ACBI1) and
ternary (200 µM PROTAC ACBI1) states in 50 mM HEPES, pH 7.4, 150 mM NaCl, 1 mM
TCEP, 2% DMSO in H2O. The protein samples were injected into the nanoACQUITY sys-
tem equipped with HDX technology for UPLC separation (Waters Corp. [224]) to generate
mapping experiments used to assess sequence coverage. Generated maps were used for all
subsequent exchange experiments. HDX was performed by diluting the initial 200 µM pro-
tein stock 13-fold with D2O (Cambridge Isotopes) containing buffer (10 mM phosphate, pD
7.4, 150 mM NaCl) and incubated at 10 °C for various time points (0.5, 5, 30 min). At the
designated time point, an aliquot from the exchanging experiment was sampled and diluted
1:13 into D2O quenching buffer containing (100 mM phosphate, pH 2.1, 50 mM NaCl, 3M
GuHCl) at 1 °C. The process was repeated at all time points, including for non-deuterated
samples in H2O-containing buffers. Quenched samples were injected into a 5-µm BEH 2.1 X
30-mm Enzymate-immobilized pepsin column (Waters Corp.) at 100 µl/min in 0.1% formic
acid at 10 °C and then incubated for 4.5 min for on-column digestion. Peptides were collected
at 0 °C on a C18 VanGuard trap column (1.7 µm X 30 mm) (Waters Corp.) for desalting
with 0.1% formic acid in H2O and then subsequently separated with an in-line 1.8µMHss
T3 C18 2.1 X 30-mm nanoACQUITY UPLC column (Waters Corp.) for a 10-min gradient
ranging from 0.1% formic acid to acetonitrile (7 min, 5–35%; 1 min, 35–85%; 2 min hold 85%
acetonitrile) at 40 µl/min at 0 °C. Fragments were mass-analyzed using the Synapt G2Si
ESL-Q-ToF mass spectrometer (Waters Corp.). Between injections, a pepsin-wash step was
performed to minimize peptide carryover. Mass and collision-induced dissociation in data-
independent acquisition mode (MSE) and ProteinLynx Global Server (PLGS) version 3.0
software (Waters Corp.) were used to identify the peptides in the non-deuterated mapping
experiments and analyzed in the same fashion as HDX experiments. Mapping experiments
generated from PLGS were imported into the DynamX version 3.0 (Waters Corp.) with
quality thresholds of MS1 signal intensity of 5000, maximum sequence length of 25 amino
acids, minimum products 2.0, minimum products per amino acid of 0.3, minimum PLGS
                                             109


score of 6.0. Automated results were inspected manually to ensure the corresponding m/z
and isotopic distributions at various charge states were assigned to the corresponding pep-
tides in all proteins (SMARCA2, VHL, ElonC, ElonB). DynamX was utilized to generate
the relative deuterium incorporation plots and HDX heat map for each peptide. The relative
deuterium uptake of common peptides was determined by subtracting the weighted-average
mass of the centroid of the non-deuterated control samples from the deuterated samples
at each time point. All experiments were made under the same experimental conditions
negating the need for back-exchange calculations but therefore are reported as relative [225].
All HDX experiments were performed twice, on 2 separate days, and a 98 and 95% con-
fidence limit of uncertainty was applied to calculate the mean relative deuterium uptake
of each data set. Mean relative deuterium uptake thresholds were calculated as described
previously [221, 222, 223]. Differences in deuterium uptake that exceeded the error of the
datasets were considered significant.
5.2.1.3    Structural Determination of SMARCA2:ACBI1:VHL Complex
Purified SMARCA2 and VCB in 50 mM HEPES, pH 7.5, 150 mM NaCl, 1 mM DTT were
incubated in a 1:1:1 molar ratio with ACBI1 for 1 hour at room temperature. Incubated com-
plex was subsequently injected on to a Superdex 10/300 GL increase (Cytiva) pre-incubated
with 50 mM HEPES, pH 7.5, 150 mM NaCl, 1 mM DTT, 2% DMSO at a rate of 0.5 mL/min
to separate any noncomplexed partners from the properly formed ternary complex. Eluted
fractions corresponding to the full ternary complex were gathered and spun concentrated to
14.5 mg/mL using an Amicon Ultrafree 10K NMWL Membrane Concentrator (Millipore).
Crystals were grown 1-3 µL hanging drops by varying the ratio of protein to mother liquor
from 0.5-2:0.5-2 respectively. Crystals were obtained in buffer consisting of 0.1 M HEPES,
pH 7.85, 13% PEG 3350, 0.2 M sodium formate incubated at 4 ◦ C. Crystals grew within the
first 24 hours but remained at 4 ◦ C for 5 days until they were harvested, cryo protected in an
equivalent buffer containing 20% glycerol and snap frozen in LN2. Diffraction data was col-
                                              110


lected at NSLS2 beamline FMX (λ=0.97932 Å) using an Eiger X 9M detector. Crystals were
found to be in the P 21 21 21 space group with unit cell dimensions of a= 80.14, b= 116.57, c=
122.23 Å, where α= β= γ=90◦ . Crystal contained two copies of the SMARCA2:ACBI1:VCB
(VHL, ElonC, ElonB) complex within the asymmetric unit cell. The structure was solved
by performing molecular replacement with CCP4i2 [226] PHASER using PDB ID 6HAX as
the replacement model. MR was followed by iterative rounds of modeling (COOT [227]) and
refinement (REFMAC5 [228, 229, 230, 231, 232, 233, 234, 235, 236]) by standard methods
also within the CCP4i2 suite. Structures were refined to Rwork /RF ree of 23.7%/27.5%.
5.2.2    Computational Methods
5.2.2.1    Unbound System Preparation
In this chapter, we will be using weighted ensemble to predict the Probable global tran-
scription activator SNF2L2 (SMARCA2) and the VHL-PROTAC ternary complex for three
different PROTACS: ACBI1, PROTAC1, and PROTAC2. The ternary complexs for PRO-
TAC1 and PROTAC2 can be found in PDB 6HAY and PDB 6HAX respectively and ternary
complex for ACBI1 was solved in this chapter.
    The simulation box of the unbound complex was solvated with explicit waters and counter
ions were added to neutralize the net charge of the system. The ACBI1 system has 24, 093
water molecules, 9 chlorine ions. The PROTAC 1 system has 21191 water molecules and 10
chlorine ions. The PROTAC 2 simulations has 31, 567 water atoms and 9 chlorine ions. We
used the Amber ff14SB force field for the proteins and TIP3 water model. All systems were
placed in rectangular boxes, with dimensions: 123 Å x 76 Å x 98 Å for the ACBI1 system
131Å x 84Å x 84Å for the PROTAC 1 system and 144Å x 89Å x 91Å for the PROTAC
2 system. We used Amber ff14SB force fields for the protein and a TIP3 water model.
The PROTAC molecular parameters were generated using in-house FFGEN/FFEngine tool.
The PROTACs began each simulation bound to the VHL protein with the goal to bind the
                                             111


VHL-PROTAC complex to SMARCA2.
5.2.2.2    Molecular Dynamics
All MD simulations were performed using OpenMM[114] v7.5.1. The time step for every
simulation was 2 fs. To enforce constant temperature and pressure, a Langevin heat bath
was used with a set temperature of 300K and a friction coefficient of 1 ps−1 was coupled to
a Monte Carlo barostat set to 1 atm and volume moves were attempted every 50 time steps.
The non-bonded forces were computed using the CutoffPeriodic function in OpenMM with
a cutoff of 10 Å. The atomic positions and velocities are saved every 10, 000 time steps, or
every 20 ps of simulation time, which is the resampling period (τ ) used here. The degrader-
VHL complex was constrained to maintain the complex during the simulation by using a
OpenMM custom centroid force defined as:
                             Centroid Force = k ∗ (dist − edist)2 ,                    (5.1)
where the dist is the distance between the center of mass of PROTAC and the center of mass
of VHL and the edist is the distance between the center of mass of PROTAC and center of
mass of VHL of the crystal structure, and k is a constant set to 2 kcal/mol ∗ Å2 .
5.2.2.3    Generating Bound Ensemble
The bound ternary complex was made using the same procedure as described in Section
5.2.2.1 The PROTAC1 system used PDB:HAY for its starting conformation, the PROTAC2
system used PDB:HAX as its initial conformation and the ACBI1 system used PDB:7S4E.
Straight forward MD was performed on the bound structures for 1µs each using the same
parameters as described in Section 5.2.2.2 without the constraining force between VHL and
the PROTAC. We then cluster the simulations into 25 cluster representatives using a vector
describing if two residues are within 4.5 Å of each other. These 25 cluster representatives
are the bound ensemble and will be used as reference structures for future calculations.
                                             112


5.2.2.4     REVO-epsilon Weighted Ensemble method
To observe binding of the VHL-PROTAC complex and SMARCA2 we apply a variant of
the weighted ensemble algorithm REVO. Each cycle of the REVO algorithm is comprised
of two parts: semi-independent MD trajectories performed in parallel and resampling. Each
of the MD trajectories (called "walkers") has a statistical weight (w) that contributes to
statistical observables. All simulations ran with 48 walkers. After a trajectory time of τ we
perform resampling. In resampling similar walkers are merged together and unique walkers
are cloned, as defined by a distance metric. During cloning, the weight is evenly divided
between the resultant clones and, when walkers are merged, the weights are combined to
ensure the conservation of probability.
    We will describe the application of the REVO algorithm as it pertains to this study, but
a more detailed explanation can be found in previous works[35, 34] and in Chapters 2 and 4.
The goal of the REVO resampling algorithm is to maximize the variation function defined
as:
                                    X         X X  dij α
                               V =      Vi =                   φi φj ,                      (5.2)
                                      i        i    j
                                                       d0
where Vi is the walker variation, dij is the distance between walkers i and j determined using
a specific distance metric, d0 is the characteristic distance used to make the distance term
dimensionless, set to 0.148 for all simulations, the α is used to determine how influential the
distances are to the walker variation and was set to 6 for all the simulations. The novelty
terms φi and φj are defined as: φi = log(wi )−log p100      . The minimum weight, pmin , allowed
                                                      min
                                                          
during the simulation was 10−50 . The walker with the highest variance, Vi and when the
weights of the resultant clones would be larger than pmin , and is within distance  of the
walker with the maximal progress towards binding of the ternary complex was proposed to
be cloned. The two walkers selected for merging were within a distance of 2 ˚   (A) and have a
combined weight larger than the maximal weight allowed, pmax , which was set to 0.1 for all
                                                113


REVO simulations. The merge pair also needed to minimize:
                                         Vj wi − Vi wj
                                                       ,                                  (5.3)
                                            wi + wj
If the proposed merging and cloning operations increase the total variance of the simulation,
the operations are performed and we repeat this process until the variation can no longer be
increased. After resampling is complete, we begin a new cycle.
5.2.2.5    Distance Metrics
Three different distance metrics were used while simulating the PROTAC 2 system: Using the
warhead RMSD to the crystal structure, maximizing the contact strength between protected
residues identified by HDX data, and a linear combination of the warhead RMSD, contact
strength between HDX-protected residues, and the contact strength between SMARCA2 and
the degrader. The simulations for the other systems used the last distance metric exclusively.
To compute the warhead RMSD distance metric, we aligned to the binding site atoms on
SMARCA2, defined as atoms that were within 8 Å of the warhead in the crystal structure.
Then the RMSD was calculated between the warhead in each frame and the crystal structure.
The distance between a set of walkers i and j is defined as: d = | RM1SDi −        1
                                                                                RM SDj
                                                                                       |.  The
contact strength is defined by determining the distances between residues. We calculate
the minimum distance between the residues and use the following to determine the contact
strength:
                                                     1
                                  strength =                   ,                          (5.4)
                                                1 + e−k(r−r0 )
where k is the steepness of the curve, r is the minimum distance between any 2 residues and
r0 is the distance we want a contact strength of 0.5. We used 10 for k and 5 Å for r0 . The
total contact strength was the sum of all residue-residue contact strengths. The distance
between walkers i and j was calculated by: d = |csi − csj | where cs is the contact strength
of a given walker.
                                              114


5.2.2.6    Ternary complex docking protocol
For the purpose of quick filtering through a large number of degrader designs, we take
advantage of the conventional restriction of molecular flexibility used in molecular docking
methods. Following [237, 217] (Methods 4 and 4b) and [216], we assume that high fidelity
structures of :warhead (i.e., SMARCA2 isoform 2:PROTAC binding moiety) and E3:ligand
(i.e., VHL:PROTAC binding moiety) are known and available to be used in protein–protein
docking. This docking of two proteins with bound PROTAC moieties is performed in the
absence of the linker. The conformations of linker are sampled independently with an in-
house developed protocol that uses implementation of fast quantum mechanical methods,
CREST [238, 239, 240]. Differently from the docking protocols described in [237, 217, 216],
we make use of the distance restraints derived either from the end-to-end distances of the
sampled conformations of linker, or from the HDX data. Thus, before running the protein-
protein docking, we generate an ensemble of conformers for linkers and calculate the values
of mean (x0 ) and standard deviation (sd) for the end-to-end distance. This information is
then used to set the distance restraints in the RosettaDock software [241, 242]:
                                                  x − x0 2
                                       f1 (x) = (       ),                               (5.5)
                                                     sd
where x is the distance between a pair of atoms in a candidate docking pose (the pair of
atoms is specified as the attachment points of the linker to warhead and ligand).
    When information about the protected residues is available from HDX experiments, we
used them to set up a set of additional distance restraints:
                                                   1
                           f2,i (x) =                         − 0.5,                     (5.6)
                                      1 + exp(−m · (x − x0))
where i is the index of a protected residue, x0 is the center of the sigmoid function and m is
its slope. As above, x0 value was set to be the mean end-to-end distance calculated over the
ensemble of linker conformers. The value of m was set to be 2.0 in all the performed docking
                                               115


experiments. The type of RosettaDock-restraint is SiteConstraint, with specification of Cα
atom for each protected residue and the chain-ID of partnering protein (i.e., x in Eq.(5.6) is
the distance of Cα atom from the partnering protein). Thus, the total restraint-term used
in docking takes the form:
                                                          X
                               frestr (x) = w · (f1 (x) +   f2,i (x)),                    (5.7)
                                                          i
where w = 10 is the weight of this additional score function term.
    RosettaDock implements a Monte Carlo-based multi-scale docking algorithm that sam-
ples both rigid-body orientation and side-chain conformations. The distance-based scoring
terms, Eq. (5.7), bias sampling towards those docking poses that are compatible with spec-
ified restraints. This allows to limit the number of output docking structures, as only those
ones that pass the Metropolis criterion with the additional term of Eq. (5.7) will be consid-
ered.
    Once the docking poses are generated with RosettaDock, all the pre-generated conforma-
tions of the linker are structurally aligned onto each of the docking predictions [216]. Only
those structures that satisfy the RMS-threshold value of ≤ 0.3 Å are saved as PDB files.
All the docking predictions are re-ranked by the values of Rosetta Interface score (Isc ). The
produced ternary structures are examined for clashes, minimized and submitted for further
investigations with MD methods.
5.2.2.7    HREMD simulation
The details of HREMD [243, 244] are shown in Figures 5.1 and 5.2, and Tables 5.2 and 5.3.
For all HREMD simulations, we chose the effective temperatures, T0 = 300 K and Tmax = 425
K such that the Hamiltonian scaling parameter, λ0 = 1.00 and λmin = 0.71 for the lowest and
the highest rank replicas respectively. The effective temperatures of intermediate replicas
are listed in Table 5.3. We estimated the number of replicas (n) in such a way that the
average exchange probabilities (p) between neighboring replicas were in the range of 0.3 to
                                                 116


0.4. We used n=20 and n=24 for SMARCA2:degrader:VHL and SMARCA2:degrader:VCB
respectively. Each simulation was run for 0.5 µs/replica, and a snapshot of a complex was
saved every 5 ps (total 100,001 frames per replica). Finally, we performed all the analyses
on only the loREVOt rank replica that ran with original/unscaled Hamiltonian.
Figure 5.1: Potential energy of all replicas from HREMD simulation of Sys7. Left to right:
rank0 to rank19. A good overal between adjacent replicas suggests a sufficient number of
replicas were employed and also confirmed no phase transition took place during the
HREMD simulation.
    We assessed the efficiency of sampling by observing (i) the values of p (Tables 5.2 and
5.3), (ii) a good overlap of histograms of potential energy between adjacent replicas ( Figure
5.1), and (iii) a mixing of exchange of coordinates across all the replicas (Figure 5.2).
5.2.2.8     Conformational free energy landscape determination
In order to quantify to the conformational free energy landscape, we performed dimension
reduction of our simulation trajectories using Principal Component Analysis (PCA). First,
the simulation trajectories were featurized by calculating interfacial residue contact distances.
Pairs of residues were identified as part of the interface if they passed within 5 Å of each other
                                                117


Figure 5.2: Effective temperature trajectories of replicas 0 (red), 5 (blue), 10 (green) and
19 (grey) from HREMD simulation of Sys7
                                                .
during the simulation trajectory, where the distance between two residues was defined as the
distance between their closest heavy atoms. PCA was then used to identify the features that
contributed most to the variance by diagonalizing the covariance matrix; for each simulated
system, the number of features used in our analysis was chosen as that which explained at
least 95% of the variance.
    After projecting the simulation data onto the resultant feature space, snapshots were
clustered using the k-means algorithm. The number of clusters k was chosen using the
“elbow-method”, i.e. by visually identifying the point at which the marginal effect of an ad-
ditional cluster was significantly reduced. In cases where no “elbow” could be unambiguously
                                              118


Table 5.2: Details of HREMD simulations. Protein complexes, number of atoms in a
simulation box, number of replicas used and the aggregate length of the simulations are
listed.
   ID                Complex                    # of atoms      # of replicas     Aggregate
                                                                                 length (µs)
  Sys1      SMARCA2-iso1:ACBI1:VHL                116,254             20              10
  Sys2      SMARCA2-iso1:ACBI1:VCB                220,573             24              12
  Sys3      SMARCA2-iso2:ACBI1:VHL                117,256             20              10
  Sys4      SMARCA2-iso2:ACBI1:VCB                234,724             24              12
  Sys5   SMARCA2-iso2:PROTAC 1:VHL                137,347             20              10
  Sys6   SMARCA2-iso1:PROTAC 2:VHL                 69,696             20              10
  Sys7   SMARCA2-iso2:PROTAC 2:VHL                 68,820             20              10
  Sys8   SMARCA2-iso2:PROTAC 2:VCB                119,082             24              12
identified, k was chosen to be the number of local maxima of the probability distribution in
the PCA feature space. Interestingly, the centroids determined by k-means approximately
coincided with such local maxima, consistent with the interpretation of the centroids as local
minima in the free energy landscape.
    To prepare the Folding@home simulations, HREMD data was featurized with interface
distances and its dimensionality reduced with PCA as described above. The trajectory
was then clustered into 98 k-means states, whose cluster centers were selected as ’seeds’
for Folding@home massively parallel simulations. The simulation systems and parameters
were kept the same as for HREMD and loaded into OpenMM where they were energy
minimized and equilibrated for 5 ns in the NPT ensemble (T = 310 K, p = 1 atm) using
the openmmtools Langevin BAOAB integrator with 2 fs time step. 100 trajectories with
random starting velocities were then initialized on Folding@home for each of the seeds. The
final dataset consists of 9800 trajectories, 5.7 milliseconds of aggregate simulation time, and
650 ns median trajectory length. This dataset is made publicly available at:
  https://console.cloud.google.com/storage/browser/paperdata.
    For computational efficiency, the data was strided to 5 ns/frame, featurized with closest
heavy atom interface distances (as described above), and projected into Time-structure Inde-
                                               119


pendent Components Analysis (TICA) space at lag time 5 ns using commute mapping. The
dimensionality of the dataset was reduced to 339 dimensions, keeping the number of TICA
necessary to explain 95% of kinetic variance. The resulting TICA space was discretized into
1000 microstates using k-means. The MSM was then estimated from the resulting discretized
trajectories at lag time 50 ns using a minimum number of counts for ergodic trimming (i.e.
the ’mincount_connectivity’ argument in PyEMMA) of 4, as the default setting resulted
in a trapped state whose connectivity between simulation sub-ensembles starting from two
different seeds was observed only due to clustering noise. The validity of the MSM was
confirmed by plotting the populations from raw MDcounts vs. equilibrium populations from
the MSM, which is a useful test, especially when multiple seeds are used and the issue of
connectivity is paramount. A hidden Markov model (HMM) was then computed using 5
macrostates to coarse-grain the transition matrix.
5.2.2.9    Calculating Interface RMSD
The quality of the structures produced by the REVO simulations were judged based on the
similarity between the interface of the simulated SMARCA2-VHL complex to the bound
ensemble. The SMARCA2-VHL interface was defined as residues between the two proteins
with a maximum distance of 10 Å. The backbone of these residues are superimposed onto
the reference structure. The interface root mean square deviation (I-RMSD) is then defined
as the RMSD of Cα atoms between the simulated structure and the reference structures.
This calculation is calculated for every structure in the bound ensemble and the minimum
value is reported.
                                             120


Table 5.3: Details of HREMD simulations. Effective temperatures and average exchange
probabilities of neighboring replicas are listed.
                 ID   Eff. temperature, Ti (K)     Avg. exchange prob. (p)
                Sys1       300, 306, 311, 317,       0.30, 0.30, 0.30, 0.29,
                           323, 329, 335, 341,       0.29, 0.29, 0.29, 0.30,
                           347, 354, 360, 367,       0.31, 0.29, 0.29, 0.31,
                           374, 381, 388, 395,       0.32, 0.31, 0.29, 0.31,
                           402, 410, 417, 425            0.29, 0.31, 0.33
                Sys2       300, 305, 309, 314,       0.27, 0.29, 0.29, 0.32,
                           319, 324, 329, 334,       0.31, 0.29, 0.32, 0.32,
                           339, 344, 349, 354,       0.29, 0.39, 0.36, 0.29,
                           360, 365, 371, 377,       0.30, 0.30, 0.30, 0.30,
                           382, 388, 394, 400,       0.31, 0.31, 0.29, 0.31,
                           406, 412, 419, 425            0.31, 0.31, 0.34
                Sys3       300, 306, 311, 317,       0.35, 0.31, 0.31, 0.29,
                           323, 329, 335, 341,       0.30, 0.32, 0.30, 0.32,
                           347, 354, 360, 367,       0.33, 0.34, 0.33, 0.35,
                           374, 381, 388, 395,       0.35, 0.35, 0.32, 0.34,
                           402, 410, 417, 425            0.33 0.35, 0.39
                Sys4       300, 305, 309, 314,       0.37, 0.38, 0.30, 0.30,
                           319, 324, 329, 334,       0.39, 0.39, 0.30, 0.34,
                           339, 344, 349, 354,       0.30, 0.30, 0.31, 0.28,
                           360, 365, 371, 377,       0.37, 0.39, 0.39, 0.36,
                           382, 388, 394, 400,       0.38, 0.31, 0.31, 0.31,
                           406, 412, 419, 425            0.41, 0.32, 0.33
                Sys5       300, 306, 311, 317,       0.31, 0.32, 0.33, 0.34,
                           323, 329, 335, 341,        0.31 0.35, 0.35, 0.30,
                           347, 354, 360, 367,       0.36, 0.32, 0.36, 0.36,
                           374, 381, 388, 395,       0.32, 0.31, 0.31, 0.35,
                           402, 410, 417, 425            0.35, 0.33, 0.38
                Sys6       300, 306, 311, 317,       0.28, 0.27, 0.30, 0.28,
                           323, 329, 335, 341,        0.28 0.28, 0.29, 0.29,
                           347, 354, 360, 367,       0.30, 0.29, 0.31, 0.31,
                           374, 381, 388, 395,       0.31, 0.29, 0.30, 0.29,
                           402, 410, 417, 425            0.32, 0.32, 0.30
                Sys7       300, 306, 311, 317,       0.29, 0.30, 0.30, 0.29,
                           323, 329, 335, 341,       0.31, 0.30, 0.29, 0.29,
                           347, 354, 360, 367,       0.32, 0.30, 0.34, 0.31,
                           374, 381, 388, 395,       0.30, 0.32, 0.31, 0.34,
                           402, 410, 417, 425            0.33, 0.34, 0.32
                Sys8       300, 305, 309, 314,       0.26, 0.28, 0.30, 0.28,
                           319, 324, 329, 334,       0.34, 0.27, 0.34, 0.32,
                           339, 344, 349, 354,       0.25, 0.34, 0.32, 0.33,
                           360, 365, 371, 377,       0.31, 0.30, 0.31, 0.27,
                           382, 388, 394, 400,       0.23, 0.25, 0.32, 0.31,
                           406, 412, 419, 425            0.31, 0.29, 0.27
                                               121


5.3     Results
5.3.1    Degraders with different efficiency induce similar ternary complex struc-
         tures in X-ray crystallography.
The ternary complexes of SMARCA2 isoform 2 and the VHL/ElonginC/ElonginB (VCB)
induced by different heterobifunctional degraders have been studied extensively [245, 208]. In
particular, PROTAC 1, PROTAC 2, and ACBI1 are three prominent degrader molecules that
induce a ternary SMARCA2 isoform 2:VCB complex and have quite different degradation
efficiencies (see Table 5.1). Whereas crystal structures of the ternary complexes induced
by PROTAC 1 (PDB ID: 6HAY) and PROTAC 2 (PDB ID: 6HAX) exist, none has been
reported to date for ACBI1, the most potent degrader among them. Thus, to study the effect
of different degraders on the ternary complex, we determined, as a first step, the structure
of SMARCA2 isoform 2:VHL liganded by ACBI1 via X-ray crystallography. The structure
was obtained using similar conditions as reported before (see Methods) [208] and solved by
molecular replacement to 2.25 Å in the highest resolution shell (Table 5.4), using 6HAX as
the search model (Figure 5.3a).
     The degrader molecule bridges the induced interface, forming contacts with both proteins.
Importantly, the ligand induces “cooperative contacts” between several amino acids of the
two proteins, such as VCB:ARG69 and SMARCA2 isoform 2:PHE1463 (Figure 5.3 b,c).
SMARCA2 isoform 2:ASN1464 makes critical bivalent contacts to the aminopyridazine group
of ACBI1, positioning the terminal phenol group for pi stacking interactions with residues
PHE1409 and TYR1421 (Figure 5.3b,c).
     On the VHL side of the interface, the interactions between TYR98 and ACBI1 are consis-
tent with those between the same residue and PROTAC 1 or PROTAC 2 (Figure 5.3b,c) [208].
     The three degraders PROTAC 1, PROTAC 2, and ACBI1 bind to VHL in a near-identical
fashion as their superposition reveals, upon aligning the VHL protein (see Figure 5.3d).
                                               122


Table 5.4: Crystallographic table for protein crystal structure 7S4E
SMARCA2-iso2:ACBI1:VHL.
                                                  Smarca 2BD: ACBi1 :
                                                           VCB
             Data collection
             Space Group                               P 21 21 21
             Cell Dimension
             a, b, c, (Å)                        80.14, 116.57, 122.32
             α, β, Ɣ (˚)                                90, 90, 90
             Resolution (Å)                        2.25 (2.31-2.25)*
             Rmerge                                        0.15
             <I/𝛔I>                                       1.27*
             Completeness (%)                      99.9(37.89-2.25)
             Redundancy                                     7.4
             Refinement
             Resolution (Å)                                2.25
             No. Reflections                              52206
             Rwork/Rmerge                               21.9/25.9
             No. Atoms                                    7356
                         Protein                          7070
                   Ligand/Ions                             196
                          Water                             90
             B factors
                Mean B value                              61.19
                   Ligand/Ions                            57.66
                          Water                           53.56
             R.M.S deviations
             Bond length (Å)                              0.009
             Bond angles (Å)                              1.519
             * Denotes values obtained for the highest-resolution shell.
                                             123


Figure 5.3: Ternary complex of SMARCA2 and VCB induced by ACBI1 shows structural
similarities with PROTAC 1 and PROTAC 2: a Overall perspective of SMARCA2 Isoform
2 (green) and VHL/ElonginC/ElonginB (grey) induced by degrader molecule ACBI1
(bright orange). b ACBI1-induced interface contacts between SMARCA2 and VCB. The
proteins are shown in space-filling, the colors are as in a, annotated residues are among
those that make the highest number of contacts (see c). c A contact map for the interface
of the crystal structure. The circle size reflects the number of atoms (including hydrogen
atoms) participating in interactions. d Superposition of 6HAY (purple), 6HAX (salmon),
7S4E (green) by aligning VHL (grey) shows varied conformations of the warheads of the
three degraders PROTAC 1, PROTAC 2, or ACBI1 (up to 1.7 Å) resulting in alterations of
SMARCA2 within the ternary complex.
                                               124


Nonetheless, the minor differences in the linker compositions, e.g. the ACBI1 linker has one
additional ether group compared to the PROTAC 2 linker, yields a slight 1.7 Å twist of
ACBI1 compared to the other two degraders, resulting in a subtle 5 Å “swing” of the protein
(Figure 5.3d).
    Our results show that, despite the differences in the linker compositions, the protein-
protein interface induced by ACBI1 is structurally similar to that induced by PROTACs 1
or 2 [208]. Notwithstanding, the markedly different degradation efficiencies between these
degraders[208] suggest that the (dynamic) ensemble of ternary complex structures may be
fairly different among them. Consistent with other studies [207, 194], this implies that
“crystallographic snapshots” are not suitable to provide a holistic view of the ensemble of
all possible ternary complex structures in solution, but merely represent a subset of the
relevant conformations favored by crystallization [246]. Consequently, such X-ray structures
cannot fully capture the dynamic nature of the degrader-induced ternary complexes, which
ultimately determines their activities and degradation efficiencies [207, 194].
5.3.2    Hydrogen Deuterium Exchange Reveals Extended Protein-Protein Inter-
         faces
In order to assess the impact of different degrader molecules on the dynamic nature of
the SMARCA2 isoform 2:VCB interactions, we performed hydrogen-deuterium exchange of
the respective APO, binary and ternary (complex) species, thus characterizing the protein-
protein interface in solution. This approach is a promising alternative to previous attempts
at characterizing degrader ternary complexes that employed multiple crystal structures [207],
nuclear magnetic resonance (NMR) [194]. Based on previously established protocols [221,
247, 223], and with the knowledge of binding constants for each of the three degraders,
complex formation was determined and each protein was also found to have a 100 % sequence
coverage (see 5.4) and stable deuterium exchange (see 5.5 and 5.6). To ascertain the degree
of protection from solvent in the binary or ternary complex, the residue-specific uptake of
                                             125


the APO or binary species was subtracted from that of the corresponding residues in the
binary or ternary state (referred to as Binary∆APO and Ternary∆Binary), respectively. The
results are summarized in difference plots that highlight the statistically significant (95% or
98% confidence interval) protection of distinct sites (see Figure 5.8a-d for the SMARCA2
isoform 2:VCB complex induced by ACBI). Importantly, protection during HDX arises due
to changes in the environment around the observed residues, which could be a result of direct
occlusion of solvent or conformational changes [225].
    Figure 5.8a reveals that large regions of SMARCA2 isoform 2 become protected upon
ternary complex formation (see Ternary∆Binary difference plot). These stretches of pro-
tected residues, e.g. amino acids 1409-1422 and 1456-1470, overlap with the ligand bind-
ing site based on the ternary complex structure published in this work (7S4E) and those
published previously (6HAY, 6HAX), which confirms the similarity of the ternary complex
interface among the three degrader molecules discussed above. Additionally, there are also
stretches of protected amino acids, 1394-1407, that are too distant from the established
binding interface to result from complex formation (Figure 5.8a and f). Interestingly, the
Binary∆APO difference plot suggests that the SMARCA2 isoform 2-ligand binary complex
was unstable under our experimental conditions (the ligand concentration is close to the
dissociation constant), as there is no substantial difference between the HDX of SMARCA2
isoform 2 in presence and in absence of the ligand (Figure 5.8a and e).
    On the VCB side of the interface, large regions of VHL, which resides at the direct
interface of the protein complex, are protected in the presence of the ligand as indicated
by the Binary∆APO difference plot (Figure 5.8b and e). The most protected residues in
the binary state are centered around amino acids 87-116, which include all 9 residues in the
ligand binding site of VHL. However, there is a large amount of protection across the entire
protein indicating the presence of a large allosteric network across the protein [248]. In the
presence of SMARCA2 isoform 2 (see Figure 5.8b, Ternary∆Binary difference plot), much
of the allosteric network due to ligand binding can be subtracted away leaving only the most
                                              126


             a                  SMC2 coverage
             b                  VHL coverage
             c                   EloC coverage
             d
                                 EloB coverage
Figure 5.4: Peptic coverage map of proteolyzed proteins SMARCA2, VHL, Elongin C and
Elongin B.
                                             127


Figure 5.5: Relative uptake heat map of HDX exchange data of all PROTAC molecules 1,
2 and ACBI1 bound to binary and Ternary State SMARCA2 isoform 2 bromo domain.
significantly protected residues due to ternary complex formation (Figure 5.8b and f). In
particular, residues 60-72, which house the critical interaction of ARG69 show significant
protection due to ternary complex formation (Figure 5.8b and f).
    Additionally, we observe continued protection of residues 166-176 and residues 187-201 on
VHL (Figure 5.8b and f) as well as some regions on Elongin B and C that show protection
upon ternary complex formation (see Figure 5.8c and d). Although these sites are distal
from the binding interface, they spatially align with one another when mapped onto the
structure (Figure 5.8c) potentially uncovering a critical network of allosteric changes induced
by ACBI1.
    These changes that are not observed in the snapshot provided by crystallography help
explain how this molecule vastly improves biological function, i.e., perhaps by changing the
orientation of the ligase to target lysine residues on SMARCA2 isoform 2, as suggested by
our simulations of the SMARCA2 isoform 2:VCB complex (see below).
                                              128


Figure 5.6: Relative uptake heat map of HDX exchange data of all PROTAC molecules 1,
2 and ACBI1 bound to binary and Ternary State VHL.
                                          129


Figure 5.7: Relative uptake heat map of HDX exchange data of all PROTAC molecules 1,
2 and ACBI1 bound to binary and Ternary State Elongin C.
5.3.3    Efficient simulation of ternary complex formation using REVO Weighted
         Ensemble simulations
We simulate the formation of a degrader ternary complex with the weighted ensemble path
sampling approach. This method has been employed before for tasks such as protein-protein
binding [249]. It is noteworthy, however, that the pre-defined CVs in the current simulations
are not informed by structural data about the ternary complex interface from X-ray crystal-
lography experiments. In particular, we employ the WE variant: REVO[35], which optimizes
an objective function called the trajectory variation (see Methods). Here we compare the
performance of the REVO algorithm with three different distance metrics: 1) differences in
the warhead RMSD, (RMSD between the target binding domain of the PROTAC between
the reference structure and a simulation frame, hereby called w-RMSD); 2) differences in
target-ligase contacts; 3) a weighted combination of metrics 1, 2, and the differences in con-
tacts between the target and the PROTAC for the PROTAC 2 system, hereby called the
                                             130


Figure 5.8: ACBI-induced ternary complex formation of SMARCA2 isoform 2:VCB leads
to protection of specific sites:a-d, SMARCA2 isoform 2(a), VHL(b), Elongin C(c), and
Elongin B(d) monitored for hydrogen-deuterium exchange over time. The difference plots
of each protein in the binary and ternary states are generated by subtracting the deuterium
exchange of like peptides of the APO or binary from the binary or ternary states (defined
as Binary∆APO and Ternary∆Binary), respectively. Regions that exchange significantly
less than the comparative state are depicted in blue (negative), whereas regions that
exchange significantly more appear in red (positive). The resultant difference plots of the
binary (e), or ternary complex (f) were mapped onto the structure 7S4E. The experiments
were repeated on 2 separate days.
                                            131


triple distance metric. The residues selected to compute the target-ligase contacts used in
distance metrics 2 and 3 were selected based on residues that showed increased protection
from hydrogen-deuterium exchange based on the HDX experiments. Between three and
seven REVO simulations are run for each distance metric on the SMARCA2-VHL-PROTAC
2 system, and a summary of their performance is given in Table 5.5.
Table 5.5: A summary of the performance of REVO simulations run with different distance
metrics. Each REVO simulation ran with 48 walkers. The number of binding events
(Nbinding ) counts the barrier crossings into the bound state, defined using an I-RMSD
< 2.0 Å. The number of simulations with binding events (Sims. w/ binding) shows the
probability of binding success. The total simulation time (Sim. time) aggregates the length
of all trajectories in each REVO simulation.
  Distance Metric            Nbinding  Sims. w/ binding    Sim. time (µs)
  W-RMSD                     5327      6/7                 13.44
  Target-Ligase Contacts     5876      2/7                 13.44
  Triple                     3278      6/7                 12.5
    Encouragingly, we observe a large number of binding events with all three distance metrics
examined here. The triple distance metric found the least number of binding events at 3278,
whereas the target-ligase contacts distance metric sees the most binding events (5876 binding
events).
    We find that the low I-RMSDs are achieved early in the REVO simulations, with the
minimum I-RMSD reaching ∼2.5 Å and stabilizing after about 500 ns of aggregate simulation
time, whereas the vanilla MD simulation we ran plateaued at 10 Å after 1.35µs of simulation
(Figure 5.9a,b). When comparing between the distance metrics both w-RMSD and the triple
distance metrics were able to find sub-2 Å structures within an aggregate simulation time of
200 ns, whereas it took the target-ligase contacts distance metric 800 ns to find structures
of the same quality (Figure 5.9b). This indicates that using the degrader orientation to the
binding site may be required to quickly and consistently generate low I-RMSD structures.
    Figure 5.10 shows an example of a structural prediction obtained for SMARCA2 isoform
2-PROTAC 2-VHL (PDB ID 6HAX [208]). The contact maps presented in Figure 5.10c
have been obtained by the Arpeggio software [250] applied to the ternary interfaces. Each
                                               132


point on the contact maps reflects the degree of interaction. As can be seen from both the
aligned prediction and co-crystallized structure (Figure 5.10d) and from the contact map
(panel (Figure 5.10c), the accuracy of prediction is very high.
    We performed clustering on all structures produced by the 6HAX simulations by a k-
means clustering algorithm into 500 macrostates using the Cα − Cα distances on residues
determined from the HDX experiments. Low I-RMSD states all have low values of w-RMSD
(as expected) (Figure 5.9c). High free energy states have a large range of both I-RMSD
values. However, the low free energy states are coincidentally below 10 Å. Five of the lowest
free energy states have I-RMSD values below 3 Å.
    Determining the I-RMSD of the ensemble is only possible when we have a crystal struc-
ture for the ternary complex. When trying to filter many possible degraders, it is not always
feasible to solve these structures for every compound. Therefore, we need to rely on other
physical quantities for predicting acceptable structures in such cases. We developed two
definitions of the bound ensemble for the REVO simulations: 1) Using w-RMSD below
2 Å and 2) Using w-RMSD below 2 Å and more than 30 residue-residue contacts between
the target and ligase residues that showed increased protection from hydrogen-deuterium
exchange obtained from the HDX experiments. The first definition was used on the simu-
lations where we used the warhead distance metric and the second definition was used for
the target-ligase contacts and triple distance metrics. Using these definitions of the bound
ensemble to filter our simulations, REVO simulations with and without HDX were able to
sample low I-RMSD regions (below 2 Å), and the probability distributions had peaks below
this I-RMSD threshold (Figure 5.11a). However, using the HDX data limited the bound
ensemble to a maximum I-RMSD of about 4 Å whereas not including the HDX data allowed
a broad bound distribution that had a maximum at 8 Å. Using this definition for the bound
ensemble, 43% of structures that met this criteria had I-RMSD values below 2 Å, whereas
the bound ensemble generated from REVO simulations not using HDX only selected con-
formations with an I-RMSD below 2 Å at 38% (Figure 5.11b). This definition of the bound
                                              133


                  a                                                               b
                            12.5                                                                                                      Warhead RMSD
       Minimum I-RMSD (Å)                                                                       Minimum I-RMSD (Å)
                                                                                                                     10
                                                                                                                                      Target-Ligase Contacts
                            10.0                                                                                      8
                                                                 Triple DM Runs                                                       Triple
                              7.5                                Average                                              6
                              5.0                                Vanilla MD
                                                                                                                      4
                              2.5                                                                                     2
                              0.0
                                        0.0     0.5     1.0      1.5      2.0                                             0.0     0.5     1.0       1.5     2.0
                                           Aggregate Simulation Time (μs)                                                    Aggregate Simulation Time (μs)
                                                      w-RMSD (Å)
                  c                              20    40   60     80
                                                                                  d
                                   30                                                                   105
                                                                                                10−4
                      I-RMSD (Å)                                                  kon (1/M*s)
                                   20                                                           10−13
                                                                                                10−22                                             PROTAC 1
                                   10
                                                                                                            −31                                   ACBI1
                                                                                                10
                                                                                                                                                  SPR Rate
                                    0
                                        0.0     0.5    1.0     1.5      2.0                                               0.0    0.5     1.0      1.5     2.0
                                               Free Energy (kcal/mol)                                                       Aggregate Simulation Time (μs)
Figure 5.9: Comparing the w-RMSD, number of target-ligase contacts, and triple distance
metrics (Linear combination of w-RMSD, target-ligase contacts and number of
target-PROTAC contacts). (a) The minimum I-RMSD over time during the simulation for
the triple distance metric. Each green line indicates one replica and the black line is the
average between all runs. The blue line is a straightforward MD simulation run on
Folding@home. (b) The minimum I-RMSD for each distance metric. (c) A scatter plot of
the free energy vs the I-RMSD after clustering the 6HAX simulations. The circles are
colored by w-RMSD. (d) The predicted binding rates for PROTAC 1 system (purple) and
the ACBI1 system (green). The black line is the experimental on-rate determined via SPR.
ensemble had 90% of structures below 3 Å I-RMSD for REVO+HDX simulations. However,
only 51% of structures in the REVO bound ensemble were below 3 Å. It is worth noting that
both the target-ligase contacts and triple distance metric both use HDX data to help guide
the simulations. However, the target-ligase contacts metric did not produce low w-RMSD
structures, the lowest being just below 6 Å and thus did not contribute the the bound ensem-
ble via this definition. Using this definition for the bound ensemble, we find that adding the
HDX data during REVO, we have a higher likelihood of finding structures at lower I-RMSD
compared to when we run simulations without HDX.
                                                                                        134


    a                                               b
    c                                               d
               PDB: 6HAX
               REVO + HDX
                                                        I-RMSD = 1.1 Å
Figure 5.10: Illustration of the representative prediction produced by REVO simulation
and its comparison to the co-crystallized structure (PDB ID: 6HAX) (a) predicted ternary
structure with I-RMSD=1.1 Å; (b) detail of the binding interface; (c) contact maps for the
interfaces of co-crystallized and predicted structures. The circle size reflects the number of
atoms (including hydrogens) participating in interactions; (d) structurally aligned
prediction (green) and co-crystallized structure (pink) with a detailed PROTAC 2
comparison shown.
                                              135


    All the above analysis was done using the PROTAC 2 system. We also performed three
1.96 µs simulations for the PROTAC 1 and ACBI1 systems using the triple distance metric,
totaling 5.88 µs for each system. All the simulations for these two systems were able to
produce structures of quality I-RMSD, the lowest being 0.69 Å for PROTAC 1 and 0.47 Å
for ACBI1.
    We next predict the on-rates on the three different PROTACs (Table 5.6) using the flux
into the bound state as defined when the state reaches an I-RMSD below 2 Å. For PROTAC
1 and ACBI1, our predicted rates are on the same order of magnitude as experiment (Fig-
ure 5.9d). For PROTAC 2, we were unable to experimentally determine the on-rate so we
simply report the rate predicted by simulation. However, for all three rates there were large
errors. This is due to the weighted ensemble algorithm being slow to converge. To obtain
better statistics, more simulation time is needed.
Table 5.6: Comparison of kon rates between simulation and experiment for the ACBI1
PROTAC 1, and PROTAC 2 systems. The experimental rate for PROTAC 2 has not been
determined yet.
 PROTAC          Predicted Rate (M −1 s−1 )   Experimental Rate (M −1 s−1 )
 ACBI1           3 ∗ 105 ± 2 ∗ 105            2.4 ∗ 105
 PROTAC 1        10 ∗ 105 ± 8 ∗ 105           2.9 ∗ 105
 PROTAC 2        2.2 ∗ 102 ± 1.7 ∗ 102        N/A
5.3.4    HDX improves prediction of ternary complex using docking
Molecular docking is a very popular method for high-throughput predictions of binding poses,
that follows a protocol of sampling, searching, and scoring these predictions. Considering
the computational cost of the REVO+HDX method described above, docking is a viable
alternative to the simulation approach in obtaining different conformations of the flexible
degrader ternary complexes in a less resource-intensive and more timely fashion.
    To demonstrate the usefulness of HDX data for more accurate structural predictions,
we show that incorporating experimentally retrieved distance restraints into the docking
protocol significantly improve its sensitivity (see Figure 5.11). Importantly, unlike Zhang et
                                               136


al. [251], who derived restraints from chemical cross-linking experiments, or Eron [252], who
revised the post-sampling scoring, our approach imposes distance restraints based on the
statistics of the linker length in a degrader molecule – at the sampling stage (see Methods).
    Figure 5.11a shows the improvement in docking predictions when augmented by HDX
data as the distribution of I-RMSDs (with respect to the crystal structure) for the top-100
predictions is shifted toward smaller values upon incorporating the experimentally derived
restraints (green compared to orange profile). In particular, when focusing on subsets of
highly accurate structure predictions, i.e., I-RMSD < 2 Å, 2.5 Å, or 3 Å (see Figure 5.11b),
for which, as described above, the improvement of REVO upon adding information from
HDX was measurable, the performance of the docking protocol is significantly improved.
Although REVO+HDX consistently outperforms the HDX-enhanced docking routine, it is
striking how strongly the incorporation of HDX data can boost the accuracy of this docking
protocol. Therefore, considering the significantly less computational cost of this approach
(75 CPU hours for 3 independent replicas) compared to the REVO method (300 A40 GPU
hours), docking, in combination with HDX, is a useful tool for the quick filtering of a large
number of degrader designs.
5.3.5    Conformational sampling of ternary complexes
Our HDX data suggest that the protein complexes studied here are dynamic and sample sev-
eral distinct bound conformations. We use HREMD simulations to identify these structures
and quantify the free energy landscape of these complexes. First, we perform PCA of the in-
terface distances observed in our HREMD simulations in order to reduce the dimensionality
of the simulation data. The probability distribution of the highest-variance features allow us
to measure a more easily interpretable free energy landscape from our simulation data than
would be possible otherwise. We find that the landscape of each protein complex contained
several local minima differing by only a few kcal/mol.
    Using k-means clustering in the PCA feature space, we then identify distinct clusters
                                               137


                                           REVO + HDX        Docking + HDX
                                           REVO              Docking
                    a
                                     0.8
                                     0.6
               Density               0.4
                                     0.2
                                     0.0
                                           2    4   6   8 10     12   14
                                                    I-RMSD (Å)
                   b
                                     100
               % of Bound Ensemble
                                      80
                                      60
                                      40
                                      20
                                       0
                                               <2      < 2.5      <3
                                                    I-RMSD (Å)
Figure 5.11: Comparing the bound ensembles determined by docking and REVO
simulations with and without information from HDX for the PDB ID 6HAX ternary
complex. The REVO bound ensemble is defined as structures below a warhead RMSD of
2 Å and more than 30 contacts between the target and ligase interface. The docking bound
basin is defined as the 100 top structures as determined by Rosetta-scoring. (a) Probability
density function distributions of I-RMSD values for the bound ensembles. (b) The percent
of structures in the predicted bound ensembles below specific I-RMSD thresholds (2 Å,
2.5 Å, and 3 Å).
                                                     138


Figure 5.12: Most populated structures of SMARCA2 bound to VHL with different
degrader molecules, identified by dimension reduction and clustering of HREMD simulation
data. (a-d) Colors of VHL and SMARCA2 represent HDX protection in the presence of
the degrader molecules relative to the situation in the absence of the degrader. The second
ranked structures of c PROTAC 2 and d isoform 1 SMARCA2 are displayed that support
HDX data, whereas the top three structures are included in Figure 5.13. Elongin B and
Elongin C are also included in panel d. e The top structures of ternary complexes are
compared after aligning VHL to illustrate conformational differences among top structures
of ternary complexes.
                                             139


Figure 5.13: Cluster centroids from the three highest populated structures of
SMARCA2-iso2 bound to VHL via (a) ACBI1, (b) PROTAC 1, and (c) PROTAC 2, along
with their populations. Less populated structures are omitted.
Figure 5.14: Free energy landscapes determined from PCA projections of SMARCA2-iso2
bound to VHL via (a) ACBI1, (b) PROTAC 1, and (c) PROTAC 2. Red points indicate
k-means centroids.
                                            140


of conformations. Cluster centers roughly correspond to local minima in the free energy
landscape, see Figure 5.14. The clusters identified by k-means are consistent with our HDX
protection data. Figure 5.12 shows that interface residues that were found to be protected
in HDX experiments are observed to interact in either the most populated or second most
populated cluster identified by k-means. Notably, this analysis shows that in the second
most populated structure of Iso1-ACBI1-VCB, the helix formed by the 17 residue extension
of isoform 1-SMARCA2 interacts with a beta sheet of VHL, Figure 5.12d, in accordance
with HDX experiments that found this beta sheet to be protected in presence of Iso1, but
not in the presence of Iso2. Similarly, highly populated structures of Iso2-ACBI1-VHL and
Iso2-PROTAC2-VHL show contact between residues that were observed to be protected in
HDX experiments with these PROTACs, but not with PROTAC 1, while the most populated
structure of PROTAC 1 does not show these contacts.
    We selected 98 representative structures from HREMD data to use as initial configura-
tions for Folding@home (F@H) simulations of Iso2-PROTAC2-VHL. Each initial condition
was cloned 100 times and run for ∼ 650 ns, for a total of ∼ 6 ms of simulation data. These
independent MD trajectories provide the basis for fitting a MSM[253], which provides a full
thermodynamic and kinetic description of the system and allows for the prediction of ex-
perimental observables of interest [220]. First, we used time-lagged independent component
analysis (tICA) [254] to determine a projection of the underlying MD data. The distance
between points in the TICA feature space corresponds roughly to kinetic distance.
    The MSM predicts a stationary probability distribution on TICA space that is in general
different from the empirical distribution of our simulation data. Interestingly, the MSM
predicts that the the crystal structure is 1.5 kcal/mol higher in free energy than the global
free energy minimum, while the bound structures obtained from our REVO simulations are
∼ 1.5 − 3.5 kcal/mol above the global minimum, Figure 5.15a-c. The model also predicts a
metastable state with free energy 2.2 kcal/mol (Figure 5.15e).
    This model is coarse-grained to obtain a five-state MSM, of which the following three
                                              141


Figure 5.15: a Conformational free energy landscape as a function of the first two TICA
features of SMARCA2-PROTAC2-VHL ternary complex inferred from a MSM. The
ensemble of bound states from REVO simulations is shown as blue points; the crystal
structure (PDB ID 6HAX) is shown as a red X. In this projection, states II and V are close
to state I. b Network diagram of the coarse-grained MSM calculated using a lag time of 50
ns, with the stationary probabilities associated with each state indicated. c mean first
passage time (MFPT) from one state in the MSM to another. Numbers indicate predicted
MFPTs in µs. d-e Comparison of the crystal structure (gray) with the lowest free energy
state (cyan) and a metastable state (orange) predicted by the MSM. Arrows indicate a
change of orientation relative to d.
                                             142


Figure 5.16: Contact maps from the (a) co-crystallized structure 6HAX ; (b) global
minimum state and (c) metastable state identified by our MSM.
                                          143


states are of particular interest: the global minimum state (or state I) with a stationary
probability of 0.63, the metastable state III with 0.10 probability, and state IV, to which
the experimental crystal structure can be assigned and which has a stationary probability
of 0.05. The global minimum state differs from the crystal structure 6HAX by an I-RMSD
of 3.6 Å, while the metastable state has an I-RMSD of 4.8 Å relative to the crystal struc-
ture. The global minimum state is stabilized by a large number of protein-protein contacts
(Figure 5.16). Contacts between VHL and PROTAC 2 are largely unchanged between the
metastable and global minimum states, likely due to the tight interaction between VHL and
the PROTAC. On the other hand, the metastable state lacks contacts between PROTAC
2 and ARG29, ASN90, and ILE96 of SMARCA2. The area of the binding interface was
substantially increased in both the metastable and global minimum states relative to the
                                                                                       2
crystal structure: the global minimum state had a buried surface area of 2962 Å , compared
           2                                       2
to 2800 Å for the metastable state and 2369 Å for the crystal structure.
5.4     Discussion
Ternary complex formation is a critical step in the process of targeted protein degradation.
However, studying the dynamic nature of ternary degrader complexes has posed challenges to
the field due to the size of the system, degrees of flexibility, timescales for biological motions,
and limited solution-phase experimental data. Here, we studied the structure and dynamics
of three different degrader molecules of SMARCA2 that have similar thermodynamic binding
profiles and crystal structure conformations but different degradation efficiencies. We solved
the crystal structure of the ternary complex bridged by ACBI1 (PDB ID: 7S4E), which re-
vealed a potential structural and dynamic aspect of the degradation profiles, which led to
a series of solution-phase experimental studies and MD simulations in an effort to explain
and predict the degradation profiles. HDX coupled with extensive MD simulations enabled
the generation of atomic resolution representations of the dynamic ensembles. We high-
lighted the conformational landscape of SMARCA2:PROTAC 2:VHL, for which we carried
                                               144


out 6 ms of accumulated simulation time using the Folding@home distributed supercom-
puter cluster. We propose an enhanced-seeding method that includes HREMD simulations
of the ternary complex, extraction of a modest number of seed structures (100 in the work
presented here), and independent simulations starting from each seed using Folding@home.
We applied TICA for dimensionality reduction and built a MSM to assess the conforma-
tional free energy landscape and transitions between low-energy states. We found that the
experimentally determined crystal structure is close to, but not coinciding with, the global
minimum in the free energy landscape, although it is within 2 kcal/mol from the global
minimum. By coarse-graining the MSM, we were able to identify five low-energy structures
that contain different protein-protein interfaces between SMARCA2 and VHL. A less com-
putationally costly solution was also explored, where we found that only 0.5 µs of lowest
rank/unbiased replica of HREMD (aggregate 12 µs with 24 replicas) gave us a qualitatively
similar conformational landscape with the same global minimum, albeit a lower resolution
free energy surface. Thus, we proceeded to run HREMD on other systems of interest (PRO-
TAC 1, ACBI1, and isoform 1 of SMARCA2; see Table 5.2 for number of replicas used and
aggregate length for each system). Simulation analyses of the most probable structures for
each of these ternary complexes show that ACBI1 and PROTAC 1 have different orientations
of SMARCA2 relative to VHL, with PROTAC 2 sampling both orientations. We propose
that this sampling procedure can be replicated for other target and E3 ligase complexes,
since no target-specific information was used for the simulations in this work other than the
starting x-ray structure coordinates.
    We then explored the prediction of the ternary complex itself, using the binary structures
of the known binding mode of the ligase ligand binding to VHL and warhead binding to
SMARCA2, which is typically known in advance of designing heterobifunctional degrader
molecules. We developed a protocol using REVO that is able to produce a 3-dimensional
structural ensemble of high accuracy structures using the RMSD to the bound pose of the
warhead as CV: we find structures less than 2 Å I-RMSD within 2-3 kcal/mol from the most
                                              145


probable conformation of the ternary complex. The addition of experimental solution-phase
HDX protection factor data, which identifies residues that are most likely to be precluded
from solvent interactions in the context of the ternary structure, further increased the quality
of the predictions. Similarly, we discovered that HDX data can improve docking results,
although, overall docking has lower quality than the REVO simulations, likely due to the
minimal sampling of internal protein degrees of freedom and lack of explicit solvent. We made
publicly available all relevant simulation data and the source code for running REVO+HDX.
    To further validate the REVO+HDX procedure, we performed a prospective prediction of
the ternary complex of isoform 1 SMARCA2, with a 17 amino acid extension, the structure
of which was previously unknown. Our ternary complex predictions suggest ways that the
SMARCA2 extension is interacting with a beta-strand from VHL, explaining the observed
HDX protection pattern. The REVO+HDX method provides an opportunity to visualize,
at an atomistic level, the molecular interactions that guide ternary structure formation for
complexes with previously known crystal structures. The ultimate goal is to uncover the
solution-phase structural ensembles adopted by the target ligase pair, which appear to ex-
tend beyond what is observed using conventional structure determination methods. This
information may provide a better representation of the factors that influence ternary com-
plex formation, ultimately leading to downstream ubiquitination. Moreover, knowledge of
critical atomic interactions provides a basis for alternative strategies to improve degrader
designs, such as modifying linkers to induce specific conformational ensembles of the ternary
complex that are associated with higher degradation. REVO+HDX is also able to recapit-
ulate experimental kon for ternary and binary complex formation.
                                              146


                                         CHAPTER 6
                        SUMMARY OUTLOOK AND IMPACT
In section 1.3 we outline the goals of this thesis. Now we go back to these overall goals and
discuss the progress, what improvements can be made and the impact on the field in general.
    The first goal of the thesis is to model and characterize the binding and unbinding events
for systems of interest, which was successful for all systems studied in this thesis. It is
worth noting however that the REVO algorithm was modified for each system. This is not
surprising as the complexity of the simulation goal increases the parameters and algorithm
might need to be optimized to be able to successfully simulate the desired phenomena.
    In Chapters 2 and 4 we characterized these pathways with detailed Markov State Models
(MSM)s constructed from high dimensional data. For the TSPO system, we are able to
model multiple pathways between the bound and unbound states. We also were able to
identify key residues the ligand interacted with along these unbinding pathways.
    While we were able to simulate (un)binding pathways, it is unlikely that we were able to
sample all likely pathways. To obtain a more complete picture of the (un)binding landscape
for a given force field we can:
    • Run longer simulations.
    • Run more replicate simulations.
    • Optimize the REVO parameters.
    • Use distance metrics that better represent the system.
    • Run simulations with more walkers.
    • Develop or utilize more efficient algorithms.
                                               147


    Knowledge of the (un)binding pathway can help guide ligand design with more opti-
mal kinetic properties. In particular, the results from Chapter 4 suggest a new pathway
for PK-11195 unbinding through the membrane, where only dissociation into the solvent
was previously published. The pathway determined in this thesis indicates that altering
the lipophobicity of PK-11195 based compounds would affect the unbinding pathway and
could significantly alter the residence time (RT). Additionally, an Roivant Discovery is using
the results presented in Chapter 5 to design new proteolysis-targeting chimera (PROTAC)
molecules.
    The second goal was to accurately predict observables, such as rate constants, RT and
∆Gbind that can be compared to experiments. It is difficult to determine if the rates were
accurate for the systems present in the SAMPL6 challenge as, to our knowledge, these
quantities have not been published. However, we do know the ∆Gbind . Using the conventional
definition for ∆Gbind we were off between 2.80 (OA-G4) and 5.10 (CB8-G3) kcal/mol from
experiment. After applying the correction factors and running additional simulations on the
OG-G6 system we were able to reduce the error to a maximum of 1.13 kcal/mol. Simulating
the binding and unbinding is not the most computationally efficient method to predict free
energy, but our simulations provide additional insight to the (un)binding mechanisms. The
rates calculated for TSPO are hard to compare against experiment because the only published
TSPO-PK-11195 unbinding rates are for the human protein, and we used a bacterial strain
from Rhodobacter sphaeroides in our simulations. However, multiple starting poses showed
unbinding rates within an order of magnitude over different rate calculation methods.
    While we were able to predict kinetic rates, there are several sources of error. The first
is that the use of the WE algorithm leads to high variation between runs as this method
requires long simulations to be able to produce adequate sampling to confidently predict
rates. Additional there are errors associated with the force fields, which dictates the atomic
motion in these simulations which in turn affects the path dependent rate prediction. Finally
there is a potential inconsistency between the boundary conditions used in our work with
                                              148


those used in experiment to decide if a ligand is (un)bound. For example in radioligand
displacement assays, any ligand pose that is not sterically blocking entry of the radiolabelled
competitor ligand would be considered “unbound” [191]. This does not indicate that the
ligand has stopped interacting with the POI, merely that it is not blocking the radioligand
from binding. Therefore, caution should be used when comparing the kinetic rates between
different methods.
    The final goal was to be able to use experimental data to help guide the REVO simulations
to simulate a rare event, in particular binding of protein-PROTAC dimer to the target
protein. When performing binding simulations, it is not always feasible to experimentally
determine the bound structure as a reference beforehand, especially if there is a large set of
ligand candidates. Furthermore, a particular crystal structure is not necessarily indicative of
the conformations a given complex will take in solution. We need other forms of experimental
data to help determine if the simulation has successfully discovered the bound basin. In
Chapter 5, we were able to use protected residues identified by hydrogen deuterium exchange
(HDX) experiments to predict which residues would be in contact in the ternary complex,
and developed distance metrics to guide the simulations to maximize those contacts and
compared this metric to a ligand root mean square deviation (RMSD) based distance metric.
Both distance metrics were able to simulate the ternary complex formation. However, the
simulations guided by HDX data had a higher probability of predicting bound structures
with lower I-RMSDs after filtering using features from docking and HDX experiments.
    Implementing experimental results into simulations is not just limited to distance metrics
constructed in this thesis. Experimental data could also be helpful in designing a binning
scheme in WE simulations. In the case of HDX data, the number of contacts between a pro-
tein and ligand could be used in ligand (un)binding simulations. Additionally, experimental
data can be used to determine the CV being simulated along in potential energy biasing
simulations such as metadynamics.
    This thesis set out to use MD simulations to help model potential (un)binding pathways of
                                               149


biologically relevant systems and test the validity of these pathways by predicting quantities
such as kinetics that can be compared to experiment, which we did successfully. We first
began with a test system to verify the REVO algorithm, and we were able to determine these
pathways and use the kinetics to accurately calculate the binding free energy. We then moved
to simulate the unbinding of PK-11195 from TSPO, an event that takes about 30 minutes
experimentally and were able to model it dissociating into the membrane and predicted RTs
that were on the same timescale as experiment. We finally simulated the formation of a
ternary complex between a VHL-PROTAC dimer and SMARCA2 and evaluated how well
different distance metrics were able to reach the bound state.
                                             150


BIBLIOGRAPHY
      151


                                    BIBLIOGRAPHY
[1]  Robert A. Copeland, David L. Pompliano, and Thomas D. Meek. Drug–target resi-
     dence time and its implications for lead optimization. Nature Reviews Drug Discovery,
     5:730–739, 2006.
[2]  Robert .A Copeland. The drug–target residence time model: a 10-year retrospective.
     Nature Reviews Drug Discovery, 15:87–95, 2015.
[3]  Visvaldas Kairys, Lina Baranauskiene, Migle Kazlauskiene, Daumantas Matulis, and
     Egidijus Kazlauskas. Binding affinity in drug design: experimental and computational
     techniques. Expert Opinion on Drug Discovery, 14(8):755–768, 2019.
[4]  T. Tanaji Talele, A. Santosh Khedkar, and C. Alan Rigby. Successful applications
     of computer aided drug discovery: Moving drugs from concept to the clinic. Current
     Topics in Medicinal Chemistry, 10:127–141, 2010.
[5]  Vijay Kumar Bhardwaj and Rituraj Purohit. Targeting the protein-protein interface
     pocket of aurora-a-tpx2 complex: rational drug design and validation. Journal of
     Biomolecular Structure and Dynamics, 39(11):3882–3891, 2021.
[6]  Ian M Hastings, William M Watkins, and Nicholas J White. The evolution of drug-
     resistant malaria: the role of drug elimination half-life. Philosophical Transactions B,
     357:505–519, 2002.
[7]  K. Sandy Pang. A review of metabolite kinetics. Journal of Pharmacokinetics and
     Biopharmaceutics, 13:633–662, 1985.
[8]  Georges Vauquelin and Steven J. Charlton. Long-lasting target binding and rebind-
     ing as mechanisms to prolong in vivo drug action. British Journal of Pharmacology,
     161:488–503, 2010.
[9]  L. B Sheiner, D. R. Stanski, S. Vozeh, D. Miller, and J. Ham. Simultaneous modeling
     of pharmacokinetics and pharmacodynamics: application to d-tubocurarine. Clinical
     Pharmacology and Therapeutics, 25:358–371, 1979.
[10] Nicholas H.G. Holford and Lewis B. Sheiner. Kinetics of pharmacologic response.
     Pharmacology & Therapeutics, 16(2):143–166, 1982.
[11] Angela Äbelö, Magdalena Andersson, Ann Aurell Holmberg, and Mats O. Karlsson.
     Application of a combined effect compartment and binding model for gastric acid inhi-
     bition of ar-ho47108: A potassium competitive acid blocker, and its active metabolite
     ar-ho47116 in the dog. European Journal of Pharmaceutical Sciences, 29(2):91–101,
                                             152


     2006.
[12] Ashraf Yassen, Erik Olofsen, Albert Dahan, and Meindert Danhof. Pharmacokinetic-
     pharmacodynamic modeling of the antinociceptive effect of buprenorphine and fentanyl
     in rats: Role of receptor equilibration kinetics. Journal of Pharmacology and Experi-
     mental Therapeutics, 313(3):1136–1149, 2005.
[13] H.-Y. Yun, M.-H. Yun, W. Kang, and K.-I. Kwon. Pharmacokinetics and pharma-
     codynamics of benidipine using a slow receptor-binding model. Journal of Clinical
     Pharmacy and Therapeutics, 30(6):541–547, 2005.
[14] Peter J. Tongue. Drug–target kinetics in drug discovery. ACS Chemical Neuroscience,
     9(1):29–39, 2018.
[15] Yu Zhou, Yan Fu, Wanchao Yin, Jian Li, Wei Wang, Fang Bai, Shengtao Xu, Qi Gong,
     Tao Peng, Yu Hong, Dong Zhang, Dan Zhang, Qiufeng Liu, Yechun Xu, H. Eric Xu,
     Haiyan Zhang, Hualiang Jiang, and Hong Liu. Kinetics-driven drug design strategy
     for next-generation acetylcholinesterase inhibitors to clinical candidate. Journal of
     Medicinal Chemistry, 64(4):1844–1855, 2021.
[16] Samuel D Lotz and Alex Dickson. Unbiased molecular dynamics of 11 min timescale
     drug unbinding reveals transition state stabilizing interactions. Journal of the Ameri-
     can Chemical Society, 140(2):618–628, 2018.
[17] Kruti B. Patel, Olga Kononova, Shuowei Cai, Valeri Barsegov, Virinder S. Parmar,
     Raj Kumar, and Bal Ram Singh. Botulinum neurotoxin inhibitor binding dynamics
     and kinetics relevant for drug design. Biochimica et Biophysica Acta (BBA) - General
     Subjects, 1865(9):129933, 2021.
[18] Barbra Costa, Elenora Da Pozzo, Chiara Giacomelli, Elisabetta Barresi, Sabrina Tal-
     iani, Federico Da Settimo, and Claudia Martini. Tspo ligand residence time: a new
     parameter to predict compound neurosteroidogenic efficacy. Scientific Reports, 6, 2016.
[19] Dong Guo, Thea Mulder-Kriger, Adriaan P. IJzerman, and Laura H Heitman. Func-
     tional efficacy of adenosine a2 a receptor agonists is positively correlated to their recep-
     tor residence time. British Journal of Pharmacology, 166:1846–1859, 2012.
[20] Hao Lu and Peter J Tonge. Drug-target residence time: critical information for lead
     optimization. Current Opinion in Chemical Biology, 14(4):467–474, aug 2010.
[21] Sai Kiran Sharma, Serge K. Lyashchenko, Hijin A. Park, Nagavarakishore Pillarsetty,
     Yorann Roux, Jiong Wu, Sophie Poty, Kathryn M. Tully, John T. Poirier, and Jason S.
     Lewis. A rapid bead-based radioligand binding assay for the determination of target-
     binding fraction and quality control of radiopharmaceuticals. Nuclear Medicine and
     Biology, 71:32–38, 2019.
                                              153


[22] Anni Allikalt and Ago Rinken. Budded baculovirus particles as a source of membrane
     proteins for radioligand binding assay: The case of dopamine d1 receptor. Journal of
     Pharmacological and Toxicological Methods, 86:81–86, 2017.
[23] Michael J. Roy, Sandra Winkler, Scott J. Hughes, Claire Whitworth, Michael Galant,
     Will̃iam Farnaby, Klaus Rumpel, and Alessio Ciulli. SPR-Measured Dissociation Ki-
     netics of PROTAC Ternary Complexes Influence Target Degradation Rate. ACS Chem-
     ical Biology, 14(3):361–368, 2019.
[24] Bojun Xiong, Guilin Jin, Ying Xu, Wenbing You, Yufei Luo, Menghan Fang, Bing
     Chen, Huihui Huang, Jian Yang, Xu Lin, and Changxi Yu. Identification of koumine
     as a translocator protein 18 kda positive allosteric modulator for the treatment of
     inflammatory and neuropathic pain. Frontiers in Pharmacology, 12:1536, 2021.
[25] David A. Sykes, Steven J. Charlton, and Tahsin F. Kellici. Single Step Determination
     of Unlabeled Compound Kinetics Using a Competition Association Binding Method
     Employing Time-Resolved FRET, pages 177–194. Springer New York, New York, NY,
     2018.
[26] David A. Sykes, Leire Borrega-Roman, Clare R. Harwood, Bradley Hoare, Jack M.
     Lochray, Thais Gazzi, Stephen J. Briddon, Marc Nazaré, Uwe Grether, Stephen J.
     Hill, Steven J. Charlton, and Dmitry B. Veprintsev. Kinetic Profiling of Ligands
     and Fragments Binding to GPCRs by TR-FRET, pages 1–32. Springer International
     Publishing, Cham, 2021.
[27] Kunal Khanna, Shankar Mandal, Aaron T. Blanchard, Muneesh Tewari, Alexander
     Johnson-Buck, and Nils G. Walter. Rapid kinetic fingerprinting of single nucleic
     acid molecules by a fret-based dynamic nanosensor. Biosensors and Bioelectronics,
     190:113433, 2021.
[28] Jacob D. Durrant and J. Andrew McCammon. Molecular dynamics simulations and
     drug discovery. BMC Biology, 9:1–9, 2011.
[29] B.J. Alder and T.E Wainwright. Phase transition for a hard sphere system. Journal
     of Chemical Physics, 27:1208–1209, 1957.
[30] J. B. Gibson, A. N. Goland, M. Milgram, and G. H. Vineyard. Dynamics of radiation
     damage. Physical Review, 120:1229–1253, 1960.
[31] Alper T. Celebi, Seyed Hossein Jamali, André Bardow, Thijs J. H. Vlugt, and Oth-
     onas A. Moultos. Finite-size effects of diffusion coefficients computed from molecular
     dynamics: a review of what we have learned so far. Molecular Simulation, 47(10-
     11):831–845, 2021.
[32] Richard M. Venable, Andreas Krämer, and Richard W. Pastor. Molecular dynamics
                                            154


     simulations of membrane permeability. Chemical Reviews, 119(9):5954–5997, 2019.
[33] Arman Fathizadeh, Helmut Schiessel, and Mohammad Reza Ejtehadi. Molecular dy-
     namics simulation of supercoiled dna rings. Macromolecules, 48(1):164–172, 2015.
[34] Samuel D. Lotz and Alex Dickson. Wepy: A flexible software framework for simulating
     rare events with weighted ensemble resampling. ACS Omega, 5(49):31608–31623, 2020.
[35] Nazanin Donyapour, Nicole M. Roussey, and Alex Dickson. REVO: Resampling of
     ensembles by variation optimization. Journal of Chemical Physics, 150(24), 2019.
[36] James A. Maier, Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom,
     Kevin E. Hauser, and Carlos Simmerling. ff14sb: Improving the accuracy of pro-
     tein side chain and backbone parameters from ff99sb. Journal of Chemical Theory and
     Computation, 11(8):3696–3713, 2015.
[37] Chuan Tian, Koushik Kasavajhala, Kellon A. A. Belfon, Lauren Raguette, He Huang,
     Angela N. Migues, John Bickel, Yuzhang Wang, Jorge Pincay, Qin Wu, and Carlos
     Simmerling. ff19sb: Amino-acid-specific protein backbone parameters trained against
     quantum mechanics energy surfaces in solution. Journal of Chemical Theory and Com-
     putation, 16(1):528–552, 2020.
[38] Zhe Huai, Zhaoxi Shen, and Zhaoxi Sun. Binding thermodynamics and interaction
     patterns of inhibitor-major urinary protein-i binding from extensive free-energy cal-
     culations: Benchmarking amber force fields. Journal of Chemical Information and
     Modeling, 61(1):284–297, 2021.
[39] K. Vanommeslaeghe and A. D. MacKerell. Automation of the charmm general force
     field (cgenff) i: Bond perception and atom typing. Journal of Chemical Information
     and Modeling, 52(12):3144–3154, 2012.
[40] K. Vanommeslaeghe, E. Prabhu Raman, and A. D. MacKerell. Automation of the
     charmm general force field (cgenff) ii: Assignment of bonded parameters and partial
     atomic charges. Journal of Chemical Information and Modeling, 52(12):3155–3168,
     2012.
[41] Chris Oostenbrink, Alessandra Villa, Alan E. Mark, and Wilfred F. Van Gunsteren.
     A biomolecular force field based on the free enthalpy of hydration and solvation: The
     gromos force-field parameter sets 53a5 and 53a6. Journal of Computational Chemistry,
     25:1656–1676, 2004.
[42] Chia-en A. Chang, Yu-ming M. Huang, Leonard J. Mueller, and Wanli You. Investi-
     gation of structural dynamics of enzymes and protonation states of substrates using
     computational tools. Catalysts, 6(6), 2016.
                                            155


[43] T. Schneider and E. Stoll. Molecular-dynamics study of a three-dimensional one-
     component model for distortive phase transitions. Phys. Rev. B, 17:1302–1322, 1978.
[44] Bin W. Zhang, David Jasnow, and Daniel M. Zuckerman. The “weighted ensemble”
     path sampling method is statistically exact for a broad class of stochastic processes
     and binning procedures. The Journal of Chemical Physics, 132:054107, 2010.
[45] José Ruiz-Franco, Lorenzo Rovigatti, and Emanuela Zaccarelli. On the effect of the
     thermostat in non-equilibrium molecular dynamics simulations. The European Physical
     Journal E, 41(80):1302–1322, 2018.
[46] Michael Gecht, Marc Siggel, Max Linke, Gerhard Hummer, and Jürgen Köfinger. Md-
     benchmark: A toolkit to optimize the performance of molecular dynamics simulations.
     The Journal of Chemical Physics, 153:144105, 2020.
[47] Ada Sedova, John D. Eblen, Reuben Budiardja, Arnold Tharrington, and Jeremy C.
     Smith. High-performance molecular dynamics simulation for biological and materials
     sciences: Challenges of performance portability. In 2018 IEEE/ACM International
     Workshop on Performance, Portability and Productivity in HPC (P3HPC), pages 1–
     13, 2018.
[48] David E. Shaw, Martin M. Deneroff, Ron O. Dror, Jeffrey S. Kuskin, Richard H.
     Larson, John K. Salmon, Cliff Young, Brannon Batson, Kevin J. Bowers, Jack C. Chao,
     Michael P. Eastwood, Joseph Gagliardo, J. P. Grossman, C. Richard Ho, Douglas J.
     Ierardi, István Kolossváry, John L. Klepeis, Timothy Layman, Christine McLeavey,
     Mark A. Moraes, Rolf Mueller, Edward C. Priest, Yibing Shan, Jochen Spengler,
     Michael Theobald, Brian Towles, and Stanley C. Wang. Anton, a special-purpose
     machine for molecular dynamics simulation. Communications of the ACM, 51(7):91–97,
     2008.
[49] David E. Shaw, Ron O. Dror, John K. Salmon, J. P. Grossman, Kenneth M. Mackenzie,
     Joseph A. Bank, Cliff Young, Martin M. Deneroff, Brannon Batson, Kevin J. Bowers,
     Edmond Chow, Michael P. Eastwood, Douglas J. Ierardi, John L. Klepeis, Jeffrey S.
     Kuskin, Richard H. Larson, Kresten Lindorff-Larsen, Paul Maragakis, Mark A. Moraes,
     Stefano Piana, Yibing Shan, and Brian Towles. Millisecond-scale molecular dynamics
     simulations on anton. In Proceedings of the Conference on High Performance Comput-
     ing Networking, Storage and Analysis, SC ’09, New York, NY, USA, 2009. Association
     for Computing Machinery.
[50] David E. Shaw, J.P. Grossman, Joseph A. Bank, Brannon Batson, J. Adam Butts,
     Jack C. Chao, Martin M. Deneroff, Ron O. Dror, Amos Even, Christopher H. Fenton,
     Anthony Forte, Joseph Gagliardo, Gennette Gill, Brian Greskamp, C. Richard Ho,
     Douglas J. Ierardi, Lev Iserovich, Jeffrey S. Kuskin, Richard H. Larson, Timothy Lay-
     man, Li-Siang Lee, Adam K. Lerer, Chester Li, Daniel Killebrew, Kenneth M. Macken-
     zie, Shark Yeuk-Hai Mok, Mark A. Moraes, Rolf Mueller, Lawrence J. Nociolo, Jon L.
                                             156


     Peticolas, Terry Quan, Daniel Ramot, John K. Salmon, Daniele P. Scarpazza, U. Ben
     Schafer, Naseer Siddique, Christopher W. Snyder, Jochen Spengler, Ping Tak Peter
     Tang, Michael Theobald, Horia Toma, Brian Towles, Benjamin Vitale, Stanley C.
     Wang, and Cliff Young. Anton 2: Raising the bar for performance and programmabil-
     ity in a special purpose molecular dynamics supercomputer. In SC ’14: Proceedings of
     the International Conference for High Performance Computing, Networking, Storage
     and Analysis, pages 41–53, 2014.
[51] Vincent A. Voelz, Gregory R. Bowman, Kyle Beauchamp, and Vijay S. Pande. Molec-
     ular simulation of ab initio protein folding for a millisecond folder ntl9. Journal of the
     American Chemical Society, 132(5):1526–1528, 2010. PMID: 20070076.
[52] David J. Earl and Michael W. Deem. Parallel tempering: Theory, applications, and
     new perspectives. Phys. Chem. Chem. Phys., 7:3910–3916, 2005.
[53] Lukas S. Stelzl and Gerhard Hummer. Kinetics from replica exchange molecular dy-
     namics simulations. Journal of Chemical Theory and Computation, 13(8):3927–3935,
     2017.
[54] Mark J. Abraham and Jill E. Gready. Ensuring mixing efficiency of replica-exchange
     molecular dynamics simulations. Journal of Chemical Theory and Computation,
     4(7):1119–1128, 2008.
[55] Angel E. Garcia, Henry Herce, and Dietmar Paschek. Chapter 5 simulations of temper-
     ature and pressure unfolding of peptides and proteins with replica exchange molecular
     dynamics. In David C. Spellmeyer, editor, Annual Reports in Computational Chem-
     istry, volume 2 of Annual Reports in Computational Chemistry, pages 83–95. Elsevier,
     2006.
[56] Rodrigo Casasnovas, Vittorio Limongelli, Pratyush Tiwary, Paolo Carloni, and Michele
     Parrinello. Unbinding kinetics of a p38 map kinase type ii inhibitor from metadynamics
     simulations. Journal of the American Chemical Society, 139(13):4780–4788, 2017.
[57] Riccardo Capelli, Anna Bochicchio, GiovanniMaria Piccini, Rodrigo Casasnovas, Paolo
     Carloni, and Michele Parrinello. Chasing the full free energy landscape of neurorecep-
     tor/ligand unbinding by metadynamics simulations. Journal of Chemical Theory and
     Computation, 15(5):3354–3361, 2019.
[58] Riccardo Capelli, Wenping Lyu, Viacheslav Bolnykh, Simone Meloni, Jógvan Mag-
     nus Haugaard Olsen, Ursula Rothlisberger, Michele Parrinello, and Paolo Carloni.
     Accuracy of molecular simulation-based predictions of koff values: A metadynamics
     study. The Journal of Physical Chemistry Letters, 11(15):6373–6381, 2020.
[59] Rilei Yu, Nargis Tabassum, and Tao Jiang. Investigation of α-conotoxin unbinding
     using umbrella sampling. Bioorganic & Medicinal Chemistry Letters, 26(4):1296–1300,
                                             157


     2016.
[60] Cameron F. Abrams and Eric Vanden-Eijnden. Large-scale conformational sampling
     of proteins using temperature-accelerated molecular dynamics. Proceedings of the Na-
     tional Academy of Sciences, 107(11):4961–4966, 2010.
[61] Gabriel Stoltz and Eric Vanden-Eijnden. Longtime convergence of the temperature-
     accelerated molecular dynamics method. Nonlinearity, 31(8):3748–3769, 2018.
[62] Shankar Kumar, John M. Rosenberg, Djamal Bouzida, Robert H. Swendsen, and Pe-
     ter A. Kollman. The weighted histogram analysis method for free-energy calculations
     on biomolecules. i. the method. Journal of Computational Chemistry, 13(8):1011–1021,
     1992.
[63] Marc Souaille and Benoıt Roux. Extension to the weighted histogram analysis method:
     combining umbrella sampling with free energy calculations. Computer Physics Com-
     munications, 135(1):40–57, 2001.
[64] Giovanni Bussi and Alessandro Laoi. Using metadynamics to explore complex free-
     energy landscapes. Nature Review Physics, 2:200–212, 2020.
[65] Pratyush Tiwary and Michele Parrinello. From metadynamics to dynamics. Phys. Rev.
     Lett., 111:230602, 2013.
[66] Anton K. Faradjian and Ron Elber. Computing time scales from reaction coordinates
     by milestoning. The Journal of Chemical Physics, 120(23):10880–10889, 2004.
[67] Ron Elber. Long-timescale simulation methods. Current Opinion in Structural Biology,
     15(2):151–156, 2005. Theory and simulation/Macromolecular assemblages.
[68] Surl-Hee Ahn, Benjamin R. Jagger, and Rommie E. Amaro. Ranking of ligand binding
     kinetics using a weighted ensemble approach and comparison with a multiscale mile-
     stoning approach. Journal of Chemical Information and Modeling, 60(11):5340–5352,
     2020.
[69] Camilo Velez-Vega, Ernesto E. Borrero, and Fernando A. Escobedo. Kinetics and re-
     action coordinate for the isomerization of alanine dipeptide by a forward flux sampling
     protocol. The Journal of Chemical Physics, 130(22):225101, 2009.
[70] David Richard and Thomas Speck. Crystallization of hard spheres revisited. i. ex-
     tracting kinetics and free energy landscape from forward flux sampling. The Journal
     of Chemical Physics, 148(12):124110, 2018.
[71] Gary A. Huber and Sangtae Kim. Weighted-ensemble brownian dynamics simulations
     for protein association reactions. Biophysical Journal, 70:97–110, 1996.
                                            158


[72] Daniel M. Zuckerman and Lillian T. Chong. Weighted ensemble simulation: Review
     of methodology, applications, and software. ANN REV BIOPHYS, 46:43–57, 2017.
[73] Daniel M. Zuckerman and Lillian T. Chong. Weighted ensemble simulation: Review of
     methodology, applications, and software. Annual Review of Biophysics, 46(1):43–57,
     2017.
[74] Tom Dixon, Samuel D. Lotz, and Alex Dickson. Predicting ligand binding affinity using
     on- and off-rates for the SAMPL6 SAMPLing challenge. Journal of Computer-Aided
     Molecular Design, 32(10):1001–1012, 2018.
[75] Badi’ Abdul-Wahid, Haoyun Feng, Dinesh Rajan, Ronan Costaouec, Eric Darve, Dou-
     glas Thain, and Jesús A. Izaguirre. AWE-WQ: Fast-Forwarding Molecular Dynamics
     Using the Accelerated Weighted Ensemble. Journal of Chemical Information and Mod-
     eling, 54(10):3033–3043, 2014.
[76] Matthew C. Zwier, Joshua L. Adelman, Joseph W. Kaus, Adam J. Pratt, Kim F.
     Wong, Nicholas B. Rego, Ernesto Suarez, Steveb Lettieri, David W. Wang, Michael
     Grabe, Daniel M. Zuckerman, and Lillian T. Chong. Westpa: An interoperable, highly
     scalable software package for weighted ensemble simulation and analysis. Journal of
     Chemical Theory and Computation, 11(2):800–809, 2015.
[77] Alex Dickson and Samuel D. Lotz. Ligand Release Pathways Obtained with WExplore:
     Residence Times and Mechanisms. Journal of Physical Chemistry B, 120(24):5377–
     5385, 2016.
[78] Matthew C. Zwier, Adam J. Pratt, Joshua L. Adelman, Joseph W. Kaus, Daniel M.
     Zuckerman, and Lillian T. Chong. Efficient atomistic simulation of pathways and
     calculation of rate constants for a protein–peptide binding process: Application to the
     mdm2 protein and an intrinsically disordered p53 peptide. The Journal of Physical
     Chemistry Letters, 7(17):3440–3445, 2016.
[79] Bin W. Zhang, David Jasnow, and Daniel M. Zuckerman. Efficient and verified sim-
     ulation of a path ensemble for conformational change in a united-residue model of
     calmodulin. Proceedings of the National Academy of Sciences, 104(46):18043–18048,
     2007.
[80] Hiroshi Fujisaki, Kei Moritsugu, Ayori Mitsutake, and Hiromichi Suetani. Confor-
     mational change of a biomolecule studied by the weighted ensemble method: Use of
     the diffusion map method to extract reaction coordinates. The Journal of Chemical
     Physics, 149(13):134112, 2018.
[81] Hiroshi Fujisaki, Yasuhiro Matsunaga, and Kei Moritsugu. Weighted ensemble
     simulations for conformational changes of proteins. AIP Conference Proceedings,
     2343(1):020016, 2021.
                                             159


[82] Scott H. Northrup, Stuart A. Allison, and J. Andrew McCammon. Brownian dynamics
     simulation of diffusion-influenced bimolecular reactions. Journal of Chemical Physics,
     80:1517, 1984.
[83] Ali S. Saglam and Lillian T. Chong. Highly efficient computation of the basal kon
     using direct simulation of protein–protein association with flexible molecular models.
     The Journal of Physical Chemistry B, 120(1):117–122, 2016.
[84] Alex Dickson and Charles L. Brooks. Wexplore: Hierarchical exploration of high-
     dimensional spaces using the weighted ensemble algorithm. The Journal of Physical
     Chemistry B, 118(13):3532–3542, 2014.
[85] Alex Dickson, Aryeh Warmflash, and Aaron R. Dinner. Separating forward and back-
     ward pathways in nonequilibrium umbrella sampling. Journal of Chemical Physics,
     131:154104, 2009.
[86] Alex Dickson, Mark Maienschein-Cline, Allison Tovo-Dwyer, Jeff R. Hammond, and
     Aaron R. Dinner. Flow-dependent unfolding and refolding of an rna by nonequilibrium
     umbrella sampling. Journal of Chemical Theory and Computation, 7:2710–2720, 2011.
[87] Eric Vanden-Eijnden and Maddalena Venturoli. Exact rate calculations by trajectory
     parallelization and tilting. Journal of Chemical Physics, 131:044120, 2009.
[88] Ernesto Suárez, Steven Lettieri, Matthew C. Zwier, Carsen A. Stringer, Sundar Raman
     Subramanian, Lillian T. Chong, and Daniel M. Zuckerman. Simultaneous computation
     of dynamical and equilibrium information using a weighted ensemble of trajectories.
     Journal of Chemical Theory and Computation, 10(7):2658–2667, 2014.
[89] Ronan Costaouec, Haoyun Feng, Jesús Izaguirre, and Eric Darve. Analysis of the accel-
     erated weighted ensemble methodology. Discrete and Continuous Dynamical Systems,
     pages 171–181, 2013.
[90] Jan-Hendrik Prinz, Hao Wu, Marco Sarich, Bettina Keller, Martin Senne, Martin
     Held, John D. Chodera, Christof Schütte, and Frank Noé. Markov models of molecular
     kinetics: Generation and validation. The Journal of Chemical Physics, 134(17):174105,
     2011.
[91] C. R. Schwantes, R. T. McGibbon, and V. S. Pande. Perspective: Markov mod-
     els for long-timescale biomolecular dynamics. The Journal of Chemical Physics,
     141(9):090901, 2014.
[92] Robert T. McGibbon, Christian R. Schwantes, and Vijay S. Pande. Statistical model
     selection for markov models of biomolecular dynamics. The Journal of Physical Chem-
     istry B, 118(24):6475–6481, 2014.
                                             160


[93] Yuguang Mu, Phuong H. Nguyen, and Gerhard Stock. Energy landscape of a small
      peptide revealed by dihedral angle principal component analysis. Proteins: Structure,
      Function, and Bioinformatics, 58(1):45–52, 2005.
[94] Ushnish Sengupta, Martín Carballo-Pacheco, and Birgit Strodel. Automated markov
      state models for molecular dynamics simulations of aggregation and self-assembly. The
      Journal of Chemical Physics, 150(11):115101, 2019.
[95] Adam Kells, Alessia Annibale, and Edina Rosta. Limiting relaxation times from
      markov state models. The Journal of Chemical Physics, 149:072324, 2018.
[96] Anita de Ruiter and Chris Oostenbrink. Free energy calculations of protein–ligand
      interactions. Current Opinion in Chemical Biology, 15:547–552, 2011.
[97] Vytautas Gapsys, Servaas Michielssens, Jan Henning Peters, Bert L. de Groot, and
      Hadas Leonov. Calculation of binding free energies. In Molecular Modeling of Proteins
      (Methods and Protocols), pages 173–209. Humana Press, New York, NY, 2014.
[98] Matthew T. Geballe, A. Geoffrey Skillman, Anthony Nicholls, J. Peter Guthrie, and
      Peter J. Taylor. The sampl2 blind prediction challenge: introduction and overview.
      Journal of Computer-Aided Molecular Design, 24:259–279, 2010.
[99] Andrea Rizzi, Steven Murkli, John McNeill, Wei Yao, Mathew Sullivan, Michael K.
      Gilson, Michael W. Chiu, Lyle Isaacs, Bruce C. Gibb, David L. Mobley, and John D.
      Chodera. Overview of the sampl6 host–guest binding affinity prediction challenge.
      Journal of Computer-Aided Molecular Design, 32:937–963, 2018.
[100] Albert C. Pan, David W. Borhani, Ron O. Dror, and David E. Shaw. Molecular
      determinants of drug–receptor binding kinetics. Drug Discovery Today, 18:667–673,
      2013.
[101] Daria B. Kokh, Marta Amaral, Joerg Bomke, Ulrich Grädler, Djordje Musil, Hans-
      Peter Buchstaller, Matthias K. Dreyer, Matthias Frech, Maryse Lowinski, Francois
      Vallee, Marc Bianciotto, Alexey Rak, and Rebecca C. Wade. Estimation of drug-target
      residence times by τ -random acceleration molecular dynamics simulations. Journal of
      Chemical Theory and Computation, 14:3859–3869, 2018.
[102] Alex Dickson, Pratyush Tiwary, and Harish Vashisth. Kinetics of ligand binding
      through advanced computational approaches: A review. Current Topics in Medici-
      nal Chemistry, 17:2626–2641, 2017.
[103] Ivan Teo, Christopher G. Mayne, Klaus Schulten, and Tony Lelièvre. Adaptive mul-
      tilevel splitting method for molecular dynamics calculation of benzamidine-trypsin
      dissociation time. Journal of Chemical Theory and Computation, 12:2983–2989, 2016.
                                            161


[104] Lane W. Votapka, Benjamin R. Jagger, Alexandra L. Heyneman, and Rommie E.
      Amaro. Seekr: Simulation enabled estimation of kinetic rates, a computational tool to
      estimate molecular kinetics and its application to trypsin–benzamidine binding. The
      Journal of Physical Chemistry B, 121(15):3597–3606, 2017.
[105] S. Doerr and G. De Fabritiis. On-the-fly learning and sampling of ligand binding by
      high-throughput molecular simulations. Journal of Chemical Theory and Computation,
      10(5):2064–2069, 2014.
[106] Ignasi Buch, Toni Giorgino, and Gianni De Fabritiis. Complete reconstruction of an
      enzyme-inhibitor binding process by molecular dynamics simulations. Proceedings of
      the National Academy of Sciences of the United States of America, 108(25):10184–
      10189, 2011.
[107] Nuria Plattner and Frank Noé. Protein conformational plasticity and complex ligand-
      binding kinetics explored by atomistic simulations and markov models. Nature Com-
      munications, 6:7653, 2015.
[108] Vittorio Limongelli, Massimiliano Bonomi, and Michele Parrinello. Funnel metady-
      namics as accurate binding free-energy method. Proceedings of the National Academy
      of Sciences of the United States of America, 16:6358–6363, 2013.
[109] Alex Dickson and Samuel D. Lotz. Multiple ligand unbinding pathways and ligand-
      induced destabilization revealed by wexplore. Biophysical Journal, 112:620–629, 2017.
[110] Pratyush Tiwary, Jagannath Mondal, and B. J. Berne. How and when does an anti-
      cancer drug leave its binding site? Science Advances, 3(5):e1700014, 2017.
[111] Hari S. Muddana, C. Daniel Varnado, Christopher W. Bielawski, Adam R. Ur-
      bach, Lyle Issacs, Matthew T. Geballe, and Michael K. Gilson. Bli,nd prediction
      of host–guest binding affinities: a new sampl3 challenge. Journal of Computer-Aided
      Molecular Design, 26:475–487, 2012.
[112] Frank Biedermann and Oren A. Scherman. Cucurbit[8]uril mediated donor–acceptor
      ternary complexes: A model system for studying charge-transfer interactions. The
      Journal of Physical Chemistry B, 116(9):2842–2849, 2012.
[113] Haiying Gan, Christopher J. Benjamin, and Bruce C. Gibb. Nonmonotonic assembly of
      a deep-cavity cavitand. Journal of the American Chemical Society, 133(13):4770–4773,
      2011.
[114] Peter Eastman, Jason Swails, John D. Chodera, Robert T. McGibbon, Yutong Zhao,
      Kyle A. Beauchamp, Lee-Ping Wang, Andrew C. Simmonett, Matthew P. Harrin,
      Chaya D. Stern, Rafal P. Wiewiora, Bernard R. Brooks, and Vijay S. Pande. Openmm
      7: Rapid development of high performance algorithms for molecular dynamics. PLOS
                                             162


      Computational Biology, 13(7):1–17, 2017.
[115] Alex Dickson, Mark Maienschein-Cline, Allison Tovo-Dwyer, Jeff R. Hammond, and
      Aaron R. Dinner. Flow-dependent unfolding and refolding of an rna by nonequilibrium
      umbrella sampling. Journal of Chemical Theory and Computation, 7(9):2710–2720,
      2011.
[116] Ronan Costaouec, Haoyun Feng, Jesús Izaguirre, and Eric Darve. Analysis of the ac-
      celerated weighted ensemble methodology. Discrete & Continuous Dynamical Systems,
      2013:171–181, 2013.
[117] T. Hill. Free energy transduction and biochemical cycle kinetics. Academic Press, 1989.
[118] Aric Hagberg, Pieter Swart, and Daniel S Chult. Exploring network structure, dy-
      namics, and function using networkx. In Proceedings of the 7th Python in Science
      conference (SciPy 2008), pages 11–15, United States, 2009.
[119] Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. Gephi: An open source
      software for exploring and manipulating networks. Proceedings of the International
      AAAI Conference on Web and Social Media, 3:361–362, 2009.
[120] Matthew P. Harrigan, Mohammad M. Sultan, Carlos X. Hernández, Brooke E. Husic,
      Peter Eastman, Christian R. Schwantes, Kyle A. Beauchamp, Robert T. McGibbon,
      and Vijay S. Pande. Msmbuilder: Statistical models for biomolecular dynamics. Bio-
      physical Journal, 112:10–15, 2017.
[121] Alex Dickson. Csnanalysis. https://github.com/ADicksonLab/CSNAnalysis, 2018.
[122] Ken Cherven. Network Graph Analysis and Visualization with Gephi. Packt Publishing,
      2013.
[123] Steven Murkli, John N. McNeill, and Lyle Isaacs. Cucurbit[8]uril•guest complexes:
      blinded dataset for the sampl6 challenge. Supramolecular Chemistry, 31:150–158, 2019.
[124] Atipat Rojnuckarin, Dennis R. Livesay, and Shankar Subramaniam. Bimolecular re-
      action simulation using weighted ensemble brownian dynamics and the university of
      houston brownian dynamics program. Biophysical Journal, 79:686–693, 2000.
[125] Mary J Carroll, Randall V Mauldin, Anna V Gromova, Scott F Singleton, J Edward,
      and Andrew L Lee. Evidence for dynamics in proteins as a mechanism for ligand
      dissociation. Nature Chemical Biology, 8(3):246–252, 2012.
[126] Georges Vauquelin, Sophie Bostoen, Patrick Vanderheyden, and Philip Seeman. Cloza-
      pine, atypical antipsychotics, and the benefits of fast-off D2 dopamine receptor antag-
      onism, volume 385. Springer, 2012.
                                             163


[127] Jianfeng Pei, Ning Yin, Xiaomin Ma, and Luhua Lai. Systems biology brings new
      dimensions for structure-based drug design. Journal of the American Chemical Society,
      136(33):11556–11565, 2014.
[128] Pelin Ayaz, Dorothee Andres, Dennis A. Kwiatkowski, Carl Christian Kolbe, Philip
      Lienau, Gerhard Siemeister, Ulrich Lücking, and Christian M. Stegmann. Confor-
      mational Adaption May Explain the Slow Dissociation Kinetics of Roniciclib (BAY
      1000394), a Type i CDK Inhibitor with Kinetic Selectivity for CDK2 and CDK9. ACS
      Chemical Biology, 11(6):1710–1719, 2016.
[129] Barbara Costa, Eleonora Da Pozzo, Chiara Giacomelli, Elisabetta Barresi, Sabrina
      Taliani, Federico Da Settimo, and Claudia Martini. TSPO ligand residence time: a
      new parameter to predict compound neurosteroidogenic efficacy. Scientific Reports,
      6(August 2015):18164, 2016.
[130] Dong Guo, Laura H. Heitman, and Adriaan P. Ijzerman. The Added Value of Assess-
      ing Ligand-Receptor Binding Kinetics in Drug Discovery. ACS Medicinal Chemistry
      Letters, 7(9):819–821, 2016.
[131] Peter J. Tonge. Drug–Target Kinetics in Drug Discovery. ACS Chemical Neuroscience,
      page acschemneuro.7b00185, 2017.
[132] Kin Sing Stephen Lee, Jun Yang, Jun Niu, Connie J. Ng, Karen M. Wagner, Hua Dong,
      Sean D. Kodani, Debin Wan, Christophe Morisseau, and Bruce D. Hammock. Drug-
      Target Residence Time Affects in Vivo Target Occupancy through Multiple Pathways.
      ACS Central Science, 5(9):1614–1624, 2019.
[133] M. Bernetti, A. Cavalli, and L. Mollica. Protein-ligand (un)binding kinetics as a new
      paradigm for drug discovery at the crossroad between experiments and modelling.
      MedChemComm, 2017.
[134] Dong Guo, Lizi Xia, Jacobus P. D. van Veldhoven, Marc Hazeu, Tamara Mocking,
      Johannes Brussee, Adriaan P. IJzerman, and Laura H. Heitman. Binding Kinetics
      of ZM241385 Derivatives at the Human Adenosine A 2A Receptor. ChemMedChem,
      9(4):752–761, 2014.
[135] Lauren A. Spagnuolo, Sandra Eltschkner, Weixuan Yu, Fereidoon Daryaee, Shabnam
      Davoodi, Susan E. Knudson, Eleanor K H Allen, Jonathan Merino, Annica Pschibul,
      Ben Moree, Neil Thivalapill, James J. Truglio, Joshua Salafsky, Richard A. Slayden,
      Caroline Kisker, and Peter J. Tonge. Evaluating the Contribution of Transition-State
      Destabilization to Changes in the Residence Time of Triazole-Based InhA Inhibitors.
      Journal of the American Chemical Society, 139(9):3417–3429, 2017.
[136] R O Dror, A C Pan, D H Arlow, D W Borhani, P Maragakis, Y Shan, H Xu, and
      D E Shaw. Pathway and mechanism of drug binding to G-protein-coupled receptors.
                                            164


      Proceedings of the National Academy of Sciences of the United States of America,
      108(32):13118–13123, 2011.
[137] Albert C. Pan, Huafeng Xu, Timothy Palpant, and David E. Shaw. Quantitative
      characterization of the binding and unbinding of millimolar drug fragments with
      molecular dynamics simulations. Journal of Chemical Theory and Computation, page
      acs.jctc.7b00172, 2017.
[138] Alex Dickson. Mapping the Ligand Binding Landscape.             Biophysical Journal,
      115(9):1707–1719, 2018.
[139] Agostino Bruno, Elisabetta Barresi, Nicola Simola, Eleonora Da Pozzo, Barbara Costa,
      Ettore Novellino, Federico Da Settimo, Claudia Martini, Sabrina Taliani, and Sandro
      Cosconati. Unbinding of Translocator Protein 18 kDa (TSPO) Ligands: From in
      Vitro Residence Time to in Vivo Efficacy via in Silico Simulations. ACS Chemical
      Neuroscience, 10(8):3805–3814, 2019.
[140] Indrajit Deb and Aaron T. Frank. Accelerating Rare Dissociative Processes in
      Biomolecules Using Selectively Scaled MD Simulations. Journal of Chemical Theory
      and Computation, 15(11):5817–5828, 2019.
[141] Steven E. Kirberger, Peter D. Ycas, Jorden A. Johnson, Chen Chen, Michael F. Ci-
      ccone, Rinette W.L. Woo, Andrew K. Urick, Huda Zahid, Ke Shi, Hideki Aihara,
      Sean D. McAllister, Mohammed Kashani-Sabet, Junwei Shi, Alex Dickson, Camila O.
      Dos Santos, and William C.K. Pomerantz. Selectivity, ligand deconstruction, and cel-
      lular activity analysis of a BPTF bromodomain inhibitor. Organic and Biomolecular
      Chemistry, 17(7):2020–2027, 2019.
[142] Jian Yin, Niel M. Henriksen, David R. Slochower, Michael R. Shirts, Michael W. Chiu,
      David L. Mobley, and Michael K. Gilson. Overview of the sampl5 host–guest challenge:
      Are we doing better? Journal of Computer-Aided Molecular Design, 1:1–19, 2017.
[143] Carlo Camilloni and Fabio Pietrucci. Advanced simulation techniques for the thermo-
      dynamic and kinetic characterization of biological systems. Advances in Physics: X,
      2018.
[144] Marc F. Lensink, Sameer Velankar, and Shoshana J. Wodak. Modeling protein–protein
      and protein–peptide complexes: CAPRI 6th edition. Proteins: Structure, Function and
      Bioinformatics, 85(3):359–377, 2017.
[145] Tristan I. Croll, Massimo D. Sammito, Andriy Kryshtafovych, and Randy J. Read.
      Evaluation of template-based modeling in CASP13. Proteins: Structure, Function and
      Bioinformatics, 87(12):1113–1127, 2019.
[146] Conor D Parks, Zied Gaieb, Michael Chiu, Huanwang Yang, Chenghua Shao,
                                            165


      W. Patrick Walters, Johanna M Jansen, G McGaughey, Richard A Lewis, Scott D Bem-
      benek, Michael K Ameriks, Tara Mirzadegan, Stephen K. Burley, Rommie E. Amaro,
      and Michael K. Gilson. D3R grand challenge 4: blind prediction of protein–ligand
      poses, affinity rankings, and relative binding free energies. Journal of Computer-Aided
      Molecular Design, 34:99–119, 2020.
[147] Synapse. IDG-DREAM Drug-Kinase Binding Prediction Challenge.
[148] Andrea Rizzi, Travis Jensen, David R Slochower, Matteo Aldeghi, Vytautas Gap-
      sys, Dimitris Ntekoumes, Stefano Bosisio, Michail Papadourakis, Niel M Henriksen,
      L De Groot, Zoe Cournia, Alex Dickson, Julien Michel, Michael K Gilson, R Michael,
      David L Mobley, and John D Chodera. The SAMPL6 SAMPLing challenge : As-
      sessing the reliability and efficiency of binding free energy calculations. Journal of
      Computer-Aided Molecular Design, pages 1–33, 2020.
[149] Michael K. Gilson, James A. Given, Bruce L. Bush, and J. Andrew McCammon. The
      statistical-thermodynamic basis for computation of binding affinities: A critical review.
      Biophysical Journal, 72(3):1047–1069, 1997.
[150] Michael R. Shirts and John D. Chodera. Statistically optimal analysis of samples from
      multiple equilibrium states. Journal of Chemical Physics, 129(12):124105, 2008.
[151] Donald Archer and Peiming Wang. The Dielectric Constant of Water and Debye-Hückel
      Limiting Law Slopes. Journal of Physical and Chemical Reference Data, 19(2):371–411,
      1990.
[152] Hyung June Woo and Benoît Roux. Calculation of absolute protein-ligand binding free
      energy from computer simulations. Proceedings of the National Academy of Sciences
      of the United States of America, 102(19):6825–6830, 2005.
[153] Matthew R. Sullivan, Wei Yao, and Bruce C. Gibb. The thermodynamics of guest
      complexation to octa-acid and tetra-endo-methyl octa-acid: reference data for the sixth
      statistical assessment of modeling of proteins and ligands (SAMPL6). Supramolecular
      Chemistry, 31(3):184–189, 2019.
[154] J M Torrie and J P Valleau. Non-physical sampling distributions in Monte-Carlo free-
      energy estimation- umbrella sampling. Journal of Computational Physics, 23:187–199,
      1977.
[155] Naohiro Nishikawa, Kyungreem Han, Xiongwu Wu, Florentina Tofoleanu, and
      Bernard R. Brooks. Comparison of the umbrella sampling and the double decou-
      pling method in binding free energy predictions for SAMPL6 octa-acid host–guest
      challenges. Journal of Computer-Aided Molecular Design, 32(10):1075–1086, 2018.
[156] Lin Frank Song, Nupur Bansal, Zheng Zheng, and Kenneth M. Merz. Detailed potential
                                              166


      of mean force studies on host–guest systems from the SAMPL6 challenge. Journal of
      Computer-Aided Molecular Design, 32(10):1013–1026, 2018.
[157] John G. Kirkwood. Statistical mechanics of fluid mixtures. The Journal of Chemical
      Physics, 3:300–313, 1935.
[158] Agastya P. Bhati, Shunzhou Wan, and Peter V. Coveney. Ensemble-based replica
      exchange alchemical free energy methods: The effect of protein mutations on inhibitor
      binding. Journal of Chemical Theory and Computation, 15:1265–1277, 2019.
[159] Margarita Gutiérrez, Gabriel A. Vallejos, Magdalena P. Cortés, and Carlos Bustos.
      Bennett acceptance ratio method to calculate the binding free energy of bace1 in-
      hibitors: Theoretical model and design of new ligands of the enzyme. Chemical Biology
      & Drug Design, 93:1117–1128, 2019.
[160] Eko Aditya Rifai, Marc van Dijk, Nico P. E. Vermeulen, Arry Yanuar, and Daan P.
      Geerke. A comparative linear interaction energy and mm/pbsa study on sirt1–ligand
      binding free energy calculation. Journal of Chemical Information and Modeling,
      59:4018–4033, 2019.
[161] Robert W. Zwanzig. High-Temperature Equation of State by a Perturbation Method.
      I. Nonpolar Gases. Journal of Chemical Physics, 22:1420–1426, 1954.
[162] William L. Jorgensen and Laura L. Thomas. Perspective on free-energy perturbation
      calculations for chemical equilibria. Journal of Chemical Theory and Computation,
      2008.
[163] Nadine Homeyer, Friederike Stoll, Alexander Hillisch, and Holger Gohlke. Binding free
      energy calculations for lead optimization: Assessment of their accuracy in an industrial
      drug design context. Journal of Chemical Theory and Computation, 10(8):3331–3344,
      2014.
[164] Frederick Bonsack and Sangeetha Sukumari-Ramesh. Tspo: An evolutionarily con-
      served protein with elusive functions. International Journal of Molecular Sciences,
      19(1694), 2018.
[165] F. Li, J. Liu, Y. Zheng, R. M. Garavito, and S. Ferguson-Miller. Crystal structures
      of translocator protein (tspo) and mutant mimic of a human polymorphism. Science,
      347:555–558, 2015.
[166] Y. Guo, R. C. Kalathur, Q. Liu, R. Bruni, C. Ginter, E. Kloppmann, B. Rost, and
      W. A. Hendrickson. Structure and activity of tryptophan-rich tspo proteins. Science,
      347:551–555, 2015.
[167] M. Jaremko, Ł. Jaremko, K. Giller, S. Becker, and M. Zweckstetter. Structure of
                                             167


      the mitochondrial translocator protein in complex with a diagnostic ligand. Science,
      343:1363–1366, 2014.
[168] M. Jaremko, Ł. Jaremko, K. Giller, S. Becker, and M. Zweckstetter. Structural integrity
      of the a147t polymorph of mammalian tspo. ChemBioChem, 16(10):1483–1489, 2015.
[169] H. Li and V. Papadopoulos. Peripheral-type benzodiazepine receptor function in choles-
      terol transport. identification of a putative cholesterol recognition/interaction amino
      acid sequence and consensus pattern. Endocrinology, 139(12):4991–4997, 1998.
[170] Alana M. Scarf, Lars M. Ittner, and Michael Kassiou. The translocator protein (18
      kda): Central nervous system disease and drug design. Journal of Medical Chemistry,
      52:581–592, 2009.
[171] L. Veenman, V. Papadopoulos, and M. Gavish. Channel-like functions of the 18-kda
      translocator protein (tspo): regulation of apoptosis and steroidogenesis as part of the
      host-defense response. Current Pharmaceutical Design, 13(23):2385–2405, 2007.
[172] H. Batoko, V. Veljanovski, and P. Jurkiewicz. Enigmatic translocator protein (tspo)
      and cellular stress regulation. Trends Biochem Sci., 40:497–503, 2015.
[173] Jemma Gatliff, Daniel A. East, Aarti Singh, Maria Soledad Alvarez, Michele Frison,
      Ivana Matic, Caterina Ferraina, Natalie Sampson, Federico Turkheimer, and Michelan-
      gelo Campanella. A role for tspo in mitochondrial ca2+ homeostasis and redox stress
      signaling. Cell Death & Disease, 8:e2896, 2017.
[174] Lan N. Tu, K. Morohaku, P. R. Manna, S. H. Pelton, W. R. Butler, D. M. Stocco, and
      V. Selvaraj. Peripheral benzodiazepine receptor/translocator protein global knock-out
      mice are viable with no effects on steroid hormone biosynthesis. Journal of Biological
      Chemistry, 289:27444–27454, 2014.
[175] Lan N. Tu, Amy H. Zhao, Douglas M. Stocco, and Vimal Selvaraj. Pk11195 effect on
      steroidogenesis is not mediated through the translocator protein (tspo). Endocrinology,
      156:1033–1039, 2015.
[176] Rainer Rupprecht, Vasslilos Papadopoulos, Gerhard Rammes, Thomas C. Baghai, Jin-
      jiang Fan, Nagaraju Akula, Ghislaine Groyer, David Adams, and Michael Schumacher.
      Translocator protein (18 kda) (tspo) as a therapeutic target for neurological and psy-
      chiatric disorders. Nature Reviews Drug Discovery, 9:971–988, 2010.
[177] Mara Perrone, Byung Seook Moon, Hyun Soo Park, Valentino Laquintana, Jae Ho
      Jung, Annalisa Cutrignelli, Angela Lopedota, Massimo Franco, Sang Eun Kim,
      Byung Chul Lee, and Nunzio Denora. A novel pet imaging probe for the detection
      and monitoring of translocator protein 18 kda expression in pathological disorders.
      Scientific Reports, 6(20422), 2016.
                                              168


[178] Barbra Costa, Chiara Giacomelli, Eleonora Da Pozzo, Sabrinia Taliani, Federico Da
      Settimo, and Claudia Martini. The anxiolytic etifoxine binds to tspo ro5-4864 binding
      site with long residence time showing a high neurosteroidogenic activity. ACS Chemical
      Neuroscience, 8:1448–1454, 2017.
[179] Donald Hamelberg, John Mongan, and J McCammon. Accelerated molecular dynam-
      ics: A promising and efficient simulation method for biomolecules. The Journal of
      Chemical Physics, 120:11919–11929, 2004.
[180] B. Isralewitz, M. Gao, and K. Schulten. Steered molecular dynamics and mechanical
      functions of proteins. Current Opinion in Structural Biology, 11(2):224 – 230, 2001.
[181] Fei Li, Jian Liu, Nan Liu, Leslie A. Kuhn, R. Michael Garavito, and Shelagh Ferguson-
      Miller. Translocator protein 18 kda (tspo): An old protein with new functions? Bio-
      chemistry, 55:2821–2831, 2016.
[182] Christophe Chipot, François Dehez, Jason R. Schnell, Nicole Zitzmann, Eva Pebay-
      Peyroula, Laurent J. Catoire, Bruno Miroux, Edmund R. S. Kunji, Gianlugi Veglia,
      Timothy A. Cross, and Paul Schanda. Perturbations of native membrane protein struc-
      ture in alkyl phosphocholine detergents: A critical assessment of nmr and biophysical
      studies. Chemical Reviews, 118:3559–3607, 2018.
[183] Juan Zeng, Riccardo Guareschi, Mangesh Damre, Ruyin Cao, Achim Kless, Bernard
      Neumaier, Andreas Bauer, Alejandro Giorgetti, Paolo Carloni, and Giulia Rossetti.
      Structural prediction of the dimeric form of the mammalian translocator membrane
      protein tspo: A key target for brain diagnostics. International Journal of Molecular
      Sciences, 19(2588), 2018.
[184] Emilia L. Wu, Xi Cheng, Sunhwan Jo, Huan Rui, Kevin C. Song, Eder M. Dávila-
      Contreras, Yifei Qi, Jumin Lee, Viviana Monje-Galvan, Richard M. Venable, Jeffery B.
      Klauda, and Wonpil Im. Charmm-gui membrane builder toward realistic biological
      membrane simulations. Journal of Computational Chemistry., 35:1997–2004, 2014.
[185] Jing Huang and Alexander D MacKerell Jr. Charmm36 all-atom additive protein
      force field: validation based on comparison to nmr data. Journal of Computational
      Chemistry, 34:2135–2145, 2013.
[186] Richard A. Friesner, Jay L. Banks, Robert B. Murphy, Thomas A. Halgren, Jasna J.
      Klicic, Daniel T. Mainz, Matthew P. Repasky, Eric H. Knoll, Mee Shelley, Jason K.
      Perry, David E. Shaw, Perry Francis, and Peter S. Shenkin. Glide: A new approach for
      rapid, accurate docking and scoring. 1. method and assessment of docking accuracy.
      Journal of Medical Chemistry, 47(7):1739–1749, 2004.
[187] Yan Xia, Kaitlyn Ledwitch, Georg Kuenze, Amanda Duran, Jun Li, Charles R.
      Sanders, Charles Manning, and Jems Meiler. A unified structural model of the mam-
                                             169


      malian translocator protein (tspo). Journal of Biomolecular NMR, 73:347–364, 2019.
[188] Schrödinger, LLC. The PyMOL molecular graphics system, version 1.8. November
      2015.
[189] Andrew Midzak, Nagaraju Akula, Laurent Lecanu, and Vassilos Papadopoulos. Novel
      androstenetriol interacts with the mitochondrial translocator protein and controls
      steroidogenesis. Journal of Biological Chemistry, 286:9875–9887, 2011.
[190] Garima Jaipuria, Andrei Leonov, Karin Giller, Suresh Kumar Vasa, Łukasz
      Jaremko, Mariusz Jaremko, Rasmus Linser, Stefan Becker, and Markus Zweckstetter.
      Cholesterol-mediated allosteric regulation of the mitochondrial translocator protein
      structure. Nature Communications, 8:14893, 2017.
[191] H J Motulsky and L C Mahan. The kinetics of competitive radioligand binding pre-
      dicted by the law of mass action. Molecular Pharmacology, 25(1):1–9, 1984.
[192] Tom Dixon, Derek MacPherson, Barmak Mostofian, Taras Dauzhenka, Samuel Lotz,
      Dwight McGee, Sharon Shechter, Utsab R. Shrestha, Rafal Wiewiora, Zachary A.
      McDargh, Fen Pei, Rajat Pal, João V. Ribeiro, Tanner Wilkerson, Vipin Sachdeva,
      Ning Gao, Shourya Jain, Samuel Sparks, Yunxing Li, Alexander Vinitsky, Asghar M.
      Razavi, István Kolossváry, Jason Imbriglio, Artem Evdokimov, Louise Bergeron, Alex
      Dickson, Huafeng Xu, Woody Sherman, and Jesus A. Izaguirre. Atomic-resolution
      prediction of degrader-mediated ternary complex structures by combining molecular
      simulations with hydrogen deuterium exchange. bioRxiv, 2021.
[193] Tao Wu, Hojong Yoon, Yuan Xiong, Sarah E. Dixon-Clarke, Radosław P. Nowak, and
      Eric S. Fischer. Targeted protein degradation as a powerful research tool in basic
      biology and drug target discovery. Nature Structural & Molecular Biology, 27(7):605–
      614, 07 2020.
[194] James Schiemer, Reto Horst, Yilin Meng, Justin I. Montgomery, Yingrong Xu, Xidong
      Feng, Kris Borzilleri, Daniel P. Uccello, Carolyn Leverett, Stephen Brown, Ye Che,
      Matthew F. Brown, Matthew M. Hayward, Adam M. Gilbert, Mark C. Noe, and
      Matthew F. Calabrese. Snapshots and ensembles of btk and ciap1 protein degrader
      ternary complexes. Nature Chemical Biology, 17:152–160, 2021.
[195] Matthieu Schapira, Matthew F. Calabrese, Alex N. Bullock, and Craig M. Crews.
      Targeted protein degradation: expanding the toolbox. Nature Reviews Drug Discovery,
      18(12):949–963, 2019.
[196] Kevin G. Coleman and Craig M. Crews. Proteolysis–Targeting Chimeras: Harnessing
      the Ubiquitin–Proteasome System to Induce Degradation of Specific Target Proteins.
      Annual Review of Cancer Biology, 2(1):1–18, 2017.
                                             170


[197] Mary E. Matyskiela, Weihong Zhang, Hon-Wah Man, George Muller, Godrej Kham-
      batta, Frans Baculi, Matthew Hickman, Laurie LeBrun, Barbra Pagarigan, Gilles
      Carmel, Chin-Chun Lu, Gang Lu, Mariko Riley, Yoshitaka Satoh, Peter Schafer,
      Thomas O. Daniel, James Carmichael, Brian E. Cathers, and Philip P. Chamberlain.
      A Cereblon Modulator (CC-220) with Improved Degradation of Ikaros and Aiolos.
      Journal of Medicinal Chemistry, 61(2):535–542, 2018.
[198] Philip P Chamberlain, Antonia Lopez-Girona, Karen Miller, Gilles Carmel, Barbra
      Pagarigan, Barbara Chie-Leon, Emily Rychak, Laura G Corral, Yan J Ren, Maria
      Wang, Mariko Riley, Silvia L Delker, Takumi Ito, Hideki Ando, Tomoyuki Mori,
      Yoshinori Hirano, Hiroshi Handa, Toshio Hakoshima, Thomas O Daniel, and Brian E
      Cathers. Structure of the human Cereblon–DDB1–lenalidomide complex reveals ba-
      sis for responsiveness to thalidomide analogs. Nature Structural & Molecular Biology,
      21(9):803–809, 2014.
[199] B. A. Kochert, R. E. Iacob, T. E. Wales, A. Makriyannis, and J. R. Engen. Hydrogen-
      deuterium exchange mass spectrometry to study protein complexes. Methods in Molec-
      ular Biology, 1764:153–171, 2018.
[200] Nobumichi Ohoka, Keiichiro Okuhira, Masahiro Ito, Katsunori Nagai, Norihito Shi-
      bata, Takayuki Hattori, Osamu Ujikawa, Kenichiro Shimokawa, Osamu Sano, Ryo-
      kichi Koyama, Hisashi Fujita, Mika Teratani, Hirokazu Matsumoto, Yasuhiro Imaeda,
      Hiroshi Nara, Nobuo Cho, and Mikihiko Naito. In Vivo Knockdown of Pathogenic
      Proteins via Specific and Nongenetic Inhibitor of Apoptosis Protein (IAP)-dependent
      Protein Erasers (SNIPERs)*. Journal of Biological Chemistry, 292(11):4556–4570,
      2017.
[201] Jieli Wei, Fanye Meng, Kwang-Su Park, Hyerin Yim, Julia Velez, Prashasti Kumar,
      Li Wang, Ling Xie, He Chen, Yudao Shen, Emily Teichman, Dongxu Li, Gang Greg
      Wang, Xian Chen, H. Üm̈mit Kaniskan, and Jian Jin. Harnessing the E3 Ligase
      KEAP1 for Targeted Protein Degradation. Journal of the American Chemical Society,
      143(37):15073–15083, 2021.
[202] A Rodriguez-Gonzalez, K Cyrus, M Salcius, K Kim, C M Crews, R J Deshaies, and K M
      Sakamoto. Targeting steroid hormone receptors for ubiquitination and degradation in
      breast and prostate cancer. Oncogene, 27(57):7201–7211, 2008.
[203] Wai-Ching Hon, Michael I Wilson, Karl Harlos, Timothy DW Claridge, Christopher J
      Schofield, Christopher W Pugh, Patrick H Maxwell, Peter J Ratcliffe, David I Stuart,
      and E Yvonne Jones. Structural basis for the recognition of hydroxyproline in hif-1α
      by pvhl. Nature, 417(6892):975–978, 2002.
[204] Kathleen M. Sakamoto, Kyung B. Kim, Akiko Kumagai, Frank Mercurio, Craig M.
      Crews, and Raymond J. Deshaies. Protacs: Chimeric molecules that target proteins
      to the Skp1–Cullin–F box complex for ubiquitination and degradation. Proceedings of
                                             171


      the National Academy of Sciences, 98(15):8554–8559, 2001.
[205] Scott J Hughes and Alessio Ciulli. Molecular recognition of ternary complexes: a new
      dimension in the structure-guided design of chemical degraders. Essays in Biochem-
      istry, 61(5):505–516, 2017.
[206] Adelajda Zorba, Chuong Nguyen, Yingrong Xu, Jeremy Starr, Kris Borzilleri, James
      Smith, Hongyao Zhu, Kathleen A. Farley, WeiDong Ding, James Schiemer, Xi-
      dong Feng, Jeanne S. Chang, Daniel P. Uccello, Jennifer A. Young, Carmen N.
      Garcia-Irrizary, Lara Czabaniuk, Brandon Schuff, Robert Oliver, Justin Montgomery,
      Matthew M. Hayward, Jotham Coe, Jinshan Chen, Mark Niosi, Suman Luthra,
      Jaymin C. Shah, Ayman El-Kattan, Xiayang Qiu, Graham M. West, Mark C.
      Noe, Veerabahu Shanmugasundaram, Adam M. Gilbert, Matthew F. Brown, and
      Matthew F. Calabrese. Delineating the role of cooperativity in the design of
      potent PROTACs for BTK. Proceedings of the National Academy of Sciences,
      115(31):201803662, 2018.
[207] R. P. Nowak, S. L. DeAngelo, D. Buckley, Z. He, K. A. Donovan, J. An, N. Safaee, M. P.
      Jedrychowski, C. M. Ponthier, M. Ishoey, T. Zhang, J. D. Mancias, N. S. Gray, and
      E. S. Bradner, J. E. Fischer. Plasticity in binding confers selectivity in ligand-induced
      protein degradation. Nature Chemical Biology, 14(7):706–714, 2018.
[208] William Farnaby, Manfred Koegl, Michael J Roy, Claire Whitworth, Emelyne Diers,
      Nicole Trainor, David Zollman, Steffen Steurer, Jale Karolyi-Oezguer, Carina Ried-
      mueller, et al. Baf complex vulnerabilities in cancer demonstrated via structure-based
      protac design. Nature chemical biology, 15(7):672–680, 2019.
[209] Hai-Tsang Huang, Dennis Dobrovolsky, Joshiawa Paulk, Guang Yang, Ellen L Weis-
      berg, Zainab M Doctor, Dennis L Buckley, Joong-Heui Cho, Eunhwa Ko, Jaebong
      Jang, et al. A chemoproteomic approach to query the degradable kinome using a
      multi-kinase degrader. Cell chemical biology, 25(1):88–99, 2018.
[210] Daniel P Bondeson, Blake E Smith, George M Burslem, Alexandru D Buhimschi, John
      Hines, Saul Jaime-Figueroa, Jing Wang, Brian D Hamman, Alexey Ishchenko, and
      Craig M Crews. Lessons in protac design from selective degradation with a promiscuous
      warhead. Cell chemical biology, 25(1):78–87, 2018.
[211] Carl C Ward, Jordan I Kleinman, Scott M Brittain, Patrick S Lee, Clive Yik Sham
      Chung, Kenneth Kim, Yana Petri, Jason R Thomas, John A Tallarico, Jeffrey M
      McKenna, et al. Covalent ligand screening uncovers a rnf4 e3 ligase recruiter for
      targeted protein degradation applications. ACS chemical biology, 14(11):2430–2440,
      2019.
[212] Michael Zengerle, Kwok-Ho Chan, and Alessio Ciulli. Selective Small Molecule In-
      duced Degradation of the BET Bromodomain Protein BRD4. ACS Chemical Biology,
                                              172


      10(8):1770–1777, 2015.
[213] Morgan S Gadd, Andrea Testa, Xavier Lucas, Kwok-Ho Chan, Wenzhang Chen, Dou-
      glas J Lamont, Michael Zẽngerle, and Alessio Ciulli. Structural basis of PROTAC
      cooperative recognition for selective protein degradation. Nature Chemical Biology,
      13(5):514–521, 2017.
[214] Andrea Testa, Scott J. Hughes, Xavier Lucas, Jane E. Wright, and Alessio Ciulli.
      Structure-Based Design of a Macrocyclic PROTAC. Angewandte Chemie International
      Edition, 59(4):1727–1734, 2020.
[215] Daniel Zaidman, Jaime Prilusky, and Nir London. PRosettaC: Rosetta Based Modeling
      of PROTAC Mediated Ternary Complexes. Journal of Chemical Information and
      Modeling, 60(10):4894–4903, 2020.
[216] Nan Bai, Palani Kirubakaran, and John Karanicolas. Rationalizing PROTAC-
      mediated ternary complex formation using Rosetta. Journal of Chemical Information
      and Modeling, 61(3):1368–1382, 2021.
[217] Michael L Drummond, Andrew Henry, Huifang Li, and Christopher I Williams. Im-
      proved Accuracy for Modeling PROTAC-Mediated Ternary Complex Formation and
      Targeted Protein Degradation via New In Silico Methodologies. Journal of Chemical
      Information and Modeling, 60(10):5234–5254, 2020.
[218] Muhammed Shaheer, Ravi Singh, and M Elizabeth Sobhia. Protein degradation: a
      novel computational approach to design protein degrader probes for main protease of
      SARS-CoV-2. Journal of Biomolecular Structure and Dynamics, pages 1–13, 2021.
[219] Haoyun Feng, Ronan Costaouec, Eric Darve, and Jesús A. Izaguirre. A comparison of
      weighted ensemble and Markov state model methodologies. The Journal of Chemical
      Physics, 142(21):214113, 2015.
[220] Brooke E Husic and Vijay S Pande. Markov state models: From an art to a science.
      Journal of the American Chemical Society, 140(7):2386–2396, 2018.
[221] Kevin B. Dagbay, Nicolas Bolik-Coulon, Sergey N. Savinov, and Jeanne A. Hardy.
      Caspase-6 Undergoes a Distinct Helix-Strand Interconversion upon Substrate Bind-
      ing*. Journal of Biological Chemistry, 292(12):4885–4897, 2017.
[222] Kevin B. Dagbay and Jeanne A. Hardy. Multiple proteolytic events in caspase-6 self-
      activation impact conformations of discrete structural regions. Proceedings of the Na-
      tional Academy of Sciences of the United States of America, 114(38):E7977–E7986,
      2017.
[223] Derek J. MacPherson, Caitlyn L. Mills, Mary Jo Ondrechen, and Jeanne A. Hardy.
                                            173


      Tri-arginine exosite patch of caspase-6 recruits substrates for hydrolysis. Journal of
      Biological Chemistry, 294(1):71–88, 2019.
[224] Thomas E. Wales, Keith E. Fadgen, Geoff C. Gerhardt, and John R. Engen. High-
      Speed and High-Resolution UPLC Separation at Zero Degrees Celsius. Analytical and
      Bioanalytical Chemistry, 80(17):6815–6820, 2008.
[225] Thomas E. Wales and John R. Engen. Hydrogen exchange mass spectrometry for the
      analysis of protein dynamics. Mass Spectrometry Reviews, 25(1):158–170, 2006.
[226] M. D. Winn, C. C. Ballard, K. D. Cowtan, E. J. Dodson, P. Emsley, P. R. Evans,
      R. M. Keegan, E. B. Krissinel, A. G. W. Leslie, A. McCoy, S. J. McNicholas, G. N.
      Murshudov, N. S. Pannu, E. A. Potterton, H. R. Powell, R. J. Read, A. Vagin, and K. S.
      Wilson. Overview of the ccp4 suite and current developments. Acta Crystallographica
      Section D: Biological Crystallography, 67(4):235–242, 2011.
[227] P. Emsley and K. Cowtan. Coot: model-building tools for molecular graphics. Acta
      Crystallographica Section D: Biological Crystallography, 60(12):2126–2132, 2004.
[228] G. N. Murshudov, A. A. Vagin, and E. J. Dodson. Application of maximum likelihood
      refinement. Refinement of Protein structures, Proceedings of Daresbury Study Weekend,
      1996.
[229] G. N. Murshudov, A. A. Vagin, and E. J. Dodson. Refinement of macromolecular struc-
      tures by the maximum-likelihood method. Acta Crystallographica Section D: Biological
      Crystallography, 53:240–255, 1997.
[230] N. J. Pannu, G. N. Murshudov, E. J. Dodson, and R. A. Read. Incorporation of
      prior phase information strengthen maximum-likelihood structure refinement. Acta
      Crystallographica Section D: Biological Crystallography, 54:1285–1294, 1998.
[231] G. N. Murshudov, A. Lebedev, A. A. Vagin, K. S. Wilson, and E. J. Dodson. Efficient
      anisotropic refinement of macromolecular structures using fft. Acta Crystallographica
      Section D: Biological Crystallography, 55:247–255, 1999.
[232] M. Winn, M. Isupov, and G. N. Murshudov. Use of tls parameters to model anisotropic
      displacements in macromolecular refinement. Acta Crystallographica Section D: Bio-
      logical Crystallography, 57:122–133, 2001.
[233] R. Steiner, A. Lebedev, and G. N. Murshudov. Fisher’s information matrix in max-
      imum likelihood molecular refinement. Acta Crystallographica Section D: Biological
      Crystallography, 59:2114–2124, 2003.
[234] M. Winn, G. N. Murshudov, and M. Z. Papiz. Macromolecular tls refinement in refmac
      at moderate resolutions. Methods in Enzymology, 374:300–321, 2003.
                                             174


[235] P. Skubak, G. N. Murshudov, and N. S. Pannu. Direct incorporation of experimental
      phase information in model refinement. Acta Crystallographica Section D: Biological
      Crystallography, 60:2196–2201, 2004.
[236] A. A. Vagin, R. S. Steiner, A. A. Lebedev, L. Potterton, S. McNicholas, F. Long, and
      G. N. Murshudov. Refmac5 dictionary: organisation of prior chemical knowledge and
      guidelines for its use. Acta Crystallographica Section D: Biological Crystallography,
      60:2284–2295, 2004.
[237] Michael L. Drummond, Andrew Henry, Huifang Li, and Christopher I Williams. Im-
      proved Accuracy for Modeling PROTAC-Mediated Ternary Complex Formation and
      Targeted Protein Degradation via New In Silico Methodologies. Journal of Chemical
      Information and Modeling, 60(10):5234–5254, 2020.
[238] Philipp Pracht, Fabian Bohle, and Stefan Grimme. Automated exploration of the
      low-energy chemical space with fast quantum chemical methods. Phys. Chem. Chem.
      Phys., 22:7169–7192, 2020.
[239] Stefan Grimme. Exploration of chemical compound, conformer, and reaction space
      with meta-dynamics simulations based on tight-binding quantum chemical calcula-
      tions. Journal of Chemical Theory and Computation, 15(5):2847–2862, 2019.
[240] Christoph Bannwarth, Sebastian Ehlert, and Stefan Grimme. Gfn2-xtb—an accurate
      and broadly parametrized self-consistent tight-binding quantum chemical method with
      multipole electrostatics and density-dependent dispersion contributions. Journal of
      Chemical Theory and Computation, 15(3):1652–1671, 2019.
[241] Jeffrey J. Gray, Stewart Moughon, Chu Wang, Ora Schueler-Furman, Brian Kuhlman,
      Carol A. Rohl, and David Baker. Protein–protein docking with simultaneous optimiza-
      tion of rigid-body displacement and side-chain conformations. Journal of Molecular
      Biology, 331(1):281–299, 2003.
[242] Nicholas A Marze, Shourya S Roy Burman, William Sheffler, and Jeffrey J Gray. Effi-
      cient flexible backbone protein–protein docking for challenging targets. Bioinformatics,
      34(20):3461–3469, 2018.
[243] Giovanni Bussi. Hamiltonian replica exchange in gromacs: a flexible implementation.
      Molecular Physics, 112(3-4):379–384, 2014.
[244] Lingle Wang, Richard A. Friesner, and B. J. Berne. Replica exchange with solute
      scaling: a more efficient version of replica exchange with solute tempering (rest2). J
      The Journal of Physical Chemistry B, 115(30):9431–9438, 08 2011.
[245] Xingui Liu, Xuan Zhang, Dongwen Lv, Yaxia Yuan, Guangrong Zheng, and Daohong
      Zhou. Assays and technologies for developing proteolysis targeting chimera degraders.
                                              175


      Future Medicinal Chemistry, 12(12):1155–1179, 2020.
[246] M. C. Deller, L. Kong, and B. Rupp. Protein stability: A crystallographer’s perspec-
      tive. Acta Chrystallographica Section F Structural Biology Communications, 72(2):72–
      95, 2016.
[247] K. B. Dagbay and J. A. Hardy. Multiple proteolytic events in caspase-6 self-activation
      impact conformations of discrete structural regions. Proceedings of the National
      Academy of Sciences of the United States of America, 114(38):E7977–E7986, 2017.
[248] E. S. Gallagher and J. W. Hudgens. Mapping protein-ligand interactions with pro-
      teolytic fragmentation, hydrogen/deuterium exchange-mass spectrometry. Methods in
      Enzymology, 566, 2016.
[249] Ali S. Saglam and Lillian T. Chong. Protein–protein binding pathways and calculations
      of rate constants using fully-continuous, explicit-solvent simulations. Chemical Science,
      10(8):2360–2372, 2018.
[250] Harry C Jubb, Alicia P Higueruelo, Bernardo Ochoa-Montaño, Will R Pitt, David B
      Ascher, and Tom L Blundell. Arpeggio: A web server for calculating and visual-
      ising interatomic interactions in protein structures. Journal of Molecular Biology,
      429(3):365–371, 2017.
[251] Mengru Mira Zhang, Brett R. Beno, Richard Y.-C. Huang, Jagat Adhikari, Ekate-
      rina G. Deyanova, Jing Li, Guodong Chen, and Michael L. Gross. An integrated
      approach for determining a protein–protein binding interface in solution and an eval-
      uation of hydrogen–deuterium exchange kinetics for adjudicating candidate docking
      models. Analytical Chemistry, 91(24):15709–15717, 2019.
[252] S. Eron. Finding a way out of the labyrinth: degrader-induced ternary complex model-
      ing. Finding a way out of the labyrinth: degrader-induced ternary complex modeling.
      July 14, 2021 The Protein Society 35th Anniversary Symposium, July 7-9, 12-14, 2021,
      2021.
[253] Martin K. Scherer, Benjamin Trendelkamp-Schroer, Fabian Paul, Guillermo Pérez-
      Hernández, Moritz Hoffmann, Nuria Plattner, Christoph Wehmeyer, Jan-Hendrik
      Prinz, and Frank Noé. PyEMMA 2: A Software Package for Estimation, Valida-
      tion, and Analysis of Markov Models. Journal of Chemical Theory and Computation,
      11:5525–5542, 2015.
[254] L. Molgedey and H. G. Schuster. Separation of a mixture of independent signals using
      time delayed correlations. Physical Review Letters, 72:3634–3637, 1994.
                                              176