COMPUTATIONAL MOLECULAR DESIGN AND INNOVATION: FROM DRUG
          DISCOVERY TO EMERGING CONTAMINANTS
                                   By
                             Yiğitcan Eken
                          A DISSERTATION
                              Submitted to
                      Michigan State University
              in partial fulfillment of the requirements
                            for the degree of
                 Chemistry⎯Doctor of Philosophy
                                  2021


                                          ABSTRACT
      COMPUTATIONAL MOLECULAR DESIGN AND INNOVATION: FROM DRUG
                        DISCOVERY TO EMERGING CONTAMINANTS
                                                By
                                          Yiğitcan Eken
    Computational approaches have found great utility in areas including drug discovery and
environmental contamination by investigating protein dynamics, binding and interaction
patterns. For drug discovery, in silico biophysical methods serve an important role in reducing
the cost of and accelerating the discovery process, as such methods aid in facilitating the
identification, optimization and screening of potential drug candidates and in providing
important understanding of drug mechanisms of actions and structure activity relationships at the
atomic level. For computational drug discovery and protein modelling strategies, probable
binding conformations of the ligand to its target can be predicted, and these conformations can
be further evaluated by using scoring functions, molecular dynamics and free energy calculations
to determine binding affinities and understand how a ligand recognizes its host.
    Despite the utility of computational approaches in areas such as drug design and the study of
protein functioning, the choice of methods is not straightforward. Because of this, a series of
international blinded host-guest binding prediction challenges are available to identify the most
effective approaches to predict a variety of properties. Some of the methods available for
calculating free energies include free energy perturbation, replica exchange free energy
perturbation and thermodynamic integration approaches, and end-state methods. The later are
most promising due to their reduced computational cost and because there is no need for
intermediate                                   state                                   simulation.


    In this dissertation initially, performance of end-state approaches is considered. Then,
computer simulations and modeling techniques combined with the optimal end-state parameter
choices were used in application studies including; the ligand preference and biological function
of three enzymes (Arthrobacter endoβ-N-acteylglucosaminidase, fibroblast growth factor-2 and
heparanase), the effects of per- and polyfluoroalkyl substance binding on human pregnane X
receptor and peroxisome proliferator-activated receptor gamma, and Ca2+ dependent activation of
protein kinase C.


This dissertation is dedicated to family.
                    iv


                                  ACKNOWLEDGEMENTS
    I am truly grateful for everyone who has supported me throughout my academic career.
Firstly, I would like to thank my Ph.D. advisor and mentor, Professor Angela K. Wilson for her
guidance and support over the years. I would also like to thank the past and current members of
the Wilson group, for the insightful discussions, their support through the years and for amazing
group activities including but not limited to Timothe, Lucas, Michael, Nuno, Semiha, Thanh,
Zack, Hailey, Bradley, Guangyao, Sasha, Jared, Narasimhan, Prajay, Lenin, John D., John P.,
Zainab and Thomas. I would like to thank my committee members at Michigan State University;
Professor Kenneth M. Merz Jr., Professor Xuefei Huang, Professor Edmund Ellsworth and my
former committee member Professor Benjamin G. Levine.
    I would like to thank all my friends. This includes but is certainly not limited to Hadi N.,
Christian S., Aslı Y., Erkan O., Kendal Ş., Şükrü A., Kaya V., Ufuk D., Yavuz B., Refik B.,
Emily C., Chelsea V., Oleksii K., Arial F., for the encouragement and the time, we have spent
from dancing classes, exercises, gamming tournaments to kayaking. I also I want to thank my
dancing teachers Richard and Alejandro for their positive and motivating classes during my time
in Michigan.
    Thank you to my large extended family and all their support from across the U.S. regardless
of the distance your presence and support were always with me. To keep this list succinct: Erhan
S., Ayhan S., Murat S., Müge S., Sabiha E., Gürol Ç., Yasemin Ç., Pelin A., Eric A., Tuba S.G.,
Aslı K., Burcu K. I would like to thank to my cousins; Neslihan Ç., Doruk S., Alper Ç., Ozan B.
I am grateful for the time spent and memories. I also want to specifically acknowledge my
cousin Ugur B. and my aunt Gülşen B. who passed away in 2018.
                                                v


    Special thanks to Lucila Garcia Lopez for her unconditional love, support. Your presence
motivated me during my long nights of study and thank you for making sure I stayed focus and
healthy during my doctoral candidacy exams. I also want to thank to the Garcia family for
accepting/treating me as a family member.
    Finally, I thank to my father, Muhammet E., sister Özgücan E., brother-in-law Burç T. and
special thanks to my mother, Didem E. whom I lost recently in 2021: With our weekly and
sometimes daily conversations you always kept me inspired, focused, and admired me
throughout my life. While there is not enough words to describe how much love and support you
have given to me my whole life, I am grateful for everything you have done for me forever.
                                               vi


                                               TABLE OF CONTENTS
LIST OF TABLES ......................................................................................................................... x
LIST OF FIGURES .................................................................................................................... xiii
CHAPTER ONE ............................................................................................................................ 1
  Introduction ............................................................................................................................... 1
  1.1 Introduction ......................................................................................................................... 2
REFERENCES.............................................................................................................................. 5
CHAPTER TWO ............................................................................................................................ 7
  Theory and Methods in Molecular Modeling ......................................................................... 7
  2.1 Theoretical Background ..................................................................................................... 8
  2.2 Molecular Dynamics ........................................................................................................... 8
  2.3 Binding Free Energy Calculations .................................................................................. 12
  2.4 Sequence Alignment.......................................................................................................... 13
REFERENCES............................................................................................................................ 16
CHAPTER THREE ..................................................................................................................... 19
  SAMPL6 Host–Guest Challenge: Binding Free Energies via a Multistep Approach ...... 19
  3.1 Introduction ....................................................................................................................... 20
  3.2 Methods .............................................................................................................................. 25
     3.2.1 System Preparation ...................................................................................................... 27
     3.2.2 Simulation Protocol ..................................................................................................... 31
     3.2.3 Quantum Mechanical Methods .................................................................................... 32
  3.3 Results ................................................................................................................................ 34
     3.3.1 Cucurbit[8]uril (CB8) .................................................................................................. 36
     3.3.2 Octa acid (OA) ............................................................................................................. 37
     3.3.3 Tetramethyl octa acid (TEMOA) ................................................................................. 39
     3.3.4 Quantum Mechanical Calculations .............................................................................. 40
  3.4 Discussion........................................................................................................................... 44
     3.4.1 Submission Analysis .................................................................................................... 45
     3.4.2 Impact of Truncated Basis Sets.................................................................................... 48
     3.4.3 Impact of the Extrapolation Scheme B-parameter ....................................................... 49
     3.4.4 Impact of representative geometries ............................................................................ 52
  3.5 Conclusions ........................................................................................................................ 56
APPENDIX .................................................................................................................................. 58
REFERENCES............................................................................................................................ 61
CHAPTER FOUR ........................................................................................................................ 70
  SAMPL7: Host–Guest Binding Prediction by Molecular Dynamics and Quantum
  Mechanics ................................................................................................................................ 70
                                                                   vii


 4.1 Introduction ....................................................................................................................... 71
 4.2 Methods .............................................................................................................................. 74
    4.2.1 Molecular dynamics protocol ...................................................................................... 76
    4.2.2 MMPBSA/MMGBSA calculations ............................................................................. 77
    4.2.3 Quantum Mechanical Methods .................................................................................... 78
 4.3 Results ................................................................................................................................ 81
    4.3.1 OA and exoOA Binding Cavities ................................................................................ 81
    4.3.2 Host Guest Binding Poses............................................................................................ 82
 4.4 Discussion........................................................................................................................... 85
    4.4.1 Molecular Dynamics .................................................................................................... 85
    4.4.2 Comparison of Poisson Boltzmann and Generalized Born Solvation Models ............ 88
    4.4.3 Comparison of RESP and AM1 charges ...................................................................... 89
    4.4.4 Solute Entropies ........................................................................................................... 90
    4.4.5 Quantum Mechanics .................................................................................................... 91
    4.4.6 OA Discussion of Results ............................................................................................ 93
    4.4.7 exoOA Discussion of Results ...................................................................................... 94
    4.4.8 Comparison of Gas Phase and Solvated Structures ..................................................... 95
 4.5 Conclusion ......................................................................................................................... 97
APPENDIX ................................................................................................................................ 100
REFERENCES.......................................................................................................................... 109
CHAPTER FIVE ....................................................................................................................... 119
 Chemoenzymatic Synthesis of Glycopeptides Bearing Rare N-Glycan Sequences with or
 without Bisecting GlcNAc .................................................................................................... 119
 5.1 Introduction ..................................................................................................................... 120
 5.2 Computational Methodology ......................................................................................... 123
 5.3 Results and Discussion .................................................................................................... 124
 5.4 Conclusions ...................................................................................................................... 127
REFERENCES.......................................................................................................................... 128
CHAPTER SIX .......................................................................................................................... 131
 Chemical Synthesis of Human Syndecan-4 Glycopeptide Bearing O-, N- Sulfation and
 Multiple Aspartic Acids for Probing Impacts of the Glycan Chain and the Core Peptide
 on Biological Functions......................................................................................................... 131
 6.1 Introduction ..................................................................................................................... 132
 6.2 Computational Methodology ......................................................................................... 134
 6.3 Computational Results and Analyses of the Interactions ........................................... 136
    6.3.1 FGF-2 Binding ........................................................................................................... 136
    6.3.2 Heparanase Binding ................................................................................................... 139
 6.4 Conclusion ....................................................................................................................... 141
APPENDIX ................................................................................................................................ 142
REFERENCES.......................................................................................................................... 144
CHAPTER SEVEN .................................................................................................................... 147
 Binding of Per- and Polyfluoroalkyl Substances to the Human Pregnane X Receptor . 147
 7.1 Introduction ..................................................................................................................... 148
                                                                 viii


 7.2 Materials and Methods ................................................................................................... 152
    7.2.1 Site Analysis and Molecular Docking ....................................................................... 153
    7.2.2 Simulation Protocol ................................................................................................... 153
    7.2.3 Binding Energy Calculations ..................................................................................... 154
    7.2.4 Hydrogen Bond Analysis ........................................................................................... 157
 7.3 Results and Discussion .................................................................................................... 157
    7.3.1 Molecular Docking and MD Simulations .................................................................. 157
    7.3.2 Binding Free Energy Calculations ............................................................................. 158
    7.3.3 PFAS Recognition on hPXR...................................................................................... 160
APPENDIX ................................................................................................................................ 166
REFERENCES.......................................................................................................................... 176
CHAPTER EIGHT .................................................................................................................... 185
 Binding of Per- and Polyfluoro-Alkyl Substances (PFASs) to Peroxisome Proliferator-
 Activated Receptor Gamma (PPAR) ................................................................................. 185
 8.1 Introduction ..................................................................................................................... 186
 8.2 Computational Methods ................................................................................................. 191
    8.2.1 Site Analysis and Molecular Docking ....................................................................... 191
    8.2.2 Simulation Protocol ................................................................................................... 192
    8.2.3 Binding Energy Calculations ..................................................................................... 193
    8.2.4 Hydrogen Bond Analysis ........................................................................................... 194
 8.3 Results and Discussion .................................................................................................... 194
    8.3.1 Binding pockets on PPARγ........................................................................................ 194
    8.3.2 Binding Poses of PFASs ............................................................................................ 195
    8.3.3 Binding Free Energy Calculations (MM-GBSA/MM-PBSA) and Correlation Plots 196
    8.3.4 Residue decomposition analysis ................................................................................ 202
    8.3.5 Hydrogen bonding ..................................................................................................... 209
 8.4 Conclusions ...................................................................................................................... 211
APPENDIX ................................................................................................................................ 213
REFERENCES.......................................................................................................................... 229
CHAPTER NINE ....................................................................................................................... 237
 Mechanisms behind Protein Kinase C (PKC) Activation ................................................. 237
 9.1 Introduction ..................................................................................................................... 238
 9.2 Methods ............................................................................................................................ 241
 9.2 Results and Discussion .................................................................................................... 242
    9.2.1 Sequence Alignment .................................................................................................. 242
    9.2.2 Binding Site Environment Comparison ..................................................................... 244
    9.2.3 Molecular Dynamics Simulations .............................................................................. 245
REFERENCES.......................................................................................................................... 252
CHAPTER TEN ......................................................................................................................... 255
 Conclusions and Future Directions ..................................................................................... 255
                                                                ix


                                                         LIST OF TABLES
Table 3. 1 The binding free energies in kcal mol-1 for the CB8 host–guest systems.................... 36
Table 3. 2 The binding free energies in kcal mol-1 for the OA host–guest systems. .................... 37
Table 3. 3 The binding free energies in kcal mol-1 for the TEMOA host– guest systems. ........... 39
Table 3. 4 The binding free energies for CB8 complexes in kcal mol-1 with various schemes
       involving not using the RI approximation, changing the dielectric constant of the implicit
       solvent with the truncated correlation consistent basis sets for hydrogen. ....................... 41
Table 3. 5 The binding free energies for the CB8 complexes in kcal mol-1 with various schemes
       involving not using the RI approximation, changing the dielectric constant of the implicit
       solvent, and two options for basis set choice when extrapolating to the Kohn–Sham limit.
       ........................................................................................................................................... 43
Table 3. 6 The predicted binding energies for OA and TEMOA using MMPBSA and RI-
       B3PW91 after the removal of mean signed error (MSE) ................................................. 51
Table 3. 7 The predicted binding energies when using different values for B in Eq. 1 for two-
       point extrapolations using cc-pVDZ and cc-pVTZ with RI-B3PW91-D3. ...................... 52
Table 3. 8 Van der Waals volumes in Å3 of CB8 guest molecules are calculated using connection
       table approximation. ......................................................................................................... 59
Table 3. 9 Van der Waals volumes in Å3 of OA and TEMOA guest molecules are calculated
       using connection table approximation. ............................................................................. 60
Table 3. 10 Fitting parameter values obtained when using Jensen’s extrapolation scheme for each
       component in calculating the binding energy (Equation 1). The host and guest are
       counterpoise-corrected before the extrapolation was performed. ..................................... 60
Table 4. 1 The binding free energies in kcal mol−1 for the OA and exoOA host–guest systems
       predicted from MMPBSA/MMGBSA. ............................................................................. 87
Table 4. 2 Calculated binding energies using B2PLYP-D3 vs experimental binding energies,
       using a range of basis sets. The geometry was optimized in the gas phase. Values shown
       are in kcal mol-1. ............................................................................................................... 92
Table 4. 3 SAMPL6-OA host guest binding data used during linear correction. Units are in in
       kcal mol-1. ....................................................................................................................... 101
                                                                        x


Table 4. 4 Calculated binding energies using B2PLYP-D3 vs experimental binding energies,
       using cc-pV(D+d)Z and cc-pV(T+d)Z. The geometry was optimized in the gas phase.
       Values shown are in kcal mol-1. ...................................................................................... 101
Table 4. 5 Root mean square errors (RMSE), mean absolute errors (MAE), mean errors (ME), r2
       correlation coefficients, slope of the correlation plots (m), and Kendall’s Tau (τ) rank
       correlation coefficients for OA and exoOA for the ranked submission. Values shown are
       in kcal mol-1. ................................................................................................................... 102
Table 5. 1 Endo-A Binding energies of various binding poses of 39 and 41 ............................. 125
Table 6. 1 Inhibitory activities of glycopeptide, glycan and peptide towards heparanase (5 nM)
       and their dissociation constant respect to FGF-2 binding measure through biolayer
       interferometry. ................................................................................................................ 134
Table 6. 2 Binding free energy for glycopeptide, glycan and peptide with FGF-2 calculated for
       various poses. .................................................................................................................. 137
Table 6. 3 Binding free energy for glycopeptide 2, peptide 29 and glycan 28 with heparanase
       calculated for various poses. ........................................................................................... 139
Table 6. 4 Average binding free energies and standard deviations calculated for glycan 28,
       peptide 29 and glycopeptide 2 on 3 potential binding sites. ........................................... 143
Table 7. 1 Nomenclature for Perfluoroalkyl Substances (PFASs) Studieda ............................... 152
Table 7. 2 hPXR Residues Interact with PFASs Upon Binding ................................................. 160
Table 7. 3 All PFAS ligands tested. ............................................................................................ 167
Table 7. 4 MMPBSA and MMGBSA relative binding energies of every PFAS tested. ............ 168
Table 7. 5 Long-chain PFAS average per-residue decomposition energies (kcal mol-1). .......... 169
Table 7. 6 Short-chain/alternative PFAS average per-residue decomposition energies. ............ 170
Table 7. 7 Total electrostatic energies of various mutant PFAS-hPXR complexes. .................. 171
Table 8. 1 The PFASs used in this study are listed and are categorized based on their structural
       families: perfluoroalkyl carboxylic acids (PFCAs), perfluorosulfonic acids (PFSAs),
       fluoro telomer alcohols (FTOH), fluoro telomer sulfonic acids (FTSA), fluoro telomer
       carboxylic acids (FTCA)................................................................................................. 214
Table 8. 2 PFASs chemical structures used in this study. .......................................................... 216
Table 8. 3 Binding energies for the dimer pocket and standard deviations in kcal mol-1 for all
       PFASs and L-carnitine. ................................................................................................... 218
                                                                xi


Table 8. 4 Binding energies for the ligand binding pocket (LBP) and standard deviations in kcal
       mol-1 for all PFASs and L-carnitine. .............................................................................. 219
Table 9. 1 Character of PKCα-C2 and PKCδ-C2 binding site residues as obtained from a
       comparison of potential binding site residues. ................................................................ 245
                                                 xii


                                                     LIST OF FIGURES
Figure 2. 1 Leapfrog algorithm steps. In this algorithm velocities are calculated on the midpoints
       of ∆t, whereas positions are calculated explicitly at each ∆t. ............................................. 9
Figure 2. 2 Alignment of two short protein sequences. ................................................................ 13
Figure 2. 3 Blossom 62 matrix is a commonly used substitution matrix. In this matrix arginine to
       arginine substitutions scores +5 and arginine to lysine substitution scores +2, indicating
       substitution of these two positively charged amino acids frequently occurs within the
       functionally related proteins. Whereas arginine to aspartic acid substitution scores -2,
       meaning this substitution is not frequent among functionally related proteins. ............... 14
Figure 3. 1 The guest molecules for the cucurbit[8]uril (CB8). ................................................... 25
Figure 3. 2 The guest molecules for the octa-acid (OA) and tetra methyl octa-acid (TEMOA)
       hosts. ................................................................................................................................. 26
Figure 3. 3 The host molecules: cucurbit[8]uril (CB8), octa-acid (OA), and tetramethyl octa-acid
       (TEMOA). ......................................................................................................................... 26
Figure 3. 4 The structures of the CB8 guest molecules inside the binding pocket. These structures
       are generated from the clustering analysis. ....................................................................... 29
Figure 3. 5 The structures of the OA guest molecules inside the binding pocket. These structures
       are generated from the clustering analysis. ....................................................................... 30
Figure 3. 6 The structures of the TEMOA guest molecules inside the binding pocket. These
       structures are generated from the clustering analysis. ...................................................... 31
Figure 3. 7 Plots for calculated results in Tables 3.1, 3.2 and 3.3 versus experimental results in
       kcal mol-1 for (a) CB8, (b) OA, and (c) TEMOA for MMPBSA (blue), RI-B3PW91-D3
       (black), and RI-B3PW91 (green). The dashed lines in each corresponding color refers to
       the best fit line where the statistical outlier (OA-G2) for RI-B3PW91 and RI B3PW91-
       D3 is removed for b and c. The dashed gray line is the y=x line. .................................... 47
Figure 3. 8 Error plots from experimental results in kcal mol-1 for (a) CB8 (b) OA, and (c)
       TEMOA for MMPBSA (blue), RI-B3PW91- D3 (black), and RI-B3PW91 (green) for the
       submitted results from Tables 3.1, 3.2 and 3.3. ................................................................ 55
Figure 4. 1 Guest molecules in the SAMPL7 GDCC host–guest binding challenge. The binding
       of these eight guest molecules is considered for both OA and exoOA hosts. .................. 75
                                                                   xiii


Figure 4. 2 The guest molecules for the octa-acid (OA) and tetra methyl octa-acid (TEMOA)
       hosts. ................................................................................................................................. 76
Figure 4. 3 a Binding cavity of OA together with G1 (shown in green). b Binding cavity of
       exoOA together with G1 (shown in green). ...................................................................... 82
Figure 4. 4 Binding modes of guest to OA host generated with docking. .................................... 84
Figure 4. 5 Binding modes of guest to exoOA host generated with docking. .............................. 84
Figure 4. 6 a MMPBSA-RESP correlation with experiment. b MMPBSA-RESP correlation with
       experiment after linear correction. The linear correction shifted the y-values (△G
       Calculated) closer to the x-values (experimental) without changing the correlation
       coefficient (r2). .................................................................................................................. 89
Figure 4. 7 Comparison between gas-phase (green) and solvent (blue) optimized structures of
       exoOA-G2. ........................................................................................................................ 96
Figure 4. 8 Geometry optimized structures of OA and exoOA host/guess with B3PW91-D3/cc-
       pVDZ. ............................................................................................................................. 103
Figure 4. 9 RMSD plots of exoOA-G1 and exoOA-G2 MD simulations. ................................. 104
Figure 4. 10 RMSD plots of exoOA-G3 and exoOA-G4 MD simulations. ............................... 104
Figure 4. 11 RMSD plots of exoOA-G5 and exoOA-G6 MD simulations. ............................... 105
Figure 4. 12 RMSD plots of exoOA-G7 and exoOA-G8 MD simulations. ............................... 105
Figure 4. 13 RMSD plots of OA-G1 and OA-G2 MD simulations. ........................................... 106
Figure 4. 14 RMSD plots of OA-G3 and OA-G4 MD simulations. ........................................... 106
Figure 4. 15 RMSD plots of OA-G5 and OA-G6 MD simulations. ........................................... 107
Figure 4. 16 RMSD plots of OA-G7 and OA-G8 MD simulations. ........................................... 107
Figure 4. 17 Correlation plot of SAMPL6-OA host-guest binding. The x-axis provides the
       experimental binding energies and the y-axis contains binding energies predicted by
       RESP-MMPBSA method without solute entropies. A trendline equation is used to correct
       the predicted SAMPL7 binding energies. ....................................................................... 108
Figure 5. 1 Glycan 39 treated with Endo-A enzyme and GlcNAc (unit A) bearing haptoglobin as
       the acceptor glycopeptide leads to a reaction yield of 65% glycopeptide 45. Potential
       branching sites are indicated on the figure with corresponding carbon numbers they
       associate within the saccharide unit shown through letters A, B, C, D, E. 2 .................. 121
                                                                   xiv


Figure 5. 2 Structures of two glycan substrates. Glycan 39 is shown on the left and glycan 41 is
       shown on the right.The additional LewisX trisaccharide thioglycosyl donor group is
       marked with red and the oxazoline ring, where the transglycosylation occurs, is marked
       with blue.......................................................................................................................... 122
Figure 5. 3 Binding pose representations for the two glycans investigated. The figure on the left
       is a snapshot taken from the MD simulation of glycan 39 with Endo-A and the indole
       rings of W216 and W244 are in the perpendicular position. Snapshot taken from the MD
       simulation of glycan 41 with Endo-A is shown on the right, indole rings of W216 and
       W244 are in the parallel position because of the hindrance caused by the additional
       antenna. ........................................................................................................................... 126
Figure 6. 1 Chemical structures of glycopeptide, glycan and peptide synthesized by our
       collaborators. ................................................................................................................... 133
Figure 6. 2 Potential binding Sites on the FGF2 structure. ......................................................... 135
Figure 6. 3 Representative binding pose of glycopeptide to FGF2. ........................................... 138
Figure 6. 4 Comparison of (a) glycan 28 and (b) glycopeptide 2 binding to the site 1 of
       heparanase (heparin binding site). .................................................................................. 140
Figure 7. 1 Binding modes of PFASs to the hPXR ligand binding pocket. ................................ 156
Figure 7. 2 (a) Correlation observed between experimental EC50 values from Zhang et al. and
       predicted binding free energies from MM-GBSA. (b) Correlation observed between EC50
       values from Zhang et al. and predicted binding free energies from MM-PBSA. Error bars
       indicate standard deviations. ........................................................................................... 158
Figure 7. 3 Binding energies of PFASs to hPXR calculated with MM-GBSA in comparison to
       EC50 values measured by Zhang et al. (the predicted binding energies are listed in Table
       7.4). ................................................................................................................................. 159
Figure 7. 4 Hydrogen bond lifetimes observed during MD simulations. ................................... 161
Figure 7. 5 Important residues that mediate ligand stability through hydrogen bonding. ......... 163
Figure 7. 6 Total electrostatic energy (EEL) contribution of various PFASs on binding to mutant
       hPXR complexes. ............................................................................................................ 164
Figure 7. 7 Average residue contributions to the PFAS binding to hPXR calculated from residue
       decomposition. ................................................................................................................ 171
Figure 7. 8 Arg-410 and Lys-210 positioned outside of the binding cavity. .............................. 172
Figure 7. 9 Comparison of VDW and electrostatic energies of every tested ligand. .................. 172
                                                                     xv


Figure 7. 10 Electrostatic energies + energy of solvation calculated by MMGBSA for every
       tested ligand. ................................................................................................................... 173
Figure 7. 11 Binding modes of PFASs to mutant hPXR ligand binding pocket. ....................... 174
Figure 7. 12 Root mean square deviation (RMSD) plots of the highest affinity PFAS poses from
       30ns MD simulations. ..................................................................................................... 175
Figure 8. 1 Binding pockets detected on the PPARγ dimer structure (PDB ID: 3ADV) using
       MOE’s Site Finder. Two potential binding sites are identified and their entrances are
       shown. The surface and area of the binding sites are depicted. The red spheres indicate a
       hydrophilic, while silver depicts hydrophobic surfaces. ................................................. 195
Figure 8. 2 Binding poses of PFASs and L-carnitine on PPARγ. The binding modes that have the
       highest binding affinity determined from MM-PBSA are shown. Residues depicted
       belong to Chain A. .......................................................................................................... 199
Figure 8. 3 Average binding energies of PFASs and L-carnitine calculated with MM-GBSA and
       MM-PBSA for the LBP. PFASs are divided into subgroups: perfluoroalkyl carboxylic
       acids (PFCAs), followed by perfluoroalkyl sulfonic acids (PFSAs), fluorotelomer
       alcohols (FTOHs), fluorotelomer carboxylic acids (FTCAs), fluorotelomer sulfonic acids
       (FTSAs) and then alternatives. Each subgroup was listed from shortest chain length to
       longest (Tables 8.1 and 8.2 for acronyms and structures). ............................................. 201
Figure 8. 4 Average calculated binding energies of PFASs with MM-PBSA in comparison with
       IC50 values determined experimentally by Zhang et. al. On the y-axis, the average
       calculated binding energies are plotted, and along the x-axis, the experimental IC50 values
       are provided. Error bars are depicted in black (MM-PBSA) and red (experimental)..... 202
Figure 8. 5 Binding contribution of each nearby residue for PFASs and L-carnitine (LBP). For
       PFASs, highest affinity poses are averaged and for L-carnitine the highest affinity pose is
       used. ................................................................................................................................ 204
Figure 8. 6 Binding contributions of the acidic and basic residues for PFASs (LBP) in Chain A
       and Chain B..................................................................................................................... 207
Figure 8. 7 Binding contributions of the acidic and basic residues for L-carnitine (LBP) in Chain
       A and Chain B. ................................................................................................................ 208
Figure 8. 8 Hydrogen bond lifetimes for the LBP. The y-axis depicts the chain and residue
       number from the receptor, and in brackets, the atom from the ligand performing the
       hydrogen bonding is shown. Acceptors are portrayed by “(O), (F), (N)”, and donors by
       “(H)”. In the x-axis the different PFASs and L-carnitine are shown. ............................. 209
Figure 8. 9 Binding poses of PFASs and L-carnitine on the PPARγ dimer pocket. The binding
       modes that have the highest binding affinity determined from MM-PBSA are shown.. 220
                                                                   xvi


Figure 8. 10 Average binding energies of PFASs and L-carnitine calculated with MM-GBSA and
       MM-PBSA for the dimer pocket..................................................................................... 222
Figure 8. 11 MM-GBSA in comparison with IC50 values measured experimentally by Zhang et.
       al. for the LBP.17 On the y-axis, average calculated binding energies are plotted, and
       along the x-axis, the experimental IC50 values are provided. Error bars are depicted in
       black (MM-GBSA) and red (experimental). ................................................................... 222
Figure 8. 12 Binding contribution of each nearby residue for PFASs and L-carnitine (dimer
       pocket)............................................................................................................................. 223
Figure 8. 13 Binding contributions of the acidic and basic residues for PFASs (dimer pocket) in
       Chain A and Chain B. ..................................................................................................... 224
Figure 8. 14 Binding contributions of the acidic and basic residues for L-carnitine (dimer pocket)
       in Chain A and Chain B. ................................................................................................. 225
Figure 8. 15 Hydrogen bond lifetimes for the dimer pocket. The y-axis depicts the chain and
       residue number from the receptor, and in brackets, the atom from the ligand performing
       the hydrogen bonding is shown. Acceptors are portrayed by “(O), (F), (N)”, and donors
       by “(H)”. In the x-axis the different PFASs and L-Carnitine are shown ........................ 226
Figure 8. 16 PFOS RMSD plots for the dimer pocket. ............................................................... 226
Figure 8. 17 L-Carnitine RMSD plots for the dimer pocket. ...................................................... 227
Figure 8. 18 PFOS RMSD plots for the LBP pocket. ................................................................. 227
Figure 8. 19 L-Carnitine RMSD plots for the LBP pocket. ........................................................ 228
Figure 9. 1 A schematic of the PKC activation pathway. In the first activation step the Ca2+ binds
       to the C2 domain, increasing the membrane affinity of the enzyme and PKC drifts to the
       membrane. Next, PIP2 that is present in the membrane binds to the C2 domain and
       loosens the C1-C2 domain interaction causing the C1 domain to move inside the
       membrane where it can bind to DAG. After Ca2+, PIP2 and DAG binding is established,
       the pseudo substrate domain leaves the active site in the kinase domain completing the
       activation of the enzyme.4,6 ............................................................................................. 239
Figure 9. 2 PKC subgroups have slightly varying structures and regulators. All isoforms carry a
       kinase domain with an activation loop shown as blue. Both conventional and novel PKCs
       contain a C1 domain that can be regulated by DAG, PS as shown in orange, whereas
       atypical PKC C1 domain can only be regulated by PS. The C2 domain that can be
       regulated by Ca2+ and PIP2 is only present in the conventional subgroup, novel PKCs
       contain a modified C2 domain that lacks the necessary residues for binding. Atypical
       PKCs carry Phox and Bem 1 (PB1) domain instead of the C2 domain present in the other
       subgroups.6 ...................................................................................................................... 240
                                                                 xvii


Figure 9. 3 Comparison of kinase domain of different PKC family isoforms with sequence
       alignment......................................................................................................................... 242
Figure 9. 4 PKC family C1 domain sequence alignment............................................................ 243
Figure 9. 5 PKC family C2 domain sequence alignment............................................................ 243
Figure 9. 6 PKCα-C2 and PKCδ-C2 binding site comparison. Potential sites for hydrogen
       bonding are in purple, hydrophobic regions in green, and neutral regions in white....... 244
Figure 9. 7 PKCα-C2 domain RMSD for the systems in different salt concentrations. ............. 246
Figure 9. 8 PKCδ-C2 domain RMSD for the systems in different salt concentrations. ............. 246
Figure 9. 9 Coulombic and Lennard-Jones interaction energy between PKCα-C2 binding site and
       Ca2+ ions in the system for extended simulation of PKCα-C2 in 150 mM CaCl2. ......... 247
Figure 9. 10 Interaction energy between PKCα-C2 binding site residues and the first Ca2+
       entering the site for extended simulation of PKCα-C2 in 150 mM CaCl2. .................... 248
Figure 9. 11 Interaction energy between PKCα-C2 binding site residues and the second Ca2+
       entering the site for extended simulation of PKCα-C2 in 150 mM CaCl2. .................... 249
Figure 9. 12 Minimum energy frame of PKCα-C2 in 150mM CaCl2. Two Ca2+ and the important
       residues are also shown. .................................................................................................. 250
                                                              xviii


             CHAPTER ONE
Introduction
                  1


1.1 Introduction
    The growth in computational capabilities over the past decade has enabled computational
biochemistry to be used on large biological systems such as proteins, viruses1, and even whole
cells.2 The dynamic nature of proteins are linked to their function and small structural
perturbations in an enzyme can affect proteins’ activities across several orders of magnitude.3
These structural perturbations, also referred to as conformational changes, occur on a timescale
ranging from microseconds to seconds. Molecular dynamics simulations (MD) can be used to
provide insight about the dynamics of proteins in order to understand specific biological
phenomena. MD simulations can also be followed by binding free energy calculations to
generate mechanistic hypotheses or activity predictions which can be further validated by
laboratory experiments.4 In the second chapter of this dissertation, computer-simulations and
modeling methodologies used in the work presented in later chapter are overviewed.
    In the third and fourth chapters of this dissertation, computer simulations and modeling were
used to investigate host-guest binding on Statistical Assessment of the Modeling of Proteins and
Ligands challenges (SAMPL6 and SAMPL7, respectively).5,6 Host-guest structures are smaller
in size and structurally less complex compared to ligand bound proteins. Due to their simplicity,
host-guest systems are utilized in Statistical Assessment of Modeling of Proteins and Ligands
(SAMPL) challenges where the physicochemical properties predicted computationally are
compared with experimental data to assess the reliability of different methods. During SAMPL
challenges, a number of parameter choices have been considered including charge and solvation
schemes and the reliability of Molecular Mechanics Poisson-Boltzmann Surface Area
(MMPBSA) method in the prediction of binding free energies has been assessed. Due to the
success of the methodology, it has been utilized in multiple studies here.
                                                  2


    The research included in the fifth and sixth chapters of this dissertation was performed in
collaboration with Professor Xuefei Huang and his research group from MSU. Chapter 5
describes computational investigation of the substrate binding preference between Arthrobacter
endo-β-N-acteylglucosaminidase (Endo-A) enzyme and rare N-glycans synthesized by the
Huang group. The computational results showed that inactive glycan hinders the gate amino
acids in Endo-A and prohibits active site formation which is consistent with the compound’s low
glycosylation yield.7
    In Chapter 6, the biological activity of human syndecan-4 glycopeptide bearing O-, N-
sulfation and multiple aspartic acids upon heparanase and Fibroblast Growth Factor-2 (FGF-2)
binding was studied computationally. Heparan sulfates (HS) are sulfated polysaccharides that
have a range of biological functions including blood clothing prevention, growth factor and
chemokine binding and controlling activity levels of various enzymes. In vivo, HS exists as a
heterogenous mixtures where the length of their backbone and location of sulfates varies.
Additionally, they can form proteoglycans where HS is covalently linked to a core protein or a
core peptide. Originally, the core peptides are considered as do not possess any biological
function. However, experiments performed by our collaborators and the modelling results shows
that free HS and HS proteoglycan can poses different biological function.8
    PFASs are man-made chemicals that are widely used in industrial products for food
wrappers, fire-fighting foams, carpets, furniture, boots, clothes, non-stick cookware to name only
a few. Regardless of extensive usage of PFAS for more than 50 years, recently environmental
and health concerns related with PFAS exposure is recognized. As the EPA has banned some of
the most common long-chain PFASs such as PFOA and PFOS alternatives such as ADONA,
GenX and PFBS are now commonly used. In chapter 7 and chapter 8 research performed on per-
                                                  3


and polyfluoroalkyl substances (PFASs) are included. PFAS exposure has been linked to a
number of serious health problems ranging from cancer to thyroid disease. In chapter 7, human
pregnane X receptor (hPXR), a known PFAS targets that is important for sensing toxic
substances within body, is studied for PFASs binding.9 In Chapter 8 the same range of PFASs
were studied along with recently discovered alternatives for peroxisome proliferator activated
receptor γ (PPARγ) binding, a type II nuclear receptor fundamental in the regulation of genes,
glucose metabolism, and insulin sensitization.10 The models explain PFASs recognition on hPXR
and PPARγ and potential effects of alternative PFASs on these targets.
    In the nineth chapter of this dissertation, Protein Kinase C (PKC), a family of
serine/threonine kinases involve in controlling various signaling pathways that regulate cell
proliferation, survival, apoptosis, migration, invasion, differentiation, angiogenesis, and drug
resistance is studied. PKC is known to be regulated by Ca2+ ions. By modelling PKC within
various ion and Ca2+ concentrations successive binding of Ca2+ ions are displayed, and
conformational changes related to this process is explained.
    And, finally, the last chapter of this dissertation ends with concluding remarks and possible
future directions stemming from the work described herein.
                                                  4


REFERENCES
     5


                                        REFERENCES
(1)  Zhao, G.; Perilla, J. R.; Yufenyuy, E. L.; Meng, X.; Chen, B.; Ning, J.; Ahn, J.;
     Gronenborn, A. M.; Schulten, K.; Aiken, C.; Zhang, P. Mature HIV-1 Capsid Structure by
     Cryo-Electron Microscopy and All-Atom Molecular Dynamics. Nature 2013, 497 (7451),
     643–646. https://doi.org/10.1038/nature12162.
(2)  Perilla, J. R.; Goh, B. C.; Cassidy, C. K.; Liu, B.; Bernardi, R. C.; Rudack, T.; Yu, H.;
     Wu, Z.; Schulten, K. Molecular Dynamics Simulations of Large Macromolecular
     Complexes.         Curr.      Opin.      Struct.     Biol.     2015,      31,     64–74.
     https://doi.org/10.1016/j.sbi.2015.03.007.
(3)  Mesecar, A. D.; Stoddard, B. L.; Koshland, D. E. Orbital Steering in the Catalytic Power
     of Enzymes: Small Structural Changes with Large Catalytic Consequences. Sci. (80-. ).
     1997, 277 (5323), 202–206. https://doi.org/10.1126/science.277.5323.202.
(4)  Klepeis, J. L.; Lindorff-Larsen, K.; Dror, R. O.; Shaw, D. E. Long-Timescale Molecular
     Dynamics Simulations of Protein Structure and Function. Curr. Opin. Struct. Biol. 2009,
     19 (2), 120–127. https://doi.org/10.1016/j.sbi.2009.03.004.
(5)  Eken, Y.; Patel, P.; Díaz, T.; Jones, M. R.; Wilson, A. K. SAMPL6 Host–Guest
     Challenge: Binding Free Energies via a Multistep Approach. J. Comput. Aided. Mol. Des.
     2018, 32 (10), 1097–1115. https://doi.org/10.1007/s10822-018-0159-1.
(6)  Eken, Y.; Almeida, N. M. S.; Wang, C.; Wilson, A. K. SAMPL7: Host–Guest Binding
     Prediction by Molecular Dynamics and Quantum Mechanics. J. Comput. Aided. Mol. Des.
     2021, 35 (1), 63–77. https://doi.org/10.1007/s10822-020-00357-3.
(7)  Yang, W.; Ramadan, S.; Orwenyo, J.; Kakeshpour, T.; Diaz, T.; Eken, Y.; Sanda, M.;
     Jackson, J. E.; Wilson, A. K.; Huang, X. Chemoenzymatic Synthesis of Glycopeptides
     Bearing Rare N-Glycan Sequences with or without Bisecting GlcNAc. Chem. Sci. 2018, 9
     (43), 8194–8206. https://doi.org/10.1039/c8sc02457j.
(8)  Yang, W.; Eken, Y.; Zhang, J.; Cole, L. E.; Ramadan, S.; Xu, Y.; Zhang, Z.; Liu, J.;
     Wilson, A. K.; Huang, X. Chemical Synthesis of Human Syndecan-4 Glycopeptide
     Bearing O-, N-Sulfation and Multiple Aspartic Acids for Probing Impacts of the Glycan
     Chain and the Core Peptide on Biological Functions. Chem. Sci. 2020, 11 (25), 6393–
     6404. https://doi.org/10.1039/d0sc01140a.
(9)  Lai, T. T.; Eken, Y.; Wilson, A. K. Binding of Per- and Polyfluoroalkyl Substances to the
     Human Pregnane X Receptor. Environ. Sci. Technol. 2020, 54 (24), 15986–15995.
     https://doi.org/10.1021/acs.est.0c04651.
(10) Nuno, A.; Eken, Y.; Wilson, A. K. Binding of Per- and Polyfluoro-Alkyl Substances
     (PFASs) to Peroxisome Proliferator-Activated Receptor Gamma (PPARγ). ACS Omega
     2021, 6 (23), 15103-15114. https://doi.org/10.1021/acsomega.1c01304.
                                               6


                                 CHAPTER TWO
Theory and Methods in Molecular Modeling
                                         7


2.1 Theoretical Background
    Computational chemistry has broad reach, from the description of detailed electronic
manifolds of the smallest of molecules, to the modeling of biological systems. There are a
number of branches of computational chemistry, and two primary areas that are used in the
computational study of biological systems are molecular dynamics (MD) and quantum
mechanics (QM). In QM, the electronic structure of the systems is solved using Schrödinger
equation and electron wavefunctions to insight events like bond breakage or forming. However,
as the system size increases, the computational cost of the calculations with respect to memory
and time becomes impractical. Due to the enormous size of biological macromolecules, QM can
only be applied to a limited number of conformers or treat only part of large systems.1 Therefore,
many studies of biological macromolecules use classical molecular dynamics (MD) to
investigate their systems, and is used in this work. With this focus, the current chapter addresses
the theory behind MD, statistics, force field parameters, bioinformatics and binding free energy
calculations.
2.2 Molecular Dynamics
    Molecular dynamics (MD) is commonly used to simulate macromolecular structures and
dynamics. Biological and chemical systems at the atomistic level on timescales ranging from
femtoseconds to milliseconds can be studied.2 In classical MD, Newtonian mechanics are used to
study the motions and interactions of atoms and molecules within the system.
                                                          ∂V(𝐫(𝑡))
                     𝐹𝑖 (𝑡) = 𝑚𝑖 a𝑖 (𝑡) = 𝑚𝑖 𝒓̈ 𝑖 (𝑡) = −                           (2.1)
                                                           ∂𝒓𝑖 (𝑡)
    Here, Fi (t) represents the total force on particle i at time t, 𝒓̈ 𝑖 (𝑡) as the second derivative of
the position represents the corresponding acceleration ai(t), mi the particle’s mass, ri(t) the
                                                     8


position vector of the particle i. In equation (2.1) V(r(t)) describes the potential energy of an
entire system of N-particles. The initial velocity of the atoms in the system is assigned randomly
according to the Boltzmann distribution function and the accelerations are calculated by the
forces acting on each atom. Versatile second order algorithms are developed to solve Newton’s
equations of motion such as Verlet,3 velocity Verlet4 and the "leapfrog"5 algorithms. Among
those the "leapfrog" algorithm is particularly suitable for solving Newton’s equations because of
its simplicity and stability. Additionally, the "leapfrog" algorithm preserves the time
reversibility.
     Figure 2. 1 Leapfrog algorithm steps. In this algorithm velocities are calculated on the
midpoints of ∆t, whereas positions are calculated explicitly at each ∆t.
                              ∆t            ∆t             ∆t
                      𝐫̇ (𝑡 + 2 ) = 𝐫̇ (𝑡 − 2 ) + 𝐫̈ (𝑡 − 2 ) ∆t                (2.2)
                                                     ∆t             ∆t
                      𝐫(𝑡 + ∆t) = 𝐫(𝑡) + 𝐫̇ (𝑡 + 2 ) ∆t + 𝐫̈ (𝑡 − 2 ) ∆t        (2.3)
     Equation (2.2) defines the velocity calculations in the "leapfrog" algorithm. In this equation,
        ∆t               ∆t
𝐫̇ (𝑡 − 2 ) and 𝐫̇ (𝑡 + 2 ) represents the velocities after and before time step t of the propagation,
        ∆t                                              ∆t
𝐫̈ (𝑡 − 2 ) represents the acceleration at time (𝑡 − 2 ) and ∆t is the selected time step. In the next
time step, velocities are calculated. Then, velocities and accelerations are used to solve equation
(2.3) and find the positions of each particle. In this equation, r(t) and r(t +∆t) stand for the
positions before and after time-step, respectively. In the "leapfrog" algorithm, velocities are not
calculated at the same time as the positions as shown in Figure 2.1. Instead, velocities are
                                                     9


calculated at the midpoints of ∆t using accelerations, determined by the force as shown in
equation (2.3) and those velocities are used to find the positions at t + ∆t. In other words,
velocities at each ∆t are not explicitly calculated in this method but velocities at each ∆t can be
                                                ∆t           ∆t
found by averaging the velocities at (𝑡 − 2 ) and (𝑡 + 2 ). Numerical integration of equations
(2.2) and (2.3) generates the simulation trajectories in which the position of each particle in the
system is evolving in time.
    In MD, the system is represented by the "Ball and Stick Model" where the nuclei are shown
as balls and the bonds between them are represented with springs. Forces are calculated using
classical dynamics, which is only applicable to nuclei. Additionally, MD calculations utilize
force fields (FF), where representative models of empirical potential energy function are used to
estimate interactions and the total energy. The FFs utilized in MD and solving only for nuclei
decreases the required computation time significantly when compared with the time that would
be required with QM. Even though electrons are not represented in MD, their effect is included
in the FF because FFs are parameterized from quantum mechanical calculations and
experimental data such as crystal structures, vibrational frequencies, and molecular geometries.
The typical mathematical expression of force field can be described as follows;6,7
                                                                           𝑉𝑛
     V(𝑟 𝑁 ) = ∑𝑏𝑜𝑛𝑑𝑠 𝑘𝑏 (𝑙 − 𝑙0 )2 + ∑𝑎𝑛𝑔𝑙𝑒𝑠 𝑘𝑎 (𝜃 − 𝜃0 )2 + ∑𝑡𝑜𝑟𝑠𝑖𝑜𝑛𝑠 ∑𝑛  2
                                                                              [1 + cos 𝑛φ − δ𝑛 ] +
                                      12             6
                               𝑟0,𝑖𝑗          𝑟0,𝑖𝑗               𝑞𝑖 𝑞𝑗
                  ∑𝑝𝑎𝑖𝑟𝑠
                   𝑖𝑗    𝜀𝑖𝑗 [(      )   −2 (       ) ] + ∑𝑝𝑎𝑖𝑟𝑠
                                                           𝑖𝑗                  (2.4)
                                𝑟𝑖𝑗            𝑟𝑖𝑗               4𝜋𝜀0 𝑟𝑖𝑗
    The first three terms in the equation (2.4) are known as the bonded terms. Their values
depend on the intramolecular interactions between the atoms such as distances, bending and
torsions. The final two terms represent the potential energy between non-bonded atoms; this
includes electrostatic and van der Waals (vdw) interactions. The most common way of treating
                                                     10


vdw interactions is by using a 12-6 Lennard-Jones potential and the electrostatic energy
described with a Coulomb potential.
    In molecular systems each atom is bonded to only a few other atoms; bonded terms can be
calculated entirely, but there are NN non-bonded interactions for the N-particle system. The non-
bonded terms are commonly treated with cutoff schemes to preserve the cost effectiveness,
which can be done in number of ways. One way is using a truncated non-bonded potentials
where the contribution is set to zero when the distance between two particle is higher than a
designated cutoff distance. Another way is via the switching form, where a second distance is set
to gradually alter the potential and smoothly decrease it to zero on the cut off distance. Yet, in
large systems these approximations may lead to poor results because they often lead to artificial
minima and potential energy of the particles can change suddenly. The particle mesh Ewald
(PME) method is an efficient alternative and is described in the following manner:
                  𝐸𝑡𝑜𝑡𝑎𝑙 (𝑡) = ∑𝑖𝑗 𝜑(𝐫𝑗 − 𝐫𝑖 )                                       (2.5)
                  𝜑(𝐫) ≝ 𝜑𝑠𝑟 (𝒓) + 𝜑𝑙𝑟 (𝒓)                                           (2.6)
                  𝐸𝑡𝑜𝑡𝑎𝑙 = 𝐸𝑠𝑟 + 𝐸𝑙𝑟 = ∑𝑖𝑗 𝜑𝑠𝑟 (𝐫𝑗 − 𝐫𝑖 ) + ∑𝑘 𝚽 ̃ 𝑙𝑟 (𝐤)|𝑝̃(𝐤)|2    (2.7)
    In the PME method the total energy is calculated by the sum of interactions between all atom
pairs as shown in equation (2.5). However, PME methods divide these interactions into long
range (Elr ) and short range (Esr ) interactions and treat them differently (2.6). The last equation of
PME (2.9) shows that short range interactions 𝜑𝑠𝑟 (𝒓) are treated in the direct space sum.
Whereas, long range interactions 𝜑𝑙𝑟 (𝒓) are Fourier transformed and included in the frequency
space sum that leads to the ∑𝑘 𝚽      ̃ 𝑙𝑟 (𝐤)|𝑝̃(𝐤)|2 term in equation (2.7).8 vdw interactions as
calculated with a 12-6 Lennard-Jones potential in the equation (2.4) vanish quickly as the
distance between pairs increases, as a result vdw interactions are commonly calculated till a
                                                    11


cutoff distance (mostly around 10 Å). However, electrostatic interactions are known as long
range interactions, and do not vanish quickly with respect to distance. The PME method is
commonly used for calculating long range interactions and it is particularly useful for lattice
structures with periodic boundary conditions (PBC). In PBC, the system is defined as a unit cell
and replicated infinitely many times in 3D space. In periodic calculations, an atom that crosses
one boundary enters again from the other side of the cell to preserve the unit cell charge and
atom numbers.
2.3 Binding Free Energy Calculations
    Free energy calculations are useful in multiple areas of computational biology such as drug
design, determination of ligand binding energies, and protein structure determination.5 There are
several methods available such as Free Energy Perturbation (FEP),9 Replica exchange Free
Energy Perturbation(REMD)10 and Thermodynamic Integration (TI).11 However, these methods
are computationally demanding with their cost swiftly increasing with respect to system size.
Another effective route via end-state free energy methods, which have reduced computation cost
compared to FEP, REMD and TI.12 Molecular mechanics combined with Poisson–Boltzmann or
generalized Born surface area solvation (MMPBSA/MMGBSA) approaches are arguably the
most popular end state free energy methods, and are frequently used to determine binding free
energies in non-covalently bound receptor-ligand complexes.12,13 MMPBSA/MMGBSA
approaches are also commonly used during our studies to predict the binding energies of protein-
ligand and host-guest systems.14–18 These binding free energies are calculated by subtracting the
free energies of the unbound receptor and ligand from the bound complex as shown below where
solvation free energies are approximated through implicit solvation models;
  ∆𝐺𝐵𝑖𝑛𝑑𝑖𝑛𝑔,𝑆𝑜𝑙𝑣𝑎𝑡𝑒𝑑 = ∆𝐺𝐶𝑜𝑚𝑝𝑙𝑒𝑥,𝑆𝑜𝑙𝑣𝑎𝑡𝑒𝑑 − [∆𝐺𝑅𝑒𝑐𝑒𝑝𝑡𝑜𝑟,𝑆𝑜𝑙𝑣𝑎𝑡𝑒𝑑 + ∆𝐺𝐿𝑖𝑔𝑎𝑛𝑑,𝑆𝑜𝑙𝑣𝑎𝑡𝑒𝑑 ]     (2.8)
                                                12


    More details about MMPBSA/MMGBSA approaches and on how various parameter choices
affect the prediction accuracy are included and discussed in Chapters 3 and 4 where
methodologies are evaluated during SAMPL (Statistical Assessment of Modeling Proteins and
Ligands) challenges.
2.4 Sequence Alignment
    Figure 2. 2 Alignment of two short protein sequences.
    Sequence alignment of proteins is among the most useful computational-based approaches
applied in protein studies. In Figure 2.2 two short protein sequences are aligned to one another.
The structure of a protein comes from the amino acid sequence and these alignments can be used
to extract important insights about genes, or the protein’s function. Additionally, these methods
can be used to compare and evaluate similarities between multiple proteins and detect the
domains which are less conserved within the family of proteins. Sequence alignment approaches
are designed based on probability and statistics. Conserved regions of the protein families with
similar structure and same function are used to calculate the frequency of changing amino acid a
to amino acid b.19 After, the frequencies are determined, these values are converted into the
scoring matrix where the scores for each pair are summed to find the total score. A high score
indicates there is a considerable similarity between the sequences that are compared, and that
                                                13


these proteins can be functionally or evolutionarily related. A low score indicates that the
sequences or proteins are different.
    Figure 2. 3 Blossom 62 matrix is a commonly used substitution matrix. In this matrix
arginine to arginine substitutions scores +5 and arginine to lysine substitution scores +2,
indicating substitution of these two positively charged amino acids frequently occurs within the
functionally related proteins. Whereas arginine to aspartic acid substitution scores -2, meaning
this substitution is not frequent among functionally related proteins.
              𝑆 = ∑(𝑖𝑑𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠, 𝑚𝑖𝑠𝑚𝑎𝑡𝑐ℎ𝑒𝑠) − ∑(𝑔𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑖𝑒𝑠)                    (2.9)
             𝑆𝑐𝑜𝑟𝑒 = 𝑀𝑎𝑥(𝑆)                                                      (2.10)
    The score of an alignment S is calculated as the sum of substitutions determined according to
the substitution matrix used (Figure 2.3) minus the sum of the gap penalty as shown in equation
(2.9). Gap scores typically have different values for opening and extension and the highest score
is considered as the result. Using these alignment scores, the similarity of the protein sequences
                                                14


can be explored. However, the scoring does not include information about the structural
arrangements in the protein.
                                         15


REFERENCES
    16


                                         REFERENCES
(1)  Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C.; Brook, S.;
     Brook, S.; Brook, S. Comparison of Multiple AMBER Force Fields and Development of
     Improved Protien Backbone Parameters. Proteins 2006, 65 (3), 712–725.
     https://doi.org/10.1002/prot.21123.Comparison.
(2)  Salomon-Ferrer, R.; Case, D. A.; Walker, R. C. An Overview of the Amber Biomolecular
     Simulation Package. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2013, 3 (2), 198–210.
     https://doi.org/10.1002/wcms.1121.
(3)  Verlet, L. Computer “Experiments” on Classical Fluids. I. Thermodynamical Properties of
     Lennard-Jones        Molecules.      Phys.      Rev.     1967,    159      (1),    98–103.
     https://doi.org/10.1103/PhysRev.159.98.
(4)  Swope, W. C.; Andersen, H. C.; Berens, P. H.; Wilson, K. R. A Computer Simulation
     Method for the Calculation of Equilibrium Constants for the Formation of Physical
     Clusters of Molecules: Application to Small Water Clusters. J. Chem. Phys. 1982, 76 (1),
     637–649. https://doi.org/10.1063/1.442716.
(5)  Ganesan, A.; Coote, M. L.; Barakat, K. Molecular Dynamics-Driven Drug Discovery:
     Leaping Forward with Confidence. Drug Discov. Today 2017, 22 (2), 249–269.
     https://doi.org/10.1016/j.drudis.2016.11.001.
(6)  Hansson, T.; Oostenbrink, C.; Van Gunsteren, W. F. Molecular Dynamics Simulations.
     Curr. Opin. Struct. Biol. 2002, 12 (2), 190–196. https://doi.org/10.1016/S0959-
     440X(02)00308-1.
(7)  Karplus, M.; McCammon, J. A. Molecular Dynamics Simulations of Biomolecules. Nat.
     Struct. Biol. 2002, 9 (9), 646–652. https://doi.org/10.1038/nsb0902-646.
(8)  Darden, T.; York, D.; Pedersen, L. Particle Mesh Ewald: An N·log(N) Method for Ewald
     Sums in Large Systems. J. Chem. Phys. 1993, 98 (12), 10089–10092.
     https://doi.org/10.1063/1.464397.
(9)  Zwanzig, R. W. High‐temperature Equation of State by a Perturbation Method. I.
     Nonpolar       Gases.      J.    Chem.       Phys.     1954,    22    (8),      1420–1426.
     https://doi.org/10.1063/1.1740409.
(10) Meng, F.; Liu, L.; Chin, P. C.; D’Mello, S. R. Akt Is a Downstream Target of NF-Kappa
     B. J. Biol. Chem. 2002, 277 (33), 29674–29680. https://doi.org/10.1074/jbc.M112464200.
(11) Kubo, R.; Ichimura, H.; Usui, T.; Hashizume, N. Statistical Mechanics; Harper & Row,
     Publishers, Inc.: New York, NY, 1965.
(12) Miller, B. R.; McGee, T. D.; Swails, J. M.; Homeyer, N.; Gohlke, H.; Roitberg, A. E.
     MMPBSA.Py: An Efficient Program for End-State Free Energy Calculations. J. Chem.
                                                17


     Theory Comput. 2012, 8 (9), 3314–3321. https://doi.org/10.1021/ct300418h.
(13) Homeyer, N.; Gohlke, H. Free Energy Calculations by the Molecular Mechanics
     Poisson−Boltzmann Surface Area Method. Mol. Inform. 2012, 31 (2), 114–122.
     https://doi.org/10.1002/minf.201100135.
(14) Yang, W.; Ramadan, S.; Orwenyo, J.; Kakeshpour, T.; Diaz, T.; Eken, Y.; Sanda, M.;
     Jackson, J. E.; Wilson, A. K.; Huang, X. Chemoenzymatic Synthesis of Glycopeptides
     Bearing Rare N-Glycan Sequences with or without Bisecting GlcNAc. Chem. Sci. 2018,
     8194–8206. https://doi.org/10.1039/c8sc02457j.
(15) Eken, Y.; Patel, P.; Díaz, T.; Jones, M. R.; Wilson, A. K. SAMPL6 Host-Guest Challenge:
     Binding Free Energies Via a Multistep Approach. J. Comput. Aided. Mol. Des. 2018, 32
     (10), 1097–1115.
(16) Yang, W.; Eken, Y.; Zhang, J.; Cole, L. E.; Ramadan, S.; Xu, Y.; Zhang, Z.; Liu, J.;
     Wilson, A. K.; Huang, X. Chemical Synthesis of Human Syndecan-4 Glycopeptide
     Bearing O-, N-Sulfation and Multiple Aspartic Acids for Probing Impacts of the Glycan
     Chain and the Core Peptide on Biological Functions. Chem. Sci. 2020, 11 (25), 6393–
     6404. https://doi.org/10.1039/d0sc01140a.
(17) Lai, T. T.; Eken, Y.; Wilson, A. K. Binding of Per- and Polyfluoroalkyl Substances to the
     Human Pregnane X Receptor. Environ. Sci. Technol. 2020, 54 (24), 15986–15995.
     https://doi.org/10.1021/acs.est.0c04651.
(18) Eken, Y.; Almeida, N. M. S.; Wang, C.; Wilson, A. K. SAMPL7: Host–Guest Binding
     Prediction by Molecular Dynamics and Quantum Mechanics. J. Comput. Aided. Mol. Des.
     2021, 35 (1), 63–77. https://doi.org/10.1007/s10822-020-00357-3.
(19) Henikoff, S.; Henikoff, J. G. Amino Acid Substitution Matrices from Protein Blocks.
     Proc.        Natl.      Acad.       Sci.      1992,      89     (22),     10915–10919.
     https://doi.org/10.1073/pnas.89.22.10915.
                                               18


                               CHAPTER THREE
SAMPL6 Host–Guest Challenge: Binding Free Energies via a Multistep Approach
                                       19


About this chapter: This chapter is reprinted from Eken, Y.; Patel, P.; Díaz, T.; Jones, M. R.;
Wilson, A. K. SAMPL6 Host – Guest Challenge: Binding Free Energies via a Multistep
Approach. J. Comput. Aided. Mol. Des. 2018, 32 (10), 1097–1115. with permission of the
Springer Nature. The docking, molecular dynamics simulations and MMPBSA calculations
mentioned in this chapter are performed by Yiğitcan Eken, clustering analysis is done by
Thomas Diaz and quantum mechanical calculations are done by co-authors Prajay Patel, Michael
Jones and Thomas Diaz.
3.1 Introduction
    Tremendous advances in technological capabilities have enabled computational approaches
to be applied to discern a broad range of physical, chemical, and biological phenomena across
scales in molecular science.1–6 With emphasis on molecular design, computational approaches
have found great utility towards innovation in drug discovery. Considering the time and cost of
the drug pipeline, from the discovery process to market, in silico biophysical methods serve an
important role in expediting and reducing the cost of the discovery process, facilitating the
identification, optimization, and refinement of potential drug candidates and providing
comprehensive insight into the mechanism of action and structure–property relationships at the
atomic level that are ultimately critical to a drug’s efficacy.7–12
    In computational strategies towards structure-based design, an important step is the
prediction of probable conformations of a ligand bound to the host. To identify better possible
candidate binding modes, they can be ranked via scoring functions and further evaluated via
molecular simulation and free energy calculations. From free energy calculations, selectivity
profiles may be constructed not only to determine binding affinities but also to provide
understanding into how the ligand recognizes its host.
                                                  20


    Because of the complexity that occurs in ligand-bound protein systems, relatively smaller
representative models such as polymer-based host–guest systems are used to assess free energy
methods.13–18 Although host structures selected to represent proteins are typically much smaller
than proteins, they are large enough to possess a cavity or binding pocket that allows non-
covalent binding of multiple guest molecules. The advantage of using host–guest systems for
assessing free energy methods is that they tend to be more rigid and symmetric than proteins,
which results in fewer conformations that need to be sampled.19–23 Even in the representation of
proteins by more simplistic models, modeling binding free energies for these smaller models is
challenging since no clear “best” computational chemistry approach has been identified; efforts
are needed to better resolve strategies towards predictions of binding free energies. Statistical
Assessment of the Modeling of Proteins and Ligands (SAMPL) blind challenges provide a
unique platform to validate available methods and stimulate the development of new methods for
quantitative predictions.13,16,18,24–26 In these challenges, binding affinities        and other
physicochemical properties are predicted, using computational models without the benefit of
insight from experiment; they are then later compared to unpublished experimental
measurements that allow the comparison of different computational prediction methods.
    For the prediction of free energies, there are many methods available such as free energy
perturbation27, replica exchange free energy perturbation28, and thermodynamic integration29.
However, these methods are computationally demanding (memory, disk space, CPU time) as
they converge poorly for large systems and require dividing the model into multiple intermediate
steps or multiple configurations for accurate predictions.30–32 In contrast, end state free energy
methods are path-independent and do not require sampling of multiple configurations. These
methods offer a simpler, computationally less costly approach to predict the binding free
                                               21


energy.33–38 Moreover, implicit solvation models can be used to further reduce the computational
cost while predicting binding free energies similar to these more sophisticated and
computationally demanding methods.
    While classical molecular dynamics (MD) methods are commonly used to investigate host–
guest interactions, molecular mechanics (MM) force fields result in a limited treatment of effects
resulting from polarization, charge transfer, and many body effects which can impact the
description of properties such as binding free energies.9,39–43 To better account for these effects,
quantum mechanical (QM) approaches, which are more costly, are commonly used in drug
discovery research9,44, and have been used in previous SAMPL competitions.17,45–47 For example,
in the SAMPL5 competition for host–guest binding, Caldarau et al.45 used density functional
theory (DFT) with an added dispersion correction (DFT-D3) and the wavefunction-based
domain-based local pair-natural orbital coupled cluster (DLPNO-CCSD(T)) method to predict
the binding energies for octa-acid (OA) host–guest systems. In this approach, they used TPSS-
D3/def2-SVP optimized structures and host structures are constrained during MD simulations to
reduce the flexibility of the host and limit the structural distortions resulting from the repulsion
between the negative charge of the ligands and the large negative charge of the OA hosts. This
approach yielded binding energies approximately 12.0 kcal mol-1 greater than the experimental
binding affinities, with a low correlation coefficient (r2≈0), and a statistically insignificant
Kendall’s rank correlation coefficient (τ ≤0.20) for all attempts for the host–guest systems in the
SAMPL5 blind challenge due to incorrect representative structures, not sampling enough
conformational binding positions for ligands, and thermochemical corrections that yielded up to
a 7.2 kcal mol-1 difference depending on the method of choice. This performance demonstrates
                                                 22


the limited sampling capabilities of current QM methods compared to MD methods, obtained
representative structures, as well as thermodynamic and solvation corrections.
    Contrary to this, in the SAMPL4 competition for host–guest binding, Mikulskis et al.47 were
successful with both MM- and QM-based approaches for OA hosts with mean absolute
deviations (MADs) less than 2.0 kcal mol-1. Their MM approach, which utilized free energy
perturbation (FEP) calculations, yielded MADs of approximately 1.0 kcal mol-1 while their QM
approaches with DFTD3 optimized structures yielded MADs of approximately 1.0–2.0 kcal mol-
1
  depending on the implementation of a solvent in the calculations, i.e. no solvent, implicit
solvent, or a combined implicit-explicit solvent. However, the combination of FEP and DFT-D3
did not yield favorable results due to the large difference between the MM and DFT potential
energy functions. Sure et al.46 provided another successful attempt at using DFT-D3 for the
SAMPL4 competition for host–guest binding of a macrocyclic cucurbit[7]uril host by optimizing
the geometry at the TPSS-D3/def2-TZVP level of theory after pre-optimizing possible binding
scenarios with the HF-3c semiempirical method. These optimizations were followed by single
point calculations using PW6B95-D3/def2-QZVP with the g- and f-functions for non-hydrogen
and hydrogen atoms removed, respectively, with the COSMO-RS implicit solvent model, which
yielded an MAD of 2.0±0.5 kcal mol-1. These two studies highlight that for the SAMPL4
competition, host–guest structure optimization and higher-level MM-based approaches like FEP
can be vital in characterizing correct binding interactions at the QM level.
    In this work, efforts in MD and QM methods are combined to predict binding affinities for
fourteen ligands to a macrocyclic cucurbit[8]uril host19,21,22,48 and eight ligands to two variants of
the OA deep-cavity cavitands.20,23 Using MD simulations to obtain representative structures,
MM- and QM-based methods are utilized to predict binding free energies. Within the QM
                                                 23


methods, the use of a resolution-of-the-identity (RI) approximation designed for larger
molecules49, Grimme’s D3 atom-pairwise dispersion corrections with Becke-Johnson damping50,
and truncated correlation consistent basis sets for the hydrogen atoms51 are evaluated to probe
how different electronic structure approaches that reduce the computational cost contribute to
predicting binding affinities. Insights into what strategies are more favorable for host guest-
binding will help to build a framework for predicting host–guest binding affinities using QM
approaches.
                                                24


3.2 Methods
    Figure 3. 1 The guest molecules for the cucurbit[8]uril (CB8).
                                               25


    Figure 3. 2 The guest molecules for the octa-acid (OA) and tetra methyl octa-acid (TEMOA)
hosts.
    Figure 3. 3 The host molecules: cucurbit[8]uril (CB8), octa-acid (OA), and tetramethyl octa-
acid (TEMOA).
                                               26


3.2.1 System Preparation
    The initial structures for the guest molecules, shown in Figures. 3.1 and 3.2, and the three
host molecules, shown in Figure 3.3, cucurbit[8]uril (CB8), octa-acid (OA), and tetramethyl
octaacid (TEMOA), that were issued with the SAMPL6 challenge dataset were used to generate
the host–guest systems. The CB8 molecule has no formal charge whereas the octaacids
(OA/TEMOA) have eight deprotonated carboxylic acid groups and thus a formal charge of −8.
Even though OA and TEMOA are water-soluble structurally similar deep-cavity cavitands, the
TEMOA host has four methyl groups in place of four hydrogen atoms present in the OA host
located on the upper rim of the cavitand that enclose the hydrophobic binding pocket.
    Initial binding poses of guest molecules binding to the host were generated using the docking
feature implemented in Molecular Operating Environment (MOE) v2016.0852. The London ΔG
scoring function53 was used to estimate ligand placement in the pocket. The top 100 poses given
by the London ΔG scoring function for each host–guest complex were refined to a list of ten
poses by rescoring the flexible receptor and ligand conformation using the GBVI/WSA ΔG
scoring algorithm.52,53 Among these ten poses, those with minor structural differences were
discarded from the list of ten poses and the chemically relevant poses with the highest
GBVI/WSA ΔG scores were selected for further investigation. The chosen host–guest poses
were minimized under the AMBER10: Extended Hückel Theory (EHT) potential implemented in
MOE, which employs Amber ff10 and EHT bonded parameters.54–56
    To generate force field parameters, the AM1-BCC scheme57 was used for generating partial
charges for the guest and host molecules using the Antechamber suite. As the guest molecule
CB8-G13 contains a platinum atom, the Mulliken charges were calculated from a geometry
optimization using B3LYP58,59 in conjunction with the 6-31G(d) basis set60 and the effective core
                                                27


potential basis set Lan2L2DZ61 for the platinum atom. All electronic structure calculations were
performed in Gaussian 1662 The host–guest systems were further prepared for simulation using
the Leap module of AmberTools63 under the General Amber Force Field (GAFF).56 Each system
was neutralized with counterions using parameters from Joung and Cheatham64, and solvated in a
14.0 Å cube of TIP4P-Ew water65 beyond the solute. To mimic the ionic strength of the
experimental buffers, additional counter ions were added to create a buffer of 150 mM sodium
chloride for the CB8 complexes and 60 mM sodium chloride for the OA/TEMOA complexes.
                                              28


    Figure 3. 4 The structures of the CB8 guest molecules inside the binding pocket. These
structures are generated from the clustering analysis.
                                                29


    Figure 3. 5 The structures of the OA guest molecules inside the binding pocket. These
structures are generated from the clustering analysis.
                                                30


    Figure 3. 6 The structures of the TEMOA guest molecules inside the binding pocket. These
structures are generated from the clustering analysis.
3.2.2 Simulation Protocol
    The host–guest systems were relaxed using NVT ensembles over six minimization
procedures with decreasing restraints on the host of 500.0, 200.0, 20.0, 10.0, 5.0 kcal/ mol (Å2),
and then were heated to 300 K over 30 ps. The temperature was maintained at 300 K using
Langevin dynamics and the pressure was coupled to 1 atm using isotropic position scaling.
Atomistic molecular dynamics simulations were performed for 10 ns in triplicate to account for
                                                31


randomized parameters that affect the MD trajectories. Nonbonded interactions were truncated
with a 10.0 Å cutoff, whereas long-range electrostatics were handled with the particle-mesh
Ewald (PME) method. Bonds involving hydrogen were constrained using SHAKE, and the
simulation time step was set to 2 fs. All simulations were performed with AMBER16.7.63
    The binding free energies were calculated with MMPBSA approach using the built-in PBSA-
solver.66 The internal and external dielectric constants were set to 1.0 and 80.0, respectively. The
solvent accessible surface area (SASA) was determined with the default LCPO method using the
modified Bondi atomic radii. Calculations for solute entropic contributions were not considered.
For each system, the binding free energy was determined using the final 100 frames from the
simulation.
    Clusters were formed using the density-based spatial clustering of applications with noise
(DBSCAN) algorithm based on two parameters, which are epsilon (Eps) and the minimum
number of points in an Eps-neighborhood (MinPts). MinPts was set to 4 and the Eps value for
DBSCAN was determined from the threshold point of a sorted 4-dist graph.67 The cluster
conformation representing the greatest number of frames from the MD simulations was used for
further analyses. Additional QSAR (quantitative structure–activity relationship) calculations
were performed on each guest molecule to determine the van der Waals volume each molecule
occupies by using the connection table approximation descriptor in MOE (Tables 3.9 and 3.10).
3.2.3 Quantum Mechanical Methods
    The individual structures generated from the clustering of MD trajectories, shown in Figures.
3.4, 3.5 and 3.6, for each host–guest complex were used for all quantum chemical calculations.
The host and guest molecules were analyzed with the same geometry as from the complex. The
thermal corrections for all molecules were calculated at the HF/6- 31G(d) level of theory in
                                                 32


Gaussian 16 and the vibrational contributions were scaled by 0.8953.68 Single point energies
were obtained using ORCA 4.069 with the B3PW91 density functional58,70,71 since B3PW91 has
been shown to properly treat long-range covalent interactions. In the treatment of the exact
exchange in the functional, the RIJCOSX approximation49 was used with the def2 auxiliary basis
set72 to reduce the computational cost associated with the number of atoms in the host–guest
complex since the RIJCOSX approximation has been shown to be five times as efficient for
molecules of similar size to the host–guest systems. To mimic the aqueous solution, the SMD
implicit solvation model73 was used with water (ε = 78.4) as the implicit solvent. Grimme’s D3
dispersion correction with Becke-Johnson damping was used to investigate long-range covalent
interactions as the inclusion of D3 dispersion improves intermolecular interaction energies
predicted with DFT.46,50,74,75
    The correlation consistent basis set family (cc-pVnZ)76 was used for all single point
calculations since these basis sets were developed to exhibit convergence behavior to the
complete basis set (CBS) limit for wavefunction-based methods through extrapolation.77–80
Knowing the CBS limit, which removes basis set incompleteness error, the error for the property
of interest, i.e. binding free energy, only corresponds to the intrinsic error of the chosen QM
method. Therefore, to extrapolate to the Kohn–Sham limit for DFT methods, analogous to the
CBS limit for wavefunction-based methods, the cc-pVnZ basis sets were used (n=D, T) with the
following extrapolation scheme proposed by Jensen
                      𝐸(𝑙𝑚𝑎𝑥 ) = 𝐸𝐶𝐵𝑆 + 𝐴(𝑙𝑚𝑎𝑥 + 1)𝑒 −𝐵√𝜋𝑠                   (3.1)
    where lmax is the maximum angular momentum function in the basis set and ns is the number
of s functions in the basis set.81 The B-parameter was set to 5.5 in agreement with Jensen for use
as a two-point extrapolation scheme. Due to the abundance of weak molecular interactions in
                                                33


biomolecules, the calculated binding energies were counterpoise corrected before the
extrapolations were performed on each host, guest, and host–guest complex.82,83
     Additional electronic structure modeling techniques were applied to the CB8 host–guest
systems to examine the impact of various approximations on the binding free energy. Targeting
reduction in computational time, the correlation consistent basis sets were truncated via the
removal of higher angular momentum basis functions for hydrogen atoms. This has been shown
to reduce the computational time by approximately 42.9% and 57.8% when removing 1 d
function from the cc-pVTZ basis set, denoted as cc-pVTZ(−1d), and 2 d functions and 1 f
function from the cc-pVQZ basis set, denoted as cc-pVQZ(−1f2d), respectively, and yielded the
results closest to the atomization energies generated with the full basis sets at the complete basis
set limit.51
     Binding free energies calculated with and without the use of the resolution-of-the-identity
(RI) approximation were examined to gauge how the RI approximation, which leads to a
reduction in CPU time, affects the accuracy. To characterize the ionic strength of the solution
used in experiment, the dielectric constant for the implicit water solvent was also altered from
78.4 for pure water to 76.4 given the concentration of the sodium chloride solution used in the
MD simulations and the experimentally determined relation between the concentration of an
ionic solution and the dielectric constant.84
3.3 Results
     The binding free energies submitted as part of the SAMPL6 competition are shown in Tables
3.1, 3.2 and 3.3 for CB8, OA, and TEMOA host–guest systems, respectively. For each host–
guest complex, statistical measurements were used to gauge the effectiveness of each of the three
methods, which are MMPBSA, RI-B3PW91-D3, and RI-B3PW91, in predicting experimental
                                                34


binding free energies. These include the mean absolute error (MAE), the root mean square error
(RMSE), Kendall’s Tau (τ) rank correlation coefficient, which measures how well a method
ranked calculated binding free energies relative to experimental binding free energies where τ
values closer to one correspond to increased qualitative accuracy of the prediction, and the
correlation coefficient (r2). To demonstrate there is no correlation in ranking between the
calculated binding free energies and the experimental binding free energies, τ values are
compared against τcrit, a cutoff value obtained through a table of critical values generated by
Monte Carlo simulations of a τ distribution, which is similar to the normal Z distribution, used to
reject the null hypothesis.85,86
                                               35


3.3.1 Cucurbit[8]uril (CB8)
    Table 3. 1 The binding free energies in kcal mol-1 for the CB8 host–guest systems.
     Complex              Exp           MMPBSA               RIB3PW91-D3           RI-B3PW91
     CB8-G0           −6.69±0.05         −29.4±0.3              −49.89                  6.75
     CB8-G1           −7.65±0.04         −31.5±0.3              −57.22                  12.7
     CB8-G2           −7.66±0.05         −25.6±0.3              −36.86                 10.34
     CB8-G3           −6.45±0.06         −34.2±0.5              −44.53                 26.61
     CB8-G4           −7.80±0.04         −30.8±0.3              −68.09                −11.11
     CB8-G5           −8.18±0.05         −18.6±0.3              −35.92                  2.39
     CB8-G6           −8.34±0.05         −19.8±0.2              −31.95                  1.26
     CB8-G7          −10.00±0.10         −17.6±0.4              −14.90                 18.09
     CB8-G8          −13.50±0.04         −30.4±0.2              −50.34                  4.49
     CB8-G9           −8.68±0.08         −19.9±0.5              −37.07                 −2.46
     CB8-G10          −8.22±0.07         −19.6±0.3              −39.30                  0.61
     CB8-G11          −7.77±0.05         −17.5±0.4              −25.75                 −1.07
     CB8-G12          −9.86±0.03         −31.5±0.4              −62.05                   15
     CB8-G13          −7.11±0.03         −25.4±0.3              −44.04                  0.17
                                                  a
       MAE                               16.7±0.3                34.29                 14.88
      RMSE                               17.8±0.8b               36.99                 17.26
         τ                                 −0.19                 −0.14                  0.05
         2
        r                                   0.00                  0.00                  0.00
The mean absolute error (MAE), root mean square error (RMSE), Kendall’s Tau (τ), and r2 are
shown. These results correspond to those submitted for the competition.
a
  The uncertainty reported for MAE is the average of the absolute uncertainties.
b
  The uncertainty reported for RMSE is the uncertainty of the RMSE with the experimental and
calculated uncertainties.
    The binding free energy predictions for the CB8 host with the three methods submitted were
compared to experiment (Table 3.1). The predicted values were significantly more negative than
experimental binding free energies with an MAE of 16.69, 33.58, and 15.54 kcal mol-1 for
MMPBSA, RI-B3PW91-D3, and RI-B3PW91, respectively.
    When the binding affinities of the guests to CB8 are ranked from the lowest to the highest
binding affinity, MMPBSA did not correctly rank any of the systems but predicted CB8-G12 to
have a stronger binding affinity relative to the other complexes, which correlates to experiment
                                                 36


well. RI-B3PW91-D3 correctly ranked CB8-G2 as the tenth strongest bound host–guest complex
and predicted that CB8-G12 was more tightly bound relative to the other CB8 host–guest
systems. RI-B3PW91 correctly ranked CB8-G6, CB8-G2, CB8-G1, and CB8-G3 as fifth, tenth,
eleventh, and fourteenth, respectively, while the remaining systems were ranked incorrectly.
Unlike both MMPBSA and RI-B3PW91- D3, RI-B3PW91 predicted CB8-G12 to have a lower
binding affinity relative to the other CB8 host–guest systems.
3.3.2 Octa acid (OA)
    Table 3. 2 The binding free energies in kcal mol-1 for the OA host–guest systems.
     Complex             Exp            MMPBSA          RI-B3PW91-D3             RI-B3PW91
     OA-G0           −5.68±0.03         −12.6±0.2           −41.36                  −16.57
     OA-G1           −4.65±0.02         −11.6±0.1           −40.67                  −17.15
     OA-G2           −8.38±0.02         −18.2±0.2            6.54                    44.53
     OA-G3           −5.18±0.02         −10.0±0.2           −47.94                  −17.62
     OA-G4           −7.11±0.02         −17.0±0.2           −48.19                  −13.49
     OA-G5           −4.59±0.02         −9.1±0.2            −38.40                  −16.42
     OA-G6           −4.97±0.02         −11.3±0.2           −43.19                  −23.31
     OA-G7           −6.22±0.02         −11.4±0.1           −47.37                  −23.78
       MAE                               6.8±0.2a       35.46[38.39]             17.86[12.85]
                                                 b
      RMSE                               7.1±0.4        36.41[38.52]             22.51[13.39]
        τ                                  0.64          0.29[0.71]              −0.21[0.05]
         2
        r                                  0.84          0.44[0.52]                0.6[0.03]
The mean absolute error (MAE), root mean square error (RMSE), Kendall’s Tau (τ), and r2 are
shown. Bracketed values indicate the values after the removal of the statistical outlier (OA-G2).
These results correspond to those submitted for the competition.
a
  The uncertainty reported for MAE is the average of the absolute uncertainties.
b
  The uncertainty reported for RMSE is the uncertainty of the RMSE with the experimental and
calculated uncertainties.
    The three sets of submitted binding free energy predictions for OA are reported in Table 3.2.
All values predicted using MMPBSA were significantly more negative than experimental
measurements with an MAE of 6.8±0.2 kcal mol-1. When ranking the binding affinities of the
                                                   37


guest to the host from lowest to highest binding affinity, MMPBSA correctly placed OA-G2,
OA-G4, OA-G6, OA-G5 as first, second, sixth, and eighth, respectively. The other systems were
not ranked correctly; OA-G0, OA-G1, OA-G7 and OA-G3 ranked third, fourth, fifth, and
seventh, respectively, whereas experimentally ranked fourth, seventh, third, and fifth,
respectively.
    For RI-B3PW91-D3 and RI-B3PW91, the binding free energy predicted for OA-G2 was
determined as a statistical outlier with 99% confidence, visualized in Figure 3.8, using Dixon’s
Q-Test.87 When the statistical outlier (OA-G2) was excluded from the RI-B3PW91-D3 set, the
MAE, RMSE, Kendall’s Tau (τ), and the correlation coefficient (r2) increased from 35.46 to
38.39 kcal mol-1, 36.41 to 38.52 kcal mol-1, 0.29 to 0.71, and 0.44 to 0.52, respectively. When
the binding free energy for OA-G2 was excluded from the set of binding free energies obtained
with RI-B3PW91, the MAE, RMSE, and r2 decreased from 17.87 to 12.85 kcal mol-1, 22.51 to
13.39 kcal mol-1, and 0.60 to 0.03, respectively, as shown in Table 3.2. In Figure 3.7b, the
statistical outlier was removed, which improved and worsened the linear regression model
comparing experiment to RI-B3PW91-D3 and RI-B3PW91, respectively. With the exclusion of
OA-G2, ranking the binding affinities from lowest to highest, RI-B3PW91-D3 correctly ranked
OA-G4, OA-G1, and OA-G5, as first, sixth, and seventh, respectively, while RI-B3PW91 did not
correctly ranked any of the systems.
                                               38


3.3.3 Tetramethyl octa acid (TEMOA)
     TEMOA is structurally different from OA because of the substitution of four hydrogens
around the portal to the binding pocket of OA with four methyl groups. While the same guests
bound to TEMOA and OA with similar binding energies, G7 weakly binds to TEMOA relative
to the other guests whereas it binds stronger to OA experimentally.
     Table 3. 3 The binding free energies in kcal mol-1 for the TEMOA host– guest systems.
     Complex              Exp                 MMPBSA            RI-B3PW91-D3       RI-B3PW91
     TEMOA-G0             −6.06±0.02          −12.0±0.2         −43.75             −12.80
     TEMOA-G1             −5.97±0.04          −11.3±0.2         −41.98             −10.18
     TEMOA-G2             −6.81±0.02          −19.3±0.2         −51.23             −7.22
     TEMOA-G3             −5.60±0.04          −8.3±0.2          −43.56             −15.29
     TEMOA-G4             −7.79±0.02          −19.2±0.3         −51.98             −12.39
     TEMOA-G5             −4.16±0.02          −6.1±0.2          −37.04             −10.66
     TEMOA-G6             −5.40±0.03          −10.4±0.2         −41.05             −16.94
     TEMOA-G7             −4.13±0.02          −6.8±0.3          −45.98             −10.29
                                                      a
     MAE                                      5.9±0.2           38.83              6.23
     RMSE                                     7.0±0.5b          39.03              7.00
     τ                                        0.79              0.57               −0.14
      2                                       0.86              0.55               0.00
     r
 The mean absolute error (MAE), root mean square error (RMSE), Kendall’s Tau (τ), and r 2 are
 shown. These results correspond to those submitted to the competition.
 a
   The uncertainty reported for MAE is the average of the absolute uncertainties.
 b
   The uncertainty reported for RMSE is the uncertainty of the RMSE with the experimental and
 calculated uncertainties.
     Binding free energy predictions using the submitted methods for the TEMOA host are
reported in Table 3. Similar to OA, all three methods overestimated the binding free energies
relative to experiment. RI-B3PW91-D3 overestimated the binding free energies with an MAE of
38.83 kcal mol-1. Of the three methods considered, the MMPBSA method yielded better binding
free energies, both quantitatively (MAE of 5.9±0.2 kcal mol-1) and qualitatively (τ = 0.79), than
                                                 39


the QM-based calculations. MMPBSA ranked TEMOA-G0 and TEMOA-G1 as the third and
fourth strongest bound complexes, respectively. Additionally, MMPBSA predicted that
TEMOA-G4 and TEMOA-G2 were the most tightly bound complexes while TEMOA-G7 and
TEMOA-G5 were the most loosely bound complexes. RI-B3PW91-D3 correctly predicted that
TEMOA-G4, TEMOA-G2, and TEMOA-G3 were the first, second, and fifth most tightly bound
complexes, respectively. Like MMPBSA, RI-B3PW91-D3 predicted that TEMOA-G5 was a
weakly bound host–guest complex relative to the other TEMOA host–guest systems. RI-
B3PW91 correctly predicted TEMOA-G0 as the third strongest bound host–guest complex and
yielded the lowest deviation from experiment (0.41 kcal mol-1) for TEMOA-G2.
3.3.4 Quantum Mechanical Calculations
    The CB8 host–guest systems were used to probe approaches for improving the binding free
energy prediction. Specifically, the effects of (1) utilizing truncated correlation consistent basis
sets as opposed to standard correlation consistent basis sets; (2) utilizing traditional DFT
calculations (neglecting the RI approximation); and (3) modifying the dielectric constant used in
the continuum solvation model to reflect the ionic strength of the solution used in experiment
were examined.
    As shown in Tables 3.1, 3.2 and 3.3, for CB8, OA without the statistical outlier (OA-G2),
and TEMOA, the MAE, and RMSE increased by approximately 19.4, 25.5, and 32.6 kcal/ mol
when using Grimme’s D3 dispersion with RI-B3PW91, respectively, away from experiment.
However, when using Grimme’s D3 dispersion, the τ value decreases from 0.05 to −0.14 for
CB8 but increases from −0.05 to 0.71 when the statistical outlier is removed for OA and
increases from −0.14 to 0.57 for TEMOA. This shows the importance of using a dispersion
correction for qualitative ranking of binding affinities.
                                                 40


   Table 3. 4 The binding free energies for CB8 complexes in kcal mol-1 with various schemes involving not using the RI
approximation, changing the dielectric constant of the implicit solvent with the truncated correlation consistent basis sets for
hydrogen.
                                      B3PW91-D3                       RI-B3PW91-D3                      RI-B3PW91-D3
                                     (SMD, ε=78.4)                    (SMD, ε=78.4)                     (SMD, ε=76.4)
  Complex          Exp
                                TZ                    QZ         TZ                    QZ         TZ                   QZ
                                          TZ                               TZ                              TZ
                               (−1d)               (−1f2d)      (−1d)               (−1f2d)      (−1d)               (−1f2d)
  CB8-G0        −6.69±0.05    −49.85 −49.91         −49.27     −49.84 −49.89         −49.25     −49.84    −49.82     −36.26
  CB8-G1        −7.65±0.04    −54.54 −57.22         −56.61     −57.21 −57.22         −56.61     −57.21    −57.24     −56.62
  CB8-G2        −7.66±0.05    −37.32 −36.86         −36.39     −36.82 −36.86         −36.39     −36.82    −36.87     −36.40
  CB8-G3        −6.45±0.06    −45.01 −44.54         −44.38     −44.51 −44.53         −44.38     −44.51    −44.55     −44.40
  CB8-G4        −7.80±0.04    −69.19 −68.10         −67.50     −68.07 −68.09         −67.49     −68.07    −68.12     −67.52
  CB8-G5        −8.18±0.05    −36.17 −16.10         −35.53     −35.89 −35.92         −35.52     −35.89    −35.95     −35.54
  CB8-G6        −8.34±0.05    −31.95 −31.96         −31.63     −31.93 −31.95         −31.62     −31.95    −31.97     −31.64
  CB8-G7       −10.00±0.10 −14.92 −14.95            −12.89     −14.88 −14.90         −12.89     −14.91    −14.92     −12.91
  CB8-G8       −13.50±0.04 −50.61 −27.26            −49.89     −50.30 −50.34         −49.90     −50.30    −50.36     −49.92
  CB8-G9        −8.68±0.08    −37.31 −19.22         −36.73     −37.05 −37.07         −36.71     −37.05    −37.09     −36.74
  CB8-G10       −8.22±0.07    −42.27 −15.29         −38.92     −39.28 −39.30         −38.90     −39.28    −39.32     −38.91
  CB8-G11       −7.77±0.05    −28.63 −10.21         −25.37     −25.74 −25.75         −25.36     −25.74    −25.80     −25.41
  CB8-G12       −9.86±0.03    −62.53 −62.08         −61.43     −61.99 −62.05         −61.40     −61.99    −62.07     −61.41
  CB8-G13       −7.11±0.03    −52.30 −51.72         −50.03     −51.73 −44.04         −50.00     −51.74    −51.75     −50.04
    MAE                        35.33     27.68       34.19      34.81     34.29       34.18      34.81    34.85       33.27
   RMSE                        37.96     33.79       37.03      37.56     36.99       37.02      37.56    37.60       36.13
      τ                        −0.14     −0.21      −0.12       −0.12     −0.14       −0.12      −0.12    −0.12       −0.08
       2
      r                         0.00      0.07        0.00       0.00      0.00        0.00       0.00     0.00        0.00
The mean absolute error (MAE), root mean square error (RMSE), Kendall’s Tau (τ), and r are shown.
                                                                                      2
                                                              41


    The binding free energies as a result of utilizing truncated basis sets individually and
extrapolated to the Kohn–Sham limit with a two-point extrapolation using cc-pVDZ and cc-
pVTZ (cc-pV∞Z[D,T]) and a three-point extrapolation using cc-pVDZ and truncated triple and
quadruple correlation consistent basis sets, cc-pVTZ(− 1d) and ccpVQZ(−1f2d), denoted as
cc(0,−1,−2), are reported in Tables 4 and 5, respectively.
                                                42


   Table 3. 5 The binding free energies for the CB8 complexes in kcal mol-1 with various schemes involving not using the RI
approximation, changing the dielectric constant of the implicit solvent, and two options for basis set choice when extrapolating to the
Kohn–Sham limit.
                                         B3PW91-D3                          RI-B3PW91-D3                        RI-B3PW91-D3
                                        (SMD, ε=78.4)                       (SMD, ε=78.4)                       (SMD, ε=76.4)
 Complex           Exp
                                  cc-pV∞Z                cc             cc-pV∞Z             Cc             cc-pV∞Z             cc
                                   [D, T]            (0,−1,−2)            [D, T]        (0,−1,−2)            [D, T]        (0,−1,−2)
  CB8-G0       −6.69±0.05          −49.91             −47.62             −49.89          −47.58             −49.82          −16.15
  CB8-G1       −7.65±0.04          −57.22             −60.08             −57.22          −55.85             −57.24          −55.88
  CB8-G2       −7.66±0.05          −36.86             −35.25             −36.86          −36.00             −36.87          −36.04
  CB8-G3       −6.45±0.06          −44.54             −43.50             −44.53          −44.20             −44.55          −44.26
  CB8-G4       −7.80±0.04          −68.10             −64.83             −68.09          −66.23             −68.12          −66.27
  CB8-G5       −8.18±0.05          −16.10             −34.89             −35.92          −35.28             −35.95          −35.32
  CB8-G6       −8.34±0.05          −31.96             −31.33             −31.95          −31.32             −31.96          −31.34
  CB8-G7      −10.00±0.10          −14.95             −11.27             −14.90          −11.30             −14.92          −11.31
  CB8-G8      −13.50±0.04          −27.26             −24.47             −50.34          −49.31             −50.36          −49.38
  CB8-G9       −8.68±0.08          −19.22             −36.14             −37.07          −36.50             −37.09          −36.58
 CB8-G10       −8.22±0.07          −15.29             −33.97             −39.30          −38.63             −39.32          −38.67
 CB8-G11       −7.77±0.05          −10.21             −20.40             −25.75          −24.88             −25.80          −25.02
 CB8-G12       −9.86±0.03          −62.08             −60.01             −62.05          −60.67             −62.07          −60.70
 CB8-G13       −7.11±0.03          −51.72             −47.18             −44.04          −37.82             −51.75          −40.12
   MAE                              27.68              30.93              34.29           32.69              34.85           30.65
  RMSE                              33.79              34.71              36.99           35.56              37.60           34.12
     τ                              −0.21              −0.34              −0.14           −0.12              −0.12           −0.01
     r2                              0.07               0.15               0.00            0.00               0.00            0.02
These options are cc-pV∞Z [D, T], which use cc-pVDZ and cc-pVTZ to extrapolate to the Kohn–Sham limit, and cc(0,−1, −2), which
uses cc-pVDZ, cc-pVTZ(−1d), and cc-pVQZ(−1f2d) to extrapolate to the Kohn–Sham limit. The binding energies obtained with RI-
B3PW91-D3 (SMD, ε=78.4)/cc-pV∞Z [D, T] were submitted. The mean absolute error (MAE), root mean square error (RMSE),
Kendall’s Tau (τ), and r2 are shown.
                                                                  43


    For the CB8 complexes in Table 3.4, using standard DFT (B3PW91-D3) yielded a MAE of
35.33 kcal/ mol and 34.19 kcal mol-1 with cc-pVTZ(− 1d) and ccpVQZ(−1f2d), respectively,
while RI-DFT (RI-B3PW91- D3) yielded a MAE of 34.81 and 34.18 kcal mol-1 for ccpVTZ(−
1d) and cc-pVQZ(− 1f2d), respectively. When changing ε from 78.4 for pure water to 76.4 to
account for the ionic strength of the solution (RI-B3PW91-D3 (ε = 76.4)), all metrics (MAE,
RMSE, τ, and r2) used to gauge the method’s predictive qualities for the binding free energies
did not significantly change with respect to the binding free energies predicted in pure water
(RIB3PW91-D3 (ε = 78.4)).
    Table 3.5 shows the predicted binding free energies for B3PW91-D3 (ε = 78.4), RI-
B3PW91-D3 (ε = 78.4), and RI-B3PW91-D3 (ε=76.4) at the Kohn–Sham limit using cc-
pV∞Z[D,T], a two-point extrapolation using cc-pVDZ and cc-pVTZ, and cc(0,−1,−2), a three-
point extrapolation using cc-pVDZ, cc-pVTZ(−1d) and cc-pVQZ(−1f2d) for the CB8 complexes.
Using the cc(0,−1,−2) basis set choice for extrapolation, the binding free energies predicted by
RIB3PW91-D3 (ε=78.4) and RI-B3PW91-D3 (ε=76.4) lowered the MAE by approximately
1.6 kcal mol-1 and 4.2 kcal/ mol, respectively, in regards to using the cc-pV∞Z[D,T] scheme.
3.4 Discussion
    Calculating end-state binding free energies with MMPBSA is relatively fast and simple but
results of a loss in accuracy and reliance compared to other free energy methods. It has been
known that various factors affect the performance of the MMPBSA method such as the force
field, solute dielectric constant, as well as sampling.33 In our model, we employed the AM1-BCC
partial charge scheme for the guest and host molecules for use with the GAFF force field to
increase computational efficiency. GAFF was designed to use partial charges calculated from the
restrained electrostatic potential fit (RESP) method.56,88 Although, the AM1-BCC scheme was
                                                  44


parameterized to reproduce RESP charges, this may only be appropriate for the guest molecules
rather than the larger host molecules. The interactions between the host and guest molecules may
have been overestimated or underestimated as a result using the AM1-BCC charge scheme,
hence the binding affinity predictions may be improved by using the RESP charge model.
3.4.1 Submission Analysis
    For the methods submitted to the SAMPL6 competition, using RI-B3PW91-D3 yielded
higher τ values for OA and TEMOA than using RI-B3PW91 for predicting binding free energies.
Since there are eight guests that are bound to OA and TEMOA, τcrit for α=0.05 is 0.57 for 8 data
points. Only MMPBSA correlates with experiment (|τ| > τcrit), as the τ values are 0.64, 0.29, and
−0.21 for MMPBSA, RI-B3PW91-D3, and RI-B3PW91, respectively. However, after removing
the statistical outlier, OA-G2, from the dataset, τ increases from 0.29 to 0.71, which implies that
RIB3PW91-D3 also correlates with experiment. As shown in Table 3.2, RI-B3PW91-D3 ranked
the binding free energies more correctly than MMPBSA when the outlier is excluded. For
TEMOA, both MMPBSA and RI-B3PW91-D3 correlate with experiment with τ values of 0.79
and 0.57, respectively, which are greater than τcrit.
    As shown in Figure 3.7a, there is no correlation between experimental and predicted binding
free energies for the CB8 host–guest systems. This is supported by r2 ≈ 0 and τ values of − 0.19,
− 0.14, 0.12 for MMPBSA, RI-B3PW91-D3, and RI-B3PW91, respectively, which are smaller in
magnitude than τcrit for α=0.05 for 14 data points, which is 0.36. This also shows an
inconsistency when using Grimme’s dispersion correction, which may be due to the abundance
of N and O atoms present in the CB8 host and empirical descriptors for those atoms. For all sets
of the host–guest systems, RI-B3PW91 had a lower MAE and RMSE than RIB3PW91-D3 by
approximately 19.4–32.6 kcal mol-1, but as a tradeoff, resulted in qualitatively better predictions
                                                 45


of the binding affinities (Figure 3.8). This implies that using a dispersion correction overbinds
the guest to the host but is needed for proper ranking.
    To estimate the relative performance of the methods, the mean signed error (MSE) was used
to offset the calculated binding free energies. After the removal of MSE from the MMPBSA and
RI-B3PW91-D3 predicted binding free energies for OA and TEMOA, the MAE and the RMSE
values are recalculated to estimate the performance of methods in relative terms as shown in
Table 3.6. This correction improved the MAE and RMSE for MMPBSA by 6.8 and 5.9 kcal/ mol
for OA and TEMOA, respectively. The correction improved the RI-B3PW91-D3 MAE and
RMSE by 38.39 and 38.83 kcal mol-1 for OA without the OA-G2 outlier and TEMOA,
respectively.
                                                 46


     Figure 3. 7 Plots for calculated results in Tables 3.1, 3.2 and 3.3 versus experimental results
in kcal mol-1 for (a) CB8, (b) OA, and (c) TEMOA for MMPBSA (blue), RI-B3PW91-D3
(black), and RI-B3PW91 (green). The dashed lines in each corresponding color refers to the best
fit line where the statistical outlier (OA-G2) for RI-B3PW91 and RI B3PW91-D3 is removed for
b and c. The dashed gray line is the y=x line.
                                                  47


3.4.2 Impact of Truncated Basis Sets
    For the QM calculations, the subset of the CB8 host–guest systems was chosen because the
size of these systems is smaller compared to the octa-acid host–guest systems investigated.
While using the RI approximation, lowering ε from 78.4 for pure water to 76.4 to account for the
ionic strength of the solution increased the MAE by 0.56 kcal mol-1. However, altering the
dielectric constant from 78.4 to 76.4 to account for the ionic strength of the solution lowered the
MAE from 34.85 to 30.65 kcal mol-1 for the three-point extrapolation with truncated triple-ζ and
quadruple-ζ correlation consistent basis sets, yet for RI-B3PW91-D3 (ε=78.4), the MAE only
decreased from 34.29 to 32.69 kcal mol-1 (Table 3.5). Therefore, factors that can change the
dielectric constant should be considered when using implicit solvent models for binding free
energy predictions.
    The use of the cc(0,−1,−2) basis set scheme lowered the MAE for CB8 complexes by 1.60
kcal mol-1 relative to using cc-pV∞Z[D,T] (Table 3.5) for RI-B3PW91-D3 (ε=78.4). In contrast,
when using truncated basis sets and standard basis sets for binding free energies (Table 3.4), the
MAE decreased by 0.51 kcal mol-1 for the CB8 complexes when using cc-pVTZ as opposed to
cc-pVTZ(−1d) for RIB3PW91-D3 (ε=78.4). The MAE decreased by 0.31 kcal/ mol when
increasing the basis set quality of truncated basis sets for RI-B3PW91-D3 (ε=78.4). Therefore,
within the RI approximation, the decrease in MAE when using ccpVQZ(−1f2d) highlights the
importance of using higher quality basis sets when extrapolating to the Kohn–Sham limit.
    For predictions without the RI approximation, the binding free energies determined using
B3PW91-D3/cc-pVTZ yielded a decrease in the MAE by 7.65 kcal mol-1 relative to B3PW91-
D3/cc-pVTZ(−1d) as shown in Table 3.4. This is believed to be a result from including the four-
center two-electron electron repulsion integrals removed via the RI approximation and the need
                                                48


for additional polarization when describing interactions with hydrogens between the host and the
guest. This effect also contributes to the increase of 3.25 kcal mol-1 in the MAE between
B3PW91-D3/ cc-pV∞Z[D,T] and B3PW91-D3/cc(0,−1,−2). However, as shown in Table 3.5,
when employing truncated basis sets (cc(0,−1,−2)), binding free energy predictions when using
RI-B3PW91-D3 (ε=76.4) are more positive and yield a MAE of 0.28 kcal mol-1 lower than
B3PW91-D3 (ε=78.4). This illustrates that within the RI approximation, changing the dielectric
constant is as beneficial to predicting binding free energies as utilizing standard DFT, which is
more computationally costly than RI-DFT.
     For the CB8-G6 host–guest complex, which was one of the smaller systems in the set of
host–guest systems, the number of basis functions decreased from 4016 to 3696 with the
truncation of 1 d basis function from the cc-pVTZ basis set for hydrogen and decreased from
7640 to 6872 with the truncation of 1 f and 2 d basis functions from the cc-pVQZ basis set for
hydrogen. Since DFT scales approximately N3 to N5 depending on the complexity of the
functional where N is the number of basis functions, truncated basis sets become a practical
option for further decreasing the computational cost while improving the quantitative prediction
of binding free energies for these host–guest systems as truncating 1 d basis function from cc-
pVTZ only affected the binding energy predicted with cc-pVTZ by ≤0.06 kcal mol-1 as shown in
Table 3.4 for RI-B3PW91-D3.
3.4.3 Impact of the Extrapolation Scheme B-parameter
     Another factor that can account for the large deviations between host–guest binding energies
is the parameter used to fit Equation 3.1 for two-point extrapolations. The value of 5.5 proposed
by Jensen for the B-parameter, which was used for atoms and diatomics, caused the extrapolation
curve to converge at a very rapid rate and is reflected in the predictions for the CB8 complexes,
                                                 49


as the binding affinities in Table 3.1 are identical to those predicted with the ccpVTZ basis set
with the respective method in Table 3.4. Also, when using the three-point extrapolations with
truncated basis sets for the CB8 complexes, the B-parameter yielded an average value of 0.37
(Table 3.10). Therefore, the value of 0.37 for the B-parameter was applied to two-point
extrapolations with cc-pVDZ and cc-pVTZ to gauge how changing the B-parameter affects the
extrapolated binding free energies (Table 3.7). The results from using 0.37 as the B-parameter in
a two-point extrapolation show that the MAE decreased by 0.84 and 0.42 kcal mol-1 for the CB8
and TEMOA complexes, respectively. The MAE did not change for the OA complexes. Setting
the B-parameter to 0.37 did not change the τ values for CB8 and OA complexes, however, did
increase the τ value from 0.57 to 0.71 for TEMOA.
    In addition to applying 0.37 for the B-parameter to predict binding free energies for all host–
guest systems using two-point extrapolations with cc-pVDZ and ccpVTZ, the value of the B-
parameter was optimized to the value of 0.12 via minimizing the MAE and was applied (Table
3.7). For the CB8 host–guest systems, shifting the B-parameter from 5.5 to 0.12 had a noticeable
impact on the MAE, which decreased from 34.29 to 29.84 kcal/ mol for RI-BWPW91-D3. A
similar effect was observed for TEMOA with a decrease in the MAE of 5.07 kcal/ mol. There is
no notable change in MAE, RMSE, or τ for the OA complexes with the change in the B-
parameter. Furthermore, τ increases from 0.57 to 0.93 when the B-parameter is changed from 5.5
to 0.12 for TEMOA with RI-B3PW91-D3, which provides more evidence that dispersion-
corrected functionals should be used for qualitative predictions of binding free energies since |τ|
> τcrit. The observed trends imply that the value of the B-parameter should be reoptimized when
using Equation 3.1 for macromolecules.
                                                 50


    Table 3. 6 The predicted binding energies for OA and TEMOA using MMPBSA and RI-
B3PW91 after the removal of mean signed error (MSE)
                                   OA                                     TEMOA
                    MMPBSA           RI-B3PW91-D3           MMPBSA             RI-B3PW91-D3
                                                                      a
    MAE              1.6±0.2a           11.66 [2.81]          3.0±0.2               3.49
                                                                      b
    RMSE             1.9±0.4b           17.87 [3.12]          3.7±0.5               3.95
       τ               0.64              0.29 [0.71]            0.79                0.57
      r2               0.84              0.44 [0.52]            0.86                0.55
Bracketed values indicate the values after the removal of the statistical outlier (OA-G2). The
mean absolute error (MAE) in kcal mol-1, root mean square error (RMSE) in kcal mol-1,
Kendall’s Tau (τ), and r2 are shown.
a
  The uncertainty reported for MAE is the average of the absolute uncertainties.
b
  The uncertainty reported for RMSE is the uncertainty of the RMSE with the experimental and
calculated uncertainties.
    Compared to other submissions employing QM methods in the SAMPL6 host–guest binding
challenge, our approach yielded quantitatively poorer predictions that may have resulted from
the approximations considered in this work. In our approach, only a single conformational state
of the guest binding to the host system was considered. Additionally, the representative
structures of the individual host–guest systems obtained from clustering the MD trajectories were
not optimized with QM methods and is reflected in our model chemistries.
                                                 51


    Table 3. 7 The predicted binding energies when using different values for B in Eq. 1 for two-
point extrapolations using cc-pVDZ and cc-pVTZ with RI-B3PW91-D3.
                              B=5.5                    B=0.37                    B=0.12
       CB8
       MAE                    34.29                     33.45                     29.84
      RMSE                    36.99                     36.33                     33.34
         τ                    −0.14                     −0.14                     −0.03
          2
         r                       0                         0                        0
       OA
       MAE                35.46[38.39]              35.46[38.42]               35.43[38.74]
      RMSE                36.41[38.52]              36.43[38.54]               36.70[38.86]
         τ                 0.29[0.71]                0.29[0.71]                 0.29[0.71]
         r2                0.44[0.52]                0.43[0.52]                 0.43[0.54]
    TEMOA
       MAE                    38.83                     38.41                     33.76
      RMSE                    39.03                      38.6                      36.3
         τ                     0.57                      0.71                      0.93
          2
         r                     0.55                      0.75                      0.58
Bracketed values indicate the values after the removal of the statistical outlier (OA-G2). The
mean absolute error (MAE) in kcal mol-1, the root mean square error (RMSE) in kcal mol-1, the
Kendall’s Tau (τ), and r2 are shown.
3.4.4 Impact of representative geometries
    The cause of OA-G2 being a statistical outlier is suspected to be from the orientation of the
substituted cyclohexene ring relative to the OA host (Figure 3.5). Comparing OA-G2 and
TEMOA-G2 in Figures 3.5 and 3.6, where the only difference is the four methyl groups on the
host, the structure of the OA-G2 complex has a smaller binding pocket than the TEMOA-G2
complex. While the experimental data suggests that G2 has a stronger binding affinity towards
OA than TEMOA, MMPBSA suggests the opposite. More sampling of representative structures
would aid in depicting whether the anomalous binding behavior of OA-G2 correlates with the
positive binding free energies predicted with DFT.
                                               52


    Although the only difference between CB8-G6 and CB8- G7 was the expansion of the ring
for the guest by one CH2 group, the predicted binding affinities for the CB8-G6 and CB8-G7
complexes differed by approximately 17.0 kcal/ mol. This may be due to the binding poses of
CB8-G6 and CB8-G7 complexes, as G6 bound in a perpendicular fashion inside the binding
pocket relative to the host whereas G7 bound in a parallel fashion inside the binding pocket. This
would affect nearby electrostatic interactions and why for B3PW91-D3 (ε=78.4), RI-B3PW91-
D3 (ε=78.4), and RI-B3PW91-D3 (ε=76.4), there was a 3.00 kcal mol-1 difference in the change
of binding energies between CB8-G6 and CB8-G7 when improving basis set quality via the basis
set scheme used for extrapolation (Table 3.5). Ergo, more sampling of chemically relevant
structures or enhanced sampling methods can provide a more robust depiction of the host–guest
binding environment.
    The volumes of guest molecules for OA and CB8 molecules were compared to each other.
The volumes of the guests bound to CB8 are larger than those bound to OA and TEMOA as
shown in Tables 3.9 and 3.10. The guests CB8-G0, CB8-G1, CB8-G2, CB8-G3, CB8-G4, and
CB8- G12 are among the largest ligands for this year’s competition with volumes of 462, 518,
432, 468, 817, and 553 Å3, respectively. These values are more than twice the average volume of
OA guests and the absolute error between the experimental and the predicted binding free
energies for the larger CB8 guests are among the highest for all our methods (MMPBSA, RI-
B3W91-D3 and RI-B3PW91) as shown in Figure 3.8. The MMPBSA and RI-B3PW91-D3
methods have a definite correlation with the experiment based on ranking the binding affinities
of the octa-acid guest molecules, which were smaller in volume on average compared to the CB8
guests. This correlation is evident from the τ values of 0.64, 0.79, 0.71, 0.57 for MMPBSA (OA),
                                                 53


MMPBSA (TEMOA), RI-B3PW91-D3 (OA without OA-G2 outlier), and RI-B3PW91-D3
(TEMOA), respectively.
    However, these two methods do not correlate to the CB8 binding free energies since the τ
values are −0.19 and −0.14 for MMPBSA and RI-B3PW91-D3, respectively. This may result
from insufficient sampling as the CB8 guests are larger molecules with higher conformational
flexibility. For example, the size of CB8-G4 does not allow the guest to fit entirely into the
binding cavity. As a result, most of the CB8-G4 molecule is weakly bound to the host from
outside of the binding pocket and only one of the three triethyl amines within the guest can fit
into the pocket as shown in Figure 3.4. Each triethyl amine group could bind to the host from
inside the binding cavity, which would result in alternative binding conformations and affect the
overall binding free energy. To better understand binding free energies of these large structures,
more sampling of the different binding modes is needed to generate weighted averages based on
the thermodynamic stability of predicted poses.
    The results for OA and TEMOA systems illustrate that MMPBSA and RI-B3PW91-D3
methods can be used to qualitatively rank binding energies of small molecules. Among those two
methods, MMPBSA is computationally less expensive, but RI-B3PW91-D3 predicted the
relative binding affinities better for OA and TEMOA host–guest systems. However, the MAE
and the corresponding error plots (Figure 3.8) indicate that both methods overestimated the
binding free energies. The MAE reported for the OA and TEMOA complexes state that
MMPBSA and RI-B3PW91-D3 predict overbinding by 6.8 and 35.5 kcal mol-1, respectively, for
OA complexes and 5.9 and 38.8 kcal mol-1, respectively, for TEMOA complexes. For all
systems, the MMPBSA method was the best approach overall in terms of quantitative
predictions.
                                                54


   Figure 3. 8 Error plots from experimental results in kcal mol-1 for (a) CB8 (b) OA, and (c)
TEMOA for MMPBSA (blue), RI-B3PW91- D3 (black), and RI-B3PW91 (green) for the
submitted results from Tables 3.1, 3.2 and 3.3.
                                                55


3.5 Conclusions
    When implementing DFT for predicting host–guest binding affinities, the use of Grimme’s
D3 dispersion correction was essential for qualitatively predicting the binding free energies for
the OA and TEMOA systems even though the MAE exceeded 35.0 kcal mol-1 for both the OA
and TEMOA systems. When using implicit solvent models, factors that can change the dielectric
constant, such as the ionic strength of the solution, are relevant for predicting binding free
energies, as lowering the dielectric constant lowered the MAE. While RI-B3PW91-D3 reduced
the computational cost relative to B3PW91-D3, B3PW91-D3 yielded a lower MAE. To attain
more quantitatively favorable results, using cc-pVQZ(−1f2d) for hydrogen atoms reduces the
computational cost relative to using cc-pVQZ while simultaneously providing a better standard
for extrapolating to the Kohn–Sham limit than only utilizing cc-pVDZ and cc-pVTZ for
extrapolations. Also, truncating 1 d basis function for hydrogen atoms had a very small effect on
predicted binding free energies obtained with cc-pVTZ, indicating that truncated basis sets are a
viable option to reduce the computational cost while yielding near-identical binding free
energies. With the extrapolation scheme utilized, the B-parameter should be revised for
macromolecules since reducing the value of the B-parameter from the proposed 5.5 to 0.12
reduced the MAE while providing extrapolated binding energies that were in alignment with
those predicted using quadruple-ζ level basis sets. Sampling of different binding poses becomes
pertinent for future investigations as binding orientation in the pocket affected the predicted
binding free energies by approximately 17.0 kcal mol-1 when using RI-B3PW91-D3 for guests
that only differed by one CH2 group.
    All methods presented predict over binding character for these host–guest systems except for
RI-B3PW91 for CB8 host–guest systems. MMPBSA and RI-B3PW91- D3 worked well at
                                                 56


ranking binding affinities for smaller guests regardless of the size of the host. The CB8 guest
molecules with a larger van der Waals volume yielded poor prediction of binding free energy
due to their higher conformational flexibility, which can complicate predicting binding poses. To
better understand binding free energies of these large structures, enhanced sampling methods can
be used, and multiple host–guest binding poses can be sampled.
                                                 57


APPENDIX
   58


   Table 3. 8 Van der Waals volumes in Å3 of CB8 guest molecules are calculated using
connection table approximation.
                                 Guest        Volume
                                CB8-G0         462
                                CB8-G1         518
                                CB8-G2         432
                                CB8-G3         468
                                CB8-G4         817
                                CB8-G5         249
                                CB8-G6         190
                                CB8-G7         214
                                CB8-G8         312
                                CB8-G9         211
                                CB8-G10        244
                                CB8-G11        184
                                CB8-G12        553
                                CB8-G13        265
                                Average        366
                                        59


    Table 3. 9 Van der Waals volumes in Å3 of OA and TEMOA guest molecules are calculated
using connection table approximation.
                                    Guest            Volume
                                   OA-G0                176
                                   OA-G1                160
                                   OA-G2                238
                                   OA-G3                160
                                   OA-G4                258
                                   OA-G5                160
                                   OA-G6                166
                                   OA-G7                184
                                   Average              188
    Table 3. 10 Fitting parameter values obtained when using Jensen’s extrapolation scheme for
each component in calculating the binding energy (Equation 1). The host and guest are
counterpoise-corrected before the extrapolation was performed.
               Complex            Complex              Host             Guest
                CB8-G0               0.37              0.36              0.41
                CB8-G1               0.36              0.35              0.37
                CB8-G2               0.36              0.36              0.37
                CB8-G3               0.36              0.36              0.37
                CB8-G4               0.32              0.32              0.34
                CB8-G5               0.38              0.38              0.39
                CB8-G6               0.39              0.39              0.40
                CB8-G7               0.38              0.38              0.40
                CB8-G8               0.37              0.37              0.39
                CB8-G9               0.39              0.39              0.39
               CB8-G10               0.38              0.38              0.38
               CB8-G11               0.39              0.39              0.40
               CB8-G12               0.36              0.35              0.37
               CB8-G13               0.39              0.38              0.40
                Average              0.37              0.37              0.38
                                               60


REFERENCES
    61


                                        REFERENCES
(1)  Klepeis, J. L.; Lindorff-Larsen, K.; Dror, R. O.; Shaw, D. E. Long-Timescale Molecular
     Dynamics Simulations of Protein Structure and Function. Curr. Opin. Struct. Biol. 2009,
     19 (2), 120–127. https://doi.org/10.1016/j.sbi.2009.03.004.
(2)  Shan, Y.; Seeliger, M. A.; Eastwood, M. P.; Frank, F.; Xu, H.; Jensen, M. O.; Dror, R. O.;
     Kuriyan, J.; Shaw, D. E. A Conserved Protonation-Dependent Switch Controls Drug
     Binding in the Abl Kinase. Proc. Natl. Acad. Sci. 2009, 106 (1), 139–144.
     https://doi.org/10.1073/pnas.0811223106.
(3)  Zhao, G.; Perilla, J. R.; Yufenyuy, E. L.; Meng, X.; Chen, B.; Ning, J.; Ahn, J.;
     Gronenborn, A. M.; Schulten, K.; Aiken, C.; Zhang, P. Mature HIV-1 Capsid Structure by
     Cryo-Electron Microscopy and All-Atom Molecular Dynamics. Nature 2013, 497 (7451),
     643–646. https://doi.org/10.1038/nature12162.
(4)  Perilla, J. R.; Goh, B. C.; Cassidy, C. K.; Liu, B.; Bernardi, R. C.; Rudack, T.; Yu, H.;
     Wu, Z.; Schulten, K. Molecular Dynamics Simulations of Large Macromolecular
     Complexes.         Curr.      Opin.      Struct.     Biol.     2015,       31,     64–74.
     https://doi.org/10.1016/j.sbi.2015.03.007.
(5)  Walkowicz, W. E.; Fernández-Tejada, A.; George, C.; Corzana, F.; Jiménez-Barbero, J.;
     Ragupathi, G.; Tan, D. S.; Gin, D. Y. Quillaja Saponin Variants with Central Glycosidic
     Linkage Modifications Exhibit Distinct Conformations and Adjuvant Activities. Chem.
     Sci. 2016, 7 (3), 2371–2380. https://doi.org/10.1039/C5SC02978C.
(6)  Hadden, J. A.; Perilla, J. R.; Schlicksup, C. J.; Venkatakrishnan, B.; Zlotnick, A.;
     Schulten, K. All-Atom Molecular Dynamics of the HBV Capsid Reveals Insights into
     Biological Function and Cryo-EM Resolution Limits. Elife 2018, 7, e32478.
     https://doi.org/10.7554/eLife.32478.
(7)  García, M. A.; Meurs, E. F.; Esteban, M. The DsRNA Protein Kinase PKR: Virus and
     Cell         Control.       Biochimie         2007,        89      (6–7),        799–811.
     https://doi.org/10.1016/j.biochi.2007.03.001.
(8)  Tripathi, R. B.; Pande, M.; Garg, G.; Sharma, D. In-Silico Expectations of Pharmaceutical
     Industry to Design of New Drug Molecules. J. Innov. Pharm. Biol. Sci. 2016, 3 (3), 95–
     103.
(9)  Ryde, U.; Söderhjelm, P. Ligand-Binding Affinity Estimates Supported by Quantum-
     Mechanical        Methods.      Chem.      Rev.     2016,     116     (9),     5520–5566.
     https://doi.org/10.1021/acs.chemrev.5b00630.
(10) Ganesan, A.; Coote, M. L.; Barakat, K. Molecular Dynamics-Driven Drug Discovery:
     Leaping Forward with Confidence. Drug Discov. Today 2017, 22 (2), 249–269.
     https://doi.org/10.1016/j.drudis.2016.11.001.
                                               62


(11) Mobley, D. L.; Gilson, M. K. Predicting Binding Free Energies: Frontiers and
     Benchmarks. Annu. Rev. Biophys. 2017, 46 (1), 531–558. https://doi.org/10.1146/annurev-
     biophys-070816-033654.
(12) Huggins, D. J.; Sherman, W.; Tidor, B. Rational Approaches to Improving Selectivity in
     Drug       Design.     J.     Med.      Chem.       2012,     55      (4),    1424–1444.
     https://doi.org/10.1021/jm2010332.
(13) Muddana, H. S.; Daniel Varnado, C.; Bielawski, C. W.; Urbach, A. R.; Isaacs, L.; Geballe,
     M. T.; Gilson, M. K. Blind Prediction of Host–Guest Binding Affinities: A New SAMPL3
     Challenge. J. Comput. Aided. Mol. Des. 2012, 26 (5), 475–487.
     https://doi.org/10.1007/s10822-012-9554-1.
(14) Rogers, K. E.; Ortiz-Sánchez, J. M.; Baron, R.; Fajer, M.; De Oliveira, C. A. F.;
     McCammon, J. A. On the Role of Dewetting Transitions in Host-Guest Binding Free
     Energy Calculations. J. Chem. Theory Comput. 2013, 9 (1), 46–53.
     https://doi.org/10.1021/ct300515n.
(15) Yang, H.; Yuan, B.; Zhang, X.; Scherman, O. A. Supramolecular Chemistry at Interfaces:
     Host-Guest Interactions for Fabricating Multifunctional Biointerfaces. Acc. Chem. Res.
     2014, 47 (7), 2106–2115. https://doi.org/10.1021/ar500105t.
(16) Muddana, H. S.; Fenley, A. T.; Mobley, D. L.; Gilson, M. K. The SAMPL4 Host–Guest
     Blind Prediction Challenge: An Overview. J. Comput. Aided. Mol. Des. 2014, 28 (4),
     305–317. https://doi.org/10.1007/s10822-014-9735-1.
(17) Gallicchio, E.; Chen, H.; Chen, H.; Fitzgerald, M.; Gao, Y.; He, P.; Kalyanikar, M.; Kao,
     C.; Lu, B.; Niu, Y.; Pethe, M.; Zhu, J.; Levy, R. M. BEDAM Binding Free Energy
     Predictions for the SAMPL4 Octa-Acid Host Challenge. J. Comput. Aided. Mol. Des.
     2015, 29 (4), 315–325. https://doi.org/10.1007/s10822-014-9795-2.
(18) Yin, J.; Henriksen, N. M.; Slochower, D. R.; Shirts, M. R.; Chiu, M. W.; Mobley, D. L.;
     Gilson, M. K. Overview of the SAMPL5 Host–Guest Challenge: Are We Doing Better? J.
     Comput. Aided. Mol. Des. 2017, 31 (1), 1–19. https://doi.org/10.1007/s10822-016-9974-4.
(19) Liu, S.; Ruspic, C.; Mukhopadhyay, P.; Chakrabarti, S.; Zavalij, P. Y.; Isaacs, L. The
     Cucurbit[n]Uril Family: Prime Components for Self-Sorting Systems. J. Am. Chem. Soc.
     2005, 127 (45), 15959–15967. https://doi.org/10.1021/ja055013x.
(20) Gan, H.; Benjamin, C. J.; Gibb, B. C. Nonmonotonic Assembly of a Deep-Cavity
     Cavitand.       J.    Am.     Chem.       Soc.    2011,     133      (13),    4770–4773.
     https://doi.org/10.1021/ja200633d.
(21) Biedermann, F.; Scherman, O. A. Cucurbit[8]Uril Mediated Donor–Acceptor Ternary
     Complexes: A Model System for Studying Charge-Transfer Interactions. J. Phys. Chem. B
     2012, 116 (9), 2842–2849. https://doi.org/10.1021/jp2110067.
(22) Vázquez, J.; Remón, P.; Dsouza, R. N.; Lazar, A. I.; Arteaga, J. F.; Nau, W. M.; Pischel,
                                              63


     U. A Simple Assay for Quality Binders to Cucurbiturils. Chem. - A Eur. J. 2014, 20 (32),
     9897–9901. https://doi.org/10.1002/chem.201403405.
(23) Gibb, C. L. D.; Gibb, B. C. Binding of Cyclic Carboxylates to Octa-Acid Deep-Cavity
     Cavitand.      J.   Comput.     Aided.    Mol.      Des.    2014,    28      (4),   319–325.
     https://doi.org/10.1007/s10822-013-9690-2.
(24) Nicholls, A.; Wlodek, S.; Grant, J. A. The SAMP1 Solvation Challenge: Further Lessons
     Regarding the Pitfalls of Parametrization. J. Phys. Chem. B 2009, 113 (14), 4521–4532.
     https://doi.org/10.1021/jp806855q.
(25) Mobley, D. L.; Bayly, C. I.; Cooper, M. D.; Dill, K. A. Predictions of Hydration Free
     Energies from All-Atom Molecular Dynamics Simulations †. J. Phys. Chem. B 2009, 113
     (14), 4533–4537. https://doi.org/10.1021/jp806838b.
(26) Geballe, M. T.; Skillman, a. G.; Nicholls, A.; Guthrie, J. P.; Taylor, P. J. The SAMPL2
     Blind Prediction Challenge: Introduction and Overview. J. Comput. Aided. Mol. Des.
     2010, 24 (4), 259–279. https://doi.org/10.1007/s10822-010-9350-8.
(27) Zwanzig, R. W. High‐temperature Equation of State by a Perturbation Method. I.
     Nonpolar        Gases.    J.     Chem.     Phys.       1954,     22      (8),     1420–1426.
     https://doi.org/10.1063/1.1740409.
(28) Jiang, W.; Hodoscek, M.; Roux, B. Computation of Absolute Hydration and Binding Free
     Energy with Free Energy Perturbation Distributed Replica-Exchange Molecular
     Dynamics.        J.   Chem.     Theory     Comput.       2009,     5    (10),     2583–2588.
     https://doi.org/10.1021/ct900223z.
(29) Mitchell, M. J.; McCammon, J. A. Free Energy Difference Calculations by
     Thermodynamic Integration: Difficulties in Obtaining a Precise Value. J. Comput. Chem.
     1991, 12 (2), 271–275. https://doi.org/10.1002/jcc.540120218.
(30) Chodera, J. D.; Mobley, D. L.; Shirts, M. R.; Dixon, R. W.; Branson, K.; Pande, V. S.
     Alchemical Free Energy Methods for Drug Discovery: Progress and Challenges. Curr.
     Opin. Struct. Biol. 2011, 21 (2), 150–160. https://doi.org/10.1016/j.sbi.2011.01.011.
(31) Hansen, N.; van Gunsteren, W. F. Practical Aspects of Free-Energy Calculations: A
     Review.       J.    Chem.     Theory     Comput.        2014,     10     (7),     2632–2647.
     https://doi.org/10.1021/ct500161f.
(32) Williams-Noonan, B. J.; Yuriev, E.; Chalmers, D. K. Free Energy Methods in Drug
     Design: Prospects of “Alchemical Perturbation” in Medicinal Chemistry. J. Med. Chem.
     2018, 61 (3), 638–649. https://doi.org/10.1021/acs.jmedchem.7b00681.
(33) Hou, T.; Wang, J.; Li, Y.; Wang, W. Assessing the Performance of the MM/PBSA and
     MM/GBSA Methods. 1. The Accuracy of Binding Free Energy Calculations Based on
     Molecular Dynamics Simulations. J. Chem. Inf. Model. 2011, 51 (1), 69–82.
     https://doi.org/10.1021/ci100275a.
                                              64


(34) Homeyer, N.; Gohlke, H. Free Energy Calculations by the Molecular Mechanics
     Poisson−Boltzmann Surface Area Method. Mol. Inform. 2012, 31 (2), 114–122.
     https://doi.org/10.1002/minf.201100135.
(35) Genheden, S.; Ryde, U. The MM / PBSA and MM / GBSA Methods to Estimate Ligand-
     Binding Affinities. 2015.
(36) Wang, C.; Greene, D.; Xiao, L.; Qi, R.; Luo, R. Recent Developments and Applications of
     the       MMPBSA           Method.       Front.       Mol.       Biosci.       2018,      4.
     https://doi.org/10.3389/fmolb.2017.00087.
(37) Genheden, S.; Ryde, U. Comparison of the Efficiency of the LIE and MM/GBSA
     Methods to Calculate Ligand-Binding Energies. J. Chem. Theory Comput. 2011, 7 (11),
     3768–3778. https://doi.org/10.1021/ct200163c.
(38) Hansson, T.; Marelius, J.; Aqvist, J. Ligand Binding Affinity Prediction by Linear
     Interaction Energy Methods. J. Comput. Aided. Mol. Des. 1998, 12 (1), 27–35.
     https://doi.org/10.1023/A:1007930623000.
(39) Steinmann, C.; Olsson, M. A.; Ryde, U. Relative Ligand-Binding Free Energies
     Calculated from Multiple Short QM/MM MD Simulations. J. Chem. Theory Comput.
     2018, Article ASAP. https://doi.org/10.1021/acs.jctc.8b00081.
(40) Curutchet, C.; Cupellini, L.; Kongsted, J.; Corni, S.; Frediani, L.; Steindal, A. H.; Guido,
     C. A.; Scalmani, G.; Mennucci, B. Density-Dependent Formulation of Dispersion-
     Repulsion Interactions in Hybrid Multiscale Quantum/Molecular Mechanics (QM/MM)
     Models.       J.    Chem.     Theory      Comput.      2018,     14      (3),   1671–1681.
     https://doi.org/10.1021/acs.jctc.7b00912.
(41) Sellers, B. D.; James, N. C.; Gobbi, A. A Comparison of Quantum and Molecular
     Mechanical Methods to Estimate Strain Energy in Druglike Fragments. J. Chem. Inf.
     Model. 2017, 57 (6), 1265–1275. https://doi.org/10.1021/acs.jcim.6b00614.
(42) Lu, Y.; Yang, C. Y.; Wang, S. Binding Free Energy Contributions of Interfacial Waters in
     HIV-1 Protease/Inhibitor Complexes. J. Am. Chem. Soc. 2006, 128 (36), 11830–11839.
     https://doi.org/10.1021/ja058042g.
(43) Bonnet, P.; Bryce, R. A. Molecular Dynamics and Free Energy Analysis of
     Neuraminidase – Ligand Interactions. Protein Sci. 2004, 13, 946–957.
     https://doi.org/10.1110/ps.03129704.four-hydroxyl.
(44) Kitamura, K.; Tamura, Y.; Ueki, T.; Ogata, K.; Noda, S.; Himeno, R.; Chuman, H.
     Binding Free-Energy Calculation Is a Powerful Tool for Drug Optimization: Calculation
     and Measurement of Binding Free Energy for 7-Azaindole Derivatives to Glycogen
     Synthase Kinase-3β. J. Chem. Inf. Model. 2014, 54 (6), 1653–1660.
     https://doi.org/10.1021/ci400719v.
(45) Caldararu, O.; Olsson, M. A.; Riplinger, C.; Neese, F.; Ryde, U. Binding Free Energies in
                                              65


     the SAMPL5 Octa-Acid Host–Guest Challenge Calculated with DFT-D3 and CCSD(T). J.
     Comput. Aided. Mol. Des. 2017, 31 (1), 87–106. https://doi.org/10.1007/s10822-016-
     9957-5.
(46) Sure, R.; Antony, J.; Grimme, S. Blind Prediction of Binding Affinities for Charged
     Supramolecular Host-Guest Systems: Achievements and Shortcomings of DFT-D3. J.
     Phys. Chem. B 2014, 118 (12), 3431–3440. https://doi.org/10.1021/jp411616b.
(47) Mikulskis, P.; Cioloboc, D.; Andrejić, M.; Khare, S.; Brorsson, J.; Genheden, S.; Mata, R.
     A.; Söderhjelm, P.; Ryde, U. Free-Energy Perturbation and Quantum Mechanical Study of
     SAMPL4 Octa-Acid Host-Guest Binding Energies. J. Comput. Aided. Mol. Des. 2014, 28
     (4), 375–400. https://doi.org/10.1007/s10822-014-9739-x.
(48) Murkli, S.; McNeil, J.; Isaacs, L. CB[8]-Guest Binding Affinities: A Blinded Dataset for
     the SAMPL6 Challenge. Supramol. Chem. 2018, (Submitted).
(49) Neese, F.; Wennmohs, F.; Hansen, A.; Becker, U. Efficient, Approximate and Parallel
     Hartree–Fock and Hybrid DFT Calculations. A ‘Chain-of-Spheres’ Algorithm for the
     Hartree–Fock       Exchange.      Chem.     Phys.     2009,     356     (1–3),    98–109.
     https://doi.org/10.1016/j.chemphys.2008.10.036.
(50) Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A Consistent and Accurate Ab Initio
     Parametrization of Density Functional Dispersion Correction (DFT-D) for the 94
     Elements       H-Pu.      J.     Chem.     Phys.      2010,     132      (15),    154104.
     https://doi.org/10.1063/1.3382344.
(51) Mintz, B.; Lennox, K. P.; Wilson, A. K. Truncation of the Correlation Consistent Basis
     Sets: An Effective Approach to the Reduction of Computational Cost? J. Chem. Phys.
     2004, 121 (12), 5629–5634. https://doi.org/10.1063/1.1785145.
(52) Chemical Computing Group Inc. Molecular Operating Environment (MOE). Montreal
     2016.
(53) Corbeil, C. R.; Williams, C. I.; Labute, P. Variability in Docking Success Rates Due to
     Dataset Preparation. J. Comput. Aided. Mol. Des. 2012, 26 (6), 775–786.
     https://doi.org/10.1007/s10822-012-9570-1.
(54) Hoffmann, R. An Extended Hückel Theory. I. Hydrocarbons. J. Chem. Phys. 1963, 39 (6),
     1397–1412. https://doi.org/10.1063/1.1734456.
(55) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison
     of Multiple Amber Force Fields and Development of Improved Protein Backbone
     Parameters. Proteins 2006, 65 (3), 712–725. https://doi.org/10.1002/prot.21123.
(56) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. a; Case, D. a. Development and
     Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25 (9), 1157–1174.
     https://doi.org/10.1002/jcc.20035.
                                              66


(57) Jakalian, A.; Jack, D. B.; Bayly, C. I. Fast, Efficient Generation of High-Quality Atomic
     Charges. AM1-BCC Model: II. Parameterization and Validation. J. Comput. Chem. 2002,
     23 (16), 1623–1641. https://doi.org/10.1002/jcc.10128.
(58) Becke, A. D. Density‐functional Thermochemistry. III. The Role of Exact Exchange. J.
     Chem. Phys. 1993, 98 (7), 5648–5652. https://doi.org/10.1063/1.464913.
(59) Lee, C.; Yang, W.; Parr, R. G. Development of the Colle-Salvetti Correlation-Energy
     Formula into a Functional of the Electron Density. Phys. Rev. B 1988, 37 (2), 785–789.
     https://doi.org/10.1103/PhysRevB.37.785.
(60) Hariharan, P. C.; Pople, J. A. The Influence of Polarization Functions on Molecular
     Orbital      Hydrogenation        Energies;     Springer-Verlag,     1973;      Vol.     28.
     https://doi.org/10.1007/BF00533485.
(61) Hay, P. J.; Wadt, W. R. Ab Initio Effective Core Potentials for Molecular Calculations.
     Potentials for the Transition Metal Atoms Sc to Hg. J. Chem. Phys. 1985, 82 (1), 270–283.
     https://doi.org/10.1063/1.448799.
(62) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J.
     R.; Scalmani, G.; Barone, V.; Petersson, G. A.; Nakatsuji, H.; Li, X.; Caricato, M.;
     Marenich, A. V; Bloino, J.; Janesko, B. G.; Gomperts, R.; Mennucci, B.; Hratchian, H. P.;
     Ortiz, J. V; Izmaylov, A. F.; Sonnenberg, J. L.; Williams-Young, D.; Ding, F.; Lipparini,
     F.; Egidi, F.; Goings, J.; Peng, B.; Petrone, A.; Henderson, T.; Ranasinghe, D.;
     Zakrzewski, V. G.; Gao, J.; Rega, N.; Zheng, G.; Liang, W.; Hada, M.; Ehara, M.; Toyota,
     K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.;
     Vreven, T.; Throssell, K.; Montgomery Jr., J. A.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.
     J.; Heyd, J. J.; Brothers, E. N.; Kudin, K. N.; Staroverov, V. N.; Keith, T. A.; Kobayashi,
     R.; Normand, J.; Raghavachari, K.; Rendell, A. P.; Burant, J. C.; Iyengar, S. S.; Tomasi,
     J.; Cossi, M.; Millam, J. M.; Klene, M.; Adamo, C.; Cammi, R.; Ochterski, J. W.; Martin,
     R. L.; Morokuma, K.; Farkas, O.; Foresman, J. B.; Fox, D. J. Gaussian 16 Revision A.03.
     2016.
(63) Case, D. A.; Betz, R. M.; Botello-Smith, W.; Cerutti, D. S.; Cheatham III, T. E.; Darden,
     T. A.; Duke, R. E.; Giese, T. J.; Gohlke, H.; Goetz, A. W.; Homeyer, N.; Izadi, S.;
     Janowski, P.; Kaus, J.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Li, P.; Lin, C.; Luchko, T.;
     Luo, R.; Madej, B.; York, D. M.; Kollman, P. A. Amber 16. 2016.
     https://doi.org/10.1002/jcc.23031.
(64) Joung, I. S.; Cheatham, T. E. Determination of Alkali and Halide Monovalent Ion
     Parameters for Use in Explicitly Solvated Biomolecular Simulations. J. Phys. Chem. B
     2008, 112 (30), 9020–9041. https://doi.org/10.1021/jp8001614.
(65) Horn, H. W.; Swope, W. C.; Pitera, J. W.; Madura, J. D.; Dick, T. J.; Hura, G. L.; Head-
     Gordon, T. Development of an Improved Four-Site Water Model for Biomolecular
     Simulations: TIP4P-Ew. J. Chem. Phys. 2004, 120 (20), 9665–9678.
     https://doi.org/10.1063/1.1683075.
                                               67


(66) Miller, B. R.; McGee, T. D.; Swails, J. M.; Homeyer, N.; Gohlke, H.; Roitberg, A. E.
     MMPBSA.Py: An Efficient Program for End-State Free Energy Calculations. J. Chem.
     Theory Comput. 2012, 8 (9), 3314–3321. https://doi.org/10.1021/ct300418h.
(67) Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering
     Clusters in Large Spatial Databases with Noise. In KDD-96; 1996; pp 226–231.
(68) Merrick, J. P.; Moran, D.; Radom, L. An Evaluation of Harmonic Vibrational Frequency
     Scale Factors.        J.   Phys.    Chem. A         2007,    111    (45), 11683–11700.
     https://doi.org/10.1021/jp073974n.
(69) Neese, F. Software Update: The ORCA Program System, Version 4.0. Wiley Interdiscip.
     Rev. Comput. Mol. Sci. 2018, 8 (1), e1327. https://doi.org/10.1002/wcms.1327.
(70) Perdew, J. P.; Wang, Y. Accurate and Simple Analytic Representation of the Electron-Gas
     Correlation      Energy.    Phys.     Rev.    B     1992,    45    (23),   13244–13249.
     https://doi.org/10.1103/PhysRevB.45.13244.
(71) Perdew, J. P.; Chevary, J. A.; Vosko, S. H.; Jackson, K. A.; Pederson, M. R.; Singh, D. J.;
     Fiolhais, C. Atoms, Molecules, Solids, and Surfaces: Applications of the Generalized
     Gradient Approximation for Exchange and Correlation. Phys. Rev. B 1992, 46 (11), 6671–
     6687. https://doi.org/10.1103/PhysRevB.46.6671.
(72) Eichkorn, K.; Treutler, O.; Öhm, H.; Häser, M.; Ahlrichs, R. Auxiliary Basis Sets to
     Approximate Coulomb Potentials. Chem. Phys. Lett. 1995, 240 (4), 283–290.
     https://doi.org/10.1016/0009-2614(95)00621-A.
(73) Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Universal Solvation Model Based on
     Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk
     Dielectric Constant and Atomic Surface Tensions. J. Phys. Chem. B 2009, 113 (18),
     6378–6396. https://doi.org/10.1021/jp810292n.
(74) Goerigk, L.; Grimme, S. A General Database for Main Group Thermochemistry, Kinetics,
     and Noncovalent Interactions - Assessment of Common and Reparameterized (Meta-
     )GGA        Density      Functionals.     J.    Chem.       Theory     Comput.       2010.
     https://doi.org/10.1021/ct900489g.
(75) Goerigk, L.; Grimme, S. A Thorough Benchmark of Density Functional Methods for
     General Main Group Thermochemistry, Kinetics, and Noncovalent Interactions. Phys.
     Chem. Chem. Phys. 2011, 13 (14), 6670. https://doi.org/10.1039/c0cp02984j.
(76) Dunning, T. H. Gaussian Basis Sets for Use in Correlated Molecular Calculations. I. The
     Atoms Boron through Neon and Hydrogen. J. Chem. Phys. 1989, 90 (2), 1007–1023.
     https://doi.org/10.1063/1.456153.
(77) Feller, D. Application of Systematic Sequences of Wave Functions to the Water Dimer. J.
     Chem. Phys. 1992, 96 (8), 6104–6114. https://doi.org/10.1063/1.462652.
                                              68


(78) Martin, J. M. L. Ab Initio Total Atomization Energies of Small Molecules — towards the
     Basis     Set    Limit.    Chem.     Phys.    Lett.   1996,     259     (5–6),  669–678.
     https://doi.org/10.1016/0009-2614(96)00898-6.
(79) Wilson, A. K.; Dunning Jr., T. H. Benchmark Calculations with Correlated Molecular
     Wave Functions. X. Comparison with “Exact” MP2 Calculations on Ne, HF, H2O, and
     N2. J. Chem. Phys. 1997, 106 (21), 8718–8726. https://doi.org/10.1063/1.473932.
(80) Feller, D.; Peterson, K. A.; Crawford, T. D. Sources of Error in Electronic Structure
     Calculations on Small Chemical Systems. J. Chem. Phys. 2006, 124 (5), 54107.
     https://doi.org/10.1063/1.2137323.
(81) Jensen, F. Polarization Consistent Basis Sets. II. Estimating the Kohn–Sham Basis Set
     Limit. J. Chem. Phys. 2002, 116 (17), 7372–7379. https://doi.org/10.1063/1.1465405.
(82) Faver, J. C.; Zheng, Z.; Merz, K. M. Model for the Fast Estimation of Basis Set
     Superposition Error in Biomolecular Systems. J. Chem. Phys. 2011, 135 (14).
     https://doi.org/10.1063/1.3641894.
(83) Boys, S. F.; Bernardi, F. The Calculation of Small Molecular Interactions by the
     Differences of Separate Total Energies. Some Procedures with Reduced Errors. Mol.
     Phys. 1970, 19 (4), 553–566. https://doi.org/10.1080/00268977000101561.
(84) Gavish, N.; Promislow, K. Dependence of the Dielectric Constant of Electrolyte Solutions
     on Ionic Concentration: A Microfield Approach. Phys. Rev. E 2016, 94 (1), 012611.
     https://doi.org/10.1103/PhysRevE.94.012611.
(85) Kendall, M. G. A New Measure of Rank Correlation. Biometrika 1938, 30 (1/2), 81.
     https://doi.org/10.2307/2332226.
(86) Berry, K. J.; Johnston, J. E.; Zahran, S.; Mielke, P. W. Stuart ’ s Tau Measure of Effect
     Size for Ordinal Variables : Some Methodological Considerations. 2009, 41 (4), 1144–
     1148. https://doi.org/10.3758/BRM.41.4.1144.
(87) Dean, R. B.; Dixon, W. J. Simplified Statistics for Small Numbers of Observations. Anal.
     Chem. 1951, 23 (4), 636–638. https://doi.org/10.1021/ac60052a025.
(88) Bayly, C. I.; Cieplak, P.; Cornell, W. D.; Kollman, P. A. A Well-Behaved Electrostatic
     Potential Based Method Using Charge Restraints for Deriving Atomic Charges: The
     RESP        Model.     J.    Phys.     Chem.      1993,    97      (40),    10269–10280.
     https://doi.org/10.1021/j100142a004.
                                              69


                            CHAPTER FOUR
SAMPL7: Host–Guest Binding Prediction by Molecular Dynamics and Quantum
Mechanics
                                    70


About this chapter: This chapter is reprinted from Eken, Y.; Almeida, N. M. S.; Wang, C.;
Wilson, A. K. SAMPL7: Host–Guest Binding Prediction by Molecular Dynamics and Quantum
Mechanics. J. Comput. Aided. Mol. Des. 2021, 35 (1), 63–77 with permission of the Springer
Natures. The docking, molecular dynamics simulations and MMPBSA/MMGBSA calculations
mentioned in this chapter are performed by Yiğitcan Eken and quantum mechanics calculations
are done by co-authors Nuno M.S. Almeida and Cong Wang.
4.1 Introduction
    Computer-aided drug design (CADD) has become a fundamental approach for the
pharmaceutical industry and medicinal community.1–4 Even though CADD methods are used in
various phases of drug design to help enhance, accelerate, and reduce the cost of the discovery of
pharmaceuticals, the balance between accuracy and computational cost of the prediction method
with respect to computing time and memory is still being scrutinized.
    SAMPL challenges provide opportunities for large-scale investigations of computational
strategies for the prediction of physicochemical properties (i.e. solvation free energies,
distribution coefficient, pKa values, and binding free energies) of a series of compounds.5–12 In
SAMPL challenges, the physicochemical properties of the compounds are measured
experimentally prior to the challenge, and the results from experiment are not provided until the
computational predictions have been submitted. Then, the predicted properties are compared
with the experimental measurements to assess the submissions. One of the central challenges of
CADD is the accurate prediction of ligand binding to a target protein, which can significantly
reduce the number of compounds synthesized and considered experimentally. The biggest barrier
in developing and applying such methods are large protein structures. The proteins have flexible
configurations along with multiple binding sites and conformations, which make the study
                                                71


computationally time consuming and difficult to sample all binding configurations. On the other
hand, host molecules, which have uses such as reaction vessels, separation devices, and
modulators for redox-active or fluorescent guests, have smaller structures and less conformations
in comparison to proteins.13 Due to their low complexity, host–guest systems are commonly used
for the study of ligand binding predictions.5–8,10
    In 2019, the SAMPL7 Gibb Deep Cavity Cavitand (GDCC) challenge prompted the study of
the binding of eight guest molecules on two GDCC hosts, which are the Octa Acid (OA) and
exo-Octa Acid (exoOA) hosts. The OA host and guests 1 to 6 (G1–G6) also were used in the
previous SAMPL competitions.6,7,10 The previously studied OA systems were considered as
reference, two new guests were included, and studied for the OA, and a total of eight guests were
studied on the new exoOA host. For the prediction of binding free energies there are numerous
computational methods available, and examples include empirical ligand docking scoring
functions, and methods that use molecular dynamics (MD) trajectories, such as free-energy
perturbation14, replica exchange free energy perturbation  15
                                                              , thermodynamic integration  16
                                                                                              and
end-state free energy methods 17. During previous SAMPL challenges, predictions based on MD
simulations are commonly adopted.7,10,18–25 QM approaches have also been valuable, but the size
of the system can make computational calculations impractical. However, methods such as DFT
have been useful, because their computational cost is lower relative to ab initio quantum
mechanical methods. MMPBSA or MMGBSA methods 17 are among the most popular end-state
free energy methods, which can be used to calculate the free energy change between two states
(such as bound and a free state of host–guest or protein–ligand). As these end-state methods do
not require simulation of the intermediate states, they are among the computationally least
demanding of the free energy methods that can be used without the loss of accuracy as compared
                                                   72


to more rigorous techniques.26 The MMPBSA/MMGBSA methods are built upon a series of
calculations: free energies of the solvated unbound receptor, unbound ligand, and the ligand
bound protein complex systems (Eq. 4.1). The free energy of each system defined in Eq. 4.2 is
calculated by approximating gas-phase energies (Egas) from molecular mechanics, and the
solvation free energies (∆Gsolvation) by PBSA, or GBSA. The solute entropies (Ssolute) are
determined from N-mode frequencies, or via a quasi-harmonic approximation, or are sometimes
neglected (Eq. 4.2).17 Then, the calculated free energies of the unbound receptor and ligand are
subtracted from the free energy of the ligand bound protein complex to determine the free energy
of binding as described on Eq. 4.2. In the MD study described herein, the effect of different
solvation models, entropy calculations, and charge models on accuracy and efficiency is
evaluated.
                    ∆𝐺𝐵𝑖𝑛𝑑𝑖𝑛𝑔 = 𝐺𝑐𝑜𝑚𝑝𝑙𝑒𝑥 − (𝐺𝑃𝑟𝑜𝑡𝑒𝑖𝑛 + 𝐺𝐿𝑖𝑔𝑎𝑛𝑑 )            (4.1)
                    𝐺𝑆𝑦𝑠𝑡𝑒𝑚 = 𝐸𝑔𝑎𝑠 + ∆𝐺𝑆𝑜𝑙𝑣𝑎𝑡𝑖𝑜𝑛 − 𝑇𝑆𝑆𝑜𝑙𝑢𝑡𝑒                 (4.2)
                                                             27
    In terms of the SAMPL competitions, since SAMPL3            , quantum mechanical calculations
have been used as valuable prediction methods for host–guest binding energies. QM methods
have advantages in describing stationary systems with high accuracy, but typically address far
fewer configurations than is possible with MD related approaches. From initial inputs which are
often based on crystal structures, chemical experience, or MD poses, complexes are optimized to
stationary points on potential energy surfaces, and calculation are done to obtain frequencies.
The QM approaches involved in the prior SAMPL competitions, to a certain extent, reflect the
development of QM methods throughout the years. In the SAMPL3 competition, the Lee–Yang–
Parr three-parameter hybrid functional (B3LYP)28–30 was employed in single-point calculations.
Structure optimization and frequency calculations were performed with the MMFF94 force
                                                73


field.27 In SAMPL431, dispersion corrections32–34 were included in the DFT calculations. The
structures were optimized with density functional methods (TPSS-D3, PW6B95-D3) and the
frequency calculations were carried out using a semi-empirical HF3c level.35 In another
submission of SAMPL4, local correlated coupled-cluster singles and doubles with perturbatively
corrected triples (LCCSD(T)) was adopted on dispersion corrected functional, TPSS-D3,
optimized structures. The frequency calculation calculations were carried out at a force field
level.36 In SAMPL537, the domain-based local pair-natural orbital coupled cluster singles and
doubles with perturbatively corrected triples (DLPNO-CCSD(T))38 method was used while the
frequency calculations remain at HF3c level. In SAMPL6, our research group performed
dispersion corrected DFT calculations (B3PW91-D328,33,39–41) with the solvation model based on
the density (SMD) model, with structures generated from MD simulations.23,42 In the QM effort
described here, a double-hybrid functional (B2PLYP-D3)43,44 was considered. For the exoOA-G2
complex, the effect of geometry optimization and thermochemistry corrections in solvent was
evaluated. Single-point calculations were performed on the resulting structure using the recently
developed functional ωB97M-V, which has resulted in excellent predictions of binding energies
in earlier studies.45,46
4.2 Methods
    The initial structures of the eight guest molecules are shown in Figure. 4.1 and the two host
molecules OA and exoOA are shown in Figure 4.2 (SAMPL7 GDCC dataset). The protonation
state of both hosts and guests were determined by the Protonate3D module implemented in
Molecular Operating Environment version 2019.01 (MOE).47,48 Host–guest binding complexes
were generated by the docking feature of MOE. The placement step of the docking was
performed by the triangle matcher algorithm, and the refinement step performed by the induced
                                                74


fit method which also accounts for the changes in the host structure upon binding.48,49 The
resulting host–guest binding poses with the highest generalized-Born volume integral/weighted
surface area score (GBVI/WSA ΔG) were minimized with molecular mechanics using
AMBER10: Extended Hückel Theory (EHT) force field implemented in MOE, which employs
Amber ff10 and EHT bond parameters.50–53 The minimized structures (Figures. 4.4, 4.5) are
further investigated using MD and QM. More details about the MD and QM calculations are
explained in the “Molecular dynamics protocol” and “Quantum mechanical methods” sections,
respectively. The QM optimized structures are shown in Figure. 4.8.
    Figure 4. 1 Guest molecules in the SAMPL7 GDCC host–guest binding challenge. The
binding of these eight guest molecules is considered for both OA and exoOA hosts.
                                                75


    Figure 4. 2 The guest molecules for the octa-acid (OA) and tetra methyl octa-acid (TEMOA)
hosts.
4.2.1 Molecular dynamics protocol
    In order to assess the effect of the charge scheme on the accuracy of the binding predictions,
partial charges of the host and guest atoms were calculated with two different methods: AM1-
BCC and HF/6-31G* restrained electrostatic potential charges (RESP) and the resulting binding
free energies are compared. The AM1-BCC charges are generated using the Antechamber
module of Amber.54 The RESP charges are generated using RED server.55,56 The calculated
partial charges were fitted using the general Amber force field (GAFF) to generate MD
parameters for the host and the guest molecules.52 The simulation box for each host–guest
system was generated using the “leap” module of Amber tools.57 Then, each system was
                                                 76


neutralized with sodium counterions with parameters from Joung and Cheatham.58 After this
step, the host–guest systems were solvated in a 14.0 Å cube of TIP4P-Ew water, which has been
previously shown to be in good agreement with experiment, in terms of the ion hydration free
energy, hydration radius, and coordination numbers.59,60 Finally, additional sodium chloride ions
were added to the simulation box in order to mimic the experimental ionic strength of the 10 mM
sodium phosphate buffer.
    The host–guest systems were minimized with decreasing energy restraints on the host
molecules (500.0, 200.0, 20.0, 10.0, 5.0, 0.0 kcal mol−1). Then, the systems were heated
gradually to 300 K over 30 ps. After heating, 10 ns production simulations at 300 K and 1 atm
pressure were performed. The production simulations were done in triplicate to account for
randomized parameters that affect the MD trajectories such as initial velocities. During all
simulations, the temperature was controlled by Langevin dynamics and the pressure was
controlled by isotropic position scaling.57,61,62 Nonbonded interactions were truncated with a
10.0 Å cutoff, whereas long-range electrostatics were handled with the particle-mesh Ewald
(PME) method.63 Bonds involving hydrogen were constrained using SHAKE, and the simulation
time step was set to 2 fs.64 All simulations were performed with AMBER18 and 500 snapshots
are extracted from each of the production runs for further use in MMPBSA/MMGBSA
calculations. The RMSD plots for the MD simulations can be found in Figures. 4.9–4.16.
4.2.2 MMPBSA/MMGBSA calculations
    The binding free energies of the host-guest complexes were calculated using both MMPBSA
and MMGBSA methods with a modified General Born solvation model by Onufriev et al.65 to
consider the effect of solvation models on the accuracy. MMPBSA and MMGBSA approaches
are implemented in the Amber PBSA-solver. For all calculations, the default internal and
                                                 77


external dielectric constants were used (1.0 and 80.0, respectively), the solvent accessible surface
area (SASA) was determined with the default Linear Combinations of Pairwise Overlaps
(LCPO) method using modified Bondi atomic radii. For both MMPBSA and MMGBSA, the
initial 500 snapshots of the MD simulations were used to calculate the binding energies. It has
been shown by Hou et al., that such simulations are useful, and longer timeframes do not
necessarily correspond to better accuracy in the calculated binding energies relative to the
experimental binding energies.66
     To consider how solute entropies affect the accuracy of the calculations, the solute entropies
were determined using N-mode approximation and were compared to the neglected solute
entropy results. To correct the calculated binding energies, the OA host–guest binding dataset
from SAMPL6 was used. The binding energies of the SAMPL6 guests to OA were predicted
with the RESP-MMPBSA method with the neglected solute entropies and the results are
provided in Table 4.3. The SAMPL6 predictions were plotted against their experimental values
to create a linear fitting curve (Figure 4.17). The fitting curve equation was used to correct
SAMPL7 RESP-MMPBSA results.
4.2.3 Quantum Mechanical Methods
     For QM calculations, docking poses were used as initial guess structures and the geometries
of the host–guest systems were optimized using the Gaussian 16 software package.67 Due to the
constraints associated with the size of the system, DFT was employed for quantum mechanical
calculations. The B3PW91 functional was chosen along with GD3BJ to describe the dispersion
forces.28,33,39–41 This method was also considered for SAMPL6.23 For exoOA-G3, there were
difficulties reaching convergence with B3PW91 in the time constraints of the competition. Thus,
B97D was considered partnered with SMD, which did reach convergence. 39 cc-pVDZ basis sets
                                                78


were used for all atoms in each of the complexes.68 Frequencies were calculated for all geometry
optimization steps, guaranteeing they were at a minimum on the potential energy surface. Note
that while a double-ζ level basis set is not ideal for small molecules, because of the size of the
host–guest systems, it is used here.
    After this step, single-point energy calculations were carried out using B2PLYP-D343,44,
which includes GD3BJ dispersion.33 B2PLYP is a double-hybrid functional that includes
Hartree–Fock exchange and MP2-like correlation, and has been shown previously to provide
lower overall errors as compared with other DFT functionals in terms of long-, short-range, and
side chain-side chain interactions.69 Due to the size of the system, an MP2 correction was not
considered. The non-double-hybrid part of B2PLYP includes gradient approximations of GGA
methods with Becke exchange, Lee, Yang and Parr correlation, along with Hartree–Fock
exchange.43 To account for the role of the solvent (water), the SMD solvation model was
employed.42 The inclusion of an implicit solvation model was deemed essential to mimic the
stabilization that water molecules have on the system and produce reliable binding energies.
    Regarding basis sets, a set of double-, triple-, and quadruple-ζ correlation consistent basis
sets were used for single-point calculations.68,70,71 For oxygen and chlorine atoms, the augmented
form of the basis sets was important, due to the negative charges located on the oxygens and
electronegative nature of the chlorine.70,71 As a modified version of the correlation consistent
basis was recommended to replace the original correlation consistent basis sets for second-row
atoms, for species that included chlorine, the tight-d forms of the basis sets (cc-pV(D+d)Z and
cc-pV(T+d)Z72) were considered for OA-G2 and exoOA-G2 and the predictions have been
included in Table 4.4.
                                                  79


    Due to inaccuracies associated with the G2 guest binding predictions (described in the next
section), an investigation of the influence of the solvent on molecular complexes was performed
for the exoOA-G2 complex. Geometry optimizations and frequency calculations were conducted
with B3LYP-D3 (GD3BJ)/6-31G*28–30,73,74 in combination with the integral equation formalism
variant of polarizable continuum model (IEF-PCM)75 solvation model using Gaussian 16 for the
complex and monomers. The reason for adopting the IEF-PCM model is that a converged
structure was not obtained in geometry optimization that employed SMD. Thermochemistry
corrections were carried out at 298.15 K and scaled by 0.96 for anharmonicity.76 The standard
state corrections were applied.42,77 The SMD solvation method was combined with the conductor
like PCM (C-PCM)75,78 for single-point calculations with the ωB97M-V functional45,46 in
conjunction with a range of correlation consistent basis sets using ORCA 4.2.1.79 An exponential
form of a basis set extrapolation scheme to the complete basis set (CBS) limit—Kohn–Sham
(KS) limit for DFT—was adopted80 and the extrapolation exponent (5.46) was considered from
Neese et al..80
    In this combined approach, the Gibbs free energy is calculated as;
      ∆𝐸 = 𝐸𝑐𝑜𝑚𝑝𝑙𝑒𝑥 − 𝐸ℎ𝑜𝑠𝑡 − 𝐸𝑔𝑢𝑒𝑠𝑡                                                       (4.3)
                                         𝐵3𝐿𝑌𝑃−𝐷3(𝐺𝐷3𝐵𝐽)/6−31𝐺∗/𝐼𝐸𝐹−𝑃𝐶𝑀       0
      ∆𝐺 = ∆𝐸 𝑤𝐵97𝑀−𝑉/𝐶𝐵𝑆/𝑆𝑀𝐷 + ∆𝐺𝑠𝑐𝑎𝑙𝑒𝑑 𝑅𝑅𝐻𝑂                            + ∆𝐺𝑔𝑎𝑠/𝑠𝑜𝑙𝑢𝑡𝑒    (4.4)
        𝑋       𝐶𝐵𝑆
      𝐸𝑆𝐶𝐹  = 𝐸𝑆𝐶𝐹   + Aexp(−α√𝑥)                                                          (4.5)
    Here RRHO represents the rigid–rotor harmonic-oscillator. In Eq. (4.3), ΔE stands for the
difference of electronic energies between the complex, guest, and host. In Eq. (4.4), ΔG is the
                                                    𝐵3𝐿𝑌𝑃−𝐷3(𝐺𝐷3𝐵𝐽)/6−31𝐺∗/𝐼𝐸𝐹−𝑃𝐶𝑀
difference of Gibbs free energy corrections; ∆𝐺𝑠𝑐𝑎𝑙𝑒𝑑 𝑅𝑅𝐻𝑂                         represents the
thermochemistry corrections from B3LYP-D3 with a smaller basis set, 6-31G*, along the solvent
correction, IEF-PCM. Since the electronic contribution is the leading term in molecular energy, it
                                                 80


is typically adopted to calculate the single-point energy with a higher-level method and the
thermochemical corrections using a lower-level method ΔGogas∕solute represents the -
1.89 kcal mol−1 correction due to difference in the standard state in gas phase and solvent.
Eq. (4.5) has been adopted to extrapolate to the CBS limit of HF energies. Similar convergence
patterns were found for DFT energies.81
    In prior work, it has been suggested that the solvation energy should be calculated at the level
where the solvation model was parameterized.77 This method led to less accurate energies than
using the ωB97M-V functional45,46 with SMD for the single-point calculations in Eq. (4). For
instance, the binding energy from gas phase ωB97M-V/cc-pVDZ combined with the solvation
energy from B3LYP/6- 31G* led to − 17.77 kcal mol−1, which is ~ 5 kcal mol−1 lower than for
ωB97M-V/cc-pVDZ with SMD directly, −12.69 kcal mol−1 in Table 4.2.
    Since the spin-contamination from an unrestricted Hartree–Fock (UHF) wavefunction may
indicate the inappropriateness of the ground state description82, a stability analysis was
performed with Gaussian 16 on the host exoOA.74,83 To further consider possible multireference
character, the complete active space self-consistent field (CASSCF)84 method was used to
calculate the partial occupation numbers in natural orbitals and conduct a T1 diagnosis38,85,86
with ORCA 4.2.1.
4.3 Results
4.3.1 OA and exoOA Binding Cavities
    The structures and binding cavities of OA and exoOA hosts for the SAMPL7 GDCC
challenge are provided in Figures. 4.2 and 4.3, respectively. The hosts differ by four carboxylic
acid groups which are located on the rim of the binding cavity. On the OA host the carboxylic
                                                 81


acid groups are placed further away from the cavity opening whereas on the exoOA structure, the
carboxylic acids are located next to the opening.
4.3.2 Host Guest Binding Poses
    The structures of the SAMPL7 GDCC guests, binding poses of guests on the OA host, and
binding poses of guests on the exoOA host predicted by docking are provided in Figures. 4.1,
4.4, and 4.5, respectively. During the SAMPL7 GDCC challenge four negatively charged guest
molecules (G1–G4) with carboxylic acid functional groups and four positively charged guest
molecules with amino groups (G5–G8) were investigated. The docking results show that
carboxylic acid and amino groups prefer to orient toward the opening of the cavity rather than
deeper in the cavity.
    Figure 4. 3 a Binding cavity of OA together with G1 (shown in green). b Binding cavity of
exoOA together with G1 (shown in green).
    The binding free energies studied with different models and levels of theory as a part of
SAMPL7-GDCC challenge are given in Tables 4.1 and 4.2. Table 4.1 includes the results
obtained from MMPBSA and MMGBSA binding free energy calculations using the MD
simulation frames. Table 4.2 contains predictions made by B2PLYP-D3, and ωB97M-V. To
                                                82


assess the binding energies determined using each method with respect to the experimental
values; root mean square errors (RMSE), mean absolute errors (MAE), mean errors (ME), r2
correlation coefficients, slope of the correlation plots (m), and Kendall’s Tau (τ) rank correlation
coefficients, which measures how well the method ranks the binding free energy of the guest
compounds with respect to experiment, are also included in Tables 4.1 and 4.2.
    Table 4.1 shows MMPBSA/MMGBSA binding free energy predictions. In addition, RESP
and AM1 partial charges were evaluated, and the influence of adding N-mode solute entropies
are compared to the experimental values. For the ranked submission, SAMPL6 OA systems are
used to perform a linear correction of the binding free energies calculated with RESP charges,
the Poison–Boltzmann (PB) solvation model, and neglected entropies (Table 4.1, RESP-
MMPBSA-Cor). The PB solvation model leads to smaller errors and better correlation when
compared to the Generalized-Born (GB) solvation model with a RMSE of 8.66 and
11.43 kcal mol−1, and r2 of 0.70 and 0.51, respectively. When RESP predictions are considered,
the binding free energy predictions have smaller errors and slightly better correlation as
compared to AM1 charges with a RMSE of 8.66 and 10.67 kcal mol−1, and r2 of 0.70 and 0.63,
respectively. Additionally, the effect of the N-mode solute entropy corrections on the binding
free energy predictions are also assessed. In all other MMPBSA/MMGBSA calculations, the
solute entropies are neglected, with the exception of the RESP-MMPBSA-Nmode calculations
(Table 4.1). Within the RESP-MMPBSA-Nmode method, the solute entropies are calculated
with an N-mode analysis of the harmonic frequencies. The binding energy prediction with the
RESP-MMPBSA-Nmode of G5 to the OA and G1-G5 to the exoOA is positive, which suggests
that these guests do not bind to their hosts.
                                                  83


    Figure 4. 4 Binding modes of guest to OA host generated with docking.
    Figure 4. 5 Binding modes of guest to exoOA host generated with docking.
    The experimental and predicted binding free energies, and the plot used during correction can
be found in Figure 4.17. When the results obtained after linear correction are compared to the
results without correction, both results have the same correlation with the experiment (r2 =0.70,
Figure 4.6). However, the linear correction shifts the predicted values and puts them closer to
                                                84


experimental values, resulted in a decrease in the RMSE from 8.66 to 1.45 kcal mol−1
(Table 4.1).
    Table 4.2 shows calculated binding energies using B2PLYP-D3. In addition, comparison of
binding energies determined when the double-, triple-, and quadruple-ζ levels of basis sets were
used, and a structural optimization in the solvent was considered. When the DZ, TZ and QZ
predictions are compared to the experimental binding energies, little correlation is found (r2
values are 0.25, 0.30 and 0.29, respectively). However, when the r2 values of OA and exoOA
systems are calculated separately, better correlation is obtained for exoOA predictions (see
“Quantum mechanics” section in discussion). A gradual improvement is observed in the RMSE
and MAE occurs as the basis set size is increased from DZ to QZ. The resulting RMSEs are 7.11,
6.70 and 3.92 kcal mol−1 and the MAE are 6.16, 4.84 and 3.92 kcal mol−1 for DZ, TZ and QZ
respectively. The smallest deviation from the experimental binding energies was observed with
QZ basis set evidenced by its lower RMSE and MAE compared to the others.
4.4 Discussion
4.4.1 Molecular Dynamics
    Of the wide variety of available molecular dynamics methods, the MMPBSA and MMGBSA
approaches are considered to be an intermediate option between semi-empirical docking scoring
approaches and computationally more rigorous methods (i.e. free energy perturbation, replica
exchange and thermodynamic integration). However, there are number of factors that can impact
the performance of the MMPBSA/MMGBSA methods including atomic partial charge method,
dielectric constant, force field, solvation model, and solute entropy correction method. In the
current study, the impact of the solvation model, partial charges, and the N-mode solute entropy
correction upon the utility of the method for predicting binding energies was investigated.
                                                 85


    When comparing the ranked submissions of OA and exoOA host–guest complexes (RESP-
MMPBSA-Cor) several trends can be noted. The exoOA host–guest systems have better
correlation with experiment then the OS systems (r 2=0.95 vs 0.26 respectively). A similar
correlation arises for QM calculations (see “QM discussion”). In terms of error analysis, for
RESP-MMPBSA-Cor, OA and exoOA do not have significant differences. Considering the OA
host–guest systems, the RMSE value is 1.55 and MAE is 1.28 kcal mol−1 respectively. For
exoOA, RMSE is 1.32 and MAE 1.03 kcal mol−1.
                                             86


Table 4. 1 The binding free energies in kcal mol−1 for the OA and exoOA host–guest systems predicted from MMPBSA/MMGBSA.
                                  RESP-              RESP-              AM1-                                       RESP-
 Complex           Exp                                                              RESP-MMPBSA-Nmode
                                MMPBSA             MMGBSA            MMPBSA                                   MMPBSA-Cor
  OA-G1        − 4.97±0.02     − 13.01±0.04       − 14.26±0.04      − 12.55±0.05         − 1.41±0.04           − 6.18±0.04
  OA-G2        − 6.91±0.02     − 12.38±0.04       − 12.90±0.04      − 11.66±0.04         − 1.49±0.04           − 5.90±0.04
  OA-G3        − 8.10±0.05     − 17.34±0.05       − 17.09±0.04      − 18.34±0.05         − 2.79±0.05           − 8.13±0.05
  OA-G4        − 6.76±0.05     − 18.49±0.05       − 18.81±0.04      − 18.55±0.05         − 3.80±0.05           − 8.65±0.05
  OA-G5        − 4.73±0.02     − 14.03±0.06       − 17.15±0.07      − 15.84±0.06          0.28±0.06            − 6.64±0.06
  OA-G6        − 4.97±0.02     − 16.56±0.05       − 18.91±0.07      − 18.36±0.05         − 3.21±0.05           − 7.78±0.05
  OA-G7        − 6.07±0.05     − 15.57±0.05       − 21.55±0.08      − 19.94±0.05         − 1.81±0.05           − 7.33±0.05
  OA-G8        − 8.25±0.02     − 17.89±0.04       − 20.91±0.05      − 21.70±0.05         − 4.71±0.04           − 8.36±0.04
exoOA-G1        0.00±0.00       − 7.84±0.08       − 11.54±0.06       − 8.80±0.06          4.37±0.08            − 3.84±0.08
exoOA-G2       − 1.31±0.02      − 7.60±0.07       − 10.25±0.06       − 7.68±0.06          3.48±0.07            − 3.73±0.07
exoOA-G3       − 3.37±0.05     − 10.62±0.08       − 12.65±0.07      − 12.66±0.10          4.57±0.08            − 5.10±0.08
exoOA-G4       − 3.61±0.05     − 10.32±0.12       − 13.91±0.08      − 10.21±0.11          4.52±0.12            − 4.96±0.12
exoOA-G5       − 5.57±0.02     − 12.41±0.07       − 16.26±0.06      − 14.40±0.07          1.78±0.07            − 5.91±0.07
exoOA-G6       − 5.83±0.02     − 14.88±0.07       − 17.91±0.07      − 18.06±0.06         − 1.14±0.07           − 7.02±0.07
exoOA-G7       − 6.98±0.05     − 14.64±0.05       − 20.53±0.07      − 18.44±0.06         − 1.33±0.05           − 6.91±0.05
exoOA-G8       − 7.67±0.02     − 16.54±0.05       − 19.82±0.05      − 21.02±0.06         − 3.61±0.05           − 7.77±0.05
  RMSE                              8.66              11.43             10.67                5.26                  1.45
   MAE                              8.48              11.19             10.29                4.96                  1.16
    ME                              8.48              11.19             10.29               − 4.96                 1.02
     r2                              0.7               0.51              0.63                0.68                   0.7
     m                              1.36               1.27              1.74                 1.3                  0.61
     τ                              0.57               0.52              0.57                0.61                  0.57
                                                                  87


4.4.2 Comparison of Poisson Boltzmann and Generalized Born Solvation Models
    The Poisson-Boltzmann (PB) model is a detailed description of the electrostatic environment
of a solute in an ion containing solvent. On the other hand, the Generalized-Born model is built
upon approximating the linearized PB model, to achieve a computationally less demanding
solution for the solvation.17,87 However, the predictions arising from MMPBSA and MMGBSA
methods are system dependent, when compared to the experimental binding energies.66 In this
section, a comparison between RESP-MMPBSA and RESP-MMGBSA is performed (Table 4.1).
Binding energies predicted using the PB solvation model were closer to the experimental values,
and led to smaller RMSE, MAE and ME as compared to energies predicted using the GB model.
Moreover, PB binding energies also showed better correlation with experimental energies, as
demonstrated by higher r2 as compared to GB binding energies (r2 =0.70 and 0.51, respectively
for PB and GB). Finally, the PB solvation model performed slightly better in the correct ranking
of host–guest systems relative to their experimental binding energies as compared to the ranking
provided by GB, as evidenced by the higher τ of the PB model (τ=0.57 and 0.52, respectively for
PB and GB). Overall, the results demonstrated the superiority of the PB model relative to the GB
model in the predictions of the binding energies of the SAMPL7-GDCC dataset.
                                                88


     Figure 4. 6 a MMPBSA-RESP correlation with experiment. b MMPBSA-RESP correlation
with experiment after linear correction. The linear correction shifted the y-values (△G
Calculated) closer to the x-values (experimental) without changing the correlation coefficient
(r2).
4.4.3 Comparison of RESP and AM1 charges
     Both RESP and AM1-bcc charges were used during MMPBSA/MMGBSA calculations.
Among the two, AM1-bcc is parameterized to generate atomic charges efficiently that emulate
the HF/6-31G* electrostatic potential (RESP), and the charge generation is fully automatized on
Amber tools.54,57 However, the calculation of RESP charges requires additional steps, including
the extraction of electrostatic potential from GAMESS or Gaussian output files, though it results
in more accurate charges. To understand the impact of the RESP and AM1 charge models, both
methods were examined in this SAMPL7-GDCC challenge, and the binding energy predictions
are provided in the RESP-MMPBSA and AM1-MMPBSA columns in Table 4.1. In general,
using RESP charges resulted in the prediction of binding energies that are closer to the
experimental values and resulted in smaller RMSE, MAE and ME as compared to those arising
from the use of AM1- bcc charges. Additionally, binding energies predicted with RESP-charges
                                                89


resulted in slightly better correlation with experimental values compared to the AM1-bcc
prediction (r2=0.70 and 0.63, respectively for RESP and AM1-bcc). However, the two methods
showed the same performance with respect to ranking the binding energies of host–guest systems
(τ = 0.57 for both RESP and AM1-bcc results). Overall, RESP charges quantitatively worked
better, but qualitatively, the two charge methods resulted in similar predictions. The complexity
and computational demand of obtaining RESP charges are higher as compared to obtaining
AM1-bcc charges, so using the latter during MMPBSA/MMGBSA calculations might be
advantageous.
4.4.4 Solute Entropies
    For the prediction of absolute binding energies using MMPBSA/MMGBSA methods, solute
entropies of the ligand and the target in the bound and unbound states were calculated. Among
the methods available, a normal mode analysis of harmonic frequencies (N-mode) from
minimized snapshots of MD frames is commonly used. However, N-mode calculations are
demanding with respect to computing time and memory. Due to these constraints, N-mode
calculations can only be performed on a few snapshots of the MD simulation, which limits the
possible conformational space that can be studied. On similar systems with respect to size and
complexity (i.e. binding to the same protein), the solute entropy contribution to the binding is
considered to be similar. For this reason, solute entropies are commonly neglected when relative
binding energies to the same, or similar targets are studied. To understand the difference between
binding energy predictions both methods were considered: When the solute entropies were
neglected or calculated through N-mode (RESP-MMPBSA and RESP-MMPBSA-Nmode
columns of Table 4.1, respectively). The results showed that N-mode analysis overestimated the
                                                 90


solute entropy difference between the bound and unbound systems and led to unfeasible binding
energies for G5 binding to the OA and G1–G5 binding energies for the exoOA.
    Due to approximations used within the MMPBSA/MMGBSA methodology, and the lack of a
fast method of calculating accurate solute entropies, MMPBSA/MMGBSA commonly
overestimate binding energies, even though the predictions are qualitatively correct. In order to
improve the quantitative predictions from MMPBSA/ MMGBSA, linear corrections from similar
systems are typically used (Figure 4.6b). In our ranked submission for the SAMPL7-GDCC
challenge, a linear correction was also beneficial. Due to the structural resemblance between OA
and exoOA, the linear fitting curve obtained from SAMPL6 OA systems also improved the
exoOA predictions. Correlation plots with and without correction are provided in Figure 4.6. The
linear correction shifted the predicted results closer to the experimental values without changing
the correlation, or the binding affinity ranking of the host–guest systems. In other words, even
though the correction improved MMPBSA/MMGBSA results quantitatively, and it brought the
predictions closer to their absolute values, it did not change the quality of the predictions.
4.4.5 Quantum Mechanics
    The binding energies submitted for SAMPL7 using QM methods are shown in Table 4.2 for
the host–guest systems. As mentioned previously in the “Results” section, the correlation in the
binding energies for exoOA in comparison with experiment was better than for OA. In addition,
for OA and exoOA, the binding energy of the anions (G1–G4 ligands) was nearer that of
experiment than for the cations (G5–G8 ligands). The only exception was for the binding of the
G2 ligand. Quadruple-level basis sets improved the accuracy for most guest–host systems, but
for the G2 ligand there were Journal of Computer-Aided Molecular Design 1 3 still some
discrepancy with respect to experimental results. To address this issue, a number of methods
                                                   91


were considered for both the structural and energetic predictions (see “Comparison of gas phase
and solvated structures”).
    Table 4. 2 Calculated binding energies using B2PLYP-D3 vs experimental binding energies,
using a range of basis sets. The geometry was optimized in the gas phase. Values shown are in
kcal mol-1.
                                                              B2PLYP-D3
    Complex               Exp
                                               DZ                  TZ                 QZ
     OA-G1            − 4.97±0.02               0.4               3.58                4.16
     OA-G2            − 6.91±0.02             − 7.29             − 25.24        − 26.83[− 7.29]a
     OA-G3            − 8.10±0.05            − 11.39             − 7.95              − 7.21
     OA-G4            − 6.76±0.05            − 11.33             − 7.92              − 7.20
     OA-G5            − 4.73±0.02            − 10.56             − 6.34              − 5.52
     OA-G6            − 4.97±0.02            − 13.34             − 9.44              − 8.86
     OA-G7            − 6.07±0.05            − 15.57             − 11.61            − 11.04
     OA-G8            − 8.25±0.02            − 11.29             − 7.82              − 7.19
   exoOA-G1            0.00±0.00               2.66               5.79                6.41
   exoOA-G2           − 1.31±0.02             − 2.16               1.4                1.94
                                             − 12.69 b
                                                                 − 5.47 b
                                                                                − 2.01b[− 0.96]c
   exoOA-G3           − 3.37±0.05            − 10.37             − 6.23              − 5.40
   exoOA-G4           − 3.61±0.05             − 6.97             − 2.89              − 2.15
   exoOA-G5           − 5.57±0.02            − 14.89             − 11.13            − 10.32
   exoOA-G6           − 5.83±0.02            − 17.87             − 14.08            − 13.49
   exoOA-G7           − 6.98±0.05            − 19.43             − 15.65            − 15.04
   exoOA-G8           − 7.67±0.02            − 14.52             − 10.90            − 10.23
     RMSE                                      7.11                6.7                3.92
      MAE                                      6.16               4.84                 3
      ME                                       5.44               3.09                1.84
       r2                                      0.25                0.3                0.29
       m                                       1.41                 2                 1.17
       τ                                       0.33               0.38                0.35
a
  Parenthesis value indicate calculation done with DZ basis set and used as ranked submission.
b
  Value indicate calculated binding energies using ωB97M-V. The geometry optimized in a
solvated environment.
c
  Parenthesis value indicate calculated binding energies using ωB97M-V extrapolated to CBS.
                                                 92


4.4.6 OA Discussion of Results
    The ranked submission results were performed using a quadruple-ζ level basis set (cc-pVQZ
(C, N, H) and aug-cc-pVQZ (O, Cl). The only exception was for OA-G2, where a double-ζ level
basis set was used. The quadruple-ζ value for the binding energy was − 26.83 kcal mol−1, which
deviates from experiment. For OA-G2, the chlorine atom seems to present a challenge, as the
binding energy predictions deviate significantly from experiment when considering any level
basis set. Though the tight-d basis sets are the recommended sets for chlorine, the cc-pV(D+d)Z
and cc-pV(T+d) Z sets also resulted in similar deviations, and did not resolve the differences
(Table 4.4).
    In addition, to the quadruple-ζ level binding energies, double, and triple-ζ quality results
were also submitted as non-ranked. Last year’s submission from our research group included
single-point calculations using B3PW91-D3, which resulted in a large overestimation of the
binding energy. Single-point calculations with B3PW91-D3 were performed for SAMPL7 as a
check, and the same outcome was observed, so these results were not included in the non-ranked
submission. The RMSE and MAE for the octa-acid are 2.99 kcal mol−1 and 2.22 kcal mol−1
respectively, which means binding energies close to experiment were obtained (Table 4.5).
    The G1 guest performs differently for OA and exoOA. B2PLYP-D3 predicts that the
complex OA-G1 does not form. Increasing the basis set size from double-ζ to quadruple-ζ, the
bonding energy increases from 0.40 to 4.16 kcal mol−1. The difference in chemical structure
between G1 and G4 is small, however, they have starkly different binding patterns. For OA-G4,
the best submitted results are 0.44 kcal mol−1 different from experiment, while for OA-G1,
B2PLYP-D3 predicts a non-binding interaction.
                                                 93


    From OA-G3 to OA-G8, B2PLYP-D3/QZ the binding energy predictions are
within~1 kcal mol−1 agreement from experimental measurements, with the exception of OA-G6
and OA-G7. The guests G5 to G8 are positively charged and affect the binding energies
differently. OA-G5 and OA-G8 are very close to experiment, but OA-G6 and OA-G7
overestimate the binding energy by~4–5 kcal mol−1.
4.4.7 exoOA Discussion of Results
    For the ranked submission which entailed the use of quadruple-ζ level basis sets (cc-pVQZ
(C, N, H) and aug-cc-pVQZ (O, Cl)), exoOA complexes have a higher RMSD and MAE than for
the OA complexes (4.76 and 3.90 kcal mol−1 compared to 2.99 and 2.22 kcal mol−1, respectively
(Table 4.5)). In terms of correlation (r2 and Kendall’s Tau) the values are quite different for OA
and exoOA. For exoOA, the r2 and Kendall’s Tau values are 0.72 and 0.58. For OA, the
correlation is much less significant (r2 =0.09 and Kendall’s Tau=0.076).
    For exoOA-G1, B2PLYP-D3/cc-pVXZ correctly predicted a positive binding energy,
indicating that the complex does not form. It is interesting to note the difference in binding
energies for OA-G2 and exoOA-G2. The binding energy prediction for OA-G2 led to large
negative values with a quadruple-ζ basis set, but the exoOA-G2 prediction indicates that the
complex does not form. For exoOA-G3 and exoOA-G4, the difference between the calculated
results at the QZ level is~2 kcal mol−1. Similar to OA-G5 to G8, B2PLYP-D3 overestimates the
predicted binding energy compared to the experiment; guests that are positively charged
overshoot the binding energy (~6–7 kcal mol−1).
    Since the predicted binding energies of OA-G2 and exoOA-G2 present large deviations from
the experimental values, further investigations were performed. The following results were not
submitted to the competition, but they are included in the present work to provide additional
                                                 94


insight about what is (and is not) needed to describe these systems, or similar systems, in the
future. ExoOA-G2 was optimized at the B3LYP-D3 (GD3BJ)/6-31G* level with the IEF-PCM
solvent model for water. The performance of B3PW91-D3 and B3LYP-D3 led to similar binding
energies as compared to the reference values provided by CCSD(T)/complete basis set (CBS)
limit from Mardirossian et al..46
4.4.8 Comparison of Gas Phase and Solvated Structures
    The comparison between the structure in the gas phase and solvent is shown in Figure 4.7. In
the solvent, the host slightly bends to the guest. This difference may come from a competition
between electrostatic repulsion and dispersion interaction. The electrostatic repulsion between
the negative charges in the host and guest may be screened by the dielectric constant of the
solvent. The dispersion attraction may be less influenced by the solvent, since it does not come
from the net charge. The G2 guest in the gas phase has C2v symmetry. In the solvent, the
optimization of the geometry within the same symmetry leads to an imaginary frequency of 45i
cm−1. The local minimum is found to be of C2 symmetry. For the exoOA host, it was noticed
that both in gas phase and solvent (water), the C1 structure has a lower Gibbs free energy than
the C4 isomer (-0.9 and -2.3 kcal mol−1, respectively). This difference may arise because it is a
large system that can have near-fat potential surfaces. In addition, in the solvent optimized hosts,
both C4 and C1 have small. imaginary frequencies (7i and 15i for C4 and C1, respectively).
Similar results have been reported by Grimme et al. in host–guest complexes.88
                                                95


    Figure 4. 7 Comparison between gas-phase (green) and solvent (blue) optimized structures of
exoOA-G2.
    Additionally, the exoOA host presented symmetry breaking solutions at HF levels, though
not with the present functional B3LYP. An instability analysis at the HF/ cc-pVDZ level on the
gas-phase optimized structure leads to an 〈 S2 〉 =5.18. This large value of spin contamination
may indicate the need to account for non-dynamical correlation.82 To further consider possible
multireference character, a CASSCF(6,6)/STO-3G calculation for the occupation number and
DLPNO-CCSD/def2-SVP for the T1 diagnostic value were done.89 (Though the STO-3G basis
set is far too small for reasonable calculations, it is used here just to provide a quick,
approximate assessment in regards to potential need to account for non-dynamical correlation.)
The active space in the CASSCF calculation included the long pairs of negatively charged
oxygen atoms and anti-bonding orbitals of benzene rings as a starting point, along with 6
electrons (6,6). The same active space has been adopted from symmetry reasons in a fullerene
system. The occupation numbers (1.94 and 0.05), and T1 value (0.014) point to less
multireference nature than suggested by instability analysis (〈S2 〉 =5.18). This may be an
example of the artificial symmetry breaking of spin state proposed by Head-Gordon et al..90,91 It
                                              96


has been found for this type of system, single-reference methods can provide reasonable energies
as compared to those from experimental data and from multi-reference calculations. Hence, it is
expected that the present density functional approaches provide a reasonable choice for the
present systems.
4.5 Conclusion
    In the SAMPL7 competition, MD and QM simulations were performed to predict the binding
free energies of host–guest systems. In this MD study, MMPBSA/MMGBSA approaches were
used, and the effects of PB and GB solvation models, RESP, AM1-bcc partial charges, and N-
mode solute entropy were considered to determine the best route for the prediction of binding
energies. Simulations with the PB solvation models led to better agreement with experiment than
GB solvation models, which resulted in lower RMSE values and higher correlation coefficients.
The comparison between the two charge methods showed that RESP charges led to
quantitatively slightly better results with a lower RMSE value. However, r2 and τ values for the
predictions made with RESP and AM1-bcc charges were similar. As the complexity and the cost
required for obtaining RESP charges were also considered, using AM1-bcc charges may be
advantageous for systems of increased size. Comparison of the binding energy predictions with
and without N-mode solute entropies showed that, N-mode calculations overestimate the solute
entropy difference, and may led to unfeasible binding energies. In contrast, qualitatively and
quantitatively better results can be obtained by using neglected solute entropies with a correction
on the predicted results using a similar dataset.
    For QM simulations, two strategies were adopted to compute the binding free energy:
    (i)     Using the non-double-hybrid part of the functional B2PLYP-D3 with the SMD
            implicit solvent model, with single-point energy differences (complex–host– guest) as
                                                  97


            the final values. Our predictions yield substantially higher accuracy than SAMPL6
            QM predictions.
    (ii)    For the exoOA-G2 system, we performed a combined approach including geometry
            optimizations, frequency calculations within solvent models for the guest, host, and
            complex, scaled thermochemistry corrections, the standard state correction, and basis
            set extrapolation with the recently developed functional ωB97M-V. The prediction
            agrees well with the experimental value.
    DFT studies were performed on SAMPL7 guest–host binding systems. Since the host system
presents instabilities for a spin restricted Hartree–Fock wavefunction, determining binding
energies from orbital optimizations in correlated approaches may be considered92,93, especially as
local correlated coupled cluster methods have been adopted in SAMPL4 and 5 host-guest
systems36,37, for which chemical accuracy (1–2 kcal mol−1) has not been reached. Considering
the thermochemical correction, it has been suggested to adopt rotation-type formalism for low
vibrational frequency contributions.94,95 Li et al., showed that low-vibration corrections led to
better agreement with experimental data.95 However, scaling factors have not been explored or
suggested on this type of approaches which may be a future interest. Moreover, the MP2 part of
B2PLYP-D3 approach may be evaluated with density-fitting, incorporating additional
correlation, in order to improve predictions.
    In summary, the routes investigated in this study provided better results than in our previous
SAMPL6 efforts for both MD and QM simulations. In general, the exoOA host–guest systems
correlated better with experiment in comparison to OA. Considering the MD study, using the
RESP partial charges, which were not used for SAMPL6, along with a linear correction led to
better correlation and accuracy for the SAMPL7 approach. In addition, the inclusion of a linear ft
                                                98


correction, yielded very accurate predictions, however the approach is limited to the availability
of similar types of structures.
    For QM predictions, higher accuracy for the binding energies can be achieved with a number
of approaches. A geometry optimization was initially performed from generated poses, which
guaranteed a minimum at the potential energy surface. The additional single-point calculations at
B2PLYP-D3 level rendered more accurate binding energies than our SAMPL6 approach, and did
not overestimate the binding energies. On the other hand, low correlation was obtained for the
OA-systems. The binding energy obtained for the G2 ligand for OA and exoOA can be attributed
the susceptibility of the DFT functional B2PLYP to strongly electronegative atoms, such as
chlorine. The utility of the newly considered approaches (B2PLYP-D3 and combined method),
should be examined for a broader range of systems. Vibrational corrections and explicit solvation
models could also be considered.
                                                99


APPENDIX
   100


   Table 4. 3 SAMPL6-OA host guest binding data used during linear correction. Units are in in
kcal mol-1.
                       Guest       Predicted Binding           Experimental Binding
                        G1                 -12.60                     -5.68
                        G2                 -10.98                     -4.65
         SAMPL 6
                        G3                 -17.18                     -8.38
                        G4                 -9.93                      -5.18
                        G5                 -17.32                     -7.11
                        G6                 -8.67                      -4.59
                        G7                 -10.49                     -4.97
                        G8                 -11.02                     -6.22
   Table 4. 4 Calculated binding energies using B2PLYP-D3 vs experimental binding energies,
using cc-pV(D+d)Z and cc-pV(T+d)Z. The geometry was optimized in the gas phase. Values
shown are in kcal mol-1.
                               Experimental                    B2PLYP-D3
            Complex
                                 Binding                DZ                     TZ
               OA-G2           -4.97 ± 0.02            -7.47                  -25.33
          exoOA-G2             -1.31 ± 0.02            -2.21                  1.16
                                              101


    Table 4. 5 Root mean square errors (RMSE), mean absolute errors (MAE), mean errors
(ME), r2 correlation coefficients, slope of the correlation plots (m), and Kendall’s Tau (τ) rank
correlation coefficients for OA and exoOA for the ranked submission. Values shown are in kcal
mol-1.
                                               OA          exoOA
                                RMSE           2.99          4.76
                                 MAE           2.22          3.90
                                  ME           0.39          3.50
                                   r2         0.093          0.72
                                   m           0.72          1.97
                                   τ          0.077          0.59
                                               102


   Figure 4. 8 Geometry optimized structures of OA and exoOA host/guess with B3PW91-
D3/cc-pVDZ.
                                          103


Figure 4. 9 RMSD plots of exoOA-G1 and exoOA-G2 MD simulations.
Figure 4. 10 RMSD plots of exoOA-G3 and exoOA-G4 MD simulations.
                                       104


Figure 4. 11 RMSD plots of exoOA-G5 and exoOA-G6 MD simulations.
Figure 4. 12 RMSD plots of exoOA-G7 and exoOA-G8 MD simulations.
                                      105


Figure 4. 13 RMSD plots of OA-G1 and OA-G2 MD simulations.
Figure 4. 14 RMSD plots of OA-G3 and OA-G4 MD simulations.
                                      106


Figure 4. 15 RMSD plots of OA-G5 and OA-G6 MD simulations.
Figure 4. 16 RMSD plots of OA-G7 and OA-G8 MD simulations.
                                      107


   Figure 4. 17 Correlation plot of SAMPL6-OA host-guest binding. The x-axis provides the
experimental binding energies and the y-axis contains binding energies predicted by RESP-
MMPBSA method without solute entropies. A trendline equation is used to correct the predicted
SAMPL7 binding energies.
                                           108


REFERENCES
    109


                                       REFERENCES
(1)  Alonso, H.; Bliznyuk, A. A.; Gready, J. E. Combining Docking and Molecular Dynamic
     Simulations in Drug Design. Med. Res. Rev. 2006, 26 (5), 531–568.
     https://doi.org/10.1002/med.20067.
(2)  Brown, F. K.; Sherer, E. C.; Johnson, S. A.; Holloway, M. K.; Sherborne, B. S. The
     Evolution of Drug Design at Merck Research Laboratories. J. Comput. Aided. Mol. Des.
     2017, 31 (3), 255–266. https://doi.org/10.1007/s10822-016-9993-1.
(3)  Cerchietti, L. C.; Ghetu, A. F.; Zhu, X.; Da Silva, G. F.; Zhong, S.; Matthews, M.;
     Bunting, K. L.; Polo, J. M.; Farès, C.; Arrowsmith, C. H.; Yang, S. N.; Garcia, M.; Coop,
     A.; MacKerell, A. D.; Privé, G. G.; Melnick, A. A Small-Molecule Inhibitor of BCL6
     Kills DLBCL Cells In Vitro and In Vivo. Cancer Cell 2010, 17 (4), 400–411.
     https://doi.org/10.1016/j.ccr.2009.12.050.
(4)  Jiang, X.; Dulubova, I.; Reisman, S. A.; Hotema, M.; Lee, C. Y. I.; Liu, L.; McCauley, L.;
     Trevino, I.; Ferguson, D. A.; Eken, Y.; Wilson, A. K.; Wigley, W. C.; Visnick, M. A
     Novel Series of Cysteine-Dependent, Allosteric Inverse Agonists of the Nuclear Receptor
     RORγt.       Bioorganic      Med.     Chem.     Lett.     2020,     30     (6),   126967.
     https://doi.org/10.1016/j.bmcl.2020.126967.
(5)  Muddana, H. S.; Daniel Varnado, C.; Bielawski, C. W.; Urbach, A. R.; Isaacs, L.; Geballe,
     M. T.; Gilson, M. K. Blind Prediction of Host–Guest Binding Affinities: A New SAMPL3
     Challenge. J. Comput. Aided. Mol. Des. 2012, 26 (5), 475–487.
     https://doi.org/10.1007/s10822-012-9554-1.
(6)  Muddana, H. S.; Fenley, A. T.; Mobley, D. L.; Gilson, M. K. The SAMPL4 Host–Guest
     Blind Prediction Challenge: An Overview. J. Comput. Aided. Mol. Des. 2014, 28 (4),
     305–317. https://doi.org/10.1007/s10822-014-9735-1.
(7)  Yin, J.; Henriksen, N. M.; Slochower, D. R.; Shirts, M. R.; Chiu, M. W.; Mobley, D. L.;
     Gilson, M. K. Overview of the SAMPL5 Host–Guest Challenge: Are We Doing Better? J.
     Comput. Aided. Mol. Des. 2017, 31 (1), 1–19. https://doi.org/10.1007/s10822-016-9974-4.
(8)  Nicholls, A.; Wlodek, S.; Grant, J. A. The SAMP1 Solvation Challenge: Further Lessons
     Regarding the Pitfalls of Parametrization. J. Phys. Chem. B 2009, 113 (14), 4521–4532.
     https://doi.org/10.1021/jp806855q.
(9)  Geballe, M. T.; Skillman, a. G.; Nicholls, A.; Guthrie, J. P.; Taylor, P. J. The SAMPL2
     Blind Prediction Challenge: Introduction and Overview. J. Comput. Aided. Mol. Des.
     2010, 24 (4), 259–279. https://doi.org/10.1007/s10822-010-9350-8.
(10) Rizzi, A.; Murkli, S.; McNeill, J. N.; Yao, W.; Sullivan, M.; Gilson, M. K.; Chiu, M. W.;
     Isaacs, L.; Gibb, B. C.; Mobley, D. L.; Chodera, J. D. Overview of the SAMPL6 Host-
     Guest Binding Affinity Prediction Challenge. 2018. https://doi.org/10.1101/371724.
                                              110


(11) Işık, M.; Bergazin, T. D.; Fox, T.; Rizzi, A.; Chodera, J. D.; Mobley, D. L. Assessing the
     Accuracy of Octanol–Water Partition Coefficient Predictions in the SAMPL6 Part II
     Log P      Challenge;     Springer    International    Publishing,   2020;      Vol.   34.
     https://doi.org/10.1007/s10822-020-00295-0.
(12) Işık, M.; Levorse, D.; Rustenburg, A. S.; Ndukwe, I. E.; Wang, H.; Wang, X.; Reibarkh,
     M.; Martin, G. E.; Makarov, A. A.; Mobley, D. L.; Rhodes, T.; Chodera, J. D. PK
     a Measurements for the SAMPL6 Prediction Challenge for a Set of Kinase Inhibitor-like
     Fragments. J. Comput. Aided. Mol. Des. 2018, 32 (10), 1117–1138.
     https://doi.org/10.1007/s10822-018-0168-0.
(13) Gibb, C. L. D.; Gibb, B. C. Binding of Cyclic Carboxylates to Octa-Acid Deep-Cavity
     Cavitand.     J.    Comput.     Aided.    Mol.     Des.    2014,   28      (4),   319–325.
     https://doi.org/10.1007/s10822-013-9690-2.
(14) Zwanzig, R. W. High‐temperature Equation of State by a Perturbation Method. I.
     Nonpolar       Gases.     J.    Chem.       Phys.    1954,      22     (8),     1420–1426.
     https://doi.org/10.1063/1.1740409.
(15) Jiang, W.; Hodoscek, M.; Roux, B. Computation of Absolute Hydration and Binding Free
     Energy with Free Energy Perturbation Distributed Replica-Exchange Molecular
     Dynamics.       J.    Chem.     Theory     Comput.      2009,    5    (10),     2583–2588.
     https://doi.org/10.1021/ct900223z.
(16) Mitchell, M. J.; McCammon, J. A. Free Energy Difference Calculations by
     Thermodynamic Integration: Difficulties in Obtaining a Precise Value. J. Comput. Chem.
     1991, 12 (2), 271–275. https://doi.org/10.1002/jcc.540120218.
(17) Miller, B. R.; McGee, T. D.; Swails, J. M.; Homeyer, N.; Gohlke, H.; Roitberg, A. E.
     MMPBSA.Py: An Efficient Program for End-State Free Energy Calculations. J. Chem.
     Theory Comput. 2012, 8 (9), 3314–3321. https://doi.org/10.1021/ct300418h.
(18) Procacci, P.; Guarrasi, M.; Guarnieri, G. SAMPL6 Host–Guest Blind Predictions Using a
     Non Equilibrium Alchemical Approach. J. Comput. Aided. Mol. Des. 2018, 32 (10), 965–
     982. https://doi.org/10.1007/s10822-018-0151-9.
(19) Frank, L.; Nupur, S.; Zheng, Z.; Merz, K. M. Detailed Potential of Mean Force Studies on
     Host – Guest Systems from the SAMPL6 Challenge. J. Comput. Aided. Mol. Des. 2018,
     32 (10), 1013–1026. https://doi.org/10.1007/s10822-018-0153-7.
(20) Caldararu, O.; Olsson, M. A.; Misini Ignjatović, M.; Wang, M.; Ryde, U. Binding Free
     Energies in the SAMPL6 Octa-Acid Host–Guest Challenge Calculated with MM and QM
     Methods. J. Comput. Aided. Mol. Des. 2018, 32 (10), 1027–1046.
     https://doi.org/10.1007/s10822-018-0158-2.
(21) Nishikawa, N.; Han, K.; Wu, X.; Tofoleanu, F.; Brooks, B. R. Comparison of the
     Umbrella Sampling and the Double Decoupling Method in Binding Free Energy
     Predictions for SAMPL6 Octa-Acid Host–Guest Challenges. J. Comput. Aided. Mol. Des.
                                              111


     2018, 32 (10), 1075–1086. https://doi.org/10.1007/s10822-018-0166-2.
(22) Laury, M. L.; Wang, Z.; Gordon, A. S.; Ponder, J. W. Absolute Binding Free Energies for
     the SAMPL6 Cucurbit[8]Uril Host–Guest Challenge via the AMOEBA Polarizable Force
     Field.     J.   Comput.     Aided.    Mol.    Des.    2018,    32    (10),     1087–1095.
     https://doi.org/10.1007/s10822-018-0147-5.
(23) Eken, Y.; Patel, P.; Díaz, T.; Jones, M. R.; Wilson, A. K. SAMPL6 Host–Guest
     Challenge: Binding Free Energies via a Multistep Approach. J. Comput. Aided. Mol. Des.
     2018, 32 (10), 1097–1115. https://doi.org/10.1007/s10822-018-0159-1.
(24) Dixon, T.; Lotz, S. D.; Dickson, A. Predicting Ligand Binding Affinity Using On- and
     off-Rates for the SAMPL6 SAMPLing Challenge. J. Comput. Aided. Mol. Des. 2018, 32
     (10), 1001–1012. https://doi.org/10.1007/s10822-018-0149-3.
(25) Hudson, P. S.; Han, K.; Woodcock, H. L.; Brooks, B. R. Force Matching as a Stepping
     Stone to QM/MM CB[8] Host/Guest Binding Free Energies: A SAMPL6 Cautionary Tale.
     J. Comput. Aided. Mol. Des. 2018, 32 (10), 983–999. https://doi.org/10.1007/s10822-018-
     0165-3.
(26) Sun, H.; Duan, L.; Chen, F.; Liu, H.; Wang, Z.; Pan, P.; Zhu, F.; Zhang, J. Z. H.; Hou, T.
     Assessing the Performance of MM/PBSA and MM/GBSA Methods. 7. Entropy Effects on
     the Performance of End-Point Binding Free Energy Calculation Approaches. 2018,
     14450–14460. https://doi.org/10.1039/c7cp07623a.
(27) Hamaguchi, N.; Fusti-Molnar, L.; Wlodek, S. Force-Field and Quantum-Mechanical
     Binding Study of Selected SAMPL3 Host-Guest Complexes. J. Comput. Aided. Mol. Des.
     2012, 26 (5), 577–582. https://doi.org/10.1007/s10822-012-9553-2.
(28) Becke, A. D. Density-Functional Thermochemistry. III. The Role of Exact Exchange. J.
     Chem. Phys. 1993, 98 (7), 5648. https://doi.org/10.1063/1.464913.
(29) Lee, C.; Yang, eitao; Parr, R. G. Development of the Colic-Salvetti Correlation-Energy
     Formula into a Functional of the Electron Density; Vol. 37.
(30) Stephens, P. J.; Devlin, F. J.; Chabalowski, C. F.; Frisch, M. J. Ab Initio Calculation of
     Vibrational Absorption and Circular Dichroism Spectra Using Density Functional Force
     Fields. J. Phys. Chem. 1994, 98 (45), 11623–11627. https://doi.org/10.1021/j100096a001.
(31) Sure, R.; Antony, J.; Grimme, S. Blind Prediction of Binding Affinities for Charged
     Supramolecular Host-Guest Systems: Achievements and Shortcomings of DFT-D3. J.
     Phys. Chem. B 2014, 118 (12), 3431–3440. https://doi.org/10.1021/jp411616b.
(32) Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A Consistent and Accurate Ab Initio
     Parametrization of Density Functional Dispersion Correction (DFT-D) for the 94
     Elements       H-Pu.     J.      Chem.     Phys.     2010,      132     (15),      154104.
     https://doi.org/10.1063/1.3382344.
                                             112


(33) Grimme, S.; Ehrlich, S.; Goerigk, L. Effect of the Damping Function in Dispersion
     Corrected Density Functional Theory. J. Comput. Chem. 2011, 32 (7), 1456–1465.
     https://doi.org/10.1002/jcc.21759.
(34) Becke, A. D.; Johnson, E. R. A Density-Functional Model of the Dispersion Interaction. J.
     Chem. Phys. 2005, 123 (15). https://doi.org/10.1063/1.2065267.
(35) Sure, R.; Grimme, S. Corrected Small Basis Set Hartree-Fock Method for Large Systems.
     J. Comput. Chem. 2013, 34 (19), 1672–1685. https://doi.org/10.1002/jcc.23317.
(36) Mikulskis, P.; Cioloboc, D.; Andrejić, M.; Khare, S.; Brorsson, J.; Genheden, S.; Mata, R.
     A.; Söderhjelm, P.; Ryde, U. Free-Energy Perturbation and Quantum Mechanical Study of
     SAMPL4 Octa-Acid Host-Guest Binding Energies. J. Comput. Aided. Mol. Des. 2014, 28
     (4), 375–400. https://doi.org/10.1007/s10822-014-9739-x.
(37) Caldararu, O.; Olsson, M. A.; Riplinger, C.; Neese, F.; Ryde, U. Binding Free Energies in
     the SAMPL5 Octa-Acid Host–Guest Challenge Calculated with DFT-D3 and CCSD(T). J.
     Comput. Aided. Mol. Des. 2017, 31 (1), 87–106. https://doi.org/10.1007/s10822-016-
     9957-5.
(38) Riplinger, C.; Pinski, P.; Becker, U.; Valeev, E. F.; Neese, F. Sparse Maps - A Systematic
     Infrastructure for Reduced-Scaling Electronic Structure Methods. II. Linear Scaling
     Domain Based Pair Natural Orbital Coupled Cluster Theory. J. Chem. Phys. 2016, 144
     (2). https://doi.org/10.1063/1.4939030.
(39) Grimme, S. Semiempirical GGA-Type Density Functional Constructed With A Long-
     Range Dispersion Correction. J. Comput. Chem. 2006, 27 (15), 1787–1799.
     https://doi.org/10.1002/jcc.20495.
(40) Perdew, J. P.; Wang, Y. Accurate and Simple Analytic Representation of the Electron-Gas
     Correlation      Energy.     Phys.    Rev.    B      1992,    45     (23),   13244–13249.
     https://doi.org/10.1103/PhysRevB.45.13244.
(41) Perdew, J. P.; Chevary, J. A.; Vosko, S. H.; Jackson, K. A.; Pederson, M. R.; Singh, D. J.;
     Fiolhais, C. Atoms, Molecules, Solids, and Surfaces: Applications of the Generalized
     Gradient Approximation for Exchange and Correlation. Phys. Rev. B 1992, 46 (11), 6671–
     6687. https://doi.org/10.1103/PhysRevB.46.6671.
(42) Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Universal Solvation Model Based on
     Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk
     Dielectric Constant and Atomic Surface Tensions. J. Phys. Chem. B 2009, 113 (18),
     6378–6396. https://doi.org/10.1021/jp810292n.
(43) Grimme, S. Semiempirical Hybrid Density Functional with Perturbative Second-Order
     Correlation. J. Chem. Phys. 2006, 124 (3). https://doi.org/10.1063/1.2148954.
(44) Goerigk, L.; Grimme, S. Efficient and Accurate Double-Hybrid-Meta-GGA Density
     Functionals- Evaluation with the Extended GMTKN30 Database for General Main Group
                                              113


     Thermochemistry, Kinetics, and Noncovalent Interactions. J. Chem. Theory Comput.
     2011, 7 (2), 291–309. https://doi.org/10.1021/ct100466k.
(45) Mardirossian, N.; Head-Gordon, M. ω B97M-V: A Combinatorially Optimized, Range-
     Separated Hybrid, Meta-GGA Density Functional with VV10 Nonlocal Correlation. J.
     Chem. Phys. 2016, 144 (21). https://doi.org/10.1063/1.4952647.
(46) Mardirossian, N.; Head-Gordon, M. Thirty Years of Density Functional Theory in
     Computational Chemistry: An Overview and Extensive Assessment of 200 Density
     Functionals.        Mol.       Phys.        2017,        115        (19),     2315–2372.
     https://doi.org/10.1080/00268976.2017.1333644.
(47) Labute, P. Protonate3D: Assignment of Ionization States and Hydrogen Coordinates to
     Macromolecular Structures. Proteins Struct. Funct. Bioinforma. 2009, 75 (1), 187–205.
     https://doi.org/10.1002/prot.22234.
(48) Chemical Computing Group Inc. Molecular Operating Environment (MOE). Montreal
     2016.
(49) Anthony, W. J.; Bender, A.; Kaya, T.; Clemons, P. A. Alpha Shapes Applied to Molecular
     Shape Characterization Exhibit Novel Properties Compared to Established Shape
     Descriptors.     J.    Chem.      Inf.    Model.      2009,     49    (10),   2231–2241.
     https://doi.org/10.1021/ci900190z.
(50) Hoffmann, R. An Extended Hückel Theory. I. Hydrocarbons. J. Chem. Phys. 1963, 39 (6),
     1397–1412. https://doi.org/10.1063/1.1734456.
(51) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison
     of Multiple Amber Force Fields and Development of Improved Protein Backbone
     Parameters. Proteins 2006, 65 (3), 712–725. https://doi.org/10.1002/prot.21123.
(52) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and
     Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25 (9), 1157–1174.
     https://doi.org/10.1002/jcc.20035.
(53) Corbeil, C. R.; Williams, C. I.; Labute, P. Variability in Docking Success Rates Due to
     Dataset Preparation. J. Comput. Aided. Mol. Des. 2012, 26 (6), 775–786.
     https://doi.org/10.1007/s10822-012-9570-1.
(54) Jakalian, A.; Jack, D. B.; Bayly, C. I. Fast, Efficient Generation of High-Quality Atomic
     Charges. AM1-BCC Model: II. Parameterization and Validation. J. Comput. Chem. 2002,
     23 (16), 1623–1641. https://doi.org/10.1002/jcc.10128.
(55) Vanquelef, E.; Simon, S.; Marquant, G.; Garcia, E.; Klimerak, G.; Delepine, J. C.;
     Cieplak, P.; Dupradeau, F. Y. R.E.D. Server: A Web Service for Deriving RESP and ESP
     Charges and Building Force Field Libraries for New Molecules and Molecular Fragments.
     Nucleic Acids Res. 2011, 39 (SUPPL. 2), 511–517. https://doi.org/10.1093/nar/gkr288.
                                              114


(56) Bayly, C. I.; Cieplak, P.; Cornell, W. D.; Kollman, P. A. A Well-Behaved Electrostatic
     Potential Based Method Using Charge Restraints for Deriving Atomic Charges: The
     RESP        Model.     J.     Phys.    Chem.      1993,    97       (40),     10269–10280.
     https://doi.org/10.1021/j100142a004.
(57) Case, D. A.; Betz, R. M.; Botello-Smith, W.; Cerutti, D. S.; Cheatham III, T. E.; Darden,
     T. A.; Duke, R. E.; Giese, T. J.; Gohlke, H.; Goetz, A. W.; Homeyer, N.; Izadi, S.;
     Janowski, P.; Kaus, J.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Li, P.; Lin, C.; Luchko, T.;
     Luo, R.; Madej, B.; York, D. M.; Kollman, P. A. Amber 18. 2018.
     https://doi.org/10.1002/jcc.23031.
(58) Joung, I. S.; Cheatham, T. E. Determination of Alkali and Halide Monovalent Ion
     Parameters for Use in Explicitly Solvated Biomolecular Simulations. J. Phys. Chem. B
     2008, 112 (30), 9020–9041. https://doi.org/10.1021/jp8001614.
(59) Horn, H. W.; Swope, W. C.; Pitera, J. W.; Madura, J. D.; Dick, T. J.; Hura, G. L.; Head-
     Gordon, T. Development of an Improved Four-Site Water Model for Biomolecular
     Simulations: TIP4P-Ew. J. Chem. Phys. 2004, 120 (20), 9665–9678.
     https://doi.org/10.1063/1.1683075.
(60) Döpke, M. F.; Moultos, O. A.; Hartkamp, R. On the Transferability of Ion Parameters to
     the TIP4P/2005 Water Model Using Molecular Dynamics Simulations. J. Chem. Phys.
     2020, 152 (2). https://doi.org/10.1063/1.5124448.
(61) Loncharich, R. J.; Brooks, B. R.; Pastor, R. W. Langevin Dynamics of Peptides: The
     Frictional Dependence of Isomerization Rates of N‐acetylalanyl‐N′‐methylamide.
     Biopolymers 1992, 32 (5), 523–535. https://doi.org/10.1002/bip.360320508.
(62) Berendsen, H. J. C.; Postma, J. P. M.; Van Gunsteren, W. F.; Dinola, A.; Haak, J. R.
     Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81 (8),
     3684–3690. https://doi.org/10.1063/1.448118.
(63) Cerutti, D. S.; Case, D. A. Multi-Level Ewald: A Hybrid Multigrid/Fast Fourier
     Transform Approach to the Electrostatic Particle-Mesh Problem. J. Chem. Theory
     Comput. 2010, 6 (2), 443–458. https://doi.org/10.1021/ct900522g.
(64) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. Numerical Integration of the Cartesian
     Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes. J.
     Comput. Phys. 1977, 23 (3), 327–341. https://doi.org/10.1016/0021-9991(77)90098-5.
(65) Onufriev, A.; Bashford, D.; Case, D. A. Exploring Protein Native States and Large-Scale
     Conformational Changes with a Modified Generalized Born Model. Proteins Struct.
     Funct. Genet. 2004, 55 (2), 383–394. https://doi.org/10.1002/prot.20033.
(66) Hou, T.; Wang, J.; Li, Y.; Wang, W. Assessing the Performance of the MM/PBSA and
     MM/GBSA Methods. 1. The Accuracy of Binding Free Energy Calculations Based on
     Molecular Dynamics Simulations. J. Chem. Inf. Model. 2011, 51 (1), 69–82.
     https://doi.org/10.1021/ci100275a.
                                              115


(67) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J.
     R.; Scalmani, G.; Barone, V.; Petersson, G. A.; Nakatsuji, H.; Li, X.; Caricato, M.;
     Marenich, A. V; Bloino, J.; Janesko, B. G.; Gomperts, R.; Mennucci, B.; Hratchian, H. P.;
     Ortiz, J. V; Izmaylov, A. F.; Sonnenberg, J. L.; Williams-Young, D.; Ding, F.; Lipparini,
     F.; Egidi, F.; Goings, J.; Peng, B.; Petrone, A.; Henderson, T.; Ranasinghe, D.;
     Zakrzewski, V. G.; Gao, J.; Rega, N.; Zheng, G.; Liang, W.; Hada, M.; Ehara, M.; Toyota,
     K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.;
     Vreven, T.; Throssell, K.; Montgomery Jr., J. A.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.
     J.; Heyd, J. J.; Brothers, E. N.; Kudin, K. N.; Staroverov, V. N.; Keith, T. A.; Kobayashi,
     R.; Normand, J.; Raghavachari, K.; Rendell, A. P.; Burant, J. C.; Iyengar, S. S.; Tomasi,
     J.; Cossi, M.; Millam, J. M.; Klene, M.; Adamo, C.; Cammi, R.; Ochterski, J. W.; Martin,
     R. L.; Morokuma, K.; Farkas, O.; Foresman, J. B.; Fox, D. J. Gaussian 16 Revision A.03.
     2016.
(68) Dunning, T. H. Gaussian Basis Sets for Use in Correlated Molecular Calculations. I. The
     Atoms Boron through Neon and Hydrogen. J. Chem. Phys. 1989, 90 (2), 1007–1023.
     https://doi.org/10.1063/1.456153.
(69) Smith, D. G. A.; Burns, L. A.; Patkowski, K.; Sherrill, C. D. Revised Damping Parameters
     for the D3 Dispersion Correction to Density Functional Theory. J. Phys. Chem. Lett. 2016,
     7 (12), 2197–2203. https://doi.org/10.1021/acs.jpclett.6b00780.
(70) Kendall, R. A.; Dunning, T. H.; Harrison, R. J. Electron Affinities of the First-Row Atoms
     Revisited. Systematic Basis Sets and Wave Functions. J. Chem. Phys. 1992, 96 (9), 6796–
     6806. https://doi.org/10.1063/1.462569.
(71) Woon, D. E.; Dunning, T. H. Gaussian Basis Sets for Use in Correlated Molecular
     Calculations. III. The Atoms Aluminum through Argon. J. Chem. Phys. 1993, 98 (2),
     1358–1371. https://doi.org/10.1063/1.464303.
(72) Dunning, J.; Peterson, K. A.; Wilson, A. K. Gaussian Basis Sets for Use in Correlated
     Molecular Calculations. X. The Atoms Aluminum through Argon Revisited. J. Chem.
     Phys. 2001, 114 (21), 9244–9253. https://doi.org/10.1063/1.1367373.
(73) Hariharan, P. C.; Pople, J. A. The Influence of Polarization Functions on Molecular
     Orbital Hydrogenation Energies. Theor. Chim. Acta 1973, 28 (3), 213–222.
     https://doi.org/10.1007/BF00533485.
(74) Hehre, W. J.; Ditchfield, R.; Pople, J. A. Self — Consistent Molecular Orbital Methods .
     XII . Further Extensions of Gaussian — Type Basis Sets for Use in Molecular Orbital
     Studies of Organic Molecules Published by the AIP Publishing Articles You May Be
     Interested in Self ‐ Consistent Molecular Or. J.Chem. Phys. 1972, 56 (5), 2257–2261.
     https://doi.org/10.1063/1.1677527.
(75) Tomasi, J.; Mennucci, B.; Cammi, R. Quantum Mechanical Continuum Solvation Models.
     Chem. Rev. 2005, 105 (8), 2999–3093. https://doi.org/10.1021/cr9904009.
(76) Merrick, J. P.; Moran, D.; Radom, L. An Evaluation of Harmonic Vibrational Frequency
                                              116


     Scale Factors.        J.     Phys.  Chem.      A   2007,     111     (45),    11683–11700.
     https://doi.org/10.1021/jp073974n.
(77) Jensen, J. H. Predicting Accurate Absolute Binding Energies in Aqueous Solution:
     Thermodynamic Considerations for Electronic Structure Methods. Phys. Chem. Chem.
     Phys. 2015, 17 (19), 12441–12451. https://doi.org/10.1039/c5cp00628g.
(78) Klamt, A.; Moya, C.; Palomar, J. A Comprehensive Comparison of the IEFPCM and
     SS(V)PE Continuum Solvation Methods with the COSMO Approach. J. Chem. Theory
     Comput. 2015, 11 (9), 4220–4225. https://doi.org/10.1021/acs.jctc.5b00601.
(79) Riplinger, C.; Neese, F. An Efficient and near Linear Scaling Pair Natural Orbital Based
     Local     Coupled      Cluster     Method.    J.   Chem.     Phys.      2013,    138    (3).
     https://doi.org/10.1063/1.4773581.
(80) Neese, F.; Valeev, E. F. Revisiting the Atomic Natural Orbital Approach for Basis Sets:
     Robust Systematic Basis Sets for Explicitly Correlated and Conventional Correlated Ab
     Initio    Methods?       J.   Chem.     Theory    Comput.     2011,      7    (1),   33–43.
     https://doi.org/10.1021/ct100396y.
(81) Jensen, F. Polarization Consistent Basis Sets. II. Estimating the Kohn–Sham Basis Set
     Limit. J. Chem. Phys. 2002, 116 (17), 7372–7379. https://doi.org/10.1063/1.1465405.
(82) Stück, D.; Baker, T. A.; Zimmerman, P.; Kurlancheek, W.; Head-Gordon, M. On the
     Nature of Electron Correlation in C60. J. Chem. Phys. 2011, 135 (19).
     https://doi.org/10.1063/1.3661158.
(83) Bauernschmitt, R.; Ahlrichs, R. Stability Analysis for Solutions of the Closed Shell Kohn-
     Sham       Equation.      J.    Chem.      Phys.    1996,     104      (22),    9047–9052.
     https://doi.org/10.1063/1.471637.
(84) Roos, B. O.; Taylor, P. R.; Sigbahn, P. E. M. A Complete Active Space SCF Method
     (CASSCF) Using a Density Matrix Formulated Super-CI Approach. Chem. Phys. 1980, 48
     (2), 157–173. https://doi.org/10.1016/0301-0104(80)80045-0.
(85) Lee, T. J.; Taylor, P. R. A Diagnostic for Determining the Quality of Single‐reference
     Electron Correlation Methods. Int. J. Quantum Chem. 1989, 36 (23 S), 199–207.
     https://doi.org/10.1002/qua.560360824.
(86) Lee, T. J.; Rice, J. E.; Scuseria, G. E.; Schaefer, H. F. Theoretical Investigations of
     Molecules Composed Only of Fluorine, Oxygen and Nitrogen: Determination of the
     Equilibrium Structures of FOOF, (NO)2 and FNNF and the Transition State Structure for
     FNNF Cis-Trans Isomerization. Theor. Chim. Acta 1989, 75 (2), 81–98.
     https://doi.org/10.1007/BF00527711.
(87) Lee, M. C.; Yang, R.; Duan, Y. Comparison between Generalized-Born and Poisson-
     Boltzmann Methods in Physics-Based Scoring Functions for Protein Structure Prediction.
     J. Mol. Model. 2005, 12 (1), 101–110. https://doi.org/10.1007/s00894-005-0013-y.
                                              117


(88) Sure, R.; Grimme, S. Comprehensive Benchmark of Association (Free) Energies of
     Realistic Host−Guest Complexes. 2015. https://doi.org/10.1021/acs.jctc.5b00296.
(89) Weigend, F.; Ahlrichs, R. Balanced Basis Sets of Split Valence, Triple Zeta Valence and
     Quadruple Zeta Valence Quality for H to Rn: Design and Assessment of Accuracy. Phys.
     Chem. Chem. Phys. 2005, 7 (18), 3297–3305. https://doi.org/10.1039/b508541a.
(90) Lee, J.; Head-Gordon, M. Distinguishing Artificial and Essential Symmetry Breaking in a
     Single Determinant: Approach and Application to the C60, C36, and C20 Fullerenes.
     Phys. Chem. Chem. Phys. 2019, 21 (9), 4763–4778. https://doi.org/10.1039/c8cp07613h.
(91) Sherrill, C. D.; Lee, M. S.; Head-Gordon, M. On the Performance of Density Functional
     Theory for Symmetry-Breaking Problems. Chem. Phys. Lett. 1999, 302 (5–6), 425–430.
     https://doi.org/10.1016/S0009-2614(99)00206-7.
(92) Sherrill, C. D.; Krylov, A. I.; Byrd, E. F. C.; Head-Gordon, M. Energies and Analytic
     Gradients for a Coupled-Cluster Doubles Model Using Variational Brueckner Orbitals:
     Application to Symmetry Breaking in O4+. J. Chem. Phys. 1998, 109 (11), 4171–4181.
     https://doi.org/10.1063/1.477023.
(93) Lee, J.; Head-Gordon, M. Regularized Orbital-Optimized Second-Order Møller-Plesset
     Perturbation Theory: A Reliable Fifth-Order-Scaling Electron Correlation Model with
     Orbital Energy Dependent Regularizers. J. Chem. Theory Comput. 2018, 14 (10), 5203–
     5219. https://doi.org/10.1021/acs.jctc.8b00731.
(94) Grimme, S. Supramolecular Binding Thermodynamics by Dispersion-Corrected Density
     Functional Theory. Chem. - A Eur. J. 2012, 18 (32), 9955–9964.
     https://doi.org/10.1002/chem.201200497.
(95) Li, Y. P.; Gomes, J.; Sharada, S. M.; Bell, A. T.; Head-Gordon, M. Improved Force-Field
     Parameters for QM/MM Simulations of the Energies of Adsorption for Molecules in
     Zeolites and a Free Rotor Correction to the Rigid Rotor Harmonic Oscillator Model for
     Adsorption Enthalpies. J. Phys. Chem. C 2015, 119 (4), 1840–1850.
     https://doi.org/10.1021/jp509921r.
                                              118


                                CHAPTER FIVE
Chemoenzymatic Synthesis of Glycopeptides Bearing Rare N-Glycan Sequences with or
without Bisecting GlcNAc
                                       119


About this chapter: This chapter is reprinted from Yang, W.; Ramadan, S.; Orwenyo, J.;
Kakeshpour, T.; Diaz, T.; Eken, Y.; Sanda, M.; Jackson, J. E.; Wilson, A. K.; Huang, X.
Chemoenzymatic Synthesis of Glycopeptides Bearing Rare N-Glycan Sequences with or without
Bisecting GlcNAc. Chem. Sci. 2018, 9 (43), 8194–8206 with the permission of the Royal
Society of Chemistry. The experiments mentioned in this chapter are performed by our
collaborators from Huang Group, simulations on Glycan 39 are performed by Thomas Diaz, and
simulations on Glycan 41 are performed by Yiğitcan Eken.
5.1 Introduction
    The glycosylation of proteins is one of the most common post translational modifications and
results in the great diversity of glycoprotein structures. Various attached carbohydrate groups
play critical roles in directing biological functions, structure stability, and conformations.
However, natural glycopeptides typically exist as a mixture of glycoforms with different
oligosaccharide groups attached. This makes it difficult to isolate pure individual forms of
glycopeptide. Due to that, synthetic methods are developed to produce pure glycopeptide forms.
The chemoenzymatic approach using the Arthrobacter endoβ-N-acteylglucosaminidase (Endo-
A) enzyme that has transglycosylation activity and can transfer free N-glycans to N-acetyl
glucosamine (GlcNAC) bearing acceptors is a method widely used and shown in Figure 5.1.1
                                                120


    Figure 5. 1 Glycan 39 treated with Endo-A enzyme and GlcNAc (unit A) bearing
haptoglobin as the acceptor glycopeptide leads to a reaction yield of 65% glycopeptide 45.
Potential branching sites are indicated on the figure with corresponding carbon numbers they
associate within the saccharide unit shown through letters A, B, C, D, E. 2
    This transformation results in N-linked glycoproteins, where the carbohydrate binds to the
protein backbone by using an asparagine residue. During the free N-glycan synthesis for
transglycosylation, most common points of attachments appear in the OH groups of C2/C6
carbons of mannose D and C2/C4 carbons of mannose E or introducing a GlcNAC structure on
the hydroxyl group attached to the C4 carbon of mannose C. Our collaborators from Huang
                                               121


group synthesized glycan 39 and glycan 412 oxazolines, which bear a branch at 6-OH of
mannose E. The structures of glycan 39 and glycan 41 are shown in Figure 5.2.
    Figure 5. 2 Structures of two glycan substrates. Glycan 39 is shown on the left and glycan 41
is shown on the right.The additional LewisX trisaccharide thioglycosyl donor group is marked
with red and the oxazoline ring, where the transglycosylation occurs, is marked with blue.
    The synthesis was performed by the Huang group and it is beyond the scope of this
dissertation; the focus here will be on the computational work performed. The only difference
between these two glycans is the additional LewisX trisaccharide thioglycosyl group present at
the C2 carbon of mannose D on glycan 41 as indicated in Figure 5.2. These two rare GlcNAc
containing oxazolines were tested for transglycosylation reaction using Endo-A enzyme and
GlcNAC bearing haptoglobin glycopeptide as an acceptor.2 The experimental results show that
when glycan 39 is used as a donor, the reaction results in glycopeptide 45 with 65% yield (Figure
5.1). In contrast, when glycan 41 is used, it does not participate in this reaction and the reaction
                                                122


does not lead to the desired product. The expected reaction site for transglycosylation is the
oxazoline rings present in B saccharides of both glycans (highlighted with blue in Figure 5.2).
Considering the similarity between the two glycans with the additional branching being far from
the reaction site, this divergent behavior of the transglycosylation reaction was unexpected. To
better understand this behavior and detect potential sources for low transglycosylation of glycan
41 between two free oxazolines that are docked to the Endo-A enzyme active site, molecular
dynamics simulations are performed in primary poses. The binding energies are calculated via
end-state free energy calculations and used to assess glycan 39 and glycan 41’s preference to the
Endo-A binding site, which might be the potential cause of reduced glycosylation yield in glycan
41.
5.2 Computational Methodology
    Initial coordinates of Endo-A were obtained from the Protein Data Bank3 (PDB ID: 3FHA)4.
As the focus of the study was on pocket residue-ligand interaction, missing segments and
residues outside the pocket region were capped using Molecular Operating Environment
v.2016.08 (MOE).5 Gate-keeper residues, W216 and W244 are positioned parallel to one another
during transglycoslyation.4 W244 was rotated from its original perpendicular orientation to
parallel with W216. The protein structure was initially minimized in MOE under the AMBER
ff10 force field6 and Extended Hückel Theory. The compounds were then non-covalently docked
with the docking program in MOE. Binding poses were refined using an induced fit refinement
method. The geometries of the N-glycan oxazoline compounds were optimized using the
Gaussian 16 program package7. The optimizations were performed using the AM1 method.8 The
obtained Mulliken charges were used with the antechamber of Amber 16 in the generation of
parameters for the N-glycan compounds. The systems were prepared using the Leap module of
                                                123


AmberTools169 under the AMBER ff14SB10 and GAFF force fields. Each enzyme complex was
solvated in a 14 Å cube of TIP4P-Ew water beyond the solute and 100 mM sodium chloride. The
systems were relaxed under NVT conditions over six minimization procedures with decreasing
restraints on the protein of 500.0, 200.0, 20.0, 10.0, 5.0 kcal/(mol Å2) to no restraints. The
systems were then heated to 300 K over 30 ps. Atomistic molecular dynamics simulations were
performed for 30 ns at 300 K and 1 atm using AMBER 16. The SHAKE algorithm constrained
bonds involving hydrogen.11 The trajectories were produced using Langevin dynamics and the
pressure of the system was regulated with isotropic position scaling. Long-range electrostatic
effects were modeled using the particle-mesh Ewald method with a cutoff of 10 Å. The resulting
trajectories were analyzed using AMBER 16 and visualized with MOE and the UCSF Chimera
package. Free energy of binding was calculated for every picosecond using the Poisson
Boltzmann model form the MMPBSA.py module of AmberTools and AMBER 16.12 The relative
free energy trends between models were compared, so solute entropy was neglected.
5.3 Results and Discussion
     Endo-A catalyzed transglycosylation reaction occurs in the binding site where active residues
W93, N171, E173, Y205, F125, W216, F243, W244,Y299 are present. During the reaction, these
critical residues surround the substrate and stimulate the oxazoline ion intermediate formation
and nucleophilic attack on this intermediate. The reaction mechanism requires a strong
interaction between the free oxazolines and the Endo-A enzyme.4 The experimental results
showed that when oxazoline 39 is treated with Endo-A enzyme and GlcNAc bearing haptoglobin
glycopeptide as a the acceptor, 65% transglycosylation yield is obtained. However, when the
experiment repeated with oxazoline 41, no desired glycopeptides where produced. To better
understand this differing behavior of 39 and 41 docking, MD and free energy calculations were
                                               124


performed. First, the two glycans docked into Endo-A (representative poses are shown in Figure
5.3). Poses with the oxazoline ring position within the active site are simulated using atomistic
MD and the binding energy of each pose is calculated from free energy calculations.
    Table 5. 1 Endo-A Binding energies of various binding poses of 39 and 41
          Compound                      Binding Poses              Binding energy (kcal mol-1)
                                               1                          −72.97 ± 6.04
              39                               2                          −94.00 ± 9.15
                                               3                          −77.36 ± 7.96
            Average                                      −81.44 ± 11.10
                                               1                         −52.08 ± 11.26
                                               2                         −60.17 ± 11.56
              41                               3                          −55.26 ± 7.95
                                               4                          −58.80 ± 7.83
                                               5                         −54.86 ± 11.17
            Average                                      −56.24 ± 9.95
    From the docking of glycan 39 to Endo-A, three different poses that are in the range of the
active site were obtained, whereas glycan 41 poses exhibit more diversity which might be due to
the lack of strong interactions with the protein. As a result, five poses are analyzed in order to
investigate the binding of each conformer. Next, 30 ns MD simulations were performed on these
systems and free energy calculations are performed. The results show an average binding energy
of 39 with Endo-A of −81.44 ± 11.10 kcal mol-1. Yet, binding of glycan 41 and Endo-A is
                                               125


significantly weaker at −56.24 ± 9.95 kcal mol-1 which might account for the lack of
transglycosylation.
     Figure 5. 3 Binding pose representations for the two glycans investigated. The figure on the
left is a snapshot taken from the MD simulation of glycan 39 with Endo-A and the indole rings
of W216 and W244 are in the perpendicular position. Snapshot taken from the MD simulation of
glycan 41 with Endo-A is shown on the right, indole rings of W216 and W244 are in the parallel
position because of the hindrance caused by the additional antenna.
     In the crystal structure of Endo-A and Endo-A complexed with tetrasaccharide oxazoline
substrate, the indole rings of W244 and W216 are perpendicular to each other. It is known that
the indole rings of W244 and W216 should be in parallel position and act as a gate to allow
substrate entry to the active site.4 These rings were moved to be parallel in order to allow
substrate entry to the active site. In all simulations with oxazoline 39, these rings turned back
into their original perpendicular orientation (Figure 5.3a). However, in the complex with
additional tri-antennae bearing donor 41, the additional antenna stays between W244 and W216
and hinders the rotation of the indole rings. This may prohibit the closed active site formation
and account for the reducing yield of glycosylation.
                                                126


5.4 Conclusions
    In this study molecular modeling is used to study interactions between Endo-A enzyme and
glycans 39, 41 that are synthesized and experimentally evaluated for trans glycosylation reaction
yields. Experimentally, Endo-A enzyme shows substrate preference towards glycan 39.
Supportive of experiment, when the Endo-A binding energy predicted for glycan 39 and 41;
glycan 41 showed significantly weaker binding compared to glycan 39. This indicates glycan 39
have higher affinity towards the active site of Endo-A. Additionally, simulations showed active
site gate residues W244 and W216 are hindered when the glycan 41 binds to the Endo-A active
site which can mechanistically explain why transglycosylation reaction did not occur on glycan
41.
                                               127


REFERENCES
    128


                                        REFERENCES
(1) Li, B.; Zeng, Y.; Hauser, S.; Song, H.; Wang, L. X. Highly Efficient Endoglycosidase-
    Catalyzed Synthesis of Glycopeptides Using Oligosaccharide Oxazolines as Donor
    Substrates.      J.    Am.      Chem.       Soc.    2005,     127     (27),     9692–9693.
    https://doi.org/10.1021/ja051715a.
(2) Yang, W.; Ramadan, S.; Orwenyo, J.; Kakeshpour, T.; Diaz, T.; Eken, Y.; Sanda, M.;
    Jackson, J. E.; Wilson, A. K.; Huang, X. Chemoenzymatic Synthesis of Glycopeptides
    Bearing Rare N-Glycan Sequences with or without Bisecting GlcNAc. Chem. Sci. 2018,
    8194–8206. https://doi.org/10.1039/c8sc02457j.
(3) Berman, H. M. The Protein Data Bank. Nucleic Acids Res. 2000, 28 (1), 235–242.
    https://doi.org/10.1093/nar/28.1.235.
(4) Yin, J.; Li, L.; Shaw, N.; Li., Y.; Song, J. K.; Zhang, W.; Xia, C.; Zhang, R.; Joachimiak,
    A.; Zhang, H. C.; Wang, L. X.; Liu, Z. J.; Wang, P. Structural Basis and Catalytic
    Mechanism for the Dual Functional Endo-$β$-N-Acetylglucosaminidase A. PLoS One
    2009, 4 (3). https://doi.org/10.1371/journal.pone.0004658.
(5) Chemical Computing Group Inc. Molecular Operating Environment (MOE). Montreal
    2016.
(6) Case, D. A.; Cheatham, T. E.; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M.; Onufriev,
    A.; Simmerling, C.; Wang, B.; Woods, R. J. The Amber Biomolecular Simulation
    Programs. J. Comput. Chem. 2005, 26 (16), 1668–1688. https://doi.org/10.1002/jcc.20290.
(7) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J.
    R.; Scalmani, G.; Barone, V.; Petersson, G. A.; Nakatsuji, H.; Li, X.; Caricato, M.;
    Marenich, A. V; Bloino, J.; Janesko, B. G.; Gomperts, R.; Mennucci, B.; Hratchian, H. P.;
    Ortiz, J. V; Izmaylov, A. F.; Sonnenberg, J. L.; Williams-Young, D.; Ding, F.; Lipparini,
    F.; Egidi, F.; Goings, J.; Peng, B.; Petrone, A.; Henderson, T.; Ranasinghe, D.;
    Zakrzewski, V. G.; Gao, J.; Rega, N.; Zheng, G.; Liang, W.; Hada, M.; Ehara, M.; Toyota,
    K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.;
    Vreven, T.; Throssell, K.; Montgomery Jr., J. A.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.
    J.; Heyd, J. J.; Brothers, E. N.; Kudin, K. N.; Staroverov, V. N.; Keith, T. A.; Kobayashi,
    R.; Normand, J.; Raghavachari, K.; Rendell, A. P.; Burant, J. C.; Iyengar, S. S.; Tomasi,
    J.; Cossi, M.; Millam, J. M.; Klene, M.; Adamo, C.; Cammi, R.; Ochterski, J. W.; Martin,
    R. L.; Morokuma, K.; Farkas, O.; Foresman, J. B.; Fox, D. J. Gaussian 16 Revision A.03.
    2016.
(8) Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. Development and Use of
    Quantum Mechanical Molecular Models. 76. AM1: A New General Purpose Quantum
    Mechanical Molecular Model. J. Am. Chem. Soc. 1985, 107 (13), 3902–3909.
    https://doi.org/10.1021/ja00299a024.
                                              129


(9)  Case, D. A.; Cerutti, D. S.; T.E. Cheatham, I. I. I.; Darden, T. A.; Duke, R. E.; Giese, T.
     J.; Gohlke, H.; Goetz, A. W.; Greene, D.; Homeyer, N.; Izadi, S.; Kovalenko, A.; Lee, T.
     S.; LeGrand, S.; Li, P.; Lin, C.; Liu, J.; Luchko, T.; Luo, R.; Mermelstein, D.; Merz, K.
     M.; Monard, G.; Nguyen, H.; Omelyan, I.; Onufriev, A.; Pan, F.; Qi, R.; Roe, D. R.;
     Roitberg, A. E.; Sagui, C.; Simmerling, C. L.; Botello-Smith, W. M.; Swails, J.; Walker,
     R. C.; Wang, J.; Wolf, R. M.; Wu, X.; Xiao, L.; York, D. M.; Kollman, P. A. Amber 2016.
     2016, No. April. https://doi.org/10.13140/RG.2.2.36172.41606.
(10) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C.
     Ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from
     Ff99SB.       J.    Chem.      Theory      Comput.     2015,     11    (8),    3696–3713.
     https://doi.org/10.1021/acs.jctc.5b00255.
(11) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. Numerical Integration of the Cartesian
     Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes. J.
     Comput. Phys. 1977, 23 (3), 327–341. https://doi.org/10.1016/0021-9991(77)90098-5.
(12) Miller, B. R.; McGee, T. D.; Swails, J. M.; Homeyer, N.; Gohlke, H.; Roitberg, A. E.
     MMPBSA.Py: An Efficient Program for End-State Free Energy Calculations. J. Chem.
     Theory Comput. 2012, 8 (9), 3314–3321. https://doi.org/10.1021/ct300418h.
                                              130


                                     CHAPTER SIX
Chemical Synthesis of Human Syndecan-4 Glycopeptide Bearing O-, N- Sulfation and
Multiple Aspartic Acids for Probing Impacts of the Glycan Chain and the Core Peptide on
Biological Functions
                                          131


About this chapter: This chapter is reprinted from Yang, W.; Eken, Y.; Zhang, J.; Cole, L. E.;
Ramadan, S.; Xu, Y.; Zhang, Z.; Liu, J.; Wilson, A. K.; Huang, X. Chemical Synthesis of
Human Syndecan-4 Glycopeptide Bearing O-, N-Sulfation and Multiple Aspartic Acids for
Probing Impacts of the Glycan Chain and the Core Peptide on Biological Functions. Chem. Sci.
2020, 11 (25), 6393–6404 with the permission of the Royal Society of Chemistry. The
experiments mentioned in this chapter are performed by our collaborator Huang Group and all
the theoretical work is performed by Yiğitcan Eken.
6.1 Introduction
Heparan sulfates (HS) are linear sulfated polysaccharides who are found in all animal tissues.
They have a range of biological functions including blood clothing prevention, growth factor and
chemokine binding and controlling activity levels of various enzymes.1–3 In nature, HS exists as
a heterogenous mixtures where the length of their backbone and location of sulfates varies. In
order to produce pure forms of HS for therapeutic purposes or to study HS structure activity
relationship, synthetic routes are commonly adopted.
    Within the tissues HS are originally exist as a proteoglycan where HS is covalently linked to
a core protein or a core peptide and form heparan sulfate proteoglycan (HSPG).4,5 These core
proteins originally thought as carriers and do not possess any biological activity. However,
recent studies suggest that the core protein itself can also be biologically active.6–8 To further
understand the role of the core protein on the biological function of HSPG glycopeptides,
synthesis and study of well-defined homogenous glycans and glycopeptides are vital.
    During this study glycan 28 bearing an N- and O-sulfated glycan chain and glycopeptide 2,
where glycan 28 covalently linked to a human syndecan-49 (amino acids 60–71) with four
                                                132


aspartic acids in the peptide backbone was synthesized by our collaborators from Huang group
(Figure 6.1).10
    Figure 6. 1 Chemical structures of glycopeptide, glycan and peptide synthesized by our
collaborators.
                                             133


    Table 6. 1 Inhibitory activities of glycopeptide, glycan and peptide towards heparanase (5
nM) and their dissociation constant respect to FGF-2 binding measure through biolayer
interferometry.
                                    Heparanase % Inhibition
           Compound                                                       FGF-2 KD (nm)
                            3.3μM          10μM           33μM
       Glycopeptide         NA             NA             NA             5
       Glycan               NA             32%            61%            14.5
       Peptide              NA             NA             NA             17
    After the synthesis, glycopeptide 2 and glycan 28 were experimentally tested for heparanase
and FGF-2 biological activity along with the peptide 29 backbone by itself. The activity data
showed that function of the glycan chain is affected by the peptide. During the heparanase study,
having glycan by itself showed inhibitory activity while glycopeptide and peptide showed no
inhibition (Table 6.1). However, the FGF-2 dissociation constants showed glycopeptide 3-fold
enhanced binding compared to glycan and peptide (Table 6.1). To insight activity data and
understand how peptide backbone impacts HS functions molecular modeling is used.10
6.2 Computational Methodology
    FGF-2 modeling studies were performed on the FGF-2 complexes with the glycopeptide 2,
peptide 29 and glycan 28 respectively, using crystal structure of the FGF2 protein (PDB11 ID:
4OEE).12 The potential ligand binding sites on the protein were detected by the Site Finder
program implemented in Molecular Operating Environment (MOE).13,14 The results showed
three potential ligand binding sites on FGF-2 with a positive Propensity of Ligand Binding
(PLB) score (Figure 6.2). Glycopeptide 2, peptide 29 and glycan 28 structures were docked
                                               134


individually into each of these potential binding sites. Molecular dynamics (MD) simulations and
binding free energy calculations were performed on the distinct binding poses with highest
GBVI/WSA △G scores.
    Figure 6. 2 Potential binding Sites on the FGF2 structure.
    Similar to FGF-2 study, the binding behavior of glycopeptide 2, peptide 29 and glycan 28 on
heparanase has also been investigated computationally. For this purpose, molecules and the
biotin tag used during experiments were docked into the heparin binding site of the heparanase
(PDB ID: 5E9C)15 using MOE. The distinct poses with highest GBVI/WSA △G scores were
further studied with molecular dynamics and binding free energy calculations. The average
binding energies and energies calculated from individual poses can be found in Table 6.2. The
experiments were performed on the glycopeptide 2, peptide 29 and glycan 28 structures with a
biotin tag. Due to that, the biotin tag was also included in this study to assess its contribution to
                                                135


the binding. The biotin tag gave little binding energy with heparanase, indicating that majority of
the binding energy results from interactions between glycan and heparanase.
6.3 Computational Results and Analyses of the Interactions
6.3.1 FGF-2 Binding
    The scan of FGF-2 structure led to identification of 3 potential ligand binding sites. Each of
these binding sites are evaluated for glycopeptide 2, peptide 29 and glycan 28 binding through
MD simulations and binding free energy calculations. The average binding energy results of
glycopeptide 2, peptide 29 and glycan 28 for each site can be found in Table 6.4. The results
showed site 1 had the highest affinity for both glycopeptide, glycan and peptide. The X-ray
crystal structure of complexes of FGF-2 and heparin oligosaccharides from literature showed that
the glycans reside in the site 1,12,16 which is consistent with our computation results. The average
binding energies and their experimental KD values for FGF2 are listed in Table 6.1, and energies
calculated from individual poses can be found in Table 6.2.
                                                  136


    Table 6. 2 Binding free energy for glycopeptide, glycan and peptide with FGF-2 calculated
for various poses.
                   FGF-2 Site 1 Glycan Binding Comparison (kcal mol-1)
   Compound              Pose           ∆G Binding             STD           Average ∆G
                   1                  -34.37             ± 8.60
                   2                  -35.25             ± 6.59
Glycan             3                  -36.40             ± 8.96            -35.09 ± 8.01
                   4                  -31.72             ± 6.45
                   5                  -37.74             ± 9.45
                   1                  -30.92             ± 10.78
                   2                  -25.55             ± 8.97
Peptide            3                  -26.84             ± 7.88            -30.40 ± 10.55
                   4                  -35.85             ± 10.16
                   5                  -32.86             ± 14.94
                   1                  -53.51             ± 14.79
                   2                  -77.67             ± 16.44
                   3                  -69.51             ± 17.63
Glycopeptide       4                  -47.82             ± 12.29           -60.04 ± 13.65
                   5                  -53.81             ± 5.36
                   6                  -51.88             ± 18.17
                   7                  -66.07             ± 10.85
                                             137


    Binding site 1 of FGF-2 is lined with many basic residues including Asn27, Arg44, Lys 119,
Arg120, Lys125, Lys129, Gln134 and Lys135 (Fig. 4). MD simulations of FGF-2 complex with
glycopeptide showed that these residues formed hydrogen bonds with glycopeptide. The
distances between the side chains of Lys125 and Lys119 are within 5 Å from the sulfates on the
glycan, indicating potential electrostatic interactions. In all glycopeptide 2 binding poses, the
glycan is located within binding site 1 while the peptide extends out of the pocket and towards
the protein’s surface. Meanwhile, glycan 28 binds site 1 with an analogous conformation as that
of glycopeptide (Fig. 6.3). The binding poses and the MD simulations showed the peptide
portion of glycopeptide extends out of site 1 towards the surface of FGF-2. This leads to the
formation of additional salt bridges with the basic residues outside of binding site 1 including
Arg22 and Lys21 (Figure 6.3). These additional salt bridges are presumably responsible for
improved binding to FGF-2 as observed in glycopeptide as compared to glycan.
    Figure 6. 3 Representative binding pose of glycopeptide to FGF2.
                                                138


6.3.2 Heparanase Binding
    Table 6. 3 Binding free energy for glycopeptide 2, peptide 29 and glycan 28 with heparanase
calculated for various poses.
                              Heparanase Binding (kcal mol-1)
   Compound               Pose           ∆G Binding             STD            Average ∆G
                    1                  -59.97             14.44
                    2                  -53.89             10.00
Glycan 28           3                  -49.84             12.50              -57.36 ± 12.19
                    4                  -72.58             10.83
                    5                  -50.52             13.20
                    1                  -34.62             12.47
                    2                  -61.90             20.10
Peptide 29          3                  -32.25             10.36              -43.14 ± 14.45
                    4                  -35.28             12.51
                    5                  -51.66             16.80
                    1                  -45.65             13.18
                    2                  -55.65             15.23
Glycopeptide 2      3                  -49.41             15.53              -50.55 ± 15.67
                    4                  -46.77             15.42
                    5                  -55.26             18.99
                                              139


    The average binding energies to heparanase and energies calculated from individual poses
are included in Table 6.3. The binding energy results show that glycan 28 has a higher affinity to
heparanase compared to peptide 29 and glycopeptide 2, respectively. The glycan 28’s higher
binding affinity towards heparanase compared to glycopeptide 2 is contrary to the results
observed with FGF-2. When inhibitory activities of glycan, peptide and glycopeptide toward
heparanase is considered (Table 6.1), glycan showed 32% inhibitory activity in 10μM
concentration and 61% inhibition on 33μM concentration. Peptide and glycopeptide showed no
inhibitory activity towards heparanase.
    Figure 6. 4 Comparison of (a) glycan 28 and (b) glycopeptide 2 binding to the site 1 of
heparanase (heparin binding site).
    Heparanase binding site consists of many basic residues including Lys159, Arg272, Lys231,
Lys232, Arg303. Glycan 28 is oriented within the binding site by interacting with these basic
residues through hydrogen bonds and ionic bonds (Fig. 6.4a). In glycopeptide 2 complex with
heparanase, the glycan is situated within the binding site, while the peptide backbone extends
toward the solvent (Figure 6.4b). The comparison of glycan 28 and glycopeptide 2 binding
                                              140


shows that core H-bonds and ionic interactions in the binding pocket are weakened in the
glycopeptide complex. For example, the interaction between Lys231 and N-sulfate group
observed in glycan 28/heparanase is lost in the glycopeptide 2/heparanase complex. Furthermore,
in glycan 28/heparanase complex vs. glycopeptide 2/heparanase, the distance between Lys232
and N-sulfate group increased from 2.64 Å to 2.71 Å, the distance between Arg272 and O-sulfate
group increased from 2.75 Å to 2.89 Å, and H-bond distance between Arg303 and a hydroxyl
group increased from 2.94 Å to 3.06 Å (Fig. 5). This weakening of glycan/protein interactions
can be explained by the peptide backbone of glycopeptide not fitting in the pocket, thus
disrupting the glycan interactions with heparanase, which presumably leads to reduced affinity
and inhibitory activity of glycopeptide 2 on heparanase.
6.4 Conclusion
    With this study for the first time, HSPG glycopeptides bearing multiple Asp residues in the
peptide backbone and O- and N-sulfation on the glycan chain have been successfully synthesized
and tested for biological functions by the Huang group. The results showed the glycan inhibited
the activities of heparanase, while the glycopeptide did not alter the heparanase activity.
Additionally, the glycopeptide showed enhanced binding comparison to glycan and peptide by
itself in FGF-2 systems. The molecular dynamics simulations are used to insight functioning of
these ligands with respect to heparanase and FGF-2 binding. The simulations showed the peptide
portion of the glycopeptide 2 can led to additional salt bridges in FGF-2 systems, whereas in
heparanase it tends to pull the glycan core towards solvent which may explain opposite effect of
peptide attachment in activity. The experimental results combined with the structural insights
gained from molecular modeling, suggests that transferring HS to a core protein as in
proteoglycans may be used to modulate HS functions.
                                               141


APPENDIX
   142


   Table 6. 4 Average binding free energies and standard deviations calculated for glycan 28,
peptide 29 and glycopeptide 2 on 3 potential binding sites.
   Compound                Site                     Average Binding Energies (kcal mol-1)
   Glycan 28               1                        -35.09               ± 8.01
                           2                        -26.75               ± 10.55
                           3                        -20.04               ± 7.17
   Peptide 29              1                        -30.40               ± 10.55
                           2                        -25.59               ± 10.24
                           3                        -26.26               ± 11.07
   Glycopeptide 2          1                        -60.04               ± 13.65
                           2                        -37.40               ± 13.88
                           3                        -41.48               ± 12.03
                                               143


REFERENCES
    144


                                       REFERENCES
(1)  Petitou, M.; van Boeckel, C. A. A. A Synthetic Antithrombin III Binding Pentasaccharide
     Is Now a Drug! What Comes Next? Angew. Chemie Int. Ed. 2004, 43 (24), 3118–3133.
     https://doi.org/10.1002/anie.200300640.
(2)  Bernfield, M.; Götte, M.; Park, P. W.; Reizes, O.; Fitzgerald, M. L.; Lincecum, J.; Zako,
     M. Functions of Cell Surface Heparan Sulfate Proteoglycans. Annu. Rev. Biochem. 1999,
     68 (1), 729–777. https://doi.org/10.1146/annurev.biochem.68.1.729.
(3)  Lindahl, U.; Kusche-Gullberg, M.; Kjellén, L. Regulated Diversity of Heparan Sulfate. J.
     Biol. Chem. 1998, 273 (39), 24979–24982. https://doi.org/10.1074/jbc.273.39.24979.
(4)  Häcker, U.; Nybakken, K.; Perrimon, N. Heparan Sulphate Proteoglycans: The Sweet Side
     of Development. Nat. Rev. Mol. Cell Biol. 2005, 6 (7), 530–541.
     https://doi.org/10.1038/nrm1681.
(5)  Bishop, J. R.; Schuksz, M.; Esko, J. D. Heparan Sulphate Proteoglycans Fine-Tune
     Mammalian         Physiology.      Nature       2007,    446     (7139),      1030–1037.
     https://doi.org/10.1038/nature05817.
(6)  Choi, S.-J.; Lee, H.-W.; Choi, J.-R.; Oh, E.-S. Shedding; towards a New Paradigm of
     Syndecan Function in Cancer. BMB Rep. 2010, 43 (5), 305–310.
     https://doi.org/10.5483/BMBRep.2010.43.5.305.
(7)  Morgan, M. R.; Humphries, M. J.; Bass, M. D. Synergistic Control of Cell Adhesion by
     Integrins and Syndecans. Nat. Rev. Mol. Cell Biol. 2007, 8 (12), 957–969.
     https://doi.org/10.1038/nrm2289.
(8)  Iozzo, R. V. Series Introduction: Heparan Sulfate Proteoglycans: Intricate Molecules with
     Intriguing     Functions.     J.    Clin.     Invest.  2001,    108      (2),   165–167.
     https://doi.org/10.1172/JCI13560.
(9)  David, G.; van der Schueren, B.; Marynen, P.; Cassiman, J. J.; van den Berghe, H.
     Molecular Cloning of Amphiglycan, a Novel Integral Membrane Heparan Sulfate
     Proteoglycan Expressed by Epithelial and Fibroblastic Cells. J. Cell Biol. 1992, 118 (4),
     961–969. https://doi.org/10.1083/jcb.118.4.961.
(10) Yang, W.; Eken, Y.; Zhang, J.; Cole, L. E.; Ramadan, S.; Xu, Y.; Zhang, Z.; Liu, J.;
     Wilson, A. K.; Huang, X. Chemical Synthesis of Human Syndecan-4 Glycopeptide
     Bearing O-, N-Sulfation and Multiple Aspartic Acids for Probing Impacts of the Glycan
     Chain and the Core Peptide on Biological Functions. Chem. Sci. 2020, 11 (25), 6393–
     6404. https://doi.org/10.1039/d0sc01140a.
(11) Berman, H. M. The Protein Data Bank. Nucleic Acids Res. 2000, 28 (1), 235–242.
     https://doi.org/10.1093/nar/28.1.235.
                                               145


(12) Li, Y.; Ho, I.; Ku, C.; Zhong, Y.; Hu, Y.; Chen, Z.; Wang, C.; Hsiao, C. Interactions That
     In Fl Uence the Binding of Synthetic Heparan Sulfate Based Disaccharides to Fibroblast
     Growth Factor ‑ 2. 2014, 6–11.
(13) Chemical Computing Group Inc. Molecular Operating Environment (MOE). Montreal
     2016.
(14) Labute, P.; Santavy, M. SiteFinder-Locating Binding Sites in Protein Structures
     http://www.chempcomp.com/journal/sitefind.htm%5Cnhttps://www.chemcomp.com/journ
     al/sitefind.htm.
(15) Wu, L.; Viola, C. M.; Brzozowski, A. M.; Davies, G. J. Structural Characterization of
     Human Heparanase Reveals Insights into Substrate Recognition. Nat. Struct. Mol. Biol.
     2015, 22 (12), 1016–1022. https://doi.org/10.1038/nsmb.3136.
(16) Faham, S.; Hileman, R. E.; Fromm, J. R.; Linhardt, R. J.; Rees, D. C. Heparin Structure
     and Interactions with Basic Fibroblast Growth Factor. 1996, 271 (5252), 1116–1120.
                                             146


                                   CHAPTER SEVEN
Binding of Per- and Polyfluoroalkyl Substances to the Human Pregnane X Receptor
                                           147


About this chapter: This chapter is reprinted from Lai, T. T.; Eken, Y.; Wilson, A. K. Binding of
Per- and Polyfluoroalkyl Substances to the Human Pregnane X Receptor. Environ. Sci. Technol.
2020, 54 (24), 15986–15995 with the permission of American Chemical Society. Lai Thanh and
Yiğitcan Eken contributed equally to this research by investigating interactions of half of the
PFASs for hPXR.
7.1 Introduction
    The production of perfluoroalkyl substances (PFASs) in the 1940s and 1950s is credited as
an industrial breakthrough due to the unique properties of PFASs including water and oil
repellency, high surface activity, and durability.1 The use of these compounds has been
widespread for food packaging, fire-fighting foams, carpet, furniture, boots, clothes, nonstick
cookware, to name only a few.2–4 PFASs are synthetic organofluorine compounds that have most
(poly-) or all (per-) of their carbon-bonded hydrogens replaced with fluorine. PFASs are
colloquially referred to as “zombie chemicals” or “forever chemicals,” for their known resistance
to degradation, which is caused by the strong electronegativity difference between their
carbon−fluorine bonds.5–7 Environmental and health concerns over the past two decades8–15 have
led to actions such as 3 M’s voluntary perfluorooctane sulfonic acid (PFOS) phase-out in 200016
and EPA’s 2006 perfluorooctanoic acid (PFOA) Stewardship Program.17 “Long-chain” PFASs,
defined as perfluoroalkyl carboxylic acids (PFCAs) with seven or more carbons and
perfluorosulfonic acids (PFSAs) with six or more carbons forming their carbon backbone, have
been slowly replaced with alternative PFASs, both “short-chain” variants and fluorinated
alternatives, which typically have different functionalities.18 Most common replacements are
ADONA (trade name for 4,8-dioxa-3Hperfluorononanoic acid)19 and Gen-X (trade name for
2,3,3,3- tetrafluoro-2-heptafluoropropoxy propanoic acid), which are used as alternatives to
                                                148


PFOA.20 6:2 Fluorotelomer carboxylic acid (6:2 FTCA) is considered to be another alternative to
PFOA, even though there has been no reported large-scale usage of the compound.19,21–24 A
limited number of studies have been done on the potential impact of alternative PFASs on the
environment and to human health (see refs19,20,25–27). These studies suggest that alternative
PFASs may exhibit comparable or even greater adverse health effects than their counterparts.
The adverse effects of PFASs are believed to be chain length and functional group dependent,
such that shorter PFASs or differently functionalized PFASs (such as ether groups in place of a
number of fluorinated carbons) may be less toxic.18,28,29 Thus, it is crucial to study molecular
recognition of PFASs together with alternative PFASs in a fast and efficient manner. Yet, few
studies have been performed on the structural differences of various PFASs and how the
structure correlates to their binding.
    PFASs are shown to interact with various human proteins such as thyroid hormone transport
proteins, plasma proteins, liver fatty acid-binding protein, and also nuclear receptors such as
pregnane X receptor (PXR), peroxisome proliferator activated receptors (PPARs), etc.30–37 A
number of epidemiological studies have suggested links between PFASs and adverse health
effects such as adult thyroid problems, early childhood immunosuppression, as well as non-high-
density lipoprotein (HDL)/total cholesterol.38–40 Furthermore, PFASs such as PFOA have been
shown to induce hepatic toxicity in mice as well as liver cancer in rodents. 33,41 As the PFAS
chemical space (>4000 compounds) and the number of proteins that might interact with PFASs
are considered, computational approaches are important.
    Both the interaction and binding of PFASs to proteins play essential roles on their toxicity
and bioaccumulation potential, and the prediction of their binding and interaction can be used as
a proximity for their bioaccumulation and toxicity assessment.42 In this study, we utilized in
                                                149


silico methods based on molecular dynamics (MD) to investigate protein−PFAS interactions. To
calculate binding affinities between PFAS and proteins, end-state approaches are selected due to
their good balance between computational cost and accuracy.43,44 To be more specific, molecular
mechanics combined with Poisson−Boltzmann surface area (MM-PBSA) and molecular
mechanics combined with Generalized Born surface area (MM-GBSA) are used in this
investigation to predict relative binding energies.
     hPXR is involved in a variety of biological and clinical functions such as xenobiotic and bile
acid metabolism, steroid hormone homeostasis, and mediation of various drug−drug
interactions.45–47 Due to its large (1150 A3 ) and flexible ligand binding cavity present on its
ligand binding domain,48 the hPXR is able to bind to a variety of ligands including naturally
occurring steroids such as progestins, glucocorticoids, bile acids, and estrogens.49 The binding of
ligands to this domain is associated with an increased stability of the receptor, which mediates
coactivator binding to the ligand-dependent activation function 2 (AF-2) surface and ultimately
leads to the induction of hPXR. However, the exact molecular mechanisms are still elusive.50,51
The induction of hPXR has been associated with hepatic steatosis, atherosclerosis, oxidative
stress, lipid homeostasis, endocrine disruption effects, carcinogenesis, and adverse drug
interactions.46,52–56
     In this study, the molecular basis of the PFAS-induced activation on hPXR as well as the
differences and similarities between how legacy and alternative (replacement) PFASs interact
with hPXR are studied computationally (Table 7.1). Particularly useful for this study is the
availability of both the crystal structure for the ligand binding domain (LBD) and experimental
bioactivity data for a number of the PFASs investigated here for human pregnane X receptor
(hPXR).57,58 Molecular dynamics simulations (MD), residue−ligand interaction energy
                                                 150


calculations, alanine mutation studies, free energy of binding calculations, and hydrogen bond
(H-bond) analysis are used to investigate relative binding energies of PFAS−hPXR complexes,
hydrogen bond frequencies, and key residue−ligand interactions to produce a quantitative
molecular-level description of PFAS−hPXR interactions. The various interaction patterns of
PFAS−hPXR are compared, focusing on structural differences. Additionally, several PFOA
alternatives, ADONA, Gen-X, 6:2 FTCA, and a short-chain PFSA variant, PFBS, are also
included in this study to consider interactions with hPXR, as the agonistic activity of these
species on hPXR was not previously determined.
                                               151


7.2 Materials and Methods
    Table 7. 1 Nomenclature for Perfluoroalkyl Substances (PFASs) Studieda
                               Perfluorinated
    Type        Acronym                                       Name             Chemical Formula
                                   Carbon
   PFCA          PFBA                  3            perfluorobutanoic acid     CF3-(CF2)2-COOH
   PFCA           PFPA                 4            perfluoropentanoic acid    CF3-(CF2)3-COOH
   PFCA          PFHxA                 5            perfluorohexanoic acid     CF3-(CF2)4-COOH
   PFCA          PFHpA                 6            perfluoroheptanoic acid    CF3-(CF2)5-COOH
   PFCA          PFOA                  7             perfluorooctanoic acid    CF3-(CF2)6-COOH
   PFCA          PFNA                  8            perfluorononanoic acid     CF3-(CF2)7-COOH
   PFCA          PFDA                  9            perfluorodecanoic acid     CF3-(CF2)8-COOH
   PFCA          PFDoA                11           perfluorododecanoic acid    CF3-(CF2)10-COOH
                                                   perfluorobutane sulfonic
   PFSA           PFBS                 4                                        CF3-(CF2)3-SO3H
                                                               acid
                                                   perfluorooctane sulfonic
   PFSA           PFOS                 8                                        CF3-(CF2)7-SO3H
                                                               acid
   FTOH        6:2 FTOH                6           6:2 fluorotelomer alcohol   CF3-(CF2)5-CH2-OH
                                                 6:2 fluorotelomer carboxylic   CF3-(CF2)5-CH2-
   FTCA        6:2 FTCA                6
                                                               acid                  COOH
                                                      2,3,3,3-tetrafluoro-2-
                                                                                 CF3-(CF2)2-O-
 Alternative     Gen-X                 5               heptafluoropropoxy
                                                                                (CF3)CF-COOH
                                                         propanoic acid
                                                          4,8-dioxa-3H-       CF3-O-(CF2)3-O-CHF-
 Alternative    ADONA                  6
                                                    perfluorononanoic acid         CF2-COOH
                                        PFSA = CF3-(CF2)n-SO3H
                                       PFCA = CF3-(CF2)n-COOH
                                     FTOH = CF3-(CF2)n-(CH2)m-OH
                                  FTCA = CF3-(CF2)n-(CH2)m-COOH
  Note: The chemical structures of the compounds are provided in Table 7.3
                                                 152


7.2.1 Site Analysis and Molecular Docking
    The hPXR protein structure was taken from the RSCB Protein Data Bank (PDB ID:
6DUP).58 The Molecular Operating Environment’s (MOE) Site Finder program was used to
detect potential binding sites in the hPXR structure.59 The site finder method detects α shapes on
the protein structure and evaluates them according to their propensity of ligand binding (PLB).60
The site that had the highest PLB score - a proven binding site for T1317 and rifampicin ligands
- was used as the PFAS binding site.61,62 Starting PFAS structures were obtained from
PubChem.63 Protonation states of the PFASs under the physiological conditions are determined
by Protonate3D module implemented in MOE.59,64 The resulting PFAS structures were
minimized in MOE with the AMBER10: Extended Hückel Theory (EHT) force field, which uses
Amber ff10 for macromolecules and Extended Hückel Theory for the ligands.65–67 Ligand
binding poses were determined by docking PFASs to the binding site using MOE. The London
ΔG scoring function68 was used to evaluate 100 initial ligand placements. Then, the initial
placements were further refined to 10 poses via the Generalized Born Volume Integral/Weighted
Surface area scoring function (GBVI/WSA) ΔG with induced-fit protein settings.59,68 From these
10 refined poses, structurally distinct ones with the highest (GBVI/WSA) ΔG scores were
selected for further studies.
7.2.2 Simulation Protocol
    The selected complex structures were minimized via AMBER10:EHT in MOE. The
topologies and the parameters for the minimized structures were created using the Leap module
of Amber Tools69 under the General Amber Force Field (GAFF), AMBER ff14sb force fields.70
The AM1-BCC charge scheme71 was used to calculate partial charges of the ligand atoms, and
these partial charges were fit to GAFF using the Antechamber69 suite to generate ligand
                                               153


parameters. The protein−ligand complex structures were placed in a 14 Å3 beyond the solute
box, neutralized and ionized with 100 mM NaCl ions using the parameters from Joung and
Cheatham.72
    The systems were minimized with decreasing energy restraints on the protein (500.0, 200.0,
20.0, 10.0, 5.0, 0.0 kcal mol-1). Then, the systems were heated from 100 to 300 K in 30 ps MD
simulation and equilibrated for 100 ps at 300 K. After equilibration, 30 ns MD simulations were
performed to ensure the convergence of the system at 300 K and 1 atm pressure. During all
simulations, the pressure was controlled by isotropic position scaling, the temperature was
controlled by Langevin dynamics, and the time step was set to 2 fs. Furthermore, SHAKE
algorithm73 was used to constrain hydrogen bonds to allow the use of the 2 fs time step.
Nonbonded interactions were truncated to 10 Å, while the particle-mesh Ewald (PME) method
was used to efficiently approximate long-range electrostatic interactions.
7.2.3 Binding Energy Calculations
    MM-PBSA and MM-GBSA methods are used for predicting the binding energies between
PFASs and hPXR. These methods are based on subtracting the free energies of the unbound
receptor and the ligand from the free energy of the ligand bound protein complex using the
structures generated during MD simulations.74
                              ΔGBind = GComplex – GProtein – GLigand (1)
    Many studies have demonstrated the success of these methods for finding relative binding
affinities and ranking binding energies of molecules,75,76 though very few of the studies have
focused on PFASs.37 While methods such as MM-PBSA and MM-GBSA have been useful, the
methods are built upon different thermochemical approximations, and, thus, the predictions
arising from these methods can be system dependent. For example, when the MM-GBSA and
                                                154


MM-PBSA binding energies for six different protein−ligand systems including α-thrombin (7
ligands), avidin (7 ligands), cytochrome C peroxidase (18 ligands), neuraminidase (8 ligands),
P450cam (12 ligands), and penicillopepsin (7 ligands) are compared, MM-GBSA results in:
better correlation with experiments for α-thrombin, penicillopepsin, neuraminidase, similar
correlation for avidin, and poorer correlation for cytochrome C peroxidase and P450cam in
comparison to MM-PBSA.76 Therefore, since the performances of MM-PBSA and MM-GBSA
cannot be determined a priori, it is necessary to consider both methods for the PFAS− hPXR
system and compare the results with the experiment, when available.
     In this study, the binding free energies of the ligand−protein complexes were calculated using
both MM-PBSA and MM-GBSA with a modified General Born solvation model by Onufriev et
al.,77 approaches implemented in the Amber PBSA-solver.74 Default internal and external
dielectric constants were used (1.0 and 80.0, respectively). The solvent-accessible surface area
(SASA) was determined with the default linear combinations of pairwise overlap (LCPO)
method using modified Bondi atomic radii. For both MM-PBSA and MM-GBSA, the frames
from the first nanosecond of the MD simulations were used to calculate binding energies since it
has been shown that such simulations can be useful, and that longer simulations do not
necessarily correspond to a better accuracy.76 Solute entropies were neglected because the
primary focus of this effort was on the relative binding energies of PFASs on hPXR. Binding
contributions of the residues at the binding site were calculated by per-residue decomposition
and the energy contribution for each residue averaged from all poses tested.69 Additionally,
mutagenesis studies were performed by replacing target residues with the alanine from the
complex structure, followed by MD and MM-GBSA efforts. The MM-GBSA electrostatic
energies of these mutant complexes were compared with their wild-type counterparts.
                                                   155


Figure 7. 1 Binding modes of PFASs to the hPXR ligand binding pocket.
                                         156


7.2.4 Hydrogen Bond Analysis
    Hydrogen bond lifetime analyses were performed via CPPTRAJ for every PFAS ligand. 78
For each PFAS, the PFAS−hPXR complex with the lowest MM-GBSA relative binding energy
was selected for analysis. Ser-247, Gln-285, His-327, and His-407 were analyzed for hydrogen
bond lifetimes.
7.3 Results and Discussion
7.3.1 Molecular Docking and MD Simulations
    The binding poses of 14 PFASs that have the highest affinity to hPXR LBD, as determined
by MM-GBSA free energy results, are provided in Figure 7.1. To account for the changes that
occur in the binding domain upon PFAS binding, induced-fit docking is used for the generation
of the binding poses. The docking algorithm allows for movement of the protein side chains
together with the ligand in its refinement step, which ensures that the protein side chains are
adjusted in accordance to the ligand structure. This type of approach is commonly used in
computer-aided drug design with success,68,79–85 especially for protein targets with a flexible
binding domain such as the flexible binding domain encountered in this study for hPXR. Most of
the PFAS binding modes have the carboxylate/ sulfonate group hydrogen bonding with Gln-285
and His-327, or Ser-247. It should be noted that Zhang et al. reported PFAS binding modes that
hydrogen bond to Ser-247.57 In the current effort, 30 ns MD simulations are adopted to ensure
system convergence. MD simulations show that poses that hydrogen bond with Gln-285 and His-
327 are still able to hydrogen bond to Ser-247 with minor movements to the
carboxylate/sulfonate group. For the most part, docking poses are preserved in MD simulations,
and any ligand movements are often attributed to changes in hydrogen bonding partners of the
carboxylate/sulfonate functional group.
                                              157


    Figure 7. 2 (a) Correlation observed between experimental EC50 values from Zhang et al. and
predicted binding free energies from MM-GBSA. (b) Correlation observed between EC50 values
from Zhang et al. and predicted binding free energies from MM-PBSA. Error bars indicate
standard deviations.
7.3.2 Binding Free Energy Calculations
    The utilities of both MM-GBSA and MM-PBSA for PFAS−hPXR systems are first evaluated
by comparing the predicted binding energies of PFBA, FPPA, PFHxA, PFHpA, PFOA, PFNA,
PFDA, PFDoA, PFOS, and 6:2 FTOH with available experimental half maximal effective
concentration (EC50) data from Zhang et al. (Figure 7.2a,b, respectively).57 Strong correlation
between experimental EC50 values and predicted binding free energies is observed with both
MM-GBSA and MM-PBSA methods with correlation coefficients of 0.95 and 0.86, respectively,
and Kendal’s Tau values of 0.96 and 0.69, respectively. Yet, MM-GBSA performed better on the
PFAS−hPXR systems, proven by its slightly higher correlation coefficient and Kendal’s Tau
results compared to those of MM-PBSA. To assess the affinity and potential impact of ADONA,
                                              158


6:2 FTCA, Gen-X, and PFBS upon hPXR binding, MM-GBSA calculations were expanded to
include these alternative PFASs whose agonistic activity on hPXR has not been reported
previously (Figure 7.3).
    Figure 7. 3 Binding energies of PFASs to hPXR calculated with MM-GBSA in comparison
to EC50 values measured by Zhang et al. (the predicted binding energies are listed in Table 7.4).
    Predicted ΔGs of PFCAs (PFBA, PFPA, PFHxA, PFHpA, PFOA, PFNA PFDA, and
PFDoA) suggest that as the perfluorinated carbon number increases, the affinity of the PFCAs to
hPXR LBD also increases, explaining the relationship between the increased agonistic activity
(EC50 values) measured with respect to increased perfluorinated carbon chain length (Figure 7.
3). When comparing PFSAs, increasing carbon chain length also leads to decreased affinity such
that PFBS (four carbons) is higher in relative binding energy (+11.6 kcal mol-1) than PFOS (8
carbons). Finally, binding energies of ADONA, Gen-X, and 6:2 FTOH show a lower affinity to
hPXR compared to PFOA. However, ADONA, Gen-X, and 6:2 FTOH’s binding to hPXR are
predicted as similar to PFPA and PFHxA, indicating even though the binding energies are lower
than PFOA, they still exhibit binding and may show agonistic activity.
                                              159


    Table 7. 2 hPXR Residues Interact with PFASs Upon Binding
Ligand           H-Bonding Residues              Largest Energy Contributors
PFBA             Ser-247, His-407                Arg-410, Lys-210, Lys-226, His-407, Ser-247
PFPA             Gln-285, His-327, His-407       Arg-410, Lys-210, Lys-226, His-407, Gln-285
PFHxA            Ser-247, Gln-285                Arg-410, Lys-210, Lys-226, His-407, Gln-285
PFHpA            Ser-247, His-407                Arg-410, Lys-210, Lys-226, His-407, Ser-247
PFOA             Ser-247, His-327                Arg-410, Lys-210, Lys-226, His-407, Ser-247
PFNA             Gln-285, His-327                Arg-410, Lys-210, Lys-226, His-407, Gln-285
PFDA             Gln-285, His-327, His-407       Arg-410, Lys-210, Lys-226, His-407, Gln-285
PFDoA            Ser-247, His-407                Arg-410, Lys-210, Lys-226, His-407, Gln-285
PFBS             Ser-247, His-407                Arg-410, Lys-210, Lys-226, His-407, Gln-285
PFOS             Ser-247, Gln-285, His-407       Arg-410, Lys-210, Lys-226, His-407, Gln-285
6:2 FTOH         —                               Trp-299, Ser-208, Phe-288, Tyr-306, Gln-285
6:2 FTCA         Ser-247                         Arg-410, Lys-210, Lys-226, His-407, Ser-247
Gen-X            Ser-247, His-407                Arg-410, Lys-210, Lys-226, His-407, Ser-247
ADONA            —                               Arg-410, Lys-210, Lys-226, Met-323, His-327
7.3.3 PFAS Recognition on hPXR
    Residue decomposition is employed to understand molecular recognition of PFASs on hPXR
and can also be used to provide insight about the activity of untested PFASs on hPXR. Residue
decomposition shows that Lys-210, Lys-226, Ser-247, Gln-285, His-327, His-407, and Arg-410
are among the largest energy contributors for PFAS−hPXR binding for all PFASs tested, except
6:2 FTOH, which does not possess an acidic functional group (Table 7.2 and Figure 7.7).
    The binding energies for the top three residues are quite similar between the short/alternative
PFASs and long-chain PFASs. Among the binding site residues, Arg-410 has the lowest
                                                160


interaction energy for both PFASs at ∼−40 kcal mol-1, followed by Lys-210 at ∼−25 kcal mol-1,
and Lys-226 at ∼−17 kcal mol-1, with the exception of 6:2 FTOH, where the functional group is
an alcohol rather than an acid. On the contrary, the contribution to the binding from Ser-247,
Gln285, His-327, and His-407 varies according to the ligand with a range from −5 to −12 kcal
mol-1. The energy contributions of Lys-210 and Arg-410 tend to increase as the carbon chain
length increases. Since both Lys-210 and Arg-410 are located near the entrance of the cavity
(Figure 7.1 and Figure 7.8), their interaction primarily arises from long-range electrostatic forces,
rather than from short-range hydrogen bonding or van der Waals interaction. Unlike Lys-210,
Lys-226, and Arg-410, which interact strongly with almost every PFAS studied, binding energy
contribution of Ser-247, Gln-285, His-327, and His-407 is alternating for different PFASs. To
better understand the H-bonding behavior of hPXR residues interact with the PFASs, insight is
gained about the hydrogen bond lifetimes using MD trajectories and the results showed that Ser-
247, Gln-285, His-327, and His-407 commonly make hydrogen bonds with PFASs.
    Figure 7. 4 Hydrogen bond lifetimes observed during MD simulations.
                                                 161


    Hydrogen bond lifetime analysis shows that the stability of the H-bond between Gln-285,
Ser-247, His-327, His-407, and PFASs is ligand dependent as the H-bonding lifetimes vary
(Figure 7.4). The PFASs’ carboxylic acid and sulfonic acid functional groups often engage in
hydrogen bonding with the hydroxyl, amide, and imidazole groups in Gln-285, Ser-247, His-327,
and His-407, respectively (Figure 7.5). On the other hand, the fluorine atoms present on the
PFAS’s perfluorinated carbon chains do not form any significant hydrogen bonds during
simulations (Figure 7.4).
    Finally, despite residue decomposition which shows that Lys-210, Lys-226, and Arg-410
contribute significantly to the binding of PFASs to hPXR, Lys-210, Lys-226, and Arg-410 do not
form hydrogen bond with PFASs. This further supports that PFASs interact mainly through long-
range electrostatics rather than short-range interactions such as through hydrogen bonding with
hPXR. The lack of hydrogen bonding for the Lys-210, Lys-226, and Arg-410 may be attributed
to the orientation of PFASs within the hPXR binding site. Carboxylic acid and sulfonic acid
functional groups facing inside of the binding cavity is commonly observed upon PFAS−hPXR
binding, allowing PFASs to hydrogen bond with cavity residues such as Ser-247, Gln-285, His-
327, and His-407. In contrast, Lys-210, Lys-226, and Arg-410 are located near the entrance of
the binding cavity and could not form hydrogen bonds. 6:2 FTOH, which contains an alcohol
rather than an acidic group, does not form any significant hydrogen bond throughout the
simulations (Figure 7. 4).
                                                162


   Figure 7. 5 Important residues that mediate ligand stability through hydrogen bonding.
   Residue decomposition results showed that Asp-205, Asp-245, and Glu-321 destabilize the
binding of PFASs to the hPXR LBD. The destabilizations most likely arise from the repulsion
between the negative charge of aspartic acid, glutamic acid residues, and the negative charge of
PFASs present on the carboxylic acid or sulfonic acid functional groups. Mutagenesis of Asp-
205, Asp-245, and Glu-321 to alanine in selected ligand−protein complexes (PFOS, PFOA,
ADONA, Gen-X, 6:2 FTCA, and PFBS) showed an overall decrease in total electrostatic energy
(EEL) contribution for every ligand mutant complex (Figure 7. 6).
                                              163


    Figure 7. 6 Total electrostatic energy (EEL) contribution of various PFASs on binding to
mutant hPXR complexes.
    When compared to the EEL energies of the wild-type ligand−protein complexes, the
presence of Glu-321 reduces the favorable EEL contribution by an average of −39.51 kcal/ mol,
Asp-245 reduces EEL contribution by an average of −16.73 kcal mol-1, and Asp-205 reduces it
by an average of −25.53 kcal mol-1. This implies that the net negative charges of Asp-205, Asp-
245, and Glu-321 destabilize the binding of PFASs to hPXR, and proteins with more acidic
residues in their binding pockets are less likely to be PFAS targets, which has implications on the
evaluation of potential PFAS protein targets.
    Residue decomposition and hydrogen bond analysis provide an understanding about how the
chemical structure of PFASs affects their binding behavior. The results indicate that
carboxylate/sulfonate functional groups on the PFAS’s structure contribute strongly to its hPXR
binding through long-range electrostatic interactions with Arg-410, Lys-210, and Lys-226 and
                                                 164


H-bonding with Gln-285, Ser-247, His-327, and His-407, and that ADONA, Gen-X, PFBS, and
6:2 FTCA are potential hPXR agonists. Thus, at least for hPXR, these efforts suggest that further
insight about the impact of PFAS without a carboxylic acid or sulfonic acid functional group
should be garnered to identify alternative PFASs that are less potent to hPXR and other proteins.
                                              165


APPENDIX
   166


Table 7. 3 All PFAS ligands tested.
                                    167


Table 7. 4 MMPBSA and MMGBSA relative binding energies of every PFAS tested.
                  Ligands       MMPBSA            MMGBSA
                   PFBA         -18.91±7.9         -19.31±4.4
                   PFPA         -23.97±6.2         -26.19±4.6
                  PFHxA        -22.71±11.4         -27.84±7.0
                  PFHpA         -21.60±8.4         -29.74±6.9
                   PFOA         -26.72±7.3         -34.51±6.7
                   PFNA         -24.51±6.7         -38.81±6.2
                   PFDA        -28.85±11.6        -40.52±10.3
                  PFDoA        -27.27±10.2         -40.38±8.3
                   PFOS         -26.14±6.1         -35.61±5.4
                   PFBS         -22.99±6.9         -23.96±5.2
                6:2 FTOH       -21.98±5.39        -30.37±4.59
                 6:2 FTCA       -19.72±7.9         -28.31±6.4
                  GEN X         -23.27±7.6         -25.64±4.7
                  ADONA        -22.68±10.7         -26.45±8.0
                                   168


    Table 7. 5 Long-chain PFAS average per-residue decomposition energies (kcal mol-1).
       PFOA                   PFNA                    PFDA                  PFDoA
            Average                Average                 Average                Average
Residue                 Residue                Residue               Residue
            ∆G Bind                ∆G Bind                 ∆G Bind                ∆G Bind
Lys-210      -20.97     Lys-210     -23.94     Lys-210      -21.59   Lys-210       -24.85
Lys-226      -16.86     Lys-226     -15.60     Lys-226      -15.94   Lys-226       -16.54
 Ser-247     -13.59     Ser-247      -6.55     Ser-247       -7.56   Ser-247        -5.22
Gln-285       -7.61     Gln-285      -9.21     Gln-285       -8.40   Gln-285        -7.63
 His-327      -4.05     His-327      -8.23     His-327       -6.40   His-327        -4.82
 His-407     -15.32     His-407      -9.73     His-407      -14.20   His-407       -12.75
Arg-410      -33.53     Arg-410     -36.67     Arg-410      -42.49   Arg-410       -48.64
       PFOS                 6:2 FTOH                     Average PFAS Binding
            Average                Average
Residue                 Residue                      Residue          Average ∆G Bind
            ∆G Bind                ∆G Bind
Lys-210      -22.75     Ser-208      -4.75           Lys-210                -18.96
Lys-226      -17.12     Leu-209      -2.01           Lys-226                -13.65
 Ser-247      -7.57     Gln-285      -2.14           Ser-247                 -6.81
Gln-285       -9.30     Phe-288      -3.46           Gln-285                 -7.38
 His-327      -4.56     Trp-299      -4.86           His-327                 -4.82
 His-407     -11.45     Tyr-306      -2.99           His-407                -10.90
Arg-410      -46.82     Met-323      -2.07           Arg-410                -34.65
Residues with interactions lower than -5 kcal mol-1 are shown. The major residues (Lys-210,
Lys-226, Ser-247, Gln-285, His-327, and Arg-410) are listed regardless of their interaction
energy.
                                               169


     Table 7. 6 Short-chain/alternative PFAS average per-residue decomposition energies.
        PFBA                     PFPA                PFHxA                   PFHpA
             Average                  Average              Average                Average
 Residue                  Residue              Residue                 Residue
             ∆G Bind                  ∆G Bind             ∆G Bind                 ∆G Bind
 Lys-210      -21.38      Lys-210      -21.42  Lys-210      -21.29     Lys-210     -22.54
 Lys-226      -15.78      Lys-226      -16.51  Lys-226      -17.49     Lys-226     -17.66
 Ser-247       -9.12      Ser-247       -2.26  Ser-247      -8.271     Ser-247      -8.65
 Gln-285       -7.91      Gln-285      -10.53  Gln-285       -9.68     Gln-285      -7.61
 His-327       -4.19      His-327       -8.22  His-327       -4.98     His-327      -4.85
 His-407      -15.26      His-407      -14.12  His-407      -11.90     His-407     -10.48
Arg-410       -35.20     Arg-410       -38.03  Arg-410      -37.49     Arg-410     -36.69
       ADONA                     GEN X              6:2 FTCA                  PFBS
             Average                  Average              Average                Average
 Residue                  Residue              Residue                 Residue
             ∆G Bind                  ∆G Bind             ∆G Bind                 ∆G Bind
 Lys-210      -29.70      Lys-210      -23.19  Lys-210      -22.69     Lys-210     -22.26
 Lys-226      -19.51      Lys-226      -17.87  Lys-226      -17.46     Lys-226     -16.24
 Ser-247       -0.89      Ser-247       -8.00  Ser-247       -7.95     Ser-247     -12.17
 Gln-285       -3.64      Gln-285       -5.63  Gln-285       -6.67     Gln-285     -12.62
 His-327      -5.386      His-327       -4.53  His-327       -4.79     His-327      -3.60
 His-407       -3.92      His-407      -15.52  His-407      -11.08     His-407     -14.38
Arg-410       -55.74     Arg-410       -42.20  Arg-410      -44.30     Arg-410     -34.15
                                    Average PFAS Binding
 Residue Lys-210          Lys-226 Ser-247 Gln-285 His-327              His-407 Arg-410
Average
              -22.81     -17.3141       -7.16    -8.04       -5.07      -12.08     -40.48
∆G Bind
ADONA, GEN X, and 6:2 FTCA are alternatives of PFOA. PFBS is a short chain variant of
PFOS. PFBA, PFPA, PFHxA, and PFHpA are short chain variants of the long chain
perfluoroalkyl carboxylic acids (Table 1). Residues with interactions lower than -5 kcal mol-1 are
shown. The major residues (Lys-210, Lys-226, Ser-247, Gln-285, His-327, and Arg-410) are
listed regardless of their interaction energy.
                                               170


    Table 7. 7 Total electrostatic energies of various mutant PFAS-hPXR complexes.
        Mutagenesis of Asp-250, Asp-245, Glu-321 MMGBSA Total Electrostatic Energies
                        Wild Type           Asp205Ala          Asp245Ala      Glu321Ala
      PFOA               -102.96              -119.91           -126.25         -141.48
      PFOS                -96.23              -115.21           -109.64         -125.50
      PFBS               -100.09              -122.87           -115.48         -124.49
     ADONA               -106.59              -125.45           -106.25         -150.38
    6:2 FTCA              -91.58              -124.97           -112.12         -148.52
    Figure 7. 7 Average residue contributions to the PFAS binding to hPXR calculated from
residue decomposition.
                                                 171


Figure 7. 8 Arg-410 and Lys-210 positioned outside of the binding cavity.
Figure 7. 9 Comparison of VDW and electrostatic energies of every tested ligand.
                                         172


    Figure 7. 10 Electrostatic energies + energy of solvation calculated by MMGBSA for every
tested ligand.
                                               173


Figure 7. 11 Binding modes of PFASs to mutant hPXR ligand binding pocket.
                                        174


   Figure 7. 12 Root mean square deviation (RMSD) plots of the highest affinity PFAS poses
from 30ns MD simulations.
                                           175


REFERENCES
    176


                                        REFERENCES
(1)  Kissa, E. Fluorinated Surfactants and Repellents, 2nd ed.; Schick, M., Hubbard, A., Eds.;
     Marcel Dekker: New York, 2001; Vol. 97.
(2)  Schaider, L. A.; Balan, S. A.; Blum, A.; Andrews, D. Q.; Strynar, M. J.; Dickinson, M. E.;
     Lunderberg, D. M.; Lang, J. R.; Peaslee, G. F. Fluorinated Compounds in U.S. Fast Food
     Packaging.      Environ.     Sci.     Technol.   Lett.    2017,    4     (3),    105–111.
     https://doi.org/10.1021/acs.estlett.6b00435.
(3)  Rao, N. S.; Baker, B. E. Textile Finishes and Fluorosurfactants. In Organofluorine
     Chemistry; Banks, R. E., Smart, B. E., Tatlow, J. C., Eds.; Springer US: Boston, MA,
     1994; pp 321–338. https://doi.org/10.1007/978-1-4899-1202-2_15.
(4)  Sajid, M.; Ilyas, M. PTFE-Coated Non-Stick Cookware and Toxicity Concerns: A
     Perspective. Environ. Sci. Pollut. Res. 2017, 24 (30), 23436–23440.
     https://doi.org/10.1007/s11356-017-0095-y.
(5)  Matheny, K. PFAS contamination is Michigan’s biggest environmental crisis in 40 years.
(6)  Gardner, P.; Ellison, G. Michigan’s next water crisis is PFAS - and you may already be
     affected.
(7)  O’Hagan, D. Understanding Organofluorine Chemistry. An Introduction to the C–F Bond.
     Chem. Soc. Rev. 2008, 37 (2), 308–319. https://doi.org/10.1039/B711844A.
(8)  Conder, J. M.; Hoke, R. A.; Wolf, W. de; Russell, M. H.; Buck, R. C. Are PFCAs
     Bioaccumulative? A Critical Review and Comparison with Regulatory Criteria and
     Persistent Lipophilic Compounds. Environ. Sci. Technol. 2008, 42 (4), 995–1003.
     https://doi.org/10.1021/es070895g.
(9)  Biege, L. B.; Hurtt, M. E.; Frame, S. R.; O’Connor, J. C.; Cook, J. C. Mechanisms of
     Extrahepatic Tumor Induction by Peroxisome Proliferators in Male CD Rats. Toxicol. Sci.
     2001, 60 (1), 44–55. https://doi.org/10.1093/toxsci/60.1.44.
(10) Yang, Q.; Xie, Y.; Depierre, J. W. Effects of Peroxisome Proliferators on the Thymus and
     Spleen      of   Mice.     Clin.     Exp.    Immunol.    2000,    122     (2),   219–226.
     https://doi.org/10.1046/j.1365-2249.2000.01367.x.
(11) Yang, Q.; Xie, Y.; Eriksson, A. M.; Nelson, B. D.; DePierre, J. W. Further Evidence for
     the Involvement of Inhibition of Cell Proliferation and Development in Thymic and
     Splenic Atrophy Induced by the Peroxisome Proliferator Perfluoroctanoic Acid in Mice.
     Biochem. Pharmacol. 2001, 62 (8), 1133–1140. https://doi.org/10.1016/S0006-
     2952(01)00752-3.
(12) Yang, Q.; Abedi-Valugerdi, M.; Xie, Y.; Zhao, X.-Y.; Möller, G.; Dean Nelson, B.;
     DePierre, J. W. Potent Suppression of the Adaptive Immune Response in Mice upon
                                               177


     Dietary Exposure to the Potent Peroxisome Proliferator, Perfluorooctanoic Acid. Int.
     Immunopharmacol. 2002, 2 (2–3), 389–397. https://doi.org/10.1016/S1567-
     5769(01)00164-3.
(13) Yang, Q.; Xie, Y.; Alexson, S. E. H.; Dean Nelson, B.; DePierre, J. W. Involvement of the
     Peroxisome Proliferator-Activated Receptor Alpha in the Immunomodulation Caused by
     Peroxisome Proliferators in Mice. Biochem. Pharmacol. 2002, 63 (10), 1893–1900.
     https://doi.org/10.1016/S0006-2952(02)00923-1.
(14) Giesy, J. P.; Kannan, K. Global Distribution of Perfluorooctane Sulfonate in Wildlife.
     Environ. Sci. Technol. 2001, 35 (7), 1339–1342. https://doi.org/10.1021/es001834k.
(15) Langley, A. E.; Pilcher, G. D. Thyroid, Bradycardic and Hypothermic Effects of
     Perfluoro‐n‐decanoic Acid in Rats. J. Toxicol. Environ. Health 1985, 15 (3–4), 485–491.
     https://doi.org/10.1080/15287398509530675.
(16) US EPA. EPA and 3M announce phase out of PFOS.
(17) US       EPA.      Fact     Sheet:     2010/2015      PFOA        Stewardship     Program
     https://www.epa.gov/assessing-and-managing-chemicals-under-tsca/%0Afact-sheet-
     20102015-pfoa-stewardship-program (accessed Apr 29, 2019).
(18) Buck, R. C.; Franklin, J.; Berger, U.; Conder, J. M.; Cousins, I. T.; Voogt, P. De; Jensen,
     A. A.; Kannan, K.; Mabury, S. A.; van Leeuwen, S. P. J. Perfluoroalkyl and
     Polyfluoroalkyl Substances in the Environment: Terminology, Classification, and Origins.
     Integr. Environ. Assess. Manag. 2011, 7 (4), 513–541. https://doi.org/10.1002/ieam.258.
(19) Wang, Y.; Chang, W.; Wang, L.; Zhang, Y.; Zhang, Y.; Wang, M.; Wang, Y.; Li, P. A
     Review of Sources, Multimedia Distribution and Health Risks of Novel Fluorinated
     Alternatives.       Ecotoxicol.      Environ.      Saf.      2019,      182,       109402.
     https://doi.org/10.1016/j.ecoenv.2019.109402.
(20) Ahearn, A. A Regrettable Substitute: The Story of GenX. Pod. Res. Perspect. 2019, 2019
     (1), EHP5134. https://doi.org/10.1289/EHP5134.
(21) Poulsen, P. B.; Jensen, A. A.; Wallström, E.; Aps, E. More Environmentally Friendly
     Alternatives to PFOS-Compounds and PFOA; 2005.
(22) Wang, Z.; Cousins, I. T.; Scheringer, M.; Hungerbühler, K. Fluorinated Alternatives to
     Long-Chain Perfluoroalkyl Carboxylic Acids (PFCAs), Perfluoroalkane Sulfonic Acids
     (PFSAs) and Their Potential Precursors. Environ. Int. 2013, 60, 242–248.
     https://doi.org/10.1016/j.envint.2013.08.021.
(23) Sagisaka, M.; Ito, A.; Kondo, Y.; Yoshino, N.; Ok Kwon, K.; Sakai, H.; Abe, M. Effects
     of Fluoroalkyl Chain Length and Added Moles of Oxyethylene on Aggregate Formation
     of Branched-Tail Fluorinated Anionic Surfactants. Colloids Surfaces A Physicochem. Eng.
     Asp. 2001, 183–185, 749–755. https://doi.org/10.1016/S0927-7757(01)00501-5.
                                              178


(24) Buck, R. C.; Murphy, P. M.; Pabon, M. Chemistry, Properties, and Uses of Commercial
     Fluorinated Surfactants; 2012; pp 1–24. https://doi.org/10.1007/978-3-642-21872-9_1.
(25) Sunderland, E. M.; Hu, X. C.; Dassuncao, C.; Tokranov, A. K.; Wagner, C. C.; Allen, J.
     G. A Review of the Pathways of Human Exposure to Poly- and Perfluoroalkyl Substances
     (PFASs) and Present Understanding of Health Effects. J. Expo. Sci. Environ. Epidemiol.
     2019, 29 (2), 131–147. https://doi.org/10.1038/s41370-018-0094-1.
(26) Gomis, M. I.; Vestergren, R.; Borg, D.; Cousins, I. T. Comparing the Toxic Potency in
     Vivo of Long-Chain Perfluoroalkyl Acids and Fluorinated Alternatives. Environ. Int.
     2018, 113, 1–9. https://doi.org/10.1016/j.envint.2018.01.011.
(27) Conley, J. M.; Lambright, C. S.; Evans, N.; Strynar, M. J.; McCord, J.; McIntyre, B. S.;
     Travlos, G. S.; Cardon, M. C.; Medlock-Kakaley, E.; Hartig, P. C.; Wilson, V. S.; Gray,
     L. E. Adverse Maternal, Fetal, and Postnatal Effects of Hexafluoropropylene Oxide Dimer
     Acid (GenX) from Oral Gestational Exposure in Sprague-Dawley Rats. Environ. Health
     Perspect. 2019, 127 (3), 37008. https://doi.org/10.1289/EHP4372.
(28) Qin, P.; Liu, R.; Pan, X.; Fang, X.; Mou, Y. Impact of Carbon Chain Length on Binding
     of Perfluoroalkyl Acids to Bovine Serum Albumin Determined by Spectroscopic
     Methods.       J.    Agric.     Food      Chem.       2010,     58      (9),   5561–5567.
     https://doi.org/10.1021/jf100412q.
(29) Kudo, N.; Suzuki-Nakajima, E.; Mitsumoto, A.; Kawashima, Y. Responses of the Liver to
     Perfluorinated Fatty Acids with Different Carbon Chain Length in Male and Female Mice:
     In Relation to Induction of Hepatomegaly, Peroxisomal β-Oxidation and Microsomal 1-
     Acylglycerophosphocholine Acyltransferase. Biol. Pharm. Bull. 2006, 29 (9), 1952–1957.
     https://doi.org/10.1248/bpb.29.1952.
(30) Takacs, M. L.; Abbott, B. D. Activation of Mouse and Human Peroxisome Proliferator–
     Activated Receptors (α, β/δ, γ) by Perfluorooctanoic Acid and Perfluorooctane Sulfonate.
     Toxicol. Sci. 2007, 95 (1), 108–117. https://doi.org/10.1093/toxsci/kfl135.
(31) Ikeda, T.; Aiba, K.; Fukuda, K.; Tanaka, M. The Induction of Peroxisome Proliferation in
     Rat Liver by Perfluorinated Fatty Acids, Metabolically Inert Derivatives of Fatty Acids. J.
     Biochem. 1985, 98 (2), 475–482.
(32) Pastoor, T. P.; Lee, K. P.; Perri, M. A.; Gillies, P. J. Biochemical and Morphological
     Studies of Ammonium Perfluorooctanoate-Induced Hepatomegaly and Peroxisome
     Proliferation. Exp. Mol. Pathol. 1987, 47 (1), 98–109. https://doi.org/10.1016/0014-
     4800(87)90011-6.
(33) Abdellatif, A.; Preat, V.; Taper, H. S.; Roberfroid, M. The Modulation of Rat Liver
     Carcinogenesis by Perfluorooctanoic Acid, a Peroxisome Proliferator. Toxicol. Appl.
     Pharmacol. 1991, 111 (3), 530–537. https://doi.org/10.1016/0041-008X(91)90257-F.
(34) Ren, X. M.; Qin, W. P.; Cao, L. Y.; Zhang, J.; Yang, Y.; Wan, B.; Guo, L. H. Binding
     Interactions of Perfluoroalkyl Substances with Thyroid Hormone Transport Proteins and
                                              179


     Potential Toxicological Implications.           Toxicology    2016,    366–367,    32–42.
     https://doi.org/10.1016/j.tox.2016.08.011.
(35) Zhang, L.; Ren, X. M.; Guo, L. H. Structure-Based Investigation on the Interaction of
     Perfluorinated Compounds with Human Liver Fatty Acid Binding Protein. Environ. Sci.
     Technol. 2013, 47 (19), 11293–11301. https://doi.org/10.1021/es4026722.
(36) Han, X.; Snow, T. A.; Kemper, R. A.; Jepson, G. W. Binding of Perfluorooctanoic Acid to
     Rat and Human Plasma Proteins. Chem. Res. Toxicol. 2003, 16 (6), 775–781.
     https://doi.org/10.1021/tx034005w.
(37) Cheng, W.; Ng, C. A. Predicting Relative Protein Affinity of Novel Per- and
     Polyfluoroalkyl Substances (PFASs) by An Efficient Molecular Dynamics Approach.
     Environ. Sci. Technol. 2018, 52 (14), 7972–7980. https://doi.org/10.1021/acs.est.8b01268.
(38) Shrestha, S.; Bloom, M. S.; Yucel, R.; Seegal, R. F.; Wu, Q.; Kannan, K.; Rej, R.;
     Fitzgerald, E. F. Perfluoroalkyl Substances and Thyroid Function in Older Adults.
     Environ. Int. 2015, 75, 206–214. https://doi.org/10.1016/j.envint.2014.11.018.
(39) Nelson, J. W.; Hatch, E. E.; Webster, T. F. Exposure to Polyfluoroalkyl Chemicals and
     Cholesterol, Body Weight, and Insulin Resistance in the General U.S. Population.
     Environ. Health Perspect. 2010, 118 (2), 197–202. https://doi.org/10.1289/ehp.0901165.
(40) Granum, B.; Haug, L. S.; Namork, E.; Stølevik, S. B.; Thomsen, C.; Aaberge, I. S.; Van
     Loveren, H.; Løvik, M.; Nygaard, U. C. Pre-Natal Exposure to Perfluoroalkyl Substances
     May Be Associated with Altered Vaccine Antibody Levels and Immune-Related Health
     Outcomes in Early Childhood. J. Immunotoxicol. 2013, 10 (4), 373–379.
     https://doi.org/10.3109/1547691X.2012.755580.
(41) Son, H.-Y.; Kim, S.-H.; Shin, H.-I.; Bae, H. I.; Yang, J.-H. Perfluorooctanoic Acid-
     Induced Hepatic Toxicity Following 21-Day Oral Exposure in Mice. Arch. Toxicol. 2008,
     82 (4), 239–246. https://doi.org/10.1007/s00204-007-0246-x.
(42) Ng, C. A.; Hungerbühler, K. Bioconcentration of Perfluorinated Alkyl Acids: How
     Important Is Specific Binding? Environ. Sci. Technol. 2013, 47 (13), 7214–7223.
     https://doi.org/10.1021/es400981a.
(43) Rastelli, G.; Rio, D. A.; Degliesposti, G.; Sgobba, M. Fast and Accurate Predictions of
     Binding Free Energies Using MM-PBSA and MM-GBSA. J. Comput. Chem. 2009, NA--
     NA. https://doi.org/10.1002/jcc.21372.
(44) de Ruiter, A.; Oostenbrink, C. Free Energy Calculations of Protein–Ligand Interactions.
     Curr.        Opin.        Chem.       Biol.       2011,       15       (4),      547–552.
     https://doi.org/10.1016/j.cbpa.2011.05.021.
(45) Kliewer, S. A.; Willson, T. M. Regulation of Xenobiotic and Bile Acid Metabolism by the
     Nuclear Pregnane X Receptor. J. Lipid Res. 2002, 43 (3), 359–364.
                                              180


(46) Ma, X.; Idle, J. R.; Gonzalez, F. J. The Pregnane X Receptor: From Bench to Bedside.
     Expert       Opin.    Drug      Metab.      Toxicol.      2008,      4      (7),   895–908.
     https://doi.org/10.1517/17425255.4.7.895.
(47) Blumberg, B.; Sabbagh, W.; Juguilon, H.; Bolado, J.; Van Meter, C. M.; Ong, E. S.;
     Evans, R. M. SXR, a Novel Steroid and Xenobiotic-Sensing Nuclear Receptor. Genes
     Dev. 1998, 12 (20), 3195–3205. https://doi.org/10.1101/gad.12.20.3195.
(48) Watkins, R. E. The Human Nuclear Xenobiotic Receptor PXR: Structural Determinants of
     Directed Promiscuity. Science (80-. ). 2001, 292 (5525), 2329–2333.
     https://doi.org/10.1126/science.1060762.
(49) Kliewer, S. A.; Moore, J. T.; Wade, L.; Staudinger, J. L.; Watson, M. A.; Jones, S. A.;
     McKee, D. D.; Oliver, B. B.; Willson, T. M.; Zetterström, R. H.; Perlmann, T.; Lehmann,
     J. M. An Orphan Nuclear Receptor Activated by Pregnanes Defines a Novel Steroid
     Signaling Pathway. Cell 1998, 92 (1), 73–82. https://doi.org/10.1016/S0092-
     8674(00)80900-9.
(50) Mani, S.; Dou, W.; Redinbo, M. R. PXR Antagonists and Implication in Drug
     Metabolism.         Drug      Metab.       Rev.       2013,         45       (1),     60–72.
     https://doi.org/10.3109/03602532.2012.746363.
(51) Navaratnarajah, P.; Steele, B. L.; Redinbo, M. R.; Thompson, N. L. Rifampicin-
     Independent Interactions between the Pregnane X Receptor Ligand Binding Domain and
     Peptide Fragments of Coactivator and Corepressor Proteins. Biochemistry 2012, 51 (1),
     19–31. https://doi.org/10.1021/bi2011674.
(52) Zhai, Y.; Pai, H. V.; Zhou, J.; Amico, J. A.; Vollmer, R. R.; Xie, W. Activation of
     Pregnane X Receptor Disrupts Glucocorticoid and Mineralocorticoid Homeostasis. Mol.
     Endocrinol. 2007, 21 (1), 138–147. https://doi.org/10.1210/me.2006-0291.
(53) Zhou, J.; Febbraio, M.; Wada, T.; Zhai, Y.; Kuruba, R.; He, J.; Lee, J. H.; Khadem, S.;
     Ren, S.; Li, S.; Silverstein, R. L.; Xie, W. Hepatic Fatty Acid Transporter Cd36 Is a
     Common Target of LXR, PXR, and PPARγ in Promoting Steatosis. Gastroenterology
     2008, 134 (2), 556–567. https://doi.org/10.1053/j.gastro.2007.11.037.
(54) Gong, H.; Singh, S. V.; Singh, S. P.; Mu, Y.; Lee, J. H.; Saini, S. P. S.; Toma, D.; Ren, S.;
     Kagan, V. E.; Day, B. W.; Zimniak, P.; Xie, W. Orphan Nuclear Receptor Pregnane X
     Receptor Sensitizes Oxidative Stress Responses in Transgenic Mice and Cancerous Cells.
     Mol. Endocrinol. 2006, 20 (2), 279–290. https://doi.org/10.1210/me.2005-0205.
(55) Lehmann, J. M.; McKee, D. D.; Watson, M. A.; Willson, T. M.; Moore, J. T.; Kliewer, S.
     A. The Human Orphan Nuclear Receptor PXR Is Activated by Compounds That Regulate
     CYP3A4 Gene Expression and Cause Drug Interactions. J. Clin. Invest. 1998, 102 (5),
     1016–1023. https://doi.org/10.1172/JCI3703.
(56) Zhang, Y.-M.; Wang, T.; Yang, X.-S. An in Vitro and in Silico Investigation of Human
     Pregnane X Receptor Agonistic Activity of Poly- and Perfluorinated Compounds Using
                                             181


     the Heuristic Method–Best Subset and Comparative Similarity Indices Analysis.
     Chemosphere 2020, 240, 124789. https://doi.org/10.1016/j.chemosphere.2019.124789.
(57) Zhang, Y. M.; Dong, X. Y.; Fan, L. J.; Zhang, Z. L.; Wang, Q.; Jiang, N.; Yang, X. S.
     Poly- and Perfluorinated Compounds Activate Human Pregnane X Receptor. Toxicology
     2017, 380, 23–29. https://doi.org/10.1016/j.tox.2017.01.012.
(58) Vaz, R. J.; Li, Y.; Chellaraj, V.; Reiling, S.; Kuntzweiler, T.; Yang, D.; Shen, H.;
     Batchelor, J. D.; Zhang, Y.; Chen, X.; McLean, L. R.; Kosley Jr., R. Amelioration of
     PXR-Mediated CYP3A4 Induction by MGluR2 Modulators. Bioorg. Med. Chem. Lett.
     2018, 28, 3194–3196. https://doi.org/10.2210/PDB6DUP/PDB.
(59) Chemical Computing Group Inc. Molecular Operating Environment (MOE). 2016.
(60) Labute, P.; Santavy, M. SiteFinder-Locating Binding Sites in Protein Structures
     http://www.chempcomp.com/journal/sitefind.htm%5Cnhttps://www.chemcomp.com/journ
     al/sitefind.htm.
(61) Xue, Y.; Chao, E.; Zuercher, W. J.; Willson, T. M.; Collins, J. L.; Redinbo, M. R. Crystal
     Structure of the PXR–T1317 Complex Provides a Scaffold to Examine the Potential for
     Receptor Antagonism. Bioorg. Med. Chem. 2007, 15 (5), 2156–2166.
     https://doi.org/10.1016/j.bmc.2006.12.026.
(62) Chrencik, J. E.; Orans, J.; Moore, L. B.; Xue, Y.; Peng, L.; Collins, J. L.; Wisely, G. B.;
     Lambert, M. H.; Kliewer, S. A.; Redinbo, M. R. Structural Disorder in the Complex of
     Human Pregnane X Receptor and the Macrolide Antibiotic Rifampicin. Mol. Endocrinol.
     2005, 19 (5), 1125–1134. https://doi.org/10.1210/me.2004-0346.
(63) Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B. A.;
     Thiessen, P. A.; Yu, B.; Zaslavsky, L.; Zhang, J.; Bolton, E. E. PubChem 2019 Update:
     Improved Access to Chemical Data. Nucleic Acids Res. 2018, 47 (D1), D1102--D1109.
     https://doi.org/10.1093/nar/gky1033.
(64) Labute, P. Protonate3D: Assignment of Ionization States and Hydrogen Coordinates to
     Macromolecular Structures. Proteins Struct. Funct. Bioinforma. 2009, 75 (1), 187–205.
     https://doi.org/10.1002/prot.22234.
(65) Hoffmann, R. An Extended Hückel Theory. I. Hydrocarbons. J. Chem. Phys. 1963, 39 (6),
     1397–1412. https://doi.org/10.1063/1.1734456.
(66) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C.; Brook, S.;
     Brook, S.; Brook, S. Comparison of Multiple AMBER Force Fields and Development of
     Improved Protien Backbone Parameters. Proteins 2006, 65 (3), 712–725.
     https://doi.org/10.1002/prot.21123.Comparison.
(67) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and
     Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25 (9), 1157–1174.
     https://doi.org/10.1002/jcc.20035.
                                             182


(68) Corbeil, C. R.; Williams, C. I.; Labute, P. Variability in Docking Success Rates Due to
     Dataset Preparation. J. Comput. Aided. Mol. Des. 2012, 26 (6), 775–786.
     https://doi.org/10.1007/s10822-012-9570-1.
(69) Case, D. A.; Cerutti, D. S.; T.E. Cheatham, I. I. I.; Darden, T. A.; Duke, R. E.; Giese, T.
     J.; Gohlke, H.; Goetz, A. W.; Greene, D.; Homeyer, N.; Izadi, S.; Kovalenko, A.; Lee, T.
     S.; LeGrand, S.; Li, P.; Lin, C.; Liu, J.; Luchko, T.; Luo, R.; Mermelstein, D.; Merz, K.
     M.; Monard, G.; Nguyen, H.; Omelyan, I.; Onufriev, A.; Pan, F.; Qi, R.; Roe, D. R.;
     Roitberg, A. E.; Sagui, C.; Simmerling, C. L.; Botello-Smith, W. M.; Swails, J.; Walker,
     R. C.; Wang, J.; Wolf, R. M.; Wu, X.; Xiao, L.; York, D. M.; Kollman, P. A. Amber17.
     2017, No. April. https://doi.org/10.13140/RG.2.2.36172.41606.
(70) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C.
     Ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from
     Ff99SB.       J.    Chem.      Theory      Comput.      2015,    11    (8),    3696–3713.
     https://doi.org/10.1021/acs.jctc.5b00255.
(71) Jakalian, A.; Jack, D. B.; Bayly, C. I. Fast, Efficient Generation of High-Quality Atomic
     Charges. AM1-BCC Model: II. Parameterization and Validation. J. Comput. Chem. 2002,
     23 (16), 1623–1641. https://doi.org/10.1002/jcc.10128.
(72) Joung, I. S.; Cheatham, T. E. Determination of Alkali and Halide Monovalent Ion
     Parameters for Use in Explicitly Solvated Biomolecular Simulations. J. Phys. Chem. B
     2008, 112 (30), 9020–9041. https://doi.org/10.1021/jp8001614.
(73) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. Numerical Integration of the Cartesian
     Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes. J.
     Comput. Phys. 1977, 23 (3), 327–341. https://doi.org/10.1016/0021-9991(77)90098-5.
(74) Miller, B. R.; McGee, T. D.; Swails, J. M.; Homeyer, N.; Gohlke, H.; Roitberg, A. E.
     MMPBSA.Py: An Efficient Program for End-State Free Energy Calculations. J. Chem.
     Theory Comput. 2012, 8 (9), 3314–3321. https://doi.org/10.1021/ct300418h.
(75) Eken, Y.; Patel, P.; Díaz, T.; Jones, M. R.; Wilson, A. K. SAMPL6 Host–Guest
     Challenge: Binding Free Energies via a Multistep Approach. J. Comput. Aided. Mol. Des.
     2018, 32 (10), 1097–1115. https://doi.org/10.1007/s10822-018-0159-1.
(76) Hou, T.; Wang, J.; Li, Y.; Wang, W. Assessing the Performance of the MM/PBSA and
     MM/GBSA Methods. 1. The Accuracy of Binding Free Energy Calculations Based on
     Molecular Dynamics Simulations. J. Chem. Inf. Model. 2011, 51 (1), 69–82.
     https://doi.org/10.1021/ci100275a.
(77) Onufriev, A.; Bashford, D.; Case, D. A. Exploring Protein Native States and Large-Scale
     Conformational Changes with a Modified Generalized Born Model. Proteins Struct.
     Funct. Genet. 2004, 55 (2), 383–394. https://doi.org/10.1002/prot.20033.
(78) Roe, D. R. Introduction to hydrogen bond analysis https://amber.utah.edu/AMBER-
     workshop/London-2015/Hbond/ (accessed Apr 19, 2019).
                                              183


(79) Durdagi, S.; Şentürk, M.; Ekinci, D.; Balaydın, H. T.; Göksu, S.; Küfrevioğlu, Ö. İ.;
     Innocenti, A.; Scozzafava, A.; Supuran, C. T. Kinetic and Docking Studies of Phenol-
     Based Inhibitors of Carbonic Anhydrase Isoforms I, II, IX and XII Evidence a New
     Binding Mode within the Enzyme Active Site. Bioorg. Med. Chem. 2011, 19 (4), 1381–
     1389. https://doi.org/10.1016/j.bmc.2011.01.016.
(80) Singh, N.; Tiwari, S.; Srivastava, K. K.; Siddiqi, M. I. Identification of Novel Inhibitors of
     Mycobacterium Tuberculosis PknG Using Pharmacophore Based Virtual Screening,
     Docking, Molecular Dynamics Simulation, and Their Biological Evaluation. J. Chem. Inf.
     Model. 2015, 55 (6), 1120–1129. https://doi.org/10.1021/acs.jcim.5b00150.
(81) Khan, K. M.; Rahim, F.; Halim, S. A.; Taha, M.; Khan, M.; Perveen, S.; Zaheer-ul-Haq;
     Mesaik, M. A.; Iqbal Choudhary, M. Synthesis of Novel Inhibitors of β-Glucuronidase
     Based on Benzothiazole Skeleton and Study of Their Binding Affinity by Molecular
     Docking.        Bioorg.     Med.       Chem.       2011,       19       (14),      4286–4294.
     https://doi.org/10.1016/j.bmc.2011.05.052.
(82) Salam, N. K.; Huang, T. H.-W.; Kota, B. P.; Kim, M. S.; Li, Y.; Hibbs, D. E. Novel
     PPAR-Gamma Agonists Identified from a Natural Product Library: A Virtual Screening,
     Induced-Fit Docking and Biological Assay Study. Chem. Biol. Drug Des. 2007, 71 (1),
     57–70. https://doi.org/10.1111/j.1747-0285.2007.00606.x.
(83) Jiang, X.; Dulubova, I.; Reisman, S. A.; Hotema, M.; Lee, C. Y. I.; Liu, L.; McCauley, L.;
     Trevino, I.; Ferguson, D. A.; Eken, Y.; Wilson, A. K.; Wigley, W. C.; Visnick, M. A
     Novel Series of Cysteine-Dependent, Allosteric Inverse Agonists of the Nuclear Receptor
     RORγt.       Bioorganic     Med.      Chem.       Lett.     2020,      30     (6),    126967.
     https://doi.org/10.1016/j.bmcl.2020.126967.
(84) Yang, W.; Eken, Y.; Zhang, J.; Cole, L. E.; Ramadan, S.; Xu, Y.; Zhang, Z.; Liu, J.;
     Wilson, A. K.; Huang, X. Chemical Synthesis of Human Syndecan-4 Glycopeptide
     Bearing O-, N-Sulfation and Multiple Aspartic Acids for Probing Impacts of the Glycan
     Chain and the Core Peptide on Biological Functions. Chem. Sci. 2020, 11 (25), 6393–
     6404. https://doi.org/10.1039/d0sc01140a.
(85) Yang, W.; Ramadan, S.; Orwenyo, J.; Kakeshpour, T.; Diaz, T.; Eken, Y.; Sanda, M.;
     Jackson, J. E.; Wilson, A. K.; Huang, X. Chemoenzymatic Synthesis of Glycopeptides
     Bearing Rare N-Glycan Sequences with or without Bisecting GlcNAc. Chem. Sci. 2018, 9
     (43), 8194–8206. https://doi.org/10.1039/c8sc02457j.
                                              184


                                CHAPTER EIGHT
Binding of Per- and Polyfluoro-Alkyl Substances (PFASs) to Peroxisome Proliferator-
Activated Receptor Gamma (PPAR)
                                       185


About this chapter: This chapter is reprinted from Nuno, A.; Eken, Y.; Wilson, A. K. Binding of
Per- and Polyfluoro-Alkyl Substances (PFASs) to Peroxisome Proliferator-Activated Receptor
Gamma (PPARγ). ACS Omega 2021, 6 (23), 15103-15114 with the permission of American
Chemical Society. Both, Nuno M.S. Almeida and Yiğitcan Eken investigated interactions of half
of the compounds with PPARγ included in this chapter.
8.1 Introduction
     Per- and poly-fluoroalkyl substances (PFASs) are “forever chemicals”, a number of which
have been implicated with long lasting effects on humans, animals and the environment.1 The
first report of PFASs dates back to 1940.2 Due to their oil and fat repellent properties along with
their resilient nature, these chemicals were initially used for military purposes. Later, they were
applied to industrial products, such as coating agents, oil repellents, and firefighting foam.3–5
     Perfluorooctane sulfonate acid (PFOS) and perfluorooctanoic acid (PFOA) are the two most
well-known PFASs. PFOA was initially used in commercial products to produce
polytetrafluoroethylene (PTFE), for non-stick coatings.3 Several studies in the 1990s confirmed
the presence of PFOS in blood serum. Eight chemical companies agreed to stop the production of
PFOA and PFOS in 2006.6 In 2015, the production of PFOS, PFOA, perfluorosulfonic acids
with six or more carbon atoms, and perfluorocarboxylic acids with eight, or more carbon atoms
in the United States ended.6,7 Despite safety concerns, which has stopped U.S. production and
use, the manufacturing of these chemicals has continued in other countries.8
     Recently, concerns have been raised about the possible levels of PFAS compounds in water
sources, and, mitigation efforts are underway in many states.9 In 2016, the EPA released a health
advisory recommending that the combined concentration of PFOS and PFOA in water should be
less than 70 ng/L.10 Despite the health advisory, there are no mandatory federal standards, and
                                                186


each state in the U.S. has its own regulations, or guidelines for the safety of drinking water,
ranging from 11 to 1000 ng/L.10
    Assessing the impact of PFASs on organisms at the molecular level is fundamental to
understanding their possible effects and identifying routes to mitigate them. The hepatotoxicity,
neurotoxicity, reproductive toxicity, immunotoxicity, thyroid disruption, and cardiovascular
toxicity of PFOS has been discussed by Zeng et. al.11 For a number of affected proteins linked to
such toxicological impacts, there is crystal structure data available, facilitating molecular level
studies. In addition, recent in-vivo and in-vitro studies have been conducted to study the
interactions between human and animal proteins with PFASs (see, e.g., Ref 12–26).
    In recent studies, PFOS was implicated in renal fibrosis.27,28 The mechanism by which PFOS
can cause renal injury, involves the deacetylation and inactivation of PPARg, playing a very
important role in cell signaling processes. Liu et. al. studied the associations of different PFASs
and serum biochemical markers for uremic patients under hemodialysis.29 They found that the
effects of PFOS and PFOA on the kidneys are long-lasting, and provided an explanation for the
long half-life that PFASs have in humans.
    PPARg functions as a regulator for fatty acid storage and glucose metabolism by binding to
DNA and acting as a transcription factor. The homodimerization of PPARγ and its biological
relevance have been discussed in the literature.30–35 Fulton et. al. provides direct evidence that
PPARγ homodimerizes by using yeast two-hybrid experiments, where the physical interaction
between the two PPARγ monomers, and formation of homodimers, has been shown by reporter
activation.30 Todorov et. al. studied nuclear receptor proteins from CaLu-6 cells probed with 33P-
labeled human renin Pal3 sequence using electrophoretic mobility-shift assay.31 The addition of
anti-PPARγ antibody in these assays resulted in retardation of two separate protein complex
                                                 187


bands. In other words, the anti-PPARγ antibody bound and slowed down two different PPARγ
containing protein complexes present in the cells. Since RXRα is the standard interaction partner
for PPARγ, Todorov et. al. suggested that these two bands might correspond to PPARγ/RXRα
heterodimer and PPARγ/PPARγ homodimer.31 Estany et. al. found two inverted half site DNA
motifs which may allow two PPARγ proteins to bind to each half site as a homodimer.32 Okuno
et. al. utilized gel shift analysis showing that PPARγ might bind to the Pal3 DNA motif as a
homodimer, in comparison to the DR1 motif, which is a commonly known PPARγ/RXR
heterodimer binding site.33 Many PPARγ crystal structures including the one reported by Nolte
et. al. and the one studied here (PDB ID:3ADV) by Waku et. al. shows that PPARγ has a
homodimer interface and can form a homodimer complex similar to other nuclear receptors (i.e.
estrogen receptor-α and RXR-α).34,35 Due, to the possible biological relevance of the PPARγ
homodimer, the homodimer was considered in this study.
    The activation of PPARg causes insulin sensitization and regulates glucose metabolism, and,
the intake of any kinds of sugar is a fundamental process for the body to regulate. Chou et. al.
investigated how L-carnitine plays an essential role in attenuating the effects of PFOS in the
kidneys via PPARg and Sirt1 mechanisms.27 Additionally, L-carnitine can be synthetized on a
cellular level by methionine and lysine, and in prior studies, it is shown to diminish the effects of
gentamicin-induced apoptosis in PPARa.27,28
    To better understand PFAS structure/protein activity relationships, computational studies are
important, although they are scarce. One of the first such studies was performed by Salvalaglio
et. al.36 They examined the binding energies and binding sites in human serum albumin,
describing how PFOS and PFOA bind to this protein. The authors utilized molecular dynamics
simulations along with molecular mechanics generalized Born solvation area (MM-GBSA)
                                                188


calculations to predict free binding energies36, and describe guidelines for PFASs with lower bio
accumulative potential. Other studies have utilized computation to investigate the interaction of
different PFASs with human or animal proteins and analyze possible binding sites and poses.37–40
     Takacs et. al. investigated the interaction between PPARg and PFOS and PFOA.12 They
observed that there was no PPARg activity alteration in both mice and humans in the presence of
these PFASs. Zhang et. al. determined half maximum inhibition concentrations (IC50) for twelve
PFASs with PPARg, providing docking and activity studies, and concluded that hydrogen
bonding of the ligands to Tyr 473, and interactions with His 323 and His 449 were deemed
essential for PPARg activation. Additionally, the authors identified key residues and important
hydrogen bond pairs on PPARg for the ligand binding pocket (LBP) using molecular docking.17
For PPARg, different studies identify His 323, His 499 and Tyr 473 as key for PPARg’s activity,
along with the size and length of the carbon chain (see example references 41 and 42). In terms of
structural properties, the importance of helixes AF-2, 3, 7 and 10 has been documented prior for
PPARg. The position of PFASs within the ligand binding pocket and AF-2 helix, along with key
residue interactions are of paramount importance for PPARg’s activity. 17,43
     Activity and docking studies were also performed on PPARb/d using a range of PFASs by Li
et. al.44 The authors found that the binding geometries of selected PFASs were similar to those of
fatty acids, fitting in the ligand binding pocket of PPARβ/δ. Furthermore, Li et. al. found that
both isoforms of PPAR are activated by PFASs, and that the transcriptional activity was
associated with the carbon length.44 Recently, Behr et. al. probed the activation of nuclear
receptors with PFAS.18 Although PPARα could activate several PFASs, PPARγ was shown to
only be activated by perfluoro-2-methyl-3-oxahexanoic acid (PMOH) and 3H-perfluoro-3-[(3-
methoxypropoxy) propanoic acid (PMPP). In comparison with in vitro experimental results by
                                                 189


Zhang et. al., Behr et. al. reported much different PPARγ activity. These inconsistencies were
attributed to the selected PPARγ constructs and different cell lines used in the experiment.    17,18
Due to the conflicting conclusions from the prior studies, a better understanding of how PPARg
interacts with different residues at a molecular level is needed.
     In this study, different binding pockets are investigated, as well as the interactions between
PPARg and 27 widely used PFASs. Herein, in addition to the orthosteric binding pocket present
in the PPARg ligand binding domain (LBD), a new binding site present in the PPARg
homodimer is identified: dimer pocket and studied as a potential bio accumulative target. The
dimer pocket is situated between the two PPARg LBD monomers, and computational predictions
showed binding to a variety of PFASs.
     The PFASs investigated here represent a variety of carbon chain lengths and functional
groups (amines, carboxylic groups, alcohols, and sulfonic groups) to provide insight about how
structural modifications affect the binding of PFAS species to the receptor. A number of “short
chain” PFAS alternatives are considered including 2,3,3,3-tetrafluoro-2-heptafluoropropoxy
propanoic acid (GenX), 4,8-dioxa-3H-perfluorononanoic acid (ADONA), 6:2 fluorotelomer
carboxylic acid (6:2 FTCA), and 6:2 fluorotelomer alcohol (6:2 FTOH). “Short chain”
alternatives to PFOS and PFOA are perfluoroalkyl carboxylic acids (PFCAs) with six or less
fluorinated carbons and perfluorosulfonic acids (PFSAs) with five or less fluorinated carbons.
“Short chain” PFASs are generally thought to be less harmful; however, their effects on the
human body and environment are less understood.45–47 The influence of basic and acidic residues
upon the interactions has been investigated, as has the impact of L-carnitine and its interaction
with different binding pockets.
                                                 190


8.2 Computational Methods
8.2.1 Site Analysis and Molecular Docking
    The PPARγ dimer structure was taken from the RSCB Protein Data Bank (PDB ID:
3ADV35), and was protonated using the Protonate 3D48 program from the Molecular Operating
Environment’s (MOE).49 3ADV structure is a PPARγ homodimer, which has seen less attention
in the literature and allowed us to identify a new binding site for PFASs (dimer pocket).
Additionally, 3ADV has a fatty acid metabolite, which has an amphiphilic nature similar to
PFASs and also has good X-ray resolution (2.27 Å), which allows for detecting positions of the
side chain atoms confidently.35 The protonated PPARγ dimer was scanned for potential binding
pockets using MOE’s “site finder program”. The site finder program detects alpha shapes on the
protein surface and evaluates them according to their propensity of ligand binding (PLB) score.50
    The initial structures of the PFASs and L-carnitine were obtained from PubChem.51 The
chemical formulas and acronyms for the PFASs can be found on Table 8.1 and the chemical
structures of the compounds are included in Table 8.2. The protonation states of the PFASs and
L-carnitine under physiological conditions (pH 7, 300K and 1 atm) were determined using the
Protonate3D module and the structures were minimized in MOE with the AMBER10: Extended
Hückel Theory (EHT) force field, which uses Amber ff10 for macromolecules and Extended
Hückel Theory for the ligands.52–54 PFASs’ and L-carnitine binding modes to the dimer pocket
and LBP were determined by docking to the binding sites using MOE.49 During the generation of
L-carnitine binding poses to the LBP, hydrogen bond to the Tyr 473 was implemented as a query
for a pharmacophore approach, which is associated with PPARγ activity.
    The London ΔG scoring function was used to evaluate 100 initial ligand placements.55 Then,
these initial 100 placements were further refined to ten poses via the Generalized-Born Volume
                                               191


Integral/Weighted Surface area scoring function (GBVI/WSA) ∆G with induced fit protein
settings. The structurally distinct refined poses with the highest (GBVI/WSA) ∆G scores were
selected for further studies.
8.2.2 Simulation Protocol
    The selected complex structures were minimized using molecular mechanics (MM) with the
AMBER10:EHT forcefield in MOE.52–54 The topologies and the parameters for the minimized
structures were created using the Leap module of Amber Tools56 by using General Amber Force
Field (GAFF), AMBER ff14sb force fields.57 The AM1-BCC charge scheme58 was used to
calculate partial charges of the ligand atoms, and these partial charges were fit to GAFF by using
the Antechamber56 suite to generate ligand parameters. The protein-ligand complex structures
were placed in a 14 Å cube beyond the solute box, neutralized and ionized with 100mM NaCl
ions using parameters from Joung and Cheatham in order to replicate a biological ionic
environment.59
    In the minimization protocol, a series of harmonic potentials (500.0, 200.0, 20.0, 10.0, 5.0,
0.0 kcal mol-1) were used, which restrain the protein structure, and allow water molecules, ions
and the ligand to relax. Then, the systems were heated from 100 K to 300 K in 30 picosecond
MD simulations. After heating, 30 ns, MD simulations were performed to ensure the
convergence of the system at 300 K and 1 atm pressure (see example RMSD plots Figures 8.16-
8.19). During all simulations, the pressure and temperature were controlled by isotropic position
scaling and Langevin dynamics, respectively. Furthermore, the SHAKE algorithm60 was used to
constrain hydrogen bonds which allowed the use of a 2-femtosecond time step. Non-bonded
interactions were truncated to 10 Å, while the particle-mesh Ewald (PME) method was used to
                                                192


efficiently approximate long-range electrostatic interactions. The minimization protocol and MD
simulations were performed with Amber.56
8.2.3 Binding Energy Calculations
    The binding free energies of the ligand-protein complexes were calculated using both
Molecular Mechanics Poisson–Boltzmann Surface Area (MM-PBSA) and Molecular Mechanics
General Born Surface Area (MM-GBSA) with a modified General Born solvation model61
implemented in the Amber PBSA-solver.62 The default internal and external dielectric constants
were used (1.0 and 80.0, respectively). The solvent accessible surface area (SASA) was
determined with the default Linear Combinations of Pairwise Overlaps (LCPO) method using
modified Bondi atomic radii. Due to the high computational cost of the methodology, initial 500
frames of the simulation were used for the MM-GBSA and MM-PBSA calculations. As shown
in Figures 8.16-8.19, the overall protein RMSD has reached stability by this point, so longer
simulations are not necessary. A prior study has demonstrated, that choice of different/longer
time frames will have little impact on the binding energy predictions.63 The solute entropies were
not considered, because the primary focus of this effort was on the relative binding energies of
the ligands on PPARγ. The binding contributions of the residues were calculated by per-residue
decomposition56 and the energy contribution for each acidic and basic residues were averaged
from all of the poses tested. The residue decomposition was performed using CPPTRAJ from
Amber was used and the full length of the simulation was considered.56,64 This step is important
to understand specific interactions, selectivity and recognition in PPARγ.
                                                 193


8.2.4 Hydrogen Bond Analysis
    Hydrogen bond lifetime analyses were performed via CPPTRAJ for every ligand tested.64
The ligand-PPARγ complex with the strongest MM-PBSA relative binding energy was selected
for analysis.
8.3 Results and Discussion
8.3.1 Binding pockets on PPARγ
    The two potential binding sites with the highest PLB scores, referred to here as the dimer
pocket and the Ligand Binding Pocket (LBP), were investigated and are shown in Figure 8.1.
The dimer pocket, not previously studied, has the highest PLB score in comparison to other
pockets. It is located between the two PPARγ dimer structures and is ~1900 Å3 in size. This is in
contrast to the LBP, which is ~ 1300 Å3 in size. The LBP is known to bind to a variety of
ligands (i.e. medium chain fatty acids, thiazolidinediones, phenyl acetic acids and phenyl
propanoic acids).65–67 In this study, both the dimer pocket and the LBP were considered as
potential binding sites for the PFASs (Table 8.1) and L-carnitine.
                                               194


                                                                    PDB ID: 3ADV
                                                                                                    E
                                                                                               NC
                                                                                          RA
                                                                                    E   NT
                                                                                P
                                                                             LB
                                               Dimer
                                                                   Ligand Binding
                                               Pocket
                                                                       Pocket
                                                 DIMER POCKET
                                                   ENTRANCE
   Figure 8. 1 Binding pockets detected on the PPARγ dimer structure (PDB ID: 3ADV) using
MOE’s Site Finder. Two potential binding sites are identified and their entrances are shown. The
surface and area of the binding sites are depicted. The red spheres indicate a hydrophilic, while
silver depicts hydrophobic surfaces.
8.3.2 Binding Poses of PFASs
   To determine how PFASs orient within the potential binding sites, molecular docking was
used. The ligand binding to PPARγ is a complex process. The PPARγ receptor contains flexible
binding cavities and can host a variety of structurally distinct ligands.68 Due to the complexity of
binding, induced-fit docking is used during the pose generation. Induced-fit docking accounts for
the movements in the protein structure upon ligand binding and multiple binding possess
generated during this step are further evaluated through MD and binding free energy
calculations. The binding poses with highest affinity are evaluated through the residue
decomposition schemes and hydrogen bond analysis. The highest affinity binding poses of the
                                                195


ligands into the LBP and the dimer pocket are shown in Figures 8.2 and 8.9, respectively. PFASs
which have more than six, and less than 14 per-fluorinated carbon orient their functional groups
towards Tyr 473, His 449 and His 323, which have previously been proposed as important
residues for PPARγ activity. 17
8.3.3 Binding Free Energy Calculations (MM-GBSA/MM-PBSA) and Correlation Plots
    The binding modes of PFASs and L-carnitine to the LBP and dimer pocket were studied
using MM-GBSA and MM-PBSA and the resulting binding energies are depicted in Figures 8.3
and 8.10, respectively. The binding energies were determined by averaging the results for
different PPARγ binding poses for each compound. In comparing the experimental IC50 values
by Zhang et. al. (see, Ref 17) to our predicted PFASs to LBP binding energies, better correlation
was obtained using MM-PBSA rather than MM-GBSA.
    The binding energy values correlate directly with the carbon chain length; however, the
effects of the carbon chain length differ for the dimer pocket and the LBP. On average, the
binding energies for the dimer pocket were lower than for the LBP. Et-PFOSA-AcOH and Me-
PFOSA-AcOH showed high affinity towards the dimer pocket. Their chain lengths in addition to
their sulfonic and carboxylic functional groups enabled very strong interactions (~25 kcal mol-1).
L-Carnitine also showed strong binding to the dimer pocket and strong residue interactions (see
Section 8.3.4).
    The PFASs showed stronger binding to the LBP than to the dimer pocket while L-carnitine
showed similar binding to both pockets according to MM-PBSA. This indicates that PFASs are
prone to bind more strongly to the LBP, although the dimer pocket can still have a role on the
accumulation of PFASs. Ligand binding to LBP is important for the activity of PPARγ (see, e.g.,
Ref. 17). In order to assess how the calculated binding energies for LBP correlate to the PPARγ
                                                196


activity, IC50 values of PFDA, PFNA, PFHxS, PFOA, PFOS, PFHxDA, PFOcDa, PFTeDA, and
PFDoA determined by Zhang et. al. are used for comparison, as shown in Figure 8.4. The
binding energies of PFOcDA and PFHxS were calculated only for the LBP to compare with
                                                         17
respective experimental IC50 values by Zhang et. al.        The predicted binding energies of L-
carnitine show that it can compete to replace PFASs from both binding sites.
    On average, the affinity of PFASs to LBP increased with the size of the carbon chain length.
There is a rise in binding energy from PFBA to PFOcDA, which is consistent with the increasing
size of the carbon chain length. The LBP is approximately three times larger than other nuclear
receptors’ ligand pockets, which allows for compounds as large as PFOcDA to bind strongly.65
PFASs with sulfonic acid groups (PFSAs) showed higher affinity to the LBP in comparison to
the carboxylic acids, fluoro telomer alcohols (FTOHs), and fluoro telomer carboxylic acids
(FTCAs), with the same number of per-fluorinated carbons. The PFASs that have a 6-8 per-
fluorinated carbons along with both sulfonic acid and carboxylic acid groups (Et-PFOSA-AcOH
and Me-PFOSA-AcOH) showed strong binding to LBP and to the dimer pocket.
    In recent work, MM-GBSA and MM-PBSA binding energy predictions were evaluated for
PFASs and the hPXR protein.69 In this prior study, both MM-PBSA and MM-GBSA correlate
well with the experimental EC50, though the MM-GBSA correlation was slightly better.69
However large PFAS molecules such as PFTeDA, PFHxDA and PFOcDA were not studied for
the hPXR receptor and for these larger molecules, MM-GBSA and MM-PBSA differ. As shown
previously, the utility of MM-GBSA and MM-PBSA can vary with respect to the studied
system.70 Factors such as hydrophobicity, lipophilicity, and electrostatics of the ligand and
choice of binding site, all play an important role on the performance of the theoretical methods,
directly influencing computed predictions. For the large PFASs (PFTeDA, PFHxDA and
                                                197


PFOcDA), the tail portion of the compound is more solvent exposed and MM-PBSA provides a
more rigorous treatment of these solvent effects, thus, MM-PBSA results in better correlation
with experimental IC50 values. For this reason, only the MM-PBSA correlation plot (Figure 8.4)
has been included. MM-GBSA correlation is shown in Figure 8.11. The r2 between calculated
binding energies and experimental IC50 values is 0.6, which indicates that the calculated binding
energies for LBP correlate with the activity data, although some variance is observed. This
variance is associated with both experimental and calculated standard deviations. Another
element that contributes to lower correlation is the fact that experimental IC50 values relate to the
structure activity data, which is not the case for MM-GBSA or MM-PBSA. For example, for 6:2
FTOH, or 8:2 FTOH, Zhang et. al. does not detect any activity experimentally, however, in the
current study, these species do bind, though they do not contribute to the receptor’s activity.
PFHpA is an outlier and has not been included in Figure 8.4, due to its large IC50 value and large
experimental uncertainty for PPARγ activation (192.4 ± 17.2).
                                                 198


    Figure 8. 2 Binding poses of PFASs and L-carnitine on PPARγ. The binding modes that have
the highest binding affinity determined from MM-PBSA are shown. Residues depicted belong to
Chain A.
                                              199


Figure 8. 2 (cont’d)
                     200


    Figure 8. 3 Average binding energies of PFASs and L-carnitine calculated with MM-GBSA
and MM-PBSA for the LBP. PFASs are divided into subgroups: perfluoroalkyl carboxylic acids
(PFCAs), followed by perfluoroalkyl sulfonic acids (PFSAs), fluorotelomer alcohols (FTOHs),
fluorotelomer carboxylic acids (FTCAs), fluorotelomer sulfonic acids (FTSAs) and then
alternatives. Each subgroup was listed from shortest chain length to longest (Tables 8.1 and 8.2
for acronyms and structures).
                                              201


    Figure 8. 4 Average calculated binding energies of PFASs with MM-PBSA in comparison
with IC50 values determined experimentally by Zhang et. al. On the y-axis, the average
calculated binding energies are plotted, and along the x-axis, the experimental IC50 values are
provided. Error bars are depicted in black (MM-PBSA) and red (experimental).
8.3.4 Residue decomposition analysis
8.3.4.1 Binding contribution from nearby residues to PFASs and L-carnitine
    To evaluate the contribution of nearby residues to the Gibbs free energy of binding, a space
of 5-6 Å around PFASs and L-carnitine was selected. The binding energy contribution within
this space was determined via a per-residue decomposition, which accounts for electrostatic and
van der Waals contributions to the binding. The average residue contributions for PFASs (red)
and L-carnitine (green) were determined from the highest affinity poses for the LBP and dimer
pocket, and are compared in Figures 8.5 and 8.12, respectively. At pH 7, L-carnitine is neutral,
                                              202


but it has two charged groups. One side of the molecule is positively charged (N+C3H9) and the
other side has a deprotonated carboxylic group (COO-). It also has an OH group which can serve
as a hydrogen donor (Section 8.3.5). As discussed in Section 8.3.3, L-carnitine shows similar
binding energies to the dimer pocket and LBP, with average binding energies of -19.0 kcal mol-1
from MMPBSA (Tables 8.3 and 8.4).
    For the dimer pocket, the acidic residues such as Glu 324, Asp 396, Glu 407 and Asp 441
repel PFASs derivatives very strongly, as demonstrated by the average binding contributions of ~
30 kcal mol-1 (Figure 8.12). For L-carnitine, the acidic residues contribute positively, or
negatively to the overall energy depending on their orientation towards the NH3+ and COO-
groups in the molecule. For example, in the dimer pocket, L-carnitine is repelled by Glu 324 (15
kcal mol-1), whereas Glu 407 has a negative contribution to the binding energy (-15 kcal mol-1).
The interaction energy of L-carnitine with basic residues, especially arginines and lysines is
significant, but not as strong as for PFASs.
    Figure 8.5 shows the interaction energy of PFASs’ with close residues within the LBP. As
shown, Arg 288 and Lys 367 have the strongest contributions to the binding, whereas Glu 295
and Glu 343 repel PFASs from binding to the LBP. In contrast, L-carnitine is not repelled by Glu
295 and Glu 343, and additionally showed strong interaction energy with Lys 367. Tyr 473
contributes slightly to the binding of PFASs and L-carnitine to the LBP, due to the hydrogen
bonding observed with the long carbon chain molecules. (Zhang et. al. proposed hydrogen
bonding to Tyr 473 as key to the PPARγ activity.17 The hydrogen bonding interaction is
discussed in Section 8.3.5). L-Carnitine has a -6.6 kcal mol-1 interaction energy to Tyr 473,
compared to a slightly lower value of average PFASs. PFASs that are shorter in length such as
PFBA and PFPA did not form a hydrogen bond with Tyr 473 (Figure 8.8).
                                              203


    As the importance of His 449 and His 323 PPARγ activity has been reported41,65, the role of
these residues is examined. His 449 has an interaction energy of ~ -5 kcal mol-1, with the PFASs
and L-carnitine. For His 323 the calculated interaction energy was -5.3 kcal mol-1 for L-carnitine,
but positive for PFASs.
    Figure 8. 5 Binding contribution of each nearby residue for PFASs and L-carnitine (LBP).
For PFASs, highest affinity poses are averaged and for L-carnitine the highest affinity pose is
used.
8.3.4.2 Binding energy contribution from acidic and basic residues to PFASs and L-
carnitine
    A residue decomposition of PPARg in terms of long-range electrostatic interaction was done.
To date, there is no such study done for PPAR receptors. Here, we consider two questions: How
                                               204


are ligands affected by long range interactions? How is the LBP affected by residues on the other
side of the protein?
    To investigate these questions, basic residues (arginines, lysines, histidines) and acidic
(glutamate, aspartate) residues within the PPARγ dimer were studied from the A and B chains.
All ligand poses were considered for the dimer pocket and LBP. Average interaction energies for
all of the PFASs investigated were compared with the L-carnitine interaction energy. In Figures
8.6 and 8.7, the average interaction energies for LBP are shown for PFASs and L-carnitine,
respectively.
    The average interaction energies for the dimer pocket can be found on Figures 8.13 and 8.14.
As the dimer pocket is situated between the two monomers (Figure 8.1), it is able to interact with
both chains of the protein (almost symmetrically, when comparing the energies of Chain A and
Chain B). For basic residues, the strongest interactions are observed with Arg 397, Arg 443, Lys
373, Lys 434 and Lys 438, and for acidic residues the strongest repulsion is observed with Asp
396, Glu 324, Glu 407 and Asp 441 (> ±25 kcal mol-1). The short-range electrostatic interactions
within the chains of the protein, can stabilize the ligand, or repel it. When comparing PFASs
with L-carnitine, the average interaction energies for the PFASs with Asp 396, Glu 324, Glu 407
and Asp 441 reveal a different trend than for L-carnitine. PFASs are strongly repelled by these
residues, while L-Carnitine is only slightly repelled (~ 5 kcal mol-1) by Glu 324 but attracted by
the other ones.
    Considering the LBP, the strongest interactions correspond to residues in Chain A (Arg 288,
Lys 367, Glu 291, Glu 295 and Glu 343), which are situated mainly in the LBP (Figures 8.6 and
8.7). There are large contributions from the residues on the other chain, that range from -5 to -15
kcal mol-1 for the basic residues and 5 to 15 kcal mol-1 for the acidic residues.
                                                205


       For L-carnitine, considering the acidic residues’ interaction energy, there is a different
trend compared to PFASs (Figure 8.7). The acidic residue energies vary from positive to
negative, which shows that not all are repulsive towards L-carnitine. Regarding basic residues,
Lys 367 is the major contributor towards its affinity in the pocket and contributes strongly to the
LBP binding.
                                               206


   Figure 8. 6 Binding contributions of the acidic and basic residues for PFASs (LBP) in Chain
A and Chain B.
                                              207


   Figure 8. 7 Binding contributions of the acidic and basic residues for L-carnitine (LBP) in
Chain A and Chain B.
                                             208


8.3.5 Hydrogen bonding
    Figure 8. 8 Hydrogen bond lifetimes for the LBP. The y-axis depicts the chain and residue
number from the receptor, and in brackets, the atom from the ligand performing the hydrogen
bonding is shown. Acceptors are portrayed by “(O), (F), (N)”, and donors by “(H)”. In the x-axis
the different PFASs and L-carnitine are shown.
    A detailed analysis of the propensity of the dimer pocket and LBP to hydrogen bond is
fundamental for understanding the intermolecular interactions between ligands and residues. By
using MD trajectories, it is possible to understand fundamental binding properties, and the
activity of the receptor/protein. Herein, some of the ligands; 6:2 FTOH, 8:2 FTOH, L-carnitine,
Et-PFOSA and Met-PFOSA can be hydrogen donors or acceptors (Figures 8.8 and 8.15).
    In Figure 8.15, the hydrogen bonding percentage is shown for the dimer pocket. Lys 438,
Arg 443 and Arg 397 have the highest percentage of hydrogen bonding. These residues were
noted earlier (Section 3.4.1) as being in close proximity to the ligands in the binding cavity. L-
                                                209


Carnitine is stabilized in this pocket by three hydrogen bonds with Gln 437, Arg 443 and Ser
394. L-Carnitine’s positive and negative charged groups allow for different bonding with
residues in the dimer pocket. Et-PFOSA-AcOH and Met-PFOSA-AcOH have very strong
affinity to the dimer pocket and form strong hydrogen bonding with Arg 443. The sulfonic and
carboxylic functional groups interact strongly with nearby residues. In addition, Et-PFOSA-
AcOH and Met-PFOSA-AcOH are also stabilized by the interaction with Asp 396 and Gln 444.
In the dimer pocket, hydrogen bonding from fluorines can occur, though it is minimal.
    In Figure 8.8, the LBP hydrogen bonding is described for PFASs and L-carnitine. As
mentioned earlier, hydrogen bonding to Tyr 473 is directly associated to the activity of the
receptor. PFASs with 7-12 perfluorinated carbons such as PFHpA, PFOA, PFNA, PFDA,
PFDoA, PFOS, Et-PFOSA-AcOH, Met-PFOSA-AcOH show high affinity to this residue. PFOS,
Et-PFOSA-AcOH, Met-PFOSA-AcOH and PFDS have a sulfonic group, which enables them to
undergo strong hydrogen bonding, occurring for nearly the entire simulation. From the literature,
6:2 FTOH, 8:2 FTOH, 6:2 FTCA, PFBS and PFBA show no activity against PPARg, which is
corroborated in Figure 8.8, there is no hydrogen bonding to Tyr 473.17 Even though PFTeDA,
PFHxDA, and PFOcDA, show activity experimentally, the MD simulations do not show
hydrogen bond formation with Tyr 473. There are examples of PPARg agonists that do not form
H-bonds with Tyr 473 but are still able to activate a receptor through immobilization of the H12
helix.17,43 Due to the size of these larger PFASs, the binding poses obtained for them were more
distant from Tyr 473 and more solvent exposed and thus the hydrogen bonding with Tyr 473 is
not demonstrated. Also, the scope of this study was to compare relative binding energies of
various PFASs and understand the molecular interactions behind the PPARg recognition. For
this purpose, 30ns MD simulations were performed, allowing more PFAS molecules and poses
                                                210


to be considered. PFASs alternatives such as ADONA, GenX, 6:2 FTOH, 6:2 FTCA, Et-PFOSA-
AcOH and Met-PFOSA-AcOH have large binding energies, but not all of them showed
hydrogen bonding with Tyr 473 during MD simulations. Short-chain PFASs exhibit binding
towards PPARg, yet they show limited hydrogen bonding with Tyr 473. PFASs that have
between six and twelve carbons form strong hydrogen bonds with Tyr 473 and alter PPARγ’s
activi8ty. L-Carnitine forms strong hydrogen bonds as an acceptor with Tyr 327, Lys 367, His
449 and Tyr 473 (Figure 8.8). As a donor, it also interacts with Ser 289. ADONA is a proposed
alternative to PFASs and also forms a hydrogen bond with Tyr 473, which shows its ability to
activate PPARγ. Tyr 327 and Lys 367 form a hydrogen bond with a range of PFASs.
8.4 Conclusions
    The interactions of twenty-seven PFAS molecules and one of its natural ligands, L-carnitine
with two potential binding pockets on the PPARγ dimer were investigated. Possible poses for the
PFASs and L-carnitine, their binding energies, and important residue interactions, including
hydrogen bond analysis were evaluated. The role of the dimer pocket is discussed and shown to
be important for binding PFASs and L-carnitine. The PFASs’ binding energies predicted for the
dimer pocket show evidence for potential bioaccumulation of PFASs at this site. Significant
correlation is observed between the predicted binding energies for the LBP and experimental
IC50 values of PFASs in PPARγ, which allowed the activity of the remaining PFASs to be
estimated.
    Shorter-chain PFASs, such as PFBA, PFPA, 6:2 FTCA, Met-PFOSA-AcOH and Et-PFOSA-
AcOH bind strongly to the dimer pocket, which indicates their potential bioaccumulation at this
site. The PFASs in this study that have between six and twelve carbons form strong hydrogen
bonds with Tyr 473 and alter the activity of PPARγ. PFAS alternatives such as ADONA, GENX,
                                               211


6:2 FTOH, 6:2 FTCA, Et-PFOSA-AcOH and Met-PFOSA-AcOH also have large binding
energies, but not all of them showed hydrogen bonding with Tyr 473 during MD simulations,
which is deemed essential for PPARγ activation. L-Carnitine also showed hydrogen bonding
with Tyr 473.
    The affinity of L-carnitine to LBP determined by MMPBSA is -19.0 kcal mol-1, which shows
similar binding in comparison to most of the PFASs. In addition, acid/base, and short distance
residue interactions contribute more towards the L-carnitine binding affinity than towards the
studied PFASs. For the dimer pocket the binding affinity of L-carnitine is one of the largest
binding energies. The high affinity of L-carnitine to both pockets, demonstrates that it could
viably be used to compete/replace PFASs from the binding sites. The important interactions
detailed here can provide useful insight about how these species may interact with other proteins,
and about traits that may be important in building an inhibitor that can help to alleviate the
effects of these “forever chemicals” on PPARγ.
                                               212


APPENDIX
   213


    Table 8. 1 The PFASs used in this study are listed and are categorized based on their structural families: perfluoroalkyl carboxylic
acids (PFCAs), perfluorosulfonic acids (PFSAs), fluoro telomer alcohols (FTOH), fluoro telomer sulfonic acids (FTSA), fluoro
telomer carboxylic acids (FTCA).
                                         Perfluorinate
         Type             Acronym                                           Name                           Chemical Formula
                                          d Carbon
        PFCA                 PFBA              3                   perfluorobutanoic acid                   CF3-(CF2)2-COOH
        PFCA                 PFPA              4                   perfluoropentanoic acid                  CF3-(CF2)3-COOH
        PFCA                PFHxA              5                   perfluorohexanoic acid                   CF3-(CF2)4-COOH
        PFCA                PFHpA              6                   perfluoroheptanoic acid                  CF3-(CF2)5-COOH
        PFCA                PFOA               7                    perfluorooctanoic acid                  CF3-(CF2)6-COOH
        PFCA                PFNA               8                   perfluorononanoic acid                   CF3-(CF2)7-COOH
        PFCA                PFDA               9                   perfluorodecanoic acid                   CF3-(CF2)8-COOH
        PFCA              PFUnDA              10                  perfluoroundecanoic acid                  CF3-(CF2)9-COOH
        PFCA                PFDoA             11                  perfluorododecanoic acid                 CF3-(CF2)10-COOH
        PFCA               PFTeDA             13                 perfluorotetradecanoic acid               CF3-(CF2)12-COOH
        PFCA              PFHxDA              15                 perfluorohexadecanoic acid                CF3-(CF2)14-COOH
        PFCA              PFOcDA              17                 perfluorooctadecanoic acid                CF3-(CF2)16-COOH
        PFSA                 PFBS              4                perfluorobutane sulfonic acid               CF3-(CF2)3-SO3H
        PFSA                PFHxS              6                 perfluorohexa sulfonic acid                CF3-(CF2)5-SO3H
        PFSA                PFHpS              7               perfluoroheptane sulfonic acid               CF3-(CF2)6-SO3H
        PFSA                 PFOS              8                perfluorooctane sulfonic acid               CF3-(CF2)7-SO3H
        PFSA                 PFDS             10               perfluorodecane sulfonic acid                CF3-(CF2)9-SO3H
        FTOH              6:2 FTOH             6                  6:2 fluorotelomer alcohol              CF3-(CF2)5-(CH2)2-OH
        FTOH              8:2 FTOH             8                  8:2 fluorotelomer alcohol              CF3-(CF2)7-(CH2)2-OH
        FTCA              5:3 FTCA             5            5:3 Fluorotelomer Carboxylic Acid           CF3-(CF2)4-(CH2)2-COOH
        FTCA              6:2 FTCA             6            6:2 Fluorotelomer Carboxylic Acid            CF3-(CF2)5-CH2-COOH
        FTSA              6:2 FTSA             6              6:2 Fluorotelomer Sulfonic Acid           CF3-(CF2)5-(CH2)2- SO3H
                                                                  214


Table 8. 1 (cont’d)
                                                2,3,3,3-tetrafluoro-2-
   Alternative         GenX   5                                             CF3-(CF2)2-O-(CF3)CF-COOH
                                         heptafluoropropoxy Propanoic Acid
                                                                             CF3-O-(CF2)3-O-CHF-CF2-
   Alternative       ADONA    6         4,8-dioxa-3H-perfluorononanoic acid
                                                                                       COOH
         -            PFOSA   8             Perfluorooctane Sulfanamido          CF3-(CF2)7-SO2NH2
                    Et-PFOSA-          2-(N-Ethylperfluorooctanesulfoamido) CF3-(CF2)7-SO2N(C2H5)-CH2-
         -                    8
                      AcOH                           Acetic Acid                       COOH
                                                        2-(N-
                    Me-PFOSA-                                               CF3-(CF2)7-SO2N(CH3)-CH2-
         -                    6          Methylperfluorooctanesulfoamido)
                      AcOH                                                             COOH
                                                     acetic acid
                                    PFSA = CF3-(CF2)n-SO3H
                                   PFCA = CF3-(CF2)n-COOH
                                 FTOH = CF3-(CF2)n-(CH2)m-OH
                                FTSA = CF3-(CF2)n-(CH2)m- SO3H
                                FTCA = CF3-(CF2)n-(CH2)m-COOH
                                              215


Table 8. 2 PFASs chemical structures used in this study.
             Structure                           Name            Structure           Name
                                                                             2,3,3,3-tetrafluoro-2-
                                          Perfluorobutanoic
                                                                           heptafluoropropoxypro
                                          Acid (PFBA, CAS
                                                                              panoic acid (GenX,
                                             No. 375-22-4)
                                                                            CAS No. 62037-80-3)
                                          Perfluoropentanoi
                                             c Acid (PFPA,                    (ADONA, CAS No.
                                          CAS No. 2706-90-                       958445-448)
                                                   3)
                                          Perfluorohexanoic
                                                                                Perfluorooctane
                                             Acid (PFHxA,
                                                                            Sulfanamido (PFOSA,
                                           CAS No. 307-24-
                                                                              CAS No. 754-91-6)
                                                   4)
                                          Perfluoroheptanoi
                                                                             Perfluoroundecanoic
                                            c Acid (PFHpA,
                                                                            Acid (PFUnDA CAS
                                           CAS No. 375-85-
                                                                                No. 2058-94-8)
                                                   9)
                                           Perfluorooctanoic               Perfluoroheptanesulfoni
                                          Acid (PFOA, CAS                    cAcid (PFHpS, CAS
                                             No. 335-67-1)                       No. 375-92-8)
                                                                                      2-N-
                                          Perfluorononanoic                Ethylperfluoroocatensul
                                          Acid (PFNA, CAS                    fanomido-Aceticacid
                                             No. 375-95-1)                    (Et-PFOSA-AcOH,
                                                                             CAS No. 2991-50-6)
                                                                                       6:2
                                          Perfluorodecanoic
                                                                           FluorotelomerSulfonic
                                          Acid (PFDA, CAS
                                                                            Acid (6:2 FTSA, CAS
                                             No. 335-76-2)
                                                                               No. 27619-97-2)
                                                             216


Table 8. 2 (cont’d)
                                           2NMethylperfluoroocta
                    Perfluorododecano
                                               nesulfonamido
                     ic Acid (PFDoA,
                                               Aceticacid (Me-
                     CAS No. 307-55-
                                            PFOSA-AcOH, CAS
                             1)
                                               No. 2355-31-9)
                    Perfluorooctanesul         2H,2H,3H,3H-
                    fonic Acid (PFOS,      Perfluorooctanoic Acid
                    CAS No. 1763-23-        (5:3 FTCA, CAS No.
                             1)                 914637-49-3)
                    Perfluorobutanesu
                                           Perfluorodecanesulfoni
                        lfonic Acid
                                           cAcid (PFDS, CAS No.
                     (PFBS, CAS No.
                                                  335-77-3)
                         375-73-5)
                    6:2 Fluorotelomer                8:2
                        Alcohol (6:2        FluorotelomerAlcohol
                     FTOH, CAS No.          (8:2 FTOH, CAS No.
                         647-42-7)                678-39-7)
                    6:2 Fluorotelomer
                     Carboxylic Acid
                     (6:2 FTCA, CAS
                       No. 647-42-7)
                                       217


   Table 8. 3 Binding energies for the dimer pocket and standard deviations in kcal mol-1 for all
PFASs and L-carnitine.
                       Average                                 Average
Compound name      MMGBSA binding        STD MMGBSA        MMPBSA binding      STD MMPBSA
                        energy                                  energy
      PFBA               -17.8                3.2                -15.2              4.0
       PFPA              -12.8                3.9                -11.3              4.7
      PFHxA               -7.4                3.5                 -9.3              4.4
      PFHpA               -8.9                3.4                -14.6              4.1
      PFOA                -6.6                3.6                -13.5              4.2
      PFNA                -8.2                4.0                -16.9              4.3
      PFDA                -1.3                4.2                -12.8              3.7
    PFUnDA                -8.2                3.7                -17.8              4.3
      PFDoA                3.5                4.4                -12.4              4.7
     PFTeDA               -2.5                4.2                -19.7              4.7
    PFHxDA                -0.8                4.4                -21.8              4.6
       PFBS              -18.1                3.7                -15.7              3.7
      PFHpS               -9.1                3.3                -14.0              3.9
       PFOS               -9.9                4.0                -16.8              4.3
       PFDS               -8.8                4.0                -16.9              4.0
    6:2 FTOH              -9.2                3.6                -17.5              3.7
    8:2 FTOH              -2.2                4.2                -10.9              4.1
    5:3 FTCA             -18.0                5.0                -16.7              5.1
    6:2 FTCA             -13.3                4.3                -19.3              5.7
    6:2 FTSA             -27.8                5.3                -18.2              5.1
       GenX              -18.8                3.6                -19.6              4.3
     ADONA               -14.2                4.2                -11.8              5.4
     PFOSA               -16.7                4.7                -15.0              5.2
Et-PFOSA-AcOH            -31.1                5.5                -26.9              5.2
   Me-PFOSA-
                         -25.3                5.0                -25.9              5.4
      AcOH
   L-carnitine           -19.0                5.4                -19.0              5.6
                                              218


   Table 8. 4 Binding energies for the ligand binding pocket (LBP) and standard deviations in
kcal mol-1 for all PFASs and L-carnitine.
                        Average                               Average
Compound name       MMGBSA binding       STD MMGBSA      MMPBSA binding     STD MMPBSA
                         energy                                energy
       PFBA               -17.7               2.5               -20.9             4.7
       PFPA               -16.9               2.9               -18.4             4.0
      PFHxA               -19.5               2.7               -21.6             4.4
      PFHpA               -18.1               2.6               -21.7             4.2
       PFOA               -17.1               3.0               -23.4             3.8
       PFNA               -22.1               3.2               -28.7             3.9
       PFDA               -23.8               3.8               -31.0             4.8
    PFUnDA                -19.4               3.2               -28.3             3.7
      PFDoA               -21.6               3.9               -27.9             4.2
     PFTeDA               -14.0               3.4               -29.2             3.5
    PFHxDA                -16.1               4.1               -35.5             4.1
     PFOcDA               -15.3               4.1               -36.9             3.9
       PFBS               -17.7               2.8               -17.6             4.2
      PFHxS               -22.4               3.2               -21.6             4.4
      PFHpS               -25.7               3.4               -26.6             4.2
       PFOS               -24.9               3.9               -28.7             4.2
       PFDS              -24.32               3.8               -29.7             3.8
    6:2 FTOH              -14.1               2.7               -20.1             3.1
    8:2 FTOH              -14.4               3.2               -23.3             2.6
    5:3 FTCA              -17.4               3.2               -19.1             4.3
    6:2 FTCA              -23.0               3.3               -27.0             4.5
    6:2 FTSA              -21.9               3.7               -21.7             4.5
       GenX               -17.2               2.7               -21.1             4.2
     ADONA                -23.2               2.6               -24.7             3.9
      PFOSA               -29.6               4.6               -27.7             4.9
Et-PFOSA-AcOH             -34.7               3.6               -30.8             4.2
   Me-PFOSA-
                          -27.3               3.6               -27.6             4.2
      AcOH
   L-carnitine            -31.8               3.3               -19.0             4.1
                                             219


   Figure 8. 9 Binding poses of PFASs and L-carnitine on the PPARγ dimer pocket. The
binding modes that have the highest binding affinity determined from MM-PBSA are shown.
                                              220


Figure 8. 9 (cont’d)
                     221


     Figure 8. 10 Average binding energies of PFASs and L-carnitine calculated with MM-GBSA
and MM-PBSA for the dimer pocket.
     Figure 8. 11 MM-GBSA in comparison with IC50 values measured experimentally by Zhang
et. al. for the LBP.17 On the y-axis, average calculated binding energies are plotted, and along the
x-axis, the experimental IC50 values are provided. Error bars are depicted in black (MM-GBSA)
and red (experimental).
                                                 222


   Figure 8. 12 Binding contribution of each nearby residue for PFASs and L-carnitine (dimer
pocket).
                                            223


    Figure 8. 13 Binding contributions of the acidic and basic residues for PFASs (dimer pocket)
in Chain A and Chain B.
                                               224


   Figure 8. 14 Binding contributions of the acidic and basic residues for L-carnitine (dimer
pocket) in Chain A and Chain B.
                                            225


    Figure 8. 15 Hydrogen bond lifetimes for the dimer pocket. The y-axis depicts the chain and
residue number from the receptor, and in brackets, the atom from the ligand performing the
hydrogen bonding is shown. Acceptors are portrayed by “(O), (F), (N)”, and donors by “(H)”. In
the x-axis the different PFASs and L-Carnitine are shown
    Figure 8. 16 PFOS RMSD plots for the dimer pocket.
                                              226


Figure 8. 17 L-Carnitine RMSD plots for the dimer pocket.
Figure 8. 18 PFOS RMSD plots for the LBP pocket.
                                          227


Figure 8. 19 L-Carnitine RMSD plots for the LBP pocket.
                                          228


REFERENCES
    229


                                        REFERENCES
(1)  Sinclair, G. M.; Long, S. M.; Jones, O. A. H. What Are the Effects of PFAS Exposure at
     Environmentally Relevant Concentrations? Chemosphere 2020, 258, 127340.
     https://doi.org/10.1016/j.chemosphere.2020.127340.
(2)  Paul, A. G.; Jones, K. C.; Sweetman, A. J. A First Global Production, Emission, and
     Environmental Inventory for Perfluorooctane Sulfonate. Environ. Sci. Technol. 2009, 43
     (2), 386–392. https://doi.org/10.1021/es802216n.
(3)  Sajid, M.; Ilyas, M. PTFE-Coated Non-Stick Cookware and Toxicity Concerns: A
     Perspective. Environ. Sci. Pollut. Res. 2017, 24 (30), 23436–23440.
     https://doi.org/10.1007/s11356-017-0095-y.
(4)  Rao, N. S.; Baker, B. E. Textile Finishes and Fluorosurfactants. In Organofluorine
     Chemistry; Banks, R. E., Smart, B. E., Tatlow, J. C., Eds.; Springer US: Boston, MA,
     1994; pp 321–338. https://doi.org/10.1007/978-1-4899-1202-2_15.
(5)  Schaider, L. A.; Balan, S. A.; Blum, A.; Andrews, D. Q.; Strynar, M. J.; Dickinson, M. E.;
     Lunderberg, D. M.; Lang, J. R.; Peaslee, G. F. Fluorinated Compounds in U.S. Fast Food
     Packaging.      Environ.     Sci.     Technol.      Lett.   2017,   4     (3),    105–111.
     https://doi.org/10.1021/acs.estlett.6b00435.
(6)  State of Minnesota. Civil Action No. 27-CV-10-28862, State of Minnesota, et Al. v. 3M
     Company. Expert Report of Philippe Grandjean, MD, DMSc. Prepared on Behalf of
     Plaintiff State of Minnesota; State of Minnesota District Court for the County of
     Hennepin; 2017.
(7)  US       EPA.      EPA       and      3M      announce       phase    out      of    PFOS
     https://yosemite.epa.gov/opa/admpress.nsf/0/33aa946e6cb11f35852568e1005246b4.
(8)  Wang, Z.; Dewitt, J. C.; Higgins, C. P.; Cousins, I. T. A Never-Ending Story of Per- and
     Polyfluoroalkyl Substances (PFASs)? Environ. Sci. Technol. 2017, 51 (5), 2508–2518.
     https://doi.org/10.1021/acs.est.6b04806.
(9)  Post, G. B.; Gleason, J. A.; Cooper, K. R. Key Scientific Issues in Developing Drinking
     Water Guidelines for Perfluoroalkyl Acids: Contaminants of Emerging Concern. PLoS
     Biol. 2017, 15 (12), e2002855. https://doi.org/10.1371/journal.pbio.2002855.
(10) Cordner, A.; De La Rosa, V. Y.; Schaider, L. A.; Rudel, R. A.; Richter, L.; Brown, P.
     Guideline Levels for PFOA and PFOS in Drinking Water: The Role of Scientific
     Uncertainty, Risk Assessment Decisions, and Social Factors. J. Expo. Sci. Environ.
     Epidemiol. 2019, 29 (2), 157–171. https://doi.org/10.1038/s41370-018-0099-9.
(11) Zeng, Z.; Song, B.; Xiao, R.; Zeng, G.; Gong, J.; Chen, M.; Xu, P.; Zhang, P.; Shen, M.;
     Yi, H. Assessing the Human Health Risks of Perfluorooctane Sulfonate by in Vivo and in
     Vitro         Studies.        Environ.         Int.       2019,       126,        598–610.
                                              230


     https://doi.org/10.1016/j.envint.2019.03.002.
(12) Takacs, M. L.; Abbott, B. D. Activation of Mouse and Human Peroxisome Proliferator–
     Activated Receptors (α, β/δ, γ) by Perfluorooctanoic Acid and Perfluorooctane Sulfonate.
     Toxicol. Sci. 2007, 95 (1), 108–117. https://doi.org/10.1093/toxsci/kfl135.
(13) Ikeda, T.; Aiba, K.; Fukuda, K.; Tanaka, M. The Induction of Peroxisome Proliferation in
     Rat Liver by Perfluorinated Fatty Acids, Metabolically Inert Derivatives of Fatty Acids. J.
     Biochem. 1985, 98 (2), 475–482.
(14) Butenhoff, J. L.; Pieterman, E.; Ehresman, D. J.; Gorman, G. S.; Olsen, G. W.; Chang, S.
     C.; Princen, H. M. G. Distribution of Perfluorooctanesulfonate and Perfluorooctanoate
     into Human Plasma Lipoprotein Fractions. Toxicol. Lett. 2012, 210 (3), 360–365.
     https://doi.org/10.1016/j.toxlet.2012.02.013.
(15) MacManus-Spencer, L. A.; Tse, M. L.; Hebert, P. C.; Bischel, H. N.; Luthy, R. G. Binding
     of Perfluorocarboxylates to Serum Albumin: A Comparison of Analytical Methods. Anal.
     Chem. 2010, 82 (3), 974–981. https://doi.org/10.1021/ac902238u.
(16) Zhang, X.; Chen, L.; Fei, X. C.; Ma, Y. S.; Gao, H. W. Binding of PFOS to Serum
     Albumin and DNA: Insight into the Molecular Toxicity of Perfluorochemicals. BMC Mol.
     Biol. 2009, 10 (1), 16. https://doi.org/10.1186/1471-2199-10-16.
(17) Zhang, L.; Ren, X. M.; Wan, B.; Guo, L. H. Structure-Dependent Binding and Activation
     of Perfluorinated Compounds on Human Peroxisome Proliferator-Activated Receptor γ.
     Toxicol.        Appl.         Pharmacol.        2014,       279         (3),     275–283.
     https://doi.org/10.1016/j.taap.2014.06.020.
(18) Behr, A. C.; Plinsch, C.; Braeuning, A.; Buhrke, T. Activation of Human Nuclear
     Receptors by Perfluoroalkylated Substances (PFAS). Toxicol. Vitr. 2020.
     https://doi.org/10.1016/j.tiv.2019.104700.
(19) Pastoor, T. P.; Lee, K. P.; Perri, M. A.; Gillies, P. J. Biochemical and Morphological
     Studies of Ammonium Perfluorooctanoate-Induced Hepatomegaly and Peroxisome
     Proliferation. Exp. Mol. Pathol. 1987, 47 (1), 98–109. https://doi.org/10.1016/0014-
     4800(87)90011-6.
(20) Abdellatif, A.; Preat, V.; Taper, H. S.; Roberfroid, M. The Modulation of Rat Liver
     Carcinogenesis by Perfluorooctanoic Acid, a Peroxisome Proliferator. Toxicol. Appl.
     Pharmacol. 1991, 111 (3), 530–537. https://doi.org/10.1016/0041-008X(91)90257-F.
(21) Ren, X. M.; Qin, W. P.; Cao, L. Y.; Zhang, J.; Yang, Y.; Wan, B.; Guo, L. H. Binding
     Interactions of Perfluoroalkyl Substances with Thyroid Hormone Transport Proteins and
     Potential Toxicological Implications. Toxicology 2016, 366–367, 32–42.
     https://doi.org/10.1016/j.tox.2016.08.011.
(22) Zhang, L.; Ren, X. M.; Guo, L. H. Structure-Based Investigation on the Interaction of
     Perfluorinated Compounds with Human Liver Fatty Acid Binding Protein. Environ. Sci.
                                               231


     Technol. 2013, 47 (19), 11293–11301. https://doi.org/10.1021/es4026722.
(23) Han, X.; Snow, T. A.; Kemper, R. A.; Jepson, G. W. Binding of Perfluorooctanoic Acid to
     Rat and Human Plasma Proteins. Chem. Res. Toxicol. 2003, 16 (6), 775–781.
     https://doi.org/10.1021/tx034005w.
(24) Wang, Y.; Zhang, H.; Kang, Y.; Cao, J. Effects of Perfluorooctane Sulfonate on the
     Conformation and Activity of Bovine Serum Albumin. J. Photochem. Photobiol. B Biol.
     2016, 159, 66–73. https://doi.org/10.1016/j.jphotobiol.2016.03.024.
(25) Beesoon, S.; Martin, J. W. Isomer-Specific Binding Affinity of Perfluorooctanesulfonate
     (PFOS) and Perfluorooctanoate (PFOA) to Serum Proteins. Environ. Sci. Technol. 2015,
     49 (9), 5722–5731. https://doi.org/10.1021/es505399w.
(26) Honda, M.; Muta, A.; Akasaka, T.; Inoue, Y.; Shimasaki, Y.; Kannan, K.; Okino, N.;
     Oshima, Y. Identification of Perfluorooctane Sulfonate Binding Protein in the Plasma of
     Tiger Pufferfish Takifugu Rubripes. Ecotoxicol. Environ. Saf. 2014, 104 (1), 409–413.
     https://doi.org/10.1016/j.ecoenv.2013.11.010.
(27) Chou, H. C.; Wen, L. L.; Chang, C. C.; Lin, C. Y.; Jin, L.; Juan, S. H. L-Carnitine via
     PPARγ- and Sirt1-Dependent Mechanisms Attenuates Epithelial-Mesenchymal Transition
     and Renal Fibrosis Caused by Perfluorooctanesulfonate. Toxicol. Sci. 2017, 160 (2), 217–
     229. https://doi.org/10.1093/toxsci/kfx183.
(28) Wen, L. L.; Lin, C. Y.; Chou, H. C.; Chang, C. C.; Lo, H. Y.; Juan, S. H.
     Perfluorooctanesulfonate Mediates Renal Tubular Cell Apoptosis through PPARgamma
     Inactivation.        PLoS          One        2016,         11        (5),       e0155190.
     https://doi.org/10.1371/journal.pone.0155190.
(29) Liu, W. S.; Lai, Y. T.; Chan, H. L.; Li, S. Y.; Lin, C. C.; Liu, C. K.; Tsou, H. H.; Liu, T.
     Y. Associations between Perfluorinated Chemicals and Serum Biochemical Markers and
     Performance Status in Uremic Patients under Hemodialysis. PLoS One 2018, 13 (7),
     e0200271. https://doi.org/10.1371/journal.pone.0200271.
(30) Fulton, J.; Mazumder, B.; Whitchurch, J. B.; Monteiro, C. J.; Collins, H. M.; Chan, C. M.;
     Clemente, M. P.; Hernandez-Quiles, M.; Stewart, E. A.; Amoaku, W. M.; Moran, P. M.;
     Mongan, N. P.; Persson, J. L.; Ali, S.; Heery, D. M. Heterodimers of Photoreceptor-
     Specific Nuclear Receptor (PNR/NR2E3) and Peroxisome Proliferator-Activated
     Receptor-γ (PPARγ) Are Disrupted by Retinal Disease-Associated Mutations. Cell Death
     Dis. 2017, 8 (3), e2677–e2677. https://doi.org/10.1038/cddis.2017.98.
(31) Todorov, V. T.; Desch, M.; Schmitt-Nilson, N.; Todorova, A.; Kurtz, A. Peroxisome
     Proliferator-Activated Receptor-γ Is Involved in the Control of Renin Gene Expression.
     Hypertension 2007, 50 (5), 939–944. https://doi.org/10.1161/hypertensionaha.107.092817.
(32) Estany, J.; Ros-Freixedes, R.; Tor, M.; Pena, R. N. A Functional Variant in the Stearoyl-
     CoA Desaturase Gene Promoter Enhances Fatty Acid Desaturation in Pork. PLoS One
     2014, 9 (1), e86177. https://doi.org/10.1371/journal.pone.0086177.
                                              232


(33) Okuno, M.; Arimoto, E.; Ikenobu, Y.; Nishihara, T.; Imagawa, M. Dual DNA-Binding
     Specificity of Peroxisome-Proliferator-Activated Receptor γ Controlled by Heterodimer
     Formation with Retinoid X Receptor α. Biochem. J. 2001, 353 (2), 193–198.
     https://doi.org/10.1042/bj3530193.
(34) Nolte, R. T.; Wisely, G. B.; Westin, S.; Cobb, J. E.; Lambert, M. H.; Kurokawa, R.;
     Rosenfeld, M. G.; Willson, T. M.; Glass, C. K.; Milburn, M. V. Ligand Binding and Co-
     Activator Assembly of the Peroxisome Proliferator-Activated Receptor-γ. Nature 1998,
     395 (6698), 137–143. https://doi.org/10.1038/25931.
(35) Waku, T.; Shiraki, T.; Oyama, T.; Maebara, K.; Nakamori, R.; Morikawa, K. The Nuclear
     Receptor PPARγ Individually Responds to Serotonin-and Fatty Acid-Metabolites. EMBO
     J. 2010, 29 (19), 3395–3407. https://doi.org/10.1038/emboj.2010.197.
(36) Salvalaglio, M.; Muscionico, I.; Cavallotti, C. Determination of Energies and Sites of
     Binding of PFOA and PFOS to Human Serum Albumin. J. Phys. Chem. B 2010, 114 (46),
     14860–14874. https://doi.org/10.1021/jp106584b.
(37) Ng, C. A.; Hungerbuehler, K. Exploring the Use of Molecular Docking to Identify
     Bioaccumulative Perfluorinated Alkyl Acids (PFAAs). Environ. Sci. Technol. 2015, 49
     (20), 12306–12314. https://doi.org/10.1021/acs.est.5b03000.
(38) Chen, H.; He, P.; Rao, H.; Wang, F.; Liu, H.; Yao, J. Systematic Investigation of the
     Toxic Mechanism of PFOA and PFOS on Bovine Serum Albumin by Spectroscopic and
     Molecular          Modeling.        Chemosphere         2015,        129,        217–224.
     https://doi.org/10.1016/j.chemosphere.2014.11.040.
(39) Zhang, W.; Xiong, X.; Wang, F.; Ge, Y.; Liu, Y. Studies of the Interaction between
     Ronidazole and Human Serum Albumin by Spectroscopic and Molecular Docking
     Methods. J. Solution Chem. 2013, 42 (6), 1194–1206. https://doi.org/10.1007/s10953-013-
     0027-5.
(40) Cheng, W.; Ng, C. A. Predicting Relative Protein Affinity of Novel Per- and
     Polyfluoroalkyl Substances (PFASs) by An Efficient Molecular Dynamics Approach.
     Environ. Sci. Technol. 2018, 52 (14), 7972–7980. https://doi.org/10.1021/acs.est.8b01268.
(41) Tsukahara, T.; Tsukahara, R.; Yasuda, S.; Makarova, N.; Valentine, W. J.; Allison, P.;
     Yuan, H.; Baker, D. L.; Li, Z.; Bittman, R.; Parrill, A.; Tigyi, G. Different Residues
     Mediate Recognition of 1-O-Oleyl-Lysophosphatidic Acid and Rosiglitazone in the
     Ligand Binding Domain of Peroxisome Proliferator-Activated Receptor. J. Biol. Chem.
     2006, 281 (6), 3398–3407. https://doi.org/10.1074/jbc.M510843200.
(42) Uppenberg, J.; Svensson, C.; Jaki, M.; Bertilsson, G.; Jendeberg, L.; Berkenstam, A.
     Crystal Structure of the Ligand Binding Domain of the Human Nuclear Receptor
     PPARgamma.          J.     Biol.    Chem.      1998,      273     (47),    31108–31112.
     https://doi.org/10.1074/jbc.273.47.31108.
(43) Zoete, V.; Grosdidier, A.; Michelin, O. Peroxisome Proliferator-Activated Receptor
                                              233


     Structures: Ligand Specificity, Molecular Switch and Interactions with Regulators.
     Biochim. Biophys. Acta - Mol. Cell Biol. Lipids 2007, 1771 (8), 915–925.
     https://doi.org/10.1016/j.bbalip.2007.01.007.
(44) Li, C. H.; Ren, X. M.; Cao, L. Y.; Qin, W. P.; Guo, L. H. Investigation of Binding and
     Activity of Perfluoroalkyl Substances to the Human Peroxisome Proliferator-Activated
     Receptor β/δ. Environ. Sci. Process. Impacts 2019, 21 (11), 1908–1914.
     https://doi.org/10.1039/c9em00218a.
(45) Wang, Z.; Cousins, I. T.; Scheringer, M.; Hungerbühler, K. Fluorinated Alternatives to
     Long-Chain Perfluoroalkyl Carboxylic Acids (PFCAs), Perfluoroalkane Sulfonic Acids
     (PFSAs) and Their Potential Precursors. Environ. Int. 2013, 60, 242–248.
     https://doi.org/10.1016/j.envint.2013.08.021.
(46) Poulsen, P. B.; Jensen, A. A.; Wallström, E.; Aps, E. More Environmentally Friendly
     Alternatives to PFOS-Compounds and PFOA; 2005.
(47) Wang, Y.; Chang, W.; Wang, L.; Zhang, Y.; Zhang, Y.; Wang, M.; Wang, Y.; Li, P. A
     Review of Sources, Multimedia Distribution and Health Risks of Novel Fluorinated
     Alternatives.       Ecotoxicol.     Environ.      Saf.      2019,     182,      109402.
     https://doi.org/10.1016/j.ecoenv.2019.109402.
(48) Labute, P. Protonate3D: Assignment of Ionization States and Hydrogen Coordinates to
     Macromolecular Structures. Proteins Struct. Funct. Bioinforma. 2009, 75 (1), 187–205.
     https://doi.org/10.1002/prot.22234.
(49) Tobergte, D. R.; Curtis, S. MOE Molecular Operating Environment. Journal of Chemical
     Information        and      Modeling.       Montreal      2013,      pp      1689–1699.
     https://doi.org/10.1017/CBO9781107415324.004.
(50) Labute, P.; Santavy, M. SiteFinder-Locating Binding Sites in Protein Structures
     http://www.chempcomp.com/journal/sitefind.htm%5Cnhttps://www.chemcomp.com/journ
     al/sitefind.htm.
(51) Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B. A.;
     Thiessen, P. A.; Yu, B.; Zaslavsky, L.; Zhang, J.; Bolton, E. E. PubChem 2019 Update:
     Improved Access to Chemical Data. Nucleic Acids Res. 2019, 47 (D1), D1102–D1109.
     https://doi.org/10.1093/nar/gky1033.
(52) Hoffmann, R. An Extended Hückel Theory. I. Hydrocarbons. J. Chem. Phys. 1963, 39 (6),
     1397–1412. https://doi.org/10.1063/1.1734456.
(53) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison
     of Multiple Amber Force Fields and Development of Improved Protein Backbone
     Parameters. Proteins        Struct.    Funct. Genet.      2006,   65    (3), 712–725.
     https://doi.org/10.1002/prot.21123.
(54) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and
                                              234


     Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25 (9), 1157–1174.
     https://doi.org/10.1002/jcc.20035.
(55) Corbeil, C. R.; Williams, C. I.; Labute, P. Variability in Docking Success Rates Due to
     Dataset Preparation. J. Comput. Aided. Mol. Des. 2012, 26 (6), 775–786.
     https://doi.org/10.1007/s10822-012-9570-1.
(56) D.A. Case, D.S. Cerutti, T.E. Cheatham, III, T.A. Darden, R.E. Duke, T.J. Giese, H.
     Gohlke, A.W. Goetz, D.; Greene, N. Homeyer, S. Izadi, A. Kovalenko, T.S. Lee, S.
     LeGrand, P. Li, C. Lin, J. Liu, T. Luchko, R. L.; D. Mermelstein, K.M. Merz, G. Monard,
     H. Nguyen, I. Omelyan, A. Onufriev, F. Pan, R. Qi, D.R. Roe, A.; Roitberg, C. Sagui,
     C.L. Simmerling, W.M. Botello-Smith, J. Swails, R.C. Walker, J. Wang, R.M. Wolf, X.;
     Wu,      L.      Xiao,   D.     M.    Y.     and     P.    A.    K.    Amber17.     2017.
     https://doi.org/10.13140/RG.2.2.36172.41606.
(57) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C.
     Ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from
     Ff99SB.       J.     Chem.     Theory     Comput.       2015,    11    (8),   3696–3713.
     https://doi.org/10.1021/acs.jctc.5b00255.
(58) Jakalian, A.; Jack, D. B.; Bayly, C. I. Fast, Efficient Generation of High-Quality Atomic
     Charges. AM1-BCC Model: II. Parameterization and Validation. J. Comput. Chem. 2002,
     23 (16), 1623–1641. https://doi.org/10.1002/jcc.10128.
(59) Joung, I. S.; Cheatham, T. E. Determination of Alkali and Halide Monovalent Ion
     Parameters for Use in Explicitly Solvated Biomolecular Simulations. J. Phys. Chem. B
     2008, 112 (30), 9020–9041. https://doi.org/10.1021/jp8001614.
(60) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. Numerical Integration of the Cartesian
     Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes. J.
     Comput. Phys. 1977, 23 (3), 327–341. https://doi.org/10.1016/0021-9991(77)90098-5.
(61) Onufriev, A.; Bashford, D.; Case, D. A. Exploring Protein Native States and Large-Scale
     Conformational Changes with a Modified Generalized Born Model. Proteins Struct.
     Funct. Genet. 2004, 55 (2), 383–394. https://doi.org/10.1002/prot.20033.
(62) Miller, B. R.; McGee, T. D.; Swails, J. M.; Homeyer, N.; Gohlke, H.; Roitberg, A. E.
     MMPBSA.Py: An Efficient Program for End-State Free Energy Calculations. J. Chem.
     Theory Comput. 2012, 8 (9), 3314–3321. https://doi.org/10.1021/ct300418h.
(63) Hou, T.; Wang, J.; Li, Y.; Wang, W. Assessing the Performance of the MM/PBSA and
     MM/GBSA Methods. 1. The Accuracy of Binding Free Energy Calculations Based on
     Molecular Dynamics Simulations. J. Chem. Inf. Model. 2011, 51 (1), 69–82.
     https://doi.org/10.1021/ci100275a.
(64) Roe, D. R. Introduction to hydrogen bond analysis https://amber.utah.edu/AMBER-
     workshop/London-2015/Hbond/ (accessed Apr 19, 2019).
                                              235


(65) Liberato, M. V.; Nascimento, A. S.; Ayers, S. D.; Lin, J. Z.; Cvoro, A.; Silveira, R. L.;
     Martínez, L.; Souza, P. C. T.; Saidemberg, D.; Deng, T.; Amato, A. A.; Togashi, M.;
     Hsueh, W. A.; Phillips, K.; Palma, M. S.; Neves, F. A. R.; Skaf, M. S.; Webb, P.;
     Polikarpov, I. Medium Chain Fatty Acids Are Selective Peroxisome Proliferator Activated
     Receptor (PPAR) γ Activators and Pan-PPAR Partial Agonists. PLoS One 2012, 7 (5), 1–
     10. https://doi.org/10.1371/journal.pone.0036297.
(66) Shi, G. Q.; Dropinski, J. F.; McKeever, B. M.; Xu, S.; Becker, J. W.; Berger, J. P.;
     MacNaul, K. L.; Eibrecht, A.; Zhou, G.; Doebber, T. W.; Wang, P.; Chao, Y. S.; Forrest,
     M.; Heck, J. V.; Moller, D. E.; Jones, A. B. Design and Synthesis of α-
     Aryloxyphenylacetic Acid Derivatives: A Novel Class of PPARα/γ Dual Agonists with
     Potent Antihyperglycemic and Lipid Modulating Activity. J. Med. Chem. 2005, 48 (13),
     4457–4468. https://doi.org/10.1021/jm0502135.
(67) Kuwabara, N.; Oyama, T.; Tomioka, D.; Ohashi, M.; Yanagisawa, J.; Shimizu, T.;
     Miyachi, H. Peroxisome Proliferator-Activated Receptors (PPARs) Have Multiple
     Binding Points That Accommodate Ligands in Various Conformations: Phenylpropanoic
     Acid-Type PPAR Ligands Bind to PPAR in Different Conformations, Depending on the
     Subtype. J. Med. Chem. 2012, 55 (2), 893–902. https://doi.org/10.1021/jm2014293.
(68) Hughes, T. S.; Giri, P. K.; de Vera, I. M. S.; Marciano, D. P.; Kuruvilla, D. S.; Shin, Y.;
     Blayo, A.-L.; Kamenecka, T. M.; Burris, T. P.; Griffin, P. R.; Kojetin, D. J. An Alternate
     Binding Site for PPARγ Ligands. Nat. Commun. 2014, 5 (1), 3571.
     https://doi.org/10.1038/ncomms4571.
(69) Lai, T. T.; Eken, Y.; Wilson, A. K. Binding of Per- and Polyfluoroalkyl Substances to the
     Human Pregnane X Receptor. Environ. Sci. Technol. 2020, 54 (24), 15986–15995.
     https://doi.org/10.1021/acs.est.0c04651.
(70) Genheden, S.; Ryde, U. The MM/PBSA and MM/GBSA Methods to Estimate Ligand-
     Binding Affinities. Expert Opinion on Drug Discovery. 2015, pp 449–461.
     https://doi.org/10.1517/17460441.2015.1032936.
                                              236


                                 CHAPTER NINE
Mechanisms behind Protein Kinase C (PKC) Activation
                                       237


9.1 Introduction
    Protein kinase C (PKC) encompass a family of serine/threonine kinases involved in
controlling various signaling pathways that regulate cell proliferation, survival, apoptosis,
migration, invasion, differentiation, angiogenesis, and drug resistance.1 PKC acts by changing
the activities of other PKC family members and proteins within signaling pathways by
phosphorylation of the hydroxyl groups of serine and threonine residues. Members of the PKC
family are considered promising targets for several diseases including multiple types of cancer,
cardiovascular diseases, immune and inflammatory diseases, neurological and metabolic
disorders due to their essential role in the cell cycle.1 PKCs are considered to be suitable
therapeutic targets, as there are no mutations in PKC encoding genes, thus, eliminating failures
anticipated due to mutations.2 While it has been a goal of academic and industrial researchers to
develop PKC-specific inhibitors, a major challenge is targeting a specific kinase resulting from
the highly similar structures of different PKC isoforms.3
    Early studies have shown that in the absence of Ca2+, PKCα weakly interacts with lipid
bilayer.4 As shown in Figure 9.1, the first step of activation is dependent on intracellular Ca2+
binding to the PKCα-C2 domain which increases its affinity for membranes and causes the
enzyme to drift to the cell membrane (even though this initial electrostatic interaction is still low
in affinity. After PKCα-C2 is docked to the membrane, it moves deeper into the membrane and
interacts with phosphatidylinositol 4,5-bisphosphate (PIP2) completing the second step of
activation.5
                                                238


    Figure 9. 1 A schematic of the PKC activation pathway. In the first activation step the Ca2+
binds to the C2 domain, increasing the membrane affinity of the enzyme and PKC drifts to the
membrane. Next, PIP2 that is present in the membrane binds to the C2 domain and loosens the
C1-C2 domain interaction causing the C1 domain to move inside the membrane where it can
bind to DAG. After Ca2+, PIP2 and DAG binding is established, the pseudo substrate domain
leaves the active site in the kinase domain completing the activation of the enzyme.4,6
    The third step occurs after this secondary interaction interrupts electrostatic C1/C2 inter
domain binding and allows the C1 domain to penetrate the membrane and bind to the
diacylglycerol (DAG). Even though both C1-DAG and C2-PIP2 interactions are relatively low in
affinity, the combined energetics from the two leads to a strong binding to the membrane.
                                               239


Establishing this strong binding is the key for the final activation step. The PKCα structure goes
through a final conformational change, where the auto-inhibitory pseudo-substrate (PSub)
domain is expelled from the kinase domain, leaving the active site of the enzyme available for
substrate binding and thus completing the activation.4
    Figure 9. 2 PKC subgroups have slightly varying structures and regulators. All isoforms
carry a kinase domain with an activation loop shown as blue. Both conventional and novel PKCs
contain a C1 domain that can be regulated by DAG, PS as shown in orange, whereas atypical
PKC C1 domain can only be regulated by PS. The C2 domain that can be regulated by Ca2+ and
PIP2 is only present in the conventional subgroup, novel PKCs contain a modified C2 domain
that lacks the necessary residues for binding. Atypical PKCs carry Phox and Bem 1 (PB1)
domain instead of the C2 domain present in the other subgroups.6
    Based on structure and cofactor regulation, PKC isozymes can be classified into three
groups: conventional (cPKC α, β, γ), novel (nPKC ε, η, θ, δ), and atypical (aPKC ι, ζ).2 As
shown in Figure 9.2, all isoforms contain the kinase domain with an activation loop in the
middle. This part is also known as the catalytic domain and contains the necessary motifs for
ATP, substrate binding and also the residues that catalyze the kinase reaction.4 In its inactive
form the PSub domain blocks this active loop and prevents substrates to reach the active site.7
Interaction between the PSub domain and the kinase domain must be interrupted by regulation of
                                                240


C1 and C2 domains in order to have an active form of the enzyme. Both conventional, novel and
atypical PKC structures contain a C1 domain regulated by the phosphatidylserine (PS), for
cPKCs and nPKCs there is an additional DAG binding site present. C2 domain regulation is
unique to cPKCs for two reasons. First, aPKCs do not have this domain, instead they carry the
Phox and Bem 1 domain (PB1) where interactions between protein scaffolds are mediated.
Second, nPKCs have C2 domains but their C2 domain lack the necessary amino acid residues
that can stimulate Ca2+ or PIP2 binding that are essential for the activation of cPKCs. Thus,
revealing the mechanisms behind C2 regulation covered, will bring out factors and potent target
sites that can be used in the design of new therapeutics.
9.2 Methods
The domains of PKCs are compared using the NCBI Basic Local Alignment Search Tool
(BLAST). Results are scored using the BLOSUM62 matrix with a gap cost 11 and extension 1.8
Initial coordinates of PKCα-C2(PDB ID: 4DNL9) and PKCδ-C2(PDB ID: 1YRK10) are obtained
from the Protein Data Bank (PDB ID: 4DNL); missing residues and hydrogens are added using
Molecular Operating Environment v.2016.08 (MOE).11 These structures are initially minimized
in MOE with the AMBER ff10 force field. The systems were prepared using Gromacs-5.0.112
with the amber99sb13 force field, and placed into a triclinic unit cell with a 1 nm solute box
distance. The unit cell is solvated in SPC/E-type waters and ions corresponding to 150 mM
NaCl, 150 mM CaCl2, 100 mM CaCl2, and 50 mM CaCl2 solutions are explicitly replaced with
random water molecules. First the systems are minimized using conjugate gradient algorithm for
10,000 steps, and the steepest descent method is used every tenth step. Then, water around the
protein is equilibrated for 20 picoseconds by restraining protein atoms to their initial position.
Next, the production simulations are performed by removing protein restraints. The trajectories
                                                241


were produced using velocity rescale thermo couple to keep the temperature at 300 K, and
Berendsen barostat to keep the pressure at 1 atm. The SHAKE algorithm was used to constrain
bonds involving hydrogens,14 vdw interactions treated with a 10 Å cutoff and long-range
electrostatic interactions were modeled with PME also with a 10 Å cutoff.
9.2 Results and Discussion
9.2.1 Sequence Alignment
    As noted earlier, based on structure and cofactor regulation, these isozymes can be classified
into three groups: conventional (α, β, γ), novel (ε, η, θ, δ), and atypical (ι, ζ) PKCs. 7 In order to
understand differences between these subgroups, sequence alignment was performed on different
domains of PKCs. During these alignments, PKCα was used as a reference for alignment.
    Figure 9. 3 Comparison of kinase domain of different PKC family isoforms with sequence
alignment.
    Sequence alignments of kinase domains of different PKCs resulted in scores higher than 200
for both isoforms in the same group and outside of the group which are compared (Figure 9.3).
These results show that the kinase domain has the highest similarity among PKC isozymes. This
domain carries an ATP binding domain and a kinase active site. Possible inhibitors of active sites
have been developed but selective inhibition is extremely difficult because of this high similarity.
                                                242


    Figure 9. 4 PKC family C1 domain sequence alignment.
    The C1 domain sequence alignment scores are higher than 200 for the conventional subgroup
members, as shown in Fig. 9.4. This indicates that the C1 domain shows high similarity between
the same members of the group. Whereas, when the PKCα-C1 sequence is aligned with members
of the other groups, the score slightly decreases. This domain shows slight variation between
members of different subgroups. This moderately similar part of the protein contains the
potential binding site for DAG and phosphatidylserine.4
    Figure 9. 5 PKC family C2 domain sequence alignment.
    Sequence alignment scores of the C2 domain show that this domain contains slight
differences among other members of the conventional PKC subgroup with a scores between 80-
200 (Figure 9.5). However, the sequence alignment for others subgroup members results in very
small scores, indicating that this domain which holds the Ca2+ and PIP2 binding site, is
                                              243


significantly different among the different subgroups. The results of sequence alignment suggest
that studying the C2 domain activation might hold a solution for the target specificity problem.
9.2.2 Binding Site Environment Comparison
    Figure 9. 6 PKCα-C2 and PKCδ-C2 binding site comparison. Potential sites for hydrogen
bonding are in purple, hydrophobic regions in green, and neutral regions in white.
    The sequence alignment results indicate that there is a significant number of differences
between the C2 domains of the PKC subgroups. Our focus, thus, shifts to understanding how the
C2 domain is regulated and what are the differences among these subgroups. Variations among
PKCs C2 domain crystal structures of different subgroups are compared to one another
(conventional PKCα and novel PKCδ. atypical PKCs lack this domain.) It is known that
conventional PKCs hold a Ca2+ binding site at this domain (Figure 9.6 left). Investigation of
potential binding sites using a geometrical approach15,16 showed that novel PKCs also have a
binding site similar in tertiary structure in place (Figure 9.6 right). The comparison of molecular
surfaces are generated using a grid-based method17 calculated in MOE program,15 show that
interior of the PKCα pocket is hydrophilic, whereas this site is hydrophobic in PKCδ.
                                                  244


    Table 9. 1 Character of PKCα-C2 and PKCδ-C2 binding site residues as obtained from a
comparison of potential binding site residues.
  Site     Nonpolar (%)      Polar (%)     Positively charged (%)       Negatively charged (%)
PKCα             14              79                   14                            36
 PKCδ            56              44                   11                            11
    A comparison of residue contents of these two sites shows that the PKCα-C2 binding site
residues are predominantly hydrophilic and make up 79% of the binding site residues, 50% of
the residues are charged with 36% to 14% negative and positive, respectively and there is an
overall negative charge due to the higher percentage of negatively charged residue (Table 9.1). In
contrary, the PKCδ-C2 binding site consists of 56% hydrophobic residues, and there is no net
charge within this site. These results show that even though PKCα-C2 and PKCδ-C2 are
structurally similar by exhibiting a potential binding site in the same region, these binding sites
have very different character.
9.2.3 Molecular Dynamics Simulations
    To better understand the behavior of different PKCs in varying environments, both PKCα-C2
and PKCδ-C2 are placed in 150 mM NaCl, 100 mM CaCl2, 50 mM CaCl2, 150 mM CaCl2 salt
solutions and these structures are simulated for 100 ns using atomistic MD (Figures 9.7 and 9.8).
                                                245


Figure 9. 7 PKCα-C2 domain RMSD for the systems in different salt concentrations.
Figure 9. 8 PKCδ-C2 domain RMSD for the systems in different salt concentrations.
                                      246


    Figure 9. 9 Coulombic and Lennard-Jones interaction energy between PKCα-C2 binding site
and Ca2+ ions in the system for extended simulation of PKCα-C2 in 150 mM CaCl2.
    In order to better understand the interactions between the two Ca2+ that are bound to the
PKCα-C2, the MD simulations are extended to 100 ns for the PKCα-C2 in 150 mM CaCl2, and
Lennard Jones and Coulombic interaction energies between pocket residues and Ca2+ on the
systems are analyzed. Figure 9.9 shows that electrostatic interaction is the dominant interaction
type and even though the first Ca2+ binds to the system at 18 ns and the second Ca2+ enters the
site at 58 ns, the electrostatic energy fluctuates during the entire simulation.
    The electrostatic interaction with the first Ca2+ starts at 18 ns where the ion enters the pocket;
at 50 ns its interaction energy increases corresponding to the Ca2+ moving deeper into the pocket.
The second Ca2+ enters the binding site at 58 ns after the relocation of first Ca2+. During the later
                                                  247


stages of simulation, it moves to a place where it can establish stronger binding as indicated by
the increase of second Ca2+ interaction energy at 78 ns.
    Figure 9. 10 Interaction energy between PKCα-C2 binding site residues and the first Ca2+
entering the site for extended simulation of PKCα-C2 in 150 mM CaCl2.
    Figure 9.10 shows the important interactions between the first Ca2+ and the residues in the
binding site. The first interaction that is established occurs between Asp248 and the Ca2+ as
shown in red in Figure 9.10. During the Ca2+ relocation at 50 ns the ion starts to interact with two
other residues (Arg252 and Asp254) as shown in blue and yellow, in the Figure 9.10. The initial
interaction with Asp248 is lost when the second Ca2+ enters the site (Figure 9.11).
                                                248


    Figure 9. 11 Interaction energy between PKCα-C2 binding site residues and the second Ca2+
entering the site for extended simulation of PKCα-C2 in 150 mM CaCl2.
    Figure 9.11 shows the important interactions between the second Ca2+ and the binding site.
When the second Ca2+ first entered, it mostly interacts with Asp187 which is lost at later stages
of the simulation to Asp246. It is also important to note that Asp248 stimulates first Ca2+ binding
during the early stages of simulations then starts switches to interacting with second Ca2+,
suggesting the two ions are competing for this interaction.
                                                249


    Figure 9. 12 Minimum energy frame of PKCα-C2 in 150mM CaCl2. Two Ca2+ and the
important residues are also shown.
    The present study on highly conserved PKC family of enzymes suggests that the C2 domain
has the most significant difference among different isoforms sequentially. The C2 domain might
also embrace a solution for the target specificity problem that occurs in therapeutic applications
targeting these enzymes. The binding site comparison of PKCα-C2 PKCδ-C2 shows that the two
binding sites exhibit a very different environment especially on the site where the Ca2+ binding
                                                250


occurs in the conventional PKCs. The Ca2+ binding site of PKCα takes an overall negative
charge and five aspartic acid residues present in this site (Asp248, Asp254, Asp246, Asp193,
Asp187, Figure 9.12) that are involved in the Ca2+ binding activation mechanism. Not having
these residues might result a lack of Ca2+ regulation in PKCδ-C2. This is the first time that the
PKC Ca2+ binding activation mechanism has been investigated using molecular dynamics.
                                              251


REFERENCES
    252


                                        REFERENCES
(1)  Garg, R.; Benedetti, L. G.; Abera, M. B.; Wang, H.; Abba, M.; Kazanietz, M. G. Protein
     Kinase C and Cancer: What We Know and What We Do Not. Oncogene 2014, 33 (45),
     5225–5237. https://doi.org/10.1038/onc.2013.524.
(2)  Koivunen, J.; Aaltonen, V.; Peltonen, J. Protein Kinase C (PKC) Family in Cancer
     Progression. Cancer Lett. 2006, pp 1–10. https://doi.org/10.1016/j.canlet.2005.03.033.
(3)  Mochly-Rosen, D.; Das, K.; Grimes, K. V. Protein Kinase C, an Elusive Therapeutic
     Target? Nat. Rev. Drug Discov. 2012, 11 (12), 937–957. https://doi.org/10.1038/nrd3871.
(4)  Steinberg, S. F. Structural Basis of Protein Kinase C Isoform Function. Physiol. Rev.
     2008, 88 (4), 1341–1378. https://doi.org/10.1152/physrev.00034.2007.
(5)  Alwarawrah, M.; Wereszczynski, J. Investigation of the Effect of Bilayer Composition on
     PKC$α$-C2 Domain Docking Using Molecular Dynamics Simulations. J. Phys. Chem. B
     2017, 121 (1), 78–88. https://doi.org/10.1021/acs.jpcb.6b10188.
(6)  Newton, A. C.; Antal, C. E.; Steinberg, S. F. Protein Kinase C Mechanisms That
     Contribute to Cardiac Remodelling. Clin. Sci. 2016, 130 (17), 1499–1510.
     https://doi.org/10.1042/CS20160036.
(7)  Newton, A. C. Protein Kinase C: Poised to Signal. AJP Endocrinol. Metab. 2010, 298 (3),
     E395--E402. https://doi.org/10.1152/ajpendo.00477.2009.
(8)  Ramsay, L.; Macaulay, M.; Degli Ivanissevich, S.; MacLean, K.; Cardle, L.; Fuller, J.;
     Edwards, K. J.; Tuvesson, S.; Morgante, M.; Massari, A.; Maestri, E.; Marmiroli, N.;
     Sjakste, T.; Ganal, M.; Powell, W.; Waugh, R. A Simple Sequence Repeat-Based Linkage
     Map         of      Barley.       Genetics     2000,       156       (4),      1997–2005.
     https://doi.org/10.1093/nar/25.17.3389.
(9)  (TCELL) Joint Center For Structural Genomics (JCSG), P. F. T.-C. B. X-Ray Diffraction
     Data for the Crystal Structure of a C2 Domain of a Protein Kinase C Alpha (PRKCA)
     from      Homo       Sapiens      at     1.90    A      Resolution      (4DNL).      2012.
     https://doi.org/10.18430/M34DNL.
(10) Funabiki, H. Two Birds with One Stone - Dealing with Nuclear Transport and Spindle
     Assembly. Cell 2005, 121 (2), 157–158. https://doi.org/10.1016/j.cell.2005.04.003.
(11) Chemical Computing Group Inc. Molecular Operating Environment (MOE). Montreal
     2016.
(12) Abraham, M. J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J. C.; Hess, B.; Lindahl, E.
     GROMACS: High Performance Molecular Simulations through Multi-Level Parallelism
     from      Laptops     to      Supercomputers.     SoftwareX     2015,      1–2,     19–25.
     https://doi.org/10.1016/j.softx.2015.06.001.
                                              253


(13) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C.; Brook, S.;
     Brook, S.; Brook, S. Comparison of Multiple AMBER Force Fields and Development of
     Improved Protien Backbone Parameters. Proteins 2006, 65 (3), 712–725.
     https://doi.org/10.1002/prot.21123.Comparison.
(14) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. Numerical Integration of the Cartesian
     Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes. J.
     Comput. Phys. 1977, 23 (3), 327–341. https://doi.org/10.1016/0021-9991(77)90098-5.
(15) Labute, P.; Santavy, M. SiteFinder-Locating Binding Sites in Protein Structures
     http://www.chempcomp.com/journal/sitefind.htm%5Cnhttps://www.chemcomp.com/journ
     al/sitefind.htm.
(16) Anthony, W. J.; Bender, A.; Kaya, T.; Clemons, P. A. Alpha Shapes Applied to Molecular
     Shape Characterization Exhibit Novel Properties Compared to Established Shape
     Descriptors.     J.    Chem.      Inf.   Model.      2009,    49    (10),     2231–2241.
     https://doi.org/10.1021/ci900190z.
(17) Sethian, J. Advancing Interfaces: Level Set and Fast Marching Methods. 1999, 12.
                                             254


                                  CHAPTER TEN
Conclusions and Future Directions
                                      255


    Conclusions and Future Directions
    With ever-evolving technological developments and advancement of computational
modelling techniques, computational biochemistry can be used to study the dynamics of large
systems. Proteins are dynamic systems in nature and their activity depends on their
conformational states (i.e active/inactive), which may be affected by ligand binding.
Understanding ligand binding phenomena, protein dynamics, and structural perturbations
triggered by the binding is critical to understand biology. Computational modelling allows the
study of these dynamics and simulation/analysis of ligand binding at a molecular level. For this
dissertation, molecular dynamics, binding free energy calculations and bioinformatic tools were
used to study binding and dynamics of a number of host-guest, protein-ligand and protein-ion
systems.
    Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) blind challenges
provide a platform to validate/gauge current modelling techniques to predict physicochemical
properties. In chapters 3 and 4 molecular dynamics and quantum dynamics were used to predict
binding energies between the host-guest molecules. The results showed that MD followed by
MMPBSA/MMGBSA calculations can be used to qualitatively rank binding energies of small
molecules with low computational cost and memory, even though the predictions result in
systematically higher binding energies than experiments. Due to its success, the MD simulations
followed by MMPBSA/MMGBSA calculations can be applied to various applications where
relative binding energies are important. To predict absolute binding energies without corrections
based on similar systems, implementation of a better solute entropy model with respect to
performance and accuracy on MMPBSA/MMGBSA solver should be considered in the future.
                                               256


    In Chapter 5, molecular modeling was used to study interactions between the Endo-A
enzyme and glycans 39, 41 which were synthesized and experimentally studied by Huang group.
Experimentally, Endo-A enzyme shows substrate preference towards glycan 39. The simulations
showed significantly weaker binding of glycan 41 toward Endo-A which can explain the lack of
glycosylation. In addition, the simulations also pointed out a mechanistic explanation on the
Endo-A substrate preference: In all glycan 41- Endo-A simulations, active site gate residues
W244 and W216 are prohibited from closing, which can account for the reduced yield from
glycosylation reaction by preventing the formation of the closed active site.
    In another collaborative effort, the Huang group synthesized HS glycopeptide and HS glycan,
discussed in chapter 6, by using total synthesis and experimentally measured FGF-2 dissociation
constants and heparanase inhibition percentages of these compounds along with the peptide
backbone. The experimental studies showed that only the glycan showed inhibitory activity
against heparanase, while the glycopeptide showed a three-fold enhanced binding in comparison
to the glycan binding to FGF-2. To understand different biological functions of glycan and
glycopeptide on the FGF-2 and heparanase systems, molecular modeling was used. HS glycan,
HS glycopeptide and peptide binding to the FGF-2 and heparanase enzymes were studied
through molecular dynamics and free energy calculations. The simulations showed the peptide
portion of the glycopeptide can lead to additional salt bridges in FGF-2 systems, whereas in
heparanase, the glycopeptide tends to pull the glycan core towards solvent and loosen the
hydrogen bonds. Both experiments and simulations showed that HS and HS proteoglycan can
possess different biological functions. As highlighted through simulations depending on the
target, peptide backbone can loosen binding of the core or result in additional interactions with
the targets. For future directions, these interactions can be further analyzed by simulation of
                                                257


mutant heparanase, and FGF-2 complexes bound to HS and HS proteoglycan to quantitively
assess each interaction and its contribution to the binding.
        In chapter 7 and 8, the MD/MMPBSA approach was used in SAMPL challenges to
investigate the binding of per and poly fluoroalkyl substances (PFASs) to a number of human
receptors. PFASs are emerging contaminants with a large and quickly growing chemical space.
Human Pregnane X receptor (hPXR) and Peroxisome Proliferator Activated Receptor γ (PPARγ)
are known targets for legacy PFASs with available toxicity data. However, toxicity of recently
emerged PFAS alternatives is not measured for these systems. Molecular modeling was used to
predict alternative PFASs’ toxicity on hPXR and PPARγ and showed they still exhibit binding
and may show toxicity. Additionally, long- and short-range interactions between the amino acids
within the binding sites and PFASs were investigated to understand how PFAS is recognized on
these receptors. The results outlined key residues that contribute strongly to the binding. The
pioneer studies detailed in chapters in 7 and 8 show how biophysical tools can be a fundamental
part of understanding PFASs at a molecular level, and guide scientist to find solutions for PFASs
related environmental issues. The methodologies discussed can be further used to investigate
other known PFAS targets and recently developed PFASs alternatives, which can also have
damaging effects on the environment. Moreover, the molecular recognition patterns identified
through models, can be used to develop environment friendly PFASs or PFAS inhibitors, such as
L-carnitine has proven to be for PPARγ.
    In chapter 9, Protein Kinase Cs (PKCs), a family of serine/threonine kinases, have been
studied. The Ca2+ binding induced activation of conventional and novel PKCs were studied
through bioinformatics and molecular dynamics simulations. The simulations displayed
successive binding of multiple Ca2+ to the C2 domain of PKCα. Additionally, interaction
                                                 258


energies identified five aspartic acid residues important to attract and hold calcium ions in this
domain. As shown with the sequence alignments and bioinformatic results the C2 domain of
PKC is a promising drug target. Future investigations should include further understanding of the
C2 domain’s role on the remaining steps of PKC activation such as phosphatidylinositol 4,5-
bisphosphate (PIP2) binding to the C2 domain. With sufficient understanding of the activity and
potential binding sites, the therapeutic view of this protein can be used to develop new therapies
for several diseases that are known to be affected by active PKC levels, including multiple types
of cancer, cardiovascular diseases, immune and inflammatory diseases, neurological and
metabolic disorders.
                                                259