INSIGHTS INTO HUMAN HEALTH THROUGH COMPUTATIONAL MODELING: TARGETING TUBERCULOSIS AND UNDERSTANDING PFAS TOXICITY By Semiha Kevser Bali A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Chemistryβ€”Doctor of Philosophy 2024 ABSTRACT Computational modeling approaches have been instrumental to biological and biochemical research for over 50 years, providing insight that can impact advances in human, animal, and environmental health. The use of in silico methods in drug discovery can reduce financial cost and time required for the development of new drugs by providing molecular-level insight and information about structure-activity relationships. Computational chemistry approaches can provide compound iden- tification, optimization and screening towards the development of more potent and novel molecules. Computational biophysical methods are not only useful in drug discovery, but they are also useful in environmental science as well, in areas such as determining the toxicity of pollutants. The validation of computational protocols is an important step in computational modeling. SAMPL blind challenges provide host-guest systems with known binding affinities and physical properties to benchmark developed methods and protocols against experiment. A benchmark for the protocols used throughout the dissertation has been provided. Per- and polyfluoroalkyl substances (PFAS) are man-made molecules that have very interesting chemical features. These compounds are both water and oil-resistant therefore, they have been utilized in many industrial processes and household products, including fast food packaging, fire- fighting foams, dental floss, water-resistant garments, batteries, and non-stick cookware. However, research conducted in recent years has shown that some PFAS can cause significant health problems thyroid problems, cholesterol and lipid issues, and cancer in living organisms upon consistent exposure and bioaccumulation. Several of the chapters in this dissertation highlight investigations of three different protein targets of PFAS: β€’ human Peroxisome proliferator receptor gamma - retinoid x receptor alpha /DNA (PPAR𝛾- RXR𝛼/DNA) is an important protein for regulation of glucose metabolism and fat cell differentiation, β€’ human thyroglobulin protein (hTG) which is responsible for producing the thyroid hormones, β€’ rainbow trout estrogen receptor 𝛼 and 𝛽 (ER𝛼, ER𝛽) controlling the reproduction. The goal of each of the studies was to gain a molecular-level understanding of the impact of selected common and alternative PFAS on the proteins. In considering specific disease, tuberculosis (TB) is a persistent disease largely observed in rural parts of the world. While there are available treatment regimens for both drug-susceptible as well as drug-resistant TB, these treatment protocols require use of many different drugs and take from six months to up to two years. Therefore, there is a need for the development of better treatment strategies. A chapter in this dissertation highlights how computational chemistry approaches have been used in studies to develop compounds targeting the treatment of Tuberculosis via two different protein targets, mycobacterium membrane protein large 3 (mmpL3) and DosS in collaboration with medicinal chemists. Homology modeling, binding energy estimations, as well as conformational dynamics of these proteins were investigated. Copyright by SEMIHA KEVSER BALI 2024 ACKNOWLEDGEMENTS I would like to extend my thanks to my advisor, Dr. Angela K. Wilson, allowing me to be part of her group and engaging me in thrilling research projects. I also would like to thank my committee members Dr. Kenneth Merz, Dr. Katharine Hunt, and Dr. Edmund Ellsworth for their invaluable feedback and support. To the past and present members of the Wilson group: your friendship and stimulating discussions have been truly appreciated. I also would like to thank the Chemistry department members and staff for their assistance. A special thank you goes out to all my friends who have stood by me throughout this degree. I am deeply appreciative of our collaborators, Dr. Edmund Ellsworth, Dr. Robert Abramovitch, and Dr. Xuefei Huang, along with their respective groups, as well as the computational and medicinal chemistry team at Reata Pharmaceuticals, and our experimental collaborators at MSU PFAS Center, for their invaluable insights and discussions. I also would like to thank Dr. Viktorya Aviyente, for her continuous support and mentorship throughout my career. She is a great source of inspiration, and I consider myself fortunate to have her guidance. Lastly, I will be forever grateful for: β€’ my husband, Hoa, you carried the second biggest burden of my Ph.D., and your presence and support made this possible. Thank you for always being there. β€’ our cats, Effie and Babycakes, you made the long Michigan winters more entertaining. β€’ my parents, your endless support and encouragement in my pursuit of science have been the foundation of my journey. β€’ myself, for the resilience, determination, and perseverance that have brought me to this moment. TΓ΄i lΓ  tΓ΄i. v CHAPTER 1 INTRODUCTION TO MOLECULAR MODELING . . . . . . . . . . . 1 TABLE OF CONTENTS CHAPTER 2 BIBLIOGRAPHY . . . . . MOLECULAR MODELING: THEORY AND METHODOLOGY . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 CHAPTER 3 BINDING OF PER-AND POLYFLUOROALKYL SUBSTANCES (PFAS) TO THE PPAR/RXRA–DNA COMPLEX . . . . . . . . . . . . . . . . . 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 SUPPORTING TABLES . . . . . . . . . . . . . . . . . . . . . 61 SUPPORTING FIGURES . . . . . . . . . . . . . . . . . . . . . 62 . . . BIBLIOGRAPHY . APPENDIX A APPENDIX B CHAPTER 4 INFLUENCE OF PFAS ON HUMAN THYROGLOBULIN PROTEIN: IMPACT ON THYROID HORMONE SYNTHESIS . . . . . 78 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 SUPPORTING TABLES . . . . . . . . . . . . . . . . . . . . . 98 SUPPORTING FIGURES . . . . . . . . . . . . . . . . . . . . . 105 . . . BIBLIOGRAPHY . APPENDIX A APPENDIX B CHAPTER 5 FISHING FOR ANSWERS: DIFFERENT BINDING MODES OF PFAS TARGETING RAINBOW TROUT ESTROGEN RECEPTORS . . . . . . . . . . . . . . . . . . . . . . . . 113 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 . SUPPORTING TABLES . . . . . . . . . . . . . . . . . . . . . 135 SUPPORTING FIGURES . . . . . . . . . . . . . . . . . . . . 142 . . . BIBLIOGRAPHY . APPENDIX A APPENDIX B CHAPTER 6 COMPUTATIONAL PATHWAYS TOWARDS NEW THERAPEUTIC COMPOUNDS: ADDRESSING TUBERCULOSIS VIA MMPL3 INHIBITION . . . . . . . . . . . . . . . . . . . . . . . . 162 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 . SUPPORTING TABLES . . . . . . . . . . . . . . . . . . . . . 193 SUPPORTING FIGURES . . . . . . . . . . . . . . . . . . . . . 194 . . . BIBLIOGRAPHY . APPENDIX A APPENDIX B CHAPTER 7 MODELING OF DOSS INTERACTIONS WITH SMALL MOLECULE INHIBITORS AS A SUPPLEMENTARY TREATMENT STRATEGY AGAINST TB . . . . . . . . . . . . . . . . . . . . . . . . 199 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 . SUPPORTING TABLES . . . . . . . . . . . . . . . . . . . . . 223 SUPPORTING FIGURES . . . . . . . . . . . . . . . . . . . . . 224 . . . BIBLIOGRAPHY . APPENDIX A APPENDIX B CHAPTER 8 INVESTIGATION OF HOST-GUEST BINDING AFFINITIES WITH GEOMETRIC AND END-POINT BINDING FREE ENERGY CALCULATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 . SUPPORTING TABLES . . . . . . . . . . . . . . . . . . . . . 249 SUPPORTING FIGURES . . . . . . . . . . . . . . . . . . . . . 251 . . . BIBLIOGRAPHY . APPENDIX A APPENDIX B vi CHAPTER 9 CONCLUDING REMARKS AND FUTURE DIRECTIONS . . . . . . . 257 vii CHAPTER 1 INTRODUCTION TO MOLECULAR MODELING 1 During the past 50 years, computational modeling has become an important component of bio- chemical research. from drug discovery programs to toxicology studies. The improvements and the growth of computational capabilities now allow the study of large biological complexes. As the dynamics of the system is related to its function, the conformational changes observed in these biomolecules are crucial to understand processes such as ligand binding and cell signaling. Molecular dynamics simulations (MD) provide a useful route to study these phenomena in biological systems. With MD simulations, the conformational changes associated with protein activity and ligand binding can be investigated, and furthermore, the binding affinities can be predicted. This dissertation covers various applications of computational modeling for different biological systems. In Chapter 3, a class of environmental pollutants (PFAS) and their toxic effects on PPAR𝛾- RXR𝛼/DNA complex is investigated. The allosteric mechanism in which PFAS can activate this complex was explained. In the fourth Chapter, PFAS binding to ER𝛼 and ER𝛽 from rainbow trout was investigated to elucidate the impact of PFAS toxicity in aquatic species. The different binding modes of PFAS in ER𝛼 and ER𝛽 were discovered and the important residues contributing to their binding affinities were investigated. Chapter 5 highlights how PFAS can interfere with the thyroid hormone synthesis by binding to hTG protein in humans. Due to the conformational rigidity caused by the PFAS binding, we showed that the specific types of PFAS can have more impact on the thyroid hormone synthesis. The research shown in Chapters 6 and 7 highlights the efforts for the development against two different targets, mmpL3 and DosS, are shown. In addition to elucidating the binding modes and estimated binding affinities of these compounds, the activation mechanism of DosS protein was also investigated. In Chapter 8, the utility of the modeling procedures used in these studies was evaluated using the host-guest dataset from the Statistical Assessment of Modeling of Proteins and Ligands9 (SAMPL9) challenge. The detailed protocols for estimating binding free energies and their statis- tical performance were evaluated against the experimental binding affinities. 2 CHAPTER 2 MOLECULAR MODELING: THEORY AND METHODOLOGY 3 2.1 Molecular Mechanics and the Concept of Force Field Molecular Mechanics (MM) is an approach that uses classical mechanics and is based on Newtonian dynamics. In this method, the atoms are represented as spheres and the bonds are defined as strings. MM is suitable for systems that can be difficult to model using quantum mechanics due to its substantial computational cost - such as for systems with more than 1,000 atoms. For this reason, a classical approach is utilized to study proteins. The combination of the parameters which are used for MM to describe a system and its total energy is called force field. A force field has two main components: bonded and non-bonded terms. Bonded terms are bond stretching, angle bending, torsion (bond rotation), while non- bonded interactions include electrostatic and van der Waals terms. These parameters are derived from empirical data to predict the particular parameters of the molecules. Two of the bonded terms, the bond stretching and angle bending can be expresses using simple harmonic motion: π‘‰π‘π‘œπ‘›π‘‘π‘’π‘‘ = βˆ‘οΈ π‘π‘œπ‘›π‘‘ 1 2 π‘˜ π‘π‘œπ‘›π‘‘ (𝑙0 βˆ’ 𝑙)2 π‘‰π‘Žπ‘›π‘”π‘™π‘’ = βˆ‘οΈ π‘Žπ‘›π‘”π‘™π‘’ 1 2 π‘˜ π‘Žπ‘›π‘”π‘™π‘’ (πœƒ0 βˆ’ πœƒ)2 (2.1) (2.2) where kbond and kangle are force constants for bonds and angles, respectively. l0 and πœƒ0 are the reference values for the bond length and angle, respectively. The force needed to change the bond length between two bonded atoms is large, hence any significant changes in both bond lengths and bond angles are prevented. The torsional term provides the largest contribution to the total energy, and is described as following: π‘‰π‘‘π‘œπ‘Ÿ π‘ π‘–π‘œπ‘› = 𝑁 βˆ‘οΈ 𝑛=0 1 2 𝑉𝑛 [1 + π‘π‘œπ‘ (π‘›πœ” βˆ’ 𝛾)] (2.3) The term Vn describes the depth of the potential energy surface of rotations about πœ”, over the periodicity of n, with the minimum angle of 𝛾. 4 Improper torsions are used to define the out-of-plane bending motions, and the following equation shows its functional form: π‘‰π‘–π‘š π‘π‘Ÿπ‘œ π‘π‘’π‘Ÿ = 1 2 π‘˜πœ” [1 βˆ’ π‘π‘œπ‘ 2πœ”] (2.4) in which πœ” is used to define the angle between four atoms that are not bonded together in sequential order. The charges on the atoms are considered as point charges, therefore, the electrostatic interactions can be defined using the Coulomb’s Law: βˆ‘οΈ βˆ‘οΈ 𝑉𝑒𝑙 = 𝑖 𝑗 π‘žπ‘–π‘ž 𝑗 4πœ‹πœ–0π‘Ÿπ‘– 𝑗 (2.5) where qi and qj define the point charges on the atoms, and rΔ³ is the distance between them. There are different approaches used to calculate the point charges, including quantum mechan- ical methods and experimental approaches. AMBER force field, which is used in this study, uses the point charges derived from electrostatic potentials 1. The final term, the van der Waals interaction energy, is defined as the sum of all interactions of the molecules while considering their positions as well as the relative orientations. Lennard-Jones function is the most widely used representations to define vdW interactions: 𝑉𝑣𝑑𝑀 = 4πœ– 𝐴𝐡 (cid:20) (cid:16) 𝜎𝐴𝐡 π‘Ÿ (cid:17) 12 βˆ’ (cid:16) 2 𝜎𝐴𝐡 π‘Ÿ (cid:17) 6(cid:21) (2.6) where πœ– AB is the amplitude of the potential, and 𝜎AB called as collision diameter, which is the arithmetic mean of individual diameters of pure species (𝜎A and 𝜎B). 2.2 Molecular Dynamics Molecular Dynamics (MD) uses the force field terms to calculate forces for each particle in the system, and with the help of statistical mechanical approach it models the dynamics of particles, hence gives a prediction for the position of the particles within a given system. This prediction is obtained by numerically solving Newton’s equations of motion. 5 Classical Hamiltonian (H(pi(t),ri(t))) can be used to describe the time evaluated motion of a system with N particles: H(p𝑖 (t), r𝑖 (t)) = 𝑁 βˆ‘οΈ 1 2π‘šπ‘– 𝑖=1 in this expression pi(t) is the momentum vector and V(ri) is the potential energy. The Hamiltonian 𝑖 + 𝑉 (r𝑖) p2 (2.7) is the sum of potential and kinetic energies of all particles in the system, and its partial derivation yields the equations of motion, i.e. velocity of the particle and the force acting on the particle. πœ•r𝑖 πœ•π‘‘ = πœ•p𝑖 πœ•π‘‘ = βˆ’ πœ•π» πœ•p𝑖 πœ•π» πœ•r𝑖 = p𝑖 π‘šπ‘– = v𝑖 = βˆ’ πœ•π‘‰ πœ•r𝑖 = 𝐹 which leads to the Newton’s second law: πœ•2r𝑖 πœ•π‘‘2 = 𝐹 π‘šπ‘– (2.8) (2.9) (2.10) for a particle with mass mi to move along coordinate ri under the influence of an external force F. The last equation above is used to obtain the coordinates of the particles. When the position of a particle changes, the force acting on it also changes. F(t), the total force at time t, is obtained by the vector sum of all interactions between the individual particles, and if the time step, πœ•t, is small, it is assumed to be constant. There are many methods implemented in MD software to integrate the equations of motion, and Velocity Verlet is one of them 2. In the Velocity Verlet algorithm, at time (t + πœ•t), the velocity of a particle (v) and its position (r) are defined as follows: r(t + πœ•π‘‘) = r(𝑑) + πœ•π‘‘vt + πœ•π‘‘2π‘šβˆ’1𝐹 (𝑑) 1 2 v(t + πœ•π‘‘) = v(𝑑) + 1 2 + πœ•π‘‘π‘šβˆ’1(𝐹 (𝑑) + 𝐹 (𝑑 + πœ•π‘‘)) (2.11) (2.12) To initiate an MD simulation, an initial set of coordinates are required. These coordinates can be obtained from various sources. For instance, the coordinates of biological systems can be obtained via X-Ray crystallography studies or NMR structures. Maxwell-Boltzmann distribution is used to derive the initial velocities of the particles, v(0), and it is adjusted so that the total momentum would be zero for the whole system. Initial forces at t=0 are obtained using the Eqn. 9. 6 For a system that has all of the requirements (initial coordinates, r(0); initial velocities, v(0); initial forces, F(0)) the simulation cycle follows these steps: 1. Using Eqn.11, the displacement of coordinates are calculated with respect to the initial positions within a time interval πœ•t. 2. With the help of Eqn.9, the forces on the particles are calculated using the positions from step(i). 3. Using Eqn.12, new velocities are calculated with the initial force and the new force that is obtained in step(ii). 4. Steps above are repeated until reaching a specified amount of simulation time. According to ergodic hypothesis, within an infinite time of simulation, all possible states of a system can be obtained; however, an infinitely long simulation is impossible, and with the current computational limitations, there is a trade-off between long simulation time and the cost, which needs to be considered. 3To be able to reach a sufficient simulation time without increasing the computational cost too much, the time step (πœ•t) should be taken as high as possible (usually (πœ•t is around 1fs to 2fs for classical MD simulations). However, πœ•t is also limited by the integration algorithm that is used by the software. When considering the MD simulations of biological systems, one should bear in mind that the proteins usually exist in a continuous solvent environment. However, the dimensions of the solvent boxes in MD simulations are limited. Therefore, to mimic the bulk effect Periodic Boundary Conditions(PBC) is employed. In PBC, the simulation box is copied infinitely in each direction, and only the coordinates of the original simulation box are followed throughout the simulation. If an atom in the original box moves to the outside of the box, its periodic image from the other box moves in the same direction, allowing the total number of particles in a box at a given time is constant always. In PBC, to treat the electrostatic interaction between particles, generally the Ewald sum method is employed 4. 7 Statistical ensembles are used to obtain the thermodynamic properties of the system of interest. These ensembles are constructed based on the temperature(T), volume(V), number of particles(N) and pressure(P). The most common ones are: β€’ NVE: microcanonical ensemble - has constant N, V and E; β€’ NVT: canonical ensemble - has constant N, V and T; β€’ NPT: isothermal isobaric ensemble - has constant N, P and T 2.3 Thermostats For biological system simulations, the most popular choice is the NVT ensemble due to its computational efficiency. When using the canonical ensemble in simulations, the exchange of energy is contained with different thermostat models by adjusting the temperature to the desired value. The classical definition of average kinetic energy in NVT ensemble is follows: < πΈπ‘˜ >𝑁𝑉𝑇 = 1 2 𝑁 βˆ‘οΈ 𝑖=1 π‘šπ‘–π‘£2 𝑖 (2.13) and the average kinetic energy also can be rewritten such that it is related to the temperature using the classical equipartition theory: < πΈπ‘˜ >𝑁𝑉𝑇 = 3𝑁 βˆ’ 6 2 π‘˜ 𝛽𝑇 (2.14) in which k𝛽 is the Boltzmann constant, and T corresponds to the temperature. Velocities of each step are scaled as vnew=πœ†vi, then the temperature change (Ξ”T = Ti-T(t)) can be calculated using the correlation between vi and T from Eqn 13 and 14 as follows: 𝑁 βˆ‘οΈ 𝑁 βˆ‘οΈ Δ𝑇 = 1 2 𝑖=1 π‘šπ‘– (πœ†π‘£π‘–)2 𝑁 π‘˜ 𝛽 2 3 βˆ’ 1 2 𝑖=1 which gives Δ𝑇 = (πœ†2 βˆ’ 1)𝑇 (𝑑) π‘šπ‘–π‘£2 𝑖 𝑁 π‘˜ 𝛽 2 3 (2.15) (2.16) To obtain the value of πœ† with respect to the target temperature Tmax, and the instantaneous temperature T(t): πœ† = βˆšοΈπ‘‡π‘›π‘’π‘€/𝑇 (𝑑) 8 (2.17) This is the procedure of the simplest thermostat 5:, in which the Tmax can be obtained by multiplying the velocity with πœ† and using the temperature obtained from the kinetic energy, T (t). However, the drawback of this method is that along the simulation, the temperature difference between the solute and solvent may occur. To solve the problems with the previous thermostat model, Andersen thermostat, which is based on the stochastic collision model, was developed 6. In the Andersen thermostat, the system is immersed in a heat bath, and the velocity of the particles in a random time interval is assigned using Maxwell-Boltzmann distribution at temperature T(t). At each step, the simulation is performed with constant energy so no thermal difference within the system occurs. In addition, the calculated velocities follow the Gaussian distribution. Langevin thermostat is another model in which all particles obtain a random force at each time step, and their velocities are lowered by using a constant friction. To obey the "fluctuation- dissipation" theorem, average strength of the random forces and the friction are related. The equations of motion are modified as: πœ•pi πœ•π‘‘ = βˆ’ πœ•π» πœ•qi βˆ’ 𝛾pi + πœŽπœ–π‘– 𝜎2 = 2π›Ύπ‘šπ‘– π‘˜π‘‡ (2.18) (2.19) in which 𝛾 is used to create a damping force to the momenta, and 𝜎 and 𝛾 particles have a relation that is defined by the fluctuation-dissipation relation to recover the canonical ensemble distribution (Eqn. 19). In the Langevin method, it is assumed that big particles (solute) exist in a pool of smaller particles (solvent), and the smaller particles usually randomly collide with the solute molecules and influence their dynamics. Moreover, solvent molecules also have dampening effect on the solute molecules described as a fictional drag force. Langevin thermostat incorporates these two factors. 2.4 Barostats The isothermal-isobaric ensemble is also frequently used in MD simulations, and the methods that are used to control temperature can be adapted for pressure control as well. One method among 9 those is Berendsen barostat. The pressure tensor can be calculated as follows: P = 2 𝑉 (E𝑑 π‘˜π‘–π‘› βˆ’ Ξ) (2.20) in which V is the box volume, Ekin is the kinetic energy and Ξ is the inner virial tensor, which is used to describe the behavior of diluted gases and it is described as follows: Ξ = βˆ’ 1 2 βˆ‘οΈ 𝑖< 𝑗 rΔ³F𝑖 𝑗 (2.21) In the Berendsen barostat, the system is coupled weakly to an external bath. In the equations of motion, an extra term is needed for the pressure change: (cid:19) (cid:18) πœ• 𝑝 πœ•π‘‘ π‘π‘Žπ‘‘β„Ž = 𝑝0 βˆ’ 𝑝 πœπ‘ (2.22) where 𝜏p is the time constant for the coupling. The pressure change is proportional to the isothermal compressibility 𝛽: And 𝛼 can be calculated as: πœ•π‘ƒ πœ•π‘‘ = βˆ’ 1 𝛽𝑉 𝑑𝑉 𝑑𝑑 = βˆ’ 3𝛼 𝛽 𝛼 = βˆ’ 𝛽( 𝑝0 βˆ’ 𝑝) 3πœπ‘ Hence, the equation of the motion is: πœ•π‘Ÿ (𝑑) πœ•π‘‘ = 𝑣 βˆ’ 𝛽( 𝑝0) βˆ’ 𝑝 3πœπ‘ π‘Ÿ that represents the proportional scaling of coordinates. 2.5 Solvation Models (2.23) (2.24) (2.25) To be able to mimic the natural conditions of biological systems, simulation systems need to be immersed in suitable solvent environments. In MD simulations, the solvent is usually explicit water molecules that are defined by certain water models. The most common solvation model is the rigid 3-site TIP3P water model, in which Coulomb’s law and Lennard-Jones potentials are used to describe the electrostatic interactions. On the other hand, 4-site TIP4P-EW model is found to be better at describing the bulk properties of water as maintaining the geometric parameters from TIP4P. In TIP4P-EW, the long-range interactions (Coulomb and LJ) are incorporated 7,8. 10 2.6 Homology Modeling Homology modeling essentially targets building a three-dimensional structure for proteins by using the available structures of closely related proteins. Since not all proteins have their 3-D structures experimentally determined, being able to predict them successfully with in silico methods is extremely useful in drug discovery studies. There are currently many available tools for homology modeling, and each of them uses a different approach. One of the most successful one is I-TASSER by Zhang Lab 9–11. The amino acid sequence is first matched with the sequence of available crystal structures in Protein Data Bank, producing fragments. Then, these fragments obtained from PDB templates are combined to form full-length structure models with Monte Carlo simulations, and the clustering is used to obtain a model. In the final step, this model is used to re-assemble the structures to obtain the final model with the lowest energy. Given the success of this approach in CASP (Critical Assessment of Techniques for Protein Structure Prediction) competitions, it was used to model M. tuberculosis MmpL3 protein structure and Estrogen receptors from rainbow trout (rERs). 2.7 Molecular Docking Another approach that is commonly used in drug discovery research is molecular docking. It provides a relatively low-cost virtual screening for determining potential drug molecules. The docking approaches that are currently used can be classified into two main groups: (i) ligand-based, (ii) structure-based. While the former is used when there is no structural information regarding the target system/protein, latter is used when the 3-D structure of the protein is available. Ligand-based methods include pharmacophore modeling and QSAR (quantitative structure-activity relationship). Molecular docking falls into the structure-based docking, and there are many different open-source (i.e. AutoDock Vina) and commercial tools (i.e. MOE, Maestro) available. A docking process generally follows a two-step approach. First, the ligand conformation as well as the orientation and position are determined, then, the binding affinity for a specific orientation (pose) is calculated after further refinement. The success of docking depends on the scoring algorithms and the sampling methods used in these steps. In addition, the docking methodologies include rigid docking where 11 the ligand/protein is treated as rigid body, and induced fit docking in which the flexibility of ligand/protein are taken into account. MOE (Molecular Operating Environment) is a commercial software that can be used to study small molecules and proteins. The docking suite of MOE is capable of performing induced fit docking with a user-friendly GUI. The algorithm used for ligand placement is Triangle Matcher which uses alpha-spheres to define the binding site 12. The ligand is positioned so that the triplets of ligand atoms are superimposed on alpha spheres, and if there is a clash with protein atoms, that pose is removed. The scoring function for placement step is called London dG that includes the terms for ligand flexibility, hydrogen bonds and desolvation 13,14. In Eq. 26, Eflex corresponds to the estimated ligand entropy; c, chb and cm are constants trained on more than 400 proteins. fhb and fm used to account for the geometric imperfections for ligand-protein and metal-ligand interactions, respectively. And the last term indicates the approximated desolvation energy. After the placement step, specified number of poses are refined for final ranking. For the refinement step, force-field based GBVI/WSA dG (Generalized-Born Volume Integral/Weighted Surface area) scoring function is used (Eq.27) 14. GBVI/WSA dG scoring function was trained using MMFF94x and AMBER99 force-fields on 99 protein-ligand complexes training set. 𝛼 and 𝛽 are the constants that were determined during the training, Esol is the solvated electrostatic term, and SAweighted corresponds to the weighted solvent-accessible area scaled with 𝛽 14. Δ𝐺 𝐿𝑑𝐺 = 𝑐 + 𝐸 𝑓 𝑙𝑒π‘₯ + βˆ‘οΈ π‘β„Žπ‘ π‘“β„Žπ‘ + βˆ‘οΈ π‘π‘š π‘“π‘š + βˆ‘οΈ Δ𝐷𝑖 (2.26) β„Žβˆ’π‘π‘œπ‘›π‘‘π‘  π‘šπ‘’π‘‘π‘Žπ‘™βˆ’π‘™π‘–π‘” π‘Žπ‘‘π‘œπ‘šπ‘ π‘– Δ𝐺𝐺 𝐡𝑉 𝐼 = 𝑐 + 𝛼 (cid:20) 2 3 (Ξ”πΈπ‘ π‘œπ‘™ + Ξ”πΈπ‘π‘œπ‘’π‘™) + Δ𝐸𝑣𝑑𝑀 + 𝛽Δ𝑆 π΄π‘€π‘’π‘–π‘”β„Žπ‘‘π‘’π‘‘ (cid:21) (2.27) 2.8 Free Energy Calculations Free energy drives all molecular processes, including the ligand-protein interactions, folding, and chemical reactions. Therefore, having in silico methods that describe the free energies as accurately as possible is crucial. In drug design, the binding free energy is considered as an indication of the binding strength between the ligand and protein. In a real system where ligand 12 binds to protein under NPT condition, the free energy is expressed as: 𝐹 = βˆ’π‘˜ 𝑏𝑇 𝑙𝑛𝑍 where Z is the partition function, and it is given by: 𝑍 = 1 𝑉0𝑁!β„Ž3𝑁 ∫ 𝑒π‘₯ 𝑝 (cid:19) (cid:18) βˆ’ 𝑃𝑉 π‘˜ 𝑏𝑇 𝑑𝑉 ∫ ∫ (cid:18) (cid:19) βˆ’ 𝐻 ( 𝑝, π‘Ÿ) π‘˜ 𝑏𝑇 π‘‘π‘π‘‘π‘Ÿ (2.28) (2.29) In the equation above, the Hamiltonian is the total energy of the system for particular position and momentum. The computational free energy calculation methods such as thermodynamic integration (TI), free energy perturbation (FEP), and molecular mechanics Poisson-Boltzmann surface area (MM- PBSA) and generalized Born surface are (MM-GBSA) are widely used in virtual screening and lead optimization steps during drug development 15. The first two methods, TI and FEP, are called pathway methods, and they obtain the free energy by converting the system from an initial state to final state via very small changes of the energy function 15. This yields an accurate result; however, those methods are also computationally expensive, and sometimes the convergence can be an issue as well. On the other hand, MM-GBSA/PBSA methods, also called end-point methods, are based on the sampling of the final states therefore they are less expensive 16,17. MM-GBSA/PBSA has been used for the analysis of docking poses, estimation of binding affinities. In addition, it can also help analyzing the individual residue contributions or different energy terms 15,18. However, there can be issues with this method. One major source of error for MM-PBSA/GBSA is the lack of a conformational entropy term. This term can be computed using an additional calculation called normal-mode analysis, which is computationally costly. 2.9 Constant pH Molecular Dynamics Simulations As is known, the environment in which the protein exists significantly impacts the protonation states of the residues and hence, the activity of the protein. While classical MD methods assume only a single protonation state for a given residue, there are cases where more than one protonation states may be possible or need to be considered. With a method developed by Mongan et al. in 2004, that is currently available in Amber20, an implicit solvent (generalized Born) can be used to 13 perform along with periodic Monte Carlo (MC) sampling for the protonation states. 19 At each of the MC step, the protonation state for a given residue is randomly chosen, and then, the free energy associated with the transition to deprotonation or protonation is calculated: 𝑑𝐺 = π‘˜ 𝐡𝑇 ( 𝑝𝐻 βˆ’ π‘πΎπ‘Ž,π‘Ÿπ‘’ 𝑓 )𝑙𝑛10 + 𝑑𝐺 𝑒𝑙𝑒𝑐 βˆ’ 𝑑𝐺 𝑒𝑙𝑒𝑐,π‘Ÿπ‘’ 𝑓 (2.30) where pH is the solvent pH that is specified, pKa,ref is the pKa of the reference residue, dGelec is the electrostatic portion of the calculated free energy for the residue, and lastly, dGelec,ref is the electrostatic portion of the calculated free energy for the reference residue. The non-electrostatic component of this equation includes all free energy components but the GB electrostatics, with an assumption that it would be very similar to the value obtained independently from electrostatic environment. The electrostatic component of the equation is calculated using the difference between the current and proposed protonation state in a single step as an implicit solvent is being used. The dG is used as a criterion whether to accept the transition or reject it. If accepted, the simulation will continue with the new protonation state, and if rejected, the protonation state will not be changed. 2.10 Steered Molecular Dynamics Steered molecular dynamics (SMD) creates changes in coordinates within a given specific time by applying an external force onto the system 20. The way it is implemented in Amber20 uses a constant velocity. It is an approach that is similar to the umbrella sampling, with a difference in which the center of the restraint is now time-dependent: π‘‰π‘Ÿπ‘’π‘ π‘‘ (𝑑) = (1/2)π‘˜ [π‘₯ βˆ’ π‘₯0(𝑑)]2 (2.31) Here, x can be any quantity such as distance, angle or torsion. The generalized work can be computed by integrating the force over time, which then can be used to compute the free energy differences with Jarzynski equation 21. If we have two states A and B, and their generalized coordinates differ in x: 𝑒π‘₯ 𝑝(βˆ’Ξ”πΊ/π‘˜ 𝐡𝑇) = βŸ¨π‘’π‘₯ 𝑝(βˆ’π‘Š/π‘˜ 𝐡𝑇)⟩𝐴 (2.32) 14 This indicates that the computing the work between states A and B, and averaging over the initial state (A), the equilibrium free energies can be calculated from the non-equilibrium calculations 22,23. One method to apply forces to a system is to apply a harmonic restraint and shift it in a specific direction. If we assume a generalized reaction coordinate x again: π‘ˆ = 𝐾 (π‘₯ βˆ’ π‘₯0)2/2 (2.33) where K is the force constant determining how "stiff" the restraint is, and X0 is the initial position of the restraint moving at a constant speed v. Then, the external force on the system can be expressed as: 𝐹 = 𝐾 (π‘₯0 + 𝑣𝑑 βˆ’ π‘₯) (2.34) 15 [1] et al Case, D., J. Berryman, R. B. (2018). AMBER18. BIBLIOGRAPHY [2] Verlet, L. (1967). Computer "experiments" on classical fluids. I. Thermodynamical properties of Lennard-Jones molecules. Physical Review, 159(1):98–103. [3] Genheden, S. and Ryde, U. (2012). Will molecular dynamics simulations of proteins ever reach equilibrium? Physical Chemistry Chemical Physics, 14(24):8662–8677. [4] Cerutti, D. S. and Case, D. A. (2010). Multi-level ewald: A hybrid multigrid/fast fourier transform approach to the electrostatic particle-mesh problem. Journal of Chemical Theory and Computation, 6(2):443–458. [5] Woodcock, L. V. (1971). Isothermal molecular dynamics calculations for liquid salts. Chemical Physics Letters, 10(3):257–261. [6] Andersen, H. C. (1980). Molecular dynamics simulations at constant pressure and/or temper- ature. The Journal of Chemical Physics, 72(4):2384–2393. [7] Horn, H. W., Swope, W. C., Pitera, J. W., Madura, J. D., Dick, T. J., Hura, G. L., and Head-Gordon, T. (2004). Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. Journal of Chemical Physics, 120(20):9665–9678. [8] Horn, H. W., Swope, W. C., and Pitera, J. W. (2005). Characterization of the TIP4P-Ew water model: Vapor pressure and boiling point. Journal of Chemical Physics, 123(19):194504. [9] Zhang, Y. (2008). I-TASSER server for protein 3D structure prediction. [10] Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., and Zhang, Y. (2014). The I-TASSER suite: Protein structure and function prediction. Nature Methods, 12(1):7–8. [11] Roy, A., Kucukural, A., and Zhang, Y. (2010). I-TASSER: A unified platform for automated protein structure and function prediction. Nature Protocols, 5(4):725–738. [12] Edelsbrunner, H. (1992). Weighted alpha shapes. Technical report, Technical paper of theDepartment of Computer Science of the University of Illinois atUrbana-Champaign, Urbana, Illinois. [13] Corbeil, C. R., Williams, C. I., and Labute, P. (2012). Variability in docking success rates due to dataset preparation. [14] Labute, P. (2008). The generalized born/volume integral implicit solvent model: Estimation of the free energy of hydration using London dispersion instead of atomic surface area. Journal of Computational Chemistry, 29(10):1693–1698. 16 [15] Wang, E., Sun, H., Wang, J., Wang, Z., Liu, H., Zhang, J. Z., and Hou, T. (2019). End-Point Binding Free Energy Calculation with MM/PBSA and MM/GBSA: Strategies and Applications in Drug Design. [16] Kollman, P. A., Massova, I., Reyes, C., Kuhn, B., Huo, S., Chong, L., Lee, M., Lee, T., Duan, Y., Wang, W., Donini, O., Cieplak, P., Srinivasan, J., Case, D. A., and Cheatham, T. E. (2000). Calculating structures and free energies of complex molecules: Combining molecular mechanics and continuum models. Accounts of Chemical Research, 33(12):889–897. [17] Massova, I. and Kollman, P. A. (2000). Combined molecular mechanical and continuum solvent approach (MM- PBSA/GBSA) to predict ligand binding. [18] Hou, T., Wang, J., Li, Y., and Wang, W. (2011). Assessing the performance of the MM/PBSA and MM/GBSA methods. 1. The accuracy of binding free energy calculations based on molecular dynamics simulations. Journal of Chemical Information and Modeling, 51(1):69–82. [19] Mongan, J., Case, D. A., and McCammon, J. A. (2004). Constant ph molecular dynamics in generalized born implicit solvent. Journal of computational chemistry, 25:2038–2048. [20] Hummer, G. and Szabo, A. (2001). Free energy reconstruction from nonequilibrium single- molecule pulling experiments. Proceedings of the National Academy of Sciences of the United States of America, 98:3658–3661. [21] Jarzynski, C. (1997). Nonequilibrium equality for free energy differences. Physical Review Letters, 78:2690. [22] Izrailev, S., Stepaniants, S., Isralewitz, B., Kosztin, D., Lu, H., Molnar, F., Wriggers, W., and Schulten, K. (1999). Steered molecular dynamics. pages 39–65. [23] Jensen, M., Park, S., Tajkhorshid, E., and Schulten, K. (2002). Energetics of glycerol conduction through aquaglyceroporin glpf. Proceedings of the National Academy of Sciences of the United States of America, 99:6731–6736. 17 CHAPTER 3 BINDING OF PER-AND POLYFLUOROALKYL SUBSTANCES (PFAS) TO THE PPAR/RXRA–DNA COMPLEX 18 About this chapter: This chapter is reprinted from Almeida NMS; Bali, SK; James, D; Wang, C; Wilson, AK, J. Chem. Inf. Model. 2023, 63, 23, 7423–7443. with permission of the American Chemical Society. Only results pertaining to PPAR𝛾 presented here. 3.1 Introduction Per- and polyfluoroalkyl substances (PFAS) are a group of chemical compounds used as fluo- ropolymers, which have industrial applications ranging from coatings, adhesives, firefighting foam, to oil repellants, due to their high heat resistance. 1–3 PFAS are considered β€œforever chemicals” due to their resistance to degradation 4,5 and their persistence in the environment (soil and water) 6,7, and in humans and animals (see e.g. Ref. 8,9). To illustrate their prevalence, in the U.S., it is estimated that PFAS can be found in more than 99% of the population. 10 The persistence can be attributed at least in part to the strength of the carbon fluorine bond, one of nature’s strongest bonds. The number of PFAS compounds is quite large. The U.S. Environmental Protection Agency (EPA) has more than 14,000 PFAS listed in the PFASTRUCT database as of March 2023, and in a recent study, over 140 compounds were shown to be potentially harmful in in vitro assays. EPA,12 Overall, the compounds can be classified into two groups: legacy and emerging PFAS compounds. The most common legacy PFAS are perfluorooctanoic acid (PFOA) and perfluorooctane sulfonate acid (PFOS). These are two of the earliest known PFAS to be produced on a large scale. 1 The emerging PFAS, which are usually created to offer β€œbetter alternatives” to replace legacy PFAS, must be well understood, not only at the molecular, but also at a mechanistic level. However, recent investigations have linked not only legacy PFAS, but also emerging PFAS, with effects on the environment and living organisms. 7,8,13–15 Nuclear receptors (NRs) are a superfamily of ligand-activated transcription factors and have been the focus of many drug discovery programs. One of the most studied NRs is peroxisome proliferator activator gamma (PPAR𝛾) duetoitsroleinglucosemetabolism,regulation of adipogenesis, and lipid metabolism. 16–18 Even though the biological relevance of PPAR𝛾 in its homodimer form has been discussed in the literature 19–23, the known biologically relevant form of PPAR𝛾 that controls gene transcription is the heterodimer form, i.e, peroxisome proliferator-activator gamma/retinoic 19 X receptor (PPAR𝛾/RXR𝛼). 16,22,24The PPAR𝛾 and RXR𝛼 proteins, similar to other NR proteins, consist of three main domains: a ligand-binding domain (LBD), a DNA-binding domain (DBD), and a hinge domain that connects the DNA binding domain (DBD) and the ligand binding domain (LBD). The ligand molecules can bind to the LBD, causing the reorientation of the Helix-12 and consequently aiding in the recruitment of coactivators. 20 Before undergoing ligand binding, PPAR𝛾 complexes with its corepressor peptides. Upon dimerization with RXR𝛼, the ligands can bind to the LBD and initiate the dissociation of the corepressor by promoting a conformational change. Then, when coactivators are recruited, the transcriptional activities of the dimer can occur. 25The DBD includes two 4-cysteine (Cys4) zinc-finger motifs that are vital to the sequence- specific binding to DNA. 26,27 The DBD domain, as well as the zinc-finger domain, are present among other heterodimers complexed with DNA. 28 It has also found that there are bridging water molecules that facilitate the interactions between DNA bases and DBD residues. 27 The solvent accessibility of the zinc-finger domains may enable the structure and the activity of zinc-finger domains to be impacted by the presence of compounds, such as PFAS in solvents. 29The interaction of zinc-finger domains with various metals that cause metal toxicities, or with small molecules that are used for cancer treatments, including cisplatin, have been investigated using experimental and computational approaches. 29–36Quantum chemistry calculations have shown that the zinc cation can assist in deprotonating cysteines in the zinc-fingers, and this deprotonation is also thought to play an important role to keep the zinc-finger domain in the functional folded conformation. 36However, the stability of the zinc-finger domains is system-dependent and protein backbone motions can stabilize, or destabilize, the cysteine cores. To the best of our knowledge, there is little or no insight about how PFAS molecules can affect/interact with the zinc-finger, or how PFAS can stabilize/destabilize the interaction of zinc-finger domains with DNA. To investigate nuclear receptors, it is important to understand how their activity can be affected by structural and conformational shifts that occur upon ligand binding. To fully understand the effect of PFAS on the PPAR𝛾/RXR𝛼 complex, the structural route towards activation/inactivation, and the interactions between the two proteins as well as with the DNA need to be investigated. 20 The binding of agonist compounds to RXR𝛼-LBD monomer can trigger structural motions for activating the receptor. 37 However, for the PPAR𝛾/RXR𝛼 heterodimer, the activation of the RXR𝛼 nuclear receptor can also activate PPAR𝛾, regardless of the occupation of the PPAR𝛾-LBD. 17,38–40 It is known that the PPAR𝛾-LBD can be activated via hydrogen bonding to Tyr473. Usually, this mechanism can occur for full agonists, which interact with the activation function 2 (AF-2) region and Helix-12. 23,41 Partial agonists have been postulated to have an activation mechanism through water bridging, i.e, they do not directly interact with Helix-12, and their transcriptional activity may not be entirely structural, or connected to movements of Helix-12. 42–48 For the PPAR𝛾/RXR𝛼 heterodimer, allosteric pathways also have been found for their activation, which can elucidate ligand-dependent transcription factors. 49,50 Several studies have linked the effects of PFAS in humans to several types of toxicities (i.e., hepa, neuro, reproductive, immuno, and cardiovascular) and thyroid disruption, among other health issues. 51–58 Recently, PPAR proteins have been investigated for potential binding to PFAS, which carries nefarious outcomes. For example, PPAR𝛾 has been shown to be affected by PFOS, leading to renal fibrosis. 59–61 Liu et al. reported that PFOA and PFOS exposure can cause long lasting effects on uremic patients. 62 The LBD of PPAR𝛾 was also investigated in vitro, and insight was gained about how 16 PFAS bind to this receptor. 63 Among the 16 PFAS compounds that were investigated, several of them bind to PPAR𝛾, which results in activation of PPAR𝛾. The aforementioned study reports the maximum inhibition concentrations, or IC50s, obtained through in vitro experiments. The authors found that the size of the carbon chain and functional groups had a clear influence on how strongly PFAS bind to PPAR𝛾. 63 More recently, Khazee et al. calculated dissociation constants (Kd) of short chain PFAS, and it was also the first time Kd for short chain PFAS were reported in sub-micromolar concentration. 64 Interestingly, an investigation by Chou et al. showed that L-carnitine is able to attenuate the effects of PFOS on PPAR𝛾 via Sirt1 mechanisms. 59 L-carnitine is a molecule that is absorbed from diet and also is synthesized in the brain, kidney, and liver. It can also be easily purchased commercially. 65 In previous investigations, L-carnitine has been reported to decrease the level of 21 apoptosis in kidney cells through a PPAR𝛾-dependent mechanism. 59,60,66 Experimentally, there is little information about how PFAS bind to RXRs. Heuval et al. reported that RXR𝛼 can be activated by PFAS in mice. 67 In the same study, only mild activation was found for PPAR𝛾. More recently, it was shown that PFAS can bind to RXR𝛽 and target a particular agonistic bioactivity of this receptor. 12 Although there are many in vitro experimental studies on PPAR𝛾, there is not much known about how PFAS interacts with PPAR𝛾/RXR𝛼 complex and how this interaction can affect DNA binding, on a molecular level. In one of the first computational studies on PPAR𝛾, the authors reported binding sites and binding energies for PFOA and PFOS. 68 Other efforts have focused on how PFAS bind to different human and animal proteins, and the prediction of binding pockets and poses. 9,69–72 Zhang et al., performed molecular docking simulations for PFAS on PPAR𝛾. The authors showed that Tyr473, His323 and His449 were important residues towards binding in the PPAR𝛾 binding pocket. 63 Li et al. reported PPAR𝛽/𝛿 activities and performed docking studies for both receptors. 73 In a more recent study, Behr et al., concluded that PPAR𝛼 could be activated by a range of PFAS. However, PPAR𝛾 was only activated by perfluoro-2-methyl- 3-oxahexanoic acid and 3H-perfluoro-3-(3-methoxypropoxy) propanoic acid. In one of our recent studies, the interactions of 27 PFAS and L-carnitine with PPAR𝛾, and the roles of the acidic and basic residues in two binding pockets were investigated. A new binding pocket (dimer pocket) was postulated for the PPAR𝛾 homodimer structure. L-carnitine was shown to have the potential to bioaccumulate in the dimer pocket as well as similar binding to most of the studied PFAS. The acid/base and residue decomposition indicated that interactions with PPAR𝛾 were more favorable towards L-carnitine than the PFAS, indicating that L-carnitine can competitively replace PFAS from both of the investigated binding pockets. Because of the large number of PFAS compounds, recently, machine learning (ML) approaches have also been utilized to predict the binding between PFAS and nuclear receptors. 74 One of the most recent approaches considered the binding of 4,464 PFAS to PPAR𝛼 and 𝛾, and the thyroid hormone receptor. 75 The authors concluded that the binding energies of PFAS to thyroid hormone receptors are 2-3 kcal mol-1 stronger than to PPAR𝛾. As well, a machine learning strategy was 22 utilized to identify novel PFAS compounds that may be less toxic than current PFAS such as GenX. 74 Herein, a variety of PFAS were investigated to consider their effect on the activity of the PPAR𝛾/RXR𝛼-DNA complex. In total, nine PFAS with different chain lengths and functional groups were selected, along with L-carnitine. The characteristics of the PFAS and the spe- cific species include those with: (a) one sulfonic group, perfluorooctane sulfonic acid (PFOS); (b) an amino group, perfluorooctane sulfonamido (PFOSA), and (c) acidic groups, PFOA and PFHxDA. For alternative PFAS, 2,3,3,3-tetrafluoro-2-heptafluoropropoxy propanoic acid (GenX) and 4,8-dioxa-3H-perfluorononanoic acid (ADONA) were considered. Furthermore, the alco- hol and carboxylic acid fluorotelomers investigated herein were 6:2 fluorotelomer alcohol (6:2 FTOH) and 6:2 fluorotelomer sulfonic acid (6:2 FTSA). In addition, the PFAS which showed the largest binding affinity in our previous study, 2-(N-Ethyl-perfluorooctane sulfonamido) acetic acid, Et-PFOSAAcOH, was also included in this investigation. 2 Molecular dynamics simulations and binding free energy calculations have been fundamental approaches to study small molecule-protein interactions. 76–78 Molecular docking approaches, along with molecular dynamics simulations were used to investigate the effects of selected compounds on the PPAR𝛾/RXR𝛼-DNA complex. Fur- ther binding analysis using the Poisson - Boltzmann surface area (MM-PBSA) and molecular mechanics with a modified general born solvation model (MM-GBSA) methodologies were used to assess the binding strength of selected PFAS for the PPAR𝛾 and RXR𝛼 ligand binding domains. Structural changes upon PFAS binding were investigated as well. For the RXR𝛼 DNA binding domain located near the PPAR𝛾-LBD, quantum mechanical calculations, using several different density functional approaches and DLPNO-CCSD(T) calculations were performed, providing a more robust assessment of the binding trends of the pocket. 3.2 Computational Protocols 3.2.1 System Preparation The DNA-bound PPAR𝛾/RXR𝛼 (PPAR𝛾/RXR𝛼-DNA) structure was obtained from the RSCB Protein Data Bank (PDB ID: 3DZU). 24 Before the docking procedure, the protein and DNA 23 structures were prepared with the Molecular Operating Environment (MOE) software 79 using the Protonate 3D at the physiological pH. 80,81 All solvent molecules, ions, and co-activator peptides were removed from the structure. For the apo simulation, the co-activator peptide (NCOA2) was included in the simulation, due to the lack of ligands in the binding pockets, and to maintain the stability of the secondary protein structure. With these modifications, the overall structure does not change from its activated state, nor does it significantly change the secondary structure of the heterodimer. To identify the possible binding pockets, MOE’s β€œsite finder” algorithm was employed. 82 Three different binding pockets were considered for docking and molecular dynamic (MD) simulations: PPAR𝛾 -LBD for Pocket 1, RXR𝛼-LBD for Pocket 2, and one of the DBD for Pocket 3 (Figure S1). The protonation states of the different PFAS and L-carnitine were obtained under physiological conditions (pH=7, 300K and 1atm). For this step, the Protonate 3D module was utilized.81 For the generation of poses, the London Ξ” G scoring function was employed obtaining 100 initial placements. 83 The GBVI/WSA Ξ” G scoring function, with the induced fit protein method was utilized to refine the final ten poses. The poses with different functional group orientations and with the highest scoring functions were selected for molecular dynamics (MD) simulations. For Pocket 1, a pharmacophore approach was utilized, which features a hydrogen bond to Tyr473, consistent with PPAR𝛾’s activation. 41 3.2.2 Molecular Docking Protocol and Pose Selection For the PPAR𝛾-LBD binding pocket, or Pocket 1, Tyr473 plays an important role in PFAS binding and towards the activation of the PPAR𝛾 protein. 63 A pharmacophore approach was used to place the functional groups of PFAS molecules near the -OH on the side chain of Tyr473 residue, and for each PFAS compound, two poses that are distinct from one another were selected for further analysis. The selected poses differ from each other by the orientation of their tail ends. In Figure S1-S2, the binding pocket locations, as well as the binding orientations of selected poses are shown. The docking scores of the selected poses are also reported in Table S1. Overall, the binding orientations of the selected poses are classified into two different categories based on the orientation of the tail end of PFAS, either pointing towards Tyr473 or in the opposite direction 24 towards Tyr473. For the RXR𝛼-LBD, Pocket 2, there is no experimental evidence suggesting the importance of any residue interaction with the PFAS compounds, hence the docking was performed without a pharmacophore model. Instead, the Triangle Matcher algorithm was used at the pose generation step and the GBVI/WSA Ξ”G scoring function was used for ranking the poses.80,83 Most of the poses obtained with this approach had PFAS head groups oriented towards Arg316. For each PFAS, two distinct poses were selected for further investigation (Figure S3). Regarding Pocket 3, two distinct poses were selected for molecular dynamics simulations. The binding pocket in the DBD was identified by site finder in MOE, and this pocket location is shown in Figure S1. This binding pocket is interesting as it is at the interface of the PPAR𝛾-LBD and the RXR𝛼-DBD. Due to the difference in charge between Zn2+ and the PFAS head groups, there is a strong electrostatic interaction which makes the binding possible. (It was later found that without a strong electrostatic interaction between an atom from the functional groups of the PFAS, and the zinc ion, the pose was not stable and moved away from the pocket.) 3.2.3 Molecular Dynamics Simulations and Binding Free Energy Calculations To prepare the PFAS and L-carnitine for the MD simulations, restrained electrostatic potential charges (RESP) were calculated with the RED server. 84,85 For each compound, a short MD simu- lation was performed to sample conformations at 350 K with a 4-fs time step. The trajectories were clustered and the top three representative frames were used in the calculation of the partial charges using the RESP method. The simulation box for each complex was generated by using the leap module as featured in Amber Tools. 86 For the simulations, ff14SB, OL15, and gaff2 were used for the protein, DNA, and small molecules (i.e., PFAS), respectively. 87,88 Each system was neutralized in accordance with Joung and Cheatham parameters in 0.1M NaCl. In addition, the TIP4P-Ew water model was considered for all simulations. 89–91 Due to the presence of the zinc-finger motif in the DBD, a Leonard-Jones 12-6-4 potential was used to describe the Zn-Cys4 interactions.92–94 In addition, the cysteines coordinated to Zn2+ were deprotonated. In total, ∼130,000 water molecules were added for each PPAR𝛾/RXR𝛼-DNA and PFAS system. 92 For the minimization, a series of harmonic potentials were selected (100.0, 50.0, 10.0 and 0.0 25 kcal mol-1 Γ…-2), which restrain all atoms with the exception of the water molecules and ions. Then, the PPAR𝛾/RXR𝛼-DNA and PFAS complex was heated from 0 K to 300 K in a stepwise manner. The systems were gradually heated with restraints that were released in gradually. After the heating step, a 500 ps equilibrium simulation was performed with a time step of 1 fs. For the production run, a 75 ns long MD simulation was performed for PPAR𝛾/RXR𝛼-DNA. For each PFAS and L-carnitine bound to Pockets 1 and 2, two poses were considered to sample various conformation of the ligands. In addition, for a given compound, the values for two poses were averaged for residue decomposition, binding free energy, and hydrogen bonding analyses. For pocket 3 (DBD), the one pose obtained from docking was submitted to optimization with DFT, and all of them converged to a minimum on the potential energy surface, with real frequencies. Furthermore, per the SI, after ∼10 ns of every simulation, the root mean square deviations (RMSD) plateaued, which indicate the stabilization of the protein structure. For this study, 75 ns is enough to provide all the information needed to analyze the binding of different PFAS and L-carnitine to PPAR𝛾/RXR𝛼-DNA. A 1 fs time step was considered for all simulations, and 1000 frames per nanosecond were written out to disk. This frame collection allows for a large sampling of trajectories and an in-depth hydrogen bonding analysis. The SHAKE algorithm 93 was utilized for covalent bonds with hydrogen atoms. The particle-mesh Ewald approach was utilized to approximate long-range electrostatic interactions. The molecular dynamic simulations were performed with AMBER 2020 using the pmemd module with CUDA. 86 Binding free binding energy calculations were performed for Pocket 1 and Pocket 2. These calculations were carried out using molecular mechanics with a Poisson - Boltzmann surface area (MM-PBSA) and molecular mechanics with a modified general born solvation model (MM-GBSA) as implemented in the Amber 2020/AmberTools21. 94,95,86 In prior work on a single NR (PPAR𝛾) and 27 different PFAS, the applicability of MM-GBSA and MM-PBSA was demonstrated, and, thus, the approaches have been utilized for the current study. Since an energetic assessment of PFAS binding strengths is investigated, both MM-GBSA and MM-PBSA yield a relatively fast assessment of different points of the simulation, providing useful data sampling. While methods 26 such as free energy perturbation (FEP) or thermodynamic integration (TI) may provide an even more useful assessment, these methods are too demanding for the present study, due to the numbers and sizes of the systems. To achieve a better sampling of the results, 7500 frames (equally spaced) from 75 ns long trajectories of the MM-GBSA and MM-GBSA simulations for each PFAS were selected for the binding free energy calculations. The simulations with the highest binding affinities for each PFAS were averaged. The residue decomposition analysis for each binding pocket was performed for the amino acids within ∼10 Γ…of the PFAS.86 Moreover, root-mean-square-distance (RMSD), root- mean-square fluctuations (RMSF), residue decomposition analysis, and hydrogen bond analysis were performed with the CPPTRAJ module as implemented in AmberTools21 using the default settings. 96 All data were plotted using the matplotlib module and the figures were obtained using UCSF Chimera and MOE. 97,79 3.2.4 QM-cluster Approach for Pocket 3 Due to the presence of metallic atoms in the DNA binding domain (DBD), an alternative to MM-PBSA and MM-GBSA methodologies is needed. To calculate the binding energies of PFAS coordinating to zinc in this pocket, a quantum mechanics-clustering (QM-clustering) approach was used for stable poses. Such an approach has been useful in the study of metal-protein coordination as well as enzymatic reactions, as shown in prior studies. 81,98–100 Because of the size of the system, density functional theory (DFT) approaches can provide useful insight while maintaining an affordable computational cost. However, a far more expansive ab initio method, the domain- based local pair natural orbital (DLPNO) form of coupled cluster single, double, and perturbative triple excitation (CCSD(T)) method (DLPNO-CCSD(T)) was also employed to predict binding energies. 83 For the QM calculations, the investigated binding site consists of Zn-Cys4 along with the residues near the bound PFAS compounds (Figure 1). For the clustering of MD simulations, a hierarchical agglomerative algorithm with an epsilon value of 3.0 was chosen, and for each PFAS, ten clusters were calculated r. Each tenth frame of 75000 frames was considered for clustering 27 Figure 3.1 Example of the DBD binding site with the Zn-Cym4 motif and nearby residues with capped backbones (shown with an asterisk) and PFOA, which was utilized for the binding energy calculations. This structure was optimized with B3LYP-D3BJ/def2-SV(P) in a PCM water environment). The Cym residues are from PPAR𝛾-DBD, the residues Arg211 to Lys213 are from the RXR𝛼 protein. Atoms with asterisks (βˆ—) are fixed in their original positions. the trajectories, and the cpptraj module was used as implemented in AmberTools21. 101,96 For the selected PFAS (PFOS, PFOA, Et-PFOSAAcOH, 6:2 FTSA, GenX, and ADONA), the first rendered cluster was prevalent for almost 100% of the simulations, therefore this first cluster was selected for each of the DFT calculations. For the geometry optimization step, the dispersion corrected density functional, Becke, 3-parameter, Lee–Yang–Parr, B3LYP-D3(GD3BJ) in conjunction with def2-SV(P) basis sets were utilized. 102–106 In prior studies, the B3LYP-D3(GD3BJ) approach with the def2-SV(P) basis set has resulted in valid equilibrium structures for structures of this size.110 This basis set was also used previously for protein-ligand interactions. 106,107The complex, the protein, and PFAS were each optimized separately. The corresponding structures are provided in Table S2. To simulate water solvation within a biological environment, the implicit-solvent polarizable continuum model (PCM) including non-electrostatic contributions (solute-solvent dis- persion, solute-solvent repulsion, and solute cavitation) was considered. 108–111 To calculate binding 28 energies (Be), single point calculations were performed based on Equation (1), where Ecomplex, Eprotein, and EPFAS correspond to the energies of the complex, protein, and PFAS, respectively. The complex, the protein, and PFAS were each optimized separately. The corresponding struc- tures are provided in Table S2. To simulate water solvation within a biological environment, the implicit-solvent polarizable continuum model (PCM) including non-electrostatic contributions (solute-solvent dispersion, solute-solvent repulsion, and solute cavitation) was considered. 108–111 To calculate binding energies (Be), single point calculations were performed based on Equation (1), where Ecomplex, Eprotein, and EPFAS correspond to the energies of the complex, protein, and PFAS, respectively: 𝐡𝑒 = πΈπ‘π‘œπ‘š 𝑝𝑙𝑒π‘₯ βˆ’ 𝐸 π‘π‘Ÿπ‘œπ‘‘π‘’π‘–π‘› βˆ’ 𝐸𝑃𝐹 𝐴𝑆 (3.1) For the binding energy calculations (Equation (1)), B3LYP-D3BJ/def2-SV(P) and B3LYP- D3BJ/def2-TZVPP calculations were performed, incorporating PCM. To provide a comparison to B3LYP-D3BJ, the Minnesota 15 (MN15) functional was also considered. MN15 is known to be useful for noncovalent interactions and includes some level of parametrization for transition metals. 112 This functional was also partnered with the def2-TZVPP basis sets and the PCM implicit solvation model. To probe the effect of the more electronegative atoms on the binding energies, the def2-TZVPPD basis sets, which include additional diffuse functions, were also employed for the B3LYP-D3BJ cal- culations. In addition, the SMD implicit model was utilized for comparison with PCM. 113 To probe the effect of the more electronegative atoms on the binding energies, the def2-TZVPPD basis sets, which include additional diffuse functions, was also employed for the B3LYP-D3BJ calculations. In addition, the SMD implicit model was utilized for comparison with PCM.119 To better account for electron correlation, DLPNO-CCSD(T) was considered, though at a triple-𝜁 basis set level (with def2-TZVP(-f)), due to its computational cost.79 For the DLPNO-CCSD(T) calculations, two implicit solvation environments were considered: the conductor-like polarizable continuum model (C-PCM) and SMD. 114–116 For these calculations, ORCA 5.0.3 was utilized. 117,115 Finally, 29 as energy convergence with respect to basis set is often not reached until the quadruple-𝜁 level for DFT methods for transition metals,123 B3LYP-D3BJ with PCM and SMD were utilized with def2-QZVPP for hydrogen, carbon, and zinc, and def2-QZVPPD for N, O, F and S to calculate the binding energies. To simplify the notation of the DFT calculations, D3BJ will be omitted when referring to a DFT functional. To simplify the calculations, the protein backbone in the complexes was replaced by -CH3 and -CH2 groups, reducing the size of the model systems (as shown in Figure 1). The selected residues coordinating to Zn cation were truncated from the C𝛼 of the adjacent residues to preserve the peptide bonds. 99,117–119 For Cym164 and Cym167, only the side chains of the residues between them were removed and the peptide backbone was kept intact. The substituted functional groups were frozen during the constrained geometry optimizations. Namely, the positions of two types of atoms were fixed: (a) the external -C(𝛼)H3 group and (b) the -C(𝛼)H2 groups between two connected cysteines. All DFT calculations were performed using the Gaussian 16 software package, revision C01. 3.3 Results and Discussion 3.3.1 Structural convergence and fluctuations The structural movement of proteins allows for correlation analysis between different structures, allowing for the analysis between the protein with no ligands bound (apo) and the protein bound with (a) its co-crystallized ligands, (b) PFAS, and (c) L-carnitine. Assessing the differences between the structures allows for insight into the movement of the secondary structure of proteins which is important, because the movement influences the activation, or inactivation of the protein. To check that structural convergence of the PPAR𝛾/RXR𝛼-DNA complex was achieved throughout the simulation, the RMSD was monitored for convergence. As 6:2 FTOH did not have a stable docking pose in Pocket 1, it was not simulated, or included in the analysis for this pocket. For Pocket 2, all of the PFAS poses remained in the pocket. For Pocket 3, L-carnitine was not stable in the pocket, and, hence, it was not included in the analysis. A structural comparison among PFAS-bound PPAR𝛾/RXR𝛼-DNA, co-crystallized ligands (2-[(2,4-dicholorobenzoyl) amino]-5-(pyrimidin-2- yloxy) benzoic acid for PPAR𝛾, and (9cis)-retinoic acid for RXR𝛼) bound complexes, and the apo 30 Figure 3.2 RMSF plot for all protein residues and DNA, for nine different PFAS and L-carnitine (LCN) for Pocket 1. The values are calculated for 75 ns MD simulations in Pocket 1. structure was done. 3.3.1.1 Pocket 1 residue fluctuations and stability The time-series RMSD plots for Pocket 1 (PFAS and L-carnitine) are shown in Figures S4-S13. Overall, all protein RMSDs converged within the 75 ns simulation time. When the LBDs, DBDs, and hinge domains are compared for both PPAR𝛾 and RXR𝛼 proteins separately, the hinge domains resulted in the largest overall RMSD. A large RMSD for the hinge domains is expected due to the lack of a definitive secondary structure. Furthermore, in all of the simulations, the DBDs of both proteins resulted in the lowest RMSDs in comparison to LDBs. On the other hand, LBDs resulted in a variety of conformational changes throughout the simulation time. The low RMSDs observed for the DBDs may indicate that strong interactions with DNA stabilize the domain movements. For the majority of the simulations, the PFAS remained stable in the pocket (i.e. small RMSD); however, there were poses in which the PFAS was observed to change conformations within Pocket 1. In Figure S14, S15, S16, S17, S18 the small conformation changes of PFOA, PFHxDA, 6:2 FTSA, and Et-PFOSAAcOH are shown, respectively. Other PFAS including PFOS, PFOSA, GenX, and ADONA did not have any significant conformation changes and their RMSDs were stable throughout the simulation time. 31 With the presence of PFAS in Pocket 1, the RMSD time-series plots showed that the DNA oligomer reaches a stable RMS distance early in the simulations, with the exception of the PFOS- bound complex. The apo simulation (Figure S19) also has a stable RMSD for the first 75 ns of the simulation with an average RMSD of ∼3 Γ…. In the presence of co-crystallized ligands (Figure S20), the RMSD of DNA is ∼2 Γ…until 50 ns, and there is an increase observed after that. In all of the simulations, PFAS in Pocket 1 led to a very stable PPAR𝛾-DBD; however, the hinge domain and PPAR𝛾-LBD showed differences depending on which PFAS is bound to the binding pocket. Both the apo complex and simulation with co-crystallized ligands result in a very stable PPAR𝛾-DBD which could indicate that the stability of PPAR𝛾-DBD may not be directly influenced by the ligand binding to PPAR𝛾-LBD, within the time frame considered. The presence of co-crystallized ligands led to more stable and lower RMSD hinge and LBD domains overall, with the highest RMSD being ∼2 Γ…. On the contrary, the apo system displayed a high RMSD for the hinge (∼ 3Γ…), while the PPAR𝛾-LDB domain was ∼2 Γ…. The L-carnitine compound also had various conformational changes throughout the trajectory; however, these conformational changes were relatively small and did not result in large motions of LCN. In all of the PFAS simulations, the PPAR𝛾 hinge domain resulted in the largest RMSD within the PPAR𝛾 protein, with an average value of ∼3 Γ…. The PPAR𝛾-LDB domain, similarly, shows very little deviation in RMSDs and is quite stable in all of the Pocket 1 simulations. The RXR𝛼 protein had a very stable DBD in all simulations with a PFAS bound to Pocket 1, whereas the hinge and RXR𝛼 LBD had different convergence times. Both the apo complex and system with co-crystallized ligands have a stable RXR𝛼-DBD domain with RMSD less than 1 Γ…. The presence of co-crystallized ligands, however, reduced the RMSD of the RXR𝛼-LDB domain to ∼1.5 Γ…, on average. The RMSF plots of Pocket 1 shown in Figure 2 illustrate the impact of the binding of PFAS on the overall protein and DNA motions. The apo simulation as well as the PFAS indicate a general trend in which the hinge domains always have a high fluctuation while the DBD and LBD domains have less. This observation is in parallel with the RMSD analysis, where the RMSD of the hinge domain was the largest among all investigated domains. An RMSF value between 4 to 8 32 Γ…was observed for the RXR𝛼 hinge domain, which is higher than what is observed for the PPAR𝛾 hinge loop. Another region with high RMSF was the Ξ© loop on PPAR𝛾 (residues Lys261-Glu276). The Ξ© loop of PPAR𝛾 showed the largest fluctuations in apo simulations, and the presence of co-crystallized ligands was observed to lower the RMSF of the Ξ© loop. Of the PFAS, ADONA resulted in the highest RMSF for the Ξ© loop, and the PFOSA had the lowest. The Ξ© loop is thought to be important for the allosteric activation mechanism of PPAR𝛾 and affecting the conformation of H12 helix. 131 While these observations were also present in the RMSF plot for L-carnitine, in general, the RMSF values are lower than other simulated systems and are comparable to apo and co-crystallized ligand-bound systems. This would indicate that binding of a small compound like L-carnitine did not structurally affect the complex. 3.3.1.2 Pocket 3 residue fluctuations and stability RMSD and RMSF plots for the investigated DBD pocket are shown in Figures S21-S29 and Figure 3, respectively. For all PFAS that coordinated to the zinc finger and stayed complexed with four cysteines and a zinc ion have a very stable RMSD (PFOSA and 6:2 FTOH did not show coordination to zinc). These poses are quite stable and did not shift away from the pocket. Throughout the 75 ns simulations, the RMSD of PPAR𝛾 was smaller than that of RXR𝛼, when PFAS are bound to the DBD. This outcome is also consistent with observations made for Pocket 1 and Pocket 2. In addition, all complexes converged during the simulation. The hinge regions of both PPAR𝛾 and RXR𝛼 resulted in the largest RMSD values when PFAS are bound to the DBD (Pocket 3). For RMSFs, the hinge regions of both PPAR𝛾-LBD and RXR𝛼-LBD also had the highest RMSF values and were affected by a range of PFAS in this pocket. All PFAS, except PFOSA and 6:2 FTOH, do not coordinate to the zinc ion, and their constant movement within the pocket altered the RMSF at the hinge regions for both proteins. This is an important outcome, because the hinges connect the DNA binding domains to RXR𝛼 and PPAR𝛾 LBDs. Having higher fluctuations in the hinge domains affects the communication of the nuclear receptors and DNA.49 Herein, the area which showed the highest fluctuations when PFAS are bound to the DBD is the RXR𝛼 region, namely residues ranging from Glu233-Asp273. For example, the most extreme case 33 Figure 3.3 RMSF plot for all protein residues and DNA bases. The values are determined for 75ns MD simulations in the DBD pocket near the zinc finger domain. of fluctuations for this region occurred for PFOA, with RMSF close to 12 Γ…, versus an average value of 7 Γ…for other PFAS. The apo and the protein structures with native ligands had RMSFs of ∼5 Γ…, i.e., showing less fluctuations/movement. 3.3.2 Binding free energy calculations In order to understand the binding strengths of PFAS in each pocket, MM-GBSA/PBSA method- ology was employed. Previously, this approach has shown good agreement with the experimental IC50 values for PFAS bound to Pocket 1. 3.3.2.1 PPAR𝛾 ligand binding pocket – Pocket 1 The MM-PBSA/GBSA methodologies were used to calculate the binding energies of the in- vestigated compounds in the binding pockets. In Figure 4, the average binding energies of the compounds in Pocket 1 are shown. The MM-PBSA results resulted in a ranking of the compounds as follows (from highest binding energy to lowest): Et-PFOSAAcOH and PFHxDA (∼-44 kcal mol-1 ); PFOS (∼-31 kcal mol-1 ); PFOSA (∼-25 kcal mol-1 ); 6:2 FTSA, GenX and ADONA (∼-22 kcal mol-1 ); PFOA (∼-20 kcal mol-1 ); L-carnitine (∼-15 kcal mol-1 ). Among those, PFOSA, Et-PFOSAAcOH, and PFOS have eight perfluorinated carbons, PFOA has six perfluronated car- 34 Figure 3.4 Average MM-PBSA/GBSA binding energies for Pocket 1. The binding energies were averaged over a 75 ns long MD simulation. bons, while 6:2 FTSA and ADONA have six perfluorinated carbons. The comparison between the MM-PBSA binding affinities and the chain length of the perfluorinated carbons indicates that the longer chain PFAS binds more strongly than the shorter chain PFAS. PFHxDA (16 carbons) and Et-PFOSAAcOH (12 carbons) are the strongest binders, while PFOA and L-carnitine are the weakest. The alternative PFAS (GenX and ADONA) have binding strengths that are comparable to PFOSA and 6:2 FTSA. 3.3.2.2 DNA binding pocket – Pocket 3 The DBD binding energies calculated with DFT methodologies are included in Table 1. Of the ten MD simulations performed for Pocket 3, only seven PFAS stayed within the pocket. 6:2 FTOH and PFOSA moved out of the pocket, and L-carnitine travelled into the solvent, so it was not considered further towards analysis. For the seven PFAS which remained stable within the binding domain, the binding energies were calculated as per Equation 1. The cartesian coordinates of the final optimized structures are included in the SI (Table S2). When comparing the final optimized structures to the highest populated cluster from MD simulations, only PFOA and Et-PFOSAAcOH maintained coordination to the zinc ion. For these two structures, a five-ligand coordinated structure was formed with four cysteines and the zinc ion. For the PFOA structure, zinc’s covalent bond length to the deprotonated cysteines (Cym) increased from ∼2.2 Γ…to ∼2.3-2.4 Γ…, and PFOA moved 35 from a distance of ∼2.2 Γ…to 2.86 Γ…. PFOA was still coordinated to the zinc dication, even though it was repelled from the first coordination sphere. PFOA was also stabilized by interactions with Lys213 and Asp214 from the RXR𝛼 protein. Even though PFOA coordinated with the zinc and the four deprotonated cysteines, its binding energy was still positive for all DFT functionals and basis set combinations, with the exception of def2-SV(P). Et-PFOSAAcOH formed a hydrogen bond with Lys213, which maintained the PFAS coordination to the zinc dication. The distance between sulfonate oxygen to zinc is 2.53 Γ…and, the zinc-Cym4 bond length averages were ∼2.4 Γ…. The binding energy for this complex was positive for all methodologies when considering triple-𝜁 and quadruple-𝜁 basis sets. For PFOA and Et-PFOSAAcOH, it is shown later (section 3.4.3) in the residue decomposition that most of the residues around these two PFAS contributed positively (repel). However, both of these PFAS formed stabilizing electrostatic interactions with the Zn2+ ion, through the negatively charged head group oxygens. Even though none of the other five PFAS maintained coordination to the zinc dication, electrostatic interactions with the Lys213, Asp214, Gly212, Arg211, and Gln210 residues allowed for these PFAS to stay in the pocket. For example, PFOS formed a strong hydrogen bond with Gln216 and Arg147. With a similar size to PFOS, 6:2 FTSA had the same orientation as PFOS within the pocket. In addition, it also bonded to Gln216 through hydrogen bonding, but not with Arg147. However, 6:2 FTSA forms a hydrogen bond with a cysteine (Cym162) coordinated to Zn2+. The 6:2 FTSA structure has two carbons with four hydrogens, which allows for a hydrogen bond donation to this negatively charge cysteine. For these two PFAS, all of the DFT methods in Table 1 predicted a negative binding energy, for all but the prediction for PFOS utilizing SMD and the triple-𝜁 basis set. In addition, DLPNO-CCSD(T) predicted a positive binding energy for PFOS in both a C-PCM and a SMD environment. From a triple- to quadruple-𝜁 basis set, B3LYP-D3BJ/PCM dropped 0.9 kcal mol-1 in binding energy, though the energy was still negative. However, B3LYP/SMD predicted a positive binding energy. For 6:2 FTSA with DLPNO-CCSD(T) in a SMD and C-PCM environment, the binding energy was -1.3 kcal mol-1 and 0.4 kcal mol-1, respectively. For the quadruple-𝜁 calculation in both solvation environments, slight binding was still predicted. 36 Table 3.1 Binding energies were calculated for the DNA binding pocket (DBD) using a range of DFT functionals and basis sets with PCM and SMD implicit solvation models. At the triple-𝜁 level, DLPNO-CCSD(T)/def2-TZVP(-f) was utilized with C-PCM and SMD. The geometry of each PFAS was optimized at the B3LYP-D3BJ/def2-SV(P) level, utilizing the PCM implicit solvation model. Units are in kcal mol-1. The largest PFAS studied, PFHxDA, formed a hydrogen bond interaction with Gly212 and Gln210, keeping this PFAS compound in the pocket. In addition, Cym162 interacted directly with the oxygen from the carboxylic acid functional group of PFHxDA. From the double- to triple-𝜁 basis set, the PCM binding energy predictions resulted a drop of 2 kcal mol-1. The SMD solvation model and MN15-PCM at the triple-𝜁 level basis set level predicted negative binding energies, demonstrating affinity towards this pocket. DLPNO-CCSD(T) predicted a negative binding energy for this PFAS with each of the implicit solvation methods investigated. Negative binding energies were also predicted with B3LYP and quadruple-𝜁 basis sets. For the two alternative PFAS investigated, GenX and ADONA, two different poses were identified within the pocket. GenX was not as close to Gly212 and Gln210 as ADONA, so it did not interact as strongly with these residues. It was also not close enough to the zinc dication in order to coordinate to it or interact with any of the deprotonated cysteines. Furthermore, due to the loss of these two important interactions, the binding energies at triple- and quadruple-𝜁 levels are positive. On the other hand, ADONA was far more stable and interacted favorably with Gly212, Gln210, and Cym162, as demonstrated by its negative binding energy. B3LYP-PCM, using a combination of quadruple-𝜁 basis sets rendered the largest binding energy for the complexes at -10.9 kcal mol-1. For DLPNO-CCSD(T), the binding 37 energy was -3.4 and 3.0 kcal mol-1 for C-PCM and SMD predictions. DLPNO is a powerful method for binding energy predictions but can only be paired with a smaller basis set, due to its computational cost. Even though explicit solvation is not possible to utilize due to the system size, implicit solvation is crucial to obtain valid and meaningful binding energies. In addition, DFT allows a great balance between computational cost and accuracy, estimating binding energies up to quadruple-𝜁 level. Regarding the different DFT methods, it should be noted that the basis set used for the geometry optimization step is not appropriate for the energetics (def2-SV(P)). B3LYP at a quadruple-𝜁 level is our most robust functional/basis set combination. When directly compared to DLPNO-CCSD(T) at a triple-𝜁 level, with different solvation methods, both methods demonstrate that PFHxDA and ADONA bind to this pocket. However, only B3LYP-D3BJ/PCM shows affinity for PFOS, but not B3LYP-D3BJ/PCM or DLPNO-CCSD(T)/SMD or C-CPCM. Even though the binding energies for the DBD are less negative than for the other two binding pockets considered, PFAS can still bind in the DBD. One of the reasons for the lower binding energies relies on the fact that the zinc finger domain is coordinated by four deprotonated cysteines, however the docking algorithm places all the PFAS in the first coordination shell of the cysteines. Since the Zn2+ has a full 3d shell, it does not want to accept another ligand. After optimizing the geometries with DFT for the different fragments separately, most of the PFAS move further into the pocket, or stay in the second coordination shell of the Zn2+ atom. 3.3.3 Residue interactions and hydrogen bonding Residue interaction analyses provide insight about how the binding pocket residues interact with PFAS and about the strengths of these interactions. Together with the hydrogen bonding patterns, these analyses provide insight into the role of different residues in the stabilization of PFAS in the investigated binding pockets. 3.3.3.1 PPAR𝛾 ligand binding domain – Pocket 1 The interaction patterns of PFAS with the surrounding residues in Pocket 1 provides critical insight about the binding patterns of these compounds. The interaction energies of the different PFAS versus L-carnitine in Pocket 1 with each residue were averaged and are plotted in Figure 38 Figure 3.5 Average residue decomposition energies for Pocket 1. Averaged energies of PFAS (PFOS, PFOA, PFHxDA, ADONA, GenX and Et-PFOSAAcOH, and 6:2 FTSA) vs L-carnitine (LCN). Only the residues that have contributions above +5 kcal mol-1 and below -5 kcal mol-1 are shown. 5. The hydrogen bond percentages for each PFAS and L-carnitine in Pocket 1 are reported in Figure 8 (A). Even though the side chains of the basic residues do not directly interact with the PFAS compounds, Arg288, as an example, has stabilizing energetic contributions to the binding. Arginines and lysines have negative, i.e, stabilizing, effects on PFAS due to their positively charged side chains. On the other hand, the amino acids with acidic side chains have non-stabilizing effects on PFAS binding, due to negative-negative charge repulsions. This observation can be attributed to the total charge of the functional groups of the PFAS. These PFAS compounds, with the exception of 6:2 FTOH and PFOSA, have a net -1 charge, which enables salt bridges to be formed with nearby basic residues such as Lys367. These salt bridges are very strong and persistent, with large hydrogen 39 bond percentages (Figure 8 (A)). For instance, the Lys367 residue formed a strong hydrogen bond interaction with each PFAS that has a net charge but did not interact with PFOSA which is a neutral compound (Figure 5). Interestingly, the only hydrogen bond that PFOSA made was with His449, which has an interaction strength of ∼ -10 kcal mol-1. The largest negative electrostatic interaction came from Lys367, resulting in a -75 kcal mol-1, on average. In addition, Lys367 forms the strongest hydrogen bonds to investigated compounds, with the exception of PFOSA and L-carnitine. Tyr327 also formed a hydrogen bond with most of the PFAS; although this residue did not have a very large negative electrostatic interaction on average (∼ -10 kcal mol-1), the hydrogen bonding with the PFAS species was quite strong. His449 also formed interactions with PFAS at ∼1 kcal mol-1, on average, while forming persistent hydrogen bonding with PFOSA and L-carnitine. An interesting observation for Pocket 1 was that L-carnitine only had one repulsive interaction with Lys457, while the rest of the PFAS have strong electrostatic interactions with this residue. Due to the zwitterionic nature of L-carnitine, the positively charged moiety orients towards Lys457 and results in repulsive interaction with Lys457. 3.3.3.2 DNA binding domain (DBD) – Pocket 3 For Pocket 3, the largest contributing residues towards binding were calculated and analyzed. With the exception of Et-PFOSAAcOHEt, no other PFAS formed hydrogen bonds with the sur- rounding residues. However, there were still strong electrostatic interactions with some of the residues, as detailed in Figure 7. The per-residue decomposition of the PFAS that coordinate to the zinc finger domain was plotted against PFAS that did not coordinate to zinc (Figure 7). The PFAS that kept their coordination to zinc were PFOS, PFOA, PFHxDA, GenX, ADONA, Et- PFOSAAcOHEt, and 6:2 FTSA. The other two PFAS (PFOSA and 6:2 FTOH), did not coordinate to zinc, but stayed in the pocket and in the vicinity of Zn2+. Even though PFOSA and 6:2 FTOH did not coordinate to the Zn2+ and moved substantially within the binding pocket, they remained bound to the protein, albeit near the DNA instead of the zinc ion. The PFAS that did coordinate to Zn2+ were very stable within the binding pocket and did not show large conformational changes due to the strong interaction with the zinc ion. The average PFAS interaction with the zinc ion 40 Figure 3.6 Hydrogen bond lifetimes of the PFAS and L-carnitine (LCN) in Pocket 1. The x-axis shows the simulated systems and y-axis shows the residue/atom information. is -211 kcal mol-1. Other residues that formed stabilizing interactions in this pocket are Arg147, Arg209, and Arg211, along with Lys161 and Lys213 from PPAR𝛾. As per Figure S3, the other four residues that coordinate to zinc are deprotonated cysteines. These cysteines repel PFAS in the binding pocket, with average interaction energies of ∼75-80 kcal mol-1 for Cym148, Cym152, and Cym162. However, for Cym165, the repulsion energy dropped to 54.5 kcal mol-1 versus the energy of other cysteines. The non-stabilizing contributions from these cysteine residues resulted in the largest contributions among the binding pocket residues. In general, aspartate residues, due to the negative charge on their side chain, also repelled the PFAS. It is interesting to note the role of DNA bases in binding. The energetic contribution from the DNA bases was always positive and ranged from 18.1 kcal mol-1 for DG471 to 24.2 kcal mol-1 for DT485. The PFAS that did not coordinate to the zinc dication remained in the proximity of the DNA bases during the simulations. The 41 Figure 3.7 Average residue decomposition energies for the DBD pocket. Averaged energies of PFAS (PFOS, PFOA, PFHxDA, ADONA, GenX and Et-PFOSAAc-OHEt and 6:2 FTSA) coordinated to zinc versus energies of PFAS not coordinated to zinc (6:2 FTOH and PFOSA). interactions with DG472, DC482, DT483 and DT484 base pairs were quite weak (all below -5 kcal mol-1). In addition, Asp146 resulted in negative interactions with PFOSA and 6:2 FTOH relative to its interactions with the other seven PFAS that coordinate to Zn2+. The other two residues that contribute towards binding were Gln163 and Arg209. 3.3.4 Interactions of DNA binding domains with the DNA molecule The hydrogen bonding lifetimes of PPAR𝛾-DNA and RXR𝛼-DNA for the apo structures, in the presence of the co-crystallized ligands, bound PFAS, and L-carnitine in Pockets 1, 2, and 3 are depicted in Figures 8 and 9. The hydrogen bond network between the DBDs and DNA is crucial for the communication between the nuclear receptors and DNA. Comparing the interaction patterns in the apo structures and co-crystallized ligands against PFAS and L-carnitine bound complexes for the different pockets provides insight about the changes in the hydrogen bonding network with DNA. In addition, this analysis provides clarity about how PFAS binding affects the communication between PPAR𝛾/RXR𝛼 receptors and the DNA. Most of the hydrogen bonds between the PPAR𝛾-DBD and the DNA remained the same between the apo and co-crystallized ligand complexes (Figure 8). For these residue pairs, the most persistent interaction was DG486/Arg166 with a 100% lifetime, 42 followed by DT464/Tyr123 (∼50%), DT464/Arg132 (∼50%), and DG468/Arg159 (∼50%). When the ligand binding domains did not include a ligand (apo), DG465 formed a hydrogen bond with Arg140 (∼70% of the simulation time); however, in the presence of co-crystallized ligands (9CR and PLB), this interaction was not observed. Furthermore, in apo simulations, the DNA base DC488 forms a hydrogen bond with Glu129, which no longer occurs in the presence of co- crystallized ligands. On the contrary, the DG486/Arg137 interaction was only observed for the protein with the co-crystallized ligands, but not for the apo system. For the majority of the PFAS simulations in Pocket 1, the most striking differences were observed for the hydrogen bonding of DT485/Arg137 and DG486/Arg166 pairs, between the apo and co-crystalized ligand complexes simulation. While DT485/R137 had a low persistence in apo and co-crystallized ligand systems, in the majority of the PFAS simulations in Pocket 1, apart from PFOA, DT585 formed a strong hydrogen bond with Arg137. Another exception was observed for GenX where the DG486/Arg166 interaction was no longer present throughout the simulation. In addition, the interaction between the DG486/R159 occurred for a longer timeframe in the simulation. The hydrogen bonds in Pocket 2 showed similarities to what was observed in Pocket 1. DT485/R137 had a strong presence (∼90% of simulation time) for all PFAS in Pocket 2 apart from PFOA, and the DG486/Arg159 and DG486/Arg166 interactions persisted for all PFAS in Pocket 2 with a higher hydrogen bonding percentage in L-carnitine. On the other hand, the DG464/Tyr123 and DG464/Lys132 interactions displayed an interaction strength similar to that of apo and co-crystallized ligand systems for PFOA, PFHxDA, PFOS, and 6:2 FTSA. For the rest of the compounds, the hydrogen bond lifetimes were very short. Another interesting interaction observed in Pocket 2 was the interaction between DG464 and Arg140 only for PFOA, PFHxDA, and PFOS, for almost ∼80% of the simulation time. This interaction strength was not observed for the same PFAS in Pocket 1. And lastly, the presence of ADONA prompted the interaction between DT484 and Glu163 residue with a lifetime of ∼90% of the simulation. The Pocket 3 hydrogen bond patterns between PPAR𝛾-DBD and DNA bases present similar interactions for DG486/Arg159 and DG486/Arg166 pairs. In contrast to Pockets 1 and 2, compounds in Pocket 3 showed higher hydrogen bond percentages for 6:2 FTSA, PFOSA, 43 Figure 3.8 Hydrogen bond lifetimes of the PPAR𝛾-DNA binding domain considering the protein’s: apo structure, with its co-crystallized ligands (PLB, 9CR), PFAS, and L-carnitine (LCN) bound to Pockets 1, 2, and 3. The y-axis shows the hydrogen bond pairs between DNA and PPAR𝛾 residues involved in hydrogen bonding. The interactions that persist more than 10% of the simulation time are reported. DA, DC, DG, and DT represent the DNA bases. The notation on the y-axis represents the DNA base/Protein residue. Et-PFOSAAcOH, GenX, and ADONA simulations. Another significant difference for Pocket 3 is that, overall, the hydrogen bonding persistence of DT485/Arg137 is lower than what was observed in Pocket 1 and Pocket 2. When comparing the apo and the co-crystallized ligand complexes with PFAS bound structures, there are notable differences. The most persistent hydrogen bond of all simulations, DG486/Arg166, was hindered by the presence of GenX, which went from 100% persistence to 0% in Pocket 1. For Pockets 1 and 2, there was an increase in the hydrogen bond lifetime for DT485/Arg137 (doubled) for all PFAS in these pockets, except for PFOA. For Pocket 3, there was also an increase in the hydrogen bond lifetime of DT485/Arg137 pair. As mentioned previously, for the apo structure, the 44 base pair DC488 forms a hydrogen bond with Glu129. In addition, for Pocket 1 and 3, the DC488 and Glu129 bond also forms hydrogen bonding upon PFAS and L-carnitine binding, (except for PFOA in Pocket 1). However, for Pocket 2, the DC488 and Glu129 bond hydrogen bond either did not occur, or it occurred for less than 10% of the simulation time for all PFAS and L-carnitine. Furthermore, when the co-crystalized ligands are present, the hydrogen bond does not form for DC488/Glu129 pair. Another interesting feature when comparing the three pockets occurs for the DG464 and Arg140 interaction. For the simulation with co-crystallized ligands and the apo complex, there was a very small percentage of hydrogen bonding between the DNA base and Arg140. On the other hand, in Pockets 1, 2, and 3, there were numerous simulations that indicated an increased percentage of this hydrogen bonding, while L-carnitine showed a negligible hydrogen bonding percentage. Furthermore, an average hydrogen bond between DG465/Arg140 lasted for ∼60% of the apo simulation, but for all of the PFAS simulations, it was a lot weaker or nonexistent in all pockets. The effects of the apo structure, co-crystallized ligands, PFAS and L-carnitine bound to the RXR𝛼-LBD on the hydrogen bonding network with the DNA (DBD) are depicted in Figure 9. The interactions between the RXR𝛼-DBD and the DNA molecule show that the DT478/Arg161 (∼100% of time), DG479/Arg191 (∼90%) and DG479/Arg184 (∼80%), DG471/Tyr147(∼60%) interactions persisted very strongly in both the apo and co-crystallized ligand complexes. On the other hand, DG472/Arg164 formed a hydrogen bond interaction only for the natural ligand system for ∼80% of the simulation time. Similarly, DC481/Arg141 only occurred in the apo simulation for ∼40% of the simulation time. DG471/Arg164 interaction resulted in a ∼60% hydrogen bonding lifetime for apo, and 40% for the co-crystallized ligand simulations; however, there was a large increase for PFAS bound in Pockets 1, 2, and 3. For instance, PFOSA bound systems (for all pockets investigated) showed a higher hydrogen bonding percentage for the DG471/Arg164 interaction. Generally, PFAS bound to Pocket 3 had a higher persistence than the two other binding pockets for this residue pair. Another example of a large change in the hydrogen bonding network was observed for DC488/Glu129. When the co-crystallized ligands are bound to RXR𝛼 and PPAR𝛾, there was no hydrogen bonding between this base pair and Glu129. For Pockets 1 and 3, this hydrogen bond 45 Figure 3.9 Hydrogen bond lifetimes of the RXR𝛼-DNA binding domain considering the apo structure, with its co-crystallized ligands (PLB, 9CR), PFAS, and L-carnitine (LCN) bound to Pockets 1, 2, and 3. The y-axis shows the hydrogen bond pairs between RXR𝛼 residues, and in parenthesis, the residue involved in hydrogen bonding pertaining PFAS/L-carnitine. The interactions that persist more than 10% of the simulation time are reported. was formed for all PFAS (except PFOA in Pocket 1) and the apo structure. However, for Pocket 2, this interaction was not formed for any PFAS bound to the RXR𝛼 binding domain. 3.3.5 Effect of PFAS binding on the DNA motion In the previous section, how the binding of PFAS to Pocket 1, 2 and 3 has a direct impact on the interaction between proteins and the DNA molecule was discussed. To further understand the effects of the PFAS on DNA motions, the bending of the DNA was investigated for all simula- tions.49,132 49,120 Skaf et al. investigated the effect on an isolated DNA stretch and discovered that the apo structure is prone to bend up to 50Β°with a most dominant bending angle of ∼15-20Β°. 49 In this work, a similar analysis was performed, i.e, the whole structure of the heterodimer, DNA, and the co-activator (NCOA2) were considered in its apo form. For this 200 ns simulation, the 46 DNA bending was calculated to be ∼42Β°(Figures S30, S31). In addition, a comparison with the co-crystallized ligand simulation was also performed, without the presence of the co-activator as the structure was already in its activated form. For the latter, the average DNA bending was ∼ 9Β°. The last analysis performed was the comparison of the DNA bending among the three pockets upon PFAS and L-carnitine binding. Overall, the bending increased between 1 to 2Β°with PFAS binding for all considered binding pockets with respect to the co-crystallized ligand complex. Compounds in Pocket 1 showed bending in a range of 9.0 to 9.5Β°in the presence of PFAS. ADONA had the smallest bending angle, and L-carnitine had the largest. For Pocket 2, the DNA bending upon PFAS binding was not as pronounced as Pocket 1. Et-PFOSAAcOH, ADONA, GenX and PFHxDA led to bending of the DNA molecule around 9 degrees, while PFOSA, PFOS, 6:2 FTSA, L-carnitine changed the angle to 9.5Β°. And finally, in Pocket 3, some of the PFAS had a more pronounced effect on the DNA bending compared to Pocket 1 and 2. Since Pocket 3 corresponds to one of the DNA binding domains, it is expected to have a stronger effect on the DNA interaction. Et-PFOSAAcOH had the most pronounced effect on the bending, very close to 10 degrees bending. The effect from other PFAS was 9 to 10Β°. It is important to note that for all investigated binding pockets, larger DNA bending was observed for PFAS and L-carnitine when compared to the co-crystallized ligand complex. The co-crystallized ligands are agonists for PPAR𝛾 and RXR𝛼, therefore, observing similar DNA bending angles in the presence of PFAS implies that these molecules can replicate the downstream effects as agonist compounds. Furthermore, interestingly, for Pocket 3, the binding of PFAS results in similar behaviors as for Pockets 1 and 2, indicating that Pocket 3 could be another potential binding location for these PFAS compounds. 3.4 Conclusion Herein, detailed structural analyses of the PPAR𝛾/RXR𝛼-DNA structures bound to PFAS, and L-carnitine were performed showing the potential of the selected PFAS as agonists. In addition, a comparison of the co-crystallized ligands with the apo structure was conducted. RMSF analysis indicated clearly that PFAS binding to the investigated binding pockets affects the movements of the LBDs and DBDs. More specifically, the hinge regions of RXR𝛼 and PPAR𝛾 have higher 47 Figure 3.10 Distribution of the DNA bending angle in the Pocket 1 simulations with PFAS and L-carnitine. LCN: L-carnitine, PLB, 9CR: Natural ligands. fluctuations upon PFAS binding, when compared to the apo and co-crystallized ligand simulations. A direct comparison of different PFAS and their binding energies indicated that the size of the carbon chain is proportional to the binding energies. Furthermore, the longer the carbon chain, the stronger the interaction energies, suggesting that the not only electrostatic interactions formed by the functional groups of PFAS, but the hydrophobic interactions with the PFAS tail are crucial for the strength of binding. In terms of the RXR𝛼/PPAR𝛾 binding domains (Pockets 1 and 2), both PFHxDA and Et-PFOSAAcEtOH resulted in the highest binding affinities. Emergent PFAS such as ADONA and GenX can be competitively replaced by L-carnitine in Pocket 2. Considering the DNA binding domain, DFT and DLPNO calculations predicted that ADONA and PFOS are the strongest binding PFAS within the pocket. Moreover, some of the PFAS showed that by moving from the initial zinc and cysteine coordination, the PFAS can still be buried in the pocket and be stabilized by other key residues, with no disruption to the overall secondary structure packing of the proteins or to the interaction with the DNA oligomer. This is the first time in literature such a discovery has been made. It is known that the ligand binding domains of PPAR𝛾 and RXR𝛼 proteins are the primary binding sites for the investigated PFAS, with stronger preference for the RXR𝛼 LBD, based 48 on our overall binding energies. The third investigated site, near to a Zinc finger domain, can be a secondary or non-specific binding of PFAS to the PPAR𝛾/RXR𝛼-DNA complex. Furthermore, for all of the investigated binding pockets, the key residues have been identified, which are fundamental for developing compounds that can competitively replace PFAS from NRs. Finally, disruptions of the hydrogen bonding network of RXR𝛼-DNA and PPAR𝛾-DNA upon PFAS binding have been carried out for the three pockets. For DNA-residue pairs such as DC488/Glu129 (PPAR𝛾-DNA) and DC481/Glu153 (RXR𝛼-DNA), there is a decrease in hydrogen bonding when PFAS are bound to Pockets 1, 2 and 3. On the other hand, for residues such as DT485/Arg137 and DG472/Arg164, there is an increase in the hydrogen bond network of these DNA base pairs and protein residues for certain PFAS. DNA bending is associated with activation of PPAR𝛾/RXR𝛼 complex. Upon PFAS and L-carnitine binding to the three different pockets, a bending angle of ∼9Β°, similar to that shown in co-crystallized ligand bound simulations, was observed. However, with the removal of all ligands from the binding pockets, the bending angle of the DNA reaches the highest value at ∼42Β°. It is important to note that, along with the interactions made with Helix 12, the observed bending angles provide evidence that PFAS acts as an agonist and may trigger the same downstream effects as natural ligands upon binding to the PPAR𝛾/RXR𝛼 complex. The results presented here for PFAS binding to a biologically relevant nuclear receptor complex provides important insight towards establishing PFAS mitigation strategies and better understanding the health implications of PFAS exposure. Furthermore, by identifying where and how strong PFAS bind, and which residues are responsible for molecular recognition, insight can be gained towards potential in vivo mitigation strategies - for the rational design of a mitigator compound which could help to alleviate the effects of PFAS in humans. 49 BIBLIOGRAPHY [1] Sajid, M. and Ilyas, M. (2017). Ptfe-coated non-stick cookware and toxicity concerns: A perspective. Environmental Science and Pollution Research, 24:23436–23440. [2] Rao, N. S. and Baker, B. E. (1994). Textile Finishes and Fluorosurfactants, pages 321–338. Springer US. [3] Schaider, L. A., Balan, S. A., Blum, A., Andrews, D. Q., Strynar, M. J., Dickinson, M. E., Lunderberg, D. M., Lang, J. R., and Peaslee, G. F. (2017). Fluorinated compounds in u.s. fast food packaging. Environmental Science & Technology Letters, 4:105–111. doi: 10.1021/acs.estlett.6b00435. [4] Buck, R. C., Franklin, J., Berger, U., Conder, J. M., Cousins, I. T., de Voogt, P., Jensen, A. A., Kannan, K., Mabury, S. A., and van Leeuwen, S. P. (2011). Perfluoroalkyl and polyflu- oroalkyl substances in the environment: Terminology, classification, and origins. Integrated Environmental Assessment and Management, 7:513–541. [5] Gagliano, E., Sgroi, M., Falciglia, P. P., Vagliasindi, F. G., and Roccaro, P. (2020). Removal of poly- and perfluoroalkyl substances (pfas) from water by adsorption: Role of pfas chain length, effect of organic matter and challenges in adsorbent regeneration. Water Research, 171:115381. [6] Schumm, C. E., Loganathan, N., and Wilson, A. K. (2023). Influence of soil minerals on the adsorption, structure, and dynamics of genx. ACS ES&T Water, 3:2659–2670. doi: 10.1021/ac- sestwater.3c00171. [7] Loganathan, N. and Wilson, A. K. (2022). Adsorption, structure, and dynamics of short- and long-chain pfas molecules in kaolinite: Molecular-level insights. Environmental Science & Technology, 56:8043–8052. doi: 10.1021/acs.est.2c01054. [8] Almeida, N. M. S., Eken, Y., and Wilson, A. K. (2021). Binding of per- and polyfluoro-alkyl substances to peroxisome proliferator-activated receptor gamma. ACS Omega, 6:15103–15114. doi: 10.1021/acsomega.1c01304. [9] Lai, T. T., Eken, Y., and Wilson, A. K. (2020). Binding of per- and polyfluoroalkyl substances to the human pregnane x receptor. Environmental Science & Technology, 54:15986–15995. [10] Yu, C. H., Riker, C. D., en Lu, S., and Fan, Z. T. (2020). Biomonitoring of emerging contaminants, perfluoroalkyl and polyfluoroalkyl substances (pfas), in new jersey adults in 2016–2018. International Journal of Hygiene and Environmental Health, 223:34–44. [EPA] Us environmental protection agency epa’s per- and polyfluoroalkyl substances (pfas) action plan 2019 no. february. [12] Houck, K. A., Patlewicz, G., Richard, A. M., Williams, A. J., Shobair, M. A., Smeltz, M., 50 Clifton, M. S., Wetmore, B., Medvedev, A., and Makarov, S. (2021). Bioactivity profiling of per- and polyfluoroalkyl substances (pfas) identifies potential toxicity pathways related to molecular structure. Toxicology, 457:152789. [13] Munoz, G., Liu, J., Duy, S. V., and SauvΓ©, S. (2019). Analysis of f-53b, gen-x, adona, and emerging fluoroalkylether substances in environmental and biomonitoring samples: A review. Trends in Environmental Analytical Chemistry, 23:e00066. [14] Guo, H., Chen, J., Zhang, H., Yao, J., Sheng, N., Li, Q., Guo, Y., Wu, C., Xie, W., and Dai, J. (2022). Exposure to genx and its novel analogs disrupts hepatic bile acid metabolism in male mice. Environmental Science & Technology, 56:6133–6143. doi: 10.1021/acs.est.1c02471. [15] Robarts, D. R., Venneman, K. K., Gunewardena, S., and Apte, U. (2022). Genx induces fibroinflammatory gene expression in primary human hepatocytes. Toxicology, 477:153259. [16] Weikum, E. R., Liu, X., and Ortlund, E. A. (2018). The nuclear receptor superfamily: A structural perspective. Protein Science, 27:1876–1892. [17] Desvergne, B. and Wahli, W. (1999). Peroxisome proliferator-activated receptors: Nuclear control of metabolism. Endocrine Reviews, 20:649–688. [18] Lemotte, P. K., Keidel, S., and Apfel, C. M. (1996). Phytanic acid is a retinoid x recep- tor ligand. European Journal of Biochemistry, 236:328–333. https://doi.org/10.1111/j.1432- 1033.1996.00328.x. [19] Fulton, J., Mazumder, B., Whitchurch, J. B., Monteiro, C. J., Collins, H. M., Chan, C. M., Clemente, M. P., Hernandez-Quiles, M., Stewart, E. A., Amoaku, W. M., Moran, P. M., Mongan, N. P., Persson, J. L., Ali, S., and Heery, D. M. (2017). Heterodimers of photoreceptor-specific nuclear receptor (pnr/nr2e3) and peroxisome proliferator-activated receptor- 𝛾 (ppar 𝛾) are disrupted by retinal disease-associated mutations. Cell Death & Disease, 8:e2677–e2677. [20] Todorov, V. T., Desch, M., Schmitt-Nilson, N., Todorova, A., and Kurtz, A. (2007). Per- oxisome proliferator-activated receptor- 𝛾 is involved in the control of renin gene expression. Hypertension, 50:939–944. doi: 10.1161/HYPERTENSIONAHA.107.092817. [21] Estany, J., Ros-Freixedes, R., Tor, M., and Pena, R. N. (2014). A functional variant in the stearoyl-coa desaturase gene promoter enhances fatty acid desaturation in pork. PLoS ONE, 9:e86177. [22] Okuno, M., Arimoto, E., Ikenobu, Y., Nishihara, T., and Imagawa, M. (2001). Dual dna- binding specificity of peroxisome-proliferator-activated receptor 𝛾 controlled by heterodimer formation with retinoid x rceptor 𝛼. Biochemical Journal, 353:193–198. [23] Nolte, R. T., Wisely, G. B., Westin, S., Cobb, J. E., Lambert, M. H., Kurokawa, R., Rosenfeld, M. G., Willson, T. M., Glass, C. K., and Milburn, M. V. (1998). Ligand binding and co-activator 51 assembly of the peroxisome proliferator-activated receptor-𝛾. Nature, 395:137–143. [24] Chandra, V., Huang, P., Hamuro, Y., Raghuram, S., Wang, Y., Burris, T. P., and Rastinejad, F. (2008). Structure of the intact ppar-𝛾–rxr-𝛼 nuclear receptor complex on dna. Nature, 456:350–356. [25] Hernandez-Quiles, M., Broekema, M. F., and Kalkhoven, E. (2021). Ppargamma in metabolism, immunity, and cancer: Unified and diverse mechanisms of action. Frontiers in Endocrinology, 12:36. [26] Bain, D. L., Heneghan, A. F., Connaghan-Jones, K. D., and Miura, M. T. (2007). Nuclear receptor structure: Implications for function. Annual Review of Physiology, 69:201–220. [27] Khorasanizadeh, S. and Rastinejad, F. (2001). Nuclear-receptor interactions on dna-response elements. Trends in Biochemical Sciences, 26:384–390. [28] Jeninga, E. H., Gurnell, M., and Kalkhoven, E. (2009). Functional implications of genetic variation in human ppar𝛾. Trends in Endocrinology & Metabolism, 20:380–387. [29] Krezel, A. and Maret, W. (2016). The biological inorganic chemistry of zinc ions. Archives of Biochemistry and Biophysics, 611:3–19. [30] Harney, A. S., Lee, J., Manus, L. M., Wang, P., Ballweg, D. M., LaBonne, C., and Meade, T. J. (2009). Targeted inhibition of snail family zinc finger transcription factors by oligonucleotide- co(iii) schiff base conjugate. Proceedings of the National Academy of Sciences, 106:13667– 13672. doi: 10.1073/pnas.0906423106. [31] Yuan, S., Ding, X., Cui, Y., Wei, K., Zheng, Y., and Liu, Y. (2017). Cisplatin preferentially binds to zinc finger proteins containing c3h1 or c4 motifs. European Journal of Inorganic Chemistry, 2017:1778–1784. https://doi.org/10.1002/ejic.201601140. [32] Sheng, Y., Cao, K., Li, J., Hou, Z., Yuan, S., Huang, G., Liu, H., and Liu, Y. (2018). Selective targeting of the zinc finger domain of hiv nucleocapsid protein ncp7 with ruthenium complexes. Chemistry – A European Journal, 24:19146–19151. https://doi.org/10.1002/chem.201803917. [33] Kluska, K., Adamczyk, J., and Krezel, A. (2018). Metal binding properties, stability and reactivity of zinc fingers. Coordination Chemistry Reviews, 367:18–64. [34] Quintal, S. M., DePaula, Q. A., and Farrell, N. P. (2011). Zinc finger proteins as templates for metal ion exchange and ligand reactivity. chemical and biological consequences. Metallomics, 3:121. [35] Baglivo, I., Russo, L., Esposito, S., Malgieri, G., Renda, M., Salluzzo, A., Blasio, B. D., Isernia, C., Fattorusso, R., and Pedone, P. V. (2009). The structural role of the zinc ion can be dispensable in prokaryotic zinc-finger domains. Proceedings of the National Academy of 52 Sciences, 106:6933–6938. [36] Dudev, T. and Lim, C. (2002). Factors governing the protonation state of cysteines in proteins: An ab initio/cdm study. Journal of the American Chemical Society, 124:6759–6766. doi: 10.1021/ja012620l. [37] Issemann, I., Prince, R. A., Tugwood, J. D., and Green, S. (1993). The peroxisome proliferator- activated receptor: Retinoid x receptor heterodimer is activated by fatty acids and fibrate hy- polipidaemic drugs. Journal of Molecular Endocrinology, 11:37–47. [38] Mangelsdorf, D. J. and Evans, R. M. (1995). The rxr heterodimers and orphan receptors. Cell, 83:841–850. [39] Kliewer, S. A., Umesono, K., Noonan, D. J., Heyman, R. A., and Evans, R. M. (1992). Convergence of 9-cis retinoic acid and peroxisome proliferator signalling pathways through heterodimer formation of their receptors. Nature, 358:771–774. [40] Schulman, I. G., Shao, G., and Heyman, R. A. (1998). Transactivation by retinoid x recep- tor–peroxisome proliferator-activated receptor 𝛾 (ppar𝛾) heterodimers: Intermolecular synergy requires only the ppar]𝛾 hormone-dependent activation function. Molecular and Cellular Biol- ogy, 18:3483–3494. [41] Xu, H., Lambert, M. H., Montana, V. G., Parks, D. J., Blanchard, S. G., Brown, P. J., Sternbach, D. D., Lehmann, J. M., Wisely, G., Willson, T. M., Kliewer, S. A., and Milburn, M. V. (1999). Molecular recognition of fatty acids by peroxisome proliferator–activated receptors. Molecular Cell, 3:397–403. [42] Oberfield, J. L., Collins, J. L., Holmes, C. P., Goreham, D. M., Cooper, J. P., Cobb, J. E., Lenhard, J. M., Hull-Ryde, E. A., Mohr, C. P., Blanchard, S. G., Parks, D. J., Moore, L. B., Lehmann, J. M., Plunket, K., Miller, A. B., Milburn, M. V., Kliewer, S. A., and Willson, T. M. (1999). A peroxisome proliferator-activated receptor 𝛾 ligand inhibits adipocyte differentiation. Proceedings of the National Academy of Sciences, 96:6102–6106. doi: 10.1073/pnas.96.11.6102. [43] Ostberg, T., Svensson, S., SelΓ©n, G., Uppenberg, J., Thor, M., Sundbom, M., Sydow-BΓ€ckman, M., Gustavsson, A.-L., and Jendeberg, L. (2004). A new class of peroxisome proliferator- activated receptor agonists with a novel binding epitope shows antidiabetic effects. Journal of Biological Chemistry, 279:41124–41130. [44] Burgermeister, E., Schnoebelen, A., Flament, A., Benz, J., Stihle, M., Gsell, B., Rufer, A., Ruf, A., Kuhn, B., Marki, H. P., Mizrahi, J., Sebokova, E., Niesor, E., and Meyer, M. (2006). A novel partial agonist of peroxisome proliferator-activated receptor-𝛾 (ppar𝛾) recruits ppar𝛾- coactivator-1𝛼, prevents triglyceride accumulation, and potentiates insulin signaling in vitro. Molecular Endocrinology, 20:809–830. 53 [45] Pochetti, G., Godio, C., Mitro, N., Caruso, D., Galmozzi, A., Scurati, S., Loiodice, F., Fracchiolla, G., Tortorella, P., Laghezza, A., Lavecchia, A., Novellino, E., Mazza, F., and Crestani, M. (2007). Insights into the mechanism of partial agonism: Crystal structures of the peroxisome proliferator-activated receptor gamma ligand-binding domain in the complex with two enantiomeric ligands. The Journal of biological chemistry, 282:17314–24. [46] Li, Y., Wang, Z., Furukawa, N., Escaron, P., Weiszmann, J., Lee, G., Lindstrom, M., Liu, J., Liu, X., Xu, H., Plotnikova, O., Prasad, V., Walker, N., Learned, R. M., and Chen, J.-L. (2008). T2384, a novel antidiabetic agent with unique peroxisome proliferator-activated receptor 𝛾 binding properties. Journal of Biological Chemistry, 283:9168–9176. [47] Motani, A., Wang, Z., Weiszmann, J., McGee, L. R., Lee, G., Liu, Q., Staunton, J., Fang, Z., Fuentes, H., Lindstrom, M., Liu, J., Biermann, D. H. T., Jaen, J., Walker, N. P. C., Learned, R. M., Chen, J.-L., and Li, Y. (2009). Int131: A selective modulator of ppar gamma. Journal of molecular biology, 386:1301–11. [48] Bruning, J. B., Chalmers, M. J., Prasad, S., Busby, S. A., Kamenecka, T. M., He, Y., Nettles, K. W., and Griffin, P. R. (2007). Partial agonists activate ppargamma using a helix 12 independent mechanism. Structure (London, England : 1993), 15:1258–71. [49] Ricci, C. G., Silveira, R. L., Rivalta, I., Batista, V. S., and Skaf, M. S. (2016). Allosteric pathways in the ppar𝛾-rxr𝛼 nuclear receptor complex. Scientific Reports, 6:19940. [50] Levin, A. A., Sturzenbecker, L. J., Kazmer, S., Bosakowski, T., Huselton, C., Allenby, G., Speck, J., Ratzeisen, C., Rosenberger, M., Lovey, A., and Grippo, J. F. (1992). 9-cis retinoic acid stereoisomer binds and activates the nuclear receptor rxr𝛼. Nature, 355:359–361. [51] Zeng, Z., Song, B., Xiao, R., Zeng, G., Gong, J., Chen, M., Xu, P., Zhang, P., Shen, M., and Yi, H. (2019). Assessing the human health risks of perfluorooctane sulfonate by in vivo and in vitro studies. Environment international, 126:598–610. [52] Sunderland, E. M., Hu, X. C., Dassuncao, C., Tokranov, A. K., Wagner, C. C., and Allen, J. G. (2019). A review of the pathways of human exposure to poly- and perfluoroalkyl sub- stances (pfass) and present understanding of health effects. Journal of Exposure Science & Environmental Epidemiology, 29:131–147. [53] Rappazzo, K., Coffman, E., and Hines, E. (2017). Exposure to perfluorinated alkyl sub- stances and health outcomes in children: A systematic review of the epidemiologic literature. International Journal of Environmental Research and Public Health, 14:691. [54] Szilagyi, J. T., Avula, V., and Fry, R. C. (2020). Perfluoroalkyl substances (pfas) and their effects on the placenta, pregnancy, and child development: a potential mechanistic role for placental peroxisome proliferator–activated receptors (ppars). Current Environmental Health Reports, 7:222–230. 54 [55] Anderko, L. and Pennea, E. (2020). Exposures to per-and polyfluoroalkyl substances (pfas): Potential risks to reproductive and children’s health. Current Problems in Pediatric and Adoles- cent Health Care, 50:100760. [56] Roth, J., Abusallout, I., Hill, T., Holton, C., Thapa, U., and Hanigan, D. (2020). Release of volatile per- and polyfluoroalkyl substances from aqueous film-forming foam. Environmental Science & Technology Letters, 7:164–170. doi: 10.1021/acs.estlett.0c00052. [57] Xu, Y., Jurkovic-Mlakar, S., Li, Y., Wahlberg, K., Scott, K., Pineda, D., Lindh, C. H., Jakobs- son, K., and EngstrΓΆm, K. (2020). Association between serum concentrations of perfluoroalkyl substances (pfas) and expression of serum micrornas in a cohort highly exposed to pfas from drinking water. Environment International, 136:105446. [58] Hu, X. C., Andrews, D. Q., Lindstrom, A. B., Bruton, T. A., Schaider, L. A., Grandjean, P., Lohmann, R., Carignan, C. C., Blum, A., Balan, S. A., Higgins, C. P., and Sunderland, E. M. (2016). Detection of poly- and perfluoroalkyl substances (pfass) in u.s. drinking water linked to industrial sites, military fire training areas, and wastewater treatment plants. Environmental Science & Technology Letters, 3:344–350. doi: 10.1021/acs.estlett.6b00260. [59] Chou, H.-C., Wen, L.-L., Chang, C.-C., Lin, C.-Y., Jin, L., and Juan, S.-H. (2017). From the cover: l-carnitine via ppar𝛾- and sirt1-dependent mechanisms attenuates epithelial- mesenchymal transition and renal fibrosis caused by perfluorooctanesulfonate. Toxicological Sciences, 160:217–229. [60] Wen, L.-L., Lin, C.-Y., Chou, H.-C., Chang, C.-C., Lo, H.-Y., and Juan, S.-H. (2016). Perfluorooctanesulfonate mediates renal tubular cell apoptosis through ppargamma inactivation. PLOS ONE, 11:e0155190. [61] Duan, X., Sun, W., Sun, H., and Zhang, L. (2021). Perfluorooctane sulfonate continual exposure impairs glucose-stimulated insulin secretion via sirt1-induced upregulation of ucp2 expression. Environmental Pollution, 278:116840. [62] Liu, W.-S., Lai, Y.-T., Chan, H.-L., Li, S.-Y., Lin, C.-C., Liu, C.-K., Tsou, H.-H., and Liu, T.-Y. (2018). Associations between perfluorinated chemicals and serum biochemical markers and performance status in uremic patients under hemodialysis. PloS one, 13:e0200271. [63] Zhang, L., Ren, X.-M., Wan, B., and Guo, L.-H. (2014). Structure-dependent binding and activation of perfluorinated compounds on human peroxisome proliferator-activated receptor 𝛾. Toxicology and Applied Pharmacology, 279:275–283. [64] Khazaee, M., Christie, E., Cheng, W., Michalsen, M., Field, J., and Ng, C. (2021). Perfluo- roalkyl acid binding with peroxisome proliferator-activated receptors 𝛼, 𝛾, and 𝛿, and fatty acid binding proteins by equilibrium dialysis with a comparison of methods. Toxics, 9:45. [65] Soderstrom, S., Lille-Langoy, R., Yadetie, F., Rauch, M., Milinski, A., Dejaegere, A., Stote, 55 R. H., Goksoyr, A., and Karlsen, O. A. (2022). Agonistic and potentiating effects of perfluo- roalkyl substances (pfas) on the atlantic cod (gadus morhua) peroxisome proliferator-activated receptors (ppars). Environment International, 163:107203. [66] DΓΆpke, M. F., Moultos, O. A., and Hartkamp, R. (2020). On the transferability of ion parameters to the tip4p/2005 water model using molecular dynamics simulations. The Journal of Chemical Physics, 152:024501. doi: 10.1063/1.5124448. [67] Dale, K., Yadetie, F., Horvli, T., Zhang, X., Froysa, H. G., Karlsen, O. A., and Goksoyr, A. (2022). Single pfas and pfas mixtures affect nuclear receptor- and oxidative stress-related pathways in precision-cut liver slices of atlantic cod (gadus morhua). Science of The Total Environment, 814:152732. [68] Sun, X., Xie, Y., Zhang, X., Song, J., and Wu, Y. (2023). Estimation of per- and polyfluorinated alkyl substance induction equivalency factors for humpback dolphins by transactivation potencies of peroxisome proliferator-activated receptors. Environmental science & technology, 57:3713– 3721. [69] Flanagan, J. L., Simmons, P. A., Vehige, J., Willcox, M. D., and Garrett, Q. (2010). Role of carnitine in disease. Nutrition and Metabolism, 7:30. [70] Heuvel, J. P. V., Thompson, J. T., Frame, S. R., and Gillies, P. J. (2006). Differential activation of nuclear receptors by perfluorinated fatty acid analogs and natural fatty acids: A comparison of human, mouse, and rat peroxisome proliferator-activated receptor-𝛼, -𝛽, and -𝛾, liver x receptor-𝛽, and retinoid x receptor-𝛼. Toxicological Sciences, 92:476–489. [71] Salvalaglio, M., Muscionico, I., and Cavallotti, C. (2010). Determination of energies and sites of binding of pfoa and pfos to human serum albumin. Journal of Physical Chemistry B, 114:14860–14874. doi: 10.1021/jp106584b. [72] Ng, C. A. and Hungerbuehler, K. (2015). Exploring the use of molecular docking to iden- tify bioaccumulative perfluorinated alkyl acids (pfaas). Environmental Science & Technology, 49:12306–12314. doi: 10.1021/acs.est.5b03000. [73] Chen, H., He, P., Rao, H., Wang, F., Liu, H., and Yao, J. (2015). Systematic investigation of the toxic mechanism of pfoa and pfos on bovine serum albumin by spectroscopic and molecular modeling. Chemosphere, 129:217–224. [74] Cheng, W. and Ng, C. A. (2018). Predicting relative protein affinity of novel per- and polyfluoroalkyl substances (pfass) by an efficient molecular dynamics approach. Environmental Science & Technology, 52:7972–7980. doi: 10.1021/acs.est.8b01268. [75] Li, C.-H., Ren, X.-M., Cao, L.-Y., Qin, W.-P., and Guo, L.-H. (2019). Investigation of binding and activity of perfluoroalkyl substances to the human peroxisome proliferator-activated receptor 𝛽/ 𝛿. Environmental Science: Processes & Impacts, 21:1908–1914. 56 [76] Behr, A.-C., Plinsch, C., Braeuning, A., and Buhrke, T. (2020). Activation of human nuclear receptors by perfluoroalkylated substances (pfas). Toxicology in Vitro, 62:104700. [77] Evans, N., Conley, J. M., Cardon, M., Hartig, P., Medlock-Kakaley, E., and Gray, L. E. (2022). In vitro activity of a panel of per- and polyfluoroalkyl substances (pfas), fatty acids, and pharmaceuticals in peroxisome proliferator-activated receptor (ppar) alpha, ppar gamma, and estrogen receptor assays. Toxicology and Applied Pharmacology, 449:116136. [78] Lai, T. T., Kuntz, D., and Wilson, A. K. (2022). Molecular screening and toxicity estimation of 260,000 perfluoroalkyl and polyfluoroalkyl substances (pfass) through machine learning. Journal of Chemical Information and Modeling, 62:4569–4578. [79] (2022). Molecular operating environment (moe), 2022.02 chemical computing group ulc, 1010 sherbooke st. west, suite 910, montreal, qc, canada, h3a 2r7. [80] Eken, Y., Almeida, N. M., Wang, C., and Wilson, A. K. (2021). Sampl7: Host–guest binding prediction by molecular dynamics and quantum mechanics. Journal of Computer-Aided Molecular Design, 35:63–77. [81] Bali, S. K., Marion, A., Ugur, I., Dikmenli, A. K., Catak, S., and Aviyente, V. (2018). Activity of topotecan toward the dna/topoisomerase i complex: A theoretical rationalization. Biochemistry, 57:1542–1551. [82] Labute, P. and Santavy, M. (2010). Sitefinder - locating binding sites in protein structures. [83] Riplinger, C., Pinski, P., Becker, U., Valeev, E. F., and Neese, F. (2016). Sparse mapsβ€”a systematic infrastructure for reduced-scaling electronic structure methods. ii. linear scaling domain based pair natural orbital coupled cluster theory. The Journal of Chemical Physics, 144:024109. [84] Vanquelef, E., Simon, S., Marquant, G., Garcia, E., Klimerak, G., Delepine, J. C., Cieplak, P., and Dupradeau, F.-Y. (2011). R.e.d. server: A web service for deriving resp and esp charges and building force field libraries for new molecules and molecular fragments. Nucleic Acids Research, 39:W511–W517. [85] Bayly, C. I., Cieplak, P., Cornell, W., and Kollman, P. A. (1993). A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the resp model. The Journal of Physical Chemistry, 97:10269–10280. [86] York, D. and P.A. Kollman, D. e. a. (2020). Amber 2020. [87] Galindo-Murillo, R., Robertson, J. C., ZgarbovΓ‘, M., Ε poner, J., Otyepka, M., Jurečka, P., and Cheatham, T. E. (2016). Assessing the current state of amber force field modifications for dna. Journal of Chemical Theory and Computation, 12:4114–4127. doi: 10.1021/acs.jctc.6b00186. 57 [88] He, X., Man, V. H., Yang, W., Lee, T.-S., and Wang, J. (2020). A fast and high-quality charge model for the next generation general amber force field. The Journal of Chemical Physics, 153:114502. [89] Li, P., Song, L. F., and Merz, K. M. (2015). Parameterization of highly charged metal ions using the 12-6-4 lj-type nonbonded model in explicit water. The Journal of Physical Chemistry B, 119:883–895. doi: 10.1021/jp505875v. [90] Li, P. and Merz, K. M. (2017). Metal ion modeling using classical mechanics. Chemical Reviews, 117:1564–1686. doi: 10.1021/acs.chemrev.6b00440. [91] Li, P., Song, L. F., and Merz, K. M. (2015). Systematic parameterization of monovalent ions employing the nonbonded model. Journal of Chemical Theory and Computation, 11:1645–1657. doi: 10.1021/ct500918t. [92] Horn, H. W., Swope, W. C., Pitera, J. W., Madura, J. D., Dick, T. J., Hura, G. L., and Head-Gordon, T. (2004). Development of an improved four-site water model for biomolecular simulations: Tip4p-ew. The Journal of Chemical Physics, 120:9665–9678. [93] Ryckaert, J.-P., Ciccotti, G., and Berendsen, H. J. (1977). Numerical integration of the cartesian equations of motion of a system with constraints: Molecular dynamics of n-alkanes. Journal of Computational Physics, 23:327–341. [94] Onufriev, A., Bashford, D., and Case, D. A. (2004). Exploring protein native states and large- scale conformational changes with a modified generalized born model. Proteins: Structure, Function and Genetics, 55:383–394. [95] Miller, B. R., McGee, T. D., Swails, J. M., Homeyer, N., Gohlke, H., and Roitberg, A. E. (2012). Mmpbsa.py : An efficient program for end-state free energy calculations. Journal of Chemical Theory and Computation, 8:3314–3321. [96] Roe, D. R. and Cheatham, T. E. (2013). Ptraj and cpptraj: Software for processing and analysis of molecular dynamics trajectory data. Journal of Chemical Theory and Computation, 9:3084–3095. doi: 10.1021/ct400341p. [97] Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., and Ferrin, T. E. (2004). Ucsf chimera: A visualization system for exploratory research and analysis. Journal of Computational Chemistry, 25:1605–1612. [98] Blomberg, M. R. A., Borowski, T., Himo, F., Liao, R.-Z., and Siegbahn, P. E. M. (2014). Quantum chemical studies of mechanisms for metalloenzymes. Chemical Reviews, 114:3601– 3658. doi: 10.1021/cr400388t. [99] Findik, B. K., Cilesiz, U., Bali, S. K., Atilgan, C., Aviyente, V., and Dedeoglu, B. (2022). Investigation of iron release from the n- and c-lobes of human serum transferrin by quantum 58 chemical calculations. Organic & Biomolecular Chemistry, 20:8766–8774. [100] Tzeliou, C. E., Mermigki, M. A., and Tzeli, D. (2022). Review on the qm/mm methodologies and their application to metalloproteins. Molecules, 27:2660. [101] Roe, D. R. (2015). Introduction to hydrogen bond analysis. [102] Becke, A. D. (1993). A new mixing of hartree–fock and local density-functional theories. The Journal of Chemical Physics, 98:1372–1377. [103] Perdew, J. P. (1986). Erratum: Density-functional approximation for the correlation energy of the inhomogeneous electron gas. Physical Review B, 34:7406–7406. [104] Grimme, S., Ehrlich, S., and Goerigk, L. (2011). Effect of the damping function in dispersion corrected density functional theory. Journal of Computational Chemistry, 32:1456–1465. [105] Weigend, F. and Ahlrichs, R. (2005). Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for h to rn: Design and assessment of accuracy. Physical Chemistry Chemical Physics, 7:3297–3305. [106] Weigend, F. (2006). Accurate coulomb-fitting basis sets for h to rn. Physical Chemistry Chemical Physics, 8:1057. [107] Rydberg, P. and Olsen, L. (2009). The accuracy of geometries for iron porphyrin complexes from density functional theory. The Journal of Physical Chemistry A, 113:11949–11953. doi: 10.1021/jp9035716. [108] Tomasi, J., Mennucci, B., and Cammi, R. (2005). Quantum mechanical continuum solvation models. Chemical Reviews, 105:2999–3094. [109] Floris, F. and Tomasi, J. (1989). Evaluation of the dispersion contribution to the solvation en- ergy. a simple computational model in the continuum approximation. Journal of Computational Chemistry, 10:616–627. [110] Floris, F. M., Tomasi, J., and Ahuir, J. L. P. (1991). Dispersion and repulsion contribu- tions to the solvation energy: Refinements to a simple computational model in the continuum approximation. Journal of Computational Chemistry, 12:784–791. [111] Pierotti, R. A. (1976). A scaled particle theory of aqueous and nonaqueous solutions. Chemical Reviews, 76:717–726. doi: 10.1021/cr60304a002. [112] Yu, H. S., He, X., Li, S. L., and Truhlar, D. G. (2016). Mn15: A kohn–sham global- hybrid exchange–correlation density functional with broad accuracy for multi-reference and single-reference systems and noncovalent interactions. Chemical Science, 7:5032–5051. 59 [113] Marenich, A. V., Cramer, C. J., and Truhlar, D. G. (2009). Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. The Journal of Physical Chemistry B, 113:6378–6396. [114] Barone, V. and Cossi, M. (1998). Quantum calculation of molecular energies and energy gradients in solution by a conductor solvent model. The Journal of Physical Chemistry A, 102:1995–2001. doi: 10.1021/jp9716997. [115] Neese, F. (2022). Software update: The orca program systemβ€”version 5.0. WIREs Compu- tational Molecular Science, 12:e1606. [116] Tekarli, S. M., Drummond, M. L., Williams, T. G., Cundari, T. R., and Wilson, A. K. (2009). Performance of density functional theory for 3d transition metal-containing complexes: Utilization of the correlation consistent basis sets. The Journal of Physical Chemistry A, 113:8607–8614. doi: 10.1021/jp811503v. [117] Himo, F. and de Visser, S. P. (2022). Status report on the quantum chemical cluster approach for modeling enzyme reactions. Communications Chemistry, 5:29. [118] Himo, F. (2017). Recent trends in quantum chemical modeling of enzymatic reactions. Journal of the American Chemical Society, 139:6780–6786. [119] Siegbahn, P. E. M. (2011). The effect of backbone constraints: The case of water oxidation by the oxygen-evolving complex in psii. ChemPhysChem, 12:3274–3280. [120] Robinson, C. E., Wu, X., Morris, D. C., and Gimble, J. M. (1998). Dna bending is induced by binding of the peroxisome proliferator-activated receptor 𝛾2 heterodimer to its response element in the murine lipoprotein lipase promoter. Biochemical and Biophysical Research Communications, 244:671–677. 60 APPENDIX A SUPPORTING TABLES Table S3.1 Docking scores of selected poses of PFAS compounds using MOE (LCN: L-carnitine). 61 APPENDIX B SUPPORTING FIGURES Figure S3.1 The docking pockets for PPAR𝛾 and RXR𝛼 LBD are shown with yellow surfaces. The other colors and their representations are as follows: Red: RXR𝛼, green: PPAR𝛾, tan: DNA, blue: Zn2+ ions. A: The view from the side; B; the view from the above. Figure S3.2 A: Overlap of the selected primary poses in the PPAR𝛾-LBD pocket. The Tyr473 residue is shown in green ball-and-stick representation, and the PFAS compounds are shown in stick representation. B: The overlap of the selected primary poses in the RXR-LBD pocket. The Arg316 residue is shown in yellow ball-and-stick representation, The Phe 313 is shown in pink ball-and-stick representation and the PFAS compounds are shown in stick representation as well. 62 Figure S3.3 Overlap of the selected primary poses in the DBD pocket. Coordinating cysteine residues are shown in ball-and-stick representation, and the PFAS compounds are shown in stick representation. The zinc coordinating atom is shown in pink. Figure S3.4 PFOA RMSD plots for Pocket 1. 63 Figure S3.5 PFHxDA RMSD plots for Pocket 1. Figure S3.6 PFOS RMSD plots for Pocket 1. 64 Figure S3.7 6:2 FTSA RMSD plots for Pocket 1. Figure S3.8 PFOA RMSD plots for Pocket 1. 65 Figure S3.9 PFOA RMSD plots for Pocket 1. Figure S3.10 PFOSA RMSD plots for Pocket 1. 66 Figure S3.11 Et-PFOSAAcOH RMSD plots for Pocket 1. Figure S3.12 GenX RMSD plots for Pocket 1. 67 Figure S3.13 ADONA RMSD plots for Pocket 1. Figure S3.14 L-carnitine (LCN) RMSD plots for Pocket 1. 68 Figure S3.15 The orientations of PFOA at 1 ns (A) and 60 ns (B) of the simulation. The PPAR𝛾 is shown in green and RXR𝛼 is shown in red. The H12 helix from PPAR𝛾 is highlighted, along with important residues around PFOA (shown in yellow). The hydrogen bonding is indicated with a pink line. Figure S3.16 The orientations of PFHxDA at 1 ns (A) and 60 ns (B) of the simulation. The PPAR𝛾 is shown in green and RXR𝛼 is shown in red. The H12 helix from PPAR𝛾 is highlighted, along with important residues around PFHxDA (shown in yellow). The hydrogen bonding is indicated with a pink line. 69 Figure S3.17 The orientations of 6:2 FTSA at 1 ns (A), 20 ns (B), and 75 ns (C) of the simulation. The PPAR𝛾 is shown in green and RXR𝛼 is shown in red. The H12 helix from PPAR𝛾 is highlighted, along with important residues around 6:2 FTSA (shown in yellow). The hydrogen bonding is indicated with a pink line. Figure S3.18 The orientations of Et-PFOSAAcOH at 1 ns (A), and 75 ns (B) of the simulation. The PPAR𝛾 is shown in green and RXR𝛼 is shown in red. The H12 helix from PPAR𝛾 is highlighted, along with important residues around Et-PFOSAAcOH (shown in yellow). The hydrogen bonding is indicated with a pink line. 70 Figure S3.19 RMSD plots for the apo PPAR𝛾-RXR𝛼/DNA complex. Figure S3.20 RMSD plots for the PPAR𝛾-RXR𝛼/DNA complex with its corresponding respective natural ligands. 71 Figure S3.21 PFOA RMSD plots for the investigated DBD pocket (Pocket 3). Figure S3.22 PFHxDA RMSD plots for the investigated DBD pocket (Pocket 3). 72 Figure S3.23 PFOS RMSD plots for the investigated DBD pocket (Pocket 3). Figure S3.24 6:2 FTOH RMSD plots for the investigated DBD pocket (Pocket 3). 73 Figure S3.25 6:2 FTSA RMSD plots for the investigated DBD pocket (Pocket 3). Figure S3.26 PFOSA RMSD plots for the investigated DBD pocket (Pocket 3). 74 Figure S3.27 Et-PFOSAAcEtOH RMSD plots for the investigated DBD pocket (Pocket 3). Figure S3.28 GenX RMSD plots for the investigated DBD pocket (Pocket 3). 75 Figure S3.29 ADONA RMSD plots for the investigated DBD pocket (Pocket 3). Figure S3.30 Distribution of the DNA bending angle in the Pocket 1 simulations with PFAS and L-carnitine. LCN: L-carnitine, PLB, 9CR: Natural ligands. 76 Figure S3.31 Distribution of the DNA bending angle in the Pocket 3 simulations with PFAS. PLB, 9CR: Natural ligands. 77 CHAPTER 4 INFLUENCE OF PFAS ON HUMAN THYROGLOBULIN PROTEIN: IMPACT ON THYROID HORMONE SYNTHESIS 78 4.1 Introduction Environmental pollutants can significantly impact the health of living organisms in the ecosys- tem and human populations. Some of the most recent health concerns related to environmental pollutants are attributed to per- and polyfluoroalkyl substances (PFAS), a group of man-made chemicals with wide industrial applications due to their unmatched water- and oil-repellent prop- erties as well as heat-resistance. 1–3 There are more than 14,000 compounds listed in the EPA PFASTRUCT database as of June 2023, however, remarkably, only approximately one percent of them have been tested for their toxicities. EPA,5 PFAS can be found in many products with non- stick and water-repellent surfaces, including food packaging, water-resistant clothing and shoes, and firefighting foams, to provide only a small number of examples, and are often referred to as β€œforever chemicals” or β€œzombie chemicals” due to their resistance to degradation. The resistance to degradation, consequently, has resulted in bioaccumulation of PFAS compounds in humans and animals, which has been linked to disruptions of glucose and bile acid metabolisms, immune, reproductive, and thyroid systems, and lipid homeostasis. 6–13 To provide a backdrop for the impact of PFAS on thyroid systems, a functional thyroid gland is crucial for neurodevelopment, cogni- tive and behavioral growth, as well as for regulating metabolic rate. 11 The synthesis of thyroid hormones, thyroxine (T4) and triiodothyronine (T3) is performed by thyroglobulin, which is a highly conserved protein in vertebrates, and thyroglobulin is located in the lumen of the thyroid follicles. 14 In humans, the thyroglobulin protein – called the human thyroglobulin (hTG) protein - is a homodimer and has four hormonogenic sites (sites A to D as shown in Figure 1) - the four sites where the T4 hormone is produced. 15–17 These sites on hTG are the locations where the thyroid hormones are synthesized. Although the exact mechanism is still not fully understood, the available cryo-EM structures indicate that the orientation of ITY residues as well as neighboring lysine and phenylalanine residues are crucial for the mechanism to take place. 15,17 Current research on the impact of PFAS on thyroid function is mainly based on epidemiological studies and clinical data, with mixed conclusions as to whether PFAS leads to an increase or decrease of the thyroid hormone levels. One study investigating the associations between PFAS exposure during pregnancy and the 79 neurodevelopment in infants indicated a relationship with PFHxS and PFBS exposure, linking to thyroid hormone-mediated neurodevelopment problems. 12 Prior studies have observed that during pregnancy, there is an association between the maternal levels of thyroid stimulating hormone and the PFHxS, PFNA, and PFOA concentrations. 18–23 Animal studies in rats indicate a decrease in T3 and T4 levels upon PFOA and PFOS exposure, 24? ,25 while long-term exposure to PFNA was linked to an increase in T3 levels in zebrafish. 25,26 While there is no single mechanism in which PFAS could disrupt the thyroid system, there are in silico and in vitro studies addressing various potential targets. 27 One study investigated the sodium-iodide symporters for rat and human thyroid cell lines and found that PFOS and PFHxS inhibited this protein. 27? ,28 In a number of prior studies, PFAS exposure has been proposed to alter the expression of proteins important for iodide removal and thyroid hormone signaling. 24,29–31 A study of PFAS’ effects on the thyroid was performed on common carp fish, 32? and Manera et al. suggested that the PFOA concentration can cause significant effects on the thyroid follicles of carp by disrupting production as well as reabsorption of thyroglobulin. As the PFAS toxicity on thyroid chemistry is a complicated and mainly uncharted process, the source of the thyroid hormone production, namely hTG, and the influence of PFAS on the thyroid hormone synthesis has been investigated. Understanding how PFAS can impact home- ostasis in humans will provide insight towards the development of potential mitigation strategies, such as targeted treatments and interventions for thyroid-related health issues. 4.2 Computational Protocols The dimeric human hTG protein atomistic structure (PDB ID: 6SCJ, 3.6Γ… resolution) was obtained from the RSCB Protein Data Bank. 15 The missing loops of the structure were modeled using the I-TASSER server separately by including ten amino acids from each end of the missing loops. 33 The prepared structure was solvated and then minimized using Amber20 as described in the Simulation Details section. 4.2.1 Simulation details The initial step of this investigation involved selecting a list of carboxylic PFAS (PFCA) and sulphonic PFAS (PFSA) with carbon chains varying from four to twelve, and their structures are 80 Figure S3.1 (a) The dimeric structure of human thyroglobulin (hTG) and the three hormonogenic sites on Chain A are shown. Among the identified hormogenic sites, Site A has two potential donor residues, and Site D has hormonogenic tyrosines from both chains. (b) The docking poses for PFAS in Site B along with ITY residues. (c) The binding energies for investigated PFAS, calculated with MM-GBSA and MM-PBSA methods are shown, along with the standard deviations. Carboxylic and sulphonic PFAS are listed. provided in Table S1. Molecular Operating Environment (MOE) software was used for the docking procedures and protonation state determination. 34 The minimized hTG dimer structure was used for the docking procedures. The binding pocket was defined by using a pharmacophore docking strategy to place the PFAS head groups near hormonogenic Tyr residues. The pharmacophore consisted of two features to place the head group. For the short-chain PFAS, a docking procedure with no pharmacophore was also performed. The pharmacophore was used for the initial placement process with a London dG scoring function to obtain 100 poses, which were further refined to five poses with an induced fit method and Generalized Born Volume integral/Weighted Surface Area (GBVI/WSA) scoring function 34 The highest scoring poses for each PFAS were selected 81 for Molecular Dynamics (MD) simulations. For the docking procedure without pharmacophore placement, Triangle Matcher method was used for the initial PFAS placement. Both monomeric and dimeric structures were considered in the modeling of the binding of PFAS to Site B in hTG protein to understand the effects of the dimer structure. The binding energies of the PFPA and PFBA compounds to the dimer hTG structure is reported in Table S4.10. The dimer hTG apo, monomeric apo, and PFAS-bound hTG monomeric systems were pre- pared using the tleap module of Amber20/AmberTools22 software. 35 The partial charges of PFAS compounds and iodinated Tyr residues (ITY) were calculated using the AM1-BCC method as im- plemented in the antechamber module of AmberTools22 with gaff2. 36,37 The protein, PFAS, and waters were modeled using ff14SB, gaff2, and TIP4P-EW force fields, respectively. 35? –39 NaCl ions were added to each a 0.15M of salt concentration to mimic the natural environment. On average, a monomeric system consisted of 680,000 atoms while a dimeric system had 1,020,000 atoms, including the solvent molecules. The minimization and heating steps were performed in a stepwise fashion as follows: (i) The minimization was done in four steps with the following restraints (100, 50, 10 ,0 kcal mol-1 Γ…-2), with each step having 20,000 cycles. (ii) The systems were heated up from 0 K to 20 K in 160 ps with a 3 kcal mol-1 Γ…-2 restraint applied on all atoms. Then, the systems were heated to 200 K for 250 ps with restraints applied to the backbone atoms only, followed by a short equilibrium simulation at 200 K for 200 ps with no restraints. Finally, heating to 300 K was done for 900 ps with no restraints applied. (iii) Before the production step, a 500 ps long equilibrium simulation was performed at 300 K. The minimized and equilibrated structures were simulated for 20 ns at 300 K and 1 atm using 1 fs timesteps with the SHAKE algorithm. 40 A duplicate set of simulations was performed by reinitializing the velocities after the heating step. The Langevin thermostat and isotropic position scaling were selected for the temperature and pressure controls, respectively. 41,42 All simulations were performed using the pmemd.cuda module as implemented in Amber20 suite 35,43 82 4.2.1.1 Analysis The binding energies were calculated by selecting every tenth frame for the last 5 ns of simula- tions, resulting in a total of 500 frames for a single simulation. The Molecular Mechanics-Poisson Boltzmann Surface Area/Generalized Born Surface Area (MM-PBSA/GBSA) method was used to estimate the binding strengths of the PFAS, as implemented in Amber20/AmberTools22. 44 As the focus of this work is not the exact estimation of the binding energies, but rather to provide a rank- ing of the binding strengths, MM-PBSA/GBSA methodology is useful in providing insight about binding pockets with partial solvent exposure. 45–50 The root-mean-square distances (RMSD), per- residue root-mean-square fluctuations (RMSF), and hydrogen bond analysis were calculated using the cpptraj module from AmberTools22 with the default parameters. 50 The per-residue decompo- sition energies were calculated by taking the non-bonded interactions into account for the residues within 10 Γ… of the PFASs. Clustering was performed to obtain the most dominant orientations of the ITY residues and PFAS using a hierarchical agglomerative algorithm with epsilon value of 3.0. The total energy convergence, as well as the structural convergence of the simulated systems were considered, and the last 5 ns of the trajectories were utilized for all analysis. Clustering of the trajectories was performed using dbscan (Density-Based Spatial Clustering of Applications with Noise) method as implemented in the cpptraj module, using six minimum samples and an epsilon value of 2. 50 The number of minimum samples were determined by a k-distribution plot. Only the last 5 ns of the trajectories were considered for this analysis. 4.2.1.2 Tyr orientation The positions of ITY residues were clustered and the angle and distance between them were measured for the last 5 ns of each simulation, as shown in Figure S4.5. The distance between the center-of-mass of the side chain atoms of ITY residues and the distance between the reactive atoms were measured. To measure the angle between the ITY residues, a plane for each ITY residue was described by two vectors: each extending from the CG atom to the iodine atoms (Figure S4.5). Then, the angle between the two planes was calculated to estimate the relative orientations of ITY residues. 83 4.3 Results and Discussion 4.3.1 PFAS Binding The location of Sites A, B, and D, and the docking poses of PFAS are shown; all of the functional groups that point towards the selected PFAS ITY2573 residue are shown in Figure 1(a). Site B of the hTG protein was selected for the suitability of the initial positioning of tyrosine residues, as Site A and Site D have either two donor ITY residues or have tyrosine residues from different chains. As the hTG monomer is a large protein with 2,700 residues, the RMSDs and total energies were calculated for the whole simulation length to assess if the simulated systems converged structurally and energetically (Figure S4.9-4.10 for RMSD, Figure S4.11-4.12 for total energies). For the majority of the hTG simulations, the total energy reached a plateau within the last 5 ns of the simulations as well as the RMSD time-series reported in the SI; hence, the last 5 ns of the PFAS systems were considered for analysis. Both PFAS and binding site showed no significant conformation change during this simulation period. The binding energies for each hTG/PFAS complex were calculated using end-point methods (MM-GBSA/PBSA) to estimate the relative binding strength of carboxylic and sulphonic PFAS with various fluorinated carbon chain lengths, as per Figure 1(c). Current literature indicates that the thyroid hormone synthesis in hTG can be affected negatively by the exposure to PFAS with eight to nine fluorinated carbons. 18–23 In our simulations, carboxylic PFAS showed an increase in binding strengths as the fluorinated carbons increased from PFBA to PFNA. However, this observation was different for PFCA with more than nine carbons. For PFDA, PFUnDA, PFDoDA, and PFTrDA, the binding energies were -10 and -13 kcal/mol (MM-PBSA). The binding energy analyses of PFSA are quite different than for PFCA. Among the investigated sulphonic PFAS, the strongest binding energy was observed for PFNS. Interestingly, PFBS was also among the strong binders, which was previously noted by Yao et al. 12 The binding strengths of longer-chain PFSA were also higher than their PFCA counterparts with the same fluorinated carbon chain length. These differences in binding strengths indicate that PFCA and PFSA compounds have different impacts on the binding site, and consequently, to the thyroid hormone synthesis. 84 Figure S3.2 The angle/distance distribution of ITY side chains when PFCA (a) or PFSA (b) is present in the pocket. The angles are calculated between the normal vectors of the planes, as described in (c) and Figure S4.5. The most dominant orientations for each PFAS are also shown. The horizontal and vertical line intersection indicates the angle/distance calculated from the cryo-EM orientation (6.4 Γ… and 76Β°). 85 Residue decomposition can help understand some of the energetic differences observed in Figure 1a), so they were calculated for each simulation of PFCA and PFSA compounds with the binding pocket residues, as shown in Table 1, and Table S2-S5. The pocket residues are separated into four groups based on their polarity and acidity: polar, non-polar, basic, and acidic residues. The strengths of the electrostatic interactions and van der Waals interactions made by PFAS and residues within 10 Γ… radius suggest that the basic residues showed the strongest interaction among all, specifically the K2536 had the highest interaction energy with all of the PFAS. The acidic residues mainly had weak and non-stabilizing interactions, with values larger than zero. In general, electrostatic interactions with charged residues had stronger interactions with PFCA, while PFSA molecules interacted with polar and non-polar residues in the binding site through the C-F tail. Considering the fact that PFSA showed slightly higher MM-PBSA energies, the tail group of PFAS provides stronger anchoring to surrounding residues than the head groups of PFAS. The contributions from the ITY residues were also identified, as they play a pivotal role in thyroid hormone synthesis. PFAS primarily formed stabilizing interactions with ITY residues, although these interactions were weaker than those with charged residues. Among the PFAS with highest ITY interaction energies, PFNA, PFNS, PF12SA also exhibited a high MM-PBSA energies, suggesting that ITY interactions could be the determining factor in predicting PFAS binding to Site B. The interactions between the PFAS and ITY were established between the diiodotyrosine side chains and the C-F tails (Figure S4.7), and the total contribution from ITY residues increased as the fluorinated carbon chain length increased in PFCA molecules, with the exception of PFNA and PFTrDA. This finding provides further evidence supporting the crucial role of the C-F tail in stabilizing the PFAS binding. The hydrogen bond interactions that were formed by PFAS during the simulations were also investigated and are reported in Table S6. PFAS with 8 to 10 carbons predominantly formed direct hydrogen bonds, and as the chain length increased or decreased, the number of hydrogen bonds formed by PFAS with the protein decreased. During the simulations, PFDS exhibited the highest number of interactions with pocket residues, followed by PFNS and PFBS. All three compounds formed hydrogen bonds with S430, Q431, and ITY2573 residues, which also had high interaction energies with PFDS, 86 PFNS, and PFBS. Among the PFCA compounds, the highest number of interactions were observed for PFNA. These observations further support the notion that for PFAS with certain carbon chain length, the local interactions made with the head group and, more importantly, through the C-F tail are important determinants of being the strong binders, compared to other PFAS. Furthermore, the strong interactions between C-F tail of PFSA compounds and polar/nonpolar residues allowed sulphonic PFAS to have higher binding energies than PFCA. 4.3.2 Changes in local interaction patterns upon PFAS binding Understanding how the presence of PFAS in the selected positions of the thyroglobulin protein change structural interactions, the residues located nearby PFAS were divided into three regions: Regions 1, 2, and 3 (Figure S4.4). Region 1 has a loop secondary structure, and Regions 2 and 3 have alpha helix structures, and the calculated hydrogen bond percentages are reported Table S4.7-S4.9. In Region 1, the interactions observed in the presence of PFAS were not significantly different: the apo system has interactions that were not observed in PFAS bound cases, or vice versa. For this region, the interactions did not show a distinct pattern either for head group type or the carbon chain length. On the other hand, the interactions within Region 2, clearly showed a noteworthy pattern: as the fluorinated carbon chain length increased, the number of interactions observed within the binding site increased (Table S4.8). The highest interaction percentage in apo simulations was for the S2534/A2538 residue pair, which is part of the alpha helix in Region 2, with the interaction occurring through their backbone atoms. S2534/A2538 interaction persisted in the majority of PFAS simulations, with the exception of the PFOA, PFNA, PFOS, and PFDA simulations. The rest of the dominant interactions in the apo system persisted for 25 to 35% of the simulation and were observed in the majority of PFAS simulations as well. The ITY2540/K2536 interaction persisted in 25% of the simulations in the apo system, and the interaction percentage increased as the carbon chain length of PFAS increased. A higher number of interactions among the residues in Region 2 results in a more stabilized helix-loop-helix structure in the presence of longer PFAS only. The importance of Region 2 for the thyroid hormone synthesis was observed in a recent study where the crystal structure of bovine TG was obtained after the formation of 87 T4 hormone. 17 Upon comparison of hTG and bovine TG with T4, one significant difference was observed for Region 2: to allow for T4 formation, Region 2 was shifted and three residues from the helix were unfolded and become part of the loop: S2534, S2535, and K2536. These are the residues that formed new hydrogen bonds in the presence of longer chain PFAS; in essence, the binding of longer chain PFAS triggers the formation of more interactions within Region 2, making it more rigid. Hence, by preventing the required flexibility, PFAS would be able to interfere with thyroid hormone synthesis. The interaction pattern observed for Region 3 is similar for Region 1. While the interactions observed in the apo system were protected in most PFAS-bound simulations, the percentages were generally higher in the presence of PFAS (Table S4.9). In general, however, the hydrogen bond percentages did not show significant interaction differences between the Apo system and PFAS simulations within this region. One interesting observation for Region 3 is that the orientations of PFSA compounds were usually towards the residues within this region (Figure S4.7), however, PFCA compounds showed preferences towards the Lysine residues in Region 2. This orientation preference, as will be explained in the following section, results in a characteristic distribution of distance and angles between ITY residues in the presence of PFCA and PFSA (Figure S4.5). 4.3.3 Impact of PFAS Binding on ITY orientations As the disturbance of thyroid hormone levels has been identified as one of the health con- sequences of PFAS exposure, understanding how the presence of PFAS could affect the thyroid hormone synthesis in the investigated hormonogenic site is fundamental. The proposed mechanism for T4 synthesis indicates that the acceptor and donor ITY residues should be within 6 Γ… distance and nearly be parallel to one another, based on the available cryo-EM structures of hTG 15. While the angle between the ITY planes provides insight about the respective positioning of the side chains of these hormonogenic residues, the distances between the donor and acceptor atoms are also an important feature in assessing the thyroid hormone formation. Therefore, the distance between the oxygen from the donor ITY2540 and the carbon from the acceptor ITY2573 were tracked for all simulations. The angle between the ITY side chains was also calculated, and their distributions as 88 well as the dominant orientations of residues are shown in Figure 2 The angle/distance distribution plots show that the PFCA and PFSA compounds impact the ITY orientation. The ideal positioning of ITY residues in Site B which would allow for the formation of T3 and T4 hormones have a 76Β° angle and 6 Γ… distance, based on the available cryo-EM structure of human Thyroglobulin (Figure 2). The presence of PFAS generally limited the conformational space of ITY residues, in terms of the distance and angle tracked here. The apo system has a single peak at 145Β° along with a shoulder at 120Β° with a wide distribution. The distance range of the apo simulation was observed to be between 6-14 Γ…. PFCAs had a broader distance distribution ( 3-14 Γ…), while PFSA compounds displayed a narrower one, around 6 to 12 Γ…, with the exception of PFPrS. The correla- tion between the fluorinated carbon chain length and angle, however, shows different preferences between PFCA and PFSA compounds. The smallest angle in the distribution observed for PFCA was 60Β° (small peak of PFNA), and it was 70Β° for PFSA (PF11SA). On the other hand, the largest angles observed were for PFDoDA and PFTrDA ( 140Β°), and PFPrS, PFHpS, and PF12SA ( 170Β°) among the PFSA compounds. Overall, the smallest angle/distance distribution among PFCA was observed for PFBA, PFOA, and PFUnDA, while among PFSA, it was PFBS, PFNS, and PF11SA. The two clusters formed by PFCA compounds (Fig. 3) can be distinguished by the distance threshold of 6 Γ…. Only three PFCA compounds had distances smaller than 6 Γ…: PFUnDA, PFBA, and PFOA. However, only in PFBA bound simulations, which is a weak binder, ITY residues show distance/angle distribution that would allow for the formation of thyroid hormones. On the other hand, PFOA has an average binding strength, as per MM-PBSA energies, and it has strong interactions with the ITY2573 residue. Similarly, PFUnDA has strong binding energy and strong interactions with ITY2540. The strong interactions with ITY residues could prevent them from forming T3 and T4 thyroid hormones. The other cluster seen in Figure 2(a) has large distance (8-14 Γ…) and angle (100-140Β°). PFNA, among those compounds, showed a strong peak around 120Β° with a smaller peak around 70Β°. The interesting fact about PFNA interactions that played a role in the bimodal distribution is the stronger interactions with ITY2540, instead of ITY2573, as mentioned before (Table S4.2). The interaction preference also contributes to the β€˜sandwiching’ 89 Table S3.1 The sum of per-residue decomposition energies for charged residues and polar & non-polar residues (in kcal mol -1). behavior that was seen in PFNA simulations, where PFAS places itself in-between two ITY residues (Fig. S4.7). Furthermore, the stronger MM-PBSA binding energy of PFNA can also be attributed to sandwiching interaction. Other PFCA has mainly stronger interaction with ITY2573 through their tail groups, and do not show β€˜sandwiching’ behavior. Among the PFSA species, PFNS and PF12SA had a similar interaction type where the intercalation between ITY residues happened (Fig. S4.7). In this case, however, PFNS exhibited strong interaction with ITY2573, while PF12SA had interactions with both ITY, with comparable strengths (Table S4.77). Many of the PFAS bound systems did not show any distances and/or angles closer to those observed in cryo-EM structure, except for PFUnDA, which had a 80Β° angle and 5 Γ… distance. For the PFSA, no system had values close to those of the experimental structure, indicating that the presence of various PFAS near hormonogenic site B can prevent the conformational space that would allow the formation of thyroid hormones. Furthermore, based on our analysis of the investigated PFAS-bound systems, the degree in which the PFAS can impact this conformational space depends on (i) the interaction mode of PFAS with the surrounding residues, including ITYs, (ii) the length of the tail group of PFAS, and (iii) the hydrogen bond interactions of head group of PFAS. The binding energies indicate that PFSA molecules have stronger interactions with the investi- gated site than with PFCA compounds, as shown in Figure 1(c). Furthermore, there is a chain-length dependent impact on the binding strength, although this dependence is not completely linear. As the chain length increased from three to eight or nine fluorinated carbons (PFNA and PFNS, respec- 90 tively), the binding energies showed a linear increase. And as the chain gets longer than eight or nine carbons, however, there is a drop in the binding strength, indicating that PFAS with eight and nine carbons can impact the hTG Site B by binding more strongly than shorter chain PFAS and forming key interactions with surrounding residues. 7,12,18,19,31 A 2023 study by Vollmar et al. suggests that PFOS and PFOA have the potential to disrupt the T4 levels. 51 Our study for the first time shows that the disruption by PFAS occurs through binding to the hTG protein and, thus, interfering with the thyroid hormone synthesis. The presence of PFAS, overall, causes the conformational space of the distance and angle between two ITY residues to narrow, as compared to the distribution observed for the apo system. While ITY residues do require the Thyroid Peroxidase (TPO) enzyme to form the thyroid hormones through a mechanism that is still unknown, the proximity and relative orientation of ITY residues are still important for successfully producing T3/T4 hormones. 15,16,52 The majority of PFAS-bound systems did not show distance and angle distribution was close to cryo-EM structure. Moreover, the influence of PFCA compounds on the conformational space of ITY residues suggests a wider range of distances compared to of PFSA molecules, pointing that these two series formed interactions with the different residues. While PFCA head groups prefer to orient towards ITY2540, PFSA compounds pointed towards the loop structure near binding site. The interactions of PFAS also impacted the distance and angle distributions significantly. PFNA and PFNS, for instance, showed a particular β€˜sandwiching’ behavior between two ITY residues, that was not observed for any other PFAS investigated. These two PFAS also had strong hydrogen bonds with the surrounding residues. Together, these different types of interactions lead them to have strong binding energies, and consequently, have more pronounced adverse effects on thyroid hormone synthesis on Site B. The local interaction changes within the binding area indicate that the longer chain PFAS could lead to more rigid helix structure in Region 2. A comparison with a recent cryo-EM structure of the bovine TG with T4 formed in Site B shows that there is a shift in Region 2 associated with the formation of the thyroid hormone. 18 Therefore, for the first time, we suggest that the changes to the hydrogen bond network within Region 2 upon long-chain PFAS binding could inhibit the required motion for the formation of thyroid hormones. As the linkages 91 between the PFAS exposure and health problems are increasing and the governments in both the United States and the European Union are proposing restrictions on PFAS production due to these adverse health effects, a detailed molecular understanding of PFAS toxicity through computational methods is necessary to establish effective mitigation strategies. To the best of our knowledge, this work is the first of its kind to investigate the influence of PFAS binding to Site B of hTG and the potential impact of PFAS on thyroid hormone synthesis by causing rigidity in binding region. We observed that PFAS with eight to nine carbons with a distinct binding mode showed higher binding energies. The longer chain PFAS, on the other hand, resulted in a change in the rigidity of Region 2, which is important for thyroid hormone synthesis. Understanding these governing factors of PFAS toxicity on thyroid hormone synthesis would help enable the development of effective mitigation strategies and understand harmful influences of PFAS in humans better. 92 BIBLIOGRAPHY [1] Sajid, M. and Ilyas, M. (2017). Ptfe-coated non-stick cookware and toxicity concerns: a perspective. Environmental Science and Pollution Research, 24:23436–23440. [2] Schaider, L. A., Balan, S. A., Blum, A., Andrews, D. Q., Strynar, M. J., Dickinson, M. E., Lunderberg, D. M., Lang, J. R., and Peaslee, G. F. (2017). Fluorinated compounds in u.s. fast food packaging. Environmental Science & Technology Letters, 4:105–111. [3] Rao, N. S. and Baker, B. E. (1994). Textile finishes and fluorosurfactants, pages 321–338. Springer US. [EPA] Us environmental protection agency epa’s per- and polyfluoroalkyl substances (pfas) action plan 2019 no. february. [5] Houck, K. A., Patlewicz, G., Richard, A. M., Williams, A. J., Shobair, M. A., Smeltz, M., Clifton, M. S., Wetmore, B., Medvedev, A., and Makarov, S. (2021). Bioactivity profiling of per- and polyfluoroalkyl substances (pfas) identifies potential toxicity pathways related to molecular structure. Toxicology, 457:152789. [6] Sunderland, E. M., Hu, X. C., Dassuncao, C., Tokranov, A. K., Wagner, C. C., and Allen, J. G. (2019). A review of the pathways of human exposure to poly- and perfluoroalkyl sub- stances (pfass) and present understanding of health effects. Journal of Exposure Science & Environmental Epidemiology, 29:131–147. [7] Rappazzo, K., Coffman, E., and Hines, E. (2017). Exposure to perfluorinated alkyl sub- stances and health outcomes in children: a systematic review of the epidemiologic literature. International Journal of Environmental Research and Public Health, 14:691. [8] Duan, X., Sun, W., Sun, H., and Zhang, L. (2021). Perfluorooctane sulfonate continual exposure impairs glucose-stimulated insulin secretion via sirt1-induced upregulation of ucp2 expression. Environmental Pollution, 278:116840. [9] Anderko, L. and Pennea, E. (2020). Exposures to per-and polyfluoroalkyl substances (pfas): po- tential risks to reproductive and children’s health. Current Problems in Pediatric and Adolescent Health Care, 50:100760. [10] Guo, H., Chen, J., Zhang, H., Yao, J., Sheng, N., Li, Q., Guo, Y., Wu, C., Xie, W., and Dai, J. (2022). Exposure to genx and its novel analogs disrupts hepatic bile acid metabolism in male mice. Environmental Science & Technology, 56:6133–6143. [11] Coperchini, F., Croce, L., Ricci, G., Magri, F., Rotondi, M., Imbriani, M., and Chiovato, L. (2021). Thyroid disrupting effects of old and new generation pfas. Frontiers in Endocrinology, 11:1077. 93 [12] Yao, Q., Vinturache, A., Lei, X., Wang, Z., Pan, C., Shi, R., Yuan, T., Gao, Y., and Tian, Y. (2022). Prenatal exposure to per- and polyfluoroalkyl substances, fetal thyroid hormones, and infant neurodevelopment. Environmental Research, 206:112561. [13] Byrne, S. C., Miller, P., Seguinot-Medina, S., Waghiyi, V., Buck, C. L., von Hippel, F. A., and Carpenter, D. O. (2018). Exposure to perfluoroalkyl substances and associations with serum thyroid hormones in a remote population of alaska natives. Environmental Research, 166:537–543. [14] Luo, Y., Ishido, Y., Hiroi, N., Ishii, N., and Suzuki, K. (2014). The emerging roles of thyroglobulin. Advances in Endocrinology, 2014:1–7. [15] Coscia, F., Taler-Verčič, A., Chang, V. T., Sinn, L., O’Reilly, F. J., IzorΓ©, T., Renko, M., Berger, I., Rappsilber, J., Turk, D., and LΓΆwe, J. (2020). The structure of human thyroglobulin. Nature, 578:627–630. [16] ul Kim, H., Jeong, H., Chung, J. M., Jeoung, D., Hyun, J., and Jung, H. S. (2022). Comparative analysis of human and bovine thyroglobulin structures. Journal of Analytical Science and Technology, 13:1–8. [17] Marechal, N., Serrano, B. P., Zhang, X., and Weitz, C. J. (2022). Formation of thyroid hormone revealed by a cryo-em structure of native bovine thyroglobulin. Nature Communications, 13:1–7. [18] Wang, Y., Rogan, W. J., Chen, P. C., Lien, G. W., Chen, H. Y., Tseng, Y. C., Longnecker, M. P., and Wang, S. L. (2014). Association between maternal serum perfluoroalkyl substances during pregnancy and maternal and cord thyroid hormones: Taiwan maternal and infant cohort study. Environmental Health Perspectives, 122:529–534. [19] Webster, G. M., Venners, S. A., Mattman, A., and Martin, J. W. (2014). Associations between perfluoroalkyl acids (pfass) and maternal thyroid hormones in early pregnancy: a population- based cohort study. Environmental Research, 133:338–347. [20] Lewis, R. C., Johns, L. E., and Meeker, J. D. (2015). Serum biomarkers of exposure to perfluoroalkyl substances in relation to serum testosterone and measures of thyroid function among adults and adolescents from nhanes 2011–2012. International Journal of Environmental Research and Public Health 2015, Vol. 12, Pages 6098-6114, 12:6098–6114. [21] Lopez-Espinosa, M. J., Fitz-Simon, N., Bloom, M. S., Calafat, A. M., and Fletcher, T. (2012). Comparison between free serum thyroxine levels, measured by analog and dialysis methods, in the presence of perfluorooctane sulfonate and perfluorooctanoate. Reproductive Toxicology, 33:552–555. [22] Wang, Y., Starling, A. P., Haug, L. S., Eggesbo, M., Becher, G., Thomsen, C., Travlos, G., King, D., Hoppin, J. A., Rogan, W. J., and Longnecker, M. P. (2013). Association between perfluoroalkyl substances and thyroid stimulating hormone among pregnant women: a cross- 94 sectional study. Environmental Health: A Global Access Science Source, 12:1–7. [23] Kim, S., Choi, K., Ji, K., Seo, J., Kho, Y., Park, J., Kim, S., Park, S., Hwang, I., Jeon, J., Yang, H., and Giesy, J. P. (2011). Trans-placental transfer of thirteen perfluorinated compounds and relations with fetal thyroid hormones. Environmental Science and Technology, 45:7465–7472. [24] Yu, W. G., Liu, W., and Jin, Y. H. (2009). Effects of perfluorooctane sulfonate on rat thyroid hormone biosynthesis and metabolism. Environmental Toxicology and Chemistry, 28:990–996. [25] Boas, M., Feldt-Rasmussen, U., and Main, K. M. (2012). Thyroid effects of endocrine disrupting chemicals. Molecular and Cellular Endocrinology, 355:240–248. [26] Liu, Y., Wang, J., Fang, X., Zhang, H., and Dai, J. (2011). The thyroid-disrupting effects of long-term perfluorononanoate exposure on zebrafish (danio rerio). Ecotoxicology, 20:47–55. [27] Buckalew, A. R., Wang, J., Murr, A. S., Deisenroth, C., Stewart, W. M., Stoker, T. E., and Laws, S. C. (2020). Evaluation of potential sodium-iodide symporter (nis) inhibitors using a secondary fischer rat thyroid follicular cell (frtl-5) radioactive iodide uptake (raiu) assay. Archives of Toxicology, 94:873–885. [28] Conti, A., Strazzeri, C., and Rhoden, K. J. (2020). Perfluorooctane sulfonic acid, a persistent organic pollutant, inhibits iodide accumulation by thyroid follicular cells in vitro. Molecular and Cellular Endocrinology, 515:110922. [29] Du, G., Hu, J., Huang, H., Qin, Y., Han, X., Wu, D., Song, L., Xia, Y., and Wang, X. (2013). Perfluorooctane sulfonate (pfos) affects hormone receptor activity, steroidogenesis, and expression of endocrine-related genes in vitro and in vivo. Environmental Toxicology and Chemistry, 32:353–360. [30] Spachmo, B. and Arukwe, A. (2012). Endocrine and developmental effects in atlantic salmon (salmo salar) exposed to perfluorooctane sulfonic or perfluorooctane carboxylic acids. Aquatic Toxicology, 108:112–124. [31] Ballesteros, V., Costa, O., IΓ±iguez, C., Fletcher, T., Ballester, F., and Lopez-Espinosa, M. J. (2017). Exposure to perfluoroalkyl substances and thyroid function in pregnant women and children: a systematic review of epidemiologic studies. Environment International, 99:15–28. [32] Manera, M., Castaldelli, G., and Giari, L. (2022). Perfluorooctanoic acid affects thyroid follicles in common carp (cyprinus carpio). International Journal of Environmental Research and Public Health 2022, Vol. 19, Page 9049, 19:9049. [33] Yang, J. and Zhang, Y. (2015). I-tasser server: new development for protein structure and function predictions. Nucleic acids research, 43:W174–W181. [34] (2022). Molecular operating environment (moe), 2022.02 chemical computing group ulc, 95 1010 sherbooke st. west, suite 910, montreal, qc, canada, h3a 2r7. [35] (2020). Amber 2020. [36] He, X., Man, V. H., Yang, W., Lee, T.-S., and Wang, J. (2020). A fast and high-quality charge model for the next generation general amber force field. The Journal of Chemical Physics, 153:114502. [37] Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A., and Case, D. A. (2004). Development and testing of a general amber force field. Journal of Computational Chemistry, 25:1157–1174. [38] Dopke, M. F., Moultos, O. A., and Hartkamp, R. (2020). On the transferability of ion parameters to the tip4p/2005 water model using molecular dynamics simulations. The Journal of Chemical Physics, 152:024501. [39] Horn, H. W., Swope, W. C., Pitera, J. W., Madura, J. D., Dick, T. J., Hura, G. L., and Head-Gordon, T. (2004). Development of an improved four-site water model for biomolecular simulations: Tip4p-ew. The Journal of Chemical Physics, 120:9665–9678. [40] Ryckaert, J.-P., Ciccotti, G., and Berendsen, H. J. (1977). Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. Journal of Computational Physics, 23:327–341. [41] Wu, X., Brooks, B. R., and Vanden-EΔ³nden, E. (2016). Self-guided langevin dynamics via generalized langevin equation. Journal of Computational Chemistry, 37:595–601. [42] Berendsen, H. J., Postma, J. P., Gunsteren, W. F. V., Dinola, A., and Haak, J. R. (1984). Molecular dynamics with coupling to an external bath. The Journal of Chemical Physics, 81:3684–3690. [43] Mermelstein, D. J., Lin, C., Nelson, G., Kretsch, R., McCammon, J. A., and Walker, R. C. (2018). Fast and flexible gpu accelerated binding free energy calculations within the amber molecular dynamics package. Journal of Computational Chemistry, 39:1354–1358. [44] Onufriev, A., Bashford, D., and Case, D. A. (2004). Exploring protein native states and large- scale conformational changes with a modified generalized born model. Proteins: Structure, Function and Genetics, 55:383–394. [45] Almeida, N. M. S., Eken, Y., and Wilson, A. K. (2021). Binding of per- and polyfluoro-alkyl substances to peroxisome proliferator-activated receptor gamma. ACS Omega, 6:15103–15114. [46] Lai, T. T., Eken, Y., and Wilson, A. K. (2020). Binding of per- and polyfluoroalkyl substances to the human pregnane x receptor. Environmental Science & Technology, 54:15986–15995. [47] Bali, S. K., Marion, A., Ugur, I., Dikmenli, A. K., Catak, S., and Aviyente, V. (2018). 96 Activity of topotecan toward the dna/topoisomerase i complex: a theoretical rationalization. Biochemistry, 57:1542–1551. [48] Findik, B. K., Cilesiz, U., Bali, S. K., Atilgan, C., Aviyente, V., and Dedeoglu, B. (2022). Investigation of iron release from the n- and c-lobes of human serum transferrin by quantum chemical calculations. Organic & Biomolecular Chemistry, 20:8766–8774. [49] Bali, S. K., Haslak, Z. P., Cifci, G., and Aviyente, V. (2023). Dna preference of indenoiso- quinolines: a computational approach. Organic & Biomolecular Chemistry, 21:4518–4528. [50] Roe, D. R. and Cheatham, T. E. (2013). Ptraj and cpptraj: software for processing and analysis of molecular dynamics trajectory data. Journal of Chemical Theory and Computation, 9:3084–3095. [51] Vollmar, A. K. R., Lin, E. Z., Nason, S. L., Santiago, K., Johnson, C. H., Ma, X., Pollitt, K. J. G., and Deziel, N. C. (2023). Per- and polyfluoroalkyl substances (pfas) and thyroid hormone measurements in dried blood spots and neonatal characteristics: a pilot study. Journal of Exposure Science & Environmental Epidemiology 2023 33:5, 33:737–747. [52] Kim, K., Kopylov, M., Bobe, D., Kelley, K., Eng, E. T., Arvan, P., and Clarke, O. B. (2021). The structure of natively iodinated bovine thyroglobulin. Acta Crystallographica Section D: Structural Biology, 77:1451–1459. 97 APPENDIX A SUPPORTING TABLES Table S4.1 The list of PFAS used in this work, the total number of fluorinated carbons, and their 2D structures are shown. 98 Table S4.2 Average per-residue decomposition energies for each PFAS simulation of non-polar pocket residues along with ITY residues. Table S4.3 Average per-residue decomposition energies for each PFAS simulation of polar pocket residues. Table S4.4 Average per-residue decomposition energies for each PFAS simulation of basic pocket residues. 99 Table S4.5 Average per-residue decomposition energies for each PFAS simulation of acidic pocket residues. Table S4.6 Hydrogen bond percentages of PFAS head group oxygen atoms. Res. ID: Residue ID of the amino acids. 100 Table S4.7 Average hydrogen bond %100 fractions of Region 1. From left to right, the fluorinated carbon chain length increases. 101 Table S4.8 Average hydrogen bond %100 fractions of Region 2. From left to right, the fluorinated carbon chain length increases. 102 Table S4.9 Average hydrogen bond %100 fractions of Region 3. From left to right, the fluorinated carbon chain length increases. 103 Table S4.10 The MM-GBSA/PBSA binding energies of PFBA and PFPA with dimer hTG protein. The compounds did not form strong interactions with the binding site residues and did not reside within the region. These systems were simulated for 10 ns in two different poses, and their average is reported here. 104 APPENDIX B SUPPORTING FIGURES Figure S4.1 Formation of T4 by hormonogenic Tyrosine residues. Iodine is shown with yellow spheres. Figure S4.2 Per-residue RMSF plot of first simulation set. 105 Figure S4.3 Per-residue RMSF plot of first simulation set. Figure S4.4 The regions for which the hydrogen bond patterns were investigated. Region 1 is shown in blue and includes S675, Q676, P677, A678, G679, and S680 residues. Region 2 is shown in green and includes V2523, K2524, Q2525, F2526, E2527, E2528, S2529, R2530, G2531, R2532, T2533, S2534, S2535, K2536, T2537, A2538, F2539, and ITY2540. Region 3 is depicted in pink loop representation and has the following residues: H2568, S2569, T2570, D2571, D2572, ITY2573, A2574, S2575, F2576, S2577, and R2578. The rest of the Thyroglobulin protein is shown as cartoon in grey color. PFOA is shown in wire representation, and ITY residues are shown in stick representation. 106 Figure S4.5 Representation of angle and distance measurements between ITY2540 and ITY2573. The residues are shown in stick representation, and the planes, shown as disks, are created by considering the side chain ring atoms. The normal of planes are shown as sticks. The distance between the OH atom of ITY2540 and CB atom of ITY2573 is shown with a dashed black arrow. CG atom is indicated by asterisk (*) on ITY2573, for reference. The structure of ITY residues is taken from the cryo-EM structure (PDB ID: 6SCJ). The distance calculated for the structure is 6.4 and the angle is 76 degrees. Figure S4.6 The kernel density plot of the distribution of calculated angles between the ITY residues. Above: PFCA compounds, below: PFSA compounds. Apo system is shown in solid red line in both plots. 107 Figure S4.7 The dominant orientations of ITY residues and PFAS compounds, extracted by clustering the last 5ns of the simulations. The key residues that have either the highest/lowest interaction with PFAS or that make hydrogen bonds with PFAS were shown in stick representation. The coloring of the secondary structures was based on the scheme shown in Figure S4.4. PFCA: per-fluoroalkyl carboxylic acid, PFSA: per-fluoroalkyl sulphonic acid. The figure is obtained using Chimera. 108 Figure S4.7 (cont’d) 109 Figure S4.8 Surface Area calculations of ITY residues in the presence of PFAS. 110 Figure S4.9 The RMSD time-series of apo system and carboxylic acid PFAS simulations. Figure S4.10 The RMSD time-series of apo system and sulphonic acid PFAS simulations. 111 Figure S4.11 The total energy of carboxylic acid PFAS simulations. Figure S4.12 The total energy of sulphonic acid PFAS simulations. 112 CHAPTER 5 FISHING FOR ANSWERS: DIFFERENT BINDING MODES OF PFAS TARGETING RAINBOW TROUT ESTROGEN RECEPTORS 113 5.1 Introduction Per- and polyfluoroalkyl substances (PFAS) are a class of synthetic organic fluorinated chemicals that were first created in the 1940s and quickly gained popularity in consumer and industrial products, such as food packaging, nonstick cookware, and water- and stain-proof textiles, due to their desirable water and oil repellent properties 1. The high stability of PFAS in a variety of environments and their resistance to heat and degradation has resulted in their use in applications including firefighting foams. 2–4 In fact, PFAS are widely referred to as β€˜forever chemicals’ or β€˜zombie chemicals’ due to their high stability and resistance to degradation in the environment. As a direct consequence of this feature, PFAS show high levels of accumulation in water, soil, and living organisms, including humans. 5 The most well-known PFAS, perfluorooctanoic acid (PFOA) and perfluorooctane sulfonic acid (PFOS), were phased out of production by U.S. manufacturers in the mid-2000’s and in 2022, were proposed to be considered as hazardous by the U.S. Environmental Protection Agency (EPA) under the Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA) as they present substantial danger to human health. EPA However, though banned for some time, PFOA and PFOS still persist in living organisms and in the environment, including the Great Lakes. 7,8 Despite the widespread use of PFAS over the past 70 years, only in the past decade have the health and environmental impacts of PFAS been widely studied. PFAS exposure in humans was shown to be linked to health problems, including high cholesterol levels, thyroid problems, certain types of cancers, and disruptions of the endocrine system. 9–15 According to the Public Health and Safety Organization, drinking PFAS-contaminated water may result in developmental problems in embryos of pregnant women. 16–19 Prior studies have shown that a major biological implication of the presence of PFAS in blood serum is the activation of certain nuclear receptors, such as Pregnane X Receptor. 9,10,14,20 A recent in vitro study testing for PFAS activation of human peroxisome proliferator-activated receptor 𝛼 (PPAR 𝛼), peroxisome proliferator-activated receptor- 𝛾 (PPAR 𝛾), and estrogen receptors (ER) indicated that multiple PFAS, both legacy and new, can result in activation of PPAR 𝛼, PPAR 𝛾, and ER at certain concentrations. 21 Due to their roles in regulation of growth and lipid metabolism, the premature activation of these proteins can 114 have adverse effects on hormonal regulation and lipid metabolism. Though the direct mechanisms of PFAS toxicity have not been fully elucidated, the existing literature mentions several adverse effects that they have on human health through various nuclear receptors. 22,23 As PFAS contamination has become a growing public health concern, attention has turned to ecological areas of great significance, including the Great Lakes. The levels of PFAS contamination among the Great Lakes does vary – e.g., the northern lakes, such as Lake Superior, have the lowest concentrations of PFAS, while the highest concentrations were found in Lake Erie and Ontario, which lie in close proximity to areas of high industrial activity. 7,24–36 Furthermore, PFOS was the most common contaminant found to bioaccumulate in Great Lakes fish, such as lake trout, due to years of prior widespread use of PFOS. However, the bioaccumulation potential of PFAS was observed to vary based on the functional group and the carbon chain length. 7? In terms of PFAS exposure to estrogen receptors in fish species, there are a limited number of studies. 37–44 A recent in vivo study highlights that several PFAS, including FC8-diol and HFPO-DA, exhibited varying levels of estrogenic activities in Fathead minnows. 45 In another study on the effect of PFAS on zebra fish, it was found that PFDA or PFTrDA can modulate the sex hormone balance by altering the steroidogenesis in a sex-dependent way; in male zebra fish, estradiol concentrations significantly increased upon PFAS exposure, but no such increase was observed for females. 46 Furthermore, a mixture of PFOS, PFNA, PFBA and PFOA was shown to cause an increase in endocrine-disruption biomarker levels, which was hypothesized to occur through either estrogen receptor binding and/or induction of estrogen expression. 47 In Tilapia, the exposure to PFOS, PFOA, and FTOHs resulted in anti- estrogenic activities in the presence of Estradiol. 41 However, despite the accumulating evidence on PFAS toxicity, the molecular-level details of the PFAS exposure in Estrogen receptors is not fully understood. The two subtypes of ERs, Estrogen receptor alpha and Estrogen receptor beta, have distinct roles in mammals and other vertebrates, including fish. In mammals, it is known that Estrogen receptor alpha is dominantly present in reproductive, bone, liver, and breast tissues, and involved in the development of secondary sexual characteristics; Estrogen receptor beta is found in the central nervous system, the immune system, and the cardiovascular system, playing an important 115 role in cardiac function. 48,49 In rainbow trout, on the other hand, ER alpha is found dominantly in the testes, liver, and spleen, and the ER beta was expressed more prominently in the kidney and liver. 50 Moreover, it has been shown that these two ERs also have different affinities towards the natural ligand, estradiol, bringing the question of whether the impact of PFAS exposure would affect the ER alpha and ER beta differently. 50–52 Given the pivotal involvement of ERs in essential processes, exploring how PFAS may interfere with the functions of ERs in fish is crucial for elucidating the endocrine-disrupting impact of these substances on aquatic organisms and the ecosystem. In this work, due to significant presence of PFAS contaminants in critical environmental areas including the Great Lakes, detailed insight into PFAS binding and toxicity towards the two Estrogen receptor subtypes, rainbow trout Estrogen receptor alpha (rER 𝛼) and Estrogen receptor beta (rER 𝛽), will be obtained about rainbow trout using molecular dynamics (MD) simulations and structural analysis. As a predatory fish, rainbow trout consume other organisms which makes them increasingly susceptible to higher doses of PFAS exposure and potential harmful effects. Understanding the impact of PFAS exposure on Estrogen receptors specifically, which are responsible for not only reproductive systems of fish but also many other important physiological functions including the immune system, enables a more comprehensive view of the impact of PFAS on the endocrine system, and may lead to the development of in vivo mitigation strategies. The observations from this work can also provide more insight on how endocrine disruption through ERs can occur in humans as well. 5.2 Computational Protocols 5.2.1 System preparation and docking protocols The rER 𝛼 and rER 𝛽 sequences of rainbow trout (sp. Oncorhynchus mykiss) were obtained from the UniProt database with the accession numbers P16058 and P57782, respectively. 53I- TASSER server was used for homology modeling of the protein structures. 54–56 The resulting structures were overlapped with the human ER 𝛼 (PDB ID: 1G50) and ER 𝛽 (PDB ID: 2J7X) structures co-crystallized with 17-𝛽-Estradiol (E2) structures to determine the binding pocket residues in the ligand binding pockets. 57,2j7 Molecular Operating Environment (MOE) was used 116 Figure S3.1 a. Sequence alignment of ligand binding domains of fish (rER 𝛼, rER 𝛽) and human (hER 𝛼, hER 𝛽) Estrogen receptors. The R407(rER 𝛼)/R273(rER 𝛽) residue used for the pharmacophore modeling is indicated with a black arrow. The blue arrow shows the mutated residue that causes the conformation change for R407(rER 𝛼)/R273(rER 𝛽) in rainbow trout estrogen receptors: A339/E205. Green arrows indicate the pocket residues that are not conserved between hER 𝛼 and hER 𝛽 four proteins: L384/M291 and M421/I328, for hER 𝛼 and hER 𝛽, respectively. b. The ClustalW percent identity matrix for the multiple sequence alignment of rER 𝛼, rER 𝛽, hER 𝛼, and hER 𝛽 ligand binding domains ( LBDs).76,77 c. Superimposed structure of ligand binding domains of rER 𝛼 (pink) and rER 𝛽 (blue). R408(rER 𝛼)/R274(rER 𝛽) residue is shownwith van der Waals surface representation. d. The MM-PBSA/GBSA binding energies for rER 𝛼 LBD are reported. e. The MM-PBSA/GBSA binding energies for rER 𝛽 LBD are reported. f. The correlation between the experimental IC50 values and the calculated binding energies with MM-PBSA and MM-GBSA methods for PFAS bound to rER 𝛼 LBD. 37 Kendall’s tau is 0.66 for both correlation plots. The values of binding energies and the standard deviations are provided in Table S1. 117 for the minimization of homology models, determination of protonation states, and for docking procedures.59,60 59,60 rER 𝛼 and rER 𝛽 models were minimized with the AMBER10:Extended Huckel Theory (EHT) forcefield where Amber ff10 was used for the protein structure. 61–63 Once the binding pocket residues were identified by overlapping the homology models with human proteins, the selected PFAS compounds (Table S1) were docked using a pharmacophore approach that places the negatively charged head group of PFASs near R407 (rER 𝛼)/R273 (rER 𝛽) residues. R407 (rER 𝛼)/R273 (rER 𝛽) residues (R394 in hER 𝛼, R301 in hER 𝛽) is known to orient towards the OH group of the E2 ligand, as seen in the crystal structures, therefore it was selected for the orientation of the PFASs within the pocket. 57,2j7The pharmacophore approach was used during the initial placement process with a London dG scoring function to obtain 100 poses.64 The further refinement was performed with an induced fit method and Generalized Born Volume integral/Weighted Surface Area (GBVI/WSA) scoring function, and the top 10 poses were reported. 59,64The highest scoring pose for each PFAS were selected for Molecular Dynamics (MD) simulations. 5.2.2 Simulation details AM1-BCC partial charges were calculated using antechamber module of Amber18/AmberTools20 using Generalized Amber Force Field (gaff2) to obtain the partial charges for the PFAS molecules. 65,62,66 The simulation boxes for each system were generated using the tleap module. 65 ff14SB, gaff2, and TIP4PEW force fields were selected for protein, PFAS, and water molecules, respectively. 62,67–69 Each system was neutralized using 0.1 M NaCl salt. 70 The minimization and heating steps were per- formed in a stepwise fashion as follows: (i) The systems were minimized with restraints (500, 200, 20, 10, 5, 0 kcal mol-1 Γ…-2). (ii) Heating from 100 K to 283.15 K was performed in 30 ps. (iii)The systems were equilibrated for 100 ps at 283.15 K. The production step consisted of a 30 ns long equilibrium MD simulation using 2 fs timestep at 300 K and 1 atm. A set of duplicate simulations were also performed by randomizing the initial velocities after the heating step. The temperature and pressure during the simulations were controlled by Langevin thermostat and isotropic position scaling. The bonds involving hydrogen atoms were constrained using the SHAKE algorithm. 71 The MD simulations were performed using AMBER 2018 with pmemd.cuda module. 65 118 5.2.3 Analysis of the trajectories Molecular Mechanics - Generalized Born Surface Area and Molecular Mechanics - Pois- son Boltzmann Surface Area (MM-GBSA/PBSA) methods were used to estimate the binding strength of PFASs in rER 𝛼 and rER 𝛽 ligand binding domains (LBDs), as implemented in Am- ber18/AmberTools20. 65 The binding energies were calculated for the last 1 ns of the simulations, including the duplicate simulations, and averaged for each PFAS. Root mean square distances (RMSD) and per-residue root mean square fluctuations (RMSF) were calculated with default set- tings as implemented in cpptraj module of AmberTools20. 72 Hydrogen bonds between the PFASs and the binding pocket residues were analyzed using cpptraj as well. The last 5 ns of the simulations were clustered using k-means clustering algorithm to obtain a representative frame. All time-series data were plotted using Python’s matplotlib library, and the figures were obtained using UCSF Chimera 1.13.1 and MOE 2022.02. 59,73 5.3 Results and Discussion 5.3.1 Stability of Investigated Complexes The common Arginine residue used in pharmacophore docking has been identified based on the existing co-crystal structures of human ER 𝛼 and ER 𝛽 LBDs. The docking of E2 ligand yielded a pose similar to what was observed in human Estrogen receptors, and the charged head group of PFAS positioned near the side chain of arginine during the docking. 55,56 These poses yielded a comparable starting point for the PFAS simulations. In order to draw meaningful conclusions from our simulations, assessing both structural and energetic stability of the simulated systems is important. The structural stabilities of the systems were measured by RMSD analysis, as per Figure S2-S5, and this analysis indicated that the RMSD of simulations reached a plateau during the last 5 ns of the simulations. Similarly, the time-series data of the total energies that were tracked and reported in Figure S6-S7 indicated the systems reach an energetically equilibrated state during the first 10 ns and continued to stay stable until the end of the simulations. Therefore, only the last 5 ns of each simulation were considered for further analysis. 119 5.3.2 Binding strength and modes of PFAS The sequence comparison of rER 𝛼 and rER 𝛽 from rainbow trout indicated that the two sequences have 50% identity for the whole sequence, and 58% identity for LBDs only. 50–52 In Figure 1, the sequence overlap and the structural superposition of both LBDs is shown. The binding pocket and the surrounding regions of both proteins are highly similar to each other, except for the residues shown and mutations listed in Figure S1. While some mutations do not change the chemical nature of the amino acid, others such as Glu to Gly or Lys to Met can cause changes in either the charge or polarity of the sidechains, which, in turn, can impact the binding strength as well as the binding mode of the investigated PFAS. While these mutations do not seem to alter the volume of the binding pockets significantly, 85 Γ…3 and 92 Γ…3 for rER 𝛼 and rER 𝛽, respectively, the orientations of pocket residues including R407/R273 is highly impacted. The obtained MM- PBSA/GBSA binding energies for the E2 and PFAS are reported in Figure 1(d) and 1(e) for rER 𝛼 and rER 𝛽, respectively. The prediction powers of MM-PBSA/GBSA approaches for PFAS binding to the rER 𝛼 protein was analyzed by comparing the calculated binding affinities to experimentally available half-maximal inhibitory concentration (IC50) values from studies by Benninghoff et al., and reported in Figure 1(e), for both MM-PBSA and MM-GBSA values for PFHxA, PFHpA, PFOA, PFNA, PFDA, PFUnA, PFDoA, and PFOS. 37 A strong correlation between the experimental IC50 values and calculated binding affinities of MM-PBSA and MM-GBSA methods were observed with coefficients of determination (R2) of 0.70 and 0.73, respectively. While the MM-GBSA method has a slightly higher R2, PFDA, PFNA, and E2 were outliers based on the correlation plot. Meanwhile, the MM-PBSA approach with 0.70 R2 only had two outliers, PFDA and PFHxA, and the binding strength of E2 was consistent with the IC50 value, prompting the consideration of MM-PBSA binding energy values for further analysis. Prior studies indicate that the binding affinity of the natural agonist E2 differs between the two subtypes. 51,52,74 Our calculations show that the binding energy differences between rER 𝛼 and rER 𝛽 slightly favor the E2 binding for rER 𝛼 protein, as shown in Figure 1. For both subtypes, E2 is among the strongest binders, indicating that the preference for the natural ligand is higher than for PFAS. Still, especially for rER 𝛼, the 120 predicted binding strength of certain PFAS, such as PFOS, PFOSA-AcOH, and Et-PFOSA-AcOH, were observed to be strong. PFDA and PFDoA for rER 𝛽 protein were among the strongest binders after the E2 ligand. Weaker binding PFAS, however, were common between the two subtypes. PFBS, PFPeA, and PFHxA were the top three weak binders in rER 𝛼, and PFHxA, PFPeA, and PFOA were predicted to have the weakest binding energies in rER 𝛽 (Figure 1). Interestingly, for carboxylic PFAS, having a higher number of fluorinated carbons resulted in a stronger binding energy when binding to both isoformsn.Still, due to the limited size of binding pockets, there was no increase in binding affinities after PFNA/PFDA when binding to rER 𝛼 and rER 𝛽, respectively. For the PFAS with sulphonic acid group, however, only the binding energies with rER 𝛼 pointed a relationship between the binding strength and chain length. The binding energies with rER 𝛽, on the other hand, showed no correlation with the length of the fluorinated carbon chains for sulphonic PFAS. Another interesting observation was related to the PFAS head group type and its relation to the binding strength. The type of head group of the PFAS impacted the binding strength, as shown in the comparison of the PFOS, PFOSA, PFOSA-AcOH, and Et-PFOSA-AcOH molecules. All of these compounds have eight fluorinated carbons, with different head groups (Table S1). However, the predicted binding strengths for rER 𝛼 protein are different, pointing to the importance of the head groups and the interactions they are forming. The aforementioned four PFAS have stronger binding affinities than the majority of carboxylic PFAS for rER 𝛼 LBD, but their affinities towards rER 𝛽 protein were on par with those of long-chain carboxylic PFAS, such as PFDA, due to the mutations in the binding site, hence forming different interactions. These differences in binding affinities highlight the fact that binding free energies of PFAS compounds depend on (i) the carbon chain length, which is limited by the pocket size, and (ii) the type of the functional group. For the specific case of binding to rER 𝛼 and rER 𝛽 LBDs, overall binding affinity of carboxylic PFAS is quite comparable between two subtypes while the sulphonic and sulphonamide PFAS showed strong affinity towards rER 𝛼. 37 121 Table S3.1 List of residues with largest energy contribution to the binding of E2 and PFAS. Compound Name E2 PFPeA PFHxA PFHpA PFOA PFNA PFDA PFUnA PFDoA PFBS PFHxS PFOS PFOSA PFOSA-AcOH Et-PFOSA-AcOH 8:2 FTOH GenX Compound Name E2 PFPeA PFHxA PFHpA PFOA PFNA PFDA PFUnA PFDoA PFBS PFHxS PFOS PFOSA PFOSA-AcOH Et-PFOSA-AcOH 8:2 FTOH 1 GenX rER𝛼 E365, H537, L400 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 E366, P417, I437, L538 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 R407, K365, K414, K542, K544, K546, R561 rER𝛽 L225, L266, L270, H403 K231, R273, K280, H403, K408, K410, K411 K231, R273, K280, H403, K408, K410, K411 K231, R274, K280, K408, K410, K411 K231, R273, K280, H403, K408, K410, K411 K231, R273, K280, H403, K408, K410, K411 K231, R273, K280, H403, K408, K410, K411, G294, S295 K231, R273, K280, H403, K408, K410, K411 K231, R273, K280, K408, K410, K411 K231, R273, K280, H403, K408, K410, K411 K231, R273, K280, K408, K410, K411 K231, R273, K280, K408, K410, K411 L225, T226, H403 K231, R273, K280, K408, K410, K411 K231, R273, K280, K408, K410, K411 E232, L266, H403, L404 K231, R273, K280, H403, K408, K410, K411 122 5.3.3 Changes in interaction patterns upon PFAS binding The binding energy predictions provided insight about the overall ranking of the affinities of investigated PFAS, and investigating the local interactions made by these fluorinated compounds is key for deciphering the molecular recognition by the rER 𝛼 and rER 𝛽 proteins. The results of the residue decomposition analysis that was employed to obtain the energetic contributions from the surrounding residues are reported in Table S8-S13, and the direct interactions between the PFAS and binding pocket amino acids were identified using hydrogen bond analysis, as shown in Figure 2. In the rER 𝛼 and rER 𝛽 LBD pockets, the binding of E2 is mainly driven by hydrogen bonds with E366/E232 and H537/H403, as shown in Figure 2(c), indicating that the orientation of the E2 molecule within the binding pockets as well as the recognition may be similar for both proteins. The histidine residue in two subtypes, H537/H403, provided an anchor point for the hydroxyl group of E2, while the other end is oriented towards the E366/E232 (Figure 2(c)). The largest energy contributions among the pocket residues for E2 binding were with E365, H537, L400 for rER 𝛼, and L225, L266, L270, H403 for rER 𝛽, as shown in Table 1, pointing out that although the anchor residues are the same, the strongest interactions with surrounding residues were different for rER 𝛼 and rER 𝛽, potentially due to the mutations within the binding pocket. In general, however, the interaction energies of E2 ligand with the pocket residues were between 0 kcal mol-1 and -10 kcal mol-1, for both LBDs. The majority of PFAS had stabilizing interaction energies (smaller than zero) with the surrounding basic residues and unfavorable interaction energies (larger than zero) with acidic residues. This observation was valid for both rER proteins (Table S8, S11). One notable exception to this is PFOSA in rER 𝛼, and 8:2FTOH and PFOSA in rER 𝛽. PFOSA In rER 𝛽 LBD, however, PFOSA did not form any notable interactions with either basic or acidic residues, and 8:2FTOH had strong interaction with E232 residue (Table S11). As these two PFAS lack a charged group, it was expected not to have any prominent interactions with acidic and basic residues.The binding of PFAS in rER 𝛼 LBD was mainly facilitated by a direct hydrogen bonding between R407 side chain and the negatively charged head group of PFAS, while majority of PFAS formed stabilizing interactions with the surrounding basic residues and unfavorable interactions 123 with the acidic ones (Table S8-S11). For sulfonamide (PFOSA) and fluorotelomer (8:2 FTOH) compounds, the anchor residue was E366 (Figure 2(a)) as those two PFAS have sulphonamide and alcohol head groups, respectively. PFOSA also formed interactions that are very similar to E2 ligand, i.e. strong stabilizing interaction with E366 in rER 𝛼 (Table S8).To the contrary, the recognition of PFAS in rER 𝛽 mainly involved the hydrogen bonding with the side chain of H403, located on Helix-11 (Figure 2(d)), and there was no direct interaction between R273 and PFAS head groups, except for the PFDoA compound. Moreover, the strongest interaction was also with H403 (Table S9, S12), and the interaction strength with this histidine was generally stronger for rER 𝛽, due to the direct hydrogen bonding between PFAS head group and the H403 residue. Interestingly, the majority of sulphonic PFAS, namely PFDA, PFUnA, PFHxS, PFOS, PFOSA, PFOSA-AcOH, and Et-PFOSA-AcOH, did not form any direct hydrogen bonds with the pocket residues of rER 𝛽. As shown in Figure S13, these sulphonic PFAS went through a slight rotation within the rER 𝛽 binding pocket and their head groups were oriented towards the Helix-7 and Helix-11 (Figure S6). The other PFAS, specifically short chain sulphonic compound PFBS and carboxylic PFAS with up to eight fluorinated carbons, were able to fully rotate their head groups to interact with H403. This β€˜tumbling’ motion of PFAS that was observed only within rER 𝛽 binding pocket and not within rER 𝛼 is a direct consequence of the amino acid differences in the pocket residues. In apo rER 𝛼 binding pocket, R407 is in proximity of two glutamate residues, E336 and E366, forming a direct hydrogen bond with the latter, as depicted in Figure 2(e). On the other hand, there are three glutamate residues, E202, E205 and E232, in the proximity of R273 of apo rER 𝛽 binding pocket. The arginine interacts directly with E205, and consequently shifting the orientation of R273 side chain. This E205 residue of rER 𝛽 is modified to an alanine (A339) in rER 𝛼 protein, explaining why (i) there is no direct interaction with R273 when natural ligand and PFAS are present in the pocket, and (ii) the charged PFAS undergo the β€˜tumbling’ motion in rER 𝛽 pocket. The reorientation within the pocket of rER 𝛽 is not just limited to the R273 residue. Upon PFAS binding, the phenylalanine residue (F283) located near the arginine also goes through a conformational change, as shown in Figure 2(f). 124 In rER 𝛼, the orientation of the PFAS as well as E2 allowed an interaction between F417 side chain and the PFAS tail group to from stabilizing interactions (Table S10), and the orientation of F417 side chain further away from R407. Meanwhile, the strength of the interactions with F283 was weaker in rER 𝛽 pocket (Table S13) and the F283 side chain had a similar orientation to what was observed in apo rER 𝛽 simulations (Figure 2(f)). The role of this phenylalanine residue is not well-defined in the literature.; However, our simulations indicate that it may have a role in stabilizing the ligand within the binding pocket for both subtypes. To the best of our knowledge, this is the first study assessing molecular-level details of PFAS binding and toxicity in rainbow trout Estrogen receptors alpha and beta. The two subtypes have slightly different affinities against the E2 ligand and various PFAS due to the modifications in amino acid sequences within the binding pocket. The most commonly known mutations identified in human Estrogen receptor studies, L397/L263 and M434/F300 for rER 𝛼 to rER 𝛽, respectively, were found to impact the binding strengths and the orientations of PFAS in rainbow trout. The most striking observation, however, was the amino acid modification of A339 to E205 from rER 𝛼 to rER 𝛽, resulting in the complete reorientation of R273/R407 and F417/F283 residues and causing PFAS to β€˜tumble’ within the binding pocket of rER 𝛽. This orientation change may explain the affinity differences between two subtypes, and further, it may indicate different downstream impacts of PFAS exposure. It is the first time in the literature that this amino acid modification was identified to impact the PFAS binding for Estrogen receptors. Human ER 𝛼 and ER 𝛽 proteins also do not have a conserved residue at this location with Ile to Asn modification, respectively (Figure 1(a), blue arrow). The impact of this residue on the mobility and orientation of important and conserved arginine amino acid may have a role in developing subtype-selective binders for both human and rainbow trout estrogen receptors. 5.3.4 Environmental Impact Understanding how PFAS exert toxic effects on living organisms and ecosystems is crucial for developing effective mitigation strategies. The Great Lakes, with its central role for local biodiversity, faces a significant threat due to persistent accumulation of PFAS. This accumulation not only poses a risk to the ecosystem and biodiversity, but also to human health through the 125 Figure S3.2 The hydrogen bond percentage heatmap of the PFAS in (a) rER 𝛼 LBD and (b) rER 𝛽 LBD pockets. The y-axis shows the pocket residue names and the x-axis shows the PFAS bound to the pocket. c. The locations of the residues that form direct hydrogen bond with E2 in rER 𝛼 and rER 𝛽, overlapped. d. The locations of the residues that form direct hydrogen bond with PFAS in rER 𝛼 and rER 𝛽. Helix 12 (H12) is also shown in light pink and light blue colors for rER 𝛼 and rER 𝛽, respectively. e. The detailed depiction of the orientations of binding pocket residues in apo rER 𝛼 and rER 𝛽 LBDs is pictured. The distances between hydrogen bonding heavy atoms were shown in green dash lines. f. The detailed depiction of the orientations of binding pocket residues in PFNA-bound rER 𝛼 and rER 𝛽 LBDs is shown. The distances between hydrogen bonding heavy atoms were shown in green dash lines. PFNA was selected as a representative of the majority of PFAS simulations. 126 consumption of contaminated fish. Therefore, addressing the impact of PFAS on fish health and ecosystems is vital for protecting the environment and human health. As a first step in understanding PFAS toxicity, molecular details of how PFAS binds to target proteins in fish needs to be addressed. Here, we focused on Estrogen receptors: as one of the nuclear receptors, they play a fundamental role not only in reproductive system but also cytoplasmic signal transduction as well as in regulating the immune system. The current study sheds light to the different binding modes of PFAS within rER 𝛼 to rER 𝛽 LBDs and molecular details of PFAS interactions within the binding pocket. Significantly, the mutations of binding pocket residues not only caused PFAS to bind differently in Estrogen receptor subtypes, but also in different orientations, emphasizing that PFAS exert its impact through different mechanisms. This understanding is central for devising targeted interventions for PFAS toxicity and creating regulatory mechanisms that can effectively mitigate PFAS-associated risks. 127 BIBLIOGRAPHY [1] Brennan, N. M., Evans, A. T., Fritz, M. K., Peak, S. A., and von Holst, H. E. (2021). Trends in the regulation of per-and polyfluoroalkyl substances (pfas): A scoping review. International Journal of Environmental Research and Public Health, 18:10900. [2] Coggan, T. L., Moodie, D., Kolobaric, A., Szabo, D., Shimeta, J., Crosbie, N. D., Lee, E., Fernandes, M., and Clarke, B. O. (2019). An investigation into per- and polyfluoroalkyl substances (pfas) in nineteen australian wastewater treatment plants (wwtps). Heliyon, 5:e02316. [3] Pelch, K. E., Reade, A., Wolffe, T. A., and Kwiatkowski, C. F. (2019). Pfas health effects database: Protocol for a systematic evidence map. Environment International, 130:104851. [4] Gaines, L. G. T. and Gaines, C. G. L. T. (2023). Historical and current usage of per- and polyfluoroalkyl substances (pfas): A literature review. American Journal of Industrial Medicine, 66:353–378. [5] Calafat, A. M., Kuklenyik, Z., Reidy, J. A., Caudill, S. P., Tully, J. S., and Needham, L. L. (2007). Serum concentrations of 11 polyfluoroalkyl compounds in the u.s. population: Data from the national health and nutrition examination survey (nhanes) 1999-2000. Environmental Science and Technology, 41:2237–2242. [EPA] Epa proposes designating certain pfas chemicals as hazardous substances under superfund to protect people’s health | us epa. [7] Remucal, C. K. (2019). Spatial and temporal variability of perfluoroalkyl substances in the laurentian great lakes. Environmental Science: Processes & Impacts, 21:1816–1834. [8] Point, A. D., Holsen, T. M., Fernando, S., Hopke, P. K., and Crimmins, B. S. (2021). Trends (2005–2016) of perfluoroalkyl acids in top predator fish of the laurentian great lakes. Science of The Total Environment, 778:146151. [9] Rappazzo, K., Coffman, E., and Hines, E. (2017). Exposure to perfluorinated alkyl sub- stances and health outcomes in children: A systematic review of the epidemiologic literature. International Journal of Environmental Research and Public Health, 14:691. [10] Duan, X., Sun, W., Sun, H., and Zhang, L. (2021). Perfluorooctane sulfonate continual exposure impairs glucose-stimulated insulin secretion via sirt1-induced upregulation of ucp2 expression. Environmental Pollution, 278:116840. [11] Sunderland, E. M., Hu, X. C., Dassuncao, C., Tokranov, A. K., Wagner, C. C., and Allen, J. G. (2019). A review of the pathways of human exposure to poly- and perfluoroalkyl substances (pfass) and present understanding of health effects. Journal of Exposure Science & Environmental Epidemiology, 29:131–147. 128 [12] Anderko, L. and Pennea, E. (2020). Exposures to per-and polyfluoroalkyl substances (pfas): Potential risks to reproductive and children’s health. Current Problems in Pediatric and Adoles- cent Health Care, 50:100760. [13] Guo, H., Chen, J., Zhang, H., Yao, J., Sheng, N., Li, Q., Guo, Y., Wu, C., Xie, W., and Dai, J. (2022). Exposure to genx and its novel analogs disrupts hepatic bile acid metabolism in male mice. Environmental Science & Technology, 56:6133–6143. [14] Almeida, N. M. S., Eken, Y., and Wilson, A. K. (2021). Binding of per- and polyfluoro-alkyl substances to peroxisome proliferator-activated receptor gamma. ACS Omega, 6:15103–15114. [15] Munoz, G., Liu, J., Duy, S. V., and SauvΓ©, S. (2019). Analysis of f-53b, gen-x, adona, and emerging fluoroalkylether substances in environmental and biomonitoring samples: A review. Trends in Environmental Analytical Chemistry, 23:e00066. [16] Chen, M. H., Ha, E. H., Wen, T. W., Su, Y. N., Lien, G. W., Chen, C. Y., Chen, P. C., and Hsieh, W. S. (2012). Perfluorinated compounds in umbilical cord blood and adverse birth outcomes. PLOS ONE, 7:e42474. [17] Sagiv, S. K., Rifas-Shiman, S. L., Fleisch, A. F., Webster, T. F., Calafat, A. M., Ye, X., Gillman, M. W., and Oken, E. (2018). Early-pregnancy plasma concentrations of perfluoroalkyl substances and birth outcomes in project viva: Confounded by pregnancy hemodynamics? American Journal of Epidemiology, 187:793–802. [18] (2018). Prenatal exposure to perfluoroalkyl substances and birth outcomes; an updated analysis from the danish national birth cohort. International Journal of Environmental Research and Public Health 2018, Vol. 15, Page 1832, 15:1832. [19] Johnson, P. I., Sutton, P., Atchley, D. S., Koustas, E., Lam, J., Sen, S., Robinson, K. A., Axelrad, D. A., and Woodruff, T. J. (2014). The navigation guideβ€”evidence-based medicine meets environmental health: Systematic review of human evidence for pfoa effects on fetal growth. Environmental Health Perspectives, 122:1028–1039. [20] Wen, L.-L., Lin, C.-Y., Chou, H.-C., Chang, C.-C., Lo, H.-Y., and Juan, S.-H. (2016). Perfluorooctanesulfonate mediates renal tubular cell apoptosis through ppargamma inactivation. PLOS ONE, 11:e0155190. [21] Evans, N., Conley, J. M., Cardon, M., Hartig, P., Medlock-Kakaley, E., and Gray, L. E. (2022). In vitro activity of a panel of per- and polyfluoroalkyl substances (pfas), fatty acids, and pharmaceuticals in peroxisome proliferator-activated receptor (ppar) alpha, ppar gamma, and estrogen receptor assays. Toxicology and Applied Pharmacology, 449:116136. [22] Amenyogbe, E., Chen, G., Wang, Z., Lu, X., Lin, M., and Lin, A. Y. (2020). A review on sex steroid hormone estrogen receptors in mammals and fish. 129 [23] Davidsen, N., RamhΓΈj, L., Lykkebo, C. A., Kugathas, I., Poulsen, R., Rosenmai, A. K., Evrard, B., Darde, T. A., Axelstad, M., Bahl, M. I., Hansen, M., Chalmel, F., Licht, T. R., and Svingen, T. (2022). Pfos-induced thyroid hormone system disrupted rats display organ-specific changes in their transcriptomes. Environmental Pollution, 305:119340. [24] Furdui, V. I., Stock, N. L., Ellis, D. A., Butt, C. M., Whittle, D. M., Crozier, P. W., Reiner, E. J., Muir, D. C., and Mabury, S. A. (2007). Spatial distribution of perfluoroalkyl contaminants in lake trout from the great lakes. Environmental Science and Technology, 41:1554–1559. [25] Houde, M., Czub, G., Small, J. M., Backus, S., Wang, X., Alaee, M., and Muir, D. C. (2008). Fractionation and bioaccumulation of perfluorooctane sulfonate (pfos) isomers in a lake ontario food web. Environmental Science and Technology, 42:9397–9403. [26] Silva, A. O. D., Spencer, C., Scott, B. F., Backus, S., and Muir, D. C. (2011). Detection of a cyclic perfluorinated acid, perfluoroethylcyclohexane sulfonate, in the great lakes of north america. Environmental Science and Technology, 45:8060–8066. [27] Myers, A. L., Crozier, P. W., Helm, P. A., Brimacombe, C., Furdui, V. I., Reiner, E. J., Burniston, D., and Marvin, C. H. (2012). Fate, distribution, and contrasting temporal trends of perfluoroalkyl substances (pfass) in lake ontario, canada. Environment International, 44:92–99. [28] Martin, J. W., Whittle, D. M., Muir, D. C., and Mabury, S. A. (2004). Perfluoroalkyl contaminants in a food web from lake ontario. Environmental Science and Technology, 38:5379– 5385. [29] Silva, A. O. D., Muir, D. C., and Mabury, S. A. (2009). Distribution of perfluorocarboxylate isomers in select samples from the north american environment. Environmental Toxicology and Chemistry, 28:1801–1814. [30] Codling, G., Vogt, A., Jones, P. D., Wang, T., Wang, P., Lu, Y. L., Corcoran, M., Bonina, S., Li, A., Sturchio, N. C., Rockne, K. J., Ji, K., Khim, J. S., Naile, J. E., and Giesy, J. P. (2014). Historical trends of inorganic and organic fluorine in sediments of lake michigan. Chemosphere, 114:203–209. [31] Codling, G., Sturchio, N. C., Rockne, K. J., Li, A., Peng, H., Tse, T. J., Jones, P. D., and Giesy, J. P. (2018). Spatial and temporal trends in poly- and per-fluorinated compounds in the laurentian great lakes erie, ontario and st. clair. Environmental Pollution, 237:396–405. [32] Yeung, L. W., Silva, A. O. D., Loi, E. I., Marvin, C. H., Taniyasu, S., Yamashita, N., Mabury, S. A., Muir, D. C., and Lam, P. K. (2013). Perfluoroalkyl substances and extractable organic fluorine in surface sediments and cores from lake ontario. Environment International, 59:389–397. [33] Guo, R., Megson, D., Myers, A. L., Helm, P. A., Marvin, C., Crozier, P., Mabury, S., Bhavsar, S. P., Tomy, G., Simcik, M., McCarry, B., and Reiner, E. J. (2016). Application of a 130 comprehensive extraction technique for the determination of poly- and perfluoroalkyl substances (pfass) in great lakes region sediments. Chemosphere, 164:535–546. [34] McGoldrick, D. J. and Murphy, E. W. (2016). Concentration and distribution of contaminants in lake trout and walleye from the laurentian great lakes (2008–2012). Environmental Pollution, 217:85–96. [35] Asher, B. J., Wang, Y., Silva, A. O. D., Backus, S., Muir, D. C., Wong, C. S., and Martin, J. W. (2012). Enantiospecific perfluorooctane sulfonate (pfos) analysis reveals evidence for the source contribution of pfos-precursors to the lake ontario foodweb. Environmental Science & Technology, 46:7653–7660. [36] Gewurtz, S. B., Silva, A. O. D., Backus, S. M., McGoldrick, D. J., Keir, M. J., Small, J., Melymuk, L., and Muir, D. C. (2012). Perfluoroalkyl contaminants in lake ontario lake trout: Detailed examination of current status and long-term trends. Environmental Science and Technology, 46:5842–5850. [37] Benninghoff, A. D., Bisson, W. H., Koch, D. C., Ehresman, D. J., Kolluri, S. K., and Williams, D. E. (2011). Estrogen-like activity of perfluoroalkyl acids in vivo and interaction with human and rainbow trout estrogen receptors in vitro. Toxicological Sciences, 120:42–58. [38] Wei, Y., Dai, J., Liu, M., Wang, J., Xu, M., Zha, J., and Wang, Z. (2007). Estrogen-like properties of perfluorooctanoic acid as revealed by expressing hepatic estrogen-responsive genes in rare minnows (gobiocypris rarus). Environmental Toxicology and Chemistry, 26:2440–2447. [39] Xin, Y., Ren, X. M., Wan, B., and Guo, L. H. (2019). Comparative in vitro and in vivo evaluation of the estrogenic effect of hexafluoropropylene oxide homologues. Environmental Science and Technology, 53:8371–8380. [40] Han, J. and Fang, Z. (2010). Estrogenic effects, reproductive impairment and developmental toxicity in ovoviparous swordtail fish (xiphophorus helleri) exposed to perfluorooctane sulfonate (pfos). Aquatic Toxicology, 99:281–290. [41] Liu, C., Du, Y., and Zhou, B. (2007). Evaluation of estrogenic activities and mechanism of action of perfluorinated chemicals determined by vitellogenin induction in primary cultured tilapia hepatocytes. Aquatic Toxicology, 85:267–277. [42] Qiu, Z., Qu, K., Luan, F., Liu, Y., Zhu, Y., Yuan, Y., Li, H., Zhang, H., Hai, Y., and Zhao, C. (2020). Binding specificities of estrogen receptor with perfluorinated compounds: A cross species comparison. Environment International, 134:105284. [43] Qu, K., Song, J., Zhu, Y., Liu, Y., and Zhao, C. (2019). Perfluorinated compounds binding to estrogen receptor of different species: a molecular dynamic modeling. Journal of Molecular Modeling, 25:1–10. 131 [44] Cocci, P., Mosconi, G., and Palermo, F. A. (2021). An in silico and in vitro study for investi- gating estrogenic endocrine effects of emerging persistent pollutants using primary hepatocytes from grey mullet (mugil cephalus). Environments - MDPI, 8:58. [45] Villeneuve, D. L., Blackwell, B. R., Cavallin, J. E., Collins, J., Hoang, J. X., Hofer, R. N., Houck, K. A., Jensen, K. M., Kahl, M. D., Kutsi, R. N., Opseth, A. S., Rodriguez, K. J. S., Schaupp, C., Stacy, E. H., and Ankley, G. T. (2023). Verification of in vivo estrogenic activity for four per- and polyfluoroalkyl substances (pfas) identified as estrogen receptor agonists via new approach methodologies. Environmental Science and Technology, 57:3794–3803. [46] Jo, A., Ji, K., and Choi, K. (2014). Endocrine disruption effects of long-term exposure to perfluorodecanoic acid (pfda) and perfluorotridecanoic acid (pftrda) in zebrafish (danio rerio) and related mechanisms. Chemosphere, 108:360–366. [47] Lee, J. W., Lee, J.-W., Shin, Y.-J., Kim, J.-E., Ryu, T.-K., Ryu, J., Lee, J., Kim, P., Choi, K., and Park, K. (2017). Multi-generational xenoestrogenic effects of perfluoroalkyl acids (pfaas) mixture on oryzias latipes using a flow-through exposure system. Chemosphere, 169:212–223. [48] Jia, M., Dahlman-Wright, K., and Γ…ke Gustafsson, J. (2015). Estrogen receptor alpha and beta in health and disease. Best Practice & Research Clinical Endocrinology & Metabolism, 29:557–568. [49] Chen, P., Li, B., and Ou-Yang, L. (2022). Role of estrogen receptors in health and disease. Frontiers in Endocrinology, 13:839005. [50] Nagler, J. J., Cavileer, T., Sullivan, J., Cyr, D. G., and Rexroad, C. (2007). The complete nuclear estrogen receptor family in the rainbow trout: Discovery of the novel er 𝛼2 and both er 𝛽 isoforms. Gene, 392:164–173. [51] Shyu, C., Cavileer, T. D., Nagler, J. J., and Ytreberg, F. M. (2011). Computational estimation of rainbow trout estrogen receptor binding affinities for environmental estrogens. Toxicology and Applied Pharmacology, 250:322–326. [52] Shyu, C., Brown, C. J., and Ytreberg, F. M. (2010). Computational study of evolutionary selection pressure on rainbow trout estrogen receptors. PLOS ONE, 5:e9392. [53] Consortium, T. U. (2023). Uniprot: the universal protein knowledgebase in 2023. Nucleic Acids Research, 51:D523–D531. [54] Zheng, W., Zhang, C., Li, Y., Pearce, R., Bell, E. W., and Zhang, Y. (2021). Folding non- homologous proteins by coupling deep-learning contact maps with i-tasser assembly simulations. Cell Reports Methods, 1:100014. [55] Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., and Zhang, Y. (2014). The i-tasser suite: 132 protein structure and function prediction. Nature Methods 2015 12:1, 12:7–8. [56] Yang, J. and Zhang, Y. (2015). I-tasser server: new development for protein structure and function predictions. Nucleic Acids Research, 43:W174–W181. [57] Eiler, S., Gangloff, M., Duclaud, S., Moras, D., and Ruff, M. (2001). Overexpression, purification, and crystal structure of native er 𝛼 lbd. Protein Expression and Purification, 22:165–173. [2j7] Rcsb pdb - 2j7x: Structure of estradiol-bound estrogen receptor beta lbd in complex with lxxll motif from ncoa5. [59] (2022). Molecular operating environment (moe). [60] Labute, P. (2009). Protonate3d: Assignment of ionization states and hydrogen coordinates to macromolecular structures. Proteins: Structure, Function, and Bioinformatics, 75:187–205. [61] Hoffmann, R. (2004). An extended hΓΌckel theory. i. hydrocarbons. The Journal of Chemical Physics, 39:1397. [62] Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A., and Case, D. A. (2004). Development and testing of a general amber force field. Journal of Computational Chemistry, 25:1157–1174. [63] Hornak, V., Abel, R., Okur, A., Strockbine, B., Roitberg, A., and Simmerling, C. (2006). Comparison of multiple amber force fields and development of improved protein backbone parameters. Proteins: Structure, Function, and Bioinformatics, 65:712–725. [64] Corbeil, C. R., Williams, C. I., and Labute, P. (2012). Variability in docking success rates due to dataset preparation. Journal of Computer-Aided Molecular Design, 26:775–786. [65] York, D. and P.A. Kollman, D.A. Case, e. a. (2020). Amber 2018. [66] He, X., Man, V. H., Yang, W., Lee, T.-S., and Wang, J. (2020). A fast and high-quality charge model for the next generation general amber force field. The Journal of Chemical Physics, 153:114502. [67] Maier, J. A., Martinez, C., Kasavajhala, K., Wickstrom, L., Hauser, K. E., and Simmerling, C. (2015). ff14sb: Improving the accuracy of protein side chain and backbone parameters from ff99sb. Journal of Chemical Theory and Computation, 11:3696–3713. [68] DΓΆpke, M. F., Moultos, O. A., and Hartkamp, R. (2020). On the transferability of ion parameters to the tip4p/2005 water model using molecular dynamics simulations. The Journal of Chemical Physics, 152:024501. [69] Horn, H. W., Swope, W. C., Pitera, J. W., Madura, J. D., Dick, T. J., Hura, G. L., and 133 Head-Gordon, T. (2004). Development of an improved four-site water model for biomolecular simulations: Tip4p-ew. The Journal of Chemical Physics, 120:9665–9678. [70] Joung, I. S. and Cheatham, T. E. (2008). Determination of alkali and halide monovalent ion parameters for use in explicitly solvated biomolecular simulations. The Journal of Physical Chemistry B, 112:9020–9041. [71] Ryckaert, J.-P., Ciccotti, G., and Berendsen, H. J. (1977). Numerical integration of the cartesian equations of motion of a system with constraints: Molecular dynamics of n-alkanes. Journal of Computational Physics, 23:327–341. [72] Roe, D. R. and Cheatham, T. E. (2013). Ptraj and cpptraj: Software for processing and analysis of molecular dynamics trajectory data. Journal of Chemical Theory and Computation, 9:3084–3095. [73] Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., and Ferrin, T. E. (2004). Ucsf chimera: A visualization system for exploratory research and analysis. Journal of Computational Chemistry, 25:1605–1612. [74] Petit, F., Valotaire, Y., and Pakdel, F. (1995). Differential functional activities of rainbow trout and human estrogen receptors expressed in the yeast saccharomyces cerevisiae. European Journal of Biochemistry, 233:584–592. 134 APPENDIX A SUPPORTING TABLES Table S5.1 The list of PFAS used in this study. The average calculated binding energies and standard deviations in kcal/mol, and experimental IC50 obtained from Ref. 16 are provided as well. 135 Table S5.2 Average residue decomposition energies of charged rER𝛼 pocket residues. The color gradient goes from blue to red as the values change from negative to positive. 136 Table S5.3 Average residue decomposition energies of polar rER𝛼 pocket residues. The color gradient goes from blue to red as the values change from negative to positive. 137 Table S5.4 Average residue decomposition energies of non-polar rER𝛼 pocket residues. The color gradient goes from blue to red as the values change from negative to positive. 138 Table S5.5 Average residue decomposition energies of charged rER𝛽 pocket residues. The color gradient goes from blue to red as the values change from negative to positive. 139 Table S5.6 Average residue decomposition energies of polar rER𝛽 pocket residues. The color gradient goes from blue to red as the values change from negative to positive. 140 Table S5.7 Average residue decomposition energies of non-polar rER𝛽 pocket residues. The color gradient goes from blue to red as the values change from negative to positive. 141 APPENDIX B SUPPORTING FIGURES Figure S5.1 The overlap of rER𝛼 and rER𝛽 LBDs is shown. Van der Walls ball representation was used for the arginine residues used in pharmacophore docking. The locations of mutated residues are shown in yellow. The volume of the binding pockets is 85 Γ…3 and 92 Γ…3 for rER𝛼 and rER𝛽, respectively. The mutated residues between two isoforms with numbering of rER𝛼 /rER𝛽 are: V353/A219, T354/N220, M355/V221, T357/M223, L358/S224, S361/N227, M362/L228, S394/C260, S395/C261, I402/L268, I405/M271, I409/V275, H410/N276, C411/H277, A418/S284, Q419/P285, I422/S288, D424/S290, S426/D292, D429/S295, E432/Q298, M434/F300, A435/V301, T444/A310, V445/T311, E536/D402, Y539/H405, S540/C406, I541/M407, C553/M409, N545/K411, K546/M412, G559/A418, R561/I420,L562/E421, Q563/M422. 142 Figure S5.2 (a) MM-GBSA binding energies of rER𝛼 and rER𝛽 proteins. (b) The kde distribution of MM-GBSA energies with respect to the PFAS type: carboxylic, and sulphonic along with the rest of the PFAS. The pink dashed line corresponds to E2 binding energy to rER𝛼 and blue dotted line indicates the binding energy of E2 to rER𝛽. 143 Figure S5.3 Per-residue root-mean square fluctuation (RMSF) of rER𝛼 residues of the first simulation sets. 144 Figure S5.4 Per-residue root-mean square fluctuation (RMSF) of rER𝛼 residues of the second simulation sets. 145 Figure S5.5 Per-residue root-mean square fluctuation (RMSF) of rER𝛽 residues of the first simulation sets. 146 Figure S5.6 Per-residue root-mean square fluctuation (RMSF) of rER𝛽 residues of the second simulation sets. 147 Figure S5.7 The helix numbering of (a) rER𝛼 and (b) rER𝛽 LBDs is used for hydrogen bond analysis. 148 Figure S5.8 Hydrogen bond heatmap for rER𝛼 Helix 3,5, and 6. The residue and atom pairs that form hydrogen bonding are shown with the following nomenclature: Res1@Atom1/Res2@Atom2. 149 Figure S5.9 Hydrogen bond heatmap for rER𝛼 Helix 3,5, and 6. The residue and atom pairs that form hydrogen bonding are shown with the following nomenclature: Res1@Atom1/Res2@Atom2. 150 Figure S5.10 Hydrogen bond heatmap for rER𝛼 Helix 7 and 8. The residue and atom pairs that form hydrogen bonding are shown with the following nomenclature: Res1@Atom1/Res2@Atom2. 151 Figure S5.11 Hydrogen bond heatmap for rER𝛼 Helix 11 and 12. The residue and atom pairs that form hydrogen bonding are shown with the following nomenclature: Res1@Atom1/Res2@Atom2. 152 Figure S5.12 Hydrogen bond heatmap for rER𝛽 Helix 3,5, and 6. The residue and atom pairs that form hydrogen bonding are shown with the following nomenclature: Res1@Atom1/Res2@Atom2. 153 Figure S5.13 Hydrogen bond heatmap for rER𝛽 Helix 7, 8, 11, and 12. The residue and atom pairs that form hydrogen bonding are shown with the following nomenclature: Res1@Atom1/Res2@Atom2. 154 Figure S5.14 Hydrogen bond heatmap of loop regions of (a) rER𝛼 and (b) rER𝛽. The residue and atom pairs that form hydrogen bonding are shown with the following nomenclature: Res1@Atom1/Res2@Atom2. 155 Figure S5.15 Comparison of the orientation of investigated PFAS in rER𝛼 and rER𝛽 binding pockets. The poses were obtained by clustering the last 5 ns of the simulations, and the most populated cluster was selected. The beta ones are not looking towards the Arg. 156 Figure S5.16 RMSD of PFAS bound to ER𝛼-LBD, primary simulation set. 157 Figure S5.17 RMSD of PFAS bound to ER𝛼-LBD, duplicate simulation set. 158 Figure S5.18 RMSD of PFAS bound to ER𝛽-LBD, primary simulation set. Figure S5.19 RMSD of PFAS bound to ER𝛽-LBD, duplicate simulation set. 159 Figure S5.20 Total energies of PFAS bound simulations of ER𝛼-LBD. 160 Figure S5.21 Total energies of PFAS bound simulations of ER𝛽-LBD. 161 CHAPTER 6 COMPUTATIONAL PATHWAYS TOWARDS NEW THERAPEUTIC COMPOUNDS: ADDRESSING TUBERCULOSIS VIA MMPL3 INHIBITION 162 6.1 Introduction Tuberculosis (TB) has been one of the most widespread infections around the world 1. According to the World Health Organization (WHO) Report, TB is still one of the top 10 causes of death worldwide. It is an airborne disease and continues to infect around 10 million people each year around the world. The main cause of TB is Mycobacterium Tuberculosis, a bacterium which was first isolated by Robert Koch in 1882 2. Once an individual has the bacteria in their body, the disease lasts for their lifetime, and the bacteria can also result in formation of tubercules 2. The bacteria mainly infect the lungs causing pulmonary TB. However, TB is not a completely new disease. Fossil and skeletal records showed abnormalities in the skeletons that are characteristics of TB, indicating that the infection has existed for a very long time. However, TB is mostly known for turning into an epidemic in 1700s and 1800s. The first vaccine was developed by French scientists following the isolation of M. tuberculosis, called BCG (Bacillus Calmette–GuΓ©rin) vaccine 2. The vaccine is currently being used in countries with high TB prevalence 3. Although the BCG vaccine is protective against meningitis and dissem- inated TB if it is administrated during infancy, its effectiveness in adults is variable 1,3. In addition, the vaccine does not inhibit the primary infection and reactivation. WHO Report also shows that the highest incidences are occurring in rural areas such as sub-Saharan Africa and Asia. However, WHO also indicates that almost one third of the world population is infected but only 10% of those who are infected show active symptoms 1. Currently, TB treatment includes the combination of various drugs to be taken for 6-9 months 4. The first line of treatment includes the use of Isoniazid, Rifampicin, Pyrazinamide and Ethambutol with various doses, and their structures can be seen in Figure 1. Rifampicin acts by inhibiting the DNA-dependent RNA synthesis by binding to the RNA polymerase 5. The rest of the drugs are known to disrupt the cell wall, although their mechanisms are still not completely known 6–8. This treatment is suitable for patients that have drug-susceptible pulmonary TB. The treatment should be followed rigorously, and if not, the bacteria could gain resistance to the first-line treatment drugs, resulting in drug-resistant bacteria. The drug-resistant TB, described as an infection caused by 163 Figure S5.1 Compounds that are being used as first-line treatment against TB. bacteria resistant to at least Isoniazid and Rifampicin, or one other TB treatment drug, requires a more extensive treatment regimen 4. In addition, the cost associated with TB treatment is high, CDC reports that the treatment of a patient with drug-susceptible TB is approximately $ 20.000 9. Apart from the cost of treatment, the side effects are a common concern, as are the interactions with other drugs. For instance, liver, ocular, skin and peripheral nerve toxicities can be seen in patients who take Ethambutol and Isoniazid. Given the prominence of the disease and existence of only a few options for treatment, WHO started an initiative in 2006 to reduce TB worldwide and to cure almost 85% of TB positive cases 1. With this initiative, the efforts to find a better treatment for TB gained momentum. Currently, there are many different treatment strategies that include targeting new proteins with new mechanism of actions (MOA) and creating new treatment regiments with known drugs. The treatment strategies that are currently under clinical trial are shown in Figure 2. Among those, linezolid, lavofloxacin and ofloxacin are repurposed drugs, and TMC-207 (Bedaquilin) as well as SQ109 have new MOA. While TMC-207 targets ATP synthase, SQ109 inhibits the activity of Mycobacterium Membrane Protein Large 3 (MmpL3) 1. However, the bottleneck when developing new TB inhibitory drugs is to find a target with whole-cell active compounds. MmpL3, at the time being, fills this gap due to its role in TMM relocation. In addition, the drug candidates should also aim to reduce the treatment duration and occurrence of resistance. 6.1.1 Treating TB by Targeting MmpL3 MmpL proteins belong to the RND (resistance, nodulation and cell division) superfamily which exists in bacteria, archaea, and eukaryotes. The RND family consists of multidrug resistance 164 Figure S5.2 The global clinical development pipeline for new anti-TB drugs and drug regimens to treat TB disease. Reproduced with permission from 10. pumps which are used to transport drugs, heavy metals, fatty acids and detergents, and in general, they use the electrochemical proton gradient across the mycobacterium inner membrane. In M. tuberculosis, 13 genes are found to code MmpL proteins. In a prior study, in order to understand the roles of those MmpL proteins, they were knocked out one by one, and cells were grown without the gene 11. However, when the MmpL3 gene was removed, the cells did not grow. This led to the conclusion that MmpL3 protein is important for cell growth and viability in M. tuberculosis, and it is conserved within all mycobacterial genome. Later, it was determined that the main function of MmpL3 protein is to act as a flipase for lipids called trehalose monomycolates (TMM) 12. Mycobacteria has two membranes, an inner membrane (IM) and an outer membrane (OM) (Fig. 3). The IM mainly consists of Ac2PIM2 , and other major phospholipids, with Ac2PIM2 being the most abundant 14. The OM, on the other hand, does not have any common lipids, but has mainly glycopeptidolipids and mycolic acid containing lipids 14. The current treatment strategy with Isoniazid and Ethambutol focuses on the inhibition of mycolic acid synthesis to disrupt the cell membrane composition 13. However, the MmpL3 protein is further in the mycolic acid pathway (Fig. 4). The MmpL3 is located in the IM and thought to be responsible for trehalose monomycolate translocation to the periplasmic domain in between the IM and OM. Once trehalose monomycolates 165 Figure S5.3 Schematic representation of the Mtb cell envelope. Reproduced with permission from 13. reaches the OM, it dimerizes and forms trehalose dimycolates (TDM) 13. trehalose dimycolates and other mycolic acids are important for the permeability of the OM (by making the membrane extremely hydrophobic) as well as the formation of biofilms 13,15,16. Therefore, the transportation of trehalose monomycolates has great importance. As stated, MmpL3 protein exports trehalose monomycolates, and it does so with the help of proton motive force (PMF). The crystal structure obtained from Mycobacterium smegmatis revealed that there are two Asp-Tyr pairs located in helices IV and X, and they are hypothesized to play an important role in PMF, showed by mutation studies 17,18. This structure can be seen in Fig.3.5. The C-terminal domain, which is located on the cytoplasmic side, however is not shown. High-throughput screening (HTS) resulted in the identification of an adamantyl urea (AU1235) against both drug-susceptible and drug-resistant M. tuberclosis by targeting MmpL3 19. Later, two other compounds, BM212 and SQ109, are found to inhibit MmpL3 protein (Fig. 5) 20,21. The 166 Figure S5.4 Biosynthetic pathway of mycolic acids in Mtb and site of action of anti-TB drugs 13. The drugs that are used for TB treatment currently are indicated with red text. β€˜Reproduced with permission from 13 MmpL3 M. smegmatis structure co-crystallized with these compounds showed that despite different chemical scaffold, they all bind to the same pocket, and inhibit PMF by disrupting hydrogen bonds between Asp-Tyr pairs 17. Since then, many groups have been working on creating compounds that target MmpL3 protein. A recent study showed that HTS helped to identify new types of MmpL3 inhibitors, and the study was also successful in identifying the known hit compounds 22. In order to understand the mechanism of action of those hit compounds, currently, computational investigation of the effect of the identified compounds was done. 6.2 Computational Details 6.2.1 Homolog Modeling Homology modeling essentially targets building a three-dimensional structure for proteins by using the available structures of closely related proteins. Since not all proteins have their 3- D structures experimentally determined, being able to predict them successfully with in silico methods is extremely useful in drug discovery studies. There are currently many available tools for homology modeling, and each of them uses a different approach. One of the most successful one is I-TASSER by Zhang Lab 23–25. The amino acid sequence is first matched with the sequence 167 Figure S5.5 Left: Overall Structure of MmpL3 from M. Smegmatis (PDB ID: 6AJG) 17. A: The side view of the crystal structure. Both periplasmic domain as well as transmembrane domain are divided into two regions, N and C. B: The top view of the transmembrane domain. Two Asp-Tyr pairs are shown in stick representation. The star indicates the pocket inhibitor molecules bind to. Right: Structures of SQ109, BMB212, and AU1235. of available crystal structures in Protein Data Bank, producing fragments. Then, these fragments obtained from PDB templates are combined to form full-length structure models with Monte Carlo simulations, and the clustering is used to obtain a model. In the final step, this model is used to re-assemble the structures to obtain the final model with the lowest energy. Given the success of this approach in CASP (Critical Assessment of Techniques for Protein Structure Prediction) competitions, it was used to model M. tuberculosis the MmpL3 protein structure. 6.2.2 Molecular Docking MOE (Molecular Operating Environment) is a commercial software that is designed for in silico research. The docking suite of MOE is capable of performing induced fit docking with a user-friendly GUI. The algorithm used for ligand placement is Triangle Matcher which uses alpha- spheres to define the binding site 26. The ligand is positioned so that the triplets of ligand atoms are superposed on alpha spheres, and if there is a steric clash with protein, that pose is removed. The scoring function for placement step is called London dG that includes the terms for ligand flexibility, hydrogen bonds and desolvation 27,28. After the placement step, a specified number of 168 poses (usually 10% of the initial placements) are refined for final ranking. For the refinement step, force-field based GBVI/WSA dG (Generalized-Born Volume Integral/Weighted Surface area) scoring function is used 28. The docking procedure is applied as follows unless stated otherwise: β€’ Compounds are drawn in MOE software and the structures are minimized at Amber10:ETH level as implemented in MOE. β€’ The binding pocket is selected using "SiteFinder" package that utilizes alpha-spheres. β€’ For placement, Triangle Matcher method is used with London dG scoring function. 100 poses are generated. β€’ For the refinement step, induced fit method is used with GBVI/WSA dG scoring function. Top 10 poses are reported. 6.2.3 Molecualr Dynamics Simulations and Binding Free Energy Calculations The MD simulations systems are prepared as following unless stated otherwise: β€’ Protein crystal structures are prepared in MOE. β€’ The partial charges of the ligand molecules are calculated using AM1-BCC with antechamber module. β€’ The protein and ligand are combined, ff14SB, gaff2 and TIP4P-EW force-fields are used for protein, ligand and water molecules, respectively. β€’ Minimization is performed in 4 steps. 1. All heavy atoms are restrained (100 kcal/mol/A2) and the system is minimized for 20000 steps. 2. All heavy atoms are restrained (50 kcal/mol/A2) and the system is minimized for 20000 steps. 169 3. Only ligand is restrained (10 kcal/mol/A2) and the system is minimized for 20000 steps. 4. The system is minimized with no restraint. β€’ Heating is performed in a step-wise fashion from 0K to 300K. Langevin thermostat is used for temperature control. The time step is 1fs. 1. All atoms are restrained (3 kcal/mol/A2) and the system is simulated at 0K for 10ps. 2. All atoms are restrained (3 kcal/mol/A2) and the system is heated up to 5K for 50ps. 3. All atoms are restrained (3 kcal/mol/A2) and the system is heated up to 10K for 50ps. 4. All atoms are restrained (3 kcal/mol/A2) and the system is heated up to 20K for 50ps. 5. All heavy atoms except for the solvent atoms are restrained and the system is heated up to 50K for 50ps. 6. All heavy atoms except for the solvent atoms are restrained and the system is heated up to 100K for 100ps. 7. All heavy atoms except for the solvent atoms are restrained and the system is heated up to 200K for 100ps. 8. No restraint is applied, and the system is equilibrated at 200K for 200ps. 9. No restraint is applied, and the system is heated up to 300K for 400ps. 10. No restraint is applied, and the system is equilibrated at 300K for 500ps. 11. A short 500ps long simulation is performed before the production run at 300K. β€’ The production run is performed with 1fs time step under NPT conditions. SHAKE is applied, and the Langevin thermostat is used. The MM-GBSA/PBSA methods were used to estimate the binding energies and rank the affinities of the investigated compounds by selecting every tenth frame from the simulation. The root-mean-square-distances (RMSD), root-mean-square-fluctuations (RMSF), hydrogen bonds, and residue decompositions were calculated using the cpptraj module. 170 6.2.4 Fragment Search To grow the ligands within the binding pocket, a selected molecule was selected to for fragment addition within MAB mmpL3 protein. Two available fragment libraries within MOE were selected: ChEMBL fragment library (778760 fragments) and MOE fragment library (40626 fragments). MM/GBVI values were calculated for each generated compound and used for ranking them. The compounds with lowest MM/GBVI values were selected for docking. 6.3 Results and Discussion 6.3.1 Preliminary investigation of MmpL3 inhibition with computational modeling The only available MmpL3 crystal structure is for Mycobacterium smegmatis 17, as of 2021. M. smegmatis (Mtb) and M. tuberculosis MmpL3 proteins share 60% identity and 76% similarity. Therefore, I-TASSER server is used to obtain the model structure for M. tuberculosis. The model has a 1.3Γ…overall similarity (Fig. 6). As stated in the Introduction, the binding pocket for SQ109, for example, lies in the middle of the transmembrane domain 17. The model structure is simulated for 20ns to sample a better orientation for the two Phe residues that rest at the bottom of the binding pocket that would allow docking of the compounds. Performing a short simulation also allows for the model system to equilibrate better. As can be observed in Fig.6, a better orientation for Phe side chains is successfully obtained, and it was also similar to the orientations observed in structure co-crystallized with an inhibitor, SQ109. The orientation of Phe residues is important since when they have positioned upwards, they do block the binding pocket. However, a downwards orientation allows for the docking of the molecules into the pocket. For the following studies, 20ns simulated model structure is used. The binding pocket overall is hydrophobic at the top part, polar in the middle due to Asp-Tyr pairs, and slightly hydrophobic at the bottom part again. After obtaining an appropriate structure, co-crystallized ligands introduced in Fig. 3 are docked to make sure that the docking procedure is working successfully. For SQ109, the orientation as well as the ligand interactions of the compound in the pocket is the same when compared to co- crystallized SQ109 (Fig. 7). Both docked pose and the co-crystallized SQ109 are seen to directly interact with Asp640 residue in the binding pocket. The 50ns long simulation of the docked system 171 Figure S5.6 Left: The model structure (orange) and 6AJF crystal (green) structure are overlapped 17. The total RMSD can be seen at the top of the figure. PC/PN is the periplasmic region, TM is the transmembrane and C-terminal is the cytoplasmic region of MmpL3. Right: Overlap of the binding pocket Phe side groups with 6AJF (top right, green) and 6AJG (bottom right, cyan) crystal structures. 20ns simulated model structure is shown in red. Figure S5.7 A. Comparison of SQ019 orientation between model (red) and 6AJG (cyan) structures. B,C. Ligand interaction map of docked and co-crystallized SQ109, respectively. 172 Figure S5.8 Left: RMSD (top) and distance (bottom) plots. Right: Residue interaction plot for the pocket residues. shows a stable N-terminal and binding pocket, however, the C-terminal RMSD keeps increasing. The distances between Asp-Tyr pairs are also tracked throughout the simulation. While one pair stays very close, the interaction between the other pair is completely broken due to the interference of SQ109 (Fig. 8). The residue interactions suggest that Asp640, Tyr252 and Tyr641 provide the highest stabilizing contributions for SQ109, while there is no positive contribution from any surrounding residues. This indicates that the compound is very well positioned and stable in the pocket, and it is successful in disrupting one Asp-Tyr pair. The binding free energy results (MM-PBSA) are reported in Table 1, and SQ109 results align with the experimental energies. Following the success of the docking procedure, the ligands presented in Williams et al. are investigated. The names of the compounds are given in Table 1, and their structures along with EC50 values are reported in the cited paper 22. The docking of HC2060 ligand resulted in two different poses. In order to understand which pose is the preferred one, both are simulated and analyzed. Based on free energy calculations, pose 2 is more stable in the pocket than pose 1. During the simulations, it is also observed that the pose 1 is folding within the pocket, trying to maximize the interaction between the polar region and the carbonyl azepane & piperidine rings. In the pose 2, they are located closer to Asp-Tyr pairs and can form interactions easily (Fig. 9). Therefore, pose 2 is considered for further analysis. This observation also gives support regarding 173 Figure S5.9 Orientation of HC2060 pose 1 and pose 2. A. The snapshot from the simulation of HC2060 pose 1. B. The comparison of docking orientations of pose 1(red) and pose 2 (cyan). C. The snapshot from the simulation of HC2060 pose 2. HC2060 compound is shown in vdW representation, Asp-Tyr pairs are shown in stick representation. the pose selection. Further analysis shows that the interactions with the surrounding residues are favorable, and Asp-Tyr interactions are disrupted throughout the simulation. Similar analysis is performed for HC2183 compound as well. It is selected due to the low differences between EC50s of the mutant pool and wild-type 22. HC2183 docking provided two different poses with very similar scores. Binding free energies, however, indicates that pose 2 is more stable in the pocket (Table 1). Furthermore, the disruption of Asp-Tyr interactions is lot more stable in pose 2 simulations. The pose 1 orientation forces the acetamide group to face the hydrophobic residues; however, in pose 2, acetamide is positioned near to polar region in the pocket (Fig. 10). Comparison of binding pocket surfaces of SQ109, HC2060 and HC2183 show that the pocket can be extended upwards easily, but the bottom part is blocked by two Phe residues. At this point, it was assumed that the pocket has more or less a cylindrical shape with a small protrusion to the middle region between Asp-Tyr pair. This results in a rotation of the compounds especially during docking procedure. However, docking results of the derivatives of compounds published by Zheng et al. revealed that a small extension in-between the Asp-Tyr pair can accommodate if functional group (oxane or cyclopentane) is bulky for the pocket 29. In Figure 11, the comparison of 174 Figure S5.10 Orientation of HC2183 pose 1 and pose 2. A. The snapshot from the simulation of HC2183 pose 1. B. The comparison of docking orientations of pose 1(red) and pose 2 (cyan). C. The snapshot from the simulation of HC2060 pose 2. HC2183 compound is shown in vdW representation, Asp-Tyr pairs are shown in stick representation. D,E: The residue interaction plot for pose1 and pose 2, respectively. pockets for SQ109 and a derivative indicates that the pocket has a capacity to expand upwards. We also see that the middle section of the pocket is negatively charged while the upper side is mostly neutral. In conclusion, the orientation of the compounds can be determined based on the binding free energies, which is further supported by the electrostatic surface of the pocket. The key interactions are determined for the pocket residues based on the MD simulations. The pocket has a flexibility to expand in one direction, and this can be utilized when designing new compounds. Although a direct comparison of EC50 values with MM-GBSA/PBSA values is not feasible, a similar trend was observed (Table 1). 175 Table S5.1 MM-GBSA energy values for simulated systems. 176 Figure S5.11 The pocket surface is shown for SQ109 and a derivative compound. A,C. Electrostatic surface of the pocket when SQ109 and a derivate are present, respectively. B. The overlap of the pockets. SQ109 is shown with light pink mesh surface, and other compound is shown in green solid surface. 6.3.2 Inhibitory differences among mycobacterium species: Mtb vs MAB While the mmpl3 gene is conserved among the mycobacterium species, there are certain dif- ferences of the protein sequences that lead to different reposes to the inhibitor compounds that are being tested. The sequence alignment of Mycobacterium tuberculosis(Mtb) and Mycobacterium abscessus (MAB) mmpL3 proteins share 62% sequence identity, as can be seen in Figure 12. Similarly, Mycobacterium tuberculosis (Mtb) and Mycobacterium avium complex (MAC) mmpL3 proteins have 73% sequence identity. When the TM helices around the binding pockets were com- pared, Mtb and MAB Helix 4 sequences were the most similar and Helix 5 was the least similar. Similarly, we do observe that there is a high sequence similarity for Helix 4 and 12, the Helix 5 had the lowest similarity between Mtb and MAC (Figure 12). When only the binding pocket residues were compared, all three species show high identity with some differences including Ser295 (Mtb) (cid:25) Ala (MAB); Ala296 (Mtb) (cid:25) Ser (MAC) and Leu (MAB); Leu299 (Mtb) (cid:25) Met (MAB); Thr314 (Mtb) (cid:25) Gly (MAB) and Ile (MAC); Ser317 (Mtb) (cid:25) Ala (MAC); Ala632 (Mtb) (cid:25) Val (MAB); Leu633 (Mtb) (cid:25) Val (MAB, MAC); Ile673 (Mtb) (cid:25) Leu (MAC). Mutations such as Ile to Leu may not affect the binding of the inhibitor compounds significantly, however, Ser to Ala, or Thr to Gly mutations could affect the binding strength or the ability of the inhibitor compounds. Despite the differences of binding site TM helices, the RMSD of the pocket residues highlights the fact that the residue orientations were similar (Fig. 12(C)). In addition, the three MmpL3 structures 177 Figure S5.12 The sequence comparison of Mtb, MAB, and MAC mmpL3 proteins. A. Sequence overlap of transmembrane helices around the binding pocket of mmpL3 protein. B. The top-down view of the TM region of Mtb mmpL3 protein. The Asp-Tyr pairs and Phe residues are shown in vdW representation, and the inhibitor binding site was shown with star. The colors of helices correspond to the colors used in A. C. The RMSD of superimposing the binding pocket residues of Mtb, MAB, and MAC. obtained with homology modeling suggest that the initial pocket opening between the TM4-5-6 and TM10-11-12 for inhibitor binding is larger in MAB than of Mtb and MAC (Fig. S1(B,C)). This might play a role in the accessibility of the pocket to the inhibitors. The apo mmpL3 proteins from both MAB and Mtb were also simulated analyzed. The apo MAB mmpL3 protein simulations show that the C-terminal is very flexible, similar to the Mtb mmpL3 C-terminal. On the other hand, the distances between the atoms interacting on the Asp-Tyr pairs have larger distances in MAB mmpL3 protein with more stability throughout the simulations, compared to the Mtb mmpL3 case (Figure 13). In addition, the most dominant orientations of Asp- Tyr pairs during the apo simulations can be seen, and they indicate that the Asp-Tyr positionings are not very different from each other. As shown in Figure 14, the RMSD of the transmembrane helices 4-6, and 10-12 are also investigated and found to be slightly different between Mtb and 178 Figure S5.13 The distance time series plots for Asp-Tyr pairs from Mtb and MAB mmpL3 apo simulations. The average values for each distance is provided with the corresponding color within the plots. MAB simulations, potentially due to the residue differences between the two proteins. Next, to understand the different inhibitory effects of the selected compounds on these two species, all were docked to both Mtb and MAB mmpL3 model proteins, two poses for each compound were selected and simulated for 20 ns. While the molecular docking gives an idea about the possible orientations of a compound in the binding pocket and surrounding residues, MD simulations are useful to observe the interactions between the inhibitor and the protein as well as the overall behavior of the protein over time. For the selected four compounds, MSU43107, 179 Figure S5.14 The RMSD and radius of gyration time series plots for TM helices 4,5,6,10,11, and 12 from Mtb and MAB mmpL3 apo simulations. 180 MSU43165, MSU43557, and MSU43644, molecular docking was used to obtain binding poses using MOE. On average, the MM-PBSA binding energies of the investigated compounds are lower for MAB MmpL3 protein that is accompanied by lower residue interaction energies with Asp-Tyr pairs in MAC system (Fig. 15). This could be a result of the amino acid differences around the binding pocket which could affect the inhibitor binding. The MM-PBSA energies suggested that certain binding poses are energetically more stable than others, but the other orientations cannot be completely discarded given the shape of the pocket. This preference, however, is most likely due to the placement of -NH on the imidazole ring, which prefers to be closer to Asp residues in the pocket to form hydrogen bonds. The hydrogen bond analysis also showed that the Asp251 is one of the residues that forms strong interactions with the inhibitors through aforementioned -NH group, which consequently implicate that the loss of -NH in this position would hinder the compound’s ability to form strong interactions with the pocket residues. In the absence of a such hydrogen-donating group, the hydrogen bond percentages with Asp residues dropped almost 50% when compared to MSU43557. In addition, in the presence of an inhibitor, while Asp641- Tyr251 interaction persisted at different percentages throughout the simulations, the hydrogen bond percentage for Asp252-Tyr640 interaction was dropped from 80% in apo-MmpL3 to 0%, except for MSU43557. This could be attributed to the orientation of the amine group towards the Asp252 and the positioning of the cyclohexane towards the Asp251 causing it to change orientation to interact with -NH on the imidazole as well as the backbone of Ile248. However, in all cases, the interaction between Ser288 and Asp640 was not disturbed in all simulations, except for HC2099 where the interaction percentage is lower than the other cases. Furthermore, the interaction strengths with each pocket residue were also calculated and the results can be seen in Figure 15. The dominant interaction seen was with Asp residues from the Asp-Tyr pairs, supporting the observations of the hydrogen bond analysis. The simulations of HC2091 and HC2099 provided an interesting situation. While the interaction pattern on HC2099 was similar to MSU-43644 except for the interaction with Ser288, HC2091 showed no hydrogen bonding with the surrounding. A close inspection of the simulation shows that 181 Figure S5.15 Residue decomposition energies of compounds MSU43107, MSU43165, MSU43557, and MSU43644 for Mtb and MAB mmpL3 protein. 182 Figure S5.16 Comparison of binding poses of SQ109 (grey) with MSU43107 (blue), MSU43557 (orange), and MSU42644 (green). HC2091 shifts in the binding pocket towards Tyr252 residue and pushing it away from the original position. This shift also causes Phe644 side group to move towards the pocket area that is usually occupied by cyclohexane in the MSU-43557 case. The presence of the tetrahydropyran ring as well as thiophene might cause the compound to have a polar region near the Phe residues, hence causing a shift in the pocket and creating space for Phe644 to occupy. In addition, this orientation of Phe644 is also observed in the MmpL3 proteins with no inhibitor compound. The calculated MM-GBSA energies for Mtb and MAB mmpL3 protein are shown in Table 2. Overall, the MAB binding affinities measured with MM-GBSA method provided a reasonable ranking of the compounds, when compared to the EC50 values. To validate, we compared binding free energies to our EC50 (Mtb and MAB) values for a small set of analogs (Table X). Despite EC50 being subject to features such as cell wall permeability and protein-binding, the data roughly rank similar within series and from compound-to-compound for both Mtb and Mab with a few exceptions. Analysis of the core of the binding domain showed that combinations of Asp251 and Tyr252 and their partner residues Asp640 and Tyr641 make similar interactions with each inhibitor with either the benzimidazole of MSU-43107 and -43557 or the amide of MSU-43644. Relative to SQ109, each inhibitor does not fully fill the lipophilic binding space available. A fragment search to investigate different moieties that can fill the periplasmic side of the pocket, as will be shown in the next section. Another interesting observation from the apo versus inhibitor-bound simulations is the water 183 access to the channel of mmpL3 proteins from Mtb and MAB species as well as the periplasmic region of the protein. One of the most potent inhibitors (MSU43085) was selected against Mtb mmpL3 for comparison with apo Mtb mmpL3 simulations. The 20 ns long simulations were clustered and the highest populated cluster was selected for the analysis of the water access to the channel where the inhibitors are bound, and the results are shown in Figure S3 . The periplasmic region of both the apo Mtb and MAB mmpL3 proteins clearly indicate the pockets where the TMM lipid can bind and be moved from the inner membrane to the periplasmic region. The water density around these regions was not impacted by the presence of the inhibitor in the channel. On the other hand, the water access to the channel in Mtb mmpL3 is significantly blocked by the presence of the inhibitor molecule (Fig. S3 A,B). One interesting difference between Mtb and MAB channels is that the water occupancy in MAB case covers a slightly larger volume than the Mtb case, supporting the results that the pocket volume is larger in MAB mmpL3 protein. When MSU43085 compound is bound to the protein, however, while both proteins have limited water access to the channel due to the bulky presence of the inhibitor in the channel, MAB mmpL3 protein seems to have slightly more access for the water molecule around the pocket helices, as shown in Figure S3(D). These observations support the fact that the binding pockets have different sizes between Mtb and MAB proteins, and also provide insight about the mechanism of inhibition: the physical presence of the inhibitor molecule can act as a "bottle stopper" to prevent the water access between the periplasmic and cytoplasmic sides and hence, inhibiting the H+ transport. 6.3.3 Understanding the residue mutations in Mtb mmpL3 and its influence on inhibitor binding Using our identified Mtb mmpL3 mutant library (Fig. 2) of resistant organisms, 15 of the mutant positions of both Clade 1 and 2 were modeled into the binding domain of Mtb MmpL3 to provide a 3D model. Most mutations are concentrated on TM5 and TM10, the remaining were on TM helices (TM4, 6, 11, 12). The Clade 1 mutants grouped on the cytosolic side of the binding domain and most Clade 2 mutants grouped on the periplasmic end or on the periphery (R373W). Based on the docking experiments for HC2099 and HC2091 (Fig. 7), it is anticipated that many Clade 1 mutants 184 (Fig 2) would be and are resistant to treatment with HC2099 and HC2091. An alanine scan of both Clade 1 and 2 mutants was also performed. Overall, Clade 1 mutations have a greater effect on ligand binding energies than those of Clade 2, supporting the data reported in Fig. 2. The biggest impact is observed for Tyr252Ala and Phe644Ala mutations. For Tyr252Ala system located on TM4, HC2099 showed loss of binding strength around -6 kcal/mol, while this loss was -4 kcal/mol, -3.5 kcal/mol, and -2 kcal/mol for HC2091, MSU-43664, and MSU-43557, respectively. Fo the Phe644Ala located on TM10, both HC2099 and MSU-43557 have -2.5 kcal mol-1 less interaction strength, and it was -2 kcal/mol and -1.8 kcal-1 for MSU-43664 and HC-2091, respectively. Both Tyr252 and Phe644 are residues located in the binding pocket, and in the close proximity of the compound. The simulation results show that there is a trend in which the investigated compounds are forming strong hydrogen bonds with Tyr residues in the pocket, and mutation to Ala would cause the loss of this prominent interaction. Similarly, Phe644 is one of the two Phe residues that is located at the cytosolic side of the pocket, and they determine the border of the pocket. During the simulations, interactions between the compound and the Phe644 residue are observed, although transient. Nevertheless, Phe644 and Phe252 stay in the close proximity of the compound forming long range interactions with the phenyl group or isopropyl groups, and the loss of the functional group of Phe residue would disrupt the interactions. Another interesting mutation from Clade I is Val285Ala located on TM5. The residue decom- position analysis showed that the energy contribution of Val285 to the binding of MSU-43557 and MSU-43644 is ˜-0.2 kcal/mol and -0.8 kcal/mol, respectively. However, the interaction loss from Val285Ala mutation is -0.6 kcal/mol and -1kcal/mol for MSU-43557 and MSU-43644, respectively. This would indicate that although there is not much of a direct contact between Val285 and the compounds, however, as it can be seen from Figure 2, the positioning of this residue is near Tyr252, and it can form long range interactions with the compounds in the pocket. In Clade II, the highest change of interaction energies was observed for the Ile244Ala mutations. Ile244 is located on TM4, and the side chain of the residue is oriented towards the binding pocket where it is interacting with the inhibitor compound. In addition, the energy contribution of Ile244 for 185 Figure S5.17 The residues tested from the cross-resistance Z-matrix data from CITE. Blue corresponds to the residues selected from Clade I and the red ones correspond to the Clade II resdiues. The Mtb mmpL3 protein is shown with grey ribbon representation,and the binding pocket is indicated with a star. both MSU-43557 and MSU-43644 is 1.4 kcal/mol, indicating that the residue is forming interactions with both compounds. The mutation of Ile244 to Ala showed that the biggest loss of interaction energy was for HC2091 with -1.7 kcal/mol, followed by -1 kcal/mol for MSU-43644, -0.7 kcal/mol for MSU-43557, and -0.3 kcal/mol for HC2099. This observation indicates that the compounds from the HC-2091 series with the chlorobenzene group tends to have stronger interaction energies than the molecules with benzimidazole. Finally, for MSU-43557 and MSU-43644 compounds, the same mutations on MAB MmpL3 were testes. While all of the investigated residues have a similar interaction loss upon mutated to Ala, only Ile244 (Mtb)/Ile255(MAB) showed a difference in MSU-43644. The simulations of this compound for MAB and Mtb MmpL3 proteins revealed that Ile244(Mtb) moves towards the compound during the simulations, however, Ile255 (MAB) moves towards the opposite direction. The interaction energies with this residue also observed to be lower (-0.4 kcal/mol) for MAB MmpL3 simulations, while the Mtb protein had -1.4 kcal/mol. 186 Table S5.2 MM-GBSA energy values for simulated systems in Mtb and MAB mmpL3 binding pockets. While the results show that the mutations of the pocket residues can have a detrimental effect on the interaction energies of the investigated compounds, more sophisticated methods, including but not limited to Molecular Dynamics and enhanced sampling, would be needed to further understand the effect caused by the residues located away from the binding site. Furthermore, with the publi- cation of a crystal structure of mmpL3 protein with TMM lipid (PDB ID: 7N6B), the importance of some resistant mutations that are positioned further away from the binding site can be explained (Figure S2). 6.3.4 Modeling new compounds targeting Mtb/MAB MmpL3 with fragment search To expand the chemical space of the compounds, a fragment search was performed by taking the selected compounds of interest as basis. For the fragment search, the focus was the periplasmic side of the pocket on the MAB mmpL3 protein as it was observed that the pocket can accommodate larger compounds than the pocket of Mtb mmpL3 (Fig. S1). Two different fragment databases available on MOE were used: ChEMBL fragment database and MOE fragment database. The obtained fragments were rescored using the GBVI/WSA dG scoring function, and the top 25 compounds were selected for docking with Mtb and MAB mmpL3 proteins. One common theme that was observed for the top 25 compounds was the presence of a hydrogen donor attached to the phenyl ring. Docking of these compounds revealed that this -OH group prefers to orient towards 187 Val618 and interact with the backbone of the amino acid in MAB mmpL3 in the majority of the poses, however, this was not observed as commonly in Mtb mmpL3, indicating that this residue may not form the hydrogen bond (Fig. 18). While the docking scores were favorable with both protein pockets, the orientations of the compounds within the pocket were observed to be different. most likely due to the shape as well as the pocket volume differences caused by the mutations in the protein (Fig. S1(B)). The poses obtained for MAB mmpL3 pocket indicated a "narrower" and more extended conformations for the compounds while Mtb mmpL3 pocket had a wider conformations around the middle region of the pocket (Figure 18). The added fragments extend towards the same region within the pocket in MAB and coordinate to Val618 backbone carbonyl. When the compounds generated with MOE fragment database were docked to MAB mmpL3 protein, a similar trend in which there is a hydrogen donor (not an alcohol necessarily this time) that interacts with Val618 backbone upon docking (Fig. S5). These results support the fact that the pocket of MAB mmpL3 protein is able to fit larger compounds due to mutations and side chain orientations, and more polar groups can be used to grow the inhibitors towards that region. 6.4 Conclusions Computational modeling approaches are instrumental in understanding the affinities and mo- tions of compounds of interest in their binding sites. Here, docking and molecular dynamics simulations were used to explain and investigate the selected mmpL3 inhibitor molecules and pro- vided a basis for their inhibitory mechanisms. Furthermore, the affinity differences observed for Mtb and MAB species towards certain candidate compounds were also analyzed. 188 Figure S5.18 A. The scaffold used for the fragment search using ChEMBL fragment database on MOE. B. Superimposed docked poses of selected compounds obtained from fragment search, docked on Mtb mmpL3 pocket. The Phe residues and Asp-Tyr pairs are shown in line representation and the docked compounds shown in stick representation. C. Superimposed docked poses of selected compounds obtained from fragment search, docked on MAB mmpL3 pocket. The Phe residues and Asp-Tyr pairs are shown in line representation and the docked compounds shown in stick representation. D. Overlay of Mtb (dark cyan) and MAB (purple) docked poses. Compounds can be seen in Figure S4. 189 [1] Suarez, L. Y. T. (2020). Global tuberculosis report 2020. BIBLIOGRAPHY [2] Barberis, I., Bragazzi, N. L., Galluzzo, L., and Martini, M. (2017). The history of tuberculosis: From the first historical records to the isolation of koch’s bacillus. Journal of Preventive Medicine and Hygiene, 58:E9–E12. [3] CDC (2018). Infection control & prevention, fact sheet - bcg vaccine. [4] (WHO), W. H. O. (2016). Treatment of tuberculosis: guidelines, 4th edition. [5] Wehrli, W. (1983). Rifampin: Mechanisms of action and resistance. Reviews of Infectious Diseases, 5:S407–S411. [6] Schubert, K., Sieger, B., Meyer, F., Giacomelli, G., BΓΆhm, K., Rieblinger, A., Lindenthal, L., Sachs, N., Wanner, G., and Bramkamp, M. (2017). The antituberculosis drug ethambutol selectively blocks apical growth in CMN group bacteria. mBio, 8(1). [7] Zhang, Y., Shi, W., Zhang, W., and Mitchison, D. (2014). Mechanisms of Pyrazinamide Action and Resistance. Microbiology Spectrum, 2(4):1. [8] VilchΓ¨ze, C. and Jacobs, W. R. (2019). The Isoniazid Paradigm of Killing, Resistance, and Persistence in Mycobacterium tuberculosis. [9] Marks, S. M., Flood, J., Seaworth, B., Hirsch-Moverman, Y., Armstrong, L., Mase, S., Salcedo, K., Oh, P., Graviss, E. A., Colson, P. W., Armitige, L., Revuelta, M., and Sheeran, K. (2014). Treatment practices, outcomes, and costs of multidrug-resistant and extensively drug-resistant tuberculosis, United States, 2005-2007. Emerging Infectious Diseases, 20(5):812–821. [10] Edwards, B. D. and Field, S. K. (2022). The struggle to end a millennia-long pandemic: Novel candidate and repurposed drugs for the treatment of tuberculosis. Drugs 2022 82:18, 82:1695–1715. [11] Domenech, P., Reed, M. B., and Barry, C. E. (2005). Contribution of the Mycobacterium tuberculosis MmpL protein family to virulence and drug resistance. Infection and Immunity, 73(6):3492–3501. [12] Su, C. C., Klenotic, P. A., Bolla, J. R., Purdy, G. E., Robinson, C. V., and Yu, E. W. (2019). MmpL3 is a lipid transporter that binds trehalose monomycolate and phos- phatidylethanolamine. Proceedings of the National Academy of Sciences of the United States of America, 166(23):11241–11246. [13] Jackson, M. (2014). The mycobacterial cell envelope-lipids. Cold Spring Harbor Perspectives in Medicine, 4(10). 190 [14] Bansal-Mutalik, R. and Nikaido, H. (2014). Mycobacterial outer membrane is a lipid bilayer and the inner membrane is unusually rich in diacyl phosphatidylinositol dimannosides. Proceed- ings of the National Academy of Sciences of the United States of America, 111(13):4958–4963. [15] DaffΓ©, M., Crick, D. C., and Jackson, M. (2014). Genetics of Capsular Polysaccharides and Cell Envelope (Glyco)lipids. Microbiology Spectrum, 2(4). [16] Yang, X., Hu, T., Yang, X., Xu, W., Yang, H., Guddat, L. W., Zhang, B., and Rao, Z. (2020). Structural Basis for the Inhibition of Mycobacterial MmpL3 by NITD-349 and SPIRO. Journal of Molecular Biology, 432(16):4426–4434. [17] Zhang, B., Li, J., Yang, X., Wu, L., Zhang, J., Yang, Y., Zhao, Y., Zhang, L., Yang, X., Yang, X., Cheng, X., Liu, Z., Jiang, B., Jiang, H., Guddat, L. W., Yang, H., and Rao, Z. (2019). Crystal Structures of Membrane Transporter MmpL3, an Anti-TB Drug Target. Cell, 176(3):636–648.e13. [18] Xu, Z., Meshcheryakov, V. A., Poce, G., and Chng, S. S. (2017). MmpL3 is the flippase for mycolic acids in mycobacteria. Proceedings of the National Academy of Sciences of the United States of America, 114(30):7993–7998. [19] Grzegorzewicz, A. E., Pham, H., Gundi, V. A., Scherman, M. S., North, E. J., Hess, T., Jones, V., Gruppo, V., Born, S. E., KordulΓ‘kovΓ‘, J., Chavadi, S. S., Morisseau, C., Lenaerts, A. J., Lee, R. E., McNeil, M. R., and Jackson, M. (2012). Inhibition of mycolic acid transport across the Mycobacterium tuberculosis plasma membrane. Nature Chemical Biology, 8(4):334–341. [20] Rosa, V. L., Poce, G., Canseco, J. O., Buroni, S., Pasca, M. R., Biava, M., Raju, R. M., Porretta, G. C., Alfonso, S., Battilocchio, C., Javid, B., Sorrentino, F., Ioerger, T. R., Sacchettini, J. C., Manetti, F., Botta, M., Logu, A. D., Rubin, E. J., and Rossi, E. D. (2012). Mmpl3 is the cellular target of the antitubercular pyrrole derivative bm212. Antimicrobial Agents and Chemotherapy, 56(1):324–331. [21] Tahlan, K., Wilson, R., Kastrinsky, D. B., Arora, K., Nair, V., Fischer, E., Barnes, S. W., Walker, J. R., Alland, D., Barry, C. E., and Boshoff, H. I. (2012). SQ109 Targets MmpL3, a Membrane Transporter of Trehalose Monomycolate Involved in Mycolic Acid Donation to the Cell Wall Core of Mycobacterium tuberculosis. Antimicrobial Agents and Chemotherapy, 56(4):1797–1809. [22] Williams, J. T., Haiderer, E. R., Coulson, G. B., Conner, K. N., Ellsworth, E., Chen, C., Alvarez-Cabrera, N., Li, W., Jackson, M., Dick, T., and Abramovitch, R. B. (2019). Identification of new MMPL3 inhibitors by untargeted and targeted mutant screens defines MMPL3 domains with differential resistance. Antimicrobial Agents and Chemotherapy, 63(10). [23] Zhang, Y. (2008). I-tasser server for protein 3d structure prediction. BMC Bioinformatics, 9:1–8. 191 [24] Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., and Zhang, Y. (2014). The I-TASSER suite: Protein structure and function prediction. Nature Methods, 12(1):7–8. [25] Roy, A., Kucukural, A., and Zhang, Y. (2010). I-TASSER: A unified platform for automated protein structure and function prediction. Nature Protocols, 5(4):725–738. [26] Edelsbrunner, H. (1992). Weighted alpha shapes. Technical report, Technical paper of theDepartment of Computer Science of the University of Illinois atUrbana-Champaign, Urbana, Illinois. [27] Corbeil, C. R., Williams, C. I., and Labute, P. (2012). Variability in docking success rates due to dataset preparation. [28] Labute, P. (2008). The generalized born/volume integral implicit solvent model: Estimation of the free energy of hydration using London dispersion instead of atomic surface area. Journal of Computational Chemistry, 29(10):1693–1698. [29] Zheng, H., Williams, J. T., Coulson, G. B., Haiderer, E. R., and Abramovitch, R. B. (2018). HC2091 kills mycobacterium tuberculosis by targeting the MmpL3 mycolic acid transporter. Antimicrobial Agents and Chemotherapy, 62(7). 192 APPENDIX A SUPPORTING TABLES Table S6.1 MM-GBSA energy values for addtionoal simulated systems in Mtb and MAB mmpL3 binding pockets. 193 APPENDIX B SUPPORTING FIGURES Figure S6.1 A. The pocket surfaces for Mtb, MAC, and MAB mmpL3 proteins along with TM4 and TM10 are shown. The residues are shown in stick and line representation. B. The distances between the TM4 and TM10 helices in Mtb and MAB mmpL3 proteins. The cytoplasmic side of the pocket was defined by the two Phe residues, and the periplasmic side was defined by the Ile-leu residues. C. The top view of the pockets of Mtb and MAB mmpL3 proteins. The binding pocket was shown in green mesh, and the pocket residues are shown in vdW representation. Due to the mutations showed above, MAB mmpL3 has more accessible space on the periplasmic side of the pocket. 194 Figure S6.2 A. The residues tested from the cross-resistance Z-matrix data from CITE. Blue corresponds to the residues selected from Clade I and the red ones correspond to the Clade II resdiues. The Mtb mmpL3 protein is shown with grey ribbon representation,and the binding pocket is indicated with a star. B. The superimposed structure of the Mtb mmpL3 protein from (A) with mmpL3 protein co-srystallized with TMM lipids. The positions of some identified resistant mutations and TMM lipid binding sites do overlap. 195 Figure S6.3 Water exposure surfaces for the periplasmic domains and the TM channel of Mtb and MAB proteins. A. Apo Mtb mmpL3 protein, B. MSU43085-bound Mtb mmpL3 protein, C. Apo MAB mmpL3 protein, and D. MSU43085-bound MAB mmpL3 protein. PC/PN:perilasmic C/periplasmic N terminal. TM: transmembrane region. 196 Figure S6.4 Examples of compounds generated using fragment search with ChEMBL database in MOE. 197 Figure S6.5 A. The scaffold used for the fragment search using MOE fragment database in MOE. B. Superimposed docked poses of selected compounds obtained from fragment search, docked on MAB mmpL3 pocket. The Phe residues and Asp-Tyr pairs are shown in line representation and the docked compounds shown in stick representation. C.Examples of compounds generated using fragment search with MOE database in MOE. 198 CHAPTER 7 MODELING OF DOSS INTERACTIONS WITH SMALL MOLECULE INHIBITORS AS A SUPPLEMENTARY TREATMENT STRATEGY AGAINST TB 199 7.1 Introduction 7.1.1 Sensing the Environment: DosRST Proteins While many of the relevant references have already been provided in Chapter 7, for this chapter, only new references relevant to DosRST have been included. Another critical target for TB treatment that has been identified belongs to the DosRST two- component system. Two-component systems (TCS) generally consist of one sensing protein, histidine kinase, and one response regulatory element. A significant amount of homology is shared among the TCS proteins in different bacteria, indicating that TCS is an important regulatory system. In Mycobacterium species, TCS is involved in the regulation of intracellular multiplication during the early infection period, the regulation of genes that are involved in pathogenesis, and the adaptation to pH and hypoxia hence controlling the non-replicating persistence (NRP) 1. TCS changes the expression of the NRP genes that would allow the bacteria to survive non-optimal conditions, and this plays an important role in TB pathogenicity and treatment length. In M. tuberculosis, there are 11 known TCS, among them two are essential (MtrAB and DosRST) 1. Disrupting the environmental sensing for TB treatment has been the focus for some time, and in this report, we will cover DosRST systems as well. DosRST is a member of TCS, however, it differs by having two sensing histidine kinases instead of one. These kinases, DosS and DosT, react to the change in hypoxia (lack of oxygen) and redox change in the environment by binding to O2, CO and NO through the heme group. Upon binding, the proteins switch to the "active" mode and autophosphorylate themselves. Then, the phosphate is transferred to the regulatory response element, DosR. Phosphorylated DosR dimerizes and binds to a specific conserved region on DNA to regulate the expression of almost 50 genes 1. DosS and DosT share about 60% sequence identity, and they also have structural homology. Both DosS and DosT have GAF domain containing heme group in N-terminal, histidine kinase domain, and ATP- binding domain in the C-terminal. Although the specifics of the sensing through heme and how the autophosphorylation is triggered are not known currently, the hypothesis is that ligand-heme interactions induce a conformational change that would cause changes in the overall structure to 200 Figure S6.1 A: Schematic for the DosRST signaling pathway, with examples of where small molecules and peptides interfere with DosRST signaling 1. B: Proposed mechanism for the role of M. tuberculosis DosS and DosT in the shift down of tubercle bacilli to the persistent state 2,3. Normoxia: Normal oxygen conditions, hypoxia: low oxygen condition. trigger the autophosphorylation. The specific ligands of DosS and DosR are not exactly known, but a recent study shows that DosS functions in reduced Fe2+ state, indicating that it could be a redox sensor 2,3. The proposed mechanisms of activation for DosS and DosR is shown in Fig.3.6. The current proposed mechanism for DosS is as follows: in the off state, Fe is oxidized, and with the help of a reducing agent (such as flavin nucleotides), Fe is reduced to Fe2+ switching the protein "on". During an "on" state, either NO or CO can bind to the reduced iron causing a conformational change that will trigger the autophosphorylation 2. For DosT, the off-state corresponds to O2-bound Fe. Under hypoxia, O2 is released, and iron is reduced, triggering the autophosphorylation 4. After the transfer of phosphate from either DosS or DosT to DosR, DosR dimerizes and binds to the regulatory sequence on DNA. The mutation studies with DosRST proteins showed dosR mutation did not inhibit the virulence. However, dosRS mutants caused growth defects 1. There are a number of molecules that have been linked to the DosRST system using HTS. "Compound 10" has been found to inhibit DosR-regulated gene expression under normoxia - normal oxygen conditions. In another HTS study that scanned more than 540,000 compounds, six distinct molecules were discovered to inhibit DosRST 5. Among them, Artemisinin acted by oxidizing the heme group in both DosS and DosT, causing loss of sensing ability 5. HC102A and 201 Figure S6.2 A: The structures of the compounds that target DosRST system. B: Crystal structure of the ferric DosS GAF domain (PDB ID: 2W3E). HC103A that are shown in Figure 2 did not act on the GAF domains, but they mainly inhibited the autophosphorylation activity in DosS and DosS/T, respectively. HC104 is found to interfere with the DNA-binding mechanism of DosR, but only for a specific operon hence does not impact the overall bacterial survival. In a UV-visible spectroscopy assay for DosS protein, the presence of HC106A caused a shift in Fe2+ Soret peak, which is also observed when NO or CO binds to heme. However, it is believed that the mechanism of action of HC106 is different than of Artemisinin. When a residue located in the heme channel, G117, is mutated to Leucine, a resistance to both compounds has been observed. The aforementioned study also showed that both compounds access to heme through the same path 5. 7.2 Computational Details 7.2.1 Protein preparation and Docking Procedures The existing crystal structures of DosS protein (PDB ID: 2W3D, 2W3F, 2W3E, 4YNR) were prepared using the Molecular Operating Environment (MOE) protein preparation suite at pH 7 at their appropriate iron oxidation states. 2,6,7 The iron metal center was selected as an anchor for the docking using a pharmacophore approach, which was utilized to place the investigated molecules. 202 Then, the top 100 poses were refined using induced fit approach and the final docking scores were calculated with Generalized-Born volume integral/weighted surface area score (GBVI/WSA dG). 7 The poses were visually analyzed and selected for further investigation. For the mutation docking studies, the residues of interest were mutated and the docking procedure was repeated. 7.2.2 Parametrization of heme group To model the non-bonded iron interactions, 12-6 LJ parameters were used for iron-isoxazole interactions. For the bonded heme model, MCPB.py was used for streamlined parametrization of the heme and iron at different oxidation states and different number of coordinations. 8 The small model containing iron, heme, and axial coordinating histidine was optimized using B3LYP/6-31G* to calculate the force constant. The large model with the second axial coordinating moiety was used to do the RESP charge calculations 9. Then, the Seminario method was used to generate the force field parameters. 10 As a final step, the RESP charge fitting was performed. 7.2.3 Constant pH Molecular Dynamics Simulations To determine the pKa values hence the correct protonation states of pocket residues, CpHMD 11 simulations were performed for the following systems: Fe+2, Fe+3, CO-bound Fe, and Fe+2-isoxazole ring. The protein was modeled with constph ff based on the ff10 force field with PBradii mbondi2 and parameters for titratable residues (histidine, aspartic acid, and glutamic acid). The CpHMD method uses Monte Carlo sampling of discrete protonation states along with a molecular dynamics simulation in an implicit solvent defined by igb = 2 ("OBC" model). 12 The leaprc.constph force field was used for protein and gaff2 force field was used for small molecules. 13 Heme and iron were parametrized based on the oxidation state of the iron, as described in the previous section. Each prepared system was simulated for the pKa calculations from pH 1 to 14 with an increment of one. For each pH, the simulations were performed for 5 ns with 2 fs timestep with an attempt to change the protonation steps at every 5 steps. The minimization is performed in four steps with decreasing positional restraints on the heavy atoms (100, 50, 10, 0 kcal mol -1 Γ…-2 ) and for each step, minimization was performed for 200000 steps with steepest descent algorithm. The systems were then heated up to 300 K using the same 203 ten-step heating procedure used in the mmpL3 simulations, with the addition of Gibbs implicit solvent. No protonation state change was attempted during the minimization and the heating processes. 7.2.4 Classical Molecular Dynamics Simulations The MD simulations were performed for 100 ns with an explicit solvent described by the TIP3P water model. 14 The ff14SB force field was used for protein, and gaff2 force field was used for small molecules. 15 AM1-BCC was used to calculate the partial charges on the inhibitor compounds. 16 The minimization is performed in four steps with decreasing positional restraints on the heavy atoms (100, 50, 10, 0 kcal mol -1 Γ…-2 ) and for each step, minimization was performed for 200000 steps with steepest descent algorithm. The systems were then heated up to 300 K using the same ten-step heating procedure used in the mmpL3 simulations. In the production run, 1 fs timestep was used along with Langevin thermostat and isobaric barostat. The 100 trajectories were saved for each nanosecond of the simulation. 7.2.5 Analysis of Trajectories The cpptraj suite of AmberTools20 was used for analysis of the trajectories. RMSD, RMSF, radial distribution plots, and residue decomposition energies were calculated with cpptraj module and plotted using gnuplot. 17,18 7.3 Results and Discussion 7.3.1 New Therapeutics for DosRST Inhibition with Docking Studies In this section, the docking results for HC106A compound and its derivatives to test the docking parameters used are shown. The DosS crystal structure GAF domain (PDB ID: 2W3E) is selected and prepared using MOE. The potential binding site is found using MOE Site Finder tool by taking the mechanism of action of known compounds into account. This resulted in a pocket where a compound can coordinate with the iron in the heme group. The compounds are docked in the pocket using a pharmacophore approach. The docking approach is exactly the same as described in the Methodology section, with the exception of the placement step in which a pharmacophore is used. The docking of HC106A shows that the compound coordinates with iron through the 204 Figure S6.3 Overlapping of binding pockets with respect to WT structure. Mutated residues are shown with vdW sphere representation. WT is shown in red color in A-C. A. Gly117Leu mutation is shown in green. B. Gly122Leu mutation is shown in gray. C. His93Ala mutation is shown in cyan. isoxazole ring, and the urea group coordinates with the carboxyl groups in the heme (Fig.4). The pocket consists of a relatively hydrophobic region around the isoxazole ring, and a more solvent exposed area located near aromatic ring. It is known in the literature that Gly117 plays an important role for DosS resistance against HC106A. Therefore, Gly117Leu mutation is used for docking. In addition, Gly122Leu and His93Ala mutations are tested for their effect on docking of HC106A compound. The docking results show that the Gly177Leu mutation is indeed causing an unstable binding of HC106A compound (Fig. 4). The docking score is above zero and compared to WT and other mutations, it is very high. The reason for this is that Leu is has a bulkier side chain than Gly, therefore, it occupies the binding pocket (Fig. 3), making it harder for a compound to dock. Furthermore, the interaction with iron is completely lost due to the rotation of the isoxazole ring. On the other hand, when compared to Gly117Leu mutation, Gly122 to Leu mutation did not cause a significant difference for docking of HC106A in terms of binding orientation. Similarly, the His93 residue located at the top of the pocket is mutated into Ala, and it did not cause any change in the binding of HC106A compound. One problem with H93A docking is that the hydrogens in the urea group in the WT interacts with carboxyl of heme, however, in His93Ala, one of them rotates and interacts with the Glu87 residue. The modeling of the synthesized compounds (Table 1) was performed using molecular docking. 205 Figure S6.4 The pocket surface is shown for HC106A in WT and mutated structures. The coloring follows the same scheme used in Figure 13. A. Wild-type DosS (docking score: -0.87). B. G117L mutation (docking score: 4.55). C. G122L mutation (docking score: -1.31). D. H93A mutation (docking score: -1.33). 206 The binding pocket consists of mainly non-polar residues (Phe98, Val95, Pro115, Gly117, Ile121, Ile125, and Tyr171) and three polar residues (Glu87 and His 89 and His93). For all of the compounds that are shown to be active, coordination with a heme iron through the isoxazole ring was observed. The urea linker region provided another contact with the heme carboxylate groups by forming hydrogen bonds, and this particular orientation was not observed for most of the inactive compounds (Fig. 5). The lipophilic-binding domain for each compound was located at the interface between the protein and the solvent, therefore, as observed from the SAR studies, the addition of a hydrogen bond donor or acceptor group provides better interactions with the polar backbone of the protein (Fig. 6). The compounds MSU-43572, MSU-43419, MSU-43424, for example, include a phenol or benzyl alcohol that is found to form interactions with the polar backbone atoms from His93 or Gly117/Lys116, depending on the orientation of the ring. The presence of the benzyl alcohol group increased the hydrophobic contact area with Ile125. Furthermore, the meta positioning of the –OH group on the phenyl ring was found to provide less polar contact area with the Val95 residue, as compared to para positioning. Replacing the –OH group with an amine group, MSU-43423, hydrogen bonding with the surrounding residues increases, however, the polar contact area with Ile125 also increases, which is not favorable. Overall, although the docking scores are not reliable in ranking the affinities, the molecular docking studies provided a stable initial conformation for the further modeling of the compounds of interest. The orientation of key moieties and interactions with the surrounding residues, how the compounds would "sit" in the pocket, and preliminary information regarding the key residues and their mutations. 7.3.2 Determining the protonation states of the pocket residues The next step in understanding the mechanism of inhibition of these compounds is to run a molecular dynamics simulation to obtain the time-dependent behaviors of the protein and the inhibitor molecules. However, as the oxidation state of the iron center in the DosS protein is known to change to trigger the "on" switch in the protein, the titratable residues within or near the iron center in heme may be changing their protonation states. The comparison of the crystal 207 Table S6.1 The docking scores and EC50s for the compounds of interest. 208 Figure S6.5 The overlapped docking orientations of selected compounds. The heme group and the residues are shown with stick representation, and the compounds are shown with line representation. Figure S6.6 (A) The electrostatic surface of the binding pocket. (B) The lipophilic surfaces of MSU-43419 and MSU-43424 with Ile125. The residues, the heme group and the compounds are shown with stick representation. 209 structures of DosS at "on" and "off" states indicate that some pocket residues, for example Glu87 and His90, go through a conformation change that accompanies to the oxidation state change of iron. Constant-pH Molecular Dynamics (CpHMD) simulations are useful in determining the pKa values of titratable residues within the protein. As the iron center changes the oxidation states, accompanied by the potential conformation changes of pocket residues, during the on-off shift of the DosS protein, it is reasonable to hypothesize that the protonation states of certain titratable pocket residues, i.e. histidines, can change. Four systems were investigated to understand the pKa change in the His,Glu, and Asp residues in the protein: Fe+2 (5-coordinated on-state; PDB ID 2W3F), water-coordinated Fe+3 (6-coordinated off-state; PDB ID 2W3D), CO-Fe+2 (CO-bound on-state; PDB ID 4YNR), and isoxazole-coordinated Fe+2 (6-coordinated inhibited protein). 210 Figure S6.7 The titration plots obtained with CpHMD simulations for four systems: Fe+2 (5-coordinated on-state; PDB ID 2W3F), water-coordinated Fe+3 (6-coordinated off-state; PDB ID 2W3D), CO-Fe+2 (CO-bound on-state; PDB ID 4YNR), and isoxazole-coordinated Fe+2 (6-coordinated inhibited protein). 211 Comparison of 2w3f ("on" state) with 2w3d ("off" state) of the pKa values in Figure 7 reveals that the protonation state changes occur for His117 and His139. At neutral pH, His117 had very close populations for the protonated state at the Nπœ– and both N𝛿 and Nπœ–, whereas His139 was mainly protonated at Nπœ–. In the "on" state, however, His117 was predominantly protonated at N𝛿 while His139 was double-protonated. The presence of the CO molecule coordinated to iron further changes the protonation states of these histidines. Going from the "on" state to CO-bound state, His117 has close populations for Nπœ– protonated and double-protonated states, and His139 lost N𝛿 proton. If we compare the "on" state with isoxazole-bound state, His117 has similar populations for N𝛿 and doubly protonated states, while His139 loses the N𝛿 proton. The addition of the isoxazole-based compounds when the DosS protein is in the "on" state was shown to shift the oxidation state of iron towards +3 from +2 state. ? Although isoxazole itself may not be considered an inhibitor, its coordination to iron may trigger pKa shifts and conformation changes in the pocket residues. If we compare the protonation states of "off" state, which is Fe+3 with isoxazole-bound state, His92 is doubly protonated and in "off" state, it only has proton in N𝛿. The simulation of apo systems were performed using the corresponding protonation states of titratable residues. The "off" state and the CO-bound state of the DosS protein were analyzed to gain a better understanding of the differences between the two states (Fig. S1). While the per-residue fluctuations are quite similar in both cases, in "off" state, the RMSD of the protein is more stable than the "on" state. Furthermore, the distance between Glu87 and His89 is quite stable in both while the His87-His93 distance showed more instability for the "on" state (Fig. S1(D,H)). The water density around the aforementioned histidines is also very similar in both systems. The water presence around the iron center on the other hand is different. While the water entry to the pocket in "off" state seem to occur through where His89 and His93 are located as well as near the loop of Gly117 residue, in CO-bound systems, there was no water bridge towards the pocket (Fig. S1). One drawback of this CpHMD approach is that the simulations were performed in implicit water, therefore, the impact of the water molecules in the pocket residues cannot be well understood. The crystal structures of both "on" and "off" states of the DosS protein has crystal water molecules, 212 which leads to hypothesize that to understand the conformation changes associated with the on-off switch, the CpHMD simulations with explicit waters need be performed. 7.3.3 Modeling of Heme group for Inhibitor Binding After determining the protonation states of specific residues that may be involved in the sequence of events to change the oxidation state of iron, the methods of modeling the iron-inhibitor interactions for calculating the binding affinities of the compounds of interest were studied. The choice of modeling approach for iron metal-inhibitor coordination determines the approach that can be used for binding energy approximations. Here, two different approaches were investigated: unbound metal coordination and bound metal coordination approaches. The different oxidation states of iron also were tested with each approach. First, the observations from the unbound modeling approach will be addressed. With the unbound modeling of the inhibitor compounds, methods such as absolute binding free energies or MM-GBSA can be used to estimate the binding affinities. However, during the simulations of unbound model, most compounds did not stay coordinated to iron during the 100 ns and either left the pocket or moved in different areas within the pocket, as can be seen in Figure 8 and Figure 9. These systems were useful in understanding the conformations that the inhibitor compounds may take before forming the interactions with iron. (+3) oxidation state of iron led to more stable complexes with the inhibitor molecule and iron, for some of the tested compounds. For instance, MSU-43686 compound with Fe+3 state (Fig. S2) was more stable and sustained a shorter distance between iron and isoxazole for a longer time than the Fe+2 counterpart. However, overall, the majority of the compounds were either leaving the pocket during the 100 ns simulation time or losing the interactions and the coordination with heme group and trying to leave the pocket. Therefore, this type of unbound modeling with iron, even with the correct oxidation state, was deemed not to be an appropriate approach to investigate the inhibitor molecules. On the other hand, it provided some evidence in terms of which paths can be used to enter/exit the pocket. Two unique paths were identified: one near the space between the Gly117 and His93 is located, and the other one is where the water channel was formed in the apo simulations. This is 213 shown in the Fe+2 unbound modeling of MSU-43683 (Fig.8) molecule where it is first losing the coordination and interaction with iron and heme, respectively, and then, exit the pocket through a path between the His89 and His93 residues. This path was identified as a water entry path to the pocket in the simulations of the "off" state. Another potential exit path was seen in HC106A simulations (Fig. 9). The molecule is quite stable during the frst 30 ns of the simulation time, however, after that, HC106A starts to lose its interaction with the heme and the iron and assumes an almost parallel placement with the heme plane by orienting the upper end of the molecule towards the Pro115. Following the non-bonded models, next, the bonded model using the MCPB.py package to model iron-isoxazole interaction was teste; it was then used it to model the rest of the compounds. With this methodology, to estimate the binding affinities, the ABFE approach is not suitable due to the bond existing between the nitrogen of isoxazole and the iron atom, therefore, either MM-GBSA or Relative binding free energy (RBFE) methods can be applied. The rationale behind selecting the isoxazole for parametrization instead of the whole compound is (i) to save time and computational cost as these parametrization procedures involve QM-based optimization and charge calculations, (ii) any other ring replacement in place of isoxazole does not provide any activity against DosS. Therefore, the iron-isoxazole group for the bonded model was parametrized. To understand the dynamics of the inhibitor compounds within the pocket, 100 ns-long simulations were performed, and the interaction energies were calculated for the pocket residues, including the heme group. For all compounds listed in Figure 10, the coordination with Fe was stable throughout the simulations. Overall, the interactions of the compounds with the surrounding pocket residues (shown in Figure S3) suggest that the –NH on the indazole ring provides a strong H-bond with His93 backbone. When hydrogen on indazole –NH is removed, the interaction is lost. The presence of the inhibitor molecules did not prevent waters from entering the pocket, and those water molecules act as water bridges between compounds and the pocket residues, including providing a coordination between heme with the urea group occurs through a water bridge. The nitrogen closer to the isoxazole group provides the majority of the interaction with water (it shows that the second nitrogen may not be as 214 Figure S6.8 Fe+2 and Fe+3 unbound modeling for MSU-43683 molecule. A, E: RMSD time-series of the DosS protein, heme and iron, and inhibitor molecule. B, F: Per-residue RMSF plot. B, G: The distance between the iron and nitrogen from isoxazole. D, H: Per-residue interaction energies. The MM-GBSA binding energies are also given as dG values. 215 Figure S6.9 Fe+3 unbound modeling for HC106A molecule. A: RMSD time-series of the DosS protein, heme and iron, and inhibitor molecule. B: Per-residue RMSF plot. C: The distance between the iron and nitrogen from isoxazole. D: Per-residue interaction energies. The MM-GBSA binding energy are also given as the dG value. The binding orientation of HC106A at two different timepoints during the simulation are shown at the bottom of the figure. 216 Figure S6.10 Fe+3 simulations with isoxazole parameters and corresponding binding affinities estimated with different methods. important). Furthermore, no conformation change was observed for Glu87 residue in the presence of the investigated compounds. The majority of the compounds formed a stable interaction with His89. The RMSD time-series data from HC106A shown in Figure 11 and MSU-43672 shown in Figure 12 indicate that both the protein and the ligands have reached an equilibrium within the investigated simulation time. Furthermore, their per-residue RMSF values are quite similar as well. The most interesting difference between these two simulations is the interaction energies with the pocket residues. While HC106A formed a very strong interaction with the heme group due to the direct hydrogen bond formation with its urea group, MSU-43672 has multiple strong interactions with other pocket residues. For both compounds, however, interaction with the iron has a destabilizing contribution (larger than zero). 7.4 Conclusions Modeling of metal-containing protein active sites using classical methods is indeed a challenging problem. Here, a protein sensor with a heme group that changes the oxidation state of its iron center 217 Figure S6.11 Results for HC106A bonded-Fe+3 state modeling for DosS protein. A : RMSD time-series of the DosS protein, heme and iron, and HC106A. B: Per-residue RMSF plot. C: Calculated water density around the given residues. D: Distances between Glu87-His89 and His89-His93 residues. E. Per-residue interaction energies. F. The most populated cluster from the HC106A simulations. The interactions between the water around the pocket residues are shown with pink lines. 218 Figure S6.12 Results for MSU-43672 bonded-Fe+3 state modeling for DosS protein. A : RMSD time-series of the DosS protein, heme and iron, and MSU-43672. B: Per-residue RMSF plot. C: Calculated water density around the given residues. D: Distances between Glu87-His89 and His89-His93 residues. E. Per-residue interaction energies. F. The most populated cluster from the MSU-43672 simulations. The interactions between the water around the pocket residues are shown with pink lines. 219 when the protein switches to "on" state was studied. The reduction of iron is accompanied by a proposed confirmation change of the side chains of the pocket residues and a water network rearragement of the active site. The water rearrangement within the active may indicate that the certain titratable residues may change their protonation states during the switch from "off" to "on" states of DosS protein. In addition, the protonation states of histidines do change when the protein is in the "on" state. The binding of the selected inhibitor compounds with various models was also investigated. While the docking approach gave initial structures for use in further simulations, the scores were not reliable enough to rank the compounds based on their affinities. Next, different methods for modeling the interaction with the iron center were tested. The bonded model was the most reliable method, and while it does not allow for ABFE to estimate binding affinities, RBFE methods can be used in this setting. 220 BIBLIOGRAPHY [1] Zheng, H. and Abramovitch, R. B. (2020). Inhibiting DosRST as a new approach to tuberculosis therapy. [2] Cho, H. Y., Cho, H. J., Kim, Y. M., Oh, J. I., and Kang, B. S. (2009). Structural insight into the Heme-based redox sensing by DosS from Mycobacterium tuberculosis. Journal of Biological Chemistry, 284(19):13057–13067. [3] Kumar, A., Toledo, J. C., Patel, R. P., Lancaster, J. R., and Steyn, A. J. (2007). Mycobacterium tuberculosis DosS is a redox sensor and DosT is a hypoxia sensor. Proceedings of the National Academy of Sciences of the United States of America, 104(28):11568–11573. [4] Podust, L. M., Ioanoviciu, A., and Ortiz De Montellano, P. R. (2008). 2.3 Γ…X-ray structure of the heme-bound GAF domain of sensory histidine kinase DosT of Mycobacterium tuberculosis. Biochemistry, 47(47):12523–12531. [5] Zheng, H., Colvin, C. J., Johnson, B. K., Kirchhoff, P. D., Wilson, M., Jorgensen-Muga, K., Larsen, S. D., and Abramovitch, R. B. (2017). Inhibitors of Mycobacterium tuberculosis DosRST signaling and persistence. Nature Chemical Biology, 13(2):218–225. [6] Basudhar, D., Madrona, Y., Kandel, S., Lampe, J. N., Nishida, C. R., and Montellano, P. R. O. D. (2015). Analysis of cytochrome p450 cyp119 ligand-dependent conformational dynamics by two-dimensional nmr and x-ray crystallography. Journal of Biological Chemistry, 290:10000– 10017. [7] ULC, C. C. G. (2022). Molecular operating environment (moe). [8] Li, P. and Merz, K. M. (2016). Mcpb.py: A python based metal center parameter builder. Journal of Chemical Information and Modeling, 56:599–604. [9] Vanquelef, E., Simon, S., Marquant, G., Garcia, E., Klimerak, G., Delepine, J. C., Cieplak, P., and Dupradeau, F.-Y. (2011). R.e.d. server: A web service for deriving resp and esp charges and building force field libraries for new molecules and molecular fragments. Nucleic Acids Research, 39:W511–W517. [10] Seminario, J. M. (1996). Calculation of intramolecular force fields from second-derivative tensors. International Journal of Quantum Chemistry, 60. [11] Mongan, J., Case, D. A., and McCammon, J. A. (2004). Constant ph molecular dynamics in generalized born implicit solvent. Journal of computational chemistry, 25:2038–2048. [12] Nguyen, H., Roe, D. R., and Simmerling, C. (2013). Improved generalized born solvent model parameters for protein simulations. Journal of chemical theory and computation, 9:2020. 221 [13] Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A., and Case, D. A. (2004). Development and testing of a general amber force field. Journal of Computational Chemistry, 25:1157–1174. [14] Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W., and Klein, M. L. (1983). Comparison of simple potential functions for simulating liquid water. The Journal of Chemical Physics, 79:926–935. [15] Maier, J. A., Martinez, C., Kasavajhala, K., Wickstrom, L., Hauser, K. E., and Simmerling, C. (2015). ff14sb: Improving the accuracy of protein side chain and backbone parameters from ff99sb. Journal of Chemical Theory and Computation, 11:3696–3713. [16] Jakalian, A., Jack, D. B., and Bayly, C. I. (2002). Fast, efficient generation of high-quality atomic charges. am1-bcc model: Ii. parameterization and validation. Journal of Computational Chemistry, 23:1623–1641. [17] York, D. and Case, P. K. . D. (2020). Amber 2020. [18] Roe, D. R. and Cheatham, T. E. (2013). Ptraj and cpptraj: Software for processing and analysis of molecular dynamics trajectory data. Journal of Chemical Theory and Computation, 9:3084–3095. 222 APPENDIX A SUPPORTING TABLES Table S8.1 The docking scores and EC50s for the compounds of interest. 223 APPENDIX B SUPPORTING FIGURES Figure S8.1 "Off" state and CO-bound "on" state modeling for DosS protein. A, E: RMSD time-series of the DosS protein, and heme and iron. B, F: Per-residue RMSF plot. C, G: Calculated water density around the given residues. D, H: Distances between Glu87-His89 and His89-His93 residues. 224 Figure S8.2 Fe+2 and Fe+3 unbound modeling for MSU-43686 molecule. A, E: RMSD time-series of the DosS protein, heme and iron, and inhibitor molecule. B, F: Per-residue RMSF plot. C, G: The distance between the iron and nitrogen from isoxazole. D, H: Per-residue interaction energies, this was not provided for Fe+2 simulations as the compound left the pocket. The MM-GBSA binding energies are also given as dG values for Fe+3. 225 Figure S8.3 Direct interactions observed during the simulations of highlighted compounds. (A) MSU-43672, (B) MSU-43682, (C) MSU-43686, (D) MSU-43686-B5 derivative, (E) HC106A, (F) MSU-43686-B1 derivative, (G) MSU-43243, (H) MSU-43683, (I) MSU-44244, (J) Ale-9-165. 226 Figure S8.4 Results for MSU-43674 bonded-Fe+3 state modeling for DosS protein. A : RMSD time-series of the DosS protein, heme and iron, and MSU-43674. B: Per-residue RMSF plot. C: Calculated water density around the given residues. D: Distances between Glu87-His89 and His89-His93 residues. E. Per-residue interaction energies. F. The most populated cluster from the MSU-43674 simulations. The interactions between the water around the pocket residues are shown with pink lines. 227 Figure S8.5 Results for MSU-43419 bonded-Fe+3 state modeling for DosS protein. A : RMSD time-series of the DosS protein, heme and iron, and MSU-43419. B: Per-residue RMSF plot. C: Calculated water density around the given residues. D: Distances between Glu87-His89 and His89-His93 residues. E. Per-residue interaction energies. F. The most populated cluster from the MSU-43419 simulations. The interactions between the water around the pocket residues are shown with pink lines. 228 Figure S8.6 Results for MSU-44245 bonded-Fe+3 state modeling for DosS protein. A : RMSD time-series of the DosS protein, heme and iron, and MSU-44245. B: Per-residue RMSF plot. C: Calculated water density around the given residues. D: Distances between Glu87-His89 and His89-His93 residues. E. Per-residue interaction energies. F. The most populated cluster from the MSU-44245 simulations. The interactions between the water around the pocket residues are shown with pink lines. 229 Figure S8.7 Results for MSU-43859 bonded-Fe+3 state modeling for DosS protein. A : RMSD time-series of the DosS protein, heme and iron, and MSU-43859. B: Per-residue RMSF plot. C: Calculated water density around the given residues. D: Distances between Glu87-His89 and His89-His93 residues. E. Per-residue interaction energies. F. The most populated cluster from the MSU-43859 simulations. The interactions between the water around the pocket residues are shown with pink lines. 230 Figure S8.8 Results for MSU-44239 bonded-Fe+3 state modeling for DosS protein. A : RMSD time-series of the DosS protein, heme and iron, and MSU-44239. B: Per-residue RMSF plot. C: Calculated water density around the given residues. D: Distances between Glu87-His89 and His89-His93 residues. E. Per-residue interaction energies. F. The most populated cluster from the MSU-44239 simulations. The interactions between the water around the pocket residues are shown with pink lines. 231 Figure S8.9 Results for MSU-44247 bonded-Fe+3 state modeling for DosS protein. A : RMSD time-series of the DosS protein, heme and iron, and MSU-44247. B: Per-residue RMSF plot. C: Calculated water density around the given residues. D: Distances between Glu87-His89 and His89-His93 residues. E. Per-residue interaction energies. F. The most populated cluster from the MSU-44247 simulations. The interactions between the water around the pocket residues are shown with pink lines. 232 CHAPTER 8 INVESTIGATION OF HOST-GUEST BINDING AFFINITIES WITH GEOMETRIC AND END-POINT BINDING FREE ENERGY CALCULATIONS 233 8.1 Introduction Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) is a blind challenge that provides computational researchers the opportunity to gauge and improve computational methods that are can be applied to drug discovery. 1–5 The existing and newly developed strategies to investigate properties such as solvation free energies, binding affinities, pKa values, and other physicochemical properties can be assessed for a series of given compounds by participants, then the predicted values are compared with the experimentally determined values to assess the performance of each approach. One challenge that is assessed by the SAMPL9 competition is the estimation of the binding affinities between a small molecule (guest) and its target molecule (host). 6,7,SAM Binding affinity prediction is one of the cornerstones of computer-aided drug discovery (CADD) 9,10. Being able to predict the binding energies accurately for a given target helps to reduce the number of compounds to be investigated in a wet-lab setting and hence can help to reduce the cost and the timeframe of a drug discovery program. Therefore, it is critical to assess the reliability of various approaches and schemes used for binding affinities. The SAMPL9 challenge includes two host and five guest molecules, as are shown in Figure 1 SAM. Host molecules are 𝛽-cyclodextrin (bCD) and Hexakis-2,6-dimethyl-𝛽-cyclodextrin (HbCD) compounds that are heptasaccharides made of glucose connected by a 1-4 glycosidic bond 6. Cyclodextrins are used as a reactant involved in inclusion complexes, assessing the diffusion and single-molecule interactions with certain proteins, and improving the solubility and stability of drugs. The five guest molecules that are given in this challenge share the same phenothiazine core with substitutions and a cationic arm (Fig. 1(B)). Among the guest molecules, Promethazine hydrochloride (PMT) is a drug used to treat nausea and vomiting, Thioridazine hydrochloride (TDZ), Chlorpromazine hydrochloride (CPZ), and Trifluoperazine dihydrochloride (TFP) are used for certain mood disorders, Promazine hydrochloride (PMZ) is used as a tranquillizer used in veterinary medicine. 11–15 The goal of the SAMPL9 host-guest challenge is to predict the binding affinities of these host-guest molecules. There are numerous computational methods with varying levels of complexity and cost that can be used to estimate binding affinities: docking methods, 234 alchemical methods such as free-energy perturbation and thermodynamic integration, geometric methods including steered molecular dynamics (SMD) and umbrella sampling, and end-state methods. The majority of these methods rely on atomistic simulations such as equilibrium or non-equilibrium MD. In this work, a number of schemes involving end-state methods are investigated: Molecular Mechanics – Gibbs Born Surface Area/Poisson Boltzmann Surface Area (MM-GBSA/PBSA). Furthermore, an SMD approach was also considered with two different pulling speeds: 5 Γ…/ns and 10 Γ…/ns, with a positional restraint only on the host molecules along with a cylindrical restraint to keep the guest molecules on the axis of the collective variable. Approaches tested here aimed to answer the following questions using the host-guest dataset: β€’ How does the frame sampling impact the performance of end-state methods, namely MM- GBSA and MM-PBSA? β€’ How does the orientation of the guest molecules affect the performance of MM-GBSA/PBSA methods? β€’ Can a simple alternative SMD protocol be created to capture the binding affinities? The end-state methods used here have been the method of choice of many drug discovery programs to perform a quick compound screen, as they have been shown to perform well in ranking the compounds based on their affinities. 16 However, as the success of MM-GBSA/PBSA methods are heavily dependent on the frame selection from the MD trajectories, it is important to test and assess the accuracies of the frame sampling schemes. Furthermore, as an extension of that, the selection of the correct binding pose is another significant criterion that changes the performance of the end-state methods for affinity predictions. Geometric methods used in free energy estimations are based on creating a potential-of-mean force (PMF) to calculate the strength of interaction between two molecules. 17,18These methods require a selection of a collective variable (can be distance, angle, dihedral angle) that is used to perturb the system, and there are well-established schemes such as pAPRika that can be used. 17,18 235 Here, the aim was to create an SMD scheme with the lowest number of restraints possible on the host molecules to estimate the binding affinities. 8.2 Computational Details 8.2.1 Preparation of systems The host and guest molecules were obtained from the SAMPL9 GitHub page (SAMPL9 CD dataset), and their 2D structures are shown in Figure 1(A,B). SAM The protonation states of host and guest molecules were determined by Protonate3D module as implemented in Molecular Operating Environment version 2019.01 (MOE). 19 The partial charges of the molecules were calculated using the RESP method with REDSERVER. 20? 8.2.2 Docking Scheme The host-guest complexes were obtained with the docking using the MOE software. 19 The docking procedure involves two steps: placement and refinement. Triangle matcher was used to place the guest molecules in the host cavities and scores were calculated using the London dG method. Then, the top 100 poses were selected and refined with induced fit approach and the final docking scores were calculated with Generalized-Born volume integral/weighted surface area score (GBVI/WSA dG). 21 The selected poses were minimized with molecular mechanics using AMBER10: Extended HΓΌckel Theory (EHT) force field implemented in MOE. For each host-guest complex, along with the highest scoring poses, a pose that orients differently was also selected to investigate the impact of the guests’ orientation with the within the host molecules. 8.2.3 Classical Molecular Dynamics Procedure Each selected pose was prepared for MD simulations using tleap module of Amber18. 22 The host and guest molecules were modeled using the gaff2 force field, and TIP4P-EW was used as water model to solvate the complexes in a 14Γ… cubic box. 23,24 Required counter ions, Na+ and Cl-, were added to neutralize the systems. 1fs timestep was applied during the heating and the production steps. Langevin thermostat and isotropic position scaling was selected for the thermostat and barostat, respectively. 25–27 Each system was minimized in four steps with decreasing positional restraints on the solute molecules (100, 50, 10, 0 (kcal mol-1 Γ… -2). Then, the systems were heated 236 up to 300 K in a stepwise fashion in 1.51 ns, as described in previous papers. 28–31 The production runs were 20 ns long at 300K, 1atm pressure in triplicates. Nonbonded interactions were truncated with a 10.0 Γ… cutoff value. Particle-mesh Ewald was used for long-range electrostatic interactions. The pmemd.cuda module was used to perform the simulations, as implemented in Amber 18. 22 8.2.4 Binding Energy Estimations The binding energies for the host-guest complexes were calculated using end-state methods, i.e. MM-GBSA/PBSA. 32 Three different frame sampling schemes were tested: (i) sampling 500 frames from first 5 ns, (ii) sampling 500 frames from last 5 ns, and (iii) sampling 1000 frames from the 20 ns trajectory. These calculations were performed for the duplicate simulations and the values are averaged. 8.2.5 Steered Molecular Dynamics Procedure Amber18 along with Plumed was used for the Steered Molecular Dynamics (SMD) calcula- tions. 33–35 The final frames for each set of simulations were used as a starting point for SMD, and the host-guest molecules were re-solvated in 30 Γ… TIP4P-EW water box. The pulling direction was set to be on the z-axis and the guest molecules were oriented accordingly. The pulling direction is determined by the orientation of the cationic side chains of guest molecules. During the SMD, a small positional restraint (5 kcal mol-1 Γ… -2) was applied to the host molecules to keep them at the center of the box. The collective variable is defined as the distance between the center-of-mass of sulfur and nitrogen atoms of phenothiazine core (COM-guest) for each guest molecule and a stationary point within the host molecule that is positioned initially 2 Γ… away from the COM-guest. A wall constraint was added to the mean distance between the COM-guest and the center-of-mass of the oxygen atoms from glycosidic bond (COM-host) (Figure S2). A time-dependent harmonic restraint with force constant of 10 was applied to the collective variable on the z axis to move the guest molecule 20 Γ… away from the host with the speed of 10 Γ…/ns for 2 ns followed by equilibrium at 20 Γ… for 600 ps and 5 Γ…/ns for 3.6 ns followed by 1.2 ns equilibrium. The calculations were performed independently four times for each replica of a pose, and the final energies were obtained by averaging based on 237 Table S8.1 The estimated binding energies and statistics for primary selected poses for bCD and listed guest molecules are shown. RMSE: root mean square errors, MAE: mean absolute errors, ME: mean errors, r2: correlation coefficient, m: slope of the correlation plots, 𝜏 : Kendall’s Tau rank correlation coefficient. the Jarzynski equality. 8.2.6 Analysis The estimated binding affinities were compared with the experimental values and the root mean square errors (RMSE), mean absolute errors (MAE), mean errors (ME), correlation coefficient (r2), slope of the correlation plots (m), and Kendall’s Tau rank correlation coefficient (𝜏) were calculated by bootstrapping with replacement. The statistical analysis was performed for bCD and HbCD systems separately and together as well to assess the performance of methods with respect to each host. The root mean square deviations (RMSD), root mean square fluctuations (RMSF), hydrogen bonds (HBOND), and contact maps were obtained using cpptraj module of Amber18. 22,36 238 Table S8.2 The estimated binding energies and statistics for primary selected poses for HbCD and listed guest molecules are shown. RMSE: root mean square errors, MAE: mean absolute errors, ME: mean errors, r2: correlation coefficient, m: slope of the correlation plots, 𝜏 : Kendall’s Tau rank correlation coefficient. 8.3 Results and Discussion 8.3.1 Host-guest docking poses The structures of the investigated host and guest molecules from SAMPL8 CD challenge are provided in Figure 1 (A) and (B), respectively. The alcohol groups in bCD are modified to methoxy in HbCD host, and an example of these modifications is highlighted with red and blue circles in Fig. 1(A). These modifications change the charge distribution and hydrogen bonding ability of the host molecules, as shown in Fig. 1 (C) and (D) and impact the orientation as well as interactions with the guest molecules. The evaluated docking poses highlight that there are preferred orientations for each host molecule. The primary poses (dominant orientation obtained from docking) in both bCD and HbCD docking, the bulky cationic side chains of guest molecules reside at the secondary face of host molecules, as can be seen in Figure 2. On the other hand, the phenothiazine core substituent did not have a specific preference. In bCD: TDZ and TFP have phenothiazine core substituent that orients towards primary face, and CPZ has the substituent pointing towards the secondary face (Fig. 2(A)). In HbCD, however, the core substituents in TDZ and TFP are oriented towards the secondary face and the CPZ substituent occupies the primary face (Fig. 2(B)). The secondary poses observed 239 Figure S8.1 2D structures of (A) host and (B) guest molecules for SAMPL9 challenge. The electrostatic and hydrophobic surfaces of host molecules are shown in (C) and (D) for bCD and HbCD, respectively. for each system as shown in Figure S1 shows that the cationic side chains still prefer the secondary face of the host molecules. The charge distribution and the hydrophobicity of each face in the host molecules can be attributed for this preference: the dominant negative charge distribution of the secondary face can lead the cationic side chain of guest molecules to orient towards this face (Fig. 1(A, B)). 8.3.2 Orientations of guest molecules The selected orientations of the guest molecules bound to bCD and HbCD are depicted in Figure S1. In general, within the primary poses, a preference is observed: for both bCD-bound and HbCD-bound molecules, their cationic side chains are oriented towards the secondary face. This orientation towards the side chains also occurs for the secondary poses as well. On the other hand, as mentioned in the previous section, the core substituents show different orientation preferences 240 between the bCD and HbCD hosts. As PMZ and PMT do not have a substituent on the phenothiazine core, their orientations are mainly based on the cationic side chains – and in all cases, PMZ and PMT poses face the secondary face consistently. bCD-PMZ and bCD-PMT also have very similar binding affinities, except for the initial 5 ns of the PMZ simulations. HbCD-PMZ poses on average have very similar MM- GBSA/PBSA energies, however, at the beginning and end of the simulations, the second pose of PMZ (HbCD-PMZ-p2) has a binding energy 2 kcal mol-1 stronger. HbCD-PMT poses, on the other hand, have very similar MM-GBSA/PBSA energies regardless of the frame sampling. The primary poses obtained for TFP have a single binding conformation in both bCD and HbCD with a difference in orientation for their phenothiazine core substituent (Fig. S1). For SP vs SS orientations obtained for bCD-TFP-p1 and HbCD-TFP-p1, respectively, their average MM-GBSA/PBSA values align with the experimental binding affinities. Another guest molecule with a different orientation is for the phenothiazine core substituent is the primary poses of CPZ. The experimental binding affinities of CPZ with bCD and HbCD are very similar (-5.42 vs -5.40 kcal mol-1. The MM-GBSA/PBSA results also follow a similar order regardless of the frame sampling. The observations from TFP and CPZ compounds and their respective binding orientations with bCD and HbCD host molecules leads to the conclusion that the preferred orientation of the phenothiazine core substituent is different in each host molecule, and also depends on the substituent type. The phenothiazine core substituent in bCD-CPZ allows more hydrogen bonds to be made with the secondary face of bCD, and in HbCD-CPZ system it allows -Cl to be positioned away from the highly charged HbCD secondary face. All in all, the orientation preferences for guest molecules depend on the substituent changes on the host as well. 8.3.3 Binding energy estimations with end-state methods The selected poses, primary poses and an alternative pose for each host-guest system, were prepared and minimized according to the protocol outlined in Methods section, and 20 ns long classical MD simulations were run in duplicates for each pose. The binding affinities of each pose were calculated for both simulations and averaged. The final results are listed in Table 1 and Table 241 2 for bCD and HbCD systems, respectively. For both MM-GBSA and MM-PBSA binding energy calculations, a number of frame samplings were tested: frames sampled from first 5 ns and last 5 ns of the simulations, or frames sampled throughout the entirety of the simulations. In addition, the SMD approach was considered with two different pulling speeds: 5 Γ…/ns and 10 Γ…/ns. The estimated binding affinities listed in Table 1 and their statistical analysis show that all MM-GBSA estimations have high RMSE (˜17 kcal mol-1), and MM-PBSA results RMSE values were calculated to be ˜13 kcal mol-1. These observations do not change significantly upon changing the frame sampling (avg. vs first 5ns vs last 5ns in Table 1). However, the predicted binding energies are still able to capture the ranking and correlate with the experimental affinities well, as is demonstrated from the Kendall’s tau and r2 values, respectively. Specifically, the approach with the lowest RMSE, MM-PBSA (first 5ns), has a 0.96 r2 and a Kendall’s tau score of 0.79. This indicates that although the exact binding affinities were not predicted, the ranking of them can be calculated with this approach. Another observation from Table 1 is that the selection of frames does not seem to impact the statistics significantly, with the exception of MM-GBSA (last 5 ns). The results obtained for HbCD host are shown in Table 2. The RMSE values are approximately 2 kcal mol-1 higher than the bCD results, highlighting the fact that predictions for HbCD are slightly more difficult for the end-state approach used here. Considering the RMSE results, the lowest RMSE was for MM-PBSA sampled from the last 5 ns. On the other hand, the highest r2 value was observed for MM-GBSA sampled from the last 5 ns, with Kendall’s tau of 0.6. The best ranking of the binding affinities however was observed for MM-GBSA energies averaged from 20 ns of the simulations. The cumulative statistics of the results (Table 3) shows that: β€’ MM-PBSA energies have 2-3 kcal mol-1 lower RMSE values than the MM-GBSA energies. β€’ The best r2 value is obtained with MM-PBSA method with frames sampled from the 20 ns-long trajectories. This is followed by the value obtained with the MM-PBSA method with frames sampled from the initial 5 ns of the simulations. β€’ Both of these schemes have comparable Kendall’s tau values: 0.45 and 0.47, respectively 242 Figure S8.2 The primary poses considered for (A) bCD and (B) HbCD molecules. Oreintations (arm/core) are as follows: bCD: SP; SP; S-; S-; SS. HbCD: SS; SS; S-; S-; SP. Comparing these values with the statistics of the secondary poses (Table S3), it is clear that the pose selection has great significance. While the RMSE values did not improve or deteriorate with pose selection, the highest values obtained for r2 (0.24) and Kendall’s tau (0.38) values were quite low. Even when the lowest energy values for each binding energy calculation scheme was selected for each complex, as reported in Table S4, the correlation coefficient and tau values are still not able to compare with the results from the primary poses. 8.4 Conclusions Based on the challenge results of the submitted computational protocols in SAMPL9 compe- tition, MM-GBSA and MM-PBSA methods provide a high Kendall’s tau (0.45-0.65) along with good r2 estimations (0.78-0.65) with MAE values ˜17 kcal mol-1, highlighting the fact that these end-state methods are successful in obtaining trends in the binding affinities, but not necessarily the exact binding energies. Results obtained in this study also showed that the success of these end-state methods is dependent on the choice of the initial poses and the sampling of the frames from MD simulations. 243 Table S8.3 Cumulative statistics of both bCD and HbCD results with primary poses are listed. RMSE: root mean square errors, MAE: mean absolute errors, ME: mean errors, r2: correlation coefficient, m: slope of the correlation plots, 𝜏 : Kendall’s Tau rank correlation coefficient. 244 BIBLIOGRAPHY [1] Nicholls, A., Wlodek, S., and Grant, J. A. (2009). The samp1 solvation challenge: Further lessons regarding the pitfalls of parametrization†. Journal of Physical Chemistry B, 113:4521– 4532. [2] Geballe, M. T., Skillman, A. G., Nicholls, A., Guthrie, J. P., and Taylor, P. J. (2010). The sampl2 blind prediction challenge: Introduction and overview. Journal of Computer-Aided Molecular Design, 24:259–279. [3] Yin, J., Henriksen, N. M., Slochower, D. R., Shirts, M. R., Chiu, M. W., Mobley, D. L., and Gilson, M. K. (2017). Overview of the sampl5 host–guest challenge: Are we doing better? Journal of Computer-Aided Molecular Design, 31:1–19. [4] Muddana, H. S., Fenley, A. T., Mobley, D. L., and Gilson, M. K. (2014). The sampl4 host- guest blind prediction challenge: An overview. Journal of Computer-Aided Molecular Design, 28:305–317. [5] Muddana, H. S., Varnado, C. D., Bielawski, C. W., Urbach, A. R., Isaacs, L., Geballe, M. T., and Gilson, M. K. (2012). Blind prediction of host-guest binding affinities: A new sampl3 challenge. Journal of Computer-Aided Molecular Design, 26:475–487. [6] Andrade, B., Chen, A., and Gilson, M. K. (2024). Host-guest systems for the sampl9 blinded prediction challenge: phenothiazine as a privileged scaffold for binding to cyclodextrins. Phys- ical chemistry chemical physics : PCCP, 26:2035–2043. [7] Amezcua, M., Setiadi, J., and Mobley, D. L. (2024). The sampl9 host–guest blind challenge: an overview of binding free energy predictive accuracy. Physical Chemistry Chemical Physics, 26:9207–9225. [SAM] samplchallenges/sampl9: 0.8. [9] Beveridge, D. L. and DiCapua, F. M. (1989). Free energy via molecular simulation: applications to chemical and biomolecular systems. Annual review of biophysics and biophysical chemistry, 18:431–492. [10] DiMasi, J. A., Grabowski, H. G., and Hansen, R. W. (2016). Innovation in the pharmaceutical industry: New estimates of r&d costs. Journal of Health Economics, 47:20–33. [11] Kiningham, K. K. (2007). Promethazine. xPharm: The Comprehensive Pharmacology Reference, pages 1–6. [12] Cheng, H. W., Liang, Y. H., Kuo, Y. L., Chuu, C. P., Lin, C. Y., Lee, M. H., Wu, A. T., Yeh, C. T., Chen, E. T., Whang-Peng, J., Su, C. L., and Huang, C. Y. (2015). Identification of thioridazine, an antipsychotic drug, as an antiglioblastoma and anticancer stem cell agent using 245 public gene expression data. Cell Death & Disease 2015 6:5, 6:e1753–e1753. [13] Mann, S. K. and Marwaha, R. (2023). Chlorpromazine. Encyclopedia of Toxicology: Third Edition, pages 925–929. [14] Koch, K., Mansi, K., Haynes, E., Adams, C. E., Sampson, S., and Furtado, V. A. (2014). Trifluoperazine versus placebo for schizophrenia. The Cochrane Database of Systematic Reviews, 2014. [15] Sibilio, J. P., Andrew, G., Stehman, V. A., Dart, D., and Moore, K. B. (1957). Treatment of chronic schizophrenia with promazine hydrochloride. A.M.A. Archives of Neurology & Psychiatry, 78:419–424. [16] Eken, Y., Almeida, N. M., Wang, C., and Wilson, A. K. (2021). Sampl7: Host–guest binding prediction by molecular dynamics and quantum mechanics. Journal of Computer-Aided Molecular Design, 35:63–77. [17] Henriksen, N. M., Fenley, A. T., and Gilson, M. K. (2015). Computational calorimetry: High-precision calculation of host-guest binding thermodynamics. Journal of Chemical Theory and Computation, 11:4377–4394. [18] Velez-Vega, C. and Gilson, M. K. (2013). Overcoming dissipation in the calculation of standard binding free energies by ligand extraction. Journal of Computational Chemistry, 34:2360–2371. [19] (2022). Molecular operating environment (moe), 2022.02 chemical computing group ulc, 1010 sherbooke st. west, suite 910, montreal, qc, canada, h3a 2r7. [20] Vanquelef, E., Simon, S., Marquant, G., Garcia, E., Klimerak, G., Delepine, J. C., Cieplak, P., and Dupradeau, F.-Y. (2011). R.e.d. server: A web service for deriving resp and esp charges and building force field libraries for new molecules and molecular fragments. Nucleic Acids Research, 39:W511–W517. [21] Corbeil, C. R., Williams, C. I., and Labute, P. (2012). Variability in docking success rates due to dataset preparation. Journal of Computer-Aided Molecular Design, 26:775–786. [22] York, D., Kollman, P., and et al, D. C. (20218). Amber 2018. [23] He, X., Man, V. H., Yang, W., Lee, T.-S., and Wang, J. (2020). A fast and high-quality charge model for the next generation general amber force field. The Journal of Chemical Physics, 153:114502. [24] Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A., and Case, D. A. (2004). Development and testing of a general amber force field. Journal of Computational Chemistry, 25:1157–1174. 246 [25] Woodcock, L. V. (1971). Isothermal molecular dynamics calculations for liquid salts. Chem- ical Physics Letters, 10:257–261. [26] Verlet, L. (1967). Computer "experiments" on classical fluids. i. thermodynamical properties of lennard-jones molecules. Physical Review, 159:98–103. [27] Horn, H. W., Swope, W. C., and Pitera, J. W. (2005). Characterization of the tip4p-ew water model: Vapor pressure and boiling point. Journal of Chemical Physics, 123:194504. [28] Almeida, N. M., Bali, S. K., James, D., Wang, C., and Wilson, A. K. (2023). Binding of per- and polyfluoroalkyl substances (pfas) to the ppar𝛾/rxr𝛼-dna complex. Journal of Chemical Information and Modeling, 63:7423–7443. [29] Bali, S. K., Marion, A., Ugur, I., Dikmenli, A. K., Catak, S., and Aviyente, V. (2018). Activity of topotecan toward the dna/topoisomerase i complex: A theoretical rationalization. Biochemistry, 57:1542–1551. [30] Bali, S. K., Haslak, Z. P., Cifci, G., and Aviyente, V. (2023). Dna preference of indenoiso- quinolines: a computational approach. Organic & Biomolecular Chemistry, 21:4518–4528. [31] Findik, B. K., Cilesiz, U., Bali, S. K., Atilgan, C., Aviyente, V., and Dedeoglu, B. (2022). Investigation of iron release from the n- and c-lobes of human serum transferrin by quantum chemical calculations. Organic & Biomolecular Chemistry, 20:8766–8774. [32] Miller, B. R., McGee, T. D., Swails, J. M., Homeyer, N., Gohlke, H., and Roitberg, A. E. (2012). Mmpbsa.py : An efficient program for end-state free energy calculations. Journal of Chemical Theory and Computation, 8:3314–3321. [33] Bonomi, M., Bussi, G., Camilloni, C., Tribello, G. A., BanΓ‘Ε‘, P., Barducci, A., Bernetti, M., Bolhuis, P. G., Bottaro, S., Branduardi, D., Capelli, R., Carloni, P., Ceriotti, M., Cesari, A., Chen, H., Chen, W., Colizzi, F., De, S., Pierre, M. D. L., Donadio, D., Drobot, V., Ensing, B., Ferguson, A. L., Filizola, M., Fraser, J. S., Fu, H., Gasparotto, P., Gervasio, F. L., Giberti, F., Gil-Ley, A., Giorgino, T., Heller, G. T., Hocky, G. M., Iannuzzi, M., Invernizzi, M., Jelfs, K. E., Jussupow, A., Kirilin, E., Laio, A., Limongelli, V., Lindorff-Larsen, K., LΓΆhr, T., Marinelli, F., Martin-Samos, L., Masetti, M., Meyer, R., Michaelides, A., Molteni, C., Morishita, T., Nava, M., Paissoni, C., Papaleo, E., Parrinello, M., Pfaendtner, J., Piaggi, P., Piccini, G. M., Pietropaolo, A., Pietrucci, F., Pipolo, S., Provasi, D., Quigley, D., Raiteri, P., Raniolo, S., Rydzewski, J., Salvalaglio, M., Sosso, G. C., Spiwok, V., Ε poner, J., Swenson, D. W., Tiwary, P., Valsson, O., Vendruscolo, M., Voth, G. A., and White, A. (2019). Promoting transparency and reproducibility in enhanced molecular simulations. Nature Methods 2019 16:8, 16:670–673. [34] Tribello, G. A., Bonomi, M., Branduardi, D., Camilloni, C., and Bussi, G. (2013). Plumed 2: New feathers for an old bird. Computer Physics Communications, 185:604–613. [35] Bonomi, M., Branduardi, D., Bussi, G., Camilloni, C., Provasi, D., Raiteri, P., Donadio, D., 247 Marinelli, F., Pietrucci, F., Broglia, R. A., and Parrinello, M. (2009). Plumed: A portable plugin for free-energy calculations with molecular dynamics. Computer Physics Communications, 180:1961–1972. [36] Roe, D. R. and Cheatham, T. E. (2013). Ptraj and cpptraj: Software for processing and analysis of molecular dynamics trajectory data. Journal of Chemical Theory and Computation, 9:3084–3095. 248 APPENDIX A SUPPORTING TABLES Table S8.1 The estimated binding energies and statistics for secondary selected poses for bCD and listed guest molecules are shown. RMSE: root mean square errors, MAE: mean absolute errors, ME: mean errors, r2: correlation coefficient, m: slope of the correlation plots, 𝜏 : Kendall’s Tau rank correlation coefficient. Table S8.2 The estimated binding energies and statistics for secondary selected poses for HbCD and listed guest molecules are shown. RMSE: root mean square errors, MAE: mean absolute errors, ME: mean errors, r2: correlation coefficient, m: slope of the correlation plots, 𝜏 : Kendall’s Tau rank correlation coefficient. 249 Table S8.3 Cumulative statistics of both bCD and HbCD results with secondary poses. RMSE: root mean square errors, MAE: mean absolute errors, ME: mean errors, r2: correlation coefficient, m: slope of the correlation plots, 𝜏 : Kendall’s Tau rank correlation coefficient. Table S8.4 Cumulative statistics of both bCD and HbCD results with lowest energy poses for each calculation type. RMSE: root mean square errors, MAE: mean absolute errors, ME: mean errors, r2: correlation coefficient, m: slope of the correlation plots, 𝜏 : Kendall’s Tau rank correlation coefficient. 250 APPENDIX B SUPPORTING FIGURES Figure S8.1 The primary and secondary poses considered for (A) bCD and (B) HbCD molecules. Oreintations (arm/core) are as follows: bCD: SP/SP; SP/SP; S-/S-; S-/S-; SS/PS. HbCD: SS; SS/SP; S-/S-; S-/S-; SP/SP 251 Figure S8.2 An overview of SMD protocol. A. Atoms selected for the COM-guest calculation. CPZ is given here as an example. B. Atoms selected for COM-host calculation. bCD is given here as an example.C. Details of the starting point for SMD calculations. (D) SMD path of the guest molecules. A cylindrical restraint was applied to the mean of the xy distance between center-of-mass of sulfur and nitrogen atoms of phenothiazine core and center-of-mass of oxygen atoms from glycosidic bond. CPZ is shown here as an example. 252 Figure S8.3 The maps for the total contact numbers (native and non-native) between host and guest molecules for primary poses. 253 Figure S8.4 The atom contact maps (non-native) between host and guest molecules for primary poses. 254 Figure S8.5 The RMSD time-series of host and guest molecules for primary poses. 255 Figure S8.6 The per-atom RMSF plots of host and guest molecules for primary poses. 256 CHAPTER 9 CONCLUDING REMARKS AND FUTURE DIRECTIONS 257 Computational chemistry provides us a set of versatile tools study various phenomenons, including developing of new compounds to target certain diseases and understanding toxicological effects of specific chemicals on various organism. Describing the dynamics of the proteins and their interactions with ligands or inhibitor compounds allows us to have better understanding of the conformational and energetic spaces in which ligands can bind to proteins. With the appropriate use of these tools, we can gain insight for the ligand binding phenomena and the protein dynamics associated with it. In Chapters 3, 4, and 5, the impact of a class of environmental pollutants (PFAS) on human and fish proteins were investigated. We have targeted three different proteins: PPAR𝛾-RXR𝛼-DNA complex, Estrogen receptors 𝛼 and 𝛽, and thyroglobulin protein. The results highlight that the PFAS are binding strong enough to be able to exert toxic effects and impact the conformational flexibility of the target proteins. Our results outline the important residues in PFAS recognition and binding in these protein pockets. In Chapter 6 and 7, in collaboration with Dr. Robert Abramovitch’s and Dr. Edmund Ellsworth’s groups, compounds aimed to kill Mycobacterium tuberculosis infections were developed and their interactions with the target proteins were investigated. One of the target proteins, DosS, is a heme protein hypothesized to sense the redox change in the environment. Our calculations sug- gested that the isoxazole moiety is favored for iron coordination. Furthermore, to elucidate the mechanism of inhibition of DosS, we employed constant pH simulations as well as classical MD simulations at different states of the protein, ad provided insight into the protonation state changes and conformational differences of the pocket residues. The other target, mmpL3, is a membrane protein responsible for relocating TMM lipid from inner membrane to periplasmic region of Mtb bacteria. The compounds developed by Dr. Ellsworth’s group were modeled and key residues and interactions were identified. Furthermore, we also investigated the other tuberculous bacteria and compared their binding sites to allow for the more efficient targeting with the developed compounds. In Chapter 8, we tested the performance of our methods as well as new SMD protocols using the dataset from Statistical Assessment of the Modeling of Proteins and Ligands 9 (SAMPL9) blind 258 challenge. Our results show good performance in terms of ranking the binding affinities of tested host-guest systems. Besides the work that was presented here, we also worked collaboratively with Reata Pharma- ceuticals until October 2023 to assist their drug discovery programs, and the contents of those work cannot be shared. Going forward, PFAS still continues to be a threat to human well-being and ecological health. Understanding the detailed mechanisms in which PFAS exerts its toxicity is crucial to develop appropriate mitigation strategies at different exposure levels. Computational modeling is a very powerful approach in providing molecular level understanding of toxicities and can be used to uncover the impact of many PFAS towards important target proteins. The nuclear receptors have been the main focus for the PFAS toxicity in humans and other vertebrae, and comparison of all the data available on these nuclear receptors would highlight the key characteristics of the protein pocket features as well PFAS features that is associated with different degrees of toxicities. Besides the nuclear receptors that are known targets for PFAS, understanding how specific signaling mechanisms are disturbed by PFAS exposure is needed. For instance, as mentioned in Chapter 4, thyroid hormone levels in human body is impacted by PFAS exposure. However, there are many layers to understand the molecular details of the cascade of events that would lead to such health problems. Investigation of thyroid hormone signaling after their production by hTG protein is also a key consideration to obtain a better picture on PFAS toxicity on thyroid system. While TB is not an immediate public health threat in developed countries, it is still a concerning problem in most parts of the rural areas. Considering the shortcomings of the current treatment strategies, novel and more effective drugs are required. mmpL3 and DosS are two crucial targets in combating TB, and understanding how the current compounds interact with these two targets are crucial in developing candidate compounds. Methods for which the computational challenges associated with the two aforementioned targets were explained in this dissertation. Moving this one step further, a bigger model system together with the relevant lipid molecules is needed to fully grasp the impact of inhibitor compounds on the mmpL3 protein. Furthermore, addition of TMM 259 lipid to the membrane will also be a key step to further uncover how the lipid binding to mmpL3 would be impacted by the inhibitory compounds. DosS protein is a challenging target to study using classical MD modeling due to the presence of an iron center. A workflow that can be built on top the work that has been presented in this thesis is required to accurately predict the binding affinities of investigated compounds. 260