MATHEMATICALMODELINGANDCOMPUTATIONOFMOLECULARSOLVATIONANDBINDINGByBaoWangADISSERTATIONSubmittedtoMichiganStateUniversityinpartialentoftherequirementsforthedegreeofAppliedMathematics-DoctorofPhilosophy2016ABSTRACTMATHEMATICALMODELINGANDCOMPUTATIONOFMOLECULARSOLVATIONANDBINDINGByBaoWangThisdissertationcontainsacoupleofresultsonbiophysicsmodelingandcomputation,rang-ingfromsolvatedmolecularconformationmodelingtomolecularsolvationandbindingmod-elinginthesolventenvironment.WestudythesolventexcludedsurfaceinEulerianrepresentation,providethesur-faceareaandenclosedvolumecalculation,themoleculartopologicalanalysisisalsoaddressed.WefurtheranalyzetheelectrostaticforthesolvatedmoleculeswiththeEuleriansolventexcludedsurface.Weshowthatoursurfaceisanalyticalwithoutanynumericalapproximation.WestudythecoarsegridPoissonBoltzmannsolver.OursoftwareenablesextremelyaccuratenumericalsolutiontothePoissonBoltzmannequationevenatverylargegridspacing.Asaconsequence,oursoftwareprovidesareliableelectrostaticcalculationforthesolvationandproteinligandbindingrelatedproblem.Westudytheblindsolvationfreeenergypredictionproblem.Ahybridofphysicalandstatisticalprotocolisproposedforhighlyaccuratesolvationfreeenergyprediction.Furthermore,tomediatetheforceparametrizationonthesolvationfreeenergyprediction,weproposealearningtorankbasedsolvationfreeenergypredictionparadigm.Weexploretheproteinligandbindingfreeenergypredictionanddockingscoringviathelearningtorankapproach.Inwhichalearntorankbasedscoringfunctionisproposedforaccurateproteinligandbindingscoring.CopyrightbyBAOWANG2016ACKNOWLEDGMENTSIwouldliketothankProfessorYongluoCaoforrecommendingandProfessorZhengfangZhouforadmittingmetothisPh.D.program.Mysincerethanksgotomyadvisor,ProfessorGuoweiWei,fortheacademicguidanceandcialsupport.Hisownscienworkandinsightonmanyoftherelatedproblemshasatremendousonmeandhasshapedtheoutcomeofthisthesis.IwanttothankProfessorYiyingTongformanystimulatingdiscussionsandencourage-mentsduringmythesiswork.IalsothankProfessorsPeterBatesandMoxunTangforservingonmythesiscommittee.MyspecialthanktoProfessorAndreaL.Bertozziformanyhelpfuldiscussionsander-ingagreatacademicpositionattheUniversityofCalifornia,LosAngeles.IthankProfessorAlexandreJ.Chorinfortheacademichelpinmygraduatestudiesaswell.Further,Iwouldthankmanycollaboratorsandfriends,ZixuanCang,YinCao,ChiJin,KediWu,MenglunWang,WeicongZhou,Dr.WenbinLi,ProfessorMingYan,andmanyothers.IalsoappreciateallthehelpfromMetinOzsarfati.Lastbutnotleast,IwouldliketothankHuiminYanandmyparentsGaomingWangandTaiyuJiangfortheirloveandsupport.vTABLEOFCONTENTSLISTOFTABLES....................................xLISTOFFIGURES...................................xiiChapter1Introduction...............................1Chapter2SolvationModels............................52.1ExplicitSolventModels.............................62.2ImplicitSolventModels.............................62.2.1Introduction................................62.2.2PoissonBoltzmann(PB)Model.....................72.2.3GeneralizedBorn(GB)Model......................72.2.3.1GeneralizedBornEquations..................82.2.4PolarizableContinuumModel......................102.2.4.1QuantumMechanicalProblem................122.2.4.2ElectrostaticProblem.....................132.2.4.3DielectricPolarizableContinuumModel(DPCM)......142.3IntegralEquationBasedSolvationModels...................152.3.1Ornstein-ZernikeEquation........................162.3.2Closures..................................172.3.31D-RISM.................................182.3.43D-RISM.................................202.4Conclusion.....................................21Chapter3SelfConsistentCouplingofPolarandNonpolarSolvationFreeEnergy...................................223.1TheDGbasedsolvationmodel.........................273.2Parametrizationmethodsandalgorithms....................313.2.1Stabilityconditions............................323.2.2Self-consistentapproachforsolvingthecoupledPDEs.........343.2.3Convexoptimizationforparameterlearning..............363.2.4AlgorithmforparameteroptimizationandsolutionofthecoupledPDEs393.3Numericalresults.................................393.3.1Solventradius...............................423.3.2Optimizationresults...........................443.3.3Five-foldcrossvalidation.........................513.4Conclusion.....................................54Chapter4MolecularConformationinSolvent.................564.1Introduction....................................56vi4.2EulerianSolventExcludedSurface(ESES)...................584.2.1ConstructionofSolventExcludedSurface...............584.2.2ofCartesianGridPoints.................604.2.2.1InsideAtom...........................614.2.2.2InsideVSandInsideSaddleProbe...............614.2.2.3InsideVSandInsideConcaveProbe..............644.2.2.4RayTracing...........................654.2.2.5ComputationoftheIntersectionCoordinates........664.2.3SurfaceMorphology............................664.3ESES:Area,VolumeCalculation........................704.3.1AtomicArea................................734.4ESES:TopologicalAnalysis...........................764.4.1LoopsandCavitiesDetections......................764.4.1.1GeometricBuildingBlock...................774.4.1.2AlgebraicOperations......................784.4.1.3HomologyonESES.......................824.4.2Persistence................................834.5ESES:ElectrostaticsAnalysis..........................904.6Conclusion.....................................92Chapter5CoarseGridPoissonBoltzmannSolver..............945.1Introduction....................................945.2PoissonBoltzmannmodel............................975.3NumericalMethod................................1005.3.1SolventSoluteBoundary.........................1005.3.1.1MSMSSurface.........................1005.3.1.2ESESsurface..........................1015.3.2SolvingthePoissonBoltzmannequation................1025.3.3ElectrostaticSolvationFreeEnergyCalculation............1035.3.4Reactionpotentialrepresentationofthesolvationfreeenergy..1045.3.5Approximatetheatomiccenterreactionpotential........1045.3.5.1Theextensionofthereactionpotential.........1065.3.6ElectrostaticBindingFreeEnergyCalculation.............1075.4NumericalResults-SolvationFreeEnergy....................1095.4.1AnalyticalTests..............................1095.4.1.1Singlecenter-distributeddielectricsphere..........1095.4.1.2Multiplechargedielectricsphere...............1105.4.2RobustontSES.........................1115.4.3ComparisonbetweentPBSolvers...............1145.4.4ConvergenceTestonPBSATestSet..................1185.5NumericalResults-BindingFreeEnergy.....................1195.5.1DataSets.................................1205.5.2ResultsandDiscussion..........................1205.6Conclusion.....................................123viiChapter6HybridPhysicalandStatisticalModelsforSolvationFreeEn-ergyPrediction.............................1286.1Introduction....................................1286.2PolarSolvationModels..............................1336.2.1FreeEnergyFunctional..........................1336.2.2Poissonmodel...............................1366.2.3PolarizablePoissonmodel........................1366.2.4Boundaryandinterfaceconditions....................1396.2.5NumericalMethod............................1416.2.6ReactionFieldEnergyCalculation...................1426.3Nonpolarsolvationmodels............................1436.3.1Modelingofnonpolarsolvationfreeenergy-scoringfunctionalgroups.1456.3.1.1Functionalgroupmodeling...................1456.3.1.2Scoringthefunctionalgroups.................1476.3.2Modelingofnonpolarsolvationfreeenergy-nearestneighborapproach1486.3.2.1Atomicfeaturebasedmolecularranking...........1506.3.2.2Nearestneighborsolvationfreeenergyprediction......1536.3.3protocolforblindsolvationfreeenergypredictions......1536.4Resultsanddiscussions..............................1546.4.1Dateprocessingandmodelvalidation..................1546.4.1.1Datasetsandforce...................1546.4.1.2Atomicsurfaceareaandmolecularvolumecalculation...1566.4.1.3Validationofatomicsurfaceareabasednonpolarmodel..1566.4.2Solvationpredictions...........................1586.4.2.1Leave-one-outprediction....................1586.4.2.2SAMPLxblindpredictions...................1596.5ConcludingRemarks...............................167Chapter7LearningtoRankforSolvationPrediction............1707.1Introduction....................................1707.2Methodsandalgorithms.............................1737.2.1Basicassumptions............................1737.2.2Learning-to-rankalgorithm.......................1757.2.2.1Queryconstruction.......................1757.2.2.2Featureselection........................1777.2.2.3LambdaMARTforranking..................1827.2.2.3.1Informationretrievalmeasures............1827.2.2.3.2LambdaRank.....................1837.2.2.3.3Gradientboosting..................1867.2.2.3.4Gradientboostedregressiontrees..........1877.2.2.3.5LambdaMART....................1887.2.2.3.6LambdaMARTformoleculesranking........1887.2.3Functionalestimationforsolvationfreeenergyprediction.......1897.3Numericalresultsanddiscussions........................1927.3.1Datasetandfeatureparametrization..................192viii7.3.1.1Dataset.............................1927.3.1.2Atomicfeatureparametrization................1937.3.2Leave-one-outprediction.........................1937.3.2.1Numberofneighborsinvolved.................1947.3.2.2Accuracyandsensitivityanalysis...............1977.3.3BlindpredictionofSAMPLxchallenges.................1987.4ConcludingRemarks...............................205Chapter8ProteinLigandBindingFreeEnergyModeling..........2098.1Introduction....................................2098.2Theoryandalgorithm..............................2158.2.1Basicassumptions............................2158.2.1.1Representabilityassumption..................2158.2.1.2Feature-functionrelationshipassumption...........2168.2.1.3Similarityassumption.....................2178.2.2Microscopicfeatures...........................2188.2.2.1Reactionfeatures.....................2188.2.2.2Electrostaticbindingfeatures.................2218.2.2.3AtomicCoulombicinteraction.................2218.2.2.4AtomicvanderWaalsinteraction...............2228.2.2.5Atomicsolventexcludedsurfaceareaandmolecularvolume2228.2.2.6Summaryofmicroscopicfeatures...............2238.2.3MARTrankingalgorithm........................2248.2.4Methodforbindingayprediction.................2258.3Numericalresults.................................2288.3.1Datasetperparation...........................2298.3.1.1Datasets.............................2298.3.1.1.1Validationset(N=1322)..............2298.3.1.1.2Trainingset(N=3589)...............2308.3.1.1.3Testsets........................2308.3.1.2Datapre-processing......................2318.3.2Validation.................................2328.3.2.1Validationonthevalidationset(N=1322).........2338.3.2.2Validationonthetrainingset(N=3589)..........2398.3.3Blindpredictionsonthreetestsets...................2418.3.3.1Predictiononthebenchmarkset(N=100).........2418.3.3.2PredictiononthePDBBindv2007coreset(N=195)...2448.3.3.3PredictiononthePDBBindv2015coreset(N=195)...2468.4Concludingremarks................................248Chapter9DissertationContribution.......................250BIBLIOGRAPHY....................................252ixLISTOFTABLESTable3.1:ThesolvationfreeenergypredictionfortheSAMPL0set.Energyisintheunitofkcal/mol..........................43Table3.2:Thesolvationfreeenergypredictionforthealkaneset.Allenergiesareintheunitofkcal/mol........................46Table3.3:Thesolvationfreeenergypredictionforthealkeneset.Allenergiesareintheunitofkcal/mol........................48Table3.4:Thesolvationfreeenergypredictionfortheetherset.Allenergiesareintheunitofkcal/mol........................50Table3.5:Thesolvationfreeenergypredictionforthealcoholset.Allenergiesareintheunitofkcal/mol........................51Table3.6:Thesolvationfreeenergypredictionforthephenolset.Allenergiesareintheunitofkcal/mol........................52Table3.7:Thepartitionofthemoleculesintosub-groups.............53Table4.1:Thegridtanalysisoftheareaandvolumecalculationforthespherewithradius2.........................73Table5.1:Electrostaticsolvationfreeenergiesofthedielectricsphereswiththecentrallocatedunitpositivechargeequippedwithtradius..111Table5.2:Electrostaticsolvationfreeenergies(kcal/mol)ofKirkwooddielectricspherewithmultiplechargescalculatedbyMIBPB,whereREistherelativeerrorcomparetotheanalyticalelectrostaticsolvationfreeenergy...................................113Table5.3:Electrostaticsolvationfreeenergies(kcal/mol)calculatedviatheMIBPBsoftwarewithMSMSsurfaceattgridsizes...........114Table5.4:Electrostaticsolvationfreeenergies(kcal/mol)calculatedviatheMIBPBwithESESsurfaceattgridsizes................115Table5.5:R2valuesandbestlinesofelectrostaticbindingfreeenergieswithtgridsizes..........................124xTable6.1:Functionalgroupsandcorrespondingnumberofmoleculesusedinthe............................147Table6.2:Atomicfeaturesusedforrankingmolecules...............153Table6.3:MoleculesinSAMPLxsetsinvolvingbromineand/oriodineatoms..157Table6.4:TheRMSEsofthesolvationfreeenergypredictionwithatomicsur-faceareaandvanderWaalsinteractionmodelsofnonpolarsolvationfreeenergyforSAMPL0testset.ThemoleculeintheSAMPL0setthatcontainsBratomisexcludedfromthiscomparison.Allresultsareinunitkcal/mol...........................158Table6.5:TheRMSEsoftheleave-one-outtestofthesolvationfreeenergypredictionwithtmethods,allwithunitkcal/mol......159Table6.6:TheRMSEsofthesolvationfreeenergypredictionwithtmethods.TheRMSEsinsideandoutsidetheparenthesisdenoteforthepredictionerrorsincludeandexcludethemoleculescontainsBratom.Allwithunitkcal/mol......................165Table7.1:AtomicfeaturesusedforLTRnearestneighborsearching.......180Table7.2:TheRMSEandMEofthesolvationfreeenergypredictionwithentparametrizationonthemolecules,theerrorsarecalculatedbytheproposedsolvationmodelswithtnumberofnearestneighborsinvolved.Allwithunitkcal/mol.....................198Table7.3:Thenumberofnearestneighborinvolvedforthesolvationfreeenergypredictionfordtforceparametrization...........200Table7.4:TheRMSEandMEoftheleave-one-outtestofthesolvationfreeen-ergypredictionwithtmethods.Foracomparison,thenum-bersintheparenthesisiscalculatedbythemethodproposedinthework[256].Allwithunitkcal/mol...................200Table7.5:TheRMSEsandMEsofthesolvationfreeenergypredictionswithdif-ferentparametrizations.ThenumbersinsideandoutsideparenthesesaretheresultscalculatedbytheHPKandLTRmodels,respectively.Allerrorsarewithunitkcal/mol....................207Table8.1:TheRMSEs(kcal/mol)forthee-foldvalidationonthe7clustersofthevalidationsetandonthewholevalidationset(N=1322)with10tdistancesinthefeatureextraction.........236xiTable8.2:TheRMSEs(kcal/mol)forthee-foldtestonthevalidationset(N=1322)withFFT-BPcalculatedattcutdistances..239Table8.3:TheRMSEs(kcal/mol)forthevalidationset(N=1322)withentnumbersofnearestneighborsandtopfeatures..........240Table8.4:TheRMSEs(kcal/mol)forthee-foldcrossvalidationonthetrain-ingset(N=3589)withtnumberofnearestneighborsandtopfeatures...............................242Table8.5:TheRMSEs(kcal/mol)oftheFFT-BPforthebenchmarktestset(N=100)withntnumbersofnearestneighborsandtopfeatures.244Table8.6:TheRMSEs(kcal/mol)oftheFFT-BPforthePDBBindCore2007testset(N=195)withtnumbersofnearestneighborsandtopfeatures...............................247Table8.7:ThepredictionRMSEs(kcal/mol)forthePDBBindv2015coreset(N=195)withtnumbersofnearestneighborsandtopfea-tures...................................250xiiLISTOFFIGURESFigure3.1:TherelationsbetweenthesolventradiiandtheRMSEs.(a)SAMPL0testset;(b)Alkaneset;(c)Alkeneset;(d)Etherset;(e)Alcohol;(f)Phenolset.Notably,thereisacommonlocalminimumatthesolventradii3.0Aforalltestsetsexceptforthealkeneset.........42Figure3.2:Thepredictedandexperimentalsolvationfreeenergyforthe17moleculesintheSAMPL0testset.........................44Figure3.3:Thepredictedandexperimentalsolvationfreeenergiesfor38alkanemolecules.................................45Figure3.4:Thepredictedandexperimentalsolvationfreeenergiesfor22alkenemolecules.................................45Figure3.5:Thepredictedandexperimentalsolvationfreeenergyforthe17ethermolecules.................................49Figure3.6:Thepredictedandexperimentalsolvationfreeenergyforthe25alco-holmolecules...............................49Figure3.7:Thepredictedandexperimentalsolvationfreeenergyforthe18phe-nolmolecules...............................49Figure3.8:Thebarplotofthetrainingandvalidationerrorsofalkanes.....53Figure3.9:Thebarplotofthetrainingandvalidationerrorsofalkenes.....53Figure3.10:Thebarplotofthetrainingandvalidationerrorsoftheethers....54Figure3.11:Thebarplotofthetrainingandvalidationerrorsofalcohols.....54Figure3.12:Thebarplotofthetrainingandvalidationerrorsofphenols.....54Figure4.1:ThreetypesofpatchesforSES:convexpatches(red),saddlepatches(green)andconcavepatches(blue)...................60Figure4.2:FigureforInsideVS.SolidlinesaretheoutlineofasimpleSEScom-posedoftwoatoms(red)andonesaddlepatch(green).Blackcircleistheoutlineofthecorrespondingvisibilityspherewhentheprobetouchesatomiandatomjsimultaneously.InsideVSistaggedastruewhenthepointisdetectedasinsidethevisibilitysphere.......63xiiiFigure4.3:FlowchartforInsideVS..........................63Figure4.4:FlowchartforInsideTorus........................64Figure4.5:FigureforInsideVT.Aprobe(red)centeredatCpisincontactwiththreeatoms(blue)centeredatC1,C2andC3respectively.InsideVTistaggedastruewhenthepointisdetectedasinsidethetetrahedronCpC1C2C3.................................65Figure4.6:FigureforanunlabeledCartesiangridpoint.Thefromlefttorighthighlightpartofthe1A2Emodelwithtpatchesrespectively,wherewecanseetheCartesianpoint(yellow)shouldbetaggedasinsidetheSES.However,itdoesnotsatisfytheconditionsproposedinEq.(4.2.1).Fromlefttorightare:visual-izationwithatomonly,Visualizationwithatomsandsaddlepatchesonly,andVisualizationforoverallSES,respectively..........67Figure4.7:Validationforintersectionwithconvexpatches.SolidlinesillustratetheoutlineofasimpleSEScomposedofthreeatoms(red)andtwosaddlepatches(green).P1P2andP3P4aretheCartesianedges.Notehereweonlymarkthepossibleintersectionswithconvexpatches.EdgeP1P2seemstohavetwointersectionsQ1andQ2withconvexpatches,whereQ1isactuallyaninvalidintersectionsinceitisinsideofatom2.Similarly,theintersectionpointQ3foredgeP3P4isnotvalid,sinceitisincludedbysaddlepatch2..............68Figure4.8:Validationforintersectionwithsaddlepatches.SolidlinesillustratetheoutlineofasimpleSEScomposedoftwoatoms(red)andtwosaddlepatches(green),wheretwoseparatesaddlepatchesbelongtothesametorus.Theblackdashedcircleistheoutlineofthecorrespondingvisibilitysphere.Singularitiesaregeneratedinthiscaseandtheyareappropriatelyhandledimplicitlywithourdesignedconditions.P1P2andP3P4aretheCartesianedges.Notehereweonlymarkthepossibleintersectionswithsaddlepatches.TherearetwointersectionsQ1andQ2foredgeP1P2withsaddlepatches,whereQ1isinvalidsinceitsassociatedBooleantagIsInLemonistrue.TheintersectionpointQ3foredgeP3P4withsaddlepatchesisnotvalideither,sinceitisoutsideofthecorrespondingvisibilitysphere....69Figure4.9:TheMSMSsurfacesforthebiomolecule(PDBID:1dqz).ChartsfromtoplefttobottomrightaretheMSMSsurfaceswiththedensities5,10,25and50,respectively.TheMSMSfailstogeneratecorrectsurfaceatdensity50,whileitworkswellotherdensities........70Figure4.10:TheESESsurfaceforprotein1DQZ...................71xivFigure4.11:ConvergencetestofthesurfaceareaandenclosedvolumeforthePBSAtestsetcomparedtotheresultsobtainedatgridsize0.2A.(a)Area;(b)Volume...........................74Figure4.12:TheconsistenceoftheareaandvolumecalculationbetweentheMSMSandESESsoftwarepackagesforpartofthePBSAtestsetthatMSMScangeneratethesurfacesuccessfulwithoutchangeofatomicradiusatthedensity100(here740biomoleculesused)forvolumecalcula-tionandanalyticsurfaceareabyMSMSsoftware.TheESESresultsaregeneratedatgridsize0.2A.(a)Fortheareaconsistence,thecor-relationis0:9999,thebestlineisy=0:9934x+14:3307;(b)Forthevolumeconsistence,thecorrelationis0:9999,thebestlineisy=1:0040x23:9577.......................75Figure4.13:TheSESfortheC60moleculewithproberadius0.1A,andthevanderWaalsradiusoftheCarbonatomissettobe0.8A........84Figure4.14:ThenumberofloopscalculatedattgridsizesfortheaboveC60solventexcludedsurface.......................84Figure4.15:Thesolventexcludedsurfacesandtheirtopologyoftwobiomolecules.(a)TheESESresultforprotein2AVHwith3loopsand1cavity.(b)Acrosssectionofprotein2AVHshowingtheloops.(c)TheESESresultforprotein1af8with2loopsand2cavities.(d)Acrosssectionofprotein1af8showingtheloopsandcavities.............85Figure4.16:ThelevelsetrepresentationoftheSESfortheC60molecule.(a)thelevelsetrepresentationoftheSES;(b),(c),and(d)representtheframeoftheevolutionattime20,40,and60,respectively.......89Figure4.17:ThelevelsetrepresentationoftheSESforprotein1clh.(a)ThelevelsetrepresentationoftheoriginalSES;(b),(c),and(d)Framesofpropagatedsurfacesattime25,50,and100,respectively.......90Figure4.18:Thepersistencediagramsoftheloops(Leftchart)andcavities(Rightchart)intheSESofprotein1clh,respectively.............91Figure4.19:Theconvergenceoftheelectrostaticssolvationfreeenergiescalcu-latedbyusingMSMSsurfacestothoseobtainedbyusingtheESESsurfaces.(a),(b)and(c)arefornucleicpeptidemoleculeswithPDBIDs1A2E,1BNAand1L4J,respectively.(d),(e)and(f)areforproteinmoleculeswithPDBIDs1a93,1acaand1b8w,respectively.AlltheenergiesareobtainedbysolvingthePBequationwiththeMIBPBsoftwareatgridsize0.5A...................92xvFigure5.1:Leftchart:bindingofDNA-drugcomplex(PDBID:121D).Rightchart:bindingofbarnbase-barstarcomplex(PDBID:1b3s).....109Figure5.2:Therelativeerrorsoftheelectrostaticsolvationfreeenergiescom-paredtotheresultscalculatedat0.2AcomputedbytheMIBPBmethodonsurfacesgeneratedbyESESandMSMSaveragedover25proteins..................................116Figure5.3:PerformanceoftPBsolvers,andtheMIBPBsolveronbothMSMSandsolventexcludedsurfaces.Fromlefttoright,uptobot-tom,the25representsthemoleculewithPDBID:1a2s,1a63,1a7m,1ajj,1bbl,1bor,1bpi,1cbn,1fca,1frd,1fxd,1hpt,1mbg,1neq,1o7b,1ptq,1r69,1sh1,1svr,1uxc,1vii,1vjw,2erl,2pde,451c,respectively................................117Figure5.4:TherelativedeviationoftheelectrostaticssolvationfreeenergiesbytPBsolversandMIBPBontsurfacescomparetotheresultsatthesmallestgrid........................118Figure5.5:Theaveragecpucostofthe25biomoleculesofthetPBsolversandMIBPBontsurfaces....................119Figure5.6:Therelativedeviationoftheelectrostaticssolvationfreeenergywithrespecttothatatgridsize0.3Aforthe937biomolecules.......120Figure5.7:Electrostaticbindingfreeenergy,forallcomplexeswithtgridsizesplottedagainsttheonecomputedwithagridsizeofh=0.2A(a)DNA-drugwithpair(0.2A,0.3A);(b)DNA-drugwithpair(0.2A,0.7A);(c)DNA-drugwithpair(0.2A,1.1A);(d)Barnase-barstarwithpair(0.2A,0.3A);(e)Barnase-barstarwithpair(0.2A,0.7A);(f)Barnase-barstarwithpair(0.2A,1.1A);(g)RNA-peptidewithpair(0.2A,0.3A);(h)RNApeptidewithpair(0.2A,0.7A);(i)RNA-peptidewithpair(0.2A,1.1A)...............123Figure5.8:BindingelectrostaticenergyforDNA-drugcomplexeswithgridsizesfrom0.2Ato1.1A.ThemarkersandPDBIDsareasfollowsyellowcircle:102d,magentacircle:109d,cyancircle:121d,greencircle:127d,redcircle:129d,bluecircle:166d,blackcircle:195d,yellowdiamond:1d30,magentadiamond:1d63,cyandiamond:1d64,greendiamond:1d86,reddiamond:1dne,bluediamond:1eel,blackdiamond:1fmq,yellowsquare:1fms,magentasquare:1jtl,cyansquare:1lex,greensquare:1prp,redsquare:227d,bluesquare:261d,blacksquare:264d,yellowtriangle:289d,magentatriangle:298d,cyantriangle:2dbe,greentriangle:302d,redtriangle:311d,bluetriangle:328d,blacktriangle:360d................125xviFigure5.9:Bindingelectrostaticenergyforbarnase-barstarcomplexeswithgridsizesfrom0.2Ato1.1A.ThemarkersandPDBIDsareasfollowsyellowcircle:1b27,magentacircle:1b2s,cyancircle:1b2u,greencircle:1b3s,redcircle:1x1u,bluecircle:1x1w,blackcircle:1x1x,yellowdiamond:1x1y,magentadiamond:2za4.............126Figure5.10:BindingelectrostaticenergyforRNA-peptidecomplexeswithgridsizesfrom0.2Ato1.1A.ThemarkersandPDBIDsareasfollowsyellowcircle:1a1t,magentacircle:1a4t,cyancircle:1biv,greencircle:1exy,redcircle:1g70,bluecircle:1hji,blackcircle:1i9f,yellowdiamond:1mnb,magentadiamond:1nyb,cyandiamond:1qfq,reddiamond:1ull,greendiamond:1zbn,bluediamond:2a9x,blackdiamond:484d.......................126Figure6.1:Anillustrationofthefunctionalgroupscoringmethodforthepredic-tionofthesolvationfreeenergyformolecule2-Chlorosyringaldehyde(PubchemID:53479),whichcontainsfourfunctionalgroups:alde-hydegroup,phenolichydroxyl,etherandchlorinatedhydrocarbon.Wecomputerelativeweightsbetweenphenolichydroxylandchlo-rinatedhydrocarbon;phenolichydroxylandether;phenolichydroxylandestergroup;andestergroupandaldehydegroup.Thenrelativeweightsarecombinedtogeneratethefullsetofweights!1;!2;!3and!4forsolvationfreeenergyprediction...............150Figure6.2:Thifensulfuron(PubchemID:91729)..................151Figure6.3:Theleave-one-outerrorofthewholetrainingsetwith16erentchargeandradiusparametrizations.Thenearestneighborapproachisemployedforsolvationfreeenergyprediction............159Figure6.4:Theplotofoptimalleaveoneouttestresults,wheretheoptimalpre-dictionachieveswhenBCCchargeisusedinconjunctionwitheithertheAmberMBondi2ortheZAP9radius.Inboththesecases,thepredictionwithRMSE1.33kcal/mol.Thecorrelationbetweenpre-dictionandexperimentalsolvationfreeenergiesare0.956and0.955,respectively,fortheAmberMBondi2andZAP9forceThecorrespondingR2are0.913and0.911,respectively...........160Figure6.5:ThepredictionresultsfortheSAMPL0blindtestset,theleftchartshowstheRMSEsbetweentheexperimentalandpredictionsolvationfreeenergies.TherightchartshowstheoptimalpredictionsofthesolvationfreeenergiesfortheSAMPL0testset............161xviiFigure6.6:ThepredictionresultsfortheSAMPL1blindtestset,theleftchartshowstheRMSEsbetweentheexperimentalandpredictionsolvationfreeenergies.TherightchartshowstheoptimalpredictionsofthesolvationfreeenergiesfortheSAMPL1testset............161Figure6.7:ThepredictionresultsfortheSAMPL2blindtestset,theleftchartshowstheRMSEsbetweentheexperimentalandpredictionsolvationfreeenergies.TherightchartshowstheoptimalpredictionsofthesolvationfreeenergiesfortheSAMPL2testset............162Figure6.8:ThepredictionresultsfortheSAMPL3blindtestset,theleftchartshowstheRMSEsbetweentheexperimentalandpredictionsolvationfreeenergies.TherightchartshowstheoptimalpredictionsofthesolvationfreeenergiesfortheSAMPL3testset............162Figure6.9:ThepredictionresultsfortheSAMPL4blindtestset,theleftchartshowstheRMSEsbetweentheexperimentalandpredictionsolvationfreeenergies.TherightchartshowstheoptimalpredictionsofthesolvationfreeenergiesfortheSAMPL4testset............163Figure7.1:Theplotofsolvationfreeenergiesofthecentralanditsneighbormolecules,theleftchartforthenearestneighbor,therightchartforthesecondnearestneighbor.Inbothcases,thehorizontalaxisrepresentthesolvationfreeenergyforthecentralmolecule,thever-ticalaxisstandsforthatofthenearestneighbormolecule......175Figure7.2:Localizationofnearestneighbormolecules.Thehorizontalaxisstandsfortheindexofatargetmoleculeandtheverticalaxisdenotestheindexofthenearestneighborofthetargetmolecule.Eachblockcontainsaqueryofmolecules......................179Figure7.3:Correlationsbetweenfeaturesandexperimentalsolvationfreeener-gies.Thehorizontalaxesrepresentfortheexperimentalsolvationfreeenergies.Fromlefttoright,threechartsintheverticalaxesrepresenttotalreactionenergies,theabsolutevalueofthemeanreactionenergiesofallatoms,andtheabsolutevalueofthetotalreactionenergyofhydrogenatoms,respectively..............182Figure7.4:Correlationsbetweenfeaturesandexperimentalsolvationfreeener-gies.Thehorizontalaxisesrepresentfortheexperimentalsolvationfreeenergies.Fromlefttorightanduptodownfourtheverticalaxesrepresentsolventexcludedsurfacearea,volumeenclosedbythesolventexcludedsurface,thevanderWaalsinteractionofHandOatoms,respectively........................183xviiiFigure7.5:IllustrationofpredictionRMSEsobtainedwithtmolecularparametrizationsbytheLTRandHPKmodelsforSAMPL0testset.202Figure7.6:IllustrationofpredictionRMSEsobtainedwithtmolecularparametrizationsbytheLTRandHPKmodelsforSAMPL1testset.204Figure7.7:IllustrationofpredictionRMSEsobtainedwithtmolecularparametrizationsbytheLTRandHPKmodelsforSAMPL2testset.205Figure7.8:IllustrationofpredictionRMSEsobtainedwithtmolecularparametrizationsbytheLTRandHPKmodelsforSAMPL3testset.206Figure7.9:IllustrationofpredictionRMSEsobtainedwithtmolecularparametrizationsbytheLTRandHPKmodelsforSAMPL4testset.208Figure8.1:ThepredictionRMSEvsthedistance..............239Figure8.2:Five-foldcrossvalidationonthevalidationset(N=1322).Leftchart:correlationbetweenexperimentalbindingandFFT-BPpredictions.Rightchart:RMSEsforegroups.Here,RMSEsare1.55,1.58,1.55,1.56,and1.59kcal/molforegroups,respec-tively.OverallPearsoncorrelationtotheexperimentalbindingtiesis0.80................................241Figure8.3:Five-foldcrossvalidationonthetrainingset(3589complexes).Leftchart:correlationbetweenexperimentalbindingandFFT-BPpredictions.Rightchart:RMSEsforegroups.Here,RMSEsare2.01,1.96,1.97,1.98,and2.00kcal/molforegroups,respec-tively.OverallPearsoncorrelationtotheexperimentalis0.70....243Figure8.4:ThecorrelationbetweenexperimentalbindingfereeenergiesandFFT-BPpredictionsonthebenchmarktestset(N=100)withtheRMSEof1.99kcal/molandthePearsoncorrelationof0.75..........245Figure8.5:Performancecomparisonbetweentscoringfunctionsonthebenchmarktestset(N=100).ThebindingycomparisonswasdoneforFFT-BP,and19well-knownscoringfunctions,namelyLISA,KECSA,LISA+[288,287],ITScore/SE[117],ITScore[116],X-Score[264],DFIRE[282],DrugScoreCSD[244],DrugScorePDB[96],Cerius2/PLP[88],SYBYL/G-Score[130],SYBYL/D-Score[171],SYBYL[74],Cerius2/PMF[180],DOCK/FF[171],Cerius2/LUDI[23],Cerius2[1],SYBYL/F-Score[201],andAutoDock[179]...............246xixFigure8.6:ThecorrelationofbetweenexperimentalbindingfereeenergiesandFFT-BPpredictionsonthePDBBindcore2007(N=195).TheRMSEandPearsoncorrelationcotare2.08kcal/moland0.76,respectively................................248Figure8.7:PerformancecomparisonbetweentscoringfunctionsonthePDBBindv2007coreset(N=195).Theperformancesoftheotherscoringfunctionareadoptedfromtheliterature[151,9,150,8,48].249Figure8.8:ThecorrelationbetweenexperimentalbindingfreeenergiesandFFT-BPpredictionsonthePDBBindv2015coreset(N=195).TheRMSEandPearsoncorrelationcotare1.92kcal/moland0.78,respectively................................250xxChapter1IntroductionInnumerousphysical,chemical,andbiologicalapplications,weencounterthesolvationprob-lem,inwhichthesolvationplayacrucialroleinthewholeprocess.Theseincludeschemicalreactions,ionchannelpermeations,proteinligandbinding,electrontransfer,sig-naltransduction,DNAsptranscription,post-transcriptionmogeneexpression,proteinsynthesis,etc.[228,60,204,228,60,140,204,228,62,266].TherehasbeenalonghistoryofpaidtostudythesolvationManysuccessfulmodelshavebeendevelopedforaddressingthesolvationintapplications.Naturally,basedonthephysicalmodelingofthesolvent,thesemodelsarebroadlyintothreecategories.Molecularmechanicsmodelingofthesolventyieldstheexplicitsolventmodels[36,69,188];statisticalmechanicsdescriptionleadstotheintegralequationsolventmodels[91,119,113,67,110,203];continuummechanicsdescriptiongivestheimplicitsolventmod-els[233,232,199,221,94,296,219].Lately,moreandmoreattentionhasbeendrawnforstudyingthesolvationespeciallyforthebiologicalandpharmaceuticalsciences.Fromtheapplicationpointofview,oneofourultimategoalsistohaveabetterunder-standingofthesolvationproblemsandtherelatedissues,e.g.,theproteinligandbindingphenomena.Moreimportantly,themodelsandnumericalmethodsdevelopedshouldmeettheexperiments,suchasthemethodologiesdevelopedarecapabletoprovideaccuratemolec-ularsolvationfreeenergyandproteinligandbindingfreeenergyprediction.Thisinturncangivesomeguidancetotheexperimentsandisapplicabletotheindustry.Forthesereasons,1therearetwomajorrequirementsonthemodelsandnumericalmethods.First,themodelofthesolvationshouldnotbeheavilydependentontheforceparametrization[37,257].Second,thenumericalmethodshouldbeaccurateenoughforcapturingthephysicalmean-ingfulresultsprovidedbythemodels.Forinstance,intheCartesianmeshbasednumericalmethods,thegridsizeuenceshouldnottheresultstoomuch[107,24].Thesetwothemes,developingforceparametrizationindependentmodelsandgridsizeindependentnumericalschemes,arecarriedthroughoutthisdissertation.Inchapter2,wereviewtheclassicalthreefamiliesofthesolvationmodels.Theprosandconsoftsolvationmodelsareshortlydiscussed.Weproviderelativelyde-taileddiscussionandmathematicaldescriptionoftheimplicitsolvationandintegralequationbasedsolvationmodels,bothofwhicharemoremathematicallyinterestinginmypointofview.Thesolvationmodelspresentedinthischapterwillbetheoutlineofthewholedis-sertation,andinthelaterchapterswewillhandlesomedetailedissuesthatappearinthesolvationmodels.Inchapter3,wepresenttherecentlyproposedtialgeometrybasedimplicitsolva-tionmodel.Thismodelcouplesthepolarandnonpolarsolvationtsinaself-consistentmanner.Thetotalvariationtheoryisemployedfordescribingthesolventsoluteinterface.Inthismodel,thesolventsoluteinterfaceandsoluteelectrostaticsareoptimizedsimultane-ously.Itistfromtheclassicalsolvationmodelswithaseparatedmolecularsurfacemodeling.Weprovideasystematicparametersoptimizationstrategyinthisdissertation.Chapter4presentstheEuleriansolventexcludedsurface(ESES),whichisthesolventexcludedsurfaceembeddedintheCartesianmesh,designedforthenitebasednumericalmethods.Comparedtotheexistingsolventexcludedsurfacesoftware,oursurfaceisdensitytotallyindependent,analyticalwithoutanyapproximation.Themolecularsurface2area,volume,andmolecularelectrostaticanalysisindicatethatthestate-of-the-artsoftwareMSMS[212]convergestoourESESsoftwarebothqualitativelyandquantitatively.Besidesthesurfacegeneration,wealsostudythebiomoleculartopologicalstructuresinthischapterviathehomologytheory,whichshowsgreatsuccessforunderstandingthetopologicalstruc-turesofthemolecules.LevelsetmethodincompanionwiththepersistenthomologytheoryareusedforstudyingthepersistenceofthetopologicalfeaturesassociatedwiththeESES.ThisisthejointworkwithBeibeiLiuetal.Inchapter5,wepresentthenumericalmethodforthecoarsegridPoissonBoltzmannsolver.ComparedtoalltheexistingPoissonBoltzmannsoftware,weprovidethemostaccurateandgridspacingindependentreactionenergycalculation,thusmeetingthebasicthemeofthisdissertation.Thesolverisconstructedbythefollowingfourinstruments:Thesolvatedmolecularconformationmodeling;Treatmentofthesingularchargesarisefromthesolutemolecularparametrization;TreatmentofthecomplexgeometryoftheinterfaceinthediscretizationofthePoissonBoltamzannequation;Theevaluationofthereactionenergy,inwhichthecoarsegridatomiccentralelectrostaticpotentialevaluationisaddressed.Alargeamountoftests,rangingfromanalyticalteststomorethanonethousandbiomoleculartests,indicateoursoftwarecanprovidelessthan0.4%errorfortheelectrostaticcalculationforstudyingthesolvations,evenatthegridsize1.1A.Afurtherstudyoftheelectrostaticbindingfreeenergybyoursoftwaredemonstratestheaccuracyforstudyingtheproteinligandbinding.Thisworkisprovidedasafreeonlineseverfortheelectrostaticanalysisofthesmallmoleculeandbiomolecules.Theblindsolvationfreeenergypredictionproblemisconsideredinbothchapters6and7.Inchapter6,weproposedahybridphysicalandstatisticalmodelforthesolvationfreeenergyprediction,inwhichthesolvationfreeenergyismodeledasthesummationoftwo3isolatedcomponents,polarandnonpolarenergies.ThePoissonmodelanditspolarizableversionareutilizedformodelingthepolarsolvationfreeenergy.Thestatisticalmodelisadoptedforthenonpolarsolvationfreeenergymodeling,inwhichweassumethesameclassofmoleculesadmitthesamesetofparametersinthenonpolarsolvationfreeenergyfunction.Motivatedbytheworkinchapter6,weproposeaandforceparametrizationlesssensitiveapproachforthesolvationfreeenergypredictioninchapter7,learningtorankbasedsolvationprediction.Instead,thesolvationfreeenergyitselfisregardedasaunityentry.Thebasicassumptionnowisthatsimilarmoleculestakeclosesolvationfreeenergies,whichisassumedtobeafunctionofthemoleculardescriptors.Tothisend,weemploythelearningtorankmethodfortheneighbormoleculestothetargetmolecule,thenutilizetheneighborinformationfortrainingalinearfunction,andfurthertopredictthesolvationfreeenergyofthetargetmolecule.Inchapter8,weextendthelearningtorankbasedsolvationpredictiontoaproteinligandbindingfreeenergyprediction.Weproposealearningtorankbasedscoringfunctionforaccurateproteinligandbindingyprediction.Ourscoringfunctioncanberegardedasahybridforceandknowledgemethod.Alargeamountofnumericalresultsdemonstratetheaccuracyoftheproposedscoringapproach.Finally,wesummarizethecontributionofthisdissertationinchapter9.4Chapter2SolvationModelsSolventmodelsareavarietyofmethodstoaccountforthebehaviorofsolvatedcondensedphases,whichallowssimulationofchemicalreactionsandbiologicalprocessesthattakeplaceinthesolvatedphases[222].Suchsolvationincorporatedsimulationsallowbetterpredictionsandimprovedunderstandingofthephysicalprocesses.Thevarioussolvationmodelscanbegenerallyclasedbasedonthephysicaldescriptionofthesolventmolecules:Explicitsolventmodelsmodelthesolventmoleculeattheatomiclevel[36,69,188];implicitsolventmodelsmodelthesolventsimplyasadielectriccontinuum[233,232,199,221,94,296,219];theinte-gralequationbasedsolvationmodelsmodelthesolventdistributionbasedonthestatisticalmechanicstheory[91,119,113,67,110,203].Implicitmodelsaregenerallycomputationallytandcanprovideareasonabledescriptionofthesolventbehavior,butfailtoaccountforthelocaltuationsinsolventdensityaroundasolutemolecule.Thedensityuationbehaviorisduetosolventorderingaroundasoluteandisparticularlyprevalentwhenwaterisconsideredasthesolvent.Explicitmodelsareoftenlesscomputationallyeconomical,butcanprovideaphysicalspatiallyresolveddescriptionofthesolvent.However,manyoftheseexplicitmodelsarecomputationallydemandingandcanfailtoreproducesomeexperimentalresults,oftenduetocertainttingmethodsandparametrization.Integralequationbasedsolvationmodelsmediatetheprosandconsofbothimplicitandexplicitsolventmodels.Inthischapterwewillprovidesabriefreviewofthetlevelofsolvationmodels,withemphasisontheimplicitsolventmodelsandintegralequationbasedsolvationmodels.52.1ExplicitSolventModelsExplicitsolventmodelstreatexplicitly,i.e.,thecoordinates,andusuallyatleastsomeofthesolventmoleculardegreesoffreedom,areincludedinthesolventmodel.Thismodelprovidesthemostrealisticmodelingofthesolventsoluteinteractionamongtlevelsofsolvationmodels,especiallywhenthelongrangeelectrostaticsinteractionaredealtwiththeEwaldsummationorFastMultipoleMethod(FMM).Thesemodelsgenerallyoccurintheapplicationofmolecularmechanics(MM),moleculardynamics(MD),andMonteCarlo(MC)simulations.Thesesimulationsoftenemploymolecularmechanicsforcewhicharegenerallyempirical,theforceareusuallyparameterizedbasedonahigherleveltheoryorexperimentaldata[55,125].Theexplicitsolventmodelgivesthemostdetaileddescriptionofthesolvent,andinturnitisextremelycomputationallyexpensive.2.2ImplicitSolventModels2.2.1IntroductionWaterhasmanychemicallyandbiologicallynecessaryproperties,oneofwhichisadielectric.Asadielectric,waterscreens(lessens)electrostaticinteractionsbetweenchargedparticles.Watercanthereforebecrudelymodeledasadielectriccontinuum.Inthismanner,theelectrostaticforcesofabiologicalsystemcanbeexpressedasasystemoftialequationswhichcanbesolvedfortheelectriccausedbyacollectionofcharges.Implicitsolventmodelsareaclassofimportantsolvationmodels,implicitduetothecontinuumdescriptionofthesolvent.Itisgenerallybelievedtobethebestcompromisebetweenaccuracyand6.2.2.2PoissonBoltzmann(PB)ModelThePoissonBoltzmannequation(PBE),whichcanbeformulatedas((r)r˚(r))=ˆm(r)+Xiciqiexpqi˚(r)kBT(2.2.1)isanonlinearequationwhichsolvesfortheelectrostatic˚(r),basedonthepositiondependentpermittivityfunction(r),thesolutechargedistributionˆm(r),andthebulkchargedensityciofionqi.Thisequationexactlysolvesfortheelectrostaticofachargedistributioninadielectric.Mathematicallyspeaking,PBEisanellipticinterfaceproblemwithcomplexinterfacegeometryandsingularsource.Thecomplexinterfacecomesfromthecomplexshapeofmoleculesinthesolvent,andthesingularsourceisduetothesolutechargedistribution[76].TheapplicationofthePBmodelvariesinmanytscienItisalsoknowninelectrochemistryasGouyChapmantheory;insolutionchemistryasDebyeHuckeltheory;incolloidchemistryasDerjaguinLandauVerweyOverbeek(DLVO)theory.OnlyminormoarenecessarytoapplythePBEtovariousinterfacialmodels,makingitahighlyusefultoolindeterminingelectrostaticpotentialatsurfaces[29].2.2.3GeneralizedBorn(GB)ModelEventhoughthePBmodelcalculatestheelectrostaticinthesolventmediumexactly,itisverycomputationallyexpensive.TheGeneralizedBorn(GB)equationprovidesanap-proximationofthePBE,andafastcalculationoftheelectrostaticinthesolvent7environment.TheGBmodelmodelsatomsaschargedsphereswhoseinternaldielectricislowerthanthatoftheenvironment.Thescreeningwhicheachatom,i,experiencesisdeter-minedbythelocalenvironment:themoreatomiissurroundedbyotheratoms,thelessit'selectrostaticswillbescreenedsinceitismoresurroundedbylowdielectric;thispropertyiscalledoneatomdescreeninganother.tGBmodelscalculateatomicdescreeningintapproaches.DescreeningisusedtocalculatetheBornradius,i,ofeachatom.TheBornradiusofanatommeasuresthedegreeofdescreening.AlargeBornradiusrepresentssmallscreening(strongelectricasiftheatomwereinvacuum.AsmallBornradiusrepresentslargescreening(weakelectricasiftheatomwereinbulkwater.WewillgiveashortreviewofthebasicideasbehindtheGBmodel,moredetailedtheorycanbefoundintheworks[268,191].2.2.3.1GeneralizedBornEquationsInaGBsimulation,thetotalelectrostaticforceonanatomi,isthebetweenthenetCoulombicforceandGBforceintheatomi,wheretheGBforceiscontributedfromthenearbyatoms:Fi=FCoulombiFGBi;wheretheelectrostaticforcesarecontributedbyothernearbyatomswithinaTheGBforceonatomiisthederivativeofthetotalGBenergywithrespecttorelative8atomdistancerij,i.e.,FGBi=Xj"dEGBTdrij#^rij(2.2.2)=X"Xk@EGBT@kkdrij+@EGBij@rij#^rij=Xj"@EGBT@iidrij+@EGBT@jjdrij+@EGBij@rij#^rij;wherethepartialderivativeareincludedsincetheBornradius,,isafunctionofallrelativeatomdistances.ThetotalGBenergyofthesystemisEGBT=XiXj>iEGBij+XiEGBii;(2.2.3)whereEGBiiistheBornradiusdependentselfenergyofatomi,andtheGBenergybetweenatomsiandjisgivenbyEGBij=keDijqiqjfij;thedielectrictermDijisgivenbyDij=1mexp(ij)s;andtheGBfunctionisgivenbyfij=vuutrij+ijexp r2ij4ij!:9Theconstantsreferredintheaboveequationsarelistedbelowke=332.063711kcalA/e2istheCoulombconstant.s,dielectricconstantofsolvent.m,dielectricconstantofsolute.0,dielectricconstantofthevacuum.,theDebyescreeninglength,calculatedfromionconcentrationbasedon1=r0pkT2NAe2T.2.2.4PolarizableContinuumModelPolarizableContinuumModel(PCM)isanotherclassofimplicitsolventmodels,thekeycomponentsofthisclassofmodelsaretheAbInitiochargecalculationthroughtlevelofquantummechanicstheories,andtheincorporationofthesolventsolutepolarizationthroughaselfconsistentcouplingoftheelectrondensityandelectrostaticequations[238,239,71].ConsiderthesolutemoleculeMinthesolvent,intheelectrostaticsolvent-soluteinterac-tionwherethechargedistributionˆMofthesoluteinsidethecavitypolarizesthedielectriccontinuum,whichinturn,polarizesthesolutechargedistribution.ThePCMmodelneststheclassicalelectrostaticproblemintoaquantummechanicalframeworktostudypolar-izationLetHMbetheHamiltonianofthesoluteMinsolvent,whichdependsonthecoordinatesoftheNelelectrons:q=fq1;q2;;qNelg,andonthecoordinatesoftheNnucnuclei:Q=fQ1;Q2;;QNnucg,andletH0MbethecorrespondingHamiltonianwithBorn-Oppenhemierapproximationinvacuumwhichhasthesamedependence.InthePCM10model,wehaveHM(q;Q)=H0M(q;Q)+Vint;(2.2.4)whereVintisthesolutesolventinteractionpotential.TherelatedSchrodingerequationisHM(q;Qf(q;Q)=Ef(Qf(q;Q);(2.2.5)herethesuperscriptsfindicatesthesolutionwasobtainediteratively.AlltherelevantinformationaboutthesolventonthesoluteMiscontainedintheeigenvalueEfandinthewavefunctionf.ThechargeofthesolutemoleculeisthesumofadiscretenuclearchargedistributionandtheelectrondensityfunctionˆM(q;Q)=ˆnuc(q;Q)+ˆel(q;Q);(2.2.6)wherethenucleiandelectronchargearegiven,respectively,byˆnuc(q;Q)=XZ(rQ);(2.2.7)ˆel(q;Q)=Zjf(q;Q)j2dq1dqNel;(2.2.8)whereZisanuclearchargeandtheindexrunsoverallthenucleiofM.InEq.(2.2.4),theinteractionbetweenthesoluteandsolventVint,whichdependsonq,11Q,andathermallyaverageddistributionfunctionofthesolventmolecules,gS,isgivenbyVint=Vint(q;Q;gS);(2.2.9)inthebasiccontinuummodel,theinteractiontermisreducedtoitsclassicalelectrostaticcomponentVint:=V˙(q;Q;ˆM)=XZ˙(Q)Xi˙(qi);(2.2.10)where˙(r)isthevalueoftheelectrostaticpotentialgeneratedbythepolarizeddi-electricatthepositionr.ThesolventsoluteinteractioncontributiontothetotalenergyEfisgivenbyWMS=ZfV˙fdq1dqNel=ZˆM(r˙(r)dr3;(2.2.11)wheretheintegrationtakesoverthewholesolutesolventspace.2.2.4.1QuantumMechanicalProblemToobtainthenuclearandelectronchargesdistributioninthePCMframework,weshouldsolvethefollowingSchrodingerequationwiththeinclusionofthesolutesolventinteractionH=E;(2.2.12)wheretheiveHamiltonianHisgivenbyH=H0M+Vint:(2.2.13)122.2.4.2ElectrostaticProblemThesolute-solventinteractionisobtainedviasolvingtheclassicalelectrostaticproblem.ConsiderthePoissonequation((r)rr))=4ˇˆM(r);(2.2.14)wherethepermittivityfunctionisgivenby(r)=8><>:1;r2sr2m(2.2.15)wheresandmrepresentthesolventandsolutedomains,respectively.isthedielectricconstantinmedium.Duetothesolventtheelectrostaticpotentialcanbedecomposedasr)=M(r)+˙(r);(2.2.16)whereM(r)istheelectrostaticpotentialgeneratedbythechargedistributionˆM,whichhastheanalyticalexpressionastheconvolutionoftheGreen'sfunctionwiththechargedistributionˆM.˙(r)isthereactionpotentialgeneratedbythepolarizationofthedielectricmedium.13ThePoissonequationEq.(2.2.14)subjectstothefollowingfarboundaryconditionslimr!1rr)=;limr!1r2r)=;(2.2.17)withvaluesforand.Atthesolutesolventcontactpart,thefollowinginterfaceconditionsconstraintsshouldbeaddedtoEq.(2.2.14)=inout=0;[(rn]=(r)@@nin(r)@@nout;(2.2.18)wherenisthenormaldirectionpointingoutwardthesolutedomain,andthesubscripts"in"and"out"representforthelimitfromtheinsidesolutedomainandoutsideone.2.2.4.3DielectricPolarizableContinuumModel(DPCM)TherearevariousversionsofthePCM,forinstance,DielectricPCM(DPCM)[238],Conductor-likePCM(C-PCM)[54],IntegralEquationFormalismPCM(IEFPCM)[71],waveletPCM[242]etal.ItisverytoreviewallthesePCMmodelshere.Inthispart,wespgiveashortreviewoftheDPCMmodel,thebasicideaiscalledtheApparentSurfaceCharge(ASC)similartotheinducedsurfacechargeapproachforsolvingthePoissonBoltzmannequation[154,94,209].Forsystemscomposedbyregionsatconstantisotropicpermittivity,thepolarizationvectorisgivenbythegradientofthetotalpotentialPi(r)=14ˇrr);(2.2.19)14whereisthedielectricconstantofthesoluteregion,andweassumethesolventdielectricconstanttobe1.Attheboundaryoftworegionsiandj,thereisanapparentsurfacechargedistributiongivenby˙ij=(PjPi)nij;(2.2.20)wherenijistheunitvectorattheboundarysurfacepointingfrommediumitomediumj.ThusthesurfacechargeforthePCMmodelisgivenby˙=Pn=14ˇroutn=14ˇrinn;(2.2.21)Since=M+S,thusthesurfacechargecanbefurtherexpressedas˙=14ˇ@M;inS;in)@n;therefore,thereactionpotentialduetothesolventpolarizationcanbeexpressedas˙(r)=Z˙(s)jrsjds2;(2.2.22)whereisthesurfaceofthesolutemolecule.2.3IntegralEquationBasedSolvationModelsComparetotheexplicitsolventtheory,thecontinuumsimthedescriptionofthesol-vent,andreducesthecomplexityofthesolvationmodeldramatically.Suchimplicitsolventmodelsareoftenuseful,buthaveavarietyoflimitations:theydrasticallyaveragethere-15sponseofwaterdipolesandionstothecreatedbysolutes,missmostofatomicandmolecularsizes,collapseallionintoasingleionicstrengthparameter,andfailtoaccountfornon-electrostaticaspectsofsolvation.Inmanysituationsthesemeanapproximationmaybetoosevere[222].Thereisalong-studiedalternativeapproachtounderstandtheequilibriumpropertiesofwaterandions,basedontheintegralequationapproachofOrnsteinandZernike.Theseideaswereoriginallyappliedtoatomicliquids,andhavebeenextendedtomolecularsolventssuchaswaterbyavarietyofmethods,mostnotablyviatheReferenceInteractionSiteModel(RISM)[91,119,113,67,110,203].2.3.1Ornstein-ZernikeEquationInhomogeneousthespatialnumberdensitydistribution,ˆ(r),isuniformandprovideslittleinformation.Incontrast,thenumberdensityofaparticle,2,relativetoaparticle,1,containsawealthofinformationand,inthegrandcanonicalensemble,isgivenbyˆ(2)(1;2)=11XN=2ZN(N2)!Zexp[VN]drN2;(2.3.1)wherethepositionandorientationofmolecularspeciesaredenotedbyboldfacenumbers,e.g.,1:=(r1;1),=1kBTwithkBbetheBoltzmannconstantandTtheabsolutetemperature,isthegrandpartitionfunction,ZNandVNaretheN-particlepartitionfunctionandpotential.Forhomogeneousthe2-particledensitydistributionisrelatedtothePairDistri-butionFunction(PDF),g(1;2),throughtheequationˆ(2)(1;2)=ˆ1ˆ2g(1;2);(2.3.2)16whereˆ1andˆ2arethebulknumberdensitiesofparticles1and2,respectively.Whentheorientationaldependenceisaveragedout,g(r)isknownastheRadialDistributionFunction(RDF).Alternatively,thePDFisalsorelatedtothepotentialofmeanforce,w(1;2)h(1;2)+1=g(1;2)=exp[w(1;2)];(2.3.3)whereh(1;2)istheTotalCorrelationFunction(TCF).Forhomogeneous,multi-component,molecularliquids,theOZintegralequationisgivenbyhij(1;2)=cij(1;2)+XkˆkZcik(1;3)hkj(3;2)d3;(2.3.4)wherewedenotethemolecularspeciesbyi;jandk,c(1;2)istheDirectCorrelationFunction(DCF)andtheintegrationisperformedoverallspace.Physically,wecaninterprettheTCFasthesumofcontributionsfromthedirectinteractionofthetwoparticles(DCF)plustheinteractionsmediatedbythesurroundingparticles(therighthandconvolution).Asbothhandcareunknownfunctions,asecond,closureequationisrequiredtoasolution.2.3.2ClosuresThemostgeneralcasefortheclosureequationisg=expu+hc+bg;(2.3.5)wherebisthebridgefunctionandwehavedroppedthefunctionalargumentsforbrevityandgenerality.17IntheHyper-NettedChain(HNC)approximation,thebridgefunctionissettozero,whichyieldsthefollowingHNCclosureg=expu+hcg;(2.3.6)thishasbeenfoundtoproduceverygoodresultsforionicandpolarsystems.Italsohasanexact,closedformexpressionforthechemicalpotentialwhencoupledwithRISMtheory.However,itdoeshavedrawbacks,includingthermodynamicinconsistencies,poorresultsforneutralsystems,withparticlesizeasymmetriesandconvergingsolutions[216].Toaddresstheissueofconvergence,KovalenkoandHiratadevelopedapartiallylinearizedclosure.Regionsofenhanceddensitywerelinearized,avoidingtheexponentialdensityre-sponseforstrongpotentialinteractions.ThislinearizationwaslatergeneralizedtoaTaylorseriesg=8><>:expftg;fort<0Pni=0(t)ii!;fort0;(2.3.7)wheret=u+hc:2.3.31D-RISMMostmodernbiomolecularforcesuseinteractionsitemodelsinwhichamoleculeiscomposedofanumberofsites,typicallyatoms,thatinteractinapair-wisefashion.Suchmodelsaveryewaytodealwithnonsphericalmoleculesbutrequireapractical18methodtoapplymodelinEq.(2.3.4)tomolecularspecieswithmultiplesites.Oneapproach(which,inpractice,isrestrictedtomoleculeswithasmallnumberofsites)istotreatthemoleculesasrigidbody,andsite-siteorientationallyaveragethecorrelationfunctionsforeachsite,reducingtheequationstoonedimension,i.e.,orientationalaveragingisdoneabouteachsiteratherthan,forexample,averagingaboutthemolecularcenter-of-mass.IntheRISMapproachthisisachievedbytreatingtheDCFasdecomposableintothesumofsite-sitedirectcorrelationfunctionsc(1;2)=X12c12(jr1r2j);(2.3.8)wherec(1;2)istheDCFbetweenmolecules1and2,aswellasandareinteractionsitesonmolecules1and2,respectively.Moleculesareassumedtoberigid,andtheirshapeentersthetheorythroughthein-tramolecularcorrelationmatrix,representedinFourierspace,^!(k)=+(1)sin(kl)kl;(2.3.9)whereistheKronecker-functionandlisthedistancebetweensitesinthesametypeofmolecule.Forthesamesite,=,l=0and^!(k)=1,whileforthesitesbelongtottypesofmolecule^w(k)=0.Withtheoftheintramolecularcorrelationfunction,wecannowexpressthemolecularOZintegralequationEq.(2.3.4)intermsofinteractionsitesratherthanmolecules.19Themulti-components1D-RISMequationcanbewrittenexplicitlyformolecules1and2asˆh(r)ˆ=NsiteXNsiteX!(r)c(r)!(r)+NsiteXNsiteX!(r)c(r)ˆh(r)ˆ;whereistheconvolutionoperatorandNsiteisthetotalnumbersitesfromallmolecularspecies.Theaboveequationcanbeconciselywrittenintothefollowingcompactmatrixformˆhˆ=!c!+!cˆhˆ;(2.3.10)withˆbeingadiagonalmatrixofscalarvalues,!andcarematricesofradiallydependentfunctionsandallmatricesareofsizeNsiteNsite.2.3.43D-RISMFormacromolecularions,whicharecomposedofmorethanafewsites,theapproximationofsphericallysymmetricdistributionfunctionsbeginstobreakdown.Oneapproachistouseafull3Ddescriptionofthemacromolecularsolute,U,whileusingorientationallyaverageddistributionsforthesolventV.Ifthesoluteisatdilution,Eq.(2.3.4)canberewrittenashVVij(i;j)=cVVij(i;j)+XkˆVkZcVVik(i;k)hVVkj(k;j)dk;(2.3.11)hVVi(1;i)=cUVi(1;i)+XkˆVkZcUVk(1;k)hVVki(k;j)dk:(2.3.12)Inthe3D-RISMmodel,Eq.(2.3.11)givestheTCFofthebulksolvent,whichisthenusedinEq.(2.3.12)toobtainthedistributionofthesolventaboutthesolute.202.4ConclusionInthischapter,weprovidesashortreviewofthethreeclassesofthesolvationmodels.Theremainingofthisdissertationfocusesonsolvingsomeproblemsthatcomefromtheabovesolvationmodelsandenrichthefamilyofsolvationmodels.Theapplicationofthecontinuumsolvationmodelsforstudyingthesolvationandbindingphenomenaisalsoamainthemeofthisdissertation.21Chapter3SelfConsistentCouplingofPolarandNonpolarSolvationFreeEnergyThegeneralideaofimplicitsolventmodelsistotreatthesolventasadielectriccontinuumanddescribethesolutewithatomisticdetail[62,219,112,211,126].Thetotalsolvationfreeenergyisdecomposedintononpolarandpolarparts.Thereisawidevarietyofwaystocarryoutthisdecomposition.Forexample,nonpolarenergycontributionscanbemodeledintwostages:theworkofdisplacingsolventwhenaddingarigidsolutetothesolventandthedispersivenonpolarinteractionsbetweenthesoluteatomsandsurroundingsolvent.ThepolarpartisduetotheelectrostaticinteractionsandcanbeapproximatedbyGeneralizedBorn(GB)[64,11,241,191,84,296,138,236,178,38,99],PolarizableContinuumModel(PCM)[237]andPoisson-Boltzmann(PB)models[145,76,219,62,293,6,295].Amongthem,GBmodelsareheuristicapproachestopolarsolvationenergyanalysis.PCMsresorttoquantummechanicalcalculationsofinducedsolutecharges.PBmethodscanbeformallyderivedfromMaxwellequationsandstatisticalmechanicsforelectrolytesolutions[19,182,111]andthereforethepromiseofhandlinglargebiomoleculeswithtaccuracyandrobustness[61,190,11].Conceptually,theseparationbetweencontinuumsolventandthediscrete(atomistic)soluteintroducesaninterfaceThismaytaketheformofanalytical22functions[100,98,99]ornonsmoothboundariesdividingthesolute-solventdomains.ThevanderWaalssurface,solventaccessiblesurface[147],andMolecularSurface(MS)[206]aredevisedforthispurposeandhavefoundtheirsuccessinbiophysicalcalculations[225,161,58,142,20,68,120,155].Ithasbeennoticedthattheperformanceofimplicitsolventmodelsisverysensitivetotheinterface[65,66,186,231].Thiscomesasnosurprisebecausemanyofthesepopularinterfaceareadhocdivisionsofthesoluteandsolventdomainsbasedonrigidmoleculargeometryandneglectingsolute-solventenergeticinteractions.Additionally,geometricsingularities[51,212]associatedwiththesesurfaceincurenormouscomputationalinstability[295,278,279]andleadtoconceptualyininterpretingthesharpinterface[45].ThetialGeometry(DG)theoryofsurfaces[275]andassociatedgeometricPartialtialEquations(PDEs)provideanaturaldescriptionofthesolvent-soluteinterface.In2005,Weiandhiscollaboratorsintroducedcurvature-controlledPDEsforgeneratingmolecularsurfacesforsolvationanalysis[271].Thevariationalsolvent-soluteinterface,namely,theMinimalMolecularSurface(MMS),wasconstructedin2006byWeiandcowork-ersbasedontheDGtheoryofsurfaces[13,14,15].MMSsareconstructedbysolvingthemeancurvaturew,ortheLaplace-Beltramiw,andhavebeenappliedtothecalculationofelectrostaticpotentialsandsolvationfreeenergies[46,15].Thisapproachwasgeneral-izedtopotential-drivengeometricows,whichadmitsphysicalinteractions,forthesurfacegenerationofbiomoleculesinsolution[12].Whileourapproacheswereemployedand/ormobymanyothers[47,281,284,285]formolecularsurfaceandsolvationanalysis,ourgeometricPDE[271]andvariationalsurfacemodels[13,15,12]are,toourknowledge,theoftheirkindforsolvent-soluteinterfaceandsolvationmodeling.Sincethesurfaceareaminimizationisequivalenttotheminimizationofsurfacefree23energies,duetoaconstantsurfacetension,thisapproachcanbeeasilyincorporatedintothevariationalformulationofthePBtheory[218,92]toresultinDG-basedfullsolvationmodels[43,269],followingasimilarapproachbyDzubiellaetal[70,291].TheDG-basedsolvationmodelshavebeenimplementedintheEulerianformulation,wherethesolvent-soluteinterfaceisembeddedinthethree-dimensional(3D)Euclideanspaceandbehaveslikeasmoothcharacteristicfunction[43].Theresultinginterfaceandassociateddielectricfunctionvarysmoothlyfromtheirvaluesinthesolutedomaintothoseinthesolventdomainandarecomputationallyrobust.AnalternativeimplementationistheLagrangianformulation[44]inwhichthesolvent-soluteboundaryisextractedasasharpsurfaceatagivenisovalueandsubsequentlyusedinthesolvationanalysis,includingnonpolarandpolarmodeling.OnemajoradvantageoftheDGbasedsolvationmodelisthatitenablesthesynergisticcouplingbetweenthesoluteandsolventdomainsviathevariationprocedure.Asaresult,theDGbasedsolvationmodelisabletotlyreducethenumberoffreeparame-tersthatusersmustoradjustinapplicationstoreal-worldsystems[235].Ithasbeendemonstratedthatphysicalparameters,i.e.,pressureandsurfacetensionobtainedfromex-perimentaldata,canbedirectlyemployedintheDG-basedsolvationmodelsforaccuratesolvationenergyprediction[59].AnotheradvantageoftheDGbasedsolvationmodelisthatitavoidstheuseofadhocsurfaceitionsanditsinterfaces,particularlyonesgeneratedfromtheEulerianformulation[43],arefreeoftroublesomegeometricsingularitiesthatcom-monlyoccurinconventionalsolvent-accessibleandsolvent-excludedsurfaces[52,212].Asaresult,theDGbasedsolvationmodelbypassesthesophisticatedinterfacetechniquesrequiredforsolvingthePBequation[278,279,90].Inparticular,thesmoothsolvent-soluteinter-faceobtainedfromtheEulerianformulation[43]canbedirectlyinterpretedasthephysicalsolvent-soluteboundaryAdditionally,theresultingsmoothdielectricboundarycan24alsohaveastraightforwardphysicalinterpretation.TheotheradvantageoftheDGbasedsolvationmodelisthatitisnaturalandeasytoincorporatetheDensityFunctionalTheory(DFT)initsvariationalformulation.Consequently,itisabletoreevaluateandreassignthesolutechargeinducedbysolventpolarizationduringthesolvationprocess[45].Theresultingtotalenergyoptimizationprocessrecreatesorresemblesthesolvent-soluteinterac-tions,i.e.,polarization,dispersion,andpolarandnonpolarcouplinginarealisticsolvationprocess.Recently,DGbasedsolvationmodelhasbeenextendedtoDGbasedmultiscalemodelsfornon-equilibriumprocessesinbiomolecularsystems[269,273,272,40,41].ThesemodelsrecovertheDGbasedsolvationmodelattheequilibrium[273].Recently,wehavedemonstrated[46]thattheDGbasednonpolarsolvationmodelisabletooutperformmanyothermethods[82,246,202]insolvationenergypredictionsforalargenumbernonpolarmolecules.TheRootMeanSquareError(RMSE)ofourpredictionswasbelow0.4kcal/mol,whichclearlyindicatesthepotentialpoweroftheDGbasedsolvationformulation.However,theDGbasedfullsolvationmodelhasnotshownasimilarsuperiorityinaccuracy,althoughitworksverywell[43,44].Havingsomanyaforementionedadvantages,theDGbasedsolvationmodelsoughttooutperformothermethodswithasimilarlevelofapproximations.OneobstaclethathinderstheperformanceofourDGbasedfullsolvationmodelisthenumericalinstabilityinsolvingtwostronglycoupledandhighlynonlinearPDEs,namely,theGeneralizedLaplace-Beltrami(GLB)equationandthegeneralizedPB(GPB)equation.Toavoidsuchinstability,astrongparameterconstraintwasappliedtothenonpolarpartinourearlierwork[43,44],whichresultsinthereductionofourmodelaccuracy.TheobjectiveofthepresentworkistoexploreabetterparameteroptimizationoftheDGbasedsolvationmodels.Apairofconditionsisprescribedtoensurethephysicalsolution25oftheGLBequation,whichleadstothewell-posednessoftheGPBequation.Suchawell-posednessinturnrendersthestabilityofsolvingtheGLBequation.ThestablesolutionofthecoupledGLBandGPBequationenablesustooptimizethemodelparametersandproducethehighlyaccuratepredictionofsolvationfreeenergies.Someofthebestresultsareobtainedinthesolvationfreeenergypredictionofmorethanahundredmoleculesofbothpolarandnonpolartypes.Therestofthischapterisorganizedasthefollows.Toestablishthenotationandfacilitatefurtherdevelopment,wepresentabriefreviewoftheDGbasedsolvationmodelsinSection3.1.Byusingthevariationalprinciple,wederivethecoupledGLBandGPBequations.Necessaryboundaryconditionsandinitialvaluesareprescribedtomakethiscoupledsystemwell-posed.Section3.2isdevotedtoparameterlearningalgorithms.WedevelopaprotocoltostabilizetheiterativesolutionprocessofcouplednonlinearPDEs.WeintroduceperturbationandconvexoptimizationmethodstoensurestabilityofthenumericalsolutionoftheGLBequationincouplingwiththeGPBequation.ThenewlyachievedstabilityinsolvingthecoupledPDEsleadstoanappropriateoptimizationofsolvationfreeenergieswithrespecttoourmodelparameters.InSection3.3,weshowthatformorethanahundredofcompoundsofvarioustypes,includingbothpolarandnonpolarmolecules,thepresentDGsolvationmodelsomeofthemostaccuratesolvationfreeenergypredictionwiththeoverallRMSEof0.5kcal/mol.263.1TheDGbasedsolvationmodelThefreeenergyfunctionalfortheDGbasedfullsolvationmodelcanbeexpressedas[270,43,44]G[S;=ZnjrSj+pS+(1S)U+Shm2jrj2+ˆmi+(1S)"s2jrj2kBTXˆ0 eqkBT1!#)dr;r2R3(3.1.1)whereisthesurfacetension,pisthehydrodynamicpressurebetweensolventandsolute,andUdenotesthesolvent-solutenon-electrostaticinteractionsrepresentedbythesemi-discreteandsemi-continuumLennard-Jonespotentialsinthepresentwork.Here0S1isahypersurfaceorsimplysurfacefunctionthatcharacterizesthesolutedomainandembedsthe2DsurfaceinR3,whereas1Scharacterizesthesolventdomain[43].OnemayconsiderSastheposition-dependentvolumefractionofthesolute.Additionally,istheelectrostaticpotentialandsandmarethedielectricconstantsofthesolventandsolute,respectively.HerekBistheBoltzmannconstant,Tisthetemperature,ˆ0denotesthereferencebulkconcentrationofthethsolventspecies,andqdenotesthechargevalenceofthethsolventspecies,whichiszeroforanunchargedsolventcomponent.Weuseˆmtorepresentthechargedensityofthesolute.Thechargedensityisoftenmodeledbyapointchargeapproximationˆm=NmXjQj(rrj);whereQjdenotingthepartialchargeofthejthatominthesolute.Alternatively,thechargedensitycomputedfromtheDFT,whichchangesduringtheiterationorenergyoptimization,canbedirectlyemployedaswell[45].27InEq.(3.1.1),thethreetermsconsistofthesocallednonpolarsolvationfreeenergyfunctionalwhilethelasttwotermsformthepolarone.AfterthevariationwithrespecttoS,weobtainanellipticequationforthesurfacefunctionSrrSjrSj+V=0;(3.1.2)wherethepotentialdriventermisgivenbyV=p+U+m2jrj2ˆms2jrj2kBTXˆ0 eqkBT1!:ItisastandardproceduretoseekthesolutionofEq.(3.1.2)byconvertingitintoaparabolicequation[12].Assuch,weconstructthefollowingGeneralizedLaplace-Beltrami(GLB)equation[43,44].@S@t=jrSjrrSjrSj+V:(3.1.3)hereweutilizedthemethodproposedbyMarquinaandOshertotamethedirectsteepestdescentmarching3.1.2[166].Asinthenonpolarcase,solvingthegeneralizedLaplace-Beltramiequation(3.1.3)gen-eratesthesolvent-soluteinterfacethroughthesurfacefunctionS.Additionally,variationwithrespecttogivesrisetothegeneralizedPoisson-Boltzmann(GPB)equation:((S)r=Sˆm+(1S)Xqˆ0eqkBT;(3.1.4)28where(S)=(1S)s+Smisthegeneralizedpermittivityfunction.Asshowninourearlierwork[270,43],(S)isasmoothdielectricfunctiongraduallyvaryingfrommtos.Thus,thesolutionprocedureoftheGPBequationavoidsmanynumericalofsolvingellipticequationswithdiscontinuouscots[286,295,294,280,279]inthestandardPBequation.TheGLB(3.1.3)andGBP(3.1.4)equationsformahighlynonlinearsystem,inwhichtheGLBequationissolvedfortheinterfaceSofthesoluteandsolvent.Theinterfacedeterminesthedielectricfunction(S)intheGPBequation.TheGPBequationissolvedfortheelectrostaticspotentialthatbehavesasanexternalpotentialintheGLBequation.Thestronglycoupledsystemshouldbesolvedinself-consistentiterations.ForGLBequation(3.1.3),thecomputationaldomainis=vdWm,wherevdWmisthesolutevanderWaalsdomaingivenbyvdWm=SiB(rvdWi).HereB(rvdWi)istheithballinthesolutecenteredatriwithvanderWaalsradiusrvdWi.WeapplythefollowingDirichletboundaryconditiontoS(r;t)S(r;t)=8><>:0;8r2@1;8r2@vdWm:(3.1.5)TheinitialvalueofS(r;t)isgivenbyS(r;0)=8><>:1;8r2@extm;0;otherwise;(3.1.6)where@extmistheboundaryoftheextendedsolutedomainconstructedbyextm=SiB(rvdWi+rprobe).HereB(rvdWi+rprobe)hasanextendedradiusofrvdWi+rprobewithrprobebeing29theproberadius,whichissetto1.4Ainthepresentwork.ForGPBequation(3.1.4),thecomputationaldomainisWesettheDirichletboundaryconditionviatheDebye-Huckelexpression,r)=NmXi=1Qisjrrijejrrij;8r2@;(3.1.7)whereisthemoDebye-Huckelscreeningfunction[44],whichiszeroifthereisnosaltmoleculeinthesolvent.Notethatnointerfacecondition[278]isneededasSand(S)aresmoothfunctionsingeneralfort>0.Consequently,theresultingGBP(3.1.4)equationiseasytosolve.Tocomparewithexperimentalsolvationdata,oneneedstocomputethetotalsolvationfreeenergy,which,inourDGbasedsolvationmodel,isobtainedasG=GP+GNP;(3.1.8)whereGPistheelectrostaticsolvationfreeenergy,GP=12NmXi=1Qiri)h(ri)](3.1.9)wherehisthesolutionoftheabovetheGPBmodelinahomogenoussystem,obtainedbysettingaconstantpermittivityfunction(r)=minthewholedomainThenonpolarenergyGNPiscomputedbyGNP=Z[jrSj+pS+(1S)U]dr:(3.1.10)30TheDGbasedsolvationmodelisformulatedasacoupledGLBandGPBequationsystem,inwhichtheGLBequationprovidesthesolventsoluteboundaryforsolvingtheGPB,whiletheGPBequationproducestheexternalpotentialintheGLBequationfortheevolutionofthesurfacefunctionS.Thesolutionprocedureforthiscoupledsystemhasbeendiscussedinourearlierwork[43,44].Essentially,fortheGLBequation,anAlternatingDirectionImplicit(ADI)schemewasutilizedforthetimeintegral,inconjugationwiththesecondordermethodforthespatialdiscretization.TheGPBequationwasdiscretizedbyastandardsecondorderschemeandtheresultingalgebraicequationsystemwassolvedbyusingastandardKrylovsubspacemethodbasedsolver[43,44].3.2ParametrizationmethodsandalgorithmsTosolvetheabovecoupledequationsystem,asetofparametersthatappearedintheGLBequation,namely,surfacetension,hydrodynamicpressurep,andtheproductofsolventdensityandwelldepthparameterofthejthatom~"j:=ˆ"j,shouldbepredeter-mined.Unfortunately,thiscoupledsystemisunstableatthecertainchoicesofparameters.Sp,forcertainV,onemayhaveS>1orS<0,whichleadstounphysical(S)andunphysicalsolutionofGPBequation(3.1.4)andthusgivesrisetoadivergentS.Thisinstabilitycanseriouslyreducethemodelaccuracy[43,44].Foraconcisedescriptionofouralgorithm,weassumethatthereisonlyonesolventcomponent(water)anddenotetheparametersetas:P=f;p;~"1;~"2;;~"NTg(3.2.1)31whereNTisthenumberoftypesofatomsinthesolutemolecule.Asmentionedinthepreviouspart,theparametersetPusedinsolvingthecoupledPDEsshouldmeettworequirements,namely,thestabilityofsolvingthecoupledPDEsandtheoptimalpredictionofthesolvationfreeenergy(ortheexperimentalsolvationfreeenergyinthebestapproach).Basedonthesetwocriteriaweintroduceatwo-stagenumericalproceduretooptimizetheparametersetandsolvethecoupledPDEs:ExplorethestabilityconditionsofthecoupledPDEsbyintroducinganauxiliarysystemviaasmallperturbation;Optimizetheparametersetbyaniterativelyschemesatisfyingthestabilityconstraint.3.2.1StabilityconditionsInthispartweinvestigatethestabilityconditionsforthenumericalsolutiontothecoupledPDEs(3.1.3)and(3.1.4).Thebasicideaistoutilizeasmallperturbationmethod.ItisknownthatomittingtheexternalpotentialintheGLBequationyieldstheLaplace-Beltrami(LB)equation:@S@t=jrSjrrSjrSj(3.2.2)ThisequationisoftypeandiswellposedwiththeDirichlettypeofboundaryconditionsprovided>0.NumericallyitiseasytosolveEq.(3.2.2)toyieldtheofthesolventsoluteboundary.AftersolvingtheLBequation(3.2.2),weusethegeneratedsmoothofthesolventsoluteboundarytodeterminethepermittivityfunctionintheGPBequation.Forsimplicity,32weconsiderapurewatersolvent,((S)r=Sˆm:(3.2.3)WithouttheexternalpotentialthesystemofEqs.(3.2.2)-(3.2.3)canbesolvedstablybysolvingtheLBequationandthentheGPBequation.Motivatedbytheaboveobservation,iftheexternalpotentialisdominatedbythemeancurvatureterm,thestabilityofcoupledGPBandGLBequationscanbepreserved.Basedonnumericalexperiments,theLennard-Jonesinteractionbetweenthesolventandsoluteisusuallysmallsincethistermisconstrainedbythenonpolarfreeenergyinourmodel.Inourmethod,weenforcethefollowingconstraintconditionstomakethecoupledsystemwell-posedinthenumericalsense>0>0;(3.2.4)andjpj;(3.2.5)where0andaresomeappropriatepositiveconstants.Insummary,theoriginalproblemistransformedintooptimizingparametersinthefol-lowingsystemtoattainthebestsolvationfreeenergywithexperimentalresults:8>>>>>>>>><>>>>>>>>>:@S@t=jrSjhrrSjrSjp+U+12mjrj212sjrj2i;((S)r=Sˆm;>0>0;jpj:(3.2.6)33NotethatthepotentialˆmisomittedintheGLBequation(3.2.6),becausewehavealreadyenforcedtheDirichletboundaryconditionintheGLBequation,whileˆmisinsidethevanderWaalssurface.Remark3.2.1.Basedonlargeamountofnumericaltests,itisfoundthatthereisnoneedtoenforcetheconstraintconditionsontheparametersthatappearintheLennard-Jonesterm.Whenthistermisusedtothesolvationenergywithexperimentalresults,theparameterscanbeboundedinasmallneighborhoodof0automaticallyduringtheprocedure.Theseparametersessentiallydonotctthenumericalstability.3.2.2Self-consistentapproachforsolvingthecoupledPDEsInthispart,weproposeaself-consistentapproachtosolvethecoupledGLBandGPBequationsforagivensetofparameters.Basically,thecoupledsystemissolvediterativelyuntilboththeelectrostaticsolvationfreeenergyGPgiveninEq.(3.1.9)andthesurfacefunctionSarebothconverged.Herethesurfacefunctionissaidtobeconvergedprovidedthatthesurfaceareaandenclosedvolumearebothconverged.Wepresentanalgorithmforsolvingthefollowingcoupledsystems:((S)r=Sˆm;(3.2.7)and@S@t=jrSjrrSjrSj+Ve;(3.2.8)whereVeistheexternalpotentialwhichisedas:Auxiliarysystem:Ve=12(ms)jrj2,34Fullsystem:Ve=p+U+12(ms)jrj2.DirichletboundaryconditionsareemployedforbothGPB(3.2.7)andGLB(3.2.8)equa-tionswithauxiliaryandfullexternalpotentials,givingrisetoawell-posedcoupledsystem.Thesmoothofthesolvent-soluteboundaryenablesthedirectuseofthesecondordercentralerenceschemetoachievethesecondorderconvergenceindiscretizingtheGPBequation.Thebiconjugategradientschemeisusedtosolvetheresultingalgebraicequationsystem.TheGLBequationofboththeauxiliaryandfullsystemscanbesolvedbythecentraldiscretizationofthespatialdomainandtheforwardEulertimeintegratorforthetimedomaindiscretization.Remark3.2.2.Forthesakeofsimplicity,inthecurrentwork,weemployedthecentralenceschemeforspatialdomaindiscretizationinbothGPBandGLBequations,andforwardEulerintegratorforthetimedomaindiscretizationofGLBequation.Forsta-bilityconsideration,inthediscretizationoftheGLBequation,thediscretizationstepsizeoftemporalandspatialdomaintheCourant-Friedrichs-Lewycondition.Toacceleratethenumericalintegration,amultigridsolvercanbeemployedforGBPequation,andanalternatingdirectionimplicitscheme[43],whichisunconditionallystable,canbeutilizedforthetemporalintegration.However,detaildiscussionoftheseacceleratedschemesisbeyondthescopeofthepresentwork.ThecoupledGLBandGPBequationsissolvedinaself-consistentmanner.Adynamicalcouplingisneededtwosolvethecoupledsystem.Remark3.2.3.InsolvingtheGLBequation,duringeachupdating,toensurethestability,insteadofthefullyupdate,weupdateitpartially,i.e.,theupdatedsolutionistheweightedsumofthenewsolutionofthecurrentGLBsolutionSnewandtheoldsolutionoftheGLB35equationinthepreviousstepSold:S=a1Snew+(1a1)Sold;(3.2.9)wherea1isaconstantandsetto0.5inthepresentwork.3.2.3ConvexoptimizationforparameterlearningInthispart,wepresenttheparameteroptimizationscheme.Inourapproach,parametersstartfromaninitialguessandthenareupdatedsequentiallyuntilreachingtheconvergence.HeretheconvergenceismeasuredbytheRMSEbetweentheandexperimentalsolva-tionfreeenergiesforagivensetofmolecules.Considertheparameteroptimizationforagivengroupofmolecules,fT1;T2;;Tng.AsdiscussedabovetheparametersetisP.TooptimizetheparametersetP,westartfromGPBequation(3.2.7)andtheauxiliarysystemofGLBequation(3.2.8)with=0:05.AftersolvingtheinitialcoupledsystembyusingAlgorithm??,weobtainthefollowingquantitiesforeachmoleculeinthetrainingset:8><>:Pj;Areaj;Volj;0@NmXi=11iZs"˙s+˙1jjrrijj122˙s+˙1jjrrijj6#dr1Aj;(3.2.10);(3.2.11)0@NmXi=1NTiZs"˙s+˙NTjjrrijj122˙s+˙NTjjrrijj6#dr1Aj9>=>;(3.2.12)wherej=1;2;;n.HereNmandNTdenotethenumberofatomsandtypesofatomsinaspmolecule.Thelastfewtermsinvolvesemi-discreteandsemi-continuumLennard-36Jonespotentials[43].Additionally,ji=8><>:1;ifatomibelongstotypej,0;otherwise.wherei=1;2;;Nm;j=1;2;NT;˙i;i=1;2;;NTistheatomicradiusoftheithtypeofatoms.Therefore,atomsofthesametypehaveacommonatomicradiusandparameter~".Thepredictedsolvationfreeenergyformoleculejcanberepresentedas:j=Pj+~"10@NmXi=11iZs"˙s+˙1jjrrijj122˙s+˙1jjrrijj6#dr1Aj(3.2.13)++~"NT0@NmXi=1NTiZs"˙s+˙NTjjrrijj122˙s+˙NTjjrrijj6#dr1Aj(3.2.14)+Areaj+pVolj(3.2.15)WedenotethepredictedsolvationfreeenergyforthegivensetofmoleculesasG(P):=fG1;G2;;Gng,whichisafunctionoftheparametersetP,anddenotethecorre-spondingexperimentalsolvationfreeenergyasGExp:=nGExp1;GExp2;;GExpno.ThentheparameteroptimizationprobleminthecoupledPDEsgivenbyEqs.(3.2.6)canbetransformedintothefollowingregularizedandconstrainedoptimizationproblem:minPjjG(P)GExpjj2+jjPjj2;(3.2.16)37subjectto0;(3.2.17)andjpj;(3.2.18)wherejjjj2istheL2normofthequantityandistheregularizationparameterchosentobe10inthepresentworktoensurethedominanceofthersttermandavoidover-Here0andaresetrespectivelyto0:05and0:1inthepresentimplementation,whichguaranteesthestabilityofthecoupledsystemaccordingtoalargeamountofnumericaltests.Itisobviousthattheobjectivefunction(3.2.16)intheoptimizationisaconvexfunction,meanwhilethesolutiondomainrestrictedbyconstraints(3.2.17)-(3.2.18)formsaconvexdomain.ThereforetheoptimizationproblemgivenbyEqs.(3.2.16)-(3.2.18)isaconvexoptimizationproblem,whichwasstudiedbyGrantandBoyd[103,102].Aftersolvingtheaboveconvexoptimizationproblem,parametersetPisupdatedandusedagaininsolvingthecoupledGLBandGPBsystem,i.e.,Eqs.(3.2.8)and(3.2.7).Repeatingtheaboveprocedure,anewgroupofpredictedsolvationfreeenergiestogetherwithanewgroupofparametersisobtained.ThisprocedureisrepeateduntiltheRMSEbetweenthepredictedandexperimentalsolvationfreeenergiesintwosequentialiterationsiswithinagiventhreshold.383.2.4AlgorithmforparameteroptimizationandsolutionofthecoupledPDEsBasedonthepreparationmadeintheprevioustwosubsections,namely,theself-consistentapproachforsolvingthecoupledGLBandGPBsystemandtheparameteroptimization,weprovidethecombinedalgorithmfortheparameteroptimizationandsolvingthecoupledsystemforagivensetofmolecules.Numerically,toresolvethecoupledPDEsandparameteroptimization.AselfconsistentiterationisemployedtwosolvetheparameteroptimizationandcoupledPDEs.Inwhichtheparametersoptimizationproblemissolvedintheouteriteration,whilethecoupledPDEsaresolvedintheinneriteration.3.3NumericalresultsInthissectionwepresentthenumericalstudyoftheDGbasedsolvationmodelusingtheproposedparameteroptimizationalgorithms.WeexploretheoptimalsolventradiususedinthevanderWaalsinteractions.Duetothehighnonlinearity,thesolventradiuscan-notbeautomaticallyoptimizedanditsoptimalvalueisobtainedviasearchingtheparameterdomain.Weshowthatforagroupofmolecules,thereisalocalminimumintheRMSEwhenthesolventradiusisvaried.Thecorrespondingoptimalsolventradiusisadoptedforothermolecules.Additionally,weconsideralargenumberofmoleculeswithknownexperimen-talsolvationfreeenergiestotesttheproposedparameteroptimizationalgorithms.Thesemoleculesareofbothpolarandnonpolartypesandaredividedintosixgroups:theSAMPL0testset[185],thealkane,alkene,ether,alcoholandphenoltypes[175].ItisfoundthatourDGbasedsolvationmodelworksreallywellforthesemolecules.Finally,todemonstrate39thepredictivepowerofthepresentDGbasedsolvationmodel,weperformae-foldcrossvalidation[108]foralkane,alkene,ether,alcoholandphenoltypesofmolecules.Itisfoundthattrainingandvalidationerrorsareofthesamelevel,whichtheabilityofourmodelforthesolvationfreeenergyprediction.TheSAMPL0moleculestructuralconformationsareadoptedfromtheliteraturewithZAP9radiiandtheOpenEye-AM1-BCCv1charges[185].Forothermolecules,structuralconformationsareobtainedfromFreeSolv[175].AmberGAFFforceisutilizedforthechargeassignment[35].ThevanderWaalsradiiaswellastheatomicradiiofHydrogen,CarbonandOxygenatomsaresetto1.2,1.7and1.5A,respectively.Thegridspacingissetto0.25Ainallofourcalculations(discretizationandintegration).Thecomputationaldomainissettotheboundingboxofthesolutemoleculewithanextralengthof6.0A.Table3.1:ThesolvationfreeenergypredictionfortheSAMPL0set.Energyisintheunitofkcal/mol.NameGPGNPGGExp[185]ErrorGlyceroltriacetate-10.602.53-8.07-8.84-0.77Benzylbromide-4.311.93-2.38-2.380.00Benzylchloride-4.451.18-3.27-1.931.34m-Bisyl)benzene-2.623.701.081.07-0.01N,N-Dimethyl-p-methoxybenzamide-8.35-2.22-10.57-11.01-0.45N,N-4-Trimethylbenzamide-6.93-3.09-10.03-9.760.27bis-2-Chloroethylether-3.73-0.14-3.59-4.23-0.641,1-Diacetoxyethane-7.072.00-5.07-4.970.101,1-Diethoxyethane-3.580.43-3.15-3.28-0.131,4-Dioxane-5.36-0.38-5.74-5.050.69Diethylpropanedioate-7.071.40-5.67-6.00-0.33Dimethoxymethane-4.091.19-2.90-2.93-0.03Ethyleneglycoldiacetate-7.661.90-5.76-6.34-0.581,2-Diethoxyethane-3.640.45-4.09-3.540.55Diethyl-2.210.76-1.47-1.430.04Phenylformate-7.102.08-5.02-4.080.94Imidazole-11.542.71-8.83-9.81-0.98RMSE0.6040(a)(b)(c)(d)(e)(f)Figure3.1:TherelationsbetweenthesolventradiiandtheRMSEs.(a)SAMPL0testset;(b)Alkaneset;(c)Alkeneset;(d)Etherset;(e)Alcohol;(f)Phenolset.Notably,thereisacommonlocalminimumatthesolventradii3.0Aforalltestsetsexceptforthealkeneset.413.3.1SolventradiusInthepresentsemi-discreteandsemi-continuumLennard-Jonespotential,~kZs"˙s+˙ijjrrijj122˙s+˙ijjrrijj6#dr;thepositionsri,(i=1;2;;Nm)arethecoordinatesofsoluteatoms,whilerisnotthepositionofaregularsolventatomormolecule.Sincethesolventistreatedasacontinuum,rvaries,inprinciple,continuouslyoverthewholesolventdomain.Thedistancejjrrijjisscaledbythesumofsolventradius˙sandsoluteradii˙i.Becauseoftheexplicitrepresen-tationofsoluteatoms,soluteatomicradii˙iaresettotheirvanderWaalsradii,theradiithatthevanderWaalssurface,whichisusedforsettinguptheboundaryconditionfortheGLBequation.However,thecontinuumtreatmentofthesolventpreventsustosimplyassociate˙swiththeradiusofthesolventmolecule.UnlikethethefullydiscreteLennard-Jonespotentialinexplicitsolventmodels,thesemi-discreteandsemi-continuumLennard-JonespotentialinourDGbasedsolvationmodeldescribesthe\interaction"ofasoluteatomwithanarbitrarypositioninthesolventdomain.Innumericalapproximation,thearbitrarypositionisrepresentedbyagridmesh.Therefore,onecannotsimplytakethesolventradiusinthepresentmodelastheradiusofindividual(discrete)solventmolecules.Additionally,itisnotedthatthesolventradiusinthepresentworkandsolventproberadiusinthePoisson-Boltzmanntheoryaretwotconcepts.Inthepresentwork,solventradius˙sisconsideredasanoptimizationparameter.Notethatduetothenonlinearnature,thisoptimizationcannotbecarriedouttogetherormixedwiththeparameteroptimizationdiscussedintheearliersection.Weutilizeabruteforceapproachforthesolventradiusselectionoroptimization.Six42Figure3.2:Thepredictedandexperimentalsolvationfreeenergyforthe17moleculesintheSAMPL0testset.setsoftestexamplesareutilizedtoexploreappropriatesolventradius.TheSAMPL0testset[185]isabenchmarkhaving17molecules.Additionally,weconsider38alkane,22alkene,17ether,25alcohol,and18phenolmolecules.Thesolventradiusisvariedfrom0.5Ato5.5AawayfromvanderWaalssurface.DuetothefastdecaypropertyoftheLennard-Jonesinteractions,theabovesettingenablesthefullinclusionoftheLennard-Jonesinteractionsinourmodel.Figure3.1depictstheRMSEsofsixtestsetsattsolventradiicalculatedfromthepresentDGbasedsolvationmodel.InFigure3.1(a),theresultclearlydemonstratesthatwiththeincreaseofthesolventradius,theRMSEdecreasesdramaticallyinitially.Theminimumappearsat3.0A.ThefurtherincreaseofthesolventradiusleadstoarapidjumpintheRMSEbeforeitstabilizesaround1.54kcal/mol.Itisnotedthat3.0Aismuchlargerthanthecommonlyusedsolventproberadiusof1.4AinPoisson-Boltzmanntheorybasedimplicitsolventmodels.Forotheretestsets,althoughthebehavioroftheRMSEineachcase,essentiallyalltheRMSEshavealocalminimumatthesolventradiusof3A.Therefore,inallthefollowingcomputations,thesolventradiusissetto3.0A.43Figure3.3:Thepredictedandexperimentalsolvationfreeenergiesfor38alkanemolecules.Figure3.4:Thepredictedandexperimentalsolvationfreeenergiesfor22alkenemolecules.3.3.2OptimizationresultsInthissection,weillustratetheperformanceofourparameteroptimizationalgorithms.First,weprovidetheregressionresultsoftheSAMPL0testset[185].Figure3.2showsthepredictedandexperimentalsolvationfreeenergiesbasedonthepresentmodelandoptimiza-tionmethod.Itisobviousthatpredictedsolvationfreeenergiesarehighlyconsistentwiththeexperimentalones.TheRMSEis0.60kcal/mol.Table3.1showsthebreakupofpolar,non-polarandtotalpredictedsolvationfreeener-gies.Theexperimentalvaluesanderrorsarealsoprovided[185].44Table3.2:Thesolvationfreeenergypredictionforthealkaneset.Allenergiesareintheunitofkcal/mol.NameGPGNPGGExp[175]Erroroctane-0.132.892.762.880.12ethane-0.041.701.661.830.17propane-0.051.831.782.000.22cyclopropane-0.082.432.350.75-1.60isobutane-0.072.092.022.300.282,2-dimethylbutane-0.072.342.272.510.24isopentane-0.072.192.122.380.262,3-dimethylbutane-0.072.412.342.340.003-methylpentane-0.082.432.352.510.16methylcyclopentane-0.101.761.661.59-0.07n-butane-0.072.031.962.100.14isohexane-0.092.492.402.510.112,4-dimethylpentane-0.092.572.482.830.35methylcyclohexane-0.101.681.581.700.12n-pentane-0.082.252.172.300.13hexane-0.092.512.422.480.06cyclohexane-0.101.401.301.23-0.07nonane-0.143.112.973.130.16heptane-0.112.732.622.670.05cyclopentane-0.101.541.441.20-0.24cycloheptane-0.111.561.450.80-0.65cyclooctane-0.121.691.570.86-0.71neopentane-0.062.132.072.510.442,2,4-trimethylpentane-0.082.742.662.890.233,3-dimethylpentane-0.072.582.512.560.052,3-dimethylpentane-0.082.722.642.52-0.122,3,4-trimethylpentane-0.082.962.882.56-0.321,2-dimethylcyclohexane-0.102.021.921.58-0.343-methylhexane-0.092.742.652.710.063-methylheptane-0.112.942.832.970.141,4-dimethylcyclohexane-0.112.021.912.110.202,2-dimethylpentane-0.082.642.562.880.322-methylhexane-0.102.732.632.930.30decane-0.163.373.213.16-0.06propylcyclopentane-0.122.212.092.130.03cis-1,2-Dimethylcyclohexane-0.091.951.861.58-0.282,2,5-trimethylhexane-0.093.153.062.93-0.13pentylcyclopentane-0.152.732.582.55-0.04RMSE0.3645Comparedtoourearlierprediction[43]inwhichthesamemodelisemployedbuttheparameterswerenotoptimizedinthepresentmanner,theRMSEdecreasesdramaticallyfromprevious1.76kcal/molto0.60kcal/molforthesametestset.NotethatthepresentRMSE(0.60kcal/mol)isalsotlysmallerthanthatoftheexplicitsolventapproach(1.710.05kcal/mol)andthatobtainedbythePBbasedprediction(1.87kcal/mol)underthesamestructure,chargeandradiussetting[185].ThepresentresultsctheciencyoftheproposednewparameteroptimizationalgorithmsanddemonstratetheaccuracyandpowerofourDGbasedsolvationmodels.Table3.3:Thesolvationfreeenergypredictionforthealkeneset.Allenergiesareintheunitofkcal/mol.NameGPGNPGGExp[175]Errorethylene-0.270.960.691.280.59isoprene-0.621.971.350.68-0.67but-1-ene-0.291.170.881.380.50butadiene-0.561.751.190.56-0.63pent-1-ene-0.301.571.271.680.41prop-1-ene-0.321.030.711.320.612-methylprop-1-ene-0.371.260.891.160.27cyclopentene-0.371.170.790.56-0.232-methylbut-2-ene-0.401.280.871.310.442,3-dimethylbuta-1,3-diene-0.652.011.360.40-0.953-methylbut-1-ene-0.271.451.181.830.651-methylcyclohexene-0.381.501.110.67-0.45penta-1,4-diene-0.531.911.380.93-0.45hex-1-ene-0.301.811.501.580.08hexa-1,5-diene-0.511.881.371.01-0.36hept-1-ene-0.332.171.841.66-0.18hept-2-ene-0.341.961.621.680.064-Methyl-1-pentene-0.261.711.451.910.462-methylpent-1-ene-0.331.751.421.470.05non-1-ene-0.362.812.452.06-0.39trans-2-Heptene-0.341.901.561.660.10trans-2-Pentene-0.301.260.961.340.38RMSE0.46Additionally,weinvestigatethesolvationfreeenergiespredictionoftwofamiliesofnon-46polarmolecules,alkaneandalkene,whichwerestudiedpreviousbyusingtheDGbasednonpolarsolvationmodel[46].Inthefollowing,wedemonstratethatthepresentDGbasedfullsolvationmodelcanprovidethesamelevelofaccuracyinthesolvationfreeenergypredictionforalkaneandalkenemolecules.Figures3.3and3.4depictthepredictedandexperimentalsolvationfreeenergiesfor38alkaneand22alkenemolecules,respectively.Tables3.2and3.3listthepolar,nonpolar,totalandexperimentalsolvationfreeenergiesforbothfamiliesofsolutemolecules,respectively.Exceptforonealkanemolecule,namely,cycloprotane,whosepredictederroris1.60kcal/mol,theerrorsforallothermoleculesarewithin1kcal/mol.TheRMSEsofthesetwofamiliesare0.36and0.46kcal/mol,respectively.ThislevelofaccuracyissimilartoourearlierresultsobtainedbyusingourDGbasednonpolarsolvationmodel[46],whichdoesnotinvolvetheelectrostatic(polar)modelandiscomputationallyeasiertooptimize.Figure3.5:Thepredictedandexperimentalsolvationfreeenergyforthe17ethermolecules.Itisinterestingtonotethatforbothalkaneandalkenemolecules,thepolarsolvationfreeenergycontributionisverysmallandthenonpolarpartdominatesthesolvationfreeenergycontribution,whichexplainswhytheDGbasednonpolarsolvationmodelworksextremelywellforthesolvationfreeenergypredictionofalkaneandalkenemolecules[46].Further,47Figure3.6:Thepredictedandexperimentalsolvationfreeenergyforthe25alcoholmolecules.Figure3.7:Thepredictedandexperimentalsolvationfreeenergyforthe18phenolmolecules.notethatforalmostallthealkanemolecules,thepolarsolvationfreeenergiesGPareofmagnitude0.01kcal/mol,whilealkenemoleculeshaveslightlylargermagnitudepolarfreeenergies,whichfurthervthatalkenemoleculeshasastrongerpolaritythanalkanemoleculesingeneral.Finally,weanalyzethreeclassesofpolarsolutemolecules,namely,ether,alcohol,andphenolmolecules.Figures3.5,3.6and3.7illustratethepredictedandexperimentalsolvationfreeenergiesfor17ether,25alcohol,and18phenolmolecules,respectively.Tables3.4,3.5and3.6listthepolar,nonpolar,totalandexperimentalsolvationfreeenergiesforthe48Table3.4:Thesolvationfreeenergypredictionfortheetherset.Allenergiesareintheunitofkcal/mol.NameGPGNPGGExp[175]Errorethoxyethane-4.082.33-1.75-1.590.162-methyltetrahydrofuran-4.101.43-2.67-3.30-0.63tetrahydrofuran-4.361.36-3.00-3.47-0.471-propoxypropane-3.752.29-1.46-1.160.30methoxymethane-4.552.26-2.29-1.910.36tetrahydropyran-4.171.09-3.07-3.12-0.051-butoxybutane-3.882.33-1.55-0.830.72trimethoxymethane-7.573.51-4.06-4.42-0.36methoxyethane-4.352.29-2.06-2.10-0.041-methoxypropane-4.082.24-1.84-1.660.182-methoxypropane-4.122.20-1.92-2.01-0.091-Ethoxypropane-4.262.32-1.94-1.810.131,3-Dioxolane-6.091.81-4.28-4.100.182,5-dimethyltetrahydrofuran-3.861.42-2.44-2.92-0.481,1,1-trimethoxyethane-7.583.46-4.12-4.42-0.302-methoxy-2-methyl-propane-3.881.97-1.91-2.21-0.301,4-dioxane-7.091.66-5.44-5.060.38RMSE0.36correspondingfamiliesofsolutemolecules.TheRMSEsofthesethreefamiliesare0.36,0.33,and0.76kcal/mol,respectively.FromtheresultslistedinTables3.4,3.5and3.6wenotethatforethermolecules,allthenonpolarenergiesarepositivewhichneutralizessomepolarcontributionstothetotalsolvationfreeenergies.Forthealcoholmolecules,thenonpolarenergiesareallnegative,whichenhancethecontributionsofthepolarcontributionstothetotalsolvationfreeenergies.Sincethesurfacepartisalwayspositiveandthevolumepartismostlypositive,theattractivevanderWaalsinteractionsbetweenalcoholmoleculesandwatersolventmustbeverystrong.Physically,therearestrongsolvent-solutehydrogenbondsthatmakealcoholmoleculeseasilysolvated.Thesesolvent-soluteinteractionaredescribedbythestrongattractivevanderWaalsinteractionsinthepresentmodel.Asforthephenolmolecules,thereisamixed49Table3.5:Thesolvationfreeenergypredictionforthealcoholset.Allenergiesareintheunitofkcal/mol.NameGPGNPGGExp[175]Errorethyleneglycol-6.98-1.76-8.73-9.30-0.57butan-1-ol-3.33-1.51-4.84-4.720.12ethanol-3.49-1.47-4.96-5.00-0.04methanol-3.69-1.41-5.10-5.100.00propan-1-ol-3.34-1.48-4.82-4.85-0.03propan-2-ol-3.26-1.36-4.62-4.74-0.12pentan-1-ol-3.36-1.61-4.97-4.570.402-methylpropan-2-ol-3.10-1.27-4.37-4.47-0.102-methylbutan-2-ol-2.95-1.17-4.12-4.43-0.312-methylpropan-1-ol-3.20-1.50-4.70-4.500.20butan-2-ol-3.09-1.32-4.40-4.62-0.22cyclopentanol-3.20-1.68-4.88-5.49-0.614-methylpentan-2-ol-2.65-1.05-3.69-3.73-0.04cyclohexanol-3.21-1.92-5.13-5.46-0.33hexan-1-ol-3.43-1.53-4.96-4.400.56heptan-1-ol-3.48-1.62-5.09-4.210.882-methylbutan-1-ol-3.27-1.29-4.56-4.420.14cycloheptanol-3.07-1.89-4.96-5.48-0.522-methylpentan-3-ol-2.86-0.93-3.78-3.88-0.10pentan-3-ol-3.01-1.08-4.10-4.35-0.254-Heptanol-2.90-1.10-3.99-4.01-0.022-methylpentan-2-ol-2.93-1.08-4.00-3.920.082,3-Dimethyl-2-butanol-2.89-0.93-3.82-3.91-0.09hexan-3-ol-3.04-1.27-4.31-4.060.25pentan-2-ol-3.10-1.23-4.33-4.39-0.06RMSE0.33patternforthenonpolarcontributions.TheabovestudyofalargevarietyofmoleculesindicatesthattheDGbasedsolvationmodeltogetherwiththeproposedparameteroptimizationalgorithmscanprovideveryac-curatepredictionsofsolvationfreeenergiesforbothpolarandnonpolarsolutemolecules.50Table3.6:Thesolvationfreeenergypredictionforthephenolset.Allenergiesareintheunitofkcal/mol.NameGPGNPGGExp[175]Error3-hydroxybenzaldehyde-9.170.39-8.78-9.52-0.744-hydroxybenzaldehyde-9.600.19-9.41-8.830.58o-cresol-5.32-1.04-6.36-5.900.46m-cresol-5.71-0.86-6.57-5.491.08phenol-5.81-0.14-6.95-6.610.34p-cresol-5.80-1.05-6.85-6.130.72naphthalen-1-ol-5.50-0.75-6.25-7.67-1.423,4-dimethylphenol-5.72-0.49-6.21-6.50-0.292,5-dimethylphenol-5.34-0.48-5.82-5.91-0.094-tert-butylphenol-5.550.86-4.69-5.91-1.222,4-dimethylphenol-5.55-1.03-6.58-6.010.573,5-dimethylphenol-5.69-0.41-6.10-6.27-0.17naphthalen-2-ol-5.85-0.72-6.57-8.11-1.542,3-dimethylphenol-5.47-1.13-6.60-6.160.442,6-dimethylphenol-5.07-1.07-6.14-5.260.883-ethylphenol-5.67-0.37-6.04-6.25-0.214-propylphenol-5.79-0.05-5.84-5.210.634-ethylphenol-5.76-0.48-6.24-6.130.11RMSE0.76Table3.7:Thepartitionofthemoleculesintosub-groups.MoleculeGroup1Group2Group3Group4Group5Alkane88877Alkene55544Ether44333Alcohol55555Phenol444333.3.3Five-foldcrossvalidationHavingvthattheDGbasedsolvationmodelwiththeoptimizedparametersprovidesverygoodregressionresults,weperformae-foldcrossvalidationtofurtherillustratesthepredictivepowerofthepresentmethodforindependentdatasets.Sp,theparameterslearnedfromagroupofmoleculescanbeemployedfortheblindpredictionofothermolecules.51Figure3.8:Thebarplotofthetrainingandvalidationerrorsofalkanes.Figure3.9:Thebarplotofthetrainingandvalidationerrorsofalkenes.Figure3.10:Thebarplotofthetrainingandvalidationerrorsoftheethers.52Figure3.11:Thebarplotofthetrainingandvalidationerrorsofalcohols.Figure3.12:Thebarplotofthetrainingandvalidationerrorsofphenols.Toperformthee-foldcrossvalidation,eachtypeofmoleculesissubdividedintoesub-groupsasuniformlyaspossible,Table3.7liststhenumberofmoleculesineachsub-groupforeachtypeofmolecules.Inourparametersoptimization,weleaveoutonesub-groupofmoleculesandusetherestofmoleculestoestablishourDGbasedsolvationmodel.Theoptimizedparametersarethenemployedfortheblindpredictionofsolvationfreeenergiesoftheleftoutsub-groupofmolecules.Figures3.8,3.9,3.10,3.11,and3.12demonstratethecrossvalidationresultsofthealkane,alkene,ether,alcohol,andphenolmolecules,respectively.Itisseenthattrainingandvali-53dationerrorsaresimilartoeachother,whichvtheabilityofourmodelintheblindpredictionofsolvationfreeenergies.Intherealpredictionofthesolvationfreeenergyforagivenmoleculeofunknowncate-gory,wecanassignittoagivengroup,andthenemploytheDGbasedsolvationmodelwiththeoptimalparameterslearnedforthisspgroupforablindprediction.3.4Conclusiontialgeometry(DG)basedsolvationmodelshavehadaconsiderablesuccessinsol-vationanalysis[269,43,44,45].Particularly,theDGbasednonpolarsolvationmodelwasshowntoosomeofthemostaccuratesolvationenergypredictionsofvariousnonpolarmolecules[46].However,theDGbasedfullsolvationmodelissubjecttonumericalinstabil-ityinsolvingthegeneralizedLaplace-Beltrami(GLB)equation,duetoitscouplingwiththeGPBequation.TostabilizethecoupledGLBandGPBequations,astrongconstraintonthevanderWaalsinteractionwasappliedinourearlierwork[43,44,45],whichhinderstheparameteroptimizationofourDGbasedsolvationmodel.Inthepresentwork,weresolvethisproblembyintroducingnewparameteroptimizationalgorithms,namelyperturbationmethodandconvexoptimization,fortheDGbasedsolvationmodel.Newstabilitycondi-tionsareexplicitlyimposedtotheparameterselection,whichguaranteesthestabilityandrobustnessofsolvingtheGLBequationandleadstoconstrainedoptimizationoftheDGbasedsolvationmodel.Thenewoptimizationalgorithmsareintensivelyvalidatedbyusingalargenumberoftestmolecules,includingtheSAMPL0testset[185],alkane,alkene,ether,alcoholandphenoltypesofsolutes.Regressionresultsbasedonournewalgorithmsareconsistentextremelywellwithexperimentaldata.Additionally,ae-foldcrossvalidation54techniqueisemployedtoexploretheabilityoftheDGbasedsolvationmodelsfortheblindpredictionofthesolvationfreeenergiesforavarietyofsolutemolecules.Itisfoundthatthesameleveloferrorsisfoundinthetrainingandvalidationsets,whichourmodel'spredictivepowerinsolvationfreeenergyanalysis.ThepresentDGbasedfullsolvationmodelprovidesaframeworkforanalyzingbothpolarandnonploarmolecules.Nevertheless,thecapabilityoftheDGbasedsolvationmodelforblindsolvationfreeenergypredictionforgeneralmoleculesisstillquitelimited.Theblindsolvationfreeenergypredictionbysomeothermodelswillbefurtherdiscussedinthefollowingchapters.55Chapter4MolecularConformationinSolvent4.1IntroductionInthepreviouschapters,wepresentedseveralsolvationmodels.Themolecularcon-formationmodelingisoneofthemostfundamentalissuesthatweneedtoresolve,especiallyfortheimplicitsolvationmodels.Inthetialgeometrybasedsolvationmodel,themolecularconformationismodeledbasedonthevariationalprinciple.However,forgeneralsolvationmodelsthismodelingisinappropriate.Inthischapterwepresentthemodelsforthemolecularconformationmodelinginsolvent.Thesolvatedmolecularisusuallycharac-terizedasthecavityenclosedbythemolecularsurface,andamongtheliteraturetherearemainlythreecategoriesofsurfacevanderWaalssurface[53],itisthesurfaceasthesurfacecreatedwheneachatomisrepresentedbyaspherewitharadiusequaltothevanderWaalsradiusofthatatom.TheVDWsurfaceforamoleculeistheunionofalltheindividualvanderWaalsspheres.SolventAccessibleSurface(SAS)[147,206],itisasthetraceoftheprobecenter,whenaprobeisusedtorollalongtheatoms'vanderWaalsspheres.SolventExcludedSurface(SES)[51,52],itisasthesurfacetracedbytheinward-facingsurfaceoftheprobe,whenaprobeisusedtorollalongtheatomspheres.56Itiscomposedoftwoparts:contactsurface,isthepartofthevanderWaalssurfacethatcanbetouchedbytheprobe;reentrantsurface,isformedbytheinward-facingpartoftheprobewhenitisincontactwithmorethanoneatom.TheSESisgenerallyacceptedtobethemostaccuratemolecularconformationmodelintheimplicitsolventmodelcommunity[209,94,259].Geometrically,SESisrelativelysmoothcomparedtotheothertwopopularsurfacenevertheless,itstilltakessomegeometricsingularities,suchastipsandcusps.Manyhavebeenpaidinthepastdecadesfordevelopingtheanalytical,robust,andtSESsoftware.ThepioneerworkisduetoConnolly[32,51,52],heformulatedthemathematicalrepresentationoftheSESforarbitrarybiomoleculesintermsofconvexsurfaces,saddlesurfaces,andconcavesurfaces.State-of-the-arttriangulatedSESsoftware,MSMS,isdevelopedbySanneretal.[212],theMSMSsoftwareprovidesveryfastSESgenerationandtriangulation.ManyworkshavealsobeendoneforthedevelopmentofapproximatedSES,forinstance,theSmoothedNumericalSurface(SNS)whichisusedintheDelphisoftware[209]forthebiomolecularelectrostaticsmodeling;GEPOLwhichisthewidelyusedapproximationofSESinthepolarizablecontinuummodel[238].Besidestheabovethreesurfacetions,forthecomputationalconvenience,manyothersurfacearealsointroduced.Especially,theGaussiansurfacewhichisin-troducedtorelievethegeometricsingularityproblem[159].ManyvariantsoftheGaussiansurfacehavebeenproposedfortpurposes[21].Inthischapter,wewillpresentourworkonthedevelopmentoftheEulerianrepresenta-tionofSES,whichisconvenientfortheCartesianmeshbasednumericalmethods.Further-more,itprovidesaparadigmforsurfacearea,molecularvolume,andmoleculartopologicalanalysis.57Thischapterisstructuredasfollowing.Section4.2introducesthealgorithmforEuleriansolventexcludedsurface(ESES)generation,whichcontainstwoparts,SESgenerationandembeddingtheSEStotheCartesianmesh.Section4.3discussestheareaandvolumecalculationfortheESES.Section4.4addressessometopologicalstructureissuesontheESES.TheelectrostaticsofthesolvatedmoleculeanalysiswillbediscussedinSection4.5.Thischapterendsupwithaconclusion.4.2EulerianSolventExcludedSurface(ESES)OuralgorithmisdesignedtodevelopananalyticalEulerianrepresentationofSES,thisamountstoclassifytheCartesiangridpointswithrespecttotheanalyticalSES,accuratelycomputethelocationsofintersectionpointsbetweentheinterfaceandCartesianmeshlines,andcalculatetheassociatedouternormaldirections.Therearemainlythreestepsinthisframework.First,ananalyticalSESisbuilt.Second,alltheCartesianmeshgridsareaseitherinsideoroutsidethesurface.Third,theintersectionspointsandassociatedouternormaldirectionsarecomputedforeachedgewithonepointsinsideandtheotheroneoutside.Wewilldiscussthesethreestepsindetailinthefollowingsubsections.4.2.1ConstructionofSolventExcludedSurfaceInthestep,ananalyticalSESisconstructedmainlybasedonthealgorithmproposedbyConnolly[32,51].Inthealgorithm,theSESisdividedintothreetypesofpatches,namely,convexpatches,saddlepatches,andconcavepatches.AsshowninFig.4.1,wherethreetpatchesarerenderedbytcolors.Theconvexpatches(red)aretheaccessibleatomicsurfacebyasphericalproberollingaround;saddlepatches(green)arethe58Figure4.1:ThreetypesofpatchesforSES:convexpatches(red),saddlepatches(green)andconcavepatches(blue).traceoftheprobeinward-facingsurfacetouchingwithtwoatomsatthesametime;concavepatches(blue)arethesphericaltrianglefacesoftheprobetouchingthreeormoreatomssimultaneously.Theboundarycurvesoftheconvexpatchesaremarkedasconvexedges,andtheboundarycurvesoftheconcavepatchesareconcaveedges.Correspondingconcaveedgesandconvexedgescomposethesaddlepatches.Whenasadle/torouspatchisfree,itsboundaryconvexedgesarethecompletecontactcircles.Iftherearenoassociatedconcaveedgesgeneratedforthenon-freetorus,thistorusismarkedasblocked.Connolly[32,51]providesthealgorithmtocomputethecenters,radii,andboundaryplanes/edges/pointsofallthepossiblepatches,whereweperformtorusconstruction,probeplacement,andsaddlefaceconstructionprocessesinorder.Notethatwhenanatomisdetectedasaninterioratom(i.e.,completelyburiedbyotheratoms),itwillnotbeconsideredforthetorusconstruction.Additionally,tomakethealgorithmcompatiblewiththecasewhentheprobetouchesmorethanthreeatomsatthesametime,aspecialtreatmentoftheconcaveedgesisneededafterprobeplacement.59Forapairoftheconcaveedgesthatareoppositetoeachother,thesetwoconcaveedgesareeliminatedandtheircorrespondingconcavepatchesaretaggedasbelongingtothesameprobesphere.Fortheconcaveedgesthatarethesame,onlyoneconcaveedgeiskeptandtheircorrespondingconcavefacesarealsotaggedasasubsetofthesameprobesphere.4.2.2ofCartesianGridPointsInthesecondstep,weclassifyCartesiangridpointsinthecomputationaldomainaseitherinsideoroutsidetheSES.Apparently,Cartesiangridpointsthatareoutsidethecollectionoftheaugmentedatomicspherescanbesafelylabeledasoutsidepoints,wheretheradiusoftheaugmentedatomisexpandedbyrpwithrpbeingtheradiusoftheprobe.ThusweonlyneedtopayextraattentiontotheCartesiangridpointsthatareincludedbytheaugmentedatoms.Uponcarefulobservation,aCartesiangridpointcanbeasinsideofthemolecularsurfaceifitInsideAtomj[(InsideVSjInsideVT)&!(InsideSaddleProbe)&!(InsideConcaveProbe)](4.2.1)EachcomponentintheexpressiongivenbyEq.(4.2.1)willbediscussedbelowindetails.Notethattheaboveequationisonlyatcondition,furtherray-tracingtechniquesisadaptedfordeterminingthestatusoftheremaininggridpoints.604.2.2.1InsideAtomLetanatomcenteredatCi=(Ci;x;Ci;y;Ci;z)withradiusRi,foragivenCartesiangridpointP=(Px;Py;Pz).InsideAtomistrueif:jCiPj2R2i0:(4.2.2)Otherwise,thegridpointPisoutsidetheatom,i.e.,InsideAtomisfalse.4.2.2.2InsideVSandInsideSaddleProbeInsideVSisanabbreviationforinsidethevisibilitysphereofthesaddlepatch,whichisproposedinKrone'swork[141].ThecenterandtheradiusofthevisibilityspherecanbecomputedbyEq.(4.2.3)andEq.(4.2.4),respectively.Cvs=jCpCijjCpCij+jCpCjjCj+jCpCjjjCpCij+jCpCjjCi;(4.2.3)Rvs=jRijCpCij(CpCi)+CiCvsj;(4.2.4)whereCvsandRvsstandforthecenterandradiusofthevisibilitysphere,respectively.Ci,Cj,andCpdenotethecenterofi-th,j-thatom,andtheprobe,respectively.jjmeansthemagnitudeofthevector.Riistheradiusofthei-thatom.AsillustratedinFig.4.2.FromFig.4.2wecanseethatInsideVSincludessomepossibleinteriorpointsintroducedbythesaddlepatches.InsideSaddleProbeeliminatesthepointsthatareaccessibletotheprobe(thepurplearea).Notethatwhenthecurrentsaddlepatchisnotfree,weonlycheckthepartofthevisibilityspherethatislocatedwithintherangeofsaddlefaces.61Figure4.2:FigureforInsideVS.SolidlinesaretheoutlineofasimpleSEScomposedoftwoatoms(red)andonesaddlepatch(green).Blackcircleistheoutlineofthecorrespondingvisibilityspherewhentheprobetouchesatomiandatomjsimultaneously.InsideVSistaggedastruewhenthepointisdetectedasinsidethevisibilitysphere.Figure4.3:FlowchartforInsideVS.Figures4.3and4.4depictthewchartfordeterminingtheconditionsInsideVSandInsideSaddleProbe,respectively.Theequationsinvolvedintheabovetwowchartsare62Figure4.4:FlowchartforInsideTorus.listedbelow:f1(P;Tj)=jPCTj;VSj2R2Tj;VSf2(P;Tj)=~P2zr2p+(q~P2x+~P2y+RTj)2f3(P;Tj)=q~P2x+~P2yrp+RTjfquartic(P;Tj)=4R2Tj(~P2zr2p)+(r2p+R2Tj~P2x~P2y~P2z)2IsInLemon=(RTjrp)&f20&f30IsTorusCovered=TjisfreejcoveredbysaddlefaceswhereCVSandRVSarethecenterandradiusofthevisibilitysphere.Tjrepresentsthej-thgeneratedtoruswhentheprobetouchestwoatomsatthesametime.~Pisthelocationofthepointprojectedonthetorusparametrizationdomain.634.2.2.3InsideVSandInsideConcaveProbeInsideVTisshortforinsidethevisibilitytetrahedronoftheprobe,whichiscomposedofthecentersofthethreetouchingatomsandtheprobeasshowninFig.4.5.WhentheCartesianpointsisdetectedasinsidetheprobespherewithaposition,i.e.,touchingthreeormoreatomssimultaneously,thenitsInsideConcaveProbetagissettobetrue.WecanseethatInsideConcaveProbehelpstoexcludethepointsaccessibletotheprobeinthetetrahedron.Figure4.5:FigureforInsideVT.Aprobe(red)centeredatCpisincontactwiththreeatoms(blue)centeredatC1,C2andC3respectively.InsideVTistaggedastruewhenthepointisdetectedasinsidethetetrahedronCpC1C2C3.644.2.2.4RayTracingSinceEq.(4.2.1)onlyprovidesatconditionfortagginginsideCartesiangridpoints,theremayexistunlabeledinsideCartesiangridpointsbecauseoftheinterior"tunnel"struc-tureofSES,asdepictedinFig.4.6.ThesepointscanbecorrectlytaggedbycountingthenumberofintersectionpointswithSES.ForaCartesianedgewithoneCartesiangridpointP1labeledandtheotherpointP2unlabeled.PointP2sharesthesameBooleanlabelwithP1ifthereareevennumbersofpointsofintersectionsbetweenSESandthemesh.OtherwiseP2istaggedwiththeoppositelabelofP1.Tocomputethecorrectnumberofintersectionpointsanalytically,avalidationfortheintersectionpointsisneeded:Validationforintersectionswiththeconvexpatch(Fig.4.7)1.Notinsideofthenearbyatoms,bycheckingthedistancetotheneighboringatomcenters2.Notcoveredbyitsassociatedsaddlepatches,bycheckingtheboundarynormaldirectionsValidationforintersectionswiththesaddlepatch(Fig.4.8)1.AssociatedIsInLemonisfalse2.InsideofitsassociatedvisibilitysphereValidationforintersectionswiththeconcavepatch1.LocatedinthescopeareaconstrainedbythethreetrianglefacesC1C2Cp,C2C3Cp,C1CpC3ofthetetrahedronCpC1C2C3,whereC1;C2;C3arethecentersofthethreetouchingatoms(Fig.4.5).65Figure4.6:FigureforanunlabeledCartesiangridpoint.Thefromlefttorighthighlightpartofthe1A2Emodelwithtpatchesrespectively,wherewecanseetheCartesianpoint(yellow)shouldbetaggedasinsidetheSES.However,itdoesnotsatisfytheconditionsproposedinEq.(4.2.1).Fromlefttorightare:visualizationwithatomonly,Visualizationwithatomsandsaddlepatchesonly,andVisualizationforoverallSES,respectively.2.OutsideofthenearbyprobesphereswithlocationsComputingtheanalyticalintersectionwithconvexpatchesandconcavepatchesisasimplequadraticequationwhileforsaddlepatchesitisaquarticequation.HereweadoptJenkins-Traubalgorithm[109]tosolvethequarticequationforbetternumericalstability.4.2.2.5ComputationoftheIntersectionCoordinatesAfterlabelingtheCartesiangridpoints,weonlyneedtocomputetheintersectioncoordinatesandcorrespondingnormalsforCartesianedgeswithoneCartesianpointinsideSESandtheotheroutside.Notethatthesamevalidationrulesdiscussedaboveareusedfordeterminingtheintersectioncoordinateswithnearbypatches.4.2.3SurfaceMorphologyInthispart,wewillpresentasurfacemorphologycomparisonwiththatgeneratedbytheMSMSsoftware[212].MSMSshortfortheMichealSannermolecularsurface,isbasedonthe66Figure4.7:Validationforintersectionwithconvexpatches.SolidlinesillustratetheoutlineofasimpleSEScomposedofthreeatoms(red)andtwosaddlepatches(green).P1P2andP3P4aretheCartesianedges.Notehereweonlymarkthepossibleintersectionswithconvexpatches.EdgeP1P2seemstohavetwointersectionsQ1andQ2withconvexpatches,whereQ1isactuallyaninvalidintersectionsinceitisinsideofatom2.Similarly,theintersectionpointQ3foredgeP3P4isnotvalid,sinceitisincludedbysaddlepatch2.socalledreducedsurface.Itistheprogramthatcanhandlethegeometricsingularities.TheMSMSalgorithmconsiststhefollowingfourmajorstepsfortheSESgenerationandtriangulation:First,computethereducedsurfaceamolecule.Second,buildananalyticalrepresentationoftheSESfromthereducedsurface.Third,removealltheself-intersectingpartsfromtheanalyticSESbuiltabove.Fourth,produceatriangulationofthereducedsurfacebasedontheSES.DuetotheandrobustnessoftheMSMSsoftware,ithasbeenincorporatedintoseveralmolecularmodelingsoftwaresuites,e.g.,theVisualMolecularDynamicssoftware(VMD)[118],theUCSFChimera[196]etal.67Figure4.8:Validationforintersectionwithsaddlepatches.SolidlinesillustratetheoutlineofasimpleSEScomposedoftwoatoms(red)andtwosaddlepatches(green),wheretwoseparatesaddlepatchesbelongtothesametorus.Theblackdashedcircleistheoutlineofthecorrespondingvisibilitysphere.Singularitiesaregeneratedinthiscaseandtheyareappropriatelyhandledimplicitlywithourdesignedconditions.P1P2andP3P4aretheCartesianedges.Notehereweonlymarkthepossibleintersectionswithsaddlepatches.TherearetwointersectionsQ1andQ2foredgeP1P2withsaddlepatches,whereQ1isinvalidsinceitsassociatedBooleantagIsInLemonistrue.TheintersectionpointQ3foredgeP3P4withsaddlepatchesisnotvalideither,sinceitisoutsideofthecorrespondingvisibilitysphere.TheMSMSsurfacegenerationdependsonanadditionalparameterbesidestheintrinsicparametersfordescribingtheSES,namely,thesurfacedensity,whichistheapproximatednumberoftrianglesonperunitA2area.Usually,thelargerthesurfacedensityis,thecloserofMSMStotheanalyticalSES,providedthesurfacecanbegeneratedsuccessfully.DespitethegreatsuccessofoftheMSMS,itshouldbepointoutsomeproblemsthatweencounteredduringtheusageofMSMS.Ontheonehand,theMSMSsurfacemaychangetheatomicradiusofthemoleculeautomatically.Ontheotherhand,MSMSmayfailtogenerateSESathighdensity.AsdepictedinFig.4.9,MSMSgeneratestheSESforthemolecular(PDBID:1dqz)successfullyatdensity5,10,and25,whilefailsatdensity50.68Figure4.9:TheMSMSsurfacesforthebiomolecule(PDBID:1dqz).ChartsfromtoplefttobottomrightaretheMSMSsurfaceswiththedensities5,10,25and50,respectively.TheMSMSfailstogeneratecorrectsurfaceatdensity50,whileitworkswellotherdensities.69Foracomparison,wedepictstheSESformolecular1dqzgeneratedbyESESsoftwareinFig.4.10.Itiseasytoseethat,thesurfacesgeneratedbyMSMSatdensity5,10,25areconsistentwiththatbyESES.Withtheincreasingofthesurfacedensity,MSMSconvergestotheESES.Figure4.10:TheESESsurfaceforprotein1DQZ.4.3ESES:Area,VolumeCalculationThemolecularsurfaceareaandenclosedvolumearewidelyusedinmodelingthenonpolarsolvationfreeenergyinbiophysics,theaccuratecalculationoftheareaandvolumeisoffundamentalimportanceinthesolvationandbindinganalysis[56].TheESESembedsSESintheCartesianmesh,whichisrepresentedbytwosetsofgridpointsJ1andJ2,representthegridsinsideandoutsidetheSESrespectively.TogetherwiththeintersectionoftheCartesianmeshwiththeSESandtheassociatedouternormaldirection.WedenotetheirregulargridpointssetasJIrrwhereagridpointissaidtobeirregularifatleastoneofits70sixneighborgridslocatedinthetsideoftheSES.TheareaoftheSEScanbeapproximatedbythefollowingformula:Area=ZdSˇX(i;j;k)2Ijnxj+jnyj+jnzjh2;(4.3.1)wherehisthegridsize,Iisthesetofirregulargridpointsthatiseitherinsideoronthesurfacejnxjisthemagnitudeofthecomponentoftheouternormaldirectionatthe(x0;yj;zk),whichistheintersectionofwiththex-meshlinethatpassesthrough(xi;yj;zk),jnyjandjnzjareanalogously.Thevolumeenclosedbythemolecularsurfaceissimilarlyevaluatedby:Volume=Zmdrˇ120@X(i;j;k)2J1+X(i;j;k)2J1SJIrrin1Ah3;(4.3.2)whereJIrrinisthesetofirregulargridpointsthatinsidethesurfaceSmerekaoriginallyproposedthesimilarschemeforevaluatingthesurfaceintegrationanddidtheconvergenceanalysisinthework[223].SimilarschemeisalsousedforcalculatingthedielectricboundaryforceinthePBbasedimplicitsolventmoleculardynamicssimulationbyGengetal[89].Table4.1givesthegridentanalysisofthenumericalschemegivenbyEqs.(4.3.1)and(4.3.2)onthespherewithradius2,obviouslythenumericalschemeforareacalculationsisroughlyofsecondorderconvergence,whilethatforthecalculationofsurfaceenclosedvolumeisapproximatedofthirdorderconvergence.Totesttheaccuracyofthenumericalschemeforsurfaceareaandvolumecalculationtotherealbiomolecules,weperformthecalculationontheAmberPBSAtestset,which71Table4.1:Thegridmentanalysisoftheareaandvolumecalculationforthespherewithradius2.GridsizeAreaErrorOrderVolumeErrorOrder1.043.7556.5147.0013.4900.548.8241.442.1835.281.8652.850.2549.8390.431.7633.800.2872.700.12550.1930.072.5633.570.0582.300.062550.2430.021.7133.520.0073.05ExactValue50.2833.52containstwoparts,theproteinandnucleicacidmolecules.Thereare937moleculesintotalinthistestset,thenumberofatomsofwhichrangefromseveralhundredstoapproximatetenthousands.Thedatasetisdownloadedfromthewebhttp://rayl0.bio.uci.edu/rayl/.Figure.4.11depictstherelativeconvergenceofthenumericalschemeforthesurfaceareaandvolumecalculation,wheretherelativeerrorfortheareacalculationisas:RelativeErrorh:=jAreahArea0:2jArea0:2;whereRelativeErrorhstandsfortherelativeerroratthegridsizeh,AreahandArea0:2arethenumericalsurfaceareaatgridsizehand0.2A,respectively.Therelativeerrorforthevolumecalculationisdinthesameway.TofurtherverifytheaccuracyoftheareaandvolumecalculationintheESESsoftware.WecomparethatbybothESESandMSMSsoftware,toensuretheaccuracyoftheMSMScalculation,weusethedensity100forMSMSsurfacegeneration.Duetotherobustnessproblem,only740molecularsurfacesaregeneratedsuccessfullyatdensity100,thefailureofothersareeitherduetothefailureofsurfacegenerationorthechangeofradiusinthesurfacegeneration.Thedetailsaboutthiscomparisoncanbefoundin[157].TheresultsdepictedinFig.4.12demonstratetheexcellentconsistencybetweentheareaandvolume72(a)(b)Figure4.11:ConvergencetestofthesurfaceareaandenclosedvolumeforthePBSAtestsetcomparedtotheresultsobtainedatgridsize0.2A.(a)Area;(b)Volume.calculationbytwosoftware.4.3.1AtomicAreaIntheimplicitsolventmodeling,thewholesurfaceareausuallyemployedforacrudemodeloftheareacontributioninnonpolarsolvationfreeenergy,wherethenonpolarenergyismodeledbyGNP1=Area,withbeingthesurfacetension.Amoreaccurateareamodelformodelingthenonpolarsolvationisbasedontheatomicsurfaceareas,inwhichttypeofatomsadmitntsurfacetensions.Assuch,onecanmodelthenonpolarsolvationfreeenergyby:GNP2=XjjAreaj;(4.3.3)wherejandAreajareatomicsurfacetensionandsurfaceareaofj-thtypeofatoms,respectively.Therefore,inadditiontothesurfaceareacalculation,theESESsoftwarealsoprovidestheatomicsurfaceareacalculation.IntheESESformulation,eachpieceofpatchis73(a)(b)Figure4.12:TheconsistenceoftheareaandvolumecalculationbetweentheMSMSandESESsoftwarepackagesforpartofthePBSAtestsetthatMSMScangeneratethesurfacesuccessfulwithoutchangeofatomicradiusatthedensity100(here740biomoleculesused)forvolumecalculationandanalyticsurfaceareabyMSMSsoftware.TheESESresultsaregeneratedatgridsize0.2A.(a)Fortheareaconsistence,thecorrelationis0:9999,thebestlineisy=0:9934x+14:3307;(b)Forthevolumeconsistence,thecorrelationis0:9999,thebestlineisy=1:0040x23:9577.74representedbysomeintersectingpointswiththeCartesianmeshline,duetotheoftheSES,thecontactsurfaceiscontributedfromoneatom;thetoricisgeneratedduetothecontactingofprobewithtwoatomssimultaneously;andtheconcavepatchisassociatedwiththreeatoms.Thisfactisinheritedtothesurfaceareapartitioning.NotethatinourareaformulaEq.(4.3.1),thewholeareaisthecumulativecontributionfromeachintersection.Intheatomicareapartitioning,wedistributetheareacomesfromeachintersectingpointtotheassociatedatom.ForagivenintersectionpointwithcoordinaterandtheassociatedareaisA0,thepartitioningoftheareaisbasedonthefollowingcriteria:Iftheintersectionisassociatedwithonlyoneatom,thewholeareaA0isassignedtothecorrespondingatom.Iftheintersectionisassociatedwithtwoatomsindexedbyiandj,whosecentersareriandrj,theradiusareriandrj,respectively.Thenwecalculatedtheweighteddistancefromtheintersectionrtotwocenters,theyaregivenby:di=jjrrijjri;anddj=jjrrjjjrj;respectively,wherejjjjdenotesfortheEuclideandistancebetweentwopoints.Andthewholeareaispartitionedtothecloseratomundertheweighteddistancemeasure.Iftheintersectionisassociatedwiththreeatoms,wedistributetheareabasedonthesamemannerasthatintwoatomscase,andthewholeareaisattributedtotheclosestatom.75TheabovepartitioningisbasedonthebasicideaoftheweightedVoronoidiagram[5].4.4ESES:TopologicalAnalysisLoopsandcavitiesareomnipresentinSESs.Usuallytheseloopsorcavitiesarerelatedtothebindingpocketsorbindingsites.Accurateandtalgorithmsfordetectingtheloopsandcavitiestogetherwithmeasuringthesizeofthesetopologicalfeaturesplayanimportantroleinthepracticalapplications,includingcomputeraideddrugdesign.Inthissection,weprovideanaccurateandtnumericalalgorithmbasedonthehomologytheory[133,73]fortheloopandcavitydetection.Furthermore,weproposealevelsetmethodbasedtogeneratepersistentbarcodeswhichcharacterizethesizeofloopsandcavities.Homologygrouptheory,providesanetheoreticalframeworkforcomputingtheloopsandcavitiesonthemanifold.ThehomologygroupconstructedbasedonthecubicalcomplexprovidesapracticalmethodologyforcomputingtheloopsandcavitiesfortheSESembeddedintheEuclideanspace,i.e.,ESES,fordetailtheoryaboutthehomologyoncubicalcomplexsetting,readersarereferto[133].Furthermore,thepersistenthomologytheory[72]providesameasuretomeasurethesizeofloopsandcavitiesonthemanifold.4.4.1LoopsandCavitiesDetectionsInthispart,abriefintroductionofthehomologytheoryincubicalcomplexsettingwhichgivesthegeneralframeworkforcomputingtheloopsandcavitiesintheEulerianrepresenta-tionofthemanifoldwillbeprovided,fordetailthereadersarereferto[133].Inthecubicalcomplexsetting,thehomologytheoryisbuiltupfromgeometricandalgebraicbuilding76block.Inthefollowing,wewillreviewtheseconceptsandthenturntodiscusstheapplicationofthesetoolsforloopsandcavitiesdetectiononESES.4.4.1.1GeometricBuildingBlockIntheEulerianrepresentation,thecubesarethebasicgeometricbuildingblocksofthehomologytheory,weneedthefollowingbasicconceptsofthecubes:Anelementarynon-degenerateintervalisaclosedintervalIˆRoftheformI=[m;m+1](orI=[m]forsimplicity)forsomeintegerm.AnelementarydegenerateintervalisapointI=[m;m].AnelementarycubeQordcubeisad-productifelementaryintervals,i.e.,Q=I1I2IdˆRd;whereeachIi,i=1;2;disanelementaryintervalofnon-degeneratedordegeneratedtype,anddiscalledtheembeddingnumberofQ,denotedasembQ=d.ThedimensionofQ,denotedasdimQ,istobethenumberofnon-degeneratedcomponentsinQ,andKkdenotesthesetofallkdimensionalelementarycubes.LetK:=S1d=1Kdbethesetofallelementarycubes,andKdbethesetofallelementarycubesinRd.Thesetofk-dimcubeswithembeddingnumberdisKdk:=KkTKd.Obviously,ifQ2KdkandP2Kd0k0,thenQP2Kd+d0k+k0.AsetXˆRdissaidtobecubicalprovideditcanbewrittenasaunionofelementarycubes.ForagivencubicalsetXˆRd,wethecubicalsetK(X)and77k-cubesetKk(X)by:K(X):=fQ2KjQˆXg;Kk(X):=fQ2K(X)jdimQ=kg;theelementsofKk(X)arecalledthek-cubesofX.4.4.1.2AlgebraicOperationsTostudythetopologicalpropertiesofthemolecularsurfaceintheEulerianrepresentation,thebasicoperationsontheaforementionedgeometricbuildingblockswillbepresentedinthispart.Eachelementaryk-cubeQ2Kdkisassociatedwithanalgebraicobject^Qwhichiscalledtheelementaryk-chainofRd,thesetofallelementaryk-chainsofRdis^Kdk:=f^QjQ2Kdkg,andthesetofallelementarychainsofRdis:^Kd:=S1k=0^Kdk.Thealgebraicoperationtobeonthecubicalcomplexistheadditionopera-tion.First,wenethek-chainsasthelinearcombinationofk-chain:c=a1^Q1+a2^Q2++^am^Qm;ai2Z;i=1;2;;m;thesetofalltheabovek-chainsisdenotesbyCdk.Theadditionoftwok-chainsisdby:Xai^Qi+Xbi^Qi=X(ai+bi)^Qi:Itisobviouslythatundertheadditionoperation,CdkisanAbeliangroup.Beforethehomologygrouponthecubicalcomplex,weneedtothebound-78aryoperationsonthecubicalcomplex.Totheboundaryoperator,weneedthefol-lowingscalarandcubicalproducts.4.4.1.Scalarproduct:Letc1,c22Cdk,wherec1=Pmi=1ai^Qiandc2=Pmi=1bi^Qi.Thescalarproductofchainsc1andc2isdas::=mXi=1aibi:4.4.2.Cubicalproduct:For8elementarycubesP2KdkandQ2Kd0k0,thecubicalproductbetweenPandQisdtobe:^P^Q:=\PQ:Furthermore,for8c12Cdkandc22Cd0k0,thecubicalproductis:c1c2=XP2Kk;Q2Kk0\PQ;andc1c22Cd+d0k+k0.Forthecubicalproduct,thefollowingfactorizationpropertyholds:Lemma4.4.3.For8^Q2^Kdwithd>1,thereexistsuniqueelementarycubicalchains^Iand^PwithembI=1andembP=d1,s.t.,^Q=^I^P.Withthepreviouspreparation,theboundaryoperationcanbeinthefollowinginductivemanner.794.4.4.Boundaryoperator:Fork2Z,thecubicalboundaryoperator:@k:Cnk!Cnk1;isahomomorphismofAbeliangroups,dforanelementarychain^Q2^Knkbyinductionontheembeddingnumbernasfollows:Forn=1,Qisanelementaryinterval,i.e.,Q=[m]orQ=[m;m+1]forsomem2Z,andone@k^Q=8><>:0;ifQ=[m];\[m+1]^[m];ifQ=[m;m+1]:Forn>1,letI1(Q)andP=I2(Q)In(Q)sothat^Q=^I^P,thenone@k^Q=@dim^I^P+(1)dim^I@dim^P:Bylinearity,theboundaryoperatorcanbeextendedtochains,i.e.,ifc=Ppi=1ai^Qi,then@kc=Ppi=1ai@k^Qi.Itiseasytoshowthattheboundaryoperator@k@k1=0;8k>1inthecubicalcomplexsetting.NowforagivenmanifoldXembeddedintheEuclideanspaceRdandXisrepresentedasacubicalset,let^Kk(X):=f^QjQ2Kk(X)gandletCk(X)bethesubgroupofCdkgeneratedbytheelementsof^Kk(X),whichiscalledthesetofk-chainsofX.TheboundaryoperatormapsCk(X)toasubsetofCk1(X),thusonecanrestricttheboundaryoperatortothecubicalsetX.TheboundaryoperatorforthecubicalsetX:@Xk:Ck(X)!Ck1(X)can80bebyrestricting@k:Cdk!Cdk1toCk(X),ThecubicalchaincomplexforthecubicalsetXˆRdisdeasC(X):=fCk(X);@Xkgk2Z,whereCk(X)arethegroupsofcubicalk-chainsgeneratedbyKk(X)and@XkisthecubicalboundaryoperatorrestrictedtoX.ForagivencubicalsetX,thecorrespondingk-chainsgroupCk(X)isitisstraightforwardtointroducetwosubgroupsofCk(X):k-cyclegroupZk(X):=Ck(X)Tker@kˆCk(X).k-boundarygroupBk(X):=im@Xk+1=@k+1(Ck+1(X))ˆCk(X).@k@k1=0;8k>1impliesthatBk(X)ˆZk(X),therefore,wecanthefollowinghomologygroup.4.4.5.Homologygroup:Thek-thhomologygroupofthecubicalsetXisdasthequotientgroup:Hk(X):=Zk(X)=Bk(X):Thek-thBettinumberisdastherankofthek-thhomologygroup,k=rankHk.Remark4.4.6.Hk(X)describesk-dimensionalholesofX,e.g.,H0(X)measuresconnectedcomponents,H1(X)measuresloops,andH2(X)measuresvoids.Inotherwords.0isthenumberofconnectedcomponents,1isthenumberofloops,2isthenumberofvoids,andsoon.814.4.1.3HomologyonESESInthispart,wewilldiscussthecomputationofthehomologyonthemanifoldenclosedbyESES.NotethatduringtheESESgeneration,allthegridpointsareasinsideoroutsidetheSES,theinsidegridscanbeusedtoconstructthecubicalcomplexthatrepresentsthemanifoldenclosedbySES.InthisworkthecubicalhomologycomputationwillbecarriedoutbythePerseusper-sistenthomologysoftware[181],whichisantpersistenthomologyprogramthatcanhandleboththesimplicialandcubicalcomplexrepresentedmanifolds[172].TheSESgen-eratedbythepresentESESsoftwareprovidessuitableinputdataforthePerseussoftware.Weconsiderabenchmarktestexample,theC60molecule,toillustratetheESESperfor-manceinloopandcavitygeneration[251].Figure4.13depictsC60generatedattheatomicradiusof0.8AforalltheCarbonatomswiththeproberadiusof0.1A.Ithas32rings.Figure4.14providesthenumberofloopsgeneratedbyESESattgridsizes.Whenthegridsizeusedforthesurfacegenerationiscoarserthan0.6A,thenumericalmethodcannotcaptureallthesmallloops,whichareabout0.5Aindiameter,intheC60molecule.However,whenthegridsizeisthan0.6A,alltheloopscanberesolved.Notethathomologycomputationreportsonly31loopsbecauseoneofthe32loopcanbeexpressedasalinearcombinationofallotherloops.Figure4.15showsthesolventexcludedsurfacesgeneratedwithproberadius1.4AforsomemoleculesfromthePBSAtestset.Theircorrespondingnumbersofloopsandcavitiesarealsopresentedandcalculatedatthegridresolutionof0.3A.82Figure4.13:TheSESfortheC60moleculewithproberadius0.1A,andthevanderWaalsradiusoftheCarbonatomissettobe0.8A.Figure4.14:ThenumberofloopscalculatedattgridsizesfortheaboveC60solventexcludedsurface.4.4.2PersistenceThemethodfordetectingtheloopsandcavitiesofthemanifoldformedbythemolecularsurfacehasbeenprovidedinthepreviouspart.Nevertheless,abilitytodetectingthenumberofloopsandcavitiesusuallynotveryusefulinpractice.Thesizeoftheloopsandcavitiesalsoneedsanewaytocharacterize.Persistenthomologytheoryprovidesawaytomeasurethesizeoftheloopsandcavities.83(a)(b)(c)(d)Figure4.15:Thesolventexcludedsurfacesandtheirtopologyoftwobiomolecules.(a)TheESESresultforprotein2AVHwith3loopsand1cavity.(b)Acrosssectionofprotein2AVHshowingtheloops.(c)TheESESresultforprotein1af8with2loopsand2cavities.(d)Acrosssectionofprotein1af8showingtheloopsandcavities.84ThemeasureiscalledpersistenceintheWehaveintroducedatimepropagationapproachforgeneratingthepersistenthomologyinourrecentwork[251].Inthepresentwork,wereplacethegeometricwpropagationbyconstantvelocitypropagationsoastouniformlymeasurethesizesofntloopsorcavities.Toethepersistenthomology,weneedai.e.,acomplexKtogetherwithnestedsequenceofsub-complexesfKig0in,suchthat:;=K0ˆK1ˆˆKn=K;eachsub-complexKiinthehasanassociatedchaingroupCik,cyclegroupZikandboundarygroupBik,for8i1,andthusonehasthefollowing4.4.7.Thep-persistentofkthhomologygroupKiis:Hi;pk=Zik=Bi+pk\Zik;hereHi;pkcapturesthetopologicalfeaturesoftheatedcomplexthatpersistsforatleastpstepsintheation.TomeasurethesizeofloopsandcavitiesofthemanifoldformedbytheSES,aslightlymoisneededinthepreviousthemoedsequenceofcomplexesisdas:m=K0ˆK1ˆˆKn=K;whereK0,orm,isthemanifoldformedbytheSES.KisthemanifoldthattheloopsandcavitiesofK0areTheremainingthingforbuildingthetheoreticalframeworkofpersistencemeasuringis85construction.Thelevelsetmethodprovidesageneraltheoreticalframeworkandandenumericalmethodforthesurfacepropagationwithconstantvelocity.Forthemolecularsurfacewecangiveitalevelsetrepresentation (r),suchthat: (r)8>>>>><>>>>>:>0;r2soutsidethemolecularsurface;=0;r2onmolcularsurface;<0;r2minsidethemolecularsurface:(4.4.1) (r)canbechosenasthesigneddistantfunctiontothemolecularsurfaceTopropagatethesurfacewithagivenvelocityalongtheouternormaldirectionoftheSESintheEulerianrepresentation,weintroducethetimevariablettothelevelsetfunction,i.e.,let (r)= (r;t).BytakingthederivativewithrespecttotoftheSESlevelsetfunction (r;t)=0,onehas@ @t+r @r@t=0:Notethat@r@tisexactlythesurfacepropagationvelocity,denotedas~v.Projectingthevelocityontotheouternormaldirectionofthesurfacer jr j,onehastheprojectedvelocityalongthenormaldirectionvN:=~vr jr j.Finally,thelevelsetequationfordescribingthesurfacepropagationalongtheouternormaldirectioncanbewrittenas:@ @t+vNjr j=0;(4.4.2)whichisaHamilton-Jacobiequation.ThelevelsetEq.(4.4.2)issolvedbyasimpleupwindschemewithperiodicboundarycondition.Fordetaildescriptionandnumericalimplemen-86tationofthelevelsetmethod,readersarereferredto[194,217].First,weconsiderthebenchmarktestexample,theSESofC60generatedbythesameparametersasmentionedbefore.TheloopsontheSEScanbeintotwocate-gories,thepentagonandhexagonloops,thesetwotypesofloopsareoftsizes.Inthefollowing,wesolvethelevelsetequationatgridresolution0.1Ainspatialdomaindiscretizationwiththevelocity0.1Aperunittimealongtheouternormaldirection.ToensuretheCFLcondition,thegridresolution0:002isemployedinthetimediscretization.Figure4.16depictssomeframesoftheevolutionprocedureoftheC60SESunderthedrivenofthelevelsetequation,tframesshowthattheSESispropagatealongtheouternormaldirectionwithaconstantvelocity,andaftersometime,bothtypeofloopswillbeclosed.Thepersistenttimeofthesetwotypesofloopsthesizeoftheloops,whichactuallyshouldbeproportionaltothesizeofloopssincethesurfacegrowswithaconstantvelocity.AftertheabovevalidationontheC60molecularsurface,weapplytheabovelevelsetbasedpersistenthomologytheorytobiomoleculestoinvestigatethesizeofloopsandcavitiesofthecorrespondingmanifoldenclosedbytheSESs.Alltheimplementationsarecarriedoutatthegridresolutionof0.3Ainspatialdiscretizationand0:01atthetemporaldiscretizationwhichguaranteesthestabilityofthenumericalintegrator.Thesurfaceispropagatedwithaconstantvelocity0.2Aperunittimealongtheouternormaldirection.Firstwestudythepersistenceoftheloopsandcavitiesoftheprotein1clh,theprevioushomologytheorypredictsthattheSESgeneratedwithproberadius1.4AwiththeAmberforcehas5loopsand5cavities.Figure4.17showstheframesofthegrowingsurfacesattime0,25,50,and100,respec-tively.Intuitively,duringthesurfacepropagation,themoleculeshouldbecomefatter,the87(a)(b)(c)(d)Figure4.16:ThelevelsetrepresentationoftheSESfortheC60molecule.(a)thelevelsetrepresentationoftheSES;(b),(c),and(d)representtheframeoftheevolutionattime20,40,and60,respectively.88(a)(b)(c)(d)Figure4.17:ThelevelsetrepresentationoftheSESforprotein1clh.(a)ThelevelsetrepresentationoftheoriginalSES;(b),(c),and(d)Framesofpropagatedsurfacesattime25,50,and100,respectively.89(a)(b)Figure4.18:Thepersistencediagramsoftheloops(Leftchart)andcavities(Rightchart)intheSESofprotein1clh,respectively.numericalresultsisconsistentwiththisintuition.TheleftandrightchartsofFig.4.18showthepersistenceoftheloopsandcavitiesonthemolecularsurfaceoftheproteinmolecule1clh,respectively.Thepersistencebarcodeoftheloopsshowsthattherearetwoquiteshortlivedloops,i.e.,twotinyloopsonthesurface,andonemiddlesizedloops,togetherwithtwolongpersistentloops.Thelargestloophaspersistentlengthof73,whichcorrespondstoadiameterof14.6A.ThecavitypersistencebarcodedemonstratesthatthereisnoshortlivedcavityintheSESmanifold.Fivecavitiesallhavearelativelylongpersistence.Thelargestcavityhaspersistencelengtharound93,whichcorrespondstoacavitylengthof18.6A.4.5ESES:ElectrostaticsAnalysisTofurthervalidatethepresentESESsoftware,weconsiderelectrostaticsolvationfreeenergycalculationsusingboththeESESandMSMSsurfaces.HeretheelectrostaticsolvationfreeenergiesarecomputedbasedonthePBmodelintroducedinthepreviouschapter,the90(a)(b)(c)(d)(e)(f)Figure4.19:TheconvergenceoftheelectrostaticssolvationfreeenergiescalculatedbyusingMSMSsurfacestothoseobtainedbyusingtheESESsurfaces.(a),(b)and(c)arefornucleicpeptidemoleculeswithPDBIDs1A2E,1BNAand1L4J,respectively.(d),(e)and(f)areforproteinmoleculeswithPDBIDs1a93,1acaand1b8w,respectively.AlltheenergiesareobtainedbysolvingthePBequationwiththeMIBPBsoftwareatgridsize0.5A.numericalmethodwillbepresentedinthenextchapter.Forthesakeofsimplicity,weconsiderthepurewatersolventwiththesolventdielectricconstantsettobe80while1forthesolute.Obviously,foragivenmoleculewiththesameforceassignmentandthesamePBsolver,whentheSESsgeneratedbytsoftwarepackagesareconsistentwitheachother,thecalculatedelectrostaticssolvationfreeenergiesshouldbethesame.Inthefollowing,wedemonstratethatwiththeincreasingofthedensityoftheMSMSsurface,theMSMSsurfacebasedelectrostaticssolvationfreeenergiesconvergetothosebasedontheESESsurface.Figure4.19showstheconvergenceoftheelectrostaticssolvationfreeenergiescalculatedbyusingtheMSMSsurfacestotheESESsurfaces.Allthecalculationsarecarriedoutby91thehighlyaccuratePBsolverMIBPBsoftware[90,278,280,293,253],inwhichthePBequationissolvedonaCartesianmesh.ItisshownthehighlyaccurateandrobustpropertyoftheMIBPBsolverincalculatingtheelectrostaticssolvationfreeenergies[253].WewilldiscussthisPBsolverinthenextchapter.TheMSMSsurfaceisgeneratedwithdensitiesvaryingfrom10to100.SincetheMSMSsurfaceisgivenbytheLagrangianrepresentation,i.e.,thetriangulationofthemolecularsurface,aLagrangiantoCartesiantransformationisemployedtoembedtheMSMSsurfacetotheCartesianmesh[292].4.6ConclusionSolventexcludedsurface(SES)isthemostpopularsurfaceitionincomputationalbio-physicsandmolecularbiologyforbiomolecularmodelingandsimulation.ExistingSESsoftwarepackages,suchasMSMS[212],typicallyprovideSESsintheLagrangianrepresen-tation.Forapplicationsinimplicitsolventmodels,oneneedstoconvertthetriangularSESintotheEulerianrepresentation,i.e.,theCartesiandomain.Additionally,qualityofMSMSdependsonthedensityselectedandthemethodmightnotworkwellatallrequireddensi-tiesforcertainmolecules.Therefore,itisdesirabletogenerateanalyticalSESsdirectlyontheCartesianmesh.Thisworkasoftwarepackage,calledEuleriansolventexcludedsurface(ESES),fortheconstructionofSESsontheCartesianmesh.WegenerateanalyticalSESsbasedonConnolly'salgorithm[51],whichdividesaSESintothreepatches:convexpatches,saddlepatchesandconcavepatches.Themathemati-calrepresentation,computingalgorithm,anddatastructuresforeachindividualpatchareformulated.WeimmersetheanalyticalSESintotheEuclideanspaceR3anddescribethesurfaceorinterfacebyitsintersectingcoordinateswiththeCartesianmeshlinesandassoci-92atednormaldirectionsatalltheintersectingpoints.TheproposedESESsoftwareisvalidatedbyalargenumberofbenchmarktests,includingmorphologicalvisualization,solvationanalysis,surfaceareaandenclosedvolumecalculation,andtopologicalfeatureanalysisandcharacterization.WeutilizetheAmberPBSAtestsetinourvalidation.TheMSMSsoftwareisemployedforcomparison.Inthemorphologicalvisualization,itisshownthatESESsuccessfullygeneratecorrectmorphologywhileMSMSdoesnotalwaysworkfortheAmbertestsetatalldensities.ESESalsoprovidessecondorderaccurateestimatesforbiomolecularsurfaceareaandenclosedvolume.ItisfoundthatelectrostaticsolvationfreeenergiescomputedusingtheESESareincloseconsistencewiththosecalculatedbasedonMSMS.Aspecialfeatureofthepresentsoftwareisthatitprovidesatomicsurfaceareascalculation,whichcanbeusedforatomicmodelingofnonpolarsolvationfreeenergies.Finally,weintroducehomologytheorytoaccuratelydetecttopologicalfeatures,namely,loopsandcavitieson/inSESs.Anovellevelsetbasedisproposedtomeasurethesizesofloopsandcavities.ThepresentESESsoftwarewillbeimprovedonafewaspects.First,abettermethodforrenderingthesurfaceneedtobeimplementedforthesurfacevisualization.Second,comparedtotheMSMSsoftware,ESESisanalyticalbutslower.Therefore,accelerationviamulti-threadcomputationaltechniqueswillbefurtherinvestigatedtomaketheESESsoftwaremoret.Third,arobusttriangulationtotheESESisdurableforthefurtherwork.93Chapter5CoarseGridPoissonBoltzmannSolver5.1IntroductionPoissonBoltzmann(PB)modelisamultiscalemodelwhichmodelstheelectrostaticsofthesolvatedmoleculesespeciallyforthebiologicalsystem,throughsophisticatedphysicalmodelingofthefocusingpart,e.g.,solvatedbiomoleculesmodeledwithatomisticdetail,theionsinthesolventismodeledasadensitydistribution,andthesolventismodeledasadielectriccontinuum.ThePBmodelisoneofthemostimportantimplicitsolventmodels,especiallyforthestudyofthebiologicalsystems.TherearemainlytwochallengesinnumericalsolutiontothePBmodel:Oneisthedescriptionofthesolvatedmolecularconformationstructure;theotheroneisdevelopmentoftheaccurateandconvergentnumericalsolutiontothePBequation(PBE).Theissueisaddressedinthepreviouschapter,thesecondissuewillbediscussedinthischapter.Withproperforceparametrizationofthesolutemolecule,mathematicalchallengesintermsofthenumericalapproximationtothePBEcanbesummarizedas:(i)tconstructionofthemolecularsurfaceofthesolvatedsolutemolecule;(ii)treatmentofthesingularchargesofthesolutemolecule,whichisrepresentedbythesingular-function;(iii)treatmentofthecomplexinterfacegeometryintheellipticinterfaceproblemofthePBE;94(iv)accurateevaluationoftheelectrostaticsolvationfreeenergyafterresolvingthePBE.ThereareextensivePBsoftwareavailable,andthenumericalmethodtosolvethePBmodelcanbeintothreecategories:method(FDM),elementmethod(FEM),andboundaryelementmethod(BEM).HerewegiveashortreviewofthecurrentexistingpopularPBsoftware,thecriticaladvantagesofeachPBsoftwarewillbediscussed.AFMPB[162]:AdaptiveFastMultipolePoisson-Boltzmann(AFMPB)solverisanu-mericalsimulationpackageforsolvingthelinearizedPoisson-Boltzmann(LPB)equa-tionwhichmodelselectrostaticinteractioninbiomoleculesystems.Inthispackage,aboundaryintegralequationapproachisappliedtodiscretizetheLPBequation.APBS[7]:TheadaptivePoisson-BoltzmannSoftware(APBS),togetherwiththePDB2PQRaresoftwarepackagesdesignedtohelptheusersanalyzethesolvationpropertiesofsmallandmacro-moleculessuchasproteins,nucleicacids,andothercomplexsystems.ThemultigridalgorithmisemployedforthediscretizationofthePBequation,itisoneofthemostpopularandwidelyusedPBsoftware.Delphi[209]:TheDelphisoftwareisaveryaccurateandtPBsoftware.Inwhichthesolvatedmolecularconformationisdescribedbythesmoothednumericalsurface.TheinducedsurfacechargemethodisemployedforaccurateandtPBcalculation.ItistestedthatthegridsizeonthePBcalculationisverysmall.AmberPBSA[259]:AmberPBSAisthePBsoftwareintheAmbersoftware,whichsolvesbothlinearandnonlinearformsofPBE.VariousalgorithmsareimplementedtosolvethelinearPBE,suchas,conjugategradient,moincompleteCholeskycon-jugategradient(ICCG),geometricmultigrid,andsuccessiveover-relaxationmethods95(SOR);andtosolvenonlinearPBE,astheinexactNewtonmethodinconjunctionwithmoICCGorgeometricmultigrid,conjugategradient,SOR,andothercomplexsystems.PBEQ-Charmm[127]:ThePBEQmodulewithinCharmmMDdistributionallowsthesettingupandthenumericalsolutionofthePBEonadiscretizedgridforasolutemolecule.MIBPB[39]:MatchedInterfaceandBoundaryBasedPoisson-Boltzmann(MIBPB)SolverisasoftwarepackageforevaluatingelectrostaticpropertiesofbiomoleculesviathesolutionofthePBE,anestablishedtwo-scalemodelinbiomolecularsimulations.ItdistinguishesitselffromotherPBEsolversbyrigorouslyenforcingtheinterfacexcontinuitycondition.ThischapterisfocusedonimprovingtheaccuracyandrobustnessofthepreviousMIBPBsoftware.ZAP[101]:TheZAPsoftwareproducesPBelectrostaticpotentialsand,fromthem,biologicallyinterestingpropertiesincludingsolventtransferenergies,bindingenergies,pKashifts,solventforces,electrostaticdescriptors,surfacepotentialsandedi-electricconstants.ZAPTKworkswellforsmallmolecules,proteinsandmacromolec-ularensembles.UniquetoZAPTKisadielectricfunctionbasedonatom-centeredGaussians,whichavoidsthepitfallsofdiscretedielectricconstants[276].Thischapterisorganizedasfollows:Insection5.2wewillformulatethePBmodelasanellipticinterfaceproblem,theinterfaceandboundaryconditionswillbepresented.Wewillpresentthenumericalmethodsforsolvingthefourmathematicalissueslistedaboveinsection5.3,inwhichbothMSMSandESESsurface[157,212]areutilizedforcharacterizingthesolutemolecularconformationstructure.TheGreen'sfunction[49,90]techniqueisadoptedfor96removingsingularsourceinthePBE,thematchedinterfaceandboundarymethod(MIB)[278,90]isusedforhandlingthecomplexinterfacegeometry,andthenumericalmethodforevaluatingthereactionenergywillalsobediscussed.Thenumericalresultsfortheelectrostaticssolvationfreeenergyandbindingfreeenergycalculationarepresentedinsections5.4and5.5,respectively.Thischapterendsupwithaconclusion.5.2PoissonBoltzmannmodelInthissection,wewillpresentareviewforthePBmodelinthevariationalprinciplepointofview.Consideranopendomain2R3=mSSs,wheremisthesolutedomainthatenclosedbythesurfaceformedbythebiomolecule,andsisthesolventdomain.isthemolecularsurface,forinstance,theSES,thatseparatesthesoluteandsolventdomains.Firstthechargedistributioninthesoluteandsolventdomainsaremodeledatthefol-lowingtscales:Considertheatomisticmodelingofthesolutedomain,inwhichthechargedistributionˆm(r)isgivenby:ˆm:=ˆm(r)=NmXi=1Qi(rri);whereQiisthepartialchargeofthei-thatom,andNmisthenumberofchargedatoms.Thesolventdomainismodeledasadielectriccontinuummedium,inwhichthechargedistributionismodeledbytheBoltzmanndistribution,mathematicallyformulatedas:ˆs:=ˆs(r)=NXj=1qjcjeqj˚=kBT;97whereNisthenumberofionicspecies,cj'sarethebulkconcentrationofeachionicspecies,andqj'sarechargesofeachionicspecies,kBistheBoltzmannconstant,Tistheabsolutetemperature.Furthermore,thepermittivity(r)arebothassumedtobeconstantsinsoluteandsolventdomain:(r)=8><>:s;ifr2sm;ifr2m;inthePBmodel,sandmareusuallysettobe80and1,respectively.Thisspeciallyselectedconstantswillbeadaptedforallthenumericalresultsinthischapter.Theelectrostaticsolvationfreeenergyofthewholesolvationsystemcanbeexpressedas:Gelec=Z0@˜mˆm˚˜mm2jr˚j2˜ss2jr˚j2+˜skBTNXj=1[cj(eqj˚=kBT1)]1Adr:(5.2.1)wherewehaveintroducedthecharacteristicfunction˜mand˜sforthesoluteandsolventdomain,respectively.TakingthevariationalderivativeofGelecwithrespectto˚,i.e.,setGelec˚=0,yieldsthefollowingPBequation:((r)r˚(r))=ˆm+ˆs:(5.2.2)TheEq.(5.2.2)derivedfromthevariationalprincipleisconsistentwiththeoneintroducedinchapter2.Inthefollowingcontextofthischapter,forthesakeofsimplicity,weonlyconsiderthesimplecasethationicstrengthinthesolventis0,i.e.,purewatersolvent.Theionicsolvent98casecanbetreatedinthesamemanner.ThePBEiswellposedbysubjectedtothefollowinginterfaceandboundaryconditionsconstraint:Thefarboundarycondition:˚(1)=0:Inpracticalcomputation,thefollowingDebye-Huckelboundaryconditionareenforced:˚(r)=NmXi=1Qi4ˇsjrrij(5.2.3)Acrosstheinterfacethecontinuityoftheelectrostaticspotentialandareen-forced:{Continuityofelectrostaticspotential:[˚]=˚s(r)˚m(r)=0:(5.2.4){Continuityofelectrostatics[(r)˚n]=sr˚s(r)nmr˚m(r)n=0;(5.2.5)wheren=(nx;ny;nz)istheouternormaldirectionoftheinterfacewhichpointingfromthesolutedomaintothesolventdomain.995.3NumericalMethodInthissection,wewillpresentthenumericalmethodforsolvingthePBE,thenumericalmethodishighlyaccurate,andmakecoarsegridPBsolverwithoutlossofaccuracypossible.Thenumericalmethodmainlycontainsthreeparts:Theconstructionofthesolventsoluteboundary,wechooseSESinthiscontext.Treatmentofthesingularchargesthatrepresentedbythe-functionfordescribingthesolutechargedistribution,whichwillbetreatedbytheGreen'sfunctiontechnique.Treatmentofthecomplexsolventsoluteboundarygeometry,weadopttheMatchedInterfaceandBoundarymethod(MIB)fortreatingtheversatilegeometry.Furthermore,wewillpresentthenewelectrostaticsolvationfreeenergycalculationscheme.5.3.1SolventSoluteBoundaryInthePBmodel,thesolventandsolutedomainsareseparatedbythemolecularsurfaceandeachpartismodeledasadielectricmediumwithagivendielectricconstant,theSESisthemostwidelyusedsurfacedetaildescriptionhasalreadybeenpresentedinthepreviouschapter.InournumericalcoarsegridPBsolver,weimplementsbothESESandMSMS,ESESissuitforextremelyaccuratecalculation,whileMSMSisobjectedforfastcalculationwithaslightlyaccuracyreduction.5.3.1.1MSMSSurfaceTheMSMSsurfaceisthereducedsurfaceoftheoriginalSES,whichisthesoftwarethatcanhandlethegeometricsingularitiesthatarisefromtheself-intersectingsurfaces.FordetailabouttheMSMSsurface,thereadersarereferredtoSanneretal'swork[212].InourinterfacemethodbasedPBsolver,wehavetotransformtheLagrangianrepre-100sentedsurface,i.e.,thetriangulationoftheSES,intotheEulerianrepresentation.ThistransformationneedstoembedthetriangulatedsurfaceintoaboundingboxinthethreedimensionalEuclideanspaceR3.Theembeddingcontainsthefollowingthreemajorsteps:insideandoutsidegridpoints.FindtheintersectionsofthetriangulatedreducedSESwiththeCartesianmeshline.Findtheoutnormaldirectionattheintersectionpoints.Theofthegridpointscanbedoneviaaraytracingtechnique,whichisbasedonthediscreteJordancurvelemma.TheintersectionsandoutnormalvectorscomputationcanbesimplydoneviasomebasicEuclideangeometricknowledge.Thedetailedstepsarepresentedinthework[292].ComparetotheembeddingoftheanalyticalSESintotheEuclideanspace,thisembeddingismuchsimpler,sinceMSMSisrepresentedbythesimplicialcomplexes,alltheequationsassociatedtothetransformationarelinear.5.3.1.2ESESsurfaceESESsurface,aspresentedinthepreviouschapter,isadirectEulerianrepresentedSESde-signedfortheetypeofmethodbasedPBsolvers.DuringtheESESgenerationandfurtherprocessing,nonumericalerrorisintroducedforthegridpointsclascation,in-tersectionsdetection,andouternormalvectorscalculation.Allthesethreeinstrumentsareaccuratewithoutanynumericalerror,thisprovidesthefoundationforthehighlyaccuratePBsolver.1015.3.2SolvingthePoissonBoltzmannequationThenumericalmethodforsolvingthePBmodelcontainstwoparts,treatmentofthesingularchargesandtreatmentofthecomplexgeometryoftheinterface.ClassicalapproachforhandlingthissingularchargesisthedirectprojectionofthesechargestotheCartesiangridpoints.Thecientofthisapproachisthatwhenthecoarsegridisapplied,thesolutechargemaybeprojectedtothegridpointsinthesolventdomain.Thisunreasonableprojectionmayleadstounacceptableerrorwhenthelargeofthedielectricconstantsexistsbetweensolventandsolutedomains.AnothertreatmentofthesesingularchargesisbasedontheGreen'sfunctionofthesingularcharges,andintermsofaccuracy,Green'sfunctionformalismissuperiortotheclassicaldirectprojectionmethod[90].TohandlethecomplexgeometryoftheinterfaceinthePBmodel,oramoregeneralclassofellipticinterfaceproblems.WeneedtonotethefactinLemma5.3.1,whichhelpstothedevelopmentofthenumericalschemefortheellipticinterfaceproblems.Lemma5.3.1.IftheinterfaceislocallyLipschitz,andagivenfunction˚dinbothmandsarecontinuousandtheorderpartialderivativesarewelld.Atanypointoftheinterface,thederivativesoffunctionarecontinuousalongthetangentialdirections,providedthefunctioniscontinuousacrosstheinterface.Proof.(Sketchoftheproof.)Letusdenotethefunction˚tobe˚mand˚sinmands,respectively.Andletf:=˚m˚s,itisobviouslythatfj=0.For8(x0;y0;z0)2thereisatangentialplaneat(x0;y0;z0)totheinterface8˝2wehave:@˚m@˝@˚s@˝=@f@˝=rf˝:Further,notethatfj=0,whichmeansˆf(x;y;z)jf(x;y;z)=0g,i.e.,theinterface102canberepresentedbytheimplicitfunctionf(x;y;z)=0.Obviously,rf=n,whichisthedirectionnormaltonat(x0;y0;z0).Hence@˚m@˝@˚s@˝=rf˝=n˝=0:Bythearbitraryof˝,theconclusionobtained.Basedontheabovelemma,wecanintroducetwomoreinterfaceconditionsonthetan-gentialdirections,thusagroupoffourinterfaceconditionswillbeavailablefordesigningnumericalschemeforsolvingPBE.Thisistestedtobeenoughfordesigningsecondor-derconvergentnumericalschemefordiscretizingthePBE.Onestate-of-the-artaccurateschemeMIBmethodutilizedtheabovefactwithoutproof[278,295].Manyotherprestigiousschemesalsoexistforsolvingtheellipticinterfaceproblem.Theglobalsecondconvergenceoftheellipticinterfacenumericalschemeisprovedinthework[17],whichprovidesageneralframeworkforprovingtheconvergenceofthegeneraldibasedellipticinterfacescheme,Mayo'smethod[170,168]andImmersedInterfaceMethod(IIM)scheme[115,149]isutilizedfortheillustrations.5.3.3ElectrostaticSolvationFreeEnergyCalculationOnemajorapplicationofthePBmodelisforevaluatingtheelectrostaticsolvationfreeenergy,whichalsoprovidesagoodcriteriaformeasuringtheaccuracyofthePBsolver.Inthissection,wewillprovidethenumericalmethodforcalculatingtheelectrostaticssolvationfreeenergies.1035.3.4ReactionpotentialrepresentationofthesolvationfreeenergyIntheconventionalimplicitsolventtheory,thereactionpotentialisasthebetweentheelectrostaticpotentialinsolventandinvacuum,thatis,˚rec(r)=˚dielec(r)˚vac(r);(5.3.1)where˚rec;˚dielecand˚vacarethereactionpotential,electrostaticpotentialinsolventandvacuum,respectively.Thevacuumelectrostaticpotentialiscalculatedthroughsettingthesolventdielectricconstantthesameasthatinthesolutedomain.Theelectrostaticssolvationfreeenergy,orreactionenergy,isby:GRF=12NmXi=1Q(ri)˚rec(ri);(5.3.2)whereQ(ri)arethechargeatthepositionri.5.3.5ApproximatetheatomiccenterreactionpotentialAccordingtotheequationforevaluatingtheelectrostaticssolvationfreeenergy,Eq.(5.3.2),theelectrostaticspotentialattheatomiccentersareneeded.Whereas,accordingtoournumericalscheme,onlytheelectrostaticpotentialonthegridoftheCartesianmeshgridsisobtained,weneedtoapproximatetheelectrostaticpotentialattheatomiccentersbythatatthegridpoints.Inthiswork,thetrilinearinterpolationschemewillbeutilizedforinterpolatingtheelectrostaticspotentialattheatomiccenters.For8ri:=(x0;y0;z0),supposetheclosestgridtoriinsidethesolutedomainisindexed104by(i;j;k).Thefollowing27gridswillbeutilizedforthefurtherinterpolation:f(i+m;j+n;k+p)jm=1;0;1;n=1;0;1;p=1;0;1g:(5.3.3)Thetrilinearinterpolationschemeisgiveninthefollowingsteps:Interpolatethevaluesatthefollowing9points:f(x0;j+n;k+p)jn=1;0;1;p=1;0;1g;bytheabove27gridsf(i+m;j+n;k+p)jm=1;0;1;n=1;0;1;p=1;0;1g,duetotheutilizationoftheuniformCartesianmesh,wehave:˚recj(x0;j+n;k+p)=wx;1˚recj(i1;j+n;k+p)+wx;0˚recj(i;j+n;k+p)+wx;1˚recj(i+1;j+n;k+p);forn=1;0;1;p=1;0;1,wherewx;1;wx;0andwx;1aretheLagrangianinterpo-lationcots,itiseasytobeobtainedviaFornberg'smethod[77].Interpolatethevaluesatthefollowing3points:f(x0;y0;k+p)jp=1;0;1g;bytheninepointsf(x0;j+n;k+p)jn=1;0;1;p=1;0;1g,similartotheabovescheme,wehave:˚recj(x0;y0;k+p)=wy;1˚recj(x0;j1;k+p)+wy;0˚recj(x0;j;k+p)+wy;1˚recj(x0;j+1;k+p);105forp=1;0;1,herewy;1;wy;0andwy;1aretheinterpolationcots.Interpolatethevalueattheatomcenter(x0;y0;z0)bythe3pointsf(x0;y0;k+p)jp=1;0;1g,whichisgivenby:˚recj(x0;y0;z0)=wz;1˚recj(x0;y0;k1)+wz;0˚recj(x0;y0;k)+wz;1˚recj(x0;y0;k+1);wz;1;wz;0andwz;1aretheinterpolationcots.Insum,theapproximationofthereactionpotentialattheatomiccenterriisgivenby:˚recj(x0;y0;z0)=1Xm=11Xn=11Xp=1wx;mwy;nwz;p˚recj(i+m;j+n;k+p):(5.3.4)5.3.5.1TheextensionofthereactionpotentialIntheEq.(5.3.4),fortheatomiccentersthatclosetotheboundarywhenthecoarsegridemployed,itispossiblethatsomeofthisgridsmayinthesolventdomain,ifwedirectlyusethereactionpotentialcomputedfromthepreviousPBsolver,theaccuracyinevaluatingthesolvationfreeenergywillbereduced.Therefore,weneedtoextendthereactionpotentialinthesolutedomaintotheoutsidegridsthatreferredininterpolatingthereactionpotentialattheatomiccenters.Foragivengrid(i1;j1;k1)belongstotheabove27grids,theschemesthatusedforex-tensionofthereactionpotentialat(i1;j1;k1),˚rec(i1;j1;k1),arelistedinthefollowingaccordingtoitspriority,theschemeemployedshouldhaveastoppriorityaspossible.Toppriority:Usethesumofvalueandtheextendedsolutionofthebound-aryvalueproblem(spfortheGreen'sfunctiontreatmentofthesingularchargecase)atthegridpoint(i1;j1;k1)astheextendedreactionpotentialat(i1;j1;k1).106Middlepriority:Choosethreeconsecutivegridpointsnextto(i1;j1;k1)alongagivendirectioninthesolutedomaintoextrapolatethereactionpotentialat(i1;j1;k1).Lowpriority:Selecttwoinsidegridsneighborto(i1;j1;k1),say,(i2;j2;k2)and(i3;j3;k3)andapproximatethereactionpotentialat(i1;j1;k1)by:˚rec(i1;j1;k1)=˚rec(i2;j2;k2)+˚rec(i3;j3;k3)˚rec(i;j;k):5.3.6ElectrostaticBindingFreeEnergyCalculationAnotherimportantapplicationofthePBmodelisthecalculationoftheelectrostaticbindingfreeenergy,whichisacrucialpartofthebindingenergybetweentmolecules,itisofgreatimportanceforthecomputeraideddrugdiscovery.ConsiderthebindingoftwomoleculesAandB,wherethecomplexafterthebindingoftwomoleculesisdenotedbyC,byasimplethermodynamiccycleanalysis,theelectrostaticsbindingfreeenergyofthisprocessisgivenby:Gel=GRF)CGRF)AGRF)B+Gel)coulomb;(5.3.5)whereGRF)Cistheelectrostaticsolvationfreeenergyoftheboundedcomplex,GRF)AandGRF)Baretheelectrostaticsolvationfreeenergiesoftheunboundedcomponents,andGel)coulombistheelectrostaticsbindingfreeenergyofthetwocomponentsinvacuum,107Figure5.1:Leftchart:bindingofDNA-drugcomplex(PDBID:121D).Rightchart:bindingofbarnbase-barstarcomplex(PDBID:1b3s).whichiscomputedbytheformula:Gel)coulomb=Xi;jqiqjmrij;(5.3.6)wherethesummationistakenoverallthosepairsofatomswhichhaveonememberineachcomponentofthecomplex,qi;qjarethecorrespondingchargesofthegiveninteractedpairatoms,andrijisthedistancebetweenthispairs,misthedielectricconstantofthesolutedomainasbefore.Figure5.1illustratestwobindingexamples,theleftchartisbindingoftheDNA-drugcomplex,therightchartisbindingofthebarnbase-barstarcomplex.1085.4NumericalResults-SolvationFreeEnergyInthissection,alargeamountofnumericalresultsthatusedtoverifythegridspacingindependentpropertiesofthecurrentPBsolverswillbepresented,theconsistentwithtPBsolverswillalsobeaddressed.TherobustnessofthePBsolverisdirectlyvbythehugeamountofthetestexamples.Ourpreviousresultsshownthatwiththeincreasingofthesurfacetriangulatedensity[157],thesolvationfreeenergyonMSMSsurfacewillconvergetoourESESanalyticalsolventexcludedsurface,alsotheESESismuchmorerobustthantheMSMSsurfaceaccordingtothetestontheAmbertestset.5.4.1AnalyticalTestsInthissubsection,thePBsolverwillbetestedbythebenchmarktestswithexactsolution,says,thedielectricspherewithasinglecenterdistributedchargeandmultiplecharges,theanalyticalsolutiontothepreviousoneisgivenbythedielectricBorntheory,whileKirkwoodtheory[136]givestheanalyticalsolutiontothelatercase.5.4.1.1Singlecenter-distributeddielectricsphereFirst,considerthedielectricspherewithsinglecentraldistributedunitchargewithtradius.Borntheorystatesthat,foradielectricspherewithradiusRandcenterchargeqplacedinasolventwithdielectricconstantand1forthedielectricsphereitself,theelectrostaticsolvationfreeenergyisGRF=q22R11:(5.4.1)Table5.1showstheelectrostaticsolvationfreeenergyforthedielectricspherewitha109centraldistributedunitpositivecharge,calculatedfromgridsize0.1to1.1Aandtheexactvalue.tradiusofthedielectricspheresaretested,wheretheradiusarethetypicalatomicradiiintheAmberforceTable5.1:Electrostaticsolvationfreeenergiesofthedielectricsphereswiththecentrallocatedunitpositivechargeequippedwithtradius.GridSizeAtomradius1.11.00.90.80.70.60.50.40.30.20.1Exact1.1-143.63-116.22-148.63-145.9-146.03-148.84-149.00-148.98-149.01-149.04-149.05-149.051.3-125.75-118.24-126.03-123.07-125.75-125.99-126.03-126.07-126.10-126.11-126.12-126.121.359-120.36-117.25-120.61-119.68-120.21-120.54-120.59-120.61-120.63-120.64-120.65-120.651.4-116.88-103.73-117.10-116.89-116.97-117.03-117.07-117.08-117.10-117.11-117.11-117.111.5-109.17-103.43-109.18-109.17-109.22-109.25-109.28-109.29-109.29-109.30-109.31-109.131.55-105.68-101.74-105.61-105.68-105.70-105.72-105.74-105.76-105.77105.78-105.78-105.781.7-94.44-95.80-96.34-96.40-96.39-96.40-96.42-96.43-96.44-96.44-96.45-96.451.8-90.96-90.96-91.02-91.01-91.04-91.05-91.07-91.07-91.08-91.09-91.09-91.091.85-88.52-88.52-88.57-88.57-88.58-88.59-88.61-88.62-88.62-88.62-88.63-88.632.0-81.86-81.95-81.90-81.92-81.95-81.96-81.98-81.97-81.98-81.98-81.98-81.98Accordingtotheaboveresults,theelectrostaticsolvationfreeenergiescalculatedbythecurrentPBsolverconvergeveryfasttotheexactvalue.Furthermore,itshouldbepointedoutthattheproposedmethodcanprovideveryreliablecalculationevenwithverylowgridresolution.Forinstance,whenthedielectricspherewithradius1.1A,thecalculatedreactionenergyis-143.63kcal/molatgridspacing1.1A(thatis,onlyonegridinsidethedielectricsphere),theexactvalueofwhichis-149.05kcal/mol.5.4.1.2MultiplechargedielectricsphereTofurthertesttheaccuracyofthePBsolver,especiallytheGreen'sfunctiontreatmentofthesingularcharges,inthispart,wefurthertestonthedielectricsphere,whilenowtherearemultiplechargesdistributedinthedielectricsphere.TheanalyticalsolutiontothesecasesisduetoKirkwood'swork[136].Weconsiderthefollowingetdistributionsofpointchargesbuttheirradiiareallsettobe2A.Case1.Twopositiveunitchargessymmetricallyplacedat(1;0;0)and(1;0;0).110Case2.Twopositiveunitchargessymmetricallyplacedat(1;0;0)and(1;0;0),andtwonegativeunitchargessymmetricallyplacedat(0;1;0)and(0;1;0).Case3.Twopositiveunitchargessymmetricallyplacedat(1:2;0;0)and(1:2;0;0),andtwonegativeunitchargessymmetricallyplacedat(0;1:2;0)and(0;1:2;0).Case4.SixPositiveunitchargesplacedat(0.4,0.0,0.0),(0.0,0.8,0.0),(0.0,0.0,1.2),(0.0,0.0,-0.4),(-0.8,0.0,0.0)and(0.0,-1.2,0.0).Case5.SixPositiveunitchargesplacedat(0:2;0:2;0:2);(0:5;0:5;0:5);(0:8;0:8;0:8),(0:2;0:2;0:2);(0:5;0:5;0:5)and(0:8;0:8;0:8).ThethreecaseswereemployedbyLitocompareDelphiandAmberPBperformance[3].Thelasttwocaseswereusedinourearlierworkin2007[90],whichalsocontainsmanytestexamplessimilartothethreecases.ThesebenchmarktestsaremorethantheBornmodel.WehighlyrecommendPBmethodologydeveloperstoconsiderthesetestcasesbeforetheytrytomakeanyboldstatementaboutthePBmodeland/ortheirmethods.Table5.2listselectrostaticsolvationfreeenergiesoftheMIBPBmethodfortheabovemultiplechargetestsonasetofmeshsizesrangefrom1.1to0.1A.ForCase1,allMIBPBrelativeerrorsarelessthan1%.ForCase2,MIBPBissmallerthan5%onallgridsizes.Case3isrelativelybecausechargesarelocatedmoreclosetotheinterface.Forthelasttwocases,errorsareboundby1.5%onallgridsizes.5.4.2RobustontSESInthispart,weemploy25biomoleculesanddatasetusedinLi'swork[3]asthetestsetforcomparingtheperformanceoftheDelphi[209]andAmberPBSA[259]PBsoftware,111Table5.2:Electrostaticsolvationfreeenergies(kcal/mol)ofKirkwooddielectricspherewithmultiplechargescalculatedbyMIBPB,whereREistherelativeerrorcomparetotheanalyticalelectrostaticsolvationfreeenergy.Case1Case2Case3Case4Case5GridGRFREGRFREGRFREGRFREGRFRE1.1-351.120.39%-63.611.27%-135.410.00%-2974.590.49%-3079.701.43%1.0-352.540.80%-65.664.53%-152.3212.45%-2952.831.22%-3138.850.47%0.9-348.840.25%-60.703.36%-131.882.59%-2993.720.15%-3099.031.80%0.8-351.200.42%-64.132.10%-137.331.42%-2996.270.23%-3110.380.44%0.7-350.560.24%-63.240.68%-135.170.16%-2995.020.19%-3103.840.65%0.6-350.680.27%-63.771.52%-137.341.43%-2994.730.18%-3118.620.18%0.5-350.390.19%-63.501.09%-136.640.92%-2986.620.09%-3124.430.00%0.4-350.020.08%-63.100.46%-136.270.64%-2991.700.08%-3119.220.15%0.3-349.810.02%-61.951.36%-136.200.59%-2991.010.06%-3120.240.13%0.2-349.720.00%-62.900.14%-135.790.29%-2989.890.02%-3122.890.04%0.1-349.640.02%-62.810.00%-135.400.00%-2989.540.00%-3123.900.01%Exact-349.73-62.81-135.40-2989.30-3124.30thedatawasoriginallydownloadedfromtheproteindatabank.AlltheHETATMrecordsinPDBareremovedbyusingMMTSBtoolset,andtheAMBER99SBforcewasemployedforthePDBparametrization.ThestructureswerefurtheroptimizedbytheAMBERsoftwarewiththesameprotocolusedbythework[3].ThePDBIDsforthissetofproteinsare:1ajj,1ptq,1vjw,1bor,1fxd,1sh1,1hpt,1fca,1bpi,1r69,1bbl,1vii,2erl,451c,2pde,1cbn,1frd,1uxc,1mbg,1neq,1a2s,1svr,1o7b,1a63,and1a7m.Table5.3showstheelectrostaticsolvationfreeenergies(kcal/mol)calculatedbyusingtheMIBPBmethodwithSESsgeneratedbytheMSMSsoftwareatdensity10.Similarly,Table5.4showsthetheelectrostaticsolvationfreeenergies(kcal/mol)calculatedbytheMIBPBsoftwarewithSESsgeneratedbytheESESsoftware.First,therearesomeminordiscrepancies(lessthan1%)betweentheelectrostaticsolvationfreeenergiescomputedbytwoSESs.MSMSaccuracydependsonitstriangularmeshdensityanditsresultsconvergetothoseESESataveryhightriangularmeshdensity,say100trianglesperA2.Additionally,foragivensurface,MIBPBisabletodeliververyconsistentresults.Theresultslistedinthe112Table5.3:Electrostaticsolvationfreeenergies(kcal/mol)calculatedviatheMIBPBsoftwarewithMSMSsurfaceattgridsizes.Gridsize(A)ID1.11.00.90.80.70.60.50.40.30.21ajj-986.26-984.17-986.38-986.89-985.82-986.68-986.64-986.85-987.09-987.151ptq-723.95-722.99-724.50-725.34-724.54-725.51-725.18-725.49-725.64-725.681vjw-1120.64-1117.29-1119.03-1119.37-1120.61-1121.60-1121.98-1121.52-1121.95-1122.121bor-773.51-774.38-777.01-776.91-778.85-778.04-778.33-778.65-778.66-778.71fxd-2424.97-2433.53-2432.13-2433.24-2434.06-2435.02-2434.64-2435.11-2435.14-2435.381sh1-568.12-570.69-570.74-571.80-573.34-573.98-573.48-573.88-574.05-574.321hpt-662.19-666.16-664.79-666.10-667.18-666.89-667.57-668.17-668.25-668.451fca-1175.78-1181.58-1181.77-1180.31-1181.93-1181.71-1181.72-1182.04-1182.10-1182.111bpi-1150.37-1148.62-1149.68-1150.61-1151.08-1151.53-1152.48-1152.38-1152.63-1152.821r69-945.34-938.93-940.63-940.05-942.28-942.08-941.88-941.98-942.15-942.231bbl-846.48-846.80-847.00-846.90-846.64-847.33-847.45-847.08-847.14-847.101vii-683.20-677.26-679.27-680.61-679.94-680.31-681.19-681.28-681.40-681.552erl-919.91-915.41-918.49-918.92-919.00-919.68-919.84-920.21-920.33-920.57451c-848.53-843.01-849.20-847.92-847.92-847.63-847.85-848.21-848.03-848.072pde-798.26-799.19-798.87-799.86-800.03-798.81-799.39-799.44-799.22-799.261cbn-298.67-301.97-303.85-303.68-304.11-304.46-304.41-304.79-304.81-304.921frd-2561.35-2556.63-2557.71-2558.70-2557.06-2559.33-2560.09-2560.74-2561.27-2561.721uxc-917.39-914.88-918.19-919.73-920.18-921.01-921.32-921.66-921.75-922.021mbg-1286.82-1281.16-1283.11-1285.37-1285.74-1285.61-1285.67-1286.08-1286.23-1286.451neq-1474.17-1471.80-1470.82-1472.88-1473.94-1474.13-1474.74-1475.49-1475.62-1475.631a2s-1846.33-1842.61-1849.40-1847.77-1848.00-1847.38-1848.25-1848.48-1848.86-1849.091svr-1319.62-1320.81-1320.06-1323.10-1322.45-1323.89-1324.12-1324.76-1324.68-1324.991o7b-1738.56-1747.28-1743.36-1743.60-1743.93-1745.48-1746.05-1746.18-1746.77-1747.121a63-1940.89-1940.05-1936.96-1939.81-1941.67-1942.27-1942.61-1943.36-1944.14-1944.551a7m-2028.72-2033.44-2032.74-2029.40-2031.87-2033.64-2033.31-2033.66-2034.26-2034.32tables5.3and5.4thegrid-sizeindependencepropertyoftheMIBPBsoftware,alsothenumericalsolutiontothePBdoesnotheavilydependonthesolventexcludedsurfacegeneration.Figure5.4depictstherelativeerrorsoftheelectrostaticsolvationfreeenergiescalcu-lationbyMIBPBsoftwareonsurfacesgeneratedbybothESESandMSMS.TheseresultsdemonstratethatMIBPBsoftwareonbothsurfacesareofthesameleveltheaccuracy.Theirrelativeerrorsareremarkablylessthan0.4%whenthemeshsizeisfrom1.1Ato0.2A.Therefore,ifone'sgoalis1%relativeerror,onecanjustuseameshsizeascoarseas1.1AinMIBPBbasedsolvationanalysis.113Table5.4:Electrostaticsolvationfreeenergies(kcal/mol)calculatedviatheMIBPBwithESESsurfaceattgridsizes.Gridsize(A)ID1.11.00.90.80.70.60.50.40.30.21ajj-996.11-995.63-993.17-993.03-991.34-988.16-987.20-986.99-987.12-987.031ptq-720.68-719.62-720.68-720.76-721.48-721.63-721.71-722.04-721.95-721.971vjw-1122.67-1123.35-1124.63-1127.13-1124.10-1125.55-1125.15-1125.20-1125.06-1125.221bor-771.50-773.85-774.17-772.50-774.31-773.74-774.04-774.37-774.44-774.451fxd-2420.46-2425.07-2423.90-2426.91-2427.53-2427.33-2428.19-2428.53-2428.50-2429.051sh1-564.58-564.16-569.12-569.23-569.74-569.50-570.31-570.59-571.01-571.201hpt-661.72-656.56-661.03-661.16-663.54-663.66-664.00-663.96-664.10-664.151fca-1190.83-1193.85-1191.26-1190.80-1191.67-1187.04-1188.83-1188.04-1187.68-1187.261bpi-1139.46-1145.12-1147.15-1145.35-1146.96-1147.57-1147.99-1148.14-1148.28-1148.361r69-932.98-937.44-934.49-936.89-937.27-937.32-937.10-937.44-937.59-937.661bbl-838.68-839.89-841.26-842.17-843.02-842.79-842.00-842.80-842.73-842.651vii-672.12-674.33-674.94-677.13-676.71-677.78-677.69-678.05-678.42-678.642erl-914.31-919.35-918.70-918.65-917.98-917.68-918.68-918.60-918.79-918.87451c-839.89-838.39-842.24-841.94-842.56-842.44-842.56-843.26-843.78-843.962pde-809.06-814.63-812.91-810.62-814.13-813.32-812.27-813.04-813.59-813.421cbn-300.65-299.73-301.79-302.88-301.94-302.24-302.37-302.50-302.60-302.671frd-2546.19-2549.42-2551.50-2553.10-2554.68-2555.89-2556.43-2557.30-2557.69-2558.101uxc-909.77-908.98-912.19-915.23-914.40-916.34-916.68-916.93-917.18-917.441mbg-1277.10-1278.07-1277.80-1278.14-1280.52-1281.24-1281.42-1281.74-1281.86-1282.061neq-1459.44-1464.56-1467.13-1467.57-1468.99-1469.33-1469.55-1469.64-1470.11-1470.121a2s-1834.76-1838.39-1841.63-1842.01-1841.72-1842.06-1842.67-1843.19-1843.19-1843.611svr-1309.47-1310.67-1314.35-1316.24-1317.25-1317.10-1317.75-1317.89-1318.42-1318.371o7b-1734.47-1726.36-1739.98-1741.12-1739.71-1740.59-1742.22-1742.57-1743.25-1743.391a63-1937.29-1942.42-1940.78-1943.18-1941.26-1942.13-1941.64-1942.12-1942.27-1942.671a7m-2038.64-2032.48-2034.48-2034.91-2038.38-2040.39-2037.91-2038.62-2038.73-2038.265.4.3ComparisonbetweentPBSolversInthispartwewillcomparetheperformanceofourPBsolverwiththeAmberPBSAandDelphiPBsolvers,thecomparisonstheconsistencyofthethreesolvers,andthegridindependentessenceoftheMIBPBsolver.Figure5.3depictstheelectrostaticsolvationfreeenergiesforthe25proteinmoleculescalculatedbythethreetPBsolversatthegridsizefrom0:2to1:1A.TheresultsdemonstratesthatnotonlythegridalmostindependentpropertiesoftheMIBPBsolverincomputingtheelectrostaticsolvationfreeenergy,butalsotheconsistenceofthethreePBsolver.Ingeneral,thesolvationfreeenergiescalculatedbytheAmberPBSAconvergedecreasetothatoftheMIBPB,whereasthatbyDelphiconverges114Figure5.2:Therelativeerrorsoftheelectrostaticsolvationfreeenergiescomparedtotheresultscalculatedat0.2AcomputedbytheMIBPBmethodonsurfacesgeneratedbyESESandMSMSaveragedover25proteins.increasetothatoftheMIBPB.MIBPBsoftwarearegridsizeindependentforbothMSMSandsolventexcludedsurfaces.TomeasuretheaccuracyofthePBsolver,sincethereisnoanalyticalsolutionfortheelectrostaticssolvationfreeenergyonthebiomolecules,thegriddependenceoftheenergiesareexamined.Errorsareestimatedeitherwithrespecttotheanalyticalsolutionifitisavailableortheresultatthegridifthereisnoanalyticalresult.Relativeerror:=jGhGgridjjGgridj;(5.4.2)whereGhandGgridaretheelectrostaticssolvationfreeenergiescalculatedatthegridsizehandthegridused.Figure5.4depictstherelativedeviationofthethreePBsolvers,whereMIBPButilizedtwotsurfaces,ateachgridcomparetotheresultsatthegridsize0.2A,obviousthe115Figure5.3:PerformanceoftPBsolvers,andtheMIBPBsolveronbothMSMSandsolventexcludedsurfaces.Fromlefttoright,uptobottom,the25representsthemoleculewithPDBID:1a2s,1a63,1a7m,1ajj,1bbl,1bor,1bpi,1cbn,1fca,1frd,1fxd,1hpt,1mbg,1neq,1o7b,1ptq,1r69,1sh1,1svr,1uxc,1vii,1vjw,2erl,2pde,451c,respectively.116MIBPBsolversaremuchstablerthanbothAmberandDelphi,andthegridsizeindependentpropertiesofMIBPBsolverissurfaceindependent.Figure5.4:TherelativedeviationoftheelectrostaticssolvationfreeenergiesbytPBsolversandMIBPBontsurfacescomparetotheresultsatthesmallestgrid.However,thereisnofreelunch.ThestableofourPBsolverincomputingtheelectrostat-icssolvationfreeenergyisbuildonmoreCPUcostatagivengridspacing.Atagivengridsize,bothAMBERPBSAandDelphiareseveraltimesfasterthanourPBsolvers.tPBsolvershavetheirownponsandcons.OurPBsolvermakesthecoarsegridcalculatingpossiblewhichalsoleadstothespeedsupthePBsolverthroughthecoarsegridelectrostaticcalculation.Figure5.5illustratestheaveragecpucostofthetPBsolversandtheMIBPBsolveronbothMSMSandsolventexcludedsurfaces.AllthetestarecarriedoutonthesameIntel14computingnodeonthehighperformancecomputingcenteroftheMichiganstateuniversity.Thetestontheabove25biomoleculesshowsthattheelectrostaticssolvationfreeenergies117Figure5.5:Theaveragecpucostofthe25biomoleculesofthetPBsolversandMIBPBontsurfaces.calculatedbythreetPBsolversareconsistentwitheachother,MIBPBsolverismorestablerthanbothAmberandDelphitheresultsbyMIBPBsolverarealmostgridsizeindependentforbothMSMSandsolventexcludedsurfaces.Atthesamegridsize,bothAmberandDelphiareseveraltimesfasterthanourPBsolver.5.4.4ConvergenceTestonPBSATestSetTofurthertestthegridsizeindependentpropertyofthecurrentPBsolver,inthispartweadoptourPBsolvertoamuchlargertesttest,theAmberPBSAtestset,asalreadyusedasthebenchmarktestfortheESESsurfacegenerationinthepreviouschapter.Figure5.6showstherelativedeviationoftheelectrostaticssolvationfreeenergywithrespecttothatatgridsize0.3Aforthe937biomolecules.Thedeviationconvergesto0monotonicallywiththetofthegridsize,andtheerrorlevelisconsistentwiththeabove25testset.Alsonotethatevenatthegridsize1.1Atherelativeerrorislessthan1180.4%,whichfurtherthatthegridsizeindependentpropertiesofthecurrentPBsolver.Figure5.6:Therelativedeviationoftheelectrostaticssolvationfreeenergywithrespecttothatatgridsize0.3Aforthe937biomolecules.5.5NumericalResults-BindingFreeEnergyAfterthestudyoftheaccuracyoftheelectrostaticsolvationfreeenergy,inthispart,weturntostudytheelectrostaticbindingfreeenergy.Theaccurateelectrostaticbindingfreeenergycalculationessentiallydependsontheaccuracyoftheelectrostaticsolvationfreeenergycalculation.However,theaccurateelectrostaticbindingfreeenergycalculationismuchmorechallenging,heretheaccuracyaremeasuredintwoways,isthequalitativerankingthemoleculesbasedontheenergy,secondisthequantitativeresultoftheenergyvalue.Duetothefactthatthemagnitudeofelectrostaticbindingfreeenergyitselfisusuallymuchlessthanthatofsolvationfreeenergy,thenumericalerrormayleadtounacceptableerror,whichmayyieldsthetrankingresultsonthemoleculesprovidedtgrid119spacingusedforthenumericalsolutionofthePBequation[107,24].Recently,ithasbeenshownthatthewidelyusedgridspacingof0.5AproducesunacceptableerrorsinGel[107].5.5.1DataSetsInthepresentwork,weadoptthreesetsofbiomolecularcomplexesemployedintheliterature[107]forsolvationfreeenergyandbindingfreeenergyestimations.Sp,theset,DataSet1,isacollectionofDNA-minorgroovedrugcomplexeshavinganarrowrangeofGel.TheProteinDataBank(PDB)IDs(PDBIDs)forthissetareasfollows:102d,109d,121d,127d,129d,166d,195d,1d30,1d63,1d64,1d86,1dne,1eel,1fmq,1fms,1jtl,1lex,1prp,227d,261d,164d,289d,298d,2dbe,302d,311d,328d,and360d.Thesecondset,DataSet2,includesvariouswild-typeandmutantbarnase-barstarcomplexes.ItsPDBIDsareasfollows:1b27,1b2s,1b2u,1b3s,2aza4,1x1w,1x1y,1x1u,and1x1x.Inthelastset,DataSet3,weinvestigateRNA-peptidecomplexeswithfollowingPDBIDs:1a1t,1a4t,1biv,1exy,1g70,1hji,1i9f,1mnb,1nyb,1qfq,1ull,1zbn,2a9x,and484d.ThedetailofthestructuralprepossessingcanbefoundinRef.Harrisetal.5.5.2ResultsandDiscussionAsdescribedabove,weconsiderthreesetsofbindingcomplexes,namely,drug-DNA,barnase-barstarandRNA-peptidesystems.Intherestofthissection,weexploretheofgridspacinginPBEsolvationandbindingfreeenergyestimationsusingourMIBPBsolver.Motivatedbywell-convergedestimationsofelectrostaticsolvationfreeenergiesatverycoarsegridspacingsaspreviouslydiscussed,weareinterestedinpredictingthebindingfree120energiesforallRNA-drug,barnase-barstar,andRNA-peptidecomplexesusingourMIBPBpackage.Wecorrelatethebindingfreeenergycalculatedatthegridspacing,h=0.2A,andonesestimatedatcoarsermeshsizes,h=0.3A,,1.1A.Figure5.7illustratestheserelationshipswiththeregressionlineswhoseparametersarerevealedinTable5.5.Indeed,thePBbindingenergyestimationbehavesthesameasthePBsolvationcalculationinourMIBPBtechnique.Sp,R2isalways1atthemesh,h=0.3A.Moreover,thesevaluesarestillsatisfactoryatrelativelycoarsermeshsizes.Forexample,atthegridspacingofh=1.1,theR2andslopeoftheregressionlineforDNA-drug,barnase-barstar,andRNApeptidecomplexesare,respectively,(0.9747,1.0081),(0.9965,0.9974),and(0.9999,0.9974).Incontrast,theR-squaredvaluesreportedinRef.[107]computedbetween0.3Aand1.0AareunacceptableforSESs,andusuallylessthan0.62.Ourstatisticalmeasuresstronglysupportthereliablebindingenergypredictionofoursolveratcoarsegridsizes.ThetrendofbindingfreeenergyattgridspacingscanbeseenclearlyinFigs.5.8-5.10whichplotsGelagainstgridsizevaryingbetween0.2Aand1.1AforDNA-drug,barnase-barstar,andRNA-peptidecomplexes,respectively.BasedontheseoursolvercanrankthebindingfreeenergyforDNA-drugcomplexesatgridspacingof0.6Abarnase-barstarcomplexesatgridspacingof0.8AandRNA-peptidecomplexesattlycoarsegridspacingof1.1A.Therefore,wecandrawaconclusionthatthecommonuseofgridsizebeing0.5Aisstilladequateforpredictingthebindingenergyfreewithoutproducingamisleadingresult.121Figure5.7:Electrostaticbindingfreeenergy,forallcomplexeswithtgridsizesplottedagainsttheonecomputedwithagridsizeofh=0.2A(a)DNA-drugwithpair(0.2A,0.3A);(b)DNA-drugwithpair(0.2A,0.7A);(c)DNA-drugwithpair(0.2A,1.1A);(d)Barnase-barstarwithpair(0.2A,0.3A);(e)Barnase-barstarwithpair(0.2A,0.7A);(f)Barnase-barstarwithpair(0.2A,1.1A);(g)RNA-peptidewithpair(0.2A,0.3A);(h)RNApeptidewithpair(0.2A,0.7A);(i)RNA-peptidewithpair(0.2A,1.1A).122Table5.5:R2valuesandbestlinesofelectrostaticbindingfreeenergieswithtgridsizes.Gridsizes(pair)R2BestlineDNA-drug(0.2,0.3)1.0000y=0.9993x+0.0194(0.2,0.4)0.9999y=0.9987x+0.0273(0.2,0.5)0.9998y=1.0028x+0.0164(0.2,0.6)0.9991y=1.0047x+0.2256(0.2,0.7)0.9982y=1.0074x+0.1394(0.2,0.8)0.9966y=1.0110x+0.1484(0.2,0.9)0.9906y=0.9655x+1.2385(0.2,1.0)0.9875y=0.9827x+0.5894(0.2,1.1)0.9747y=1.0081x+0.0709Barnase-barstar(0.2,0.3)0.9999y=0.9974x+0.2035(0.2,0.4)0.9999y=0.9997x-0.0492(0.2,0.5)0.9995y=1.0318x-2.7755(0.2,0.6)0.9946y=0.9878x+1.5525(0.2,0.7)0.9932y=1.0090x+0.1819(0.2,0.8)0.9883y=0.9976x+3.7333(0.2,0.9)0.9493y=0.9382x+5.3970(0.2,1.0)0.9384y=1.0912x-3.8377(0.2,1.1)0.8002y=0.9974x+18.2837RNA-peptide(0.2,0.3)1.0000y=0.9997x-0.0655(0.2,0.4)1.0000y=1.0001x-0.1106(0.2,0.5)1.0000y=1.0012x-0.2755(0.2,0.6)1.0000y=0.9999x+0.2021(0.2,0.7)0.9999y=1.0037x-0.3756(0.2,0.8)1.0000y=1.0004x+0.6673(0.2,0.9)0.9999y=0.9927x+1.9755(0.2,1.0)0.9997y=0.9923x+2.8775(0.2,1.1)0.9998y=0.9937x+1.79925.6ConclusionPoisson-Boltzmann(PB)theoryisanestablishedmodelforbiomolecularelectrostaticanaly-sisandhasbeenwidelyusedinelectrostaticsolvationandbindingenergyestimation.Inthischapter,wepresentagridsizealmostindependentPBsolver,i.e.,MIBPBsoftware,itmakestheaccurateandcoarsegridPBsoftwarepossible.ThemainthemeoftheMIBPBsolver123Figure5.8:BindingelectrostaticenergyforDNA-drugcomplexeswithgridsizesfrom0.2Ato1.1A.ThemarkersandPDBIDsareasfollowsyellowcircle:102d,magentacircle:109d,cyancircle:121d,greencircle:127d,redcircle:129d,bluecircle:166d,blackcircle:195d,yellowdiamond:1d30,magentadiamond:1d63,cyandiamond:1d64,greendiamond:1d86,reddiamond:1dne,bluediamond:1eel,blackdiamond:1fmq,yellowsquare:1fms,magentasquare:1jtl,cyansquare:1lex,greensquare:1prp,redsquare:227d,bluesquare:261d,blacksquare:264d,yellowtriangle:289d,magentatriangle:298d,cyantriangle:2dbe,greentriangle:302d,redtriangle:311d,bluetriangle:328d,blacktriangle:360d.isrigorouslytreatingfourdetailsthatreferredinthePBequation.First,themolecularsurfaceSES,isanalyticallyimplementedintheCartesiangridforthepurposeofthediscretizationofthePBequation,whichistfromthemolecularsurfaceusedinmostofthecurrentexistingPBsolvers,mostofwhichutilizedtheapproximatedSESinsteadoftheexactSES.Second,thesingularchargesaretreatedbytheGreen'sfunctioninsteadofconventionalmethodthatprojectthesingularchargestotheclosesteightgridpoints,theprojectionmethodsusuallyyieldsthechargesbeprojectedtothegridinthesolventdomainwhenthecoarsegridisemployed,theGreen'sfunctiontreatmentmakesthecoarsegridtreatingofthesingularchargespossible.Third,theinterfaceconditionsariseinthePBequationaretreatedrigorouslythroughtheinterfaceconditionsmatching,intheliterature,124Figure5.9:Bindingelectrostaticenergyforbarnase-barstarcomplexeswithgridsizesfrom0.2Ato1.1A.ThemarkersandPDBIDsareasfollowsyellowcircle:1b27,magentacircle:1b2s,cyancircle:1b2u,greencircle:1b3s,redcircle:1x1u,bluecircle:1x1w,blackcircle:1x1x,yellowdiamond:1x1y,magentadiamond:2za4.Figure5.10:BindingelectrostaticenergyforRNA-peptidecomplexeswithgridsizesfrom0.2Ato1.1A.ThemarkersandPDBIDsareasfollowsyellowcircle:1a1t,magentacircle:1a4t,cyancircle:1biv,greencircle:1exy,redcircle:1g70,bluecircle:1hji,blackcircle:1i9f,yellowdiamond:1mnb,magentadiamond:1nyb,cyandiamond:1qfq,reddiamond:1ull,greendiamond:1zbn,bluediamond:2a9x,blackdiamond:484d.125therearesomeothermethodsthatcantreattheseconditionswithsimplegeometrywhileourmethodcanhandlegeometrywitharbitrarycomplexgeometry.Fourth,thereactionpotentialextensionutilizedfortheevaluationofthereactionenergy,thisposteriortreatmentavoidstheaccuracyreductionduetotheusageofthereactionpotentialinsolvent.Duetothemathematicalrigorouslytreatmentandtheutilizationofsecondordercon-vergencescheme,theMIBPBsolverconvergeveryfast.WhichleadstothegridsizealmostindependentpropertyofthePBsolver,thispropertywasvbyalargeamountoftestcasesincludesbothanalytictestsandtherealbiomoleculetests,allthetestsdemonstratethatthecurrentPBsolverisgridsizeindependent.Throughthetestofthestabilitywithrespecttotsurfaces,thereducedmolecularsurface(MSMSsurface)andtheana-lyticalEuleriansolventexclusivesurface(ESES),thecurrentPBsolver'sgridindependentpropertyisshowntobeindependentfromthemolecularsurfaceused.Furthermore,thepresentpapersolvestwoproblemsproposedinRobertC.Harrisetalspaper[107]namely,qualitativerankingthebiomolecularcomplexesbytheirelectrostaticbindingfreeenergiesattgridspacings,andtheconvergencecalculationofthebindingfreeenergiesonthesolventexcludedsurface.Highlyaccurateandhighlystablecalculationofthebindingfreeenergyplaysacriticalroleincomputationalchemistry,alsohaslotsofimportantpharma-ceuticalapplications.Withabetterestimationofthebindingfreeenergieswillhelpsalotonthedrugdesignandsomeotherindustries.ThecurrentMIBPBsolversperformextremelyaccurateandstableforboththeelectrostaticsolvationandbindingenergiescalculation.Thecurrentresultsshowthatthewidelyusedgridsize0.5Acangiveaccurateenoughresultsforbothqualitativelyrankingthebiomolecularcomplexesandthequantitativelyevaluatingtheelectrostaticbindingfreeenergies,actuallywhenthegridsizeislessthan0.7Athe126resultsisalreadysuitablefortherankingofthecomplexes,andthebindingfreeenergiescalculatedatthecoarsergridishighlyconsistentandcorrelatedwiththatatgrid,thebindingenergiesalsoconvergetothebenchmarkresults.Insum,thisworkdevelopedagridsizeindependentPBsolver,whichmakesthecoarsegridPBsolverbecomespossibleandindirectlyspeedupthePBsolver.127Chapter6HybridPhysicalandStatisticalModelsforSolvationFreeEnergyPrediction6.1IntroductionSolvationinwhichseparatedsolventandsolutemoleculesarecombinedtoformasolution,isanelementaryphysicalprocessinnaturethatprovidesafoundationformorecomplicatedchemicalandbiologicalprocesses,suchasionchannelpermeations,proteinligandbinding,electrontransfer,signaltransduction,DNAsptranscription,post-transcriptionmogeneexpression,proteinsynthesis,etc.Therefore,theunderstandingofsol-vationisaprerequisiteforthequantitativestudyofothermorecomplexchemicalandbiologicalprocessesandisofparamountimportancetochemistry,physicsandbiology[228,60,204,228].Themostbasicandreliableexperimentalobservationofsolvationisthesolvationfreeenergy,whichmeasuresthefreeenergychangeinthesolvationprocess.Asaresult,oneofthemostimportanttasksinthesolvationmodelingandcomputationisthepredictionofsolvationfreeenergies,whichhasrecentlydrawnmuchattentionincomputationalbiophysicsandchemistry[60,140,204,228,62,266].128Alargevarietyofsolvationmodelshavebeendevelopedinthepastfewdecadesforsolvationanalysisandsolvationfreeenergyprediction.Ingeneral,thesemodelscanbeintoeitherknowledge-basedmodelsorphysical-basedmodels.Knowledge-basedmodelsusuallyutilizealargeamountofavailabledataofsolvationfreeenergiestotrainstatisticalmodelsforthesolvationfreeenergyprediction.Oneexampleofthistypeofmodelmakesuseofsolventaccessiblesurfaceareas,solventexcludedsurfaceareasorcal-ibratedatomicsurfaceareasforsolvationfreeenergypredictions[260].Knowledge-basedmodelscanproviderelativelyaccuratepredictionsprovidedthatenoughexperimentaldataareavailable.Ontheotherhand,physical-basedsolvationmodelsarebasedonfundamentallawsofphysics.Thesemodelscanbefurtherintothreemajorclasses.Oneofthemisexplicitsolventmodelsinwhichbothsolventandsolutemoleculesaredescribedattheatomicorelectroniclevel.Thisclassofmodelsareaccurateingeneral,butusuallycom-putationallyexpensive[214,156,138].Anotherclassofmodelsareintegralequationbasedsolvationtheories[110,203,200],inwhichthesolutemoleculeisstillmodeledattheatomiclevel,whilethesolventismodeledbystatisticalmechanics,suchasliquiddensityfunctionaltheory,Ornstein-Zernikeequationwithhypernetted-chain,Percus-Yevickequationorotherspclosure.Integralequationbasedsolvationmodelsreducethenumberofdegreesoffreedomdramaticallycomparedtotheexplicitsolventmodels.Amajorfeatureofintegralequationapproachesisthatthesemethodsareabletoprovideagoodapproximationofsol-ventmicrostructuresnearthesolvent-soluteinterface[91].Theotherclassofthephysicalbasedsolvationmodelsisimplicitsolventmodels[192,56,211,199,83,6,90,270,43],inwhichthesolutemoleculecanbemodeledateithertheatomicorthequantummechanicallevel,whilethesolventisdescribedasadielectriccontinuum.Comparedtoothermethods,implicitsolventmodelshaveadvantagesofinvolvingasmall129numberofdegreesoffreedom,highlyaccuratemodelingofstrongandlong-rangeelectro-staticinteractionsthatoftendominatesolvationphenomena,andconvenienttreatmentsofsolvent-soluteelectronicpolarization[56].Inimplicitsolventmodels,thesolvationfreeenergyistypicallydividedintopolarandnonpolarcomponents.Thepolarsolvationfreeenergyisduetoelectrostaticinteractionsandcanbemodeledbyanumberofap-proaches,includingBorndielectricspheremodel,Kirkwoodmodel[136],GeneralizedBorn(GB)model[124,99,95,75,64,11],PolarizableContinuumModel(PCM)[121,239,71]andPoisson-Boltzmann(PB)theory[195,218,208,112].Amongtheseapproaches,Borndielectricspheremodeltreatsthesolutemoleculeasthedielectricspherewithacentrallylocatedatomicchargeisthesimplestone.TheKirkwoodmodelgivesanalyticalexpressionsforelectrostaticpotentialsinasphericalsolute-solventsystemwithmultiplechargesinsidethesolutemolecule[136].TheGeneralizedBorn(GB)model[124,99,95,75,64,11]isabletodealwitharbitrarilyshapedmoleculesandaccountsfortheimpactofeachpointchargebyitsedistance(i.e.,theBornradius)fromthesolvent-soluteinterface[234,57].TheGBmodelisrelativelyfast,whiledependsonothermethods,suchasthePBtheory,foritsparametrization.ThePCMdescribessoluteelectronsbyusingquantummechanicssothatsolvationinducedpolarizationscanbeaccountedthroughaniterativeprocedure.AfewtversionsofPCMshavebeenreported,includingdielectricPCM,integralequationformalismversionPCM,andthevariationalPCMmodels[121,239,71].PCMisrelativelycomputationallyexpensiveduetoitsquantummechanicalchargecalculationwhileitsaccuracyisboundedbythePBtypeoftreatmentofelectrostaticpotentialaswellasthequalityofsolvent-soluteinterfaces.ThePBtheorycanbederivedfrommorefundamentalMaxwell'sequations[195,218,208,112,90].ItismoreaccuratethantheGBmodelandcomputationallymorecientthanthePCM.ThePBcanbeeasilycoupledwithquantum130mechanicsforamoreaccuratedescriptionofthesolutechargedensity[262,263,45].Ithasbecomeoneofthemostpopularsolvationmodelsduetoitsrelativelylowcomputationalcostandhighmodelingaccuracyforbiomolecules.Thenonpolarsolvationcomponenthasbeenmodeledbyanumberofterms.Apopularapproach,basedontheScaled-ParticleTheory(SPT)fornonpolarsolutesinaqueoussolu-tions[226,197],istouseasolvent-accessiblesurfacearea(SASA)term[230,167].ItwasshownthataSolvent-AccessibleVolume(SAV)termisrelevantinlargelengthscaleregimes[163,114].RecentstudiesindicatethatSASAbasedsolvationmodelsmaynotdescribevanderWaalsinteractionsnearsolvent-soluteinterfacialregion[84,83,50,246].Acombinationofsurfacearea,surfaceenclosedvolume,andvanderWaalspotentialhasbeenshowntoprovideaccuratenonpolarsolvationpredictions[46,248].Thepolarandnonpolarcomponentsaretypicallydecoupledintraditionalimplicitsolventmodels.Recently,newhavebeenmadetocouplepolarandnonpolarcomponents[70,270,43].tialgeometrybasedsolvationmodelsmakeuseoftherentialgeometrytheoryofsurfacestodynamicallycouplepolarandnonploarmodelsbysurfaceevolution,whichisdrivenbyfreeenergyoptimization[270,45,43,44].Coupledwithanoptimizationprocedure,thismodelwasshowntodeliversomeofthebestnonpolarsolventfreeenergyprediction[46].Byapplyingconstrainedoptimizationtononpolarparameterselection,thismodelprovidesstate-of-the-artsolvationfreeenergyandcrossvalidationresultsforalargenumberofsolutemolecules[248].Itisimportanttovalidatesolvationmodelsbyexperimentaldata.SAMPLxblindsol-vationpredictionproject,aimedattestingproteinligandbindingmodels,alsoprovidesbenchmarktestsoftheperformanceofthesolvationmodels.Manytsolvationap-proacheshavebeentestedbytheSAMPLxchallengingmolecules[183,177,106,205,176,131173,207,174,81,105,86,87].ImplicitsolventmodelsareamongthemostcompetitiveapproachesinSAMPLxtests.Theobjectiveofthepresentworkistodevelopaprotocolthathybridsphysicalandknowledge(orstatistical)modelsfortheblindsolvationfreeenergyprediction.Inourap-proach,physicalmodelsareusedformodelingthepolarsolvationfreeenergy.Sp,weutilizePBtheoryforcomputingelectrostaticsolvationfreeenergies.BothapointchargebasedPBmodelandaKSDFTbasedpolarizablePBmodelaredevelopedinthepresentwork.FortheKSDFTbasedPBmodel,aniterativeprocessisdevelopedtotakecareofthesolventpolarizationandsoluteresponse[45].Inourstatisticalmodels,thenonpolarsolva-tionfreeenergyispredictedbasedonthesimilarityanalysisofsolutemolecules.Essentially,weassumethatsimilarmoleculesadmitthesamesetofnonpolarparameters.Inthiswork,weexaminetwostatisticalmodels.Oneofthemisbasedonafunctionalgroupscoring,andtheotherisbasedonanearestneighborapproach.Bothstatisticalmodelsutilizemachinelearningmethodstooptimizeparameters.Adatabasecontainsthesolvationfreeenergiesinaqueoussolventfor668solutemoleculesiscollectedfromtheliterature[31,175]tovalidatetheproposedsolvationprotocol.Itisfoundthatthepresentapproachprovidessomeofthebestblindpredictionsofsolvationfreeenergies.Therestofthischapterisorganizedasfollows.Insection6.2,wepresentthepolarsolvationmodels,inwhichboththeclassicalPoissonmodelandthepolarizablePoissonmodelwillbediscussed.Section6.3presentstwononpolarsolvationmodels,namely,functionalgroupscoringandnearestneighborapproaches.AnalgorithmforblindsolvationfreeenergypredictionisdevelopedinSection6.3.3.Theleave-one-outtestof668moleculesisemployedtovalidatetheproposedprotocol.Finally,theaccuracyandrobustnessoftheproposedprotocolisdemonstratedbyeSAMPLtestsets.1326.2PolarSolvationModelsInthissection,wewillderivethepolarizablePoisson(Boltzmann)modelbasedonthevariationalprinciple.WewillstartfromreviewingthePoisson(Boltzmann)modelfromtheenergeticvariationalapproachpointofview,thenwewillintroducetheenergyfunctionalforthepolarizablePoisson(Boltzmann)model.TherearetwomainadvantagesofthepolarizablePoisson(Boltzmann)model:AbInitiocalculationofthesolutemoleculechargedistribution,insteadoftheforceparametrization.Solventsolutepolarizationareincorporatedintothesolvationmodel.6.2.1FreeEnergyFunctionalConsiderthesolutiondomainformedbythesoluteandsolventmolecules,whichtakethedomainsmands,respectively.Considerthefollowingmultiscaledescriptionsofthesolutionsystem.Continuumdescriptionofthesolventdomain,whichmodelsthesolventasdielectriccontinuum;andtheatomicmodelingofthesolutedomain,wherethesolutemoleculeismodeledasatomsequippedwithpartialchargesattheatomiccenters.Theseparationofthesoluteandsolventdomainisdescribedbythemolecularsurfaceofthesolutemolecules.Continuumdescriptionofthesolventdomain,whilethesolutedomainismodeledattheelectroniclevel.Themolecularsurfaceofthesolutemoleculeisemployedastheseparationofthetwodomains.133Inthefollowingwewillformulatethepolarfreeenergyfunctionalofthetwomultiscalemodels.Forthemultiscalemodel,thatis,continuummodelingofsolventdomainandatomicmodelingofthesolutedomain.Thepolarfreeenergyfunctionalcanbewrittenas:Gp1=Zˆ˜ˆtotal˚12mjr˚j2+(1˜)12sjr˚j2dr;(6.2.1)where˜and1˜arethecharacteristicfunctionsofthesoluteandsolventdomains,respectively.ˆtotal=PNmi=1qi(rri)isthepartialchargesinsidethesolutemolecule,withqibetheamountofpartialchargelocatedattheithatomcenterri,Nmbethenumberofatomsofthesolutemolecules,(rri)isthedeltafunctionatthepositionri.mandsarethedielectricconstantsofthesoluteandsolventdomains,respectively.˚:=˚(r)istheelectrostaticpotentialinthesolution.Forthesecondmultiscalemodel,inwhichelectronicmodelingofthesolutemoleculeisusedtoreplacetheatomicmodelinginthemodel,moresp,weconsidertheKohn-ShamDensityFunctionalTheory(KSDFT)modelingoftheelectronicstructure.Thepolarfreeenergyfunctionalnowcanbeformulatedas:Gp2=Z˜^ˆtotal˚12mjr˚j2+(6.2.2)(1˜)12sjr˚j2+˜"Xi~22mjr ij2+EXC[n]#dr;where^ˆtotal=qn(r)qnn(r)isthechargedistributioninthesolutedomain,withqbetheunitchargeofanelectron,n(r)betheelectrondensity,nn(r)=PIZI(rRI)isthenucleusdensity,inwhichZIandRIaretheatomicnumberandpositionvector134ofnucleusI,respectively.f igaretheKohnShamorbitals.m:=m(r)isthepositiondependentelectronmass,~=h2ˇwithhbetheplanckconstant.ThemeaningsofthesamenotationsareinheritedfromthepreviousenergyfunctionalGp1.EXC[n]istheexchange-correlationenergywhichisafunctionoftheelectrondensitythatusedtodescribethemanyparticlesinteractionsinKSDFTscenario.Remark6.2.1.TheKohnShamorbitalf igsubjecttothefollowingorthonormalconstraint< ij j>=Z i(r) j(r)dr=8><>:1;i=j0;i6=j(6.2.3)where i(r)istheconjugateof i(r)Remark6.2.2.ThefollowingrelationbetweenKohnShamorbitalandelectrondensityholds:n(r)=Xij ij2:(6.2.4)Remark6.2.3.R˜[^ˆtotal˚]drdescribestheelectron-electron,electron-nucleustwobodyColumbicinteractions.Toavoiddoublecounting,weneglectthetwobodyColumbicinteractionenergyintheaboveenergyfunctionalthatdescribestheKSDFTenergyfunctional.Sofar,wehavealreadyformulatedthepolarenergyfunctionalforbothmultiscalemodelsoftheelectrostaticsinteractioninthesolute-solventsystem.Inthefollowingparts,wewillderivethegoverningequationsofthetwomodelsbythevariationalprinciple.1356.2.2PoissonmodelInthissubsection,wewillderivethegoverningequationforthemodelwithatomicmodel-ingofthesolutemoleculeandcontinuummodelingofthesolventmodel.Euler-LagrangeequationindicatesthatbytakingGp1˚=0yieldsthefollowingPoissonequation:˜[ˆtotal+r(mr˚)]+(1˜)[r(sr˚)]=0;(6.2.5)whichcanbewrittenas:((r)˚((r)))=ˆtotal;(6.2.6)wherethedielectricfunction(r)hasthefollowingform:(r)=8><>:m;r2ms;r2s(6.2.7)TheabovePoissonequationdescribestheelectrostaticspotentialdistributioninthesolute-solventsystem,ingeneral,thesoluteandsolventdomainstaketdielectricconstants,i.e.,s6=m.6.2.3PolarizablePoissonmodelAmoreaccuratephysicaldescriptionofthesolutemoleculeistomodelthesolutemoleculeattheelectroniclevel,inwhichthechargedistributionofthesolutemoleculeiscalculatedinanAbInitioapproach.Inthewatersolventenvironment,especiallyforthepolarmolecules,theinteractionbetweensolventandsolutemoleculescannotbeneglected,whichcanchangethechargedistributionofthesolutemoleculedramaticallycomparetothatatthevacuum136environment.Thesecondmodelpresentedabovecanbeusedtoincludethesolventtothesolutemoleculechargedistribution.Inthispart,wewillderivethegoverningequationsforthesecondmodel.WhentakingthevariationalderivativeofthepolarfreeenergyfunctionalGp2withrespecttotheelectrostaticspotential˚(r),i.e.,Gp2˚=0,givesthefollowingPoissonequationfordescribingelectrostaticspotential:((r)r˚(r))=^ˆtotal:(6.2.8)DuetotheAbInitiocalculationofthechargedistribution,theequationfordescribingtheelectronicstructureisrequired,toobtainthis,wetakethevariationalderivativeofthepolarfreeenergyfunctionalGp2withrespectivetotheKohnShamorbitalsubjecttotheorthonormalconstraintsoftheorbital,giveshGp2+Pi;j(ij< ij j>)i i=0;whichleadsto˜[2q˚] ˜2~22mr(r i)+˜EXC[n] i˜242Xj jji35=0;whichcanbewrittenas~22m+q˚+VXC[n] i=Xj jji;(6.2.9)whereVXC:=EXC[n]2 i.137Sinceij=< ijj j>=< jj~22m+q˚+VXC[n]j i>=< ij~22m+q˚+VXC[n]j j>=ji;thatis,:=ijisHermitian,hencethereexistaunitarymatrixU,suchthatUU=diag(E1;E2;);considerthechangeofbasisf^ ig=f igU,thenEq.(6.2.9)becomes~22m+q˚+VXC[n]Xj^ j(U)ji=Xj;k^ k(U)kjji;multiplybothsidesbyUilandcontractoverindexi,yields~22m+q˚+VXC[n]Xi;j^ j(U)jiUil=Xi;j;k^ k(U)kjjiUil;whichcanbeto~22m+q˚+VXC[n]^ l=El^ l:Notetheunitarytransformationdoesnotchangetheenergyofbasis,forthesakeofsimplicity,wedonotdistinguishbetweentwobasisset,thenweendsupwiththefollowinggeneralizedKohnShamequationforthedescriptionoftheKohnShamorbital:~22mr2+q˚+VXC[n] i=Ei i;(6.2.10)138whereVXC[n]:=EXC[n]2 i i=dEXC[n]dn.TheabovegeneralizedKohnShamequationcanbefurtherwrittenas:~22mr2+q˚RF+U0 i=Ei i;(6.2.11)where˚RF=˚˚0isthereactionpotentialofthesoluteinthesolventenvironment,inwhich˚and˚0aretheelectrostaticpotentialgeneratedbythesoluteinthesolventandsoluteenvironments,respectively.AndU0istheeKohnShampotentialinthevacuumenvironment.BydenotingU:=q˚RF+U0astheepotentialofthegeneralizedKohnShamequation,thegeneralizedKohnShamequationEq.(6.2.11)canbesimpas:~22mr2+U i=Ei i:(6.2.12)Remark6.2.4.ComparedtotheKSDFTinvacuum,ingeneralizedKSDFT,thereactionenergyinthesolute-solventsystemisaddedintothetheHamiltonianofKSDFT.6.2.4BoundaryandinterfaceconditionsTomaketheaforementionedtwomodelswell-posed,boththeboundaryandinterfacecon-ditionsneedtobespIntermsoftheboundarycondition,generallythefollowingfarboundaryconditionisenforcedfortheequationofelectrostaticspotential:˚(1)=0:(6.2.13)139However,inthepracticalnumericalimplementationofthePoissonequation,theitesizesolute-solventdomainisrequired,inthiscase,thefollowingDeybe-Huckelboundaryconditionsareenforced,respectively,fortheelectrostaticsequationintwomodels.Forthemodelwithatomicmodelingofsolutemolecule,weusethefollowingboundarycondition:˚(r)=NmXi=1qi4ˇsjrrij;8r2@:(6.2.14)Forthemodelwithquantummechanicsmodelingofsolutemolecule,weusethefol-lowingboundarycondition:˚(r)=Z^ˆtotal(~r)4ˇsjr~rjd~r8r2@:(6.2.15)Duetothefactthatthesoluteandsolventdomainsadmittdielectricconstants,thereisaninterfacewhichisdescribedbythemolecularsurfaceofthesolutemolecule,betweentwodomains.Acrossthisinterface,thefollowingconditionsonthecontinuityoftheelectrostaticspotentialandelectrostaticsareenforced.[˚]j=˚s(r)˚m(r)=0;8r2;(6.2.16)[n]=s˚s(r)nm˚m(r)n=0;8r2;(6.2.17)where[]denotesthejumpofthequantityacrosstheinterface˚sand˚mdenotefortheelectrostaticpotentiallimitsattheinterfacepointpointingfromthesolventandsolutedomain,respectively.nistheouternormaldirectionontheinterfacepointingfromsolute140domaintosolventdomain.6.2.5NumericalMethodMathematically,thepolarizablePoissonmodelisformedbythefollowingellipticinterfaceproblem8>>>>>>>>><>>>>>>>>>:((r)˚(r))=~ˆtotal;8r2˚s(r)˚m(r)=0;8r2sr˚s(r)nmr˚m(r)n=0;8r2˚(r)=R^ˆtotal4ˇsjr~rjd~r;8r2@:(6.2.18)coupledwiththefollowinggeneralizedKSDFTequation~22mr2+U i=Ei i:(6.2.19)TheellipticinterfaceproblemissolvedsimilartotheabovewithouttheGreen'sfunc-tiontreatmentofthesingularchargeprocedure,insteadthechargeislocallyprojectedtothenearesteightgridpointsinthediscretizedcomputationalsolutesolventdomain.ThegeneralizedKSDFTissolvedbytheSIESTAsoftwarewiththereactionenergyaddedintotheKohnShamepotential.DuetothecouplingofthePoissonmodelandtheKSDFT,inwhichtheelectrondensitycalculatedbyKSDFTisusedforthechargesourceinthePoissonmodel,inturn,theelectrostaticpotentialcalculatedfromthePoissonmodelisfurtherusedbygeneralizedKSDFTforupdatingtheelectronicstructure.Thecoupledmod-elsneedtobesolvedinanself-consistentapproach,thepseudo-codefortheself-consistentalgorithmisdemonstratedinAlgorithm??.Duringthecommunicationofthetwosolvers,thetotalchargeconservationschemeisutilized,theschemecontainstwosteps:141ProjectthechargedensitylocatedoneachmeshgridofSIESTAsolvertothenearest8gridsofthePoissonsolver.AssembletheprojectedchargesonthemeshgridsofthePoissonsolver.PseudopotentialisusedtoeliminatethecomplicatedofcoreelectronsinSIESTAsolver[224],forallcalculations,thedefaultdouble-plussinglepolarization(DZP)basisareused.Theissetas125Rydbergandlocaldensityapproximation(LDA)isusedtoapproximatetheexchangecorrelationpotential.Thesolutionmethodissettobe"diagon".6.2.6ReactionFieldEnergyCalculationInthispart,wewillformulatethepolarfreeenergy,whichisalsocalledthereactionenergyreferredinthepreviousparts.Duetothetnumericalmethodsintwopolarsolvationmodels,thepolarsolvationfreeenergyiscalculatedbytformulas.InthePoissonmodel,thereactionpotentialcanbewrittenas˚RF(r)=˚0(r)+~˚(r),inturn,thepolarfreeenergycanbewrittenas:Gp1=NmXi=1q(ri)˚RF(ri):(6.2.20)InthepolarizablePoissonmodel,thereactionpotentialcanbewrittenas˚RF(r)=˚inhomo(r)˚homo(r),where˚inhomo(r)issolvedfromthePoissonmodelwiththe142followingdielectricfunction:(r)=8><>:s;r2sm;r2mandthe˚homo(r)issolvedwiththehomogeneouspermittivityfunction(r)=s;8r2Inthechargedensityformalism,thepolarfreeenergycanbewrittenas:Gp2=Zmq(r)˚RF(r)dr:(6.2.21)Forthesakeofabbreviation,withoutambiguity,wedonotdistinguishbetweenGp1andGp2,anddenotethemasGp.6.3NonpolarsolvationmodelsInourearlierwork,thenonpolarsolvationfreeenergywasmodeledby[270,45,43,44]GNP=A+pV++ˆ0ZsUvdWdr;(6.3.1)whereandAare,respectively,surfacetensionandareaofthesolutemolecule.Additionally,pandVare,respectivelysurfaceenclosedvolumeandhydrodynamicpressureFinally,ˆ0isthesolventbulkdensity,andUvdWisthevanderWaalsinteractionpotential,i.e.,theLennard-Jonespotential.Theintegrationisoversolventdomains.Thisnonpolarsolvationfreeenergymodelwasshowntoexcellentpredictionsofexperimentaldatafor143variousnonpolarmolecules[46].Inourrecentwork,wehavedemonstratedsuperbresultsforalargenumberofpolarandnonpolarmoleculeswhenthenonpolarmodelinEq.(6.3.1)iscombinedwithaPBmodelforthepolarsolvationcomponent[248].TheLennardJonespotentialersaphysicalmodelfordealingwithvdWinteractionsnearthesolvent-soluteinterface.However,adrawbackofthisnonpolartermisthattheproberadiusintheLennardJonespotentialisnonlinearandcannotbeoptimizedtogetherwithothernonpolarparameters.Mathematically,forasmallproberadius,thevdWtermforagivenatomisproportionaltotheatomicsurfacearea.Therefore,atomicsurfaceareaapproachusedbyKollmanandco-workers[260]shouldhaveasimilarmodelingasthatoftheLennardJonespotential.Basedonthisobservation,weproposethefollowingnonpolarsolvationenergymodelGNP=NX=1Area+pVol=NX=1Xi2Areai+pVol;(6.3.2)whereNisthetotalnumberofatomictypesinagivensolutemolecule.Here,andAreaarethesurfacetensionandsurfaceareaofthethtypeofatoms,respectively,andAreaiisthesurfaceareaoftheithatominthetypeofatoms.Inthisnonpolarsolvationfreeenergymodel,theparametersandpneedtobelearnedfromatrainingset.BoththeLennardJonesandtheatomic-surface-areabasednonpolarsolvationmodelsarevalidatedinthepresentwork.1446.3.1Modelingofnonpolarsolvationfreeenergy-scoringfunc-tionalgroups6.3.1.1FunctionalgroupmodelingInournonpolarsolvationmodel,weassumethateachfunctionalgroupofmoleculesadmitsthesamesetofoptimalparameters.Similarapproachhasbeensuccessfullyusedintheliterature[220],includingours[46,248].Inthiswork,wefurtherincorporatemachinelearningtypeofmethodsforthenonpolarsolvationfreeenergyprediction.Tothisend,thewholedataset,whichcontains668molecules,ispartitionedintotrainingsetsandtestingsets.Intheleaveoneouttestcase,eachstepweonlyleaveonemoleculeoutasthetestset,alltheothermoleculesareregardedasthetrainingset.Inaspleave-SAMPLx-outstudy,alltheSAMPLxmoleculesareleftasthetestingset,whiletheremainingmoleculesaretreatedasthetrainingset.Furthermore,thetrainingsetinthisscenarioisalsodividedintotwoparts:i)themono-functionalgroupmolecules,inwhichmoleculesofaspfunctionalgroupisusedtotrainasetofparametersforallsimilarmolecules;andii)thepoly-functionalgroupsmoleculesinwhichscoringweightsareusedtoweightthecontributionsofparametersofvariousinvolvedfunctionalgroups.Table6.1liststwentyfunctionalgroupsusedinthetrainingset.WedenotefT1;T2;;Tngagivensetofnmoleculeswithaspfunctionalgroupfromtheabove20groups.Foramoleculeindexedj,thesolvationfreeenergycalculatedfromthepolarsolvationfreeenergymodel,atomicsurfaceareasandmolecularvolumearelistedas:nGpj;Area1j;Area2j;;AreaNj;Volo;(6.3.3)145Table6.1:Functionalgroupsandcorrespondingnumberofmoleculesusedinthetion.GroupNumberGroupNumberalkynyl8alkenyl38aldehydegroup11nitrilegroup5carboxyl7estergroup34ketone23amino35nitro9alcoholichydroxy33phenolichydroxyl16ether22alkane38aromatics33nitrogenheterocyclic19chlorinatedhydrocarbon53Nitrate5amid7thiol4thioether5wherej=1;2;;n.Inourapproach,thesolvationfreeenergyismodeledasthesumofthepolarandnonpolarparts,thusforthejthmolecule,thecorrespondingmodeledsolvationfreeenergycanbeexpressedas:Gj=GPj+NX=1Area+pVol:=Gj(P);(6.3.4)whereP:=f1;2;;N;pgisthesetofparameterstobeoptimized.Foragivensetofmoleculeswiththesamemono-functionalgroup,wedenotedG(P):=fG1(P);G2(P);;Gn(P)g,andtheassociatedexperimentalsolvationfreeenergyisde-notedasGExp:=nGExp1;GExp2;;GExpno.ThentheoptimalparametersetcanbeobtainedbysolvingthefollowingTikhonovregularizedleastsquareproblem(alsoknownasridgeregression)whichhasaclosedformsolution:argminPnjjG(P)GExpjj2+jjPjj2o;(6.3.5)wherejjjj2istheL2normofthequantity.Hereistheregularizationparameterchosentobealargenumber,suchas100,inthepresentworktoensurethedominanceofthe146termandavoidovthroughcontrollingthemagnitudeofjjPjj2..6.3.1.2ScoringthefunctionalgroupsMostmoleculesintheSAMPLblindtestsetsinvolvepoly-functionalgroups.Inthiscase,wefurtheremploythepoly-functionalgroupmoleculesinthetrainingsetfortrainingtheoptimalrelativescoringweightsbetweentfunctionalgroups.Accordingtotherelativescoringweights,thescoringweightsbetweenallthefunctionalgroupscanbeobtainedthroughasimplenormalizationprocedure.Wedenoten~T1;~T2;;~Tn0oagivensetofpoly-functionalgroupmoleculesthathasthesamefunctionalgroupsinthetrainingset.TheassociatedoptimizedparametersetsareP1;P2;;Pm,wheremisthenumberoffunctionalgroupsinthisset,eachPi,i=1;2;;mislearnedthroughsolvingtheregularizedoptimizationproblemgivenbyEq.(6.3.5).Forthejthmoleculeinthispoly-functionalgroupset,wemodelitssolvationfreeenergyas:Gj(!)=mXi=1!iGj(Pi);(6.3.6)wherejj!jj1:=Pmi=1!i=1,with!ibeingthescoringweightofithfunctionalgroup.Therelativescoringweightsforthemfunctionalgroupsassociatedtothissetofpoly-functionalgroupsmoleculescanbelearnedviasolvingthefollowingconstraintoptimizationproblem,argmin!jj~G(!)~GExpjj2;(6.3.7)subjecttojj!jj1=1;(6.3.8)147and!i0;8i=1;2;;m;(6.3.9)where~G(!)and~GExprepresent,respectively,thepredictedandexperimentalsolvationfreeenergiesforthisgroupofpoly-functionalgroupmolecules.SinceboththeoptimizationobjectgivenbyEq.(6.3.7)andtheconstraintconditionsinEqs.(6.3.8-6.3.9)areconvexwithrespecttothescoringweights!.TheaboveconstrainedoptimizationcanbeeasilysolvedviaaconvexoptimizationsolverintheCVXsoftwarepackage[103,102].Intherestofthissection,weprovideanexampletoillustratetheprocedureoffunctionalgroupbasedapproachforsolvationfreeenergyprediction.AsshowninFig.6.1,thetargetmolecule2-Chlorosyringaldehyde(PubchemID:53479)containsfourtfunctionalgroups.Weneedtoscoringweightsforthesefunctionalgroups.Notethatintheabovemolecularsearchingscheme,pairwisely,therelativeweightsforeachtwofunctionalgroupscanbedeterminedbysolvingtheconstrainedoptimizationprobleminEqs.(6.3.7)-(6.3.8)formoleculesinthetwocorrespondingfunctionalgroups.Accordingtothepairwiserelativeweights,thefunctionalgroupscoringinthetargetmoleculecanbeachieved.6.3.2Modelingofnonpolarsolvationfreeenergy-nearestneigh-borapproachTheabovefunctionalgroupbasedmodelistestedtobeabletoprovideveryaccuratesol-vationfreeenergypredictionprovidedthatforeachtestmolecule,parametersforalloftheinvolvedmono-functionalgroupscanbedetermined,andasetofmoleculeswiththesameploy-functionalgroupscanbefoundinthetrainingsetaswell.However,forsomecomplexmoleculesthatcontainfunctionalgroupsbeyondthoselistedinTable6.1,thismethodfails.148Figure6.1:Anillustrationofthefunctionalgroupscoringmethodforthepredictionofthesolvationfreeenergyformolecule2-Chlorosyringaldehyde(PubchemID:53479),whichcontainsfourfunctionalgroups:aldehydegroup,phenolichydroxyl,etherandchlorinatedhydrocarbon.Wecomputerelativeweightsbetweenphenolichydroxylandchlorinatedhydrocarbon;phenolichydroxylandether;phenolichydroxylandestergroup;andestergroupandaldehydegroup.Thenrelativeweightsarecombinedtogeneratethefullsetofweights!1;!2;!3and!4forsolvationfreeenergyprediction.149Figure6.2:Thifensulfuron(PubchemID:91729).Forinstance,consideringaSAMPL1testmolecule,thifensulfuron,asshowninFig.6.2,ithasaverycomplexstructureandcontainsfunctionalgroupscannotbefoundinTable6.1.Wethereforeproposethefollowingrankingalgorithmfornearestneighborsearching.6.3.2.1AtomicfeaturebasedmolecularrankingTodealwithgeneralandcomplexmolecules,weproposeanotherapproachtorankmolecules,anddevelopanearestneighborapproachfornonpolarsolvationfreeenergyprediction.Ourmolecularrankingisbasedonatomicfeatures.Ourbasicassumptionisthatiftwosolutemoleculeshavesimilaratomicfeatures,theirchemicalpropertiesarealsosimilar.Therefore,thesemoleculesshouldsharethesameparametersetforthenonpolarsolvationfreeenergymodel.Thisapproachisfurtherdescribedbelow.Ourmethodisbasedontheansatzthatmoleculeschemicalpropertiesaremainlyde-terminedbyatomicfeatures,includingstructuralfeaturesandatomicelectrostaticfeatures.Amongthem,atomicstructuralfeaturesincludeatomictype,atomhybridizationstateandbondinginformation.Atomicelectrostaticfeaturesincludeatomiccharge,atomicdipole,atomicquadrupoleandatomicelectrostaticsolvationfreeenergy.Atomicfeaturesarese-lectedbasedthecriteriathatagivenatomicfeatureisretainedifitcanelydiscrimi-natethepreviouslymentionedmono-functionalgroupslistedinTable6.1.Table7.1listsall150theatomicfeaturesthatareusedformoleculeranking.Table6.2:Atomicfeaturesusedforrankingmolecules.FeaturenameNumberofatomsNumberofheavyatomsNumberofhydrogenatomsNumberofsinglebondsNumberofdoublebondsNumberoftriplebondsNumberofaromaticbondsNumberofeachtypeofatomsNumberofsp1carbonatomsNumberofsp2carbonatomsNumberofsp3carbonatomsNumberofsp1nitrogenatomsNumberofsp2nitrogenatomsNumberofsp3nitrogenatomsNumberofsp2oxygenatomsNumberofsp3oxygenatomsNumberofsp2sulfuratomsNumberofsp3sulfuratomsMaximumofatomicreactionenergyMinimumofatomicreactionenergyMaximumreactionenergyofeachtypeofatomsMinimumreactionenergyofeachtypeofatomsMaximumofatomicreactionenergyAveragereactionenergyofeachtypeofatomsTotalabsolutechargeTotalchargeofeachtypeofatomsTotalabsolutechargeofeachtypeofatomsMaximumchargeofeachtypeofatomsMinimumchargeofeachtypeofatomsThevariationofeachtypeofatomicchargesMaximumofatomicdipoleMaximumofeachatomicdipoleMinimumofeachatomicdipoleVariationofatomicdipoleMaximumquadrupoleMaximumquadrupoleofeachtypeofatomsVariationofatomicquadrupoleVariationofeachtypeofatom'squadrupole151Wealsoconstructatomicfeaturesbyusingthestatisticsvariables,i.e.,average,maxi-mum,minimum,andvariationofquantitiesinTable7.1.Finally,weruleoutredundantfeatures,whereredundancyisduetothefactthatsomeatomicfeaturesare100%correlatedwitheachother.Inthiscase,onlyoneofthesehighlycorrelatedfeaturesisretained.Atomicfeaturesarecalculatedbyfollowingmethods.MolecularstructuralinformationisparsedbytheOpenBabelsoftware[169].Atomiccharge,atomicdipoleandatomicquadrupoleareobtainedviatheDistributedMultipoleAnalysis(DMA)method[227],inwhichthechargedensityisoriginallycomputedbythedensityfunctiontheorywithB3LYPand6-31GbasisintheGaussianquantumchemistrysoftware[80,18,148].Atomicelectrostaticsolavtionenergyiscalculatedbyourin-houseMIBPBsoftware[253,39,90].Withtheaboveselectedatomicfeatures,intuitively,wemeasurethesimilarityofmoleculesbythePearsoncorrelationcotofatomicfeaturevectors.Sp,wedenotethefeaturesoftwomoleculesasvectorX:=fx1;x2;;xkgandvectorY:=fy1;y2;;ykg,thentheirsimilarityismeasuredbyCXY=jPki=1(xix)(yiy)jqPki=1(xix)2qPki=1(yiy)2;wherex:=1kPki=1xiandy:=1kPki=1yi,andkisthedimensionofthefeaturespace.Thehigherthecorrelationbetweentwoatomicfeaturevectorsindicatesthemoresimilaritybetweenmolecules.1526.3.2.2NearestneighborsolvationfreeenergypredictionByusingtheabovenearestneighborbasednearestmoleculesearchingprocedure,foragivenmolecule,weobtainthesimilarityrankingoftheremainingmoleculeswiththisspone,aswellascorrelationsbetweenmolecules.Inthenearestneighborpredictionofthenonpolarsolvationfreeenergy,welearntheparametersinthenonpolarsolvationfreeenergyexpression,Eq.(6.3.2),byagivennumberofnearestmoleculesfoundinthetrainingset.ThenumberofthenearestmoleculesusedisbasedonthefollowingprinciplesAllthemoleculeswhosecorrelationwithagivenmoleculeisgreaterthan0.99areusedfortheparameterlearningforthisgivenmolecule;Iftheabovecriterionyieldslessthan5molecules,weuse5nearestmoleculesfortheparameterlearning.Herethe5-moleculenearestneighbormethodisfoundtoprovidethebestleave-one-outprediction.6.3.3protocolforblindsolvationfreeenergypredictionsWeutilizeauprotocolforthepredictionofsolvationfreeenergy.First,forthepolarsolvationfreeenergyGp,weselecteitherthePoissonmodelwithagivenpointchargeforceorthepolarizablePoissonmodel.Second,forthecalculationofthenonpolarsolvationfreeenergywenotethatingeneral,thefunctionalgroupscoringapproachcandeliverbetterblindpredictionsthanthenearestneighborapproach.Therefore,inournonpolarenergypredictionstep,thefunctionalgroupscoringmethodisusedwheneveritworks.Otherwise,weutilizethenearestneighborapproach.1536.4Resultsanddiscussions6.4.1Dateprocessingandmodelvalidation6.4.1.1DatasetsandforceWeconsideratotalof668molecules,thestructuresofthesemoleculesaredownloadedfromthePubchemproject.Theexperimentalsolvationfreeenergiesofthesemoleculesarecol-lectedfromtheliterature.ThisdatasetcontainsmoleculesfromtheSAMPLblindsolvationpredictionprojects,rangingfromSAMPL0toSAMPL4[184,105,86,87,106];andtheremainingmoleculesinourdatasetarecollectedfromtheliterature[30,261,153].Coinci-dentally,thereisaconsiderableoverlapofourdatabasewithMobley'ssolvationdatabase,whichisavailablefromhttp://mobleylab.org/resources.html.Moreprecisely,589moleculesinourdatabasearealreadycoveredbyMobley's.Thedetailinformationofourdatasetisprovidedinthesupportingmaterial.ForboththestandardPoissonmodelandtheKSDFTbasedpolarizablePoissonmodel,weconsiderfourtypesofatomicradii,namely,Amber6,Amberbondi,Ambermbondi2[35]andZAP9[183]parametrizations.Additionally,forthestandardPoissonmodel,threesetsofchargeassignments,namely,OpenEye-AM1-BCCv1parameters[123],Gasteiger[85],andMulliken[35],aretested.OurMIBPBsolver[253,39,90]isutilizedtosolvethePoissoninterfaceproblemtoobtainelectrostaticsolvationenergies.Theproberadiusissetto1.4AfortheESESsurfacegenerationinallPBcalculations.FortheKSDFTbasedpolarizablePoissonmodel,thePoissoninterfaceproblemiscoupledwiththegeneralizedKSDFTinaself-consistentapproach.ThechargedensityusedbyPoissoninterfaceproblemiscalculatedinAbInitioapproachbythegeneralizedKSDFT,154Table6.3:MoleculesinSAMPLxsetsinvolvingbromineand/oriodineatoms.TestsetMoleculeSAMPL0benzylbromideSAMPL1bromacilSAMPL25-bromouracil5-iodouracilSAMPL3NoneSAMPL4Noneheregeneralizationduetotheadditionalsolute-solventreactionenergyisincludedintheeKSpotential.ThePoissoninterfaceproblemisemployedforcalculatingthereactionenergy.ThisapproachismofromourearliertialgeometrybasedpolarizablePBmodel[45].WeiterativelycoupletheSIESTA[224]andtheMIBPBsolver[253,39,90],inwhichasharpsolvent-soluteinterfaceisemployed.TheSIESTAsoftwarewithadditionalreactiondenergyisusedforchargedensitycalculation,andourin-housesoftwareisusedforreactionenergycalculation.Auniformmeshsizeof0.25AisusedforsolvingthePoissonequationinboththestandardPoissonandPolarizablePoissonmodel.InthepolarizablePoissonmodel,thecomputedchangedensitiesaremappedtotheuniformmeshwiththeconservationofthetotalcharge.Forthemoleculescontainingiodineatoms,thecurrentlevelofDFTmethodusedinthiswork,includingtheGaussiansoftware,cannothandlethiselementappropriately.Therefore,forauniformcomparison,weignoremoleculescontainingiodineatoms.Thereisasimilarsituationforbromineatoms|thereisnoappropriatepseudo-potentialforthisatomintheSIESTAsoftware.Therefore,moleculesinvolvingbromineatomsareexcludedinourKSDFTbasedpolarizablePBcalculations.Table6.3listsfourmoleculesinSAMPLxthatarenotconsideredinsomeofourpredictions.1556.4.1.2AtomicsurfaceareaandmolecularvolumecalculationInournonpolarsolvationfreeenergymodel,bothatomicareasandsurfaceenclosedvolumearerequired.InarecentlydevelopedEuleriansolventexcludedsurface(ESES)package,asecond-orderaccuratenumericalschemeforsurfaceareacalculationsandathird-orderaccuratenumericalschemeforvolumeestimationshavebeendeveloped[157].AweightedVoronoidiagramalgorithmisimplementedtopartitionamolecularsurfaceareaintoatomicsurfaceareas.Theseschemeshavebeenintensivelyvalidatedbyalargenumberoftestexamples.Inthiswork,bothatomicareasandmolecularvolumearecalculateddirectlybyusingourESESsoftwarepackage[157].6.4.1.3ValidationofatomicsurfaceareabasednonpolarmodelInthiswork,insteadofusingtheclassicalnonpolarmodelthatincludesthevanderWaalsinteractionbetweenthesolventandsolutenonpolarinteraction[248]asshwoninEq.(6.3.1),weutilizetheatomicsurfaceareamodelgiveninEq.(6.3.2)forthecorrespondingInthispart,weconsiderasetofnumericaltesttoverifythevalidityofthistreatment.OurnumericalresultsindicatethattheatomicsurfaceareamodelprovidesresultsasgoodasthoseobtainedbythevanderWaalsbasednonpolarmodel.ThevanderWaalsinteractionEq.(6.3.1)ismodeledasasemi-continuousandsemi-atomicpotential.Moresp,weconsiderthe6-12LennardpotentialformodelingthevanderWaalsinteractionbetweenthecontinuumsolventandatomisticsolutemolecules[248].ThesolventradiiintheLennaedpotentialissettobe1.4A,whichisthesameasthatusedinallothercalculations.Table6.4liststheRMSEswithfourtypesofatomicradiiandfourchargeparametrizationmethodsforthepredictionofSAMPL0'ssolvationfreeenergies.Fromtheseresults,itisseenthatforagivenforceeldparametrization,the156Table6.4:TheRMSEsofthesolvationfreeenergypredictionwithatomicsurfaceareaandvanderWaalsinteractionmodelsofnonpolarsolvationfreeenergyforSAMPL0testset.ThemoleculeintheSAMPL0setthatcontainsBratomisexcludedfromthiscomparison.Allresultsareinunitkcal/molNonpolarmodelRadiusBCCMullikenGasteigerSIESTAAtomicsurfaceareaAmber61.301.271.230.99AmberBondi1.401.301.300.93AmberMbondi21.411.341.331.10ZAP91.331.391.371.08vanderWaalsAmber61.421.311.340.93AmberBondi1.441.301.301.05AmberMbondi21.451.381.301.18ZAP91.441.391.321.32Table6.5:TheRMSEsoftheleave-one-outtestofthesolvationfreeenergypredictionwithtmethods,allwithunitkcal/molRadiusBCCMullikenGasteigerSIESTAAmber61.471.491.651.65AmberBondi1.341.481.661.40AmberMbondi21.331.491.681.42ZAP91.331.391.701.53atomicsurfaceareaapproachprovidesslightlymoreaccuratenonpolarsolvationfreeenergyprediction.Asshowninourearlierwork[248],theperformanceofthesemi-continuousandsemi-atomicLennardJonespotentialissensitivetothechoiceoftheproberadius,duetoitsnonlineardependenceontheradius,whereastheSESbasedatomicsurfaceareaapproachislesssensitivetotheproberadius.WeconcludethatingeneraltheatomicsurfaceareabasednonpolarmodelisatleastasgoodastheLennardJonespotentialbasedoneformodelingthenonpolarsolvationfreeenergy.Therefore,theatomicsurfaceareabasednonpolarmodelisemployedintherestofthischapter.157Figure6.3:Theleave-one-outerrorofthewholetrainingsetwith16tchargeandradiusparametrizations.Thenearestneighborapproachisemployedforsolvationfreeenergyprediction.6.4.2Solvationpredictions6.4.2.1Leave-one-outpredictionWeexaminetheproposednearestneighborapproachbytheleave-one-outtestof668molecules.Inthisexamination,weselectonemoleculeatatimeanduseallothermoleculesasthetrainingsettopredicttheselectedone'ssolvationfreeenergy.Thisprocessisappliedsystematicallytoallthemoleculesinthewholedatasetof668molecules.Fourtatomicradiussetsareconsidered,togetherwithfourtchargeforceOurresultsareillustratedinFig.6.3,theassociatedvaluesarelistedinTable7.4.Ingeneral,allradiussetsandchargeforceperformsimilarlywell.ThemaximumRMSEisbelow1.7kcal/molforallmethodsoverallmolecules.Moresp,BondiandMbondiradiithebestoverallresults.ForchargeforceAM1-BBCchargesappeartoprovidethebestpredictionsandtheirRMSEsarelessthan1.5kcal/molforallthefourradiussets.158Figure6.4:Theplotofoptimalleaveoneouttestresults,wheretheoptimalpredictionachieveswhenBCCchargeisusedinconjunctionwitheithertheAmberMBondi2ortheZAP9radius.Inboththesecases,thepredictionwithRMSE1.33kcal/mol.Thecorrelationbetweenpredictionandexperimentalsolvationfreeenergiesare0.956and0.955,respectively,fortheAmberMBondi2andZAP9forceThecorrespondingR2are0.913and0.911,respectively.TheoptimalresultobtainedwithAM1-BBCchargesandZAP9radiihasanRMSEof1.33kcal/mol.Figure6.4plotstheoptimalresultsontheleaveoneouttest.Forasetof643molecules,whichlargelyoverlapswithpresentdataset,MobleyandGuthriereportedanRMSEof1.51kcal/mol[175].Ourresultsindicatethatthepresentnearestneighborapproachcanachievehighlyaccuratepredictionsforthesolvationfreeenergiesforalltheatomicradiusandchargeforce6.4.2.2SAMPLxblindpredictionsInthissection,wepresentresultsoftheblindsolvationfreeenergypredictionsbasedontheproposedprotocol.AlltheSAMPL0-SAMPL4challengesforsolvationfreeenergiesareconsidered.Wepredictthesolvationfreeenergieswithaleave-SAMPLx-outapproach,inwhichtheSAMPLxmoleculesandtheirexperimentalsolvationfreeenergiesareregarded159Figure6.5:ThepredictionresultsfortheSAMPL0blindtestset,theleftchartshowstheRMSEsbetweentheexperimentalandpredictionsolvationfreeenergies.TherightchartshowstheoptimalpredictionsofthesolvationfreeenergiesfortheSAMPL0testset.Figure6.6:ThepredictionresultsfortheSAMPL1blindtestset,theleftchartshowstheRMSEsbetweentheexperimentalandpredictionsolvationfreeenergies.TherightchartshowstheoptimalpredictionsofthesolvationfreeenergiesfortheSAMPL1testset.160Figure6.7:ThepredictionresultsfortheSAMPL2blindtestset,theleftchartshowstheRMSEsbetweentheexperimentalandpredictionsolvationfreeenergies.TherightchartshowstheoptimalpredictionsofthesolvationfreeenergiesfortheSAMPL2testset.Figure6.8:ThepredictionresultsfortheSAMPL3blindtestset,theleftchartshowstheRMSEsbetweentheexperimentalandpredictionsolvationfreeenergies.TherightchartshowstheoptimalpredictionsofthesolvationfreeenergiesfortheSAMPL3testset.161Figure6.9:ThepredictionresultsfortheSAMPL4blindtestset,theleftchartshowstheRMSEsbetweentheexperimentalandpredictionsolvationfreeenergies.TherightchartshowstheoptimalpredictionsofthesolvationfreeenergiesfortheSAMPL4testset.asunknownwhileinformationofothermoleculesisutilizedtopredictselectedSAMPLtestset,basedonmolecularformulas.WeconsiderSAMPL0testset,inwhichmolculesarediverse.OnemoleculeinthistestsetcontainsBratom,forwhichourpolarizablePBmodeldoesnotanappropriatepseudo-potential.ThestructuresoftheSAMPL0moleculescanbefoundintheliterature[183].TheleftchartofFig.6.5showstheplotofRMSEsforeachchargeandatomicradiuscombination.Itisclearthatthechangeinatomicradiihasaminorontheaccuracyofpredictions,whilethechangeinchargeforcehasamuchmoretInotherwords,thesolvationfreeenergypredictionforthistestsetismoresensitivetothechargeparametrizationthantheatomicradiiparametrization.Ingeneral,ourKSDFTbasedpolarizablePoissonmodelprovidesbetterpredictionsthanthoseofotherchargeforceItisnotedthattheSIASTAbasedpolarizablePoissonmodelwithAmberBondiradiusparametersdeliversthebestsolvationfreeenergyprediction.ThisresultisdepictedintherightchartofFig.6.5.TheassociatedRMSE(0.93kcal/mol)appearstobebetterthanthatintheliterature[183](i.e.,1.71kcal/molforthefullSAMPL0testsetof17molecules).162Table6.6:TheRMSEsofthesolvationfreeenergypredictionwithtmethods.TheRMSEsinsideandoutsidetheparenthesisdenoteforthepredictionerrorsincludeandex-cludethemoleculescontainsBratom.Allwithunitkcal/molTestsetRadiusBCCMullikenGasteigerSIESTASAMPL0Amber61.30(1.26)1.27(1.25)1.23(1.20)0.99(NA)AmberBondi1.40(1.37)1.30(1.27)1.30(1.27)0.93(NA)AmberMbondi21.41(1.37)1.34(1.32)1.33(1.29)1.10(NA)ZAP91.33(1.29)1.39(1.37)1.37(1.33)1.08(NA)SAMPL1Amber63.26(3.27)4.74(4.77)4.92(4.96)2.92(NA)AmberBondi3.06(3.07)4.65(4.68)5.52(5.55)2.89(NA)AmberMbondi23.29(3.30)5.39(5.41)4.76(4.82)2.82(NA)ZAP94.26(4.35)6.16(6.16)5.33(5.45)3.76(NA)SAMPL2Amber62.09(2.11)3.51(3.59)4.78(4.86)3.46(NA)AmberBondi1.95(1.97)3.38(3.47)4.62(4.72)1.90(NA)AmberMbondi21.90(1.96)3.55(3.66)4.65(4.76)2.35(NA)ZAP92.05(2.03)3.19(3.15)4.56(4.51)1.93(NA)SAMPL3Amber61.281.420.971.08AmberBondi1.471.580.821.16AmberMbondi21.471.580.821.16ZAP91.551.280.781.33SAMPL4Amber61.281.201.081.41AmberBondi1.121.411.101.07AmberMbondi21.091.331.031.04ZAP91.121.121.091.32ThebestpredictionintheliteraturehasanRMSEof1.34kcal/mol[134].EventhoughfortheSIESTAbasedAbInitiochargecalculation,themoleculecontainstheBratomisneglected,basedonthegeneralpatternthatwhenanyotherforceused,thepredictionforthatomittedmoleculesisverywell.WebelievewithproperpseudopotentialforBratom,ourpolarizablePoissonmodelcandeliveranaccuratepredictionforthismoleculealso.Moreover,forthistestset,alltheotherthreechargeparametrizationscanprovideasimilarlevelofsolvationfreeenergyprediction.WenextconsidertheSAMPL1challengesetforsolvationpredictions[105].Thissetcontainsnotonlyalargestnumberof63molecules,whichisthelargestamongalltheSAMPLtestsets,butalsomanymoleculeswithextremelycomplicatedstructures.The163detaildescriptionofthissetisgivenintheliterature[105].Mostmoleculesinthissetaredruggableandverycomplex.Theyofthistestsetcomesfromtwoaspects:ontheonehand,thestructuresofmoleculesinthissetareverycomplicated.Ontheotherhand,theknowledgeinthetrainingsetthatcanbeusedforpredictingthesolvationfreeenergiesofthesemoleculesisrareort.Inadditiontothemolecularcomplexity,thereportedexperimentalsolvationfreeenergiesalsoadmitlargeuncertainty[105].MostearliercomputationalpredictionsreportRMSEsof3to4kcal/molwhensomeextremelycomplexmoleculesareexcluded.SomebestpredictionforthewholesethasanRMSEof2.45kcal/mol[134].ThebestperformancewasshowntogiveanRMSEof2.4kcal/molonasubsetoftheSAMPL1testsetthatcontainsonly56molecules[165].Figure6.6plotsthepresentblindpredictions.Onemolecule(bromacil)thatcontainsaBratomisconsideredbyallthechargemodelsexceptforthepolarizablePoissonmethod.Itisnotedthattwoofpresentapproaches,namely,theAM1-BCCsemi-empiricalchargesandtheAbInitiochargeproviderelativelyaccuratepredictionsforthistestset.WhenAmberMbondi2atomicradiitogetherwiththeSIESTAchargecalculationapplied,theoptimalsolvationfreeenergypredictionisachievedwithRMSE2.82kcal/mol.WhenBCCchargesandtheAmberBondiradiiareused,thepredictionforthewholeSAMPL1setwithoutamoleculethatcontainsBratomhasanRMSEof3.06kcal/mol,theRMSEis3.07kcal/molforthesametestwithallmoleculesinvolved.Forthistestset,thelargepredictionRMSEmainlyduetotheextremelylargeerrorinpredictingsolvationfreeenergiesforsomecomplexmolecules,forwhichtheRMSEcanbeaslargeas15kcal/mol.Thisunreasonablepredictionisduetothefactthatinappropriatemoleculeforceparametrizationyieldsunreasonableelectrostaticsolvationfreeenergies,whichinturnleadstoerroneoussolvationfreeenergyprediction.SimilartotheSAMPL0testset,thepredictionismoresensitivetothecharge164parametrization.Nevertheless,theatomicradiusparametrizationalsoplaysancriticalroleinthepredictionaccuracyforthistestset.SAMPL2isanothertestsetwithalmostthesamelevelofyastheSAMPL1testset[137].ComparedwithSAMPL1testsetitcontainsafewcomplexmolecules,andmostmoleculesinthistestsetaredruglikeonesaswell.ContrarytomoleculesinSAMPL1testset,thissethaslessuncertaintyintheexperimentalsolvationfreeenergies.Theexperimentalsolvationfreeenergiesofthistestsetdistributeoverawiderange.Usingall-atommoleculardynamicssimulationsandmultiplestartingconformationsforblindpre-diction,KlimovichandMobleyreportedanRMSEof2.82kcal/moloverthewholesetand1.86kcal/moloverallthemoleculesexceptseveralhydroxyl-richcompounds[137].SomebestpredictionhasanRMSEof1.59kcal/mol[134].Inthepresentwork,themoleculecon-taininganIatom(5-iodouracil)isexcludedinallcalculations.Additionally,5-bromouracilhasaBratomisexcludedinthepolarizablePoissonmodel.TheRMSEsfromvariousradiusandchargeforcearegivenFig.6.7.Apparently,thistestsethasastrongforcedependenceaswell.TheRMSEsvaryverymuchfromonechargeforcetoanother.However,theperformanceofthesepredictionshasaweakradiusdependence.Thebestprediction,obtainedfromacombinationofAmberMBondi2radiusparametersandBCCchargeforceorAmberBondiradiustogetherwithmoSIESTAcharge,bothhavetheRMSEof1.90kcal/molwhenthemoleculewithaBratomisexcludedinthepredic-tion.Whenallmoleculesareincluded,thecombinationofAmberMBondi2radiusandBCCchargeparametrizationgivestheoptimalpredictionofRMSE1.96kcal/molasdepictedintherightchartofFig.6.7.ComparedtothepredictionofSAMPL1testset,thesepre-dictionsaremoreaccurate,duetotworeasons.First,themoleculesinthissetareslightlysimplerandexperimentaluncertainlyislesssevereforthedeterministicprediction.Second,165inourknowledgebasedmodelsforthenonpolarsolvationfreeenergyprediction,wehavemoresimilarmoleculesintrainingset,whichenablesustoobtainbetternonpolarsolvationfreeenergyparametersinthenearestneighborbasedapproach.Incontrast,intheSAMPL1prediction,whenthenearestneighborapproachisapplied,thenearestmoleculesselectedfromthetrainingsetmuchfromSAMPL1molecules.SAMPL3testsetisarelativelyeasyonewith36solutemolecules[87].Itsmolecularstructuresarelessversatilethantheearliertestsets.Additionally,inthistestset,thereisnomoleculethatinvolvesBrorIatom.OneofinthepredictionofthissetisthelackofsimilarmoleculesinourdatabasewhenalltheSAMPL3moleculesareleftout.ThebestpredictionintheliteratureanRMSEof1.29kcal/mol[134].TheRMSEsofourblindpredictionsfortheSAMPL3testsetareplottedinFig.6.8.Inthiscase,theaccuracyofpredictionsalsodependsstronglyonchargeforceandweaklyonradiusparameters.Ingeneral,GasteigerchargesaresuperbforthistestsetandtheirRMSEsarealwayssmallerthan1kcal/mol.Ourbestprediction,obtainedfromthecombinationofZAP9radiusparametersandGasteigerchargeforcehasasmallRMSEof0.78kcal/molandisdepictedintherightchartofFig.6.8.IngeneralGasteigerchargeperformsverypoorinthesolvationfreeenergypredictionforprevioustestsoncomplexmolecules,whileweseethatthischargeparametrizationmethodissuperiorforchlorinatedhydrocarbonmoleculesinthepresenttestset.Unfortunately,thereisnouniformlyoptimalparametrizationforallthemolecules.Thisfactmotivatesustoseekasolvationfreeenergypredictionthatdoesnotheavilydependsontheforceparametrization.SAMPL4testsetisstudiedbyusingtheproposedmethodsaswell.AkeyfeatureofthistestsetisthatitsmoleculesaredivThistestsetalsoinvolvesawiderangeofsolvationfreeenergies.However,thestructuresofthesemoleculesarenotascomplexasthose166inbothSAMPL1andSAMPL2testsets,whichindicatesaslightlyeasiertaskforsolvationfreeenergyprediction.ThistestsetwasstudiedbyanumberofresearchersintheliteratureandthebestpredictionintheliteraturehastheRMSEof1.2kcal/mol[177].Figure6.9illustratestheRMSEsofourpredictionsforatotalof16chargeandradiuscombinations.Comparedwiththoseintheliterature,allofourpredictionsareofhighquality.OurbestresulthastheRMSEof1.03kcal/molandthecorrespondingresultsaregivenintherightchartofFig.6.9.Table6.6liststheRMSEsofourblindpredictionsofsolvationfreeenergiesforalletestsetsusingatotalof16chargeandradiusimplementations.Somecommentsareinorder.First,allpredictionsarequitesensitivetochargeforceandrelativelylessdependentonradiusparameterselection.Additionally,itistoidentifyaclearwinneroverallthetestsets|someapproachesperformbetterinoneortwotestsets,butdonotdowellintheresttestsets.Thisphenomenonhighlightstheyofdesigningoptimalmodelsforsolvationanalysis.Moreover,theKSDFTbasedpolarizablePoissonmodelismoreoftentoprovidebetterpredictionsovermostchargeforceandradiusparameters.Thisindicatesapotentialtodevelopbetterphysicalmodelsbyimprovingthequantumchargedensitycalculation.Finally,wepointoutthattheblindpredictionresultspresentedinthepresentworkarethestate-of-the-artcomparedtothoseintheliterature[183,177,106,205,176,173,207,174,81,105,86,87,134].6.5ConcludingRemarksInthiswork,weproposedahybridphysicalandknowledgebasedprotocolfortheblindpredictionofsolvationfreeenergies.Theproposedprotocolpredictsnonpolarsolvationfree167energiesbystatisticalbasedmodelswhileutilizingeitherthePoissonmodeloraDensityFunctionalTheory(DFT)basedpolarizablePoissonmodelforpolarsolvationfreeenergycalculations.Fortheknowledgebasedmodelingofnonpolarsolvationfreeenergies,weutilizedtheassumptionthatmoleculeswiththesamefunctionalgroupadmitthesameparametrizationofthenonpolarsolvationenergyfunctional.Forcomplexpoly-functional-groupmolecules,wedevelopascoringproceduretodeterminetheoptimalrelativeweightofeachfunctionalgroup.Forextremelycomplexmoleculesthatfailthefunctionalgroupscoringmethod,wefurtherdevelopamoleculerankingalgorithmtoselectanoptimalsetofnearestneighbormoleculesforparametertraining.Weconstructatomicfeaturesforthemoleculeranking.Finally,wesystematicallyintegratetheabovementionedmodelsandmethodsintoarobustprotocolforblindsolvationfreeenergyprediction.Inthepresentwork,weconsideredanexperimentaldatabaseof668solvationmolecules,thelargestdatabaseeverconstructedforsolvation,tovalidateourapproach.Amongthem,SAMPL0toSAMPL4testsetsarepaidspecialattention.ForthePoissonmodelorDFTbasedpolarizablePoissonmodel,foursetsofatomicradiusparameters(i.e.,Amber6,Amberbondi,AmbermbondiandZAP9radii)arecombinedwithfoursetsofchargeforce(i.e.,AM1-BCC,Mulliken,GasteigerandSIESTADFT)toarriveatatotalof16timplementations.Theresultingpolarsolvationfreeenergiesareutilizedinourstatisticalapproachesforblindpredictions.Wecarryouttheleave-one-outvalidationofthewholedatabase.TheAM1-BCCchargeforcedeliversalowRMSEof1.33kcal/mol,whichisthelowestforsuchalargetestdatabase,toourbestknowledge.Wefurtherconductaseriesofleave-SAMPLx-outblindtests.Onaverage,theBCCparametrizationinthePoissonmodelandDFTbasedpolarizablePoissonmodelperformsbetterthanotherchargeforceespeciallyforpredictingthesolvationfreeenergiesofthecomplexmolecules.168Weobtainsomeofbestknownresults.TheoptimalRMSEsforSAMPL0-SAMPL4arerespectively,0.93,2.82,1.90,0.78,and1.03kcal/mol,whichagain,aresomeofthebesttoourbestknowledge.Fromthesolvationfreeenergypredictions,particularlyonSAMPL1andSAMPL2testsets,weconcludethatatomicchargeparametrizationisextremelyimportantforthepresentphysicalmodels,namely,thePoissonmodelortheKSDFTbasedpolarizablePoissonmodel.Withoutanappropriatechargeparametrization,thepredictionerrorscanbeformoleculeswithcomplexstructures.Ingeneral,forfourchargeparametrizationmethods,boththesemi-empiricalBCCchargeandtheAbInitiochargecalculationfromthegeneralizedKSDFTcanproviderelativelyreliablechargeassignments.Forthisreason,asolvationfreeenergypredictionmethodthatdoesnotheavilydependonthemoleculeparametrizationisunderourconsideration.Theessentialideaisthatifonedoesnotpartitionthesolvationfreeenergyintotwoisolatedparts,thepredictionerrorsinthepolarsolvationfreeenergywillnotbepropagatedtothenonpolarsolvationfreeenergyprediction.Alternative,ourrentialgeometrybasedsolvationmodelthatdynamicallycouplepolarandnonpolarmodels[270,43]mightalsoprovidealesssensitiveapproach.Thisworkcanbeimprovedinanumberofways.First,theofthedatabaseintofunctionalgroupsisnotunique.Futurestudieswillexploreoptimalmoleculesparti-tioning.Additionally,theselectionandcomputationofatomicfeaturesneedtobefurtherinvestigated.Itispossibletoconstructanoptimalsetofatomicfeaturesforsolvationanal-ysisandprediction.Further,inthecurrentwork,molecularrankingfornearestneighborsearchingisnotoptimalyet.Moresophisticatedmachinelearningand/ordeeplearningalgorithmscanbedevelopedforthispurpose.Finally,amoreversatileDFTsolverscanbeutilizedtofurtherimproveourin-housepolarizablePoissonmodel.169Chapter7LearningtoRankforSolvationPrediction7.1IntroductionLastchapter,wereviewedmanyphysicalbasedsolvationmodels.Besidesthephysicalsolvationmodels,theknowledgebasedmodelswhicharebasedonthecombinationofthestatisticallearningtheorywiththechemicalandbiologicalintuition.Manyknowledgebasedmodelshavebeenpresentedintheliteraturesforsolvationfreeenergyprediction.Forin-stance,thesolvationfreeenergymodeledbytheatomicsolventaccessiblesurfacearea[260];anothertypicalmodelisthemovabletypemodelinwhichthesolvationfreeenergyitselfisdecomposedintothecontributionfromatomicsolvationfreeenergy.Apartitionfunctionistrainedthroughthewholemoleculedatabaseforatomicsolvationfreeenergymodeling[290].Therearemanyotherinterestingknowledgebasedsolvationmodelsthatexist,thecomprehensivereviewingofthesemodelsisbeyondthescopeofthiswork.Inourpreviouswork[256],weproposedahybridphysicalandstatisticalmodelforsolva-tionfreeenergyprediction.Inwhichwefollowtheclassicalpolarandnonpolardecompositionofthewholesolvationfreeenergy.Forthepolarsolvationfreeenergyprediction,weappliedthePoissonmodelorthepolarizablePoissonmodelwithdtatomicparametrization,especiallythechargemodelsrangingfromsemi-empiricaltoAbInitiocalculation.Interms170ofthenonpolarsolvationfreeenergyprediction,wemodeleditbythemolecularsolventexcludedatomicsurfaceareaandvolumeenclosedbythemolecularsurface.Wefollowtheclassicalassumption,thatinthenonpolarmodel,moleculesoftfunctionalgroupshavetsetsofparameters.However,thisapproachonlyworksforsimplemonofunc-tionalgroupmolecules,forpolyfunctionalgroupmolecules,themoleculesharesthesamefunctionalgroupsistforaccuratesolvationfreeenergyprediction.Tomakethemethodapplicabletoarbitrarycomplexmolecules,wefurtherintroducedthenearestneigh-borsearchingalgorithmtodetecttheclosestsetofmoleculestoagiventargetmolecule,wherethesimilarityofthemoleculeismeasuredbythecosinesimilaritybetweentheatomicfeaturesofthemolecules.Tomakethenearestneighborsearchinge,weelaboratelyselectedtheatomicfeaturesthatcandistinguishmoleculeswithtfunctionalgroups.Theproposednearestneighborsearchingmethodisconsistentwiththeclassicalfunctionalgroupapproachduetothefactthatthenearestmoleculesformonofunctionalgroupmoleculefoundhavethesamefunctionalgroupasthetargetmolecule.Besidesthisbasicconsistencyintermsofthechemicalproperty,i.e.,thenearestneighborshavethesamefunctionalgrouptotheircentralmolecule,wefurthernoticedthatthenearestneighborspossessquiteclosesolvationfreeenergytoitscentralmolecule.Theproposedprotocolcanprovideaccuratesol-vationfreeenergypredictionprovidedthemoleculesareproperlyparameterized.However,whenthemolecularchargeassignmentisinappropriate,theerrorcanbeextremelylarge.Inotherwords,thepredictionbythepreviousmodelissensitivetothemolecularforceassignment.Thismotivatesustoseekamethodthatisinsensitive,oratleastmuchlesssensitivetothemoleculeparametrization.Byananalysisofthemodel,wefoundthatthemainissueonthelargepredictionerrorcomesfromthedecouplingofthepolarandnonpolarsolvationfreeenergies.Inthisapproach,oncethemoleculeisnotproperlyparametrized,the171Poissonmodeloritspolarizablemodelwillleadtohugeerrorintheelectrostaticcalculation,whichwillbefurtherpropagatedtothenonpolarsolvationprediction.Motivatedbyourpreviouswork,weproposeanewsolvationfreeenergymodelingparadigm.Inthismodel,insteadoftreatingthesolvationfreeenergyastwoseparatedparts,i.e.,polarandnonpolarparts,weconsiderthesolvationfreeenergyasaunity.Thesolvationfreeenergyitselfismodeledasafunctionofthemoleculardescriptors,wherethesedescriptorsdescribethemoleculeattheatomiclevelinsteadofatthemolecularlevel;thescaleenablesmoreaccuratedescription,moreimportantly,themolecularphysicalprop-ertieswillbecapturedbythesedescriptors.Basedontheaboveassumption,weintroducedalocalsolvationfreeenergylearningframework.Theprocedurecanbedividedintotwostages.First,applyingLearningToRank(LTR)theoryfornearestneighborsearching;sec-ond,localhyperplaneapproximationforsolvationfreeenergyfunction,wherethehyperplaneislearnedbyaregularizedleastsquarebasedonthenearestneighborsinformationtoagiventargetmolecule.Thischapterisstructuredasfollows:Section7.2isdevotedtomethodsandalgorithms.WeprovideabriefreviewofourpreviousnearestneighborsearchingalgorithminSection8.2.1,followedbythebasicansatzforbuildingthesolvationmodelinthiswork.TheLTRbasednearestneighborsearchingmethodisdescribedinSection7.2.2,whichisdividedintothreeparts:i)queryconstructionwhichincorporatesthepreviousnearestneighborsearchingresults;ii)featureselection;andiii)LTRformolecularneighbordetection.ThenearestneighborinformationbasedalgorithmforsolvationfreeenergypredictionispresentedinSection7.2.3.Section7.3presentsnumericalresultsanddiscussions.AfterdescribingthedatasetsandforceinSection7.3.1,wetheleave-one-outvalidationoftheproposedmodelinSection7.3.2.Wedemonstratethatthepresentmodelisnotverysensitivetoatomic172Figure7.1:Theplotofsolvationfreeenergiesofthecentralanditsneighbormolecules,theleftchartforthenearestneighbor,therightchartforthesecondnearestneighbor.Inbothcases,thehorizontalaxisrepresentthesolvationfreeenergyforthecentralmolecule,theverticalaxisstandsforthatofthenearestneighbormolecule.forceparametrization.SAMPLxblindchallengesarepresentedinSection7.3.3.Someofthebestresultsinsolvationfreeenergypredictionareobtained.7.2Methodsandalgorithms7.2.1BasicassumptionsAsmentionedabove,ourearlierHPKmodelfromitssensitivitytochargeandradiusparametrizations.Amajorgoalofthepresentworktodevelopamodelthatislessornotsensitivetofeatureparametrizations.Wehaveobservedthatthesolvationfreeenergyofatargetmoleculeisquiteclosetothatofitsnearestneighbors.Figure7.1depictsthecorrelationbetweenexperimentalsolvationfreeenergyfromamoleculeandthatofitsnearestneighbors.TheRMSEofsolvationfreeenergiesbetweenmoleculesandtheirandsecondnearestneighborsare1.44and1.77kcal/mol,respectively.173Motivatedbytheaboveobservation,weassumethatthereexistsafeaturevectortouniquelycharacterizeanddistinguisheachsolutemolecule.Moreover,weassumethatthesolvationfreeenergyofagivenmoleculeisafunctionalofitsatomicfeaturevector.GA=f(xA)(7.2.1)whereGAisthesolvationfreeenergyofsoluteA,fisaunknownfunctionalformodelingtherelationshipbetweensolvationfreeenergyandmoleculardescriptors,oratomicfeatures,andxAisthefeaturevectorofthegivensolutemolecule.Finally,weassumethatsimilarmoleculeshavesimilarsolvationfreeenergies.Itiswellknownthatforagivenfunctionwithsuitableregularity,itsorderTay-lorpolynomialprovidesaverygoodapproximationofthefunction,locally.Forthesol-vationfreeenergyfunctionGA=f(xA),itcanbelocallyapproximatedbyGAˇrf(xA0)(xAxA0)+GA0aroundthemolecularatomicfeaturevectorxA0andsolvationfreeenergyGA0.Foragivenmolecule,wecanpredictitssolvationfreeenergyinthefollowingfeaturing,learning-to-rankandlearning-to-predictprotocol:constructfeaturevectorsforallmolecules,includingthetargetone;nearestneighbormoleculestothetargetmoleculeinthedatabasewiththeknownsolvationfreeenergiesbyusingalearning-to-rankalgorithm;learnthelinearfunctionalrelationbetweenthesolvationfreeenergyandfeaturesac-cordingtoagroupofnearestneighbormolecules,andthenpredictthesolvationfreeenergyforthetargetmoleculebythelinearfunctional.Thefundamentalimportantansatzinthisworkisthatsimilarmoleculeshaveclosesolvation174freeenergies,whichincontrasttotheansatzusedinourHPKmodel:similarmoleculessharethesamesetofparametersinthenonpolarsolvationmodeling.Therefore,thesolvationfreeenergyisnolongermodeledbydecoupledpolarandnonpolarparts.Asaresult,thepresentmethodisnotverysensitivetochargeandradiusparametrization.Inthefollowingsections,weprovidedetaileddescriptionsonfeatureselection,nearestneighborsearchingalgorithmbasedonLTRandsolvationfreeenergyprediction.7.2.2Learning-to-rankalgorithmInthissection,weintroducetheLTRmethod.ThelistwiseLTRalgorithmisemployedtorankmolecules.InthetrainingprocedureoftheLTRmodel,thesolvationfreeenergyofmoleculeisusedasthemolecularlabel,whichisconsistentwithourbasicansatz.AscoringfunctionislearnedinthelistwiseLTRmethodonthesetoftrainingmoleculesandisutilizedforrankingthemoleculesinthesetoftestingmolecules.Thenearestneighborsearchingcanberegardedasatop-Nrecommendationproblem,whichismathematicallythesameastheitemsearchingintheworld-wide-web.7.2.2.1QueryconstructionInourLTRmodel,theuseofsolvationfreeenergyasalabel,basedonouransatzthatsimilarmoleculeshavesimilarsolvationfreeenergies,automaticallyimpliesthatsimilarsolvationfreeenergiesindicatesimilarmolecules.This,however,isnotreasonableingeneral.TocircumventthisinourLTRmodelfornearestneighborsearching,weneedtopartitionthewholedataset(whichcontainsatotal668moleculesasdescribedlater)intoanumberofsubsets,whereeachsubsetisregardedasaqueryintheLTRterminology.Thebasicrequirementonthequeryconstructionisthatmoleculesineachqueryshouldhave175somechemicalsimilarity.Additionally,werequirethateachqueryisinvarianttothenearestmoleculedetectionbasedonthecosinesimilarityoftheatomicfeaturesproposedinourearlierwork[256].Toachievethis,themoststraightforwardapproachisthateachqueryofmoleculescontainsthesamefunctionalgroup.However,thecomplexityofthemoleculesinthedatasetmakesthispartitionimpractical.Adirectrelaxationisthateachqueryofthemoleculeshavethesameelementtypes,whichwillbeusedforqueryconstructioninthiswork.Weconstructsixgroupsofmolecules,theyareformedbythemoleculeswithelementtype:i)H,C;ii)H,C,O;iii)H,C,N/H,C,N,O;iv)H,C,Cl;v)H,C,O,Cl;andvi)H,S,respectively.ThethirdgroupcontainsmoleculeseitherwithH,C,andNelementsorwithH,C,N,andO.Thisisduetothefactthatbasedonthecosinesimilarity,moleculesinthesetwocategoriesmayhavetheirnearestneighborsoverlapped.Fortheremainingmoleculesweiterativelyaddthemintotheabovesixgroupsbasedontheirnearestneighbor'sclasslabel.Themoleculesthatcannotbeintoanyoftheabovecategoryareregardedasanewquery.Letlabelmoleculesinthedatasetfrom1to668.Figure7.2showsthatthequeriesconstructedbasedontheaboveprocedureareinvarianttothenearestneighborsearchingbasedonthemeasureproposedinourearlierwork[256],whereeachblockdenotesaqueryofmolecules.Itiseasytoseethatmolecules'nearestneighborsarelocalizedintoeachblock.Thisinvarianceindicatesthatourqueryconstructionpreservesthemolecularchemicalsimilarity,i.e.,eachqueryofmoleculesisofsamesimilarityinthephysicalsense.Basedontheabovequeryconstruction,wecanapproximatelyregardthatclosesolvationfreeenergiesindicatesimilarmoleculesineachquery,whichmakestheLTRbasednearestneighborsearchingphysicallysound.WelistallthequeriesinSupplementarymaterial.176Figure7.2:Localizationofnearestneighbormolecules.Thehorizontalaxisstandsfortheindexofatargetmoleculeandtheverticalaxisdenotestheindexofthenearestneighborofthetargetmolecule.Eachblockcontainsaqueryofmolecules.Sinceourpartitionofthedatasetisbasedonsimilaritywithchemicalconstrains,wediscussthesimilaritymeasure,atomicfeatureselectionbasedonchemicalandphysicalpropertiesthatfacilitatethemeasureandLTRalgorithmforrankingthemolecules.Fornearestneighborsearchingineachquery,weemphasizethatthenearestneighborismeasuredbasedontheclosenessofthesolvationfreeenergies,insteadofthesimilaritymeasureusedbefore.7.2.2.2FeatureselectionAfundamentalassumptionofourapproachisthatthereexistsafeaturevectorthatcanuniquelycharacterizeanddistinguishonemoleculefromanother.Obvious,suchasfeaturevectorisoneofmostimportanttasks.Inourpreviouswork[256],thegoalofthefeatureselectionistotheclosestmoleculetoagiventargetmoleculeinthesenseoffunctionalgroupsimilarity,soweselectedthefeaturesthatcandistinguishmoleculeswith177tfunctionalgroups,anddesignatemoleculeshavingthesamefunctionalgroupsassimilar.Nevertheless,thismaynotbesuitabletothefundamentalassumptioninthiswork,namely,similarmoleculeshavingsimilarsolvationfreeenergies.Thedesiredfeaturesshouldthesimilarityinsolvationfreeenergy.Tothisend,weselectthosefeaturesthattheirPearsoncorrelationstosolvationfreeenergiesarelargerthan0.65orlessthan-0.65.Basedonthiscriterion,weselectthefollowingfeatures,aslistedinTable7.1.Table7.1:AtomicfeaturesusedforLTRnearestneighborsearching.FeatureNameSumofatomicreactionenergySumofabsolutevalueofatomicreactionenergySumofHatomicreactiondenergySumofabsolutevalueofHatomicreactionenergySumofOatomicreactioneldenergySumofabsolutevalueofOatomicreactionenergyMinimumvalueofatomicreactionenergyMaximumoftheabsolutevalueofreactionenergyMinimumvalueofHatomicreactionenergyMaximumoftheabsolutevalueofHatomicreactionenergyAveragevalueofatomicreactionenergyAverageofabsolutevalueofatomicreactionenergyVariationofatomicreactioneldenergyVariationoftheabsolutevalueofreactionenergyVariationofHatomicreactionenergyVariationofabsolutevalueofHatomicreactionenergySumofabsolutevalueofatomicchargeSumofHatomicchargeSumofabsolutevalueofHatomicchargeSumofOatomicchargeSumofabsolutevalueofOatomicchargeMinimumofatomicchargeMaximumofabsolutevalueofatomicchargeMaximumofHatomicchargeMaximumofabsolutevalueofHatomicchargeAverageofabsolutevalueofatomicchargeVariationoftheatomicchargeVariationoftheabsolutevalueofatomicchargeVariationoftheabsolutevalueofHatomicchargeVariationofHatomicdipole178Mathematically,foragivenmoleculeA,thesumofatomicreactionenergyisasGrf=NmXi=1Grf;i;whichisthesameastheelectrostaticsolvationfreeenergyofthesolutemoleculeA,whereNmisthenumberofatomsinsoluteM,Grf;iisthereactionenergycontributedfromtheithatom.ThesumoftheabsolutevalueofatomicreactionenergyisGabsrf=NmXi=1jGel;ij:Theotherfeaturescanbemathematicallyinthesamemanner.Theaboveatomicfeaturesarecalculatedbythefollowingmethods.Atomicchargesanddipolescanbecomputedbyusingquantummechanicaltheory.AtomicreactionenergiescanbecomputedbyusingPBtheory.Thecalculationsofmaximum,minimum,sum,mean,andvariancearebasedonstraightforwardstatisticaltheory.Figure7.3plotssomerepresentativefeaturescomparedtoexperimentalsolvationfreeenergies.Fromlefttoright,threechartsarethecorrelationsofexperimentalsolvationfreeenergieswithtotalreactionenergies,theabsolutevalueofthemeanreactionenergiesofallatoms,andtheabsolutevalueofthetotalreactionenergyofhydrogenatoms,respectively.TheirPearsoncorrelationsare0.87,-0.76,and-0.80,respectively.Inmanyphysicalbasedsolvationmodels,thenonpolarsolvationfreeenergyisusuallymodeledbythesolventexcludedsurfacearea,volumeenclosedbythemolecularsurface,179Figure7.3:Correlationsbetweenfeaturesandexperimentalsolvationfreeenergies.Thehorizontalaxesrepresentfortheexperimentalsolvationfreeenergies.Fromlefttoright,threechartsintheverticalaxesrepresenttotalreactionenergies,theabsolutevalueofthemeanreactionenergiesofallatoms,andtheabsolutevalueofthetotalreactionenergyofhydrogenatoms,respectively.andthevanderWaalsinteraction.Thistermsusuallyhighlycorrelatedwiththenonpolarsolventsoluteinteraction.However,herewenotethatitisnothighlycorrelatedtothetotalsolvationfreeenergy,sometypicaltermsaredepictedinFig.7.4,herethevanderWaalsinteractionoftheHandCatomsarecalculatedinthesamewaywithexactlythesameparametersusedin[256],howeverthesurfacetensionparametersareignored,whichdonotmatterintermsofthecorrelationmeasure.Thecorrelationsbetweenthesolvationfreeenergyandthesefourfeaturesare-0.27,-0.26,0.02,and-0.60,respectively.HereeventhevanderWaalsinteractionbetweenthesolventandHatomsofsolutemoleculeisslightlyhigh,butwetestedthataddthistoourLTRrankingfeaturesetdonottherankingresults.Hereweshouldemphasisagain,wehavenotadoptnonpolarsolvationfreeenergyrelatedfeaturesforLTRdonotmeanthesefeaturesareirrelevanttothesolvationfreeenergy,onlyintermsofLTR,theyarenotgoodmoleculardescriptors.Remark7.2.1.ThehighcorrelationofreactionenergycalculatedbythePBmodelwiththesolvationfreeenergyindicatesthatthePBisanctiveapproachformodelingthesolvationcts,thereactionenergycalculatedbythePBisconsistentwiththe180Figure7.4:Correlationsbetweenfeaturesandexperimentalsolvationfreeenergies.Thehorizontalaxisesrepresentfortheexperimentalsolvationfreeenergies.Fromlefttorightanduptodownfourures,theverticalaxesrepresentsolventexcludedsurfacearea,volumeenclosedbythesolventexcludedsurface,thevanderWaalsinteractionofHandOatoms,respectively.181experimentalsolvationfreeenergy.7.2.2.3LambdaMARTforrankingIngeneral,LTRalgorithmscanbeintothreecategories,i.e.,pointwise,pairwise,andlistwiseapproaches.Amongthem,listwiseLTRalgorithmsarecommonlyregardedasthemostadvancedones.Inourmodel,eachqueryofmoleculesformsalist,andthuswiththeaccuratelistwiseLTRalgorithm,thenearestneighborforagiventargetmoleculecanbefoundaccurately.Inthiswork,wespselectthestateoftheartlistwiserankingalgorithm,LambdaMARTforrankingmoleculesintqueries.Inthispart,weprovideabriefintroductiontotheLambdaMARTalgorithm.WealsodiscusshowtoapplytheLambdaMARTalgorithmtooursolvationmodeling.7.2.2.3.1InformationretrievalmeasuresLambdaMARTistheboostedtreeversionoftheLambdaRankalgorithm,wherelambdaisthephysicalgradientwhichmakesthegradientdescentapplicabletothesolutionoftheLTRproblem.MART(multipleadditiveregressiontree,alsonamedGradientBoostedRegressionTree(GBDT))isaboostedtreefortherankingscoreofthetrainingset.Inthispart,wegiveashortreviewoftheLambdaMARTmethod.Formoredetailaboutthisalgorithm,thereaderisreferredtotheliterature[26,78,27,25].First,letusintroducethemeasurethatisusedininformationretrievalresearch.Thereareseveralrankingqualitymeasuresusedinthissuchas,MeanReciprocalRank(MRR),MeanAveragePrecision(MAP),ExpectedReciprocalRank(ERR),andNormalizedDiscountedCumulativeGain(NDCG).Amongthem,NDCGandERRhavetheadvantageinhandlingmultiplelevelsofrelevance,whileMAPandMRRaresuitableforbinaryrelevance182levels.ThefrequentlyusedrankingmeasurefortheLambdaMARTalgorithmisNDCG.ToNDCG,wegivetheofthediscountedcumulativegain(DCG)foragivensetofsearchresults:DCG@T:=TXi=12li1log(1+i)(7.2.2)whereTisthetruncationlevel,andliisthelabeloftheithlistedmolecule.Wetypicallyuseelevelsofrelevance:li2f0;1;2;3;4g.TheNDCGisthenormalizedversionofDCGNDCG@T:=DCG@TmaxDCG@T(7.2.3)wherethedenominatoristhemaximumDCG@Tattainableforthequery,sothatNDCG@T2[0;1].7.2.2.3.2LambdaRankThemaindrawbackoftheabovementionedrankingqualitymeasureisthat,thegradientofthemiseverywhereeitherzeroornotThismakesgradientdescenttypeofoptimizationtechniquesfailedfordirectsolvingrankingproblems.ThekeyideaofLambdaRankistointroduceaphysicalgradientwhichdirectlyoptimizestherankingobjectivefunction.Inthispart,wewillintroducetheprincipleoftheLambdaRankalgorithm.ConsidertwoarbitrarymoleculesMiandMjinagivenquery.Letxiandxjbefeaturevectors,andyiandyjbethecorrespondingsolvationfreeenergies,respectively.Forthesakeofnotationsimplicity,letsi:=F(xi)andsj:=F(xj)bemodellearnedsolvationfreeenergiesforithandjthmolecules,respectively.Further,wethelearnedprobability183thatMishouldhavelargersolvationfreeenergythanMjtobePij:=P(MiMj):=11+e(sisj):=11+e˙ij(7.2.4)whichmodelstheprobabilitybyasimplesigmoidfunction.TheassociatedknownprobabilitythatMishouldbelargerthanthatofMjisPij:=1+Sij2(7.2.5)whereSij:=8>>>>><>>>>>:0;foryi=yj1;foryi>yj1;foryi<>:jZijjlog(1+e˙ij);forSij=1jZijjlog(1+e˙ji);forSij=1(7.2.8)whereZijisthechangeofNDCG@TbyswappingtherankpositionsofMiandMj.Itshouldbepointedoutthattheabovetargetfunctionincorporatesbotharankingqualitymeasureandacrossentropycostfunction.Fromnowon,weassumeall(i;j)2I.HereI:=f(i;j)jSij=1g:thesetofpairsofindices(i;j),forwhichdesiredMihavealargersolvationfreeenergythanMj.Thus,the184expressionofCijcanbeasCij:=jZijjlog(1+e˙ij):(7.2.9)BytreatingZijasaconstant,wehave@Cij@si=@Cij@˙ij=jZijj1+e˙ij:=Zijjˆij=@Cij@˙ji=@Cij@sj:Wethefollowing-gradientij:=@Cij@˙ij=jZijj1+e˙ij=jZijjˆij:(7.2.10)Thenwehave@Cij@si=ijandthesecondgradientis@2Cij@s2i=@ij@si=jZijje˙ij(1+e˙ij)2=jZjˆij(1ˆij):(7.2.11)Foragivenquery,assemblingthepairwisecostyieldsthetotalcostfunctionC=XiXj:(i;j)2ICij+XiXj:(j;i)2ICji=XiXj:(i;j)2IjZijjlog(1+e˙ij)+(7.2.12)XiXj:(j;i)2IjZjijlog(1+e˙ji)185TheordergradientofCis@C@si=Xj:(i;j)2I@Cij@si+=Xj:(j;i)2I@Cji@si=(7.2.13)Xj:(i;j)2I(Zijjˆij)+Xj:(j;i)2I(Zjijˆji)=Xj:(i;j)2I(ij)+Xj:(j;i)2I(ji)Thus,wethe-gradientforagivenmoleculeMiinthequeryi:=Xj:(i;j)2Iij+Xj:(j;i)2Iij=Xj:(i;j)2IijXj:(j;i)2Iji:(7.2.14)Hence,wesimplyhave@C@si=i:(7.2.15)Furthermore,itiseasytoverifythefollowingsecondordergradientexpression@2C@s2i=@i@si:(7.2.16)7.2.2.3.3GradientboostingBeforeintroducingtheMART,wepresenttheprin-cipleofthegradientboosting.WeconsiderthelossfunctionL=L(y;F)ofinputscoreyandmodelfunctionF=F(x).ThegoalofthegradientboostingistominimizethelossfunctionminFL(y;F):(7.2.17)Borrowingtheideafromtheclassicalgradientdescent,thegradientboostingiterativelyupdatesthemodelfunctionFinagivenfunctionalspaceFn+1=Fn+ˆfn(x)(7.2.18)186wherefn(x)isamodelfromagivenfunctionalspace,i.e.,regressiontreefortheMART,thattheresidualf~ygi~yi=@L(yi;F)@FjF=Fn:(7.2.19)Theshrinkageparameterisobtainedbysolvingthefollowingoptimizationproblemˆ=argminˆL(y;Fn+ˆfn(x));(7.2.20)viaasimplelinesearchingalgorithm.7.2.2.3.4GradientboostedregressiontreesIntheaforementionedgradientboost-ingalgorithm,ifweselectthefunctionalspacetobetheregressiontree,thisresultsinaGBDTalgorithm.Mathematically,theregressiontreeisformulatedasf(x)=f(x;fj;RjgJi)=JXj=1j1(x2Rj);(7.2.21)whereJisthenumberofleaves,andfRjgJiaredisjointregionsthatcoverallfeaturespace,witheachofthembeingcoveredinoneleaf.Herefjgarethevaluesinthecorrespondingleaf.Inthiscase,weconstructfRjgJitofxi;yigbyaleastsquarealgorithm.AccordingtoEq.(7.2.18),wehaveFn+1=Fn+JXj=1j1(x2Rj):187Additionally,Eq.(7.2.20)indicatesthatj=argminXi:xi2RjL(yi;Fn(xi)+):=argminXi:xi2Rjg(7.2.22)whereg=g(yi;Fn(xi))=L(yi;Fn(xi)+).WhenthereisnoclosedformsolutiontoEq.(7.2.22),wecanapproximateitbyasingleNewtonstepj=Pi:xi2Rj@g@FnPi:xi2Rj@2g@F2n:(7.2.23)7.2.2.3.5LambdaMARTThegoaloftheLambdaMARTistomaximizemaxFC(7.2.24)whereCisthetotalcostfunctionnedbyEq.(7.2.12),andFisaMART.7.2.2.3.6LambdaMARTformoleculesrankingNowletusturntotheapplicationofLambdaMARTtothesolvationprediction.Ineachqueryofthemolecules,thesolvationfreeenergiesthemselvesareregardedasthelabelsofmolecules,andthecorrespondingfea-turesarediscussedinthenextpart.Ourmethodcanbesummarizedasrankingthenearestneighborsofatargetmoleculebasedontheirsolvationfreeenergiesandthen,learningarelationbetweenfeaturesandsolavtionfreeenergiesforpredictingthesolvationfreeenergyofthetargetmolecule.1887.2.3FunctionalestimationforsolvationfreeenergypredictionWediscussthesolvationfreeenergypredictionforagiventargetmoleculeinthissection.Basedonourassumptionthatsolutesolvationfreeenergyisafunctionalofthefeaturevector,solvationfreeenergypredictionisactuallytoconstructaenergyfunctionalaroundthetargetmolecule.Thisfunctionalwillbeutilizedforsolvationfreeenergypredictionforthetargetmolecule.ConsiderthesolvationfreeenergyforatargetmoleculeAcharacterizedbyitsfeaturevectorxA=(xA1;xA2;;xAn),wherenisthedimensionofthefeaturespace,i.e.,thespaceofallfeaturevectors.Here,wesimplyusethefeaturesthatutilizedintheLTRprocedurefortrainingthelocalsolvationfunction,sincethesefeaturesinsomesensealsotheorderofthesolvationfreeenergy.However,webelievethatoptimalselectionofthefeaturesfortrainingthisfunctioncanleadstobetterprediction,sincewenotethatinourfeaturesselectionthenonpolarfeatureswerenotused.Forglobalsolvationfreeenergyfunctiontraining,neglectingnonpolarfeaturesmayleadstotremendouserror,butinourLTRmodel,wearerestrictedourlearninginthelocalsense.FromtheLTRframework,wemfeaturevectorsofthenearestneighborsandcorrespondingknownsolvationfreeenergies(x1;G1);(x2;G2);;(xm;Gm).Notethatingeneral,thenumberofnearestneighborsfoundisfarlessthanthedimensionofthefeaturespace,i.e.,m˝n.Inthiswork,weassumethefunctionalrelationbetweenfeaturesandsolvationfreeener-giesfortargetmoleculeAhastheformGA=b+nXi=1wixAi;(7.2.25)wherewiistheweightforfeaturexAiandbcanbeintuitivelyunderstoodastheheightof189hyperplaneembeddedintheEuclideanspace.Equation(8.2.21)canberegardedasaorderTaylorpolynomialapproximationofthesolvationfreeenergyGA=f(xA).Sincethefactthatm˝n,thedirectlyregressionbasedontheleastsquareapproachmayleadstoovToavoidovtherearegenerallytwostrategiesfordeterminingfwigandb:sparsesolutionviaacompressedsensingapproach;TikhonovregularizationbasedleastsquareInthiswork,weusethesecondstrategyfortrainingthelocalregressionmodelforsolvationfreeenergyprediction.ThelocalregressionproblemisequivalenttosolvethelinearsysteminEq.(7.2.26)intheL2sense0BBBBBBBBB@G1G2...Gm1CCCCCCCCCA=0BBBBBBBBB@x11x12x1nx21x22x2n............xm1xm2xmn1CCCCCCCCCA0BBBBBBBBB@w1w2...wn1CCCCCCCCCA+0BBBBBBBBB@bb...b1CCCCCCCCCA:(7.2.26)Equation(8.2.15)canbewrittenasG=xw+b1;(7.2.27)whereG=G1;G2;;Gm)T,w=(w1;w2;;wn)T,1isam-dimensionalcol-190umnvectorwithallelementsequal1,andmatrixxisgivenbyx=0BBBBBBBBB@x11x12x1nx21x22x2n............xm1xm2xmn1CCCCCCCCCA:ToavoidovweaddtheL2penaltytotheweightvectorw,andthusEq.(8.2.16)canbesolvedbythefollowingoptimizationproblemminw;bjjGxwb1jj22+jjwjj22:=minw;bF;(7.2.28)whereistheregularizationparameter,whichissetto100inthiswork,jjjj2denotestheL2normofthequantity.Bysolving@F@w=0,wehavew=xTx+I1xTGxT(b1);(7.2.29)whereIismmidentitymatrix.TothevaluebthatsolvestheoptimizationproblemEq.(8.2.17),werelaxb1toarbitraryvectorb=(b1;b2;;bm)T,bysolving@F@b=0,wehaveb=Gxw:(7.2.30)191Therefore,weobtaintheunbiasedestimationofbasb=Pmi=1Gxw)im;(7.2.31)whereGxw)iistheithcomponentofthevectorGxw.WecansolvetheoptimizationproblemEq.(7.2.28)byalternatingiterationsbetweenEq.(7.2.29)andEq.(7.2.31),whichisessentiallyanexpectation-Maximization(EM)algorithm.Afterobtainingoptimizedparameterswandb,thesolvationfreeenergyoftargetmoleculeA,ispredictedbyEq.(7.2.25).7.3Numericalresultsanddiscussions7.3.1Datasetandfeatureparametrization7.3.1.1DatasetInordertoassesstheperformanceofthepresentmethod,weconsiderthesamedatesetthathasbeenconstructedinourearlierwork[256].Withatotalof668molecules,thisdatasetisthelargesttodate,toourknowledge,andcontainsbothmonofunctionalgroupandpolyfunc-tionalgroupmolecules.Experimentalsolvationfreeenergiesarecollectedfromtheliterature[30,261,153].Themainpartofourdataset,i.e.,589molecules,overlapswithMobley'ssol-vationdatabase(http://mobleylab.org/resources.html).AllthestructuresofthisdatasetaredownloadedfromthePubchemproject(https://pubchem.ncbi.nlm.nih.gov/).Moredetaileddescriptionofthedatasetcanbefoundinourearlierwork[256].1927.3.1.2AtomicfeatureparametrizationInatomicfeaturegeneration,atomicchargesandatomicdipolesarecalculatedviathedis-tributedmultipoleanalysis(DMA)method[227],inwhichthechargedensityisoriginallycomputedbythedensityfunctiontheorywithB3LYPand6-31GbasisselectioninGaussianquantumchemistrysoftware[80,18,148].Atomicreactionldenergies(i.e.,atomicelectro-staticsolvationenergies)arecalculatedbyourin-houseMIBPBsoftware[253,39,90]withaproberadiusof1.4Aanddielectricconstantsbeing1and80,respectivelyforthesoluteandsolventdomains.Auniformgridsizeof0.25Aisusedinallatomicreactionenergycalculations.ToexaminethesensitivityofthepresentapproachtochargeforcewhichwasamajorissueinourearlierHPKmodel,weutilizethreetypesofatomicradii,namely,Amber6,Amberbondi,andAmbermbondi2[35].Additionally,weconsiderthreetypeofchargeassignments,namely,OpenEye-AM1-BCCv1parameters[123],Gasteiger[85],andMulliken[35].Thecombinationofradiussetsandchargesetsgivesrisetoatotalofninetparametrizations,whichhavealreadybeenutilizedinourearlierwork[256]tosomeofthebestsolvationpredictionresults.Fortheregularizedleastsquarehyperplanetheregularizationparameterissetto100.7.3.2Leave-one-outpredictionFirst,weconsidertheleave-one-outtestonthewholedatasetof668molecules.Inthistest,weregardthesolvationfreeenergyofonemoleculeasunknown,andusetheremainingmoleculestopredictthesolvationfreeenergyforthetargetmolecule.Thepurposeoftheleave-one-outtestistwotwofold.First,ithelpsfortheparameterselection,i.e.,thenumberofnearestneighborstobeusedforthepredictionofthetargetmolecule'ssolvationfreeenergy193andtheparametersusedintrainingtheLambdaMART.Second,theleave-one-outtestcandemonstratetheperformanceoftheproposedmodelforsolvationfreeenergyprediction.Theperformanceofleave-one-outtestismeasuredbyboththerootmeansquareerror(RMSE)andmeanerror(ME),respectivelybyRMSE=vuutPNi=1GPrediGExpli2N(7.3.1)andME=PNi=1GPrediGExpliN(7.3.2)whereNisthetotalnumberofmoleculesinourdataset,GExpliandGPredistandfortheexperimentalandpredictedsolvationfreeenergiesfortheithmolecule,respectively.TheRMSEmeasurestheaccuracyoftheprediction.AsmallRMSEindicatesthepre-dictionsforthewholedatasetareuniformlyaccurate.MEisusedtodeterminewhetherthepredictionisbiasedornot.IftheMEisclosetozero,itmeansthatthepredictionisunbiased.7.3.2.1NumberofneighborsinvolvedInapplyingourmodel,onehastodeterminehowmanynearestneighborstobeinvolvedforthesolvationfreeenergyprediction.Ingeneral,thisnumberdependsonthetrainingdatasetandparametrization.Numerically,onecanuseeitherleave-one-outore-foldcrossvalidationtodeterminetheoptimalnumberofnearestneighbors.Table7.2listsRMSEandMEofourleave-one-outpredictionusingatotalof9tcombinationsofatomicradiiandchargeforceTheuseoftnumbersofnearestneighborsisexaminedas194Table7.2:TheRMSEandMEofthesolvationfreeenergypredictionwithtparametrizationonthemolecules,theerrorsarecalculatedbytheproposedsolvationmodelswithtnumberofnearestneighborsinvolved.Allwithunitkcal/mol.ParametrizationError12345678BCC+Amber6RMSE1.0591.0681.0791.0861.1111.1331.1281.109ME0.0300.0430.0370.0290.0370.0350.0500.045BCC+BondiRMSE1.0101.0711.0781.0931.0101.0841.0861.095ME0.0360.0580.0610.0510.0700.0730.0710.078BCC+MBondi2RMSE1.0991.1061.1361.1391.1771.1771.1751.148ME0.0400.0220.0170.0290.0300.0470.0470.469GAS+Amber6RMSE1.3311.2941.2671.2781.2681.2841.3211.336ME0.002-0.005-0.008-0.0030.019-0.0030.0190.038GAS+BondiRMSE1.2031.1931.2271.2491.2761.3021.3111.340ME-0.011-0.0040.0120.0280.0420.0540.0550.065GAS+MBondi2RMSE1.1651.2001.1791.1631.1751.1921.2071.220ME-0.018-0.037-0.031-0.0180.0030.0170.0240.036MUL+Amber6RMSE1.3561.3281.3131.3411.3321.3381.3471.368ME0.0250.0370.0360.0370.0370.0570.0710.076MUL+BondiRMSE1.2811.2521.2291.2381.2271.2501.2801.305ME0.0300.0270.0310.0330.0380.0450.0530.070MUL+MBondi2RMSE1.2641.2861.2771.2611.2751.2851.3161.326ME-0.012-0.022-0.020-0.0100.0130.0120.0280.032195well.WenotethatjudgedbyRMSEs,ourmethodisnotsensitivetothenumberofnearestneighbors.Allofthetoptenrecommendationshavethesamelevelofaccuracy.However,whenMEsarealsotakenintoconsideration,itisfoundthatalargenumberofnearestneighborstypicallycontributestoalargeEM.Weproposetoselectthenumberofnearestneighborsbasedonthefollowingcriteria:TheRMSEshouldbeassmallaspossibletogiveanaccurateprediction.TheMEshouldbeasclosetozeroaspossibletogiveanunbiasedprediction.AtthesamelevelofRMSEandME,itispreferredtoinvolvemoremolecules,whichmakesiteasytodeterminesolvationfreeenergyfunctional.Usually,thereisatradeoamongtheaforementionedcriteriainselectingthenumberofmoleculesforsolvationprediction.Basedontheabovecriteria,thechoiceofthenumberofnearestneighborsthatisutilizedforeachforceparametrizationforsolvationfreeenergypredictionislistedinTable7.3.Weemphasizethattheproposedmethodisquiterobustwithrespecttotchoices.Table7.3:Thenumberofnearestneighborinvolvedforthesolvationfreeenergypredictionfortforceparametrization.ChargeRadiusNumberofNearestNeighborsBCCAmber64AmberBondi4AmberMBondi23GASAmber64AmberBondi2AmberMBondi24MULAmber63AmberBondi3AmberMBondi24196Table7.4:TheRMSEandMEoftheleave-one-outtestofthesolvationfreeenergypredictionwithtmethods.Foracomparison,thenumbersintheparenthesisiscalculatedbythemethodproposedinthework[256].Allwithunitkcal/molRadiusChargeBCCMullikenGasteigerAmber6RMSE1.08(1.47)1.27(1.49)1.31(1.65)ME0.03(-0.13)-0.00(-0.20)0.03(-0.19)AmberBondiRMSE1.09(1.34)1.19(1.48)1.22(1.66)ME0.05(-0.14)-0.00(-0.21)0.03(-0.13)AmberMBondi2RMSE1.10(1.33)1.16(1.49)1.26(1.68)ME0.02(-0.14)-0.02(-0.22)-0.01(-0.22)7.3.2.2AccuracyandsensitivityanalysisInthispart,wecomparethepresentleave-one-outpredictionswiththoseofourearlierHPKmodel[256]underthesameradiusandchargeparametrizations.Table7.4liststheRMSEsandMEsofthecurrentmodelpredictions.Foracomparison,correspondingRMSEsobtainedbyourpreviousHPKmodelisalsolistedinparentheses.FromTable7.4,wecanconcludethefollows:TheLTRsolvationmodelisinmuchmoreaccuratethanourpreviousHPKmodel.ThebestpredictionbythepresentmodelhasanRMSEof1.08kcal/mol,comparedtothelowestRMSEof1.33kcal/molachievedbythepreviousmodel.ThetheworstRMSEofthepresentpredictionis1.31kcal/mol,whichisstillbetterthanthebestresultobtainedbythepreviousmodel.NotethatworstearlierresulthasanRMSEof1.68kcal/mol[256].TheLTRsolvationmodelprovidesunbiasedsolvationpredictions,asindicatedfromMEresults.ThepredictionswithtmolecularparametrizationsallachievenearzeroMEs.TheMEsofthepreviousmodelarealmosttentimeslargerthanthoseoftheLTRsolvationmodel.Additionally,wenotethatnomatterwithwhattypeofmolecularparametrization,thepreviouspredictionsarebiasedtowardonedirection,197whereasthepresentmodelhasMEsofbothsigns.TheLTRsolvationmodelislesssensitivetotheatomicfeatureparametrizationcom-paredtotheHPKsolvationmodel.TherangesoftheRMSEsforLTRandHPKmodelsdueto9tparametrizationsare1.08-1.31and1.33-1.68kcal/mol,respectively.Obviously,largerrangeinpredictionRMSEsindicatesthattheHPKmodelismoresensitivetoatomicfeatureparametrization.7.3.3BlindpredictionofSAMPLxchallengesInthispart,weconsidertheblindpredictionofsolvationfreeenergyfortheSAMPLxchallengesets.OurLTRsolvationmodelisappliedtoallofeSAMPLtestsets,i.e.,SAMPL0-SAMPL4.Weadoptthesameprotocolusedinourpreviousleave-SAMPLx-outprediction[256].Sp,ineachSAMPLtestprediction,weexcludeallthemoleculesinthegivenSAMPLinourLTRprocess,andusetheremainingmoleculesasourtrainingsettoasetofthenearestneighborstoeacheachmoleculeintheSAMPLtestset.BothRMSEandMEmeasuresareevaluatedtoassesstheperformanceoftheproposedLTRmodel.Thesame9setsofchargeandradiusparamerizationsareimplementedinleave-SAMPLx-outtests.First,letusconsiderthesolvationfreeenergypredictionfortheSAMPL0testset,whichcontainsatotalof17molecules.Allstructuresofthistestsetarerelativelysimple.However,themoleculespeciesofthissetisquitediverse.Manyresearchershavereportedtheirsol-vationfreeenergypredictionsforthischallengeset[183,134].Inourearlierwork[256],wehaveshownthatwhenGasteigerchargeandAmber6radiusareusedforparametrization,ourHPKmodelgivesablindpredictionRMSEof1.20kcal/molforthewholeset.Whenone198moleculethatcontainsaBratomwasexcluded,byusingAmberBondiradiiandtheDFTbasedpolarizablePoissonmodelforelectrostaticcalculation,ouroptimalpredictionhasanRMSEof0.93kcal/mol.Priortoourearlierwork,theoptimalpredictiontothistestsethasanRMSEof1.34kcal/molforthewholeset[134].Figure7.5depictsthepresentLTRre-sultsforatotalof9chargeandradiuscombinations.WhenBCCchargeisused,theRMSEsofourpredictionswiththreeradiusparametrizationsarealllessthan0.90kcal/mol.OuroptimalpredictionhasanRMSEof0.76kcal/mol,obtainedfrombothAmber6andAmberMbondi2radiusparametrizationsinconjugationwiththeBCCchargeassignment.ApartfromdeliveringthesameRMSE,thesetwoparametrizationsalsoquitecloseMEs.Foracomparisonwithourearlierwork,wehavealsoplottheRMSEsfromourpreviousHPKmodelinFig.7.5.Obviously,exceptforMullikenchargeandAmber6orAmberBondiradius,thepresentpredictionsaremoreaccuracythanthoseofourpreviousHPKmodel.Figure7.5:IllustrationofpredictionRMSEsobtainedwithtmolecularparametriza-tionsbytheLTRandHPKmodelsforSAMPL0testset.HavingdemonstratedthesuperiorityoftheproposedLTRmodelfortheblindprediction199oftheSAMPL0challengeset,wefurtherconsidertheSAMPL1testset,whichisgenerallybelievedtobethemostyone,duetothefollowingtworeasons.First,themolecularstructuresofthistestsetareextremelycomplexcomparedtoothermoleculeswithknownexperimentalsolvationfreeenergies.Second,theuncertaintyofSAMPL1experimentaldataisverylarge.Forsomemoleculestheuncertaintyisaslargeas2.0kcal/mol[105].Never-theless,itisextremelydesirabletodevelopanaccuratemodelingparadigmforthistestsetbecausemostmoleculesinthistestsetaredruggable.ThebestpredictionforthewholesethasanRMSEof2.45kcal/mol[134].OnasubsetoftheSAMPL1testsetthatcontainsonly56molecules,thebestperformancewasshowntogiveanRMSEof2.4kcal/mol.OurpreviousHPKmodelcanprovidebestpredictionforthissetwithRMSE2.82kcal/molwhentheDFTisusedforthechargeassignment.However,thisresulthasaprerequisitethatonemoleculethatcontainsaBrelementwasignoredduetothelackofaproperBrpesudopo-tentialfortheDFTsoftware.TheoptimalpredictionforthewholetestsethasanRMSEof3.07kcal/molbasedontheAM1-BCCchargeandAmberBondiradiusparametrization[256].Additionally,ourpreviousHPKmodelisverysensitivetotheforceassignment.OurearlierRMSEsfor16chargeandradiuscombinationsvaryfrom3.07to6.16kcal/mol.Figure7.7illustratesthecomparisonoftheLTRandHPKpredictionsonthewholeSAMPL1testset.ItiseasytoseethattheLTRmodelismuchmoreaccurate.TheoptimalpredictionhasanRMSEassmallas2.14kcal/mol,whichisthebesttoourknowledge.Additionally,thepresentLTRmodelisveryrobustwithrespecttothechangeinforceThemaxi-mumandminimumpredictionRMSEsover9setsofchargeandradiusparametrizationsare2.14and2.81kcal/mol,respectively.Thebetweenthemaximumandminimumis0.67kcal/mol,whichismuchsmallerthanexperimentaluncertaintyof2kcal/mol.AnothertestsetisSAMPL2,whichcontainsatotalof30molecules[137].The200Figure7.6:IllustrationofpredictionRMSEsobtainedwithtmolecularparametriza-tionsbytheLTRandHPKmodelsforSAMPL1testset.experimentaluncertaintyonthesemoleculesismuchlessthanthatoftheSAMPL1testset.Nevertheless,accuratesolvationpredictionforthissetisrare.Usingall-atommoleculardynamicssimulationsandmultiplestartingconformationsforblindprediction,KlimovichandMobleyreportedanRMSEof2.82kcal/moloverthewholesetand1.86kcal/moloverallthemoleculesexceptseveralhydroxyl-richcompounds[137].SomebestpredictionhasanRMSEof1.59kcal/mol[134].Inourprevioustest,themoleculecontaininganIatom(5-iodouracil)isexcludedinallcalculationsduetothelackofappropriatechargeforceInthiswork,wealsoignorethismoleculeforthesamereason.TheHPKmodelgivesanoptimalpredictionwithRMSE1.96kcal/mol.However,theRMSEsofthepredictionvaryoveralargerange,from1.96to4.86kcal/mol,whentchargeandradiusforceareapplied.Inthepresentwork,anoptimalLTRpredictionhasanRMSEof1.90kcal/mol,aslightimprovementoverourearlierprediction.However,thevariationofRMSEsundertchargeandradiusparametrizationsisonly1.2kcal/mol(i.e.,from1.90to2013.10kcal/mol),whichindicatestherobustnessofthepresentLTRmodelcomparedtotheearlierHPKmodel.AcomparisonofHPKandLTRpredictionsisgiveninFig.7.7.Figure7.7:IllustrationofpredictionRMSEsobtainedwithtmolecularparametriza-tionsbytheLTRandHPKmodelsforSAMPL2testset.TheSAMPL3testset,whichcontains36molecules,isrelativelyeaseforblindprediction.ThestructuresofSAMPL3moleculesarerelativelysimple,andmostmoleculesinthissetarechlorinatedhydrocarbonmolecules[87].ThebestpredictionintheliteratureanRMSEof1.29kcal/mol[134].OurearlierHPKmodelachievedthemostaccuratesolva-tionfreeenergypredictionforthewholeset,especiallywhenGasteigerchargeisusedforparametrization.ItisfoundthatGasteigerchargesdogiveagoodchargedescriptionforthechlorinatedhydrocarbonmolecules.Figure7.8depictstheRMSEsofthepredictionsbyLTRandHPKmodels.Itiseasytoseethattwomethodsgivealmostthesameoptimalprediction:theLTRpredictionhasanoptimalRMSEof0.87kcal/mol,whiletheHPKpre-dictionhasoptimalRMSEof0.82kcal/mol.TheRMSEsofLTRpredictionsfromtparametrizationsvaryoverasmallrangeof0.61kcal/mol(i.e.,from0.87to1.48kcal/mol),202whichfurthervtherobustnessoftheLTRsolvationmodel.Figure7.8:IllustrationofpredictionRMSEsobtainedwithtmolecularparametriza-tionsbytheLTRandHPKmodelsforSAMPL3testset.Finally,weconsidertheSAMPL4testset,whichisaverypopularone.Manyexplicit,implicit,integralequationandquantumapproacheshavebeenappliedtothisset[177].OurprevioustestusingtheHPKmodelshowsanextremelyaccurateandrobustpredictionwiththeoptimalRMSEof1.03kcal/mol.TheoptimalpredictionoftheLTRmodelhasanRMSEof1.01kcal/mol.ThevariationinpredictionRMSEsisassmallas0.24kcal/mol(i.e.,from1.01to1.35kcal/mol)over9tparametrizations.Foracomparison,wedepictthepredictionRMSEsofboththeLTRandHPKmodelsinFig.7.9.ItisseenthatforSAMPL4,bothmodelsgivethesamelevelofaccuracyandrobustnessinthesolvationfreeenergyprediction.Table7.5providesasummaryoftheLRTRMSEsandMEsforallSAMPL0-SAMPL4testsets.Theseresultsindicatethatoverall,theLTRframeworkismoreaccurateinsolvationpredictionsthanourearlierHPKmodelforSAMPLtestsets.Thisisespeciallytruefor203Table7.5:TheRMSEsandMEsofthesolvationfreeenergypredictionswithtparametrizations.ThenumbersinsideandoutsideparenthesesaretheresultscalculatedbytheHPKandLTRmodels,respectively.Allerrorsarewithunitkcal/mol.TestsetRadiusErrorBCCMullikenGasteigerSAMPL0Amber6RMSE0.76(1.26)1.17(1.25)1.25(1.20)ME0.30(0.76)0.46(-0.12)0.10(-0.13)AmberBondiRMSE0.86(1.37)1.15(1.27)1.38(1.27)ME0.15(0.86)0.29(-0.20)-0.07(-0.24)AmberMBondi2RMSE0.76(1.37)1.05(1.32)1.19(1.29)ME0.29(0.88)0.42(0.21)0.34(-0.25)SAMPL1Amber6RMSE2.81(3.27)2.69(4.77)2.70(4.96)ME-1.28(0.88)-0.43(-2.28)-0.33(-1.12)AmberBondiRMSE2.14(3.07)2.26(4.68)2.27(5.55)ME-0.97(0.99)-0.42(-2.28)-0.12(-1.26)AmberMBondi2RMSE2.38(3.30)2.20(5.41)2.74(4.82)ME-1.15(1.22)-0.42(-2.39)-0.37(-0.67)SAMPL2Amber6RMSE1.90(2.11)2.04(3.59)2.39(4.86)ME0.71(-0.65)1.64(1.65)1.40(2.65)AmberBondiRMSE2.48(1.97)2.09(3.47)2.52(4.72)ME1.10(-0.26)1.41(1.69)1.53(2.61)AmberMBondi2RMSE2.21(1.96)1.92(3.66)3.10(4.76)ME1.14(-0.46)1.31(2.62)2.19(1.79)SAMPL3Amber6RMSE1.04(1.28)1.34(1.42)1.30(0.97)ME0.04(0.38)-0.24(0.72)-0.15(-0.16)AmberBondiRMSE1.09(1.47)0.87(1.58)1.12(0.82)ME0.12(-0.56)-0.05(0.85)0.04(-0.09)AmberMBondi2RMSE1.18(1.47)1.34(1.58)1.48(0.82)ME0.09(-0.56)0.24(0.85)0.00(-0.09)SAMPL4Amber6RMSE1.06(1.28)1.30(1.20)1.22(1.08)ME0.27(-0.14)0.23(0.11)0.34(0.18)AmberBondiRMSE1.01(1.12)1.35(1.41)1.28(1.10)ME0.02(0.06)-0.03(0.31)0.08(0.33)AmberMBondi2RMSE1.10(1.09)1.21(1.33)1.10(1.03)ME0.16(0.15)0.23(0.14)0.37(-0.08)204Figure7.9:IllustrationofpredictionRMSEsobtainedwithtmolecularparametriza-tionsbytheLTRandHPKmodelsforSAMPL4testset.complexandchallengingmoleculesinSAMPL1andSAMPL2testsets.Furthermore,thepresentLTRsolvationpredictionmodelismuchlesssensitivetotchargeandradiusparametrizations.Nevertheless,contrarytothesmallMAEsfoundintheleave-one-outtests,theseerrorsamplifyalotintheblindpredictionofSAMPLxtestsets,particularlyforSAMPL1andSAMPL2testsets.Possibleexplanationsforthisphenomenonarethecomplexityofmoleculesandthelackofphysicallyandchemicallysimilarmoleculesinourdatabase.WealsopointoutthatintheLTRpredictions,largeRMSEsandMEsoccursimultaneously,whichindicatesthatlargeRMSEsmightcomefrombiasedpredictions.Thisphenomenonisunderourfurtherinvestigation.7.4ConcludingRemarksThisworkproposedaLearning-To-Rank(LTR)modelforsolvationfreeenergyprediction.Moreprecisely,thisapproachcombinesanLTRbasednearestneighborsearchingmethod-205ologywithalocalhyperplanelearningprocedureforsolvationfreeenergyprediction.TheproposedmodelisinspiredbyourpreviousHybridPhysicalandKnowledge(HPK)model[256]onthenearestneighborparametrizationforsolvationfreeenergyprediction,inwhichthenearestneighborwasdetectedaccordingtothesimplecosinesimilaritybetweenthemolecularatomicfeatures.Ourpreviousattemptonthenearestneighborapproachmoti-vatesabasicassumptionutilizedinthiswork,i.e.,similarmoleculeshavesimilarsolvationfreeenergies.Inmachinelearningterminology,ourpreviousnearestneighborsearchmethodcanberegardedasanunsupervisedlearnedmethod.Withthesimilarmoleculeshavingsimilarsolvationfreeenergies,thenearestneighborsearchingproblemcanbecastintoasupervisedlearningproblem.Asaresult,thenearestneighborqualitycanbeimproveddramatically,whichfurtherimprovestheaccuracyofsolvationfreeenergyprediction.ThepresentLTRmethodcanbeconsideredasanenhancednearestneighborsearchingmethod.ToimplementournewsupervisedLTRmodel,wepartitionmoleculesintoseveralgroupsaccordingtotheirchemicalcompositions.EachgroupisregardedasaqueryintheLTRterminology.Thequeryconstructionisoffundamentalimportancetomakeourmodelpractical,sinceduringthemolecularranking,weutilizetheassumptionthatclosesolvationfreeenergiesimplysimilarmolecules,whichisgenerallynottruewithoutaproperqueryconstruction.Anotherfundamentalassumptionofthismethodisthatthereexistsafeaturevectorthatcanuniquelycharacterizeanddistinguishonemoleculefromanother.Obviously,theconstructionofthefeaturevectorisofcrucialimportancetotheperformanceofthepresentmodel.Weutilizeatomicfeatures,suchasatomiccharge,dipole,reactionenergy,etc.,evaluatedbyquantummechanics,polarizationtheoryandPoisson-Boltzmanntheory.Astate-of-the-artlistwiseLTRalgorithm,LambdaMART,isadoptedfortrainingtheLTRmodel.Byusingthisalgorithm,thequalityofthenearestneighborsearchimproves206tly,whichissupportedfromthefactthatthebetweenthesolvationfreeenergiesofatargetmoleculeanditsneighborsdecreaseddramatically.Furthermore,weassumethatmolecularsolvationfreeenergyisafunctionofmolecularfeatures.Basedonthisassumption,wedeveloparegularizedleastsquarebasedlocalhyperplanelearningalgorithmforsolvationfreeenergyprediction.Highlyaccuratesolvationfreeenergypredictionisbyboththeleave-one-outtestover668solvationmoleculesandblindpredictionofeSAMPLtestsets,namely,SAMPL0,SAMPL1,SAMPL2,SAMPL3andSAMPL4.Apartfromprovidingthestate-of-the-artsolvationfreeenergypredictions,theproposedLTRmodelismuchlessparametrizationdependent.Thatis,theLTRmodelislesssensitivetotheparametrizationofchargesandradiiinthesolvationfreeenergyprediction.Thisrobustpropertyisespeciallyevidentfromtheaccurateandrobustsolvationfreeenergypredictionofcomplexmolecules,forwhichourpreviousHPKpredictionscandaslargeas10kcal/molwithtchargesandradiusparameterizations.AmajorreasonleadstothisrobustinLTRbasedsolvationfreeenergypredictionisthatinourbasicassumption,solvationfreeenergyismodeledasaunity,insteadofisolatedpolarandnonpolarcomponents.Thistreatmentmakesthepresentmodelrobusttothemoleculeparametrization,andavoidstheerrorpropagationfrompolarmodelingtononpolarmodeling.InourearlierHPKmodel,inappropriateatomicchargeorradiiassignmentscanleadtoahugeelectrostaticerror,whichpropagatestothenonpolarsolvationfreeenergyprediction.Thisworkisourattemptindevelopinganadvancedmachinelearningbasedmodelforsolvationfreeenergyprediction.Thismodelcanbeimprovedinanumberofways.Oneimprovementisaboutqueryconstructionbasedonmolecularelementtypes.Webelievethatamoresophisticatedqueryconstructioncanfurtherimprovetheaccuracyofthenear-estneighborsearching.Anotherpotentialimprovementisabetterfeatureselection.For207example,onecanselectfeaturesaccordingtotheirlocalcorrelationswiththesolvationfreeenergiesinagivenquery.Theotherimprovementcanbeachievedthroughbetterfeaturede-signandmoreaccuratefeatureevaluations.ManyatomicfeatureswerecomputedviaDFTinthepresentwork.Webelievethatsomeotheradvancedquantummethodologiesforatomiccharge,dipole,andquadrupolecalculationswilltlyimproveourprediction.TheadvantageoftheDFTbasedpolarablePoissonmodelhasbeennoticedinourpreviouswork[256].Therefore,someimprovementsinthereactionenergycanbevaluableaswell.Overall,webelievethatwithabettersetofmoleculedescriptors,molecularparametriza-tion,andmolecularpartition,theproposedLTRbasedsolvationfreeenergypredictioncanbefurtherimproved.Theapplicationoftheproposedapproachtoprotein-ligandbindingisunderourconsideration.208Chapter8ProteinLigandBindingFreeEnergyModeling8.1IntroductionDesigningtdrugsforcuringdiseasesisofessentialimportanceforthenewcentury'slifescience.Indeed,oneoftheultimategoalsofmolecularbiologyistounderstandthemolecularmechanismofhumandiseasesandtodeveloptsidrugsfordiseasecuring.Nevertheless,thedrugdiscoveryprocedureisextremelycomplicated,andinvolvesmanysciendisciplinesandtechnologies.Asabriefsummary,thedrugdiscover-ingcontainsthefollowingsevenmajorsteps[22],namely,i)Diseaseidenation;ii)Targethypothesis,i.e.,theactivationorinhibitionofdrugtargets(usuallyproteinswithinthecell)isthoughttoalterthediseasestate;iii)Screeningpotentialprinciplecompoundsthatwillbindtothetarget;iv)Optimizingtheidencompoundswithrespecttotheirstructuralcharacteristicsinthecontextofthetargetbindingsite;v)Preclinicaltest,bothinvitroandinvivotestswillbeperformed;vi)Clinicaltrialstodeterminetheirbioavailabilityandtherapeuticpotential;andvii)Optimizingchemical's,toxicity,andpharmacokineticsproperties.Typically,thewholecostofanewdrugdevelopmentisestimatedtobemorethanonebilliondollarswithmorethantenyears'group[283].Thislargeamountofcostmostlycomesfromunsuitablechemicalcompoundsthatareusedinthepreclinicaland209clinicaltesting[2].Intermsofeconomicaldrugdesign,sophisticatedandaccuratecomputeraidedcompoundscreeningmethodsbecomeextremelyimportant.Virtualscreening(VS)methodologiesfocusondetectingasmallsetofhighlypromisingcandidatesforfurtherexper-imentaltesting[215].DockingisoneofthemostimportantVSmethodologiesandiswidelyusedintheComputerAidedDrugDesign(CADD).Itisatwo-stageprotocol[10].Thestepissamplingtheligandbindingconformations,whichdeterminesthepose,orientation,andconformationofamoleculeasdockedtothetarget'sbindingsite[28].Thesecondstageisprotein-ligandbindingyscoring.WiththedevelopmentofMolecularDynamics(MD),MonteCarlo(MC),andGeneticAlgorithm(GA)forposegeneration,thesamplingproblemisrelativelywellresolved[265,146,187].Amajorremainingchallengeinachievingaccuratedockingisthedevelopmentofaccuratescoringfunctionsfordiverseproteinligandcomplexes.Oneofthemostimportantopenproblemsincomputationalbioscienceistheaccuratepredictionofthebindingofalargesetofdiverseprotein-ligandcomplexes[10].Adesirablegoalistoachievelessthan1kcal/molRootMeanSquareError(RMSE)intheprediction.Sincethepioneerworkinthe1980sand1990s,thestudyofthescoringfunctionandsamplingtechniqueshasbeenbloomingintheCADDcommunity[143,63,97,131].Inarecentreviewpaper,LiuandWangclassifytheexistingpopularscoringfunctionsintofourcategories[158],namely,i)Fbasedorphysicalbasedscoringfunctions;ii)Empiricalorregressionbasedscoringfunctions;iii)PotentialoftheMeanForce(PMF)orknowledgebasedscoringfunctions;andiv)Machinelearningbasedscoringfunctions.Physicsbasedscoringfunctionsprovidesomeofthemostaccurateanddetaileddescriptionofthepro-teinandligandmoleculesinthesolventenvironment.TypicalmodelsthatbelongtothiscategoryareMolecularMechanicsPoisson-BoltzmannSurfaceArea(MMPBSA)andMolec-210ularMechanicsGeneralized-BornSurfaceArea(MMGBSA)[139,93]withagivenforceparametrizationofbothsolventandsolutemolecules,likeAMBERorCHARMMforce[258,164,274].Inthisframework,thebindingfreeenergyismodeledasasuperpositionoffourparts:vanderWaals(vdW),electrostaticsinteractionsbetweenproteinandligand,thehydrogenbonding,andsolvationInadditiontoMMPBSAandMMGBSA,severalotherprestigiousscoringfunctionsalsobelongtothisgroup,includingCOMBINE[193]andMedusaScore[277].Physicalbasedscoringfunctionsareaclassofdynamicallyimprovedmethods,theVScanbecomemoreaccuratewiththefurtherdevelopmentofmoreadvancedandcomprehensivemolecularmechanicsforcePlentyofimprovementshasalreadybeendoneforimprovingtheaccuracyofthesescoringfunctions,suchasQM/MMmultiscalecoupling[229]andpolarizableforceelds[198].Empiricalorregressionbasedscoringfunctions,usuallyalsocalledMultipleLinearRegression(MLR)scoringfunctions,typicallymodeltheprotein-ligandbindingycontributedfromvdWinteraction,hydro-genbonding,desolvation,andmetalchelation[287].Severalparametersareintroducedineachoftheaboveterm,thescoringfunctionisobtainedbyusingtheexistingprotein-ligandbindinginformationtotraintheseparametersinthegivenbindingyfunction.Manyotherexistingscoringfunctionsalsobelongtothiscategory,e.g.,PLP[245],ChemScore[74],andX-Score[264],etc.Arecentstudyonacongenericseriesofthrombininhibitorsconcludesthatfreeenergycontributionstoprotein-ligandbindingarenon-additive,showingsometheoreticaldciesoftheMLRbasedscoringfunctions[16].Machinelearningalgorithmsdonotexplicitlyrequireagivenformofthebindingytoitsrelateditems,andthusdonotrequiretheadditiveassumptionofenergeticterms.Manymachinelearningbasedscoringfunctionsareproposedinthepastfewdecades.ThesemethodsapplyQuantitativeStructure-Activity211Relation(QSAR)principlestothepredictionoftheprotein-ligandbindingy.Repre-sentativeworkalongthislineistheRandomForest(RF)basedscoringfunction,RF-Score[152].InRF-Score,therandomforestisselectedasthebasicregressorinsteadoftheclassicalMLRwhichisrestrictedtothepre-linearformofthebindingyfunction.Byutilizationofthefeaturescalculatedfromtheexistingscoringfunctions,itachieveshighlyaccuratee-foldcrossvalidationresultsonthePDBBindv2013set.PredictionresultsonthePDBBindv2007coresetfurthertheaccuracyoftheRF-Score[152].Manyothermachinelearningtoolsareutilizedasthemainskeletonsofthescoringfunctions,likeSupportVectorRegression(SVR)[135],multivariateadaptiveregression(MARS),k-NearestNeighbors(kNN),BoostedRegressionTrees(BRT),etc[4].Thebloomingofthebigdataapproachesandmoreaccuratedescriptorscharacterizationoftheprotein-ligandbind-inghavemademachinelearningtypeofscoringfunctionfullofvitalityinCADD.Machinelearningbasedscoringfunctionscanmakecontinuousimprovementthroughbothadvanceinphysicalprotein-ligandbindingdescriptorsanddiscoveryofnewmachinelearningtechniques.AnotherimportantclassofscoringfunctionsisPMFbased.Thiscategoryofscoringfunctionsisbasedonthestatisticalmechanicstheoryinwhichtheproteinlig-andbindingyismodeledasthesumofpairwisestatisticalpotentialsbetweenproteinandligandatoms.ThemajormeritofthePMFtypeofscoringfunctionsistheirsim-plicityinbothconceptandcomputation.Thisphysicalmodelcapturesmajorphysicalprinciplesbehindtheproteinligandbinding.InKnowledge-basedandEmpiricalCombinedScoringAlgorithm(KECSA),thebindingybetweenproteinandligandaremodeledby49pairwisemoLennard-Jonestypesofpotentialsbetweenttypesofatoms[288].Throughalargenumberoftraininginstances,thefunctionalformofall212thesepairwiseinteractionpotentialscanbedetermined.eligandbindingconfor-mationsamplingprocedurecanalsobeincorporatedintothistheoreticalframework[289].TherearemanyotherinterestingdevelopmentsinthePMFbasedscoringfunctions,e.g.,PMF[180],DrugScore[243],andIT-Score[116].Essentially,themajorpurposeofthescoringfunctionistotherelativeorderofbindingofcandidatechemicalstothetargetbindingsite.Theserankedresultsarefurtherusedforthepreclinicaltestinrealdrugdesignprocedure.Fromthispointofview,thescoringfunctiondevelopmentturnsouttobethedevelopmentofrankingmethods.Manyexistingscoringfunctionsarealsodevelopedfromthisperspective.Forexample,LearningToRank(LTR)algorithmshavebeenusedtodevelopvariousscoringfunctions,includingPTRank,RankNet,RankNet,RankBoost,ListNet,andAdaRank[283,267,2,247].ComparedtoothermachinelearningorsimpleMLRbasedscoringfunctions,theadvantagesofrankingbasedscoringfunctionsaretwo-fold.First,theyareapplicabletoidentifyingcompoundsonnovelproteinbindingsiteswherenocientdataavailableforothermachinelearningalgorithms.Second,theyaresuitableforthecasethatbindingaremeasuredintplatformssincerankingcanbemorefocusedonrelativeorder[283].Inthiswork,weproposeaFeatureFunctionalTheory-BindingPredictor(FFT-BP)fortheblindpredictionofbindingy.TheFFT-BPisconstructedbasedonthreeas-sumptions,i.e.,i)representabilityassumption:thereexistsamicroscopicfeaturevectorthatcanuniquelycharacterizeanddistinguishonemolecularcomplexfromanother;ii)feature-functionrelationshipassumption:themacroscopicfeatures,includingbindingfreeenergy,ofamoleculeorcomplexisafunctionalofmicroscopicfeaturevectors;andiii)similarityassumption:moleculeswithsimilarmicroscopicfeatureshavesimilarmacroscopicfeatures,213suchasbindingfreeenergies.FFT-BPhasthreedistinguishingtraits.AmajortraitoftheproposedFFT-BPisitsuseofmicroscopicfeaturesderivedfromphysicalmodels,in-cludingPoissonBoltzmann(PB)theory[218,219,208,112,92,43,248],nonpolarsolvationmodels[226,84,83,50,246,270,46],andcomponentsinMMPBSA[139].Assuch,elec-trostaticsolvationfreeenergy,electrostaticbindingy,atomicreactionenergies,andCoulombicinteractionsareutilizedtorepresenttheelectrostaticofprotein-ligandbinding.AtomicpairwisevanderWaalsinteractionsareemployedtomodelthedispersioninteractionsbetweentheproteinandligand.WealsomakeuseofatomicsurfaceareasandmolecularvolumeinourFFT-BPtodescribehydrophobicandentropyoftheprotein-ligandbindingprocess.AnothertraitofthepresentFFT-BPisitsfeature-functionrelationshipassumption,whichavoidstheuseofadditivemodelingofthetotalbindingybythedirectsumofvariousenergycomponents.Themachinelearningalgorithmautomaticallyranktherelativeimportanceofeachfeaturetothebindingy.Byuti-lizingtheboostedregressiontreetypeofalgorithmsfortheranking,ourmodelcancapturethenonlineardependenceofthebindingytoeachfeature.TheothertraitofFFT-BPisitsuseofadvancedLTRalgorithm,themultipleadditiveregressiontree(MART),forrankingthenearestneighborsviamicroscopicfeatures.Thisapproachallowsustofurtherimproveourmethodbyincorporatingthestate-of-the-artmachinelearningtechniques.Thischapterisstructuredasfollows.InSection8.2,wepresentthetheoreticalback-groundofFFT-BP,whichconsistsoffourparts,basicassumptions,microscopicfeatureselection,MARTalgorithmandbindingtyfunction.InSection8.3,weverifytheac-curacyandrobustnessofourFFT-BPbyavalidationset,atrainingsetandthreestandardtestsetsinvolvingavarietyofdiverseprotein-ligandcomplexes.WeshowthatFFT-BPdeliverssomeofthebestbindingypredictions.2148.2TheoryandalgorithmInthissection,wepresentFFTforbindingfreeenergyprediction.First,wediscussthebasicFFTassumptions.Additionally,featureselectionsarebasedonphysicalmodels.Moreover,protein-ligandcomplexesarerankedfromamachinelearningalgorithms,i.e.,theMARTrankingalgorithm.Finally,wedescribealinearregressionalgorithmforapproximatingthebindingfreeenergybasedonfeaturesfromnearestneighborsrankedbytheMARTalgorithm.8.2.1BasicassumptionsOurFFTisbasedonthreeassumptions,includingrepresentability,feature-functionalrela-tionshipandsimilarity.Theseassumptionsaredescribedbelow.8.2.1.1RepresentabilityassumptionWithoutlostofgenerality,weconsideratotalofNmoleculesorcomplexesfMigNi=1withknownnamesandgeometricstructuresfromrelateddatabases.OneofFFTbasicas-sumptionsisthatthereexistsann-dimensionalmicroscopicfeaturevector,denotedasxi=(xi1;xi1;;xin)touniquelycharacterizeanddistinguishtheithmoleculeorcomplex.Herethevectorcomponentsincludevariousmicroscopicfeatures,suchasatomictypesandnumbers,atomiccharges,atomicdipoles,atomicquadrupole,atomicreactionenergies,electrostaticsolvationorelectrostaticbindingfreeenergies,atomicsurfaceareas,pairwiseatomicvanderWaalsinteractions,etc.Forithmoleculeorcomplex,apartfromitsnmicroscopicfeatures,therearelmacroscopicfeatures,orphysicalobservableoi=(oi1;oi1;;oil),suchasdensity,pressure,boilingpoint,enthalpyofformation,heatofcombustion,solvationfreeenergy,pKa,pH,viscosity,215permittivity,electricalconductivity,bindingfreeenergy,etc.Wecombinethemicroscopicandmacroscopicfeaturevectorstoconstructanextendedfeaturevectorvi=(xi;oi)fortheithmolecule.ExtendedfeaturevectorsfvigNi=1spanavectorspaceV,whichcommonlyre-quiredeightaxiomsforadditionandmultiplication,suchasassociativity,commutativity,identityelement,andinverseelementsofaddition,compatibilityofscalarmultiplicationwithldmultiplication,etc.UnliketheusualLpspace,theextendedfeaturespacedoesnothavethenotionofnearness,anglesordistances.Wethereforeneedadditionaltech-niques,namely,machinelearningalgorithmstostudythenearnessanddistancebetweenfeaturevectors.Theselectionofmicroscopicfeaturesdependsonwhatphysicalorchemicalpredictionisinterested.Inourapproach,weutilizemicroscopicfeaturesformrelatedphysi-calmodels.Forexample,forsolvationandbindingfreeenergyprediction,weselectfeaturesthatarederivedfromimplicitsolventmodelsandquantummechanics.Basedonourassumption,microscopicfeaturesalongareabletocharacterizeanddis-tinguishmolecules.Incontrast,macroscopicfeaturesareusedasthelabelinlearningandrankingmoleculesforagivenpurpose.Therefore,foragiventask,saybindingfreeenergyprediction,wedonotincludeallthemacroscopicfeaturesinthefeaturevectoroi.Weonlyselectoi=(oi1)=Gi;8i=1;;N,wherefGigareknownbindingfreeenergiesfromdatabases.Theresultingextendedvectorisusedforthebindingfreeenergyprediction.8.2.1.2Feature-functionrelationshipassumptionInFFT,ageneralfeature-functionrelationshipisassumedforthejthphysicalobservableojoftargetmoleculeAoAj=fj(xA;v1;v2;;vN);(8.2.1)216wherefjisanunknownfunctionmodelingthejthphysicalobservableofmoleculeAandxAisthemicroscopicfeaturevectorofthetargetmoleculeA.Thisrelationappliestothepredictionofvariousphysicalandchemicalproperties.Inthepresentapplication,weareinterestedinthepredictionofbindingfreeenergiesforasetofdiverseprotein-ligandcom-plexes.WeconstructafeaturespaceforthetrainingsetandthebindingfreeenergyoftargetmolecularcomplexABcanbegivenasafunctionalofextendedfeaturevectorsGAB=fbinding(xAB;v1;v2;;vN)(8.2.2)whereGABisthebindingfreeenergyofmolecularcomplexAB,andfbindingisanunknownfunctionalformodelingtherelationshipbetweenbindingfreeenergyandextendedfeatures.Obviously,thedeterminationoffbindingisamajortaskofthepresentwork.8.2.1.3SimilarityassumptionIntheFFT,weassumethatmoleculeswithsimilarmicroscopicfeatureshavesimilarmacro-scopicfeatures,orphysicalobservable.Inthepresentapplication,weassumethatproteincomplexeswithsimilarmicroscopicfeatureswillhavesimilarbindingfreeenergies.Thisassumptionprovidesthebasisforutilizingsupervisedmachinelearningalgorithmstorankprotein-ligandcomplexes.InourearlierHPKmodel,weassumethatmoleculeswithsimilarfeatureshavingthesamesetofparametersinaphysicalmodel.Asaresult,solvationorbindingfreeenergiesarestillcomputedbasedonaphysicalmodel,whileamachinelearningalgorithmisusedtooutthenearestneighborsformodelingthephysicalparameters.InthepresentFFT,thebindingfreeenergyisnotmodeledbyaphysicalmodeldirectly.However,themicroscopic217featuresareconstructedfromphysicalmodels.8.2.2MicroscopicfeaturesInphysicalmodels,suchasMMPBSAandMMGBSA,theproteinligandbindingyisgivenbythecombinationofmolecularmechanicsenergy,solvationfreeenergy,andentropytermG=EMM+GsolvTS;(8.2.3)whereEMM,Gsolv,andTSarethemolecularmechanicsenergy,solvationfreeenergy,andentropyterms,respectively.Further,themolecularmechanicsenergycanbedecomposedasECovalent,whichisthesumofbond,angle,andtorsionenergyterms,andENoncovalent,whichincludesthevanderWaalstermandaCoulombictermECoul[104].Equation(8.2.3)isusedasaguidanceforthefeatureselectioninourFFT-Scoremodel.8.2.2.1ReactionfeaturesMolecularelectrostaticsisoffundamentalimportanceintheproteinsolvationandbindingprocesses[219,112,92].Inthiswork,weuseaclassicalimplicitsolventmodel,thePBtheory,formodelingthemolecularelectrostaticsinthesolventenvironment.Thismodelisusedfortwopurposes.Ontheonehand,thesolvationduringtheproteinligandbindingwillbemodeledviathistheory.Ontheotherhand,theelectrostaticcontributiontotheproteinligandbindingyiscomputedbasedonthismodel,aswell.Forsimplicity,weconsiderthelinearizedPBmodelinthepurewatersolvent,whichisformulatedasthefollowingellipticinterfaceprobleminmathematicalterminology.The218governingequationisgivenby((r)r˚(r))=NmXi=1Qi(rri);(8.2.4)withtheinterfaceconditions[˚]j=0;(8.2.5)and[n]j=0;(8.2.6)where˚istheelectrostaticspotentialoverthewholesolventsolutedomain,Qiisthepartialchargelocatedatriand(rri)isthedeltafunctionatpointri.Thepermittivityfunction(r)isgivenby(r)=8><>:m=1;r2ms=80;r2s(8.2.7)wheremandsaresoluteandsolventdomains,respectively.Thetwodomainsaresepa-ratedbythemolecularsurfaceThefollowingDebye-HuckeltypeofboundaryconditionisimposedtomakethePBmodelwellposed˚(r)=NmXi=1Qi4ˇsjrrij;ifr2@;(8.2.8)where=mSs.MolecularreactionenergyiscomputedbythefollowingformulaGRF=NmXi=1GRFi(8.2.9)219wheretheithatomicreactionenergyGRFiisgivenbyGRFi=12Qi(˚(ri)˚home(ri))(8.2.10)where˚homeisobtainedthroughsolvingthePBmodelwith(r)=1inthewholecompu-tationaldomainNotethatatomicreactionenergiesGRFiareusedasfeaturesinourFFTbasedsolvationmodel.Herethereactionenergygivesagooddescriptionofthesolvationfreeenergy.Inourearlierstudyonthesolvationmodel,wefoundthatreactionenergyrelatedmoleculardescriptorprovidesaveryaccuratecharacterizationofthesolvationThestudyofalargeamountofsmallsolutemoleculesdemonstratesthatbyusingthesemicroscopicfeaturesinthesolvationmodel,thepredictedsolvationfreeenergyisinanexcellentagreementwiththeexperimentalsolvationfreeenergy.Forexample,theRMSEofourleave-one-outtestforalargedatabaseof668moleculesisaround1kcal/mol[250].NotethatinEq.(8.2.9),thewholereactionenergyisregardedasthesumofatomicreactionenergies.InthePBcalculation,thesolutemoleculeisusuallyassumedtobeahomogeneousdielectriccontinuumwithauniformdielectricconstant,whichisaninap-propriateassumption,sinceatomsintenvironmentsshouldhavetdielectricproperties[265].Forthisreason,weselecttheatomicreactionenergyasamicro-scopicfeatureandletthemachinelearningalgorithmtoautomaticallytakecarethepossibleindielectricconstants.2208.2.2.2ElectrostaticbindingfeaturesByusingthePBmodel,wecanfurtherobtaintheelectrostaticscontributiontotheprotein-ligandbindingy.TheelectrostaticsbindingfreeenergyiscalculatedbyGel=GRF)ComGRF)ProGRF)Lig+GCoul;(8.2.11)whereGelistheelectrostaticsbindingfreeenergybetweenproteinandligand,GRF)ProandGRF)Ligarethereactionldenergiesoftheproteinandligand,respectively.HereGCoulistheCoulombicinteractionbetweenthetwopartsinthevacuumenvironment,whichiscomputedasGCoul=Xi;jQiQjrij;(8.2.12)whererijisthedistancebetweentwospcharges,andindexesiandjrunoveralltheatomsintheproteinandligandmolecules,respectively.ThePBmodelissolvedbyourin-housesoftware,MIBPB[295,278,90,39],whichisshowntobegridsizeindependent.Itsrelativerankingordersofreactionenergyandbindingfreeenergycalculatedwithtgridsizesareconsistent[252].Thisnumericalaccuracyguaranteesthepreservingofrelativerankingorders,whichinturnavoidstheonthepredictionfromnumericalerrors.8.2.2.3AtomicCoulombicinteractionCoulombicenergyplaysanimportantroleinthemolecularmechanicsenergy[167,139,104].Coulombicenergycalculationalsodependsonthedielectricmedium.Tothisend,weconsideredtheatomicCoulombicinteractionsinvacuumenvironment.Sp,forthe221ithatomintheproteinmolecule,weselectthemicroscopicfeaturefromatomicCoulombicenergyasGCoul)i=XjQiQjrij;(8.2.13)wherethesummationindexjrunsoveralltheatomsintheligandmolecule.TheCoulombicenergyassociatedwiththeatomsintheligandmoleculescanbeanalogously.8.2.2.4AtomicvanderWaalsinteractionItwasshownthatvanderWaalsinteractionsplayanimportantroleinsolvationanalysis[83,50,246,46,248].WeexpectthatvanderWaalsinteractionsareessentialtobindingprocessaswell.Inthiswork,weconsiderthe6-12LennardJones(LJ)interactionpotentialformodelingthevanderWaalsinteractionsuij(ri;rj)=ij"ri+rjjjrirjjj122ri+rjjjrirjjj6#;(8.2.14)whereriandrjareatomicradiioftheithandjthatoms,respectively.ijmeasuresthedepthoftheattractivewellatjjrirjjj=ri+rj.ForfeaturesrelatedtothevanderWaalsinteractions,weselectpairwiseparticlesinteractionsasmicroscopicfeaturesfordescribingthevanderWaalsinteractionsbetweentheproteinandligand.Inthesefeatures,eachatomtypeiscollectedtogether,well-depthparametersijareleftastrainingparametersinthesubsequentmachinelearningalgorithm.8.2.2.5AtomicsolventexcludedsurfaceareaandmolecularvolumeMolecularsurfaceareaandsurfaceenclosedvolumeareusuallyemployedinscaled-particletheory(SPT)tomodelthenonpolarsolvationfreeenergy[226,197,163]and/orentropy222contributiontotheproteinligandbindingy.InourFFT-BP,thesolventexcludedsurfaceisemployedforconformationmodelingofthesolvatedmolecule.Themolecularsurfaceareaassociatedwitheachatomtypeandmolecularvolumeareusedasmicroscopicfeatures.Thesefeaturesarealsocomputedbyourin-housesoftware,ESES[157],inwhichasecondorderconvergentschemebasedonthelevelsettheoryandthirdordervolumeschemesareimplemented.InESES,themolecularsurfaceareaispartitionedintoatomicsurfaceareasbasedonthepowerdiagramtheory.8.2.2.6SummaryofmicroscopicfeaturesWeconsidermicroscopicfeaturesofaprotein-ligandcomplex.Fortheproteinmolecule,microscopicfeaturesareselectedfromfollowingtypesofatoms,i.e.,C,N,O,andS.Fortheligandmolecule,atomicfeaturesarecollectedfromC,N,O,S,P,F,Cl,Br,andI.Herewedropfeaturesfromhydrogenatoms(H)sincethepositionsoftheseatomsarenottypicallygiveninoriginalX-raycrystallographydata,theirinformationitselfmaynotbeaccurate.Coincidentally,thisselectionofrepresentativeeatomsisconsistentwiththatofsomeotherexistingscoringfunctions,e.g.,Cyscore[33],AutoDockVina[240],andRF-Score[9].Inourmodel,wecollectatomicreactionenergies,molecularreactionenergy,atomicvanderWaalsandCoulombicinteractions,atomicsurfaceareas,andmolecularvolumeasthebuildingblockoffeaturespace.Duetothefactthatbindingisadynamicalprocess,thechangeoftheatomicreactionenergies,atomicsurfaceareas,andmolecularvolumesbetweentheboundedandunboundedstatesareselectedasmicroscopicfeaturesaswell.2238.2.3MARTrankingalgorithmInthissubsection,weintroducetheMARTalgorithm,anddescribetheapplicationofthisalgorithmtoproteinligandbindingyscoring.MARTisalist-wiseLTRalgo-rithm,foragiventrainingsetwithfeaturevectorsandassociatedrankingorder(herewesimplyusingtheprotein-ligandbindingyasthislabelvalue),ittrainsafunctionthatoptimallysimulatestherelationbetweenfeaturesandlabels.Whenappliedtoaprotein-ligandcomplexinthetestset,thistrainedfunctionactsonthecorrespondingfeaturesandgivesapredictedvalue.Thepredictedvaluethebindingyofthecomplexinthetestset.Intheweb-searchcommunity,LambdaMARTisoneofthestate-of-the-artLTRalgorithms,hereLambdaMARTisacouplingofLambdaandMART.ComparedtotheclassicalMLRmodelfortrainingfunctionsthatlinkfeaturesandlabels,MARTcancapturethenonlinearrelationship.Furthermore,comparedtomostneuronnetworkbasedalgorithms,itismoret.MARTalsonamedGBDT(gradientboostingdecisiontree)isaverytensemblemethodforregression.Meanwhile,duetotheboostingoftheweakerlearners(usuallyquitesimplemodelslikedecisiontree),theovproblemcanbeavoidedely.TheprinciplesoftheGBDTaresummarizedasfollowing:Forthetrainingset,GBDTsuccessivelylearnstheweaklearners,andeachweaklearnerisaregressiontreewithquiteafewlevelsfortheresidualofthepreviousforestcomparedtothetrainingset.Thisprocedurestartsfromaregressiontreeforthetrainingset,andtheregressiontreeisaddedintotheforestgradually.Eachsucceedregressiontreeisusedfortheresidualofthepreviousforest.Insteadofcountingthewholecontributionfromeachregressiontree,shrinkageisadopted,whichisaweightoftheregressiontree.Thisweightisobtainedthrough224solvinganoptimizationproblemviathesimplelinesearchingalgorithm.Weightedcontributionsfromthewholeregressiontreesarepresentedinthescoringfunction,whichistheboostingofsimpleregressiontrees.Duetothesimplicityofeachregressiontree,theovproblemcanbebypassedtly.Insummary,theMARTitselflearnsafunctionbetweenfeaturesandthebindingfreeenergythroughthetrainingset.Inthetestingstep,thisfunctionassignsapredictedbindingytoeachsampleinthetestingset,andtherankingpositionofagivensampleisdeterminedthroughtheobtainedscore.Thisrankingmethodistlyentfromtheclassicalpairwiseapproaches,e.g.,RankSVM[128,129,144],whererankingisbasedonthepairwisecomparisonbetweenallsamplepairsinthetrainingset.Themajordrawbackoftheseapproachesisthattheyassumesthesamepenaltyforallpairs.Incontrast,weonlycareaboutafewtoprankingresultsforagivenqueryinmostapplications.FormorecomprehensiveandmathematicaldescriptionoftheMART,readerisreferredtotheliterature[26,78].ManyotherLTRalgorithmscanbeusedinourframeworkaswell,likeLambdaMART[26,78],ListNet[34],etc.8.2.4MethodforbindingypredictionInthissubsection,wediscusstheFFTpredictionofthebindingfreeenergyofagiventargetprotein-ligandcomplexAB.Basedonourassumptionthatbindingfreeenergyisafunctionaloffeaturevectors,weconstructafeaturefunctionaroundthetargetmolecularcomplexanduseittopredictthebindingfreeenergy.Eventhoughtheexactformofthefunctionbetweenfeatureandbindingityisunknown,locallyitcanbeapproximatedbyalinearfunction.Inotherwords,locallyweassumethebindingnityisalinearfunctionofthemicroscopic225featurevector.Theimportanceofvariousfeaturescanberankedautomaticallyduringthemachinelearningprocedure,andthusthenumberoftialfeatures(n)canbereducedbyse-lectingfeaturesoftopimportancetorepresentthebindingy.WeassumethattargetmolecularcomplexABischaracterizedbyitsfeaturevectorxAB=(xAB1;xAB2;;xABn),wherenisthedimensionofthemicroscopicfeaturespace,i.e.,thespaceofallmicroscopicfeaturevectors.WealsoassumethatbyusingtheLTRalgorithm,wecantopmnearestneighborsfromthetrainingset.Theextendedfeaturevectorsofthesenearestneighborcomplexesaregivenbyfvi=(xi;Gi)gmi=1.Ingeneral,thedimensionofthefeaturespaceismuchlargerthanthenumberofnearestneighborsused,i.e.,m˝n.Therefore,thedirectleastsquareapproachmayleadtoovToavoidovweutilizeaTikhonovregularizationbasedleastsquarealgorithmfortrainingthebindingyfunction.Fromtheextendedfeaturevectors,wecansetupthefollowingsetofequations0BBBBBBBBB@G1G2...Gm1CCCCCCCCCA=0BBBBBBBBB@x11x12x1nx21x22x2n............xm1xm2xmn1CCCCCCCCCA0BBBBBBBBB@w1w2...wn1CCCCCCCCCA+0BBBBBBBBB@bb...b1CCCCCCCCCA;(8.2.15)wherewi=wi(v1;v2;vm)andb=b(v1;v2;vm)dethefunctionforGi.Bythesimilarityassumption,thesamefunctionalformcanbeusedfortargetcomplexAB.Forfurtherderivation,werewriteEq.(8.2.15)asG=xw+b1;(8.2.16)226whereG=G1;G2;;Gm)T,w=(w1;w2;;wn)T,1isanm-dimensionalcolumnvectorwithallelementsequaling1,andmatrixxisgivenbyx=0BBBBBBBBB@x11x12x1nx21x22x2n............xm1xm2xmn1CCCCCCCCCA:ToavoidovweaddanL2penaltytotheweightvectorw,andsolveEq.(8.2.16)asanoptimizationproblemminw;bjjGxwb1jj22+jjwjj22:=minw;bF;(8.2.17)whereisaregularizationparameterandissetto10inthiswork,andjjjj2denotestheL2normofthequantity.Bysolving@F@w=0,wehavew=xTx+I1xTGxT(b1);(8.2.18)whereIisanmmidentitymatrix.TodeterminebfromEq.(8.2.17),werelaxb1toanarbitraryvectorsuchthatb=(b1;b2;;bm)T.Bysolving@F@b=0,wehaveb=Gxw:(8.2.19)227Anunbiasedestimationofbisgivenbyb=Pmi=1Gxw)im;(8.2.20)whereGxw)iistheithcomponentofthevectorGxw.TheoptimizationprobleminEq.(8.2.17)issolvedbyalternatelyiteratingEqs.(8.2.18)and(8.2.20),whichisessentiallyanExpectation-Maximization(EM)algorithm.Afterobtainingoptimizedweightswforthefeaturevectorxandhyperplaneheightb,thebindingfreeenergyoftargetmolecularcomplexABcanbepredictedasGAB=b+nXi=1wixABi:(8.2.21)Equation(8.2.21)canberegardedasalinearapproximationofthebindingfreeenergyfunctionalGAB=f(xAB;v1;v2;vm).Alternatively,wecanalsodirectlyobtainthebindingyofthetargetcomplexABfromtheLTRrankingvalueiftherankingalgorithmattemptstothetargetvalue.ForgeneralLTRalgorithms,especiallypairwiserankingalgorithms,thedirectuseoftherankingscoreasapredictedbindingyisnotappropriate.However,theproposedprotocolalsoappliestothisscenario.Thesetwoapproachesarecomparedinthispresentwork.8.3NumericalresultsInthissection,weexplorethevalidity,demonstratetheperformance,andexaminethelimitationoftheproposedFFT-BP.First,wedescribedatasetsusedinthiswork.Then,weexaminewhetherFFT-BP'sperformancedependsonproteinclusters,whereeachcluster228containsonespproteinandtensorhundredsofligands.Ourtestonavalidationsetof1322protein-ligandcomplexesfrom7clustersindicatesthattheperformanceoftheproposedFFT-BPdoesnotdependonproteinclusters.Byusingthesametestset,wealsostudytheimpactofdistancetoFFT-BPprediction.Herecut-odistancereferstoproteinfeatureevaluationtruncationdistance.Proteinatomswithinthedistanceareallowedtocontributetheatomicfeatureselectionandcalculation.TofurtherbenchmarktheaccuracyofthepresentFFT-BP,wecarryoutae-foldcrossvalidationontrainingset(N=3589),whichisderivedfromthePDBBindv2015set[160].Finally,weprovideblindpredictionsonabenchmarksetof100protein-ligandcomplexes[265],thePDBBindv2007coreset(N=195)[9],andthePDBBindv2015coreset(N=195)[160].8.3.1DatasetperparationAlldatasetsusedinthepresentworkareobtainedfromthePDBBinddatabase[160],inwhichthePDBBindv2015nedsetof3,706entrieswasselectedfromageneralsetof14,620protein-ligandcomplexeswithgoodquality,overbindingdata,crystalstructures,aswellasthenatureofthecomplexes[160].Duetothefeactureextraction,apre-processingofdataisrequiredinthepresentmethod.8.3.1.1DatasetsThisworkutilizesonevalidationset(N=1322),onetrainingset(N=3589),andthreetestsets(N=195;N=195andN=100).8.3.1.1.1Validationset(N=1322)Toexploretheclusterdependenceorindepen-denceandtheoptimaldistanceofthepresentFFT-BP,weselectasubsetofthe229PDBBindv2015setwith1322complexesin7tclusters.Eachclustercon-tainsoneproteinandalargenumber,rangingfrom93to333,ofsmallligandmolecules.8.3.1.1.2Trainingset(N=3589)WecarryourFFTmicroscopicfeatureextractionofthePDBBindv2015setviaappropriateforceparametrizationdescribedbelow,whichleadstoaparametrizedsetof3589protein-ligandcomplexes.Wheneveratestsetisemployed,itsentriesarecarefullyexcludedfromthetrainingsetof3589complexes.8.3.1.1.3TestsetsThreetestsetsarestandardonesdescribedintheliterature.PDBIDsofthetrainingsetandthevalidationsetaregivenintheSupportingmaterial.ThePDBBindv2015coresetof195benchmark-qualitycomplexesisemployedasatest.Accordingtotheliterature,thePDBBindv2015coresetwasselectedwithanemphasisonthediversityinstructuresandbindingdata.Itcontains65representativeclustersfromtheset.Foreachcluster,itmusthaveatleasteprotein-ligandcomplexesandthreecomplexes,onewiththehighestbindingconstant,anotherwiththelowestbindingconstant,andtheotherwithamediumbindingconstantwereselectedforthePDBBindv2015coreset[160].Wealsoconsidertwoadditionaltestsets,thePDBBindv2007coresetof195complexes[48]andthebenchmarksetof100complexes[265]tobenchmarktheproposedFFT-BPagainstalargenumberofscoringfunctions.Whenthetrainingset(N=3589)isappliedtoatestset,weexcludealltheoverlappingentriesbetweenthetrainingandthegiventestandre-trainthetrainingsetforthespetest.2308.3.1.2Datapre-processingFFT-BPutilizesmicroscopicfeatures,whichrequiresappropriatefeatureextractionfromthedataset.Beforethefeaturegeneration,structureoptimizationandforceeldassignmentarecarriedout.ProteinstructureswithcorrespondingligandarepreparedwiththeproteinpreparationwizardutilityoftheScodinger2015-2Suite[79,213]withdefaultparametersexceptthemissingsidechains.TheprotonationstatesforligandsaregeneratedusingEpikstatepenaltiesandtheH-bondnetworksforthecomplexarefurtheroptimizedusingPROPKAatpH7.0[210,189].TherestrainedminimizationonheavyatomsforthecomplexstructuresareperformedwithOPLS2005force[132].TheatomicradiiandchargesforthecomplexesareparameterizedbyAmbertool14[35].Forligandmolecules,chargesarecalculatedbytheantechambermodulewithAM1-BCCsemi-empiricalchargemethodandtheatomicradiiareassignedbyusingthembondi2radiiset[122].Forproteinmolecules,radiiandchargesofeachatomareparameterizedbytheAmbergeneralforcewithtleapmodule[35].Proteinfeaturesareextractedwithadistance.Sp,weatightboundingboxcontainingtheligand,thenextendfeaturegenerationdomainalongalldirec-tionsaroundtheboxtoadistance.WeprovideallthedatainvolvedinthisworkintheSupportingmaterial,inwhichsomeprotein-ligandstructuresthatneedsspectreatmentsareemphasized.InthePDBBinddatabase,theproteinligandbindingyisprovidedintermofpKd.WeconvertalltheenergyunitinthePDBBinddatabasetokcal/mol.Toderivetheunitconvertformula,onenotesthatG=RTlnkd=RTlnKeq;231whereGistheGibbsfreeenergy,kdisthedisassociationconstant,andRisthegasconstant.SincepKd=log10Kd,thenattheroomtemperature,T=298:15K,onehasthefollowingrelationbetweenthesetwounitsG=1:3633pKd:(8.3.1)8.3.2ValidationInthissection,weexplorethepropertiesofFFT-BPandvalidateitsperformance.Thefollowingtwoimportantissuesareexaminedinseveralexistingscoringfunctions.Theissueisrelatedtotheprotein-ligandbindingypredictionofdiversemultipleclusters,especiallyclusterswithlimitedexperimentaldata.Anotherissueisthatascoringmethodshouldbeoptimizedwithadistanceinthefeatureextractiontomaintaincientaccuracyandavoidunnecessaryfeaturecalculations.Intheexistingwork,theLTRbasedscoringfunctionscanpredictcross-clusterbindingywell[283].Fortherandomforestandsomeothermachinelearningalgorithms,onetypicallyselectsadistanceof12A,intheproteinfeaturecalculation[10].Inthiswork,wedemonstratethecapabilityoftheFFT-BPfortheaccuratecross-clusterbindingyprediction.Additionally,weexploretheoptimaldistanceforFFT-BPfeatureextraction.Finally,sincetheaccuracyoftheFFT-BPpredictionsdependsonthenumbersofthenearestneighborsandtopfeatures,weinvestigaterobustnessoftheproposedFFT-BPwithrespecttochoiceofthenearestneighborsandtopfeatures.Twosetsofprotein-ligandcomplexes,i.e.,thevalidationset(N=1322)andthetrainingset(N=3589),areemployedinthisvalidationstudy.2328.3.2.1Validationonthevalidationset(N=1322)Table8.1:TheRMSEs(kcal/mol)forthee-foldvalidationonthe7clustersofthevali-dationsetandonthewholevalidationset(N=1322)with10tdistancesinthefeatureextraction.TestsetGroup5A10A15A20A25A30A35A40ACluster1Group11.901.861.761.731.811.901.811.82Group22.072.152.382.232.352.252.212.24Group32.311.982.041.951.851.871.871.89Group41.891.751.581.631.631.671.621.66Group52.352.222.092.052.141.672.102.12Average2.112.011.991.931.972.131.931.96Cluster2Group11.391.331.311.321.391.431.461.42Group21.661.241.311.231.191.141.151.19Group31.391.281.141.211.281.311.371.37Group41.441.331.351.361.371.381.351.40Group51.531.441.381.491.361.371.381.33Average1.491.331.341.321.321.331.341.35Cluster3Group12.562.402.652.412.532.622.612.60Group22.072.132.082.102.112.112.112.09Group31.541.531.521.571.551.511.521.50Group41.821.751.701.711.641.681.701.72Group52.142.232.202.152.182.262.262.27Average2.052.032.072.012.032.082.082.08233Table8.1(cont'd)TestsetGroup5A10A15A20A25A30A35A40ACluster4Group11.591.781.801.881.761.681.721.72Group21.411.471.531.251.341.391.371.34Group31.581.461.501.591.561.521.551.55Group41.911.761.761.871.831.841.801.78Group51.571.541.611.731.811.841.741.67Average1.621.611.641.681.671.671.651.62Cluster5Group12.012.431.831.641.601.651.671.69Group22.152.081.891.881.921.861.941.88Group32.522.262.542.422.412.372.392.40Group41.651.701.301.371.251.331.351.36Group53.182.872.892.492.592.542.562.67Average2.392.312.182.032.032.022.052.07Cluster6Group13.172.993.032.953.003.022.902.92Group22.091.831.831.821.911.831.881.86Group31.681.711.551.651.691.551.631.58Group41.731.691.551.601.601.511.581.58Group52.301.972.042.132.032.062.052.05Average2.262.092.082.092.102.072.062.06Cluster7Group11.831.972.161.931.681.662.011.90Group21.921.991.971.972.001.932.092.06Group31.681.691.451.391.351.391.441.51234Table8.1(cont'd)TestsetGroup5A10A15A20A25A30A35A40AGroup42.272.112.131.912.142.142.392.36Group51.761.401.291.321.411.351.381.39Average1.901.831.811.711.731.711.881.86Average1.901.881.871.831.841.841.851.85WholesetGroup11.811.551.621.571.671.691.661.55Group21.631.761.621.691.641.671.551.71Group31.711.581.651.651.651.551.551.63Group41.731.621.651.571.561.531.781.57Group51.641.651.591.641.651.681.601.63Average1.701.631.631.631.641.631.641.63WevalidatetheproposedFFT-BPonthevalidationsetof1322complexes.Weutilizethee-foldcrossvalidationstrategytotestthemodelanddetermineoptimaldistance.Inthisstrategy,thevalidationsetof1322complexesisrandomlypartitionedintoeessentiallyequalsizedsubsets.Oftheesubsets,asinglesubsetisretainedasthetestsetfortestingtheFFT-BP,andtheremainingfoursubsetsareusedastrainingdata.First,werunacoarsetestwithdistancefrom5to50Ausing5Aasthestepsize,whichhelpstodeterminetheroughoptimaldistance.Second,wecarryasearchfortheoptimalthedistancebasedoncoarsetestresultswithastepofsize1A.Atagivensize,wedothee-foldcrossvalidationonthevalidationsetof1322complexes,togetherwiththe235Table8.2:TheRMSEs(kcal/mol)forthee-foldtestonthevalidationset(N=1322)withFFT-BPcalculatedattcutdistances.Groupdistance5A6A7A8A9A10A11A12A13A14A15AGroup11.811.681.801.581.611.551.491.621.501.601.62Group21.631.671.611.791.671.761.631.651.761.721.62Group31.711.681.571.651.611.581.801.621.561.711.65Group41.731.561.461.581.641.621.741.581.551.701.65Group51.641.571.821.571.601.651.461.561.591.591.59Average1.701.641.661.641.631.631.631.601.601.661.63e-foldcrossvalidationoneachof7clusters.Table8.1liststheRMSEsonalltheve-foldcrossvalidationwithdistance5to50Aandstepsize5A.Figure8.1:ThepredictionRMSEvsthedistance.ResultsinTable8.1indicatethat:1)Overall,predictionoverthewholesetof1322complexesgivesbetterresultsthanpredictionsonindividualclusters.Therefore,thepro-posedmethodfavorsblindcross-clusterpredictions.2)Accordingtheresultsfromthewholevalidationsettests,featuredistanceat10Aishasreacheditsoptimalvalue.Thisdistanceisactuallyconsistentwiththeexplicitsolventmodelinginwhicha10Adistanceisdesignedtoaccountforlongrangeelectrostaticinteractions.Tobetterestimatetheoptimaldistance,wecarryoutamoreaccuratesearchingintherangeof5to15Adistancewithastepsizeof1A.Table8.2liststheRMSEsofthee-foldcrossvalidation236Table8.3:TheRMSEs(kcal/mol)forthevalidationset(N=1322)withdtnumbersofnearestneighborsandtopfeatures.NumberofNumberoftopfeaturesnearestneighbors510152025303540455011.601.601.601.611.611.611.611.611.611.6221.601.601.611.611.611.611.611.621.621.6231.601.591.601.701.661.681.711.701.691.7041.611.571.621.711.701.721.731.701.851.8351.611.601.671.741.751.741.751.731.781.7761.621.611.681.791.801.811.811.881.851.8571.611.611.651.781.771.781.781.811.821.8281.621.621.651.741.761.761.771.781.801.8191.621.611.651.741.751.761.761.761.781.77101.621.621.731.741.791.801.821.821.881.90onthewholevalidationsetof1322complexes.Theseresultsshowthat12Aistheoptimaldistanceinthesearchedsolutionspace,whichisconsistentwiththatusedintheRF-Score[10].WeplottherelationbetweenthedistanceandpredictionerrorinFig.8.1.Intherestofthiswork,thedistanceof12Aisutilized.Finally,alltheabovepredictionsarebasedontheLTRrankingresults.Alternatively,wecanalsocarryoutthepredictionbyusingnearestneighborsandtheirassociatedfeatures.Weareinterestedtoseethencebetweenthesetwoapproaches.Tothisend,wecomputethebindingofe-foldresultswitherentnumbersofnearestneighborsandtopfeaturesinvolved.HeretopfeaturesarerankedbytheLTRalgorithmautomaticallyaccordingtotheirimportanceduringthecomplexranking.Welistthetop50importantfeaturestotheproteinligandbindingforthevalidationsetintheSupportingmaterial.Wenotedthatthemostimportantefeaturesarethevolumechange,atomicCoulombicinteractionofSatoms,areachangeoftheCatomsintheproteinandcomplexparts,andelectrostaticbindingfreeenergy.TheRMSEsofthetestswithtnumbersoftopfeaturesandnearestneighbors237Figure8.2:Five-foldcrossvalidationonthevalidationset(N=1322).Leftchart:correla-tionbetweenexperimentalbindingandFFT-BPpredictions.Rightchart:RMSEsforegroups.Here,RMSEsare1.55,1.58,1.55,1.56,and1.59kcal/molforegroups,respectively.OverallPearsoncorrelationtotheexperimentalbindingis0.80.involvedarepresentedinTable8.3.Theoptimalresultisobtainedwhenfournearestneigh-borand10topfeaturesareutilized,withRMSE1.57kcal/mol.Itisseenthatwhenlessthanorequalto10topfeaturesareemployedthepredictionisquiteaccurate.However,withmorefeaturesandmoreneighborsinvolved,thepredictionbecomeslightlyworse.Onepossiblereasonisthereducedqualityofthenearestneighborsinvolvedfortheprediction.Indeed,theneighborsthatarenotveryclosetothetargetmoleculecomplexmaymakealargetothepredictionofthetargetcomplex.Thisproblemalsomotivatesustoseekabettersetoffeaturesforprotein-ligandbindinganalysis.Figure8.2depictstheoptimalpredictionresults(Leftchart)andRMSEsforeachgroup(Rightchart).ItisseenthattheRMSEsforallgroupsarealmostthesame,indicatingtheunbiasednatureofe-foldcross-validation.ThesuccessofproposedFFT-BPisimpliedbythesmallRMSEs(1.55˘1.59kcal/mol)andthehighoverallPearsoncorrelationof0.80.238Table8.4:TheRMSEs(kcal/mol)forthee-foldcrossvalidationonthetrainingset(N=3589)withrentnumberofnearestneighborsandtopfeatures.NumberofNumberoftopfeaturesnearestneighbors510152025303540455012.001.992.002.002.002.012.012.012.012.0222.001.992.001.992.002.012.012.012.012.0132.012.002.002.002.002.002.022.012.012.0142.002.012.001.992.002.012.012.012.022.0152.012.002.012.012.012.012.012.012.012.0262.001.991.992.002.002.012.012.012.012.0172.002.002.002.002.012.012.022.022.022.0282.001.991.981.991.992.002.002.002.012.0092.002.002.002.012.022.052.052.052.052.04101.992.002.002.032.042.072.082.082.082.088.3.2.2Validationonthetrainingset(N=3589)Wealsoconsiderthee-foldcrossvalidationonourtrainingsetof3589complexes.Werandomlydividethisdatasetintovegroupswith717,718,718,718,and718complexes,respectively.Inthee-foldcrossvalidation,eachtimeweregardonegroupofmoleculesasthetestsetwithoutbindingydata,andusingtheremainingfourgroupstopredictthebindingnitiesoftheselectedtestset.DirectlyusingtherankingscoreasthepredictedbindingyleadstoRMSE2.00kcal/mol.Alternatively,wecanpredictbindingusingthenearestneighborsandtopfeatures.Table8.4showstheRMSEsforthee-foldcrossvalidationtestonthetrainingset(N=3589).Thenumberofnearestneighborsisvariedfrom1to10andthenumberoftofeaturesischangedfrom5to50.Themostimportant50featuresindicatedfromtheLTRalgorithmareprovidedintheSupportingmaterial.Fivetopimportantfeaturesarevolumechange,electrostaticsbindingfreeenergy,andvanderWaalsinteractionsbetweenC-S,C-OandC-Npairs,respectively.Theoptimalpredictionisachievedwhen8nearest239Figure8.3:Five-foldcrossvalidationonthetrainingset(3589complexes).Leftchart:correlationbetweenexperimentalbindingandFFT-BPpredictions.Rightchart:RMSEsforegroups.Here,RMSEsare2.01,1.96,1.97,1.98,and2.00kcal/molforegroups,respectively.OverallPearsoncorrelationtotheexperimentalis0.70.neighborsandtop10featuresareusedforbindingyprediction,withtheRMSEbeing1.98kcal/mol.tnumbersofnearestneighborsandtopfeaturesbasicallygiveveryconsistentpredictions.Comparedtothee-foldtestonthe1322proteinligandcomplexes,thepredictionerrorsonthissetaremuchlarger,whichispartiallyduetothefactthatstructuresinthistestsetismorecomplexes.Forexample,binding-sitemetalarepresentedwithoutanappropriatetreatment.WebelieveabettertreatmentofmetalandaofligandmoleculeswouldimprovetheFFT-BPprediction.Figure8.3depictstheoptimalpredictionresults(Leftchart)andRMSEsforeachgroup(Rightchart).Thesetestsdemonstratethefollowingtwofacts.First,e-foldcrossvali-dationpredictionisunbiased.ThepredictionresultsdonotdependsonthedataitselfandtheRMSEsforallgroupsarealmostatthesamelevel.Second,whentheprotein-ligandcomplexesbecomediverse,thepredictionbecomesslightlyworseduetothelackofsimilarcomplexesforcertainclusters.240Table8.5:TheRMSEs(kcal/mol)oftheFFT-BPforthebenchmarktestset(N=100)withtnumbersofnearestneighborsandtopfeatures.NumberofNumberoftopfeaturesnearestneighbors510152025303540455012.002.002.002.002.002.002.002.002.002.0022.012.011.991.992.012.002.012.012.012.0132.002.002.002.002.002.002.002.012.012.0142.012.012.012.002.002.002.012.012.012.0152.012.012.012.012.012.012.002.012.012.0162.022.012.012.012.012.012.012.012.012.0172.022.022.012.012.012.012.012.012.012.0182.012.012.012.012.012.012.022.022.022.0292.012.012.012.012.012.012.022.012.012.01102.012.012.012.022.012.022.012.022.022.028.3.3BlindpredictionsonthreetestsetsTofurtherverifytheaccuracyoftheFFT-BP,weperformtheblindpredictiononthreebenchmarktestsets.Thetrainingset(N=3589)thatisprocessedfromthePDBBindv2015setisutilizedforthetraininginallblindpredictions.DuetotheLTRalgorithmusedinourFFT-BP,theRMSEandcorrelationofourFFT-BPpredictionwouldbearound0kcal/moland1,respectively,hadweincludeallthetestsetcomplexesinourtrainingset.Therefore,ineachblindprediction,wecarefullyexcludetheoverlappingtestsetcomplexesfromthetrainingsetandre-trainthetrainingsetwithareducednumberofcomplexes.8.3.3.1Predictiononthebenchmarkset(N=100)Firstofall,weconsiderapopularbenchmarksetoriginallyusedbyWangetal[265].Thissetcontains100proteinligandcomplexeswhichinvolvesalargevarietyofproteinreceptors.Originallythistestsetwasusedtotesttheperformanceofalargeamountofwell-knownscoringfunctionsanddockingalgorithms[265].Recently,ZhengetalhaveutilizedthistestsettodemonstratethesuperbperformanceoftheirKECSAmethod[288].Inthiswork,we241examinetheaccuracyandrobustnessofourFFT-BPonthisbenchmarktestset.Figure8.4:ThecorrelationbetweenexperimentalbindingfereeenergiesandFFT-BPpre-dictionsonthebenchmarktestset(N=100)withtheRMSEof1.99kcal/molandthePearsoncorrelationof0.75.DirectlyusingtherankingscoreasthepredictedbindingyleadstotheRMSEof2.01kcal/molandPearsoncorrelationcotof0.75.Alternatively,weexamineFFT-BPpredictionsusingentnumbersnearestneighborsandtopfeatures.Table8.5liststhepredictedRMSEsforthebenchmarkset(N=100).Thenumbersofnearestneighborsandtopsfeaturesvaryfrom1to10and5to50,respectively.Themostimportant50featuresindicatedbytheLTRalgorithmareprovidedintheSupportingmaterial.Fivetopimportantfeaturesarevolumechange,electrostaticsbindingfreeenergy,vanderWaalsinteractionbe-tweenC-SandC-Cpairs,andthecomplex'sareachange.Theoptimalpredictionisreachedwhen2nearestneighborsandtop15or20featuresareusedforbindingprediction.Thecor-respondingRMSEsandcorrelationsforbothcasesare1.99kcal/moland0.75,respectively.tnumbersofnearestneighborsandtopfeaturesbasicallygiverisetoveryconsistentpredictions.Wealsonotethatthepredictionerrorsforthis100testsetareverysimilartothoseoftheve-foldcrossvalidationtestsonourtrainingset(N=3589).ThisconsistencyindicatestherobustnessoftheproposedFFT-BPinbindingypredictions.242Figure8.5:Performancecomparisonbetweentscoringfunctionsonthebench-marktestset(N=100).ThebindingycomparisonswasdoneforFFT-BP,and19well-knownscoringfunctions,namelyLISA,KECSA,LISA+[288,287],ITScore/SE[117],ITScore[116],X-Score[264],DFIRE[282],DrugScoreCSD[244],DrugScorePDB[96],Cerius2/PLP[88],SYBYL/G-Score[130],SYBYL/D-Score[171],SYBYL[74],Cerius2/PMF[180],DOCK/FF[171],Cerius2/LUDI[23],Cerius2[1],SYBYL/F-Score[201],andAutoDock[179].Figure8.4illustratestheoptimalpredictionresultscomparedtotheexperimentaldata.TheRMSEandPearsoncorrelationcoientare1.99kcal/moland0.75,respectively.Thistestsetisacriticaltestsetwithdiverseprotein-ligandcomplexesandawiderangeofexperimentalbindingfreeenergies.Inourprediction,mostpredictionsarequiteappealingwithlessthan2kcal/molRMSEcomparedtoexperimentalresults.Manyoutstandingscoringfunctionshavebeentestedonthistestsetassummarizedby243Table8.6:TheRMSEs(kcal/mol)oftheFFT-BPforthePDBBindCore2007testset(N=195)withtnumbersofnearestneighborsandtopfeatures.NumberofNumberoftopfeaturesnearestneighbors510152025303540455012.102.102.102.102.102.102.102.102.102.1022.092.092.092.092.092.092.102.102.102.1032.102.102.102.092.092.092.092.102.092.0942.092.092.092.092.092.102.092.092.092.1052.092.102.092.102.112.102.112.112.112.1062.102.082.092.082.092.082.102.092.102.1072.092.092.102.102.112.122.122.122.122.1282.102.102.112.102.102.102.102.102.102.1092.092.102.102.102.102.102.112.112.112.11102.102.102.092.102.102.102.112.112.112.11Zhengetral[288].Herewealsoaddourpredictiontothislist.AsshowninFig.8.5,theperformanceofourFFT-BPishighlightedwithredcolor.Theperformanceofother19scoringfunctionsareduetothecourtesyofRef.[288].8.3.3.2PredictiononthePDBBindv2007coreset(N=195)PDBBindv2007coreset(N=195)whichcontainshighqualitydatamainlyaimsfortestingtheperformanceofscoringfunctions[160].Ithasbeenemployedtostudyandcomparemanyexcellentscoringfunctions[151,9,150,8,48].HerewealsoexamineourFFT-BPtestset.Ifweregardtherankingscoreitselfasthepredictedbindingy,theRMSEis2.09kcal/molforthistestset,whichisslightlylargerthanthatofthee-foldtestonthetrainingset(N=3589)andthatofthebenchmarktestset(N=100).Table8.6showsthepredictionRMSEsforPDBBindv2007coreset.Wehavevariedthenumbersofnearestneighborsandtopsfeaturesfrom1to10and5to50,respectively.Themostimportant50featuresindicatedbytheLTRalgorithmareprovidedintheSupportingmaterial.Thetopeimportantfeaturesarevolumechange,electrostaticsbindingfree244energy,vanderWaalsinteractionbetweenC-O,thecomplex'sareachange,andvanderWaalsinteractionbetweenC-C.Thesefeaturesarebasicallyconsistentwiththoseofalltheprevioustestcases.TheoptimalFFT-BPpredictionisfoundwhen6nearestneighborsandtop15or30featuresareusedforbindingprediction,withRMSEsforbothcasesbeing2.08kcal/mol.Pearsoncorrelationcotof0.76isconsistentwiththeearlierfromthetestsetof100complexes.Itisfoundthatforthistestset,thepredictionsbasedontnumbersofnearestneighborsandtopfeaturesdonotmuchfromeachother.Figure8.6:ThecorrelationofbetweenexperimentalbindingfereeenergiesandFFT-BPpredictionsonthePDBBindcore2007(N=195).TheRMSEandPearsoncorrelationcotare2.08kcal/moland0.76,respectively.Figure8.6illustratesthecorrelationbetweenexperimentalbindingfreeenergiesandthebestpredictionsobtainedbytheFFT-BP.Obviously,thereisabiasinthepredictedbindingwhichwillbeaddressedinourfuturework.LietalhavegivenacomparisonoftestsonthePDBBindv2007coreset(N=195)usingmanyoutstandingscoringfunctions[151].Inthiscontent,wealsoplottheperformanceofourFFT-BPintermsofPearsoncorrelationcotinFig.8.7.TheFFT-BPcorrelationcotof0.76ishighlightedwithredcolor.245Figure8.7:PerformancecomparisonbetweentscoringfunctionsonthePDBBindv2007coreset(N=195).Theperformancesoftheotherscoringfunctionareadoptedfromtheliterature[151,9,150,8,48]..8.3.3.3PredictiononthePDBBindv2015coreset(N=195)Finally,weperformatestonthePDBBindv2015coreset(N=195),whichcontainshighqualityexperimentaldata.Thistestsetisalsoquitechallengingdueitsdiversityof65protein-ligandclustersandawidebindingyrange.Inasimilarroutine,weconsidertheFFT-BPpredictionwithtnumbersofneighborsandtopfeatures.Table8.7showstheRMSEsofFFT-BPforPDBBindv2015coreset(N=195).Thetop50featuresarealsolistedintheSupportingmaterial.Themostimportantfeaturesaresimilar246Table8.7:ThepredictionRMSEs(kcal/mol)forthePDBBindv2015coreset(N=195)withtnumbersofnearestneighborsandtopfeatures.NumberofNumberoftopfeaturesnearestneighbors510152025303540455011.951.951.951.951.951.951.951.951.951.9521.951.941.951.951.951.951.951.951.961.9631.941.941.951.951.951.951.951.951.951.9541.941.941.931.931.941.941.941.951.951.9551.951.951.921.941.951.951.961.951.951.9561.951.951.951.951.961.961.951.961.951.9471.951.931.931.941.951.971.951.951.951.9581.951.951.951.961.961.971.971.961.951.9591.941.941.941.941.941.951.951.941.941.94101.951.951.951.951.951.941.941.941.941.94Figure8.8:ThecorrelationbetweenexperimentalbindingfreeenergiesandFFT-BPpre-dictionsonthePDBBindv2015coreset(N=195).TheRMSEandPearsoncorrelationcotare1.92kcal/moland0.78,respectively.tothoseinprevioustests,whichindicatesthatthevolumechange,electrostaticbindingfreeenergyandvanderWaalsinteractionsareoffundamentalimportancetotheprotein-ligandbinding.ItisworthnotingthattheRMSEsofFFT-BPpredictionsarelowerthanthosefromearliertestsets.ApossiblereasonisthisdatasetisconsistentwiththetrainingsetasbothobtainedfromthePDBBind2015set.Additionally,abetterdataqualitymightalsocontributeourbetterpredictions.OuroptimalpredictionhastheRMSEof1.92kcal/molandPearsoncorrelationcotof0.78,when5nearestneighborsand15featuresare247usedfortheprediction.Figure8.8plotsthecorrelationbetweenexperimentalbindingfreeenergiesandFFT-BPpredictionsonthePDBBindv2015coreset(N=195).Comparedtotheearliertwoblindpredictions,thepredictiononthissetismoreaccurate.However,similartothebehaviorintwoothertestsets,thepresentpredictionisbiased.Thisissuewillbestudiedinourfuturework.8.4ConcludingremarksInthiswork,weproposeanewscoringfunction,featurefunctionaltheory-bindingpre-dictor(FFT-BP).FFT-BPisconstructedbasedonthreefundamentalassumptions,namely,representability,feature-functionrelationship,andsimilarityassumptions.Avalidationsetof1322complexes,atrainingsetof3589complexes,andthreetestsetswith100,195and195complexesareconsideredinthepresentworktovalidatetheproposedmethod,exploreitsutility,demonstrateitsperformanceandrevealits.Extensivenumericalex-perimentsindicatethatFFT-BPdeliverssomeofthemostaccurateblindpredictionsinthewiththeroot-mean-squareerroraround2kcal/molandPearsoncorrelationcotaround0.76.AmajoradvantageofFFT-BPisthatitextractsmicroscopicfeaturesfromconventionalimplicitsolventmodelssothatthevalidityofthesephysicalmodelsforbindinganalysisandpredictioncanbesystematicallyexamined.Consequently,theproposedFFT-BPcanbeimprovedviatheimprovementofourunderstandingonphysicalmodels.AnotheradvantageofFFT-BPisthatitprovidesaframeworktosystematicallyincorporatesandcontinuouslyabsorbadvancedmachinelearningalgorithmstoimproveitspredictivepower.Theother248advantageofFFT-BPisthatitbecomesmoreandmoreaccurateastheexistingbindingdatabasebecomeslargerandlarger.Thisworkisourattemptinexploringthemathematicalmodelingoftheprotein-ligandbindingy.Ourmodelcanbefurtherimprovedinseveralaspects.First,wehaveemployedaverycrudeforceparameterizedofthePoissonmodel.MoreaccuratePoisson-Boltzmann(PB)modeling,suchaspolarizablePBmodel,andfeatureextractionfrommoreaccuratequantummechanics/molecularmechanics(QM/MM)modelswillim-provethepresentFFT-BP.Additionally,weemploytheMARTalgorithmforthemoleculesranking.Moresophisticatedmachinelearningalgorithms,suchasdeeplearning,canpo-tentiallyimproveFFT-BPprediction,andeliminatethecurrentpredictionbiasintestsets.Finally,aofthecurrentmodelisthatitneglectsthemetalonprotein-ligandbindingy.Theincorporationofthisintoourmodelisunderourinvestigation.249Chapter9DissertationContributionInthischapter,wewillsummarizethecontributionofthisdissertation.Inchapter3,weproposedanovelparametrizationmethodforthetialgeometrybasedimplicitsolventmodel,thisworkwaspublishedin[248].TheoriginalsolvationmodelwasproposedbyWei[270],andimplementedbyChenetal[43].Inthework[43],parametersselectionwasconsidered.Nevertheless,theparametrizationcannotgiveoptimalpredictionforpolarmolecule.IproposedasystematicparametrizationschemewhichincorporatesthePDEanalysisandconvexoptimizationtechniques.Iimplementedtheparametrizationschemetotheirframework.Inchapter4,wepresentedanEuleriansolventexcludedsurface.ThisisacollaborativeworkwithLiuetal[157].Inthiswork,Ifocusedonthesurfacevalidation.Inchapter5,weproposedanelectrostaticspotentialinterpolationschemeandim-plementedtheEuleriansolventexcludedsurfacetotheMIBPBPoissonBoltzmannsoftwarewhichiscontributedfrommanypeople[39].ThesoftwareisshowntobeofsecondorderconvergenceinthenumericalsolutiontothePoissonBoltzmannmodel.However,therearesomeonthissoftwareintermsofrobustnessandhighlyaccuratereactionenergycalculation.Myworkmakesthissoftwaremorerobustandprovidesgridspacingalmostindependentreactionenergycalculation.Inchapter6,wedevelopedahybridphysicalandknowledgebasedsolvationprediction250paradigm.Thisworkcanalsobefoundin[255].OnemajorpartisthepolarizablePoissonmodel,inwhichIcoupledthePoissondielectricmodelwiththeKohnShamdensityfunctionaltheory.Thiscouplinghasbeeninvestigatedbyseveralwork[262,42],however,thesolventsoluteinterfaceconditionshavenotbeenexplicitlyimplemented.IincorporatedtheinterfaceoftheSIESTAsoftwareusedbyChenetal[45]totheMIBPBsoftware,anddevelopedseveralschemesforthecommunicationbetweenthistwosoftware.Thiscouplingcanberegardedastheinterfacemethodbasedpolarizablecontinuummodel.Anothercontributionofthischapterisdevelopingaframeworkforsolvationfreeenergyprediction.Furthermore,weintroducedthedistributedmultipoleanalysis[227]forcharacterizingthesolutemolecules.Inchapter7,wedeliveredaprotocolthatcouplesthelearningtorankmethodandtheimplicitsolventmodelforsolvationprediction.Comparedtosomeclassicalmultiscalemethods,wherethemicroscopicmodelsprovideparametrizationformacroscopicmod-els.Thenewcouplingprovidesanimplicitparametrizationstrategy,whichisdemon-stratedbytreatingtheatomicreactionenergythroughthebigdataapproach.Largeamountofnumericalresultsshowstate-of-the-artaccuracyandrobustnessforblindsolvationfreeenergyprediction.Thisworkcanalsobefoundin[249].Inchapter8,westudiedtheproteinliganddockingproblem.Moresp,weproposedanovelproteinligandbindingscoringfunction.Thisscoringfunctioncanberegardedasanextensionofourlearningtorankbasedsolvationmodeltoproteinligandbindingscenario.Thetestingontheseveralbenchmarktestsetsverifytheaccuracyoftheproposedscoringfunction.Thisworkcanalsobefoundin[254].251BIBLIOGRAPHY252BIBLIOGRAPHY[1]CERIUS2usermanual;accelrys,inc.:Sandeigo,ca.pages3{48,2000.[2]ShivaniAgarwal,DeepakDugar,andShiladityaSengupta.Rankingchemicalstructurefordrugdiscovery:Anewmachinelearningapproach.JournalofChemicalInformationandModel,50:716{731,2010.[3]LiAnbang.Performancecomparisonofpoissonboltzmannequationsolversdelphiandpbsaincalculationofelectrostaticsolvationenergies.JournalofTheoreticalandCom-putationalChemistry,13(13):1450040,2014.[4]HossamM.AshtawyandNiharR.Mahapatra.Acomparativeassessmentofrankingaccuraciesofconventionalandmachine-learning-basedscoringfunctionsforprotein-ligandbindingyprediction.IEEE/ACMTransactionsoncomputationalbiologyandbioinformatics,9(5):1301{1313,2012.[5]FranzAurenhammer.Voronoidiagramsasurveyofafundamentalgeometricdatastructure.ACMComputingSurveys,23(3):345{405,1991.[6]N.A.Baker.Improvingimplicitsolventsimulations:aPoisson-centricview.CurrentOpinioninStructuralBiology,15(2):137{43,2005.[7]NathanA.Baker,DavidSept,MichaelJ.Holst,andJ.AndrewMccammon.Theadap-tivemultilevelelementsolutionofthePoisson-Boltzmannequationonmassivelyparallelcomputers.IBMJournalofResearchandDevelopment,45(3-4):427{438,2001.[8]PedroJ.Ballester.Machinelearningscoringfunctionsbasedonrandomforestandsupportvectorregression.Proceedingsofthe7thIAPRinternationalconferenceonPatternRecognitioninBioinformatics,pages14{25,2012.[9]PedroJ.BallesterandJohnB.O.Mitchell.Amachinelearningapproachtopredictingproteinligandbindingywithapplicationstomoleculardocking.Bioinformatics,26(9):1169{1175,2010.[10]PedroJ.Ballester,AdrianSchreyer,andL.BlundellTom.Doesamoreprecisechemicaldescriptionofprotein-ligandcomplexesleadtomoreaccuratepredictionofbindingy?JournalofChemicalInformationandModel,54:944{955,2014.253[11]D.BashfordandD.A.Case.GeneralizedBornmodelsofmacromolecularsolvationAnnualReviewofPhysicalChemistry,51:129{152,2000.[12]P.W.Bates,Z.Chen,Y.H.Sun,G.W.Wei,andS.Zhao.Geometricandpotentialdrivingformationandevolutionofbiomolecularsurfaces.J.Math.Biol.,59:193{231,2009.[13]P.W.Bates,G.W.Wei,andS.Zhao.Theminimalmolecularsurface.arXiv:q-bio/0610038v1,[q-bio.BM],2006.[14]P.W.Bates,G.W.Wei,andS.Zhao.Theminimalmolecularsurface.MidwestQuan-titativeBiologyConference,MissionPointResort,MackinacIsland,MI:September29{October1,2006.[15]P.W.Bates,G.W.Wei,andShanZhao.Minimalmolecularsurfacesandtheirappli-cations.JournalofComputationalChemistry,29(3):380{91,2008.[16]B.Baum,L.Muley,M.Smolinski,A.Heine,D.Hangauer,andG.Klebe.Non-additivityoffunctionalgroupcontributionsinprotein-ligandbinding:acompre-hensivestudybycrystallographyandisothermaltitrationcalorimetry.J.Mol.Bio,397(4):1042{1054,2010.[17]J.T.BealeandA.T.Layton.Ontheaccuracyofmethodsforellipticproblemswithinterfaces.Comm.Appl.Math.Comp.Sci.,,1:91{119,2006.[18]AxelD.Becke.Density-functionalthermochemistry.iii.theroleofexactexchange.JournalofChemicalPhysics,98(7):5648,1993.[19]D.BeglovandB.Roux.Solvationofcomplexmoleculesinapolarliquid:anintegralequationtheory.JournalofChemicalPhysics,104(21):8678{8689,1996.[20]C.A.S.Bergstrom,M.StraL.Lazorova,A.Avdeef,K.Luthman,andP.Ar-tursson.Absorptionoforaldrugsbasedonmolecularsurfaceproperties.JournalofMedicinalChemistry,46(4):558{570,2003.[21]J.Blinn.Ageneralizationofalgebraicsurfacedrawing.ACMTransactionsonGraph-ics,1(3):235{256,1982.[22]JoelR.BockandDavidA.Gough.Anewmethodtoestimateligand-receptorener-getics.MolecularandCellularProteomics,1(11):904{910,2002.254[23]H.J.Bohm.Thedevelopmentofasimpleempiricalscoringfunctiontoestimatethebindingconstantforaprotein-ligandcomplexofknownthree-dimensionalstructure.JournalofComputer-AidedMolecularDesign,8:234{256,1994.[24]A.H.BoschitschandM.O.Fenley.AFastandRobustPoisson-BoltzmannSolverBasedonAdaptiveCartesianGrids.JournalofChemicalTheoryandComputation,7:1524{1540,2011.[25]C.J.C.Burges,R.Ragno,andQuocVietLe.Learningtorankwithnonsmoothcostfunctions.AdvancesinNeuralInformationProcessingSystems,19:193{200,2007.[26]ChristopherJ.C.Burges.FromRankNettoLambdaRanktoLambdaMART:Anoverview.MicrosoftResearchTechnicalReport,82,2010.[27]C.J.C.Burges,T.Shaked,E.Renshaw,A.Lazier,M.Deeds,N.Hamilton,andG.Hul-lender.Learningtorankusinggradientdescent.InProc.ofICML,pages89{96,2005.[28]B.D.Bursulaya,M.Totrov,R.Abagyan,andC.L.Brooks.Comparativestudyofseveralalgorithmsforliganddocking.JournalofComputer-AidedMolecularDesign,17:755{763,2003.[29]H.Butt,L.Graf,andM.Kappl.PhysicsandChemistryofInterfaces.Weinheim,Germany:Wiley-VCH.,2006.[30]S.Cabani,P.Gianni,VMollica,andLLepori.GroupContributionstotheThermo-dynamicPropertiesofNon-IonicOrganicSolutesinDiluteAqueousSolution.JournalofSolutionChemistry,10(8):563{595,1981.[31]SergioCabani,PaoloGianni,VincenzoMollica,andLucianoLepori.Groupcontribu-tionstothethermodynamicpropertiesofnon-ionicorganicsolutesindiluteaqueoussolution.JournalofSolutionChemistry,10:563{595,1981.[32]MichaelL.Cannolly.Solvent-accessiblesurfacesofproteinsandnucleicacids.Science,221(4612):709{713,1983.[33]YCaoandLLi.Improvedprotein-ligandbindingaypredictionbyusingacurvature-dependentsurface-areamodel.Bioinformatics,30(12):1674{1680,2014.[34]Z.Cao,T.Qin,T.Y.Liu,M.F.Tsai,andF.Li.Learningtorank:Frompairwiseapproachtolistwiseapproach.ICML,2007.255[35]D.A.Case,J.T.Berryman,R.M.Betz,D.S.Cerutti,T.E.CheathamIII,T.A.Darden,R.E.Duke,T.J.Giese,H.Gohlke,A.W.Goetz,N.Homeyer,S.Izadi,P.Janowski,J.Kaus,A.Kovalenko,T.S.Lee,S.LeGrand,P.Li,T.Luchko,R.Luo,B.Madej,K.M.Merz,G.Monard,P.Needham,H.Nguyen,H.T.Nguyen,I.Omelyan,A.Onufriev,D.R.Roe,A.Roitberg,R.Salomon-Ferrer,C.L.Simmerling,W.Smith,J.Swails,R.C.Walker,J.Wang,R.M.Wolf,X.Wu,D.M.York,andP.A.Kollman.Amber2015.UniversityofCalifornia,SanFrancisco,2015.[36]D.S.Cerutti,N.A.Baker,andJ.A.McCammon.Solventreactionpotentialinsideanunchargedglobularprotein:Abridgebetweenimplicitandexplicitsolventmodels?TheJournalofChemicalPhysics,127(15):155101,2007.[37]ArghyaChakravorty,LinLi,andEmilAlexov.Sensitivitytestofpoisson-boltzmannmodelingofelectrostaticcomponentofthebindingfreeenergy:ofenergymini-mization,forceeldparametersandmodelingprotocols.Preprint.[38]D.Chen,G.W.Wei,X.Cong,andG.Wang.Computationalmethodsforopticalmolecularimaging.CommunicationsinNumericalMethodsinEngineering,25:1137{1161,2009.[39]DuanChen,ZhanChen,ChangjunChen,W.H.Geng,andG.W.Wei.MIBPB:Asoftwarepackageforelectrostaticanalysis.J.Comput.Chem.,32:657{670,2011.[40]DuanChen,ZhanChen,andG.W.Wei.QuantumdynamicsincontinuumforprotontransportII:Variationalsolvent-soluteinterface.InternationalJournalforNumericalMethodsinBiomedicalEngineering,28:25{51,2012.[41]DuanChenandG.W.Wei.Quantumdynamicsincontinuumforprotontransport|Generalizedcorrelation.JChem.Phys.,136:134109,2012.[42]J.L.Chen,LouisNoodleman,D.A.Case,andD.Bashford.Incorporatingsolva-tionintodensity-functionalelectronic-structurecalculations.J.Phys.Chem.,98:11059{11068,1994.[43]Z.Chen,N.A.Baker,andG.W.Wei.tialgeometrybasedsolvationmodelsI:Eulerianformulation.J.Comput.Phys.,229:8231{8258,2010.[44]Z.Chen,N.A.Baker,andG.W.Wei.tialgeometrybasedsolvationmodelsII:Lagrangianformulation.J.Math.Biol.,63:1139{1200,2011.[45]Z.ChenandG.W.Wei.tialgeometrybasedsolvationmodelsIII:Quantumformulation.J.Chem.Phys.,135:194108,2011.256[46]Z.Chen,ShanZhao,J.Chun,D.G.Thomas,N.A.Baker,P.B.Bates,andG.W.Wei.Variationalapproachfornonpolarsolvationanalysis.JournalofChemicalPhysics,137(084101),2012.[47]L.T.Cheng,JoachimDzubiella,AndrewJ.McCammon,andB.Li.Applicationofthelevel-setmethodtotheimplicitsolvationofnonpolarmolecules.JournalofChemicalPhysics,127(8),2007.[48]T.Cheng,X.Li,Y.Li,Z.Liu,andR.Wang.Comparativeassesmentofscoringfunctionsonadiversetestset.J.Chem.Inf.Model.,49:1079{1093,2009.[49]I.L.Chern,J.-G.Liu,andW.-C.Weng.Accurateevaluationofelectrostaticsformacromoleculesinsolution.MethodsandApplicationsofAnalysis,10(2):309{28,2003.[50]NiharenduChoudhuryandBMontgomeryPettitt.Onthemechanismofhydropho-bicassociationofnanoscopicsolutes.JournaloftheAmericanChemicalSociety,127(10):3556{3567,2005.[51]M.L.Connolly.Analyticalmolecularsurfacecalculation.JournalofAppliedCrystal-lography,16(5):548{558,1983.[52]M.L.Connolly.Depthalgorithmsformolecularmodeling.J.Mol.Graphics,3:19{24,1985.[53]R.B.CoreyandL.Pauling.Molecularmodelsofaminoacids,peptidesandproteins.Rev.Sci.Instr.,24:621{627,1953.[54]MaurizioCossi,NadiaRega,GiovanniScalmani,andVincenzoBarone.Energies,structures,andelectronicpropertiesofmoleculesinsolutionwiththec-pcmsolvationmodel.JournalofComputationalChemistry,24(6):669{681,2003.[55]C.J.Cramer.EssentialsofComputationalChemistry:TheoriesandModels.JohnWileyandSons,2013.[56]C.J.CramerandD.G.Truhlar.Implicitsolvationmodels:equilibria,structure,spectra,anddynamics.ChemicalReviews,99(8):2161{2200,1999.[57]ChristopherJ.CramerandDonaldG.Truhlar.Auniversalapproachtosolvationmodeling.Accountsofchemicalresearch,41:760{768,2008.257[58]PeterB.CrowleyandAdelGolovin.Cation-piinteractionsinprotein-proteininterfaces.Proteins:Structure,Function,andBioinformatics,59(2):231{239,2005.[59]M.Daily,J.Chun,A.Heredia-Langner,G.W.Wei,andN.A.Baker.Originofparameterdegeneracyandmolecularshaperelationshipsinwcalculationsofsolvationfreeenergies.JournalofChemicalPhysics,,139:204108,2013.[60]R.Daudel.Quantumtheoryofchemicalreactivity.InQuantumTheoryofChemicalReactivity,1973.[61]L.David,R.Luo,andM.K.Gilson.ComparisonofgeneralizedBornandPoissonmod-els:EnergeticsanddynamicsofHIVprotease.JournalofComputationalChemistry,21(4):295{309,2000.[62]M.E.DavisandJ.A.McCammon.Electrostaticsinbiomolecularstructureanddy-namics.ChemicalReviews,94:509{21,1990.[63]R.L.DesJarlais,R.P.Sheridan,J.S.Dixon,I.D.Kuntz,andR.Venkataraghavan.Dockingligandstomacromolecularreceptorsbymolecularshape.J.Med.Chem.,29:2149{2153,1986.[64]B.N.DominyandC.L.Brooks,III.DevelopmentofageneralizedBornmodelparameterizationforproteinsandnucleicacids.JournalofPhysicalChemistryB,103(18):3765{3773,1999.[65]F.Dong,M.Vijaykumar,andH.X.Zhou.Comparisonofcalculationandexperimentimplicatestelectrostaticcontributionstothebindingstabilityofbarnaseandbarstar.BiophysicalJournal,85(1):49{60,2003.[66]F.DongandH.X.Zhou.Electrostaticcontributiontothebindingstabilityofprotein-proteincomplexes.Proteins,65(1):87{102,2006.[67]J.P.Donley,J.G.Curro,andJ.D.McCoy.Adensityfunctionaltheoryforpaircorrelationfunctionsinmolecularliquids.TheJournalofchemicalphysics,101:3205,1994.[68]AnatolyI.Dragan,ChristopherM.Read,ElenaN.Makeyeva,EkaterinaI.Milgotina,MairE.Churchill,ColynCrane-Robinson,andPeterL.Privalov.DNAbindingandbendingbyHMGboxes:Energeticdeterminantsofspy.JournalofMolecularBiology,343(2):371{393,2004.258[69]J.DzubiellaandJ.-P.Hansen.Competitionofhydrophobicandcoulombicinterac-tionsbetweennanosizedsolutes.TheJournalofChemicalPhysics,121(11):5514{5530,September2004.[70]J.Dzubiella,J.M.J.Swanson,andJ.A.McCammon.Couplinghydrophobicity,dispersion,andelectrostaticsincontinuumsolventmodels.PhysicalReviewLetters,96:087802,2006.[71]B.MennucciE.CancesandJ.Tomasi.Anewintegralequationformalismforthepolarizablecontinuummodel:Theoreticalbackgroundandapplicationstoisotropicdielectrics.JornalofChemicalPhysics,107:3032,1997.[72]H.Edelsbrunner,D.Letscher,andA.Zomorodian.Topologicalpersistenceandsim-DiscreteComput.Geom.,28:511{533,2002.[73]HerbertEdelsbrunnerandJohnHarer.Computationaltopology:anintroduction.AmericanMathematicalSoc.,2010.[74]M.D.Eldridge,C.W.Murray,T.R.Auton,G.V.Paolini,andR.P.Mee.Empiricalscoringfunctions:I.thedevelopmentofafastempiricalscoringfunctiontoestimatethebindingyofligandsinreceptorcomplexes.J.Comput.Aided.Mol.Des,11:425{445,1997.[75]M.Feig,W.Im,andC.L.BrooksIII.ImplicitsolvationbasedongeneralizedBorntheoryintdielectricenvironments.JournalofChemicalPhysics,120(2):903{911,2004.[76]F.Fogolari,A.Brigo,andH.Molinari.ThePoisson-Boltzmannequationforbiomolec-ularelectrostatics:atoolforstructuralbiology.JournalofMolecularRecognition,15(6):377{92,2002.[77]B.Fornberg.Calculationofweightsinteformulas.SIAMRev,40:685{691,1998.[78]JeromeHFriedman.Greedyfunctionapproximation:agradientboostingmachine.Annalsofstatistics,pages1189{1232,2001.[79]R.A.Friesner,J.L.Banks,R.B.Murphy,T.A.Halgren,J.J.Klicic,D.T.Mainz,M.P.Repasky,E.H.Knoll,MShelley,J.K.PerryJK,D.E.Shaw,P.Francis,andP.S.Shenkin.Glide:anewapproachforrapid,accuratedockingandscoring.1.methodandassessmentofdockingaccuracy.J.Med.Chem.,47:1739,2004.259[80]M.J.Frisch,G.W.Trucks,H.B.Schlegel,G.E.Scuseria,M.A.Robb,J.R.Cheese-man,G.Scalmani,V.Barone,B.Mennucci,G.A.Petersson,H.Nakatsuji,M.Cari-cato,X.Li,H.P.Hratchian,A.F.Izmaylov,J.Bloino,G.Zheng,J.L.Sonnenberg,M.Hada,M.Ehara,K.Toyota,R.Fukuda,J.Hasegawa,M.Ishida,T.Nakajima,Y.Honda,O.Kitao,H.Nakai,T.Vreven,J.A.Montgomery,Jr.,J.E.Peralta,F.Ogliaro,M.Bearpark,J.J.Heyd,E.Brothers,K.N.Kudin,V.N.Staroverov,R.Kobayashi,J.Normand,K.Raghavachari,A.Rendell,J.C.Burant,S.S.Iyen-gar,J.Tomasi,M.Cossi,N.Rega,J.M.Millam,M.Klene,J.E.Knox,J.B.Cross,V.Bakken,C.Adamo,J.Jaramillo,R.Gomperts,R.E.Stratmann,O.Yazyev,A.J.Austin,R.Cammi,C.Pomelli,J.W.Ochterski,R.L.Martin,K.Morokuma,V.G.Zakrzewski,G.A.Voth,P.Salvador,J.J.Dannenberg,S.Dapprich,A.D.Daniels,.Farkas,J.B.Foresman,J.V.Ortiz,J.Cioslowski,andD.J.Fox.Gaussian09RevisionE.01.GaussianInc.WallingfordCT2009.[81]J.Fu,Y.Liu,andJ.Wu.Fastpredictionofhydrationfreeenergiesforsampl4blindtestfromaclassicaldensityfunctionaltheory.TheJournalofComputer-AidedMolecularDesign,28:299{304,2014.[82]E.Gallicchio,M.M.Kubo,andR.M.Levy.Enthalpy-entropyandcavitydecomposi-tionofalkanehydrationfreeenergies:Numericalresultsandimplicationsfortheoriesofhydrophobicsolvation.JournalofPhysicalChemistryB,104(26):6271{6285,2000.[83]E.GallicchioandR.M.Levy.AGBNP:Ananalyticimplicitsolventmodelsuitableformoleculardynamicssimulationsandhigh-resolutionmodeling.JournalofCompu-tationalChemistry,25(4):479{499,2004.[84]E.Gallicchio,L.Y.Zhang,andR.M.Levy.TheSGB/NPhydrationfreeenergymodelbasedonthesurfacegeneralizedBornsolventreactionandnovelnonpolarhydrationfreeenergyestimators.JournalofComputationalChemistry,23(5):517{29,2002.[85]JGasteigerandMMarsili.Iterativepartialequalizationoforbitalelectronegativityarapidaccesstoatomiccharges.Tetrahedron,36:3219{3228,1980.[86]M.T.Geballe,A.G.Skillman,A.Nicholls,J.P.Guthrie,andJ.P.Taylor.Thesampl2blindpredictionchallenge:introductionandoverview.JournalofComputer-AidedMolecularDesign,24:259{279,2010.[87]MatthewT.GeballeandJ.P.Guthrie.TheSAMPL3blindpredictionchallenge:transferenergyoverview.JournalofComputer-AidedMolecularDesign,26:489{496,2012.260[88]DKGehlhaar,GMVerkhivker,PARejto,CJSherman,DBFogel,LJFogel,andSTFreer.MolecularrecognitionoftheinhibitorAG-1343byHIV-1protease:confor-mationallydockingbyevolutionaryprogramming.ChemBiol.,2(5):317{324,1995.[89]WeihuaGengandG.W.Wei.Multiscalemoleculardynamicsusingthematchedinterfaceandboundarymethod.Journalofcomputationalphysics,230(2):435{457,2011.[90]WeihuaGeng,SiningYu,andG.W.Wei.Treatmentofchargesingularitiesinimplicitsolventmodels.JournalofChemicalPhysics,127:114106,2007.[91]G.M.Giambasu,T.Luchko,D.Herschlag,D.M.York,andD.A.Case.Ioncountingfromexplicit-solventsimulationsand3d-rism.BiophysicalJournal,106:883{894,2014.[92]M.K.Gilson,M.E.Davis,B.A.Luty,andJ.A.McCammon.Computationofelec-trostaticforcesonsolvatedmoleculesusingthePoisson-Boltzmannequation.JournalofPhysicalChemistry,97(14):3591{3600,1993.[93]M.K.GilsonandHuanXiangZhou.Calculationofprotein-ligandbindingAnnualReviewofBiophysicsandBiomolecularStructur,36:21{42,2007.[94]MichaelK.Gilson,KimA.Sharp,andBarryH.Honig.Calculatingtheelectrostaticpotentialofmoleculesinsolution:Methodanderrorassessment.JornalofComputa-tionalChemistry,9:327{335,1988.[95]H.GohlkeandD.A.Case.Convergingfreeenergyestimates:MM-PB(GB)SAstud-iesontheprotein-proteincomplexras-raf.JournalofComputationalChemistry,25(2):238{250,2004.[96]HGohlke,MHendlich,andGKlebe.Knowledge-basedscoringfunctiontopredictprotein-ligandinteractions.JMolBiol.,295(2):337{356,2000.[97]D.S.GoodsellandA.J.Olson.Automateddockingofsubstratestoproteinsbysimulatedannealing.ProteinStruct.Funct.Genet.,8:195{202,1990.[98]J.A.GrantandB.T.Pickup.Agaussiandescriptionofmolecularshape.JournalofPhysicalChemistry,99:3503{3510,1995.[99]J.A.Grant,B.T.Pickup,M.T.Sykes,C.A.Kitchen,andA.Nicholls.TheGaussianGeneralizedBornmodel:applicationtosmallmolecules.PhysicalChemistryChemicalPhysics,9:4913{22,2007.261[100]J.AndrewGrant,BarryT.Pickup,andAnthonyNicholls.Asmoothpermittivityfunc-tionforPoisson-Boltzmannsolvationmethods.JournalofComputationalChemistry,22(6):608{640,2001.[101]J.A.Grant,B.Pickup,andA.Nicholls.Asmoothpermittivityfunctionforpoisson-boltzmannsolvationmethods.JComputChem,22:608,2001.[102]MichaelGrantandStephenBoyd.Graphimplementationsfornonsmoothconvexprograms.InV.Blondel,S.Boyd,andH.Kimura,editors,RecentAdvancesinLearn-ingandControl,LectureNotesinControlandInformationSciences,pages95{110.Springer-VerlagLimited,2008.http://stanford.edu/boyd/graph-dcp.html.[103]MichaelGrantandStephenBoyd.CVX:Matlabsoftwarefordisciplinedconvexpro-gramming,version2.1.http://cvxr.com/cvx,March2014.[104]PauletteA.Greenidge,ChristianKramer,Jean-ChristopheMozziconacci,andRo-mainM.Wolf.MM/GBSAbindingenergypredictiononthePDBBinddataset:Successes,failures,anddirectionsforfurtherimprovement.JournalofChemicalIn-formationandModel,53:201{209,2013.[105]J.PeterGuthrie.Ablindchallengeforcomputationalsolvationfreeenergies:Intro-ductionandoverview.JournalofPhysicalChemistryB,113:4501{4507,2009.[106]J.PeterGuthrie.Sampl4,ablindchallengeforcomputationalsolvationfreeenergies:thecompoundsconsidered.J.ComputAidedMolDes,28:151{168,2014.[107]RobertC.Harris,AleanderH.Boschitsch,andMarciaO.Fenley.ofgridspacinginPoisson-Boltzmannequationbindingenergyestimation.JournalofChemicalTheoryandComputation,9:3677{3685,2013.[108]TrevorHastie,RobertTibshirani,andJeromeFriedman.Theelementsofstatisti-callearning:Datamining,inference,andprediction.InTheElementsofStatisticalLearning:DataMining,Inference,andPrediction,SecondEdition.Springer,2009.[109]KennethHaugland.Polynomialequationsolver.http://www.codeproject.com/Articles/552678/Polynomial-Equation-Solver,2013.[110]F.Hirataandetal.Moleculartheoryofsolvation.InF.Hirata,editor,MolecularTheoryofSolvation.Springer,2003.262[111]ChristianHolm,PatrickKekicandRudolfPodgornik.Electrostaticctsinsoftmatterandbiophysics;NATOScienceSeries.KluwerAcademicPublishers,Boston,2001.[112]B.HonigandA.Nicholls.Classicalelectrostaticsinbiologyandchemistry.Science,268(5214):1144{9,1995.[113]C.S.HsuandD.Chandler.Rismcalculationofthestructureofliquidacetonitrile.MolecularPhysics,36(1):215{224,1978.[114]DavidMHuangandDavidChandler.Temperatureandlengthscaledependenceofhydrophobicandtheirpossibleimplicationsforproteinfolding.ProceedingsoftheNationalAcademyofSciences,97(15):8324{8327,2000.[115]H.HuangandZ.L.Li.Convergenceanalysisoftheimmersedinterfacemethod.IMAJournalofNumericalAnalysis,19(4):583{608,1999.[116]S.Y.HuangandX.Zou.Aniterativeknowledge-basedscoringfunctiontopredictprotein-ligandinteractions:I.derivationofinteractionpotentials.J.Comput.Chem.,27:1865{1875,2006.[117]Sheng-YouHuangandXiaoqinZou.Inclusionofsolvationandentropyintheknowledge-basedscoringfunctionforproteinligandinteractions.J.Chem.Inf.Model.,50(2):262{273,2010.[118]W.Humphrey,A.Dalke,andK.Schulten.VMD{visualmoleculardynamics.JournalofMolecularGraphics,14(1):33{38,1996.[119]R.Ishizuka,S.H.Chong,andF.Hirata.AnintegralequationtheoryforinhomogeneousmolecularThereferenceinteractionsitemodelapproach.JournalofChemicalPhysics,128(3):34504{34504,2008.[120]RichardM.JacksonandMichaelJ.Sternberg.Acontinuummodelforprotein-proteininteractions:Applicationtothedockingproblem.JournalofMolecularBiology,250(2):258{275,1995.[121]BenedettaMennucciJacopoTomasiandRobertoCammi.Quantummechanicalcon-tunuumsolvationmodels.Chem.Rev.,105:2999{3093,2005.[122]A.Jakalian,D.B.Jack,andC.I.Bayly.Fast,tgenerationofhigh-qualityatomiccharges.AM1-BCCmodel:II.parameterizationandvalidation.JournalofComputationalChemistry,23(16):1623{1641,2002.263[123]ArazJakalian,BruceL.Bush,DavidB.Jack,andChristopherI.Bayly.Fast,tgenerationofhigh-qualityatomiccharges.am1-bccmodel:I.method.JournalofComputationalChemistry,21(2):132{146,2000.[124]B.Jayaram,D.Sprous,andD.L.Beveridge.Solvationfreeenergyofbiomacro-molecules:ParametersforamogeneralizedBornmodelconsistentwiththeAM-BERforceJournalofPhysicalChemistryB,102(47):9571{9576,1998.[125]F.Jensen.Introductiontocomputationalchemistry.JohnWileyandSons,2007.[126]R.JinnouchiandA.B.Anderson.Electronicstructurecalculationsofliquid-solidinterfaces:CombinationofdensityfunctionaltheoryandmoPoisson-Boltzmanntheory.PHYSICALREVIEWB,77:245417,2008.[127]S.Jo,T.Kim,V.G.Iyer,andW.Im.Charmm-gui:aweb-basedgraphicaluserinterfaceforcharmm.JComputChem,29(11):1859{1865,2008.[128]T.Joachims.Optimizingsearchenginesusingclickthroughdata.ProceedingsoftheACMConferenceonKnowledgeDiscoveryandDataMining(KDD),2002.[129]T.Joachims.TraininglinearSVMsinlineartime.ProceedingsoftheACMConferenceonKnowledgeDiscoveryandDataMining(KDD),2006.[130]GarethJones,PeterWillett,RobertCGlen,AndrewRLeach,andRobinTaylor.Developmentandvalidationofageneticalgorithmfordocking.JournalofMolecularBiology,267(3):727{748,1997.[131]W.L.Jorgensen.Rustingofthelockandkeymodelforprotein-ligandbinding.Science,254:954{955,1991.[132]WilliamL.JorgensenandJulian.Tirado-Rives.TheOPLSoptimizedpotentialsforliquidsimulations]potentialfunctionsforproteins,energyminimizationsforcrystalsofcyclicpeptidesandcrambin.J.Am.Chem.Soc.,110(6):1657{1666,1988.[133]T.Kaczynski,K.Mischaikow,andM.Mrozek.Computationalhomology.Springer-Verlag,2004.[134]CharlesW.Kehoe,ChristopherJ.Fennell,andKenA.Dill.Testingthesemi-explicitassemblysolvationmodelinthesampl3communityblindtest.JComputAidedMolDes,26:563{568,2012.264[135]SarahL.Kinnings,NinaLiu,PeterJ.Tonge,RichardM.Jackson,LeiXie,andPhilipE.Bourne.Amachinelearningbasedmethodtoimprovedockingscoringfunc-tionsanditsapplicationtodrugrepurposing.JournalofChemicalInformationandModel,51(2):408{419,2011.[136]J.G.Kirkwood.Theoryofsolutionofmoleculescontainingwidelyseparatedchargeswithspecialapplicationtozwitterions.J.Comput.Phys.,7:351{361,1934.[137]PavelV.KlimovichandDavidL.Mobley.Predictinghydrationfreeenergiesusingall-atommoleculardynamicssimulationsandmultiplestartingconformations.J.ComputAidedMolDes.,24:307{316,2010.[138]P.Koehl.Electrostaticscalculations:latestmethodologicaladvances.CurrentOpinioninStructuralBiology,16(2):142{51,2006.[139]P.A.Kollman,I.Massova,C.Reyes,B.Kuhn,S.Huo,L.Chong,M.Lee,T.Lee,Y.Duan,W.Wang,O.Donini,P.Cieplak,J.Srinivasan,D.A.Case,andIIICheatham,T.E.Calculatingstructuresandfreeenergiesofcomplexmolecules:com-biningmolecularmechanicsandcontinuummodels.AccountsofChemicalResearch,33(12):889{97,2000.[140]M.M.KreevoyandD.G.Truhlar.Ininvestigationofratesandmechanismsofreac-tions,parti.InC.F.Bernasconi,editor,InInvestigationofRatesandMechanismsofReactions,PartI,page13.Wiley:NewYork,1986.[141]M.Krone,K.Bidmon,andT.Ertl.Interactivevisualizationofmolecularsurfacedynamics.VisualizationandComputerGraphics,IEEETransactionson,15(6):1391{1398,Nov2009.[142]LeslieA.Kuhn,MichaelA.Siani,MichaelE.Pique,CindyL.Fisher,ElizabethD.andJohnA.Tainer.Theinterdependenceofproteinsurfacetopographyandboundwatermoleculesrevealedbysurfaceaccessibilityandfractaldensitymeasures.JournalofMolecularBiology,228(1):13{22,1992.[143]I.D.Kuntz,J.M.Blaney,S.J.Oatley,R.Langridge,andT.E.Ferrin.Ageometricapproachtomacromolecule-ligandinteractions.J.Mol.Biol.,161:269{288,1982.[144]T.-M.Kuo,C.-P.Lee,andC.-J.Lin.Large-scalekernelRankSVM.SIAMInternationalConferenceonDataMining,2014.265[145]G.Lamm.ThePoisson-Boltzmannequation.InK.B.Lipkowitz,R.Larter,andT.R.Cundari,editors,ReviewsinComputationalChemistry,pages147{366.JohnWileyandSons,Inc.,Hoboken,N.J.,2003.[146]A.R.Leach,B.K.Shoichet,andC.E.PPredictionofprotein-ligandinterac-tions.dockingandscoring:Successesandgaps.J.Med.Chem.,49:5851{5855,2006.[147]B.LeeandF.M.Richards.Theinterpretationofproteinstructures:estimationofstaticaccessibility.JMolBiol,55(3):379{400,1971.[148]ChengtehLee,WeitaoYang,andRobertG.Parr.DevelopmentoftheColle-Salvetticorrelation-energyformulaintoafunctionaloftheelectrondensity.PhysicalReviewB,37(2):785,1988.[149]R.J.LeVequeandZ.L.Li.Theimmersedinterfacemethodforellipticequationswithdiscontinuouscoetsandsingularsources.SIAMJ.Numer.Anal.,31:1019{1044,1994.[150]Guo-BoLi,Ling-LingYang,Wen-JingWang,Lin-LiLi,andSheng-YongYang.ID-Score:Anewempiricalscoringfunctionbasedonacomprehensivesetofdescriptorsrelatedtoproteinligandinteractions.J.Chem.Inf.Model.,53(3):592{600,2013.[151]H.Li,K.S.Leung,P.J.Ballester,andM.H.Wong.iStar:Awebplatformforlarge-scaleprotein-liganddocking.PlosOne,9(1),2014.[152]HongjianLi,Kwong-SakLeung,ManHonWong,andPedroJBallester.Substitutingrandomforestformultiplelinearregressionimprovesbindingypredictionofscoringfunctions:Cyscoreasacasestudy.BMCBioinformatics,15(291),2014.[153]JiaboLi,TianhaiZhu,GergoryD.Hawkins,PaulWinget,DanielA.Liotard,Christo-pherJ.Cramer,andDonaldG.Truhlar.Extensionoftheplatformofapplicabilityofthesm5.42runiversalsolvationmodel.TheoreticalChemistryAccounts,103,1999.[154]LinLi,ChuanLi,SubhraSarkar,JieZhang,ShawnWitham,ZheZhang,LinWang,NicholasSmith,MarharytaPetukh,andEmilAlexov.Delphi:acomprehensivesuitefordelphisoftwareandassociatedresources.BMCBiophysics,5:9:2046{1682,2012.[155]V.J.LicataandN.M.Allewell.Functionallylinkedhydrationchangesinescherichiacoliaspartatetranscarbamylaseanditscatalyticsubunit.Biochemistry,36(33):10161{10167,1997.266[156]Jung-HsinLin,NathanAndrewBaker,andJ.AndrewMcCammon.Bridgingtheim-plicitandexplicitsolventapproachesformembraneelectrostatics.BiophysicalJournal,83(3):1374{1379,2002.[157]BeibeiLiu,BaoWang,RundongZhao,YiyingTong,andGuoWeiWei.ESES:softwareforEuleriansolventexcludedsurface.Preprint,2015.[158]JieLiuandRenxiaoWang.ofcurrentscoringfunctions.JournalofChemicalInformationandModel,55(3):475{482,2015.[159]T.T.Liu,M.X.Chen,andB.Z.Lu.Parameterizationformoleculargaussiansurfaceandacomparisonstudyofsurfacemeshgeneration.J.MolecularModeling,21(5):113,2015.[160]ZhihaiLiu,YanLi,LiHan,JieLiu,ZhixiongZhao,WeiNie,YuchenLiu,andRenxiaoWang.PDB-widecollectionofbindingdata:currentstatusofthePDBbinddatabase.Bioinformatics,31(3):405{412,2015.[161]J.R.Livingstone,R.S.Spolar,andM.T.RecordJr.Contributiontothethermody-namicsofproteinfoldingfromthereductioninwater-accessiblenonpolarsurfacearea.Biochemistry,30(17):4237{44,1991.[162]BenzhuoLu,XiaolinCheng,JingfangHuang,andJ.AndrewMcCammon.AFMPB:AnAdaptiveFastMultipolePoisson-BoltzmannSolverforCalculatingElectrostaticsinBiomolecularSystems.Comput.Phys.Commun.,184:2618{2619,2013.[163]K.Lum,D.Chandler,andJ.D.Weeks.Hydrophobicityatsmallandlargelengthscales.JournalofPhysicalChemistryB,103(22):4570{7,1999.[164]Jr.MacKerell,A.D.,D.Bashford,M.Bellot,Jr.Dunbrack,R.L.,J.D.Evanseck,M.J.Field,S.Fischer,J.Gao,H.Guo,S.Ha,D.Joseph-McCarthy,L.Kuchnir,K.Kuczera,F.T.K.Lau,C.Mattos,S.Michnick,T.Ngo,D.T.Nguyen,B.Prodhom,IIIReiher,W.E.,B.Roux,M.Schlenkrich,J.C.Smith,R.Stote,J.Straub,M.Watanabe,J.Wiorkiewicz-Kuczera,D.Yin,andM.Karplus.All-atomempiricalpotentialformolecularmodelinganddynamicsstudiesofproteins.JournalofPhysicalChemistryB,102(18):3586{3616,1998.[165]AleksandrV.Marenich,ChristopherJ.Cramer,,andDonaldG.Truhlar.PerformanceofSM6,SM8,andSMDontheSAMPL1testsetforthepredictionofsmall-moleculesolvationfreeenergies.JournalofPhysicalChemistryB,113:45384543,2009.267[166]A.MarquinaandS.Osher.Explicitalgorithmsforanewtimedependentmodelbasedonlevelsetmotionfornonlineardeblurringandnoiseremoval.SIAMJournalonComputing,22(2):387{405,2000.[167]IrinaMassovaandPeterAKollman.Combinedmolecularmechanicalandcontinuumsolventapproach(MM-PBSA/GBSA)topredictligandbinding.Perspectivesindrugdiscoveryanddesign,18(1):113{135,2000.[168]A.Mayo.ThefastsolutionofPoisson'sandthebiharmonicequationsonirregularregions.SIAMJ.Numer.Anal.,21:285{299,1984.[169]PaoloMazzatorta,Lin-AnhTran,BenotSchilter,andMartinGrigorov.Integrationofstructure-activityrelationshipandarintelligencesystemstoimproveinsilicopredictionofamestestmutagenicity.J.Chem.Inf.Model.,47(1):34{38,2007.[170]A.McKenney,L.Greengard,andA.Mayo.AfastPoissonsolverforcomplexgeome-tries.J.Comput.Phys.,118:348{355,1995.[171]ElaineC.Meng,BrianK.Shoichet,andIrwinD.Kuntz.Automateddockingwithgrid-basedenergyevaluation.JournalofComputationalChemistry,13:505{524,1992.[172]K.MischaikowandV.Nanda.Morsetheoryforandcientcomputationofpersistenthomology.DiscreteandComputationalGeometry,50(2):330{353,2013.[173]DavidL.Mobley,ChristopherI.Bayly,MatthewD.Cooper,andKenA.Dill.Pre-dictionsofhydrationfreeenergiesfromall-atommoleculardynamicssimulations.J.Phys.Chem.B.,13:4533{4537,2009.[174]DavidL.Mobley,KenA.Dill,andJohnD.Chodera.Treatingentropyandconforma-tionalchangesinimplicitsolventsimulationsofsmallmolecules.J.Phys.Chem.B.,112:938{946,2008.[175]DavidL.MobleyandJ.PeterGuthrie.Freesolv:adatabaseofexperimentalandcal-culatedhydrationfreeenergies,withinputJournalofComputer-AidedMolecularDesign,28:711{720,2014.[176]DavidL.Mobley,ShauiLiu,DavidS.Cerutti,WilliamC.Swope,andJuliaE.Rice.Alchemicalpredictionofhydrationfreeenergiesforsampl.J.ComputAidedMolDes.,26:551{562,2012.268[177]DavidL.Mobley,KarisaL.Wymer,NathanM.Lim,andJ.PeterGuthrie.Blindpredictionofsolvationfreeenergiesfromthesampl4challenge.J.ComputAidedMolDes,28:135{150,2014.[178]J.Mongan,C.Simmerling,J.A.McCammon,D.A.Case,andA.Onufriev.Gen-eralizedBornmodelwithasimple,robustmolecularvolumecorrection.JournalofChemicalTheoryandComputation,3(1):159{69,2007.[179]G.M.Morris,D.S.Goodsell,R.S.Halliday,R.Huey,W.E.Hart,R.K.Belew,andA.J.Olson.Automateddockingusingalamarckiangeneticalgorithmandanempiricalbindingfreeenergyfunction.JournalofComputationalChemistry,19:1639{62,1998.[180]IMueggeandYCMartin.Ageneralandfastscoringfunctionforprotein-ligandinteractions:apotentialapproach.JMedChem.,42(5):791{804,1999.[181]ViditNanda.Perseus:thepersistenthomologysoftware.Softwareavailableathttp://www.sas.upenn.edu/vnanda/perseus.[182]R.R.NetzandH.Orland.BeyondPoisson-Boltzmann:Fluctuationsandcor-relationfunctions.EuropeanPhysicalJournalE,1(2-3):203{14,2000.[183]AnthonyNicholls,DavidL.Mobley,J.PeterGuthrie,JohnD.Chodera,ChridtopherI.Bayly,MatthewD.Cooper,andVijayS.Pande.Predictingsmall-moleculesolvationfreeenergies:Aninformalblindtestforcomputationalchemistry.J.Med.Chem.,51:769{799,2008.[184]AnthonyNicholls,DavidLMobley,JPeterGuthrie,JohnDChodera,ChristopherIBayly,MatthewDCooper,andVijaySPande.Predictingsmall-moleculesolvationfreeenergies:aninformalblindtestforcomputationalchemistry.JournalofMedicinalChemistry,51(4):769{779,2008.[185]AnthonyNicholls,DavidL.Mobley,PeterJ.Guthrie,JohnD.Chodera,andVijayS.Pande.Predictingsmall-moleculesolvationfreeenergies:Aninformalblindtestforcomputationalchemistry.JournalofMedicinalChemistry,51(4):769{79,2008.[186]M.Nina,W.Im,andB.Roux.Optimizedatomicradiiforproteincontinuumelectro-staticssolvationforces.BiophysicalChemistry,78(1-2):89{96,1999.[187]F.N.Novikov,A.A.Zeifman,O.V.Stroganov,V.S.Stroylov,V.Kulkov,andG.G.Chilov.CSARScoringchallengerevealstheneedfornewconceptsinesti-matingprotein-ligandbindingy.JournalofChemicalInformationandModel,51:2090{2096,2011.269[188]A.Okur,L.Wickstrom,M.Layten,R.Geney,K.Song,V.Hornak,andC.Sim-merling.Improvedofreplicaexchangesimulationsthroughuseofahy-bridexplicit/implicitsolvationmodel.JournalofChemicalTheoryandComputation,2(2):420{433,2006.[189]MatsH.M.Olsson,ChrestenR.Sondergaard,MichalRostkowski,andJanH.Jensen.PROPKA3:consistenttreatmentofinternalandsurfaceresiduesinempiricalpkapredictions.J.Chem.TheoryComput.,7(2):525{537,2011.[190]A.Onufriev,D.Bashford,andD.A.Case.MoofthegeneralizedBornmodelsuitableformacromolecules.JournalofPhysicalChemistryB,104(15):3712{3720,2000.[191]A.Onufriev,D.A.Case,andD.Bashford.eBornradiiinthegeneralizedBornapproximation:theimportanceofbeingperfect.JournalofComputationalChemistry,23(14):1297{304,2002.[192]M.OrozcoandF.J.Luque.Theoreticalmethodsforthedescriptionofthesolventinbiomolecularsystems.Chem.Rev.,100:4187{4225,2000.[193]A.R.Ortiz,M.T.Pisabarro,F.Gago,andR.C.Wade.Predictionofdrugbindingbycomparativebindingenergyanalysis.J.Med.Chem,38:2681{2691,1995.[194]StanleyOsherandRonaldP.Fedkiw.Levelsetmethods:Anoverviewandsomerecentresults.J.Comput.Phys.,169(2):463{502,2001.[195]InsookPark,YunHeeJang,SunguHwang,andDooSooChung.Poisson-boltzmanncontinuumsolvationmodelsfornonaqueoussolventsi.1-octanol.ChemistryLetters,32:4,2003.[196]E.F.Pettersen,T.D.Goddard,C.C.Huang,G.S.Couch,D.M.Greenblatt,E.C.Meng,andT.E.Ferrin.UCSFChimera{avisualizationsystemforexploratoryresearchandanalysis.Journalofcomputationalchemistry,25(13):1605{1612,2004.[197]R.A.Pierotti.Ascaledparticletheoryofaqueousandnonaqeoussolutions.ChemicalReviews,76(6):717{726,1976.[198]J.W.Ponder,C.J.Wu,P.Y.Ren,V.S.Pande,J.D.Chodera,M.J.Schnieders,I.Haque,D.L.Mobley,D.S.Lambrecht,R.A.DiStasio,M.Head-Gordon,G.N.I.Clark,M.E.Johnson,andT.Head-Gordon.CurrentstatusoftheamoebapolarizableforceJ.Phys.Chem.B,114:2549{2564,2010.270[199]N.V.Prabhu,P.Zhu,andK.A.Sharp.Implementationandtestingofstable,fastimplicitsolvationinmoleculardynamicsusingthesmooth-permittivityPoisson-Boltzmannmethod.JournalofComputationalChemistry,25(16):2049{2064,2004.[200]RosaRamirezandDanielBorgis.Densityfunctionaltheoryofsolvationanditsrelationtoimplicitsolventmodels.JournalofPhysicalChemistryB,109:6754{6763,2005.[201]MRarey,BKramer,TLengauer,andGKlebe.Afastdockingmethodusinganincrementalconstructionalgorithm.JMolBiol.,261(3):470{489,1996.[202]E.L.Ratkova,G.N.Chuev,V.P.Sergiievskyi,andM.V.Fedorov.Anaccuratepredictionofhydrationfreeenergiesbycombinationofmolecularintegralequationstheorywithstructuraldescriptors.J.Phys.Chem.B,114(37):12068{2079,2010.[203]EkaterinaL.Ratkova,GennadyN.Chuev,VolodynyrP.Sergiievskyi,andMaximV.Fedorov.Anaccuratepredictionofhydrationfreeenergiesbycombinationofmolecularintegralequationstheorywithstructuraldescriptors.JournalofPhysicalChemistryB,114:12068{12079,2010.[204]C.Reichardt.Solventsandsolventinorganicchemistry.InSolventsandSolventctsinOrganicChemistry.VCH:NewYork,1990.[205]JensReinischandAndreasKlamt.Predictionoffreeenergiesofhydrationwithcosmo-rsonthesampl4dataset.J.ComputAidedMolDes,28:169{173,2014.[206]F.M.Richards.Areas,volumes,packing,andproteinstructure.AnnualReviewofBiophysicsandBioengineering,6(1):151{176,1977.[207]RobertC.Rizzo,TibaAynechi,DavidA.Case,andIrwinD.Kuntz.Estimationofabsolutefreeenergiesofhydrationusingcontinuummethods:Accuracyofpartialchargemodelsandoptimizationofnonpolarcontributions.JournalofChemicalTheoryandComputation,2:128{139,2006.[208]W.Rocchia,E.Alexov,andB.Honig.Extendingtheapplicabilityofthenonlinearpoisson-boltzmannequation:Multipledielectricconstantsandmultivalentions.J.Phys.Chem.,105:6507{6514,2001.[209]W.Rocchia,S.Sridharan,A.Nicholls,EAlexov,AChiabrera,andB.Honig.Rapidgrid-basedconstructionofthemolecularsurfaceandtheuseofinducedsurfacechargetocalculatereactionldenergies:Applicationstothemolecularsystemsandgeomet-ricobjects.JournalofComputationalChemistry,23:128{137,2002.271[210]MichalRostkowski,MatsHMOlsson,ChrestenRSondergaard,andJanHJensen.GraphicalanalysisofpH-dependentpropertiesofproteinspredictedusingPROPKA.BMCStructuralBiology,11(6),2011.[211]B.RouxandT.Simonson.Implicitsolventmodels.BiophysicalChemistry,78(1-2):1{20,1999.[212]M.F.Sanner,A.J.Olson,andJ.C.Spehner.Reducedsurface:Antwaytocomputemolecularsurfaces.Biopolymers,38:305{320,1996.[213]G.MadhaviSastry,MatveyAdzhigirey,TylerDay,RamakrishnaAnnabhimoju,andWoodySherman.Proteinandligandpreparation:parameters,protocols,andonvirtualscreeningenrichments.J.Comput.Aid.Mol.Des.,27:221{234,2013.[214]A.SavelyevandG.A.Papoian.Inter-DNAelectrostaticsfromexplicitsolventmolecu-lardynamicssimulations.JournaloftheAmericanChemicalSociety,129(19):6060{1,2007.[215]BKSchichet.Virtualscreeningofchemicallibraries.Nature,432(7019):862{865,2004.[216]TamarSchlick.InnovationsinBiomolecularModelingandSimulations:Volume1.RoyalSocietyofChemistry,2012.[217]J.A.Sethian.LevelSetMethodsandFastMarchingMethods,volume3ofMonographsonAppl.Comput.Math.CambridgeUniversityPress,Cambridge,2ndedition,1999.[218]K.A.SharpandB.Honig.CalculatingtotalelectrostaticenergieswiththenonlinearPoisson-Boltzmannequation.JournalofPhysicalChemistry,94:7684{7692,1990.[219]K.A.SharpandB.Honig.Electrostaticinteractionsinmacromolecules-theoryandapplications.AnnualReviewofBiophysicsandBiophysicalChemistry,19:301{332,1990.[220]DevleenaShlvakumar,JoshuaWillams,YujieWu,WolfgangDamm,JohnShelly,andWoodySherman.Predictionofabsolutesolvationfreenergiesusingmoleculardynam-icsfreeenergyperturbationandtheoplsforceJournalofChemicalTheoryandComputation,6(5):1509{1519,2010.[221]DoreeNirBen-Tal,andBarryHonig.Calculationofalkanetowatersolvationfreeenergiesusingcontinuumsolventmodels.J.Phys.Chem.,100:2744{2752,1996.272[222]R.Skyner,J.L.McDonagh,C.R.Groom,T.vanMourik,andJ.B.O.Mitchell.Areviewofmethodsforthecalculationofsolutionfreeenergiesandthemodellingofsystemsinsolution.Phys.Chem.Chem.Phys,17(9):6174,2015.[223]PeterSmereka.Thenumericalapproximationofadeltafunctionwithapplicationtolevelsetmethods.J.Comput.Phys.,211(1):77{90,2006.[224]J.M.Soler,E.Artacho,J.D.Gale,A.Garca,J.Junquera,P.Ordejn,andD.Snchez-Portal.Thesiestamethodforab-initioorder-nmaterialssimulation.J.Phys.:Con-dens.Matt.,14:2745{2779,2002.[225]R.S.Spolar,J.H.Ha,andM.T.RecordJr.Hydrophobicinproteinfoldingandothernoncovalentprocessesinvolvingproteins.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,86(21):8382{8385,1989.[226]F.H.Stillinger.Structureinaqueoussolutionsofnonpolarsolutesfromthestandpointofscaled-particletheory.J.SolutionChem.,2:141{158,1973.[227]A.J.Stone.Distributedmultipoleanalysis,orhowtodescribeamolecularchargedistribution.ChemicalPhysicsLetters,83(2):233{239,1981.[228]JoeyW.Storer,DavidJ.Giesen,GregoryD.Hawkins,GillianC.Lynch,ChristopherJ.Cramer,DonaldG.Truhlar,andDanielA.Liotard.Solvationmodelinginaqueousandnonaqueoussolvent,newtechniquesandareexaminationoftheclaisenrearrangement.InC.J.CramerandD.G.Truhlar,editors,Structure,Energetics,andReactivityinAqueousSolution:CharacterizationofChemicalandBiologicalSystems,568,pages24{49.AmericanChemicalSocietySymposium,1994.[229]Pin-ChihSu,Cheng-ChiehTsai,ShahilaMehboob,KirkE.Heveber,andMichaelE.Johnson.Comparisonofradiisets,entropy,qmmethods,andsamplingonMM-PBSA,MM-GBSA,andQM/MM-GBSAligandbindingenergiesoff.tularensisenoyl-acpreductase(fabl).JournalofComputationalChemistry,36:1859{1873,2015.[230]J.M.J.Swanson,R.H.Henchman,andJ.A.McCammon.Revisitingfreeenergycalculations:AtheoreticalconnectiontoMM/PBSAanddirectcalculationoftheassociationfreeenergy.BiophysicalJournal,86(1):67{74,2004.[231]J.M.J.Swanson,J.Mongan,andJ.A.McCammon.Limitationsofatom-centereddielectricfunctionsinimplicitsolventmodels.JournalofPhysicalChemistryB,109(31):14769{72,2005.273[232]J.M.J.Swanson,J.A.Wagoner,N.A.Baker,andJ.A.McCammon.Optimizingthepoissondielectricboundaywithexplicitsolventforcesandenergies:Lessonslearnedwithatom-centereddielectricfunctions.JournalofChemicalTheoryandComputation,3(1):170{83,2007.[233]C.Tan,L.Yang,andR.Luo.HowwelldoesPoisson-Boltzmannimplicitsolventagreewithexplicitsolvent?Aquantitativeanalysis.JournalofPhysicalChemistryB,110(37):18680{18687,2006.[234]JianJ.Tan,WeiZ.Chen,andCunX.Wang.InvestigatinginteractionsbetweenHIV-1gp41andinhibitorsbymoleculardynamicssimulationandMM-PBSA/GBSAcalculations.JournalofMolecularStructure:Theochem.,766(2-3):77{82,2006.[235]D.G.Thomas,J.Chun,Z.Chen,G.W.Wei,andN.A.Baker.Parameterizationofageometricwimplicitsolvationmodel.J.Comput.Chem.,24:687{695,2013.[236]H.TjongandH.X.Zhou.GBr6NL:AgeneralizedBornmethodforaccuratelyre-producingsolvationenergyofthenonlinearPoisson-Boltzmannequation.JournalofChemicalPhysics,126:195102,2007.[237]JacopoTomasi,BenedettaMennucci,andRobertoCammi.Quantummechanicalcon-tinuumsolvationmodels.Chem.Rev.,105:2999{3093,2005.[238]JacopoTomasi,BenedettaMennucci,andRobertoCammi.Quantummechanicalcon-tinuumsolventmodels.Chem.Rev.,105:2999{3093,2005.[239]JacopoTomasiandMaurizioPersico.Molecularinteractionsinsolution:anoverviewofmethodsbasedoncontinuousdistributionsofthesolvent.Chem.Rev.,94:2027{2094,1994.[240]O.TrottandA.J.Olson.AutoDockVina:improvingthespeedandaccuracyofdockingwithanewscoringfunction,toptimization,andmultithreading.JComputatChem,31(2):455{461,2010.[241]V.TsuiandD.A.Case.Moleculardynamicssimulationsofnucleicacidswithagener-alizedBornsolvationmodel.JournaloftheAmericanChemicalSociety,122(11):2489{2498,2000.[242]WeijoV1,RandrianarivonyM,HarbrechtH,andFredianiL.Waveletformulationofthepolarizablecontinuummodel.JournalofComputationalChemistry,31(7):1469{1477,2010.274[243]H.F.G.Velec,H.Gohlke,andG.Klebe.Knowledge-basedscoringfunctionderivedfromsmallmoleculecrystaldatawithsuperiorrecognitionrateofnear-nativeligandposesandbetteryprediction.J.Med.Chem,48:6296{6303,2005.[244]HFVelec,HGohlke,andGKlebe.DrugScore(CSD)-knowledge-basedscoringfunctionderivedfromsmallmoleculecrystaldatawithsuperiorrecognitionrateofnear-nativeligandposesandbetteryprediction.JMedChem.,48(20):6296{303,2005.[245]G.Verkhivker,K.Appelt,S.T.Freer,andJ.E.Villafranca.Empiricalfreeenergycalculationsofligand-proteincrystallographiccomplexes.i.knowledgebasedligand-proteininteractionpotentialsappliedtothepredictionofhumanimmunovirusproteasebindingy.ProteinEng,8:677{691,1995.[246]J.A.WagonerandN.A.Baker.Assessingimplicitmodelsfornonpolarmeansolvationforces:theimportanceofdispersionandvolumeterms.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,103(22):8331{6,2006.[247]N.WaleandG.Karypis.Targetforchemicalcompoundsusingtarget-ligandactivitydataandrankingbasedmethods.JournalofChemicalInformationandModel,49(10):2190{201,2009.[248]B.WangandG.W.Wei.Parameteroptimizationintialgeometrybasedsol-vationmodels.JournalChemicalPhysics,143:134119,2015.[249]BaoWang,ChengzhangWang,andGuoweiWei.Featurefunctionaltheory-solvationpredictorfortheblindpredictionofsolvationfreeenergy.Preprint.[250]BaoWang,ChengzhangWang,andGuoweiWei.Learningtorankforsolvationfreeenergyprediction.Preprint,2016.[251]BaoWangandG.W.Wei.Objective-orientedPersistentHomology.ArXive-prints,December2014.[252]BaoWangandGuo-WeiWei.Coarsegridpoissonboltzmannsolverwithoutlossofaccuracy.Preprint.[253]BaoWangandGuo-WeiWei.Accurate,robustandreliablecalculationsofpoisson-boltzmannsolvationandbindingenergies.Preprint,2016.[254]BaoWang,ZhixiongZhao,andGuoweiWei.Featurefunctionaltheory-bindingpredictorfortheblindpredictionofbindingfreeenergy.Preprint.275[255]BaoWang,ZhixiongZhao,andGuoweiWei.Hybridphysicalandstatisticalmodelsfortheblindpredictionofsolvationfreeenergies.Preprint.[256]BaoWang,ZhixiongZhao,andGuoweiWei.Hybridphysicalandstatisticalmodelsfortheblindpredictionofsolvationfreeenergies.preprint,2016.[257]ChanghaoWang,PeterH.Nguyen,KevinPham,DanielleHuynh,Thanh-BinhNancyLe,HongliWang,PengyuRen,andRayLuo.Calculatingprotein-ligandbindingwithmmpbsa:Methodanderroranalysis.Preprint.[258]J.Wang,R.M.Wolf,J.W.Caldwell,P.A.Kollman,andD.A.Case.DevelopmentandtestingofageneralAMBERforceJournalofComputationalChemistry,25(9):1157{74,2004.[259]JunWang,QinCai,YeXiang,andRayLuo.ReducingGridDependenceinFinitePoissonBoltzmannCalculations.JournalofChemicalTheoryandCompu-tation,8:2741{2751,2012.[260]JunmeiWang,WeiWang,ShuanghongHuo,MatthewLee,andPeterA.Kollman.Solvationmodelbasedonweightedsolventaccessiblesurfacearea.JournalofPhysicalChemistryB,105:5055{5067,2001.[261]JunmeiWang,WeiWang,ShuanghongHuo,MatthewLes,andPeterA.Kollman.Solvationmodelbasedonweightedsolventccessiblesurfacearea.J.Phys.Chem.B,105:5055{5067,2001.[262]M.L.WangandC.F.Wong.Calculationofsolvationfreeenergyfromquantumme-chanicalchargedensityandcontinuumdielectrictheory.J.Phys.Chem.A,110:4873{4879,2006.[263]M.L.Wang,C.F.Wong,J.H.Liu,andP.X.Zhang.tquantummechanicalcalculationofsolvationfreeenergiesbasedondensityfunctionaltheory,numericalatomicorbitalsandpoisson-boltzmannequation.ChemicalPhysicsLetters,442:464{467,2007.[264]R.Wang,L.Lai,andS.Wang.Furtherdevelopmentandvalidationofempiricalscoringfunctionsforstructurebasedbindingyprediction.J.Comput.Aided.Mol.Des,16:11{26,2002.[265]RenxiaoWang,YiPinLu,andShaomengWang.Comparativeevaluationof11scoringfunctionsformoleculardocking.J.Med.Chem.,46:2287{2303,2003.276[266]A.WarshelandA.Papazyan.Electrostaticinmacromolecules:fundamentalconceptsandpracticalmodeling.CurrentOpinioninStructuralBiology,8(2):211{217,1998.[267]A.M.Wassermann,H.Geppert,andJ.R.Bajorath.Searchingfortarget-selectivecompoundsusingtcombinationsofmulticlasssupportvectormachinerankingmethods,kernelfunctions,andtdescriptors.JournalofChemicalInforma-tionandModel,49(3):582{92,2009.[268]StillWC,TempczykA,HawleyRC,andHendricksonT.Semianalyticaltreatmentofsolvationformolecularmechanicsanddynamics.JAmChemSoc,112(16):6127{6129,1990.[269]G.W.Wei.GeneralizedPerona-Malikequationforimagerestoration.IEEESignalProcessingLett.,6:165{167,1999.[270]G.W.Wei.tialgeometrybasedmultiscalemodels.BulletinofMathematicalBiology,72:1562{1622,2010.[271]G.W.Wei,Y.H.Sun,Y.C.Zhou,andM.Feig.Molecularmultiresolutionsurfaces.arXiv:math-ph/0511001v1,pages1{11,2005.[272]Guo-WeiWei.Multiscale,multiphysicsandmultidomainmodelsI:Basictheory.Jour-nalofTheoreticalandComputationalChemistry,12(8):1341006,2013.[273]Guo-WeiWei,QiongZheng,ZhanChen,andKelinXia.Variationalmultiscalemodelsforchargetransport.SIAMReview,54(4):699{754,2012.[274]S.J.Weiner,P.A.Kollman,D.T.Nguyem,andD.A.Case.Anallatomforsimulationsofproteinsandnucleic-acids.JCompChem,7(2):230{252,1986.[275]T.J.Willmore.RiemannianGeometry.OxfordUniversityPress,USA,1997.[276]J.M.WordandA.Nicholls.Applicationofthegaussiandielectricboundaryinzaptothepredictionofproteinpkavalues.Proteins,79(12):3400{3409,2011.[277]S.Yin,L.Biedermannova,J.Vondrasek,andN.V.Dokholyan.Medusascore:Anacu-rateforcescoringfunctionforvirtualdrugscreening.JournalofChemicalInformationandModel,48:1656{1662,2008.277[278]S.N.Yu,W.H.Geng,andG.W.Wei.Treatmentofgeometricsingularitiesinimplicitsolventmodels.JournalofChemicalPhysics,126:244108,2007.[279]S.N.YuandG.W.Wei.Three-dimensionalmatchedinterfaceandboundary(MIB)methodfortreatinggeometricsingularities.J.Comput.Phys.,227:602{632,2007.[280]S.N.Yu,Y.C.Zhou,andG.W.Wei.Matchedinterfaceandboundary(MIB)methodforellipticproblemswithsharp-edgedinterfaces.J.Comput.Phys.,224(2):729{756,2007.[281]Z.Y.YuandC.Bajaj.Computationalapproachesforautomaticstructuralanalysisoflargebiomolecularcomplexes.IEEE/ACMTransComputBiolBioinform,5:568{582,2008.[282]CZhang,SLiu,QZhu,andYZhou.Aknowledge-basedenergyfunctionforprotein-ligand,protein-protein,andprotein-dnacomplexes.JMedChem.,48(7):2325{35,2005.[283]WeiZhang,LijuanJi,YananChen,KailinTang,HaipingWang,RuixinZhu,WeiJia,ZhiweiCao,andQiLiu.Whendrugdiscoverymeetswebsearch:Learningtorankforligand-basedvirtualscreening.JournalofCheminformatics,7(5),2015.[284]ShanZhao.Pseudo-time-couplednonlinearmodelsforbiomolecularsurfacerepresenta-tionandsolvationanalysis.InternationalJournalforNumericalMethodsinBiomedicalEngineering,27:1964{1981,2011.[285]ShanZhao.Operatorsplittingadischemesforpseudo-timecouplednonlinearsolvationsimulations.JournalofComputationalPhysics,257:1000{1021,2014.[286]ShanZhaoandG.W.Wei.High-orderFDTDmethodsviaderivativematchingforMaxwell'sequationswithmaterialinterfaces.J.Comput.Phys.,200(1):60{103,2004.[287]ZhengZhengandKennethM.MerzJr.Ligandidenscoringalgorithm(LISA).JournalofChemicalInformationandModel,51:1296{1306,2011.[288]ZhengZhengandKennethM.MerzJr.Developmentoftheknowledge-basedandempiricalcombinedscoringalgorithm(KECSA)toscoreproteinligandinteractions.JournalofChemicalInformationandModel,53:1073{1083,2013.[289]ZhengZheng,MelekN.Ucisik,andKennethM.MerzJr.Themovabletypemethodap-pliedtoproteinligandbinding.JournalofChemicalTheoryandComputation,9:5526{5538,2013.278[290]ZhengZheng,TingWang,PengfeiLi,andKennethM.MerzJr.KECSA-Movabletypeimplicitsolvationmodel(KMTISM).JournalofChemicalTheoryandComputation,11:667{682,2015.[291]S.G.Zhou,L.T.Cheng,H.Sun,J.W.Che,J.Dzubiella,B.Li,andJ.A.McCam-mon.Ls-vism:Asoftwarepackageforanalysisofbiomolecularsolvation.JournalofComputationalChemistry,36:1047{1059,2015.[292]Y.C.Zhou.Matchedinterfaceandboundary(mib)methodanditsapplicationstoimplicitsolventmodelingofbiomolecules.Ph.D.Thesis.,MichiganStateUniversity,2006.[293]Y.C.Zhou,M.Feig,andG.W.Wei.Highlyaccuratebiomolecularelectrostaticsincontinuumdielectricenvironments.JournalofComputationalChemistry,29:87{97,2008.[294]Y.C.ZhouandG.W.Wei.Ontheitious-domainandinterpolationformulationsofthematchedinterfaceandboundary(MIB)method.J.Comput.Phys.,219(1):228{246,2006.[295]Y.C.Zhou,ShanZhao,MichaelFeig,andG.W.Wei.Highordermatchedinterfaceandboundarymethodforellipticequationswithdiscontinuouscotsandsingularsources.J.Comput.Phys.,213(1):1{30,2006.[296]J.Zhu,E.Alexov,andB.Honig.ComparativestudyofgeneralizedBornmodels:Bornradiiandpeptidefolding.JournalofPhysicalChemistryB,109(7):3008{22,2005.279