THEFLEXIBILITY-RIGIDITYINDEX(FRI):THEORYANDAPPLICATIONSByKristopherOpronADISSERTATIONSubmittedtoMichiganStateUniversityinpartialentoftherequirementsforthedegreeofBiochemistryandMolecularBiology-DoctorofPhilosophy2016ABSTRACTTHEFLEXIBILITY-RIGIDITYINDEX(FRI):THEORYANDAPPLICATIONSByKristopherOpronSincetheproteinstructuresweresolvedinthe1950s,theproteindatabankhasgrowntoincludeoveronehundredthousandmacromolecularstructuresranginginsizefromsmallpeptidestolargeviralcapsids.Theseexperimentshaveshownthatproteinsexhibitadiverserangeofstructureandfunctionandthatthesetwoaspectsarecloselyrelated.Infact,itisoftenpossibletopredictaprotein'sfunctionfromitsstructurealone.Muchofthefocustodatehasbeenonthemorestaticregionsofproteinsfortheoreticalandpracticalreasons.However,itisimportanttonotethatevenwellfoldedproteinsexperienceeverlastingduetotheconstantfromoutsideforces,whichdrivemotionsthatarerelevanttofunctionsuchassidechainandconformationalshifts.Thepossiblemovementsthatcanarisefromthesearedeterminedbyaprotein'sstructure.Thismeansibility,ortheabilitytodeformfromthecurrentconformationunderexternalforces,isanintrinsicpropertyofallproteins,andiscloselytiedtofunction.Inordertobetterstudyproteinfunctioninorderedordisorderedproteins,werequireaccurate,t,multiscaletoolsforevaluatingy.Thisworkputsforwardamultiscale,multiphysicsandmultidomainmodel,theity-rigidityindex(FRI),toestimatethetyandconformationalmotionsofmacromolec-ularstructures.ThebasicassumptionofthepresentFRItheoryisthatthegeometryorstructureofagivenprotein,togetherwithitsspenvironment,completelydeterminesthebiologicalfunctionandpropertiesincludingyandcharge.Tothisend,weuti-lizemonotonicallydecreasingfunctionstomeasurethegeometriccompactnessofaproteinandquantifythetopologicalconnectivityofatomsorresiduesintheproteinsandnucleicacids.Wethetotalrigidityofamoleculebyasummationofatomicrigidities.ApracticalvalidationoftheproposedFRIforyanalysisisprovidedbythepredictionofB-factors,ortemperaturefactorsofproteins,measuredbyX-raycrystallography.Weemployatestsetof263structurallydistinctproteinstoexaminethevalidityandrobustnessoftheproposedFRImethodforB-factorestimationoryprediction.ThebasicFRIalgorithmoutperformsGNMonthistestsetbyabout20%.AftervalidationofthebasicFRImethodweintroduceamultikernel-basedmultiscaleFRI(mFRI)strategytoanalyzemacromoleculary.Theessentialideaistoemploytwoorthreekernelseachparameterizedwithatscaletocapturethemultiplecharacteristicinteractionscalesofcomplexbiomolecules.Basedonanexpandedtestsetcontaining364proteins,weshowthatthemFRImethodisabout22%moreaccuratethantheGNMmethodinB-factorprediction.Mostimportantly,wedemonstratethatthepresentmFRIgivesrisetoexcellentibilityanalysisformanyproteinsthatarecasesforGNMandthepreviouslyintroducedsingle-scaleFRImethods.Finally,foraproteinofNresidues,weillustratethatthecomputationalcomplexityoftheproposedmFRIisoflinearscalingO(N),incontrasttotheorderofO(N3)forGNM.TABLEOFCONTENTSLISTOFTABLES....................................viiLISTOFFIGURES...................................xLISTOFALGORITHMS...............................xviiiKEYTOABBREVIATIONS.............................xixChapterI.Summary.................................1ChapterII.BackgroundandIntroduction....................62.1Experimentalmethodsforstructuraly.................62.2Computationalmethodsforyanddynamics..............72.3TheFlexibility-RigidityIndex..........................102.3.1fastFRIandanisotropicFRI......................112.3.2FRIforProtein-NucleicAcidComplexes................142.3.3gGNM,mGNMandmANM.......................142.3.4MachinelearningandFRIforprotein-proteininteractions......15ChapterIII.Methods..................................173.1Flexibility-rigidityindex(FRI)..........................173.2FRIcorrelationmapsormatrices........................203.3Fastity-rigidityindex(fFRI).......................233.4Multiscaley-rigidityindex(mFRI)...................253.5Anisotropicy-rigidityindex(aFRI)...................253.5.1Anisotropicrigidity............................273.5.2Anisotropicy...........................273.6GeneralizedGaussiannetworkmodels(gGNMs)................283.7MultiscaleGaussiannetworkmodel(mGNM).................323.7.1Type-1mGNM..............................323.7.2Type-2mGNM..............................333.8Multiscaleanisotropicnetworkmodel(mANM)................353.9gGNMmodecalculationsforpredictinghinges.................363.10Machinelearningandfeatureselection.....................37ChapterIV.ValidationandApplications.....................404.1BasicFRImethod................................404.1.1FRIB-factorprediction.........................404.1.2Rigidityandyvisualization...................504.2FastFRImethod.................................544.2.1fFRIparametertesting..........................544.2.2ComparisonofB-factorpredictionsfromfFRI,GNMandNMA...55iv4.2.2.1FRIvsGNMandNMA....................554.2.2.2fFRIvsGNM.........................584.3MultikernelmultiscaleFRImethod.......................584.3.1mFRIB-factorprediction........................674.3.1.1Multiscalecorrelationsofmacroproteins...........674.3.1.2parameterizationoftwo-kernelbasedmFRI.........694.3.1.3ThreeKernelbasedmFRI..................714.3.2ComputationalcomplexityofmFRI..................714.4MultiscaleFRIapplications...........................744.4.1Fittinghingeregions.......................754.4.2OtherproteinsthatbfrommFRI.................764.4.2.1Cyantprotein...................784.4.2.2AntibioticsynthesisproteinfromThermusthermophilus..804.4.2.3RibosomalsubunitL14.....................824.4.2.4Marinesnailtoxin.......................844.5FRIforprotein-nucleicacidcomplexes.....................854.5.1Coarse-grainedrepresentationsofprotein-nucleicacidcomplexes...854.5.2mFRIB-factorprecictionsforprotein-nucleicacidstructures.....874.5.2.1MultikernelFRItestingonprotein-nucleicstructures....884.5.2.2SinglekernelFRItesting...................884.5.2.3Parameter-freemultikernelFRI................894.6Protein-nucleicacidstructureapplications...................934.6.1mFRIypredictionforribosomes................944.6.2aFRIconformationalmotionpredictiononanRNApolymerasestructure964.7GeneralizedGNM,multiscaleGNMandmultiscaleANMmethods......1004.7.1GeneralizedGaussiannetworkmodel..................1004.7.1.1ComparisonbetweengGNMandFRI.............1004.7.1.2IntrinsicbehaviorofgGNMatlargedistance.....1014.7.1.3ValidationofgGNMwithextensiveexperimentaldata...1054.7.2MultiscaleGaussiannetworkmodel...................1074.7.2.1Type-1mGNM.........................1074.7.2.2Type-2mGNM.........................1094.7.3Multiscaleanisotropicnetworkmodels.................1094.8mGNMandmANMapplications........................1114.8.1B-factorpredictionofcasesusingmGNM...........1114.8.2DomaindecompositionusingmGNM..................1174.8.3CollectivemotionsimulationusingmANM...............1194.9FRI-basedhingepredictionvalidationwithknownhingingproteins.....1204.9.1gGNMmode-basedhingeprediction...................1214.9.2Machinelearningfeatureranking....................1274.9.3SVMmodelpredictionresults......................128ChapterV.ConclusionsandFutureDirections.................1325.1Conclusions....................................1325.2Futuredirections.................................140vBIBLIOGRAPHY....................................142viLISTOFTABLESTable4.1:AveragecorrelationcotsforCB-factorpredictionwithFRI,GNMandNMAforthreestructuresetsfromParketal.52andasupersetof365structures...................................57Table4.2:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forsmall-sizestructures.yGNMandNMAvaluesaretakenfromthecoarse-grained(C)GNMandNMAresultsreportedinParketal.52exceptwherestarred(*).Starredvaluesindicatecorrelationcots,fromourowntestofGNM,thathavetlyincreasedcomparedtothevaluesreportedbyParketal.52....................59Table4.3:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)formedium-sizestructures.yGNMandNMAvaluesaretakenfromthecoarse-grained(C)GNMandNMAresultsreportedinParketal.52exceptwherestarred(*).Starredvaluesindicatecorrelationcots,fromourowntestofGNM,thathavetlyincreasedcomparedtothevaluesreportedbyParketal.52....................60Table4.4:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forlarge-sizestructures.yGNMandNMAvaluesaretakenfromthecoarse-grained(C)GNMandNMAresultsreportedinParketal.52exceptwherestarred(*).Starredvaluesindicatecorrelationcots,fromourowntestofGNM,thathavetlyincreasedcomparedtothevaluesreportedbyParketal.52....................61Table4.5:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forasetof365proteins.GNMscoresreportedherearetheresultofourtestsasdescribedinSection4.1.1......................62Table4.6:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forasetof365proteins.GNMscoresreportedherearetheresultofourtestsasdescribedinSection4.1.1......................63Table4.7:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forasetof365proteins.GNMscoresreportedherearetheresultofourtestsasdescribedinSection4.1.1......................64viiTable4.8:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forasetof365proteins.GNMscoresreportedherearetheresultofourtestsasdescribedinSection4.1.1......................65Table4.9:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forasetof365proteins.GNMscoresreportedherearetheresultofourtestsasdescribedinSection4.1.1......................66Table4.10:Averagecorrelationcots(CC)ofB-factorpredictionforasetof365proteinsusingfFRI(R=12).TheimprovementsofthefFRIovertheGNMprediction(0.565)aregiveninparentheses...........67Table4.11:ImprovementsinaveragedcorrelationcotsfortheB-factorpredic-tionofasetof364proteinsduetotheintroductionofanadditionalkernelparameterizedatalargescale(2).Twoexponentialkernelswith=25areemployed.Thekernel'sscalevalueissetto1=7:0Ainallcases.Thesecondkernel'sscalevalue(2)isvariedandlistedonthetopofthetable.Resultsareorganizedandsplitbythesizeofthestructuresbasedonthenumberofaminoacidsinordertoshowtheimpactoft2valuesontsizesofproteins......................68Table4.12:Correlationcots(CCs)betweenpredictedandexperimentalB-factorsforthesetof64protein-nucleicstructures.78HereN1,N2andN3valuesrepresentthenumberofatomsusedfortheM1,M2orM3representationsforeachstructure.Weusetheparameter-freetwo-kernelmFRImodelwithoneexponentialkernel(=1and=18A)andoneLorentzkernels(˛=3,=18A.PDBIDsmarkedwithanasterisk(*)indicatestructurecontainingonlynucleic-acidresidues..........90Table4.13:ThePDBIDsofthe203highresolutionprotein-nucleicstructuresusedinoursingle-kernelFRIparametertest.IDsmarkedwithanasteriskindicatethosecontainingonlynucleicacidsresidues............92Table4.14:MCCsofGaussiannetworkmodel(GNM),78singlekernely-rigidityindex(FRI)andtwo-kernelmFRIforthreecoarse-grainedrepre-sentations(M1,M2,andM3).Asetof64protein-nucleicacidstructures78isused.....................................93Table4.15:ThebestaveragePCCswithexperimentalB-factors.ResultsforGNMandmGNMareaveragedover362proteins.ResultsforANMandmANMareaveragedover300proteins........................109Table4.16:64Large-sizedproteinsinthe364-proteindataset49butnotincludedinourmANMtestduetolimitedcomputationalresource...........109viiiTable4.17:CasestudyofB-factorpredictionforfourproteinsinthreerentschemes:GNM7,GNM20andmGNM.Inthecaseof1WHI,weusemGNMwithtwokernelsandthreekernels(valueinparentheses)....116Table4.18:gGNM-basedhingepredictionsfor32proteinstructurescomparedwithconsensushingeresiduesdeterminedfromliteratureandotherhingestud-ies.39......................................122Table4.19:gGNM-basedhingepredictionsfor32proteinstructurescomparedwithconsensushingeresiduesdeterminedfromliteratureandotherhingestud-ies.39Y-Thehinge(s)arecompletelyanduniquelyidenP-Apre-dictedhingeisfromatruehingepositionbylessthan5aminoacidsorthereisafalsepositiveornegative,N-Failuretoidentifyanymajorhinges.....................................126Table4.20:SummaryofhitsforgGNM-basedpredictionsofhingesfor32PDBs.Full-Thehinge(s)arecompletelyanduniquelyidenPartial-Apredictedhingeisfromatruehingepositionbylessthan5aminoacidsorthereisafalsepositiveornegative,None-Failuretoidentifyanymajorhinges.................................126Table4.21:FeatureimportancerankingsbyF-score.F-scoresarecalculatedusingtheLIBSVMsoftware............................129Table4.22:Featureimportancerankingsbyrandomforestmethod.Importanceval-uescalculatedusingtheRpackagecaretcomman,varImp.........130Table4.23:SVMresultsforamodelwitheightofthetoprankedfeatures,FRIf,dFRI,FRIf,ishinge,ishinge3,hingedist,HP6,RES6andROT6......130Table4.24:SVMresultsforamodelwithemGNM-basedfeatures,ishinge,ishinge3,hingedist,Mode1andcMode1.........................131ixLISTOFFIGURESFigure2.1:Thestructureofcalmodulin(PDBID:1CLL)visualizedinVMD34andcoloredbyexperimentalB-factors(topleft)andGNMpredictedB-factors(topright)withredrepresentingthemostbleregions.Bot-tom,acomparisonofpredictedB-factorvaluesfrommFRI,GNMwithadistanceof7A,andexperimentalB-factorstakenfromthePDBentry.....................................13Figure3.1:Correlationmapsandsecondarystructurerepresentationsforfourpro-teinstructures.Structuresusedincludethealpha-spectrinSH3do-main,thetetramerizationdomainofthep53tumorsupressor,theB1immunoglobulin-bindingdomainofstreptococcalproteinGandaDNAbindingproteinfromMethanococcusjannaschii,fromlefttoright,toptobottom.CorrelationmapsaregeneratedusingEq.(3.5)with˛=2.5and=1.0A.SecondarystructurevisualizationsaregeneratedwithVMD.34Colorsrepresentdistanceandcorrelationvaluesforeachpairofatoms.TheresiduenumbersforeachCarelistedalongthex-andy-axes.TheproteinaredisplayedinVMD's\newcartoon"representationandcol-oredbysecondarystructuredeterminedbySTRIDE.Thecolorschemeforsecondarystructureis:Purple-helix,blue-3(10)helix,yellow--sheet,cyan-turn,white-coil......................21Figure3.2:Illustrationofadmissiblecorrelationfunctions.(a)CorrelationfunctionsapproachtheILFas!1or˛!1at=7A.(b)ofvaryingscalevalue.Localcorrelationisobtainedwithlarge˛andsmallvalues.Whereas,nonlocalcorrelationisgeneratedbysmall˛andlargevalues....................................30Figure3.3:WorkwofbasicprocedureinmGNMandmANM...........36Figure4.1:CorrelationcotsforexperimentalvspredictedB-factorsusingtheLorentzkernel(left)andexponential(right)kernel.Thetestsetconsistsof263ConlyPDBScoresbelow0.5arenotshown.FortheLorentzkernel,˛valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom1.0Ato40.0Aatanintervalof1.0A.Fortheexponentialkernel,valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom0.5Ato20.0Aatanintervalof0.5A......41xFigure4.2:ExperimentalB-factors(black)vspredictedB-factors(red)usingtheLorentz(top)andexponential(bottom)correlationkernels.Thestruc-turesusedforcomparisonare1DF4(left)and2Y7L(right).Forthesecomparisons,theoptimalparameterswereusedfor˛,andbasedontheparametersearchesforeachcorrelationkernel.FortheLorentzkernel,˛=1.5and=2.0Aaretheparametersusedfor1DF4and˛=1.5and=19Aareusedfor2Y7L.Fortheexponentialkernel,=0.5and=1.0Aareemployedfor1DF4and=0.5and=2.5Afor2Y7L....42Figure4.3:Optimal˛parametervaluefor263proteinsusingtheLorentzcorrelationkernel.B-factorpredictionwascalculatedfor˛valuesrangingfrom0.5to10atanintervalof0.5andvaluesrangingfrom1.0Ato40.0Aatanintervalof1.0A...............................42Figure4.4:PhasediagramforLorentzkerneloptimalparametervalues˛andcol-oredbythesizeofstructureandwithshapescorrespondingtocorrelationcot.Diamond-0.5,downwardtriangle-0.6,upwardtriangle-0.7,square-0.8,circle-0.9.˛valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom1.0Ato40Aatanintervalof1.0A43Figure4.5:Optimalparametersfor263structuresusingtheexponentialcorrelationkernel.Herevaluesrangefrom0.5to10.0atanintervalof0.5.valuesrangefrom0.5Ato20.0Aatanintervalof0.5A..........44Figure4.6:Phasediagramforexponentialkerneloptimalparametervaluesandcoloredbythesizeofstructureandwithshapescorrespondingtocorrela-tioncot.Diamond-0.5,downwardtriangle-0.6,upwardtriangle-0.7,square-0.8,circle-0.9.valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom0.5Ato20Aatanintervalof0.5A.46Figure4.7:Completeresultsofoptimalparametersearchesusingtheexponentialcorrelationkernelforstructures1DF4(topleft),2Y7L(topright),2Y9F(bottomleft)and3LAA(bottomright).Structures1DF4and2Y7L(top)representthehighscoringstructures,thosewithscoresnear0.9.Structures2Y7Land3LAA(bottom)showthetypicalpatternofcorre-lationscoresforthemajorityofproteinstested.valuesrangefrom0.5to20.0atanintervalof0.5andvaluesrangefrom0.5Ato20Aatanintervalof0.5A...............................47xiFigure4.8:Comparisonofcorrelationcoientscalculatedusingoptimalparame-tersforbothLorentzandexponentialcorrelationkernels.Averagedevi-ation=0.0182(left)and0.0365(right).FortheLorentzkerneloptimalparametersearch,˛valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom1.0Ato40.0Aatanintervalof1.0A.Fortheexponentialkernelparametersearch,valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom0.5Ato20.0Aatanintervalof0.5A.TheparameterfreeLorentzkerneluses˛=2.5and=1.0Aandtheparameterfreeexponentialkerneluses=1.5and=5.0A......48Figure4.9:Comparisonofcorrelationcotscalculatedusingoptimalparam-etersandparameterfreeversionsofthemethod.Theoptimizedcorre-lationcotsarethehighestscoringfromaparametersearch.FortheLorentzkerneloptimalparametersearch,˛valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom1.0Ato40.0Aatanintervalof1.0A.Fortheexponentialkernelparametersearch,val-uesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom0.5Ato20.0Aatanintervalof0.5A.TheparameterfreeLorentzker-neluses˛=2.5and=1.0Aandtheparameterfreeexponentialkerneluses=1.5and=5.0A.Theliney=xisshownforreference.Pointsonthelineindicatelittleornobetweenoptimizedparametersandtheparameterfreeresults.Averagedeviationsare0.0410,0.0549,0.0463,and0.0540(fromlefttorightandfromtoptobottom).....49Figure4.10:Catomsof1QD9inVDWrepresentationscaledbypredictedB-factor(bothimages)andcoloredwithelectrostatics(right).LargerVDWradiirepresentmoreatomssuchasthosenearthesurfaceofthissolubleprotein.SmallerVDWradiirepresentmorerigidatomssuchasthoseinthecoreoftheprotein.Ontheright,atomsarecoloredbyelectrotaticsrevealingtwochargeddomains.First,theouteraminoacidshavesomeareasofpositivechargethatinteractwiththebulksolvent.Second,ahighlynegativelychargedportionoftheproteincoreishighlightedinred.Thesechargesarestabilizedbyinternalwatermolecules......50Figure4.11:ThemolecularsurfaceofProtein1QD9coloredbyB-factor(left)andcontinuousFRIrepresentation(right).ThetyindexiscalculatedusingtheLorentzmethodwith˛=2.5and=1.0A.ImagesgeneratedbyVMDusingBWRcolorbarandscale10to50forB-factorsand0.75to0.90fortheyindex.Inbothimages,blueregionsindicatelowyandredregionsindicatehighy.Ontheleft,B-factorisanatomisticrepresentationofy.Ontheright,FRIisusedtopredictyandthecontinuumrepresentationismappedtotheproteinsurface.Thecontinuumpredictionmatchestheexperimentalypatterncloselyexceptfornearthecoreoftheproteinwhichcontainssomestructuralwaternotincludedinourmodel........53xiiFigure4.12:Parametertestingforexponential(Leftchart)andLorentz(Rightchart)functions.AveragecorrelationcotofB-factorpredctionsof365proteinsisplotagainstchoiceofforarangeofvaluesforor˛...54Figure4.13:Theimpactofboxsizetotheaveragecorrelationcotforasetof365proteins.ThefFRIisexaminedoverarangeofvaluesforparameters(and˛)toillustratetherelationshipbetweenaccuracyandchoiceofboxsizeR..................................55Figure4.14:ComparisonofcorrelationcotsfromB-factorpredictionusingGNM,coarse-grained(C)NMAandFRImethods.Topleft:pfFRIvsopFRIfor365proteins;Topright:opFRIvsGNMfor365proteins;Bottomleft:pfFRIvsGNMfor365proteins;Bottomright:pfFRIvsNMAforthreesetsofproteinsusedbyParketal.52ThecorrelationcotsforNMAareadoptedfromParketal.52forthreesetsofproteins.ForoptimalFRI,parameter˛isoptimizedforarangefrom0.1to10.0.FortheparameterfreeversionoftheFRI(pfFRI),weset˛=3and=3A.Theliney=xisincludedtoaidincomparingscores.56Figure4.15:Parametertestingforatwo-kernelbasedmFRImethod.Valuesforarevariedforeachkernel,bothLorentzkernels.Herevaluesforeitherkernelarelistedalongtheaxises.TheaveragedcorrelationcotforB-factorpredictiononasetof364proteinsisshownineachcellofthematrixandcolorcodedforconveniencewithredrepresentingthehighestcorrelationcoientsandgreenthelowest.Obvious,thecombinationofarelativelysmall-scalekernelandarelativelylarge-scalekerneldeliversbestprediction,whichshowstheimportanceofincorporatingmultiscaleinproteinbilityanalysis.........................70Figure4.16:ComputationaleofmultikernelfastFRI(multifFRI)relativetosinglekernelfastFRI(fFRI)andGNM.ThedatasetsusedforthepresentstudyarethesameasthoselistedinTableVIIIofRef.4973Figure4.17:ComparisonofB-factorpredictionsofcalmodulin(PDBID:1CLL)usingtheGNMdistanceis7A)andFRImethods.ExperimentalB-factorsshowahingeregioninthemiddleasshowninFigure2.1:.One-kernelFRI(FRI-1K)isparameterizedat˛=3,=3:0.Two-kernelFRI(FRI-2K)isparameterizedat1=1;1=3A,˛2=3,and2=10A.Three-kernelFRI(FRI-3K)isparameterizedat˛1=3,1=3A,˛2=3,2=7A,3=1,and3=15A.ThethreekernelbasedmFRIdeliversthebestB-factorpredictionforthehingeregion.76xiiiFigure4.18:Top,avisualcomparisonofexperimentalB-factors(left),FRIpre-dictedB-factors(midlle)andGNMpredictedB-factors(right)fortheengineeredtealtprotein,mTFP1(PDBID:2HQK).Bottom,TheexperimentalandpredictedB-factorvaluesplottedperresidue.TheGNMnamingconventionindicatedtheusedfortheGNMmethodinangstroms,forexample,GNM7istheGNMmethodwithaof7A.......................................78Figure4.19:Top,avisualcomparisonofexperimentalB-factors(left),FRIpre-dictedB-factors(midlle)andGNMpredictedB-factors(right)fortheengineeredtealtprotein,mTFP1(PDBID:1V70).Bottom,TheexperimentalandpredictedB-factorvaluesplottedperresidue...80Figure4.20:Top,avisualcomparisonofexperimentalB-factors(left),FRIpredictedB-factors(midlle)andGNMpredictedB-factors(right)fortheribosomalproteinL14(PDBID:1WHI).Bottom,TheexperimentalandpredictedB-factorvaluesplottedperresidue.....................82Figure4.21:Top,avisualcomparisonofatomicexperimentalB-factors(farleft),C-alphaexperimentalB-factors(left),FRIpredictedB-factors(right)andGNMpredictedB-factors(farright)forthemarinesnailconotoxin(PDBID:1NOT).Bottom,TheexperimentalandpredictedB-factorval-uesplottedperresidue...........................84Figure4.22:MCCsforsinglekernelparametertestusingtheM1(squares),M2(circles)andM3(triangles)representations.Lorentzkernelwith˛=3isused.TheparameterisvariedtothemaximumMCConthetestsetofstructures.Theresultsforasetof64protein-nucleicstructures(PDBIDslistedinTable4.12:)areshownontheleft,whileresultsforaseparatesetof203structures(PDBIDslistedinTable4.13:)isshownontherightformoregeneralselections...................86Figure4.23:Illustrationhighlightingatomsusedforcoarse-grainedrepresentationsinprotein-nucleicacidcomplexesforFRIandGNM.InadditiontoproteinCatoms,ModelM1considersthebackbonePatomsfornucleotides.ModelM2includesM1atomsandaddsthesugarO4'atomsfornu-cleotides.ModelM3includesM1atomsandaddsthesugarC4'atomsandthebaseC2atomsfornucleotides...................87Figure4.24:Meancorrelationcots(MCCs)fortwo-kernelFRImodelsonasetof203protein-nucleicstructures.Fromlefttoright,MCCvaluesareshownforM1,M2andM3representations.WeuseoneLorentzkernelwith˛=3:0andoneexponentialkernelwith=1:0.Thevaluesofparameterforbothkernelsarevariedfrom2to20A..........89xivFigure4.25:CompleteribosomewithboundtRNAs(yellow(Asite)andgreen(Psite))andmRNAShine-Delgarnosequence(orange)PDBID:4V4J.ThesamecorrelationcotsandtingparametersfrommFRImodelofprotein1YIJareused.AcomparisonofpredictedandexperimentalB-factordataforRibosome50SsubunitPDBID:1YIJ.TheCCvalueis0.85usingtheparameterfreethree-kernelmFRImodel.NucleicacidsareshownasasmoothsurfacecoloredbyFRIyvalues(redformoreregions)whileboundproteinsubunitsarecoloredrandomlyandshowninasecondarystructurerepresentation.WeachieveaCCvalueupto0.85usingparameterfreethree-kernelmFRImodelwithoneexponentialkernel(=1and=15A)andtwoLorentzkernels(˛=3,=3Aand˛=3,=7A).........................95Figure4.26:TheRNAPlocalaFRImodeforthebridgehelix,triggerloopandnucleicacidsfrombothopen(PDBID:2PPB)andclosed(PDBID:2O5J)ArrowsrepresentthedirectionandrelativemagnitudeofatomicArrowsforthebridgehelix,triggerloopandnucleicacidsarepicturedasblue,whiteandyellow,respectively.97Figure4.27:Illustrationofprotein2Y7L.(a)Structureofprotein2Y7Lhavingtwodomains;(b)CorrelationmapgeneratedbyusingGNM-Lorentzindicat-ingtwodomains;(c)ComparisonofexperimentalB-factorsandthosepredictedbyGNM-Lorentz(=16A);(d)ComparisonofexperimentalB-factorsandthosepredictedbyFRI-ILF(rc=24A)..........102Figure4.28:PCCsbetweenvariousB-factorsforprotein2Y7L.(a)Correlationsbe-tweenBGNMILFandBExp,betweenBFRIILFandBExp,andbetweenBGNMILFandBFRIILF;(b)CorrelationsbetweenBGNMLorentzandBExp,betweenBFRILorentzandBExp,andbetweenBGNMLorentzandBFRILorentz..................................103Figure4.29:PCCsbetweenvariousB-factorsaveragedover364proteins.(a)Corre-lationsbetweenBGNMILFandBExp,betweenBFRIILFandBExp,andbetweenBGNMILFandBFRIILF;(b)CorrelationsbetweenBGNMLorentzandBExp,betweenBFRILorentzandBExp,andbetweenBGNMLorentzandBFRILorentz..................................105Figure4.30:TheaveragePCCsover362proteinsforType-1mGNM.(a)TwoILFkernelsandtheirdistancesaresystematicallychangedfrom5Ato31A.(b)Twoexponentialkernelsandtheirscalesaresystematicallyvariedintherangeof[1A,26A]......................107Figure4.31:TheaveragePCCsover362proteinsforType-2mGNM.(a)TwoILFkernelsandtheirdistancesaresystematicallychangedfrom5Ato31A.(b)Twoexponentialkernelsandtheirscalesaresystematicallyvariedintherangeof[1A,26A]......................108xvFigure4.32:TheaveragePCCsover300proteinsformANM.(a)TwoILFkernelsandtheirdistancesaresystematicallychangedfrom5Ato31A.(b)TwoGaussiankernels(=2)andtheirscalesaresystematicallyvariedintherangeof[1A,26A]......................110Figure4.33:ComparisonbetweenType-2mGNMwithexponentialkernelandtra-ditionalGNMfortheB-factorpredictionofprotein1CLL.Twoscales,1=3Aand2=25A,areemployedinmGNM.(a)MolecularsurfacecoloredbyB-factorspredictedbyGNMwithdistance7A.(b)MolecularsurfacecoloredbyB-factorsevaluatedbyourType-2mGNM.(c)MolecularsurfacecoloredbymultiscaleyfunctioninEqua-tion(3.15).(d)B-factorspredictedbytraditionalGNMwithdis-tances7A(GNM7)and20A(GNM20).(e)B-factorspredictedbymGNM.112Figure4.34:ComparisonbetweenType-2mGNMwithexponentialkernelandtra-ditionalGNMforprotein1V70B-factorprediction.Twoscales,1=3Aand2=25A,areemployedinmGNM.(a)Molecularsurfacecol-oredbyB-factorspredictedbyGNMwithcudistance7A.(b)Molec-ularsurfacecoloredbyB-factorsevaluatedbyourType-2mGNM.(c)MolecularsurfaceiscoloredbymultiscaleyfunctioninEquation(3.15).(d)B-factorspredictedbytraditionalGNMwithdistances7A(GNM7)and20A(GNM20).(e)B-factorspredictedbymGNM...113Figure4.35:ComparisonbetweenType-2mGNMwithexponentialkernelandtra-ditionalGNMforprotein2HQKB-factorprediction.Twoscales,1=3Aand2=25A,areusedformGNM.(a)MolecularsurfacecoloredbyB-factorspredictedbyGNMwithcutdistance7A.(b)Molec-ularsurfacecoloredbyB-factorsevaluatedbytheType-2mGNM.(c)MolecularsurfaceiscoloredbymultiscaleyfunctioninEquation(3.15).(d)B-factorspredictedbytraditionalGNMwithdistances7A(GNM7)and20A(GNM20).(e)B-factorspredictedbymGNM...114Figure4.36:ComparisonbetweenType-2mGNMwithexponentialkernelandtra-ditionalGNMforprotein1WHIB-factorprediction.TwomGNMsareused.Theone,mGNMK2,hastwoexponentialkernelswith=1,1=3Aand2=25A.ThesecondmGNM,mGNMK3,hasanextraexponentialkernelwith=1and3=10A.(a)MolecularsurfacecoloredbyB-factorspredictedbyGNMwithdistance7A.(b)MolecularsurfacecoloredbyB-factorsevaluatedbyaType-2mGNM.(c)MolecularsurfaceiscoloredbymultiscaleyfunctioninEqua-tion(3.15).(d)B-factorspredictedbytraditionalGNMwithdis-tances7A(GNM7)and20A(GNM20).(e)B-factorspredictedbytwomGNMs,mGNMK2andmGNMK3....................115xviFigure4.37:ProteindomaindecompositionwithType-1mGNM.Theeigenvec-tor(Fiedlervector)isusedtodecomposetheproteinintotwodomains.(a)protein1ATN(chainA);(b)protein3GRS..............117Figure4.38:ProteindomaindecompositionwithType-2mGNM.Theeigenvec-tor(Fiedlervector)isusedtodecomposetheproteinintotwodomains.(a)protein1ATN(chainA);(b)protein3GRS.ItcanbeseenthatType2mGNMfailsinproteindomaindecomposition.............118Figure4.39:Thecollectivemotionsofprotein1GRU(chainA).Theseventh,eighthandninthmodescalculatedfrommANMaredemonstratedin(a),(b)and(c),respectively.............................119Figure4.40:Thecollectivemotionsofprotein1URP(chainA).Theseventh,eighthandninthmodescalculatedfrommANMaredemonstratedin(a),(b)and(c),respectively.............................120Figure4.41:Top,secondarystructurerepresentationofovotransferrinwithhingeresidueshighlitedbyVdWrepresentationsoftheirC-alphaatoms.Bot-tom,valuesbyresidueformodes1and2(lefty-axis)withcumulativesum(righty-axis).Themaximumandminimumvaluesofthecumulativesumcorrespondtohingepoints......................123Figure4.42:Top,secondarystructurerepresentationofribosebindingproteinwithhingeresidueshighlitedbyVdWrepresentationsoftheirC-alphaatoms.Bottom,valuesbyresidueformodes1and2(lefty-axis)withcumulativesum(righty-axis).Themaximumandminimumvaluesofthecumulativesumcorrespondtohingepoints......................124Figure4.43:Top,secondarystructurerepresentationoflactoferrinwithhingeresidueshighlitedbyVdWrepresentationsoftheirC-alphaatoms.Bottom,val-uesbyresidueformodes1and2(lefty-axis)withcumulativesum(righty-axis).Themaximumandminimumvaluesofthecumulativesumcor-respondtohingepoints...........................125xviiLISTOFALGORITHMSAlgorithm1:fFRIalgorithm...............................24Algorithm2:Type-2mGNMmultiscaleKircmatrix................34xviiiKEYTOABBREVIATIONSaFRI-AnisotropicFlexibility-RigidityIndexANM-AnisotropicNetworkModelfFRI-FastFlexibility-RigidityIndexFRI-Flexibility-RigidityIndexgANM-GeneralizedAnisotropicNetworkModelgGNM-GeneralizedGaussianNetworkModelGNM-GaussianNetworkModelmANM-MultiscaleAnisotropicNetworkModelMD-MolecularDynamicsmFRI-MultiscaleFlexibility-RigidityIndexmGNM-MultiscaleGaussianNetworkModelNMA-NormalModesAnalysisNMR-NuclearMagneticResonanceSpectroscopypfFRI-Parameter-freeFlexibility-RigidityIndexSVM-SupportVectorMachinexixCHAPTERI.SummaryRecenttechnologicalandmethodologicaladvanceshavedramaticallyincreasedthesizeofthemacromolecularstructuresthatcanbesolvedexperimentally.Thisincreaseinscalehasledtochallengesinthetheoreticaldescriptionandcomputersimulationofproteinsandnucleicacids.Inresponsetothisboominstructuresize,therehasbeenanincreasedinterestinmultiscale,multiphysics,and/ormultidomainmodels.Suchmodelsaimtoimproveanaly-sisoflargemacromoleculesbyreducingthenumberofdegreesoffreedomwhilemaintainingmodelingaccuracyandachievingcomputational.Tothisend,thefollowingworkintroducesasimple,accurateandentmultiscalemethodforanalyzingmacromolecularyandrigidityinatomicdetail,theFlexibility-RigidityIndexorFRI.TheFRItheoryisbasedontheassumptionthatthemostfundamentalpropertiesofmacromoleculesarealmostentirelydeterminedbythegeometricstructureoftheproteinratherthanitssequence,eventhoughthestructureisdeterminedprimarilybyitssequenceofaminoacids.Simplyput,FRImethodsusethegeometriccompactnessofamacromolec-ularstructuretodeterminelityandmotionattheatomicscale.Unlikethesimilarandwell-knownmethodsbasedonNormalModesAnalysis,FRIdoesnotrequirematrixdiagonalizationforibilitypredictions.Inthecaseofanisotropiccalculations,FRIdoesrequiresomematrixsolving,buttheFRImethodallowsforfewerand/orsmallermatricestobesolved,therebycuttingdownoncalculationtimesdrastically.ThebasicFRIalgo-rithm'scomputationalcomplexityisapproximatelyO(N2),whereNisthenumberofatomsorresidues,incontrasttoO(N3)formethodssuchasNormalModesAnalysis(NMA)andGasussianNetworkModel(GNM)thatrequiresolvingofalargematrix.Inourinitialstudies,wedemonstratethattheproposedFRIgivesrisetoaccuratepredictionsofproteinB-Factorsforasetof263proteinstructurestakenfromX-raycrystallographydatainthe1ProteinDataBank.Wealsoshowthataparameter-freeformulationofFRI(pfFRI)isabletoachieveabout95%accuracyoftheregularFRIalgorithm.Furthermore,wecomparetheaccuracyandofFRItothatofthemostpopularapproachesforilityanalysis,NMAandGNM.Aninterpolationalgorithmisalsointroducedintheworkandisusedtoconstructcontinuousatomicyfunctionsforvisualizationanduseinmultiscalemultiphysicsmodels.BeyondtheintroductionofthebasicFRImethod,thisworkintroducesvariousimprove-mentsandvariationsontheFRImethodthatimprovecomputational,increaseaccuracyoraddnewutility.TheimprovementintroducedisthefastFRI(fFRI)algorithmforimprovingthecomputationalrun-timeforyanalysis.TheproposedfFRIfurtherreducesthecom-putationalcomplexityfromO(N2)toO(N)throughtheimplementationofthecelllistsmethod.IntensivevalidationandcomparisonsindicatethatfFRIisordersofmagnitudemoretandabout10%moreaccurateoverallthansomeofthemostpopularmethodsintheTheproposedfFRIisabletopredictB-factorsfor-carbonsoftheHIVviruscapsid(313,236residues)inlessthan30secondsonasingleprocessorusingonlyonecore.ThenextmajoradditionweproposeistheanisotropicFRI(aFRI)algorithmfortheanalysisofcollectiveproteindynamics.TheaFRIalgorithmmakesuseofadaptiveHessianmatrices,rangingfromacompletelyglobal3N3Nmatrixtocompletelylocal33matri-ces.Theselocal33matricesonlydescribethemotionofasingleresidue,however,thesematricesincludeglobalcorrelationgivingrisetopredictionsthatqualitativelymatchglobalcalculations.Theuseofadaptivematricesallowsforatdecreaseincom-putationalrunningtimeduetotheadvantageofsolvingmanysmallmatricesratherthanonelargematrix.EigenvectorsobtainedthroughtheproposedaFRIalgorithmsareabletodemonstratecollectivemotionssimilartonormalmodesmethods.Alargesetofproteinsis2usedtocomparetheoftheFRI,fFRI,aFRIandGNMmethods.ThenextstepinthedevelopmentofFRIwastheadditionofamultiscaleconcepttotheFRImethod.Becauseproteininteractionsareinherentlymultiscaleandproteinibilityisassociatedwithproteininteractions,proteinyshouldhaveamultiscalecharacteristic.Existingelasticnetworkmodelsaretypicallyparameterizedatasinglecut-distanceandthereforemayfailtoproperlypredictthethermalctuationofthemanymacromoleculesinvolvingmultiplecharacteristiclengthscales.Thereforeweintroduceamultiscaley-rigidityindex(mFRI)methodtoresolvethisproblem.TheproposedmFRIutilizestwoorthreecorrelationkernelsparametrizedattlengthscalestocap-turevariouslevelsofinteractionswithinandbetweenproteins.ItisshownthatthemFRImethodisabout20%moreaccuratethantheGaussianNetworkModelintheB-factorpre-dictionforasetof364proteinstructures.Additionally,weidentifymultipleinstanceswheremFRIisaccurateandGNMisveryinaccurate,possiblyduetothelackofamultiscaleaspect.Inadditiontotestingonproteins,wetesttheFRImethodformacromolecularcomplexesthatincludenucleicacids.Protein-nucleicacidcomplexesareimportantformanycellularprocessesincludingsomeofthemostessentialfunctionssuchastranscriptionandtranslation.Formanyprotein-nucleicacidcomplexes,exibilityofbothmacromoleculesisknowntobecriticalforspyand/orfunction.Therefore,wehaveextendedFRIyanalysistoprotein-nucleicacidcomplexes.WedemonstratebycomparisonwithexperimentaldatathatthemFRImultiscalestrategyisabletoaccuratelypredicttheyofprotein-nucleicacidcomplexes.Also,wetakeadvantageofthehighaccuracyandO(N)computationalcomplexityofourmultiscaleFRImethodtoinvestigatethebilityoflargeribosomalsubunitsandanentireribosome,whichistoanalyzebyalternativeapproachesduetoitssize.AsademonstrationoftheFRImethodforprotein-nucleicacidcomplexes,3weutilizeananisotropicFRIapproach,whichinvolveslocalizedHessianmatrices,tostudythetranslocationandactivesitedynamicsofbacterialRNApolymerase.AsGNMandANMaresomeofthemostpopularmethodsforthestudyofproteinyandrelatedfunctions,andtheFRImethodresemblesthesemethodsinsomeways,itisnecessarytoclarifytherelationshipbetweennormalmodes-basedmethodsandFRI.Tothisend,weproposegeneralizedGNMandANMmethods(gGNMandgANM)andshowthattheGNMKircmatrixcanbebuiltfromtheideallow-passaspecialcaseofcorrelationfunctionsunderpinningtheFRImethod.WeproposeaframeworktoconstructgeneralizedKircmatriceswhosematrixinverseleadstogGNMs,whereas,thedirectinverseofitsdiagonalelementsgivesrisetoFRImethod.Inadditiontoexploringthisconnection,weintroducetwonewmultiscaleelasticnetworkmodels,namelymultiscaleGNM(mGNM)andmultiscaleANM(mANM),whichareabletoincorporateentscalesintogeneralizedKircorHessianmatrices.WeillustratethatgGNMsoutperformtheoriginalGNMmethodintheB-factorpredic-tionofthesetof364proteins.Wedemonstratethatforagivencorrelationfunction,FRIandgGNMmethodsprovideessentiallyidenticalB-factorpredictionswhenthescalevalueinthecorrelationfunctionistlylarge.ThemultiscaleaspectoftheproposedmGNMandmANMgivesrisetoatimprovement,morethan11%,inB-factorpredictionsovertheoriginalGNMandANMmethods.WefurtherdemonstratebofourmGNMmethodintheB-factorpredictionsofmanyproteinsthattheoriginalGNMmethodfailstoaccuratelypredictB-factorsfor.Also,weshowthatthepresentmGNMcanbeusedtoan-alyzeproteindomainseparationsandshowcasetheabilityofourmANMforthesimulationofproteincollectivemotions.AsanexplorationintooneofthemanyapplicationsofFRI,weexaminedthepotentialforFRIandaFRImethodsinpredictingproteinhinges.Thestudyofhingesandhinge4motionsinproteinshasbeenanimportanttopicandmuchresearchhasbeendoneinthepast.21,25,26,39,59IdenofhingeresiduesisusefulforinferringmotionandfunctionwhenmoleculesaretoolargeforMDsimulationonrelevanttimescales.OthermethodssuchasGNMandNMAhavebeenutilizedforthispurposeinthepast,leadingustotheideathatFRI-basedmethodscouldplaceatroleinhingeanalysis.SofarwehavetriedpredictinghingeusingmodesofmotioncalculatedfromFRIcorrelationmaps.Wehavealsotriedvariousmachine-learningmodelstopredicthingesusingacombinationofFRI-basedmetricsandvariousotherresidue-levelmetricsbasedonsolventaccessiblesurfacearea,side-chain,hydrophobicityandmanyotherproperties.Finally,weshowthathingepredictionsfromFRImodesareatleastasaccurateasthoseobtainedfromotherstateofthearthingepredictionsoftware.5CHAPTERII.BackgroundandIntroduction2.1ExperimentalmethodsforstructuralyTheofstructuralbiologyhasseenrapidgrowthinthelastfewdecades.Sincetheproteinstructuresweresolvedinthelate1950s,theproteindatabankhasgrowntoincludeoveronehundredthousandmacromolecularstructuresranginginsizefromsmallpeptidestolargeviralcapsids.Theseexperimentshaveshownthatproteinsexhibitadi-verserangeofstructureandfunctionandthatthesetwoaspectsarecloselyrelated.Infact,itisoftenpossibletopredictaprotein'sfunctionfromitsstructurealone,especiallywhenahomologousproteinisavailableforcomparison.Muchofthefocustodatehasbeenonthemorestaticregionsofproteinsfortheoreticalandpracticalreasons.However,itisimportanttonotethatevenwellfoldedproteinsexperienceeverlastingsduetotheconstantfromoutsideforces,whichdriveintrinsicmotionssuchasatomicvibrationsandconformationalshifts.Thepossiblemovementsthatcanarisefromthesearedeterminedbyaprotein'sstructure.Thismeansy,ortheabilitytodeformfromthecurrentconformationunderexternalforces,isanintrinsicpropertyofallproteins,andiscloselytiedtofunction.Forinstance,proteinycanenhanceprotein-proteinandprotein-ligandinteractionsbyintermittentlymorefavorablebindingsurfacesthroughsmallsecondarystructureandsidechainAdditionally,proteinyandmotionamplifytheprobabilityofbarriercrossinginenzymaticreactions.Therefore,theinvestigationofproteinyatmultiplescalesisvitaltotheunderstandingandpre-dictionofproteinfunctions.Infact,eventhestudyofsomecompletelydisorderedproteinsisessentialduetotheirconnectionstoneurodegenerativediseasessuchasmadcowdisease,Alzheimer'sdiseaseandParkinson'sdisease.15,67Therefore,inordertobetterstudyproteinfunctioninorderedordisorderedproteins,werequireaccurate,t,multiscaletoolsforevaluatingy.6Currently,themostimportantexperimentaltechniquesforproteinyanalysisareX-raycrystallographyandNuclearMagneticResonance(NMR).Amongtheoveronehun-dredthousandstructuresintheproteindatabank(PDB),morethaneightypercentwerecollectedbyX-raycrystallography.TheDebye-Wallerfactor,orB-factor,isaexperimen-talmeasureofdisorderthatcanbedirectlycomputedfromX-raydata.TheseB-factorshavebeenobservedtocorrelatewithatomicyfromMDandNMAex-periments,therebymakingthemanidealexperimentalmeasureofyforcomparisonwiththeoreticalmethods.However,itisimportanttorememberthatthisisnotaperfectcorrelationbecauseB-factorscanbebymultiplefactorsincludingvariationsinatomiccrosssectionsandchemicalstabilityduringtheiondatacollection.Therefore,onlytheB-factorsforsptypesofatoms,mostoftenCatoms,canbedirectlyinterpretedastheirrelativeywithoutcorrections.TheothermajorexperimentalmethodforaccessingproteinyisNMR,whichoftenprovidesstructuralxibilityinformationunderphysiologicalconditionsunlikeX-raywhichrequiresspconditionstoformsuitablecrystals.NMRspectroscopyallowsthecharacterizationofpro-teinyindiversespatialdimensionsandalargerangeoftimescales.AboutsevenpercentofthestructuresinthePDBaredeterminedbyNMRspectroscopy,however,itisunclearhowtoassignyvaluestoatomsbasedonNMRspectroscopydata.ThereforewearecurrentlyfocusedoncomparingtheoreticalresultstoX-raycrystallographyresultsonly.2.2ComputationalmethodsforyanddynamicsTheexperimentaltechniquesmentionedintheprevioussectionareincrediblypowerfulforstudyingproteinstructureandfunction,however,theydofacesomelimitationsduetotechnicalchallenges.Forexample,someproteinsmaybeextremelyultorimpossibletocrystallizeandothersthatdocrystallizemaydosoinformsnotrelevanttotheirfunction.7Toaddressthecaseswhereexperimentaltechniquesfailandtoincreaseofanalysisweturntotheoreticalapproaches.Therehavebeenmanydistinctmethodsforyandmotionanalysisproposedoverthepastfewdecades.Themajorexamples,andthosethatcanbeinsomewaycomparedtoFRI,aremoleculardynamics(MD),NMA,machine-learningmodels,andmultiscale,multiphysicssimulations.Moleculardynamicssimulationsareattheforefrontofcomputationalbiochemistryandhavecontributedtlytoourunderstandingoftheconformationallandscapesofpro-teins,especiallyconformationsthatarenotdirectlyaccessibleviaexperimentaltechniquesduetovarioustechnicalorpracticalchallenges.Thesesimulationsenableustostudypro-teinsthataretostudyexperimentallysuchasamyloidintrinsicallydisorderedproteins,andpartiallydisorderedproteins.However,thedynamicsoflargermacromoleculesandsystemsincludingmultiplemoleculestypicallyoccurattimescalesthatareintractableforMDsimulations.Amajorbreakthroughwithrespecttothescaleofproteinsimulationscamewiththein-troductionofnormalmodeanalysis(NMA),8,29,44,64atime-independentmolecularmechanicsmethodthatisrelatedtoMDviathetime-harmonicapproximation.52ThesuccessoftheinitialNMAmethodledtothedevelopmentofrelatedmethodsthatimprovecomputationalrunningtimeoraddutility.ThemostnotableexamplesofNMA-relatedmethodsaretheelasticnetworkmodel(ENM),66GNM,4,5,27andanisotropicnetworkmodel(ANM).3TheENMandGNMmethodsusecoarse-grainedrepresentationsofmacromoleculestospeedupcomputationwithonlyaminorlossinaccuracy,andtheANMmethodprovidesmotionpre-dictions.Allnormalmodes-relatedmethodscanbeusedtoapproximateproteinyorB-factorsfromthefeweigenvectorsandeigenvaluesoftheinteractionHamiltonianinnormalmodesortheKircmatrixinENMandGNM.Thesequantitativepredictionsofbiomolecularibilityandtheirapplicationsarediscussedinmanyreviewpapers.16,46,61,778Thelowest-energyeigenvaluesfromthesecalculationstheproteindynamicsthrougheventhelongestrelevanttimescales,somethingwhichistypicallybeyondthereachofMDsimulations.5,8,44,64,66Thenormalmodesapproacheshavebeenimprovedinmanyaspectssincetheirintroduction,includingcrystalperiodicitycorrections32,40,41,62andtheintroduc-tionofthedensity-clusterrotational-translationalblocking18tospeedupcalculations.Still,thecomputationalcomplexityofthesemethodsisdominatedbyacomputationallytdiagonalizationofthelargeNbyNmatricesusedinnormalmodes.Duetothediagonal-izationstep,thesemethodshaverunningtimesthatscaleaccordingtoO(Nk)time,whereNisthematrixdimensionandkˇ3.Sowhilenormalmodescalculationsaretypicallymoretforcalculatinglong-timedynamicsofproteinsthanMD,themethodisnotsuitableforexcessivelylargemacromoleculesandmacromolecularcomplexes,e.g.systemswithmillionsofaminoacidresiduessuchasthoseobtainedfromcryo-EMexperimentsortheoreticalconstructs.Anevenmorerecentsetoftoolsforyanalysisisthatofknowledgebased,machine-learningmethods.Thiscategoryincludesexamplesofypredictionbyneu-ralnetworks,55supportvectorregression81andtwo-stagesupportvectorregression.51Theseapproachestypicallyutilizelargeproteindatasetsastrainingdata.Therefore,thevalidityandaccuracyandofthesemethodsaredependentonthequalityandrepresentativenessofthetrainingdataset,qualitiesthataretoproveinthecurrentstateofstructuralbiology.Yetanothermodernapproachutilizesgraphtheorytoanalyzethebondnetworksinproteins,36employingbothgeometricandenergeticcriteriatoidentifytheandrigidregions.Unfortunately,thismethodreliesonnormalmodeanalysisandothercostlyalgorithmswhichlimitittothesamescalesasthenormalmodetools.Mostrecentlytherehasbeenanincreasedinterestintheoreticalmethodsforyanalysisthataredevelopedviamultiscaleformulations.Multiscalemethodscombineelastic9mechanicsandmolecularmechanicstotlyreducethedegreesoffreedomoflargebiomolecularsystems.10Forexample,theclassicaltheoryofelasticityforDNAloopshasbeencombinedwiththeMDdescriptionofproteinforprotein-DNAinteractioncomplexes.70Recently,thecontinuumelasticmodelingoftheCanham-HelfrichtypeofenergyfunctionalhasbeencoupledwithMDsimulationstoinvestigatethecomplexelasticbehaviorofHep-atitisBviruscapsids.57Multiscalebasedyanalysishasawiderangeoftechnicalvariability.Inthebestscenario,multiscalemethodscantaketheadvantageofeachscaletoachieveexcellentmodelingaccuracyandcomputational.However,multiscalemethodsaretypicallytechnicallydemandingandcomputationallycomplex.Amajoris-sueintheishowtogobeyondthephonologicaldomainandmaketheseapproachesquantitativeandpredictive.Reliableanalysisandvalidationwithexperimentaldataareindispensableprocedures.Forthesereasons,thereisaneedtofurtherdevelopandvalidateinnovativeapproachesfortheyanalysisofbiomolecularsystems.2.3TheFlexibility-RigidityIndexThisworkaimstosolvetheamajorissuewiththeaforementionedmethods,especiallyMDandNMA,whichistheissueofpoorscaling.Inadditiontoprovidingimprovedscalingforpredictingyandlong-timescaledynamicsitisvitaltoalsomatchorimproveuponthelevelofaccuracyandutilitybytheseothermethods.Toaddresstheseissuesproposeancient,accuratemethodforproteinB-factorpredictionandyanalysiscalledtheFlexibility-RigidityIndexorFRI.TheFRImethodisbasedonsomesimplifyingassumptionsaboutmacromoleculesincludingthatproteindynamicsaredeterminedentirelybystructureandthatsidechaincanbeignored.TheFRIalgorithmisbasedonmeasurementsofgeometriccompactnessandtopologicalconnectivityofaproteinstructureateachresidue.Itisassumedthatnearbyatomshavestrongerinteractionsandtendtoconferstabilitytoothernearbyatomsinmacromoleculesandthatthisstabilizingdecays10withincreasingdistance.Physicalinteractionpotentialsarenotdirectlyusedtorepresenttheinteractionsinthismethodandareinsteadreplacedbyamonotonicallydecayingkernelorkernelsthatisparametrizedempirically.Inpractice,thismethodgivesrisetoaccuratepredictionsofproteinyorB-factorsbasedongeometriccompactnessalone.WenotedafterthepublicationofourearliestworkonFRI75thatthenameofyindex"wasproposedindependentlybyvonderLiethetal.71andJacobsetal.36fortwotquantitiestodescribebondstrengths.BothoftheseyindicesaredistinctfromtheproposedFRImethod.TheFRIalgorithmissolelystructuralbasedanditdoesnotreconstructanyproteininteractionHamiltonian.OnlyelementaryarithmeticisneededintheFRImethodforproteins.Inparticular,theFRIpredictionofproteinB-factorsdoesnotrequireastringentlyminimizedstructureoratimeconsumingmatrixdiagonalizationormatrixdecompositionstep,nordoesitinvolveanytrainingprocedure.2.3.1fastFRIandanisotropicFRIAnotherobjectiveofthepresentworkistointroduceafastFRI(fFRI)algorithmbyusingappropriatedatastructuresbecausecomputationaliscriticalforanalyzinglargerstructures.ThecomputationalcomplexityoftheproposedfFRIisofO(N),comparedtothatofO(N2)fortheoriginalFRIalgorithmandofO(N3)fortheGNM,whereNisthenumberofatoms.Weusethecelllistsapproach2toachievethisreductioninthecomputationalcomplexitywithnegligiblelossinaccuracy..AnotherobjectiveistointroduceanisotropicFRI(aFRI)algorithmsforthemotionanalysisofbiomolecules.UnlikeANM,3,52whichiscompletelyglobalandhas3N3NelementsinitsHessianmatrix,theproposedaFRIalgorithmsutilizeadaptiveHessianmatrices,whichvaryfromcompletelyglobaltocompletelylocal.EveninthemostlocalformulationofaFRI,therearecollectivemotionspredictedbythreesetsofeigenvectors.Thesethreemodesofmotionturnouttocorrespondwiththelowestenergy,mostdominantmodesofglobalaFRIandANMinmosttestcases.11WedemonstratethisutilityofaFRIonmultipleproteinsandprotein-nucleicacidsystem.ItwasnoticedearlyinthedevelopmentofFRIthatthereareasmallnumberofstructuresforwhichFRIperformsverypoorlyinyprediction.Furthermore,thosestructureswhichcauseproblemsforFRIarelikelytobeforNMAandGNMaswell.OnesuchstructureispicturedinFigure2.1:wheretheGNMmethodfailstopredictthehighyofahingeregionincalmodulin.Thereareanumberofpossiblereasonsforthisandsimilarfailures,whicharehighlightedinthiswork.Thecrystalenvironment,solventtype,co-factors,datacollectionconditions,andstructuralmentproceduresareallwell-knownects32,40,41,62thatcaninterferewithyestimationsfromX-rayexperiments.However,thereisonemoreimportantcausethathasnotbeendiscussedintheliteraturetoourbestknowledge,namely,multiplecharacteristiclengthscalesinasingleproteinstructure.Contrarytoverysmallmolecules,macromoleculeshaveawidevarietyofcharacteristiclengthscales,fromthesmallscaleofintramolecularbondingtolargescaleobservedinprotein-nucleicandprotein-proteininteractions.Therefore,itisreasonabletoregardlargeproteinsasmultiscalemolecules.WhenaGNMorFRIalgorithmisparametrizedatasinglegivenorscaleparameter,itcapturesonlyasubsetofthecharacteristiclengthscalesandinevitablymissesothercharacteristiclengthscalesoftheprotein.Consequently,neithermethodisabletoprovideaccurateB-factorpredictionsforallmacromoleculesusingasinglecharacteristiclengthscale.Thereforeoneoftheobjectivesofthepresentworkistointroduceamultiscalestrategyforproteinyanalysis.TheessentialideaistoassessproteintopologicalconnectivityandpackingcompactnessatmultiplescalesbycombiningmultipleFRIkernelsorcorrelationfunctions.Asaresult,multiscaleFRI(mFRI),isabletosimultaneouslycaptureproteincrucialcharacteristiclengthscalesandprovideimprovedB-factorpredictions.12Figure2.1:Thestructureofcalmodulin(PDBID:1CLL)visualizedinVMD34andcoloredbyexperimentalB-factors(topleft)andGNMpredictedB-factors(topright)withredrepresentingthemostregions.Bottom,acomparisonofpredictedB-factorvaluesfrommFRI,GNMwithadistanceof7A,andexperimentalB-factorstakenfromthePDBentry.132.3.2FRIforProtein-NucleicAcidComplexesInadditiontoproteins,nucleicacidsareamongthemostessentialbiomoleculesforallknownformsoflife.Nucleicacidsoftenfunctioninassociationwithproteinsandplayacrucialroleinencoding,transmittingandexpressinggeneticinformation.ThereforeitwasnecessarytodevelopFRImethodsfornucleicacidchainsandprotein-nucleicacidcom-plexes.Proteinsandnucleicacidchainsaredramaticallytbiomoleculesandaminoacidresiduesandnucleotideshavetlengthscalesandinteractioncharacteristics.Therefore,agoodmodelshouldnotonlyallowresiduesand/ornucleotidestobetreatedwithtlengthscales,butalsoadaptamultiscaledescriptionofeachresidueand/ornucleotide.Unlikeelasticnetworkmodelsthatareparametrizedinonlyonelengthscaleforeachparticle,mFRIprovidesasimultaneousmultiscaledescription.Therefore,thepresentmFRIisabletobettercapturemultiscalecollectiveinteractionsofprotein-nucleicacidcom-plexes.Additionally,manyprotein-nucleicacidcomplexesareverylargebiomoleculesandthereforerequireconsiderablecomputationalresourcestoanalyzebyconventionalmodedecomposition-basedmethods.TheO(N)scalingFRImethodsprovideamoretapproachtotheyanalysisoflargeprotein-nucleicacidcomplexes.2.3.3gGNM,mGNMandmANMInspiredbytheimprovementsthatmultiplecorrelationkernelshaveontheFRImethod,weproposeamethodtoincorporatemultiscalecorrelationsintotheaforementionedmodedecomposition-basedmethods,GNMandANM.OurapproachtoaddressthelinkbetweenFRIandnormalmodesmethodsistwofold.First,weproposeaframeworktocon-structageneralizedGNM(gGNM).WerevealthattheGNMKircmatrixcanbecon-structedfromtheideallow-passr(ILF),whichisthelimitingcaseofadmissibleFRIcorrelationfunctions.WedemonstratethatFRIandgGNMareasymptoticallyequivalent14whenthevalueintheKircmatrixorthescalevalueinthecorrelationfunctionistlylarge.ThispavesthewayforunderstandingtheconnectionbetweentheGNMandFRImethods.Toclarifythisconnection,weintroduceageneralizedKircmatrixtoprovideastartingpointforthegGNMandFRImethods,whichelucidatesonthesimilarityandebetweengGNMandFRI.BasedonthisnewunderstandingofthegGNMworkingprinciple,weproposemanycorrelationfunction-basedgGNMs.WeshowthatgGNMoutperformstheoriginalGNMfortheB-factorpredictionofasetof364proteins.BothgGNMandFRIdeliveralmostidenticalresultswhenthescaleparameteristlylarge.ThisapproachshedslightontheconstructionoftgGNMs.Additionally,weproposetwonewmethods,multiscaleGNM(mGNM)andmultiscaleANM(mANM),toaccountforthemultiscalefeaturesofbiomolecules.TheaimistogeneralizeoriginalGNMandANMintoamultikernelsettingsothateachkernelcanbeparametrizedatagivencharacteristiclength.ThisgeneralizationisachievedthroughtheuseofaFRIassessment,whichpredictstheinvolvementoftscales,followedbyanappropriateconstructionofmultikernelGNMormultikernelANM.Thisapproachworksbecauseforadiagonallydominantmatrix,thedirectinverseofthediagonalelementisessen-tiallyequivalenttothediagonalelementoftheinversematrix.InthisworkwedemonstratebycomparisonwithexperimentaldatathattheproposedmGNMandmANMareabletosuccessfullycapturethemultiscalepropertiesofproteinstructuresandsitlyimprovetheaccuracyofthesemethodsinproteinyprediction.2.3.4MachinelearningandFRIforprotein-proteininteractionsSupportVectorMachine(SVM)isatypeofsupervisedmachinelearningthathasgrowninpopularityrecentlyduetosuccessfulapplicationsacrossmanytSomeex-amplesofsuccessfulapplicationsofSVMmodelsincludedrugdesign,37imagerecognitionandtext12,20,38microarraygeneexpressiondataanalysis,9,11,28,30,48,53,80pro-15teinfoldrecognition14,19,43,protein-proteininteraction7andproteinsecondarystructureprediction.33ThebasicideaofapplyinganSVMinthiscontextistomapasetofinputintothefeaturespaceinwhichtheinputdatabecomesmoreseparablecomparedtotheoriginalinput,thenconstructamaximum-marginhyperplanewhichseparatestwoclasseswithinthefeaturespace.Inthiscasethetwoclassesbeingseparatedarehotspotresiduesandnon-hotspotresidues.TotesttheviabilityofFRI-derivedfeaturesforpredictingprotein-proteininteractions,wechosetoincorporateFRI-derivedmetricsintotheKFC2model,83anSVMmodelthatusesresidue-scalefeaturestopredictprotein-proteinbindinghotspots.16CHAPTERIII.Methods3.1Flexibility-rigidityindex(FRI)WeinitiallyconsideronlyproteinsasexamplestoillustratetheFRIalgorithm,althoughotherbiomolecules,suchasDNAandRNA,canbeaccommodatedwithaminormoofthealgorithm.Weareparticularlyinterestedinacoarse-grainedrepresentation.However,methodsforafullatomdescriptioncanbeformulatedaswell.Weseekastructurebasedalgorithmtoconvertproteingeometryintoproteintopology.Tothisend,weconsideraproteinwithNCatoms.Theirlocationsarerepresentedbyfrjjrj2R3;j=1;2;;Ng.WedenotekrirjktheEuclideanspacedistancebetweenithCatomandthejthCatom.ThedistancegeometryofproteinCatomsisutilizedtoestablishthetopologyconnectivitybyusingmonotonicallydecreasingradialbasisfunctions,Cij=krirjk;ij);(3.1)whereijisacharacteristicdistancebetweenparticles,andkrirjk;ij)isacorrela-tionfunction,whichis,ingeneral,areal-valuedmonotonicallydecreasingfunction.Asacorrelationfunction,itkririk;ii)=1(3.2)krirjk;ij)=0askrirjk!1:(3.3)Deltasequencesofthepositivetypediscussedinanearlierwork73areallgoodchoices.Forexample,onecanusegeneralizedexponentialfunctionskrirjk;ij)=e(krirjkij);>0(3.4)andgeneralizedLorentzfunctionskrirjk;ij)=11+(krirjkij)˛;˛>0:(3.5)17Essentially,thecorrelationbetweenanytwoparticlesshoulddecayaccordingtotheirdis-tance.Therefore,manyotheralternativescanbeusedandsoduringvalidationmultiplefunctionsaretested.ThecorrelationmaporcrosscorrelationisanimportantquantityfortheGNM.WecanasimilarcorrelationmapbysettingC=fCijg;i;j=1;2;;N.ThecorrelationmapmeasurestheconnectivityofCsintheprotein.Weanatomicrigidityindexiasthesummationoftopologicalconnectivityi=NXj=1wijkrirjk;ij);8i=1;2;;N;(3.6)wherewijisaweightfunctionrelatedtotheatomictype,Theatomicrigidityindeximanifeststherigidityorattheithatom.Inageneralsense,theatomicrigidityindexthetotalinteractionstrength,includingbothbondedandnon-bondedcontributions.ItisquitestraightforwardtotheaveragedmolecularrigidityindexasasummationofatomicrigidityindicesMRI=1NNXi=1i:(3.7)Theaveragedmolecularrigidityindexcanbeusedtopredictmolecularthermalstability,bulkmodulus,density(compactness),boilingpointsofisomers,theratioofsurfaceareaovervolume,surfacetension,etc.Adetailedinvestigationoftheseaspectsisbeyondthescopeofthepresentwork.Wearenowreadytoapositiondependentshearmodulus(r)=NXj=1wj(rkrrjk;ij);r2E;(3.8)wherewj(r)isaweightfunction,risintheproximityofriandEisthemacromoleculardomain.Inordertodeterminewj(r),weanaveragerigidity(oraveragedrigidity18indexfunction)by=1VZ(r)dr;(3.9)whereVisthevolumeofthemacromolecule.Ifwj(r)isaconstant,itsvaluecanbeuniquelydeterminedbyacomparisonofwithexperimentalshearmodulus58foragivenmacro-moleculeandcorrelationfunction.Wealsoanatomicyindexasfi=1i;8i=1;2;;N:(3.10)SincetheyateachatomisproportionaltoitstemperaturewecanexpressB-factorsasBti=afi+b;8i=1;2;;N(3.11)wherefBtigaretheoreticallypredictedB-factors,andaandbaretwoconstantstobedeter-minedbyasimplelinearregression.Wecanalsotheaveragedmolecularyindex(MFI)asasummationofatomicyindicesfMFI=1NNXi=1fi:(3.12)MFIshouldcorrelatewithmolecularstabilityandenergy.Forthepurposeofvisualization,weacontinuousatomicyfunctionasF(r)=NXj=1Btikrrjk);r2E:(3.13)wherekrrjk)isageneralinterpolationfunctionforscattereddata.Wavelets,splinefunctions,andmoShepard'smethod56,65canbeemployedfortheinterpolation.One19canmapf(r)tothemolecularsurfacetovisualizetheproteiny.75Alternatively,onecancomputethecontinuousatomicyfunctionbyF(r)=1PNj=1wj(rkrrjk;ij);r2E:(3.14)Similarly,wecanalsoconstructacontinuousmultiscaleyfunction,f(r)=b+Xn=1anPNj=1wnjkrrjk;n):(3.15)Onecanmapthiscontinuousmultiscaleyfunctionontoamolecularsurfacetoanalyzetheibilityofthemolecule.3.2FRIcorrelationmapsormatricesSimilartothecrosscorrelationsofGNMandothermethods,FRIcorrelationmapscom-putedusingEq.(3.1)qualitativelythethree-dimensionalstructureofaprotein.Asaconsequence,distinctsecondarystructuressuchashelicesand-sheetsexhibitcharacter-isticpatterns.Aftersomestudyingofthepatternsitispossibletoapproximateaproteinssecondaryandtertiarystructurefromthepatternsofthecorrelationmapalone.However,unlikethecrosscorrelationsofGNM,FRIcorrelationmapsareabletomorequanti-tativestructuralinformation.Infact,sincethekernelusedtogeneratethemapisknown,thedistancesbetweenallatomscanbecalculatedandthethree-dimensionalstructurecanbereconstructedfromthecorrelationmap.Figure3.1:displaysfourexamplesofcorrela-tionmapsnexttotheircorrespondingthree-dimensionalstructure.Thescale-barsofthecorrelationmapsincludedistancevaluestoemphasizethepreservationofthe3Dstructuralinformation.Asstatedpreviously,eachsecondarystructureexhibitsadistinctpatterninthecorre-lationmaps.ThepatternforanhelixisshownintherowofFig.3.1:.Thehelixcreatesabandofhighcorrelationextendingabout4aminoacidsineitherdirectionfromthediagonal.Thecorrelationhasalocalmaximumatthethirdneighborresidue,due20Figure3.1:Correlationmapsandsecondarystructurerepresentationsforfourproteinstruc-tures.Structuresusedincludethealpha-spectrinSH3domain,thetetramerizationdomainofthep53tumorsupressor,theB1immunoglobulin-bindingdomainofstreptococcalproteinGandaDNAbindingproteinfromMethanococcusjannaschii,fromlefttoright,toptobot-tom.CorrelationmapsaregeneratedusingEq.(3.5)with˛=2.5and=1.0A.SecondarystructurevisualizationsaregeneratedwithVMD.34Colorsrepresentdistanceandcorrelationvaluesforeachpairofatoms.TheresiduenumbersforeachCarelistedalongthex-andy-axes.TheproteinaredisplayedinVMD's\newcartoon"representationandcoloredbysecondarystructuredeterminedbySTRIDE.Thecolorschemeforsecondarystructureis:Purple-helix,blue-3(10)helix,yellow--sheet,cyan-turn,white-coil.21tothestructureofthehelix(3.6aminoacidresiduesperturn).Therefore,thepeakatthethirdresidueservesasanothersignatureofanhelixintheFRIcorrelationmap.Anincreaseincorrelationbetweentwosuchneighboringatomscomparedtootherneighboringpairsindicatestheinteractionofthehelixandanothercomponent.Forexample,inthethirdrowofFig.3.1:,thecorrelationstrengthbetween29thCand32thCishigher,duetointeractionof29thCwiththethirdandfourthbetasheets.Thisisanexampleofhowthistypeofcorrelationkernelstertiarystructureinformation.Otherfoldssuchas-sheetsarealsoeasilyidenbydistinctpatterns.Onecaneasilydistinguishparallel-sheetsfromanti-parallel-sheetsbytheirpatternswiththismethod.ThesecondrowofFig.3.1:isagoodexampleofthepatterngeneratedbyanti-parallel-sheets.Anti-parallel-sheetsappearaslinesthatareperpendiculartothediagonalofthemapandtheintersectionofthetwolinesofhighcorrelationaretheturnsbetweeneachstrand.Parallel-sheetsappearaslinesparalleltothediagonal.InthethirdrowofFig.3.1:,ananti-parallel-sheetisformedbytheandlasttenaminoacidsresultinginalineinthetopleftandbottomrightofthecorrelationmatrix.ThelasttworowsofFig.3.1:bothdisplaycomplexpatternswhichnotonlysecondarystructureinformationbutalsothethreedimensionalarrangementofthesecondarystructurefeatures.Clearlyfromthelastcorrelationmap,the-sheetinteractsstronglywiththehelixandthesecond-sheetinaparallelmanner.Italsointeractstoalesserdegreewiththesecondhelixandwiththelast-sheetinananti-parallelmanner.ThesepatternsandthestabilizingforcesfromtheinteractionstheyrepresentarelostifoneusesacontactorKircmatrixbasedmethodinsteadofamonotonicallydecreasingradialbasisfunctionbasedcorrelationmap.223.3Fasty-rigidityindex(fFRI)Asdiscussedinourearlierwork,75theoriginalFRIalgorithmhasthecomputationalcomplexityofO(N2),mainlyduetotheconstructionofthecorrelationmatrix.Inthepresentwork,weproposeafastFRI(fFRI)algorithm,whichcomputesonlythetelementsofthecorrelationmatrixandatthesametimemaintainstheaccuracyofthemethod.Asaresult,thecomputationalcomplexityofthefFRIalgorithmisofO(N).Theessentialideaistopartitiontheresiduesinaproteinintocubicboxesaccordingtotheirspatiallocations.Foreachresidueinagivenbox,weonlycomputeitscorrelationmatrixelementswithallresidueswithinthegivenboxandwithallresiduesintheadjacent26boxes.Theaccuracyandencyofthisapproacharedeterminedbytheboxdimension.WeselectaboxsizeofRsuchthatR;)"(3.16)where">0isagiventruncationerror.Therefore,forgeneralizedexponentialfunctions(3.4),wehaveRln1"1:(3.17)Ifweset"=102,wehaveRˇ4:6for=1andRˇ2:15for=2.Notethatentvalueshavetoptimalvalues.Thehigherthevalueis,thelargertheoptimalis.Similarly,forgeneralizedLorentzfunctions(3.5),wechoosetheboxsizeR1""1˛:(3.18)Again,ifweset"=102,wehaveRˇ10for˛=2andRˇ4:6for˛=3.AnoptimalRshouldbalanceaccuracyand.InSection4.2,itisfoundthattheselectionofR=12AisnearoptimalforbothexponentialandLorentzfunctions.InAlgorithm,wepresentapseudocodetoillustratethetruncationalgorithmofthefFRI.23Algorithm1:fFRIalgorithmInput:atoms(N).XYZcoordinatesfromPDBmincoor minval(atoms).Computedimensionsofboundingboxmaxcoor maxval(atoms)R boxsize.SetsizeofgridNbox ceiling((maxcoormincoor)=R).Computenumberofboxesineachdirectionforii 1;Natomsdoi;j;k ceiling((atoms(ii)mincoor=R)).CountthenumberofatomsineachboxNatoms(i;j;k) Natoms(i;j;k)+1endforfork 1;Nbox[3]doforj 1;Nbox[2]dofori 1;Nbox[1]doallocate(box(i;j;k)).Allocatespaceforeachboxendforendforendforforii 1;Natomsdo.Copycoordinatestoappropriateboxbasedon3Dcoordinatesi;j;k ceiling((atoms(ii)mincoor)=R)box(i;j;k) atoms(ii)endforfork 1;Nbox[3]do.Iterateoverboxesforj 1;Nbox[2]dofori 1;Nbox[1]doforna 1;Natoms(i;j;k)do.Iterateoveratomsincurrentboxforn k1;k+1do.Iterateoveradjacentboxesform j1;j+1doforl i1;i+1dofornb 1;Natoms(l;m;n)do.Iterateoveratomsinadjacentboxesdist distance(box(i;j;k)(na);box(l;m;n)(nb))FRI(na) kernel(dist)endforendforendforendforendforendforendforendfor243.4Multiscaley-rigidityindex(mFRI)ThebasicideaofmultiscaleFRI(mFRI)isquitesimple.Sincemacromoleculesareinherentlymultiscaleinnature,weutilizemultiplecorrelationkernelsthatareparameterizedatmultiplescalestocharacterizethemultiscaleyofmacromoleculesfni=1PNj=1wnjn(krirjk;nj);(3.19)wherewnj,n(krirjk;nj)andnjarethecorrespondingquantitiesassociatedwiththenthkernel.WeseektheminimizationoftheformMinan;b8<:XiXnanfni+bBei29=;(3.20)wherefBeigaretheexperimentalB-factors.Inprinciple,allparameterscanbeoptimized.Forsimplicityandcomputational,weonlydeterminefangandbintheaboveminimizationprocess.Foreachkerneln,wnjandnjwillbeselectedaccordingtothetypeofparticles.Sp,forasimpleCnetwork,wecansetwnj=1andchooseasinglekernelfunctionparametrizedattscales.ThepredictedB-factorscanbeexpressedasBmFRIi=b+Xn=1anPNj=1krirjk;n):(3.21)ThebetweenEqs.(3.19)and(3.21)isthat,inEqs.(3.19),boththekernelandthescalecanbechangedfordifefrentn.Incontrast,inEq.(3.21),onlythescaleischanged.Onecanuseagivenkernel,suchaskrrjk;n)=11+(krrjkn)3;(3.22)toachievegoodmultiscalepredictions.3.5Anisotropicy-rigidityindex(aFRI)Inthissection,weproposeanewanisotropicmodelbasedontheFRImethod.Inexistinganisotropicmethods,theHessianmatrixisalwaysglobalsothematrixcontainsall25the3N3NelementsforNparticlesinmolecule.IntheaFRImodel,theHessianmatrixisinherentlylocalandadaptive.Itssizemayvaryfrom33foracompletelylocalaFRIto3N3NforacompleteglobalaFRI,dependingontheneedofaphysicalproblemorcomputationalresources.MorelocalHessianmatricesaresmallerandcanbesolvedmuchfasterduetothepoorscalingofmatrixsolvingalgorithms.TobuildtheadaptivematricesofaFRI,partitionalltheNparticlesinamoleculeintoatotalofMclustersfc1;c2;;ck;;cMg.ClusterckhasNkparticlesoratomssothatN=PMk=1Nk.Aclustermaybearegionofphysicalinterestinamoleculesuchasanalphahelix,adomain,orabindingsiteofaprotein.Oneoftwoextremecasesisthatthereisonlyoneparticleineachcluster.InthatcasethereareNclusters.Theothercaseisthatthereisonlyoneclustercontainingtheentiremolecule.TheresultisaHessianmatrixforanysizeclusterthatcanbesolvedindividuallyandretainssomeinformationaboutotherclusterpropertiesinthevaluesofthediagonal.Forexample,ifweareinterestedinthethermalofaparticularclusterckwithNkparticlesoratoms,wecan3Nkeigenvectorsforthecluster.LetuskeepinmindthateachpositionvectorinR3hasthreecomponents,r=(x;y;z).Wedenoteijuv=@@ui@@vjkrirjk;ij);u;v=x;y;z;i;j=1;2;;N:(3.23)Notethatforeachgivenij,wedeij=ijuv)asalocalanisotropicmatrix(3.24)ij=0BBBBB@ijxxijxyijxzijyxijyyijyzijzxijzyijzz1CCCCCA:Sincerigidityandycanbebothanisotropic,itisnaturaltoproposetwoentaFRIalgorithmsbasedonarigidityHessianmatrixandayHessianmatrix.263.5.1AnisotropicrigidityAnisotropicrigidityisbyarigidityHessianmatrixforanarbitraryclusterck.Letusdenote(ijuv(ck))arigidityHessianmatrixforclusterck.Itselementsarechosenasijuv(ck)=wijijuv;i;j2ck;i6=j;u;v=x;y;z(3.25)iiuv(ck)=PNj=1wijijuv;i2ck;u;v=x;y;z(3.26)ijuv(ck)=0;i;j=2ck;u;v=x;y;z:(3.27)TheHessianmatrix(ijuv(ck))isof3Nk3Nkdimensions.Notethatthediagonalpart,iiuv(ck),hasbuiltininformationfromalltheparticlesinthesystem,eveniftheclusteriscompletelylocalized,Nk=1;8k.AtestoftheanisotropicrigiditymethodistocheckifitworksforB-factorprediction.TopredictB-factorswithanisotropicrgiditywecollectthediagonaltermsoftherigidityHessianmatrixidiag=Triuv(3.28)=NXj=1wijijxx+ijyy+ijzz:(3.29)Wethenasetofanisotropicrigidity(AR)basedyindicesbyfARi=1idiag:(3.30)B-factorscanbepredictedwithasetofffARigbyusingthelinearregressioninEq.(3.11).3.5.2AnisotropicyToanalyzebiomolecularanisotropicmotionsinparalleltoANM,weneedtoexaminetheiranisotropicy.Tothisend,wefurtheraxibilityHessianmatrixF(ck)27forclusterckasFij(ck)=1wijij)1;i;j2ck;i6=j;u;v=x;y;z(3.31)Fii(ck)=PNj=11wijij)1;i2ck;u;v=x;y;z(3.32)Fij(ck)=0;i;j=2ck;u;v=x;y;z:(3.33)whereij)1denotetheunscaledinverseofmatrixijsuchthatijij)1=jijj.Similartoanisotropicrigidity,thediagonalpartFii(ck)hasbuilt-ininformationfromallparticlesinthesystem.Therefore,evenifthepartitionofclustersiscompletelylocalized(Nclusters),correlationamongatomicmotionsisretained.BydiagonalizingF(ck),weobtain3Nkeigen-vectorsfortheNkparticlesinclusterck.Sincetheselectionofckisarbitrary,eigenvectorsofallotherclusterscanbeattainedusingthesameprocedure.ToobtaintheB-factorpredictionfromthisanisotropicy,wedeasetofanisotropicy(AF)basedyindicesbyfAFi=Tr(F(ck))ii;(3.34)=(F(ck))iixx+(F(ck))iiyy+(F(ck))iizz:(3.35)ThenEq.(3.11)isemployedtoobtainB-factorpredictions.Inthiswork,weonlyconsiderthecoarse-grainedmodelinwhicheachresidueisrepre-sentedbyitsC.Tofurthersimplythemodel,thebetweenresiduesareignored.Theparameterwijisassumedtobe1andijissettoaconstant.3.6GeneralizedGaussiannetworkmodels(gGNMs)Toestablishnotationandfacilitatenewdevelopment,letuspresentabriefreviewoftheGNMandFRImethods.ConsideranN-particlecoarse-grainedrepresentationofabiomolecule.Wedenotefrijri2R3;i=1;2;;Ngthecoordinatesoftheseparticlesandrij=krirjktheEuclideanspacedistancebetweenithandjthparticles.Inanutshell,the28GNMpredictionoftheithB-factorofthebiomoleculecanbeexpressedas4,5BGNMi=a1ii;8i=1;2;;N;(3.36)whereaisaparameterthatcanberelatedtothethermalenergyand1)iiistheithdiagonalelementofthematrixinverseoftheKircmatrix,ij=8>>>>>>><>>>>>>>:1;i6=jandrijrc0;i6=jandrij>rcPNj;j6=iij;i=j;(3.37)wherercisadistance.TheGNMtheoryevaluatesthematrixinverseby1)ii=PNk=21kukuTkii,whereTisthetransposeandkandukarethektheigenvalueandeigenvectorofrespectively.Thesummationomitstheeignmodewhoseeigenvalueiszero.TheFRIpredictionoftheithB-factorofthebiomoleculecanbegivenby49,75BFRIi=a1PNj;j6=iwjrij;)+b;8i=1;2;;N;(3.38)whereaandbarengparameters,fi=1PNj;j6=iwjrij;)istheithilityindexandi=PNj;j6=iwjrij;)istheithrigidityindex.Here,wjisanatomicnumberdependedweightfunctionthatcanbesettowj=1foraCnetwork,andrij;)isareal-valuedmonotonicallydecreasingcorrelationfunctionsatisfyingthefollowingadmissibilitycondi-tionsrij;)=1asrij!0(3.39)rij;)=0asrij!1;(3.40)whereisascaleparameter.Deltasequencesofthepositivetype73aregoodchoices.Manyradialbasisfunctionsarealsoadmissible.49,75CommonlyusedFRIcorrelationfunctions29Figure3.2:Illustrationofadmissiblecorrelationfunctions.(a)Correlationfunctionsap-proachtheILFas!1or˛!1at=7A.(b)ofvaryingscalevalue.Localcorrelationisobtainedwithlarge˛andsmallvalues.Whereas,nonlocalcorrelationisgeneratedbysmall˛andlargevalues.includethegeneralizedexponentialfunctionsrij;;)=e(rij);>0;(3.41)andgeneralizedLorentzfunctionsrij;;˛)=11+(rij)˛;˛>0:(3.42)AmajoradvantageoftheFRImethodisthatitdoesnotresorttomodedecompositionanditscomputationalcomplexitycanbereducedtoO(N)bymeansofthecelllistsalgorithmusedinfastFRI(fFRI).49Incontrast,themodedecompositionofNMAandGNMhasthecomputationalcomplexityofO(N3).TofurtherexplorethetheoreticalfoundationofGNM,weexaminetheparameterlimitsofthegeneralizedexponentialfunctions(3.4)andthegeneralizedLorentzfunctions(3.5).e(rij)!rij;rc)as!1(3.43)11+(rij)˛!rij;rc)as˛!1;(3.44)30whererc=andrij;rc)istheideallow-pass(ILF)usedintheGNMKircmatrixrij;rc)=8>>><>>>:1;rijrc0;rij>rc:(3.45)Relations(3.43)and(3.44)connectFRIcorrelationfunctionstotheGNMKircmatrix.ItisimportanttonotethattheILFusedinGNMisanadmissibleFRIcorrelationfunction.Mathematically,theILFisaspecialreal-valuedmonotonicallydecreasingcorrelationfunc-tionandalsoadmissibilityconditions(3.2)and(3.3).Infact,allFRIcorrelationfunctionsarelow-passaswell.Therefore,bothGNMandFRIadmitlow-passintheirconstructions.GNMisveryspecialinthesensethatthereisonlyoneILFusedeventhoughtherearemanyotherlow-passFigure3.2:illustratesthebehaviorandrelationshipoftheabovelow-passorcorrelationfunctions.ItisshownthatthegeneralizedexponentialfunctionandgeneralizedLorentzfunctionmaybelonger-rangingandtheformerdecaysfasterthanthelatterforagivenpower.Thecombinationofalowpowervalueandalargescalegivesrisetonon-localcorrelations.Earliertestsindicatethataparameterizationof˛=3and=3AforLorentzkernelFRIprovidesaccurateypredictionsonasetof364proteinsrelativetoGNM.49TodescribethemathematicalfoundationandrelationshipbetweentheGNMandFRImethods,weconsiderageneralizedKircmatrix76ij=8>>><>>>:rij;);i6=jPNj;j6=iij;i=j;(3.46)whererij;)isanadmissibleFRIcorrelationfunction.ThegeneralizedKirchhomatrixincludestheKircmatrixasaspecialcase.ItisimportanttonotethateachdiagonalelementisanFRIrigidityindex:i=iiTherefore,thegeneralizedKircmatrix31providesastartingpointforboththeFRIandgGNMmethods.However,thebetweenthegGNMandFRImethodsisthattopredictB-factors,thegGNMrequiresthecalculationoftheinverseoftheKircmatrix(3.37),whereas,theFRItakesthedirectinverseofonlythediagonalelementsofthegeneralizedKircmatrix(3.46).3.7MultiscaleGaussiannetworkmodel(mGNM)TheideabehindmGNMistobuildamultiscaleKircmatrix,whichincorporatesvariousscalesinsteadofasingleone.DuetotheintrinsicrelationbetweenFRIandgGNMdiscussedinSection3.6,wemakeuseofthecoetsapproximatedfromtheFRIalgorithmtoconstructamultiscaleKircmatrix.Inthissection,wepresenttwotypesofalgorithmstoconstructanmGNMmethod.3.7.1Type-1mGNMFirst,weassumethatthemultiscaleKircmatrixtakestheform(3.47)=Xnann;whereanandn=ijn(rij;nj))arethecotandgeneralizedKircmatrixassociatedwiththenthkerneln(rij;n))parameterizedatanappropriatescalen.WeusethemFRImethodtoevaluatecotsfang.Basically,wehavemultiscalerigidityindexi=Pnannii.Then,fangaredeterminedviatheminimizationMinPi1iBei2,whichisequivalenttoMinan8<:XiXnannii1Bei29=;;(3.48)assumingthatBei>0.WiththemultiscaleKirchhomatrixgiveninEq.(3.47),wecarryoutroutineGNManalysisasdescribedinEq.(3.36).323.7.2Type-2mGNMAnotheralgorithmforconstructinganmGNMmethodmakesuseofcotsfrommFRIdirectlyviatherelationshipbetweenbiomolecularlocalpackingdensityandy.Basically,wechooseseveralkernelsparameterizedatvariousscalesandevaluatethebestcotsfangandb,withtheexperimentalB-factorsusingEquation(3.48).TheresultingmultiscaleyindexisthenusedtoconstructthegeneralizedKircmatrixasfollowsXnanfni+b=1ii;8i=1;2;;N:(3.49)Withtherelationfni=1ni;8i=1;2;;N,theaboveexpressioncanberewrittenas,ii=1Pnanni+b;8i=1;2;;N:(3.50)Usually,wecanusetwoorthreekernelsparameterizedattscales.Forinstance,ifweusetwokernels,wecanfurtherrewritetheaboveexpressionas,ii=1i2ia12i+a21i+1i2i;8i=1;2;;N:(3.51)Nowtheproblemistodeterminethenon-diagonaltermsofamultiscaleKircma-trix.Onesimpleapproachistosubdivideeitherofthetworigidityindices.Forexam-ple,wecanchoosetousetherigidityindexforthekernel.Sincewehaveni=PNj;j6=iwnjn(rij;n);n=1;2,diagonaltermofthemGNMmatrixcanalsobeexpressedasii=Xj;j6=ifw1j1(rij;1)g2ia12i+a21i+1i2i;8i=1;2;;N:(3.52)Inthisway,thefullmultiscaleKircmatrixcanbeexpressedasij=8>>><>>>:fw1j1(rij;1)g2ia12i+a21i+1i2i;i6=jPNj;j6=iij;i=j:(3.53)33TheproblemwiththematrixinEq.(3.53)isthattheresultingmultiscaleKirchhomatrixisnotsymmetric,whichmayleadtocomputationaly.Toavoidanon-symmetricmatrix,weproposeanalternativeconstructiontopreservethesymmetryofthematrix.ThealternativeistodeterminethediagonaltermsiifromEq.(3.50)andthenoneachrow,equallydistributethediagonaltermintothenon-diagonalparts,undertheconditionthattheresultingmatrixremainssymmetric.Thisisshownasaniterativeschemein.Algorithm2:Type-2mGNMmultiscaleKircmatrixInput:ii;i=1;2;;N.DiagonaltermsarecalculatedfrommFRIforj 2;Ndo.FortherowandlineofmultiscaleKircmatrix.1j=11N1.Weequallydistributethediagonaltermsintonon-diagonalparts.j1=1j.Usethesymmetryproperty.endforfori 2;N1dosum=0fork 1;i1dok1=kk2=k+1sum=sum+k1k2.Summarizeovertermsalreadydeterminedfrompreviousiterations.endforforj i+1;Ndoij=iisumNi.Weequallydistributethediagonaltermsintonon-diagonalparts.ji=ij.Usethesymmetryproperty.endforendforIntheconstructionoftheType-2mGNM,onlythediagonaltermsareanddeter-minedusingmFRI.InB-factorprediction,thenon-diagonalvaluescanbeveryaslongastheysatisfythenetworkconstraintthatthesummationoftheirvaluesequalsthediagonalterm.WebelievethisisduetothefactthatthesuccessofmGNMinB-factorpredictionisdeterminedmostlybythepackinginformationstoredinthediagonaltermsofitsKircmatrix.Inthefollowingdiscussion,weonlyusethesymmetricschemeinAlgorithmfortheType-2mGNM.343.8Multiscaleanisotropicnetworkmodel(mANM)InmANM,thegeneralizedlocal33HessianmatrixHnijassociatedwiththenthkernelcanbewrittenasHnij=n(rij;n)r2ij2666664(xjxi)(xjxi)(xjxi)(yjyi)(xjxi)(zjzi)(yjyi)(xjxi)(yjyi)(yjyi)(yjyi)(zjzi)(zjzi)(xjxi)(zjzi)(yjyi)(zjzi)(zjzi)37777758i6=j:(3.54)NotethatHinsen31hasproposedaspecialcase:n(rij;n)=e(rijn)2.WefurthertakethediagonalpartsasHnii=Pi6=jHnij;8i=1;2;;N.Basically,itisthesummationofallthenon-diagonallocalmatrices.ThekeycomponentofmANMistoconstructamultiscaleHessianmatrixemployingseveralHessianmatricesparameterizedattscalesanddeterminetheircotsinthemultiscaleHessianmatrixbyusingmFRI.ItshouldbenoticedthatforB-factorprediction,each3diagonaltermsfromtheinverseHessianmatrixaresummarizedtogether.Therefore,intheHessianmatrixbasedmFRI,therigidityindexassociatedwiththenthkernelisconstructedasthesummationofthediagonalterms,ni=Xi6=jn(rij;n)r2ij[(xjxi)2+(yjyi)2+(zjzi)2]=Xi6=jn(rij;n);8i=1;2;;N:(3.55)Indeed,therigidityindexofmANMaboveisthesameasthemFRIrigidityindex.Therefore,asfarasB-factorpredictionisconcerned,themFRIapproachforconstructinganmGNMshouldworkforconstructinganmANMaswell.WeadopttheapproachusedintheType-1mGNMconstructiontoconstructanmANM.WeproposeamultiscaleHessianmatrixH=PnanHn,forwhichthecotsanshould35ReadinpdbdataSelectkernelfunctions˚nandscaleparametersnCalculaterigidityindexniandyindexfniEvaluatecotsanandbviatheminimizationConstructmultiscaleKircmatrixformGNM(orHessianmatrixHformANM)EigenvaluedecompositionmGNMB-factorprediction(ormANMcollectivemodeanalysis)Figure3.3:WorkwofbasicprocedureinmGNMandmANM.beevaluatedfromMinan8<:XiXnanni1Bei29=;:(3.56)Again,tmatricesfHngshouldbeparameterizedattscales.ToclarifytheproposedmultiscaleGaussiannetworkmodelandmultiscaleanisotropicnetworkmodel,wepresentawchartinFig.3.3:toillustratethebasicprocedurethatoutliningthemethods.3.9gGNMmodecalculationsforpredictinghingesInaplotofgGNMmodevalues,hingesareoftenobservedwherethevaluesofagGNMmodeswitchsignandchangevaluedrastically.Furthermorethereshouldbeasigntnumberofnegativevaluestoonesideofthesignswitchandpositivevaluesontheothersidetoindicateanactualseparationbetweendomains.Otherwise,theremaybeextraneous36hingepredictionsinregionsthathavemanyvaluesclosetozeroandmayswitchsignsmanytimesoverthespanofjustafewresidues.TomoreeasilypotentialhingeresiduesfromgGNMmodesasdescribedaboveandinamannerthatouttheminorsignchanges,weutilizethecumulativesumofamodethendeterminetheresiduenumbercorrespondingtothemaximumand/orminimumvaluesofthatseriesofcumulativesums.Thistechniquecanidentifyanynumberofhingesaslongasthenumberofpeaksisidencorrectly.Forthisstudy,thecodedetectsonlythemaximumandminimumofthecumulativesumofthemodewhichlimitsdetectiontotwohinges,howeverthisshouldnotthisstudyexcepttopossiblylimitfalsepositivepredictionsbecausetheproteinsusedfortestinghaveatmosttwomajorhingeregions.Also,anyresiduestooneareitherendofthemoleculetobeseparatingdomains,i.e.residueswithin35residuesoftheend,areremovedfromthesetofpredictedhinges.Theeigenvaluesoftherelevantmodescanbeverysimilarinsomecaseswhilethenumberand/orlocationsofhingespredictedbythosemodesist.Inthiscasewemayequallyconsiderallmodeswithsimilareigenvalues.Thismayleadtoadditionalfalsepositivesbutisnecessaryforensuringmaximumsensitivitywhichisapriority.3.10MachinelearningandfeatureselectionThefeatureswetestinanattempttoimproveontheKFC2SVMmodelincludevari-ousmetricsderivedfromFRIypredictions,FRI-modecalculations,hydrophobicity,sequencebasedstatisticsanddetailsofthestructure.FeaturesderivedfromFRIypredictionsincludetheFRIandmFRIyindices,FRIfandmFRIf,FRIandmFRIB-factorvaluesandtheirFRI,mFRIanddFRI,andtheaveragemFRIB-factorvaluewithinsixAngstroms,avgFRI.FeaturesderivedfromFRI-modecalculationsarecreatedfromthetwolowesteigen-valuedmodesofmotion.Thelowesteigenvaluedmodetypicallycorrespondstothe37mostimportantorlargestinamplitudemotion,thereforwederivemostofthefeaturesfromthismode.Thefeaturesderivedfromthemodeincludetherawvaluesofthemoderesidue-by-residue,Mode1,thecumulativesumofthemodevalues,cMode1,theresiduenumberswhereMode1isatalocalmaximumorminimum,ishinge,allresidueswithinthreeofishinge,ishinge3,andthenumberofresiduestothenearestpositiveishingevalue,hingedist.Alsoincludedarefeaturesderivedfrommode2,Mode2andcMode2.Thehydrophobicityofindividualresiduesandregionsofaproteinisthemaindrivingforcebehindfoldingofaproteinstructure.Therefore,wesuspectthathingeresidues,whichoccupysppositionsbetweendomains,areundersomepressuretomaintainacertainlevelofhydrophobicity,likelyhighhydrophobicityduetothesovlentexposednatureofmanyhinges.Thehydrophobicityderivedfeatureswetestinthisstudyincludethehydrophobicityofasingleresidue,HP1,andtheaveragehydrophobicityforresiduesinasixAngstromspherearoundaresidue,HP6.Twoothermetricsarealsotestedthatdescribetheareawithin6Angstromsofaresidue,RES6andROT6,thenumberofresiduesandthenumberofrotatablebondsinthatregion.Thesefeatureshavebeenusedinmachinelearningmodelsforpredictingprotein-proteininteractionhotspotsandshowedrelativelyhighpredictiveworth.InadditiontoRES6andROT6,wealsolookatsolventaccessiblesurfacearea(SASA)basedmetricstodescribetheareaaroundaresidue.UsingthePOPSsoftwarepackagewegeneratefeaturesforhydrophobicSASA,PhobSASA,hydrophilicSASA,PhilSASA,totalSASA,TotSASA,numberofoverlappingatoms,N(ovrl),andthetotalsurfaceareaoftheresidue,Surf.Finally,weincludePSSMfeatures,onefeatureforeachaminoacidloglikelihoodateachposition.PSSMderivedfeaturesarenamedasA(PSSM),G(PSSM),etc.Awiderangeofmetrics,derivedfromvariousmolecularproperties,wereconsideredaspotentiallyusefulforhingeprediction.Thesemetricswerecomparedbyby38F-scorethresholdthenrankingtheremainingfeaturesbytheRandomForestimportancevalue.39CHAPTERIV.ValidationandApplications4.1BasicFRImethod4.1.1FRIB-factorpredictionTovalidatetheoriginalFRImethod,wecomparetheB-factorpredictionsobtainedfromFRIwithexperimentalB-factorsfromproteinX-raycrystallographyexperimentsasshowninEq.(4.6).Asetof263proteinswascollectedfromthePDBwithpreferenceforhighresolution(1.5A)protein-onlystructuresthatlackstructuralco-factors.Theimpactofco-factorsonproteinstabilityrequiresanallatommodelandisatopicthatwillbeexploredinourfuturework.Thesetof263proteinswasconvertedtoaConlyformatandwhenatomshavemultiplecoordinateswithoccupancy<1.0thehighestoccupancycoordinatewaskeptandallotherswerediscarded.ThisisapotentialsourceoferrorintheB-factorpredictions.Howeversomeproteinswithmultiplecoordinatesforatomswereamongthehighestscoringwhichsuggeststhattheimpactinmostcasesissmall.ThecorrelationcotsofB-factorpredictionaredisplayedinFig.4.1:forbothex-ponential(expo)andLorentzkernels.EachproteinwastestedwithboththeexponentialandLorentzcorrelationkernelsacrossarangeofparametervaluesofandfortheexpo-nentialkerneland˛andfortheLorentzkernel.CorrelationcotscoresforB-factorpredictionsbelow0.5accountforjust19ofout263proteinsfortheLorentzkernelbasedFRIand14outof263fortheexponentialkernelbasedFRIandarenotshowninFig.4.1:.Thereasonsfortheselowscoresarethesubjectoffutureresearchandarelikelyrelatedtotheenceofcrystalpackingestructuralligandsandside-chainthatarenotapproximatedwellbytheCcoursegrainedmodel.TheaccuracyofB-factorpredictionisalsodependentuponthequalityoftheexperimentaldata.IfmultiplecoordinatesarereportedforanatomalongwithmultipleB-factors,thenwedonothavehighin40Figure4.1:CorrelationcotsforexperimentalvspredictedB-factorsusingtheLorentzkernel(left)andexponential(right)kernel.Thetestsetconsistsof263ConlyPDBScoresbelow0.5arenotshown.FortheLorentzkernel,˛valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom1.0Ato40.0Aatanintervalof1.0A.Fortheexponentialkernel,valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom0.5Ato20.0Aatanintervalof0.5A.theB-factorandthusthepredictionwillappeartobelessaccurate.AcomparisonoftheexperimentalvspredictedB-factorsfortwoproteins,1DF4and2Y7L,isshowninFig.4.2:todemonstratetheaccuracyofourFRImethod.ThesetwoproteinswereinthetopehighestcorrelationcotsforB-factorpredictionsusingtheexponential(2Y7L:0.928,1DF4:0.909)andLorentz(2Y7L:0.928,1DF4:0.917)kernels.ItcanbeseenfromthecorrelationscoresandFigs.4.1:and4.2:thatbothcorrelationkernelsgivesimilarresults,especiallyforthesehighlyaccuratepredictions.B-factorpredictionwascalculatedforeachproteinatarangeofparametervaluesineachkernel.TheLorentzkernelrequiresparameters,˛and,whiletheexponentialkernelrequiresand.Theaimistondvaluesfortheseparametersthataresuitableformostorallproteinssothatthemethodmaybemadeparameterfree.TheparameterswhichresultinthehighestcorrelationcotforeachproteinaredisplayedinFig.4.3:andFig.4.5:fortheLorentzandexponentialkernels,respectively.Theoptimalvaluefor˛intheLorentzkernelisfoundtobenear2.5formostproteins41Figure4.2:ExperimentalB-factors(black)vspredictedB-factors(red)usingtheLorentz(top)andexponential(bottom)correlationkernels.Thestructuresusedforcomparisonare1DF4(left)and2Y7L(right).Forthesecomparisons,theoptimalparameterswereusedfor˛,andbasedontheparametersearchesforeachcorrelationkernel.FortheLorentzkernel,˛=1.5and=2.0Aaretheparametersusedfor1DF4and˛=1.5and=19Aareusedfor2Y7L.Fortheexponentialkernel,=0.5and=1.0Aareemployedfor1DF4and=0.5and=2.5Afor2Y7L.Figure4.3:Optimal˛parametervaluefor263proteinsusingtheLorentzcorrelationkernel.B-factorpredictionwascalculatedfor˛valuesrangingfrom0.5to10atanintervalof0.5andvaluesrangingfrom1.0Ato40.0Aatanintervalof1.0A.42Figure4.4:PhasediagramforLorentzkerneloptimalparametervalues˛andcoloredbythesizeofstructureandwithshapescorrespondingtocorrelationcot.Diamond-0.5,downwardtriangle-0.6,upwardtriangle-0.7,square-0.8,circle-0.9.˛valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom1.0Ato40Aatanintervalof1.0A43Figure4.5:Optimalparametersfor263structuresusingtheexponentialcorrelationkernel.Herevaluesrangefrom0.5to10.0atanintervalof0.5.valuesrangefrom0.5Ato20.0Aatanintervalof0.5A.inthetestset.Theoptimalvalueforistypicallythehighestorlowesttested.Theresultsoftheparametersearchfor˛andareshowninFig.4.3:.ThisresultisaclosematchtotheofYangetal.79andtheirparameterfreeENM(pfENM)model.InthepfENM,springconstantsarescaledbyaninversepower.Yangetal.testedpowers1-10andfoundsecondandthirdinversepowerrelationshipswerethemostaccurateforB-factorpredictions.79Inourstudywealsotestnon-integerpowersovertherange0.5to10.0andcometoasimilarconclusion.TheoptimalvaluesforinthesetestsTheoptimalvaluefor˛isplottedagainsttheoptimalvalueforandcoloredbythesizeofproteininFig.4.4:.Thereisnoclearpatternbasedonproteinsizeexceptthatsomesmallerproteins(under100atoms)preferveryhighvaluesof˛whichmaybeduetoalackoflongrangeinteractions.Fortheexponentialkernel,theoptimalvalueformostproteinsisbetween0.5and1whiletheoptimalvaluesaremorespreadoutwiththemajorityofproteinshavingoptimalvaluesfrom0.5Ato8A.ThisambiguityintheoptimalparametervaluemakesthechoiceofparametersforaparameterfreeversionhoweverthetestingoftheparameterfreeexponentialkernelmethodshowsthatitperformsaswellastheparameterfreeLorentzkernelmethods.Theoptimalvaluesforandforallproteinsinthetestsetareshown44inFig.4.5:.Optimalvaluesforare0.5or1.0inmostcaseswithatpeakat=10whichisthehighestvaluetested.Optimalvaluesforaremorevariedandthereisnoclearchoiceforaparameterfreeversion.Thereisalargepeakatthehighestvaluetested(=20A)astherewasforhoweverthesetwopeaksdonotcorrespondtothesamesetofproteins.ThispointisillustratedinFig.4.6:whichcomparesandvalues.Figure4.6:alsoshowsthatthereisnorelationshipbetweennumberofatomsorcorrelationcotandtheparametersand.TofurtherinformourchoiceofparametersfortheparameterfreeexponentialmethodwelookatthepatternsofcorrelationscoresforeveryandvaluecombinationinFig.4.7:.Theparametermapsshowthatformostproteinsthechoiceofismostimportantandthatwhen1therearemanychoicesforthatresultinverysimilarcorrelationcots.TotestparameterfreeversionsoftheFRImethodwechose˛=2.5and=1.0AfortheLorentzkerneland=1.5and=5.0Afortheexponentialkernel.Thesechoicesweremadebasedontheparametersearchesandlimitedtestsofvariousparametervalues.InFig.4.8:wecomparetheexponentialandLorentzkernelperformancebasedoncorrelationcotsfromB-factorprediction.Thecorrelationcotswerehighestoverallwhenusingtheexponentialkernelwithoptimizedparameters.TheaveragecorrelationcontofB-factorpredictionusingtheexponentialkernelis0.681usingoptimalparametersand0.627usingtheparameterfreeversion.TheaveragecorrelationcoientofB-factorpredictionusingtheLorentzkernelis0.668usingoptimalparametersand0.627usingtheparameterfreeversion.ThebetweentheexponentialandLorentzkernelsissmallwhenusingoptimizedparameterswithanaveragedeviationofjust0.0182.Theparameterfreeversionsofthekernelsalsoproduceverysimilarcorrelationcotswithanaveragedeviationof0.0365.TheparameterfreeLorentzandexponentialkernelsappeartohavesimilarperformance45Figure4.6:Phasediagramforexponentialkerneloptimalparametervaluesandcoloredbythesizeofstructureandwithshapescorrespondingtocorrelationcot.Diamond-0.5,downwardtriangle-0.6,upwardtriangle-0.7,square-0.8,circle-0.9.valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom0.5Ato20Aatanintervalof0.5A.46Figure4.7:Completeresultsofoptimalparametersearchesusingtheexponentialcorrelationkernelforstructures1DF4(topleft),2Y7L(topright),2Y9F(bottomleft)and3LAA(bottomright).Structures1DF4and2Y7L(top)representthehighscoringstructures,thosewithscoresnear0.9.Structures2Y7Land3LAA(bottom)showthetypicalpatternofcorrelationscoresforthemajorityofproteinstested.valuesrangefrom0.5to20.0atanintervalof0.5andvaluesrangefrom0.5Ato20Aatanintervalof0.5A.47Figure4.8:ComparisonofcorrelationcotscalculatedusingoptimalparametersforbothLorentzandexponentialcorrelationkernels.Averagedeviation=0.0182(left)and0.0365(right).FortheLorentzkerneloptimalparametersearch,˛valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom1.0Ato40.0Aatanintervalof1.0A.Fortheexponentialkernelparametersearch,valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom0.5Ato20.0Aatanintervalof0.5A.TheparameterfreeLorentzkerneluses˛=2.5and=1.0Aandtheparameterfreeexponentialkerneluses=1.5and=5.0A.andtheseresultsdonotindicateaclearadvantageinusingeitherkernel.InFig.4.9:wecomparethecorrelationcoentsfromtheparameterfreeandoptimizedversionsofthemethodforbothcorrelationkernels.Ineachcasetheoptimizedmethodoutperformstheparameterfreemethodnomatterwhichkernelisused.Againthissuggeststhatneitherkernelhasanadvantageovertheotherforthismethod.Themaximalaveragedeviationamongthesemethodsis0.0549,meaningthattheparameterfreeexponentialkernelcaptures94%ofthebestresultsgeneratedbyoptimizedLorentzkernelforthissetofproteins.Similarly,theparameterfreeexponentialkernelcaptures94%ofthebestresultsfromtheoptimizedexponentialkernel.ItisworthwhiletonotethattheparameterfreeLorentzkernel(˛=2.5and=1.0A)isabletocapture95%ofthebestresultsgeneratedbyeithertheoptimizedexponentialorLorentzkernelforthissetofproteins.Therefore,itappearsthatthebothparameterfreekernelsareveryrobustforpracticalapplications.48Figure4.9:Comparisonofcorrelationcotscalculatedusingoptimalparametersandparameterfreeversionsofthemethod.Theoptimizedcorrelationcontsarethehighestscoringfromaparametersearch.FortheLorentzkerneloptimalparametersearch,˛valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom1.0Ato40.0Aatanintervalof1.0A.Fortheexponentialkernelparametersearch,valuesrangefrom0.5to10.0atanintervalof0.5andvaluesrangefrom0.5Ato20.0Aatanintervalof0.5A.TheparameterfreeLorentzkerneluses˛=2.5and=1.0Aandtheparameterfreeexponentialkerneluses=1.5and=5.0A.Theliney=xisshownforreference.Pointsonthelineindicatelittleornobetweenoptimizedparametersandtheparameterfreeresults.Averagedeviationsare0.0410,0.0549,0.0463,and0.0540(fromlefttorightandfromtoptobottom).49Figure4.10:Catomsof1QD9inVDWrepresentationscaledbypredictedB-factor(bothimages)andcoloredwithelectrostatics(right).LargerVDWradiirepresentmoreexibleatomssuchasthosenearthesurfaceofthissolubleprotein.SmallerVDWradiirepresentmorerigidatomssuchasthoseinthecoreoftheprotein.Ontheright,atomsarecoloredbyelectrotaticsrevealingtwochargeddomains.First,theouteraminoacidshavesomeareasofpositivechargethatinteractwiththebulksolvent.Second,ahighlynegativelychargedportionoftheproteincoreishighlightedinred.Thesechargesarestabilizedbyinternalwatermolecules.4.1.2RigidityandyvisualizationFromtheaboveanalysis,therigidityandyindicescanbeobtainedatcoordinatesofCatomsintheprotein.Suchvaluescanbeutilizeddirectlyforvisualization.Forthepurposeofvisualization,itisttoploteitherrigidityory.Alargevalueoftheyindexcanberepresentedbyalargeatomicradiusinthevisualizationwhileasmallityindexcorrespondsasmallatomicradius.Therefore,wescaleatomicvanderWaalsradiibytheiryindicesasshowninFig.4.10:for1QD9.Clearly,CslocatednearmolecularboundaryaremoreAdditionally,theyindexcanbevisualizedtogetherwithelectrostaticpotential.50Sp,theyisrepresentedbytheatomicsizewhiletheelectrostaticsisillus-tratedbycolorasshownintherightchartofFig.4.10:.Thereisacorrelationbetweenyandpartialchargeinthisstructure|chargedresiduesareslightlylessFromtheseweseetheimageofatypicalsolubleproteinwithpartiallychargedresiduesonthesolvent-soluteboundaryandalessrigidcore.Itiswell-knownthatthepartiallychargedouterproteinsurfaceisresponsibleformanyproteinfunctionsinenzymes,cellsignalingandligandbinding.Interestinglythissolubleproteinhasahighlychargedcoremadeupofmanynegativelychargedresiduesinteractingwithanetworkofwatermolecules.Thisresultsinanegativelycharged,rigidcorewhichisrepresentedbysmall,redVDWspheres.Furthermore,inordertostudytheelasticdynamics,elastostatics,andcollectivemotionofamacromolecule,thecontinuousatomicrigidityandyfunctionsarerequiredinourmultiscalemultiphysicsmultiphysicsandmultidomainmodels.ThespatiallyscatteredinformationateachCcoordinateneedstobeinterpolatedintocontinuousatomicrigidityandexibilityfunctions.Inthiswork,weemploythemoShepard'smethodtointerpo-laterigidityandyvaluesatCcoordinatestobuildtheircontinuousfunctions.56,65TheessenceofShepard'smethodistoblendlocalinterpolantswithlocallysupportedweightfunctions.Forexample,theatomicibilityfunctioncanbeexpressedasF(r)=NXi=1Wi(r)Qi(r);(4.1)wherethelocallysupportedweightfunctionisasWi(r)=pi(krrik;Ri)PNi=1pi(krrik;Ri);(4.2)pi(krrik;Ri)=8>>><>>>:RirrikRikrrik2;krrik0isaconstantradiuswithithCasitscenter.Itsvaluevarieswithisoastoincludetnumbersofpointsintoitsdomainwhenitisnecessary.65OurinputdataareasetatomicyindicesffigorthepredictedB-factorsfBtiglocatedatCs.Wedenoter=(x;y;z);r2SEageneralpositioninsidetheelasticdomainofamacromolecule,andthelocalinterpolantisanodalfunctionas,Qi(r)=ai1x2+ai2y2+ai3z2+ai4xy+ai5xz+ai6yz+ai7x+ai8y+ai9z+ai10;(4.4)whereaijarecotsandQi(r)isaquadraticpolynomialfunctionwhichinterpolatesthepredictedB-factorsatneighboringsetofClocations,namelyQi(rj)=Btjij(4.5)whereijistheKroneckerdeltafunction.ForagivenithC,Eq.(4.5)isrepeatedlyemployedonallCswithinthegivensphereofradiusRiandresultsinanumberofalgebraicequations.Thealgebraicequationsaresolvedbyusingtheweightedleastsquaremethod,whichdeterminescotsaij.Fortlylargedata,wecanchoose32surroundingatomicyindicestocots.65Notethattheatomicrigidityfunction((r))canbeconstructedinthesamemannerbyreplacingBtjwithj.InFig.4.11:wecompareanatomisticandacontinuousrepresentationforyofprotein1QD9.ThemolecularsurfaceontheleftiscoloredbyX-rayB-factors,whilethemolecularsurfaceontherightiscoloredbytheinterpolatedyvalues.Overall,theinterpolatedvaluesmimictheB-factorpatternclosely.However,thepredictedyattheinnerringofthestructureishigherthanthatgivenbyX-rayB-factorsduetothefactwatermoleculespartoftheinnercoreinthefullstructure.TheB-factorcolormapisdiscontinuous.Incontrast,theymapgeneratedwiththeFRImethodhastheadvantageofbeingcontinuousbothonthesurfaceandintheinterioroftheprotein.Theatomicrigidityfunctionandatomicyfunctionconstructedinthepresentworkwill52Figure4.11:ThemolecularsurfaceofProtein1QD9coloredbyB-factor(left)andcontinuousFRIrepresentation(right).TheyindexiscalculatedusingtheLorentzmethodwith˛=2.5and=1.0A.ImagesgeneratedbyVMDusingBWRcolorbarandscale10to50forB-factorsand0.75to0.90fortheyindex.Inbothimages,blueregionsindicatelowyandredregionsindicatehighy.Ontheleft,B-factorisanatomisticrepresentationofxibility.Ontheright,FRIisusedtopredictityandthecontinuumrepresentationismappedtotheproteinsurface.Thecontinuumpredictionmatchestheexperimentalypatterncloselyexceptfornearthecoreoftheproteinwhichcontainssomestructuralwaternotincludedinourmodel.53beutilizedtostudymacromolecularelasticdynamics,elastostaticsandelasticvibrationinourfuturework.4.2FastFRImethod4.2.1fFRIparametertestingFigure4.12:Parametertestingforexponential(Leftchart)andLorentz(Rightchart)func-tions.AveragecorrelationcotofB-factorpredctionsof365proteinsisplotagainstchoiceofforarangeofvaluesforor˛.ToanalyzethebestparameterforLorentzandexponentialfunctions,westudytheirbehaviorinFig.4.12:,whereeachfunctionistestedoverarangeofparameters.Forexponentialtypeoffunctions,=1and=3Agiverisetoanearoptimalparameter-freeFRI.Similarly,forLorentztypeoffunctions,˛=3,and=3Anearoptimalresults.ItisseenfromFig.4.12:thatexponentialfunctionsarequitesensitivetovalues,whileLorentzfunctionsarerelativelyrobustwithrespectto.ThisstudyprovidesabasisfortheselectionofparameterfreeFRI(pfFRI)schemes.ItisinterestingtoanalyzetheperformanceoftheproposedfFRIintermsofaccuracyand.Tothisend,werstexploretheimpactofboxsizetothecorrelationcots54Figure4.13:Theimpactofboxsizetotheaveragecorrelationcotforasetof365proteins.ThefFRIisexaminedoverarangeofvaluesforparameters(and˛)toillustratetherelationshipbetweenaccuracyandchoiceofboxsizeR.ofafewfFRIschemesinFig.4.13:.Foreachgivenand˛,thebestfoundinFig.4.12:isemployed.ItisseenfromFig.4.13:thatbothexponentialandLorentztypesoffunctionsareabletoachievetheirnearoptimalperformanceatR=12A.Therefore,werecommendR=12A,=3Aand=1fortheexponenttypeoffFRImethod.Similarly,R=12A,=3Aand˛=3arenearoptimalforLorentztypeoffFRImethods.4.2.2ComparisonofB-factorpredictionsfromfFRI,GNMandNMA4.2.2.1FRIvsGNMandNMAInordertocomparetheFRIandGNM,were-analyzedthestructuresfromParketal.52withtheGNMmethodwithavalueof7A,thesamevalueusedbytheauthors.ItwasfoundthatsomecorrelationcotswerelowforGNMduetomultiplecoordinatesforsomeCatomsinsomePDBdataandmissingCatomsinothers.ToensureafaircomparisonbetweentheFRIandGNMwere-analyzedthestructuresusingGNMafterprocessingthePDBtotheseissues.Weremovedallbutthehighest55Figure4.14:ComparisonofcorrelationcotsfromB-factorpredictionusingGNM,coarse-grained(C)NMAandFRImethods.Topleft:pfFRIvsopFRIfor365proteins;Topright:opFRIvsGNMfor365proteins;Bottomleft:pfFRIvsGNMfor365proteins;Bottomright:pfFRIvsNMAforthreesetsofproteinsusedbyParketal.52ThecorrelationcotsforNMAareadoptedfromParketal.52forthreesetsofproteins.ForoptimalFRI,parameter˛isoptimizedforarangefrom0.1to10.0.FortheparameterfreeversionoftheFRI(pfFRI),weset˛=3and=3A.Theliney=xisincludedtoaidincomparingscores.56occupancycoordinatesforeachatomandusedeveryCatomfromtheoriginalPDBtoruntheGNMB-factorpredictioncodeandcalculatecorrectedcorrelationcots.InTables4.2:,4.3:and4.4:,optimalandparameterfreeFRIiscomparedtotheGNMdatareportedbyParketal.52ThenewlycalculatedcorrelationcotisshownonlyifthereisatimprovementusingourprocessedPDBOntheotherhand,Tables4.5:through4.9:listallcorrelationcotsforGNMfromourowntestsusingourprocessedPDBThesecorrelationcotsaretypicallythesameasthosereportedbyParketal.52althoughsomehavechanged.TheuseofourprocessedPDBleadstoaslightincreaseintheaveragescoresfortheGNMinouranalysis.Table4.1:AveragecorrelationcotsforCB-factorpredictionwithFRI,GNMandNMAforthreestructuresetsfromParketal.52andasupersetof365structures.PDBsetopFRIpfFRIGNMNMASmall0.6670.5940.5410.480Medium0.6640.6050.5500.482Large0.6360.5910.5290.494Superset0.6730.6260.565NATodirectlycomparetheFRIwithGNMandNMA,wecalculatedthecorrelationcoef-tofCB-factorpredictionsforthethreestructuresetstakenfromParketal.52TofurthercomparetheFRIandGNM,wealsocalculatedtheaccuracyofthesetwomethodsonasupersetof365structures.TwoversionsoftheFRIareusedforthesetests.TheoptimalFRI(opFRI),searchesawiderangeofparametersforthehighestscoringparameterandthesecond,parameterfreeFRI(pfFRI),uses˛=3and=3Ainallcases.ThecorrelationcotsforthreesetsproposedbyParketal.arereportedinTables4.2:,4.3:and4.4:forFRI,GNMandNMA.TheresultsoftheB-factorpredictionsforthesupersetareshowninFig.4.14:.Usingthetopleftchartasanexample,bothaxesarecorrelationcots.Foreachcircle,itsx-coordinateisitscorrelationcotforpfFRI,whileits57y-coordinateisitscorrelationcotforopFRI.Sinceallcirclesarelocatedabovethediagonalline,opFRIalwaysoutperformpfFRI.TheaveragecorrelationscoresforoptimalFRI,parameterfreeFRI,GNMandNMAforeachsetofstructuresarelistedinTable4.1:.AsshowninTable4.1:andFig.4.14:,opFRIoutperformspfFRIinmanycasesalthoughthemajorityofstructureshavelittleintheirscoreforeachmethod.BothoptimalandparameterfreeFRImethodsoutperformGNMandNMAformoststructures.B-factorpredictionwiththeFRIismostaccurateforsmallerstructures(<70residues).Allthreemeth-odstendtoperformworseasthestructuresgetlargerexceptinthecaseofNMAwherethemedium-sizedstructuresscoredslightlylowerthanthelarge-sizedstructures.Thisbehaviorisexpectedbecauseasproteinsgetlargertheirstructuresbecomemorecomplexandmayincludestructuralco-factorsandmoreaminoacidsidechaininteractionsthatcontributetotheprotein'sstability.Thecoarse-grainedCrepresentationusedinthesemethodsisunabletocapturethesekindsofdetails.TheaverageincreaseincorrelationcoetswhenusingtheFRIoverGNMonthesupersetof365proteinsis0.096foropFRIand0.059forpfFRI.Additionally,opFRIandpfFRIaremoreaccurateonaveragethanGNMandNMAforallthreesetsofstructuresusedbyParketal.52FromtheseresultsweconcludethatbothFRIandpfFRIaremoreaccurateonaveragethaneitherGNMorNMA.4.2.2.2fFRIvsGNMTable4.10:liststheaveragecorrelationcotsofB-factorpredictionfor365proteinsusingfFRIschemesatagiventruncation(R=12A).ItisseenthattheproposedfFRIschemesimplementedineitherexponential(=3Aand=1)orLorentz(=3Aand˛=3)areatleast10%moreaccuratethantheGNM.4.3MultikernelmultiscaleFRImethodInthissection,weimplementandvalidatetheproposedmFRIforB-factorprediction.Animmediateconcernistheaccuracyofmulti-kernelFRImethodwhichistestedbytheB-58Table4.2:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forsmall-sizestructures.yGNMandNMAvaluesaretakenfromthecoarse-grained(C)GNMandNMAresultsreportedinParketal.52exceptwherestarred(*).Starredvaluesindicatecorrelationcots,fromourowntestofGNM,thathavesitlyincreasedcomparedtothevaluesreportedbyParketal.52PDBIDNopFRIpfFRIGNMyNMAy1AIE310.5880.4160.1550.7121AKG160.3730.3500.185-0.2291BX7510.7260.6230.7060.8681ETL120.7100.6090.6280.3551ETM120.5440.3930.4320.0271ETN120.0890.023-0.274-0.5371FF4650.7180.6130.6740.5551GK7390.8450.7730.8210.8221GVD520.7810.7320.5910.5701HJE130.8110.6860.6160.5621KYC150.7960.7630.7540.7841NOT130.7460.6220.5230.5671O06200.9100.8740.8440.9001OB4160.7760.7630.750*0.9301OB7160.7370.5450.652*0.9521P9I290.7540.7420.6250.6031PEF180.8880.8260.8080.8881PEN160.5160.4650.2700.0561Q9B430.7460.7260.6560.6461RJU360.5170.4470.4310.2351U06550.4740.4290.4340.3771UOY640.7130.6530.6710.6281USE400.4380.146-0.142-0.3991VRZ210.7920.6950.677*-0.2031XY280.6190.5700.5620.4581YJO60.3750.3330.4340.4451YZM460.8420.8340.9010.9392DSX520.3370.3330.1270.4332JKU350.8050.6950.6560.8502NLS360.6050.5590.5300.0882OL960.9090.9040.6890.8862OLX40.9170.8880.8850.7766RXN450.6140.5740.5940.30459Table4.3:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)formedium-sizestructures.yGNMandNMAvaluesaretakenfromthecoarse-grained(C)GNMandNMAresultsreportedinParketal.52exceptwherestarred(*).Starredvaluesindicatecorrelationcots,fromourowntestofGNM,thathavesitlyincreasedcomparedtothevaluesreportedbyParketal.52PDBIDNopFRIpfFRIGNMyNMAy1ABA870.7270.6980.6130.0571CYO880.7510.7020.7410.7741FK5930.5900.5680.4850.3621GXU880.7480.6340.4210.5811I71830.5490.5160.5490.3801LR7730.6790.6570.6200.7951N7E950.6510.6090.4970.3851NNX930.7950.7890.6310.5171NOA1130.6220.6040.6150.4851OPD850.5550.4090.3980.7961QAU1120.6780.6720.6200.5331R7J900.7890.6210.3680.0781UHA830.7260.6650.638*0.3081ULR870.6390.5940.4950.2231USM770.8320.8090.7980.7801V05960.6290.5990.6320.3891W2L970.6910.5640.3970.4321X3O800.6000.5590.6540.4531Z21960.6620.6380.4330.2891ZVA750.7560.5790.6900.5792BF9360.6060.5540.680*0.5212BRF1000.7950.7640.7100.5352CE0990.7060.5980.5290.6282E3H810.6920.6820.6050.6322EAQ890.7530.6900.6950.6882EHS750.7200.7130.7470.5652FQ3850.7190.6920.3480.5082IP6870.6540.5780.5720.8262MCM1130.7890.7130.6390.6432NUH1040.8350.6910.7710.6852PKT930.1620.003-0.193*-0.1652PLT990.5080.4840.509*0.1872QJL990.5940.5840.5940.4972RB8930.7270.6140.5170.4853BZQ990.5320.5160.4660.3515CYT1030.4410.4210.3310.10260Table4.4:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forlarge-sizestructures.yGNMandNMAvaluesaretakenfromthecoarse-grained(C)GNMandNMAresultsreportedinParketal.52exceptwherestarred(*).Starredvaluesindicatecorrelationcots,fromourowntestofGNM,thathavesitlyincreasedcomparedtothevaluesreportedbyParketal.52PDBIDNopFRIpfFRIGNMyNMAy1AHO640.6980.6250.5620.3391ATG2310.6130.5780.4970.1541BYI2240.5430.4910.5520.1331CCR1110.5800.5120.3510.5301E5K1880.7460.7320.8590.6201EW41060.6500.6440.5470.4471IFR1130.6970.6890.6370.3301NKO1220.6190.5350.3680.3221NLS2380.6690.5300.523*0.3851O082210.5620.3330.3090.6161PMY1230.6710.6540.6850.7021PZ41140.8280.7810.8430.8441QTO1220.5430.5200.3340.7251RRO1120.4350.3720.5290.5461UKU1020.6650.6610.7420.7201V701050.6220.4920.1620.2851WBE2040.5910.5770.5490.5741WHI1220.6010.5390.2700.4141WPA1070.6340.5770.4170.3802AGK2330.7050.6940.5120.5142C712050.6580.6490.5600.5842CG7900.5510.5390.3790.3082CWS2270.6470.6400.6960.5242HQK2130.8240.8090.3650.7432HYK2380.5850.5750.5100.5932I241130.5930.4980.4940.4412IMF2030.6520.6250.5140.4012PPN1070.6770.6380.6680.4682R161760.5820.4950.618*0.4112V9V1350.5550.5480.5280.5942VIM1040.4130.3930.2120.2212VPA2040.7630.7550.5760.5942VYO2100.6750.6480.7290.7393SEB2380.8010.7120.8260.7203VUB1010.6250.6100.6070.36561Table4.5:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forasetof365proteins.GNMscoresreportedherearetheresultofourtestsasdescribedinSection4.1.1PDBIDNopFRIpfFRIGNMPDBIDNopFRIpfFRIGNM1ABA870.7270.6980.6131PEF180.8880.8260.8081AGN14920.3310.0510.1701PEN160.5160.4650.2701AHO640.6980.6250.5621PMY1230.6710.6540.6851AIE310.5880.4160.1551PZ41140.8280.7810.8431AKG160.3730.3500.1851Q9B430.7460.7260.6561ATG2310.6130.5780.4971QAU1120.6780.6720.6201BGF1240.6030.5390.5431QKI39120.8090.7510.6451BX7510.7260.6230.7061QTO1220.5430.5200.3341BYI2240.5430.4910.5521R291220.6500.6310.5561CCR1110.5800.5120.3511R7J900.7890.6210.3681CYO880.7510.7020.7411RJU360.5170.4470.4311DF4570.9120.8890.8321RRO1120.4350.3720.5291E5K1880.7460.7320.8591SAU1140.7420.6710.5961ES52600.6530.6380.6771TGR1040.7200.7110.7141ETL120.7100.6090.6281TZV1410.8370.8200.8411ETM120.5440.3930.4321U06550.4740.4290.4341ETN120.0890.023-0.2741U7I2670.7780.7620.6911EW41060.6500.6440.5471U9C2210.6000.5770.5221F8R19320.8780.8590.7381UHA830.7260.6650.6381FF4650.7180.6130.6741UKU1020.6650.6610.7421FK5930.5900.5680.4851ULR870.6390.5940.4951GCO10440.7660.6930.6461UOY640.7130.6530.6711GK7390.8450.7730.8211USE400.4380.146-0.1421GVD520.7810.7320.5911USM770.8320.8090.7981GXU880.7480.6340.4211UTG700.6910.6100.5381H6V29270.4880.4290.3061V05960.6290.5990.6321HJE130.8110.6860.6161V701050.6220.4920.1621I71830.5490.5160.5491VRZ210.7920.6950.6771IDP4410.7350.7150.6901W2L970.6910.5640.3971IFR1130.6970.6890.6371WBE2040.5910.5770.5491K8U890.5530.5310.3781WHI1220.6010.5390.2701KMM14990.7490.7440.5581WLY3220.6950.6790.6661KNG1440.5470.5360.5121WPA1070.6340.5770.4171KR41100.6350.6120.4661X3O800.6000.5590.6541KYC150.7960.7630.7541XY1180.8320.6450.4471LR7730.6790.6570.6201XY280.6190.5700.5621MF71940.6870.6810.7001Y6X870.5960.5240.36662Table4.6:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forasetof365proteins.GNMscoresreportedherearetheresultofourtestsasdescribedinSection4.1.1PDBIDNopFRIpfFRIGNMPDBIDNopFRIpfFRIGNM1N7E950.6510.6090.4971YJO60.3750.3330.4341NKD590.7500.7030.6311YZM460.8420.8340.9011NKO1220.6190.5350.3681Z21960.6620.6380.4331NLS2380.6690.5300.5231ZCE1460.8080.7570.7701NNX930.7950.7890.6311ZVA750.7560.5790.6901NOA1130.6220.6040.6152A504570.5640.5240.2811NOT130.7460.6220.5232AGK2330.7050.6940.5121O06200.9100.8740.8442AH19390.6840.5930.5211O082210.5620.3330.3092B0A1860.6390.6030.4671OB4160.7760.7630.7502BCM4130.5550.5510.4771OB7160.7370.5450.6522BF9360.6060.5540.6801OPD850.5550.4090.3982BRF1000.7950.7640.7101P9I290.7540.7420.6252C712050.6580.6490.5602CE0990.7060.5980.5292OLX40.9170.8880.8852CG7900.5510.5390.3792PKT930.1620.003-0.1932COV5340.8460.8230.8122PLT990.5080.4840.5092CWS2270.6470.6400.6962PMR760.6930.6820.6192D5W12140.6890.6820.6812POF4400.6820.6510.5892DKO2530.8160.8120.6902PPN1070.6770.6380.6682DPL5650.5960.5380.6582PSF6080.5260.5000.5652DSX520.3370.3330.1272PTH1930.8220.7840.7672E104390.7980.7960.6922Q4N1530.7110.6670.7402E3H810.6920.6820.6052Q524120.7560.7480.6212EAQ890.7530.6900.6952QJL990.5940.5840.5942EHP2480.8040.8040.7732R161760.5820.4950.6182EHS750.7200.7130.7472R6Q1380.6030.5400.5292ERW530.4610.2530.1992RB8930.7270.6140.5172ETX3890.5800.5560.6322RE22380.6520.6130.6732FB61160.7910.7860.7402RFR1540.6930.6710.7532FG11570.6200.6170.5842V9V1350.5550.5480.5282FN95600.6070.5950.6112VE85150.7440.6430.6162FQ3850.7190.6920.3482VH7940.7750.7260.5962G69990.6220.5900.4362VIM1040.4130.3930.2122G7O680.7850.7840.6602VPA2040.7630.7550.5762G7S1900.6700.6440.6492VQ41060.6800.6790.5552GKG1220.6880.6460.7112VY81490.7700.7240.5332GOM1210.5860.5840.4912VYO2100.6750.6480.72963Table4.7:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forasetof365proteins.GNMscoresreportedherearetheresultofourtestsasdescribedinSection4.1.1PDBIDNopFRIpfFRIGNMPDBIDNopFRIpfFRIGNM2GXG1400.8470.7800.5202W1V5480.6800.6800.5712GZQ1910.5050.3820.3692W2A3500.7060.6380.5892HQK2130.8240.8090.3652W6A1170.8230.7480.6472HYK2380.5850.5750.5102WJ5960.4840.4400.3572I241130.5930.4980.4942WUJ1000.7390.5980.5982I493980.7140.6830.6012WW71500.4990.4710.3562IBL1080.6290.6250.3522WWE1110.6920.5820.6282IGD610.5850.4810.3862X1Q2400.5340.4780.4432IMF2030.6520.6250.5142X251680.6320.5980.4032IP6870.6540.5780.5722X3M1660.7440.7170.6552IVY880.5440.4830.2712X5Y1710.7180.7050.6942J322440.8630.8480.8552X9Z2620.5830.5780.5742J9W2000.7160.7050.6622XHF3100.6060.5910.5692JKU350.8050.6950.6562Y0T1010.7780.7740.7982JLI1000.7790.6130.6222Y721700.7800.7540.7662JLJ1150.7410.7200.5272Y7L3190.9280.7970.7472MCM1130.7890.7130.6392Y9F1490.7710.7620.6642NLS360.6050.5590.5302YLB4000.8070.8070.6752NR71940.8030.7850.7272YNY3150.8130.8040.7062NUH1040.8350.6910.7712ZCM3570.4580.4220.4202O6X3060.8140.7990.6512ZU13600.6890.6720.6532OA21320.5710.4560.4583A0M1480.8070.7120.3922OCT1920.5670.5500.5403A7L1280.7130.6630.7562OHW2560.6140.5390.4753AMC6140.6750.6690.5812OKT3420.4330.4110.3363AUB1160.6140.6080.6372OL960.9090.9040.6893B5O2300.6440.6290.6013BA13120.6610.6240.6213MD4120.8600.7810.9143BED2610.8450.8200.6843MD5120.6490.413-0.2183BQX1390.6340.4810.2973MEA1660.6690.6690.6003BZQ990.5320.5160.4663MGN3480.2050.1190.1933BZZ1000.4850.4500.6003MRE3830.6610.6410.5673DRF5470.5590.5490.4883N113250.6140.5830.5173DWV3250.7070.6610.5473NE02080.7060.6450.6593E5T2280.5020.4890.2963NGG940.6960.6890.7193E7R400.7060.6870.6423NPV4950.7020.6530.6773EUR1400.4310.4270.5773NVG60.7210.6170.59764Table4.8:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forasetof365proteins.GNMscoresreportedherearetheresultofourtestsasdescribedinSection4.1.1PDBIDNopFRIpfFRIGNMPDBIDNopFRIpfFRIGNM3F2Z1490.8240.7920.7403NZL730.6270.5830.5063F7E2540.8120.8030.8113O0P1940.7270.7060.7343FCN1580.6400.6060.6323O5P1280.7340.6980.6303FE7910.5830.5330.2763OBQ1500.6490.6450.6553FKE2500.5250.4760.4353OQY2340.6980.6860.6373FMY660.7010.6550.5563P6J1250.7740.7670.8103FOD480.5320.440-0.1263PD71880.7700.7230.5893FSO2210.8310.8170.7933PES1650.6970.6420.6833FTD2400.7220.7130.6343PID3870.5370.5310.6423FVA60.8350.8250.7893PIW1540.7580.7440.7173G1S4180.7710.7000.6303PKV2210.6250.5970.5683GBW1610.8200.7470.5103PSM940.8760.7900.7453GHJ1160.7320.5110.1963PTL2890.5430.5410.4683HFO1970.6910.6700.5183PVE3470.7180.6670.5683HHP12340.7200.7160.6833PZ93570.7090.7090.6783HNY1560.7930.7230.7583PZZ120.9450.9220.9503HP41830.5340.5000.5733Q2X60.9220.9040.8663HWU1440.7540.7480.8413Q6L1310.6220.5770.6053HYD70.9660.9500.8673QDS2840.7800.7450.5683HZ81920.6170.5020.4753QPA1970.5870.4420.5033I2V1240.4860.4410.3013R6D2210.6880.6690.4953I2Z1380.6130.5990.3173R871320.4520.4190.2863I4O1350.7350.7140.7383RQ91620.5100.4030.2423I7M1340.6670.6350.6953RY01280.6160.6060.4703IHS1690.5860.5650.4093RZY1390.8000.7840.8493IVV1490.8170.7970.6933S0A1190.5620.5240.5263K6Y2270.5860.5350.3013SD2860.5230.4210.2373KBE1400.7050.7040.6113SEB2380.8010.7120.8263KGK1900.7840.7750.6803SED1240.7090.6580.7123KZD850.6470.6110.4753SO61500.6750.6660.6303L412200.7180.7160.6693SR36370.6190.6110.6243LAA1690.8270.6470.6593SUK2480.6440.6330.5673LAX1060.7340.7300.5843SZH6970.8170.8150.6973LG38330.6580.6140.5893T0H2080.8080.7750.6943LJI2720.6120.6080.5513T3K1220.7960.7480.7353LJI2720.6120.6080.5513T3K1220.7960.7480.7353M3P2490.5840.5540.3383T471410.5920.5270.44765Table4.9:CorrelationcotsforB-factorpredictionobtainedbyoptimalFRI(opFRI),parameterfreeFRI(pfFRI)andGaussiannormalmode(GNM)forasetof365proteins.GNMscoresreportedherearetheresultofourtestsasdescribedinSection4.1.1PDBIDNopFRIpfFRIGNMPDBIDNopFRIpfFRIGNM3M8J1780.7300.7280.6283TDN3570.4580.4190.2403M9J2100.6390.5740.2963TOW1520.5780.5560.5713M9Q1760.5910.5100.4713TUA2100.6650.6580.5883MAB1730.6640.5910.4513TYS750.8530.8000.7913U6G2480.6350.6320.5264DT41600.7760.7380.7163U97770.7530.7360.7124EK32870.6800.6800.6743UCI720.5890.5260.4954ERY3180.7400.7010.6883UR86370.6660.6520.5974ES1950.6480.6250.5513US61480.6980.5860.5534EUG2250.5700.5290.4053V1A480.5310.4870.5834F014480.6330.3720.6883V752850.6040.5960.4914F3J1430.6170.5980.5513VN01930.8400.8370.8124FR91410.6710.6550.5013VOR1820.6020.5570.4844G14150.4670.3230.3563VUB1010.6250.6100.6074G2E1510.7600.7550.7583VVV1080.8330.7410.7534G5X5500.7860.7540.7433VZ91630.7850.7490.6954G6C6580.5910.5900.5283W4Q7730.7370.7250.6494G7X1940.6880.5870.6243ZBD2130.6510.5160.6324GA21440.5280.4850.4063ZIT1520.4300.4040.3924GMQ920.6780.6280.5503ZRX2210.5900.5620.3914GS3900.5440.5220.5473ZSL1380.6910.6870.5264H4J2360.8100.8060.6893ZZP740.5240.4600.4484H891680.6820.5880.5963ZZY2260.7460.7090.7284HDE1680.7450.7280.6154A021660.6180.5160.3034HJP2810.7030.6490.5104ACJ1670.7480.7460.7594HWM1170.6380.6220.4994AE71860.7240.7170.7174IL7850.4460.4040.3164AM13450.6740.6190.4604J113570.6200.5620.4014ANN1760.5510.5360.4704J5O2200.7930.7570.7774AVR1880.6800.6050.6504J5Q1460.7420.7420.6894AXY540.7000.6230.7204J783050.6580.6480.6084B6G5580.7650.7560.6694JG21850.7460.7360.5434B9G2920.8440.8160.7634JVU2070.7230.6970.5534DD53870.6150.5960.3514JYP5340.6880.6820.5384DKN4230.7810.7610.5394KEF1330.5800.5300.3244DND950.7630.7500.5825CYT1030.4410.4210.3314DPZ1090.7300.7260.6516RXN450.6140.5740.5944DQ73280.6900.6830.37666Table4.10:Averagecorrelationcots(CC)ofB-factorpredictionforasetof365proteinsusingfFRI(R=12).TheimprovementsofthefFRIovertheGNMprediction(0.565)aregiveninparentheses.ExponentialparametersAvg.CCLorentzparametersAvg.CC=0.5,=0.50.615(8.8%)˛=2.5,=2.00.622(10.1%)=1.0,=3.00.623(10.3%)˛=3.0,=3.00.626(10.8%)=1.5,=6.00.619(9.6%)˛=3.5,=4.00.623(10.3%)factorpredictionofasetof364proteinstructures.49AnotherconcernistheparameterizationofmFRIandhowthechoiceoftheparameterB-factorpredictionforstructuresoftsizes.Finally,weexaminewhethertheproposedmFRIisascomputationallytastheoriginalfFRI.4.3.1mFRIB-factorpredictionTotesttheaccuracyofmFRIonproteinstructuresweuseatestsetcontaining364proteinstructures.ThisisthesametestsetusedinourpreviousFRIpaperwheretheProteinDataBank(PDB)identitiesarelisted49anditcontainstestsetsusedinGNMstudies.52ThistestsetomitsonestructurepresentinpreviousFRIstudies(PDBID:1AGN)duetounrealisticB-factordata.ToquantitativelyassesstheperformanceoftheproposedmultikernelbasedmFRImethod,weconsiderthecorrelationcotCc=Ni=1BeiBeBtiBtNi=1(BeiBe)2Ni=1(BtiBt)21=2;(4.6)wherefBti;i=1;2;;NgareasetofpredictedB-factorsbyusingtheproposedmethodandfBei;i=1;2;;NgareasetofexperimentalB-factorsextractedfromthePDBHereBtandBethestatisticalaveragesoftheoreticalandexperimentalB-factors,respectively.4.3.1.1MultiscalecorrelationsofmacroproteinsToillustratethemultiscalebehaviorofyanalysis,weneedtoconstructcorrela-tionfunctionswithsharpkernelresponsesimilartothatofaHeavisidestepfunction.To67thisend,weset=25fortheexponentialtypeofcorrelationkernels.Inthiscase,theone-kernelFRImethodbehavesliketheGNMmethod.Thebestperformanceforone-kernelFRIisobtainedat1=7Aandtheassociatedaveragedcorrelationcotforthe364testsetis0.540,whichissimilartothatobtainedbyusingGNM.49Obviously,thetypeofkernelbehaviorobtainedat=25doesnotrecognizeanylarge-scalecorrelationbeyond7Ainmacromolecules.Tocapturelarge-scalecorrelations,weemploythesecondexponentialkernelwithitsscale(2>1)varyingoverarangeofvaluesasshowninTable4.11:.N2=9A2=12A2=15A2=17A2=20A2=25A0-990.0550.0830.1000.1020.0970.083100-1990.0610.0930.1010.1000.0990.093200-2990.0510.0870.0970.0970.0950.087300-3990.0690.1080.1150.1190.1230.108400-4990.0790.1260.1480.1570.1550.126500+0.0640.1070.1360.1430.1400.107Overall0.0600.0940.1060.1080.1060.094Table4.11:ImprovementsinaveragedcorrelationcotsfortheB-factorpredictionofasetof364proteinsduetotheintroductionofanadditionalkernelparameterizedatalargescale(2).Twoexponentialkernelswith=25areemployed.Thekernel'sscalevalueissetto1=7:0Ainallcases.Thesecondkernel'sscalevalue(2)isvariedandlistedonthetopofthetable.Resultsareorganizedandsplitbythesizeofthestructuresbasedonthenumberofaminoacidsinordertoshowtheimpactoft2valuesontsizesofproteins.Toanalyzethescalebehaviorduetoproteinsize,weclassify364proteinsinto6groups.Theimprovementsofaveragedcorrelationcotsduetotheintroductionofanaddi-tionalkernelarelistedinTable4.11:foranumberoflargescalevalues2.First,theB-factorpredictionsfromvarioussizeclassesareantlybfromtheintroductionofthelarge-scalekernel.Additionally,atthescalevalueof2=17A,theaveragedcorrelationcotis0.648andtheassociatedimprovementtotheoriginalFRIorGNMmethodsis20%forthesetof364proteins.NotethatthismultiscaleimprovementcannotbeeasilyachievedbyGNM,NMA,oranyothermodedecompositionbasedmethods.Moreover,the68large-scalekernelleadstothemosttimprovementintheB-factorpredictionforrelativelylargeproteins,proteinswith400-499residues,whichindicatesthatlargeproteinshavemoretmultiscalecorrelationsthansmallproteinsdo.Finally,theimprove-mentintheB-factorpredictionforproteinswithmorethan500residuesisnotasmuchasthatforproteinswith400-499residues,whichindicatesthattwoscalesarenotenoughtocaptureallthemultiscalecorrelationsinproteinswithmorethan500residues.Thisobser-vationsuggeststhatthreekernelsormultikernelsareneededfortheB-factorpredictionofexcessivelylargeproteins.4.3.1.2parameterizationoftwo-kernelbasedmFRITofurtherunderstandthetwo-kernelbasedmFRImethod,weconsiderthecombinationoftwotypesofkernels.PrevioustestsofsinglekernelFRIindicatethattheLorentztypeandexponentialtypeofcorrelationkernelsarethetwomostaccuratesinglekerneltypes.Thisleadsustotrythecombinationofthesetwotypesofkernels.Thesamesetof364proteinsisemployedtotestourmethod.Tosimplifytheparametersearches,wesettheparameteroftheexponentialkernelto1.0andsetthe˛parameteroftheLorentzkernelto3.0,whicharetheoptimalvaluesfromsinglekerneltests.49OurresultsaredepictedinFig.4.15:.Asexpected,theadditionofasecondcorrelationkernelresultsinanoverallincreaseinaccuracyforB-factorpredictions.ForsinglekernelFRI,theaveragecorrelationcotforB-factorpredictiononthesetof364structuresis0.626.Byswitchingtothetwo-kernelFRItheaveragedcorrelationcotforthissetincreasedupto0.663.TheimprovementintheB-factorpredictionaccuracyissixpercentoverprevioussinglekernelFRImethods.Thisaveragedcorrelationcoientisalsobetterthanthevalueof0.640achievedwithtwosharpresponseexponentialkernels(=25).69Figure4.15:Parametertestingforatwo-kernelbasedmFRImethod.Valuesforarevariedforeachkernel,bothLorentzkernels.Herevaluesforeitherkernelarelistedalongtheaxises.TheaveragedcorrelationcotforB-factorpredictiononasetof364proteinsisshownineachcellofthematrixandcolorcodedforconveniencewithredrepresentingthehighestcorrelationcotsandgreenthelowest.Obvious,thecombinationofarelativelysmall-scalekernelandarelativelylarge-scalekerneldeliversbestprediction,whichshowstheimportanceofincorporatingmultiscaleinproteintyanalysis.Figure4.15:indicatesthatthebestresultsareattainedeitherfromthecombination70ofarelativelysmall-scaleexponentialkernelandarelativelylarge-scaleLorentzkernel,orfromthecombinationofarelativelysmall-scaleLorentzkernelandarelativelylarge-scaleexponentialkernel.Thecombinationoftwosmall-scalekernelsorthecombinationoftwolarge-scalekernelsdoesnotmuchimprovementtotheoriginalsinglekernelFRImethod.Thisbehaviorprovesagaintheimportanceofincorporatingmultiscaleintheyanalysisofmacromolecules.4.3.1.3ThreeKernelbasedmFRIThelatestmultikernelFRImethodcombinesthreekernels.Aftersometestingwehavedecideduponusingonekernelofexponentialdecay(=1)andtwokernelsofLorentztype(˛=3)withtscale()parametervalues.Thechoiceofkernelsandparametersisdrivenbytheideathateachkernelshouldcaptureinteractionsoftranges,e.g.,short-,medium-andlong-rangeinteractionseachbeingrepresentedbyatkernel.Theexponentialkernelischosentorepresenttheslowestdecayingforceswith3=15Aand=1whilethetwoLorentztypeofkernelscapturerelativeshort-andmedium-rangeinteractionswithparameters˛=3,1=3:0Aand2=7A,respectively.Theassociatedaveragedcorrelationcotforthe364testsetis0.689,whichisabout22%betterthanwhatobtainedbyusingtheGNMmethod.49Othercombinationsofkernelparametersweretriedinwhichtheexponentialkernelexhibitedthequickestdecay,however,theydidnotperformaswellinB-factorpredictiontests.ThefastdecayingLorentzkernel,1=3Aand˛=3,maybewellsuitedtocapturetheofchemicalbondsduetoitsparticularshapeofdecaywhichhighlyfavorsinteractionsbelow3.0A.4.3.2ComputationalcomplexityofmFRIIthasbeenpreviouslybeendemonstratedthatthecomputationalcomplexityofthesinglekernelFRImethodisasymptoticallyofO(N2).Bymakinguseofthecelllistsalgorithm,71fFRIachievesacomputationalcomplexityofO(N).TheadditionofmultiplekernelstotheFRImethoddoesnotthisaspectofscaling,however,therunningtimeforB-factorpredictiondoesincreasewitheachadditionalkernelslightly.Indeed,themulti-kernelregressionrequirestooptimizeonemoreparameterwiththeadditionofeachnewkernel.TheimpactofthesechangesontherunningtimeofFRI-basedB-factorpredictionisshowninFigure4.16:.Weemploythesamedatasetsandtestconditionsasthosedescribedinourearlierpaper49forthepresenttest.ThedatausedfortestingmFRIandfFRIarethesameasthoseusedintestingthefFRIinTableVIIIofRef.49IntestingtheGNM,thesamedatasetasthatlistedinTableVIIIofRef.49isemployed.72Figure4.16:ComputationalofmultikernelfastFRI(multifFRI)relativetosinglekernelfastFRI(fFRI)andGNM.ThedatasetsusedforthepresentncystudyarethesameasthoselistedinTableVIIIofRef.49ClearlytheimpactofextrakernelsdoesnottheessentiallylinearscalingoffFRIwithlinesofforfFRIandmultikernelfastFRI(multifFRI)beingt=7106N0:957andt=8106N0:959respectively.Theincreaseincomputationtimeisminorespeciallyformoleculeswithsmallernumbersofatoms.Incontrast,thelineoffortheGNMist=4108N3:09.49Notethateachincreaseinoneadditionalkernelleadstoonlyonemoreparameter,forwhichthetimeisnegligiblysmall.Onlyinextremecases,73withsystemsfarlargerthanthosecurrentlystudiedatomistically,mightsinglekernelFRIbepreferred.Therefore,itispreferabletousemultikernelbasedmFRIoversinglekernelFRIprovidedthereisatincreaseinaccuracyandreliability,aswasdemonstratedpreviously.NotethatthelargesttestmoleculeisanHIVviruscapsid,whichhasthan313236aminoacidresidues.ItwouldtaketheGNMmorethan120yearstothepredictionifthecomputermemoryisnotaproblem.Incontrast,theproposedmFRIdoesthejobinabout30secondsorlessonasingleworkstationdependingontheprocessingpower.4.4MultiscaleFRIapplicationsTheimprovementintheaveragedcorrelationcotforB-factorpredictiononasetof364proteinsdiscussedinthelastsectionobscuresthefactthatsomestructuresshowmuchlargerimprovementsthanjusttenpercent.Inthissection,wehighlightsomeexampleswheretheimprovementisuptothreetimesmoreaccuratethanGNMandalsosingle-kernelFRI.TheproposedmFRIprovidesexcellentB-factorpredictionsformanycasesthattheprevioussingle-scalebasedtheoriesandalgorithmsdonotworkatall.WeexploresomeoftheadvantagesofusingFRIforilityanalysisbyexploringap-plicationsofthemultikernelFRImethod.First,weexploretheimprovementinrepresentinghingesinproteinstructuresthatcomesfromusingone,twoandthreekernelmethods.Then,wehighlightsomeotherinstancesofypredictionwherethemultiplekernelapproachisclearlysuperiortotosinglyparameterizedmethodssuchasGNMandtheorig-inalFRIformulation.Inallcases,GNMisusedtopredictb-factorsusingthesuggestedparameterizationof7Angstroms.AnattemptwasmadetoamoreoptimalparameterforGNMforeachstructureinthissection,atthevaluesof5,6and8Angstroms.Ifatlybetterparameterizationwasfound,theresultsofb-factorpredictionusingthatparameterareincludedintheureandindicatedbythenameofthedataseries.GNM7representsaGNMwithaof7angstromsandotherparameterizationsarenamedsim-74ilarlywiththevaluechangedtothenewparameter.Forexample,aparameterizationof5AngstromswouldbelabeledasGNM5.Inallcases,themFRIb-factorpredictionshaveahighercorrelationtotheexperimentaldatathananyparameterizationofGNM.4.4.1FittinghingeregionsProteinhingeregionshavebeenshowntobecorrelatedwithactivesitesandcatalysisinenzymes.Flexibilityhasamajorroleinspyofbindingofaproteintootherproteins,nucleicacidsorothermolecules.AnactivesiteordockingregionthatismorewillaccommodatemorevariedsubstratesorpartnerswhilemorerigiddomainsaremorespProteinhingesarealsofoundseparatinglargedomainsofproteins.Inthiscontext,thehingescanbeveryimportantforproteinconformationalchanges.Theproteinfeaturedinthissection,calmodulin,isagoodexampleofahingethatsbothstructureandfunction.ThecentralregionofcalmodulinshowninFigure2.1:isalong-helixwhichisunwoundorkinkedatthemiddlewhennocalciumisboundtothetwodistalmetalcoordinatingdomains.Inbothforms,withorwithoutcalciumbound,thishelixretainsalargedegreeofybasedonB-factorvaluesfromthePDB(1CLLand1CFD).Manytoolsexistforthepredictionandanalysisofhingesinproteinsusingbioinformat-ics,26graphtheory21,39,59andenergetics.25TheproposedmFRIhascapabilitiessimilartothoseinthesetools.ThemFRIcanbeusedtopredicthingeregionsbyregionsofhighFRIvaluesorpredictedB-values.Manytoolsexistforthepredictionandanalysisofhingesinproteinsusingbioinformatics,26graphtheory21,39,59andenergetics.25TheproposedmFRIhascapabilitiessimilartothoseinthesetools.ThemFRIcanbeusedtopredicthingeregionsbyregionsofhighFRIvaluesorpredictedB-values.AcomparisonofvariouspfFRImethodsandGNMfortheB-factorpredictionofcalcium-boundcalmodulinisdisplayedinFigure4.17:.B-factorpredictionbysinglekernelFRIand75GNMisunabletoaccuratelypredictthehingeregioninthemiddleoftheproteinwithanyparameter.Two-andthree-kernelbasedmFRImethods,ontheotherhand,aremuchmoreaccurateinthehingeregion.Asmorekernelsareadded,theaccuracycanbeseentogrowbuttaccuracyisachievedatthreekernels.Figure4.17:ComparisonofB-factorpredictionsofcalmodulin(PDBID:1CLL)usingtheGNMdistanceis7A)andFRImethods.ExperimentalB-factorsshowahingeregioninthemiddleasshowninFigure2.1:.One-kernelFRI(FRI-1K)isparameterizedat˛=3,=3:0.Two-kernelFRI(FRI-2K)isparameterizedat1=1;1=3A,˛2=3,and2=10A.Three-kernelFRI(FRI-3K)isparameterizedat˛1=3,1=3A,˛2=3,2=7A,3=1,and3=15A.ThethreekernelbasedmFRIdeliversthebestB-factorpredictionforthehingeregion.4.4.2OtherproteinsthatbfrommFRIInthissectionwelookatfourspcasestodemonstratewhyamultiscaleapproachisrequiredtocapturethecomplexityofinteractionsorcorrelations.Ineachcase,wehaveusedboththree-kernelbasedmFRIandGNMtopredictB-factorsforthestructures.When76GNMperformedpoorly,tparametersweretriedtoseeifthereisamoreidealpa-rameterization.TheresultsofB-factorpredictionaremappedontotheresiduesforvisualcomparisonandshownplottedagainsttheexperimentalvaluesformoredetail.774.4.2.1CyantproteinFigure4.18:Top,avisualcomparisonofexperimentalB-factors(left),FRIpredictedB-factors(midlle)andGNMpredictedB-factors(right)fortheengineeredtealtprotein,mTFP1(PDBID:2HQK).Bottom,TheexperimentalandpredictedB-factorvaluesplottedperresidue.TheGNMnamingconventionindicatedtheusedfortheGNMmethodinangstroms,forexample,GNM7istheGNMmethodwithaof7A.Cyantprotein(CFP),showninFigure4.18:,isahomologofthefamousgreentprotein(GFP).Isolatedfromthecrystaljelinthe1990s,60GFP78enabledarevolutioninbiochemistrybyallowingthetaggingandtrackingofawiderangeofmolecules.CFPwasfoundlaterinAnthozoaspecieswhichhaveturnedouttobeagoodsourceoftproteinswithvariedemissionspectra.47InthisexampleweexaminetheyofanengineeredCFP1(PDBID:2HQK),mTFP1.ItisclearinFigure4.18:thatGNMB-factorpredictionscontainalargeerroraroundresidues50-60whichisverypronouncedattherecommendedof7Angstromsandisstillsomewhatproblematicwhentheischangedto8Angstroms.mFRIontheotherhandhasnoissuewiththisparticularregion.Uponfurtherinspection,itisclearthattheregionisthesmall,alpha-helicalregionsuspendedinthecenterofthebeta-barrel.ItisnotsurprisingthatthissortofwouldbehighlydependentinaschemesuchasGNM,whichhashardforconnectivity.Itwouldappearthatthisstructureisdominatedbyshort-rangeinteractionsbuttheregionofresidues50-60istoalargedegreebymid-rangeinteractions,thereforethereareatleasttwoimportantscalesofinteractioninthiscase.ItfollowsthenthatmFRI,whichhaskernelstocaptureshort-andmid-rangeinteractions,wouldperformbetterthanGNM7orGNM8methodsaloneinB-factorpredictionswhichisexactlywhatweseefromtheresultsinFigure4.18:.794.4.2.2AntibioticsynthesisproteinfromThermusthermophilusFigure4.19:Top,avisualcomparisonofexperimentalB-factors(left),FRIpredictedB-factors(midlle)andGNMpredictedB-factors(right)fortheengineeredtealtprotein,mTFP1(PDBID:1V70).Bottom,TheexperimentalandpredictedB-factorvaluesplottedperresidue.Asimilarsituationexistswiththestructure1V70,aprobableantibioticsynthesispro-tein,whichisshowninFigure4.19:.Asinthelastexample,theproblematicportionfor80B-factorpredictioncomesattheendofaproteinchain.Inthiscasethereisanoverestima-tionofyforresidues1-10whenusingGNM.Again,varyingparametersfromtherecommended7Aresultsinmarginallybetterresults,howevernoparameterizationisabletoreachtheaccuracyofmFRI.814.4.2.3RibosomalsubunitL14Figure4.20:Top,avisualcomparisonofexperimentalB-factors(left),FRIpredictedB-factors(midlle)andGNMpredictedB-factors(right)fortheribosomalproteinL14(PDBID:1WHI).Bottom,TheexperimentalandpredictedB-factorvaluesplottedperresidue.Thethirdexampleisabiologicallyimportantmolecule,ribosomalsubunitL14,acompo-nentofthe60Sribosomalsubunit.17DepictedinFigure4.20:,L14isastructurallydiverseproteincontainingregionsofalphahelix,beta-barrel,parallelbetastrandsandabeta-hairpin82motif.ThepatternofypredictedbyGNMforthisstructureisshowntobeoverexaggeratedastherigidareasarepredictedtobemorerigidthantheyactuallyareandviceverse.ThispatternexistsinmostGNMresultsduetotheuseofahardintheKirch-matrix.Suchahardwillinevitablyleadtotheoverestimationofbondimportanceneartheedgeofthetherefore,ifalargenumberofinteractionsexistforaparticularatomnearthecutopoint,thereislikelytobealargeerrorintheestimationofibilityforthatatom.ThisislikelywhatishappeningwiththeerrorsinGNMcalculationoftheproteinsinFigures4.18:,4.19:and4.20:,theproteinattheendofthechainmaybeneartheedgeofthedistanceformanyinteractionswiththebulkoftheproteins.WhileadjustingGNM'sdistancemaytempertheerrorbeingintroduced,itcannoteliminateitcompletelyunlesstheychangetoasoftdecayingkernelmethodsuchasFRI.834.4.2.4MarinesnailtoxinFigure4.21:Top,avisualcomparisonofatomicexperimentalB-factors(farleft),C-alphaexperimentalB-factors(left),FRIpredictedB-factors(right)andGNMpredictedB-factors(farright)forthemarinesnailconotoxin(PDBID:1NOT).Bottom,TheexperimentalandpredictedB-factorvaluesplottedperresidue.Theexampleisnotaproteinbutapeptidemolecule,apredatorymarinesnailtoxin,showninFigure4.21:.Thispeptideadoptsacyclicalsecondarystructurewhichismadeup84oftwoconnectedloopscreatedbytwobonds.Inthisstructuretherehappenstobeaparticularresidueatthebeginningofthechainwhichismuchmorethantheothers.Thisisacaseforyprediction,especiallycoarse-grainedpredictions,astheremaybeside-chaininteractionsmakinglargecontributionstotheyofsomeatomsandtherearetwobondsthatlink.Nevertheless,mFRIisabletoaccuratelyreproducethehighyoftheresidue.GNMontheotherhandisunabletorecreatethepatternofityatanyparameterization.ThisisagainduetotheuseofahardintheGNMmethodandtheuseofasinglekernel.TheindistancesbetweenresiduesinthisstructurearetoosubtletobecapturedbyamethodthattreatsdistancewithahardThekernelsusedinFRIaresensitiveenoughtodetecttheindistancesbetweenatomsinthisstructurewhichleadstothesinglestand-outresidue.4.5FRIforprotein-nucleicacidcomplexesInthissection,weparameterizeandtestthepreviouslydescribedmFRIonprotein-nucleicacidstructures.AimmediateconcerniswhethertheproposedmFRIisastonprotein-nucleicstructuresasitisonprotein-onlystructuresasshowninapreviousstudy.74TheaccuracyofthemFRImethodistestedbytheB-factorpredictionoftwosetsofprotein-nucleicacidstructures,includingasetof64moleculesusedinarecentGNMstudy78andasetof203moleculesformoreaccurateparameterizationofmFRI.4.5.1Coarse-grainedrepresentationsofprotein-nucleicacidcomplexesInthissection,weconsideryanalysisofprotein-nucleicacidcomplexes.Tothisend,weneedcoarse-grainedrepresentations.Weconsiderthreecoarse-grainedrepresentationofnucleicacidstobeusedinconjugationwiththeC-onlyrepresentationusedforproteins.ThesethreemodelsareidenticaltothoseusedbyYangetal.78andarenamedM1,M2and85Figure4.22:MCCsforsinglekernelparametertestusingtheM1(squares),M2(circles)andM3(triangles)representations.Lorentzkernelwith˛=3isused.TheparameterisvariedtothemaximumMCConthetestsetofstructures.Theresultsforasetof64protein-nucleicstructures(PDBIDslistedinTable4.12:)areshownontheleft,whileresultsforaseparatesetof203structures(PDBIDslistedinTable4.13:)isshownontherightformoregeneralselections.86Figure4.23:Illustrationhighlightingatomsusedforcoarse-grainedrepresentationsinprotein-nucleicacidcomplexesforFRIandGNM.InadditiontoproteinCatoms,ModelM1considersthebackbonePatomsfornucleotides.ModelM2includesM1atomsandaddsthesugarO4'atomsfornucleotides.ModelM3includesM1atomsandaddsthesugarC4'atomsandthebaseC2atomsfornucleotides.M3.ModelM1consistsofthebackbonePatomsandproteinCatoms.ModelM2containsthesameatomsasM1butalsoincludessugarO4'atoms.ModelM3includesatomsfromM1andaddsthesugarC4'atomsandbaseC2atoms,seeFig.4.23:.ModelM1issimilartoproteinCrepresentationsbecausetheyarebothbackbone-onlyrepresentations.TheatomsinM1are6bondsapartwhileCatomsare3bondsapart.ModelM2includesPatomsandaddstheO4'atomslocatedontheriboseportionofthenucleotide.Finally,modelM3includesatomsofP,C4'andC2,acarbonfromthebaseportionofthenucleotide,seeFig.4.23:.AspointoutbyYangetal.,78nucleotidesareapproximatelythreetimesmoremassivethanaminoacidsandsomodelM3withthreenodespernucleotideisconsistentinthissensewithusingCatomsfortheproteinrepresentation.4.5.2mFRIB-factorprecictionsforprotein-nucleicacidstructuresToparameterizeandtesttheaccuracyofmultikernelfFRIonprotein-nucleicacidstruc-tures,weuseadatasetfromYangetal.78containing64structures.Inaddition,weconstructalargerdatabaseof203highresolutionstructures.Thisexpandedprotein-nucleicstructuresetwasobtainedbysearchingtheProteinDataBank(PDB)forstructuresthatcontainboth87ProteinandDNAandstructurewhichhaveanX-rayresolutionbetween0.0and1.75A.AllPDBareprocessedbyremovinglowoccupancyatomiccoordinatesforstructureshavingresidueswithmultiplepossiblecoordinates.ThePDBIDsofthe64and203structurescanbefoundinTable4.12:andTable4.13:,respectively.ToquantitativelyassesstheperformanceoftheproposedmultikernelFRImethod,weconsiderthecorrelationcot(CC)CC=PNi=1BeiBeBtiBthPNi=1(BeiBe)2PNi=1(BtiBt)2i1=2;(4.7)wherefBti;i=1;2;;NgareasetofpredictedB-factorsbyusingtheproposedmethodandfBei;i=1;2;;NgareasetofexperimentalB-factorsreadfromthePDBHereBtandBethestatisticalaveragesoftheoreticalandexperimentalB-factors,respectively.4.5.2.1MultikernelFRItestingonprotein-nucleicstructuresPrevioustestsofsinglekernelFRIindicatethattheLorentztypeandexponentialtypecorrelationkernelsarethetwomostaccuratekerneltypes.Thisleadsustotrythecom-binationofthesetwotypesofkernels.TheresultingmultikernelFRImethodrequiresfourparameters,namely,andfortheexponentialkerneland˛andfortheLorentzkernel.4.5.2.2SinglekernelFRItestingInordertocompareFRIandGNMmethodsforprotein-nucleicacidstructures,wetestoursinglekernelFRIatarangeofvalues.ForthistestweusetheLorentzkernelwith˛=3forB-factorpredictiononbothstructuressetsandallthreerepresentations(M1,M2andM3).TheresultsareshowninFigure4.22:.Forthe64structureset,singlekernelFRIhasamaximummeancorrelationcot(MCC)toexperimentalB-factorsforM1,M2andM3representationsof0.620,0.612and0.555.Comparatively,GNMhadaMCCofapproximately0.59,0.58and0.55forM1,M2andM3forthesamedataset.78ThemaximumMCCsforFRIonthelargerdatasetforM1,M2andM3are0.613,0.625and0.586,respectively.The88Figure4.24:Meancorrelationcots(MCCs)fortwo-kernelFRImodelsonasetof203protein-nucleicstructures.Fromlefttoright,MCCvaluesareshownforM1,M2andM3representations.WeuseoneLorentzkernelwith˛=3:0andoneexponentialkernelwith=1:0.Thevaluesofparameterforbothkernelsarevariedfrom2to20A.M1andM2representationsperformbetterthantheM3representation.4.5.2.3Parameter-freemultikernelFRIAswithprotein-onlystructures,wedevelopmultikernelFRIswithmultiplekernelstoimproveaccuracyofpredictiononprotein-nucleicacidstructures.InordertosimplifytheFRImethod,wetrytodevelopanaccurateparameter-freeversionforatwo-kernelmFRI.WeuseacombinationofoneLorentzandoneexponentialkernel.Valuesforparameters˛andaresetto3.0and1.0respectivelybasedontheresultsofpreviousFRIstudies.49Theoptimalvaluesforinbothkernelsaredeterminedbytestingarangeofpossiblevaluesfrom2to20A.Allthreerepresentations(M1,M2andM3)describedpreviouslyareconsidered.Theresultsofthesetestsonthesetof203protein-nucleicacidstructuresareshowninFigure4.24:.Asexpected,theadditionofanotherkernelsresultsinanoverallincreaseinaccuracyforthe203complexset.Fortwo-kernelmFRI,theMCCsincreaseupto0.68forM1,0.67forM2and0.63forM3.ThechoiceofturnsouttobeveryrobustbasedonresultsshowninFigure4.24:.89Table4.12:Correlationcots(CCs)betweenpredictedandexperimentalB-factorsforthesetof64protein-nucleicstructures.78HereN1,N2andN3valuesrepresentthenumberofatomsusedfortheM1,M2orM3representationsforeachstructure.Weusetheparameter-freetwo-kernelmFRImodelwithoneexponentialkernel(=1and=18A)andoneLorentzkernels(˛=3,=18A.PDBIDsmarkedwithanasterisk(*)indicatestructurecontainingonlynucleic-acidresidues.M1M2M3PDBIDCCN1CCN2CCN31asy0.64711140.64512480.63113821b230.7514710.7745370.7146031c0a0.7636530.7047210.5987891CX00.8211620.7632340.6273061drz0.8461620.7542340.5853061efw0.53712860.64714120.66015381egk*0.2731040.2982120.2673201ehz*0.623620.7061240.7221861evv*0.710620.7691240.7701861f7u0.5776700.5887340.6037980.75964820.79393100.809121380.5209910.54910660.56811411fg0*0.7204980.7239960.72114940.687610.5761220.4391831fjg0.46139150.58554280.60069411gid*0.6493160.6436320.5839481gtr0.7246030.7476770.6457511h3e0.7175070.7245860.6456631h4s0.67110110.70410760.62611411hr2*0.5993130.5896280.5859431i940.48939230.61554370.65269511i9v*0.615730.6311470.6422201j1u0.7303720.6714460.4565201j2b0.68613000.71214480.67215961j5a0.53231580.54859320.51087061j5e0.42739090.54654220.55369351jj20.79965670.83994430.836123191jzx0.58631580.60059320.56187061l8v*0.7003120.6886260.6729401l9a0.8492110.7893360.6754611lng0.7801830.5952800.40537790Table4.11,continued:Correlationcots(CCs)betweenpredictedandexperimentalB-factorsforthesetof64protein-nucleicstructures.78HereN1,N2andN3valuesrepresentthenumberofatomsusedfortheM1,M2orM3representationsforeachstructure.Weusetheparameter-freetwo-kernelmFRImodelwithoneexponentialkernel(=1and=18A)andoneLorentzkernels(˛=3,=18A.PDBIDsmarkedwithanasterisk(*)indicatestructurecontainingonlynucleic-acidresidues.M1M2M3PDBIDCCN1CCN2CCN31m5k0.9044020.8416220.7608421m5o0.9214050.8726290.8108531mfq0.7733410.6884680.5435951mms0.5073170.5484330.6465491n320.38839160.49454470.51769781nbs*0.5472700.5665400.5738101o0c0.7666020.7586760.6367501qf60.6087100.5787790.5408481qrs0.6716030.6726770.5867511qtq0.6206020.6406760.5967501qu20.5209910.54910660.56811411qu30.5799540.59910290.61311041rc70.5992560.5662960.4703361s720.82366360.83995070.831123781ser0.7488550.7439170.6579781sj30.8801670.8052400.6143131tn2*0.686620.7121240.6761861tra*0.624620.6701240.6601861ttt0.57814010.56415870.51517731u0b0.7575350.7546090.6216831u6b0.4763120.4905310.5067501u9s*0.4461550.4323100.4194651vby0.8771670.7922400.5873131vc00.8781670.8042400.6113131vc50.8611640.8402340.6853041y0q*0.4912300.4844630.4726961y26*0.677700.6971410.7092121yfg*0.565640.6001280.6231921yhq0.83566360.84095070.831123781yij0.83666360.85195070.842123782tra*0.614650.6141300.6131953tra*0.645640.6151280.6201924tra*0.679620.7151240.69418691Table4.13:ThePDBIDsofthe203highresolutionprotein-nucleicstructuresusedinoursingle-kernelFRIparametertest.IDsmarkedwithanasteriskindicatethosecontainingonlynucleicacidsresidues.PDBIDPDBIDPDBIDPDBIDPDBIDPDBIDPDBIDPDBIDPDBIDPDBID1A1H1A1I1AAY1AZP1BF41C8C1D021D2I1DC11DFM1DP71DSZ1EGW1EON1F0V1FIU1H6F1I3W1JK21JX41K3W1K3X1L1Z1L3L1L3S1L3T1L3V1LLM1MNN1NJX1NK01NK41OJ81ORN1PFE1QUM1R2Z1RFF1RH61SX51T9I1U4B1VTG1WTO1WTQ1WTV1XJV1XVK1XVN1XVR1XYI1ZS42ADW2AXY2BCQ2BCR2BOP2C622C7P2EA02ETW2EUW2EUX2EUZ2EVF2EVG2FMP2GB72HAX2HEO2HHV2IBT2IH22ITL2NQ92O4A2OAA2ODI2P2R2PY52Q102R1J2VLA2VOA2WBS2XHI2Z702ZKD3BIE3BKZ3BM33BS13D2W3EY13EYI3FC33FDE3FDQ3FSI3FYL3G003G9M3G9O3G9P3GO33GOX3GPU3GQ43HPO3HT33HTS3I0W3I2O3I3M3I493I8D3IGK3JR53JX73JXB3JXY3JXZ3KDE3KXT3M4A3MR33MXM3NDH3O1M3O1P3O1S3O1T3O1U3OQG3PV83PVI3PX03PX43PX63PY83QEX3RKQ3RZG3S573S5A3SAU3SJM3TAN3TAP3TAQ3TAR3THV3TI03U6E3U6P3V9W3ZDA3ZDB3ZDC3ZDD4A754B214B9S4DFK4DQI4DQP4DQQ4DS44DS54DSE4DSF4E0D4ECQ4ECV4ECX4ED04ED24ED74ED84EZ64F1H4F2R4F2S4F3O4F4K4F8R4FPV4GZ14GZN4HC94HIK4HIM4HLY4HTU4HUE4HUF4HUG4IBU4IX74KLG4KLI4KLM4KMF92Table4.14:MCCsofGaussiannetworkmodel(GNM),78singlekernelility-rigidityindex(FRI)andtwo-kernelmFRIforthreecoarse-grainedrepresentations(M1,M2,andM3).Asetof64protein-nucleicacidstructures78isused.GNM78FRITwo-kernelmFRIM10.590.6200.666M20.580.6120.668M30.550.5550.620Wehavealsocarriedoutasimilartestoftwo-kernelmFRI(˛=3:0and=1:0)forthesetof64protein-nucleicacidstructures.Notethatthishasmanylargecomplexes.TheMCCsforM1,M2andM3modelsare0.668,0.666and0.620,respectively,whicharesimilartowhatwehavefoundforthesetof203structures.Thesetof64structuresincludes19structurescomposedofnucleicacidsandnoaminoacids.TheMCCsforthisnucleicacid-onlysubset0.608,0.617and0.603forM1,M2andM3models.Thecorrelationcotsforall64individualmolecularcomplexesarelistedinTable4.12:.TosummarizetheperformanceofGaussiannetworkmodel,singlekernelFRI,andtwo-kernelmFRI,welisttheirMCCsforthe64protein-nucleicacidstructuresinTable4.14:.Itcanbeseenthat,theFRIoutperformsGNMinallthreerepresentations,andtwo-kernelmFRIfurthertlyimprovestheaccuracyofourmethodandachievesupto15%improvementcomparedwithGNM.78Basedonourearliertest,50webelievethatourthree-kernelmFRIcandeliverabetterprediction.4.6Protein-nucleicacidstructureapplicationsInthissectionweexploretheapplicationsofthemFRIandaFRImethodstolargeprotein-nucleicacidcomplexes.WehighlightafewparticularexampleswheremFRIimprovesuponpreviousFRImethods,inparticular,fortheypredictionofribo-somes.Further,weshowhowaFRIiswellsuitedforthestudyofthedynamicsoflargemacromolecularcomplexesusingthebacterialRNApolymeraseactivesiteasanexample.934.6.1mFRIypredictionforribosomesSomeofthelargestandmostbiologicallyimportantstructuresthatcontainbothpro-teinandnucleicacidsareribosomes.Ribosomesaretheproteinsynthesizersofthecellandconnectaminoacidintopolymerchains.Inribosomes,proteinsandRNAinteractthroughintermolecularsuchaselectrostaticinteractions,hydrogenbonding,hydrophobicin-teractions,basestackingandbasepairing.RNAtertiarystructurescantlyprotein-RNAinteractions.RibosomesareprimarilycomposedofRNAwithmanysmallerassociatedproteinsasshowninFig.4.25:.ThetopofFig.4.25:showsthe50Ssubunitoftheribosome(PDBID:1YIJ)withthenucleicacidsinasmoothsurfacerepresentationwiththeproteinsubunitsboundandshowninasecondarystructurerepresentation.Thesetof64structuresusedinourtestscontainsanumberofribosomalsubunits.Duetotheirmultiscalenature,thesestructuresalsohappentobeamongthosethatbthemostfromusingmultikernelFRIoversinglekernelFRIorGNM.Forexample,inthecaseofribosome50Ssubunitstructure(PDBID:1YIJ),B-factorpredictionwiththree-kernelFRIyieldsaCCvalueof0.85,whilethatofsinglekernelFRIisonlyaround0.3.GNMdoesnotprovideagoodB-factorpredictionforthisstructureeither.Thethree-kernelmFRImodelweusedisoneexponentialkernel(=1and=15A)andtwoLorentzkernels(˛=3,=3Aand˛=3,=7A).ThecomparisonbetweenmFRI-predictedandexperimentalB-factorsforribosome50SsubunitstructureisdemonstratedinFig.4.25:.Byusingthecocientsfromtheabove50Ssubunit(1YIJ)yanalysis,wehaveobtainedypredictionsfortheentireribosome(PDBID:4V4J)aswellasmanyproteinsubunitsandotherRNAsthatassociatewithit,seeFig.4.25:.Toavoidconfusion,theB-factorsfor4V4Jareuniquelydeterminedbyusingnotonlythesamethree-kernelmFRImodelfromthecase1YIJ,andalsoitsparametersa1=;a2=;a3;andb.Again,theFRIvaluesaremappedbycolortothesmoothsurfaceofthenucleicacids,however,94(a)CompleteribosomewithboundtRNAsPDBID:4V4J.(b)Ribosome50SsubunitPDBID:1YIJBfactors(c)Ribosome50SsubunitPDBID:1YIJFigure4.25:CompleteribosomewithboundtRNAs(yellow(Asite)andgreen(Psite))andmRNAShine-Delgarnosequence(orange)PDBID:4V4J.ThesamecorrelationcotsandparametersfrommFRImodelofprotein1YIJareused.AcomparisonofpredictedandexperimentalB-factordataforRibosome50SsubunitPDBID:1YIJ.TheCCvalueis0.85usingtheparameterfreethree-kernelmFRImodel.NucleicacidsareshownasasmoothsurfacecoloredbyFRIilityvalues(redformoreregions)whileboundproteinsubunitsarecoloredrandomlyandshowninasecondarystructurerepresentation.WeachieveaCCvalueupto0.85usingparameterfreethree-kernelmFRImodelwithoneexponentialkernel(=1and=15A)andtwoLorentzkernels(˛=3,=3Aand˛=3,=7A).95inthesebottomtheproteinsubunitsareomittedtodrawattentioninsteadtothevarioustypesofRNAinvolvedinthisstructure.4.6.2aFRIconformationalmotionpredictiononanRNApolymerasestructureRNApolymeraseisoneoftheessentialenzymesforalllifeonEarthasweknowittodayandpossiblyfromtheverybeginningoflife.13,35Despitethisimportance,themechanismsformanyofthepolymerase'sfunctionsarestillnotwellunderstoodontheatomiclevel.ConsiderablehasbeenspentbothexperimentallyandcomputationallytounderstandRNAPpolymerasefunctioninmoredetailbutmanyquestionsremain.ThestudyofRNApolymeraseexperimentallyorcomputationallyisandoftenexpensiveduetothesizeofthesystemandvarietyofmoleculesinvolved.TheminimalrequiredelementsforabacterialoreukaryoticRNApolymeraseincludemultipleproteinsubunits,adoublestrandedDNAmolecule,asinglestrandedRNAmolecule,freenucleotides,variousions(Mg2+,Zn2+,Na+etc.)andsolvent.Atypicalsetupforthissysteminall-atommoleculardynamicsincludes300,000atomswhensolvated.Withthisnumberofatomsandcurrentcomputerpower,itisoftennotfeasibletosimulatethesemoleculesonbiologicallyrelevanttimescalesusingMD.Perhapsthemostpopulartoolforstudyinglongtimedynamicsofbiomoleculesisnormalmodeanalysis(NMA)anditsrelatedmethodssuchastheanisotropicnetworkmodel(ANM).Thesemethodshavebeensuccessfullyusedtostudyproteindynamicsformanyproteins,however,attheirmaximumaccuracy,theircomputationalcomplexityisofO(N3),whereNisthenumberofatoms.Thisisaproblembecausemanycellularfunctionsinvolvealargenumberofmacromoleculeswithmanythousandstomillionsofresiduestoconsider.Therefore,futurecomputationalstudiesofbiomoleculesbeyondtheproteinscalewillrequiremethodswithbetterscalingpropertiessuchasFRIandaFRI.Inthisexample,weusecompletelylocalanisotropicFRItoexaminecorrelatedmotionsinregionsneartheactivesiteofbacterialRNApolymerase,includingthebridgehelix,96(a)RNAPolymerasewithclosedtriggerloop(b)Correlatedmotionnearactivesite(c)aFRImode1-OpenTL(d)aFRImode1-ClosedTLFigure4.26:TheRNAPlocalaFRImodeforthebridgehelix,triggerloopandnucleicacidsfrombothopen(PDBID:2PPB)andclosed(PDBID:2O5J)ns.ArrowsrepresentthedirectionandrelativemagnitudeofatomicArrowsforthebridgehelix,triggerloopandnucleicacidsarepicturedasblue,whiteandyellow,respectively.97triggerloopandnucleicacidchains.Weexaminetherelationshipbetweenthesecomponents'motionsandtheircontributionstocriticalfunctionssuchascatalysisandtranslocation.Weusetheanisotropicrigidityforminsection3.5withtheLorentzkernel(˛=2and=3A).Figure4.26:aisarepresentationofRNApolymerase(PDBID2PPB)thatshowstheseimportantfeatureswhichareburiedinthecoreofthelargestproteinsubunits,and0.Thebridgehelixandtriggerloop,showningreenandbluerespectively,arepartsoftheproteinthathavebeenimplicatedinmostoftheessentialfunctionsofthepolymerase.Mutationalstudiesoftheseregionsresultinmodulationofthepolymerasespeedandaccuracy,bothpositivelyandnegatively,indicatingtheregionsareimportantfornormalfunctioningoftheenzyme.Howtheseregionsaidthesefunctionsandhowtheyinteractremainsanopenquestion.WiththisdemonstrationoflocalaFRIanalysiswehopetoshedsomelightonhowtheseessentialpartsofRNApolymeraseworktogether.LocalaFRI,asdescribedinearlierwork,ismuchlesscomputationallycostlythanglobalaFRIorNMAandhasbeenshowntohavequalitativelysimilarresultsforsmalltolargesizesingleproteins.TofurthervalidatethelocalaFRImethodwecomparetheconclusionsfromalocalaFRIstudyofRNAPtothoseofNMAbasedstudies.TheRNApolymeraseelongationcomplexisarelativelylargesystembutitisstilltenableforNMAmethods.NMAhasbeenappliedtobothbacterialandeukaryoticRNApolymeraseinthepast23,69whichprovidesuswithapointofcomparisonforourresults.LocalaFRIproducesthreemodesofmotionsortedfromlowesttohighestfrequencyvi-brationaccordingtoeigenvalueasinNMA.InFigure4.26:wepresentfromthelowestfrequencymodeelyfocusingonthemostdominantmotionofeachconforma-tion.TwomajorconformationsofRNApolymeraseareconsidered,thosewithopenandclosedtriggerloopregions(Figures4.26:cand4.26:d.)Aclosedtriggerloopisonethatiscompletelyfoldedintotwoparallelalphaheliceswhileanopentriggerloophasaregionof98disorderedloopbetweentwoshorterhelicesandisslightlybentawayfromthebridgehelix.TheclosingorfoldingofthetriggerloopintotheclosedconformationisassumedtofollowbindingofanNTPintheactivesiteandtoprecedecatalysis.Aftercatalysis,itissuspectedthatthetriggerloopopensorunfoldstofacilitatetranslocationandpermitnewNTPstoentertheactivesite.TheresultsofaFRIanalysisontheoftriggerloopclosingrevealadistinctchangeincorrelatedmotionsinopenandclosedtriggerloopconformations.Thesechangesinvolveinteractionsbetweenthebridgehelix,thetriggerloopandthenucleicacidregions.InFigure4.26:b,regionsofhighcorrelationarecolorcodedwhichrevealsthatthebridgehelixiscomposedoftwohighlyselfcorrelatedportionssuggestingthepresenceofahingeinthebridgehelix.Infact,thecentralportionofthebridgehelixhasbeenobservedasakinkedorbenthelixinayeastRNAPstructure.72Additionally,itisobservedthataportionofthebridgehelixandtheN-terminalhelixofthetriggerlooparehighlycorrelatedintheclosedtriggerloopstructureonly.Thissetoftwohelicesissituateddirectlynexttotheactivesiteandcouldprovidestabilitytoaidcatalysisaftertriggerloopclosing.Additionally,correlationbetweennucleicacidsandproteinshowsmarkedfromtheopentriggerlooptoclosedtriggerloopstructures.ThemotionsindicatedinFigures4.26:cand4.26:dshowthattheopentriggerloopstructureisprimedtotranslocatebasedonthedirectionofhighlycorrelatedmotionsoftheupstreamanddownstreamnucleicacids.Bycontrast,theclosedtriggerloopnucleicacidmotionsareconsiderablylesscorrelatedandnotinthedirectionoftranslocation.ThisistheexpectedrelationshipasitmatchestheresultsfrompreviousbiologicalandNMAstudiesofRNApolymerase.23Thesebetweenaclosedtriggerloopandopentriggerloopstructurerevealpo-tentiallyimportantstructuralchangesthatariseastheRNApolymeraseswitchesbetweenopenandclosedtriggerloopconformationsduringthetransitionbetweentranslocationand99catalysis.Sp,theresultsfortheclosedtriggerloopconformationsuggestthepres-enceofastabilizedcatalyticareawhichismadeoftheN-terminalhelixofthetriggerloopandthebridgehelix.Theresultsfortheopentriggerloopconformationshownosuchcoor-dinationoftheactivesitehelicesandinsteadindicatesalesshingeandcoordinatedmotioninthedirectionoftranslocation.Takentogethertheseresultsprovideapotentialexplanationforhowtriggerloopopeningandclosingiscorrelatedwithtranslocationandcatalysisrespectively.4.7GeneralizedGNM,multiscaleGNMandmultiscaleANMmethods4.7.1GeneralizedGaussiannetworkmodel4.7.1.1ComparisonbetweengGNMandFRIBasedontheanalysisinSection3.6,itisstraightforwardtoconstructcorrelationfunction-basedgGNMsviathematrixinverseofthegeneralizedKircmatrix(3.46),whichleadstomanynewgGNMsincludingtheoriginalGNMasaspeciallimitingcase.Also,itispossibletoconstructaFRImethodusingtheKircmatrixofGNM.Inlightoftheseobservationsitisnecessarytodirectlycomparetheperformanceoftherelatedmeth-odsandtoexplorewhetherthereisanyfurtherrelationshipbetweenthesetwoapproaches,spthediagonalelementsofthegGNMmatrixinverseandthedirectinverseofthediagonalelementsofageneralizedKircmatrix.Toaddressthisquestion,weselecttworepresentativecorrelationfunctions,theLorentz(˛=3)andILFfunctions,toconstructthegeneralizedKircmatrix(3.46).TheLorentzfunctionisafrequentlyusedcorrelationfunctioninourearlierwork.49Incontrast,theILFfunction,whiletypicalinGNM,isanextremecaseofFRIcorrelationfunctionnotpreviouslyconsideredinworkonFRI.TheresultingtwogeneralizedKircmatrices(3.46)canbeusedforcalculatingthegGNMmatrixinverseortheinversediagonalelementsoftheFRImatrix.Thisresultsinpossiblecombinationsofmethods,namely,FRI-Lorentz,FRI-ILF,GNM-LorentzandGNM-ILF.100Totestthemethodsdescribedabove,weanalyzethetyofaproteinfrompathogenicfungusCandidaalbicans(ProteinDataBankID:2Y7L)with319residuesasshowninFig.4.27:(a).Weconsiderthecoarse-grainedCrepresentationofprotein2Y7L.WedenoteBGNMILF,BFRIILF,BGNMLorentzandBFRILorentzrespectivelythepredictedB-factorsofGNM-ILF,FRI-ILF,GNM-LorentzandFRI-Lorentzmethods.TheexperimentalB-factorsfromX-rayBExp,arealsodisplayedforcomparison.ThePearsonproduct-momentcorrelationcoient(PCC)isusedtomeasurethestrengthofthelinearrela-tionshipordependencebetweeneachsetsofpredictedorexperimentalB-factors.Sinceperformanceofthesemethodsdependsontheirparameters,thedistance(rc)intheILFandthescalevalue()intheLorentzfunction,thetheoreticalB-factorsarecomputedoverawiderangeofrcandvaluestotheparametersthatworkbestforeachmethod.Figure4.28:depictsPCCsbetweenvarioussetsofB-factorsforprotein2Y7L.AsshowninFig.4.28:(a),thedistancercoftheILFisvariedfrom5Ato64A.ThePCCsbetweenBGNMILFandBExp,andbetweenBFRIILFandBExp,indicatethatbothGNM-ILFandFRI-ILFareabletoprovideaccuratepredictionsofycomparedtheexperimentalB-factors.Thebestpredictionsareattainedaroundrc=24A,whichistlylargerthanthecommonlyusedGNMcudistanceof7-9A.4.7.1.2IntrinsicbehaviorofgGNMatlargedistanceItisinterestingtoobservethatGNM-ILFandFRI-ILFprovideessentiallyidenticalpre-dictionswhenthedistanceisequaltoorlargerthan20A.Thisphenomenonindicatesthatwhentheistlylarge,thediagonalelementsofthegGNMinversematrixandthedirectinverseofthediagonalelementsoftheFRIcorrelationmatrixbecomelinearlydependent.ToexaminetherelationbetweenGNM-ILFandFRI-ILF,wecomputePCCsbetweenBGNMILFandBFRIILFoverthesamerangeofdistances.AsshowninFig.4.28:(a),thereisastronglineardependencebetweenBGNMILFandBFRIILFforrc10A.101Figure4.27:Illustrationofprotein2Y7L.(a)Structureofprotein2Y7Lhavingtwodo-mains;(b)CorrelationmapgeneratedbyusingGNM-Lorentzindicatingtwodomains;(c)ComparisonofexperimentalB-factorsandthosepredictedbyGNM-Lorentz(=16A);(d)ComparisonofexperimentalB-factorsandthosepredictedbyFRI-ILF(rc=24A).102Figure4.28:PCCsbetweenvariousB-factorsforprotein2Y7L.(a)CorrelationsbetweenBGNMILFandBExp,betweenBFRIILFandBExp,andbetweenBGNMILFandBFRIILF;(b)CorrelationsbetweenBGNMLorentzandBExp,betweenBFRILorentzandBExp,andbetweenBGNMLorentzandBFRILorentz.Tounderstandthisdependenceatlargedistance,weconsideranextremecasewhenthedistanceisequaltoorevenlargerthantheproteinsize,soalltheparticleswithinthenetworkarefullyconnected.Inthissituation,wecananalyticallycalculateithdiagonalelementoftheGNMinversematrix1rij;rc!1))ii=N1N2;(4.8)andtheFRIinverseoftheithdiagonalelement1PNj;j6=irij;rc!1)=1N1:(4.9)TheseresultsshowastrongasymptoticcorrelationbetweenBGNMILFandBFRIILFinFig.4.28:(a).TheyalsoexplainwhypredictionsoftheoriginalGNMandFRI-ILFdeteriorateasrcistlylargebecauseallthepredictedB-factorsbecomeidentical,eitherN1N2or1N1.Andtwomethodsdeliververysimilarresults,especiallywhenthetotalnumberisvery103large,aswehaveN1N21N1!1whenN!1.TheperformanceandcomparisonbetweenGNM-LorentzandFRI-LorentzisillustratedinFig.4.28:(b)wherethescaleparameterrangesfrom0.5Ato64A.ItisseenfromtheseresultsthattheGNM-Lorentzmethodisasuccessfulnewapproach.Infact,itoutperformstheoriginalGNM.AcomparisonofthepredictedB-factorsandtheexperimentalB-factorsisplottedinFigs.4.27:(c)and4.27:(d)forGNM-LorentzandFRI-ILF,respectively.ItisseenthatBFRIILFmorecloselymatchestheexperimentalB-factorsthanBGNMLorentzdoesduetothetschemesemployedbytwomethodsasshowninEqs.(3.36)and(3.38),respectively.AsshowninFig.4.28:(b),thepredictionsfromGNM-LorentzandFRI-Lorentzbecomeidenticalas5A.AstrongcorrelationbetweenBGNMLorentzandBFRILorentzisrevealedatanevensmallerscalevalue.Thisbehaviorleadstoageneralrelation1rij;))ii!cPNj;j6=irij;);!1;(4.10)wherecisaconstant.Relation(4.10)meansthatthecorrelationfunctionbasedgGNMisequivalenttotheFRIforagivenadmissiblecorrelationfunctionwhenthescaleparameteristlylarge.ThisrelationiscertainlytruefortheILFasanalyticallyprovedinEqs.(4.8)and(4.9).Relation(4.10)isaveryinterestingandpowerfulresultnotonlyforthesakeofunderstandingtheGNMandFRImethodsandtheirrelationship,butalsoforthedesignofaccurateandtnewmethods.Itshouldbenoticedthatourareconsistentwiththeprevious54that,thelocalpackingdensitydescribedbythedirectinverseofthediagonaltermsrepresentsonlytheleadingorderbutnottheentiresetofthedynamicsdescribedbygGNM.OurresultsrevealaninterestingconnectionbetweenFRIandgGNMwhenthecharacteristicdistanceistlylarge.104Figure4.29:PCCsbetweenvariousB-factorsaveragedover364proteins.(a)CorrelationsbetweenBGNMILFandBExp,betweenBFRIILFandBExp,andbetweenBGNMILFandBFRIILF;(b)CorrelationsbetweenBGNMLorentzandBExp,betweenBFRILorentzandBExp,andbetweenBGNMLorentzandBFRILorentz.4.7.1.3ValidationofgGNMwithextensiveexperimentaldataItremainstobeproventhattheabovefromasingleproteinaretranslatableandvonalargeclassofbiomolecules.Tothisend,weconsiderasetof364proteins,asubsetofthe365proteinsutilizedanddocumentedinourearlierwork.49Theomittedproteinis1AGN,whichhasbeenfoundtohaveunrealisticexperimentalB-factors.Wecarryoutsystematicstudiesoffourmethodsoverarangeofdistancesorscalevalues.Foreachgivenrcor,thePCCsbetweentwosetsofB-factorsareaveragedover364proteins.Figure4.29:illustratesourresults.Figure4.29:(a)plotstheresultsoftheILFimplementedinbothGNMandFRImethodswiththedistancevariedfrom4Ato23A.Figure4.29:(b)depictssimilarresultsobtainedbyusingtheLorentzfunctionimplementedintwomethods.Thescalevalueisvariedovertherangeof0.5Ato10A.First,itisevidentthattheproposednewmethod,GNM-Lorentz,isveryaccuratefor105theB-factorpredictionof364proteinsasshowninFig.4.29:(b).ThebestGNM-Lorentzpredictionisabout10.7%betterthanthatoftheoriginalGNMshowninFig.4.29:(a).Infact,GNM-LorentzoutperformstheoriginalGNMoverawiderangeofparametersforthissetofproteins,whichindicatesthattheproposedgeneralizationisvaluable.Similarly,FRI-Lorentzisalsoabout10%moreaccuratethanFRI-ILFinB-factorprediction.SincetheILFisaspecialcaseandthereareitelymanyFRIcorrelationfunctions,thereisawidevarietyofcorrelationfunctionbasedgGNMsthatareexpectedtodelivermoreaccurateyanalysisthantheoriginalGNMdoes.Additionally,theFRI-Lorentzmethodisabletoattainthebestaveragepredictionfor364proteinsamongthefourmethodsasshowninthezoomedinpartsinFig.4.29:(b).However,foragivencorrelationfunction,thebetweenFRIandgGNMpredictionsisverysmall.Moreover,foragivenadmissibleFRIfunction,gGNMandFRIB-factorpredictionsarestronglylinearlycorrelatedandreachnear100%correlationwhenrc>9Aor>0:5Afor364proteinsasdemonstratedinFig.4.29:.ThisasolidofEq.(4.10).Therefore,correlationfunctionbasedgGNMs,includingtheoriginalGNMasaspecialcase,areindeedequivalenttothecorrespondingFRImethodsinthexibilityanalysisforawiderangeofcommonlyusedscalevalues.Furthermore,ithasbeenshownthatthefastFRIisalinearscalingmethod,49whilegGNMscalesasO(N3)duetotheirmatrixinverseprocedure.Asaresult,theaccumu-latedCPUtimesfortheB-factorpredictionsof364proteinsatrc=7or=3are0.88,1.57,5071.32and4934.79secondsrespectivelyfortheFRI-ILF,FRI-Lorentz,GNM-ILFandGNM-Lorentz.Thetestisperformedonaclusterwith8IntelXeon2.50GHzCPUsand128GBmemory.gGNMmethodsareveryfastforsmallproteinsandmostoftheaccumu-latedgGNMCPUtimeisduetothecomputationofthreelargestproteins(1F8R,1H6V106Figure4.30:TheaveragePCCsover362proteinsforType-1mGNM.(a)TwoILFkernelsandtheirdistancesaresystematicallychangedfrom5Ato31A.(b)Twoexponentialkernelsandtheirscalesaresystematicallyvariedintherangeof[1A,26A].and1QKI)inthetestset.Finally,itisworthmentioningthattheearlierFRIrigidityindexincludesthecontribu-tionfromtheselfcorrelation.49,75ThepresentdonotchangeifthesummationinthegeneralizedKircmatrix(3.46)ismotoincludethediagonaltermandthenthecalculationofgGNMmatrixinverseismotoincludethecontributionfromeigenmode,1)ii=PNk=11ukuTkii.Infact,thismomakesthegeneralizedKircmatrixlesssingularandfasterconverging.4.7.2MultiscaleGaussiannetworkmodel4.7.2.1Type-1mGNMWevalidateourtwotypesofmGNMwithvariousparametervaluesoverasetof362proteins.Twolargestproteins,1H6Vand1QKI,areremovedfromourearlierdatasetof364proteins49duetothelimitedcomputationalresources.Twokindsofkernels,ILFandexponential,areemployed.Toexplorethemultiscalebehavior,weusetwokernelsofthesametypebutwithtcharacteristicdistancesinourmGNMschemes.FortheILFkernel107Figure4.31:TheaveragePCCsover362proteinsforType-2mGNM.(a)TwoILFkernelsandtheirdistancesaresystematicallychangedfrom5Ato31A.(b)Twoexponentialkernelsandtheirscalesaresystematicallyvariedintherangeof[1A,26A].basedtest,thedistancesinbothkernelsvaryfrom5Ato31A.Fortheexponentialkernelbasedtest,weset=1andvaryinbothkernelswithintherangeof[1A,26A].ThePCCswithexperimentalB-factorsareaveragedover362proteins.TheresultsfortheType-1mGNMaredemonstratedinFigures4.30:(a)and(b).WhentwoILFkernelsareusedinFigure4.30:(a),wecanseenthatthelargestaveragePCCsareconcentratedaroundtheregionwheretwokernelshavedramaticallyrentdistanceswithonebeingaround7Aandtheotherrangingfrom14to20A.OurresultsindicatethatinthissetofproteinsthereisamultiscalepropertythatisbetterdescribedbymGNMparameterizedattdistances.Moreover,thebestPCCisdistributedarounddistance7A,whichisconsistentwiththeoptimaldistance(7A)recommendedforthetraditionalGNMmethod.SimilarmultiscalebehaviorcanalsobeobservedforanexponentialkernelbasedmGNMasdemonstratedinFigure4.30:(b).1084.7.2.2Type-2mGNMTheresultsofType-2mGNMswithILFkernelsandexponentialkernelsaredemonstratedinFigures4.31:(a)and(b),respectively.Themultiscalepropertyisobservedtoimprovepredictionsforbothcases.ComparedwithType-1mGNM,Type-2mGNMisabletoachievebetteraveragePCCswithrespecttoexperimentalB-factors.FortwoILFkernels,thebestaveragePCCfortraditionalGNMis0.567.Type-1mGNMhastlyimproveditto0.607.Additionally,Type-2mGNMachievesthebestaveragePCCof0.614.Similarresultsareobservedinexponentialkernelmodels.ForthegeneralizedGNM,thebestaveragePCCisabout0.608.Thishasbeenimprovedto0.629inType-1mGNMandfurtherimprovedto0.642inType-2mGNM.DetailedcomparisonsaresummarizedinTable4.15:.Table4.15:ThebestaveragePCCswithexperimentalB-factors.ResultsforGNMandmGNMareaveragedover362proteins.ResultsforANMandmANMareaveragedover300proteins.KernelGNMType-1mGNMType-2mGNMKernelANMmANMILF0.5670.6070.614ILF0.4900.531Exponential0.6080.6290.642Gaussian0.5180.5464.7.3MultiscaleanisotropicnetworkmodelsTable4.16:64Large-sizedproteinsinthe364-proteindataset49butnotincludedinourmANMtestduetolimitedcomputationalresource.1F8R1GCO1H6V1IDP1KMM1QKI1WLY2A502AH12BCM2COV2D5W2DPL2E102ETX2FN92I492O6X2OKT2POF2PSF2Q522VE82W1V2W2A2XHF2Y7L2YLB2YNY2ZCM2ZU13AMC3BA13DRF3DWV3G1S3HHP3LG33MGN3MRE3N113NPV3PID3PTL3PVE3PZ93SRS3SZH3TDN3UR83W4Q4AM14B6G4B9G4DD54DKN4DQ74ERY4F014G5X4G6C4J114J784JYPTostudytheperformanceofthemultiscaleanisotropicnetworkmodel,weuse300pro-teinsobtainedfromthedatasetwith364proteinsbyremovingthelargest64proteinslistedin109Figure4.32:TheaveragePCCsover300proteinsformANM.(a)TwoILFkernelsandtheirdistancesaresystematicallychangedfrom5Ato31A.(b)TwoGaussiankernels(=2)andtheirscalesaresystematicallyvariedintherangeof[1A,26A].Table4.16:.TheHessianmatrixusedinmANMis3N3N,whichis9timeslargerthanthecorrespondingKircmatrixingGNM.ThisposesmorechallengesasthecomputationaltimegrowsexponentiallywiththesizeoftheHessianmatrix.WeconsiderILFkernelandGaussiankernel(=2)basedmANMmethodsinourteststudy.OurresultsareplottedinFigure4.32:.First,onecanstillseethemultiscaleinthissetofproteinsasthebestaveragePCCvaluesofmANMareachievedatthecombinationofarelativelysmalldistance(7A)andarelativelylargedistance.Thesevaluesaremuchhigherthanthoseonthediagonal,whichrepresenttheaveragePCCvaluesofthetraditional(singlekernel)ANM.FortheGaussiankernelbasedmANM,weseeasimilarpattern.However,itachievesbetterpredictionsthanthoseoftheILFkernelbasedmANM.ThisresultsarealsolistedinTable4.15:.AlthoughtheANMmethodsarenotasaccurateastheGNMmethods,theyareabletouniquecollectivemotionsthatotherwisecannotbeobtainedbytheGNMmethods.1104.8mGNMandmANMapplicationsHavingdemonstratedtheabilityofmGNMandmANMforcapturingproteinmultiscalebehaviorandimprovingB-factorpredictions,weconsiderafewapplicationstoshowcasetheproposedmethods.First,wetakeonasetofproteinsthatfailtheoriginalGNMinvariousways.ThisanalysismightshedlightonwhytheproposedmGNMworksbetterthantheoriginalGNM.Additionally,GNMandANMcanprovidedomaininformationforaproteinstructure.ItiswellknownthatGNMeigenvectorscanbeusedtoindicatethepossibledivisionsofdomainsanddomain-domaininteractions.Finally,ANMeigenvectorsarewidelyusedtopredictthecollectivemotionsofaproteinnearitsequilibrium.4.8.1B-factorpredictionofcasesusingmGNMItiswellknownthatthetraditionalGNMdoesnotworkwellintheB-factorpredictionforcertainproteinsforvariousreasons.50,52Parketal.haveshownthatGNMPCCswithexperimentalB-factorscanbenegative.52Inthiswork,wedemonstratethatthemGNMmethodisabletodelivermoresatisfactoryB-factorpredictionsbycapturingmultiscalefeatures.Todemonstratethisweconsiderfourproteins,1CLL,1V70,2HQKand1WHI.TheType-2mGNMwithtwoexponentialkernelsisusedfortheseapplications.AsdepictedinFigure4.31:(b),thereisawiderangeofscaleparametersthatdeliveraccurateB-factorpredictions.Wechoose=1;1=3Aand=1;2=25Atouseinthistest.Forcomparisonstotheoriginalmethod,thetraditionalGNM,orGNM-ILF,isemployedwithtdistances,namely7Aand20A,whicharedenotedasGNM7andGNM20,respectively.Figures4.33:,4.34:,4.35:and4.36:illustratetheresults.IneachproteinsurfacesarecoloredbyB-factorvaluespredictedbyGNM7,mGNMandtheyfunctioninEq.(3.15),respectivelyin(a),(b)and(c).ThecomparisonsofB-factorspredictedbyGNM7andGNM20withthoseofexperimentsaredemonstratedin(d).Similarly,111Figure4.33:ComparisonbetweenType-2mGNMwithexponentialkernelandtraditionalGNMfortheB-factorpredictionofprotein1CLL.Twoscales,1=3Aand2=25A,areemployedinmGNM.(a)MolecularsurfacecoloredbyB-factorspredictedbyGNMwithcut-distance7A.(b)MolecularsurfacecoloredbyB-factorsevaluatedbyourType-2mGNM.(c)MolecularsurfacecoloredbymultiscaleilityfunctioninEquation(3.15).(d)B-factorspredictedbytraditionalGNMwithdistances7A(GNM7)and20A(GNM20).(e)B-factorspredictedbymGNM.112Figure4.34:ComparisonbetweenType-2mGNMwithexponentialkernelandtraditionalGNMforprotein1V70B-factorprediction.Twoscales,1=3Aand2=25A,areem-ployedinmGNM.(a)MolecularsurfacecoloredbyB-factorspredictedbyGNMwithdistance7A.(b)MolecularsurfacecoloredbyB-factorsevaluatedbyourType-2mGNM.(c)MolecularsurfaceiscoloredbymultiscaleyfunctioninEquation(3.15).(d)B-factorspredictedbytraditionalGNMwithdistances7A(GNM7)and20A(GNM20).(e)B-factorspredictedbymGNM.113Figure4.35:ComparisonbetweenType-2mGNMwithexponentialkernelandtraditionalGNMforprotein2HQKB-factorprediction.Twoscales,1=3Aand2=25A,areusedformGNM.(a)MolecularsurfacecoloredbyB-factorspredictedbyGNMwithdistance7A.(b)MolecularsurfacecoloredbyB-factorsevaluatedbytheType-2mGNM.(c)MolecularsurfaceiscoloredbymultiscaleyfunctioninEquation(3.15).(d)B-factorspredictedbytraditionalGNMwithdistances7A(GNM7)and20A(GNM20).(e)B-factorspredictedbymGNM.114Figure4.36:ComparisonbetweenType-2mGNMwithexponentialkernelandtraditionalGNMforprotein1WHIB-factorprediction.TwomGNMsareused.Theone,mGNMK2,hastwoexponentialkernelswith=1,1=3Aand2=25A.Thesec-ondmGNM,mGNMK3,hasanextraexponentialkernelwith=1and3=10A.(a)MolecularsurfacecoloredbyB-factorspredictedbyGNMwithdistance7A.(b)MolecularsurfacecoloredbyB-factorsevaluatedbyaType-2mGNM.(c)Molecularsur-faceiscoloredbymultiscaleyfunctioninEquation(3.15).(d)B-factorspredictedbytraditionalGNMwithdistances7A(GNM7)and20A(GNM20).(e)B-factorspredictedbytwomGNMs,mGNMK2andmGNMK3.115thecomparisonsofthepredictedB-factorsbymGNMwiththoseofexperimentsareplottedin(e).AsummaryofrelatedPCCvaluesarelistedinTable4.15:.Table4.17:CasestudyofB-factorpredictionforfourproteinsinthreentschemes:GNM7,GNM20andmGNM.Inthecaseof1WHI,weusemGNMwithtwokernelsandthreekernels(valueinparentheses).PDBIDGNM7GNM20mGNM1CLL0.2610.2350.7631V700.1620.5480.7502HQK0.3650.7810.8331WHI0.2700.3700.484(0.766)FlexiblehingesaresometimesimportanttoproteinfunctionsbuttheyarenotalwayseasilydetectedbyGNMtypemethods.25,39AsshowninFigure4.33:,theoriginalGNMparameterizedatdistance7or20Adoesnotworkwellforthehingelocatedaroundresidues65-85.Infact,theGNMmethodcannotpredictthehingeatanygivendistance.Whereas,thetwo-kernelmGNMisabletocapturethehingebehavior.Protein1V70showninFigure4.34:isanothercaseforthetraditionalGNMmethod.Atdistance7A,itseverelyover-predictstheB-factorsofthe12residues.However,itspredictionimprovesifalargerdistanceisused.Incontrast,thetwo-kernelmGNMprovidesaverygoodprediction.Figure4.35:illustratesonemoreinterestingsituation.ThetraditionGNMwithdistance7Aover-predictstheB-factorsforresiduesnearnumber58.However,atalargedistanceof20A,itisabletoaccurateresults.Inthiscase,mGNMisabletofurtherimprovetheaccuracy.Thecaseof1WHIgiveninFigure4.36:isforbothmethodstested.GNMswithtwotparameterizationdonotworkwellandtwo-kernelmGNM,whilemoreaccurate,stilldoesnotreachaPCCgreaterthan0.5.ItsPCCof0.484isjustaminorimprovementofGNMPCCs,0.270(obtainedatrc=7A)and0.370(obtainedatrc=20A).Itshouldbe116Figure4.37:ProteindomaindecompositionwithType-1mGNM.Theeigenvector(Fiedlervector)isusedtodecomposetheproteinintotwodomains.(a)protein1ATN(chainA);(b)protein3GRS.noticedthatmGNMcansimultaneouslyincorporateseveralscales,therefore,weemployanextrakernelwith=1;3=10Atodealwiththisprotein.AsshowninTable4.17:andFigure4.36:,thethree-kernelmGNMisabletodeliveragoodPCCof0.766.4.8.2DomaindecompositionusingmGNMMathematically,thesmallestnonzeroeigenvalueiscalledalgebraicconnectivityorFiedlervalueandtherelatedeigenvectoriscalledFiedlervector.ItisknownthattheFiedlervectorcanbeusedtodecomposeaproteinintotwodomains.Eachparticleintheproteinisassignedwithavalue(element)fromtheFiedlervectorandtheseparticlesaregroupedaccordingtotheirpositiveornegativesigns.Theparticleswithzerovaluescanbeintoeithergroupastheyareusuallyinalinkingregionbetweentwodomains.TotesttheperformanceofthemGNMmethods,weutilizetwotestproteins,1ATN(chainA)and3GRS,whicharealsousedbyKundu,etal.42Wecomparetheperformanceoftwo117Figure4.38:ProteindomaindecompositionwithType-2mGNM.Theeigenvector(Fiedlervector)isusedtodecomposetheproteinintotwodomains.(a)protein1ATN(chainA);(b)protein3GRS.ItcanbeseenthatType2mGNMfailsinproteindomaindecomposition.typesofmGNMs.InType-1mGNM,weusetheexponentialkernelswith=1;1=3Aand=1;2=25A.InType-2mGNM,weusethreeexponentialkernelswiththesametwokernelsasType-1mGNMwithanextrakernelparameterizedas=1;3=10A.TheresultsaredepictedinFigures4.37:and4.38:,respectively.ItcanbeseenthatType-1mGNMdeliversagreatdecomposition,whichisalsoconsistentwiththepredictionfromtraditionalGNM.42However,theType-2mGNMdoesnotproduceareasonableresult.ThisisduetothefactthatAlgorithmisdesignedtoconstructthesymmetricKircmatrixwithrequireddiagonalelementsanditsnon-diagonalelementsdonotproperlytheproteinconnectivity.However,thePCCsofType-1mGNMfor1ATNand3GRSare0.460and0.658.Whereas,thePCCsofType-2mGNMfor1ATNand3GRSare0.660and0.666.TheseresultsindicatethattheB-factorvaluesaremainlydictatedbythediagonalmatrixelementswhilethe118Figure4.39:Thecollectivemotionsofprotein1GRU(chainA).Theseventh,eighthandninthmodescalculatedfrommANMaredemonstratedin(a),(b)and(c),respectively.domainseparationisdeterminedbynon-diagonalmatrixelements.4.8.3CollectivemotionsimulationusingmANMGNMisanisotropicmodelwhichquanthegeneralatomicinamolecule.Incontrast,ANMisdesignedtodescribetheanisotropicproperties,suchascollectivemo-tionsofamoleculenearequilibrium.Typically,thesixmodes,correspondingtosixzero(ornearzero)eigenvalues,representthetrivialtranslationalandrotationalmodesofacom-plexbiomolecule.Globalmodesthatareuniquetothebiomolecularstructurearedescribedbyeigenvectorsassociatedwiththenextsmallestnonzeroeigenvalues.Duetoitssimplicity,accuracyandavailability,ANMiswidelyusedtostudythedynamicsofbiomolecules.Inthepresentwork,wehavedesignedanmANMmethodtomaintaintheaforementionedproperties.TovalidatemANMforanisotropicmodeanalysis,weusetwotestproteins,1GRU(chainA)and1URP(chainA).Theprotein1GRUischaperoninGroEL,abenchmarktestforANM.68,82WeemploymANMwithtwoGaussiankernels(=2)with=5Aand=20A.Wecomputeeigenvectorsassociatedwiththethreenonzeroeigenvalues.As119Figure4.40:Thecollectivemotionsofprotein1URP(chainA).Theseventh,eighthandninthmodescalculatedfrommANMaredemonstratedin(a),(b)and(c),respectively.illustratedinFigure4.39:,themANMresultsareinanexcellentagreementwiththoseofANMforchaperoinGroEL.68,82TofurthervalidatethemANMmethod,weexamineanothertestcase,1URP.Thismoleculesisaribose-bindingproteinanditsanisotropicmotionshavebeenstudiedpre-viously.45Weutilizethesamesetofparametersdescribedabove.Figure4.40:demonstratesthemANMresultsforthisstructureandagaintheresultsareincloseagreementwiththetraditionalANManalysis.454.9FRI-basedhingepredictionvalidationwithknownhingingproteinsInthissectionwetakeanin-depthlookatthehingespredictedbygGNMmodesonahighqualitysetoftestcasesborrowedfromtheStoneHinge39studyandfromearlierstudiesbyFlores,etal.Thissetincludes32structures,openandclosedconformationsof16dtproteins.Eachproteininthesethasaknownhingetypemotionmentionedinthesourceliterature.1204.9.1gGNMmode-basedhingepredictionTheFRI-basedhingepredictionsfor19outofthe32structuresstudiedwereclearmatcheswhileanother11casesarepartialhits,wherethereisatleastonetruepositiveandonefalsepositiveorfalsenegative.Togetherthereare30ofthe32structuresforwhichthepredictionsareatleastpartiallyaccurate.CompleteresultsforgGNMmode-basedhingepredictionareshowninTables4.18:,4.19:and4.20:.Predictionaccuracyisdeterminedundertheloosecriterionusedbyearliercomprehensivehingestudies,a14residuewindowaroundthepredictedhingepoint.AcasewheretheliteraturehingesandgGNMmodehingepredictionsareinperfectagreement,openandandclosedformsofovotransferrin,isshowninFigure4.41:.Thepreviouslyidenhingeresiduesareresidues333and342.Visualinspectionoftheregionshowstheresiduesfrom333to342areallrandomcoil,thereforeweconsiderthisrangetobeahingeregion.gGNMmodehingepredictionplacesthecenterofthehingeatresidue344intheclosedconformationand339intheopenconformation,inverycloseagreementwiththe333to342range.Thereforewecountthisamongthefullhits.NextweexaminetheproteinsonwhichtheautomaticgGNM-basedmodehingepre-dictiondoesnotagreewiththeknownhingeresidues,thecasefor2of32structures.Itisimportanttonotethatintwoofthesecases,thesecondgGNMmodeprovidesaccuratepredictions.FailuretopredicthingesaccuratelycanhappenbecausegGNMmodesarenotaccurateforaparticularstructure(duetoavarietyofreasonsincludingmissingligands,un-accountedforcrystals,etc.),becausethehingingmotionsofconsequencearenotthehingesbetweenthelargestdomains,orbecausetherearemultiplemodeswithverysimilar,loweigenvalues.Oneexamplewheretheimportanthingeisnottheonedividingdomainsislactoferrin,showninFigure4.43:.ThecaseoflactoferrinhasbeenformanyhingepredictionmethodssuchasStoneHingeandTLSMDwhileotherssuchasFlexOraclecanreadilyandaccuratelyidentifythebiologicallyimportanthinges.Thisislikelyduetothe121factthatthemostbiologicallyinterestinghingingmotionisnotatadomainseparation,butratherasmalleruniquetoonedomain.Someearlierhingestudieshaveconsid-eredthisastheonlyhingetopredict,howeverthereissomeevidencefromcrystallographicstudiesthatlactoferrindoesinfacthaveahingingmotionbetweenitstwolargestdomains.ThishingingbetweendomainsiswhatissuggestedbytherstgGNMmode.Furthermore,thesecondgGNMmodeidenthesmaller,biologically-relevanthinging.Therefore,thismaybeacaseofwheretheinitialliterature-basedidenofhingeresidueswaswronginsteadofafailureofgGNM-basedhingeprediction.Anotherexamplewherethesecondmodecontainsimportanthingepredictionsisribosebindingproteinopenconformation,Figure4.42:.Theclosedconformationpredictionforribosebindingproteinisaperfecthit,however,theopenconformationgivesthecorrecthingepredictionswhenconsideringthesecondgGNMmode.BasedonthisresultwesuggestalwaysconsideringthehingepredictionsfromthesecondmodeofgGNMparticularlywhenthehinge(s)predictedbythemodedonotappeartobeatahingingregionuponvisualinspectionofthestructure.Table4.18:gGNM-basedhingepredictionsfor32proteinstructurescomparedwithconsen-sushingeresiduesdeterminedfromliteratureandotherhingestudies.39ProteinNameClosedPredictionConsensusOpenPredictionConsensusOvotransferrin1aiv344333,3421ovt339333,342Adenylatekinase1ake110,168124-126,161-1632ak3113,173124-126,161-163CAPK1atp126119-1261ctp126119-126Biotincarboxylase1bnc114,208130-131,203-2041dv2107,210130-131,203-204DNApolymerasebeta1bpd9779-83,91-932bpg143(86,259m2)79-83,91-93Calmodulin1cll8076-801cfd7876-80Elastase1ezm142132-1351u4g142132-135GluR21fto108,218214-2151ftm108,218214-215Lir-11g0x9595-961p7q9695-96Bence-JonesProtein4bjl115108-1164bjl112108-116Inorganicpyrophosphatase1k20187188-1921k23188188-192Phosphoglyceratekinase1kf0194201-205,402-4041hdi194201-205,402-404Lactoferrin1lfh34390,250(340*)1lfg340(91,250m2)90,250(340*)LAObindingprotein1lst89,19289-91,182-1942lao89,19189-91,182-194Glutamineblindingprotien1wdn87,18285-90,178-1851ggg87,18285-90,178-185Ribosebindingprotein2dri103,235103-104,235-2361urp151(103,235m2)103-104,235-236122(a)1AIV-view1(b)1AIV-view2(c)1AIVMode1Figure4.41:Top,secondarystructurerepresentationofovotransferrinwithhingeresidueshighlitedbyVdWrepresentationsoftheirC-alphaatoms.Bottom,valuesbyresidueformodes1and2(lefty-axis)withcumulativesum(righty-axis).Themaximumandminimumvaluesofthecumulativesumcorrespondtohingepoints123(a)1URP-view1(b)1URP-view2(c)1URPMode1(d)1URPMode2Figure4.42:Top,secondarystructurerepresentationofribosebindingproteinwithhingeresidueshighlitedbyVdWrepresentationsoftheirC-alphaatoms.Bottom,valuesbyresidueformodes1and2(lefty-axis)withcumulativesum(righty-axis).Themaximumandminimumvaluesofthecumulativesumcorrespondtohingepoints124(a)1LFG-view1(b)1LFG-view2(c)1LFGMode1(d)1LFGMode2(e)1LFGMode1-DomainonlycalculationFigure4.43:Top,secondarystructurerepresentationoflactoferrinwithhingeresidueshigh-litedbyVdWrepresentationsoftheirC-alphaatoms.Bottom,valuesbyresidueformodes1and2(lefty-axis)withcumulativesum(righty-axis).Themaximumandminimumvaluesofthecumulativesumcorrespondtohingepoints.125Table4.19:gGNM-basedhingepredictionsfor32proteinstructurescomparedwithconsen-sushingeresiduesdeterminedfromliteratureandotherhingestudies.39Y-Thehinge(s)arecompletelyanduniquelyidenP-Apredictedhingeisfromatruehingepositionbylessthan5aminoacidsorthereisafalsepositiveornegative,N-Failuretoidentifyanymajorhinges.ProteinNameClosedPredictionConsensusOpenPredictionConsensusOvotransferrin1aivY333,3421ovtY333,342Adenylatekinase1akeP124-126,161-1632ak3P124-126,161-163CAPK1atpY119-1261ctpY119-126Biotincarboxylase1bncP130-131,203-2041dv2P130-131,203-204DNApolymerasebeta1bpdP79-83,91-932bpgN(Pmode2)79-83,91-93Calmodulin1cllY76-801cfdY76-80Elastase1ezmY132-1351u4gY132-135GluR21ftoP214-2151ftmP214-215Lir-11g0xY95-961p7qY95-96Bence-JonesProtein4bjlY108-1164bjlY108-116Inorganicpyrophosphatase1k20Y188-1921k23Y188-192Phosphoglyceratekinase1kf0P201-205,402-4041hdiP201-205,402-404Lactoferrin1lfhP90,250(340*)1lfgP90,250(340*)LAObindingprotein1lstY89-91,182-1942laoY89-91,182-194Glutamineblindingprotien1wdnY85-90,178-1851gggY85-90,178-185Ribosebindingprotein2driY235-2361urpN(Ymode2)235-236Table4.20:SummaryofhitsforgGNM-basedpredictionsofhingesfor32PDBs.Full-Thehinge(s)arecompletelyanduniquelyidenPartial-Apredictedhingeisfromatruehingepositionbylessthan5aminoacidsorthereisafalsepositiveornegative,None-Failuretoidentifyanymajorhinges.Yes19Partial11No21264.9.2MachinelearningfeaturerankingTheroundoffeatureranking,calculatingtheF-score,includedall55consideredfeaturesandthecompleteresultsareshowninTable4.21:.TheF-scoreservesasausefulbecauseitisquicklycalculatedandthescoresareindependentoftheothervariablestested,whichisadvantageousbecauseitdoesnotincorrectlyweightfeaturesduetothepresenceofmanycorrelatedfeatures,aweaknessofrandomforest.Thedisadvantageofthisapproachisthatthereisnoindicationofwhichvariablesarehighlycorrelatedandthereforetypicallyshouldnotbeusedinthesamemodel.AsecondroundoffeaturerankingwasdoneonthetopfeaturesbasedonF-scoreandtheseresultsaredisplayedinTable4.22:ThetopfourfeaturesbyF-scoreareallderivedfromgGNMmode1calculations.Thesefeaturesincludeishinge3,ishinge,cMode1andhingedist.TheF-scoresforthelatterthreefeaturesareverysimilarwhiletheishinge3,ismorethandoubletheF-scoreofthosethree.Thisdiscrepancyisprobablyduetothefactthatmanyhingesinthisdatasetspanmultipleresiduesandishinge3essentiallymarkssevenresidueregionsashingesbasedongGNMmode1values.Therefore,wesuspectthesefeaturesareallessentiallypredictingthesamethingandthisemphasizestheimportanceoftryingmultipleslightlytfeatureformulationstoonewiththemostvalueasapredictor.InterestinglythefeaturescoresfallsharplyafterthegGNMmodederivedfeatures.Thefeaturesroundingoutthetopten,inorder,areisH,HP6,RES6,isC,insec,ROT6andP(PSSM).Twoofthesefeaturesaresecondarystructurerelated,threefeaturesdescribethelocalenvironmentwithin6Angstromsandonefeatureissequencerelated.WhiletheF-scoresoftheseareconsiderablylowerthanthetopfour,thesefeaturesinthetoptendeservesomefuture,in-depthanalysistoseeifotherfeaturesrelatedtothesecouldbecreatedthataremoreusefulforpredictions.Afterthetoptenfeaturestherearethemode2derivedfeaturesandFRIyindex127relatedfeatures.Asmentionedearlier,sometimesitishardtodistinguishbetweenthemostimportantmodewhentwomodeshaveverysimilareigenvalues.Thereforeitmaybethatmode2featuresareonlyusefulinthosecases.Withthisismindwecanthemodelfurtherbycheckingforsimilareigenvaluesforthelowestmodesofeachstructureand,iftheyarefoundtobetlyclose,wecombinethehingepredictionsfrombothmodeswhencreatingfeatures.Underthisscheme,all32structures'hingesareatleastpartiallypredictedcorrectlybythismethod.4.9.3SVMmodelpredictionresultsThissectionprovidesanexampleofwhatisachievableusingsupportvectormachinemodelingwithgGNMmodefeatures.ListedinTables4.23:and4.24:arethehingeresiduespredictedbytwoofthemoreaccurateSVMmodelsweareabletocreatewiththefeaturestested.ThefeaturesusedinthemodelincludeFRIf,dFRI,FRIf,ishinge,ishinge3,hingedist,HP6,RES6andROT6.Thefeaturesusedinthesecondmodelincludeishinge,ishinge3,hingedist,Mode1andcMode1.Manycombinationsoffeaturesweretriedand,asexpectedfromthefeaturerankingresults,theonlyessentialfeaturesarethosederivedfrommode1ofmGNM.TheresultsfromtheSVMmodelsareverysimilartothosefromthemGNMmodepredictionshowevertheSVMmodeltendstopredictfewerhingeresidues.Insomecasesthisservestoremovefalsepositivesbutitalsocausesfalsenegatives.Intheend,itwouldbejustasgoodtousethemGNMpredictionsratherthantaketheextrasteptousetheSVMmodel.128Table4.21:FeatureimportancerankingsbyF-score.F-scoresarecalculatedusingtheLIBSVMsoftware.F-scorerankF-scoreFeaturenameF-scorerankF-scoreFeaturename10.056996ishinge3310.000135D(PSSM)20.017780ishinge320.000099Phil/A230.016408cMode1330.000085prop140.010333hingedist340.000083isB50.003040isH350.000082isT60.002399HP6360.000081C(PSSM)70.002279RES6370.000070H(PSSM)80.001591isC380.000064Phob/A290.001551insec390.000047N(PSSM)100.001375ROT6400.000043A(PSSM)110.001301P(PSSM)410.000036Surf/A2120.001283Mode2420.000035E(PSSM)130.001184cMode2430.000034HP1140.000525N(overl)440.000025Q(PSSM)150.000495FRIf-index450.000014S(PSSM)160.000473Avg.mFRIwithin6A460.000012L(PSSM)170.000461Mode1470.000010K(PSSM)180.000422ofmFRIandFRI480.000010Y(PSSM)190.000409isE490.000009I(PSSM)200.000409G(PSSM)500.000003Total/A2210.000400mFRIB-factor510.000002V(PSSM)220.000400prop5520.000002%SASA230.000379T(PSSM)530.000001R(PSSM)240.000324mFRIf-index540.000000isI250.000302prop3550.000000nosec260.000275M(PSSM)270.000241isG280.000209FRIB-factor290.000151W(PSSM)300.000143F(PSSM)129Table4.22:Featureimportancerankingsbyrandomforestmethod.ImportancevaluescalculatedusingtheRpackagecaretcomman,varImp.RankImportanceImportance(Scaled)FeatureName10.79100.00cMode120.7385.78ishinge530.6153.53RES640.6051.52HP650.5948.30PSSM(P)60.5845.94cMode270.5844.81ROT680.5744.07insec90.5641.97isC100.5537.96N(overlap)110.400.00isHTable4.23:SVMresultsforamodelwitheightofthetoprankedfeatures,FRIf,dFRI,FRIf,ishinge,ishinge3,hingedist,HP6,RES6andROT6.ClosedPredictionConsensusOpenPredictionConsensus1aiv342333,3421ovt341-342333,3421ake124-126,161-1632ak3124-126,161-1631atp123-126119-1261ctp123-126119-1261bnc115-116,205-206130-131,203-2041dv2104-108130-131,203-2041bpd79-83,91-932bpg92-9379-83,91-931cll76-8076-801cfd76-8076-801ezm142132-1351u4g141132-1351fto105-108,215-221214-2151ftm105-108,215-221214-2151g0x92-9895-961p7q93-9995-964bjl114-117108-1164bjl114-117108-1161k20190188-1921k23188-191188-1921kf0201-205,402-4041hdi201-205,402-4041lfh341-34290,250(340*)1lfg338-34390,250(340*)1lst89-92,189-19589-91,182-1942lao87-92,18989-91,182-1941wdn85-88,179-18585-90,178-1851ggg84-88,179-18585-90,178-1852dri102-104,235-236235-2361urp235-236130Table4.24:SVMresultsforamodelwithemGNM-basedfeatures,ishinge,ishinge3,hingedist,Mode1andcMode1.ClosedPredictionConsensusOpenPredictionConsensus1aiv333,3421ovt341333,3421ake124-126,161-163124-126,161-1632ak3124-126,161-1631atp122-126119-1261ctp123-126119-1261bnc205-206130-131,203-2041dv2104-108130-131,203-2041bpd8179-83,91-932bpg79-8279-83,91-931cll77-8376-801cfd76-8076-801ezm139132-1351u4g139-141132-1351fto109-111,214214-2151ftm105-108,215-221214-2151g0x92-9495-961p7q95-964bjl116-117108-1164bjl114-117108-1161k20190188-1921k23188-191188-1921kf0195-197,253201-205,402-4041hdi201-205,402-4041lfh90,250(340*)1lfg34390,250(340*)1lst90-9289-91,182-1942lao90,186-19089-91,182-1941wdn85-88,179-18585-90,178-1851ggg84-88,179-18585-90,178-1852dri102,235235-2361urp235-236131CHAPTERV.ConclusionsandFutureDirections5.1ConclusionsInlivingorganisms,proteinsandnucleicacidscarryoutavastvarietyoffunctionsin-cludingprovidingstructuralsupport,catalyzingchemicalreactions,replicatingDNA,orrespondingtostimuli.Manyofthesefunctionsareperformedthroughsynergisticinterac-tionsorcorrelationsovermultiplelengthscales,includingatomic,vanderWaals,residue,alpha-betacomplex,domain-domainandprotein-proteininteractions.PopularexistingibilitymethodssuchasGaussiannetworkmodeldonotdirectlyaccountforthemultiscalenatureofmacromolecularinteractionsandfailtopredictDebye-WallerfactorsorB-factorsformanyproteinsthatinvolvemultiplelengthcharacteristics.Thisworkputsforwardamultiscale,multiphysicsandmultidomainmodel,theity-rigidityindex(FRI),toestimatethestaticpropertyofmacromolecules.AbasicassumptionofthepresentFRItheoryisthatthegeometryorstructureofagivenproteintogetherwithitsspenvironment,namely,solvent,assemblyorcrystallattice,completelydeterminesthebiologicalfunctionandpropertiesincludingy,rigidityandenergy.Assuch,thepresentapproachbypassestheconstructionoftheHamiltonianandinteractionpoten-tials.Apossibledrawbackofthepresentmethodisthatthefullgeometricandtopologicalinformationofaproteincomplexisusuallynotavailable,whichcontributestomodelingerrors.Weutilizemonotonicallydecreasingfunctionstomeasurethegeometriccompactnessofaproteinandquantifythetopologicalconnectivityofatomsorresiduesintheproteinsandnucleicacids.Physically,FRIcharacterizesthetotalinteractionstrengthateachatomorresidueandthusittheatomicrigidityandy.Additionally,wethetotalrigidityofamoleculebyasummationofatomicrigidities.ApracticalvalidationoftheproposedFRIforyanalysisisprovidedbythepredictionofB-factors,or132temperaturefactorsofproteins,measuredbyX-raycrystallography.Weemployasetof263proteinstoexaminethevalidity,explorethereliabilityanddemonstratetherobustnessoftheproposedFRImethodforB-factorand/oryprediction.Weanalyzetheperformanceoftwoclassesofcorrelationkernels,sptheexponentialtypeandtheLorentztype,forB-factorprediction.Theexponentialtypeofcorrelationkernelinvolvestwoparameters,exponentialorderandcharacteristiclength.TheLorentztypeofcorrelationkernelalsoinvolvestwoparameters,powerorderandcharacteristiclength.Bysearchingtheparameterspaceforoptimalpredictions,parameter-freecorrelationkernelsareobtained.Itisfoundthattheparameter-freecorrelationkerneloftheLorentztypeisabletoretainabout95%accuracycomparedtotheoptimizedresults.AftervalidationofthebasicFRImethodweintroducedamultikernel-basedmultiscaleFRI(mFRI)strategytoanalyzemacromolecularity.Theessentialideaistoemploytwoorthreekernelseachparameterizedwithatscaletocapturethemultiplecharac-teristicinteractionscalesofcomplexbiomolecules.Basedonanexpandedtestsetcontaining364proteins,weshowthatthemFRImethodisabout20%moreaccuratethantheGNMmethodinB-factorprediction.Additionally,wedemonstratethatthepresentmFRIgivesrisetoexcellentyanalysisformanyproteinsthatarecasesforGNMandthepreviouslyintroducedsingle-scaleFRImethods.Finally,foraproteinofNresidues,weillustratethatthecomputationalcomplexityoftheproposedmFRIisoflinearscalingO(N),incontrasttotheorderofO(N3)forGNM.AnincreasedinterestinlargesystemsofmacromolecularcomplexesiswhatrequiresandinspiresthelatestadvancesintheFRImethods.FRIhasproventobewellsuitedtomakecalculationsonscalesrelevanttocurrentbiochemicalandbiophysicalresearch.Inparticular,fFRIboastsacomputationalcomplexityonthescaleofO(N),meaningthatitfaroutpacesalternativetoolssuchasGNM.Additionally,FRIhasbeenpreviouslydemonstrated133tomaintainsuperioraccuracytopreviousmethodsevenatsuchcientcomputationalcomplexity.NowFRI'sutilityhasbeenextendedtothenucleicaciddomain-enablingstudyofmanyimportantbiologicalsystemssuchastheRNApolymeraseexamplefeaturedinthispaper.DuetotheuniqueformulationofFRIandaFRIwewereabletoanalyzeacomplexsystemforbiologicallyrelevantdetailsthatcannotbeaccessedbyglobalmethodsortime-dependentmethodsAcontributingfactorforFRI'sincreasedcomparedtoexistingmethodsisthatGNMandNMAareessentiallyglobalmethodsinasensethattheyrelyonthesolutionoftheglobaleigenvalueproblemtopredictlocalatomicproperties,e.g.,B-factors.Incontrast,FRIisalocalmethodandutilizesthelocalgeometricinformationtopredictlocalatomicproperties.Inparallel,thereare(global)bandtheoryofsolidsand(local)atomicorbitalmodelofsolids.Theformerisgoodfordescribingmanyglobalphysicalpropertiessuchaselectricalconductivityandthermallatticemotionsintermsofexcitations,whilethelatterismorepowerfulforexplaininglocalizedchemicalreactivityandcatalysisofsolids.OneofthemajordrawbacksofGNMisthepoorscalingwiththenumberofresiduesoratomsinthesystem.ThematrixdiagonalizationofnormalmodesmethodsisofO(N3)computationalcomplexity,whereNisthenumberofresidues.Thecomputationalcom-plexityoftheoriginalFRIalgorithmisofO(N2).Inthepresentwork,weproposeafastFRI(fFRI)algorithm,whichfurtherreducesthecomputationalcomplexityofFRItoO(N).BothFRIandfFRIdonotinvolvethetimeconsumingmatrixdecomposition.Asaresult,ittakeslessthan30secondsforthefFRImethodtopredicttheB-FactorsofanHIVvirusstructurewithmorethanthreehundredthousandsofresidues,whichwouldrequiremanyyearsforGNMtocompute.Additionally,boththeexponential-basedparameter-freefFRIandtheLorentz-basedparameter-freefFRIareabout10%moreaccuratethantheGNMintheB-factorpredictionof364proteins.134Anisotropicmotionsbetweenproteindomainsareknowntocorrelatewithproteinfunc-tions.Todescribeproteinanisotropications,wealsointroduceanisotropicFRI(aFRI)algorithms.WeintroduceanadaptiveaFRImethodthatpartitionsthemoleculeintomanyclusterswithvariablesizes.Wespexaminetwoextremecases,aone-clusterparti-tionandanN-clusterpartition,whichresultinasinglecompletelyglobal3N3NHessianmatrixandNcompletelylocalized33Hessianmatrices,respectively.ThecomputationalcomplexityofaFRIvariesfromO(N3)toO(N).AlthoughaFRIHessianmatricescanbecompletelylocal,theystillcontainmuchnon-locationcorrelation.Assuch,allofthreepro-teinmodespredictedbythecompletelylocalaFRIexhibithighlycollectiveglobalmotions.TheeigenmodesobtainedfromthecompletelyglobalaFRIcloselyresemblethoseoftheanisotropicnetworkmodel(ANM).3,6However,modesconstructedfromthecompletelylo-calaFRIshowtcollectivemotionpatterns.Sincethereisnoanalyticalsolutionforcollectivemotions,itisnotpossibletojudgewhosecollectivemotionsaremorecorrect.Ingeneral,theeigenmodesofANMandthecompletelyglobalaFRIexhibitaslightlybettersynergisticthanmodesgeneratedbyusingthecompletelylocalaFRI.Inadditiontothequantitativeaspects,theproposedFRIhasafewvisualapplications.First,thecorrelationmapsoftheFRIarecapableofrevealingbothshort-andlong-distanceinteractionsorconnectivity.Sincecorrelationmapelementsaredirectlyrelatedtotheorig-inaldistancesbyaknownradialbasisfunction,thedistancescanbelabeledonthemapaswell.Additionally,thepredictedB-factorscanbeplottedastheradiiofresiduestovisualizetheamplitudeofthermalThisplotbecomesevenmoreinterestingwhenatomicspheresarecoloredwiththeelectrostatics.75Theclosecorrelationbetweenyandlargeelectrostaticpotentialscanbeunveiled,whichshedslightonintrinsicproteinstructuralproperties.Moreover,thepredictedB-factorscanbeplottedwithsecondarystructurestohaveanoverallpictureofstructuraly.Finally,ascontinuousfunctions,theatomic135rigidityfunctionandatomictyfunctioncanbeprojectedontoproteinmolecularsurfacesorothersurfacerepresentationstoanalyzey.AnotherapplicationofFRIandaFRIistheanalysisofproteindomains.Existingmeth-ods,suchasGNMandANM,arewellknownfordomainanalysis.ThepresentFRIprovidesaclearcorrelationmapfordomainidenItisfoundthataFRIgivesrisetohighlycollectivedomainmotionpatterns,althoughnotallpartsofadomainmoveuniformlyinaFRImodesofmotion.Protein-nucleicacidcomplexesareessentialtoalllivingorganisms.Thefunctionofthesecomplexesdependscruciallyontheiry,anintrinsicpropertyofamacromolecule.However,formanylargeprotein-nucleicacidcomplexes,suchasribosomesandRNApoly-merases,thepresentyanalysisapproachescanbeproblematicduetotheircompu-tationalcomplexityscalingofO(N3)andneglectingmultiscaleThereforewealsointroducetheFlexibility-rigidityindex(FRI)methodsparameter-ized49,50,75fortheyanalysisofprotein-nucleicacidstructures.WeshowthatamultiscaleFRI(mFRI)realizedbymultiplekernelsparameterizedatmultiplelengthscalesisabletotlyoutperformtheGaussiannetworkmodel(GNM)fortheB-factorpredictionofasetof64protein-nucleicacidcomplexes.78TheFRImethodsarenotonlyaccurate,butalsot,astheircomputationalcomplexityscalesasO(N).Additionally,anisotropicFRI(aFRI),whichhasclusterHessianmatrices,collectivemotionanalysisforanygivencluster,i.e,subunitordomaininabiomolecularcomplex.WecanapplyFRImethodstoalargeribosomalsubunit(1YIJ)withmultiplesubunits.Wenotethatbothoriginalsingle-scaleFRIandGNMdonotworkwellforthisstructure.Itisfoundthatthemultiscalestrategyiscrucialforthexibilityanalysisofmulti-subunitstructures.ThecorrelationcotsbetweenFRIpredictionsandexperimentalB-factorsfor1YIJimprovefrom0.3forsingle-scaleFRIto0.85formultiscaleFRI.Wefurtherusethe136cotsobtainedfrom1YIJtopredicttheyofanentireribosome,4V4J.WefoundthatmFRIhasanadvantageforanalyzinglargebiomolecularcomplexesduetobothhigherspeedsandaccuracy.WehavealsodemonstratedtheutilityoftheanisotropicFRI(aFRI)foranalyzingthetranslocationofanRNApolymerase,whichinvolvesprotein,DNA,RNA,nucleotidesub-stratesandvariousions.BothexperimentalandcomputationalstudiesofRNApolymerasesareandexpensiveduetothesizeandcomplexityofthebiomolecularcomplex.ThemolecularmechanismofRNApolymerasetranslocationisaninteresting,openresearchtopic.ThepresentworkmakesuseoflocalizedaFRItoelucidatethesynergisticlocalmo-tionsofabacterialRNApolymerase.Theseareconsistentwiththosefrommuchmoreexpensivemoleculardynamicssimulationsandnormalmodeanalysis.23,24Also,toclarifytherelationshipbetweennormalmodesmethodsandFRI,weconstructaseriesofgeneralizedGaussiannetworkmodels(gGNMs).WeshowthattheoriginalKircmatrixusedinGNMcanbeconstructedbyusingtheideallow-pass(ILF),whichisaspecialcaseofafamilyofadmissiblecorrelationkernels(orfunctions)usedinFRI.Basedonthisconnection,weproposeaframeworktoconstructgeneralizedKircmatricesforbothGNMandFRI.Moresp,theinverseofthegeneralizedKircmatricesleadstomanygGNMsandthedirectinverseofthediagonaltermsgivesrisetoFRI.WerevealtheidenticalbehaviorbetweengGNMandFRIatalargedistanceorcharacteristicscaleforB-factorproteinpredictions.Additionally,weproposemultiscaleGaussiannetworkmodels(mGNMs)basedontherelationshipofGNMandFRI.Essentially,wedevelopatwo-stepproceduretoconstructmGNMs.Inthestep,weutilizemFRItocomeupwithanoptimalcombinationofmultiscalekernels.Inthesecondstep,wetrytoimplementthesamecombinationofmultiscalekernelsinthegeneralizedKircmatricesformGNMs.However,thisstepisnotuniquebecauseforagivenKircmatrix,137GNMandFRIareconnectedonlythroughdiagonalelements.Twotypesschemes,Type-1mGNMandType-2mGNM,areproposedinthiswork.Moreover,weproposemultiscaleanisotropicnetworkmodels(mANMs)basedonthesimilaritybetweenANMandGNMandtheconnectionbetweenGNMandFRI.SinceANMistypicallylessaccuratethanGNMinB-factorprediction,49,52itsmainutilityisforcollectivemotionanalysis.WethereforehavedevelopedmANMstomaintainthephysicalconnectivityofproteinatomsintheKircmatrix.WehavecarriedoutintensivenumericalexperimentstovalidatetheproposedgGNM,mGNMandmANMmethodsforB-factorpredictions.ThegGNMmethodisexaminedoverasetof364proteins.ItisfoundthattheproposedgGNMisabout10%moreaccuratethanGNMinB-factorprediction.FormGNM,weuseonlyasetof362proteinsduetolimitedcomputerresources.WeshowthatmGNMcanachieveabout13%improvementoverGNM.Similarly,theproposedmANMisabout11%moreaccuratethanitscounterpart,ANM,inB-factorpredictionoverasetof300proteins.Further,weconsiderthreetypesofapplicationsoftheproposedmGNMandmANMmethods.OnetypeofapplicationistoanalyzetheyofproteinsthatfailtheoriginalGNMmethodinvariousways.WeemployfourproteinstodemonstratetheadvantageoftheproposedmGNMinyanalysis.Anotherapplicationisthestudyofproteindomainseparations.ThenontrivialeigenmodeofthemultiscaleKircmatrixisused.WefoundfromtheanalysisoftwoproteinsthatType-1mGNMdoesagoodjobindomainanalysiswhileType-2mGNMdoesnotworkforthispurpose.Theotherapplicationconcernstheprotein'scollectivemotions.mANMisfoundtosimilarresultstothoseoftheoriginalANMmethod.ItisimportanttonotethatthemGNMandmANMmethodsarenotlimitedtotheexamplesshowninthiswork.ThedesignofnewmGNMandmANMmethodsisstillanopenproblem.Essentially,wehopethesenewmethodsaret,accurateandrobust.138Moresp,highaccuracyinB-factorpredictionisamaincriterion.Additionally,havingtheabilitytoprovidecorrectproteindomainanalysisisadesirablepropertyaswell.FormANM,thecapabilityoferingcorrectmotionanalysisisamajorrequirement.Thequalityofbothdomainandmotionanalysesdependsonhowtodesignnon-diagonalmatrixelementssoastoproperlythephysicalconnectivityamongparticles.Inthefuture,wewillcarefullyconsiderthepresentmANMforotherinterestingapplications,namelyanisotropicB-factors22andconformationalchanges.63Thestudyofhingeshasbeenanimportanttopicandmuchresearchhasbeendoneinthepast.21,25,26,39,59IdenofhingeresiduesisusefulforinferringmotionandfunctionwhenmoleculesaretoolargeforMDsimulationonrelevanttimescales.Othermethods,suchasGNMandNMAhavebeenutilized.FRI-basedmethodscouldplaceatroleinhingeanalysis.Intestssofar,gGNMmode-basedhingepredictionsareatleastpartiallycorrectforallofthestructuresanalyzedusingtheautomatic,simpleanalysismethodwepropose.Furthermore,withsomehumaninterpretationalongwithconsiderationofthesecondmode,itispossibletopositivelyidentifyalmosteverysinglehingeinthetestset.Featurerankingresultsdemonstratethatmanyofthemolecularcharacteristicsthatareusefulinothermachinelearningmodels,suchasSVM-basedhotspotpredictors,arenotusefulpredictorsofhinging.Featuresbasedonxibility,sequenceandsecondarystructurehavelittlecorrelationtohingingresiduesbasedonF-score.ThebestSVMmodelthatcouldbeproducedusesgGNMmode-basedpredictionsastheonlyfeature,andthepredictionsfromthesemodelsshowonlyminorinsensitivityandspycomparedtothepredictionstakendirectlyfromobservinggGNMmode-basedfeaturesalone.Therefore,unlessotherfeaturesarefoundthataremorepredictive,gGNMmode-basedpredictionsarejustasusefulasanySVMmodelbuiltuponthem.Inthisstudywehavelaidoutaframeworkforamachinelearningmodelforhingedetection.Unfortunately,noneofthe139featureswehavetriedsofarimprovethemodeltlybeyondamodelbasedpurelyonhingepredictionfromgGNMmodes.Nevertheless,thereisstillthepossibilityforotherfeatureswehavenotyettestedtoimprovethismodel.5.2FuturedirectionsOneofthemostimportantfuturegoalsforFRIistomakethesoftwareeasytoaccessandeasytouse.AmajorsteptowardcompletingthisgoalisthedevelopmentofawebserverforthevariousFRItools.Developmenthasbegunonaweb-basedtoolthatallowsuserstorunFRItoolsfory,hingeandanisotropicmotionpredictions.ThewebtoolaccommodatesanystandardformatPDBofproteinsand/ornucleicacids.InadditiontothewebserverversionoftheFRItools,weplantohostvariousexecutableforFRItoolsfortheWindowsandLinuxplatformsaswellasthesourcecode.Finally,weplantocreateplug-insforthePyMolandVisualMolecularDynamicsprogramstoenablequickFRItoolaccesswithinthesepopulartools.Hopefully,withincreasedaccessibility,FRImethodswillcompletelyreplacenormalmodesmethodsasthemostpopulartoolforcalculatingyandlong-timedynamicsofmacromolecules.AnisotropicB-factorsprovideapossibleopportunityforfurthervalidationoftheanisotropicFRImethod.Unfortunately,thenumberofstructureswithanisotropicB-factorsismuchlowerthanstructureswithisotropicB-factors.Additionally,thereareveryfewtoolsthataredesignedtoreadanisotropicB-factorsfromPDBformatInthenearfutureweplantotestaFRIagainstallofthePDBstructuresthatincludeanisotropicB-factorvalues.WealsoplantofurtherpursueimplementationofFRIbasedfeaturesinmachinelearningmodels.AlthoughthisinitialattemptatimprovingamachinelearningmodelwithFRIwasnotmetwithmuchsuccess,webelievethereareotherapplicationswhereFRImaybeofuse.Inparticular,yisknowntoplayaroleinsmallmoleculebindingandprotein-nucleicacidbinding.ThereforeweareattemptingtouseFRIyandgGNMmode140calculationstoimprovemachinelearningmodelsfortheseapplications.Workhasbegunonaprotein-nucleicbindingmodelinspiredbytheDBSImodelfromJulieMitchellattheUniversityofWisconsin.WeaimtoincludeFRI-basedfeaturesandtoimprovetheaccuracyoftheelectrostaticscalculationsinsuchmodelstoimprovethemodel'soverallpredictionaccuracy.141BIBLIOGRAPHY142BIBLIOGRAPHY[1]Hui-wangAi,JHenderson,SRemington,andRCampbell.Directedevolutionofamonomeric,brightandphotostableversionofclavulariacyantprotein:struc-turalcharacterizationandapplicationsinorescenceimaging.Biochem.J,400:531{540,2006.[2]M.P.AllenandD.J.Tildesley.ComputerSimulationofLiquids.Oxford:ClarendonPress,1987.[3]A.R.Atilgan,S.R.Durrell,R.L.Jernigan,M.C.Demirel,O.Keskin,andI.Bahar.Anisotropyofuationdynamicsofproteinswithanelasticnetworkmodel.Biophys.J.,80:505{515,2001.[4]I.Bahar,A.R.Atilgan,M.C.Demirel,andB.Erman.Vibrationaldynamicsofproteins:ofslowandfastmodesinrelationtofunctionandstability.Phys.Rev.Lett,80:2733{2736,1998.[5]I.Bahar,A.R.Atilgan,andB.Erman.Directevaluationofthermalinproteinsusingasingle-parameterharmonicpotential.FoldingandDesign,2:173{181,1997.[6]ABakan,L.M.Meireles,andI.Bahar.Prody:Proteindynamicsinferredfromtheoryandexperiments.Bioinformatics,27:1575{1577,2011.[7]JoelRBockandDavidAGough.Predictingprotein{proteininteractionsfromprimarystructure.Bioinformatics,17(5):455{460,2001.[8]B.R.Brooks,R.E.Bruccoleri,B.D.Olafson,D.J.States,S.Swaminathan,andM.Karplus.Charmm:Aprogramformacromolecularenergy,minimization,anddy-namicscalculations.J.Comput.Chem.,4:187{217,1983.[9]M.P.S.Brown,W.N.Grundy,D.Lin,N.Cristianini,C.W.Sugnet,T.S.Furey,M.Ares,andD.Haussler.Knowledge-basedanalysisofmicroarraygeneexpressiondatabyusingsupportvectormachines.ProceedingsoftheNationalAcademyofSciences,97(1):262267,Apr2000.[10]MichaelF.Brown.CurvatureForcesinMembraneLipid-ProteinInteractions.Bio-chemistry,51(49):9782{9795,DEC112012.[11]MichaelPSBrown,WilliamNobleGrundy,DavidLin,NelloCristianini,CharlesSugnet,ManuelAres,andDavidHaussler.Supportvectormachineofmicroarraygeneexpressiondata.UniversityofCalifornia,SantaCruz,TechnicalReportUCSC-CRL-99-09,1999.[12]R.Burbidge,M.Trotter,B.Buxton,andS.Holden.Drugdesignbymachinelearning:supportvectormachinesforpharmaceuticaldataanalysis.Computers&Chemistry,26(1):514,2001.143[13]ZacharyFBurton.Theoldandnewtestamentsofgeneregulation:Evolutionofmulti-subunitrnapolymerasesandco-evolutionofeukaryotecomplexitywiththernapiictd.Transcription,5(3),2014.[14]Yu-DongCai,Xiao-JunLiu,Xue-biaoXu,andGuo-PingZhou.Supportvectorma-chinesforpredictingproteinstructuralclass.BMCbioinformatics,2(1):1,2001.[15]F.ChitiandC.M.Dobson.Proteinmisfolding,functionalamyloid,andhumandisease.Annu.Rev.Biochem.,75:333{366,2006.[16]Q.CuiandI.Bahar.Normalmodeanalysis:theoryandapplicationstobiologicalandchemicalsystems.ChapmanandHall/CRC,2010.[17]ChristopherDavies,StephenWWhite,andVRamakrishnan.Thecrystalstructureofribosomalproteinl14revealsanimportantorganizationalcomponentofthetranslationalapparatus.Structure,4(1):55{66,1996.[18]OmarN.A.DemerdashandJulieC.Mitchell.Density-clusterNMA:Anewproteindecompositiontechniqueforcoarse-grainednormalmodeanalysis.Proteins:StructureFunctionandBioinformatics,80(7):1766{1779,JUL2012.[19]ChrisHQDingandInnaDubchak.Multi-classproteinfoldrecognitionusingsupportvectormachinesandneuralnetworks.Bioinformatics,17(4):349{358,2001.[20]SusanDumais,JohnPlatt,DavidHeckerman,andMehranSahami.Inductivelearningalgorithmsandrepresentationsfortextcategorization.InProceedingsoftheseventhinternationalconferenceonInformationandknowledgemanagement,pages148{155.ACM,1998.[21]UgurEmekli,Schneidman-Duhovny,Dina,HaimWolfson,RuthNussinov,andTurkanHaliloglu.HingeProt:automatedpredictionofhingesinproteinstructures.Proteins,70(4):1219{1227,2008.[22]EranEyal,ChakraChennubhotla,Lee-WeiYang,andIvetBahar.Anisotropictuationsofaminoacidsinproteinstructures:insightsfromx-raycrystallographyandelasticnetworkmodels.Bioinformatics,23(13):i175{i184,2007.[23]MichaelFeigandZacharyFBurton.Rnapolymeraseiiyduringtransloca-tionfromnormalmodeanalysis.Proteins:Structure,Function,andBioinformatics,78(2):434{446,2010.[24]MichaelFeigandZacharyFBurton.Rnapolymeraseiiwithopenandclosedtrig-gerloops:activesitedynamicsandnucleicacidtranslocation.Biophysicaljournal,99(8):2577{2586,2010.[25]SamuelFloresandMarkGerstein.FlexOracle:predictinghingesbyidentionofstabledomains.BMCbioinformatics,8(1),2007.144[26]SamuelFlores,LongLu,JulieYang,NicholasCarriero,andMarkGerstein.Hingeatlas:relatingproteinsequencetositesofstructuraly.BMCbioinformatics,8,2007.[27]P.J.Flory.Statisticalthermodynamicsofrandomnetworks.Proc.Roy.Soc.Lond.A,,351:351{378,1976.[28]TerrenceSFurey,NelloCristianini,Nigel,DavidWBednarski,MichelSchummer,andDavidHaussler.Supportvectormachineandvalidationofcancertissuesamplesusingmicroarrayexpressiondata.Bioinformatics,16(10):906{914,2000.[29]N.Go,T.Noguti,andT.Nishikawa.Dynamicsofasmallglobularproteinintermsoflow-frequencyvibrationalmodes.Proc.Natl.Acad.Sci.,80:3696{3700,1983.[30]IsabelleGuyon,JasonWeston,StephenBarnhill,andVladimirVapnik.Geneselectionforcancerusingsupportvectormachines.Machinelearning,46(1-3):389{422,2002.[31]K.Hinsen.Analysisofdomainmotionsbyapproximatenormalmodecalculations.Proteins,33:417{429,1998.[32]K.Hinsen.Structuralyinproteins:impactofthecrystalenvironment.Bioin-formatics,24:521{528,2008.[33]SujunHuaandZhirongSun.Anovelmethodofproteinsecondarystructurepredictionwithhighsegmentoverlapmeasure:supportvectormachineapproach.Journalofmolecularbiology,308(2):397{407,2001.[34]W.Humphrey,A.Dalke,andK.Schulten.VMD{visualmoleculardynamics.JournalofMolecularGraphics,14(1):33{38,1996.[35]LakshminarayanMIyer,EugeneVKoonin,andLAravind.Evolutionaryconnectionbetweenthecatalyticsubunitsofdna-dependentrnapolymerasesandeukaryoticrna-dependentrnapolymerasesandtheoriginofrnapolymerases.BMCstructuralbiology,3(1):1,2003.[36]D.J.Jacobs,A.J.Rader,L.A.Kuhn,andM.F.Thorpe.Proteinypredictionsusinggraphtheory.Proteins-Structure,Function,andGenetics,44(2):150{165,AUG12001.[37]ThorstenJoachims.Textcategorizationwithsupportvectormachines:Learningwithmanyrelevantfeatures.MachineLearning:ECML-98LectureNotesinComputerSci-ence,page137142,1998.[38]ThorstenJoachims.Textcategorizationwithsupportvectormachines:Learningwithmanyrelevantfeatures.Springer,1998.[39]KevinSKeating,SamuelCFlores,MarkBGerstein,andLeslieAKuhn.StoneHinge:hingepredictionbynetworkanalysisofindividualproteinstructures.ProteinScience,18(2):359{371,2009.145[40]D.A.Kondrashov,A.W.VanWynsberghe,R.M.Bannen,Q.Cui,andJr.G.N.Phillips.Proteinstructuralvariationincomputationalmodelsandcrystallographicdata.Structure,15:169{177,2007.[41]S.Kundu,J.S.Melton,D.C.Sorensen,andJr.G.N.Phillips.Dynamicsofproteinsincrystals:comparisonofexperimentwithsimplemodels.Biophys.J.,83:723{732,2002.[42]S.Kundu,D.C.Sorensen,andG.N.Jr.Phillips.AutomaticdomaindecompositionofproteinsbyaGaussiannetworkmodel.Proteins:Structure,Function,andBioinfor-matics,57(4):725{733,2004.[43]ChristinaSLeslie,EleazarEskin,andWilliamNoble.Thespectrumkernel:AstringkernelforsvmproteinInsymposiumonbiocomputing,volume7,pages566{575,2002.[44]M.Levitt,C.Sander,andP.S.Stern.Proteinnormal-modedynamics:Trypsinin-hibitor,crambin,ribonucleaseandlysozyme.J.Mol.Biol.,181(3):423{447,1985.[45]H.Y.Li,Z.X.Cao,L.L.Zhao,andJ.H.Wang.Analysisofconformationalmotionsandresidueforescherichiacoliribose-bindingproteinrevealedwithelasticnetworkmodels.InternationalJournalofMolecularSciences,14(5):10552{10569,2013.[46]J.P.Ma.Usefulnessandlimitationsofnormalmodeanalysisinmodelingdynamicsofbiomolecularcomplexes.Structure,13:373{180,2005.[47]MikhailVMatz,ArkadyFFradkov,YuliiALabas,AleksandrPSavitsky,AndreyGZaraisky,MikhailLMarkelov,andSergeyALukyanov.Fluorescentproteinsfromnonbioluminescentanthozoaspecies.Naturebiotechnology,17(10):969{973,1999.[48]SayanMukherjee,PTamayo,DSlonim,AVerri,TGolub,JMesirov,andTPoggio.Supportvectormachineclasscationofmicroarraydata.1999.[49]K.Opron,K.L.Xia,andG.W.Wei.Fastandanisotropicy-rigidityindexforproteinyandanalysis.JournalofChemicalPhysics,140:234105,2014.[50]KristopherOpron,K.L.Xia,andG.W.Wei.Communication:CapturingproteinmultiscalethermalJournalofChemicalPhysics,142(211101),2015.[51]Xiao-YongPanandHong-BinShen.RobustPredictionofB-FactorlefromSe-quenceUsingTwo-StageSVRBasedonRandomForestFeatureSelection.ProteinandPeptideLetters,16(12):1447{1454,2009.[52]J.K.Park,RobertJernigan,andZhijunWu.Coarsegrainednormalmodeanalysisvs.gaussiannetworkmodelforproteinresidue-levelstructuraluctuations.BulletinofMathematicalBiology,75:124{160,2013.146[53]PaulPavlidis,JasonWeston,JinsongCai,andWilliamNobleGrundy.Genefunctionalfromheterogeneousdata.ProceedingsoftheannualinternationalconferenceonComputationalbiology-RECOMB'01,2001.[54]A.J.Rader,C.Chennubhotla,L.W.Yang,I.Bahar,andQ.Cui.TheGaussiannetworkmodel:Theoryandapplications.Normalmodeanalysis:Theoryandapplicationstobiologicalandchemicalsystems,9:41{64,2006.[55]P.Radivojac,Z.Obradovic,D.K.Smith,G.Zhu,S.Vucetic,C.J.Brown,J.D.Lawson,andA.K.Dunker.Proteinyandintrinsicdisorder.ProteinSci.,13:71{80,2004.[56]R.J.Renka.Multivariateinterpolationoflargesetsofscattereddata.ACMTransac-tionsonMathematicalSoftware,14(2):139{148,JUN1988.[57]WouterHRoos,MelissaMGibbons,AntonArkhipov,CharlotteUetrecht,NRWatts,PTAlasdairCSteven,AlbertJRHeck,KlausSchulten,WilliamSKlug,andGijsJLWuite.Squeezingproteinshells:Howcontinuumelasticmodels,moleculardynamicssimulations,andexperimentscoalesceatthenanoscale.BiophysicalJournal,99:1175{1181,2010.[58]DavidSeptandFredC.MacKintosh.MicrotubuleElasticity:ConnectingAll-AtomSimulationswithContinuumMechanics.PhysicalReviewLetters,104(1),Jan82010.[59]MaximShatsky,RuthNussinov,andHaimJWolfson.FlexProt:alignmentofproteinstructureswithoutaofhingeregions.JournalofComputationalBiology,11(1):83{8106,2004.[60]OsamuShimomura,FrankHJohnson,andYoSaiga.Extraction,andprop-ertiesofaequorin,abioluminescentproteinfromtheluminoushydromedusan,aequorea.Journalofcellularandcomparativephysiology,59(3):223{239,1962.[61]L.Skjaerven,S.M.Hollup,andN.Reuter.Normalmodeanalysisforproteins.JournalofMolecularStructure:Theochem.,898:42{48,2009.[62]G.SongandR.L.Jernigan.vgnm:abettermodelforunderstandingthedynamicsofproteinsincrystals.J.Mol.Biol.,369(3):880{893,2007.[63]F.TamaandY.H.Sanejouand.Conformationalchangeofproteinsarisingfromnormalmodecalculations.ProteinEng.,14:1{6,2001.[64]M.Tasumi,H.Takenchi,S.Ataka,A.M.Dwidedi,andS.Krimm.Normalvibrationsofproteins:Glucagon.Biopolymers,21:711{714,1982.[65]WilliamI.Thacker,JingweiZhang,LayneT.Watson,B.Birch,ManjulaA.Iyer,andMichaelW.Berry.Algorithm905:SHEPPACK:MoShepardAlgorithmforInterpolationofScatteredMultivariateData.ACMTransactionsonMathematicalSoftware,37(3),SEP2010.147[66]M.M.Tirion.Largeamplitudeelasticmotionsinproteinsfromasingle-parameter,atomicanalysis.Phys.Rev.Lett.,77:1905{1908,1996.[67]V.UverskyandA.K.Dunker.Controlledchaos.Sceince,322:1340{1341,2008.[68]A.Uyar,N.Kantarci-Carsibasi,T.Haliloglu,andP.Doruker.Featuresoflargehinge-bendingconformationaltransitions.predictionofclosedstructurefromopenstate.Bio-physicalJournal,106(12):2656{2666,2014.[69]AdamVanWynsberghe,GuohuiLi,andQiangCui.Normal-modeanalysissuggestsproteinymodulationthroughoutrnapolymerase'sfunctionalcycle.Biochem-istry,43(41):13083{13096,2004.[70]EVilla,ABalaeLMahadevan,andKSchulten.Multiscalemethodforsimulatingprotein-DNAcomplexes.MultiscaleModeling&Simulation,2(4):527{553,2004.[71]C.W.vonderLieth,K.Stumpf-Nothof,andU.Prior.Abondyindexderivedfromtheconstitutionofmolecules.JournalofChemicalInformationandComputerScience,36:711{716,1996.[72]DongWang,DavidABushnell,KennethDWestover,CraigDKaplan,andRogerDKo-rnberg.Structuralbasisoftranscription:roleofthetriggerloopinsubstratespyandcatalysis.Cell,127(5):941{954,2006.[73]G.W.Wei.Waveletsgeneratedbyusingdiscretesingularconvolutionkernels.JournalofPhysicsA:MathematicalandGeneral,33:8577{8596,2000.[74]K.L.Xia,X.Feng,Y.Y.Tong,andG.W.Wei.Persistenthomologyforthequantitativepredictionoffullerenestability.JournalofComputationalChemsitry,36:408{422,2015.[75]K.L.Xia,K.Opron,andG.W.Wei.Multiscalemultiphysicsandmultidomainmodels|Flexibilityandrigidity.JournalofChemicalPhysics,139:194109,2013.[76]K.L.XiaandG.W.Wei.Astochasticmodelforproteinyanalysis.PhysicalReviewE,88:062709,2013.[77]L.W.YangandC.P.Chng.Coarse-grainedmodelsrevealfunctionaldynamics{I.elasticnetworkmodels{theories,comparisonsandperspectives.BioinformaticsandBiologyInsights,2:25{45,2008.[78]Lee-WeiYang,ARader,XiongLiu,CristopherJursa,ShannChen,HassanKarimi,andIvetBahar.oGNM:onlinecomputationofstructuraldynamicsusingthegaussiannetworkmodel.Nucleicacidsresearch,34(WebServerissue):W24{W31,2006.[79]LeiYang,GuangSong,andRobertL.Jernigan.Proteinelasticnetworkmodelsandtherangesofcooperativity.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,106(30):12347{12352,JUL282009.148[80]Chen-HsiangYeang,SridharRamaswamy,PabloTamayo,SayanMukherjee,RyanMRifkin,MichaelAngelo,MichaelReich,EricLander,JillMesirov,andToddGolub.Molecularofmultipletumortypes.Bioinformatics,17(suppl1):S316{S322,2001.[81]ZYuan,TLBailey,andRDTeasdale.PredictionofproteinB-factorProteins-StructureFunctionandBioinformatics,58(4):905{912,MAR12005.[82]W.Zheng,B.R.Brooks,andD.Thirumalai.Allosterictransitionsinthechaperoningroelarecapturedbyadominantnormalmodethatismostrobusttosequencevaria-tions.Biophys.J.,93:2289{2299,2007.[83]XiaoleiZhuandJulieCMitchell.Kfc2:Aknowledge-basedhotspotpredictionmethodbasedoninterfacesolvation,atomicdensity,andplasticityfeatures.Proteins:Structure,Function,andBioinformatics,79(9):2671{2683,2011.149